Markus v2.0.0 released! Better metrics API for Python projects.

What is it?

Markus is a Python library for generating metrics.

Markus makes it easier to generate metrics in your program by:

  • providing multiple backends (Datadog statsd, statsd, logging, logging roll-up, and so on) for sending metrics data to different places
  • sending metrics to multiple backends at the same time
  • providing a testing framework for easy metrics generation testing
  • providing a decoupled architecture making it easier to write code to generate metrics without having to worry about making sure creating and configuring a metrics client has been done--similar to the Python logging module in this way

We use it at Mozilla on many projects.

v2.0.0 released!

I released v2.0.0 just now. Changes:


  • Use time.perf_counter() if available. Thank you, Mike! (#34)
  • Support Python 3.7 officially.
  • Add filters for adjusting and dropping metrics getting emitted. See documentation for more details. (#40)

Backwards incompatible changes

  • tags now defaults to [] instead of None which may affect some expected test output.

  • Adjust internals to run .emit() on backends. If you wrote your own backend, you may need to adjust it.

  • Drop support for Python 3.4. (#39)

  • Drop support for Python 2.7.

    If you're still using Python 2.7, you'll need to pin to <2.0.0. (#42)

Bug fixes

  • Document feature support in backends. (#47)
  • Fix MetricsMock.has_record() example. Thank you, John!

Where to go for more

Changes for this release:

Documentation and quickstart here:

Source code and issue tracker here:

Let me know whether this helps you!

Socorro Engineering: July 2019 happenings and putting it on hold


Socorro Engineering team covers several projects:

This blog post summarizes our activities in July.

Highlights of July

  • Socorro: Added modules_in_stack field to super search allowing people to search the set of module/debugid for functions that are in teh stack of the crashing thread.

    This lets us reprocess crash reports that have modules for which symbols were just uploaded.

  • Socorro: Added PHC related fields, dom_fission_enabled, and bug_1541161 to super search.

  • Socorro: Fixed some things further streamlining the local dev environment.

  • Socorro: Reformatted Python code with Black.

  • Socorro: Extracted supersearch and fetch-data commands as a separate Python library:

  • Tecken: Upgraded to Python 3.7 and adjusted storage bucket code to work better for multiple storage providers.

  • Tecken: Added GCS emulator for local development environment.

  • PollBot: Updated to use Buildhub2.

Hiatus and project changes

In April, we picked up Tecken, Buildhub, Buildhub2, and PollBot in addition to working on Socorro. Since then, we've:

  • audited Tecken, Buildhub, Buildhub2, and PollBot
  • updated all projects, updated dependencies, and performed other necessary maintenance
  • documented deploy procedures and basic runbooks
  • deprecated Buildhub in favor of Buildhub2 and updated projects to use Buildhub2

Buildhub is decomissioned now and is being dismantled.

We're passing Buildhub2 and PollBot off to another team. They'll take ownership of those projects going forward.

Socorro and Tecken are switching to maintenance mode as of last week. All Socorro/Tecken related projects are on hold. We'll continue to maintain the two sites doing "keep the lights on" type things:

  • granting access to memory dumps
  • adding new products
  • adding fields to super search
  • making changes to signature generation and updating siggen library
  • responding to outages
  • fixing security issues

All other non-urgent work will be pushed off.

As of August 1st, we've switched to Mozilla Location Services. We'll be auditing that project, getting it back into a healthy state, and bringing it in line with current standards and practices.

Given that, this is the last Socorro Engineering status post for a while.

Read more…

crashstats-tools v1.0.1 released! cli for Crash Stats.

What is it?

crashstats-tools is a set of command-line tools for working with Crash Stats (

crashstats-tools comes with two commands:

  • supersearch: for performing Crash Stats Super Search queries
  • fetch-data: for fetching raw crash, dumps, and processed crash data for specified crash ids

v1.0.1 released!

I extracted two commands we have in the Socorro local dev environment as a separate Python project. This allows anyone to use those two commands without having to set up a Socorro local dev environment.

The audience for this is pretty limited, but I think it'll help significantly for testing analysis tools.

Say I'm working on an analysis tool that looks at crash report minidump files and does some additional analysis on it. I could use supersearch command to get me a list of crash ids to download data for and the fetch-data command to download the requisite data.

$ mkdir crashdata
$ supersearch --product=Firefox --num=10 | \
    fetch-data --raw --dumps --no-processed crashdata

Then I can run my tools on the dumps in crashdata/upload_file_minidump/.

Be thoughtful about using data

Make sure to use these tools in compliance with our data policy:

Where to go for more

See the project on GitHub which includes a README which contains everything about the project including examples of usage, the issue tracker, and the source code:

Let me know whether this helps you!

Crash pings (Telemetry) and crash reports (Socorro/Crash Stats)

I keep getting asked questions that stem from confusion about crash pings and crash reports, the details of where they come from, differences between the two data sets, what each is currently good for, and possible future directions for work on both. I figured I'd write it all down.

This is a brain dump and sort of a blog post and possibly not a good version of either. I desperately wished it was more formal and mind-blowing like something written by Chutten or Alessio.

It's likely that this is 90% true today but as time goes on, things will change and it may be horribly wrong depending on how far in the future you're reading this. As I find out things are wrong, I'll keep notes. Any errors are my own.


We (Mozilla) have two different data sets for crashes: crash pings in Telemetry and crash reports in Socorro/Crash Stats. When Firefox crashes, the crash reporter collects information about the crash and this results in crash ping and crash report data. From there, the two different data things travel two different paths and end up in two different systems.

This blog post covers these two different journeys, their destinations, and the resulting properties of both data sets.

This blog post specifically talks about Firefox and not other products which have different crash reporting stories.


Updates and changes to this blog post since I wrote it:

  • 2019-07-04: Crash ping data is not publicly available. Blog post updated accordingly. Thank you, Chutten!
  • 2019-11-07: Added note that you can download symbols files using any http client and that the file you get back is gzipped. Thank you, Steven!


Firefox crashes. It happens.

The crash reporter kicks in. It uses the Breakpad library to collect data about the crashed process, package it up into a minidump. The minidump has information about the registers, what's in memory, the stack of the crashing thread, stacks of other threads, what modules are in memory, and so on.

Additionally, the crash reporter collects a set of annotations for the crash. Annotations like ProductName, Version, ReleaseChannel, BuildID and others help us group crashes for the same product and build.

The crash reporter assembles the portions of the crash report that don't have personally identifiable information (PII) in them into a crash ping. It uses minidump-analyzer to unwind the stack. The crash ping with this stack is sent via the crash ping sender to Telemetry.

If Telemetry is disabled, then the crash ping will not get sent to Telemetry.

The crash reporter will show a crash report dialog informing the user that Firefox crashed. The crash report dialog includes a comments field for additional data about the crash and an email field. The user can choose to send the crash report or not.

If the user chooses to send the crash report, then the crash report is sent via HTTP POST to the collector for the crash ingestion pipeline. The entire crash ingestion pipeline is called Socorro. The website part is called Crash Stats.

If the user chooses not to send the crash report, then the crash report never goes to Socorro.

If Firefox is unable to send the crash report, it keeps it on disk. It might ask the user to try to send it again later. The user can access about:crashes and send it explicitly.

Relevant backstory

What is symbolication?

Before we get too much further, let's talk about symbolication.

minidump-whatever will walk the stack starting with the top-most frame. It uses frame information to find the caller frame and works backwards to produce a list of frames. It also includes a list of modules that are in memory.

For example, part of the crash ping might look like this:

  "modules": [
      "debug_file": "xul.pdb",
      "base_addr": "0x7fecca50000",
      "version": "",
      "debug_id": "4E1555BE725E9E5C4C4C44205044422E1",
      "filename": "xul.dll",
      "end_addr": "0x7fed32a9000",
      "code_id": "5CF2591C6859000"
  "threads": [
      "frames": [
          "trust": "context",
          "module_index": 8,
          "ip": "0x7feccfc3337"
          "trust": "cfi",
          "module_index": 8,
          "ip": "0x7feccfb0c8f"
          "trust": "cfi",
          "module_index": 8,
          "ip": "0x7feccfae0af"
          "trust": "cfi",
          "module_index": 8,
          "ip": "0x7feccfae1be"

The "ip" is an instruction pointer.

The "module_index" refers to another list of modules that were all in memory at the time.

The "trust" refers to how the stack unwinding figured out how to unwind that frame. Sometimes it doesn't have enough information and it does an educated guess.

Symbolication takes the module name, the module debug id, and the offset and looks it up with the symbols it knows about. So for the first frame, it'd do this:

  1. module index 8 is xul.dll
  2. get the symbols for xul.pdb debug id 4E1555BE725E9E5C4C4C44205044422E1 which is at
  3. figure out that 0x7feccfc3337 (ip) - 0x7fecca50000 (base addr for xul.pdb module) is 0x573337
  4. look up 0x573337 in the SYM file and I think that's nsTimerImpl::InitCommon(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const &,unsigned int,nsTimerImpl::Callback &&)

Symbolication does that for every frame and then we end up with a helpful symbolicated stack.


You can use wget, curl, or any http client to download symbols using urls like in step 2. One thing to know is that the file you get back is gzipped, so you'll need to un-gzip it to read it.

Tecken has a symbolication API which takes the module and stack information in a minimal form and symbolicates using symbols it manages.

It takes a form like this:

  "memoryMap": [
  "stacks": [
      [0, 11723767],
      [1, 65802]

This has two data structures. The first is a list of (module name, module debug id) tuples. The second is a list of (module id, memory offset) tuples.

What is Socorro-style signature generation?

Socorro has a signature generator that goes through the stack, normalizes the frames so that frames look the same across platforms, and then uses that to generate a "signature" for the crash that suggests a common cause for all the crash reports with that signature.

It's a fragile and finicky process. It works well for some things and poorly for others. There are other ways to generate signatures. This is the one that Socorro currently uses. We're constantly honing it.

I export Socorro's signature generation system as a Python library called siggen.

For examples of stacks -> signatures, look at crash reports on Crash Stats.

What is it and where does it go?

Crash pings in Telemetry

Crash pings are only sent if Telemetry is enabled in Firefox.

The crash ping contains the stack for the crash, but little else about the crashed process. No register data, no memory data, nothing found on the heap.

The stack is unwound by minidump-analyzer in the client on the user's machine. Because of that, driver information can be used by unwinding so for some kinds of crashes, we may get a better stack than crash reports in Socorro.

Stacks in crash pings are not symbolicated.

There's an error aggregates data set generated from the crash pings which is used by Mission Control.

Crash reports in Socorro

Socorro does not get crash reports if the user chooses not to send a crash report.

Socorro collector discards crash reports for unsupported products.

Socorro collector throttles incoming crash reports for Firefox release channel--it accepts 10% of those for processing and rejects the other 90%.

The Socorro processor runs minidump-stackwalk on the minidump which unwinds the stack. Then it symbolicates the stack using symbols uploaded during the build process to

If we don't have symbols for modules, minidump-stackwalk will guess at the unwinding. This can work poorly for crashes that involve drivers and system libraries we don't have symbols for.

Crash pings vs. Crash reports

Because of the above, there are big differences in collection of crash data between the two systems and what you can do with it.

Representative of the real world

Because crash ping data doesn't require explicit consent by users on a crash-by-crash basis and crash pings are sent using the Telemetry infrastructure which is pretty resilient to network issues and other problems, crash ping data in Telemetry is likely more representative of crashes happening for our users.

Crash report data in Socorro is limited to what users explicitly send us. Further, there are cases where Firefox isn't able to run the crash reporter dialog to ask the user.

For example, on Friday, June 28th, 2019 for Firefox release channel:

  • Telemetry got 1,706,041 crash pings
  • Socorro processed 42,939 crash reports, so figure it got around 420,000 crash reports

Stack quality

A crash report can have a different stack in the crash ping than in the crash report.

Crash ping data in Telemetry is unwound in the client. On Windows, minidump-analyzer can access CFI unwinding data, so the stacks can be better especially in cases where the stack contains system libraries and drivers.

We haven't implemented this yet on non-Windows platforms.

Crash report data in Socorro is unwound by the Socorro processor and is heavily dependent on what symbols we have available. It doesn't do a good job with unwinding through drivers and we often don't have symbols for Linux system libraries.

Gabriele says sometimes stacks are unwound better for crashes on MacOS and Linux platforms than what the crash ping contains.

Symbolication and signatures

Crash ping data is not symbolicated and we're not generating Socorro-style signatures, so it's hard to bucket them and see change in crash rates for specific crashes.

There's an fx-crash-sig Python library which has code to symbolicate crash ping stacks and generate a Socorro-style signature from that stack. This is helpful for one-off analysis but this is not a long-term solution.

Crash report data in Socorro is symbolicated and has Socorro-style signatures.

The consequence of this is that in Telemetry, we can look at crash rates for builds, but can't look at crash rates for specific kinds of crashes as bucketed by signatures.

The Signature Report and Top Crashers Report in Crash Stats can't be implemented in Telemetry (yet).


Telemetry has better tooling for analyzing crash ping data.

Crash ping data drives Mission Control.

Socorro's tooling is limited to Supersearch web ui and API which is ok at some things and not great at others. I've heard some people really like the Supersearch web ui.

There are parts of the crash report which are not searchable. For example, it's not possible to search for crash reports where a certain module is in the stack. Socorro has a signature report and a topcrashers page which help, but they're not flexible for answering questions outside of what we've explicitly coded them for.

Socorro sends a copy of processed crash reports to Telemetry and this is in the "socorro_crash" dataset.

PII and data access

Telemetry crash ping data does not contain PII. It is not publicly available, except in aggregate via Mission Control.

Socorro crash report data contains PII. A subset of the crash data is available to view and search by anyone. The PII data is restricted to users explicitly granted access to it. PII data includes user email addresses, user-provided comments, CPU register data, what else was in memory, and other things.

Data expiration

Telemetry crash ping data isn't expired, but I think that's changing at some point.

Socorro crash report data is kept for 6 months.

Data latency

Socorro data is near real-time. Crash reports get collected and processed and are available in searches and reports within a few minutes.

Crash ping data gets to Telemetry almost immediately.

Main ping data has some latency between when it's generated and when it is collected. This affects normalization numbers if you were looking at crash rates from crash ping data.

Derived data sets may have some latency depending on how they're generated.

Conclusions and future plans


Socorro is still good for deep dives into specific crash reports since it contains the full minidump and sometimes a user email address and user comments.

Socorro has Socorro-style signatures which make it possible to aggregate crash reports into signature buckets. Signatures are kind of fickle and we adjust how they're generated over time as compilers, symbols extraction, and other things change. We can build Signature Reports and Top Crasher reports and those are ok, but they use total counts and not rates.

I want to tackle switching from Socorro's minidump-stackwalk to minidump-analyzer so we're using the same stack walker in both places. I don't know when that will happen.

Socorro is going to GCP which means there will be different tools available for data analysis. Further, we may switch to BigQuery or some other data store that lets us search the stack. That'd be a big win.


Telemetry crash ping data is more representative of the real world, but stacks are symbolicated and there's no signature generation, so you can't look at aggregates by cause.

Symbolication and signature generation of crash pings will get figured out at some point.

Work continues on Mission Control 2.0.

Telemetry is going to GCP which means there will be different tools available for data analysis.


At the All Hands, I had a few conversations about fixing tooling for both crash reports and crash pings so the resulting data sets were more similar and you could move from one to the other. For example, if you notice a troubling trend in the crash ping data, can you then go to Crash Stats and find crash reports to deep dive into?

I also had conversations around use cases. Which data set is better for answering certain questions?

We think having a guide that covers which data set is good for what kinds of analysis, tools to use, and how to move between the data sets would be really helpful.


Many thanks to everyone who helped with this: Will Lachance, W Chris Beard, Gabriele Svelto, Nathan Froyd, and Chutten.

Also, many thanks to Chutten and Alessio who write fantastic blog posts about Telemetry things. Those are gold.

Socorro Engineering: June 2019 happenings


Socorro Engineering team covers several projects:

This blog post summarizes our activities in June.

Highlights of June

  • Socorro: Fixed the collector's support of a single JSON-encoded field in the HTTP POST payload for crash reports. This is a big deal because we'll get less junk data in crash reports going forward.
  • Socorro: Reworked how Crash Stats manages featured versions: if the product defines a product_details/PRODUCTNAME.json file, it'll pull from that. Otherwise it calculates featured versions based on the crash reports it's received.
  • Buildhub: deprecated Buildhub in favor of Buildhub2. Current plan is to decommission Buildhub in July.
  • Across projects: Updated tons of dependencies that had security vulnerabilities. It was like a hamster wheel of updates, PRs, and deploys.
  • Tecken: Worked on GCS emulator for local dev environment.
  • All hands discussions:
    • GCP migration plan for Tecken and figure out what needs to be done.
    • Possible GCP migration schedule for Tecken and Socorro.
    • Migrating applications using Buildhub to Buildhub2 and decommissioning Buildhub in July.
    • What would happen if we switched from Elasticsearch to BigQuery?
    • Switching from Socorro's minidump-stackwalk to minidump-analyzer.
    • Re-implementing the Socorro Top Crashers and Signature reports using Telemetry tools and data.
    • Writing a symbolicator and Socorro-style signature generator in Rust that can be used for crash reports in Socorro and crash pings in Telemetry.
    • The crash ping vs. crash report situation (blog post coming soon).

Read more…

Socorro: May 2019 happenings


Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in May.

Read more…

Socorro: April 2019 happenings


Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in April.

Read more…

Socorro: March 2019 happenings


Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in March.

Read more…

Code of conduct: supporting in projects

This week, Mozilla added PRs to all the repositories that Mozilla has on GitHub that aren't forks, Servo, or Rust. The PRs add a file and also include some instructions on what projects can do with it. This standardizes inclusion of the code of conduct text in all projects.

I'm a proponent of codes of conduct. I think they're really important. When I was working on Bleach with Greg, we added code of conduct text in September of 2017. We spent a bunch of time thinking about how to do that effectively and all the places that users might encounter Bleach.

I spent some time this week trying to figure out how to do what we did with Bleach in the context of the Mozilla standard. This blog post covers those thoughts.

This blog post covers Python-centric projects. Hopefully, some of this applies to other project types, too.

What we did in Bleach in 2017 and why

In September of 2017, Greg and I spent some time thinking about all the places the code of conduct text needs to show up and how to implement the text to cover as many of those as possible for Bleach.

PR #314 added two things:

  • a CODE_OF_CONDUCT.rst file
  • a copy of the text to the README

In doing this, the code of conduct shows up in the following places:

In this way, users could discover Bleach in a variety of different ways and it's very likely they'll see the code of conduct text before they interact with the Bleach community.

[1] It no longer shows up on the "new issue" page in GitHub. I don't know when that changed.

The Mozilla standard

The Mozilla standard applies to all repositories in Mozilla spaces on GitHub and is covered in the Repository Requirements wiki page.

It explicitly requires that you add a file with the specified text in it to the root of the repository.

This makes sure that all repositories for Mozilla things have a code of conduct specified and also simplifies the work they need to do to enforce the requirement and update the text over time.

This week, a bot added PRs to all repositories that didn't have this file. Going forward, the bot will continue to notify repositories that are missing the file and will update the file's text if it ever gets updated.

How to work with the Mozilla standard

Let's go back and talk about Bleach. We added a file and a blurb to the README and that covered the following places:

With the new standard, we only get this:

In order to make sure the file is in the source tarball, you have to make sure it gets added. The bot doesn't make any changes to fix this. You can use check-manifest to help make sure that's working. You might have to adjust your file or something else in your build pipeline--hence the maybe.

Because the Mozilla standard suggests they may change the text of the file, it's a terrible idea to copy the contents of the file around your repository because that's a maintenance nightmare--so that idea is out.

It's hard to include .md files in reStructuredText contexts. You can't just add this to the long description of the file and you can't include it in a Sphinx project [2].

Greg and I chatted about this a bit and I think the best solution is to add minimal text that points to the in GitHub to the README. Something like this:

Code of Conduct

This project and repository is governed by Mozilla's code of conduct and
etiquette guidelines. For more details please see the `
file <>`_.

In Bleach, the long description set in includes the README:

def get_long_desc():
    desc ='README.rst', encoding='utf-8').read()
    desc += '\n\n'
    desc +='CHANGES', encoding='utf-8').read()
    return desc


    description='An easy safelist-based HTML-sanitizing tool.',

In Bleach, the index.rst of the docs also includes the README:

.. include:: ../README.rst


.. toctree::
   :maxdepth: 2


Indices and tables

* :ref:`genindex`
* :ref:`search`

In this way, the README continues to have text about the code of conduct and the link goes to the file which is maintained by the bot. The README is included in the long description of so this code of conduct text shows up on the PyPI page. The README is included in the Sphinx docs so the code of conduct text shows up on the front page of the project documentation.

So now we've got code of conduct text pointing to the file in all these places:

Plus the text will get updated automatically by the bot as changes are made.


[2] You can have Markdown files in a Sphinx project. It's fragile and finicky and requires a specific version of Commonmark. I think this avenue is not worth it. If I had to do this again, I'd be more inclined to run the Markdown file through pandoc and then include the result.

Future possibilities

GitHub has a Community Insights page for each project. This is the one for Bleach. There's a section for "Code of conduct", but you only get a green checkmark if and only if you use one of GitHub's pre-approved code of conduct files.

There's a discussion about that in their forums.

Is this checklist helpful to people? Does it mean something to have all these items checked off? Is there someone checking for this sort of thing? If so, then maybe we should get the Mozilla text approved?

Hope this helps!

I hope to roll this out for the projects I maintain on Monday.

I hope this helps you!

Socorro: February 2019 happenings


Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in February.

Read more…