Socorro Engineering: July 2019 happenings and putting it on hold

Summary

Socorro Engineering team covers several projects:

This blog post summarizes our activities in July.

Highlights of July

  • Socorro: Added modules_in_stack field to super search allowing people to search the set of module/debugid for functions that are in teh stack of the crashing thread.

    This lets us reprocess crash reports that have modules for which symbols were just uploaded.

  • Socorro: Added PHC related fields, dom_fission_enabled, and bug_1541161 to super search.

  • Socorro: Fixed some things further streamlining the local dev environment.

  • Socorro: Reformatted Python code with Black.

  • Socorro: Extracted supersearch and fetch-data commands as a separate Python library: https://github.com/willkg/crashstats-tools

  • Tecken: Upgraded to Python 3.7 and adjusted storage bucket code to work better for multiple storage providers.

  • Tecken: Added GCS emulator for local development environment.

  • PollBot: Updated to use Buildhub2.

Hiatus and project changes

In April, we picked up Tecken, Buildhub, Buildhub2, and PollBot in addition to working on Socorro. Since then, we've:

  • audited Tecken, Buildhub, Buildhub2, and PollBot
  • updated all projects, updated dependencies, and performed other necessary maintenance
  • documented deploy procedures and basic runbooks
  • deprecated Buildhub in favor of Buildhub2 and updated projects to use Buildhub2

Buildhub is decomissioned now and is being dismantled.

We're passing Buildhub2 and PollBot off to another team. They'll take ownership of those projects going forward.

Socorro and Tecken are switching to maintenance mode as of last week. All Socorro/Tecken related projects are on hold. We'll continue to maintain the two sites doing "keep the lights on" type things:

  • granting access to memory dumps
  • adding new products
  • adding fields to super search
  • making changes to signature generation and updating siggen library
  • responding to outages
  • fixing security issues

All other non-urgent work will be pushed off.

As of August 1st, we've switched to Mozilla Location Services. We'll be auditing that project, getting it back into a healthy state, and bringing it in line with current standards and practices.

Given that, this is the last Socorro Engineering status post for a while.

Read more…

crashstats-tools v1.0.1 released! cli for Crash Stats.

What is it?

crashstats-tools is a set of command-line tools for working with Crash Stats (https://crash-stats.mozilla.org/).

crashstats-tools comes with two commands:

  • supersearch: for performing Crash Stats Super Search queries
  • fetch-data: for fetching raw crash, dumps, and processed crash data for specified crash ids

v1.0.1 released!

I extracted two commands we have in the Socorro local dev environment as a separate Python project. This allows anyone to use those two commands without having to set up a Socorro local dev environment.

The audience for this is pretty limited, but I think it'll help significantly for testing analysis tools.

Say I'm working on an analysis tool that looks at crash report minidump files and does some additional analysis on it. I could use supersearch command to get me a list of crash ids to download data for and the fetch-data command to download the requisite data.

$ export CRASHSTATS_API_TOKEN=foo
$ mkdir crashdata
$ supersearch --product=Firefox --num=10 | \
    fetch-data --raw --dumps --no-processed crashdata

Then I can run my tools on the dumps in crashdata/upload_file_minidump/.

Be thoughtful about using data

Make sure to use these tools in compliance with our data policy:

https://crash-stats.mozilla.org/documentation/memory_dump_access/

Where to go for more

See the project on GitHub which includes a README which contains everything about the project including examples of usage, the issue tracker, and the source code:

https://github.com/willkg/crashstats-tools

Let me know whether this helps you!

Crash pings (Telemetry) and crash reports (Socorro/Crash Stats)

I keep getting asked questions that stem from confusion about crash pings and crash reports, the details of where they come from, differences between the two data sets, what each is currently good for, and possible future directions for work on both. I figured I'd write it all down.

This is a brain dump and sort of a blog post and possibly not a good version of either. I desperately wished it was more formal and mind-blowing like something written by Chutten or Alessio.

It's likely that this is 90% true today but as time goes on, things will change and it may be horribly wrong depending on how far in the future you're reading this. As I find out things are wrong, I'll keep notes. Any errors are my own.

Summary

We (Mozilla) have two different data sets for crashes: crash pings in Telemetry and crash reports in Socorro/Crash Stats. When Firefox crashes, the crash reporter collects information about the crash and this results in crash ping and crash report data. From there, the two different data things travel two different paths and end up in two different systems.

This blog post covers these two different journeys, their destinations, and the resulting properties of both data sets.

This blog post specifically talks about Firefox and not other products which have different crash reporting stories.

CRASH!

Firefox crashes. It happens.

The crash reporter kicks in. It uses the Breakpad library to collect data about the crashed process, package it up into a minidump. The minidump has information about the registers, what's in memory, the stack of the crashing thread, stacks of other threads, what modules are in memory, and so on.

Additionally, the crash reporter collects a set of annotations for the crash. Annotations like ProductName, Version, ReleaseChannel, BuildID and others help us group crashes for the same product and build.

The crash reporter assembles the portions of the crash report that don't have personally identifiable information (PII) in them into a crash ping. It uses minidump-analyzer to unwind the stack. The crash ping with this stack is sent via the crash ping sender to Telemetry.

If Telemetry is disabled, then the crash ping will not get sent to Telemetry.

The crash reporter will show a crash report dialog informing the user that Firefox crashed. The crash report dialog includes a comments field for additional data about the crash and an email field. The user can choose to send the crash report or not.

If the user chooses to send the crash report, then the crash report is sent via HTTP POST to the collector for the crash ingestion pipeline. The entire crash ingestion pipeline is called Socorro. The website part is called Crash Stats.

If the user chooses not to send the crash report, then the crash report never goes to Socorro.

If Firefox is unable to send the crash report, it keeps it on disk. It might ask the user to try to send it again later. The user can access about:crashes and send it explicitly.

Relevant backstory

What is symbolication?

Before we get too much further, let's talk about symbolication.

minidump-whatever will walk the stack starting with the top-most frame. It uses frame information to find the caller frame and works backwards to produce a list of frames. It also includes a list of modules that are in memory.

For example, part of the crash ping might look like this:

  "modules": [
    ...
    {
      "debug_file": "xul.pdb",
      "base_addr": "0x7fecca50000",
      "version": "69.0.0.7091",
      "debug_id": "4E1555BE725E9E5C4C4C44205044422E1",
      "filename": "xul.dll",
      "end_addr": "0x7fed32a9000",
      "code_id": "5CF2591C6859000"
    },
    ...
  ],
  "threads": [
    {
      "frames": [
        {
          "trust": "context",
          "module_index": 8,
          "ip": "0x7feccfc3337"
        },
        {
          "trust": "cfi",
          "module_index": 8,
          "ip": "0x7feccfb0c8f"
        },
        {
          "trust": "cfi",
          "module_index": 8,
          "ip": "0x7feccfae0af"
        },
        {
          "trust": "cfi",
          "module_index": 8,
          "ip": "0x7feccfae1be"
        },
...

The "ip" is an instruction pointer.

The "module_index" refers to another list of modules that were all in memory at the time.

The "trust" refers to how the stack unwinding figured out how to unwind that frame. Sometimes it doesn't have enough information and it does an educated guess.

Symbolication takes the module name, the module debug id, and the offset and looks it up with the symbols it knows about. So for the first frame, it'd do this:

  1. module index 8 is xul.dll
  2. get the symbols for xul.pdb debug id 4E1555BE725E9E5C4C4C44205044422E1 which is at https://symbols.mozilla.org/xul.pdb/4E1555BE725E9E5C4C4C44205044422E1/xul.sym
  3. figure out that 0x7feccfc3337 (ip) - 0x7fecca50000 (base addr for xul.pdb module) is 0x573337
  4. look up 0x573337 in the SYM file and I think that's nsTimerImpl::InitCommon(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const &,unsigned int,nsTimerImpl::Callback &&)

Symbolication does that for every frame and then we end up with a helpful symbolicated stack.

Tecken has a symbolication API which takes the module and stack information in a minimal form and symbolicates using symbols it manages.

It takes a form like this:

{
  "memoryMap": [
    [
      "xul.pdb",
      "44E4EC8C2F41492B9369D6B9A059577C2"
    ],
    [
      "wntdll.pdb",
      "D74F79EB1F8D4A45ABCD2F476CCABACC2"
    ]
  ],
  "stacks": [
    [
      [0, 11723767],
      [1, 65802]
    ]
  ]
}

This has two data structures. The first is a list of (module name, module debug id) tuples. The second is a list of (module id, memory offset) tuples.

What is Socorro-style signature generation?

Socorro has a signature generator that goes through the stack, normalizes the frames so that frames look the same across platforms, and then uses that to generate a "signature" for the crash that suggests a common cause for all the crash reports with that signature.

It's a fragile and finicky process. It works well for some things and poorly for others. There are other ways to generate signatures. This is the one that Socorro currently uses. We're constantly honing it.

I export Socorro's signature generation system as a Python library called siggen.

For examples of stacks -> signatures, look at crash reports on Crash Stats.

What is it and where does it go?

Crash pings in Telemetry

Crash pings are only sent if Telemetry is enabled in Firefox.

The crash ping contains the stack for the crash, but little else about the crashed process. No register data, no memory data, nothing found on the heap.

The stack is unwound by minidump-analyzer in the client on the user's machine. Because of that, driver information can be used by unwinding so for some kinds of crashes, we may get a better stack than crash reports in Socorro.

Stacks in crash pings are not symbolicated.

There's an error aggregates data set generated from the crash pings which is used by Mission Control.

Crash reports in Socorro

Socorro does not get crash reports if the user chooses not to send a crash report.

Socorro collector discards crash reports for unsupported products.

Socorro collector throttles incoming crash reports for Firefox release channel--it accepts 10% of those for processing and rejects the other 90%.

The Socorro processor runs minidump-stackwalk on the minidump which unwinds the stack. Then it symbolicates the stack using symbols uploaded during the build process to symbols.mozilla.org.

If we don't have symbols for modules, minidump-stackwalk will guess at the unwinding. This can work poorly for crashes that involve drivers and system libraries we don't have symbols for.

Crash pings vs. Crash reports

Because of the above, there are big differences in collection of crash data between the two systems and what you can do with it.

Representative of the real world

Because crash ping data doesn't require explicit consent by users on a crash-by-crash basis and crash pings are sent using the Telemetry infrastructure which is pretty resilient to network issues and other problems, crash ping data in Telemetry is likely more representative of crashes happening for our users.

Crash report data in Socorro is limited to what users explicitly send us. Further, there are cases where Firefox isn't able to run the crash reporter dialog to ask the user.

For example, on Friday, June 28th, 2019 for Firefox release channel:

  • Telemetry got 1,706,041 crash pings
  • Socorro processed 42,939 crash reports, so figure it got around 420,000 crash reports

Stack quality

A crash report can have a different stack in the crash ping than in the crash report.

Crash ping data in Telemetry is unwound in the client. On Windows, minidump-analyzer can access CFI unwinding data, so the stacks can be better especially in cases where the stack contains system libraries and drivers.

We haven't implemented this yet on non-Windows platforms.

Crash report data in Socorro is unwound by the Socorro processor and is heavily dependent on what symbols we have available. It doesn't do a good job with unwinding through drivers and we often don't have symbols for Linux system libraries.

Gabriele says sometimes stacks are unwound better for crashes on MacOS and Linux platforms than what the crash ping contains.

Symbolication and signatures

Crash ping data is not symbolicated and we're not generating Socorro-style signatures, so it's hard to bucket them and see change in crash rates for specific crashes.

There's an fx-crash-sig Python library which has code to symbolicate crash ping stacks and generate a Socorro-style signature from that stack. This is helpful for one-off analysis but this is not a long-term solution.

Crash report data in Socorro is symbolicated and has Socorro-style signatures.

The consequence of this is that in Telemetry, we can look at crash rates for builds, but can't look at crash rates for specific kinds of crashes as bucketed by signatures.

The Signature Report and Top Crashers Report in Crash Stats can't be implemented in Telemetry (yet).

Tooling

Telemetry has better tooling for analyzing crash ping data.

Crash ping data drives Mission Control.

Socorro's tooling is limited to Supersearch web ui and API which is ok at some things and not great at others. I've heard some people really like the Supersearch web ui.

There are parts of the crash report which are not searchable. For example, it's not possible to search for crash reports where a certain module is in the stack. Socorro has a signature report and a topcrashers page which help, but they're not flexible for answering questions outside of what we've explicitly coded them for.

Socorro sends a copy of processed crash reports to Telemetry and this is in the "socorro_crash" dataset.

PII and data access

Telemetry crash ping data does not contain PII. It is not publicly available, except in aggregate via Mission Control.

Socorro crash report data contains PII. A subset of the crash data is available to view and search by anyone. The PII data is restricted to users explicitly granted access to it. PII data includes user email addresses, user-provided comments, CPU register data, what else was in memory, and other things.

Data expiration

Telemetry crash ping data isn't expired, but I think that's changing at some point.

Socorro crash report data is kept for 6 months.

Data latency

Socorro data is near real-time. Crash reports get collected and processed and are available in searches and reports within a few minutes.

Crash ping data gets to Telemetry almost immediately.

Main ping data has some latency between when it's generated and when it is collected. This affects normalization numbers if you were looking at crash rates from crash ping data.

Derived data sets may have some latency depending on how they're generated.

Conclusions and future plans

Socorro

Socorro is still good for deep dives into specific crash reports since it contains the full minidump and sometimes a user email address and user comments.

Socorro has Socorro-style signatures which make it possible to aggregate crash reports into signature buckets. Signatures are kind of fickle and we adjust how they're generated over time as compilers, symbols extraction, and other things change. We can build Signature Reports and Top Crasher reports and those are ok, but they use total counts and not rates.

I want to tackle switching from Socorro's minidump-stackwalk to minidump-analyzer so we're using the same stack walker in both places. I don't know when that will happen.

Socorro is going to GCP which means there will be different tools available for data analysis. Further, we may switch to BigQuery or some other data store that lets us search the stack. That'd be a big win.

Telemetry

Telemetry crash ping data is more representative of the real world, but stacks are symbolicated and there's no signature generation, so you can't look at aggregates by cause.

Symbolication and signature generation of crash pings will get figured out at some point.

Work continues on Mission Control 2.0.

Telemetry is going to GCP which means there will be different tools available for data analysis.

Together

At the All Hands, I had a few conversations about fixing tooling for both crash reports and crash pings so the resulting data sets were more similar and you could move from one to the other. For example, if you notice a troubling trend in the crash ping data, can you then go to Crash Stats and find crash reports to deep dive into?

I also had conversations around use cases. Which data set is better for answering certain questions?

We think having a guide that covers which data set is good for what kinds of analysis, tools to use, and how to move between the data sets would be really helpful.

Thanks!

Many thanks to everyone who helped with this: Will Lachance, W Chris Beard, Gabriele Svelto, Nathan Froyd, and Chutten.

Also, many thanks to Chutten and Alessio who write fantastic blog posts about Telemetry things. Those are gold.

Updates

  • 2019-07-04: Crash ping data is not publicly available. Blog post updated accordingly. Thanks, Chutten!

Socorro Engineering: June 2019 happenings

Summary

Socorro Engineering team covers several projects:

This blog post summarizes our activities in June.

Highlights of June

  • Socorro: Fixed the collector's support of a single JSON-encoded field in the HTTP POST payload for crash reports. This is a big deal because we'll get less junk data in crash reports going forward.
  • Socorro: Reworked how Crash Stats manages featured versions: if the product defines a product_details/PRODUCTNAME.json file, it'll pull from that. Otherwise it calculates featured versions based on the crash reports it's received.
  • Buildhub: deprecated Buildhub in favor of Buildhub2. Current plan is to decommission Buildhub in July.
  • Across projects: Updated tons of dependencies that had security vulnerabilities. It was like a hamster wheel of updates, PRs, and deploys.
  • Tecken: Worked on GCS emulator for local dev environment.
  • All hands discussions:
    • GCP migration plan for Tecken and figure out what needs to be done.
    • Possible GCP migration schedule for Tecken and Socorro.
    • Migrating applications using Buildhub to Buildhub2 and decommissioning Buildhub in July.
    • What would happen if we switched from Elasticsearch to BigQuery?
    • Switching from Socorro's minidump-stackwalk to minidump-analyzer.
    • Re-implementing the Socorro Top Crashers and Signature reports using Telemetry tools and data.
    • Writing a symbolicator and Socorro-style signature generator in Rust that can be used for crash reports in Socorro and crash pings in Telemetry.
    • The crash ping vs. crash report situation (blog post coming soon).

Read more…

Socorro: May 2019 happenings

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in May.

Read more…

Socorro: April 2019 happenings

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in April.

Read more…

Socorro: March 2019 happenings

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in March.

Read more…

Code of conduct: supporting in projects

CODE_OF_CONDUCT.md

This week, Mozilla added PRs to all the repositories that Mozilla has on GitHub that aren't forks, Servo, or Rust. The PRs add a CODE_OF_CONDUCT.md file and also include some instructions on what projects can do with it. This standardizes inclusion of the code of conduct text in all projects.

I'm a proponent of codes of conduct. I think they're really important. When I was working on Bleach with Greg, we added code of conduct text in September of 2017. We spent a bunch of time thinking about how to do that effectively and all the places that users might encounter Bleach.

I spent some time this week trying to figure out how to do what we did with Bleach in the context of the Mozilla standard. This blog post covers those thoughts.

This blog post covers Python-centric projects. Hopefully, some of this applies to other project types, too.

What we did in Bleach in 2017 and why

In September of 2017, Greg and I spent some time thinking about all the places the code of conduct text needs to show up and how to implement the text to cover as many of those as possible for Bleach.

PR #314 added two things:

  • a CODE_OF_CONDUCT.rst file
  • a copy of the text to the README

In doing this, the code of conduct shows up in the following places:

In this way, users could discover Bleach in a variety of different ways and it's very likely they'll see the code of conduct text before they interact with the Bleach community.

[1] It no longer shows up on the "new issue" page in GitHub. I don't know when that changed.

The Mozilla standard

The Mozilla standard applies to all repositories in Mozilla spaces on GitHub and is covered in the Repository Requirements wiki page.

It explicitly requires that you add a CODE_OF_CONDUCT.md file with the specified text in it to the root of the repository.

This makes sure that all repositories for Mozilla things have a code of conduct specified and also simplifies the work they need to do to enforce the requirement and update the text over time.

This week, a bot added PRs to all repositories that didn't have this file. Going forward, the bot will continue to notify repositories that are missing the file and will update the file's text if it ever gets updated.

How to work with the Mozilla standard

Let's go back and talk about Bleach. We added a file and a blurb to the README and that covered the following places:

With the new standard, we only get this:

In order to make sure the file is in the source tarball, you have to make sure it gets added. The bot doesn't make any changes to fix this. You can use check-manifest to help make sure that's working. You might have to adjust your MANIFEST.in file or something else in your build pipeline--hence the maybe.

Because the Mozilla standard suggests they may change the text of the CODE_OF_CONDUCT.md file, it's a terrible idea to copy the contents of the file around your repository because that's a maintenance nightmare--so that idea is out.

It's hard to include .md files in reStructuredText contexts. You can't just add this to the long description of the setup.py file and you can't include it in a Sphinx project [2].

Greg and I chatted about this a bit and I think the best solution is to add minimal text that points to the CODE_OF_CONDUCT.md in GitHub to the README. Something like this:

Code of Conduct
===============

This project and repository is governed by Mozilla's code of conduct and
etiquette guidelines. For more details please see the `CODE_OF_CONDUCT.md
file <https://github.com/mozilla/bleach/blob/master/CODE_OF_CONDUCT.md>`_.

In Bleach, the long description set in setup.py includes the README:

def get_long_desc():
    desc = codecs.open('README.rst', encoding='utf-8').read()
    desc += '\n\n'
    desc += codecs.open('CHANGES', encoding='utf-8').read()
    return desc

...

setup(
    name='bleach',
    version=get_version(),
    description='An easy safelist-based HTML-sanitizing tool.',
    long_description=get_long_desc(),
    ...

In Bleach, the index.rst of the docs also includes the README:

.. include:: ../README.rst

Contents
========

.. toctree::
   :maxdepth: 2

   clean
   linkify
   goals
   dev
   changes


Indices and tables
==================

* :ref:`genindex`
* :ref:`search`

In this way, the README continues to have text about the code of conduct and the link goes to the file which is maintained by the bot. The README is included in the long description of setup.py so this code of conduct text shows up on the PyPI page. The README is included in the Sphinx docs so the code of conduct text shows up on the front page of the project documentation.

So now we've got code of conduct text pointing to the CODE_OF_CONDUCT.md file in all these places:

Plus the text will get updated automatically by the bot as changes are made.

Excellent!

[2] You can have Markdown files in a Sphinx project. It's fragile and finicky and requires a specific version of Commonmark. I think this avenue is not worth it. If I had to do this again, I'd be more inclined to run the Markdown file through pandoc and then include the result.

Future possibilities

GitHub has a Community Insights page for each project. This is the one for Bleach. There's a section for "Code of conduct", but you only get a green checkmark if and only if you use one of GitHub's pre-approved code of conduct files.

There's a discussion about that in their forums.

Is this checklist helpful to people? Does it mean something to have all these items checked off? Is there someone checking for this sort of thing? If so, then maybe we should get the Mozilla text approved?

Hope this helps!

I hope to roll this out for the projects I maintain on Monday.

I hope this helps you!

Socorro: February 2019 happenings

Summary

Socorro is the crash ingestion pipeline for Mozilla's products like Firefox. When Firefox crashes, the crash reporter collects data about the crash, generates a crash report, and submits that report to Socorro. Socorro saves the crash report, processes it, and provides an interface for aggregating, searching, and looking at crash reports.

This blog post summarizes Socorro activities in February.

Read more…

Bleach: stepping down as maintainer

What is it?

Bleach is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML.

I'm stepping down

In October 2015, I had a conversation with James Socol that resulted in me picking up Bleach maintenance from him. That was a little over 3 years ago. In that time, I:

  • did 12 releases
  • improved the tests; switched from nose to pytest, added test coverage for all supported versions of Python and html5lib, added regression tests for xss strings in OWASP Testing Guide 4.0 appendix
  • worked with Greg to add browser testing for cleaned strings
  • improved documentation; added docstrings, added lots of examples, added automated testing of examples, improved copy
  • worked with Jannis to implement a security bug disclosure policy
  • improved performance (Bleach v2.0 released!)
  • switched to semver so the version number was more meaningful
  • did a rewrite to work with the extensive html5lib API changes
  • spent a couple of years dealing with the regressions from the rewrite
  • stepped up as maintainer for html5lib and did a 1.0 release
  • added support for Python 3.6 and 3.7

I accomplished a lot.

A retrospective on OSS project maintenance

I'm really proud of the work I did on Bleach. I took a great project and moved it forward in important and meaningful ways. Bleach is used by a ton of projects in the Python ecosystem. You have likely benefitted from my toil.

While I used Bleach on projects like SUMO and Input years ago, I wasn't really using Bleach on anything while I was a maintainer. I picked up maintenance of the project because I was familiar with it, James really wanted to step down, and Mozilla was using it on a bunch of sites--I picked it up because I felt an obligation to make sure it didn't drop on the floor and I knew I could do it.

I never really liked working on Bleach. The problem domain is a total fucking pain-in-the-ass. Parsing HTML like a browser--oh, but not exactly like a browser because we want the output of parsing to be as much like the input as possible, but as safe. Plus, have you seen XSS attack strings? Holy moly! Ugh!

Anyhow, so I did a bunch of work on a project I don't really use, but felt obligated to make sure it didn't fall on the floor, that has a pain-in-the-ass problem domain. I did that for 3+ years.

Recently, I had a conversation with Osmose that made me rethink that. Why am I spending my time and energy on this?

Does it further my career? I don't think so. Time will tell, I suppose.

Does it get me fame and glory? No.

Am I learning while working on this? I learned a lot about HTML parsing. I have scars. It's so crazy what browsers are doing.

Is it a community through which I'm meeting other people and creating friendships? Sort of. I like working with James, Jannis, and Greg. But I interact and work with them on non-Bleach things, too, so Bleach doesn't help here.

Am I getting paid to work on it? Not really. I did some of the work on work-time, but I should have been using that time to improve my skills and my career. So, yes, I spent some work-time on it, but it's not a project I've been tasked with to work on. For the record, I work on Socorro which is the Mozilla crash-ingestion pipeline. I don't use Bleach on that.

Do I like working on it? No.

Seems like I shouldn't be working on it anymore.

I moved Bleach forward significantly. I did a great job. I don't have any half-finished things to do. It's at a good stopping point. It's a good time to thank everyone and get off the stage.

What happens to Bleach?

I'm stepping down without working on what comes next. I think Greg is going to figure that out.

Thank you!

Jannis was a co-maintainer at the beginning because I didn't want to maintain it alone. Jannis stepped down and Greg joined. Both Jannis and Greg were a tremendous help and fantastic people to work with. Thank you!

Sam Snedders helped me figure out a ton of stuff with how Bleach interacts with html5lib. Sam was kind enough to deputize me as a temporary html5lib maintainer to get 1.0 out the door. I really appreciated Sam putting faith in me. Conversations about the particulars of HTML parsing--I'll miss those. Thank you!

While James wasn't maintaining Bleach anymore, he always took the time to answer questions I had. His historical knowledge, guidance, and thoughtfulness were crucial. James was my manager for a while. I miss him. Thank you!

There were a handful of people who contributed patches, too. Thank you!

Thank your maintainers!

My experience from 20 years of OSS projects is that many people are in similar situations: continuing to maintain something because of internal obligations long after they're getting any value from the project.

Take care of the maintainers of the projects you use! You can't thank them enough for their time, their energy, their diligence, their help! Not just the big successful projects, but also the one-person projects, too.

Shout-out for PyCon 2019 maintainers summit

Sumana mentioned that PyCon 2019 has a maintainers summit. That looks fantastic! If you're in the doldrums of maintaining an OSS project, definitely go if you can.

Changes to this blog post

Update March 2, 2019: I completely forgot to thank Sam Snedders which is a really horrible omission. Sam's the best!