html5lib-python 1.0 released!: retrospective (2017)

Note: This is an old post in a blog with a lot of posts over a long span of time. The world has changed, technologies have changed, and I've changed. It's likely this is out of date, the code doesn't work, the ideas haven't aged well, or the ideas were terrible to begin with. Let me know if you think this is something that needs updating.

Project

time:

3 months

impact:
  • reduced technical debt and maintenance friction for html5lib which impacts a variety of projects like PyPI, pip, readme_renderer, Jupyter, TensorFlow

  • reduced security risks for Bleach

html5lib-python v1.0 released!

Yesterday, Sam released html5lib 1.0 [1]! The changes aren't wildly interesting for users, but are important for the health of the project.

The more interesting part for me is how the release happened and experimenting with interim maintainers to get projects going again. I'm going to spend the rest of this post talking about that.

The story of Bleach and html5lib

I work on Bleach which is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML. It relies heavily on another library called html5lib-python. Most of the work that I do on Bleach consists of figuring out how to make html5lib do what I need it to do.

Over the last few years, maintainers of the html5lib library have been working towards a 1.0. Those well-meaning efforts got them into a versioning model which had some unenthusing properties. I would often talk to people about how I was having difficulties with Bleach and html5lib 0.99999999 (8 9s) and I'd have to mentally count how many 9s I had said. It was goofy [2].

In an attempt to deal with the effects of the versioning, there's a parallel set of versions that start with 1.0b. Because there are two sets of versions, it was a total pain in the ass to correctly specify which versions of html5lib that Bleach worked with.

While working on Bleach 2.0, I bumped into a few bugs and upstreamed a patch for at least one of them. That patch sat in the PR queue for months. That's what got me wondering--what's going on with html5lib?

I tracked down Sam and talked with her a bit on IRC. She seems to be the only active maintainer. She was really busy with other things, html5lib doesn't pay, there's a ton of stuff to do, she's burned out, and recently there have been spats of negative comments in the issues and PRs. Generally the project had a lot of stop energy.

Some time in August, I offered to step up as an interim maintainer and shepherd html5lib to 1.0. The goals being:

  1. land or close as many old PRs as possible

  2. triage, fix, and close as many issues as possible

  3. clean up testing and CI

  4. clean up documentation

  5. ship 1.0 which ends the versioning issues

Thoughts on being an interim maintainer

I see a lot of open source projects that are in trouble in the sense that they don't have a critical mass of people and energy. When the sole part-time volunteer maintainer burns out, the project languishes. Then users show up, complain, demand changes, and talk about how horrible the situation is and everyone should be ashamed. It's tough--people are frustrated and then do a bunch of things that make everything so much worse. How do projects escape the raging inferno death spiral?

For a while now, I've been thinking about a model for open source projects where someone else pops in as an interim maintainer for a short period of time with specific goals, works to achieve those goals, and then steps down. Maybe this alleviates users' frustrations? Maybe this gives the part-time volunteer burned-out maintainer a breather? Maybe this can get the project moving again? Maybe the temporary interim maintainer can make some of the hard decisions that a regular long-term maintainer just can't?

I wondered if I should try that model out here. In the process of convincing myself that stepping up as an interim maintainer was a good idea [3], I looked at projects that rely on html5lib [4]:

  • pip vendors it

  • Bleach relies upon it heavily, so anything that uses Bleach uses html5lib (jupyter, hypermark, readme_renderer, TensorFlow, ...)

  • anything that uses readme_renderer like PyPI and tools around Python packages

  • most web browsers (Firefox, Chrome, servo, etc) have it in their repositories because web-platform-tests uses it

I talked with Sam and offered to step up with these goals in mind.

I started with cleaning up the milestones in GitHub. I decided the 0.9999999999 (10 9s) milestone was going to be 1.0. I bumped everything from the 0.9999999999 (10 9s) milestone to the 1.0 milestone. I went through all the issues and PRs and threw any that piqued my interest in the 1.0 milestone bucket.

Then I went through the issue tracker and triaged all the issues. I tried to get steps to reproduce and any other data that would help resolve the issue. I closed some issues I didn't think would ever get resolved.

I triaged all the pull requests. Some of them had been open for a long time. I apologized to people who had spent their time to upstream a fix that sat around for years. In some cases, the changes had bitrotted severely they had to be re-done [5].

Then I plugged away at issues and pull requests for a couple of months landing and fixing, and pushed anything out of the milestone that wasn't well-defined or something we couldn't fix in a week.

At the end of all that, Sam released version 1.0 and here we are today!

Conclusion and more thoughts

I finished up as interim maintainer for html5lib. I don't think I'm going to continue actively as a maintainer. Yes, Bleach uses it, but I've got other things I should be doing.

I think this was an interesting experiment. I also think it was a successful experiment in regards to achieving my stated goals, but I don't know if it gave the project much momentum to continue forward.

I'd love to see other examples of interim maintainers stepping up, achieving specific goals, and then stepping down again. Does it bring in new people to the community? Does it affect the raging inferno death spiral at all? What kinds of projects would benefit from this the most? What kinds of projects wouldn't benefit at all?

Want to comment? Send an email to willkg at bluesock dot org. Include the url for the blog entry in your comment so I have some context as to what you're talking about.