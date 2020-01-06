While there are good reasons for why 2019 was crazy, it was soooo crazy. Some highlights:

I released Everett v1.0.0. Everett is a Python library for managing configuration. It's similar to other libraries, but it includes support for documenting configuration and testing with configuration which makes development and using projects that use Everett a lot easier. I released version 1.0.0 in January.

I reimplemented crontabber in Socorro. Socorro has a scheduled tasks system and relied on a library called crontabber. crontabber was initially part of Socorro and was extracted so other people could use it. The crontabber code wasn't well maintained and it had a lot of issues. I decided it was easier and more convenient to rewrite it as Django management commands than to take on maintaining the crontabber library. So I did.

I stepped down as the Bleach maintainer. Bleach is a Python library that makes user-provided content safe in an HTML context. It's used in a lot of places. It's slow, it's a difficult and complex problem domain, it's finicky and fragile, and it relies on another library called html5lib which has its own set of daunting problems. In March, I stepped down because I was burned out and needed to reduce the number of things I was working on out of sheer obligation. Months after I put it down, I still feel lousy that I walked away. Every time I think about how I feel lousy, I tell myself it was the right thing to do. I talk about stepping down from Bleach.

I migrated Socorro from RabbitMQ to Google Pub/Sub. I redid how Socorro handles queueing crash reports for processing. Previously, it used RabbitMQ. I switched it to Google Pub/Sub. In doing this, I removed one of the components between the collector and the processor which was sometimes flakey, so that was good. This was the first step in moving all of Socorro to Google Cloud Platform. Later in the year, we decided not to move Socorro to Google Cloud Platform. Fun times!

I got a new co-worker! I was working with Osmose for parts of 2018, but he left in early 2019 and even when he was around, he was on other projects and I was mostly working on my own and increasingly feeling disconnected and isolated. That kind of sucked. In April-ish, John joined me on Socorro. John's great to work with! Not only does he reduce the bus factor for our projects, allow me to go on vacations with less anxiety, and have a predilection towards deep dives into mysterious problems, but he's also wonderful to talk with. Yay for co-workers!

We took over and audited Buildhub. In April-ish, John and I inheirited Buildhub. It was written a couple of years prior to be an index of build information for Mozilla projects. The build process creates artifacts on archive.mozilla.org which is an AWS S3 bucket with a web interface of the directory structure. Buildhub consumes that information, puts it into an index, and provides a search interface for it. Socorro needs this information for converting (buildid, channel, version) to a proper version string. Mission Control needs this information for similar things. There are other systems that use it, too. John and I audited the Buildhub project. We wrote up a bunch of issues for things that surfaced from the audit. We fixed security issues and performed some necessary maintenance.

We took over and audited Buildhub2. The way Buildhub was built, it was really challenging to debug problems with build artifact ingestion. It had a lot of problems with missing information for unclear reasons. Buildhub2 was another attempt at building a build information index. It was launched at the end of 2018. It took a different approach and was a stricter mirror of the information on archive.mozilla.org--it didn't attempt to infer anything from older builds which didn't have buildhub.json files. We audited the Buildhub2 project, wrote up a bunch of issues that surfaced from the audit, fixed security issues, updated dependencies, rewrote the documentation, and wrote a basic runbook.

We shut down Buildhub. During the Buildhub and Buildhub2 audits, I decided that while Buildhub2 has a different set of issues with its data, it was better than maintaining two indexes. I wrote up a plan to shut down Buildhub, identified and fixed blocking issues in Buildhub2, and migrated projects from Buildhub to Buildhub2. Then we shut down and dismantled Buildhub.

We took over PollBot and Dependency Dashboard. "Took over" is a bit of a stretch here. We did a rough audit of both projects and fixed some security issues with dependencies. However, we didn't get very far in absorbing either of these and still don't know much about them.

We took over and audited Tecken. Tecken is the Mozilla Symbols Server. It's used by a bunch of projects including Socorro for symbolicating stacks. Tecken was in pretty good shape, so we haven't had to spend a lot of time on urgent work.

I wrote an essay on crash pings (Telemetry) vs crash reports (Socorro/Crash Stats). In July, I wrote Crash pings (Telemetry) and crash reports (Socorro/Crash Stats). It took a while to write because it goes into a lot of detail for specific things. I know there have been changes in Telemetry-land as they moved to GCP, so I bet parts of it are wrong now. Writing it sure helped me and other people understand the current situation regarding crash report data and which data is good to use for what purposes. Will Lachance and I bandied about writing a more permanent manual for crash report data. I think that's a good idea, but I had to switch projects and haven't had time to spend on it, yet. I want to go through the essay and do an update at some point soon.

I released crashstats-tools v1.0.1. In 2018, I was tinkering with crashstats-tools. as a standalone set of command line tools that make it easier to manipulate crash report data from Crash Stats using the Crash Stats APIs in a command-line context. I use these tools in a few different ways mostly when looking into issues with Socorro processing. I wasn't sure if anyone else would use it, so I didn't tell anyone for a while--I didn't want to add another project to my plate that required ongoing maintenance work. In 2019, Gabriele and Marco spent a lot of time improving the situation around system library symbols. Up until recently, we had system library symbols for Windows in some cases and some for some versions of Mac OS, but parts of it were really manual and we didn't have a good story for Linux and it was generally just not great. This is a problem when walking and symbolicating stacks. Without symbols, the stackwalker has to guess where the frames are and that's problematic. Further, the result isn't human readable. For example, you end up with stuff like this: 0 libxul.so libxul.so@0x43d015b context 1 libxul.so libxul.so@0x43cffb6 frame_pointer 2 libxul.so libxul.so@0x415bc0e frame_pointer 3 libxul.so libxul.so@0x3f6c4f8 scan 4 libxul.so libxul.so@0x3fe0e4b scan 5 libxul.so libxul.so@0x3fde1bd scan 6 libxul.so libxul.so@0x3fde9cf scan 7 libxul.so libxul.so@0x3fd07f8 scan 8 libxul.so libxul.so@0x553c1df scan 9 libxul.so libxul.so@0x3fd088f scan 10 libxul.so libxul.so@0x553c1df scan 11 libxul.so libxul.so@0x3fd1bc0 scan 12 libxul.so libxul.so@0x3fea0a8 scan 13 libxul.so libxul.so@0x3feac6f scan 14 libxul.so libxul.so@0x3ff276c scan 15 libxul.so libxul.so@0x3feadda scan 16 libxul.so libxul.so@0x3fd6f7b scan 17 libxul.so libxul.so@0x3fec0a3 scan After Gabriele and Marco's work, we now have this: 0 libxul.so <hashglobe::hash_map::HashMap<K, V, S>>::clear /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/servo/components/hashglobe/src/hash_map.rs:1050 context 1 libxul.so style::stylist::CascadeData::clear /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/servo/components/style/stylist.rs:2412 cfi 2 libxul.so Servo_StyleSet_FlushStyleSheets /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/servo/components/style/stylist.rs:2048 cfi 3 libxul.so mozilla::ServoStyleSet::UpdateStylist() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/layout/style/ServoStyleSet.cpp:1374 cfi 4 libxul.so mozilla::ServoStyleSet::ResolveInheritingAnonymousBoxStyle(nsAtom*, mozilla::ServoStyleContext*) /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/layout/style/ServoStyleSet.cpp:592 cfi 5 libxul.so nsCSSFrameConstructor::ConstructRootFrame() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/layout/base/nsCSSFrameConstructor.cpp:2661 cfi 6 libxul.so mozilla::PresShell::Initialize() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/layout/base/PresShell.cpp:1685 cfi 7 libxul.so nsContentSink::StartLayout(bool) /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/dom/base/nsContentSink.cpp:1203 cfi 8 libxul.so nsHtml5TreeOpExecutor::StartLayout(bool*) /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/parser/html/nsHtml5TreeOpExecutor.cpp:639 cfi 9 libxul.so nsHtml5TreeOperation::Perform(nsHtml5TreeOpExecutor*, nsIContent**, bool*, bool*) /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/parser/html/nsHtml5TreeOperation.cpp:1110 cfi 10 libxul.so nsHtml5TreeOpExecutor::RunFlushLoop() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/parser/html/nsHtml5TreeOpExecutor.cpp:456 cfi 11 libxul.so nsHtml5ExecutorFlusher::Run() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/parser/html/nsHtml5StreamParser.cpp:125 cfi 12 libxul.so mozilla::SchedulerGroup::Runnable::Run() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/xpcom/threads/SchedulerGroup.cpp:370 cfi 13 libxul.so nsThread::ProcessNextEvent(bool, bool*) /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/xpcom/threads/nsThread.cpp:975 cfi 14 libxul.so NS_ProcessNextEvent(nsIThread*, bool) /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/xpcom/threads/nsThreadUtils.cpp:455 cfi 15 libxul.so mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/ipc/glue/MessagePump.cpp:88 cfi 16 libxul.so MessageLoop::Run() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/ipc/chromium/src/base/message_loop.cc:290 cfi 17 libxul.so nsBaseAppShell::Run() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/widget/nsBaseAppShell.cpp:136 cfi 18 libxul.so XRE_RunAppShell() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/toolkit/xre/nsEmbedFunctions.cpp:860 cfi 19 libxul.so MessageLoop::Run() /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/ipc/chromium/src/base/message_loop.cc:290 cfi 20 libxul.so XRE_InitChildProcess(int, char**, XREChildData const*) /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/toolkit/xre/nsEmbedFunctions.cpp:698 cfi 21 firefox-esr content_process_main(mozilla::Bootstrap*, int, char**) /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/browser/app/../../ipc/contentproc/plugin-container.cpp:49 cfi 22 firefox-esr main /build/firefox-esr-Mag8OK/firefox-esr-60.7.1esr/browser/app/nsBrowserApp.cpp:254 cfi Ø 23 libc-2.24.so libc-2.24.so@0x202e0 cfi 24 firefox-esr firefox-esr@0x561f scan 25 firefox-esr firefox-esr@0x596f scan Ø 26 ld-2.24.so ld-2.24.so@0xf96a scan 27 firefox-esr firefox-esr@0x596f scan 28 firefox-esr _start scan 29 @0x7ffe1c04eb37 Big difference, right?! Gabriele told me he's using crashstats-tools in their symbols upload scripts. So the scripts upload symbols for modules that are missing in Mozilla Symbols server, then do a search on Crash Stats for crash reports where those modules show up in the stack, and reprocess those crash reports. That's immensely helpful. I wrote about the crashstats-tools release.

John and I picked up Mozilla Location Services. Mozilla Location Services had been dormant for years. It was running Python 2.6 on Scientific Linux. It had a deploy pipeline that was several generations old. It was in an unmaintainable state. We overhauled it, finished up the Docker-ization of the services, finished the mostly-done migration from Python 2.6 to Python 3, updated dependencies, reduced a bunch of complexity, wrote a lot of documentation, fixed a ton of issues, pushed out a new deploy pipeline and Docker-based infrastructure, and did a series of stop-gap fixes for processing. It was a massive undertaking. The infrastructure migration went smoothly--the site was unavailable for like 15 minutes during the switch over from the old infrastructure and old code base to the new one. There are still a bunch of issues with the system. We're triaging them now. However, it's maintainable and we can do deploys so it's vastly improved situation. This is currently our primary project, so we'll be spending most of our time on this in early 2019.

We passed off some projects. After picking up Mozilla Location Services, John and I were spread waaaay too thin, so we passed off Buildhub2 and PollBot to the build engineering team. That was a little tricky because I had only had these projects for a short period of time, so it was hard to answer questions about them.

I released Markus v2.0.0. Markus is a Python library for metrics generation. It wraps statsd and dogstatsd and some other libraries and makes it much easier to develop and test code that generates metrics. I use it in all my projects. Version 2.0.0 involved a minor rewrite to support filters. Filters let you adjust metrics before they get emitted. This makes it easier to add tags to all metrics generated for a service with things like the host and service type. I wrote about Markus v2.0.0 here.