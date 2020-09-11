Prepare to dive in!

It's September now and 2020h1 ended a long time ago, but I'm only just getting a chance to catch up and some things happened in 2020h1 that are important to divulge and we don't tell anyone about Socorro events via any other medium.

2020h1 was rough. Layoffs, re-org, Berlin All Hands, Covid-19, focused on MLS for a while, then I switched back to Socorro/Tecken full time, then virtual All Hands.

We also did a bunch of small features, bug fixes, docs fixes, and other things.

The Tecken system test also has a set of utilities for manipulating stacks and generating test data which is handy for debugging issues.

I wrote a new system test which effectively tests live environments. I based some bits on the Antenna system test I wrote years ago.

In 2019, I had cobbled together a "system test" for Tecken to verify that the APIs were working in a live system, but the number of issues were increasing and it effectively stopped working.

Crash reports consists of a set of crash annotations and then zero or more minidump files. I documented crash annotations, what they are, where they're documented, and the process to add them.

If you see things that are still confusing, please let me know.

I standardized everything to use the phrase "protected data". I cleaned up the data policy. I added notes to web pages stating whether you were seeing or not seeing protected data with links to the policy. I improved the iconography around protected data.

Even further, when you were logged in and didn't have access to protected data, it was unclear what you couldn't see and what you could do about that.

Further, when you were logged in and had access to protected data, it was unclear which data protected and which was public.

Prior to this work, we talked about data as "public" and "requiring minidump access" which made sense a long time ago, but makes no sense now. Now we've got "public" and "protected data". Also, we used different phrases in different places like "minidump access" and "personally identifiable information".

Improved language and notices around protected data

We switched all the repositories related to Tecken and Socorro to use "main" as the default branch name.

Switch from master to main

As part of improving Fenix support, we changed a bunch of things around how Socorro supports products and the documentation for that.

Socorro handles semver versions now in addition to the "Firefox version scheme".

We did a lot of work on Socorro to support Fenix requirements. Some of that work has benefitted Fennec and other projects as well. Socorro is in a much better place for supporting projects that work differently than Firefox does.

Thus the roadmap for Socorro is to unify as many of those things with the Data Platform group's versions and reduce a lot of redundancy, specialized tooling, maintenance costs, and other things.

There's a lot of overlap. Further, for Socorro, all those things were covered by John and I. For other ingestion pipelines, there are tens of people covering those things.

support for groups adding data and using data

process for removing data and handling data removal requests

process for getting access to data and people responsible for handling access requests

pipeline with the same rough shape: collector, processor, data storage, access to data, reports, ad hoc analysis tools

client to send data to the pipeline

For Socorro, the story is a little different. Mozilla has multiple data ingestion pipelines. Each pipeline solves the same set of problems:

For Tecken, there's a lot of overlap between Tecken and what Sentry is working on around symbols and symbolication. We should unify these two efforts where possible. The first step towards this is switching to use the Symbolic library .

I spent a good chunk of 2020h1 looking at the current state of Socorro and Tecken, what they do, why they do it, similar projects at Mozilla, and finally where we should head over the next 3 years.

I did the bulk of this work at the end of 2019, but in early 2020, we did the switch.

Switch from GCP Pub/Sub to AWS SQS for queueing

Blog post: Switching from pyup to dependabot

I wrote a tool called paul-mclendahand to deal with this.

One problem with this is that dependabot creates one pull request per dependency update. So on the first of the month, we get a bazillion pull requests. Multiply this by multiple repositories and projects and that becomes a ton of work to do every month.

We had problems with the system we were using and decided to go whole hog and switch to dependabot for all dependency management.

Switch to dependabot for dependency management

