Socorro Engineering: 2022 retrospective
Summary
2022 took forever. At the same time, it kind of flew by. 2023 is already moving along, so this post is a month late. Here's the retrospective of Socorro engineering in 2022.
2022 took forever. At the same time, it kind of flew by. 2023 is already moving along, so this post is a month late. Here's the retrospective of Socorro engineering in 2022.
Bleach is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML.
Bleach 6.0.0 cleans up some issues in linkify and with the way it uses html5lib so it's easier to reason about. It also adds support for Python 3.11 and cleans up the project infrastructure.
There are several backwards-incompatible changes, hence the 6.0.0 version.
https://bleach.readthedocs.io/en/latest/changes.html#version-6-0-0-january-23rd-2023
I did some rough testing with a corpus of Standup messages data and it looks
like bleach.clean
is slightly faster with 6.0.0 than 5.0.0.
Using Python 3.10.9:
5.0.0: bleach.clean on 58,630 items 10x: minimum 2.793s
6.0.0: bleach.clean on 58,630 items 10x: minimum 2.304s
The other big change 6.0.0 brings with it is that it's now deprecated.
Bleach sits on top of html5lib which is not actively maintained. It is increasingly difficult to maintain Bleach in that context and I think it's nuts to build a security library on top of a library that's not in active development.
Over the years, we've talked about other options:
find another library to switch to
take over html5lib development
fork html5lib and vendor and maintain our fork
write a new HTML parser
etc
With the exception of option 1, they greatly increase the scope of the work for Bleach. They all feel exhausting to me.
Given that, I think Bleach has run its course and this journey is over.
Possibilities:
Pass it to someone else?
No, I won't be passing Bleach to someone else to maintain. Bleach is a security-related library, so making a mistake when passing it to someone else would be a mess. I'm not going to do that.
Switch to an alternative?
I'm not aware of any alternatives to Bleach. I don't plan to work on coordinating the migration for everyone from Bleach to something else.
Oh my goodness--you're leaving us with nothing?
Sort of.
I'm going to continue doing minimal maintenance:
security updates
support for new Python versions
fixes for egregious bugs (begrudgingly)
I'll do that for at least a year. At some point, I'll stop doing that, too.
I think that gives the world enough time for either something to take Bleach's place, or for the sanitizing web api to kick in, or for everyone to come to the consensus that they never really needed Bleach in the first place.
Bleach. Tired. At the end of its journey.
Many thanks to Greg who I worked with on Bleach for a long while and maintained Bleach for several years. Working with Greg was always easy and his reviews were thoughtful and spot-on.
Many thanks to Jonathan who, over the years, provided a lot of insight into how best to solve some of Bleach's more squirrely problems.
Many thanks to Sam who was an indispensible resource on HTML parsing and sanitizing text in the context of HTML.
For more specifics on this release, see here: https://bleach.readthedocs.io/en/latest/changes.html#version-6-0-0-january-23rd-2023
Documentation and quickstart here: https://bleach.readthedocs.io/en/latest/
Source code and issue tracker here: https://github.com/mozilla/bleach
2+ years
radically reduced risk of data leaks due to misconfigured permissions
centralized and simplified configuration and management of fields
normalization and validation performed during processing
documentation of data reviews, data caveats, etc
reduced risk of bugs when adding new fields--testing is done in CI
new crash reporting data dictionary with Markdown-formatted descriptions, real examples, relevant links
I've been working on Socorro (crash ingestion pipeline at Mozilla) since the beginning of 2016. During that time, I've focused on streamlining maintainence of the project, paying down technical debt, reducing risk, and improving crash analysis tooling.
One of the things I identified early on is how the crash ingestion pipeline was chaotic, difficult to reason about, and difficult to document. What did the incoming data look like? What did the processed data look like? Was it valid? Which fields were protected? Which fields were public? How do we add support for a new crash annotation? This was problematic for our ops staff, engineering staff, and all the people who used Socorro. It was something in the back of my mind for a while, but I didn't have any good thoughts.
In 2020, Socorro moved into the Data Org which has multiple data pipelines. After spending some time looking at how their pipelines work, I wanted to rework crash ingestion.
The end result of this project is that:
the project is easier to maintain:
adding support for new crash annotations is done in a couple of schema files and possibly a processor rule
risk of security issues and data breaches is lower:
typos, bugs, and mistakes when adding support for a new crash annotation are caught in CI
permissions are specified in a central location, changing permission for fields is trivial and takes effect in the next deploy, setting permissions supports complex data structures in easy-to-reason-about ways, and mistakes are caught in CI
the data is easier to use and reason about:
normalization and validation of crash annotation data happens during processing and downstream uses of the data can expect it to be valid; further we get a signal when the data isn't valid which can indicate product bugs
schemas describing incoming and processed data
crash reporting data dictionary documenting incoming data fields, processed data fields, descriptions, sources, data gotchas, examples, and permissions
Socorro is the crash ingestion pipeline for Mozilla products like Firefox, Fenix, Thunderbird, and MozillaVPN.
When Firefox crashes, the crash reporter asks the user if the user would like to send a crash report. If the user answers "yes!", then the crash reporter collects data related to the crash, generates a crash report, and submits that crash report as an HTTP POST to Socorro. Socorro saves the submitted crash report, processes it, and has tools for viewing and analyzing crash data.
The crash ingestion system was working and it was usable, but it was in a bad state.
Poor data management
Normalization and validation of data was all over the codebase and not consistent:
processor rule code
AWS S3 crash storage code
Elasticsearch indexing code
Telemetry crash storage code
Super Search querying and result rendering code
report view and template code
signature report code and template code
crontabber job code
any scripts that used the data
tests -- many of which had bad test data so who knows what they were really testing
Naive handling of minidump stackwalker output which meant that any changes in the stackwalker output were predominantly unnoticed and there was no indication as to whether changed output created issues in the system.
Further, since it was all over the place, there were no guarantees for data validity when downloading it using the RawCrash, ProcessedCrash, and SuperSearch APIs. Anyone writing downstream systems would also have to normalize and validate the data.
Poor permissions management
Permissions were defined in multiple places:
Elasticsearch json redactor
Super Search fields
RawCrash API allow list
ProcessedCrash API allow list
report view and template code
Telemetry crash storage code
and other places
We couldn't effectively manage permissions of fields in the stackwalker output because we had no idea what was there.
Poor documentation
No documentation of crash annotation fields other than CrashAnnotations.yaml which didn't enforce anything in crash ingestion (process, valid type, data correctness, etc) and was missing important information like data gotchas, data review urls, and examples.
No documentation of processed crash fields at all.
Making changes was high risk
Changing fields from public to protected was high risk because you had to find all the places it might show up which was intractable. Adding support for new fields often took multiple passes over several weeks because we'd miss things. Server errors happend with some regularity due to weirdness with crash annotation values affecting the Crash Stats site.
Tangled concerns across the codebase
Lots of tangled concerns where things defined in one place affected other places that shouldn't be related. For example, the Super Search fields definition was acting as a "schema" for other parts of the system that had nothing to do with Elasticsearch or Super Search.
Difficult to maintain
It was difficult to support new products.
It was difficult to debug issues in crash ingestion and crash reporting.
The Crash Stats webapp contained lots of if/then/else bits to handle weirdness in the crash annotation values. Nulls, incorrect types, different structures, etc.
Socorro contained lots of vestigial code from half-done field removal, deprecated fields, fields that were removed from crash reports, etc. These vestigial bits were all over the code base. Discovering and removing these bits was time consuming and error prone.
The code for exporting data to Telemetry built the export data using a list of fields to exclude rather than a list of fields to include. This is backwards and impossible to maintain--we never should have been doing this. Further, it pulled data from the raw crash which we had no validation guarantees for which would cause issues downstream in the Telemetry import code.
There was no way to validate the data used in the unit tests which meant that a lot of it was invalid. We had no way to validate the test data which meant that CI would pass, but we'd see errors in our stage and production environments.
Different from other similar systems
In 2020, Socorro was moved to the Data Org in Mozilla which had a set of standards and conventions for collecting, storing, analyzing, and providing access to data. Socorro didn't follow any of it which made it difficult to work on, to connect with, and to staff. Things Data Org has that Socorro didn't:
a schema covering specifying fields, types, and documentation
data flow documentation
data review policy, process, and artifacts for data being collected and how to add new data
data dictionary for fields for users including documentation, data review urls, data gotchas
In summary, we had a system that took a lot of effort to maintain, wasn't serving our users' needs, and was high risk of security/data breach.
Many of these issues can be alleviated and reduced by moving to a schema-driven system where we:
define a schema for annotations and a schema for the processed crash
change crash ingestion and the Crash Stats site to use those schemas
When designing this schema-driven system, we should be thinking about:
how easy is it to maintain the system?
how easy is it to explain?
how flexible is it for solving other kinds of problems in the future?
what kinds of errors will likely happen when maintaining the system and how can we avert them in CI?
what kinds of errors can happen and how much risk do they pose for data leaks? what of those can we avert in CI?
how flexible is the system which needs to support multiple products potentially with different needs?
I worked out a minimal version of that vision that we could migrate to and then work with going forward.
The crash annotations schema should define:
what annotations are in the crash report?
which permissions are required to view a field
field documentation (provenance, description, data review, related bugs, gotchas, analysis tips, etc)
The processed crash schema should define:
what's in the processed crash?
which permissions are required to view a field
field documentation (provenance, description, related bugs, gotchas, analysis tips, etc)
Then we make the following changes to the system:
write a processor rule to copy, nomralize, and validate data from the raw crash based on the processed crash schema
switch the Telemetry export code to using the processed crash for data to export
switch the Telemetry export code to using the processed crash schema for permissions
switch Super Search to using the processed crash for data to index
switch Super Search to using the processed crash schema for documentation and permissions
switch Crash Stats site to using the processed crash for data to render
switch Crash Stats site to using the processed crash schema for documentation and permissions
switch the RawCrash, ProcessedCrash, and SuperSearch APIs to using the crash annotations and processed crash schemas for documentation and permissions
After doing that, we have:
field documentation is managed in the schemas
permissions are managed in the schemas
data is normalized and validated once in the processor and everything uses the processed crash data for indexing, searching, and rendering
adding support for new fields and changing existing fields is easier and problems are caught in CI
Use JSON Schema.
Data Org at Mozilla uses JSON Schema for schema specification. The schema is written using YAML.
https://mozilla.github.io/glean_parser/metrics-yaml.html
The metrics schema is used to define metrics.yaml
files which specify the
metrics being emitted and collected.
For example:
https://searchfox.org/mozilla-central/source/toolkit/mozapps/update/metrics.yaml
One long long long term goal for Socorro is to unify standards and practices with the Data Ingestion system. Towards that goal, it's prudent to build out a crash annotation and processed crash schemas using whatever we can take from the equivalent metrics schemas.
We'll additionally need to build out tooling for verifying, validating, and testing schema modifications to make ongoing maintenance easier.
Use schemas to define and drive everything.
We've got permissions, structures, normalization, validation, definition, documentation, and several other things related to the data and how it's used throughout crash ingestion spread out across the codebase.
Instead of that, let's pull it all together into a single schema and change the system to be driven from this schema.
The schema will include:
structure specification
documentation including data gotchas, examples, and implementation details
permissions
processing instructions
We'll have a schema for supported annotations and a schema for the processed crash.
We'll rewrite existing parts of crash ingestion to use the schema:
processing 1. use processing instructions to validate and normalize annotation data
super search 1. field documentation 2. permissions 3. remove all the normalization and validation code from indexing
crash stats 1. field documentation 2. permissions 3. remove all the normalization and validation code from page rendering
Only use processed crash data for indexing and analysis.
The indexing system has its own normalization and validation code since it pulls data to be indexed from the raw crash.
The crash stats code has its own normalization and validation code since it renders data from the raw crash in various parts of the site.
We're going to change this so that all normalization and validation happens during processing, the results are stored in the processed crash, and indexing, searching, and crash analysis only work on processed crash data.
By default, all data is protected.
By default, all data is protected unless it is explicitly marked as public. This has some consequences for the code:
any data not specified in a schema is treated as protected
all schema fields need to specify permissions for that field
any data in a schema is either: * marked public, OR * lists the permissions required to view that data
for nested structures, any child field that is public has public ancesters
We can catch some of these issues in CI and need to write tests to verify them.
This is slightly awkward when maintaining the schema because it would be more reasonable to have "no permissions required" mean that the field is public. However, it's possible to accidentally not specify the permissions and we don't want to be in that situation. Thus, we decided to go with explicitly marking public fields as public.
We had a lot of work to do before we could start defining schemas and changing the system to use those schemas.
remove vestigial code (some of this work was done in other phases as it was discovered)
[bug 1724933]: remove unused/obsolete annotations (2021-08)
[bug 1743487]: remove total_frames
(2021-11)
[bug 1743704]: remove jit crash classifier (2022-02)
[bug 1762000]: remove vestigial Winsock_LSP
code (2022-03)
[bug 1784485]: remove vestigial exploitability
code (2022-08)
[bug 1784095]: remove vestigial contains_memory_report
code (2022-08)
[bug 1787933]: exorcise flash things from the codebase (2022-09)
fix signature generation
[bug 1753521]: use fields from processed crash (2022-02)
[bug 1755523]: fix signature generation so it only uses processed crash data (2022-02)
[bug 1762207]: remove hang_type
(2022-04)
fix Super Search
[bug 1624345]: stop saving random data to Elasticsearch crashstorage (2020-06)
[bug 1706076]: remove dead Super Search fields (2021-04)
[bug 1712055]: remove system_error
from Super Search fields (2021-07)
[bug 1712085]: remove obsolete Super Search fields (2021-08)
[bug 1697051]: add crash_report_keys
field (2021-11)
[bug 1736928]: remove largest_free_vm_block
and tiny_block_size
(2021-11)
[bug 1754874]: remove unused annotations from Super Search (2022-02)
[bug 1753521]: stop indexing items from raw crash (2022-02)
[bug 1762005]: migrate to lower-cased versions of Plugin*
fields in
processed crash (2022-03)
[bug 1755528]: fix flag/boolean handling (2022-03)
[bug 1762207]: remove hang_type
(2022-04)
[bug 1763264]: clean up super search fields from migration (2022-07)
fix data flow and usage
[bug 1740397]: rewrite CrashingThreadInfoRule
to normalize crashing thread (2021-11)
[bug 1755095]: fix TelemetryBotoS3CrashStorage
so it doesn't use Super Search fields (2022-03)
[bug 1740397]: change webapp to pull crashing_thread
from processed crash (2022-07)
[bug 1710725]: stop using DotDict
for raw and processed data (2022-09)
clean up the raw crash structure
[bug 1687987]: restructure raw crash (2021-01 through 2022-10)
After cleaning up the code base, removing vestigial code, fixing Super Search, and fixing Telemetry export code, we could move on to defining schemas and writing all the code we needed to maintain the schemas and work with them.
[bug 1762271]: rewrite json schema reducer (2022-03)
[bug 1764395]: schema for processed crash, reducers, traversers (2022-08)
[bug 1788533]: fix validate_processed_crash
to handle
pattern_properties
(2022-08)
[bug 1626698]: schema for crash annotations in crash reports (2022-11)
That allowed us to fix a bunch of things:
[bug 1784927]: remove elasticsearch redactor code (2022-08)
[bug 1746630]: support new threads.N.frames.N.unloaded_modules
minidump-stackwalk fields (2022-08)
[bug 1697001]: get rid of UnredactedCrash API and model (2022-08)
[bug 1100352]: remove hard-coded allow lists from RawCrash (2022-08)
[bug 1787929]: rewrite Breadcrumbs
validation (2022-09)
[bug 1787931]: fix Super Search fields to pull permissions from processed crash schema (2022-09)
[bug 1787937]: fix Super Search fields to pull documentation from processed crash schema (2022-09)
[bug 1787931]: use processed crash schema permissions for super search (2022-09)
[bug 1100352]: remove hard-coded allow lists from ProcessedCrash models (2022-11)
[bug 1792255]: add telemetry_environment to processed crash (2022-11)
[bug 1784558]: add collector metadata to processed crash (2022-11)
[bug 1787932]: add data review urls for crash annotations that have data reviews (2022-11)
With fields specified in schemas, we can write a crash reporting data dictionary:
[bug 1803558]: crash reporting data dictionary (2023-01)
[bug 1795700]: document raw and processed schemas and how to maintain them (2023-01)
Then we can finish:
[bug 1677143]: documenting analysis gotchas (ongoing)
[bug 1755525]: fixing the report view to only use the processed crash (future)
[bug 1795699]: validate test data (future)
This was a very very long-term project with many small steps and some really big ones. Getting large projects done is futile and the only way to do it successfully is to break it into a million small steps each of which stand on their own and don't create urgency for getting the next step done.
Any time I changed field names or types, I'd have to do a data migration. Data migrations take 6 months to do because I have to wait for existing data to expire from storage. On the one hand, it's a blessing I could do migrations at all--you can't do this with larger data sets or with data sets where the data doesn't expire without each migration becoming a huge project. On the other hand, it's hard to juggle being in the middle of multiple migrations and sometimes the contortions one has to perform are grueling.
If you're working on a big project that's going to require changing data structures, figure out how to do migrations early with as little work as possible and use that process as often as you can.
This was such a huge project that spanned years. It's so hard to finish projects like this because the landscape for the project is constantly changing. Meanwhile, being mid-project has its own set of complexities and hardships.
I'm glad I tackled it and I'm glad it's mostly done. There are some minor things to do, still, but this new schema-driven system has a lot going for it. Adding support for new crash annotations is much easier, less risky, and takes less time.
It took me about a month to pull this post together.
That's the story of the schema-based overhaul of crash ingestion. There's probably some bits missing and/or wrong, but the gist of it is here.
If you have any questions or bump into bugs, I hang out on #crashreporting
on
chat.mozilla.org
. You can also write up a bug for Socorro.
Hopefully this helps. If not, let us know!
Today is Volunteer Responsibility Amnesty Day where I spend some time taking stock of things and maybe move some projects to the done pile.
In June, I ran a Volunteer Responsibility Amnesty Day [1] for Mozilla Data Org because the idea really struck a chord with me and we were about to embark on 2022h2 where one of the goals was to "land planes" and finish projects. I managed to pass off Dennis and end Puente. I also spent some time mulling over better models for maintaining a lot of libraries.
This time around, I'm just organizing myself.
Here's the list of things I'm maintaining in some way that aren't the big services that I work on:
Bleach is an allowed-list-based HTML sanitizing Python library.
maintainer
no
more on this next year
Python configuration library.
maintainer
yes
keep on keepin on
Python metrics library.
maintainer
yes
keep on keepin on
Python library for scrubbing Sentry events.
maintainer
yes
keep on keepin on
Fake Sentry server for local development.
maintainer
yes
keep on keepin on, but would be happy to pass this off
Sphinx extension for documenting JavaScript and TypeScript.
co-maintainer
yes
keep on keepin on
Command line utilities for interacting with Crash Stats
maintainer
yes
keep on keepin on
Utility for combining GitHub pull requests.
maintainer
yes
keep on keepin on
Firefox addon for attaching GitHub pull requests to Bugzilla.
maintainer
yes
keep on keepin on
Python library for symbolicating stacks and generating crash signatures.
maintainer
maybe
keep on keepin on for now, but figure out a better long term plan
Python library for generating crash signatures.
maintainer
yes
keep on keepin on
Django OpenID Connect library.
contributor (I maintain docker-test-mozilla-django-oidc
maybe
think about dropping this at some point
That's too many things. I need to pare the list down. There are a few I could probably sunset, but not any time soon.
I'm also thinking about a maintenance model where I'm squishing it all into a burst of activity for all the libraries around some predictable event like Python major releases.
I tried that out this fall and did a release of everything except Bleach (more on that next year) and rob-bugson which is a Firefox addon. I think I'll do that going forward. I need to document it somewhere so as to avoid the pestering of "Is this project active?" issues. I'll do that next year.
I went to NormConf 2022, but didn't attend the whole thing. It was entirely online as a YouTube livestream for something like 14 hours split into three sessions. It had a very active Slack instance.
I like doing post-conference write-ups because then I have some record of what I was thinking at the time. Sometimes that's useful for other people. Often it's helpful for me.
I'm data engineer adjacent. I work on a data pipeline for crash reporting, but it's a streaming pipeline, entirely bespoke, and doesn't use any/many of the tools in the data engineer toolkit. There's no ML. There's no NLP. I don't have a data large-body-of-water. I'm not using SQL much. I'm not having Python packaging problems. Because of that, I kind of skipped over the data engineer related talks.
The conference was well done. Everyone did a great job. The Slack channels I lurked in were hopping. The way they did questions worked really well.
These are my thoughts on the talks I watched.
I work at Mozilla. We get a laptop refresh periodically. I got a new laptop that I was going to replace my older laptop with. I'm a software engineer and I work on services that are built using Docker and tooling that runs on Linux.
This post covers my attempt at setting up a Windows laptop for software development for the projects I work on after having spent the last 20 years predominantly using Linux and Linux-like environments.
Spoiler: This is a failed attempt and I gave up and stuck with Linux.
Back in June, I saw a note about Volunteer Responsibility Amnesty Day in Sumana's Changeset Consulting newsletter. The idea of it really struck a chord with me. I wondered whether running an event like this at work would help. With that, I coordinated an event, ran it, and this is the blog post summarizing how it went.
As people leave Mozilla, the libraries, processes, services, and other responsibilities (hidden and visible) all suddenly become unowned. In some cases, these things get passed to teams and individuals and there's a clear handoff. In a lot of cases, stuff just gets dropped on the floor.
Some of these things should remain on the floor--we shouldn't maintain all the things forever. Sometimes things get maintained because of inertia rather than actual need. Letting these drop and decay over time is fine.
Some of these things turn out to be critical cogs in the machinations of complex systems. Letting these drop and decay over time can sometimes lead to a huge emergency involving a lot of unscheduled scrambling to fix. That's bad. No one likes that.
In the last year, I had picked up a bunch of stuff from people who had left and it was increasingly hard to juggle it all. Thus taking a day to audit all the things on my plate and figuring out which ones I don't want to do anymore seemed really helpful.
Further, even without people leaving, new projects show up, pipelines are added, new services are stood up--there's more stuff running and more stuff to do to keep it all running.
Thus I wondered, what if other people in Data Org at Mozilla had similar issues? What if there were tasks and responsibilities that we had accumulated over the years that, if we stepped back and looked at them, didn't really need to be done anymore? What if there were people who had too many things on their plate and people who had a lot of space? Maybe an audit would surface this and let us collectively shuffle some things around.
In that context, I decided to coordinate a Volunteer Responsibility Amnesty Day for Data Org.
I decided to structure it a little differently because I wanted to run something that people could participate in regardless of what time zone they were in. I wanted it to produce an output that individuals could talk with their managers about--something they could use to take stock of where things were at, surface work individuals were doing that managers may not know about, and provide a punch list of actions to fix any problems that came up.
I threw together a Google doc that summarized the goals, provided a template for the audit, and included a next steps which were pretty much tell us on Slack and bring it up with your manager in your next 1:1. Here's the doc:
https://docs.google.com/document/d/19NF69uavGXii_DEkRpQsJuklHxWoPTWwxBp_ucRela4/edit#
I talked to my manager about it. I mentioned it in meetings and in various channels on Slack.
On the actual day, I posted a few reminders in Slack.
I figured it was worth doing once. Maybe it would be helpful? Maybe not? Maybe it helps us reduce the amount of stuff we're doing solely for inertia purposes?
I didn't get a lot of signal about how it went, though.
I know chutten participated and the audit was helpful for him. He has a ton of stuff on his plate.
I know Jan-Erik participated. I don't know if it was helpful for him.
I heard that Alessio decided to do this with his team every 6 months or so.
While I did organize the event, I actually didn't participate. I forget what happened, but something came up and I was bogged down with that.
That's about all I know. I think there are specific people who have a lot of stuff on their plate and this was helpful, but generally either people didn't participate (Maybe they were bogged down like me? Maybe they don't have much they're juggling?) or I never found out they participated.
I think it was useful to do. It was a very low-effort experiment to see if something like this would be helpful. If it was the case that people had a lot on their plates, seems like this would have surfaced a bunch of things allowing us to improve peoples' work lives. I think for specific people who have a lot on their plate, it was a helpful exercise.
I didn't get enough signal to make me want to spend the time to run it again in December.
Given that:
Ift think it's good to run individually. If you're feeling overwhelmed with stuff, an audit is a great place to start figuring out how to fix that.
It might be good to run in a small team as an excercise in taking stock of what's going on and rebalance things.
It's probably not helpful to run in an org where maybe it ends up being more bookkeeping work than it's worth.
Dennis is a Python command line utility (and library) for working with localization. It includes:
a linter for finding problems in strings in .po
files like invalid Python
variable syntax which leads to exceptions
a template linter for finding problems in strings in .pot
files that make
translator's lives difficult
a statuser for seeing the high-level translation/error status of your .po
files
a translator for strings in your .po
files to make development easier
It's been 5 years since I released Dennis v0.9. That's a long time.
This brings several minor things and clean up. Also, I transferred the repository from "willkg" to "mozilla" in GitHub.
b38a678 Drop Python 3.5/3.6; add Python 3.9/3.10 (#122, #123, #124, #125)
b6d34d7 Redo tarrminal printin' and colorr (#71)
There's an additional backwards-incompatible change here in which we drop
the --color
and --no-color
arguments from dennis-cmd lint
.
658f951 Document dubstep (#74)
adb4ae1 Rework CI so it uses a matrix
transfer project from willkg to mozilla for ongoing maintenance and support
I worked on Dennis for 9 years.
It was incredibly helpful! It eliminated an entire class of bugs we were plagued with for critical Mozilla sites like AMO, MDN, SUMO, Input [1], and others. It did it in a way that supported and was respectful of our localization community.
It was pretty fun! The translation transforms are incredibly helpful for fixing layout issues. Some of them also produce hilarious results:
Input has gone to the happy hunting ground in the sky.
SUMO in dubstep.
SUMO in Pirate.
SUMO in Zombie.
There were a variety of dennis recipes including using it in a commit hook to translate commit messages. https://github.com/mozilla/dennis/commits/main
I enjoyed writing silly things at the bottom of all the release blog posts.
I learned a lot about gettext, localization, and languages! Learning about the nuances of plurals was fascinating.
The code isn't great. I wish I had redone the tokenization pipeline. I wish I had gotten around to adding support for other gettext variable formats.
Regardless, this project had a significant impact on Mozilla sites which I covered briefly in my Dennis Retrospective (2013).
It's been 6 years since I worked on sites that have localization, so I haven't really used Dennis in a long time and I'm no longer a stakeholder for it.
I need to reduce my maintenance load, so I looked into whether to end this project altogether. Several Mozilla projects still use it for linting PO files for deploys, so I decided not to end the project, but instead hand it off.
Welcome @diox and @akatsoulas who are picking it up!
For more specifics on this release, see here: https://dennis.readthedocs.io/en/latest/changelog.html#version-1-0-0-june-10th-2022
Documentation and quickstart here: https://dennis.readthedocs.io/en/latest/
Source code and issue tracker here: https://github.com/mozilla/dennis
39 of 7,952,991,938 people were aware that Dennis existed but tens--nay, hundreds!--of millions were affected by it.
Socorro and Tecken make up the services part of our crash reporting system at Mozilla. We ran a small Data Sprint day to onboard a new ops person and a new engineer. I took my existing Socorro presentation and Tecken presentation [1], combined them, reduced them, and then fixed a bunch of issues. This is that presentation.
I never blogged the Tecken 2020 presentation.
Back in January 2020, I wrote How to pick up a project with an audit. I received some comments about it over the last couple of years, but I don't think I really did anything with them. Then Sumana sent an email asking whether I'd blogged about my experiences auditing projects and estimating how long it takes and things like that.
That got me to re-reading the original blog post and it was clear it needed an update, so I did that. One thing I focused on was differentiating between "service" and "non-service" projects. The post feels better now.
But that's not this post! This post is about my experiences with auditing. What happened in that Summer of 2019 which formed the basis of that blog post? What were those 5 [1] fabled projects? How did those audits go? Where are those projects now?
It ended up being 6 projects. I think I didn't originally count Mozilla Location Services for some reason.