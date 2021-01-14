Socorro: Massive update to socorro-submitter The socorro-submitter is part of our test infrastructure. It mimics incoming crash reports sent to the collector. It hadn't been updated in a long while, so it didn't support changes to crash report payloads we've made over the last few years. The massive update fixes holes in test in our staging environment.

Socorro: Redid infrastructure for minidump-stackwalk The Socorro processor takes a minidump and runs the stackwalker which extracts information from the minidump. Mozilla has multiple stackwalkers and while they all use the Breakpad library, they use different versions of the Breakpad library and in some cases, have different patch sets. Extracting minidump-stackwalk into a separate repository makes it easier for other people to help maintain it, apply patches, and test changes. Further, it now has its own release infrastructure, so we can tag versions of minidump-stackwalk and push those to the cloud. When we build Socorro, we now pull those pre-built binaries--this reduced Socorro build times. I also did a round of optimizing minidump-stackwalk builds including generating PGO profiles. Gabriele vastly improved that work so it builds the profiles as part of the build. That improved stackwalking runtimes slightly.

Socorro: Redid Crash Stats product support Prior to Fenix, Socorro and Crash Stats supported multiple products, but they were all kind of the same in regards to their needs and build infrastructure. Fenix is really different, so to support it and other GeckoView/android-components-using Android apps we had to make significant changes to the assumptions Socorro makes when processing and displaying those crash reports. I lump this flexibility into "product support". In 2020h2, we hit a series of requirements that forced me to redo product support. Now it's a set of JSON files in the product_details/ directory. Each JSON file covers a single product. All the product-specific configuration is now in that file. For example, we can now set the create-a-bug links on a product-by-product basis. Fenix now has create-a-bug links for a variety of GitHub issue trackers as well as Bugzilla components. Anyone can submit a PR to add, remove, or modify the create-a-bug links for a product.

Socorro: Overhauled protected data policy I worked with Alicia and others to fix the language around protected data, overhaul the protected data policy, and add some tooling to make it easier to grant, audit, and revoke protected data access. It's much clearer, the term "protected data" is now used consistently across the site, and there should be less confusion about how it all works. Later in the half, I sent an email to everyone who has protected data access reminding them of the policy details and where they can find it. The proected data policy is here: https://crash-stats.mozilla.org/documentation/protected_data_access/

Socorro: Documented process for adding crash annotations Crash annotations are the key/value pairs of metadata that get added to crash reports. Socorro exists in a larger ecosystem of crash data and some crash annotations flow between systems, while others don't. Also, it was not clear how to add a crash annotation and make sure it can be searchable, it shows up in crash ping data, and it shows up in the socorro_crash data set in Telemetry. Documenting this clarifies everything. Further, it codifies how crash annotations fit in with Mozilla data collection practices and moves us one step towards unifying crash data with the rest of the data that Data Org manages. The documentation for adding crash annotations is here: https://socorro.readthedocs.io/en/latest/annotations.html

Socorro: Raised the max body size for incoming crash report payloads Previously, crash report payloads have to be under 10mb. If they were over, the crash report was rejected. We have metrics around crash report payload sizes. Here's what we're seeing today: 50%: 220kb

mean: 330kb

95%: 1.5mb It's rare we have crash reports that exceed 10mb. However, there are scenarios where the resulting crash reports do exceed 10mb like stack overflow errors. In 2020h2, we raised the max payload size to 25mb. That didn't seem to affect the number of rejected crash reports because of payload size much. It did seem to increase the number of connection timeouts when POSTing crash report payloads. Fennec compresses crash reports because they could be sent over mobile data and that's expensive for users. Fenix and other products that use android-components also compress crash reports. Firefox desktop, however, does not. We decided the next step is to compress Firefox crash reports. [bug 781630] covers that. After that work is done, we'll see where we're at again.

Socorro: Crash Stats .COM to .ORG Some Mozilla properties are on a .ORG domain and some are on a .COM domain. Some were on a .COM domain and transitioned to .ORG. We started to transition Crash Stats from .COM to .ORG a long time ago, but never did the next step. We did that this half. Now all crash-stats.mozilla.com requests are redirected to crash-stats.mozilla.org. This ONLY covers Crash Stats--the site where you analyze crash reports. This does not affect where crash reports get sent.

Socorro: Breadcrumbs for Fenix In 2020h2, Roger and I worked out a structure for sending breadcrumbs data with Fenix crash reports as a Crash Annotation. We cribbed from the format that Sentry uses. That went through a couple of rounds of honing and the data review process and then landed. Next step is to add some code to make the breadcrumbs data for crash reports viewable on Crash Stats.

Socorro: Structured JavaException For the last 10 years, we've had mediocre support for crash reports in Java-land. One of the consequences of this is that we have terrible signatures for Java crash reports. Crash reports get an unstructured JavaStackTrace annotation which is junk to parse. We really needed structured data to generate signatures and also to show stack trace breakdown with links to source code and other stuff. In 2020h2, I worked with Roger to establish a structured format for Java exceptions. We cribbed from the format that Sentry uses. That went through honing and data review and it's now in the JavaException crash annotation. In 2021, I'll work on showing that information in the Details page of the crash report. This is covered in [bug 1675560]. In 2021, I'll also implement better signature generation for Java crash reports. This is covered in [bug 1541120].

Tecken: Clean up database tables Tecken had a bunch of database tables that were monotonically increasing in size despite the fact that they contained metadata about things that had expired. I implemented table cleanup routines and we ran them and that deleted over half of the data and sped up the site.

Tecken: Rewrote API documentation I rewrote the Tecken API documentation so it's clearer how to use it to upload symbols, download symbols, and do symbolication. https://tecken.readthedocs.io/en/latest/index.html

Tecken: dll/exe compression problem We (Mozilla) have been adjusting the build system to reduce build times for Firefox. One of the recent steps was to move the uploading-symbols step to a later part of the build. When that happened, it stopped compressing dll and exe files when uploading them to Tecken. Then Tecken wouldn't serve them. I helped figure that out, suggested a fix, and then fixed all the dll/exe data we had accumulated.

Tecken: move admin-y things to Django admin pages Tecken has a React frontend and a bunch of admin-y pages written using it. That's fine, but Tecken is built on Django and Django has admin pages built in and it's much easier to enable Django admin pages for db tables than it is to build an API and React pages to use that API. I enabled the Django admin and moved a bunch of functionality from React to the Django admin simplifying administration and maintenance.