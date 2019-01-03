2018 was a big year. I really can't overstate that. Some highlights:

Switched from Google sign-in to Mozilla's SSO. Alexis (one of our summer interns) switched the Crash Stats site from Google sign-in to Mozilla's SSO. It fixed a ton of problems we've had with sign-in over the years and brings Crash Stats into the fold along with other Mozilla sites. It also created a couple of new problems which I'm still working out. The big one being "Periodic 'An unexpected error occurred' when browsing reports and comments" ([bug 1473068]).

Redid our AWS infrastructure. This was a huge project that reworked everything about Socorro's infrastructure. Now we have: aggregated, centralized logs and log history CI-triggered deploys Docker-based services a local development environment that matches stage and prod server environments disposable nodes version-control managed configuration locked-down access to storage systems automatic scaling AWS S3 bucket names that don't have periods in them This project took a year and a half to do and simplified deploying and maintaining the project significantly. It also involved rewriting a lot of stuff. I talk more about this project in Socorro Smooth Mega-Migration 2018. We did a fantastic job on this--it was super smooth!

Rewrote Socorro's signature generation system. Early this summer, Will Lachance took on Ben Wu as an intern to look at Telemetry crash ping data. One of the things Ben wanted to do was generate Socorro-style signatures from the data. Then he could do analysis on crash ping data using Telemetry tools and do deep dives on specific crashes in Socorro. I refactored and extracted Socorro's signature generation code into a Python library that could be used outside of Socorro. I talk more about this project in Siggen (Socorro signature generator) v0.2.0 released!. After Ben finished up his internship, the project was shut down. I don't think anyone uses the Siggen library. Ted says if we make it a web API, then people could use it in other places. That's the crux of "Add a web API to generate a signature from a list of frames" ([bug 828452]). I want to work on that, but have to hone the signature generation API more first. I also cleaned up a bunch of signature generation removing one of the siglist files we had, generalizing some of the code, and improving signature generation in several cases.

Tried out React. Mike and Alexis investigated switching the Crash Stats front end to React. Towards that, they tested out converting the report view to a React to see how it felt, what problems it solved, and what new issues came up. Alexis ended his summer internship and Mike switched to a different project, so I spent some time mulling over things and deciding that while I like React and there are some compelling reasons to React-ify Crash Stats, this isn't a good move right now.

Reworked Socorro to support new products. I reworked processing and the web interface to allow Socorro to support products that don't have the same release management process as Firefox and Fennec. Now Socorro supports Focus, FirefoxReality, and the GeckoView ReferenceBrowser.

Switched from FTPScraperCronApp to ArchiveScraperCronApp. Incoming crash reports for the beta channel report the release version and not the beta version. For example, crash reports for "64.0b4" come in saying they're for "64.0". That's tough because then it's hard to group crashes by specific beta. Because of that, the processor has a BetaVersionRule which looks up the (product, channel, buildid) in a table and pulls out the version string for all incoming crash reports in the beta channel. Previously, "a table" was a set of tables containing product build/version data. It was populated by FTPScraperCronApp which scraped archive.mozilla.org every hour for build information. It would pass the build information through a series of stored procedures and magically data would appear in the table . Most of this code was written many years ago and didn't work with recent changes to releases like release candidates and aurora. I rewrote the BetaVersionRule to do a lookup on Buildhub. However, we hit a bunch of issues that I won't go into among which is that the data in Buildhub doesn't have exactly what we need for the BetaVersionRule to do its thing correctly. So I wrote a new ArchiveScraperCronApp that scrapes archive.mozilla.org for the data the BetaVersionRule needs to correctly find the version string. It now handles release candidates correctly and also aurora.

Removed PostgreSQL from the processor; removed alembic, sqlalchemy, and everything they managed. For years, Socorro engineering team worked on cleaning up the Gormenghast-like sprawl that was postgres. For years, we've been generating PR after PR tweaking things and removing things to reduce the spaghetti morass. It was like removing a mountain with a plastic beach toy. All that has come to an end. https://github.com/mozilla-services/socorro/pull/4723 We now have one ORM. We now have one migration system. We no longer have stored procedures or other bits that lack unit tests and documentation. We also bid farewell to ftpscraper and that data flow of build/release information that could have been a character or a setting in a Clive Barker novel. This gets rid of a bunch of things that were really hard to maintain and never worked quite right. While I did the final PR, all the work I did built upon work Adrian and Peter and other people did over the years. Yay us!

Migrated to Python 3. I started the Python 3 migration project a couple of years ago because the death knell for Python 2 had sounded and time was ticking. We did this work in a series of baby steps so that we could make progress incrementally without upsetting or blocking other development initiatives. In the process of doing this, we updated and rewrote a lot of code including most of the error handling in the processor. I talk more about this project in Socorro: migrating to Python 3. This was a big deal. Python 3 is sooooo much easier to deal with. Plus some of the libraries we're using or are planning to use are dropping support for Python 2 and things were going to get increasingly irksome. Big thanks to Ced, Lonnen, and Mike for their efforts on this!

Removed ADI and ADI-related things. Socorro used ADI to normalize crash rates in a couple of reports. There were tons of problems with this. Now we have Mission Control which does a better job with rates and normalizing and has more representative crash data, too. Thus, we removed the reports from Socorro and also all the code we had to fetch and manage ADI data.

Stopped saving crash reports that won't get processed. Socorro was saving roughly 70% of incoming crash reports over half of which it wasn't processing. That was problematic because it meant we had a whole bunch of crash report data in storage that we didn't know anything about. That's one of the reasons we had to drop all the crash report data back in December 2017--we couldn't in a reasonable amount of time figure out which crash reports were ok to keep and which had to go. Now Socorro saves and processes roughly 20% of incoming crash reports and rejects everything else. Note that this doesn't affect users--they can still go to about:crashes and submit crash reports and those will get processed just like before.

Removed a lot of code. In 2017, we removed a lot of code. We did the same in 2018. At the beginning of 2018, we had this: -------------------------------------------------------------------------------- Language files blank comment code -------------------------------------------------------------------------------- Python 401 12447 10881 61034 C++ 11 816 474 6052 HTML 66 695 24 5167 JavaScript 52 904 959 4926 JSON 88 21 0 4432 LESS 19 146 49 2614 SQL 67 398 333 2242 C/C++ Header 12 322 614 1259 Bourne Shell 36 298 366 1094 CSS 13 55 65 1012 MSBuild script 3 0 0 463 YAML 4 34 44 241 Markdown 3 69 0 187 INI 4 27 0 120 make 3 31 14 96 Mako 1 10 0 20 Bourne Again Shell 1 7 13 13 Dockerfile 1 4 2 11 -------------------------------------------------------------------------------- SUM: 785 16284 13838 90983 -------------------------------------------------------------------------------- At the end of 2018, we had this: ------------------------------------------------------------------------------ Language files blank comment code ------------------------------------------------------------------------------- Python 296 8493 6708 41107 C++ 11 827 474 6095 JSON 92 21 0 4296 HTML 50 484 19 4270 JavaScript 37 624 773 3368 LESS 36 287 51 2712 C/C++ Header 12 322 614 1259 CSS 3 27 53 704 MSBuild script 3 0 0 463 Bourne Shell 21 173 263 449 YAML 3 28 33 226 make 3 36 15 142 Dockerfile 1 14 12 35 INI 1 0 0 8 ------------------------------------------------------------------------------- SUM: 569 11336 9015 65134 ------------------------------------------------------------------------------- We're doing roughly the same stuff, but with less code. I don't think we're going to have another year of drastic code reduction, but it's likely we'll remove some more in 2018 as we address the last couple of technical debt projects.