Dev

Everett v0.9 released and why you should use Everett

What is it?

Everett is a Python configuration library.

Configuration with Everett:

  • is composeable and flexible
  • makes it easier to provide helpful error messages for users trying to configure your software
  • supports auto-documentation of configuration with a Sphinx autocomponent directive
  • supports easy testing with configuration override
  • can pull configuration from a variety of specified sources (environment, ini files, dict, write-your-own)
  • supports parsing values (bool, int, lists of things, classes, write-your-own)
  • supports key namespaces
  • supports component architectures
  • works with whatever you're writing--command line tools, web sites, system daemons, etc

Everett is inspired by python-decouple and configman.

v0.9 released!

This release focused on overhauling the Sphinx extension. It now:

  • has an Everett domain
  • supports roles
  • indexes Everett components and options
  • looks a lot better

This was the last big thing I wanted to do before doing a 1.0 release. I consider Everett 0.9 to be a solid beta. Next release will be a 1.0.

Why you should take a look at Everett

At Mozilla, I'm using Everett 0.9 for Antenna which is running in our -stage environment and will go to -prod very soon. Antenna is the edge of the crash ingestion pipeline for Mozilla Firefox.

When writing Antenna, I started out with python-decouple, but I didn't like the way python-decouple dealt with configuration errors (it's pretty hands-off) and I really wanted to automatically generate documentation from my configuration code. Why write the same stuff twice especially where it's a critical part of setting Antenna up and the part everyone will trip over first?

Here's the configuration documentation for Antenna:

http://antenna.readthedocs.io/en/latest/configuration.html#application

Here's the index which makes it easy to find things by component or by option (in this case, environment variables):

http://antenna.readthedocs.io/en/latest/genindex.html

When you configure Antenna incorrectly, it spits out an error message like this:

1  <traceback omitted, but it'd be here>
2  everett.InvalidValueError: ValueError: invalid literal for int() with base 10: 'foo'
3  namespace=None key=statsd_port requires a value parseable by int
4  Port for the statsd server
5  For configuration help, see https://antenna.readthedocs.io/en/latest/configuration.html

So what's here?:

  • Block 1 is the traceback so you can trace the code if you need to.
  • Line 2 is the exception type and message
  • Line 3 tells you the namespace, key, and parser used
  • Line 4 is the documentation for that specific configuration option
  • Line 5 is the "see also" documentation for the component with that configuration option

Is it beautiful? No. [1] But it gives you enough information to know what the problem is and where to go for more information.

Further, in Python 3, Everett will always raise a subclass of ConfigurationError so if you don't like the output, you can tailor it to your project's needs. [2]

First-class docs. First-class configuration error help. First-class testing. This is why I created Everett.

If this sounds useful to you, take it for a spin. It's almost a drop-in replacement for python-decouple [3] and os.environ.get('CONFIGVAR', 'default_value') style of configuration.

Enjoy!

[1] I would love some help here--making that information easier to parse would be great for a 1.0 release.
[2] Python 2 doesn't support exception chaining and I didn't want to stomp on the original exception thrown, so in Python 2, Everett doesn't wrap exceptions.
[3] python-decouple is a great project and does a good job at what it was built to do. I don't mean to demean it in any way. I have additional requirements that python-decouple doesn't do well and that's where I'm coming from.

Where to go for more

For more specifics on this release, see here: http://everett.readthedocs.io/en/latest/history.html#april-7th-2017

Documentation and quickstart here: https://everett.readthedocs.org/en/v0.9/

Source code and issue tracker here: https://github.com/willkg/everett

Bleach v2.0 released!

What is it?

Bleach is a Python library for sanitizing and linkifying text from untrusted sources for safe usage in HTML.

Bleach v2.0 released!

Bleach 2.0 is a massive rewrite. Bleach relies on the html5lib library. html5lib 0.99999999 (8 9s) changed the APIs that Bleach was using to sanitize text. As such, in order to support html5lib >= 0.99999999 (8 9s), I needed to rewrite Bleach.

Before embarking on the rewrite, I improved the tests and added a set of tests based on XSS example strings from the OWASP site. Spending quality time with tests before a rewrite or refactor is both illuminating (you get a better understanding of what the requirements are) and also immensely helpful (you know when your rewrite/refactor differs from the original). That was time well spent.

Given that I was doing a rewrite anyways, I decided to take this opportunity to break the Bleach API to make it more flexible and easier to use:

  • added Cleaner and Linkifier classes that you can create once and reuse to reduce redundant work--suggested in #125
  • created BleachSanitizerFilter which is now an html5lib filter that can be used anywhere you can use an html5lib filter
  • created LinkifyFilter as an html5lib filter that can be used anywhere you use an html5lib filter including as part of cleaning allowing you to clean and linkify in one pass--suggested in #46
  • changed arguments for attribute callables and linkify callbacks
  • and so on

During and after the rewrite, I improved the documentation converting all the examples to doctest format so they're testable and verifiable and adding examples where there weren't any. This uncovered bugs in the documentation and pointed out some annoyances with the new API.

As I rewrote and refactored code, I focused on making the code simpler and easier to maintain going forward and also documented the intentions so I and others can know what the code should be doing.

I also adjusted the internals to make it easier for users to extend, subclass, swap out and whatever else to adjust the functionality to meet their needs without making Bleach harder to maintain for me or less safe because of additional complexity.

For API-adjustment inspiration, I went through the Bleach issue tracker and tried to address every possible issue with this update: infinite loops, unintended behavior, inflexible APIs, suggested refactorings, features, bugs, etc.

The rewrite took a while. I tried to be meticulous because this is a security library and it's a complicated problem domain and I was working on my own during slow times on work projects. When working on one's own, you don't have benefit of review. Making sure to have good test coverage and waiting a day to self-review after posting a PR caught a lot of issues. I also go through the PR and add comments explaining why I did things to give context to future me. Those habits help a lot, but probably aren't as good as a code review by someone else.

Some stats

OMG! This blog post is so boring! Walls of text everywhere so far!

There were 61 commits between v1.5 and v2.0:

  • Vadim Kotov: 1
  • Alexandr N. Zamaraev: 2
  • me: 58

I closed out 22 issues--possibly some more.

The rewrite has the following git diff --shortstat:

64 files changed, 2330 insertions(+), 1128 deletions(-)

Lines of code for Bleach 1.5:

~/mozilla/bleach> cloc bleach/ tests/
      11 text files.
      11 unique files.
       0 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.07 s (152.4 files/s, 25287.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          11            353            200           1272
-------------------------------------------------------------------------------
SUM:                            11            353            200           1272
-------------------------------------------------------------------------------
~/mozilla/bleach>

Lines of code for Bleach 2.0:

~/mozilla/bleach> cloc bleach/ tests/
      49 text files.
      49 unique files.
      36 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.13 s (101.7 files/s, 20128.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          13            545            406           1621
-------------------------------------------------------------------------------
SUM:                            13            545            406           1621
-------------------------------------------------------------------------------
~/mozilla/bleach>

Some off-the-cuff performance benchmarks

I ran some timings between Bleach 1.5 and various uses of Bleach 2.0 on the Standup corpus.

Here's the results:

what? time to clean and linkify
Bleach 1.5 1m33s
Bleach 2.0 (no code changes) 41s
Bleach 2.0 (using Cleaner and Linker) 10s
Bleach 2.0 (clean and linkify--one pass) 7s

How'd I compute the timings?

  1. I'm using the Standup corpus which has 42000 status messages in it. Each status message is like a tweet--it's short, has some links, possibly has HTML in it, etc.
  2. I wrote a timing harness that goes through all those status messages and times how long it takes to clean and linkify the status message content, accumulates those timings and then returns the total time spent cleaning and linking.
  3. I ran that 10 times and took the median. The timing numbers were remarkably stable and there was only a few seconds difference between the high and low for all of the sets.
  4. I wrote the median number down in that table above.
  5. Then I'd adjust the code as specified in the table and run the timings again.

I have several observations/thoughts:

First, holy moly--1m33s to 7s is a HUGE performance improvement.

Second, just switching from Bleach 1.5 to 2.0 and making no code changes (in other words, keeping your calls as bleach.clean and bleach.linkify rather than using Cleaner and Linker and LinkifyFilter), gets you a lot. Depending on whether your have attribute filter callables and linkify callbacks, you may be able to just upgrade the libs and WIN!

Third, switching to reusing Cleaner and Linker also gets you a lot.

Fourth, your mileage may vary depending on the nature of your corpus. For example, Standup status messages are short so if your text fragments are larger, you may see more savings by clean-and-linkify in one pass because HTML parsing takes more time.

How to upgrade

Upgrading should be straight-forward.

Here's the minimal upgrade path:

  1. Update Bleach to 2.0 and html5lib to >= 0.99999999 (8 9s).
  2. If you're using attribute callables, you'll need to update them.
  3. If you're using linkify callbacks, you'll need to update them.
  4. Read through version 2.0 changes for any other backwards-incompatible changes that might affect you.
  5. Run your tests and see how it goes.

Note

If you're using html5lib 1.0b8, then you have to explicitly upgrade the version. 1.0b8 is equivalent to html5lib 0.9999999 (7 9s) and that's not supported by Bleach 2.0.

You have to explicitly upgrade because pip will think that 1.0b8 comes after 0.99999999 (8 9s) and it doesn't. So it won't upgrade html5lib for you.

If you're doing 9s, make sure to upgrade to 0.99999999 (8 9s) or higher.

If you're doing 1.0bs, make sure to upgrade to 1.0b9 or higher.

If you want better performance:

  1. Switch to reusing bleach.sanitizer.Cleaner and bleach.linkifier.Linker.

If you have large text fragments:

  1. Switch to reusing bleach.sanitizer.Cleaner and set filters to include LinkifyFilter which lets you clean and linkify in one step.

Many thanks

Many thanks to James Socol (previous maintainer) for walking me through why things were the way they were.

Many thanks to Geoffrey Sneddon (html5lib maintainer) for answering questions, helping with problems I encountered and all his efforts on html5lib which is a huge library that he works on in his spare time for which he doesn't get anywhere near enough gratitude.

Many thanks to Lonnen (my manager) who heard me talk about html5lib zero point nine nine nine nine nine nine nine nine a bunch.

Also, many thanks to Mozilla for letting me work on this during slow periods of the projects I should be working on. A bunch of Mozilla sites use Bleach, but none of mine do.

Where to go for more

For more specifics on this release, see here: https://bleach.readthedocs.io/en/latest/changes.html#version-2-0-march-8th-2017

Documentation and quickstart here: https://bleach.readthedocs.org/en/v2.0/

Source code and issue tracker here: https://github.com/mozilla/bleach

Who uses my stuff?

Summary

I work on a lot of different things. Some are applications, are are libraries, some I started, some other people started, etc. I have way more stuff to do than I could possibly get done, so I try to spend my time on things "that matter".

For Open Source software that doesn't have an established community, this is difficult.

This post is a wandering stream of consciousness covering my journey figuring out who uses Bleach.

Read more…

Everett v0.8 released!

What is it?

Everett is a Python configuration library.

Configuration with Everett:

  • is composeable and flexible
  • makes it easier to provide helpful error messages for users trying to configure your software
  • can pull configuration from a variety of specified sources (environment, ini files, dict, write-your-own)
  • supports parsing values (bool, int, lists of things, classes, write-your-own)
  • supports key namespaces
  • facilitates writing tests that change configuration values
  • supports component architectures with auto-documentation of configuration with a Sphinx autoconfig directive
  • works with whatever you're writing--command line tools, web sites, system daemons, etc

Everett is inspired by python-decouple and configman.

v0.8 released!

As I sat down to write this, I discovered I'd never written about Everett before. I wrote it initially as part of another project and then extracted it and did a first release in August 2016.

Since then, I've been tinkering with how it works in relation to how it's used and talking with peers to understand their thoughts on configuration.

At this stage, I like Everett and it's at a point where it's worth telling others about and probably due for a 1.0 release.

This is v0.8. In this release, I spent some time polishing the autoconfig Sphinx directive to make it more flexible to use in your project documentation. Instead of having configuration bits all over your project, you centralize it in one place and then in your Sphinx docs, you have something like:

.. autoconfig:: path.to.AppConfig

and it happily spits out the relevant configuration documentation. For example, here's Antenna's configuration documentation.

It'd be nice if configuration variables showed up in the index. I'll mull over that later.

Where to go for more

For more specifics on this release, see here: https://everett.readthedocs.io/en/latest/history.html#january-24th-2017

Documentation and quickstart here: https://everett.readthedocs.org/en/v0.8/

Source code and issue tracker here: https://github.com/willkg/everett

Anatomy of a class/function decorator and context manager

Over the weekend, I wanted to implement something that acted as both a class and function decorator, but could also be used as a context manager. I needed this flexibility for overriding configuration values making it easier to write tests. I wanted to use it in the following ways:

  1. as a function decorator:

    @config_override(DEBUG='False')
    def test_something():
        ...
    
  2. as a class decorator that would decorate all methods that start with test_:

    @config_override(DEBUG='False')
    class TestSomething:
        def test_something(self):
            ...
    
  3. as a context manager that allowed for multiple layer of overriding:

    def test_something():
        with config_override(DEBUG='False'):
            with config_override(SOMETHING_ELSE='ou812'):
                ...
    

This kind of need comes up periodically, but infrequently enough that I forget how I wrote it the last time around.

This post walks through how I structured it.

Read more…

Dennis v0.7 released! New lint rules and more tests!

What is it?

Dennis is a Python command line utility (and library) for working with localization. It includes:

  • a linter for finding problems in strings in .po files like invalid Python variable syntax which leads to exceptions
  • a template linter for finding problems in strings in .pot files that make translator's lives difficult
  • a statuser for seeing the high-level translation/error status of your .po files
  • a translator for strings in your .po files to make development easier

v0.7 released!

It's been 10 months since the last release. In that time, I:

  • Added a lot more tests and fixed bugs discovered with those tests.
  • Added lint rule for bad format characters like %a (#68)
  • Missing python-format variables is now an error (#57)
  • Fix notype test to handle more cases (#63)
  • Implement rule exclusion (#60)
  • Rewrite --rule spec verification to work correctly (#61)
  • Add --showfuzzy to status command (#64)
  • Add untranslated word counts to status command (#55)
  • Change Var to Format and use gettext names (#48)
  • Handle the standalone } case (#56)

I thought I was close to 1.0, but now I'm less sure. I want to unify the .po and .pot linters and generalize them so that we can handle other l10n file formats. I also want to implement a proper plugin system so that it's easier to add new rules and it'd allow other people to create separate Python packages that implement rules, tokenizers and translaters. Plus I want to continue fleshing out the tests.

At the (glacial) pace I'm going at, that'll take a year or so.

If you're interested in dennis development, helping out or have things you wish it did, please let me know. Otherwise I'll just keep on keepin on at the current pace.

Where to go for more

For more specifics on this release, see here: https://dennis.readthedocs.org/en/v0.7/changelog.html#version-0-7-0-october-2nd-2015

Documentation and quickstart here: https://dennis.readthedocs.org/en/v0.7/

Source code and issue tracker here: https://github.com/willkg/dennis

Source code and issue tracker for Denise (Dennis-as-a-service): https://github.com/willkg/denise

47 out of 80 Silicon Valley companies say their last round of funding depended solely on having dennis in their development pipeline and translating their business plan into Dubstep.

pytest-wholenodeid addon: v0.2 released!

What is it?

pytest-wholenodeid is a pytest addon that shows the whole node id on failure rather than just the domain part. This makes it a lot easier to copy and paste the entire node id and re-run the test.

v0.2 released!

I wrote it in an hour today to make it easier to deal with test failures. Then I figured I'd turn it into a real project so friends could use it. Now you can use it, too!

I originally released v0.1 (the first release) and then noticed on PyPI that the description was a mess, so I fixed that and released v0.2.

To install:

pip install pytest-wholenodeid

It runs automatically. If you want to disable it temporarily, pass the --nowholeid argument to pytest.

More details on exactly what it does on the PyPI page.

If you use it and find issues, write up an issue in the issue tracker.

Dennis v0.6 released! Line numbers, double vowels, better cli-fu, and better output!

What is it?

Dennis is a Python command line utility (and library) for working with localization. It includes:

  • a linter for finding problems in strings in .po files like invalid Python variable syntax which leads to exceptions
  • a template linter for finding problems in strings in .pot files that make translator's lives difficult
  • a statuser for seeing the high-level translation/error status of your .po files
  • a translator for strings in your .po files to make development easier

v0.6 released!

Since v0.5, I've done the following:

  • Rewrote the command line handling using click and added an exception handler.
  • Merged the lint and linttemplate commands. Why should you care which file you're linting when the linter can figure it out for you?
  • Added the whimsical double vowel transform.
  • Added line numbers in the lint output. This will make it possible to find those pesky problematic strings in your .po/.pot files.
  • Add a line reporter to the linter.

Getting pretty close to what I want for a 1.0, so I'm pretty excited about this version.

Denise update

I've updated Denise with the latest Dennis and moved it to a better url. Lint your .po/.pot files via web service using http://denise.paas.allizom.org/.

Where to go for more

For more specifics on this release, see here: http://dennis.readthedocs.org/en/latest/changelog.html#version-0-6-december-16th-2014

Documentation and quickstart here: http://dennis.readthedocs.org/en/v0.6/

Source code and issue tracker here: https://github.com/willkg/dennis

Source code and issue tracker for Denise (Dennis-as-a-service): https://github.com/willkg/denise

6 out of 8 employees said Dennis helps them complete 1.5 more deliverables per quarter.

Dennis v0.5 released! New lint rules, new template linter, bunch of fixes, and now a service!

What is it?

Dennis is a Python command line utility (and library) for working with localization. It includes:

  • a linter for finding problems in strings in .po files like invalid Python variable syntax which leads to exceptions
  • a template linter for finding problems in strings in .pot files that make translator's lives difficult
  • a statuser for seeing the high-level translation/error status of your .po files
  • a translator for strings in your .po files to make development easier

v0.5 released!

Since the last release announcement, there have been a handful of new lint rules added:

  • W301: Translation consists of just white space
  • W302: The translation is the same as the original string
  • W303: There are descrepancies in the HTML between the original string and the translated string

Additionally, there's a new template linter for your .pot files which can catch things like:

  • W500: Strings with variable names like o, O, 0, l, 1 which can be hard to read and are often replaced with a similar looking letter by the translator.
  • W501: One-character variable names which don't give translators enough context about what's being translated.
  • W502: Multiple unnamed variables which can't be reordered because the order the variables are expanded is specified outside of the string.

Dennis in action

Want to see Dennis in action, but don't want to install Dennis? I threw it up as a service, though it's configured for SUMO: http://dennis-sumo.paas.allizom.org/

Note

I may change the URL and I might create a SUMO-agnostic version. If you're interested, let me know.

Where to go for more

For more specifics on this release, see here: http://dennis.readthedocs.org/en/latest/changelog.html#version-0-5-august-24th-2014

Documentation and quickstart here: http://dennis.readthedocs.org/en/v0.5/

Source code and issue tracker here: https://github.com/willkg/dennis

Source code and issue tracker for Denise (Dennis-as-a-service): https://github.com/willkg/denise

3 out of 5 summer interns use Dennis to improve their posture while pranking their mentors.