Will >> Will's blog

purpose: Will Kahn-Greene's blog of Miro, PyBlosxom, Python, GNU/Linux, random content, PyBlosxom, Miro, and other projects mixed in there ad hoc, half-baked, and with a twist of lemon
Page 1 of 86  >> (less recent)

Sun, 20 Jul 2014

Input status: July 20th, 2014

Summary

This is the status report for development on Input. I publish a status report to the input-dev mailing list every couple of weeks or so covering what was accomplished and by whom and also what I'm focusing on over the next couple of weeks. I sometimes ruminate on some of my concerns. I think one time I told a joke.

Last status report was at the end of June. This status report covers the last few things we landed in 2014q2 as well as everything we've done so far in 2014q3.

Development

Landed and deployed:

  • 6ecd0ce [bug 1027108] Change default doc theme to mozilla sphinx (Anna Philips)
  • 070f992 [bug 1030526] Add cors; add api feedback get view
  • f6f5bc9 [bug 1030526] Explicitly declare publicly-visible fields
  • c243b5d [bug 1027280] Add GengoHumanTranslater.translate; cleanup
  • 3c9cdd1 [bug 1027280] Add human tests; overhaul Gengo tests
  • ff39543 [bug 1027280] Add support for the Gengo sandbox
  • 258c0b5 [bug 1027280] Add test for get_balance
  • 44dd8e5 [bug 1027280] Implement Gengo Human push_translations
  • 35ae6ec [bug 1027280] Clean up API code
  • a7bf90a [bug 1027280] Finish pull_translations and tests
  • c9db147 [bug 1027286] Gengo translation system status
  • f975f3f [bug 1027291] Implement spot Gengo human translation
  • f864b6b [bug 1027295] Add translation_sync cron job
  • c58fd44 [bug 1032226] en-GB should copyover, too
  • 7480f87 [bug 1032226] Tweak the code to be more defensive
  • 7ac1114 [bug 1032571] CSRF exempt the API
  • ac856eb [bug 1032571] Fix tests to catch csrf issues in the api
  • 74e8e09 [bug 1032967] Handle unsupported language pairs
  • 74a409e [bug 1026503] First pass at vagrantification
  • a7a440f Continued working on docs; ditched hacking howto
  • 44e702b [bug 1018727] Backfill translations
  • 69f9b5b Fix date_end issue
  • e59d4f6 [bug 1033852] Better handle unsupported src languages
  • cc3c4d7 Add list of unsupported languages to admin
  • 32e7434 [bug 1014874] Fix translate ux
  • 672abba [bug 1038774] Hide responses from hidden products
  • e23eca5 Fix a goof in the last commit
  • 6f78e2e [bug 947767] Nix authentication for API stuff
  • a9f2179 Fix response view re: non-existent products
  • e4c7c6c [Bug 1030905] fjord feedback api tests for dates (Ian Kronquist)
  • 0d8e024 [bug 935731] Add FactoryBoy
  • 646156f Minor fixes to the existing API docs
  • f69b58b [bug 1033419] Heartbeat backend prototype
  • f557433 [bug 1033419] Add docs for heartbeat posting

Landed, but not deployed:

  • 7c7009b [bug 935731] Switch all tests to use FactoryBoy
  • 2351fb5 Generate locales so ubuntu will quite whining (Ian Kronquist)

Current head: 7ea9fc3

High-level

At a high level, this is:

  1. Landed automated Gengo human translation and a bunch of minor fixes to make it work more smoothly.
  2. Reworked how we build development environments to use vagrant. This radically simplifies the instructions and should make it a lot easier for contributors to build a development environment. This in turn should lead to more people working on Input.
  3. Fixed a bug where products marked as "hidden" were still showing up in the dashboard.
  4. Implemented a GET API for Input responses. (https://wiki.mozilla.org/Firefox/Input/Dashboards_for_Everyone)
  5. Implemented the backend for the Heartbeat prototype. (https://wiki.mozilla.org/Firefox/Input/Heartbeat)
  6. Also, I'm fleshing out the Input section in the wiki complete with project plans. (https://wiki.mozilla.org/Firefox/Input)

Over the next two weeks

  1. Continue fleshing out project plans for in-progress projects on the wiki.
  2. Gradient sentiment and product picker work.

What I need help with

  1. We have a new system for setting up development environments. I've tested it on Linux. Ian has, too (pretty sure he's using Linux). We could use some help testing it on Windows and Mac OSX.

Do the instructions work on Windows? Do the instructions work on Mac OSX? Are there important things the instructions don't cover? Is there anything confusing?

http://fjord.readthedocs.org/en/latest/getting_started.html

  1. I'm changing the way I'm managing Fjord development. All project plans will be codified in the wiki. A rough roadmap of which projects are on the drawing board, in-progress, completed, etc is also on the wiki. I threw together a structure for all of this that I think is good, but it could use some review.

Do these project plans provide useful information? Are there important questions that need answering that the plans do not answer?

https://wiki.mozilla.org/Firefox/Input

If you're interested in helping, let me know! We hang out on #input on irc.mozilla.org and there's the input-dev mailing list.

I think that covers it!

Tue, 01 Jul 2014

Input: 2014q2 post-mortem

I'm going to start doing quarterly post-mortems for Input development. The goal is to be more communicative about what happened, why, what's in the works and what I need more help with.

NB: "Fjord" is the name of the codebase that runs Input.

Bug and git stats

Bugzilla
========

Bugs created:        63
Bugs fixed:          54

git
===

Total commits: 151

      Will Kahn-Greene : 142  (+14758, -4599, files 438)
        ossreleasefeed : 3  (+197, -42, files 9)
          Anna Philips : 2  (+734, -6, files 24)
          Joshua Smith : 2  (+65, -31, files 5)
     Swarnava Sengupta : 1  (+2, -2, files 1)
         Ricky Rosario : 1  (+0, -0, files 0)

Total lines added: 15756
Total lines deleted: 4680
Total files changed: 477

We added a lot of lines of code this quarter:

  • April 1st, 2014: 15195 total, 6953 Python
  • July 1st, 2014: 20456 total, 9247 Python

That's a pretty big jump in LOC. I think a bunch of that is the translation-related changes.

Contributor stats

5 non-core people contributed to Fjord development.

I spent some time over the weekend finishing up Vagrant provisioning script and rewriting the docs. I'm planning to spend some more time in 2014q3 reducing the complexity and barriers for setting up a Fjord development environment to the point where someone can contribute.

Additionally, I'm planning to create more bugs that are contributor-friendly. I started doing that in the last week. I think a good goal for Input is to have around 20 contributor-y bugs hanging around at any given time.

Accomplishments

Site health dashboard: I wrote a mediocre site health dashboard that's good enough to give me a feel for how the site is performing before and after a deployment. This still needs some work, but I'll schedule that for a rainy day.

Client side smoke tests: I wrote smoke tests for the client side. I based it on the defunct input-tests code that QA was maintaining up until we rewrote Input. There are still a bunch of tests that I want to write to have a better coverage of things, but having something is way better than nothing. I'm hoping the smoke tests will reduce the amount of manual testing I'm doing, too.

Vagrant: I took some inspiration from Erik Rose and DXR and wrote a Vagrant provisioning shell script. This includes a docs overhaul as well. This work is almost done, but needs some more testing and will probably land in the next week or two. This will make peoples' lives easier.

Automated translation system (human and machine): I wrote an automated translation system. It's generalized so that it isn't model/field specific. It's also generalized so that we can add plugins for other translation systems. It's currently got plugins for Dennis, Gengo machine translation and Gengo human translation. I turned the automated human translation on yesterday and it seems to be working well. That was a HUGE project. I'm glad it's done.

One thing it includes is a lot of auditing and metrics gathering. This will make it possible for me to go back in time and look at how the translation system worked on various Input feedback responses and hone the system going forward to reduce the number of human translations we're doing and also reduce the number of problems we have doing them.

Better query syntax: We were upgraded to Elasticsearch 0.90.10. I switched the query syntax for the dashboard search field to use Elasticsearch simple_query_string. That allows users to express search queries they weren't previously able to express.

utm_source and utm_campaign handling: I finished the support for handling utm_source and utm_campaign querystring parameters. This allows us to differentiate between organic feedback and non-organic feedback.

More like this: I added a "more like this" section to the response view. This makes it possible for UA analyzers to look at a response and see other responses that are similar.

Dashboards for you, dashboards for everyone!

I'm putting this in its own section because it's intriguing. I'll write another blog post about it later in July as things gel.

On Thursday, a couple of days after d3 training that Matt organizied, I threw together a better GET API for Input feedback responses. It's not documented, it probably has some bugs and it's probably going to change a bit, but the gist of it is that it lets you more easily build a dashboard that meets your needs against live Input data.

Here's a proof-of-concept:

http://bl.ocks.org/willkg/c4d5a272f86ae4510750

That's looking at live Input data using the new GET API. The code is in a GitHub gist. It auto-updates every 2 minutes.

The problem is that I've got a ton of Input work to do and I just can't write dashboard code on Input fast enough. Further, of the people I've talked to that use the front page dashboard, they all have really different questions they're asking of the data. I'm hoping this alleviates that bottleneck by letting you and everyone else write dashboards that meet your needs.

I encourage you to take my proof-of-concept, fork the gist, tweak it, use bl.ocks.org or something to "host" the gist. Build the dashboard that answers your questions. Share it with other people. Plus, let me know about it. If you have issues with the API, submit a bug and tell me.

If this scratches the itch I think needs scratching, it should result in a bunch of interesting dashboards. If that happens, I'll write some code in Input to create a curated list of them so people can find them more easily.

Summary

This was a really crazy quarter and parts of it really sucked, but we got a lot accomplished and we laid some groundwork for some really interesting things for 2014q3.

Mon, 23 Jun 2014

Input status: June 23rd, 2014

I publish a status report to the input-dev mailing list every couple of weeks or so covering what was accomplished and by whom and also what I'm focusing on over the next couple of week. I sometimes ruminate on some of my concerns. I think one time I told a joke.

Since the last report:

Landed and deployed:

Landed, but not deployed:

HEAD: 1d9e67a

Mostly I spent the last couple of weeks working on automated Gengo human translation support. This involved some infrastructure rewriting plus some additional infrastructure so that when we push all this out, we can see what's going on as it is happening.

Additionally, I went through and updated the mentor metadata for mentored bugs, added a bunch of new mentored bugs and worked with two potential contributors on them.

Over the next week (last week in 2014q2):

  1. finish up automated Gengo human translation work

First thing in 2014q3, I'll spend some time "opening up" the development side of the project. This will make it easier/possible to follow and participate in development. I'm still figuring out some of the details and it's likely I'll continue to change how things work over the course of the quarter, but plan to follow advice from the Community Building team and Erik Rose who seems to be doing really super with DXR.

(Updated) Using the bug_mentor field with the Bugzilla REST API to get mentored bugs

In my previous post, I mentioned how Bugzilla grew a mentor field and had some code on how to query Bugzilla using the old and new APIs for the list of mentored bugs. This updates that information.

Gerv and Byron pointed out there's a isnotempty operator that's better to use than the way I cobbled together to query for bugs that have data in the bug_mentor field.

Thus, the code should look like this:

import requests

# Using the old BzAPI: https://wiki.mozilla.org/Bugzilla:REST_API
r = requests.get(
    'https://api-dev.bugzilla.mozilla.org/latest' +
    '/bug?product=Input&f1=bug_mentor&o1=isnotempty'
)
data = r.json()
print len(data['bugs'])  # Prints 9


# Using the new BMO API. https://wiki.mozilla.org/BMO/REST
r = requests.get(
    'https://bugzilla.mozilla.org/rest' +
    '/bug?product=Input&f1=bug_mentor&o1=isnotempty'
)
data = r.json()
print len(data['bugs'])  # Prints 9

Thu, 19 Jun 2014

Using the bug_mentor field with the Bugzilla REST API to get mentored bugs

Updated June 23rd, 2014:
There's a better way to query the data. See the update blog post

Bugzilla grew a mentor field recently. This is really fantastic as it solves some interesting problems and makes it easier to track various aspects of mentoring which have been previously difficult to track. Yay to everyone involved in making that happen!

Migrating from the old way (sticking [mentor=xxx] in the whiteboard field) to the new way caused a problem that I spent a while working on today. I heard reports of other people having the same problem, hence this blog post.

There are a bunch of Bugzilla-symbiotic systems which would show a list of mentored bugs by checking to see if the string mentor= was in the whiteboard field. That no longer works. Instead we have to check to see if the bug_mentor field is empty. However, this is difficult to express with the old Bugzilla REST API (BzAPI).

The bug_mentor field is unique in that it holds email addresses which have the @ in them. So we can (ab)use this property by seeing if the bug_mentor field contains the @ character.

I did this with the GetInvolved/input.mozilla.org page. Here is the diff in case that's helpful.

Here's some Python that shows this with the old BzAPI and the new BMO API which pulls mentored bugs for Input:

import requests

# Using the old BzAPI: https://wiki.mozilla.org/Bugzilla:REST_API
r = requests.get(
    'https://api-dev.bugzilla.mozilla.org/latest' +
    '/bug?product=Input&bug_mentor_type=contains&bug_mentor=@'
)
data = r.json()
print len(data['bugs'])  # Prints 9


# Using the new BMO API. https://wiki.mozilla.org/BMO/REST
r = requests.get(
    'https://bugzilla.mozilla.org/rest' +
    '/bug?product=Input&bug_mentor_type=substring&bug_mentor=@'
)
data = r.json()
print len(data['bugs'])  # Prints 9

Wed, 14 May 2014

Fiddling with Kibana

I just kicked off a script that's going to take around 4 hours to complete mostly because the API it's running against doesn't want me doing more than 60 requests/minute. Given I've got like 13k requests to do, that takes a while.

I'm (ab)using Elasticsearch to store the data from my script so that I can analyze it more easily--terms facet is pretty handy here.

Given that I've got some free time now, I spent 5 minutes setting up Kibana.

Steps:

  1. download the tarball
  2. untar it into a directory
  3. edit kibana-3.0.1/config.js to point to my local Elasticsearch cluster (the defaults were fine, so I could have skipped this step)
  4. cd kibana-3.0.1/ and run python -m SimpleHTTPServer 5000 (I'm using a Python-y thing here, but you can use any web-server)
  5. point my browser to http://localhost:5000

Now I'm using Kibana.

Now that I've got it working, first thing I do is click on the cog in the upper right hand corner, click on the Index tab and change the index to the one I wanted to look at. Now I'm looking at the data my script is producing.

The Kibana site says Kibana excels at timestamped data, but I think it's helpful for what I'm looking at now despite it not being timestamped. I get immediate terms facets on the fields for the doc type I'm looking at. I can run queries, pick specific columns, reorder, do graphs, save my dashboard to look at later, etc.

If you're doing Elasticsearch stuff, it's worth looking at if only to give you another tool to look at data with.

Tue, 13 May 2014

Input: changed query syntax across the site

Better search syntax is here!

Yesterday I landed the changes for bug 986589 which affects all the search boxes and search feeds on Input. Now they use the Elasticsearch simple-query-search query instead of the hand-rolled query parser I wrote.

This was only made possible in the last month after we were updated from Elasticsearch 0.20.6 (or whatever it was) to 0.90.10.

Tell me more about this ... syntax.

I'm pretty psyched! It's pretty much the minimum required syntax for useful searching. It's kind of lame it took a year to get to this point, but so it goes.

To quote the Elasticsearch 0.90 documentation:

+  signifies AND operation
|  signifies OR operation
-  negates a single token
"  wraps a number of tokens to signify a phrase for searching
*  at the end of a term signifies a prefix query
( and )  signify precedence

Negation and prefix were the two operators my hand-rolled query parser didn't have.

What does this mean for you?

It means that you need to use the new syntax for searches on the dashboard and other parts of the site.

Further, this affects feeds, so if you're using the Atom feed, you'll probably need to update the search query there, too.

Also, we added a ? next to search boxes which links to a wiki page that documents the syntax with examples. It's a wiki page, so if the documentation is subpar or it's missing examples, feel free to let me know or fix it yourself.

Thu, 01 May 2014

Dennis v0.4 released! Tweaks to Python 3 support, overhauled linter, string-by-string lint rules ignoring

What is it?

Dennis is a Python command line utility (and library) for working with localization. It includes:

  • a linter for finding problems in strings in .po files like invalid Python variable syntax which leads to exceptions
  • a statuser for seeing the high-level translation/error status of your .po files
  • a translator for strings in your .po files to make development easier

v0.4 released!

v0.4 sports an overhauled linter. Instead of two rules ("malformed" and whatever the other one was), it now has a bunch of much smaller and more specific rules! Also, I renamed the rules so they are all numbered!

See the table of error/warning rules and their numbers here: http://dennis.readthedocs.org/en/v0.4/linting.html#warnings-and-errors

Additionally, dennis hits false positives for a variety of reasons. If you're doing a "keep the errors out of production!" kind of thing, then false positives can prevent locale files from making it. That sucks!

To alleviate this, dennis now allows you to tell it what to ignore in the extracted comments. What's an extracted comment? It's a comment in the .po file that starts with #.. You can specify the extracted comments with "context" or similar mechanisms depending on how you're extracting strings. You can tell dennis to skip specific rules or skip all the rules on a string-by-string basis.

Ignore everything:

#. dennis-ignore: *
msgid "German makes up 10% of our visitor base"
msgstr "A német a látogatóbázisunk 10%-át teszi ki"

Ignore specific rules (comma-separated):

#. dennis-ignore: E101,E102,E103
msgid "German makes up 10% of our visitor base"
msgstr "A német a látogatóbázisunk 10%-át teszi ki"

Ignore everything, but note the beginning of the line is ignored by dennis so you can tell localizers to ignore the ignore thing:

#. localizers--ignore this comment. dennis-ignore: *
msgid "German makes up 10% of our visitor base"
msgstr "A német a látogatóbázisunk 10%-át teszi ki"

I also tweaked some of the Python 3 support code because it looked at me funny.

Also, universal wheel!

For more specifics on this release, see here: http://dennis.readthedocs.org/en/v0.4/changelog.html#version-0-4-may-1st-2014

Documentation and quickstart here: http://dennis.readthedocs.org/en/v0.4/

Source code and issue tracker here: https://github.com/willkg/dennis

2 out of 10 people saw the Pirate translation on The Web We Want (Mozilla). Arrr!

Mon, 21 Apr 2014

Dennis v0.3.11 released! Fixes and Python 3 support

What is it?

Dennis is a Python command line utility (and library) for working with localization. It includes:

  • a linter for finding problems in strings in .po files like invalid Python variable syntax which leads to exceptions
  • a statuser for seeing the high-level translation/error status of your .po files
  • a translator for strings in your .po files to make development easier

v0.3.11 released!

v0.3.11 adds Python 3 support (there might be problems, but it's working for me) and adds error detection for the case where there's a } but no {.

Definitely worth updating!

8 out of 11 people who have heard of Dennis and continue to ignore its baby mews of wonderfulness also have a severe allergy to rainbows and kittens.

Fri, 18 Apr 2014

Django Eadred v0.3 released! Django app for generating sample data.

Django Eadred gives you some scaffolding for generating sample data to make it easier for new contributors to get up and running quickly, bootstrapping required database data, and generating large amounts of random data for testing graphs and things like that.

The v0.3 release is a small one, but good:

There are no backwards-compatability problems with previous versions.

To update, do:

pip install -U eadred