Will's blog

purpose: Will Kahn-Greene's blog of Miro, PyBlosxom, Python, GNU/Linux, random content, PyBlosxom, Miro, and other projects mixed in there ad hoc, half-baked, and with a twist of lemon

[ home | blog home | recent activity ]

Wed, 20 Jun 2007

Using exiftool and Python to fix photos (edit: to order them)

S and I decided to get a wedding photographer in addition to allowing our guests to take as many photos as they wanted of all aspects of our wedding (except when we were getting dressed [and ... undressed]). There were a few reasons for this one of which being the several horror stories we've heard about people's digital media dying causing them to lose all pictures of their wedding. Ick.

The problem is that there are a fajillion pictures and it's really hard to order them into a single consistent timeline. The wedding photographer we had1 had four cameras and took some 800 pictures. My dad took another 100 or so. Other people took a bunch, too. Right now I'm working with 1200+ pictures all of which are pretty big (between 5 MB and 10 MB each). It's not feasible to tweak them all by hand to order them. I didn't want to leave them unordered--my soul shudders at that thought. I needed a way to do batch processing to reorder pictures from a bunch of cameras into a nice timeline.

More after the break...

----

First thing I did was put all the pictures that had no EXIF information to the side. There weren't many of them and I don't see any way to batch process them.

We're now left with 1000+ pictures all of which have EXIF information that we can work with. Interesting properties of the problem:

  1. all of the pictures have DateTimeOriginal and SerialNumber headers in the EXIF data
  2. pictures from a single camera have a consistent timeline in the sense that if picture 2 from camera A comes before picture 3 from camera A by 14 seconds, then that's what happened
  3. the time from all the pictures from camera A are a constant offset from all the pictures from camera B

This is all pretty obvious and there's nothing exciting here, but it does reduce the problem to picking a camera as a baseline and then figuring out how far off in seconds and minutes the other cameras are from the baseline. That's pretty easy.

First thing I do is get exiftool and run it like this to rename all the files according to their timestamp and serialnumber:

  exiftool '-FileName<${DateTimeOriginal}_${SerialNumber}.jpg' \
           -d "%Y%m%d_%H%M%S" .

Then I copied all the files into a thumbs directory and in the thumbs directory I used mogrify which comes with ImageMagick to create thumbnails:

  for i in `ls *.jpg`; do mogrify -quality 65 -geometry 150; done

Then I did this to figure out all the serial numbers of the cameras that took these photos:

  exiftool -SerialNumber *.jpg | sort -u

Then I wrote a Python script (any language will do) to build an index of the images using timeline offsets:

import os, time, datetime

# serial number -> (offset(minute, second), color)
OFFSETS = { "1020415017": ((0, 37), "#ff5555"), # red
            "1420918126": ((1, 11), "#ff55ff"), # purple
            "1621009923": ((1, 9), "#55ff55"),  # green
            "620306618": ((0, 0), "#5555ff") }  # blue (baseline)

def getinfo(fn):
   t = fn.split("-", 1)[1]
   t = t.split("_", 1)

   cam = t[1]
   cam = cam.split(".")[0]

   t = t[0]

   print fn, t

   t = time.strptime(t, "%H%M%S")

   # hardcoded year, month, and day
   t = datetime.datetime(2007, 05, 26, t[3], t[4], t[5])

   offset = OFFSETS[cam][0]

   t = t - datetime.timedelta(0, offset[1], 0, 0, offset[0])
   return (t, cam, fn)

def build():
   files = os.listdir(os.getcwd())
   files = [f for f in files if f.endswith(".jpg")]

   pics = [ getinfo(fn) for fn in files ]

   pics.sort()

   out = open("index.html", "w")
   out.write("<html><body><table border=\"1\">\n")

   for t, cam, fn in pics:
      out.write("""
<tr><td>
  <table>
    <tr><td bgcolor="#aaaaaa">name</td><td>%s</td></tr>
    <tr><td bgcolor="#aaaaaa">camera</td><td bgcolor="%s">%s</td></tr>
    <tr><td bgcolor="#aaaaaa">datestamp</td><td>%s</td></tr>
  </table>
</td><td><img src="%s"></td></tr>""" % (fn, offsets[cam][1], cam, repr(t), fn))

   out.write("</table></body></html>")
   out.close()

if __name__ == "__main__":
   build()

Note that I color-code the cameras. I find this makes it really easy to eyeball the timeline without trying to distinguish between similar-looking serial numbers.

I run that in my thumbs directory and it builds an index.html page that I can look at with a web-browser. The index.html has the offsets factored in. I look through the pictures and tweak the offsets until all the cameras are consistent with one timeline. Once I have a final set of offsets, I go through the pictures for each camera and (very carefully) do this:

  exiftool "-AllDates-=0:0:0 0:M:S" *SN.jpg
                               ^ ^   ^^

replacing:

Then you do another pass at renaming files and the files should then be in the same consistent timeline and in alphabetical order by filename.

After I worked out the process it took a couple of hours. We had the advantage of having a couple of points during the wedding where a lot of photographs were taken and it was obvious as to what order they needed to be in.

[1] - http://www.jillgoldman.com/ -- Jill is awesome.

06/21/2007 - Changed the title to something more appropriate. I was thinking "fix" because I was modifying the EXIF metadata for each photo to put them in the correct order, but Konquest makes a good point. I also fixed one of the command lines.

06/22/2007 - bockris said in the reddit.com comments:
This is a good idea. I've had to fix the EXIF data my photos before because I changed the batteries and didn't reset the date but I've never tried to sync up multiple cameras after the fact.

If I'm ever in a similar situation as the OP, I think I have everyone take a picture of a clock with a second hand at some period during the event. That would let you easily get the time difference among all cameras.

Comments:

Posted by Tom Ellis - Seattle Wedding Photographer on Mon Jul 26 01:01:14 2010
Will, I am extremely impressed with how you managed to deal with all of the photos.  I wish I had your level of computer and programming expertise!
Didn't you find that with all of those photos you had an awful lot of duplication, especially with 800+ by the photographer?  You didn't mention the time span over which these photos were taken, but assuming 8 hours that is still 100/hour, and my own experience is that when you shoot that many photos, an awful lot are very similar.
Were any/all of the photos in RAW?  While I'd hope this is the case for the better quality and ability to better manipulate the images, that would mean a hell of a lot of time in front of the computer doing the editing, especially if this isn't something you do all the time.


Post a new comment:

Three things:

  1. New comments get placed in a "draft" status and will NOT show up on the site until I explicitly approve it. Sometimes that happens within 24 hours.
  2. I reserve the right to reject/remove inappropriate comments.
  3. Sometimes I'll reply to a comment directly in email--so make sure your email address is correct.

If you can't for some reason post a comment, send me an email: willg at bluesock dot org.

Your name:


Your e-mail address (this doesn't get displayed to anyone--sometimes I'll reply directly to you):


URL of your website (optional):


Comment:


Yes, I am a human!


pyblosxom::1.5-dev git-master

Copyright 1996 to 2012, Will Guaraldi Kahn-Greene, under the Creative Commons BY-SA 3.0 license

Creative Commons License
Will's Blog by William Kahn-Greene is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.