Thursday, May 19, 2016

Ceviche’s Café

Not a blog post about food - though that sounds good right about now.

Instead, this is more about griping about technology, which is more in line with what I usually write.

The 2016 Seattle International Film Festival has just opened as I write this. 25 days of upwards of 300 movies. I was told upwards of 400 movies, but I'm not sure about that. The biggest film festival in the United States.

In years past, I've got the big fancy catalog to flip through to decide what I wanted to see. There's a PDF of a smaller catalog, which is also helpful. And on SIFF's website, there's a "My SIFF" feature which allows you to keep track of the stuff you have tickets for. Or, for the folks who buy a series pass, you can still register your interest in specific showings.

It doesn't (as far as I can tell) export to useful things like Google Calendar for easy reference on the go.

So, in addition to cursing the darkness, I set about to put things into a more useful format. I didn't get explicit permission from SIFF for this, so I'm not advocating that you do what I have done. I wrote a Python script that uses urllib(2) and Beautiful Soup to pull down and then interpret web pages. With a little bit of poking around, I was able to interpret individual movie pages, to figure out where and when the showings are. There are a lot of venues and a lot of screenings going on.

After collecting that information, I proceeded to push the movies up onto Google Calendar, which you can see here: https://calendar.google.com/calendar/embed?src=tbbr77hpo9aqi5b98qdr2el2ls%40group.calendar.google.com&ctz=America/Los_Angeles

I actually did most of that last year for SIFF 2015. This year, I discovered that some of the movies some friends wanted to see weren't showing up on the Google Calendar. I poked through the debug log that was generated when I wrote the data to the calendar, and it turned out I was detecting some sort of error condition on certain movies, skipping them, and continuing on.

As I dug deeper, I determined that the error condition had something to do with Unicode encoding and/or decoding. Oh, joy.

One thing that was an interesting issue is that some of the URLs that SIFF uses have non-ASCII characters in them. It's OK, as long as you encode those characters properly. For example, http://www.siff.net/festival-2016/ceviche%E2%80%99s-dna has the right quote (not part of 7-bit ASCII) properly wrapped. If you try to get urllib2 to download the URL that you see in the browser address bar, it'll choke.

Maybe there's a better way to do this, but I ended up just finding the last slash and hitting the part of the URL after that with an encoding pass, because my efforts to encode the whole string led to the slashes being converted, which isn't any good.

Ok, so I can handle right quotes, maybe.

There's a bunch of other characters that show up, like in the opening movie, "Café Society". My tool took several passes, writing and rewriting title text, using the title as a key when storing the movie information in the python "shelve" format, reloading it, and somewhere along there, unicode titles were getting mangled, and I was having a hard time making sure that they got re-encoded or re-decoded, or encoded and decoded or something. I kept throwing more random layers at the problem, and it still wasn't really working for me.

In the end, I realized that I could grab the title text as ascii for the purposes it already served (particularly being a key for shelve) and then also grab a Unicode version of it alongside for generating the calendar entries.

Which ended up working really well. I'm astonished I didn't think of it earlier.

I also added in a pass where if I found '&' in the text, I'd convert it to '&'. Again, there were fancier, probably better ways which I tried and had a hard time with. So, I did a simple replacement specifically targeting that one character.

In the end, it works pretty well - I've got several things that I might change about the script before 2017, but in a lot of cases, my life would be easier if SIFF exposed an API to pull movie information from. I know that SIFF isn't in the API business, so maybe if I contacted them, they'd refer me to the company that handles their web presence, and maybe that'd be a useful conversation.

Or, maybe what I've got is good enough for a while.