Bugs and Modeling

The web was all abuzz on December 31 as the 30Gig version of the Microsoft Zune players all stopped working.  What was up?  Was it a terrorist attack?  Solar flares?  A weird Y2K bug almost a decade later?

The truth is a bit prosaic:  there was simply a bug related to leap years.  Since the Zune was not around four years ago, 2008 was the first time for the bug to show itself.  There are descriptions of the bug in numerous places:  here is one if you haven’t seen it yet.  Bottom line:  a section of code for converting “days since January 1, 1980” (when the universe was created) to years, months, and days didn’t correctly handle a leap year, leading to an infinite loop.

It is easy to laugh at such a mistake:  why didn’t a code review or unit testing catch such a simple mistake?  But, it seems, such “simple” parts of the code seem the ones most likely not to get tested.  When you have to test all sorts of complicated things like checking authorization, playing music, handling the interface and so on, who expects problems in a date calculation?  And hence a zillion Zunes fail for a day.

Ooops!
Ooops!

I experienced something similar when I was reviewing some code I use to create a sports schedule.  Never mind the sport:  it doesn’t matter.  But the model I created aimed to have a large number of a particular type game on a particular week.  And, for the last few years, we didn’t get a lot of those games on that week.  This didn’t particularly worry me:  there are a lot of constraints that interact in complicated ways, so I assumed the optimization was right in claiming the best number was the one we got (and this was one of the less important parts of the objective).  But when I looked at the code recently, I realized there was an “off by one” error in my logic, and sure enough the previous week had a full slate of the preferred games.  Right optimization, wrong week.  Dang!

So one of my goals this week, before class starts is to relook at the code with fresh eyes and see what other errors I can find.    There are some things I can do to help find such bugs, like trying it on small instances and turning on and off various constraint types, but one difficult aspect of optimization is that knowing the optimal solution requires … optimization, making it very hard to find these sorts of bugs.