Careful with Wolfram|Alpha

Wolfram|Alpha is an interesting service. It is not a search engine per se. If you ask it “What is Operations Research” it draws a blank (*) (mimicking most of the world) and if you ask it “Who is Michael Trick” it returns information on two movies “Michael” and “Trick” (*). But if you give it a date (say,  April 15, 1960), it will return all sorts of information about the date:

Time difference from today (Friday, July 31, 2009):
49 years 3 months 15 days ago
2572 weeks ago
18 004 days ago
49.29 years ago

106th day
15th week

Observances for April 15, 1960 (United States):
Good Friday (religious day)
Orthodox Good Friday (religious day)

Notable events for April 15, 1960:
Birth of Dodi al-Fayed (businessperson) (1955): 5th anniversary
Birth of Josiane Balasko (actor) (1950): 10th anniversary
Birth of Charles Fried (government) (1935): 25th anniversary

Daylight information for April 15, 1960 in Pittsburgh, Pennsylvania:
sunrise | 5:41 am EST\nsunset | 6:59 pm EST\nduration of daylight | 13 hours 18 minutes

Phase of the Moon:
waning gibbous moon (*)

(Somehow it missed me in the famous birthdays: I guess their database is still incomplete)

It even does simple optimization

min {5 x^2+3 x+12}  =  231/20   at   x = -3/10 (*)

And, in discrete mathematics, it does wonderful things like generate numbers (permutations, combinations, and much more) and even put out a few graphs:
graphs(*)

This is all great stuff.

And it is all owned by Wolfram who define how you can use it. As Groklaw points out, the Wolfram Terms of Service are pretty clear:

If you make results from Wolfram|Alpha available to anyone else, or incorporate those results into your own documents or presentations, you must include attribution indicating that the results and/or the presentation of the results came from Wolfram|Alpha. Some Wolfram|Alpha results include copyright statements or attributions linking the results to us or to third-party data providers, and you may not remove or obscure those attributions or copyright statements. Whenever possible, such attribution should take the form of a link to Wolfram|Alpha, either to the front page of the website or, better yet, to the specific query that generated the results you used. (This is also the most useful form of attribution for your readers, and they will appreciate your using links whenever possible.)

And if you are not academic or not-for-profit, don’t think of using Wolfram|Alpha as a calculator to check your addition (“Hmmm… is 23+47 really equal 70? Let me check with Wolfram|Alpha before I put this in my report”), at least not without some extra paperwork:

If you want to use copyrighted results returned by Wolfram|Alpha in a commercial or for-profit publication we will usually be happy to grant you a low- or no-cost license to do so.

“Why yes it is. I better get filling out that license request!  No wait, maybe addition isn’t a ‘copyrighted result’.  Maybe I better run this by legal.”

Groklaw has an interesting comparison to Google:

Google, in contrast, has no Terms of Use on its main page. You have to dig to find it at all, but here it is, and basically it says you agree you won’t violate any laws. You don’t have to credit Google for your search results. Again, this isn’t a criticism of Wolfram|Alpha, as they have every right to do whatever they wish. I’m highlighting it, though, because I just wouldn’t have expected to have to provide attribution, being so used to Google. And I’m highlighting it, because you probably don’t all read Terms of Use.

So if you use Wolfram|Alpha, be prepared to pepper your work with citations (I have done so, though the link on the Wolfram page says that the suggested citation style is “coming soon”: I hope I did it right and they do not get all lawyered up) and perhaps be prepared to fill out some licensing forms.  And it might be a good idea to read some of those “Terms of Service”.

——————————————–
(*) Results Computed by Wolfram Mathematica.

Google does Operations Research and Open Source

While Google is, of course, heavily active in analytics, the company has not been known for its operations research. The “ethos” of the company has been heavily computer science based. So, while I would count much of what they do as “operations research”, they probably would not use that label.

The line between operations research and computer science is not easy to draw, but linear programming falls pretty strongly on the operations research side: it is the sort of “heavy machinery” that occurs often in OR. Given the variety of problems a company like Google faces, it is not surprising that they would end up with some problems for which linear programming is the right approach. Or not. In a recent discussion, Vijay Gill, senior manager of engineering and architecture was somewhat cagey on how computer load redistribution was being done:

Gill even seems to be saying that the company hopes to instantly redistribute workloads between data centers.

“How do you manage the system and optimize it on a global-level? That is the interesting part,” Gill continued. “What we’ve got here [with Google] is massive – like hundreds of thousands of variable linear programming problems that need to run in quasi-real-time. When the temperature starts to excurse in a data center, you don’t have the luxury to sitting around for a half an hour…You have on the order of seconds.”

Heiliger asked Gill if this was a technology Google is using today. “I could not possibly comment on that,” Gill replied, drawing more laughter from his audience.

In possibly unrelated, but possibly related, news, the Google Open Source Blog has an announcement of an open-source simplex code:

SimplexSolver is an easy-to-use, object-oriented method of solving linear programming problems. We’re happy to announce today that we’ve Open Sourced the code that runs the newly released Google Spreadsheets Solve Feature and made it a part of Apache Commons Math 2.0.

Thanks to my former office mate in Auckland Hamish Waterer and to Andrew Mason for passing along these pointers.

Gurobi software now available for download

I am behind Erwin Kalvelagen, who writes an extremely useful blog where many challenging modeling problems are solved (this is one of my “must read” blogs), in announcing that Gurobi’s standalone software is now available.  I particularly like that the trial version is 500 variables and 500 constraints, which is large enough to see how the software really works, at least for MIP.  I had attended part of a software session at the recent INFORMS conference and left impressed both with the software and with Gurobi’s business plan.

Further workshops at INFORMS Practice

Over at the INFORMS Practice Conference Blog, I have entries on Gurobi and ILOG, an IBM Company. Both presentations were inspiring in their own ways.

Gurobi Post:

It goes without saying that these statements are my individual views of the workshops, and are not the official word from either the companies or INFORMS.

The world of optimization software has been turned upside down in the last year.  Dash Optimization (makers of XPRESS-MP) was bought by FairIsaac (or FICO, as it is now called). ILOG, makers of CPLEX, was bought by IBM.  And three key people from ILOG, Gu Rothberg and Bixby, split off to form Gurobi (no prizes for guessing how the name was formed).  Gurobi held its first (I believe) technical workshop at an INFORMS Practice Conference, and had tons of interesting news.  Since “Operation Clone Michael Trick so He Can Attend All Interesting Workshops” failed, I spent the first half hour of the 3pm workshop session at the Gurobi sesssion before moving onto another session.  Here are a few things presented.

Bob Bixby presented an overview of the history of Gurobi.  Their main goal over the last year has been to create a top-notch linear and mixed integer programming code.  I was surprised that they were able to do this in the March 2008-November 2008 period.  Since then, the optimization code has been essentially static while the firm works on things like documentation, bug fixes, user interfaces and so on.

The business model of Gurobi has three main parts:

  1. Focus on math programming solvers
  2. Flexible partnerships
  3. Technology leadership

The partnership aspect was quite interesting.  They very much value the relationship they have with Microsoft Solver Foundation (whose presentation I attended this morning), along with the partnerships they have with AIMMS, Frontline, GAMS, Maximal, and other groups.

Ed Rothberg presented the stand-alone user interface (to be released May 6), which has been implemented as a customization of the Python shell.  Some of my colleagues (in particularly those at the University of Auckland) have been pushing Python, but this is the first full scale system I have seen, and it is very impressive.

Beyond that, I can only go by the handouts, since I did some session jumping, but a few things are clear:

  1. As an optimization code, Gurobi is competitive with the best codes out there, being better than all on some instances, and worse than some on others.
  2. Gurobi is taking parallel optimization very seriously, stating that single-core optimization is nothing but a special case of its multi-core approach.
  3. Python is a powerful way of accessing more complicated features of the system.

Gurobi is already available as an add-in to other systems.  It will be available in a stand-alone system in a week or so. Further versions are planned to come out at six month intervals.

CPLEX/IBM Post:

Continuing my coverage of a few of the Technical Workshops, I reiterate that the views here are neither those of the companies nor of INFORMS.  They are mine!

Ducking out of one technical workshop, I moved on to the presentation by ILOG (now styled ILOG, an IBM Company, since IBM’s acquisition earlier this year).  It was great to see the mix of IBMers and ILOG people on the stage.  Like many (about 2/3 according to a later audience survey), I was worried about the effect of having IBM acquire ILOG, but the unity of the group on stage allayed many of those fears.  The workshop had two major focuses:  the business strategy of having IBM together with ILOG and, more technically, details on the new version of ILOG’s CPLEX, CPLEX 12.

When it comes to business strategy, IBMers Brenda Dietrich and Gary Cross put out a persuasive and inspiring story on how IBM is focusing on Business Analytics and Optimization.  How can you make an enterprise “intelligent”?  You can make it aware of the environment, linked internally and externally, anticipating future situations, and so on.  And that requires both data (as in business intelligence) and improved decision making (aka operations research).  As IBM tries to lead in this area, they see the strengths in research meshing well with their consulting activities and with their software/product acquisitions.  The presentation really was inspiring, and harkened back to the glory days of “e-business” circa 1995 with an operations research tilt (with the hopes of not having a corresponding bust a few years later).

When it comes to CPLEX 12.0, there continues to be improvements.  These were given in three areas:

  1. improved MIP performance.
  2. parallel processing under the standard licence.
  3. built-in connectors for Excel, Python, and Matlab.

The improved performance was characterized by two numbers.  For instances taking more than a second to solve, the improvement was about 30%;  for harder problems taking more than 1000 seconds, CPLEX 12 is about twice as fast as 11.2 (on the problems in the extensive testbed).  Strikingly, the CPLEX testbed still has 971 models that take at least 10,000 seconds to solve, so there is still lots of work to be done here.  The improvements came through some new cuts (multicommodity flow cuts) as well as general software engineering improvements.

I think the news on parallel (multicore) processing is particularly exciting.  If our field is to take advantage of modern multi-core systems, we can’t have our software systems charging us per core.  There are some issues to be handled:  the company doesn’t want people solving 30,000+ separate models on a cloud system simultaneously for the price of one license, but some system for exploiting the 2-8 cores on most current machines must be found.  I am really pleased that this will be available standard.

I was also very happy to see the Excel add-in.  As an academic, I know that my (MBA)  students are most comfortable working within Excel, and I will be very happy to introduce them to top-notch optimization in that environment (once ILOG figures out its pricing, which was unclear in the presentation).

Overall, I found this an inspiring workshop on both the business strategy and the technical sides.  IBM should also be recognized for bringing in a clicker system to get audience feedback:  that made for a very entertaining and useful audience feedback session.

One final point: IBM claims to have 800 “OR Experts”, which is a pretty good number.  If all of them became members of INFORMS, we would gain about 650 members, by my calculation.

Operations Research is hot at IBM

IBM announced today that it is forming a new consulting group for business analytics and optimization, called Business Analytics and Optimization Services.  With 4000 people, this is a pretty serious operation!  You can check out the news release and the Business Week coverage.   I’ll pass over the fact that IBM doesn’t use the phrase “operations research” in this announcement, and note that this group combines its consulting arm (formed with the acquisition of PricewaterhouseCoopers a few years back) with the strengths in research (in Watson and other places), two groups with world-leading operations research skills.  From the press release:

Working with more than 4,000 consultants dedicated to this effort will be experts from IBM Research’s world renowned laboratories with more than 200 mathematicians and advanced analytics experts. The company also made significant investments in Services Research for the past 10 years to build technologies and intellectual property that optimize new services offerings — all culminating in this new consulting practice in support of IBM’s Smarter Planet strategy, which recognizes the need for improved business insight.

I did work with PwC before it was acquired, and continued for a period after they became part of IBM. There was a large group of people who really got operations research, and we did some great work for the Internal Revenue Service and the US Postal Service. I am excited that IBM sees the value of operations research (OK, business analytics and optimization) sufficiently to put together such a large group.

This move is very much in keeping with IBMs previous acquisition of ILOG. From the Business Week article:

The consulting business may drive sales for a lot of IBM’s own technologies, as well. The company has built up a strong position in business analytics software in recent years, partly through acquisitions. In 2007 it paid $5 billion for Canada’s Cognos, a leader in business intelligence. Last year, IBM broke into business-process optimization with a $340 million acquisition of ILOG, a French company.

This is an exciting move, and I think it will have a significant effect on how the sort of mathematics we do is used in industry.

Added 3:15PM ET, April 14. Be sure to check out the IBM site for the new initiative. With people like Brenda Dietrich and Bill Pulleyblank involved, I think it is safe to assume that operations research is going to have a big role.

Time for Baseball

The baseball season started a few minutes ago with Atlanta playing Philadelphia.  I’ve been working with Major League Baseball for more than a dozen years, and my (along with partners, of course) company, The Sports Scheduling Group, produces the schedules for MLB (our chief scheduler Kelly Easton does all the hard work, but I do the final day assignments), as well as for the umpires (which I do, based on some fantastic work done a few years ago in a Tepper School  MBA project, further developed in Hakan Yildiz‘ dissertation).  The start of the season is always a time of anxiety for me (not strong anxiety, but a gnawing fear):  what if I forgot to put in a game?  What if Philadelphia shows up tonight, but Atlanta’s schedule has them in Los Angeles?  It is a rather silly worry, since thousands have people have looked at the schedule at this point, so it is unlikely that anything particularly egregious is happening.

Still, I was happy tonight to see Brett Myers toss the first pitch to Kelly Johnson (a ball).

And know that he did so because of operations research.

Test Gurobi yourself!

In keeping with an unusually awesome collection of comments, Daniel Fylstra of Frontline Systems wrote regarding the new linear/integer programming software from Gurobi (apologies for the self-linking, but, hey!, it’s my blog!):

Although Gurobi Optimization plans to offer the Gurobi Solver directly (with its own APIs and tool set) starting in April, it’s available in a full commercial release this month (February) as embedded in several modeling systems (AIMMS, Frontline, GAMS, MPL). We have it available for download right now (15 day free trial with no other restrictions) at http://www.solver.com/gurobi. You can run the Mittleman benchmarks yourself, or solve your own models in MPS, LP and OSIL format using a little program GurobiEval that we include. You can create models from scratch, and solve them with Gurobi, either in Excel with our modeling system, Risk Solver Platform, or in C, C++, C#, VB6, VB.NET, Java or MATLAB using our Solver Platform SDK.

This is excellent news, and I look forward to experimenting with the new software.

If this level of comments keeps up, I’ll have to initiate a “quote of the week”!

Check out ILOG’s DIALOG blog!

I am in Minneapolis, flying back to Pittsburgh after spending a couple days in Winnipeg for a memorial service for my Mom.  My Mom’s passing has a number of effects, large and small.  On the large side, my son Alexander has lost both his gradmothers in the past six months, which makes me sad:  every kid needs plenty of grandparents to spoil them!  Now my Dad will have to pick up all the slack.

On the small side, I had to cancel a trip I was looking forwad to:  ILOG’s DIALOG conference in Orlando.  First, it is Orlando, which looks pretty sweet for a Pittsburgher in February.  Second, with all of the changes for ILOG in the past year, I was looking forward to meeting people and seeing how the optimization side of ILOG was making out.  No chance to do that, unfortunately, but ILOG has significant blog coverage of the conference, which I recommend.  Most of the guest bloggers are rules-oriented (my absence messes up the optimization covererage, I guess) but it was good to read that the plenary of Tom Rosamilia (head of WebSphere) sees how optimization fits in (from James Taylor’s blog entry):

Tom identified four essentials for survival:

  • Adapt to embrace change
  • Streamline processes to make them more dynamic and manageable
  • Optimize to allocate resources efficiently
  • Visualize to transform insight into action for faster decisions

This is not quite the way I see optimization (either it is much more than resource allocation, or resource allocation has a much broader definition than I usually give it!), but at least we hit the main points.

American Express and Data Mining

I teach a data mining course here to our MBA students.  It is a popular course with about 70% of the students taking it at some point during their two years with us.  Since I am an operations research guy, I concentrate on the algorithms, but we spend a lot of time talk on the use of data mining, and possible pitfalls.  The New York Times today has a wonderful story illustrating the pitfalls.  American Express has been using shopping patterns to reduce customer’s credit limits.  This, in itself, is not surprising, but a letter the company sent out implies that it was basing the evaluation on the stores and companies the customer used, rather than a more direct measure of consumer ability to repay a debt.

“Other customers who have used their card at establishments where you recently shopped,” one of those letters said, “have a poor repayment history with American Express.”

Wow!  Shop at the Dollar Mart, you are not a careful shopper reacting to an uncertain financial world, but rather a poor credit risk who should be jettisoned before defaulting (I don’t know if Dollar Mart is one of the “bad” establishments: American Express has not released a list of companies that are signs of imminent financial doom). That is, of course, what data mining results come down to, but it is rare for a company to admit it.  Not surprisingly, customer’s who received such a letter became a little irate.  Check out newcreditrules.com for one person’s story.

American Express says it is no longer using store shopping information, but it will continue to use the results of data mining in its credit decisions.

In one presentation to analysts, it noted that people with multiple residences and multiple mortgages used to be a good bet. Now, the reverse is true.

In a good economy, lots of data mining was used to “help” customers by identifying new products or offers that might appeal to them.  Now, it seems that more data mining uses the customer’s data against their own interests.  I suspect we will see more stories of this type.

Gurobi versus CPLEX benchmarks

Hans Mittelmann has released some benchmark results comparing CPLEX 11.2 with the first version of Gurobi‘s code (1.02) in both sequential and parallel (4 processor) mode.  Mosek’s sequential code is also included in the test.  Let me highlight some of the lines:

=================================================================
s problem     CPLEX1    GUROBI1     MOSEK1     CPLEX4    GUROBI4
-----------------------------------------------------------------
    air04          9         18         49          7         13
   mzzv11         89        116        434         80        116
      lrn         42        104          f         36        996
ns1671066       1828          1       2474         26          1
ns1688347       1531        315          f        716        258
ns1692855       1674        322          f        688        234
     acc5         25        247       4621         30         33

These are not chosen to be typical, so be sure to look at the full list of benchmark results.  Concentrating on the first two columns, Gurobi’s code seems to be holding its own against CPLEX, and sometimes has a surprising result.  The ns1671066 result certainly looks like Gurobi is doing something interesting relative to CPLEX.

It is also striking how often the speedup is more than a factor of 4.  That same ns1671066 instances that CPLEX had such trouble on sequentially certainly looks much more solvable with 4 threads.  But sometimes there is a significant slowdown (see Gurobi on lrn).  Perhaps someone more skilled in this area can explain how we should interpret these benchmarks.

The official release for Gurobi is still two months away:  it will be interesting to see how the released version works.