Statistics, Cell Phones, and Cancer

Today’s New York Times Magazine has a very nice article entitled “Do Cellphones Cause Brain Cancer”. The emphasis on the article is on the statistical and medical issues faced when trying to find such a link. On the surface, it seems unlikely that cellphones have a role here. There has been no increase in brain cancer rates in the US over the time you would expect if you believe cellphones are a problem:

From 1990 to 2002 — the 12-year period during which cellphone users grew to 135 million from 4 million — the age-adjusted incidence rate for overall brain cancer remained nearly flat. If anything, it decreased slightly, from 7 cases for every 100,000 persons to 6.5 cases (the reasons for the decrease are unknown).

If it wasn’t for the emotion involved, a more reasonable direction to take would be to study why cellphones protect against brain cancer (not that I believe that either!).

This “slight decrease” is then contrasted in a later study:

In 2010, a larger study updated these results, examining trends between 1992 and 2006. Once again, there was no increase in overall incidence in brain cancer. But if you subdivided the population into groups, an unusual pattern emerged: in females ages 20 to 29 (but not in males) the age-adjusted risk of cancer in the front of the brain grew slightly, from 2.5 cases per 100,000 to 2.6.

I am not sure why 7 down to 6.5 is “slight” but 2.5 to 2.6 is “unusual”. It does not take much experience in statistics to immediately dismiss this: the test divides people into males and females (2 cases), age in 10 year groupings (perhaps 8 cases), and area of brain (unclear, but perhaps 5 cases). That leads to 80 subgroups. It would be astonishing if some subgroup did not have some increase. If this is all too dry, might I suggest the incomparable xkcdIf you test lots of things, you will come up with “significant” results.

Even if the 2.5 to 2.6 was “true”, consider its implications: among women in their 20s, about 4% of the occurrences of a rare cancer are associated with cell phone usage. This association would not be among men, or women 30 years or older or under 20. I am not sure who would change their actions based on this:  probably not even those in the most at risk group!

And there are still a large number of caveats: the causation might well be associated with something other than cell phone usage. While statistical tests attempt to correct for other causes, no test can correct for everything.

There are other biases that can also make it difficult to believe in tiny effects. The article talks about recall bias (“I have brain cancer, I used my phone a lot: that must be issue!”):

some men and women with brain cancer recalled a disproportionately high use of cellphones, while others recalled disproportionately low exposure. Indeed, 10 men and women with brain tumors (but none of the “controls”) recalled 12 hours or more of use every day — a number that stretches credibility.

This issue is complicated by a confusion about what “causes” means. Here is a quick quiz: Two experiments A and B both show a significant increase in brain cancer due to a particular environmental factor. A had 1000 subjects, B had 10,000,000 subjects. Which do you find more compelling?

Assuming both tests were equally carefully done, test A is more alarming. With fewer subjects comes a need for a larger effect to be statistically significant. Test B, with a huge number of subjects, might find a very minor increase; Test A can only identify major increases.  The headline for each would be “X causes cancer”, but the impact would be much different if Test A shows a 1 in 50 chance and test B shows a 1 in a million chance.

With no lower bound on the amount of increase that might be relevant, there is no hope for a definitive study: more and larger studies might identify an increasingly tiny risk, a risk that no one would change any decision about, except retrospectively (“I had a one in a zillion chance of getting cancer by walking to school on June 3 last year. I got cancer. I wish I didn’t walk to school.”). It certainly appears that the risks of cell phone usage are very low indeed, if they exist at all.

There is no doubt that environmental factors increase the incidence of cancer and other health problems. The key is to have research concentrate on those that are having a significant effect. I would be truly astonished if cell phones had a significant health impact. And I bet there are more promising things to study than the “linkage” between cell phones and cancer.

IBM, Ralph Gomory and Business Analytics

Had a post at the INFORMS Conference site on Ralph Gomory:

For those of us taking a break from the INFORMS conference, the Master’s golf tournament holds special attention. Not for the golf (though the golf is wonderful), but for the commercials. Practically every commercial break has an IBM commercial featuring some of its luminaries from the past. Prominent among them is Ralph Gomory. Everyone in operations research knows of Ralph. For the optimization-oriented types, he is the Gomory of Gomory cuts, a fundamental structure in integer programming. For the application-oriented types, he was the long-time head of research for IBM. For the funding and policy-oriented types, he was the long-time head of the Alfred P. Sloan Foundation supporting analysis on globalization, technology, and education. Great career, when you can be highly influential three different ways (so far)!

During his time at IBM, Ralph stressed the need for research and development to work together. This view that research should be grounded in real business needs is one that I think has greatly strengthened areas such as operations research and business analytics. While there is no dearth of theoretical underpinnings in these areas, the fundamental research is better by being guided by practical need. This has led to the insights that give us fast optimization codes, stronger approaches to risk and uncertainty, and the ability to handle huge amounts of data.

There is a full version of the IBM video that lasts about 30 minutes (currently on the front of their Smarter Planet page). Ralph shows up in the introduction, then around 24:43 in an extended discussion of the relationship between research and business need, and again near the end (30:08).

This conference would have been a lot different (and less interesting) without the career of Ralph as a researcher, executive and foundation leader. We are lucky he began in operations research.

INFORMS Sponsorship of OR-Exchange

OR-Exchange has been a question and answer site on operations research in existence for about two years. Over that time, there have been 290 questions, generating more than 1000 answers. You have a question? Chances are there is someone there to answer!

Coinciding with the newly revamped INFORMS Conference on Business Analytics and Operations Research is the INFORMS sponsorship of OR-Exchange. Conversion to new software and the INFORMS computing system has gone smoothly over the past few days (thanks David!), and we are excited about the new opportunities that come with INFORMS support.

In keeping with the renaming of the conference, we’ve also changed the tagline for OR-Exchange. We are now “Your place for questions and answers in operations research and analytics”.

Getting ready for INFORMS Business Analytics and OR conference

I’m getting ready for next week’s INFORMS Conference on Business Analytics and Operations Research. Looks like the renaming (from the INFORMS Practice Conference) has had an effect: the conference has gotten record registration (more than 600).

Getting ready for a conference is not just tossing some clothes in a suitcase. Keeping up my social networking responsibilities is a lot of work! I’ve changed my blog page to highlight the feed from the INFORMS Conference blog (where I will guest blog for a few days). We’ve started a discussion on the appropriate twitter-tag (I like #baor11). I’ve contacted some friends for suggestions of a brewpub to visit (Goose Island on Clybourn seems to be a good choice). Above all that, I have to read (thoroughly!) the papers associated with the Edelman competition, where I am a judge.

I have done my first post for the INFORMS Blog. Here is what I wrote:

I fly out to the Analytics conference in a few days. By some weird happenstance, I have never flown with Southwest before, but I am doing so on Saturday. In view of the issues Southwest is having, I need to do a bit of risk analysis. I really wish I could attend the risk analysis track before I get on the plane, instead of after I arrive.

Fortunately, Arnie Barnett (operations research go-to guy for aviation risk analysis) has provided insight into the risks. I think I’ll be OK with Southwest.

Puppetry, Turf Management, and Operations Research

CNN and careerbuilder.com have put out a list of six unusual college degrees. I checked it out, expecting to see Carnegie Mellon’s own offering in this area: bagpiping. But bagpiping was not unusual enough to make this list. After possibilities for racetrack management and packaging (“Don’t think outside the box: think about the box”), there was one appealing one nestled between puppetry and turfgrass management: “decision making”. At the Kelly School at the University of Indiana, you can get a doctorate in “help[ing] future business leaders analyze and make decisions.” Wow!

Of course, this is just our favorite field of Operations Research, weakly disguised with a fake mustache and beard:

According to the program’s website, “Decision Sciences is devoted to the study of quantitative methods used to aid decision making in business environments. Using mathematical models and analytical reasoning, students examine problems … and learn how to solve these problems by using a number of mathematical techniques, including optimization methods (linear, integer, nonlinear), computer simulation, decision analysis, artificial intelligence and more.”

In our never-ending quest to find the right name for our field, we are showing up on lists of wacky degrees, displacing bagpiping and cereal science (“Ingrain yourself to a great career”). Better that than being on no lists at all. Maybe a prospective puppeteer will see the list and decide to go into “decision making” instead. No strings attached.

Thanks to Kevin Furman for the pointer!

The Appeal of Operations Research and Sports

For a more recent comment on MLB scheduling and the Stephensons see my response to the 30 for 30 video.

The relationship between operations research and sports is one topic that I return to often on this site.    This is not surprising:  I am co-owner of a small sports scheduling company that provides schedules to Major League Baseball and their umpires, to many college-level conferences, and even to my local kids soccer league.  Sports has also been a big part of my research career.  Checking my vita, I see that about 30% of my journal or book chapter papers are on sports or games, and almost 50% of my competitive conference publications are in those fields.  Twenty years ago, my advisor, Don Ratliff, when looking over my somewhat eclectic vita at the time (everything from polymatroidal flows to voting systems to optimization implementation) told me that while it was great to work in lots of fields, it is important to be known for something.  To the extent that I am known for something at this point, it is either for online stuff like this blog and or-exchange, or for my part in the great increase of operations research in sports, and sports scheduling in particular.

This started, as most things in life often do, by accident.  I was talking to one of my MBA students after class (I was younger then, and childless, so I generally took my class out for drinks a couple times a semester after class) and it turned out he worked for the Pittsburgh Pirates (the local baseball team).  We started discussing how the baseball schedule was created, and I mentioned that I thought the operations research techniques I was teaching (like integer programming) might be useful in creating the schedule.  Next thing I know, I get a call from Doug Bureman, who had recently worked for the Pirates and was embarking on a consulting career.  Doug knew a lot about what Major League Baseball might look for in a schedule, and thought we would make a good team in putting together a schedule.  That was in 1996.  It took until 2005 for MLB to accept one of schedules for play.  Why the wait?  It turned out that the incumbent schedulers, Henry and Holly Stephenson were very good at what they did.  And, at the time, the understanding of how to create good schedules didn’t go much beyond work on on to minimize breaks (consecutive home games or away games) in schedules, work done by de Werra and a few others.  Over the decade from 1996-2005, we learned things about what does work and what doesn’t work in sports scheduling, so we got better on the algorithmic side.  But even more important was the vast increase in speed in solving linear and integer programs.  Between improvements in codes like CPLEX and increases in the speed of computers, my models were solving millions of times faster in 2005 than they did in 1996.  So finally we were able to create very good schedules quickly and predictably.

In those intervening years, I didn’t spend all of my time on Major League Baseball of course.  I hooked up with George Nemhauser, and we scheduled Atlantic Coast Conference basketball for years.  George and I co-advised a great doctoral student, Kelly Easton, who worked with us after graduation and began doing more and more scheduling, particularly after we combined the baseball activities (with Doug) and the college stuff (with George).

After fifteen years, I still find the area of sports scheduling fascinating.  Patricia Randall, in a recent blog post (part of the INFORMS Monthly Blog Challenge, as is this post) addressed the question on why sports is such a popular area of application.  She points to the way many of us know at least something about sports:

I think the answer lies in the accessibility of the data and results of a sports application of OR. Often only a handful of people know enough about an OR problem to be able to fully understand the problem’s data and judge the quality of potential solutions. For instance, in an airline’s crew scheduling problem, few people may be able to look at a sequence of flights and immediately realize the sequence won’t work because it exceeds the crew’s available duty hours or the plane’s fuel capacity. The group of people who do have this expertise are probably heavily involved in the airline industry. It’s unlikely that an outsider could come in and immediately understand the intricacies  of the problem and its solution.

But many people, of all ages and occupations, are sports fans. They are familiar with the rules of various sports, the teams that comprise a professional league, and the major players or superstars. This working knowledge of sports makes it easier to understand the data that would go into an optimization model as well as analyze the solutions it produces.

I agree that this is a big reason for popularity. When I give a sports scheduling talk, I know I can simply put up the schedule of the local team, and the audience will be immediately engaged and interested in how it was put together. In fact, the hard part is to get people to stop talking about the schedule so I can get on talking about Benders’ approaches or large scale local search or whatever is the real content of my talk.

But let me add to Patricia’s comments: there are lots of reasons why sports is so popular in OR (or at least for me).

First, we shouldn’t ignore the fact that sports is big business. Forbes puts the value of the teams of Major League Baseball to be over $15 billion, with the Yankees alone worth $1.7 billion. With values like that, it is not surprising that there is interest in using data to make better decisions. Lots of sports leagues around the world also have high economic effects, making the overall sports economy a significant part of the overall economy.

Second, there are a tremendous number of issues in sports, making it applicable and of interest to a wide variety of researchers. I do essentially all my work in scheduling, but there are lots of other areas of research. If you check out the MIT Sports Analytics conference, you can see the range of topics covered. By covering statistics, optimization, marketing, economics, competition and lots of other areas, sports can attract interest from a variety of perspectives, making it richer and more interesting.

A third reason that sports has a strong appeal, at least in my subarea of scheduling, is the close match between what can be solved and what needs to be solved. For some problems, we can solve far larger problems than would routinely occur in practice. An example of this might be the Traveling Salesman Problem. Are there real instances of the TSP that people want to solve to optimality that cannot be solved by Concorde? We have advanced so far in solving the problem, that the vast majority of practical applications are now handled. Conversely, there are problems where our ability to solve problems is dwarfed by the size of problem that occurs in practice. We would like to understand, say, optimal poker play for Texas Hold’em (a game where each player works with seven cards, five of them in common with other players). Current research is on Rhode Island holdem, where there are three cards and strong limitations on betting strategy. We are a long way from optimal poker play.

Sports scheduling is right in the middle. A decade ago, my coauthors and I created a problem called the Traveling Tournament Problem. This problem abstracts out the key issues of baseball scheduling but provides instances of any size. The current state of the art can solve the 10 team instances to optimality, but cannot solve the 12 team instances. There are lots of sports scheduling problems where 10-20 teams is challenging. Many real sports leagues are, of course, also in the 10-20 team range. This confluence of theoretical challenge and practical interest clearly adds to the research enthusiasm in the area.

Finally, there is an immediacy and directness of sports scheduling that makes it personally rewarding. In much of what I do, waiting is a big aspect: I need to wait a year or two for a paper to be accepted, or for a research agenda to come to fruition. It is gratifying to see people play sports, whether it is my son in his kid’s soccer game, or Derek Jeter in Yankee Stadium, and know not only are they there because programs on my computer told them to be, but that the time from scheduling to play is measured in months or weeks.

This entry is part of the March INFORMS Blog Challenge on Operations Research and Sports.

Great Way to Get to The INFORMS Conference on Business Analytics and Operations Research

I am very much looking forward to attending this year’s INFORMS Conference on Business Analytics and Operations Research (formally the INFORMS Practice Conference).  I am a judge for the Edelmans, so I will be spending Monday watching the presentations and asking tough questions (“Wow, did you really save $200 million?  That’s so cool!”).  I’ll also be attending some of the Technology Workshops on Sunday, and will attend other presentations on Tuesday.

Over the last few years, I have scrabbled together some funds to support sending some of the Tepper MBA students to the conference (thanks Tepper Administration for all of your support!), and they always come back raving about the conference and the field.  I expect this year to be no different:  we’ll have four students in our business analytics track (at least!) at the conference.

Two of the students will be attending the Professional Colloquium, a day-long program for Masters and PhD students who are transitioning into real-world careers.   I always worry when I suggest this to MBAs since the professional skills and insight into organizations that the day provides are the same skills an MBA provides (and which are more commonly lacking in normal masters programs in operations research).  Will they get enough out of the day? But every MBA who has attended the Colloquium has loved it:  the speakers provide insights into success from the perspective of operations research/business analytics professionals.  For many of the students who have attended, this is a life-changing experience.  I see that one of my students from a couple of years ago thinks enough of this to be part of this year’s organizing committee!

Whether you are a business analytics-oriented MBA, a Masters of OR or IE, or a Doctoral student (or a recent graduate in any of these areas), I can’t recommend the program highly enough.  And, while the registration fee of $375 might not seem cheap, it really is a steal, since it includes participation in the full conference as well as the Colloquium.  There are some limited support funds from the Colloquium committee, but this is the sort of activity that your school really should be supporting (and even if not, this is a great investment in your career).

Applications are due March 25, so get going if you want to be part of this!

Programming versus Optimization

Renaming is a powerful way to show change.  Recently, I came across two colleagues who changed names.  Once did the equivalent of changing names from Michael to Michelle, signifying some very significant life changes.  The other went from a name like John Smith to Luigi Backtrend (real names changed to protect the innocent!), wanting to make himself more unique and visible to online searches.   Name changes like this can affect a life and career:  no one listened to the song stylings of Arnold Dorsey until he became Englebert Humperdinck, and Marion Morrison could not possibly be the hero of a western, but he could, renamed as John Wayne.

Somehow I completely missed the changes that the Mathematical Programming Society has undergone in this regard.  Last year, the forty-year-old MPS became the Mathematical Optimization Society.  At an age when many men are buying sports cars and experimenting with extramarital affairs, the MPS decided a name change would scratch its mid-life crisis itch.

While I am not enthusiastic about the change, I am sympathetic to the need.  The word “programming” in mathematical programming (and linear programming, integer programming, dynamic programming and so on) does not match up with the current use of “programming” to mean, almost exclusively, the programming of computers.  Back in the 40s and 50s, “programming” could be used for any sort of planning, so “linear programming” made sense: it was a method for determining (planning or programming) the maximum of a linear function over linear constraints.  The word “program” is still used in many contexts (“television program”, “conference program”, and so on) but in much of our world, “program” now means one thing: a computer program.  So the meaning of “linear programming” is no longer self-evident. “Mathematical programming” could now be misinterpreted as creating computer programs for mathematics, which is not quite what the field defines it as.

The diagram shows how often the phrases “linear programming” and “computer program” appear in books, through the Google Books ngram system.  I am surprised that linear programming does that well (and it dominates “computer programming”):  it is a term with a great history.  But “computer program” is certainly more common these days, and I suspect there are many, many instances where program is used without a qualifier to mean “computer program”.

“Optimization” has, at least so far, kept a meaning of “finding the best value”, though I hear my students (and researchers in meta-heuristics) refer to “more optimal” solutions so often that I fear it too is losing its meaning.  So “Mathematical Optimization” is a bit more self-evident.   It is, however, not a term that has been used a lot.  The same ngram system does not show anywhere near as much use of “mathematical optimization” as “mathematical programming”.  And while the ngrams are only for books, Google search shows a five to one advantage for “mathematical programming” over “mathematical optimization”, while the ratio is twenty to one in Google Scholar.

But even our field did not uniformly adopt “programming” over “optimization”.  It is, after all “combinatorial optimization” not “combinatorial programming”.  So there is some historical precedent for the use of “optimization”.

While I hate the idea of downgrading the word “programming” in our field (it is not just “computer programming”!), I understand why MPS/MOS decided to be proactive on this front.  And I appreciate they way they did it quickly, seemingly without the endless hand-wringing of those of us in operations research/management science/decision optimization/prescriptive analytics/advanced business analytics.  There may come a day when the Institute for Operations Research and the Management Sciences (INFORMS) changes its name:  I can’t believe it will be done as easily as MPS/MOS has done it.

Finding Love Optimally

Like many in operations research, my research interests often creep over into my everyday life. Since I work on scheduling issues, I get particularly concerned with everyday scheduling, to the consternation of my friends and family (“We should have left six minutes ago: transportation is now on the critical path!”). This was particularly true when I was a doctoral student when, by academic design, I was living and breathing operations research 24 hours a day.

I was a doctoral student from ages 22 to 27 (age will be relevant shortly), and like many in that age group, I was quite concerned with finding a partner with whom to spend the rest of my life. Having decided on certain parameters for such a partner (female, breathing, etc.), I began to think about how I should optimally find a wife. In one of my classes, it hit me that the problem has been studied: it is the Secretary Problem! I had a position to fill (secretary, wife, what’s the difference?), a series of applicants, and my goal was to pick the best applicant for the position.

Fortunately, there is quite a literature on the Secretary Problem (for a very nice summary of results, see this site, from which I also got the background to the graphic for this entry), and there are a number of surprising results. The most surprising is that it is possible to find the best secretary with any reasonable probability at all. The hard part is that each candidate is considered one at a time, and an immediate decision must be made to accept or reject the candidate. You can’t go back and say “You know, I think you are the cat’s meow after all”. This matched up with my empirical experience in dating. Further, at each step, you only know if the current candidate is the best of the ones you have seen: candidates do not come either with objective values or with certifications of perfection, again matching empirical observations. You can only compare them with what you have sampled.

Despite these handicaps, if you know how many candidates there are, there is a simple rule to maximize the chance of finding the best mate: sample the first K candidates without selecting any of them, and then take the first subsequent candidate who is the best of all you have seen. K depends on N, the total number of candidates you will see. As N gets big, K moves toward 1/e times N, where e is 2.71…. So sample 36.9% of the candidates, then take the first candidate who is the best you have seen. This gives a 36.9% chance of ending up with Ms (in my case) Right.

One problem here: I didn’t know what N is. How many eligible women will I meet? Fortunately, the next class covered that topic. If you don’t know what N is but know that you will be doing this over a finite amount of time T, then it is OK to replace this with a time cutoff rule: simply take the first candidate after 36.9% of the time (technically, you use 36.9% of the cumulative distribution, but I assumed a uniform distribution of candidate arrivals). OK, I figured, people are generally useless at 40 (so I thought then: the 50-year-old-me would like to argue with that assumption), and start this matching process at about 18 (some seem to start earlier, but they may be playing a different game), so, taking 36.9% of the 22 year gap gives an age of 26.11. That was my age! By a great coincidence, operations research had taught me what to do at exactly the time I needed to do that.

Redoubling my efforts, I proceeded to sample the candidate pool (recognizing the odds were against me: there is still only a 36.9% chance of finding Ms Right) when lo and behold. I met Her: the woman who was better than every previous candidate. I didn’t know if she was Perfect (the assumptions of the model don’t allow me to determine that), but there was no doubt that she met the qualifications for this step of the algorithm. So I proposed.

And she turned me down.

And that is when I realized why it is called the Secretary Problem, and not the Fiancee Problem (though Merrill Flood proposed the problem under that name). Secretaries have applied for a job and, presumably, will take the job if offered. Potential mates, on the other hand, are also trying to determine their best match through their own Secretary Problem. In order for Ms Right to choose me, I had to be Mr. Right to her! And then things get much more complicated. What if I was meeting women in their sampling phase? It did seem that some people were very enthusiastic about having long sampling phases, and none of them would be accepting me, no matter how good a match they would be for me. And even the cutoff of 36.9% looks wrong in this case. In order to have a hope of matching up at all in this “Dual Secretary Problem”, it looked like I should have had a much earlier cutoff, and in fact, it seemed unlikely there was a good rule at all!

I was chagrined that operations research did not help me solve my matching problem. I had made one of the big mistakes of practical operations research: I did not carefully examine the assumptions of my model to determine applicability.

Downcast, I graduated with my doctorate, resolving to marry myself to integer programming. I embarked on a postdoc to Germany.

There, I walked into a bar, fell in love with a beautiful woman, moved in together three weeks later, invited her to live in the United States “for a while”, married her six years after that, and had a beautiful son with her six years ago. I am not sure what optimization model led me down that path, but I think I am very happy with the result.

Some details of this story have been changed to protect me from even more embarrassment. This post is part of the February INFORMS Blog Challenge.