The Perils of Search Engine Optimization

This blog has been around more than six years, making it ancient in the blogosphere.  And, while not popular like the big-boy blogs (I run about 125,000 hits a year with about 1500 RSS subscribers according to FeedBurner), I think I have a reasonable-sized audience for the specialized topic I cover (operations research, of course!).  People recognize me at conferences and I get the occasional email (or, more commonly, blog comment) that lets me know that I am appreciated.  So I like blogging.

The past few months, however, have been a chore due to the amount of comment spam I get.  Of course, I have software to get rid of most the spam automatically (Akismet is what I have installed), since otherwise it would be unbearable.  Akismet stopped 5,373 spam comments in the last year.  This sounds like a lot but that is way down from the heights a few years ago:  Akismet stopped 6,711 spams in the month of March, 2009 alone.  Unfortunately, it is letting a lot more spam come through for me to judge: in the past year 619 entries were put through to moderation that I determined were spam.  This is a frustrating exercise since I like my readers: if they want to say something, I want them to say it!  But comment after comment from places like “Sacremento Cabs” or “Callaway Reviews” saying vaguely on-topic things is a bit hard to take.   Sometimes it seems that someone has taken the effort to read the blog post and comments:

“From the communication between the two of you I think I can say that I wish I had a teacher like Mr. X and I wish I had a student like Y.”

came in recently, where Mr. X and Y were previous (legit) commentators.  But the URL of the commentator was to some warez site, so I deleted it all.  Is this a human posting, or just some pattern matching?

Why the sudden influx?  Further checking the logs showed that a number of people (a couple hundred per month) are getting to this blog by searching something like ‘site:.edu inurl:blog “post a comment”‘.  Sure enough if I do that search (logging out from google and hoping I get something like a generic search result, I get the following:

Wow!  Of all the .edu blogs that include the phrase “Post a Comment”, I come in at number 3!  Of course, despite my efforts, Google may still be personalizing the search towards me, but clearly I am showing up pretty high to attract the attention of hundreds of people.  Through my diligence and efforts, I have made this blog attractive to the Google algorithms (I do seem to be number 1 for “operations research blog” and some other natural searches).  This is a great Search Engine Optimization success!

Or not.  Because clearly I am attracting lots of people who have no interest in what I have to say but are rather visiting to figure out how they can manipulate me and the blog for their own non-operations research purposes (I am perfectly happy to be manipulated for operations research purposes!).   The sponsored link in the search gives it away: there are companies working hard to get comments, any comments, on blogs (presumably any blogs).   How many of those 125,000 hits were really my audience (people in operations research or those who would like to know more about it)?  Do I really have an operations research audience at all (beyond Brian, Matt, Laura, and a few others who I know personally)?

I’ll spend time thinking about ways to avoid this aggravation.  I’ve already put in NOFOLLOW tags, so there is no SEO value to any URLs that get through. I already warn that if the URL submitted is not on operations research, then the comment will be deleted.  I could get rid of URLs completely, but I do see legitimate comments as a way of leading people to interesting places.  I could add more CAPCHAs and the like, though I am having trouble with some of those myself, particularly when surfing with a mobile device.  Or I can put up with deleting useless comments with inappropriate URLs and just relax about the whole thing.  But fundamentally: how can you do Search Engine Optimization to attract those you would like to attract without attracting the attention of the bottom-feeders?

On the off chance that one of the …. shall we say, enterprising souls, has made it through this post, perhaps you can explain in the comments why you add a comment when the topic is of no interest to you.  Do you get a dime for every one that makes it through?  Are you bored and find this a useful way to pass a quiet afternoon?  Are you a program, mindlessly grabbing parts of the post and comments to make yourself look more human?  Add a comment:  I probably won’t let your URL through but at least we can understand each other a bit better.

 

Cure for the Winter Blues

The weather here in Pittsburgh is finally feeling more like winter: it is a dreary, windy day, with intermittent snow. I suspect it will only get worse over the next two months.

Fortunately I know a cure for the winter blues: Optimization! Optimization with the INFORMS crowd, that is, at the Fourth INFORMS Optimization Society Conference, February 24-26, 2012. I’ll be there (I’m the co-chair, so attendance is kinda expected) and I think there is a great slate of plenary and special presentations:

  • Jeralt Ault from the University of Miami on using analytics for managing fisheries,
  • Manoj Saxena from IBM on aspects of Watson,
  • Dimitris Bertsimis from MIT on analytics in sports,
  • Suvrajeet Sen from Ohio State on stochastic mixed integer programs,
  • Dave Alderson from NPS on attacker-defender modeling

(all these are my gloss on the topic: formal titles and abstracts are coming soon).

Nothing like a bit of optimization to make the world a bit brighter!

If you’d like to submit an abstract for this conference, get a move on: the due date is January 6, 2012. Abstracts are short (300 characters) so it shouldn’t take much time to put something together.

Oh, and the conference will be in Miami, which might also do some good for getting away from the winter for many of us.

Operations Research Resolutions for 2012

It is the time of the year for resolutions.  Resolutions help define what we would like to be, but are damned hard things to follow through on.  As Paul Rubin points out, gyms will be overrun for a few weeks with “resolvers” (or, as my former personal trainer called them “resolutionists”) before the weight room empties again.  I’ve been as guilty as any in this regard:  it is no coincidence that my gym membership runs January 15-January 15 and that I renew it each year with the best intents, and the worst results.  Paul suggests an OR view of resolutions:

…the New Year’s resolution phenomenon offers a metaphor for O. R. practice. The “resolver” diets, exercises, stops smoking or whatever for a while because the “boss” (their conscience) is paying attention.  When the “boss” stops watching, the “resolver” makes excuses for why the new regime is too difficult, and reverts to previous behavior.  An O. R. solution to a business problem that is implemented top-down, without genuine commitment by the people who actually have to apply the solution (and change behaviors in doing so), is likely to end up a transient response leading to a return to the previous steady state.

So, in keeping with an operations research view of resolutions, I’ve been thinking about my resolutions with a particular focus on variables (what choices can I make), constraints (what are the limits on those choices) and objectives (what I am trying to accomplish).  It does no good to define objectives and go willy-nilly off in those directions without also defining the constraints that stop me from doing so.   But, of course, a creative re-definition or expansion of variables might let me find better solutions.

I have personal resolutions, and take inspiration from people around me who are able to transform themselves (yes, BB, I am talking about you!).  But I also have some professional resolutions.  So, here they are, and I hope they have been improved by an operations research view:

  1. Make time for research.  This might seem to be a funny resolution:  isn’t that a big part of what professors do?  Unfortunately, I have taken an administrative role, and there is a never-ending list of things to do.    Short term, it seems if I will be a better Associate Dean if I get on with the business of Associate Deaning, but long-term I know I will be a better Associate Dean if I keep active in research.  The decision model on time allocation has to be long term, not short term.
  2. Do what makes me happiest.  But where will the time come from?  I need to stop doing some things, and I have an idea of what those are.  I have been very fortunate in my career: I’ve been able to take part in the wide varieties of activities of a well-rounded academic career.  Not all of this gives me equal pleasure.  Some aspects (*cough* journals *cough*) keep me up at night and are not my comparative advantage.  So now is the time to stop doing some things so I can concentrate on what I like (and what I am comparatively good at).  While many of my decisions in my life can be made independently, time is a major linking constraint.
  3. Work harder at getting word out about operations research.  This has not been a great year for this blog with just 51 posts.  I don’t want to post just for the sake of posting, but I had lots of other thoughts that just never made it to typing.  Some appeared as tweets, but that is unsatisfying.  Tweets are ephemeral while blog entries continue to be useful long after their first appearance.  This has been a major part of my objective function, but I have been neglecting it.
  4. Truly embrace analytics and robustness.  While “business analytics” continues to be a hot term, I don’t think we as a field have truly internalized the effect of vast amounts of data in our operations research models.  There is still too much a divide between predictive analytics and prescriptive analytics.  Data miners don’t really think of how their predictions will be used, while operations researchers still limit themselves to aggregate point estimates of values that are best modeled as distributions over may, predictable single values.   Further, operations research models often create fragile solutions.  Any deviation from the assumptions of the models can result in terrible situations.  A flight-crew schedule is cheap to run until a snowstorm shuts an airport in Chicago and flights are cancelled country-wide due to cascading effects.  How can we as a field avoid this “curse of fragility”?  And how does this affect my own research?  Perhaps this direction will loosen some constraints I have seen  as I ponder the limits of my research agenda.
  5. Learn a new field.  While I have worked in a number of areas over the years, most of my recent work has been in sports scheduling.  I started in this are in the late 90s, and it seems time to find a new area.  New variables for my decision models!
OK, five resolutions seems enough.  And I am not sure I have really embraced an operations research approach:  I think more constraints are needed to help define what I can do, my objective is ill-defined, and even my set of variables is too self-limited.  But if I can make a dent in these five (along with the eight or so on my personal list) then I will certainly be able to declare 2012 to be a successful year!

Happy New Year, everyone, and I wish you all an optimal year.

This entry is part of the current INFORMS Blog Challenge.

The Plight of the Traveling Politician

Bill Cook, Professor at Georgia Tech, has an article in the “Campaign Stops” blog of the New York Times outlining the plight of the candidates vying for the Republican nomination for the U.S. presidency currently campaigning in Iowa.  It is a quirk of American politics that  a small state can take on outsized importance in the U.S. political process, but historically states such as Iowa and New Hampshire vote early on “their” nominees, while larger states such as California and New York come later.  The effect of this is that candidates spend a tremendous amount of time in relatively low-population states.  This is not necessarily a bad thing:  the rest of the country gets to watch the candidates do practically one-on-one politics and we get to see how they handle the varied interests in these states in great detail.

At least two of the candidates in Iowa (Rick Santorum and Michelle Bachmann) will have visited all 99 counties in that state, leading up to the January 3, 2012 caucus date (it is a further quirk of American politics that some state don’t hold elections as such: a caucus is a gathering of neighbors to discuss, debate, and choose their delegates to a state convention which then will choose among the candidates).

The question Bill asked was “What is the quickest way to visit all 99 counties?”.  Actually, Bill asked the question “What is the quickest way to visit the county seats of all 99 counties?”, a somewhat easier problem.  This is an instance of the Traveling Salesman Problem, and problem Bill has studied for years.  Bill is the primary author of Concorde, clearly the best code for finding optimal Traveling Salesman tours.  As Bill explains in the New York Times article, the TSP is formally a hard problem, but particular instances can be solved to optimality:

Leaving Des Moines, we have 98 choices for our second stop, then, for each of these, 97 choices for the third stop, then 96 choices for the fourth stop, and so on until we hit our last county and drive back to Des Moines. Multiplying all of these choices gives the number of possible routes through Iowa. And it is a big number, roughly 9 followed by 153 zeros. No computer in the world could look through such an enormous collection of tours one by one.

The task of choosing the best of the tours is known as the traveling salesman problem, or T.S.P. I work in an applied mathematics field called operations research, which deals with the optimal use of scarce resources, and the T.S.P. is a specialty of mine.

This is a tough challenge, but to tour Iowa we do not need a general solution for the problem, we just have to handle our particular 99-city example. We were able to do this using a technique known as the cutting-plane method.

The Iowa instance is, presumably, not much of a challenge for Concorde:  that code has found solutions to TSP instances with tens of thousands of cities.  I am sure that most of the work, which Bill did together with Alain Kornhauser and Robert Vanderbei of Princeton, was in determining the distances between the 99 county seats.  The actually route optimization would be a matter of seconds or less.  The result is a 55.5 hour tour, traveling at legal speed limits.

If you would like to experiment with TSPs of your own, the excellent NEOS solvers allow you to upload your own instances and run Concorde on them.

Stay tuned for more from Bill Cook as we count down to the publication of his book In Pursuit of the Traveling Salesman, to be published in January.  Trust me, you are going to like the book, so show some support for Bill by liking the Facebook page!

 

COIN-OR Cup, 2011 edition

COIN-OR is a project to spur open-source activities in operations research.  I am a big supporter of this activity, to the extent that I was part of its Strategic Leadership Board for a term until I did them an even bigger favor by stepping aside for people who could be even better at this than I (not that such people were exactly rare:  my time on SLB corresponded to a somewhat over-committed time for me).

Every year COIN-OR gives out an award called the COIN-OR INFORMS Cup.  This year’s winner has just been announced, and I think the committee has made an inspired choice:

The submission “OpenSolver: Open Source Optimisation for Excel using COIN-OR”, by Andrew Mason and Iain Dunning, has been selected as the winner of the 2011 edition of the COIN-OR INFORMS Cup. OpenSolver is an “Open Source linear and integer optimizer for Microsoft Excel. OpenSolver is an Excel VBA add-in that extends Excel’s built-in Solver with a more powerful Linear Programming solver.” (from http://opensolver.org)

This year’s award recognizes that lots and lots of people want to use top-notch optimization code, but would like to stay in the world of Excel.  The authors of this work (who I am very proud to say come from the University of Auckland (at least in Andrew’s case), where I was a visitor in 2007) have done a great job in integrated the optimization codes from COIN-OR into an easy-to-use interface in Excel.  It is a fantastic piece of work (that I blogged about previously) and one that I believe does a tremendous amount of good for the world of operations research.  If you can model in Excel’s Solver, then you can plug in OpenSolver and start using the COIN-OR solvers with no limits on problem size.  I am also delighted to see that that they have moved to CPL licensing, rather than GPL, which was my only whine in my original post.

Congratulations Andrew and Iain.  If you would like to celebrate this award, there is a reception to attend, thanks to IBM:

All entrants and their supporters are welcome to join in the celebration and regale (rile) the prize winners.

Date: Sunday, November 13
Time: 8pm-10pm

Location: The Fox and Hound
330 North Tryon St.
Charlotte, NC 28202
(Directions: http://tinyurl.com/75zhm7k)

The celebration is sponsored by IBM.

Good work by the committee:

The COIN-OR INFORMS Cup committee:

Pietro Belotti
Matthew Galati
R. Kipp Martin
Stefan Vigerske

Kiwis and open source rule!

This entry also occurs in the INFORMS Conference blog.

Getting ready for Charlotte, and first blog entry there

I am getting ready for the INFORMS Conference coming up next week in Charlotte.  As I generally do, I will be guest blogging at the conference (along with more than a dozen others: great lineup this year!), so my blog entries will appear there (often with a copy showing up here).  I have put together my first entry, entitled “Hoisted on Operations Research’s Petard” (a petard is a small bomb;  if a military engineer had his bomb explode prematurely, he would be hoisted into the air):

For ’tis the sport to have the enginer
Hoist with his own petard, an’t shall go hard

Hamlet, Act 3, Scene 4

I am greatly looking foward to this year’s INFORMS Annual Conference in Charlotte.  There is nothing like getting together with 4000 of my closest friends, raising many a coffee (and other liquids) in toasting the successes of our field.

I could see the successes of operations research over the last couple of days as I tried to change my flights and hotel in reaction to some family issues.  I had booked everything months ago, paying a pittance for the flight and getting the conference rate for the hotel.  Of course, trying to rebook things three days in advance was a different story:  $150 change fees, along with quadrupling of airfare was the opening bid, with the opportunity to pay about six times the airfare if I wanted to fly at a time when humans are normally awake.   And I’m not sure why this happened, but the hotel took the chance to increase my daily rate by $10, even though I just knocked a day off my reservation.  The conference suddenly became a lot more expensive, just because my wife pointed out that if I don’t rake the leaves on Saturday, when will it ever get done!

I know who to blame for all this:  operations research, of course.  The subfield of “Revenue Management” makes change fees and differential pricing a science.  And that field is one of the great success stories of operations research, as shown by such things as the string of Edelman finalists that focus on revenue management.  So, while I rue the extra expense that operations research has caused me, I can take solace in knowing that I will eventually gain far more due to the overall success of our field.

See you in Charlotte!

 

Can the New York Times Magazine Please Hire a Mathematically Literate Editor?

The New York Times Magazine provides almost inexhaustible fodder for this blog.  Recently I have written about prostates, cell phones, and ketogenic diets, based on articles in the Magazine.  Normally the articles are well researched and provocative.  But sometimes things seem to go a bit haywire in the editing process, and the Magazine lets through stuff that it really should catch (see, for instance, my post on nonsensical graphics).  This week in the midst of a fascinating (and scary) article on the state of US bio-defense, there came the following sentence:

In hundreds of experiments, scientist weaponized the bacteria to extraordinary potency and the proceeded to mix the slurry with another agent… which multiplied the effect logarithmically.

I suppose I should be happy that the Magazine did not go with the lazy “exponentially”, but really:  what can it mean to multiply an effect “logarithmically”?  Take x and multiple by log x?  Base 10?  e?  This does not seem the case based on the following sentences, where the result “shatter[s] the human immune system”.  It seems more likely that the author was searching for something to add a mathematical veneer and grabbed the only function remembered from a long-ago high school math class.

“… which greatly increased the effect” might not sound so sophisticated, but I think would have been a better choice here.

P.S.  I now recall that I have seen this strange use of logarithmically before.  Perhaps I can have a category where I track this usage.

 

Get Money from Google for Operations Research, in Europe anyway

Google seems to be getting the idea that operations research has a lot to add to the company.  This is a long way from the time where I got the following response from a google software engineer at an information systems conference: “Integer programming?  I tried it:  it doesn’t work”.  Oh, really?

This change is great for our field:  the line between operations research and computer science is extremely fuzzy and is getting fuzzier as “business analytics” unites traditionally CS areas such as data mining with traditionally OR areas such as optimization.  Operations Research needs to be seen as a key reason for the continuing success for companies like Google.

If you are in Europe, you can even get some money for a post-doc or other research expense from google, as per this announcement from Laurent Perron:

Object: Call for Proposal for Google Focused Grant Program on Mathematical Optimization and Combinatorial Optimization in Europe.
Deadline: November 25th (strict).
Contact: proposal should be sent to both Laurent Perron (lperron@google.com) and Michel Benard (benard@google.com).
Format: A proposal is a 3 pages document following the format described in http://research.google.com/university/relations/research_awards.html

The purpose of this program is to facilitate more interaction between Google and academia and also nurture stronger relations and partnerships with universities. The intent of this focused awards program is to support academic research aimed at improving the theory and applications of mathematical and combinatorial optimization (Operations Research, Constraint Programming, Meta-Heuristics). Google funds Research Awards unrestricted and retains no intellectual property from the research. We prefer if the results from the research are open sourced and widely published. Awards through this program are for one year in the range of $10K-$150K.
A typical grant will cover 1 Post-Doctoral student for 1 year. This program is restricted to Europe.

Areas that are of particular interest include (but are not limited to):

* Inference and relaxation methods: constraint propagation, cutting planes, global constraints, graph algorithms, dynamic programming, Lagrangean and convex relaxations, counting based inferences and heuristics, constraint relaxation based heuristics.

* Search methods: branch and bound, intelligent backtracking, incomplete search, randomized search, column generation and other decomposition methods, local search, meta-heuristics, large scale parallelism.

* Integration methods: static/dynamic problem decomposition, solver communication, transformations between models and solvers, collaboration between concurrent methods, models, and solvers.

* Modeling methods: comparison of models, symmetry breaking, uncertainty, dominance relationships, model compilation into different technologies (CP, LP, etc.), learning (CP, LP) models from examples.

* Innovative applications derived from OR techniques.

As a point of comparison, last year, we funded 9 grants in the following fields: Explanations in Constraint Programming, SAT techniques in Scheduling, Just-In-Time scheduling, Complex Bin-Packing, Parallel Resources in Scheduling, Convex Integer Programming, Ride sharing, Large Scale Mixed Integer Programming, and Territory Design.

Google values well written proposals that fit into the 3 page format + references. These proposals should include the following sections:
– A clear description of the problem the authors are trying to solve, and the potential impact they could have if the research is successful.
– An overview of the state of the art in this field and an explanation of why the proposed research is innovative.
– A precise understanding on how the authors are going to solve this problem.
– A convincing experimental method that will prove the authors are solving the real problem on realistic data instances, and that will measure the actual gains.
– The biography and a link to recent publications from the P.I. related to the proposal.

Finally, receiving a Google Research Award does not grant access to Google data. In some instances Google researchers may however be interested in collaborating on the problem formulation and the algorithmic solution.

Benchmarks: Coloring, Sports and Umpires

I have always felt strongly that operations research needs more libraries of instances for various problem classes, along with listings of current best solutions.  By tracking how well we solve problems over time, we can show how we advance as a field.  It also makes it easier to evaluate new work, making both authors and referees work easier.

I began this direction almost two decades ago when I spent a year at DIMACS (a fantastic research institute on discrete mathematics and computer science based at Rutgers) when I (together with David Johnson) ran their Computational Challenge, with an emphasis on solving graph coloring, clique, and satisfiability instances.  From that, I put together a page on graph coloring (which has to be one of the oldest pages on the internets!)   David, Anuj Mehrotra and I followed that up in 2003 with an updated challenge just on graph coloring.   It was great to see people re-use the same instances, so we could understand the advances in the field.  It is hard to tell exactly how many papers have used the various benchmark repositories, but it is clearly the basis for hundreds of papers, based on google scholar hits on the DIMACS paper referencing the instances.

I had this experience in mind ten years ago when Kelly Easton, George Nemhauser and I wanted to publish about work we had done with Major League Baseball in their scheduling.  It made no sense to use MLB as a benchmark, since there is only one instance per year and much of the information about the needs of a schedule is confidential.  So we created the Traveling Tournament Problem that abstracts two key issues in MLB scheduling: travel distance, and “flow” (the need to mix home and away games).  We created a set of instances, solving a few smaller ones, and let it loose on the world.  The result was fantastic:  dozens of groups started working on the problem, and we could clearly see which techniques worked and which didn’t.

I had made a terrible mistake when creating benchmarks for graph coloring.  I didn’t keep track of best results.  This led to a fair amount of chaos in the field, with contradictory results appearing (claimed coloring values better than claimed lower bounds), and no clear picture of where things are going.  I had thought at one time that I would try to clean things up with a “Repository of Optimization Instances and Solutions”, but too many other things have intruded for me to spend the time necessary on that.  Fortunately, Stefano Gualandi and Marco Chiarandini have put together a site for graph coloring solutions, and I hope they will be successful in their efforts to put a little structure in the field.

I learned from that mistake and was much more diligent about keeping track of solutions for the Traveling Tournament Problem.  The TTP site is always up to date (OK, almost always), so people can reasonably trust the results there.  I have recently extended the site to include instances for non-round-robin scheduling and for the Relaxed TTP (where there is an opportunity for off-days).

One relatively new problem I am excited about is scheduling umpires in sports.  Former doctoral students Hakan Yildiz (now at Michigan State) and Tallys Yunes (Miami) and I developed a problem called the Traveling Umpire Problem which again tried to abstract out key issues in Major League Baseball scheduling.  In this case, the umpires want to travel relatively short distances (unlike the players, the umpires have no “home city”, so they are always traveling) but should not see the same teams too often.  This problem feels easier than the Traveling Tournament Problem, but we still cannot solve instances with 14 or more umpires to optimality.  This work received a fair amount of interest when the university PR people caught hold of our Interfaces paper.  Since that paper, Hakan and I have put together a couple of other papers, exploring optimization-based genetic algorithms and benders-based local search approaches for this problem (to appear in Naval Research Logistics).  Both papers illustrate nice ways of using optimization together with heuristic approaches.  The website for the problem gives more information on the problem, along with instances and our best solutions.

I don’t think my repositories of benchmarks will be as influential as, say MIPLIB, which focuses on mixed integer programs.  But I do like to think that they make the operations research world run a bit smoother.

Prostates and Probabilities

After a few years hiatus, I finally got back to seeing a doctor for an annual physical last week.  For a 51-year-old male with a fondness for beer, I am in pretty good shape.  Overweight (but weighing a bit less than six months ago), pretty good blood pressure (123/83), no cholesterol issues, all without the need for an drug regime.

Once you hit your mid-to-late 40s  (and if you are a male), doctors begin to obsess about your prostate.  There is a good reason for this:  prostate cancer is the second most common reason for cancer death among males (next to lung cancer).  So doctors try to figure out whether you have prostate cancer.

However, there is a downside to worrying about the prostate.  It turns out that lots of men  have prostate cancer, and most men will die of something else.  The cancer is often slow growing and localized, making it less of a worry.  Cancer treatment, on the other hand, is invasive and risky, causing not only death through the treatment (rare but not negligible) but various annoying issues such as impotence, incontinence, and other nasty “in—“s.   But if the cancer is fast growing, then it is necessary to find it as early as possible and aggressively treat it.

So doctors want to check for prostate cancer.  Since the prostate is near the surface, the doctor can feel the prostate, and my doctor did so (under the watchful gaze of a medical student following her and, as far as I know, a YouTube channel someplace).  When I say “near the surface”, I did not mention which surface:  if you are a man of a certain age, you know the surface involved.  The rest of you can google “prostate exam” and spend the rest of the day trying to get those images out of your mind.

Before she did the exam, we did have a conversation about another test:  PSA (Prostate Specific Antigen) testing.  This is a blood test that can determine the amount of a particular antigen in the blood.  High levels are associated with prostate cancer.  My doctor wanted to know if I desired the PSA test.

Well, as I was recovering from the traditional test (she declared that my prostate felt wonderful:  if it were a work of art, I own the Mona Lisa of prostates, at least by feel), I decided to focus on the decision tree for PSA testing.  And here I was let down by a lack of data.  For instance, if I have a positive PSA test, what is the probability of my having prostate cancer?  More importantly, what is the probability that I have the sort of fast growing cancer for which aggressive, timely treatment is needed?  That turns out to be quite a complicated question.  As the National Cancer Institutes of the NIH report, there is not any clear cutoff for this test:

PSA test results show the level of PSA detected in the blood. These results are usually reported as nanograms of PSA per milliliter (ng/mL) of blood. In the past, most doctors considered a PSA level below 4.0 ng/mL as normal. In one large study, however, prostate cancer was diagnosed in 15.2 percent of men with a PSA level at or below 4.0 ng/mL (2). Fifteen percent of these men, or approximately 2.3 percent overall, had high-grade cancers (2). In another study, 25 to 35 percent of men who had a PSA level between 4.1 and 9.9 ng/mL and who underwent a prostate biopsywere found to have prostate cancer, meaning that 65 to 75 percent of the remaining men did not have prostate cancer (3).

In short, even those with low PSA values have a pretty good chance of having cancer.  There is the rub between having a test “associated with” a cancer, and having a test to determine a cancer.  Statistical association is easy: the correlation might be very weak, but as long as it is provably above zero, the test is correlated with the disease.  Is the correlation high enough?  That depends on a host of things, including an individual’s view of the relative risks involved.  But this test is clearly not a “bright line” sort of test neatly dividing the (male) population into those with cancers that will kill them and those without such cancers.

In the days since my doctor’s appointment, there have been a slew of articles on PSA testing, due to the US Preventative Services Task Force moving towards declaring that PSA testing has no net benefit.  The Sunday New York Times Magazine has an article on prostate screening.  The article includes a wonderfully evocative illustration of the decision to be made:

David Newman, a director of clinical research at Mount Sinai School of Medicine in Manhattan, looks at it differently and offers a metaphor to illustrate the conundrum posed by P.S.A. screening.

“Imagine you are one of 100 men in a room,” he says. “Seventeen of you will be diagnosed with prostate cancer, and three are destined to die from it. But nobody knows which ones.” Now imagine there is a man wearing a white coat on the other side of the door. In his hand are 17 pills, one of which will save the life of one of the men with prostate cancer. “You’d probably want to invite him into the room to deliver the pill, wouldn’t you?” Newman says.

Statistics for the effects of P.S.A. testing are often represented this way — only in terms of possible benefit. But Newman says that to completely convey the P.S.A. screening story, you have to extend the metaphor. After handing out the pills, the man in the white coat randomly shoots one of the 17 men dead. Then he shoots 10 more in the groin, leaving them impotent or incontinent.

Newman pauses. “Now would you open that door?”

Is more information better?  To me, information matters only if it changes my actions.  Would a “positive” PSA test (whatever that means) lead me to different health-care decisions?  And is it really true that more information is always better?  Would my knowing that I, like many others, had cancerous prostate cells (without knowing if they will kill me at 54 or 104) really improve my life?

Perhaps in a few years, we’ll have a some advances.  Ideal, of course, would be a test that unmistakably can determine if a man has a prostate cancer that, untreated, will kill him.  Next best would be better, more individual models that would say, perhaps “A 53 year old male, with normal blood pressure and a fondness for beer, with a prostate shape that causes angels to sing, and a PSA value of 4.1 has an x% chance of having a prostate cancer that, untreated, will kill him with five years.”  Then I would have the data to make a truly informed decision.

This year, I opted against the PSA test, and everything I have read so far has made me more confident in my decision.  Of course, I did not necessarily opt out of PSA testing forever and ever:  I get to make the same decision next year, and the one after that, and the one after that….  But I will spend the time I save in not worrying about PSA testing by working out more at the gym (and maybe adding a bit of yoga to the regime).  That will, I think, do me much more good.