The Perils of Search Engine Optimization

This blog has been around more than six years, making it ancient in the blogosphere.  And, while not popular like the big-boy blogs (I run about 125,000 hits a year with about 1500 RSS subscribers according to FeedBurner), I think I have a reasonable-sized audience for the specialized topic I cover (operations research, of course!).  People recognize me at conferences and I get the occasional email (or, more commonly, blog comment) that lets me know that I am appreciated.  So I like blogging.

The past few months, however, have been a chore due to the amount of comment spam I get.  Of course, I have software to get rid of most the spam automatically (Akismet is what I have installed), since otherwise it would be unbearable.  Akismet stopped 5,373 spam comments in the last year.  This sounds like a lot but that is way down from the heights a few years ago:  Akismet stopped 6,711 spams in the month of March, 2009 alone.  Unfortunately, it is letting a lot more spam come through for me to judge: in the past year 619 entries were put through to moderation that I determined were spam.  This is a frustrating exercise since I like my readers: if they want to say something, I want them to say it!  But comment after comment from places like “Sacremento Cabs” or “Callaway Reviews” saying vaguely on-topic things is a bit hard to take.   Sometimes it seems that someone has taken the effort to read the blog post and comments:

“From the communication between the two of you I think I can say that I wish I had a teacher like Mr. X and I wish I had a student like Y.”

came in recently, where Mr. X and Y were previous (legit) commentators.  But the URL of the commentator was to some warez site, so I deleted it all.  Is this a human posting, or just some pattern matching?

Why the sudden influx?  Further checking the logs showed that a number of people (a couple hundred per month) are getting to this blog by searching something like ‘site:.edu inurl:blog “post a comment”‘.  Sure enough if I do that search (logging out from google and hoping I get something like a generic search result, I get the following:

Wow!  Of all the .edu blogs that include the phrase “Post a Comment”, I come in at number 3!  Of course, despite my efforts, Google may still be personalizing the search towards me, but clearly I am showing up pretty high to attract the attention of hundreds of people.  Through my diligence and efforts, I have made this blog attractive to the Google algorithms (I do seem to be number 1 for “operations research blog” and some other natural searches).  This is a great Search Engine Optimization success!

Or not.  Because clearly I am attracting lots of people who have no interest in what I have to say but are rather visiting to figure out how they can manipulate me and the blog for their own non-operations research purposes (I am perfectly happy to be manipulated for operations research purposes!).   The sponsored link in the search gives it away: there are companies working hard to get comments, any comments, on blogs (presumably any blogs).   How many of those 125,000 hits were really my audience (people in operations research or those who would like to know more about it)?  Do I really have an operations research audience at all (beyond Brian, Matt, Laura, and a few others who I know personally)?

I’ll spend time thinking about ways to avoid this aggravation.  I’ve already put in NOFOLLOW tags, so there is no SEO value to any URLs that get through. I already warn that if the URL submitted is not on operations research, then the comment will be deleted.  I could get rid of URLs completely, but I do see legitimate comments as a way of leading people to interesting places.  I could add more CAPCHAs and the like, though I am having trouble with some of those myself, particularly when surfing with a mobile device.  Or I can put up with deleting useless comments with inappropriate URLs and just relax about the whole thing.  But fundamentally: how can you do Search Engine Optimization to attract those you would like to attract without attracting the attention of the bottom-feeders?

On the off chance that one of the …. shall we say, enterprising souls, has made it through this post, perhaps you can explain in the comments why you add a comment when the topic is of no interest to you.  Do you get a dime for every one that makes it through?  Are you bored and find this a useful way to pass a quiet afternoon?  Are you a program, mindlessly grabbing parts of the post and comments to make yourself look more human?  Add a comment:  I probably won’t let your URL through but at least we can understand each other a bit better.

 

Social Networks and Operations Research

Until recently, I pretty well had a handle on my use of social networks.   Rather than try to use a single social networking system in multiple ways, I have used different systems in different ways and for different networks.

  • I have a blog, of course, and I use that to pontificate on various aspects of operations research.  While the communication is primarily one-way, I see this as a network since 1) I have enough regular commentators that I feel there is some two-way conversation, and 2) there is a network of OR bloggers (the “blogORsphere”) and our various posts often riff off each other, particularly now that INFORMS provides a monthly topic for us to use in common (this post is part of the July challenge on OR and social networks).    You can get a feed of all the OR blogs either in the sidebar at my page or through a Google Reader site. If you have an operations research blog and are not included, please let me know!Feedburner and my log files suggest that each post is read about 3000 times in the first week of posting (after that, each post gets a regular trickle of readers through search).
  • I have a twitter account (@miketrick) where 90% of my tweets have some operations research content (denoted by an “#orms” hash tag).  About 10% of the time, I am griping about some failure in customer service or something similar on non-operations research aspects.   When I post on my blog, a tweet automatically goes out through my twitter account. I follow 183 other twitter users and am followed by just over 700 others, most presumably for the #orms content.
  • I have a facebook account (michael.trick).  Again, a post on my blog generates a facebook entry, but I primarily use facebook for my real-life friends and family, and rarely post on operations research (except the blog entries).

And that seemed enough!  But recently, there have been more social networks that I have had to integrate in to my life, and the existing ones have changed.

  • LinkedIn remains a mystery to me.  I certainly have done a fair amount of linking, with 365 direct connections.  Many of these are former students who want to stay connected to me, and I am happy to be connected.  It has even been useful when getting a request like “Do you know anyone at X who can help me with Y”.  And somehow I am getting emails on conversations that are going on at LinkedIn that actually look pretty good.  But when I go to the site, I can never find where those conversations are coming from, and I am just generally overwhelmed with minutia about who has changed their picture and commented on what.  My blog and twitter feed gets mirrored at LinkedIn, but otherwise this is just not something I have been active in.
  • OR-Exchange is  a Question and Answer site that focuses on operations research and analytics.  In many ways, this was a response to the death (or near death: there are some diehards holding on) of the Usenet group sci.op-research.  That group died under the weight of “solution key sellers”, ersatz conference announcements, mean-spirited responses from curmudgeonly long-timers, and general lack of interest.  So I registered the site or-exchange.com at a Q&A site, and started things off.  Since then, the system has taken a life of its own.  INFORMS now hosts it, and there are a dozen or so very active participants along with a larger number of regulars.  I am not sure the system has really reached critical mass, but I am very hopeful for it.
  • Facebook is moving in a direction that might make it more relevant for my professional life.  Bjarni Kristjansson has put together a group “I like operations research” that is getting some traction.  I put together a page that provides the feed to all of the operations research blogs that I can find (this is the same group of entries that is in the sidebar of my blog).
  • Reddit.com is a very popular way to point out links, and “cavedave” has done a great job in putting together a “sysor” subreddit.  With a couple thousand readers, a post there gets a noticeable bump in readership.
  • Google Plus simply baffles me at the moment.  I have an account, but I don’t know how to treat it.  It seems silly to just recreate a twitter feed in plus, but there doesn’t seem to be a hole in my personal social networking activities  that requires plus.  I had already done the “circles” thing by my different uses of facebook, twitter, and my blog, so it is not a great addition.  But I hate to think I am missing out on something big.  On the other hand, I did spend a couple of days on Google Wave, so I am a little hesitant to simply leap on this bandwagon.

As I look through all of this, I can’t help but reflect on how fragmented this all is.  Wouldn’t it be great to have a real community site where all of us in operations research can get together to share thoughts, papers, links, and more?  Bjarni is working hard at pushing INFORMS in the direction of providing such a community site.  But the sad thing is that we had such a site more than ten years ago, when social networking was in its infancy.  ILOG, through people like Irv Lustig, created a site e-optimization.com.  It lasted a couple of years, but could not survive the pressures of the dot-com crash.  INFORMS keeps a snapshot of the site (with limited functionality), and it is still impressive long after it shuttered its doors.

And, as I look closer to all of the activity, I am amazed that there is not more.  Why are there not hundreds of operations research blogs, instead of the couple of dozen that I list?  Why doesn’t every doctoral student in operations research have a twitter account?  Is there a social networking world I am missing?  If not, where is everybody?

Of course, if you are reading this, then you are in my social network, and I am very grateful that you are.

 

Hello Cousin!

My father has spent time over the last decade collecting pictures and documents related to our family tree.  I greatly appreciate him doing this, and the result is fascinating.  There is no one really famous in my tree, unless you are a follower of (Canadian) prairie socialism, since I think J.S. Woodsworth is in there, by marriage into the Staples family, my father having Staples as a middle name.  But the pictures of past generations are evocative and the stories of families moving from Europe to rural Canada are inspirational.  The bravery in pulling up roots in a world when communication times are measured in weeks or months is unbelievable.  It makes me realize how easy I have had it, and how my own choices are so relatively costless either way.

I addition to real ancestry, there is also academic ancestry, tracing descendents through the academic advisement.  Within operations research (and mathematics more generally), we have a central collection point for academic ancestry:  the Mathematics Genealogy Project.   This site has collected the the advisor/advisee relationships for more than 150,000 mathematicians, including many in operations research.  This site is certainly not new, dating back to 1999 or so 1997, and I have known about it practically since the start.  It is only now, however, that I sent in the information on my own advisors (Don Ratliff and John Bartholdi) and my seven advisees.  It will take a bit of time to update the site.  In the world of Web 2.+, it is strange to have such a delay, but it appears there is still a bit of hand editing.  Soon my tiny part of the tree will be accurate.

For finding my ancestry, the first part is easy:  John Bartholdi was a student of Don Ratliff (starting me on a non-branching family tree), and Don was the student of Manny Bellmore, who also advised now-billionaire John Malone.   Bellmore was the student of Frederick (Tom) Sparrow, a long time faculty member at Purdue.  Looking at Tom’s descendents, I see he was the advisor to Stella Dafermos who advised … fellow OR-blogger, and network guru, Anna Nagurney!  In fact, the picture of Dr. Sparrow comes from Anna’s Virtual Center for Supernetworks. So it turns out that Anna is my … umm… first cousin once removed?  Anyway, we are definitely related, as you can tell by the fact that she writes very well, and I … type things into my blog (generally with too many parentheses and exclamation points!).

I’m continuing to work my way back.  It seems that most people end up back at Gauss, but we’ll see where I end up.  I think I would be more delighted to see that most of operations research blogORsphere comes from close academic relatives!

So What Correlates with Operations Research?

Google Labs has a new tool called Google Correlate. Google provided some early correlation results during the 2008 flu season when it showed that search count for certain terms (like “flu” presumably) could be used to estimate the prevalence of flu in an area.  This led to Google Flu Trends (it appears that currently only South Africa has many cases of the flu).

You can now play this game on your own data.  Have a time series over the last 9 years or so?  You can enter it into Google Correlate and see what search terms are correlated with the data.

Even easier is just entering a search term:  it will then return other correlated search terms.

If you are going to periodically write in something called “Michael Trick’s Operations Research Blog”, it is clear what to do next:  search on “Michael Trick” (it is required to egotistically search on your own name first, right?).  No dice:  I’m not popular enough to justify a search (sigh…).

But, of course, “operations research” works fine.  What  correlates with that phrase?  Turns out lots of interesting things:  “signal processing”, “information systems”, and … “molecular biology”?  What are the common features on these terms?  Well, they were relatively more common search terms in 2004-2005, relatively flat in the past three years, and have a strong seasonality, corresponding to the start of the academic year (“Hey, I signed up for Operations Research:  what the heck is that?”).  Whether it is operations research, signal processing or molecular biology, it appears lots of academic departments begin September with students frantically searching on their subjects.

We can try another term with some currency:  “business analytics”.  The result is somewhat surprising.  “Thank you email”?  “Vendor portal”?  “Zoes Kitchen”?  It seems hard to make much sense of this.  As we know, “business analytics” is a relatively new term and the search quantity is less than that of “operations research” which perhaps explains the spurious correlations:  there are so many terms that are searched as often as “business analytics” that the highest correlations come more or less randomly.

To data people like us (me, anyway), the ability to search correlations is endlessly fascinating.  Shift the operations research time series by 13 weeks and what do you get:  things like “portable mp3” and “retriever pictures”:  clearly our students are bored with our course and are surfing around for something more entertaining.  What does “management science” search correlate with?  “introduction” and “social research”.  Is there anything interesting to be learned by the differences in correlates between operations research and management science?  Nothing springs to mind, but there might be a thesis or two there.

I am not sure what any of this means, but it sure is a great way to spend an early summer afternoon!

 

 

That’s got to be true… doesn’t it?

Back in 1996, Harvey Greenberg, longtime faculty member at the University of Colorado at Denver, began putting together a collection of myths and counterexamples in mathematical programming.  While generally I find mathematical programming to be quite intuitive, there turn out to be lots of things that I think must be true that are not.  Consider the following:

  1. The duality theorem applies to infinite LPs.
  2. Simulated annealing converges more slowly than steepest descent when there is a unique optimum.
  3. In linear programming, a degenerate basis implies there is a (weakly) redundant constraint.
  4. If f has continuous nth-order derivatives, local behavior of f can be approximated by Taylor’s series.
  5. The problem of finding integer x such that Ax = b, where A is an m by n integer matrix and b a length m integer vector, is NP-complete.

Amazingly none of these are true!  Reading through the myths and counterexamples reminds me of how much I “know” is really false.

The Myths and Counterexamples document is hosted by the INFORMS Computing Society as part of its Mathematical Programming Glossary, and Harvey periodically updates the Myths site (with the last update being in February 2010).  If you have shown that something that seems obvious is actually false, be sure to let Harvey know about it.  And the next time you are doing a proof and are tempted to make a claim because “it is well known that…” or “obviously…”, perhaps you should check out the site first.

Thanks to @fbahr and the folks at Reddit’s SYSOR for reminding me of the value of a project I have followed for about fifteen years so far!

Recently on OR-Exchange…

OR-Exchange is a question and answer site on operations research (and analytics).   The concept couldn’t be simpler.  People ask questions about operations research;  people answer questions about operations research.  Kinda like the usenet group sci.op-research without the spam.

I put together the site a couple of years ago when stack-exchange made it easy to put such sites on their server.  The idea was to mimic the popular mathoverflow site, but to specialize on operations research issues.  I had no idea of how well this would work, but it started off pretty active and continued to grow.

About a year ago, stackexchange decided on a different path, and they no longer wanted to host or support other groups.  Instead, groups of people could go through a process to become a true stackexchange site.  A number of groups have done that, and have done well with that path.  Unfortunately, our site was a little small for that direction.  Even now, with 381 “official users”, it would be the smallest of any stackexchange site (the current smallest is Jewish Life and Learning with 401 users).  The requirements to be a stackexchange site seemed insurmountable, so we needed another solution.

At this point, INFORMS stepped in and offered to sponsor the site.  After receiving confirmation that “sponsorship” did not mean “ownership” and that we could continue acting the way we were, we (i.e. me, along with a few of the very active participants) decided to move the site to INFORMS.  A big question was the software to use, since stackexchange software was no longer available.  Fortunately, there was an open source replacement from osqa.net, so it was just a matter of installing that….   Famous last words!  Installing the software and getting the current questions and answers from stackexchange was no easy feat.  Fortunately, David and Herman from INFORMS were up to the task, and were able to do the herculean task of getting things up and running smoothly.  The conversion happened on April 8, while I was sitting in a faculty meeting, doing the few minor things I needed to do, like pointing or-exchange.com from stackexchange to INFORMS (Here is some advice:when sitting in a faculty meeting, do not try to guess URLs;  godaddy and bigdaddy lead to radically different sorts of sites).  And things have worked great since then!

As I said, there are 381 registered users for the site, with about 40 being reasonably active.  But you don’t have to register to read the questions and answers, and there are about 300 unique visitors per day who do so, often due to hits at google.  This 300 is more than the background hits on this blog (when I post, hits spike up, but I run about 275 hits per day between posts).  There have been 316 questions asked, generating 1152 answers, along with at least that many comments.  At this point, there are eight moderators, though the moderation touch is extremely light.

Recently people have asked about

and much more!  It is a friendly group (except when it comes to answering homework problems!) so if you have a question in the area of operations research, broadly defined, don’t hesitate to check it out!  And thanks to INFORMS, and particularly Terry, David, and Herman, for the sponsorship and the outstanding technical support.

Algorithmic Pricing

900,000,000 bookThe Twitterverse is giggling over some of the absurd pricing for some used books at Amazon (Panos Ipeirotis and Golan Levin were two  who tweeted on the subject).  There are books at Amazon where the price is in the millions of dollars!  How can such a thing happen?

While I love the picture of an eager seller (“I have just two books for sale, but, man, if I sell one, I am set for life!”), the explanation is much more mundane, at least in some cases. As the “it is NOT junk” blog shows, it is clear that two sellers of the book The Making of a Fly (a snip at a price height of a mere $23 million) are setting their price relative to each others price.  Seller A sets its price equal to .99830 times that of seller B;  B sets its equal to 1.27059 of A.  Every day the price of the book goes up by a factor of 1.26843.  Do this for a few months, and you’ll get prices in the millions.

This sort of market driven pricing is not unreasonable.  Sellers with good reputation are able to command higher prices (see, for instance, the paper by Ghosth, Ipeirotis, and Sundararajan on “Reputation Premiums in Electronic Peer-to-Peer Markets” for results and references). A firm might well feel that its reputation is worth a premium of 27.059%. Another firm might adopt an alternative strategy of just undercutting the competition by, say .0017%. Everything works fine until they become the only two firms in a market. Then the exponential growth in prices appears since there is no real “market” to base their differential on.

Such an issue would be nothing more than an amusing sideline if it weren’t for the effect such algorithmic prices can have on more important issues than obscure used books. The “flash crash” of the stock market in 2010 appears to have been caused by the interaction between one large sale and automated systems that executed trades based on trading volumes, not price. As the SEC report states:

“… under stressed market conditions, the automated execution of a large sell order can trigger extreme price movements, especially if the automated execution algorithm does not take prices into account. Moreover, the interaction between automated execution programs and algorithmic trading strategies can quickly erode liquidity and result in disorderly markets.”

Pricing based on markets abound.  At the recent Edelman competition, one of the groups (InterContinental Hotels Group) discussed a price-setting mechanism that had, as one of the inputs, the competing prices in the area. Fortunately, they had a “human in the loop” that prevents spiraling prices of the form seen at Amazon.

In a wish to be quick, there is great pressure to move to automated pricing. Until we create systems that are more robust to unforeseen situations, we risk having not just $900,000,000 books, but all sorts of “transient” effects when systems spin out of control.  And these effects can cause tremendous damage in a short period of time.

INFORMS Sponsorship of OR-Exchange

OR-Exchange has been a question and answer site on operations research in existence for about two years. Over that time, there have been 290 questions, generating more than 1000 answers. You have a question? Chances are there is someone there to answer!

Coinciding with the newly revamped INFORMS Conference on Business Analytics and Operations Research is the INFORMS sponsorship of OR-Exchange. Conversion to new software and the INFORMS computing system has gone smoothly over the past few days (thanks David!), and we are excited about the new opportunities that come with INFORMS support.

In keeping with the renaming of the conference, we’ve also changed the tagline for OR-Exchange. We are now “Your place for questions and answers in operations research and analytics”.

The Sexiness of Integer Programming

Suresh Venkatasubramanian, of the excellent Geomblog, is at SODA and covered some preconference conferences. When covering a paper that used integer programming (via CPLEX), his comment was:

It’s not the “sexiest” thing in the world to solve algorithms problems in practice by using large industrial strength packages.

Oh, he didn’t say that, did he?

I spend most of my life “solving algorithms problems in practice by using large industrial strength packages”. I solve sports scheduling problems, logistics problems, and even problems in social choice with large scale integer programming software as my primary tool. Who says I am not sexy? Or, worse, that my research is not sexy?

I can think of a few misconceptions that might lead one to doubt the sexiness of integer programming.

  • Integer Programming is uncreative. After all, you just formulate the problem in terms of variables, linear objective, and linear constraints, toss it in a code, and you are done. Where is the research?

    Such a view ignores the tremendous number of choices once must make in formulating integer programming. The choice of variables, constraints, and objective can have a tremendous impact on the effectiveness of the solver. While solvers continue to get better at automatically identifying formulation improvements, formulating integer programs is still an art. It is an art, however, enhanced by theory. As we better understand the polyhedral characteristics of integer programming formulations, we create better and better formulations.

    This creative effort is expanded when alternatives to “basic” formulations are included. Creating a branch-and-price, branch-and-cut, Benders, or other type of formulation can be the key to solving real problems, but may require significant research effort.

  • Integer programming is slow. This seems to be the most common response I get from those outside the integer programming subfield. One quote from a person at a well-known computer science-oriented firm: “I tried integer programming. It doesn’t work.” Not “It doesn’t work for this problem” or “I couldn’t make it work”. Just “It doesn’t work”. How can a topic be sexy if it doesn’t work?

    I think one of the great stories in all of mathematics and business is the incredible confluence between increased data, faster computers, algorithmic improvements, and implementation successes that have led to many orders of magnitude speed increases in integer programs. Over the last fifteen years, we have gotten much, much better at solving integer programming models. With the underlying linear programming solvers being more than million times faster (no hyperbole: both computers and algorithms provide more than a 1000 time speedup each), lots of instances formerly out of reach can now be solved routinely.

    Of course, integer programming is exponential time in the worst case. But I am not sure why a polynomial time algorithm that gets an approximate solution within a factor of, say, 42 is any “sexier” than an algorithm that finds the optimal solution in a reasonable amount of time for any instance of practical import.

  • Integer programming is expensive. Hey, if you have to spend tens of thousands of dollars for solutions, that really doesn’t count does it? It is like a troll dressed up in a $5,000 suit: it might look OK, but it’s not sexy!

    No one has to pay a lot for high quality integer programming software. Suresh refers to this in his post:

    This was surprising to me on two levels: firstly, that CPLEX is actually free for academic use (who knew!) and that such a simple approach is so effective.

    Actually, all the major programs offer a way for free academic use. I particularly like Gurobi’s approach, which is based on periodic validation through a “.edu” domain, but academics can also get CPLEX (as you would have read on my blog last year) and XPRESS.

    And if you are not an academic? COIN-OR offers “industrial strength” linear and integer programming in an open source approach.

In Suresh’s defense, here is the full quote, where he makes clear his support for this type of research:

It’s not the “sexiest” thing in the world to solve algorithms problems in practice by using large industrial strength packages. However, both CPLEX and SAT solvers are examples of tools that can be used in practice to solve fairly intractable problems. It still takes a lot of engineering and skill to make the heuristics work well, but it’s something that we should be using as a matter of course when designing heuristics before trying to invent an algorithm from scratch.

Suresh obviously gets it: let’s hope his post expands interest in integer programming methods for these problems.

Integer programming is useful, fast, and cheap. If that isn’t sexy (for an algorithm!), then I don’t know what is.

First INFORMS Blog Challenge

INFORMS has announced the results of the first Blog Challenge and it is a great success.  Fourteen bloggers had a post on the subject “OR and the Holidays” (including me!).    January’s Challenge moves into current events with the topic “OR and Politics”.  If you post on that subject, be sure to email graphics@mail.informs.org with the pointer.