Algorithmic Voting Theory, Venice, and a Talk on Old/New Papers

I just got back from Venice, where I attended a conference on Algorithmic Decision Theory.  This is a new conference series (optimistically numbered the first conference, implying at least a second) revolving around issues in uncertainty in decision making, preference solicitation, learning and other issues.  From the conference web page:

A new unique event aiming to put together researchers and practitioners coming from different fields such as Decision Theory, Discrete Mathematics, Theoretical Computer Science and Artificial Intelligence in order to improve decision support in the presence of massive data bases, combinatorial structures, partial and/or uncertain information and distributed, possibly inter-operating decision makers. Such problems arise in several real-world decision making problems such as humanitarian logistics, epidemiology, risk assessment and management, e-government, electronic commerce, and the implementation of recommender systems.

This area has been very active, particularly in computer science where there are  a host of applications.

I was asked to give one of the invited talks, and spoke on “An Operations Research Look at Voting”.  I was a little worried about giving this talk, since my main qualifications come from papers published twenty years ago.  When I was a doctoral student at Georgia Tech, I worked with John Bartholdi and Craig Tovey on computational issues in voting.  Being the first to look at those issues, we got to prove some of the basic results in the area.  These include

  1. For some natural voting rules, it is NP-hard to determine who the winner is.
  2. For some natural voting rules, it is NP-hard to determine how to manipulate the rule (where manipulation means misrepresenting your preferences so as to get a preferred outcome).
  3. For some natural voting rules, optimally using the powers of a chair to form subcommittees or otherwise manipulate the voting process is NP-hard.

We published this work in Social Choice and Welfare (after getting it rejected from more mainstream economics journals) where … it was soundly ignored for 15 years.  No one referred to the papers; no one followed up on the papers;  no one cared about the papers at all!

This work was my job talk in 1987/88, and it got me a wonderful job here at the Tepper School (then GSIA), Carnegie Mellon.  And, without Google Scholar, it was not obvious that the papers were being ignored, so they added to my vita, and made it a bit easier to pass through the various steps.

But I did like the work a lot, and regretted (and still regret) that my economist colleagues did not take computational limits more seriously in their models.

votingcitesBut then something amazing happened about 5 years ago:  people started referring to the papers!  The references were mainly in computer science, but at least the papers were being recognized. The counts of these papers in the Web of Science (formerly Science Citation Index) are particularly striking.  In the associated graph, the x-axis is years since publication;  the y-axis is the number of references in Web of Science in that year (Google scholar numbers are higher of course, but there is a bias towards more recent papers there).  In my talk, I compare that graph to my “normal” papers, which reach a peak after 4 or 5 years then decrease.   It is really gratifying to see the interest in these papers along with all the really cool new results people are getting.

I closed off the talk with some work I have done recently on creating voting trees.  Suppose there are three candidates, “a”, “b”, and “c”, and you really like candidate “a”.  Many voting systems proceed as a series of comparisons between two alternatives (think of standard parliamentary procedure).  If you are the chairperson, you might try to bring the candidates forward so as to increase the chances of “a” winning.  In fact, if you set the agenda to be “b” against “c” and the winner against “a”, then “a” will win as long as “a” beats someone (and no one beats everyone).  In this problem, the goal is to do the manipulation without knowing other voters’ preferences.

lextree4Can you do this for four candidates?  If you want “a” to win, “a” must be in the top cycle: the group of candidates (perhaps the entire set of candidates) who all beat all the other candidates.  The “Condorcet winner” is the minimal top cycle:  if some candidate beats all the other candidates one-on-one, then that candidate must win any voting tree it occurs in.  So, assuming “a” is in the top cycle, can you create a voting  tree so that “a” wins with four candidates?  The answer is yes, but it is a bit complicated:  first “a” goes against “c” with the winner against “d” then the winner against “b” who plays the winner of (“a” goes against “b” with the winner against “d” …) ….  actually, the minimum tree has 14 leaves!  I am biased, but I think the tree is beautiful, but it goes to show how hard it is to manipulate agendas without knowledge of others’ preferences.  I am in the process of generating the trees on 4 candidates:  there is a very natural rule (“Copeland winner with Copeland loser tie break”:  see the presentation for definitions) that requires more than 32 leaves (if an implementation tree for it exists).

Sanjay Srivastava and I made a conjecture almost 15 years ago that would imply that this sort of manipulation would be possible no matter how many candidates.  Little progress has been made but I think it is still a great problem (the economists know this as implementation by backwards induction and characterizing rules implementable on trees is an important problem in social choice/mechanism design).

If you want more details on all this, here are my slides.  The references are

  • Small Binary Voting Trees M.A. Trick, Small Binary Voting Trees, First International Workshop on Computational Social Choice,
    Amsterdam, Netherlands, 500-511 (2006).
  • Sophisticated Voting Rules S. Srivastava and M.A. Trick, Sophisticated voting rules: The two tournament case, Social Choice and
    Welfare, 13: 275-289 (1996).
  • How hard is it to control an election?, with C. A. Tovey and M. A. Trick. A slightly revised version of this appeared in Mathl. Comput. Modelling (Special Issue on Formal Theories of Politics) 16(8/9):27-40 (1992).
  • The computational difficulty of manipulating an election, J.J. Bartholdi, C. A. Tovey and M. A. Trick (1989); Social Choice and Welfare 6:227-241 (1989).
  • Voting schemes for which it can be difficult to tell who won the election, J.J. Bartholdi, C. A. Tovey and M. A. Trick; Social Choice and Welfare 6:157-165 (1989).

Oh, and Venice is a beautiful place to visit.  But you might have heard that elsewhere.

Hey Buddy, Can I Sell You an NP Hard Problem?

In keeping with issues of economics and computational power, there is a very neat paper out of Princeton by Arora, Barak, Brunnermeier, and Ge entitled “Computational Complexity and Information Asymmetry in Financial Products“.  Can you embed an NP-hard problem into the pricing problem for a financial instrument?  As the authors point out, the answer is clearly “yes” and the issue is really how naturally you can do it.  For instance, I could have an instrument that pays $1000 if the answer to a particular, large traveling salesman problem is less than 1 million miles, and $0 otherwise.  Pricing such an instrument involves solving an NP-complete problem, but no one would argue that this implies anything about real financial instruments.  The authors give another, similarly artificial, example:

Consider for example a derivative whose contract contains a 1000 digit integer n and has a nonzero payoff if the unemployment rate next January, when rounded to the nearest integer, is the last digit of a factor of n. A relatively unsophisticated seller can generate such a derivative together with a fairly accurate estimate of its yield (to the extent that unemployment rate is predictable), yet even Goldman Sachs would have no idea what to pay for it. This example shows both the difficulty of pricing arbitrary derivatives and the possible increase in asymmetry of information via
derivatives.

Nobody is actually interested in embedding an NP-Hard problem in a financial instrument in that way.  If the pricing is not clear, and there is obvious information asymmetry, buyers will simply assume they are getting ripped off and walk away (see adverse selection, below).

This paper does something much more interesting.  It develops a system where the firm offering financial assets seems to divide multiple assets up fairly but actually does so in a biased way.  In order to distinguish this tampering from non-tampering, an outside analyst has to solve a densest subgraph problem (an NP-hard problem).

In finance and economics, there is a issue called adverse selection.  Why am I hesitant to buy a used car?  People will only sell their lemons.  Why should we be cautious in hiring faculty who work at other schools?  The other school knows more about them, and will choose not to compete if they want to get rid of them.  Why should I be cautious when a bank chooses 10 mortgages to resell out of their portfolio of 10,000?  They know where the lemons are and will choose to dump them given the chance.

What the authors are saying is that even when a company sells all of their assets in groups in an apparently random way, it is possible to hide manipulation.  So adverse selection can occur in a way that is hard to determine.  Maybe the buyers will assume no adverse selection and we get to dump our lemons!  You can read the paper (or this blog posting from Andrew Appel) for some more information.

I have somewhat mixed feelings about this result (though I have only lived with it for a few hours:  this may take more time to settle my views).  On one hand, I really think that people in economics and finance need to be more aware of computational limits in their models.  On the other hand, it is not clear that NP-hardness is a particularly good defense here.  The authors have a somewhat breathless gloss on what NP-hardness means:

Computational complexity studies intractable problems, those that require more computational resources than can be provided by the fastest computers on earth put together.

Ummm… OK.  Sanjeev Arora is one of the world’s experts in complexity.  He is a Godel Prize Winner and a Fellow of the ACM.  He even, most tellingly, has a wikipedia entry.  I still don’t think this is the world’s best definition of intractable in the NP-hardness sense.  In particular, if a group put out 1000 groupings of financial instruments, and I needed to solve the densest subgraph problem on the resulting instance, I would work very hard at getting an integer program, constraint program, dynamic program, or other program to actually solve the instance (particularly if someone is willing to pay me millions to do so).  If the group then responded with 10,000 groupings, I would then simply declare that they are tampering and invoke whatever level of adverse selection correction you like (including just refusing to have anything to do with them).  Intractable does not mean unsolvable, and not every size instance needs more computing than “the fastest computers on earth put together”.

NP-hardness talks about worst case behavior as problem size grows.  Those of us in operations research spend a lot of time solving NP-hard problems like routing, timetabling, and many other problems because we really want solutions to instances of a particular size and with particular structure.  Bill Cook will solve practically any instance of the NP-hard Traveling Salesman Problem that you like (particularly if the financial incentives are right) as long as you keep it to no more than 100,000 cities or so.  I’d be happy to help a financial firm solve densest subgraph instances if it really mattered to them, NP-hardness be damned!

Of course, if this paper takes off, there might be real demand for operations researchers in looking at NP-Hard problems for the financial industry.  And that would be great for our field!

Thanks to my finance colleague Bryan Routledge for pointing out the paper, and for providing the excellent (IMHO) title to this entry.

Models, Information, and Market Rationality

I have come across a couple of items recently involving market rationality and the ability of the market to reflect “unknown” information.  The first came in a conversation with my colleague Bryan Routledge.  Harkening back to the Challenger disaster, Bryan mentioned that “the market” quickly determined the company that caused the failure (all this is my paraphrasing of my understanding of what Bryan said:  it is my fault if it is wrong!).  Here is the graph of the stock prices of the main companies involved in building the space shuttle on the day of the disaster:

shuttle company stock prices

There are four companies shown:  Morton Thiokol, Lockheed, Martin Marietta, and Rockwell.  The stock price for all of the companies immediately dropped 7-8% after the disaster.  Within an hour, three companies went back up to being just 2-3% down, while one company further decreased:  Morton Thiokol.  The company responsible for the O-ring (of Richard Feynman and ice water fame):  Morton Thiokol.  It is certainly provocative that the market seemed to know something immediately that took an investigation months to determine. It would have been even more impressive if the market identified this an hour earlier (the explosion happened at 11:39AM), but the results from the day are still pretty impressive.

But, as Bryan reminds me, this was not exactly a mystery to everyone at the time:  the engineers involved strongly suspected early what the issue was and later fed that information to Feynman.  So the information was out there and perhaps that information leaked out to the market in the immediate aftermath of the explosion. So perhaps it is not so mysterious after all. And there may well be other explanations for the larger drop off by Motton Thiokol.

Continuing the theme of markets seeming to have information that mere mortals do not, Panos Ipierotis, with tongue-in-cheek, suggested a prediction market on the P=NP question, arguing

So,if P=NP is a decidable problem, it is either true or false. So, a fully rational agent, participating in the market, should know whether P=NP. It is not a matter of probabilities! All the information to make the decision is available. So, if the market has one or more rational players, the market should converge to a price of 0 or 1 immediately, depending on the state of the problem. Right?

He then offers a few choices when the market, presumably, does not give such a result:

  • There are no rational agents. So, all the analysis of prediction markets that assume rationality of traders is incomplete.
  • There are rational agents. The market does not converge to 0 or 1 because the P=?NP problem is undecidable.
  • There are rational agents but the return from the risk-free rate until reaching the time to settlement exceeds the return from the market. So, the market gives information on how long it will take for the problem to be officially solved.
  • If your laptop cannot find the solution, neither can the market.

These are not mutually exclusive:  it could be more than one.  But I think it is pretty clear that there are no rational agents by this definition, and I think most or all economists will agree to that.  The concept of having a rational agent is no different than the assumption of linearity and divisibility and so on in linear programming.  A model cannot include everything (a map that includes everything would be as large as the area mapped), so simplifications are made.  Some economic models include limits on rationality, or on information, or on timing, while others will give their agents more power/information/etc. than could occur in practice.  There is normally a tradeoff:  with more detail on rationality may come limits in other areas, such as number of agents, or complexity of markets.  Given some set of assumptions, it is pretty easy to come up with a situation that exploits weaknesses in the assumptions. If that is the case, those are the wrong assumptions to make!

So I would agree with Panos: “all the analysis of prediction markets that assume rationality of traders is incomplete”.  That “incomplete” analysis might still be useful.  To paraphrase the statistician George Box: all models are incomplete;  some are useful.

This concept was certainly seen by Feynman in the Challenger case when he questioned whether some of the mathematical models used were useful.  From Feynman’s appendix to the Challenger Report:

When using a mathematical model careful attention must be given to uncertainties in the model.