# Hey Buddy, Can I Sell You an NP Hard Problem?

In keeping with issues of economics and computational power, there is a very neat paper out of Princeton by Arora, Barak, Brunnermeier, and Ge entitled “Computational Complexity and Information Asymmetry in Financial Products“.  Can you embed an NP-hard problem into the pricing problem for a financial instrument?  As the authors point out, the answer is clearly “yes” and the issue is really how naturally you can do it.  For instance, I could have an instrument that pays \$1000 if the answer to a particular, large traveling salesman problem is less than 1 million miles, and \$0 otherwise.  Pricing such an instrument involves solving an NP-complete problem, but no one would argue that this implies anything about real financial instruments.  The authors give another, similarly artificial, example:

Consider for example a derivative whose contract contains a 1000 digit integer n and has a nonzero payoff if the unemployment rate next January, when rounded to the nearest integer, is the last digit of a factor of n. A relatively unsophisticated seller can generate such a derivative together with a fairly accurate estimate of its yield (to the extent that unemployment rate is predictable), yet even Goldman Sachs would have no idea what to pay for it. This example shows both the difficulty of pricing arbitrary derivatives and the possible increase in asymmetry of information via
derivatives.

Nobody is actually interested in embedding an NP-Hard problem in a financial instrument in that way.  If the pricing is not clear, and there is obvious information asymmetry, buyers will simply assume they are getting ripped off and walk away (see adverse selection, below).

This paper does something much more interesting.  It develops a system where the firm offering financial assets seems to divide multiple assets up fairly but actually does so in a biased way.  In order to distinguish this tampering from non-tampering, an outside analyst has to solve a densest subgraph problem (an NP-hard problem).

In finance and economics, there is a issue called adverse selection.  Why am I hesitant to buy a used car?  People will only sell their lemons.  Why should we be cautious in hiring faculty who work at other schools?  The other school knows more about them, and will choose not to compete if they want to get rid of them.  Why should I be cautious when a bank chooses 10 mortgages to resell out of their portfolio of 10,000?  They know where the lemons are and will choose to dump them given the chance.

What the authors are saying is that even when a company sells all of their assets in groups in an apparently random way, it is possible to hide manipulation.  So adverse selection can occur in a way that is hard to determine.  Maybe the buyers will assume no adverse selection and we get to dump our lemons!  You can read the paper (or this blog posting from Andrew Appel) for some more information.

I have somewhat mixed feelings about this result (though I have only lived with it for a few hours:  this may take more time to settle my views).  On one hand, I really think that people in economics and finance need to be more aware of computational limits in their models.  On the other hand, it is not clear that NP-hardness is a particularly good defense here.  The authors have a somewhat breathless gloss on what NP-hardness means:

Computational complexity studies intractable problems, those that require more computational resources than can be provided by the fastest computers on earth put together.

Ummm… OK.  Sanjeev Arora is one of the world’s experts in complexity.  He is a Godel Prize Winner and a Fellow of the ACM.  He even, most tellingly, has a wikipedia entry.  I still don’t think this is the world’s best definition of intractable in the NP-hardness sense.  In particular, if a group put out 1000 groupings of financial instruments, and I needed to solve the densest subgraph problem on the resulting instance, I would work very hard at getting an integer program, constraint program, dynamic program, or other program to actually solve the instance (particularly if someone is willing to pay me millions to do so).  If the group then responded with 10,000 groupings, I would then simply declare that they are tampering and invoke whatever level of adverse selection correction you like (including just refusing to have anything to do with them).  Intractable does not mean unsolvable, and not every size instance needs more computing than “the fastest computers on earth put together”.

NP-hardness talks about worst case behavior as problem size grows.  Those of us in operations research spend a lot of time solving NP-hard problems like routing, timetabling, and many other problems because we really want solutions to instances of a particular size and with particular structure.  Bill Cook will solve practically any instance of the NP-hard Traveling Salesman Problem that you like (particularly if the financial incentives are right) as long as you keep it to no more than 100,000 cities or so.  I’d be happy to help a financial firm solve densest subgraph instances if it really mattered to them, NP-hardness be damned!

Of course, if this paper takes off, there might be real demand for operations researchers in looking at NP-Hard problems for the financial industry.  And that would be great for our field!

Thanks to my finance colleague Bryan Routledge for pointing out the paper, and for providing the excellent (IMHO) title to this entry.

## 15 thoughts on “Hey Buddy, Can I Sell You an NP Hard Problem?”

1. David says:

Good points. Also, although solving a given problem may be impractical, finding a good bound may be easier and sufficient for the purpose of evaluating the instrument.

2. Will says:

Your example is not NP complete. The traveling salesman problem is polynomial time for checking any individual solution (in your example, the individual solution is 1 million miles).

3. Michael Trick says:

Will, you miss the point. If the seller doesn’t give the tour, then the buyer is faced with an NP-complete problem to do the valuation. The seller knows the value (by knowing whether there is a tour or not, though there is a co-NP issue here that I will gloss over) but the buyer does not.

4. The factorization example is interesting from a number-theory point of view; I may not be able to factor a 100-digit number but I can say things about the probability that it has a factor with a given last digit, so I would be able to put some sort of value on it. But I suspect the authors were not thinking in terms of probabilistic number theory.

5. Do you think firms would try to package their derivatives in order to prevent adverse selection? We can imagine constructing a problem that has lots of local optima near the true solution, where the optimum price is arbitrarily worse than any nearby local optima. A player with an approximate solution would believe they had a near-optimal approximate price and agree to trade.

We can even consider a market where all players are trading pricing-intractable derivatives instead of ordinary ones. It seems like there would be incentive to participate if players believed their approximation algorithm or embedding scheme could beat the next firm’s.

6. The hidden subset problem is not just any NP-hard problem, though. As the factorization example hints, the authors know that in practice some problems are NP-harder than others.

If anyone wants to “try their hand” on these problems, I’ve put some instance generation code and edge lists on healthyalgorithms.

7. Ivan says:

We already have a very NP-hard problem out there – it’s called the interest rate. Add a hefty dollop of leverage and you can feed a whole island of bankers. Oh, wait…

Jokes aside, I don’t think pricing is ever the real problem, if you have 100% complete snapshot of the universe – it’s usually the discontinuities in value/first/second derivative, combined with fuzzy knowledge of most inputs that make all the difference

8. Matthew Saltzman says:

Just a point of information: The factorization problem is not believed to be NP-complete. There are sub-exponential factorization algorithms and the problem is in both NP and Co-NP. Neither property holds for any problem known to be in NPC. Factorization is thought to be not in NPC or Co-NPC or P. The belief that it is not in P (not the belief that it is NP-hard) is what drives its use in security applications.

The point about different degrees of NP-hardness is well taken, though. For example, Knapsack is pseudopolynomial (that is, polynomial in the magnitudes of coefficients rather than their length) and admits a fully polynomial-time approximation scheme (that is, an algorithm that can get within a factor of epsilon of the optimum in time that is polynomial in input length for any fixed epsilon). Knapsack is said to be weakly NP-complete.

TSP is strongly NP-complete, and provably has neither of these properties.

All of the above, of course, holds unless P=NP.

9. Sanjeev Arora says:

Mike:

It is probably true that if you really really set your mind to it, you could solve densest subgraph on moderate size graphs. In this sense NP-hardness results are merely a first cut.

But that is not what current CDO pricing algorithms (or algorithms used by rating agencies) are doing. So they would definitely not find anything wrong with the CDOs we describe. Note also that finding the dense subgraph requires having the entire graph in one place, which also may not be possible to piece together given the lack of transparency in the market.

If our paper leads to more computational thinking in design and pricing of derivatives, we would be happy enough!

10. Will says:

Perhaps I don’t entirely understand. I had thought that checking any individual solution of the TSP was concretely in polynomial time. Thus whether or not the TSP has a solution less than a million miles should be a polynomial time problem to solve. If you can solve the problem quickly, then you can easily price the instrument. What am I missing?

11. Michael Trick says:

Will: Checking if a given solution satisfies the TSP constraints (is a cycle) and has value less than 1 million is in P. But if I don’t give you that solution, you are no closer in knowing whether there is a solution less than 1 million. Consider: I give you a distance matrix on, say, 1000 points and ask you: is there a tour of value less than 1 million? What would you do to answer this definitively?

12. Matthew Saltzman says:

In fact, this is one characterization of the difference between P and NP. If checking a proposed solution is polynomial, then the problem is in NP. If finding a solution is polynomial, the problem is in P.

Obviously, if finding a solution is polynomial, so is checking a proposed solution. But the converse is true if and only if P = NP.

13. Joey Auto says:

“…buyers will simply assume they are getting ripped off and walk away.”
Exactly. Beside all the math in this subject there is also some sort of psychological component -> How to communicate complex subjects in an easy-to-understand way.