A more interesting use of uncertainty occurs when the state that results from a decision is uncertain. For example, consider the following coin tossing game: a coin will be tossed 4 times. Before each toss, you can wager $0, $1, or $2 (provided you have sufficient funds). You begin with $1, and your objective is to maximize the probability you have $5 at the end. of the coin tosses.
We can formulate this as a dynamic program as follows: create a stage
for the decision point before each flip of the coin, and a ``final''
stage, representing the result of the final coin flip. There is a
state in each stage for each possible amount you can have. For stage
1, the only state is ``1'', for each of the others, you can set it to
``0,1,2,3,4,5'' (of course, some of these states are not possible, but
there is no sense in worrying too much about that). Now, if we are in
stage i and bet k and we have x dollars, then with probability
.5, we will have x-k dollars, and with probability .5 we will have
x+k dollars next period. Let be the probability of ending
up with at least $5 given we have $x before the ith coin flip.
This gives us the following recursion:
Note that the next state is not known for certain, but is a
probabilistic mixing of states. We can still easily determine
from
, and
from
and so on back to
.
Another example comes from the pricing of stock options. Suppose we have the option to buy Netscape stock at $150. We can exercise this option anytime in the next 10 days (american option, rather than a european option that could only be exercised 10 days from now). The current price of Netscape is $140. We have a model of Netscape stock movement that predicts the following: on each day, the stock will go up by $2 with probability .4, stay the same with probability .1 and go down by $2 with probability .4. Note that the overall trend is downward (probably conterfactual, of course). The value of the option if we exercise it at price x is x-150 (we will only exercise at prices above 150).
We can formulate this as a stochastic dynamic program as follows: we
will have stage i for each day i, just before the exercise or keep
decision. The state for each stage will be the stock price of
Netscape on that day. Let be the expected value of the
option on day i given that the stock price is x. Then, the
optimal decision is given by:
and
Given the size of this problem, it is clear that we should use a spreadsheet to do the calculations.
There is one major difference between stochastic dynamic programs and deterministic dynamic programs: in the latter, the complete decision path is known. In a stochastic dynamic program, the actual decision path will depend on the way the random aspects play out. Because of this, ``solving'' a stochastic dynamic program involves giving a decision rule for every possible state, not just along an optimal path.