Encoding Decision Trees
Decision trees a widely used to teach decision making with uncertain outcomes. Analytica does not support the direct encoding computation over decision trees -- it uses an influence diagram paradigm instead. But you can encode the logic of a decision tree within an influence diagram. Influence diagrams are often more efficient. You can perform same computations with an influence diagram that you might perform on a decision tree. This page describes how to transform from a decision tree to an influence diagram representation. It then addresses more advanced topics related to dynamic programming and sequential decision making.
What Is a Decision Tree?
A decision tree is a branching depiction of a set of sequential decision and chance outcomes, depicted in a decision tree diagram such as the following:
Although the diagram shown was "drawn" using Analytica, this is not an Analytica model -- it is drawn only for illustration, and although the node shapes are the same as those in an influence diagram, the semantics of the nodes shown are different in a decision tree case. The green rectangles are decision nodes, the ovals are chance outcomes, and the far right hexagons are objective (utility) nodes -- the same shape distinctions used by Analytica's influence diagrams. But where an influence diagram would depict one decision node (e.g.,
D2), we see multiple instances of that decision on the decision tree -- one instance on each branch. In a sense, a decision tree expands out the dimensionality visually, where the visual complexity increases multiplicatively with the number of decisions and chance nodes, whereas an influence diagram depiction increases only additively as we add decisions and chance outcomes.
In the decision tree, there are two possible options for decision
D1. If we were to decide
D1 = 1, we'd take the top branch, otherwise if we were to decide
D1 = 2 we'd take the bottom branch. Suppose we take the top branch, then nature randomly selects one of two outcomes for
C1, leading either up or down the next branch. Through a set of sequential decisions and sequential chance outcomes, we eventually end up at a leaf, where a certain utility for that outcome is obtained. The probabilities at each chance outcome can vary with the decisions and chance outcomes that preceded it, and likewise, a decision maker can base his decision on the decisions and chance outcomes that preceded that decision (i.e., each preceding chance outcome is observable at the time the decision is made). Furthermore, the utility at each leaf can vary with all decisions and chance outcomes.
In general, decisions may have more than 2 options and chance nodes may have more than 2 possible outcomes. The number of arrows emanating to the right of a node it called its branching factor. The total number of leaf nodes is multiplicative in the branching factors of nodes, and hence, decision trees become infeasible in practice for all but the most simplistic of examples. For this reason, they are often viewed primarily as a tool for teaching concepts of probability and sequential decision making with uncertainty. Building your models as influence diagrams, and learning to think in terms of influence diagrams, will generally lead to more scalable approaches to modeling.
Solving a decision tree
Solving a decision tree means involves computing, for each decision on each branch, what the optimal decision would be. The collective set of decisions (aka policy) should be one that maximizes expected utility. Notice in the diagram there are 4 instances of
D2 -- one on each of the four branches (contingent on
C1). A separate decision may be assigned to each of those four instances of
D2. In the diagram, there are 13 decision node instances, each having 2 possible decisions, so that the total number of possible policies is 213 = 8,192. If we added one more stage to the tree, this would increase to 229=536,870,912 possible policies.
One can assign a probability to every leaf node -- the probability of reaching that leaf node given that all preceding decisions are taken to guide you on the path to that leaf node. This probability is just the product of the probabilities of each outcome on the path from the root to the leaf node.
If you select a single policy, the decision tree could be collapsed to an event tree consisting only of chance nodes and utility nodes. All branches inconsistent with the chosen policy are pruned away to get the event tree. If you then multiply each utility by its probability, and add them all up, you'll get the expected utility of following that policy. So one way to compute the optimal policy is brute force -- cycle through all policies, compute the expected utility and keep the policy that scores the highest. This algorithm is, of course, horribly inefficient -- in fact, it is doubly-exponential in complexity.
One can do much better, with a singly-exponential algorithm (singly-exponential since the decision tree is already multiplicative in size compared to the number of decision and chance variables). The method starts at the right of the diagram and backs up the utilities one layer at a time, determining first the decisions at
D3, then the decisions at
D2, and lastly the decisions at
D1. For each
D3 node, you select the decision leading to the greater utility. That becomes the utility on the arrow leading into that decision. Using those, at each
C2 node you compute the expected utility based on the probabilities of outcomes at that
C2 instance, which becomes the utility on the arrow leading into each
C2. The arrows thus get labelled working your way back until the optimal decision at
D1 is finally determined to complete the computation.
Encoding a Decision Tree in Analytica
Let us now consider how to encode a decision tree, such as the one shown above, as an influence diagram. For this first take, we'll assume that probabilities are going to be encoded and used explicitly without any Monte Carlo simulation. (Using Monte Carlo to handle chance nodes is an alternative way to proceed, which will be considered later).
In the influence diagram approach, we are going consider each stage to be a single variable of our domain. So there are only three decisions, two chance variables, and one utility variable in our example. So the structural complexity of the influence diagram does not reflect the combinatoric complexity of the representation as it does in a decision tree. The combinatoric complexity will be captured through the dimensionality of the variables involved.
For decision variables, there is an important distinction to make between the possible options and the optimal decision taken. I prefer using index nodes (parallelograms) for the possible options and decision nodes (rectangles) for the optimal decisions, but others prefer using decision nodes for the possible options since it is more similar to the original decision tree depiction. In either case, you will want to split these into separate Analytica nodes. For the chance variables, we also have a distinction between the possible outcomes and the probability of each outcome; however, it is possible to cover both of these using a Self-Indexed Table.
Our initial influence diagram encoding, which contains the information contained in the decision tree model, but does not yet contain the backup computation of the optimal policy, looks like this:
There is an index node for each decision, containing the possible options. Each chance node is a self-indexed table, with all preceding nodes as indexes as well as a self-index. The self-index values label the possible outcomes, and the cells of the table contain the outcome probabilities. The objective,
U, is a function of all preceding variables. In most models, this will be computed from the actual values using some sort of expression, although in a brute-force encoding a 5-dimensional edit table could be used.
A policy is one combination of
I_D1, I_D2 and
Computing the Optimal Policy
To perform the backup-policy computation, we add additional nodes to our influence diagram.
To compute the optimal decision policy at
Decision D_3 := ArgMax(U, I_D3)
The result is a function of decisions made for
I_D1, I_D2, and a function of the observed outcomes for
C2, and hence is 4-dimensional. This reflects the fact that the decision maker will be able to see all previous decisions and all previous chance outcomes when it comes time to take the decision. At that time, he can simply find his scenario in this 4-D table, which tell him what the optimal action is.
We next back up the utility to obtain the expected utility of each branch reached by
D2. That backed-up utility will then allow us to pick the optimal decision policy for
D2. Since we now know what the optimal decision at
D3 is, the utility on the arrows going into
D3 is given by
U[I_D3 = D3]. Taking the expectation over
C2's outcomes, we obtain:
Variable U2 := Sum(U[I_D3 = D_3]*C2, C2)
And with this, can select the optimal decision policy for
Decision D_2 := ArgMax(U2, I_D2)
Finally, for the first decision:
Variable U1 := Sum(U2[I_D2 = D_2] * C1, C1)
Decision D_1 := ArgMax(U1, I_D1)
With these we've computed the optimal decision policy for this decision tree model. One final item that may be of interest is
Variable Leaf_probability := C1*C2
which computes the probability of arriving at each leaf given for each possible policy. This is indexed by
I_D1, I_D2 (identifying a single policy, less
I_D3 which doesn't impact the probability), as well as by
C2 (which splits the result into each leaf that is reachable from a given policy). With nodes to compute the optimal policy, and the backed-up utilities, our decision model is now:
You can download this model from media:Decision tree translation.ana.
On this page we have considered the direct translation of a decision tree into an Analytica influence diagram. From this, you should be able to grasp the basic concepts for how to map a decision tree into an influence diagram representation and how to compute the result. It should also be evident that the full representational power of the decision tree formalism can be harnessed from within an influence diagram model (the inverse is not the case, of course, since the influence diagram formalism is strictly more general/powerful), and all computations performed in he decision tree formalism are still available from within an influence diagram formalism.
Due largely to the inherent combinatorics of a decision-tree inspired model, there are many extensions to consider. I will mention here some of these extensions, which you may be interested in exploring yourself as a exercises.
Selective Parametric Restriction to a Sub-Tree
Earlier I stated that I prefer to use index nodes to encode the possible decision options. However, one variation is to change these to decision nodes defined as Choice pulldowns. In this configuration, the domain of the variables contain the possible options. However, what this configuration allows you to do is to restrict the nodes that follow to a subset of the full tree.
As an exercise, convert the model from above to this configuration. You'll need to change the definitions of
D_1, D_2 and
D_3 slightly to ensure that they only consider options that are selected. In other words, if you've restricted the model to the subtree in which
I_D3 = 2, then the optimal decision
D_3 should be 2 (i.e., no ArgMax is necessary in that case). But if
I_D2 = All, then
D_2 would need to compute the ArgMax.
This technique provides one way to deal with very large dimensionalities. In your full decision-tree model, it may be infeasible to analyze the full tree, but you still might be able to analyze selective sub-trees at one time, and then easily explore alternative sub-trees. Note that this technique helps to cut down excessive dimensionality arising from multiple decisions, but does not help with the multiplicity of chance outcomes.
Monte-Carlo treatment of Chance Variables
To cut down on the combnatoric dimensionality of multiple chance variables, we can encode a decision tree in an alternative fashion, using Monte Carlo to sample chance variables. This removes all chance variable dimensions from downstream variables, by adds one more dimension to the result -- the Run dimension (for the Monte Carlo simulation). If the number of chance outcome combinations is extremely huge, much larger than the Monte Carlo sample size, this can result in a dramatic reduction of dimensionality. A Monte Carlo formulation also allows you to utilize continuous chance variables, which is not possible in a straight decision-tree formulation.
As an exercise, reform the model to use Monte Carlo rather than analytic probabilities for chance variables. You should probably come up with a utility that is expressed as an expression (of
I_D1, I_D2, I_D3, C1 and
C2) rather than as an explicit table, since an explicit table would carry through the dimensionality of
C2 which is what we are trying to avoid here. Initially, it helps to alter our assumption about observability slightly -- assume instead that the decision maker cannot observe the chance outcome of previous decision at the time he is making each decision. This relaxation removes the need to include
C2 as dimensions of the computed policy
D_3, for example.
In the example worked out on this page, we had 3 sequential decision stages. If we were to expand our model to 4 sequential stages, we would have to surgically alter the influence diagram, adding several nodes and making adjustments to the nodes in the final stage.
In some cases, the chance outcome probabilities may be a function of only the preceding stage's variables, and hence can be expressed in a general way that applies at any stage, even if we expand the model out to 100 time steps. (Which, of course, would become entirely infeasible to solve, but perhaps not to represent). Also, if the utility function is expressed as a general expression, then we can conceptualize a Dynamic model where only a single stage is depicted on the diagram, with a influence loop indicating that each stage is a function of the previous stage.
This type of representation does not eliminate the inherent combinatorics of decision trees -- our arrays will continue to gain dimensionality as we proceed to higher time points -- but it can result in a model where you can simply specify the number of sequential stages, T, without having to surgically alter your model.
To implement a Dynamically-structured decision tree model like this, you must represent the possible options for the decision at <code.Time = i using a local index. In place of
I_D1, I_D2, and
I_D3, we'll create local indexes for all decisions using an expression such as the definition for Variable Options:
For t := Time do \Index I := [1, 2]
The index for
D_t is then referenced using:
(#Options[Time=t]).I. The LocalAlias declaration can be used to extract an index using this notation and then use it in an expression index context.
The Dynamic Programming paradigm introduces a concept of state at each stage. Think of it this way -- at each arrow in the decision tree, the world is in a given "state". Taking a decision transitions to a new "state". Each arrow on the diagram corresponds to some world state.
A key simplification of complexity can be obtained if the number of possible states is bounded. Each time you add one additional stage to your decision tree, the number of arrows increases multiplicatively at each stage. But if these map to a finite number of states, then the number of states at each stage remains constant. A state description must be "sufficient" (or "Markov") -- meaning that knowing the state label on any given input arrow to a decision or chance node must give you enough information to predict the state label on the output arrows of that node in the decision tree. With a state representation, chance probabilities depend only on the incoming state, and optimal decisions depend only on the incoming state, so dimensionality is constrained even if the number of sequential decision stages increases.
Dynamic programming is a distinct paradigm from the decision-tree paradigm, providing an alternative way to attack sequential stochastic decision problems. They share the similarity that computing an optimal policy is carried out by starting at the right and working backwards, propagating the utility backwards at each time step. Dynamic programming differs from decision-trees in that the number of states at each stage is bounded, while in decision trees you can view the number of states as multiplying without bound with each stage.
As an exercise, try transforming the example into a dynamic programming formulation (you won't use the same numbers as we did here). Start by assuming there are 5 distinct work states, and label the arrows in the original decision tree with state assignments, in such a way that the outcome states of every node are always determined by the input state. (For example, if two different
C2 instances both have their input state labelled as
state = 1, then their output states need to agree). Represent the model in influence diagram form, and then implement a utility backup to compute the optimal decision policy at each stage as a function only of state. (This is referred to as the value-iteration algorithm).
Approximate Dynamic Programming
Sometimes a dynamic programming formulation can be identified, but the number of states is extremely large or continuous. Like exact dynamic programming, Approximate Dynamic Programming computes policies also using the value-iteration algorithm. The difference is that it approximates the backed up utility at each stage using some sort of parametric "fit" to samples taken from the utility.
ADP can overcome extremely formidable dimensionality "curses", and thus can find applicability to highly complex real-world sequential stochastic decision making problems; however, it is almost always extremely challenging to apply. It involves expertise in advanced techniques from stochastic Monte Carlo sampling, function optimization, function fitting/approximation, and of course, dynamic programming. We hope to explore this very interesting area further with follow-on articles and/or webinars in the future.