Writing Array-Abstractable Definitions
This article was written April 2003 by Lonnie Chrisman, Ph.D., of Lumina Decision Systems, updated Nov 2005, and originally in a PDF format. It was transferred to the Analytica Wiki Feb 2010.
The Benefits of Array Abstraction
One of the most powerful and useful features of Analytica is Intelligent Array Abstraction™. When we give the variable Profit the definition
Revenue-Expenses
Analytica automatically carries through any dimensions of Revenue and Expenses, seamlessly making the correspondence if the same dimension occurs in each. If the Revenue and Expenses variables are broken down by (i.e., indexed by) department and project, the resulting Profit will also be broken down across these two dimensions. A conventional programming language would force the modeler to explicitly loop over these dimensions when computing Profit, and a spreadsheet would require the modeler to explicitly copy the same formula to every cell in the two-dimensional grid. This manual overhead imposed on the modeler in those environments is distracting, and fundamentally unnecessary, since the relationship between Profit, Revenue and Expenses is entirely separate from whatever the dimensions happen to be in a particular model. This principle of separating the fundamental relationships between quantities (variables) and their dimensionality is what we call (Intelligent) Array Abstraction. (Intelligent Array Abstraction is a trademarked name of Lumina Decision Systems.)
Once one has mastered the basics of array abstraction, it is hard to imagine going back to a more primitive modeling environment (such as a spreadsheet or standard programming language) without such a capability for any non-trivial modeling application. Array abstraction provides a flexibility that goes hand-in-hand with the modeling process. When we begin prototyping a model, the optimum dimensions to include are seldom clear. The modeling process usually involves the elucidation of relevant variables, specification of fundamental relationships, and experimentation with tradeoffs between richness of knowledge to build off of, computational complexity, and levels of detail. The determination of the appropriate dimensionality for key variables is a task that usually falls most naturally after elucidation of relationships between variables. Array abstraction allows the modeler to do just this – once an abstractable model is in place, dimensions can be added and removed with little effort. This can be contrasted with spreadsheets and conventional programming languages where one must commit to the dimensionality of variables early in the modeling process, and because every computational step involves explicit defined iteration over the specified dimensions, altering the dimensionality of variables tends to be extremely tedious.
Array abstraction also promotes correctness. Opportunities to make errors when copying cell references in a spreadsheet abound, are quite common in practice and are hard to verify; likewise, it is quite easy to make mistakes when writing explicit looping constructs in conventional programming languages, such as accidentally confusing the correspondence between dimensions. Separated from the clutter of all these looping constructs, the clarity of variable definitions in Analytica is far more apparent. Furthermore, array abstraction leaves one with a sense of confidence that solid portions of the model are and will remain correct, regardless of other modifications or dimensional adjustments that may be made outside of that sub-model. In Analytica, array abstraction enables a number of other important capabilities. A major one of those is the uncertainty analysis that is built into Analytica. Uncertainty analysis in Analytica operates by adding a dimension to the uncertain variables in your model, named the Run dimension. Since your (presumably abstractable) model describes the relationships between variables, what is true without that dimension should also be true with that dimension, and thus Analytica can seamlessly propagate that dimension through the computations, even though as a modeler, you might not have conceptualized your variables up front as being dimensioned by Run.
Another important capability arising from array abstraction is What If analysis. Adding new dimensions to key input variables enables one to simultaneously explore and compare multiple scenarios, without having the logic of your model. “What-if” analyses are also often the bases decision making by exploring spaces of possible decisions.
Non-abstractable Definitions
Most Analytica expressions (relationships) that one would write in a variable’s definition field will be “abstractable”, that is, it will be possible to arbitrarily add and adjust dimensions to input variables without altering the correctness of the result. Analytica will propagate any of these dimensions through the variables without any extra thought on the part of the modeler. However, there are some notable exceptions, where it is possible to write a definition that is not abstractable. When such a definition is written, the flexibility and benefits of array abstraction may be lost. A key area of the more advanced Analytica modeling skill set is being able to recognize such cases and to know how to “correctly” express the desired relationships in alternative, abstractable ways.
Expressions that are limited in their ability to be array abstracted often fall into one of three groups: Expressions that introduce new dimensions, dimension-reducing expressions that that don’t name the relevant dimension, and convergence/iterative/recursive algorithms where the total number of iterations depends on the computation itself and cannot be known in advance. Relationships falling in these categories can almost always be written in an array abstractable fashion if the modeler pays appropriate attention to how the expression is written.
An example of a non-abstractable expression in the first group of functions that introduce a new dimension is the Sequence function, e.g., 1..N or Sequence(1,N). When N is “atomic” (i.e., not an array, containing no dimensions), this expression evaluates just fine, but N is changed to a list of numbers, e.g., [10,20,30], the result would be a ”non-rectangular” array, which is not allowed as an Analytica value. The presence of a Sequence function embedded in a more complex expression could make the entire expression non-abstractable if appropriate modeling style is not exercised, as in the following example of an expression to compute the factorial of a number N:
Product(1..N)
Although there is no inherent reason why the factorial of a number should not be abstractable, this particular way of writing the factorial would not be abstractable. Other functions also falling into the first group of expressions that (may) introduce dimensions include Subset, SplitText, SortIndex (if the second parameter is omitted), Concat, CopyIndex, IndexNames, and Unique.
An example from the second group of expressions, dimension-reducing expressions not naming the relevant dimension, is the expression: Sum(A), where the second index parameter of Sum is omitted. This form sums over the “outer” dimension of A. As a modeler, there are generally only a couple of instances where you know what the outer dimension is – the case where A is guaranteed to be only one-dimensional, and the case where A contains an implicit dimension (which will always be the outer dimension). If you assume any other dimension is the outer dimension, you may be unpleasantly surprised when new dimensions are introduced. Thus, as a matter of style, it is best to always include the relevant dimension as a second parameter to the array reducing functions in all other cases, and you’ll steer clear with abstractability problems from this class of expressions. The array reducing functions with optional index parameters include Sum, Product, Max, Min, Average, Rank, ArgMax, ArgMin, SubIndex, ChanceDist, CumDist, and ProbDist. Also, the function Size is essentially also in this category, although there is no concept of relevant index there – basically, Size should only be used on parameters restricted to be one-dimensional, index parameters being a very important special case.
Finally, the third class of expressions are iterative or recursive convergence algorithms with a dynamically determined number of iterations. This group almost always involves While or Iterate functions. For example, another non-abstractable expression for factorial of an integer N is:
Var a := 1; Var fact := 1; While a < N Do ( a := a + 1; fact := fact * a )
As written, this expression assumes N to be atomic, and will not evaluate if N is an array. This expression will be discussed further below when discussing Horizontal and Vertical Array Abstraction. The fundamental problem with abstracting over a While is that the number of iterations may differ from cell-to-cell, so a single evaluation of the While loop does not account for the cell-by-cell differences in evaluation flow. It is important to realize that if you are iterating over a dimension, either implicitly or via For or Using..In..Do, because the number of iterations required is fixed in advanced, abstraction limitations generally are not introduced.
Horizontal and Vertical Array Abstraction
There are two conceptual models for how Analytica actually carries out array abstraction when evaluating expressions, which we will (somewhat arbitrarily) call the horizontal and vertical conceptualizations. For illustrative purposes, suppose you have a model, call it F(X), that computes a result based on the input variable X, and suppose we add a new dimension I:=1..100 to X and compute our result. The horizontal conceptualization is that Analytica evaluates F(x) one hundred separate times, once for each value of X, and collects the result for display at the end. The vertical conceptualization is that the array X is passed through each level of evaluation as an array, vertically through the call stack, until the array-valued value bottoms out at a built-in operator or function. In most instances, either conceptualization is perfectly fine, and knowledge of which method Analytica actually implements is unnecessary. Indeed, the two conceptualizations are functionally equivalent (i.e., produce the same net result) for any fully abstractable expression. However, the expert Analytica modeler can make use of how Analytica abstracts to leverage horizontal or vertical evaluation as appropriate in the problem cases highlighted in the previous section in order to write abstractable variations of their expressions.
In the default case, Analytica uses vertical abstraction to evaluate an expression. Thus, when the user-defined function
MyFact(N) := Product(1..N)
is called with N:= [10,20,30]
, the array of values is passed into the definition, then into the
parameter of Product, and finally into the second parameter of Sequence, resulting in the
evaluation of Sequence(1,[10,20,30]), which as discussed in the previous section, presents a
problem. However, using the “atomic” qualifier on the parameter, e.g.,
MyFact( N : atomic ) := Product(1..N)
instructs Analytica to abstract over the parameter of MyFact horizontally, so instead the expression Product(1..10) is evaluated, followed by the expression Product(1..20), and finally the expression Product(1..30). Each of these returns a scalar, which is then collected up and returned as [3.6e6, 2.4e18, 2.7e32]. So, although the expression Product(1..N) is not vertically abstractable, it is horizontally abstractable across N. In fact, exerting control over when Analytica abstracts horizontally or vertically is a quite general tool for eliminating limitations on array abstractability.
Leveraging User-Defined Functions (UDFs) is usually the easiest and most efficient way to horizontally abstract. The qualifier atomic is generalized (in Analytica 3.1 and above) to any number of dimensions using the parameter syntax, e.g.
( x : Array[I,J,Time] ; I,J : Index ; y : atomic )
The list of indexes in square brackets indicates the “allowed” dimensions – any other dimension will be horizontally abstracted over before calling your function, so that inside your function body, you can assume that only these dimensions appear. The dimensions may be either global indexes (as Time is above), or other index parameters. The qualifier atomic is equivalent to Array[ ] – i.e., an array with zero dimensions.
If a parameter that is not indexed by I were passed to a parameter with the qualifier Array[I], the value will be passed in without the index. A value not indexed by I is generally treated as a value that is constant across dimension I by Analytica, so there are few instances where this will matter. One way this is often leveraged is by declaring a parameter as Array[Run], which allows the Run index, but doesn’t require it, for example, when the parameter is evaluated in Mid mode. In the case where you must have the dimension, the all qualifier can be added:
( x : Array All[I,J]
The atomic and Array qualifiers on user-defined functions demonstrated above is a very convenient method for specifying horizontal array abstraction when a UDF is used. Within the body of definition (whether a variable, user-defined function, or other object), horizontal array abstraction controlled by specifying the allowed dimensions in a variable declaration (Var..Do or Using..Do). Two variations on syntax are
Var X[I,J,…] := expr ; body1; body2; … bodyL
or
Using X := expr In Each I,J,… Do body
In the first syntax, the bracketed list of indexes specifies that array abstraction on the local variable X is to be done horizontally for any dimensions not listed in brackets. The local variable X has a lexical scope of body1 thru bodyL, and each of those expressions to the end of the current lexical scope can assume that X is only indexed by those dimensions named. The Using..In Each..Do syntax accomplishes the same with a slightly different syntax, the first syntax having a more procedural flavor, the second a more functional-programming flavor and more consistent with the syntactic style prevalent in earlier versions of Analytica. In either case, the list of dimensions can be empty, which states that X should be horizontally abstracted all the way down to a scalar (equivalent to the atomic qualifier on a UDF parameter). Thus, another way of writing the factorial expression in a fully abstractable way is:
Using A := N In Each Do Product(1..A)
or synonymously
Var A[ ] := N; Product(1..A)
As another variation, consider the non-abstractable expression7
Sum(0.5^(Min(B,I)..Max(B,I)))
Here the assumption is that B is not indexed by anything other than I, so an abstractable version of this expression would be
Var A[I] := B Do Sum(0.5^(Min(A,I)..Max(A,I)))
Since each declaration loops independently, multiple dimensionality declarations can sometimes be inefficient. For example, in:
Var A[] := X; Var B[] := Y; F(A,B)
if X and Y are both indexed by I, the first declaration will loop over I, and the second will loop over I again. The final result will utilize only the diagonal if this Size(I)^2 computation. There are two ways to avoid this inefficiency. The easiest is to use a User-Defined function with a declaration (A ,B : atomic), which would loop over I only once. A more complex method, not requiring a UDF, is to bundle the declaration using references, as in the following:
Index ABindex := [‘A’,’B’]; Var AB[] := Array(ABIndex,[ \[]X, \[]Y ] ); Var A := #AB[ABIndex=’A’]; Var B := #AB[ABIndex=’B’]; F(A,B)
Here the reference syntax, \[]X, indicates that no indexes are to be “swallowed” by the reference. You would place any desired dimensions for A in the square brackets of the reference operator if A is not just atomic. The single declaration, “Var AB[]”, ensures that the variables all have their desired dimensionality in the expression body.
The While Loop
When a While loop is evaluated with vertical array abstraction, the while-loop expression is evaluated only once (the body of that While is iterated 0 or more times, of course). This is generally problematic when abstractable dimensions are present, since each different column of the extra dimension may need to iterate the while-loop’s body a different number of times. Thus, forcing horizontal abstraction to atomic values for any variables appearing in the condition of the while loop is often the appropriate method for obtaining an abstractable while-loop expression. This method is illustrated by the following UDF:
Function MyFact( N : atomic ) := Var a := 1; Var fact := 1; While a < N Do ( a := a + 1; fact := fact * a ); fact
The reader should note that the only change between this and the originally expression in Section 2 is the addition of the atomic qualifier on N in the UDF’s parameter list.
If the same snippet were to appear within a larger expression, and not necessarily within a UDF, the horizontal array abstraction method would result in
Var m[ ] := N; Var a := 1; Var fact := 1; While ( a < m ) Do ( a := a + 1; fact := fact * a ); fact
Another approach to dealing with the abstractability of While is to continue using vertical array abstraction, but to adjust the body expression to ensure that the extra iterations do not functionally affect the outcome. This is demonstrated with this abstractable version of the same example
Var a := 1; Var fact := 1; While Any ( a < N ) Do ( a := a + 1; fact := If a<=N Then fact * a Else fact ); fact
The Any keyword following while allows the condition to be array-valued, but the while will continue until all elements are 0. The conditional update rule inside the While ensures that the extra iterations don’t impact the final result. The computational ramifications of the two methods may impact the modeler’s choice of which method to use. The core update operation (in this case, the multiplication) ends up being evaluated many extra times with vertical array abstraction, particularly if the number of iterations required differs wildly from element to element. However, this needs to be weighed against the greater efficiencies of vertical array abstraction and the fewer number of times the complete while loop expression needs to be evaluated. When the internal update operation is very expensive, horizontal array abstraction will usually be more efficient. There are also some cases where only one of these two techniques can be applied.
We should emphasize that the While loop is a very poor choice for computing a factorial, since the number of iterations is known in advance. It was chosen as a starting example solely for its ease in understanding. The more appropriate applications for While are iterative convergence or optimization algorithms where the number of iterations is determined dynamically. Nevertheless, the two methods outlined above are the general techniques as the modeler’s disposal for obtaining array abstractable iterative expressions based on while. The following is an example of an advanced iterative convergence algorithm that has been made fully abstractable using the horizontal abstraction method. The example appears in the example models and libraries shipped with Analytica 3.x. It uses an iteration known as the power method (from an elementary Linear Algebra textbook) to compute the dominant Eigen vector of a square matrix A, dimensioned by equal-sized dimensions I & J. If there are additional dimensions (i.e., A is a set of matrices, or A contains a Run dimension), the expression abstracts over them. The reader should not worry about the specifics of this particular algorithm, but should take note of the first line, declaring B to horizontally array abstracted should A contain extra dimensions. The take away message being that a very sophisticated iterative convergence algorithm such as this, which on the surface appears quite incompatible with (vertical) array abstraction, requires only a single line (with a few variable name substitutions within) to convert it into a fully abstractable expression. Thus is the benefit of having a mastery of horizontal array abstraction control. The reader might also examine the While criteria, which is typical of a convergence algorithm. It is a case where we must guarantee that B is 2-D (so that EigenVal is scalar, and thus the condition to While is scalar), which is accomplished by horizontal array abstraction. Some auxiliary UDFs called within are not included here, but can be found in the example models shipped with Analytica 3.x.
/* Ensure that it array abstracts if A has other dimensions */ var B[I,J] := A; /* First, generate a random first eigenvector */ var EigenVec := Random_vector(J); var EigenVal := Rayleigh_quotient(B,eigenVec,I,J); var prevEigenVal := EigenVal-1; var maxIter := 50; while (maxIter>0 and abs(prevEigenVal-EigenVal)>1e-10) Do ( var xi := subscript(sum(B*EigenVec,J),I,J); EigenVec := xi /max(xi,J); prevEigenVal := EigenVal; EigenVal := Rayleigh_quotient(B,EigenVec,I,J); maxIter := maxIter-1 ); EigenVec
Note: If you really want to compute the Eigen value/vector, use the built-in EigenDecomp function.
As one final and very advanced example of a very sophisticated UDF, I present the GoalSeek function. GoalSeek is essentially an equation solver based on a Newton-Raphson gradient descent method. It finds the (atomic) value for a variable X that makes the (atomic) variable Y equal to goal, where Y depends on X in the model. Extra dimensions may end up being introduced via array abstraction in a variety of ways, making a particularly challenging exercise to make this generalized library function fully array abstractable. For example, dimensions can be introduced via X (i.e., interpreted as different starting points for the search), into Y by other variables in the model (interpreted as different cases that must be solved), or into goal (interpreted as different cases that must be solved). Before examining the actual code for Goalseek, I will demonstrate an example of its usage.
Problem statement: b and c are uncertain quantities, and are related to R by the transcendental equation:
- R2 = c e-bR
The application requires us to treat the solution for R as an uncertain random variable. We would like to view the Mid-value of R, as well as the uncertainty graphs for R (e.g., PDF, CDF, Sample, Prob Bands, etc). C and b may or may not have the Run dimension or other dimensions.
We model this with GoalSeek by first dividing by e-bR so all instances of R are on one side of the equation
- R2 ebR = c
which is now in a form appropriate for the GoalSeek function. An example of this model is as follows:
b := Uniform(0,1) c := Uniform(0,5) r_guess := 5 tst := r_guess^2 * exp(b*r_guess) R := GoalSeek( r_guess, tst, c ) xs
The following graph is the CDF and PDF of R computed from an uncertainty sample size of 1,000 points. This took 4½ minutes on a 1.9GHz Pentium 4 processor with 512M RAM.
When R is evaluated in prob-mode (e.g., when its CDF is viewed), GoalSeek must deal with the Run dimension in b and c. For dealing with extra dimensions that might be introduced into Y from other components of the model that GoalSeek has no way of knowing or reasoning about, GoalSeek employs the vertical array abstraction method. Any dimensions of b are handled via this method. This is an example where the horizontal method could not be adapted, but the vertical method could. Extra iterations beyond that necessary for minimal convergence in this case are not harmful in terms of the functional result, so we just let them update. (Strictly speaking, this is not perfect array abstraction, since the value in the abstracted case may be a little more precise then the same value computed as a scalar, but since both will be within the convergence threshold, this is not particularly important). For iteration over dimensions introduced thru the goal (c in this case) or X (different initial guesses), GoalSeek leverages horizontal array abstraction.
Conclusion
One of the key skills of the advanced Analytica modeler is ability to write array abstractable functions and models, leading to maximum flexibility for later dimensional adjustments, what-if analyses, uncertainty computations, and decision making. This skill is developed by perfecting key elements in modeling style, learning to recognize the cases that may be subject to array abstraction limitations, and developing the skill to adjust expressions in ways that will abstract. The later requires developing deep understandings of features such as the While loop, horizontal array abstraction, etc. Many modelers using Analytica may find they seldom run into a situation that does not array abstract… these situations tend to arise in the more complex and algorithmically-oriented applications. But for those in the latter camp, the flexibility that comes from array abstractable models further increases the power of Analytica for exploring and prototyping new models.
Enable comment auto-refresher