Difference between revisions of "Writing Array-Abstractable Definitions"
(Formatting of sec 2) |
(Horizontal and Vertical Array Abstraction section) |
||
Line 124: | Line 124: | ||
= Horizontal and Vertical Array Abstraction = | = Horizontal and Vertical Array Abstraction = | ||
+ | |||
+ | There are two conceptual models for how Analytica actually carries out array abstraction when | ||
+ | evaluating expressions, which we will (somewhat arbitrarily) call the horizontal and vertical | ||
+ | conceptualizations. For illustrative purposes, suppose you have a model, call it F(X), that | ||
+ | computes a result based on the input variable X, and suppose we add a new dimension I:=1..100 | ||
+ | to X and compute our result. The horizontal conceptualization is that Analytica evaluates F(x) | ||
+ | one hundred separate times, once for each value of X, and collects the result for display at the | ||
+ | end. The vertical conceptualization is that the array X is passed through each level of evaluation | ||
+ | as an array, vertically through the call stack, until the array-valued value bottoms out at a built-in | ||
+ | operator or function. In most instances, either conceptualization is perfectly fine, and knowledge | ||
+ | of which method Analytica actually implements is unnecessary. Indeed, the two | ||
+ | conceptualizations are functionally equivalent (i.e., produce the same net result) for any fully | ||
+ | abstractable expression. However, the expert Analytica modeler can make use of how Analytica | ||
+ | abstracts to leverage horizontal or vertical evaluation as appropriate in the problem cases | ||
+ | highlighted in the previous section in order to write abstractable variations of their expressions. | ||
+ | |||
+ | In the default case, Analytica uses vertical abstraction to evaluate an expression. Thus, when the | ||
+ | user-defined function | ||
+ | |||
+ | MyFact(N) := [[Product]](1..N) | ||
+ | |||
+ | is called with N:= [10,20,30], the array of values is passed into the definition, then into the | ||
+ | parameter of Product, and finally into the second parameter of [[Sequence]], resulting in the | ||
+ | evaluation of [[Sequence]](1,[10,20,30]), which as discussed in the previous section, presents a | ||
+ | problem. However, using the “atomic” qualifier on the parameter, e.g., | ||
+ | |||
+ | MyFact( N : atomic ) := [[Product]](1..N) | ||
+ | |||
+ | instructs Analytica to abstract over the parameter of MyFact horizontally, so instead the | ||
+ | expression [[Product]](1..10) is evaluated, followed by the expression Product(1..20), and finally | ||
+ | the expression Product(1..30). Each of these returns a scalar, which is then collected up and | ||
+ | returned as [3.6e6, 2.4e18, 2.7e32]. So, although the expression Product(1..N) is not vertically | ||
+ | abstractable, it is horizontally abstractable across N. In fact, exerting control over when | ||
+ | Analytica abstracts horizontally or vertically is a quite general tool for eliminating limitations on | ||
+ | array abstractability. | ||
+ | |||
+ | Leveraging [[User-Defined Functions]] (UDFs) is usually the easiest and most efficient way to | ||
+ | horizontally abstract. The qualifier atomic is generalized (in Analytica 3.1 and above) to any | ||
+ | number of dimensions using the parameter syntax, e.g. | ||
+ | |||
+ | ( x : Array[I,J,Time] ; I,J : IndexType ; y : atomic ) | ||
+ | |||
+ | The list of indexes in square brackets indicates the “allowed” dimensions – any other dimension | ||
+ | will be horizontally abstracted over before calling your function, so that inside your function | ||
+ | body, you can assume that only these dimensions appear. The dimensions may be either global | ||
+ | indexes (as Time is above), or other index parameters. The qualifier atomic is equivalent to | ||
+ | Array[ ] – i.e., an array with zero dimensions. | ||
+ | |||
+ | If a parameter that is not indexed by I were passed to a parameter with the qualifier Array[I], the | ||
+ | value will be passed in without the index. A value not indexed by I is generally treated as a | ||
+ | value that is constant across dimension I by Analytica, so there are few instances where this will | ||
+ | matter. One way this is often leveraged is by declaring a parameter as Array[Run], which allows | ||
+ | the Run index, but doesn’t require it, for example, when the parameter is evaluated in Mid mode. | ||
+ | In the case where you must have the dimension, the all qualifier can be added: | ||
+ | |||
+ | ( x : Array All[I,J] | ||
+ | |||
+ | The atomic and Array qualifiers on user-defined functions demonstrated above is a very | ||
+ | convenient method for specifying horizontal array abstraction when a UDF is used. Within the | ||
+ | body of definition (whether a variable, user-defined function, or other object), horizontal array | ||
+ | abstraction controlled by specifying the allowed dimensions in a variable declaration (Var..Do or | ||
+ | Using..Do). Two variations on syntax are | ||
+ | |||
+ | [[Var]] X[I,J,…] := expr ; | ||
+ | body1; | ||
+ | body2; | ||
+ | … | ||
+ | bodyL | ||
+ | |||
+ | or | ||
+ | |||
+ | Using X := expr In Each I,J,… Do body | ||
+ | |||
+ | In the first syntax, the bracketed list of indexes specifies that array abstraction on the local | ||
+ | variable X is to be done horizontally for any dimensions not listed in brackets. The local | ||
+ | variable X has a lexical scope of body1 thru bodyL, and each of those expressions to the end of | ||
+ | the current lexical scope can assume that X is only indexed by those dimensions named. The | ||
+ | Using..In Each..Do syntax accomplishes the same with a slightly different syntax, the first syntax | ||
+ | having a more procedural flavor, the second a more functional-programming flavor and more | ||
+ | consistent with the syntactic style prevalent in earlier versions of Analytica. In either case, the | ||
+ | list of dimensions can be empty, which states that X should be horizontally abstracted all the way | ||
+ | down to a scalar (equivalent to the atomic qualifier on a UDF parameter). Thus, another way of | ||
+ | writing the factorial expression in a fully abstractable way is: | ||
+ | |||
+ | Using A := N In Each Do [[Product]](1..A) | ||
+ | |||
+ | or synonymously | ||
+ | |||
+ | [[Var]] A[ ] := N; | ||
+ | [[Product]](1..A) | ||
+ | |||
+ | As another variation, consider the non-abstractable expression7 | ||
+ | |||
+ | [[Sum]](0.5^([[Min]](B,I)..[[Max]](B,I))) | ||
+ | |||
+ | Here the assumption is that B is not indexed by anything other than ''I'', so an abstractable version | ||
+ | of this expression would be | ||
+ | |||
+ | [[Var]] A[I] := B Do [[Sum]](0.5^([[Min]](A,I)..[[Max]](A,I))) | ||
+ | |||
+ | Since each declaration loops independently, multiple dimensionality declarations can sometimes | ||
+ | be inefficient. For example, in: | ||
+ | |||
+ | [[Var]] A[] := X; | ||
+ | [[Var]] B[] := Y; | ||
+ | F(A,B) | ||
+ | |||
+ | if X and Y are both indexed by I, the first declaration will loop over I, and the second will loop | ||
+ | over I again. The final result will utilize only the diagonal if this size(I)^2 computation. There | ||
+ | are two ways to avoid this inefficiency. The easiest is to use a User-Defined function with a | ||
+ | declaration (A ,B : atomic), which would loop over I only once. A more complex method, not | ||
+ | requiring a UDF, is to bundle the declaration using references, as in the following: | ||
+ | |||
+ | [[Index]] ABindex := [‘A’,’B’]; | ||
+ | [[Var]] AB[] := [[Array]](ABIndex,[ \[]X, \[]Y ] ); | ||
+ | [[Var]] A := #AB[ABIndex=’A’]; | ||
+ | [[Var]] B := #AB[ABIndex=’B’]; | ||
+ | F(A,B) | ||
+ | |||
+ | Here the reference syntax, [[Using References|\[]X]], indicates that no indexes are to be “swallowed” by the reference. | ||
+ | You would place any desired dimensions for A in the square brackets of the reference operator if | ||
+ | A is not just atomic. The single declaration, “[[Var]] AB[]”, ensures that the variables all have their | ||
+ | desired dimensionality in the expression body. | ||
+ | |||
+ | = The While Loop = |
Revision as of 00:05, 23 February 2010
This article was written April 2003 by Lonnie Chrisman, Ph.D., of Lumina Decision Systems, updated Nov 2005, and originally in a PDF format. It was transferred to the Analytica Wiki Feb 2010.
The Benefits of Array Abstraction
One of the most powerful and useful features of Analytica is Intelligent Array Abstraction™.
When we give the variable Profit the definition Revenue-Expenses
, Analytica automatically
carries through any dimensions of Revenue and Expenses,
seamlessly making the correspondence if the same
dimension occurs in each. If the Revenue and Expenses variables are broken down by (i.e.,
indexed by) department and project, the resulting Profit will also be broken down across these
two dimensions. A conventional programming language would force the modeler to explicitly
loop over these dimensions when computing Profit, and a spreadsheet would require the modeler
to explicitly copy the same formula to every cell in the two-dimensional grid. This manual
overhead imposed on the modeler in those environments is distracting, and fundamentally
unnecessary, since the relationship between Profit, Revenue and Expenses is entirely separate
from whatever the dimensions happen to be in a particular model. This principle of separating
the fundamental relationships between quantities (variables) and their dimensionality is what we
call (Intelligent) Array Abstraction.
Once one has mastered the basics of array abstraction, it is hard to imagine going back to a more primitive modeling environment (such as a spreadsheet or standard programming language) without such a capability for any non-trivial modeling application. Array abstraction provides a flexibility that goes hand-in-hand with the modeling process. When we begin prototyping a model, the optimum dimensions to include are seldom clear. The modeling process usually involves the elucidation of relevant variables, specification of fundamental relationships, and experimentation with tradeoffs between richness of knowledge to build off of, computational complexity, and levels of detail. The determination of the appropriate dimensionality for key variables is a task that usually falls most naturally after elucidation of relationships between variables. Array abstraction allows the modeler to do just this – once an abstractable model is in place, dimensions can be added and removed with little effort. This can be contrasted with spreadsheets and conventional programming languages where one must commit to the dimensionality of variables early in the modeling process, and because every computational step involves explicit defined iteration over the specified dimensions, altering the dimensionality of variables tends to be extremely tedious.
Array abstraction also promotes correctness. Opportunities to make errors when copying cell references in a spreadsheet abound, are quite common in practice and are hard to verify; likewise, it is quite easy to make mistakes when writing explicit looping constructs in conventional programming languages, such as accidentally confusing the correspondence between dimensions. Separated from the clutter of all these looping constructs, the clarity of variable definitions in Analytica is far more apparent. Furthermore, array abstraction leaves one with a sense of confidence that solid portions of the model are and will remain correct, regardless of other modifications or dimensional adjustments that may be made outside of that sub-model. In Analytica, array abstraction enables a number of other important capabilities. A major one of those is the uncertainty analysis that is built into Analytica. Uncertainty analysis in Analytica operates by adding a dimension to the uncertain variables in your model, named the Run dimension. Since your (presumably abstractable) model describes the relationships between variables, what is true without that dimension should also be true with that dimension, and thus Analytica can seamlessly propagate that dimension through the computations, even though as a modeler, you might not have conceptualized your variables up front as being dimensioned by Run.
Another important capability arising from array abstraction is What If analysis. Adding new dimensions to key input variables enables one to simultaneously explore and compare multiple scenarios, without having the logic of your model. “What-if” analyses are also often the bases decision making by exploring spaces of possible decisions.
Non-abstractable Definitions
Most Analytica expressions (relationships) that one would write in a variable’s definition field will be “abstractable”, that is, it will be possible to arbitrarily add and adjust dimensions to input variables without altering the correctness of the result. Analytica will propagate any of these dimensions through the variables without any extra thought on the part of the modeler. However, there are some notable exceptions, where it is possible to write a definition that is not abstractable. When such a definition is written, the flexibility and benefits of array abstraction may be lost. A key area of the more advanced Analytica modeling skill set is being able to recognize such cases and to know how to “correctly” express the desired relationships in alternative, abstractable ways.
Expressions that are limited in their ability to be array abstracted often fall into one of three groups: Expressions that introduce new dimensions, dimension-reducing expressions that that don’t name the relevant dimension, and convergence/iterative/recursive algorithms where the total number of iterations depends on the computation itself and cannot be known in advance. Relationships falling in these categories can almost always be written in an array abstractable fashion if the modeler pays appropriate attention to how the expression is written.
An example of a non-abstractable expression in the first group of functions that introduce a new dimension is the Sequence function, e.g., 1..N or Sequence(1,N). When N is “atomic” (i.e., not an array, containing no dimensions), this expression evaluates just fine, but N is changed to a list of numbers, e.g., [10,20,30], the result would be a ”non-rectangular” array, which is not allowed as an Analytica value. The presence of a Sequence function embedded in a more complex expression could make the entire expression non-abstractable if appropriate modeling style is not exercised, as in the following example of an expression to compute the factorial of a number N:
Product(1..N)
Although there is no inherent reason why the factorial of a number should not be abstractable, this particular way of writing the factorial would not be abstractable. Other functions also falling into the first group of expressions that (may) introduce dimensions include Subset, SplitText, SortIndex (if the second parameter is omitted), Concat, CopyIndex, IndexNames, and Unique.
An example from the second group of expressions, dimension-reducing expressions not naming the relevant dimension, is the expression: Sum(A), where the second index parameter of Sum is omitted. This form sums over the “outer” dimension of A. As a modeler, there are generally only a couple of instances where you know what the outer dimension is – the case where A is guaranteed to be only one-dimensional, and the case where A contains an implicit dimension (which will always be the outer dimension). If you assume any other dimension is the outer dimension, you may be unpleasantly surprised when new dimensions are introduced. Thus, as a matter of style, it is best to always include the relevant dimension as a second parameter to the array reducing functions in all other cases, and you’ll steer clear with abstractability problems from this class of expressions. The array reducing functions with optional index parameters include Sum, Product, Max, Min, Average, Rank, ArgMax, ArgMin, SubIndex, ChanceDist, CumDist, and ProbDist. Also, the function Size is essentially also in this category, although there is no concept of relevant index there – basically, Size should only be used on parameters restricted to be one-dimensional, index parameters being a very important special case.
Finally, the third class of expressions are iterative or recursive convergence algorithms with a dynamically determined number of iterations. This group almost always involves While or Iterate functions. For example, another non-abstractable expression for factorial of an integer N is:
Var a := 1; Var fact := 1; While a < N Do ( a := a + 1; fact := fact * a )
As written, this expression assumes N to be atomic, and will not evaluate if N is an array. This expression will be discussed further below when discussing Horizontal and Vertical Array Abstraction. The fundamental problem with abstracting over a While is that the number of iterations may differ from cell-to-cell, so a single evaluation of the While loop does not account for the cell-by-cell differences in evaluation flow. It is important to realize that if you are iterating over a dimension, either implicitly or via For or Using..In..Do, because the number of iterations required is fixed in advanced, abstraction limitations generally are not introduced.
Horizontal and Vertical Array Abstraction
There are two conceptual models for how Analytica actually carries out array abstraction when evaluating expressions, which we will (somewhat arbitrarily) call the horizontal and vertical conceptualizations. For illustrative purposes, suppose you have a model, call it F(X), that computes a result based on the input variable X, and suppose we add a new dimension I:=1..100 to X and compute our result. The horizontal conceptualization is that Analytica evaluates F(x) one hundred separate times, once for each value of X, and collects the result for display at the end. The vertical conceptualization is that the array X is passed through each level of evaluation as an array, vertically through the call stack, until the array-valued value bottoms out at a built-in operator or function. In most instances, either conceptualization is perfectly fine, and knowledge of which method Analytica actually implements is unnecessary. Indeed, the two conceptualizations are functionally equivalent (i.e., produce the same net result) for any fully abstractable expression. However, the expert Analytica modeler can make use of how Analytica abstracts to leverage horizontal or vertical evaluation as appropriate in the problem cases highlighted in the previous section in order to write abstractable variations of their expressions.
In the default case, Analytica uses vertical abstraction to evaluate an expression. Thus, when the user-defined function
MyFact(N) := Product(1..N)
is called with N:= [10,20,30], the array of values is passed into the definition, then into the parameter of Product, and finally into the second parameter of Sequence, resulting in the evaluation of Sequence(1,[10,20,30]), which as discussed in the previous section, presents a problem. However, using the “atomic” qualifier on the parameter, e.g.,
MyFact( N : atomic ) := Product(1..N)
instructs Analytica to abstract over the parameter of MyFact horizontally, so instead the expression Product(1..10) is evaluated, followed by the expression Product(1..20), and finally the expression Product(1..30). Each of these returns a scalar, which is then collected up and returned as [3.6e6, 2.4e18, 2.7e32]. So, although the expression Product(1..N) is not vertically abstractable, it is horizontally abstractable across N. In fact, exerting control over when Analytica abstracts horizontally or vertically is a quite general tool for eliminating limitations on array abstractability.
Leveraging User-Defined Functions (UDFs) is usually the easiest and most efficient way to horizontally abstract. The qualifier atomic is generalized (in Analytica 3.1 and above) to any number of dimensions using the parameter syntax, e.g.
( x : Array[I,J,Time] ; I,J : IndexType ; y : atomic )
The list of indexes in square brackets indicates the “allowed” dimensions – any other dimension will be horizontally abstracted over before calling your function, so that inside your function body, you can assume that only these dimensions appear. The dimensions may be either global indexes (as Time is above), or other index parameters. The qualifier atomic is equivalent to Array[ ] – i.e., an array with zero dimensions.
If a parameter that is not indexed by I were passed to a parameter with the qualifier Array[I], the value will be passed in without the index. A value not indexed by I is generally treated as a value that is constant across dimension I by Analytica, so there are few instances where this will matter. One way this is often leveraged is by declaring a parameter as Array[Run], which allows the Run index, but doesn’t require it, for example, when the parameter is evaluated in Mid mode. In the case where you must have the dimension, the all qualifier can be added:
( x : Array All[I,J]
The atomic and Array qualifiers on user-defined functions demonstrated above is a very convenient method for specifying horizontal array abstraction when a UDF is used. Within the body of definition (whether a variable, user-defined function, or other object), horizontal array abstraction controlled by specifying the allowed dimensions in a variable declaration (Var..Do or Using..Do). Two variations on syntax are
Var X[I,J,…] := expr ; body1; body2; … bodyL
or
Using X := expr In Each I,J,… Do body
In the first syntax, the bracketed list of indexes specifies that array abstraction on the local variable X is to be done horizontally for any dimensions not listed in brackets. The local variable X has a lexical scope of body1 thru bodyL, and each of those expressions to the end of the current lexical scope can assume that X is only indexed by those dimensions named. The Using..In Each..Do syntax accomplishes the same with a slightly different syntax, the first syntax having a more procedural flavor, the second a more functional-programming flavor and more consistent with the syntactic style prevalent in earlier versions of Analytica. In either case, the list of dimensions can be empty, which states that X should be horizontally abstracted all the way down to a scalar (equivalent to the atomic qualifier on a UDF parameter). Thus, another way of writing the factorial expression in a fully abstractable way is:
Using A := N In Each Do Product(1..A)
or synonymously
Var A[ ] := N; Product(1..A)
As another variation, consider the non-abstractable expression7
Sum(0.5^(Min(B,I)..Max(B,I)))
Here the assumption is that B is not indexed by anything other than I, so an abstractable version of this expression would be
Var A[I] := B Do Sum(0.5^(Min(A,I)..Max(A,I)))
Since each declaration loops independently, multiple dimensionality declarations can sometimes be inefficient. For example, in:
Var A[] := X; Var B[] := Y; F(A,B)
if X and Y are both indexed by I, the first declaration will loop over I, and the second will loop over I again. The final result will utilize only the diagonal if this size(I)^2 computation. There are two ways to avoid this inefficiency. The easiest is to use a User-Defined function with a declaration (A ,B : atomic), which would loop over I only once. A more complex method, not requiring a UDF, is to bundle the declaration using references, as in the following:
Index ABindex := [‘A’,’B’]; Var AB[] := Array(ABIndex,[ \[]X, \[]Y ] ); Var A := #AB[ABIndex=’A’]; Var B := #AB[ABIndex=’B’]; F(A,B)
Here the reference syntax, \[]X, indicates that no indexes are to be “swallowed” by the reference. You would place any desired dimensions for A in the square brackets of the reference operator if A is not just atomic. The single declaration, “Var AB[]”, ensures that the variables all have their desired dimensionality in the expression body.
Enable comment auto-refresher