Controlling When Result Values Are Cached

Revision as of 00:59, 11 February 2016 by Bbecane (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Caching of Computed Results

When Analytica computes the result of a variable, it stores the computed value in memory. If that result is requested later, it simply returns the previously computed value without having to recompute it. This process is referred to as caching. The stored result is the cached result.

After Analytica has cached a computed result, it maintains information about what factors went into that computation. If any upstream value is changed, Analytica automatically detects that its cached value may no longer be valid. When this happens, Analytica invalidates the cached result. Only the variables that are or may be influenced by a changed parameter are invalidated. After a cached value is invalidated, that value is no longer stored in memory, and thus next time it is needed, Analytica must recompute it. This system of remembering what each result depends on is referred to as dependency maintenance. As a user of Analytica, you seldom need to worry about making sure values get recomputed when things change -- the dependency maintenance subsystem takes care of it for you automatically.

Analytica actually caches up to three separate results for each variable: The mid value, sample value and index value. Some of these may not apply to particular variables. In practice, it is the mid value and sample value that have the potential to consume substantial amounts of memory.

Many benefits emerge from the automatic caching of results. Some computations are automatically made dramatically more efficient, since intermediate variables don't need to be recomputed by every child variable that uses the result. In some dramatic cases this can actually convert an exponential algorithm into a polynomial one, with no effort from the modeler (in effect, it automatically converts to a dynamic programming algorithm). When you are examining or debugging the results of a model, the ability to see the values of intermediate variables can be extremely helpful. Not only are these readily accessible without additional recomputation, but you see exactly the values (including the precise Monte Carlo samples) that were used during the computation of downstream results.

Analytica caches results at the resolution of variables. Intermediate results used during the evaluation of a definition live only as long as their values are used, and then their memory is reclaimed. It is only when a variable object's mid or sample value completes is that value cached. These are stored in the midValue and probValue attributes of the object. Only variable objects, and objects subclassed from Variable (Decision, Chance, Objective, Index, Constant) are cached. In particular, the values of User-Defined Functions are never cached, even if they have zero input parameters.

Controlling which results get cached

One aspect of automatic caching of results is that it essentially trades memory for speed. The downside arises when available memory is limited. Caching all intermediate results in your model consumes memory, which may limit the scale of computation you can complete without the available memory resources of your computer. By eliminating some cached results, it may be possible to free up memory for use elsewhere in your model. This is the motivation for controlling what gets cached.

You can impact how much data is cached through various restructurings of your model. You seldom want to do this if you can avoid it, since it may impact transparency of your model. But when in desperation, such restructurings are sometimes warranted.

Restructurings intended to reduce the amount of cached data are based on principles. The first is that only final results of variables are cached -- not intermediate results. The second is that User-Defined Functions never cache their results. Thus, when you want to eliminate the memory caching overhead of an intermediate variable, you can fold it into the expression of its successor and eliminate the variable, or you can just convert it to a User-Defined Function.

Converting to a User-Defined function is easy in theory, since a UDF does not have to have any parameters. With a blank parameter list, it simply evaluates its definition, which can depend on other global input variables. To ensure equivalence with a variable node, your result should not have an implicit dimension (since a UDF cannot promote this to a self-index, given that it doesn't cache its results).

The mechanics of converting a variable to be a UDF are a bit more problematic. There is no user-interface method for converting the class of an object from Variable to Function. In general, you need to do this by adding a function node, copying your definition to it, changing each child of your variable to use the function instead (which involves a syntactic change as well -- adding parenthesis after the function name), and then deleting your original object.

Before going to the effort of enacting these restructurings, it is always wise to run the Performance Profiler to measure the amount of memory consumed by your intermediate variables' caches. There is no reason to eliminate variables that do not consume substantial amounts of memory.

Configuring the CachingMethod Attribute

This feature requires Analytica Enterprise.

Analytica 4.2 introduced a new attribute, CachingMethod, which you can use to specify how results are to be cached for individual variables. CachingMethod is an attribute of variable objects, which can be set to the following numeric values:

0 = Analytica's default (this is Always cache)
1 = Always cache result
2 = Never cache array results
3 = Release cached result after all children are fully computed
4 = Never cache results, even if scalar

These values apply to both cached mid-value and sample-value results. You can also separately control how mid and sample values are cached, if you have reason to do so, using the following values:

18 = Always cache sample, Never cache mid
19 = Always cache sample, release mid
33 = Never cache sample, always cache mid
35 = Never cache sample, release mid
49 = Release sample, always cache mid
50 = Release sample, never cache mid

Never Cache Settings

When a variable is set to Never Cache its result, then its result must be recomputed every time it is requested. If its parents are cached, those cached value when be utilized, and only the variable's own definition will need to be re-evaluated.

Suppose a variable X is set to never cache its result, and X has two children, Y1 and Y2, both of which use X in their definition. When these are computed, X's definition will need to be re-evaluated at least twice -- once when Y1 is computed, and again when Y2 is computed. It may end up re-evaluating X many more than just two times. Suppose Y1 is defined as X + X. In this case, X will be re-evaluated twice during the evaluation of Y1. You can avoid that extra work using:

Var x:=x do x + x

which temporarily stores the value of x in a local variable with the same name.

If you later attempt to view the result window or result graph for X, the user interface will have to re-evaluate X's definition again. While the result window is open, small changes may cause the UI to re-read the result, in which case it may be again re-evaluated. Every time you switch to a different view (e.g., from PDF to CDF), the sample value will be re-evaluated. In exchange for recovering the memory from X's cached value, quite a bit of convenience may be sacrificed. On the other hand, if the intermediate variable is not of primary interest, this may be an insignificant price to pay.

If you are using the cache control setting to save on memory, then you should use CachingMethod = 2 rather than CachingMethod = 4. Option 4 has no memory savings over option 2. Option 4 is there for the rare case where you intentionally wish to force a result to be recomputed every time it is requested, even though the result may be scalar.

Release Cache Setting

You can also configure a variable to release its cached value when all its children have been computed. This setting can help preserve the efficiency gained by avoiding the recomputation of intermediate variables, while still enabling that memory to be reclaimed.

Consider again the simple example where Y1 := X + X and Y2 := F(X), where F is some function (we'll assume the parameter is a Context mode parameter). Suppose that X is configured to release its cached value (CachingMethod = 3). When Y1 is evaluated, X gets evaluated and its value cached when the first X is evaluated. When the second X is requested, its value is immediately available without recomputation. Y1 can now be computed (as X + X) and its value cached (if it is configured to do so). Later, when Y2 is evaluated, the value of X is already cached and does not need to be recomputed. Y1 and Y2 are both computed with only one evaluation of X. Upon the completion of Y2's evaluation, the cached value for X is released, and that memory is reclaimed.

If you later attempt to display the value of X in a result window, Analytica must re-evaluate its definition from scratch, since the value is no longer cached. The newly recomputed value does not get re-cached as long as the results for Y1 and Y2 remain valid, so from this point it behaves similarly to a never-cached variable. Various minor GUI changes may require reevaluating X.

If you happen to view the result for X in a result window after Y1 had been computed, but before Y2 has been computed, the cached value is available for the user-interface to use, and in that case, no recomputation occurs.

While the release-cache setting may sound good in theory, you may find it substantially less useful than you may have expected. When Analytica determines whether all children are computed, it does so in a very conservative fashion. If there is any possibility that a child may still need the value in the future, then it cannot release the cached value. In fact, if Analytica cannot prove that the child is done with the value, it will not release it. In some cases, the child's definition may be so complex that even though the child will not need the value again, Analytica is simply unable to prove that this is the case. The problematic cases often arise as the result of functions that change the evaluation context. Consider the following example, where Y is the only child of X.

Variable Y := Z + Mean(X)

Once Y's mid-value has been computed, you might hope that X's cached sample value would be released. However, Analytica cannot release X's sample value until Y's mid and sample values are both computed. As long as Y's sample value remains uncomputed, there is a possibility it may still to use X's sample value. Most likely, you'll compute your model's final results, and Y will end up being computed in just one evaluation mode. During that computation, X will never relinquish its cached value. Unfortunately, these cases where Analytica cannot prove that the child is done with the value are more common than you might think, and does limit the usefulness of this configuration substantially.

If your final result is a number or the value Null, then releasing the cached value does not save any memory. In these cases, Analytica does not release the cached value even if you have your CachingMethod set to 3.

Interaction with Monte Carlo Sampling

Setting the CachingMethod of a chance variable (or any variable that uses a distribution function or the Random function) to anything other than Always Cache can create havoc. Analytica does not prevent you from doing this, but you should understand the consequences if you do this.

Let's look at an example:

Chance U := Uniform(0, 1)
Variable Z1 := U^2
Variable Z2 := U*2
Variable Z := Z2^2/Z1

If you do not twiddle the CachingMethod of U, Z will evaluate to 4 for every sample value. But consider what happens if you set CachingMethod of U to 2 (never cache). When Z1 is evaluated, U generates a random Monte Carlo sample, which is the used to compute Z^2. U then dumps that sample without caching it. Later when Z2 is computed, U generates a new Monte Carlo sample, which will be independent of the one used by Z1. Instead of getting the expected Z := 4 for every sample, a scatter of values emerges. This has the potential to distort your results and mislead.

The lesson here is that you must be very careful with which variables you configure to release their cached values.

Limitations

The manually-controlled CachingMethod is non-robust when used in combination with some of Analytica's more esoteric features. You should use it only in fairly straightforward situations. Here we'll mention some of the more esoteric cases that could result in problematic interactions.

  • Can't release when used by UDF
    You cannot use the Release-when-children-computed (CachingMethod = 3) configuration when the variable is used directly by a User-Defined function.
  • You cannot use the never-cache method (CachingMethod = 2) for a variable defined by ComputedBy.
    • It is a bad idea to use the release-cache method (CachingMethod = 3) on a ComputedBy variable. This is allowed, but you may not be able to view the result in a result window later.
  • There may be complex interactions with the following functions. Thus, it is preferable if you avoid changing the CachingMethod on variables that interact with these functions (some of these interactions may be fine, but we're just warning you in advance):
  • When a child makes use of a function with any of the following qualifiers, your value will usually not be relinquished until both mid and sample values for the child are fully computed.

See Also

Comments


You are not allowed to post comments.