Difference between revisions of "Aggregate"

Line 1: Line 1:
 
[[Category:Transforming functions]]
 
[[Category:Transforming functions]]
 
[[Category:Array Library]]
 
[[Category:Array Library]]
 +
 
== Aggregate(x, map, i, targetIndex) ==
 
== Aggregate(x, map, i, targetIndex) ==
Aggregates the values in array «x» over index «i» to produce a result indexed by «targetIndex», where map (indexed by «i») gives the value of «targetIndex» for each element of «i». It gives a warning if any value in map is not in index «targetIndex».
+
Aggregates the values in array «x» over index «i» to produce a result indexed by «targetIndex», where «map» (indexed by «i») gives the value of «targetIndex» for each element of «i». It gives a warning if any value in «map» is not in index «targetIndex».
  
 
== Examples ==
 
== Examples ==
 
 
If <code>Population_by_country</code> is indexed by <code>Country</code> and the map <code>Continent_by_country</code> (indexed by <code>Country</code>) gives the continent containing each country, where <code>Continent</code> is an index listing the <code>Continents</code>, then
 
If <code>Population_by_country</code> is indexed by <code>Country</code> and the map <code>Continent_by_country</code> (indexed by <code>Country</code>) gives the continent containing each country, where <code>Continent</code> is an index listing the <code>Continents</code>, then
  
Line 13: Line 13:
  
 
Similarly, given the following variables:
 
Similarly, given the following variables:
Index Months := [[Sequence]]([[MakeDate]](2009, 1, 1), [[MakeDate]](2019, 12, 1), dateUnit:"M")
+
:<code>Index Months := Sequence(MakeDate(2009, 1, 1), MakeDate(2019, 12, 1), dateUnit: "M")</code>
Index Years := 2009..2020
+
:<code>Index Years := 2009..2020</code>
Year_by_months := [[DatePart]](Months,"Y")
+
:<code>Year_by_months := DatePart(Months, "Y")</code>
Revenue_by_month := { an array indexed by Months }
+
:<code>Revenue_by_month := { an array indexed by Months }</code>
&nbsp;
+
:<code>Revenue_by_year := Aggregate(Revenue_by_month, Year_by_months, Months, Years)</code>
Revenue_by_year := [[Aggregate]](Revenue_by_month, Year_by_months, Months, Years)
 
  
 
==Optional Parameters==
 
==Optional Parameters==
=== ''type'' ===
+
=== Type ===
The aggregation method: Average, Max, Min, or Median, or Sum (which is the default). Any other array-reducing function -- i.e. a function reduces the number of dimensions by one -- will work. You can even provide a User-defined function that is an array-reducing function. You may specify the «type» parameter as a text value -- e.g. "Product" -- or a handle to the function -- e.g. Handle(Product).  
+
The aggregation method: <code>Average, Max, Min, Median</code>, or <code>Sum</code> (default). Any other array-reducing function -- i.e. a function reduces the number of dimensions by one -- will work. You can even provide a [[User-Defined function|user-defined fuction]] that is an [[Array-reducing functions|array-reducing function]]. You may specify the «type» parameter as a text value -- e.g. <code>"Product"</code> -- or a handle to the function -- e.g. <code>Handle(Product)</code>.  
  
=== ''positional'' ===
+
=== Position ===
False by default. If True, «map» must contain the integer positions of the corresponding element of «targetIndex» rather than their values.
+
<code>False</code> by default. If <code>True</code>, «map» must contain the integer positions of the corresponding element of «targetIndex» rather than their values.
  
=== ''defaultValue'' ===
+
===DefaultValue ===
 
The value in the result array when no value in «x» maps to that value of «targetIndex».  No warning is issued when there are result cells with no data. The default «defaultValue» is [[Null]]. If you'd like to use a different value, such as 0, you can specify the value like so:
 
The value in the result array when no value in «x» maps to that value of «targetIndex».  No warning is issued when there are result cells with no data. The default «defaultValue» is [[Null]]. If you'd like to use a different value, such as 0, you can specify the value like so:
  
<code>[[Aggregate]](x, map, i, targetIndex, type:"SDeviation", defaultValue:0)</code>
+
<code>Aggregate(x, map, i, targetIndex, type: "SDeviation", defaultValue: 0)</code>
  
 
<tip title="Note">Aggregate ignores [[Null]] values in «x».</tip>
 
<tip title="Note">Aggregate ignores [[Null]] values in «x».</tip>
Line 37: Line 36:
 
The optional «type» parameter specifies the aggregation type.  The parameter names an existing [[:Category:Array-reducing functions|array-reducing function]], either built-in or [[User-Defined Functions|user-defined]].  You can supply either the textual name of the function, or a handle to the function (or even an array to these).  Built-in functions that can be applied include [[Sum]] (default), [[Max]], [[Min]], [[Average]] or [[Mean]], [[Product]], [[Median]], [[SDeviation]], [[Variance]], [[Skewness]], [[Kurtosis]], and [[Join]].   
 
The optional «type» parameter specifies the aggregation type.  The parameter names an existing [[:Category:Array-reducing functions|array-reducing function]], either built-in or [[User-Defined Functions|user-defined]].  You can supply either the textual name of the function, or a handle to the function (or even an array to these).  Built-in functions that can be applied include [[Sum]] (default), [[Max]], [[Min]], [[Average]] or [[Mean]], [[Product]], [[Median]], [[SDeviation]], [[Variance]], [[Skewness]], [[Kurtosis]], and [[Join]].   
  
Revenue_dev := [[Aggregate]](Revenue_by_month,MonthToYear,Month,Year,Type:"SDeviation")
+
:<code>Revenue_dev := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: "SDeviation")</code>
Peak_revenue:= [[Aggregate]](Revenue_by_month,MonthToYear,Month,Year,Type:[[Handle]](Max))
+
:<code>Peak_revenue := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: Handle(Max))</code>
Ave_revenue := [[Aggregate]](Revenue_by_month,MonthToYear,Month,Year,Type:"Average")
+
:<code>Ave_revenue := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: "Average")</code>
  
 
You can also create your own custom aggregation method in the form of a [[User-Defined Function]].  The function should accept an array parameter and an index, and should [[:Category:Array-reducing functions|reduce]] the array so that the result eliminates the indicated index.   
 
You can also create your own custom aggregation method in the form of a [[User-Defined Function]].  The function should accept an array parameter and an index, and should [[:Category:Array-reducing functions|reduce]] the array so that the result eliminates the indicated index.   
  
 
The following would all be examples of legal aggregation functions:
 
The following would all be examples of legal aggregation functions:
Function First(A:Array[I] ; I:Index) := A[@I=1]
+
:<code>Function First(A: Array[I]; I: Index) := A[@I = 1]</code>
Function Last(A:Array[I] ; I:Index) := A[@I=[[Size]](I)]
+
:<code>Function Last(A: Array[I]; I: Index) := A[@I = Size(I)]</code>
Function Middle(A:Array[I] ; I:Index) := A[@I=[[Floor]]([[Size]](I)/2)+1]
+
:<code>Function Middle(A: Array[I]; I: Index) := A[@I = Floor(Size(I)/2) + 1]</code>
Function PosMax(A:Array[I] ; I:Index) := [[ArgMax]](A,I,position:true)
+
:<code>Function PosMax(A: Array[I]; I: Index) := ArgMax(A, I, position: true)</code>
  
These would be used in the same fasion as other types:
+
These would be used in the same fashion as other types:
Opening_Revenue := [[Aggregate]](Revenue_by_month,MonthToYear,Month,Year,Type:"First")
+
:<code>Opening_Revenue := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: "First")</code>
Closing_Revenue := [[Aggregate]](Revenue_by_month,MonthToYear,Month,Year,Type:"Last")
+
:<code>Closing_Revenue := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: "Last")</code>
Best_Month := [[Aggregate]](Revenue_by_month,MonthToYear,Month,Year,Type:Handle(PosMax))
+
:<code>Best_Month := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: Handle(PosMax))</code>
  
 
If you desire to use "Count" as an aggregation type, you simply need to use a boolean expression for the first parameter, and do not need to specify the «type» parameter.  For example, to count the number of months in each year with positive revenue:
 
If you desire to use "Count" as an aggregation type, you simply need to use a boolean expression for the first parameter, and do not need to specify the «type» parameter.  For example, to count the number of months in each year with positive revenue:
[[Aggregate]](Revenue_by_month>0,MonthToYear,Month,Year)
+
:<code>Aggregate(Revenue_by_month > 0, MonthToYear, Month, Year)</code>
  
 
Note: Aggregation (aka conglomeration) functions can also be utilized by the [[MdTable]] function.  The requirements are the same.
 
Note: Aggregation (aka conglomeration) functions can also be utilized by the [[MdTable]] function.  The requirements are the same.
  
 
== Positional Maps ==
 
== Positional Maps ==
 
 
The «map» parameters to specifies how each element in the original index «I» maps into the «targetIndex».  This is usually a many-to-one mapping.  Your «map» is necessarily indexed by «I», and each element of «map» specifies an element of «targetIndex».  When specifying the target values, you can do so either [[Associative vs. Positional_Indexing|by value or by position]].  The default assumes that map contains values from «targetIndex», but by specifying the optional «positional» parameter as true, «map» is interpreted as containing positions.
 
The «map» parameters to specifies how each element in the original index «I» maps into the «targetIndex».  This is usually a many-to-one mapping.  Your «map» is necessarily indexed by «I», and each element of «map» specifies an element of «targetIndex».  When specifying the target values, you can do so either [[Associative vs. Positional_Indexing|by value or by position]].  The default assumes that map contains values from «targetIndex», but by specifying the optional «positional» parameter as true, «map» is interpreted as containing positions.
  
Variable M_to_Y_pos := [[Floor]]( (@Month-1)/12 ) + 1
+
:<code>Variable M_to_Y_pos := Floor((@Month-1)/12) + 1</code>
Variable Annual_Rev := [[Aggregate]](Revenue_by_month,M_to_Y_pos,Month,Year,positional:true)
+
:<code>Variable Annual_Rev := Aggregate(Revenue_by_month, M_to_Y_pos, Month, Year, positional: true)</code>
  
 
The «map» may contain [[Null]] values.  With a positional mapping, any «I»-position in which «map» is [[Null]] does not get included in the aggregated result.  With a value-based mapping, these elements are included only if one of the elements of «targetIndex» is [[«null»]] (which is uncommon).
 
The «map» may contain [[Null]] values.  With a positional mapping, any «I»-position in which «map» is [[Null]] does not get included in the aggregated result.  With a value-based mapping, these elements are included only if one of the elements of «targetIndex» is [[«null»]] (which is uncommon).
Line 73: Line 71:
 
Just as with aggregate, there are many potential types of de-aggregation. For example, when de-aggregating annual data down to months, we might use the annual value for every month in that year, or split it uniformly among all the target months, or perhaps assign it to only one month, using zero for all other months in that year.
 
Just as with aggregate, there are many potential types of de-aggregation. For example, when de-aggregating annual data down to months, we might use the annual value for every month in that year, or split it uniformly among all the target months, or perhaps assign it to only one month, using zero for all other months in that year.
  
The following uses the annual value for every month in that year.  In the example, ''Annual_val'' is indexed by ''Year'', while the result is indexed by ''Month'':
+
The following uses the annual value for every month in that year.  In the example, <code>Annual_val</code> is indexed by<code>Year</code>, while the result is indexed by <code>Month</code>:
Annual_val[ Year = MonthToYear ]
+
:<code>Annual_val[Year = MonthToYear]</code>
  
 
To spread the value uniformly across all months, we can do this:
 
To spread the value uniformly across all months, we can do this:
[[Var..Do|Var]] MonthsPerYear := [[Aggregate]](1,MonthToYear,Month,Year);
+
:<code>Var MonthsPerYear := Aggregate(1, MonthToYear, Month, Year);</code>
(Annual_val / MonthsPerYear)[ Year = MonthToYear ]
+
:<code>(Annual_val/MonthsPerYear)[Year = MonthToYear]</code>
 +
 
 +
More generally, if we have a weighting, <code>w</code>, indexed by <code>Month</code>, which specifies the proportion of annual revenue each month receives, we can use:
 +
:<code>w*Annual_val[Year = MonthsToYear]</code>
  
More generally, if we have a weighting, ''w'', indexed by Month, which specifies the proportion of annual revenue each month receives, we can use:
+
If you need to ensure that <code>w</code> is normalized for each month, use as the weight:
w * Annual_val[ Year = MonthsToYear ]
+
:<code>w := (w0/Aggregate(1, MonthToYear, Month, Year)[Year = MonthToYear])</code>
  
If you need to ensure that ''w'' is normalized for each month, use as the weight:
+
where <code>w0</code> is the unnormalized weight.  Using <code>w0 := 1</code> yields the earlier example of spreading evenly across all months for the year.
w := (w0 / [[Aggregate]](1,MonthToYear,Month,Year)[Year=MonthToYear])
 
where ''w0'' is the unnormalized weight.  Using ''w0:=1'' yields the earlier example of spreading evenly across all months for the year.
 
  
Using an array ''w'' that has a 1 in the first month of each year, and 0 for all remaining months, yields a de-aggregation that assigns the full annual value to the first month of each year:
+
Using an array <code>w</code> that has a 1 in the first month of each year, and 0 for all remaining months, yields a de-aggregation that assigns the full annual value to the first month of each year:
Index Firsts := [[Unique]](MonthToYear,Month);
+
:<code>Index Firsts := Unique(MonthToYear, Month);</code>
(@[Firsts=Month]>0) * Annual_val[Year=MonthToYear]
+
:<code>(@[Firsts = Month] > 0)*Annual_val[Year = MonthToYear]</code>
  
 
== Other methods for Aggregation ==
 
== Other methods for Aggregation ==
 
In many existing Analytica models, one will see the following method used for aggregation:
 
In many existing Analytica models, one will see the following method used for aggregation:
[[Sum]]( (MonthToYear=Year) * Revenue, Month )
+
:<code>Sum((MonthToYear = Year) * Revenue, Month)</code>
  
 
This expression is functionally equivalent to:
 
This expression is functionally equivalent to:
[[Aggregate]](Revenue,MonthToYear,Month,Year)
+
:<code>Aggregate(Revenue, MonthToYear, Month, Year)</code>
  
However, the use of [[Aggregate]] will evaluate much faster.  When using the dot-product method with [[Sum]], the intermediate expression (MonthToYear=Year) expands to a full 2-D array (Month x Year), which even though it is very sparse, still consumes lots of memory and time to process.  In rough terms, the [[Sum]]-based method is O(n<sup>2</sup>) complexity, while [[Aggregate]] is only O(n).
+
However, the use of [[Aggregate]] will evaluate much faster.  When using the dot-product method with [[Sum]], the intermediate expression <code>(MonthToYear = Year</code>) expands to a full 2-D array (<code>Month x Year</code>), which even though it is very sparse, still consumes lots of memory and time to process.  In rough terms, the [[Sum]]-based method is ''O(n<sup>2</sup>)'' complexity, while [[Aggregate]] is only ''O(n)''.
  
 
The [[Frequency]] function can be used to aggregate, with either "sum" type aggregation (using the data itself as the weighting parameter) or "count" type aggregation.  [[Frequency]] is in fact quite efficient for this purpose, but the usage for the purpose is unintuitive, making [[Aggregate]] the preferred alternative.
 
The [[Frequency]] function can be used to aggregate, with either "sum" type aggregation (using the data itself as the weighting parameter) or "count" type aggregation.  [[Frequency]] is in fact quite efficient for this purpose, but the usage for the purpose is unintuitive, making [[Aggregate]] the preferred alternative.
  
 
[[Slice Assignment]] can be used in a procedural-programming style to aggregate.  This is a very general-purpose approach (can be employed for a variety of aggregation types) and can also be quite efficient. The general approach is demonstrated here (for [[Max]]-type aggregation):
 
[[Slice Assignment]] can be used in a procedural-programming style to aggregate.  This is a very general-purpose approach (can be employed for a variety of aggregation types) and can also be quite efficient. The general approach is demonstrated here (for [[Max]]-type aggregation):
[[Var..Do|Var]] res:=[[Null]];
+
:<code>Var res := Null;</code>
[[For]] m:=Month Do (
+
:<code>For m := Month Do (</code>
    res[Year=MonthToYear] := [[Max]]( [res[Year=MonthToYear],Revenue[Month=m] ] )
+
::<code>res[Year = MonthToYear] := Max([res[Year = MonthToYear], Revenue[Month = m]])</code>
);
+
:<code>);</code>
res
+
:<code>res</code>
  
 
==History==
 
==History==

Revision as of 20:30, 19 January 2016


Aggregate(x, map, i, targetIndex)

Aggregates the values in array «x» over index «i» to produce a result indexed by «targetIndex», where «map» (indexed by «i») gives the value of «targetIndex» for each element of «i». It gives a warning if any value in «map» is not in index «targetIndex».

Examples

If Population_by_country is indexed by Country and the map Continent_by_country (indexed by Country) gives the continent containing each country, where Continent is an index listing the Continents, then

Aggregate(Population_by_country, Continent_by_country, Country, Continent)

returns the population by continent.

Similarly, given the following variables:

Index Months := Sequence(MakeDate(2009, 1, 1), MakeDate(2019, 12, 1), dateUnit: "M")
Index Years := 2009..2020
Year_by_months := DatePart(Months, "Y")
Revenue_by_month := { an array indexed by Months }
Revenue_by_year := Aggregate(Revenue_by_month, Year_by_months, Months, Years)

Optional Parameters

Type

The aggregation method: Average, Max, Min, Median, or Sum (default). Any other array-reducing function -- i.e. a function reduces the number of dimensions by one -- will work. You can even provide a user-defined fuction that is an array-reducing function. You may specify the «type» parameter as a text value -- e.g. "Product" -- or a handle to the function -- e.g. Handle(Product).

Position

False by default. If True, «map» must contain the integer positions of the corresponding element of «targetIndex» rather than their values.

DefaultValue

The value in the result array when no value in «x» maps to that value of «targetIndex». No warning is issued when there are result cells with no data. The default «defaultValue» is Null. If you'd like to use a different value, such as 0, you can specify the value like so:

Aggregate(x, map, i, targetIndex, type: "SDeviation", defaultValue: 0)

Note
Aggregate ignores Null values in «x».

Aggregation types

The optional «type» parameter specifies the aggregation type. The parameter names an existing array-reducing function, either built-in or user-defined. You can supply either the textual name of the function, or a handle to the function (or even an array to these). Built-in functions that can be applied include Sum (default), Max, Min, Average or Mean, Product, Median, SDeviation, Variance, Skewness, Kurtosis, and Join.

Revenue_dev := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: "SDeviation")
Peak_revenue := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: Handle(Max))
Ave_revenue := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: "Average")

You can also create your own custom aggregation method in the form of a User-Defined Function. The function should accept an array parameter and an index, and should reduce the array so that the result eliminates the indicated index.

The following would all be examples of legal aggregation functions:

Function First(A: Array[I]; I: Index) := A[@I = 1]
Function Last(A: Array[I]; I: Index) := A[@I = Size(I)]
Function Middle(A: Array[I]; I: Index) := A[@I = Floor(Size(I)/2) + 1]
Function PosMax(A: Array[I]; I: Index) := ArgMax(A, I, position: true)

These would be used in the same fashion as other types:

Opening_Revenue := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: "First")
Closing_Revenue := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: "Last")
Best_Month := Aggregate(Revenue_by_month, MonthToYear, Month, Year, Type: Handle(PosMax))

If you desire to use "Count" as an aggregation type, you simply need to use a boolean expression for the first parameter, and do not need to specify the «type» parameter. For example, to count the number of months in each year with positive revenue:

Aggregate(Revenue_by_month > 0, MonthToYear, Month, Year)

Note: Aggregation (aka conglomeration) functions can also be utilized by the MdTable function. The requirements are the same.

Positional Maps

The «map» parameters to specifies how each element in the original index «I» maps into the «targetIndex». This is usually a many-to-one mapping. Your «map» is necessarily indexed by «I», and each element of «map» specifies an element of «targetIndex». When specifying the target values, you can do so either by value or by position. The default assumes that map contains values from «targetIndex», but by specifying the optional «positional» parameter as true, «map» is interpreted as containing positions.

Variable M_to_Y_pos := Floor((@Month-1)/12) + 1
Variable Annual_Rev := Aggregate(Revenue_by_month, M_to_Y_pos, Month, Year, positional: true)

The «map» may contain Null values. With a positional mapping, any «I»-position in which «map» is Null does not get included in the aggregated result. With a value-based mapping, these elements are included only if one of the elements of «targetIndex» is «null» (which is uncommon).

Inverses of Aggregate (De-aggregation)

Since Aggregate is used to map data from a finer-grained index to a coarser-grained index, it is natural to ask how the inverse is accomplished -- namely, mapping from a coarse-grained index to a fine-grain index.

Just as with aggregate, there are many potential types of de-aggregation. For example, when de-aggregating annual data down to months, we might use the annual value for every month in that year, or split it uniformly among all the target months, or perhaps assign it to only one month, using zero for all other months in that year.

The following uses the annual value for every month in that year. In the example, Annual_val is indexed byYear, while the result is indexed by Month:

Annual_val[Year = MonthToYear]

To spread the value uniformly across all months, we can do this:

Var MonthsPerYear := Aggregate(1, MonthToYear, Month, Year);
(Annual_val/MonthsPerYear)[Year = MonthToYear]

More generally, if we have a weighting, w, indexed by Month, which specifies the proportion of annual revenue each month receives, we can use:

w*Annual_val[Year = MonthsToYear]

If you need to ensure that w is normalized for each month, use as the weight:

w := (w0/Aggregate(1, MonthToYear, Month, Year)[Year = MonthToYear])

where w0 is the unnormalized weight. Using w0 := 1 yields the earlier example of spreading evenly across all months for the year.

Using an array w that has a 1 in the first month of each year, and 0 for all remaining months, yields a de-aggregation that assigns the full annual value to the first month of each year:

Index Firsts := Unique(MonthToYear, Month);
(@[Firsts = Month] > 0)*Annual_val[Year = MonthToYear]

Other methods for Aggregation

In many existing Analytica models, one will see the following method used for aggregation:

Sum((MonthToYear = Year) * Revenue, Month)

This expression is functionally equivalent to:

Aggregate(Revenue, MonthToYear, Month, Year)

However, the use of Aggregate will evaluate much faster. When using the dot-product method with Sum, the intermediate expression (MonthToYear = Year) expands to a full 2-D array (Month x Year), which even though it is very sparse, still consumes lots of memory and time to process. In rough terms, the Sum-based method is O(n2) complexity, while Aggregate is only O(n).

The Frequency function can be used to aggregate, with either "sum" type aggregation (using the data itself as the weighting parameter) or "count" type aggregation. Frequency is in fact quite efficient for this purpose, but the usage for the purpose is unintuitive, making Aggregate the preferred alternative.

Slice Assignment can be used in a procedural-programming style to aggregate. This is a very general-purpose approach (can be employed for a variety of aggregation types) and can also be quite efficient. The general approach is demonstrated here (for Max-type aggregation):

Var res := Null;
For m := Month Do (
res[Year = MonthToYear] := Max([res[Year = MonthToYear], Revenue[Month = m]])
);
res

History

See Also

Comments


You are not allowed to post comments.