Aggregate calculations

We've already considered aggregations such as sum, min, and max in Tableau. Often, you'll use fields as simple aggregations in the view. But sometimes, you'll want to use aggregations in more complex calculations.

For example, you might be curious to explore the percentage of the rent that was discounted. There is no such field in the data. It could not really be stored in the source, because the value changes based on the level of detail present in the view (for example, the percent discounted for an individual unit will be different to the percent discounted per floor or per building). Rather, it must be calculated as an aggregate and recalculated as the level of detail changes.

Let's create a calculation named Discount % with the following code:

SUM([Discount]) / SUM([Rent]) 

This code indicates that the sum of Discount should be divided by the sum of Rent. This means that all the values of Discount will be added, and all the values of Rent will be added. Only after the sums are calculated will the division occur.

Once you've created the calculation, you'll notice that Tableau treats the new field as a Measure in the data pane. Tableau will treat any calculation with a numeric result as a measure by default, but you can change row-level calculations to dimensions if desired. In this case, though, you are not even able to redefine the new field as a dimension. The reason for this is that Tableau will treat every aggregate calculation as a measure, no matter what data type is returned. This is because an aggregate calculation depends on dimensions to define the level of detail at which the calculation is performed. So, an aggregate calculation cannot be a dimension itself.

As the value of your calculation is a percentage, you will also likely want to define the format as a percentage. To do this, right-click the Discount % field, select Default Properties | Number Format, and select Percentage. You may adjust the number of decimal places that are displayed if desired.

Now, create a couple of views to see how the calculation returns different results, depending on the level of detail in the view. First, we'll build a view to look at each individual rental period:

  1. Place Building, Room, Full Name, Start, and End on Rows.
  2. In the data pane, under Measures, double-click each of the following fields: Rent, Discount, and Discount %. Tableau will place each of these measures in the view by using Measure Names and Measure Values.
  3. Rearrange the fields on the Measure Values shelf so that the order is Rent, Discount, and Discount %:

    Figure 4.7: Illustrates the Discount % calculated at the level of Building, Room, Full Name, Start, and End

You can see the percentage given by way of discount for each rental period. However, notice how the values change when you remove all fields except Building and Room:

Figure 4.8: Illustrates the Discount % calculated at the level of Building and Room

Why did the values change? Because aggregations depend on what dimensions are defining the level of detail of the view. In the first case, Building and Room, Full Name, Start, and End defined the level of detail in the view. So, the calculation added up all the rent for each rental period and all the discounts for the rental period and then divided them. In the second case, Building and Room redefine the level of detail. So, the calculation added up all the prices for each building and room and all the discounts for each building and room and then divided them.

You may have noticed that as you double-clicked each measure, it was added to the pane of the view in a special way. Measure Names and Measure Values are special fields that appear in every data connection (toward the bottom of the Data pane). These serve as placeholders for multiple measures that share the same space in the view.

In the view you just created, for example, three measures all shared space in the pane. Measure Values on Text indicated that all values of measures on the Measure Values shelf should be displayed as text. The Measure Names field on Columns created a column for each measure, with the value of the name of that measure.

Notice that the values change again, as expected, if you look at the overall dataset without slicing by any dimensions:

Figure 4.9: Illustrates the Discount % calculated at the highest possible level: the entire dataset

An easy way to get Tableau to implement Measure Names / Measure Values is to remember that they are used whenever you want to use two or more measures in the same space in a view. So, if you want to use two or more measures in the pane, drag the first to the pane and then the second. If you want to use two or more measures on the same axis, drag the first to the axis, and then drag and drop the second on the same spot.

Now that you have a basic understanding of row-level and aggregate calculations, let's consider why the distinction is important.

Why the row level versus aggregate difference matters

Let's say you created a Discount % (row level) calculation with the following code:

[Discount]/[Rent] 

The code differs from the aggregate calculation you created previously, which had the following code:

SUM([Discount])/SUM([Rent]) 

Here is the dramatic difference in results:

Figure 4.10: Illustrates the Discount % calculated as a row-level value and as an aggregate

Why is there such a difference in the results? It's a result of the way the calculations were performed.

Notice that Discount % (row level) appears on the Measure Values shelf as a SUM. That's because the calculation is a row-level calculation, so it gets calculated row by row and then aggregated as a measure after all row-level values have been determined. The 54.00% value you see is actually a sum of percentages that were calculated in each record of underlying data.

In fact, the row-level calculation and the final aggregation is performed like this:

Figure 4.11: If each Discount % result is calculated at a row level and then aggregated, the result is wrong

Contrast that with the way the aggregate calculation is performed. Notice that the aggregation that's listed in the active field on the Measure Values shelf in the view is AGG, and not SUM. This indicates that you have defined the aggregation in the calculation. Tableau is not aggregating the results further. Here is how the aggregate calculation is performed:

Figure 4.12: If the numerator and denominator are aggregated first, then the Discount % calculation is correct

It is vital to understand the difference between row-level and aggregate calculations to ensure you are getting the results you expect and need. In general, use row-level calculations when you are certain that you will use either the value as a dimension or that an aggregation of the row-level values will make sense. Use aggregate calculations if aggregations must be performed prior to other operations.

One of the most common error messages that's encountered while writing Tableau calculations is Cannot mix aggregate and non-aggregate arguments with this function. When you encounter this message, check your code to make sure you are not improperly mixing row-level fields and calculations with aggregate fields and calculations. For example, you cannot have something like [Discount] / SUM([Rent]).

This mixture of a row-level value (Discount) and the aggregation (SUM of Rent) is invalid.

With the distinction between row-level and aggregate calculations clear, let's take an interlude and discuss parameters before building additional examples.