Relating parts of the data to the whole

As you explore and analyze data, you'll often want to understand how various parts add up to a whole. For example, you'll ask questions such as the following:

  • How much does each electric generation method (wind, solar, coal, and nuclear) contribute to the total amount of energy produced?
  • What percentage of total profit is made in each state?
  • How much space does each file, subdirectory, and directory occupy on my hard disk?

These types of questions are asking about the relationship between the part (production method, state, file/directory) and the whole (total energy, national sales, and hard disk). There are several types of visualizations and variations that can aid you in your analysis.

Let's now look at some visualization examples that will aid us as we consider how to show part-to-whole relationships.

Stacked bars

We took a look at stacked bars in Chapter 1, Taking Off with Tableau, where we noted one significant drawback: it is difficult to compare values across most categories. Except for the leftmost (or bottom-most) bars, the other bar segments have different starting points, so lengths are much more difficult to compare. It doesn't mean stacked bars should never be used, but caution should be exercised to ensure clarity of communication.

Here, we are using stacked bars to visualize the makeup of the whole. We are less concerned with visually comparing across categories and more concerned with seeing the parts that make up a category.

For example, at the hospital, we might want to know what the patient population looks like within each type of department. Perhaps each patient was assigned a risk profile on admission.

We can visualize the number of visits broken down by risk profile as a stacked bar, like this:

Figure 3.31: A stacked bar chart showing the total number of patients per department and the breakdown of low and high risk

This gives a decent view of the visits for each department type. We can tell that more people visit the general departments and that the number of high-risk patients for both Specialty and Labs are about the same. Intensive Care sees fewer high-risk patients and fewer patients overall. But this is only part of the story.

Consider a stacked bar that doesn't give the absolute value, but gives percentages for each type of department:

Figure 3.32: A stacked bar chart showing the relative number of high-risk and low-risk patients per department

Compare the previous two stacked bar charts. The fact that nearly 50% of patients in Intensive Care are considered High risk is evident in both charts. However, the second chart makes this immediately obvious.

None of the data has changed between the two charts, but the bars in the second chart represent the percent of the total for each type of department. You can no longer compare the absolute values but comparing the relative breakdown of each department type has been made much easier. Although there are fewer patients in Intensive Care, a much higher percentage of them are in a high-risk category.

Let's consider how the preceding charts can be created and even combined into a single visualization in Tableau. We'll use a quick table calculation. We'll cover table calculations much more in Chapter 6, Diving Deep with Table Calculations. Here, simply follow these steps:

  1. Create a stacked bar chart by placing Department Type on Rows, Number of Patient Visits on Columns, and Patient Risk Profile on Color. You'll now have a single stacked bar chart.
  2. Sort the bar chart in descending order.
  3. Duplicate the Number of Patient Visits field on Columns by holding down Ctrl while dragging the Number of Patient Visits field in the view to a new spot on Columns, immediately to the right of its current location. Alternatively, you can drag and drop the field from the data pane to Columns. At this point, you have two Number of Patient Visits axes which, in effect, duplicate the stacked bar chart:

    Figure 3.33: An interim step in creating the stacked bars

  4. Using the drop-down menu of the second Number of Patient Visits field, select Quick Table Calculation | Percent of Total. This table calculation runs a secondary calculation on the values that were returned from the data source to compute a percentage of the total. Here, you will need to further specify how that total should be computed.
  5. Using the same drop-down menu, select Compute Using | Patient Risk Profile. This tells Tableau to calculate the percent of each Patient Risk Profile within a given department. This means that the values will add up to 100% for each department.
  6. Turn on labels by clicking the T button on the top toolbar. This turns on default labels for each mark:

    Figure 3.34: This toolbar option toggles labels on/off

After following the preceding steps, your completed stacked bar charts should appear as follows:

Figure 3.35: The final stacked bar view with absolute and relative values

Using both the absolute values and percentages in a single view can reveal significant aspects and details that might be obscured with only one of the charts.

Treemaps

Treemaps use a series of nested rectangles to display parts of the whole, especially within hierarchical relationships. Treemaps are particularly useful when you have hierarchies and dimensions with high cardinality (a high number of distinct values).

Here is an example of a treemap that shows the number of days spent in the hospital by patients. The largest rectangle sections show Department Type. Within those are departments and patients:

Figure 3.36: A treemap showing part-to-whole relationship of Department Types / Departments / Doctors / Patients

To create a treemap, you simply need to place a measure on the Size shelf and a dimension on the Detail shelf. You can add additional dimensions to the level of detail to increase the detail of the view. Tableau will add borders of varying thickness to separate the levels of detail that are created by multiple dimensions. Note that in the preceding view, you can easily see the division of department types, then departments, then doctors, and finally individual patients. You can adjust the border of the lowest level by clicking the Color shelf.

The order of the dimensions on the Marks card defines the way the treemap groups the rectangles. Additionally, you can add dimensions to rows or columns to slice the treemap into multiple treemaps. The result is effectively a bar chart of treemaps:

Figure 3.37: Adding a dimension to Rows has effectively made a bar chart of treemaps

The preceding treemap not only demonstrates the ability to have multiple rows (or columns) of treemaps—it also demonstrates the technique of placing multiple fields on the Color shelf. This can only be done with discrete fields. You can assign two or more colors by holding down the Shift key while dropping the second field on color. Alternatively, the icon or space to the left of each field on the Marks card can be clicked to change which shelf is used for the field:

Figure 3.38: Clicking the icon next to a field on the Marks card allows you to change which shelf is used

Treemaps, along with packed bubbles, word clouds, and a few other chart types, are called non-Cartesian chart types. This means that they are drawn without an x or y axis, and do not even require row or column headers. To create any of these chart types, do the following:

  • Make sure that no continuous fields are used on Rows or Columns.
  • Use any field as a measure on Size.
  • Change the mark type based on the desired chart type: square for treemap, circle for packed bubbles, or text for word cloud (with the desired field on Label).

Area charts

Take a line chart and then fill in the area beneath the line. If there are multiple lines, then stack the filled areas on top of each other. That's how you might think of an area chart.

In fact, in Tableau, you may find it easy to create a line chart, like you've done previously, and then change the mark type on the Marks card to Area. Any dimensions on the Color, Label, or Detail shelves will create slices of area that will be stacked on top of each other. The Size shelf is not applicable to an area chart.

As an example, consider a visualization of patient visits over time, segmented by hospital branch:

Figure 3.39: An area chart showing patient visits over time by hospital branch

Each band represents a different hospital branch location. In many ways, the view is aesthetically pleasing and it does highlight some patterns in the data. However, it suffers from some of the same weaknesses as the stacked bar chart. Only the bottom band (South) can be read in terms of the values on the axis.

The other bands are stacked on top and it becomes very difficult to compare. For example, it is obvious that there is a spike around February of each year. But is it at each branch? Or is one of the lower bands pushing the higher bands up? Which band has the most significant spike?

Now, consider the following view:

Figure 3.40: An area chart showing percentages instead of absolute values

This view uses a quick table calculation, like the stacked bars example. It is no longer possible to see the spikes, as in the first chart. However, it is much easier to see that there was a dramatic increase in the percentage of patients seen by the East branch (the middle band) around February 2019, and that the branch continued to see a significant number of patients through the end of the year.

It is important to understand what facets of the data story are emphasized (or hidden) by selecting a different chart type. You might even experiment in the Chapter 3 workbook by changing the first area chart to a line chart. You may notice that you can see the spikes as well as the absolute increase and decrease in patient visits per branch. Each chart type contributes to a certain aspect of the data story.

You can define the order in which the areas are stacked by changing the sort order of the dimensions on the shelves of the Marks card. Additionally, you can rearrange them by dragging and dropping them within Color Legend to further adjust the order.

Pie charts

Pie charts can also be used to show part-to-whole relationships. To create a pie chart in Tableau, change the mark type to Pie. This will give you an Angle shelf, which you can use to encode a measure. Whatever dimension(s) you place on the Marks card (typically on the Color shelf) will define the slices of the pie:

Figure 3.41: A pie chart showing total revenue broken down by branch

Observe that the preceding pie chart uses the sum of revenue to define the angle of each slice; the higher the sum, the wider the slice. The Hospital Branch dimension is slicing the measure and defining slices of the pie. This view also demonstrates the ability to place multiple fields on the Label shelf. The second SUM(Revenue) field is the percentage of the total table calculation you saw previously. This allows you to see the absolute values of revenue, as well as the percentage of the whole.

Pie charts can work well with a few slices. In most cases, more than two or three become very difficult to see and understand. Also, as a good practice, sort the slices by sorting the dimension that defines the slices. In the preceding example, the Hospital Branch dimension was sorted by the SUM of revenue descending. This was done by using the drop-down menu option. This causes slices to be ordered from largest to smallest and allows anyone reading the chart the ability to easily see which slices are larger, even when the size and angles are nearly identical.

With a good understanding of some techniques for visualizing part-to-whole relationships, let's move on to visualizing distributions.