Data Aggregation
Data aggregation ensures that charts, maps, and tables display accurate results when connected to filter elements.
Overview
In data visualization projects, charts, maps, and tables often link to filter elements, allowing users to explore data interactively. When a filter selects multiple values, multiple records may need to combine into single data points for accurate visualization. Aggregation handles this by summing or averaging relevant values.
Aggregation settings are configured in the dataset editor, where numeric fields (numbers, currency, percent, and measurement) can aggregate by Sum (the default) or Average.
Example: Aggregating Data
Consider a dataset tracking US trade flows under the
USMCA in 2022.
This dataset is plotted below in a bar chart with "Trade Flow" on the vertical axis and "Total ($ billions)" on the horizontal axis. A filter element (displayed as checkboxes) allows users to control the value of the "Category" field (setting it to either "goods" or "services" or both).
- Selecting only one checkbox (e.g., "goods") filters out records for the other category (e.g. "services"), leaving two values from the "Total" field to represent exports and imports respectively. In this case, no aggregation is required.
- Selecting both checkboxes retains all four records, requiring aggregation to either sum or average the values in the Total field.
This aggregation ensures bars represent the combined trade flow for both goods and services.
Aggregation is used to calculate the bar size when both checkboxes are selected.
Aggregation in Charts
Aggregation is necessary when multiple records share the same value in the chart’s
independent variable. In the example above, if several records share identical "Trade Flow" values, the "Total" field must aggregate to generate accurate visual elements.
Exceptions: Scatterplots and Histograms
Most charts in Mappica apply aggregation when the independent variable field contains multiple records with the same value. However, two chart types work differently because they rely on raw data and do not involve an independent variable in the traditional sense:
Scatterplots plot raw data points based on pairs of values, with each record representing a unique observation. For example, a scatterplot showing the relationship between height and weight treats each pair of values as a distinct data point. If two records share the same height and weight, they will overlap exactly in the scatterplot. To visually distinguish overlapping points, techniques like adjusting color opacity or using dynamic symbol sizing can be considered.
Histograms display the distribution of raw data values, and aggregation would defeat their purpose. Every data point represents an observation, and the goal is to show how frequently these observations occur across a range of values.
Aggregation in Maps
In maps, aggregation works in a similar way to aggregation in charts. In choropleth maps, it is triggered when there are multiple identical values in the Dataset Join Field, which is typically a geographic identifier such as an id or name value. In bar maps, spike maps, and symbol maps, aggregation happens when there are multiple records with identical Latitude and Longitude values.
Aggregation in Tables
Aggregation is an optional table feature that combines values within Series Fields when multiple records share identical values across their Category Fields. If multiple Category Fields are selected, aggregation applies only when the corresponding values match in all of those fields.
For example, consider a table built from the dataset shown above. The table is set up with "City" and "State" as Category Fields and "Population" as a Series Field. Without aggregation, all four records are displayed in the table. When aggregation is enabled, records with identical City and State values will be merged into a single record. In this case, only the two records for Springfield, Ohio meet this criterion, so they will be aggregated, and the population will either be summed to 116,000 (58,000*2) or averaged to 58,000 ((58,000+58,000)/2).
Aggregation and Empty Cells
When any value involved in an aggregation is empty, the result evaluates to null, and no value is displayed. Empty cells are not treated as 0 in aggregation computations. For example, if "Trade Flow" is used as the independent variable in a bar chart and "Total ($ billions)" is set as the dependent variable, the imports bar will display a value of 974.3 (calculated as 891.3 + 83), but no exports bar will appear because the sum includes a null value (680.8 + null = null).
Note: In area charts, null values are visually rendered as 0, but no tooltip is displayed. This behavior ensures the chart remains visually coherent, as area charts require continuous values to prevent gaps or distortions in the overall shape.