Overview
Wide data and long data are different formats used to store and organize data. Long data is sometimes called narrow data, stacked data, or (when when formatted appropriately, tidy data).
To understand the structure of these data formats, start by considering a sample dataset that stores, for a given year, the GDP per capita for a country (in this case, Germany):
When we decide we add the GDP of a second country, there are two strategies available to us.
In wide data format, the additional country is added as a new column:
In long data format, one row is added for each observation, and a column is added to identify the country:
Choosing a Format
Wide and long data formats cater to varying needs and scenarios:
Wide data is more intuitive for public sharing. When datasets are presented in public-facing contexts, for instance as tables in news articles or reports, wide data formats are often preferred. They display categories as separate columns, making it easier for readers to quickly grasp comparisons and relationships without requiring advanced knowledge of data structures.
Long data is usually better for statistical software and advanced analysis. Long data formats are highly compatible with statistical software and programming languages, such as R or Python, which often require data in this structure for functions like grouping, filtering, or summarizing. This format makes it easier to handle multiple variables, apply consistent transformations, and perform complex analyses across categories.
In Mappica, you can build datasets using either wide or long data formats, though certain formats are better suited to specific situations. Here are several factors to consider:
1. The complexity of the data: Wide data is typically more suitable to smaller datasets that a dataset contains only a few series (e.g., 2–5), since editing and managing data can be easier when viewing columns side-by-side, and without the repetition of the independent variable (the "Date" column in the examples above).
2. Selection of visual elements: Many elements in Mappica are capable of using either wide or long format, but some require a particular data format. The available data formats for a particular element are displayed in the right panel, under the Dataset section.
3. Filtering needs: When you plan to build intricate filtering into your visualization and need multiple elements to connect to the same filter controls, long data is often the better choice. Consider an updated version of the sample dataset that stores both "GDP per capita" and "Population" data for Germany and Sweden. In long format, it might look like this:
We can use this dataset to easily create a chart for GDP and another for population. We can also add filters for any of the variables. For instance, we could create a filter element that is tied to the country column and connect this to both charts. This filter lets the user toggle the visibility of countries in both charts.
Now consider the wide data equivalent:
Once again we can create separate charts for both GDP and population. However, we can no longer simultaneously filter both charts using a single variable (e.g., country). In wide data format, relationships that were previously explicitly represented have been lost, and as a result the format is more limiting in terms of functionality.