How to mimic UNION ALL operator with Google Data Studio and combine sets of data
Last week, we enjoyed discovering the new possibilities of data manipulation inside Google Data Studio, thanks to the new data blending features.
Now, it is time for a first tip to leverage them and built an operator not yet available : UNION, and more specifically, UNION ALL operator.
UNION ALL operator combines several sets of data with the same sequence of fields and the same related data types. Data sets can be made from separate data sources or from the same one just for filtering purposes.
It is quite common to use UNION operator in SQL to merge several sub queries, doing what is necessary to normalize the data schema across sources.
If you are familiar with SQL, you will see that inside Google Data Studio, the difference between UNION (removing duplicate rows) and UNION ALL (keeping all rows) operators will not be really significant because of the way we use dimensions and metrics aggregation options. In other words, distinct dimension values and sum of metric values are everywhere in Data Studio charts and we need to do specific things to avoid it…so, the opposite comparing to SQL queries…
For this article, I will use two examples applying the same trick:
- Google Search Console with site data from web, image and video properties, combined in the same chart
- Google Sheets to reproduce the mechanism of configuration with a simple example
For this tutorial, we will use table chart as the final result for both examples, but of course, you will be able to reproduce it for any chart type.
To create our UNION operator and combine data with it on a chart, we will need to the following elements:
- From 2 to 5 data sources or instances of data sources (tables)
- For Google Search Console use case: I’ve connected 3 separate data sources to Web, Image and Video properties
- For Google Sheets theoretical example: 3 different data sources corresponding to 3 tabs with 1 field “Name”
- Blended data of mentioned data sources, with shared data set format, correctly joined
- Table charts to display final data
- Chart calculated fields to create the homemade union operator of each displayed field
1. Data sources
1.1 Google Search Console data sources (1st track)
Properties (web, image, video and news) from Google Search Console (GSC) connector are available through a parameter inside a Google Data Studio data source. Unfortunately, the value selection for this parameter is not useable for what we need to do.
So we create one dedicated data source for each GSC property (no data on “news” property for this example, so just 3 data sources to merge).
1.2. Google Sheets data sources (2nd track for a theoretical example)
Let’s fill 3 tabs of the same Google Sheets with 1 “Name” field and 3 different values for each.
2. Blended data
2.1. Google Search Console blended data (1st track)
Now, we configure blended data “BD – GSC properties” as described below for our example:
- 1 table by Google Search Console data source (x3)
- Table names: The unique value of Google property for each data source, so respectively “Video”, “Image” & “Web” regarding the screenshot below
- Dimensions: Google Property, Device category
- Metric: Impressions
- Full outer join type for both conditions
- 1 key: Google Property on each side without really taking care of the concerned tables
- Explanations: The point here is not to try to match rows but completely the reverse: be sure that any row will match with rows from other tables. Quite easy with Google Property, we are sure that each data source will provide a unique value comparing to values from others.
2.2. Google Sheets blended data (2nd track for a theoretical example)
For our Google Sheets example, here the data blending configuration:
- 1 table by tab (x3)
- Table name: Respectively “Source #1”, “Source #2” and “Source #3”
- A table calculated field providing the unique name of the table (same values)
- Metric: None
- Full outer join type for both conditions
- 1 key: Source – just created calculated field ensuring the non matching of rows across tables.
3. Joined raw data
Now, let’s make a break to get a view of our joined data and see that we are close to the goal.
Here 2 tables charts corresponding to our blended data sets. The rows get values only from one source and the other fields are null. As wished, There is no matching. we combine all data, but for the moment, in separate fields.
It is the first required step for the way to custom UNION operator.
3.1. Google Search Console properties (1st track)
3.2. Google Sheets tabs (2nd track)
4. Calculated fields
At last, to join values of similar fields across tables, we need simple calculated fields with a nice function for this kind of case: COALESCE
COALESCE function returns the first non-missing value found in a list of fields.
So, this COALESCE function will be complementary to full outer join. Their combination makes the trick for UNION ALL operator building.
The calculated fields below are created on the table chart level as chart dimensions and metric.
4.1. Calculated fields for Google Search Console use case
COALESCE(Google Property (Video),Google Property (Image),Google Property (Web))
COALESCE(Device Category (Video),Device Category (Image),Device Category (Web))
COALESCE(Impressions (Video),Impressions (Image),Impressions (Web))
4.2. Calculated fields for Google Sheets example
COALESCE(Name (Source #1),Name (Source #2),Name (Source #3))
5. Configuration review
And as usual, Nothing better than a video to confirm that everything is correctly configured.