Google Data Studio

How to split comma-separated values to single values with Google Data Studio

How to split comma-separated values to single values with Google Data Studio

Let’s describe step by step how to split delimiter-separated values – as example with comma – from a text dimension to get the same related metrics, drilled down by real single values.

If your system of storage or your data collection – like Google Analytics – doesn’t allow to manage list dimension, and you don’t wish to send several tracking hits – one by separate value – to optimize the payload of your measurement device, this trick offers a solution to do it quickly & without a layer of data preparation.

But be careful, it is not perfect, just quite nice for simple & small data sets.

Here some examples of use cases, regardless of the data source & related tools:

  • Blog: Article tags
  • SEO: List of targeted & optimized keywords
  • E-commerce: Personalized payment methods of a checkout funnel
  • Attribution: Channel touches for a linear attribution model of conversions

Ingredients

For this article, I apply the method with an hit-scoped custom dimension, collecting articles tags on Google Analytics, but as usual, you can do exactly the same for any other context with a dimension collecting delimiter-separated values.

To build an example of bar chart, let’s use the following elements:

  • A Google Sheets data source playing the role of list splitter
  • Blended data to mix data & metrics from Google Analytics (pageviews by article tags), with Sheets data source
  • A table chart with bars to visualize our data
  • A calculated dimension to split items to single values
  • A calculated metric to keep values only for the right rows

So, another new & simple roundabout way to use data blending, like in a previous article to customize labels for pie & donut charts.


0. Maximum number of values inside the list dimension

First, we need to know our data and especially the max number of values stored in our raw dimensions. It will be important for the consistency of your report in the future.

To get it on GA as example, you can use the following calculated field, applying as the metric of a scorecard on a wide period. In my example, values are delimited with comma separator. Feel free to adapt the formula with the delimiter character of your data.

For most of cases, each single value is unique as part of the limiter-separated values of the raw dimension, so this number cannot be higher than the number of distinct value.


Max number of tags
MAX(
     LENGTH(
         REGEXP_REPLACE(Article tags,"([^,])","")
     )
 )+1

Explanation of the formula:
We remove all characters except commas to count them, get the higher value with max function, and increase by one to get the max number of tags.
If you are not familiar with regular expressions, in the formula, the delimiter corresponds to the comma just after the ^.


For my tiny data set, the formula returns 4. So with a minimum of allowance, I consider that my list dimension cannot contain more that 10 values to separate.


1. Google Sheets data source

I create a simple Google Sheets with one tab & one column “index” of 10 values, from 1 to 10. Our static splitter: Inside Data Studio, we cannot generate lists except from a data source. That ‘s the big limit we have to deal with for this trick.

Then, we add the Google Sheets tab as a Data Studio data source.


Google sheets data source to split list values

2. Blended data

Now, we configure blended data “BD for list dimension” as described below for our example:

  • As left data source
    • Data source: Google Sheet data source with indexes
    • Join key: None
    • Dimension : Index
  • As right data source
    • Data source: Main data source, here Google Analytics data source
    • Join key: None
    • Dimensions: Article tags, Nb tags as a calculated dimension (so on blended data scope) with the formula below
    • Metric: Pageviews

Nb tags
LENGTH(REGEXP_REPLACE(Article tags,"[^,]",""))+1

Explanation of the formula: We remove all characters except commas to count them and increase by one to get the number of tags.
The delimiter corresponds to the comma just after the ^.


Blended data to separate comma-separated values to single values

3. Chart configuration

At last, we create the chart, here a table chart:

  • Data tab
    • Data source: Your blended data “BD for list dimension”
    • Dimension: A chart calculated field “Tag” with the formula below
    • Metric: A chart calculated field “Page views” with the formula below and default aggregation (SUM)
    • Sort by Page views (so the “New Field” metric) descending
  • Style tab
    • Show metric values as bars with numbers (metrics selection)

Chart with list dimension values

Calculated field to use as chart dimension

Tag
CASE 
 WHEN Nb tags >= Index AND Index = 1 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){1}'),",","")
 WHEN Nb tags >= Index AND Index = 2 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){2}'),",","")
 WHEN Nb tags >= Index AND Index = 3 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){3}'),",","")
 WHEN Nb tags >= Index AND Index = 4 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){4}'),",","")
 WHEN Nb tags >= Index AND Index = 5 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){5}'),",","")
 WHEN Nb tags >= Index AND Index = 6 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){6}'),",","")
 WHEN Nb tags >= Index AND Index = 7 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){7}'),",","")
 WHEN Nb tags >= Index AND Index = 8 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){8}'),",","")
 WHEN Nb tags >= Index AND Index = 9 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){9}'),",","")
 WHEN Nb tags >= Index AND Index = 10 THEN REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){10}'),",","")
 ELSE REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){1}'),",","")
 END

Explanation of the formula:
Keep in mind that data set rows have been replicated 10 times thanks to the left join with Google Sheets data source.
We extract the single value from the comma-separated values only if the list contains enough values and only for the specific index.

Here in red the three occurrences of delimiter to replace for each WHEN output, if you don’t use comma character:
REPLACE(REGEXP_EXTRACT(Article tags,'(,?[^,]+){10}’),”,“,””)


PS: Unfortunately, REGEXP_EXTRACT function accepts only literal expression (no possible dynamic injection of the index to simplify the formula)…


Calculated field to use as chart metric

Page views
CASE 
 WHEN Nb tags >= Index THEN Pageviews
 ELSE 0
END

Explanation of the formula:
We count pageviews only if the list contains enough values. Crossing this selection, with the dimension above, duplicates the metric value for each single split value. That’s exactly what we need.
It means also that any pre calculated or pre aggregated metric needs to be rebuilt with raw metrics after the split or needs the right aggregation.


Configuration review

At last, let’s see how everything is configured and allow us to do our nice chart, based on comma-separated values, becoming now a real list dimension. Voilà.



Eparpillé par petits bouts, façon puzzle

Share