How to reveal and leverage selected date range of Google Data Studio reports
Need to access to selected date range and related time repository
Most of data sources and related APIs provide rows of dated data only if the requested data are available (in other words, when the combination of all requested fields gets values), otherwise no row is returned. It is quite obvious but it also drives the way to manage data when temporality is key inside Google Data Studio reports.
On the other hand, Data Studio doesn’t provide for the moment a way to access to contextual information of the live state of the report, including which date range has been selected (or automatically applied). Only data sources can provide data to leverage for time data…but, as said before, all cases cannot be solved with data dates, especially when no data as been collected for all dates or other required time granularity.
So, we need sometimes a consistent time and calendar management, a dates repository without dependency with visualized data. An absolute reference of time.
Google BigQuery for a trivial small query
Google BigQuery can address nicely complex needs about cloud data storage, scalability, speed, everything catalyzed by complementary amazing tools from Google Cloud Platform and proposed with an attractive pricing plan based on queried and stored data.
But it is not the subject of this article.
Maybe you already know how I like finding solutions – out of the box – for data viz/reporting requirements not yet directly covered by Data Studio built-in features.
Let’s use BigQuery and its related implementation of Structured Query Language (SQL) as an easy and free way to get a service providing absolute time and calendar data. We will not store and query dynamic data, so we will not be charged for it. We will just take advantage of the SQL engine to query calculated rows about dates & time dimensions. Just that 😇
So, this time, we will hijack Google BigQuery to use 0,001% of its capabilities, just to get more context about dates and use it to improve displayed data. 😋
Example and Ingredients
Let’s assume that you’ve got already your main data source, providing dated data but not returning rows for every day. For this article, I use the example of daily sold quantity of a specific product category from Google Analytics. I would like to calculate the average of sold quantity by day, even if there is no sale (and any data) collected for some specific days.
So, of course, count distinct values on data date field or date difference between min and max data date field values are not reliable options to calculate the number of day, so the average.
In addition to our main data source, we will need very few elements:
- An BigQuery project (an empty project in BigQuery sandbox is enough)
- A BigQuery data source with a custom query
- Blended data joining BigQuery time data with your main data
- Some calculated fields on our charts powered by blended data
BigQuery (sandbox) project
First, we need a BigQuery project inside Google Cloud Platform (GCP).
Don’t worry, our GCP needs are quite simple and it doesn’t require any billing information. If you have never configured a BigQuery project, you can use the BigQuery sandbox. We will not be blocked by the sandbox limitations. You will find how to do in this quick start documentation.
So, create your project using the sandbox or use your existing environment, then go back to your Data Studio report. And of course, no need to create a BigQuery dataset and the related tables. We will not store data inside BigQuery.
BigQuery data source inside Google Data Studio
To configure our BigQuery data source, we will code a custom query, enabling date range parameters. This feature allows you to provide start and end date from report date range to built your custom query.
Here is the main trick: Data Studio cannot provide us directly the selected date range of the report to allow us to compute data regarding time context. So we will take advantage of the query language of BigQuery, not to use date range parameters to filter data stored on BigQuery table. Nope. But to build a simple query to get dates list of selected range 😈
So, add and save your new BigQuery data source as described below:
- Name in the example here: “BQ – Calendar”
- Connector: BigQuery from Google
- Type: Custom query
- Project: Your (sandbox) project (here “My First Project”)
- Custom query: Copy/paste the query below
- Check “Enable date range parameters”
Through this query, we request all the dates between the start and the end of the date range, named “calendar_date”.
SQL custom query
SELECT * FROM UNNEST( GENERATE_DATE_ARRAY( PARSE_DATE('%Y%m%d',@DS_START_DATE) ,PARSE_DATE('%Y%m%d',@DS_END_DATE) ,INTERVAL 1 DAY ) ) AS calendar_date
The screenshot below sums up the full configuration of the BigQuery data source.
2. Blended data
The data source corresponding to the date range repository is ready. Let’s blend it with the main data source.
- As left table (table 1)
- Data source: BigQuery data source (“BQ – Calendar”)
- Dimension : calendar_date, renamed as “Date”
- As right table (table 2)
- Data source: Main data source, here Google Analytics data source
- Dimension: Date, renamed as “Data date”
- Other options: to adapt, here quantity as metric
- Join configuration
- Operator: Left outer join
- Join keys: Date and Data date
3. Chart configuration
At last, adapt your reporting to use the blended data instead of your main data source and create chart calculated fields to take advantage of our dates repository.
Here 2 examples of calculated fields:
- Quantity: A simple nullity test to replace absent values by zero
- Average of daily quantity to divide the total quantity by the real number of days
At last, let’s see how everything is configured and let’s play with it. Voilà.
This is just a first example describing how we can use BigQuery as a service provider of absolute or contextual data and the possibilities are endless, and not limited to dates & time. I’ve got in mind a visualisation use case to work around some limitations of Data Studio, probably a topic for another blog post.
On the other hand, As an extension of this article, I’m wondering if delegating some additional calculation or formatting about dates would be advantageous for performance purposes, instead of doing it inside Data Studio with calculated fields engine.
At last, I let you imagine how it would be amazing to get dynamic parameter feature, and so to be able to provide to BigQuery, not just date range data, but also data from other data sources, and so use BigQuery as a computing engine regardless of the data source. Just saying…