BigQuery Integration with Google Tag Manager Streaming Events
BigQuery Integration with Google Tag Manager Streaming Events

BigQuery Integration with Google Tag Manager Streaming Events

BigQuery Integration with Google Tag manager
BigQuery

Basic Understanding when sending Data to BigQuery

BigQuery supports ingesting data directly to tables using the Streaming API. However, the API requires that the destination table is already configured and ready to go. In addition, the table must have a proper schema that matches the incoming event data. Any event that doesn’t match the schema will be dropped.

Fortunately, BigQuery supports nested fields. Nesting the event parameters into an array allows for a schema that doesn’t require an update every time a new parameter is added. We’ll do this by using a similar schema as what GA4 uses. The schema consists of an event_params array which contains one record per event parameter. The records have three fields for being able to store different data types:  string_valueint_value, and float_value.

Setting up the destination table in BigQuery

First We have to login in GCP console for the Biquery application and then create the table that will store all the events data transferred from server-side GTM. You can start by creating a new dataset. The dataset’s setup & configuration will define the data location for the tables inside it.

Dataset Bigquery

As you have created the dataset you can see it on the left side of Navigation. Click on the three dots next to the dataset and create a new table. Give your table a name and click to edit the schema as text.

Bigquery table

Copy Paste the schema editor shown

[
    {
        "description": "Event timestamp in milliseconds",
        "mode": "REQUIRED",
        "name": "timestamp",
        "type": "INTEGER"
    },
    {
        "description": "Name of the event",
        "mode": "NULLABLE",
        "name": "event_name",
        "type": "STRING"
    },
    {
        "description": "Event data parameters and their values",
        "mode": "REPEATED",
        "name": "event_params",
        "type": "RECORD",
        "fields": [
            {
                "description": "Parameter name",
                "mode": "NULLABLE",
                "name": "key",
                "type": "STRING"
            },
            {
                "description": "Value stored in one of the record's fields",
                "mode": "NULLABLE",
                "name": "value",
                "type": "RECORD",
                "fields": [
                    {
                        "name": "string_value",
                        "type": "STRING",
                        "mode": "NULLABLE"
                    },
                    {
                        "name": "int_value",
                        "type": "INTEGER",
                        "mode": "NULLABLE"
                    },
                    {
                        "name": "float_value",
                        "type": "FLOAT",
                        "mode": "NULLABLE"
                    }
                ]
            }

        ]
    }
]

Before creating the table, remember to configure the partitioning setting. Partitioning is an essential feature as it will reduce the amount of data that BigQuery will scan whenever you query the table. With large tables, partitioning will add a significant performance improvement and decrease query costs.

BigQuery Data Partitioning

Select partition by ingestion time. In the partitioning type, you can select the granularity of the partitions into which the data will be organized.

After that, you’re good to go and can create the table.

Creating the event tag

Create a new tag using the BigQuery Event tag template. You can find the tag template in the community template gallery.

The tag is quite simple to use: Each event will include the current time as a millisecond timestamp, the name of the event, and any parameters you decide to add.

This example setup collects the name of the event together with page_location:

Querying the event data

Because of the schema similarities and the data being event data, querying the data is very similar to querying GA4 data.

Below is an example query on extracting the individual parameters from the events using unnest.

Example Query

SELECT
  extract(date from timestamp_millis(timestamp)) as date,
  event_name,
  (select value.string_value from unnest(event_params) where key = 'page_location') as page_location
FROM
  `<project-id>.server_side_bigquery.events`
WHERE
  DATE(_PARTITIONTIME) = "2022-05-24"

Summary

Streaming events from server-side GTM to BigQuery is relatively straightforward. However, as the setup is just a simple event stream, it lacks the processing that analytics tools have. Because of this, things like user-level fields are missing. However, it’s always possible to do the processing later once the data hits BigQuery.

(Visited 118 times, 1 visits today)

One comment

  1. Pingback: Big Query Schema for GA4 events tracking - Analytics Smart

Leave a Reply

Your email address will not be published. Required fields are marked *