What is Data Sampling?
Data sampling is selecting a subset of data from the overall data available in the system, It is part of statistical analysis to analyze large data sets in a cost-effective manner within time.
A sample should always be a good representative of all data, subset should give the same result as we get from analyzing all data.
Why Google Analytics sample data?
Google Analytics is designed to use sampling for time and speed in process queries for users. With sampling, google analytics never analyzes complete data rather than only a subset of it.
GA just looks for a subset or a portion of that data, which takes less time, and gives you information fast which is accurate. Someone doesn’t have to wait until all data is processed then a report is generated for your site traffic, audiences, and conversions.
Google Analytics Data Sampling
Google Analytics samples data with an upward limit depending on the date range and the number of sessions queried. This limit has been defined in order to save computation costs. In session-based tracking, in Google Analytics, when the user is active on your website or app.
If a user visits your website session started. Any user can have multiple sessions & can be identified based on cookies stored in the browser. For data sampling, Google analytics will analyze a subset of all sessions together for multiple users. As sometimes behavior on-site is common so rather than multiple data points their sample is used.
Date range plays important role in data sampling Issues. If you wanted to use your website activity for a month for an instance, Google Analytics will only sample data from that date range, not the entire year. Now with the launch of GA4 few things are changed but the concept remains the same.
Data sampling in GA4
GA4 reporting is basically divided into two categories in the Analysis tab. standard & Advanced reports.
Standard reports always show unsampled data within a selected date range and in advanced reporting, it is sometimes sampled based on what combination of data with dimensions and metrics you have used in selection.
In the image down below you can see unsampled standard reporting options in GA4
In GA4 if you apply secondary dimensions on segments created in a standard report or even all visitor’s data is unsampled. Even if you can apply comparisons, secondary dimensions, and filters all will continue to be unsampled.
Now if you click on the green tick icon you will see “This report is based on 100% available data”
What is hit limit in GA4?
GA4 is a free tool and has no-hit limits as compared to the 10 million per month per account Universal ANalytics used to have. That is why it is a preferred analytics tool by organizations at free cost.
Threshold concept in GA4
GA4 applies thresholds to event data tracking to prevent anyone from viewing a report inferring demographics or PII or interests of specific users on the website.
When a report contains info related to age, interest search term, or gender assigned to user id. a threshold gets applied and some information is kept hidden from the report. Thresholds are defined by Google we can’t control or adjust them. If a threshold has been applied to a report, you will see unknown values in your report. Values are replaced by unknowns to keep identity values hidden.
Every report in GA4 has dimensions assigned and each dimension can have several values attached to it. Like gender can have 3 values (male, female and other). So here we consider cardinality for that dimension is 3. A total number of unique values for any dimension is known as its cardinality.
Page view dimensions have high cardinality as it has N number of URLs that are on your site. Any dimension with a large number of potential values knows as a high cardinal dimension.
There is no concrete definition available on Google when the cardinality limit is reached in any Google Analytics account.
GA4 shows unsampled reports for standard reports and un case of advanced reporting options in the Analysis tab (comparison report, segment overlap, funnel analysis, the exploration, user exploration, sales funnel) might get sampled.