BigQuery
Google BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. The Data Connect integration with BigQuery allows you to automatically sync your Contentsquare data to BigQuery for advanced analysis and joining with other business data.
Prerequisites
Section titled PrerequisitesBefore setting up the BigQuery integration, ensure you:
- Have a Google Cloud Platform (GCP) project ↗.
- Enable billing in the GCP project ↗.
- Enable the BigQuery API ↗
- Know the region you want to use among
us,eu,europe-west1,europe-west2,europe-west4,europe-west6,us-central1,us-west1,us-west2,us-east4, oraustralia-southeast1. - (Optional) Decide on a name for your dataset (default is
project_environment).
These prerequisites are also outlined in Google Cloud Platform quick-start guide ↗
BigQuery setup
Section titled BigQuery setupWithin the GCP dashboard ↗ for your selected project, visit IAM & admin settings and click + Add.
In the subsequent view, add heap-sql@heap-204122.iam.gserviceaccount.com with the BigQuery User role and save the new permission.
We would prefer to be added as a BigQuery user per the steps above. At minimum, we need to be assigned to a dataEditor role, and we need the following permissions ↗:
Project Permissions
bigquery.datasets.getbigquery.jobs.create
Dataset Permissions
bigquery.routines.createbigquery.routines.deletebigquery.routines.getbigquery.routines.listbigquery.routines.updatebigquery.tables.createbigquery.tables.deletebigquery.tables.getbigquery.tables.getDatabigquery.tables.listbigquery.tables.updatebigquery.tables.updateData
See how to grant individual permissions to create a custom IAM role ↗ for Contentsquare.
External service account
Section titled External service accountIf you must, you can use your own GCP service account and have Data Connect write to a bucket in your Cloud Storage. Contentsquare will authenticate as your service account to deliver files into your bucket and if configured load them into BigQuery. In this case, provide Contentsquare teams with the:
- Email associated with your GCP service account
- Private key for this service account
- Bucket name
Virtual Private Cloud (VPC)
Section titled Virtual Private Cloud (VPC)If you have a Virtual Private Cloud:
- Create a GCS staging bucket that allows ingress from outside the perimeter and share the bucket name with Contentsquare
- Setup a perimeter bridge including this staging bucket and the target BigQuery dataset
Configure Data Connect
Section titled Configure Data Connect-
Log in to Contentsquare.
-
Navigate to Analysis setup > Data Connect.
-
Select Connect next to BigQuery.
-
Enter your BigQuery hostname and select Next.
-
Supply the required information:
- Your Project ID, which you can find in the
Project infosection of your GCP project dashboard (make sure you're in the correct project). - Your region:
us,eu,europe-west1,europe-west2,us-central1,us-west1,us-west2,australia-southeast1,europe-west6, orus-east4. - The dataset name override if you don't want the default. The default dataset name is
project_environment. For example, the Main Production environment will default to a dataset name of main_production.
- Your Project ID, which you can find in the
-
Select Connect.
Advanced Features
Section titled Advanced FeaturesPartitioning
Section titled PartitioningData Connect automatically partitions BigQuery tables by date to optimize query performance and reduce costs. Understanding how partitioning works helps you write more efficient queries.
How Data Connect Partitions Tables:
- Event tables are partitioned by the
timecolumn - Partitions are created daily based on event timestamps
- Each partition contains all events for that specific date
Benefits of Partitioning:
- Reduced query costs: Queries that filter by date only scan relevant partitions
- Improved performance: Less data to scan means faster query execution
- Automatic pruning: BigQuery automatically eliminates unnecessary partitions
Writing Partition-Aware Queries:
-- Good: Uses partition filter on time columnSELECT user_id, event_id, timeFROM `your-project.main_production.page_viewed`WHERE DATE(time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
-- Bad: Scans all partitions (expensive)SELECT user_id, event_id, timeFROM `your-project.main_production.page_viewed`WHERE user_id = 12345Partition Pruning Tips:
- Always include a
WHEREclause on thetimecolumn when possible - Use
DATE(time)for date-based filters - Combine date filters with other conditions for optimal performance
Check Partition Details:
-- View partition informationSELECT partition_id, total_rows, total_logical_bytesFROM `your-project.main_production.INFORMATION_SCHEMA.PARTITIONS`WHERE table_name = 'page_viewed'ORDER BY partition_id DESCLIMIT 10External Service Account Setup
Section titled External Service Account SetupIn some cases, you may need to use your own Google Cloud Platform (GCP) service account for Data Connect instead of Contentsquare's default service account.
When to Use a Custom Service Account:
- Your organization requires all data access to use internal service accounts
- You need granular control over permissions and access logs
- You have VPC Service Controls that restrict external service accounts
- You want to route data through your own Cloud Storage bucket
Step-by-Step Configuration:
-
Create a Service Account in your GCP project:
Terminal window gcloud iam service-accounts create contentsquare-connect \--display-name="Contentsquare Data Connect" \--project=your-project-id -
Grant Required Permissions to the service account:
Terminal window # BigQuery permissionsgcloud projects add-iam-policy-binding your-project-id \--member="serviceAccount:contentsquare-connect@your-project-id.iam.gserviceaccount.com" \--role="roles/bigquery.dataEditor"gcloud projects add-iam-policy-binding your-project-id \--member="serviceAccount:contentsquare-connect@your-project-id.iam.gserviceaccount.com" \--role="roles/bigquery.jobUser" -
Create a Cloud Storage Staging Bucket (if needed):
Terminal window gsutil mb -p your-project-id -l us gs://your-csq-staging-bucket# Grant service account accessgsutil iam ch serviceAccount:contentsquare-connect@your-project-id.iam.gserviceaccount.com:objectAdmin \gs://your-csq-staging-bucket -
Generate Service Account Key:
Terminal window gcloud iam service-accounts keys create contentsquare-key.json \--iam-account=contentsquare-connect@your-project-id.iam.gserviceaccount.com -
Provide Credentials to Contentsquare:
- Service account email:
contentsquare-connect@your-project-id.iam.gserviceaccount.com - Private key (from
contentsquare-key.json) - Bucket name (if using staging bucket)
- Service account email:
Security Considerations:
- Rotate keys regularly: Generate new service account keys periodically
- Limit permissions: Only grant the minimum required permissions
- Monitor access: Enable audit logging for the service account
- Secure key storage: Never commit service account keys to version control
- Use IAM conditions: Apply conditional access policies where appropriate
Required Permissions:
The service account needs these specific BigQuery permissions:
bigquery.datasets.getbigquery.jobs.createbigquery.routines.*(create,delete,get,list,update)bigquery.tables.*(create,delete,get,getData,list,update,updateData)
If using a staging bucket, also grant:
storage.objects.createstorage.objects.deletestorage.objects.getstorage.objects.list
VPC Service Controls
Section titled VPC Service ControlsVPC Service Controls create security perimeters around GCP resources to prevent data exfiltration and unauthorized access.
If your organization uses VPC Service Controls to secure GCP resources, you need toimplement the following steps for Data Connect to work:
-
Create a GCS Staging Bucket outside the perimeter:
Terminal window # Create bucket in the same region as your BigQuery datasetgsutil mb -p your-project-id -l us gs://csq-staging-bucket# Grant Contentsquare's service account accessgsutil iam ch serviceAccount:heap-sql@heap-204122.iam.gserviceaccount.com:objectAdmin \gs://csq-staging-bucketThe staging bucket must be accessible from both Contentsquare (outside your perimeter) and your BigQuery dataset (inside your perimeter).
-
Create a perimeter bridge to allow data flow from the staging bucket to BigQuery:
Terminal window # Get your perimeter namegcloud access-context-manager perimeters list --policy=YOUR_POLICY_ID# Create bridge configurationgcloud access-context-manager perimeters update YOUR_PERIMETER_NAME \--set-egress-policies=egress-policy.yaml \--policy=YOUR_POLICY_IDExample
egress-policy.yaml:- egressFrom:identities:- serviceAccount:heap-sql@heap-204122.iam.gserviceaccount.comegressTo:resources:- projects/YOUR_PROJECT_NUMBERoperations:- serviceName: storage.googleapis.commethodSelectors:- method: "*" -
During Data Connect setup, provide Contentsquare with:
- Project ID
- BigQuery dataset name
- Staging bucket name:
gs://csq-staging-bucket - Region
- VPC perimeter name (for support reference)
-
Once the setup is complete, verify data is flowing correctly:
-- Check recent sync activitySELECT table_name, TIMESTAMP_MILLIS(creation_time) as createdFROM `your-project.main_production.__TABLES__`ORDER BY creation_time DESCLIMIT 10
If syncs fail with VPC-related errors, follow these steps to troubleshoot:
- Verify the staging bucket is not inside the VPC perimeter
- Check that the perimeter bridge includes the staging bucket
- Confirm egress policies allow Contentsquare's service account
- Review VPC Service Controls audit logs for denied requests
- Ensure the staging bucket and BigQuery dataset are in the same region
Common error messages include:
"Request violates VPC Service Controls": Perimeter bridge not configured correctly"Bucket is inside VPC perimeter": Staging bucket must be outside the perimeter"Permission denied": Service account lacks necessary IAM permissions