BigQuery

Google BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. The Data Connect integration with BigQuery allows you to automatically sync your Contentsquare data to BigQuery for advanced analysis and joining with other business data.

Before setting up the BigQuery integration, ensure you:

  1. Have a Google Cloud Platform (GCP) project.
  2. Enable billing in the GCP project.
  3. Enable the BigQuery API
  4. Know the region you want to use among us, eu, europe-west1, europe-west2, europe-west4, europe-west6, us-central1, us-west1, us-west2, us-east4, or australia-southeast1.
  5. (Optional) Decide on a name for your dataset (default is project_environment).

These prerequisites are also outlined in Google Cloud Platform quick-start guide

Within the GCP dashboard for your selected project, visit IAM & admin settings and click + Add.

In the subsequent view, add heap-sql@heap-204122.iam.gserviceaccount.com with the BigQuery User role and save the new permission.

We would prefer to be added as a BigQuery user per the steps above. At minimum, we need to be assigned to a dataEditor role, and we need the following permissions:

Project Permissions

  • bigquery.datasets.get
  • bigquery.jobs.create

Dataset Permissions

  • bigquery.routines.create
  • bigquery.routines.delete
  • bigquery.routines.get
  • bigquery.routines.list
  • bigquery.routines.update
  • bigquery.tables.create
  • bigquery.tables.delete
  • bigquery.tables.get
  • bigquery.tables.getData
  • bigquery.tables.list
  • bigquery.tables.update
  • bigquery.tables.updateData

See how to grant individual permissions to create a custom IAM role for Contentsquare.

If you must, you can use your own GCP service account and have Data Connect write to a bucket in your Cloud Storage. Contentsquare will authenticate as your service account to deliver files into your bucket and if configured load them into BigQuery. In this case, provide Contentsquare teams with the:

  • Email associated with your GCP service account
  • Private key for this service account
  • Bucket name

If you have a Virtual Private Cloud:

  • Create a GCS staging bucket that allows ingress from outside the perimeter and share the bucket name with Contentsquare
  • Setup a perimeter bridge including this staging bucket and the target BigQuery dataset
  1. Log in to Contentsquare.

  2. Navigate to Analysis setup > Data Connect.

  3. Select Connect next to BigQuery.

  4. Enter your BigQuery hostname and select Next.

  5. Supply the required information:

    • Your Project ID, which you can find in the Project info section of your GCP project dashboard (make sure you're in the correct project).
    • Your region: us, eu, europe-west1, europe-west2, us-central1, us-west1, us-west2, australia-southeast1, europe-west6, or us-east4.
    • The dataset name override if you don't want the default. The default dataset name is project_environment. For example, the Main Production environment will default to a dataset name of main_production.
  6. Select Connect.

Data Connect automatically partitions BigQuery tables by date to optimize query performance and reduce costs. Understanding how partitioning works helps you write more efficient queries.

How Data Connect Partitions Tables:

  • Event tables are partitioned by the time column
  • Partitions are created daily based on event timestamps
  • Each partition contains all events for that specific date

Benefits of Partitioning:

  • Reduced query costs: Queries that filter by date only scan relevant partitions
  • Improved performance: Less data to scan means faster query execution
  • Automatic pruning: BigQuery automatically eliminates unnecessary partitions

Writing Partition-Aware Queries:

-- Good: Uses partition filter on time column
SELECT user_id, event_id, time
FROM `your-project.main_production.page_viewed`
WHERE DATE(time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
-- Bad: Scans all partitions (expensive)
SELECT user_id, event_id, time
FROM `your-project.main_production.page_viewed`
WHERE user_id = 12345

Partition Pruning Tips:

  • Always include a WHERE clause on the time column when possible
  • Use DATE(time) for date-based filters
  • Combine date filters with other conditions for optimal performance

Check Partition Details:

-- View partition information
SELECT
partition_id,
total_rows,
total_logical_bytes
FROM `your-project.main_production.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name = 'page_viewed'
ORDER BY partition_id DESC
LIMIT 10

External Service Account Setup

Section titled External Service Account Setup

In some cases, you may need to use your own Google Cloud Platform (GCP) service account for Data Connect instead of Contentsquare's default service account.

When to Use a Custom Service Account:

  • Your organization requires all data access to use internal service accounts
  • You need granular control over permissions and access logs
  • You have VPC Service Controls that restrict external service accounts
  • You want to route data through your own Cloud Storage bucket

Step-by-Step Configuration:

  1. Create a Service Account in your GCP project:

    Terminal window
    gcloud iam service-accounts create contentsquare-connect \
    --display-name="Contentsquare Data Connect" \
    --project=your-project-id
  2. Grant Required Permissions to the service account:

    Terminal window
    # BigQuery permissions
    gcloud projects add-iam-policy-binding your-project-id \
    --member="serviceAccount:contentsquare-connect@your-project-id.iam.gserviceaccount.com" \
    --role="roles/bigquery.dataEditor"
    gcloud projects add-iam-policy-binding your-project-id \
    --member="serviceAccount:contentsquare-connect@your-project-id.iam.gserviceaccount.com" \
    --role="roles/bigquery.jobUser"
  3. Create a Cloud Storage Staging Bucket (if needed):

    Terminal window
    gsutil mb -p your-project-id -l us gs://your-csq-staging-bucket
    # Grant service account access
    gsutil iam ch serviceAccount:contentsquare-connect@your-project-id.iam.gserviceaccount.com:objectAdmin \
    gs://your-csq-staging-bucket
  4. Generate Service Account Key:

    Terminal window
    gcloud iam service-accounts keys create contentsquare-key.json \
    --iam-account=contentsquare-connect@your-project-id.iam.gserviceaccount.com
  5. Provide Credentials to Contentsquare:

    • Service account email: contentsquare-connect@your-project-id.iam.gserviceaccount.com
    • Private key (from contentsquare-key.json)
    • Bucket name (if using staging bucket)

Security Considerations:

  • Rotate keys regularly: Generate new service account keys periodically
  • Limit permissions: Only grant the minimum required permissions
  • Monitor access: Enable audit logging for the service account
  • Secure key storage: Never commit service account keys to version control
  • Use IAM conditions: Apply conditional access policies where appropriate

Required Permissions:

The service account needs these specific BigQuery permissions:

  • bigquery.datasets.get
  • bigquery.jobs.create
  • bigquery.routines.* (create, delete, get, list, update)
  • bigquery.tables.* (create, delete, get, getData, list, update, updateData)

If using a staging bucket, also grant:

  • storage.objects.create
  • storage.objects.delete
  • storage.objects.get
  • storage.objects.list

VPC Service Controls create security perimeters around GCP resources to prevent data exfiltration and unauthorized access.

If your organization uses VPC Service Controls to secure GCP resources, you need toimplement the following steps for Data Connect to work:

  1. Create a GCS Staging Bucket outside the perimeter:

    Terminal window
    # Create bucket in the same region as your BigQuery dataset
    gsutil mb -p your-project-id -l us gs://csq-staging-bucket
    # Grant Contentsquare's service account access
    gsutil iam ch serviceAccount:heap-sql@heap-204122.iam.gserviceaccount.com:objectAdmin \
    gs://csq-staging-bucket

    The staging bucket must be accessible from both Contentsquare (outside your perimeter) and your BigQuery dataset (inside your perimeter).

  2. Create a perimeter bridge to allow data flow from the staging bucket to BigQuery:

    Terminal window
    # Get your perimeter name
    gcloud access-context-manager perimeters list --policy=YOUR_POLICY_ID
    # Create bridge configuration
    gcloud access-context-manager perimeters update YOUR_PERIMETER_NAME \
    --set-egress-policies=egress-policy.yaml \
    --policy=YOUR_POLICY_ID

    Example egress-policy.yaml:

    - egressFrom:
    identities:
    - serviceAccount:heap-sql@heap-204122.iam.gserviceaccount.com
    egressTo:
    resources:
    - projects/YOUR_PROJECT_NUMBER
    operations:
    - serviceName: storage.googleapis.com
    methodSelectors:
    - method: "*"
  3. During Data Connect setup, provide Contentsquare with:

    • Project ID
    • BigQuery dataset name
    • Staging bucket name: gs://csq-staging-bucket
    • Region
    • VPC perimeter name (for support reference)
  4. Once the setup is complete, verify data is flowing correctly:

    -- Check recent sync activity
    SELECT table_name, TIMESTAMP_MILLIS(creation_time) as created
    FROM `your-project.main_production.__TABLES__`
    ORDER BY creation_time DESC
    LIMIT 10

If syncs fail with VPC-related errors, follow these steps to troubleshoot:

  1. Verify the staging bucket is not inside the VPC perimeter
  2. Check that the perimeter bridge includes the staging bucket
  3. Confirm egress policies allow Contentsquare's service account
  4. Review VPC Service Controls audit logs for denied requests
  5. Ensure the staging bucket and BigQuery dataset are in the same region

Common error messages include:

  • "Request violates VPC Service Controls": Perimeter bridge not configured correctly
  • "Bucket is inside VPC perimeter": Staging bucket must be outside the perimeter
  • "Permission denied": Service account lacks necessary IAM permissions