Databricks

Databricks is a unified analytics platform built on Apache Spark that combines data engineering, data science, and analytics workloads. The Data Connect integration allows you to automatically sync your Contentsquare data to Databricks for advanced analysis and machine learning.

Before setting up the Databricks integration, ensure you have:

  • Access to an AWS-hosted Databricks account that uses the Unity Catalog
  1. Log in to Contentsquare.

  2. Navigate to Analysis setup > Data Connect.

  3. Select Connect next to Databricks.

  4. Provide the following information:

    • Hostname: The ID of your Databricks account, which you can find in the account URL.
    • Path: The path of the warehouse you are connecting via this integration.
    • Catalog: The catalog that this data should sync to; if left blank, this integration will create a new catalog.
    • Schema (optional): The schema that this data should sync to; if left blank, this integration will create a new schema.
    • Token: This is required to allow Data Connect to write to the schema. The token must be a Personal Access Token (PAT) rather than an OAuth Token.
  5. Select Next.

Once setup is complete, you’ll see a sync within 24 hours with the following built-in tables:

  • Pageviews
  • Sessions
  • Users
  • user_migrations

You can create an all_events view to Databricks by setting up a query like this one:

SELECT
event_id,
time,
user_id,
session_id,
'test_event_table' AS event_table_name
FROM
"TEST_DB"."TEST_SCHEMA"."TEST_EVENT_TABLE"
UNION
SELECT
event_id,
time,
user_id,
session_id,
'click_event_table' AS event_table_name
FROM
"SCHEMA"."CLICK_EVENT_TABLE"
UNION
SELECT
event_id,
time,
user_id,
session_id,
'pageview_event_table' AS event_table_name
FROM
"SCHEMA"."PAGEVIEW_EVENT_TABLE"
  • The All Events table is not synced to Databricks. As a workaround, you can create your own all_events.
  • Defined properties syncing is not supported during beta.
  • Segments syncing is not supported during beta.