Databricks
Databricks is a unified analytics platform built on Apache Spark that combines data engineering, data science, and analytics workloads. The Data Connect integration allows you to automatically sync your Contentsquare data to Databricks for advanced analysis and machine learning.
Prerequisites
Section titled PrerequisitesBefore setting up the Databricks integration, ensure you have:
- Access to an AWS-hosted Databricks account that uses the Unity Catalog
Configure Data Connect
Section titled Configure Data Connect-
Log in to Contentsquare.
-
Navigate to Analysis setup > Data Connect.
-
Select Connect next to Databricks.
-
Provide the following information:
- Hostname: The ID of your Databricks account, which you can find in the account URL.
- Path: The path of the warehouse you are connecting via this integration.
- Catalog: The catalog that this data should sync to; if left blank, this integration will create a new catalog.
- Schema (optional): The schema that this data should sync to; if left blank, this integration will create a new schema.
- Token: This is required to allow Data Connect to write to the schema. The token must be a Personal Access Token (PAT) rather than an OAuth Token.
-
Select Next.
Once setup is complete, you’ll see a sync within 24 hours with the following built-in tables:
- Pageviews
- Sessions
- Users
user_migrations
You can create an all_events
view to Databricks by setting up a query like this one:
SELECT event_id, time, user_id, session_id, 'test_event_table' AS event_table_nameFROM "TEST_DB"."TEST_SCHEMA"."TEST_EVENT_TABLE"UNIONSELECT event_id, time, user_id, session_id, 'click_event_table' AS event_table_nameFROM "SCHEMA"."CLICK_EVENT_TABLE"UNIONSELECT event_id, time, user_id, session_id, 'pageview_event_table' AS event_table_nameFROM "SCHEMA"."PAGEVIEW_EVENT_TABLE"
Limitations
Section titled Limitations- The All Events table is not synced to Databricks. As a workaround, you can create your own
all_events
. - Defined properties syncing is not supported during beta.
- Segments syncing is not supported during beta.