Data Schema
This guide explains the schema structure of Contentsquare data in your data warehouse, helping you understand how to effectively query and analyze the data.
Data Connect creates several tables in your data warehouse, each representing a different aspect of user behavior data. Understanding this schema is essential for effective analysis and joining with other data sources.
Data Connect adopts a hierarchical schema. By default, Data Connect will sync the following tables:
- One users table (for all user information)
- One sessions table (for all session information)
- One pageviews table (for all pageview information)
Data Connect creates the following core tables in your data warehouse.
users
Section titled usersThe users
table contains a row for each unique user that has visited your site.
Column Name | Type | Description |
---|---|---|
user_id | BIGINT | Unique ID of user, randomly generated by Contentsquare (CSQ). |
identity | TEXT | User’s username or other unique token, passed identify. Must be unique. |
handle | TEXT | User’s username or other unique token, passed identify. |
email | TEXT | User’s email address, passed in via addUserProperties API. |
joindate | TIMESTAMP | Timestamp without time zone of when the user was first seen. In S3, this is in UNIX timestamp format. |
last_modified | TIMESTAMP | Timestamp without time zone of when the user’s data was last modified. In S3, this is in UNIX timestamp format. |
user properties | TEXT | There will be one column for every unique user property you’ve sent via the addUserProperties API (name, gender, account status, etc). The column type is automatically inferred from the underlying property values. |
sessions
Section titled sessionsThe sessions
table represents individual user sessions.
Column Name | Type | Library | Description |
---|---|---|---|
event_id | BIGINT | All | ID of associated session event, randomly generated by CSQ. |
user_id | BIGINT | All | ID of associated user, randomly generated by CSQ. |
session_id | BIGINT | All | ID of associated session, randomly generated by CSQ. |
time | TIMESTAMP | All | Timestamp without time zone of when session started. |
library | TEXT | All | Version of Contentsquare library which began the session. Can be one of “web” or “iOS”. |
platform | TEXT | Web | User’s operating system. |
device_type | TEXT | Web | Device type, which can only be one “Desktop”. |
country | TEXT | All | Country in which user session occurred, based on IP. |
region | TEXT | All | Region in which user session occurred, based on IP. |
city | TEXT | All | City in which user session occurred, based on IP. |
referrer | TEXT | Web | URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as direct in the UI. |
landing_page | TEXT | Web | URL of the first pageview of the session. |
browser | TEXT | Web | User’s browser. |
utm_source | TEXT | Web | GA-based utm_source tag associated with the session’s initial pageview. |
utm_campaign | TEXT | Web | GA-based utm_campaign tag associated with the session’s initial pageview. |
utm_medium | TEXT | Web | GA-based utm_medium tag associated with the session’s initial pageview. |
utm_term | TEXT | Web | GA-based utm_term tag associated with the session’s initial pageview. |
utm_content | TEXT | Web | GA-based utm_content tag associated with the session’s initial pageview. |
custom_var | TEXT | Web | key-value pairs that define more information about the pageview. |
session_replay_link | TEXT | Web | Contentsquare Platform url link of the session replay. |
session_duration | TEXT | Web | Total duration of the session. |
session_language | TEXT | Web | Language displayed during the session. |
session_number_of_views | TEXT | Web | The number of page views in the session. |
frustration_score | TEXT | Web | Measure of user frustration during the session. |
looping_Index | TEXT | Web | The proportion of same urls visited during the session (meaning the user is doing back & forth). |
page_consumption | TEXT | Web | Proportion of pages where the user spent less than 2s and did less than 2 clicks. |
filtered_APV | TEXT | Web | Number of views in which APVs (Artificial Pageviews) are not counted / a boolean determining whether a given page view is an Artificial Page view. |
pageviews
Section titled pageviewsThe pageviews
table contains detailed information about each pageview.
Column Name | Type | Library | Description |
---|---|---|---|
event_id | BIGINT | All | ID of associated pageview event. |
user_id | BIGINT | All | Unique ID of associated user, randomly generated by CSQ. |
session_id | BIGINT | All | Unique ID of associated session, randomly generated by CSQ. |
time | TIMESTAMP | All | Timestamp without time zone of when the pageview occurred. |
library | TEXT | All | Version of Contentsquare library which started the session. Can be one of “web” or “iOS”. |
platform | TEXT | Web | User’s operating system. |
device_type | TEXT | Web | Device type, which can only be one “Desktop”. |
country | TEXT | Web | Country in which user session occurred, based on IP. |
region | TEXT | Web | Region in which user session occurred, based on IP. |
city | TEXT | Web | City in which user session occurred, based on IP. |
referrer | TEXT | Web | URL that linked to your site and started the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as direct in the UI. |
landing_page | TEXT | Web | URL of the first pageview of the session. |
browser | TEXT | Web | User’s browser. |
utm_source | TEXT | Web | GA-based utm_source tag associated with the session’s initial pageview. |
utm_campaign | TEXT | Web | GA-based utm_campaign tag associated with the session’s initial pageview. |
utm_medium | TEXT | Web | GA-based utm_medium tag associated with the session’s initial pageview. |
utm_term | TEXT | Web | GA-based utm_term tag associated with the session’s initial pageview. |
utm_content | TEXT | Web | GA-based utm_content tag associated with the session’s initial pageview. |
path | TEXT | Web | The path of the pageview. |
query | TEXT | Web | The query parameters associated with the pageview. |
hash | TEXT | Web | The hash parameters associated with the pageview. |
title | TEXT | Web | Title of the current page. |
custom_var | TEXT | Web | key-value pairs that define more information about the pageview. |
dynamic_var | TEXT | Web | key-value pairs that define more information about the session. |
view_number | TEXT | Web | The pageview number in the session. |
is_first | TEXT | Web | If the page is the entry page then value is 1 otherwise value is 0. |
is_last | TEXT | Web | If the page is the exit page then value is 1, otherwise value is 0. |
scroll_rate | TEXT | Web | The portion of the page which was displayed (in percentage points). |
view_duration_msec | TEXT | Web | Duration between the page view event time (view_time) and the timestamp of the last interaction event (in milliseconds). |
first_input_delay | TEXT | Web | Delay (in milliseconds) between the first user interaction and when the web browser can handle it (part of Core Web Vitals). |
interaction_to_next_paint | TEXT | Web | Longest visual delay (in milliseconds) experienced by a user when interacting with the page (clicks or keystrokes). |
time_to_first_byte | TEXT | Web | Delay (in milliseconds) between the request for the web page and when the web browser received the first piece of data. |
largest_contentful_paint | TEXT | Web | Delay (in milliseconds) to render the largest piece of content (part of Core Web Vitals). |
cumulative_layout_shift | TEXT | Web | Score about the layout stability of the page rendering (part of Core Web Vitals). |
dom_interactive_after_msec | TEXT | Web | The time in milliseconds required before the user is able to interact with the page (although all resources may not be fully loaded yet). |
fully_loaded | TEXT | Web | The time in milliseconds required to fully load all the resources of the page, from initiation of the pageview to load completion in the browser. |
first_contentful_paint | TEXT | Web | Delay (in milliseconds) to render the first piece of content. |
start_render | TEXT | Web | Timestamp sent by tag when the rendering of the page started. |
window_height | TEXT | Web | Height of the window in which the page is opened. |
window_width | TEXT | Web | Width of the window in which the page is opened. |
session_replay_link | TEXT | Web | Contentsquare Platform url link of the session replay. |
events
Section titled eventsContentsquare will create one table for every labeled event you’ve created within Contentsquare and synced downstream. You currently don’t have access to the UI to self-create your event.
Column Name | Type | Libraries | Description |
---|---|---|---|
event_id | BIGINT | All | ID of the associated event, randomly generated by CSQ. |
user_id | BIGINT | All | Unique ID of the associated user, randomly generated by CSQ. |
session_id | BIGINT | All | Unique ID of the associated session, randomly generated by CSQ. |
pageview_id | BIGINT | All | Unique ID of the associated page, randomly generated by CSQ. |
time | TIMESTAMP | All | Timestamp without time zone of when the event happened. |
type | TEXT | All | For web autocaptured events, can be any of view page, click, submit, change, with push state events registered as view page events. For custom events, this will be the custom event name. |
library | TEXT | All | Version of Contentsquare library on which event occurred. Can be one of “web”, “iOS”, or “server”. |
platform | TEXT | Web | User’s operating system. |
device_type | TEXT | Web | Device type, which can only be one “Desktop”. |
country | TEXT | Web | Country in which user session occurred, based on IP. |
region | TEXT | Web | Region in which user session occurred, based on IP. |
city | TEXT | Web | City in which user session occurred, based on IP. |
referrer | TEXT | Web | URL that linked to your site and started the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as direct in the UI. |
landing_page | TEXT | Web | URL of the first pageview of the session. |
landing_page_query | TEXT | Web | The query parameters of the first page of the user’s session. |
landing_page_hash | TEXT | Web | The hash route of the first page of the user’s session. |
browser | TEXT | Web | User’s browser. |
utm_source | TEXT | Web | GA-based utm_source tag associated with the session’s initial pageview. |
utm_campaign | TEXT | Web | GA-based utm_campaign tag associated with the session’s initial pageview. |
utm_medium | TEXT | Web | GA-based utm_medium tag associated with the session’s initial pageview. |
utm_term | TEXT | Web | GA-based utm_term tag associated with the session’s initial pageview. |
utm_content | TEXT | Web | GA-based utm_content tag associated with the session’s initial pageview. |
domain | TEXT | Web | Domain including subdomain, e.g. blog.company.io. |
path | TEXT | Web | Portion of the current URL following your domain, e.g. /docs for company.io/docs. |
hash | TEXT | Web | Portion of the current URL following the hash sign, e.g. #install for company.io/docs#install. |
query | TEXT | Web | Query params of the page’s current URL, e.g. ?utm_id=1234 for company.io?utm_id=1234. |
title | TEXT | Web | Title of the current page. |
href | TEXT | Web | href property of link (used for clicks on anchor tags). |
target_text | TEXT | Web, iOS | Button text of the event target. |
api_errors
Section titled api_errorsThe api_errors
table contains information about API errors detected during user sessions.
Column Name | Type | Library | Description |
---|---|---|---|
event_id | BIGINT | Errors | ID of associated event, randomly generated. |
user_id | BIGINT | Errors | ID of associated user, randomly generated. |
session_id | BIGINT | Errors | ID of associated session, randomly generated. |
pageview_id | BIGINT | Frustration | Unique ID of the associated page, randomly generated by CSQ. |
time | TIMESTAMP | Errors | Timestamp without time zone of when the event happened. |
library | TEXT | Errors | Version of Contentsquare library which began the session. “web” or “iOS”. |
platform | TEXT | Errors | User’s operating system. |
device_type | TEXT | Errors | Device type, which can only be one “Desktop”. |
country | TEXT | Errors | Country in which user session occurred, based on IP. |
region | TEXT | Errors | Region in which user session occurred, based on IP. |
city | TEXT | Errors | City in which user session occurred, based on IP. |
landing_page | TEXT | Errors | URL of the first pageview of the session. |
api_error_duration | TEXT | Errors | Time taken to resolve API error. |
api_error_endpoint | TEXT | Errors | URL where API errors are reported. |
api_error_method | TEXT | Errors | Standardized communication of API error responses. |
api_error_status_code | TEXT | Errors | Numerical code identifying API error. |
errors_after_clicks | TEXT | Errors | Issue occurring post-user interaction (boolean). |
error_group_id | TEXT | Errors | Identifier for categorizing related errors. |
error_source | TEXT | Errors | Origin point of the reported error. |
type | TEXT | Errors | API error. |
referrer | TEXT | Errors | URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as direct in the UI. |
js_errors
Section titled js_errorsThe js_errors
table contains information about JavaScript errors detected during user sessions.
Column Name | Type | Library | Description |
---|---|---|---|
event_id | BIGINT | Errors | ID of associated session event, randomly generated. |
user_id | BIGINT | Errors | ID of associated user, randomly generated. |
session_id | BIGINT | Errors | ID of associated session, randomly generated. |
pageview_id | BIGINT | Frustration | Unique ID of the associated page, randomly generated by CSQ. |
time | TIMESTAMP | Errors | Timestamp without time zone of when the event happened. |
library | TEXT | Errors | Version of Contentsquare library which began the session. “web” or “iOS”. |
platform | TEXT | Errors | User’s operating system. |
device_type | TEXT | Errors | Device type, which can only be one “Desktop”. |
country | TEXT | Errors | Country in which user session occurred, based on IP. |
region | TEXT | Errors | Region in which user session occurred, based on IP. |
city | TEXT | Errors | City in which user session occurred, based on IP. |
landing_page | TEXT | Errors | URL of the first pageview of the session. |
error_message | TEXT | Errors | Description of the encountered error. |
error_line_number | TEXT | Errors | Line in the code where the error occurred. |
js_error_file_name | TEXT | Errors | File name containing the JavaScript error. |
js_error_column_number | TEXT | Errors | Column in the file where the error occurred. |
errors_after_clicks | TEXT | Errors | Issue occurring post-user interaction (boolean). |
error_group_id | TEXT | Errors | Identifier for categorizing related errors. |
error_source | TEXT | Errors | Origin point of the reported error. |
type | TEXT | Errors | JS error. |
referrer | TEXT | Errors | URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as direct in the UI. |
custom_errors
Section titled custom_errorsThe custom_errors
table contains information about custom errors detected during user sessions.
Column Name | Type | Library | Description |
---|---|---|---|
event_id | BIGINT | Error | ID of associated session event, randomly generated. |
user_id | BIGINT | Error | ID of associated user, randomly generated. |
session_id | BIGINT | Error | ID of associated session, randomly generated. |
pageview_id | BIGINT | Error | Unique ID of the associated page, randomly generated by CSQ. |
time | TIMESTAMP | Error | Timestamp without time zone of when the event happened. |
library | TEXT | Error | Version of Contentsquare library which began the session. “web” or “iOS”. |
platform | TEXT | Error | User’s operating system. |
device_type | TEXT | Error | Device type, which can only be one “Desktop”. |
country | TEXT | Error | Country in which user session occurred, based on IP. |
region | TEXT | Error | Region in which user session occurred, based on IP. |
city | TEXT | Error | City in which user session occurred, based on IP. |
landing_page | TEXT | Error | URL of the first pageview of the session. |
error_message | TEXT | Error | Description of the encountered error. |
errors_after_clicks | TEXT | Error | Issue occurring post-user interaction (boolean). |
error_group_id | TEXT | Error | Identifier for categorizing related errors. |
error_source | TEXT | Error | Origin point of the reported error. |
type | TEXT | Error | Custom Error. |
referrer | TEXT | Error | URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as direct in the UI. |
frustration_factors
Section titled frustration_factorsThe following data schema table will be the same for every frustration factor (rage click, excessive hovering, multiple use element, multiple field interactions, multiple button interactions, loading time high and low activity).
Column Name | Type | Library | Description |
---|---|---|---|
event_id | BIGINT | Frustration | ID of the associated event, randomly generated by Contentsquare. |
user_id | BIGINT | Frustration | Unique ID of the associated user, randomly generated by Contentsquare. |
session_id | BIGINT | Frustration | Unique ID of the associated session, randomly generated by Contentsquare. |
pageview_id | BIGINT | Frustration | Unique ID of the associated page, randomly generated by Contentsquare. |
time | TIMESTAMP | Frustration | Timestamp without time zone of when the event happened. |
library | TEXT | Frustration | Frustration |
platform | TEXT | Frustration | User’s operating system. |
device_type | TEXT | Frustration | Device type, which can only be one “Desktop”. |
country | TEXT | Frustration | Country in which user session occurred, based on IP. |
region | TEXT | Frustration | Region in which user session occurred, based on IP. |
city | TEXT | Frustration | City in which user session occurred, based on IP. |
landing_page | TEXT | Frustration | URL of the first pageview of the session. |
type | TEXT | Frustration | rage click |
referrer | TEXT | Frustration | URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as direct in the UI. |
domain | TEXT | Frustration | Domain including subdomain, e.g. http://blog.company.io ↗. |
path | TEXT | Frustration | Portion of the current URL following your domain, e.g. /docs for company.io/docs. |
hash | TEXT | Frustration | Portion of the current URL following the hash sign, e.g. #install for company.io/docs#install. |
query | TEXT | Frustration | Query params of the page’s current URL, e.g. ?utm_id=1234 for http://company.io?utm_id=1234 ↗. |
target_text | TEXT | Frustration | The text displayed on the button clicked. |
frustration_score | TEXT | Frustration | Frustration score of the pageview. |
target_path | TEXT | Frustration | Target path of the HTML component where the event happens. |
relative_time | INT | Frustration | The point in time when the event happens. |
value | INT | Frustration | The number of clicks. |
ecommerce_transactions
Section titled ecommerce_transactionsThe ecommerce_transactions
table contains information about e-commerce transactions.
Column Name | Type | Library | Description |
---|---|---|---|
event_id | BIGINT | Contentsquare | ID of associated session event, randomly generated. |
user_id | BIGINT | Contentsquare | ID of associated user, randomly generated. |
session_id | BIGINT | Contentsquare | ID of associated session, randomly generated. |
time | TIMESTAMP | Contentsquare | Timestamp without time zone of when the pageview occurred. |
library | TEXT | Contentsquare | Version of Contentsquare library which began the session. “web” or “iOS”. |
platform | TEXT | Contentsquare | User’s operating system. |
country | TEXT | Contentsquare | Country in which user session occurred, based on IP. |
region | TEXT | Contentsquare | Region in which user session occurred, based on IP. |
city | TEXT | Contentsquare | City in which user session occurred, based on IP. |
referrer | TEXT | Contentsquare | URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as direct in the UI. |
type | TEXT | Contentsquare | E-commerce transaction. |
device_type | TEXT | Contentsquare | Device type, which can only be one “Desktop”. |
transaction_amount | TEXT | Contentsquare | The total monetary value of a purchase or payment made by a user during a session. |
transaction_currency | TEXT | Contentsquare | The currency in which the transaction amount was processed. |
all_events
Section titled all_eventsTable Does Not Sync For S3. The all_events
table does not sync for S3. You can create an all_events
view of all synced tables after the data synced to S3 has gone through ETL processing.
Column Name | Type | Description |
---|---|---|
event_id | BIGINT | ID of associated event, randomly generated by Contentsquare. |
user_id | BIGINT | Unique ID of associated user, randomly generated by Contentsquare. |
session_id | BIGINT | Unique ID of associated session, randomly generated by Contentsquare. |
time | TIMESTAMP | Timestamp without time zone of when event happened. |
event_table_name | TEXT | Name of the table where similar events can be found. |
_sync_history
Section titled _sync_historyContentsquare creates a table called _sync_history
that contains a record for every attempted sync. This table is helpful for monitoring sync performance and scheduling dependent jobs. All timestamps are in UTC for Redshift and BigQuery and in local time for Snowflake.
Column Name | Type | Description |
---|---|---|
status | STRING | Takes value running/succeeded/failed based on the outcome of the sync attempt. |
start_time | TIMESTAMP | Start time of the sync attempt. |
finish_time | TIMESTAMP | Finish time of the sync attempt. |
error | STRING | Error message if sync fails. |
next_scheduled_sync_at | TIMESTAMP | Time of the next scheduled sync (estimated at the time of sync based on sync frequency settings). |
_sync_info
Section titled _sync_infoContentsquare creates a table called _sync_info
that contains metadata around the sync process and last updated times.
Column Name | Type | Description |
---|---|---|
event_table_name | TEXT | The event name. |
sync_started | TIMESTAMP | A timestamp for when that table began syncing. |
sync_ended | TIMESTAMP | A timestamp for when that table completed syncing. |
synced_to_time | TIMESTAMP | A timestamp that reflects the most recent occurrence of the event. |
inserted_row_count | BIGINT | The number of rows inserted during the most recent sync. |
user_migrations
Section titled user_migrationsWhen a user is identified, a migration occurs to aggregate the data under the new identity. These migrations are recorded in this table. Time is a column for Redshift and S3 destinations only. Snowflake and BigQuery do not have a column for time.
Column Name | Type | Description |
---|---|---|
from_user_id | BIGINT | The migrating user’s ID. |
to_user_id | BIGINT | The destination user’s ID. |
time | TIMESTAMP | A timestamp for when the migration occurred. |
event_metadata
Section titled event_metadataThis table contains metadata about the names of events, which allows you to build automated jobs that use this metadata. This table is re-created on each sync to stay updated with the latest event metadata.
Column Name | Type | Description |
---|---|---|
table_name | TEXT | For each event, this row contains the sanitized name of the table containing the event’s data synced to your warehouse. |
ui_name | TEXT | For each event, this row contains the name of the event in Contentsquare. |
last_updated_at | TIMESTAMP | Time event label was last updated. |
last_updated_by | TEXT | User who last updated event label. |
notes | TEXT | Notes attached to the event. |
source | TEXT | Data capture source (ex: web, ios, android). |
verified_at | TIMESTAMP | Time event was verified. |
verified_by | TEXT | User who verified the event. |
The event_metadata
table shows up differently for S3. You can query it as follows:
{ "ui_name": "Stripe charge", "table_name": "stripe_charge", "source": "stripe", "last_updated_at": "1609459200000", "last_updated_by": "sales@contentsquare.com", "notes": "This is a note", "verified_at": "1640995200000", "verified_by": "admin@contentsquare.com"}