---
title: Troubleshooting - Data Connect
description: Troubleshooting Data Connect
lastUpdated: 07 April 2026
source_url:
  html: https://docs.contentsquare.com/en/connect/troubleshooting/
  md: https://docs.contentsquare.com/en/connect/troubleshooting/index.md
---

> Documentation index: https://docs.contentsquare.com/llms.txt
> Use this file to discover all available pages before exploring further.

Running into unexpected data or performance issues with Data Connect? Find answers to common data questions — from duplicate events and missing user properties to query optimization tips — so you can troubleshoot quickly and get back to your analysis.

## Why are there duplicate event IDs in the All Events view in my data warehouse?

Some duplicate pairs (`user_id`, `event_id`) can be expected in the `all_events` table, as long as the `event_table_name` value is different for the rows that share an event ID.

If you have two event definitions that overlap, it's possible for a single event with a unique event ID to qualify for multiple event definitions.

Let's say you have two click event definitions: one that captures every click on your platform and one that captures clicks on a specific element. A click event that qualifies for the first event will automatically qualify for the second event because the first event captures any click.

If both event definitions are synced to Data Connect, they will each have their own event table downstream and a single event that qualifies for both definitions will appear on both tables with the same event ID. An event will always have just one event ID, even if it qualifies for multiple event definitions (and by extension, is included on multiple event tables in your data warehouse).

The `all_events` table is a **UNION** of every individual event table that has been synced to your data warehouse. If both individual event tables include the same event, then the `all_events` table will show duplicate event IDs. However, these two rows will have different `event_table_name` values.

The primary key for `all_events` should be a composite of `user_id`, `event_id`, and `event_table_name`. In individual event tables, a composite of `user_id` and `event_id` should suffice.

## Why don't I see initial user properties in Data Connect?

We don't currently sync Initial User properties downstream in Data Connect (such as Initial Marketing Channel, or Initial Browser). Refer to [how to recreate those properties in your warehouse](https://docs.contentsquare.com/en/connect/common-queries/#creating-an-enhanced-users-table).

## Why is a given `event_id` included more than once on the `all_events` table?

This FAQ applies to using Data Connect for warehouses (Redshift, BigQuery, Snowflake) only.

A given `event_id` across all tables corresponds with a unique event recorded by Contentsquare. That said, the same `event_id` may exist in multiple event tables, as Contentsquare provides the flexibility to create multiple event definitions that may correspond to the same raw event. For instance, you may define the following two events in the product:

* **Click CTA** defined as Click on .cta
* **Click CTA - Homepage** defined as Click on .cta with a filter where Path equals /

If a user clicks the CTA on the homepage, a new event will be recorded on both event tables that correspond with both event definitions. As a result, two events with the same `event_id` will be included in the `all_events` table, as it contains every recorded instance of all defined and custom events.

Additionally, pageviews can share an `event_id` with a *defined* event. The defined event with a given `event_id` will have a unique time, after excluding where `event_name = 'pageviews'`.

## Table constraints are not enforced

Data Connect defines primary and foreign key constraints in the table schema, but no supported warehouse (Redshift, BigQuery, Snowflake) enforces them. Constraints exist for documentation and query optimization only — data integrity is your responsibility.

To avoid duplicate records:

* Use `DISTINCT` or window functions for deduplication in queries
* Implement deduplication logic in your ETL pipelines
* Consider creating views with built-in deduplication for commonly used datasets

## Query performance issues

If you're experiencing slow query performance when working with Data Connect data, consider these optimization strategies:

### General optimization tips

1. **Filter early**: Apply date range filters and other conditions as early as possible in your queries

   ```sql
   -- Good: Filter before joining
   SELECT *
   FROM sessions
   WHERE time >= DATEADD('day', -30, GETDATE())
   ```

2. **Select specific columns**: Avoid `SELECT *` and only query the columns you need

   ```sql
   -- Good: Select only needed columns
   SELECT user_id, session_id, time, device_type
   FROM sessions
   ```

3. **Use appropriate joins**: Choose the right join type (INNER, LEFT, RIGHT) based on your analysis needs

4. **Limit result sets**: Use `LIMIT` for exploratory queries

   ```sql
   SELECT * FROM all_events
   WHERE time >= DATEADD('day', -1, GETDATE())
   LIMIT 1000
   ```

5. **Use `EXPLAIN` to understand query plans**: Before optimizing, review the execution plan to identify bottlenecks like full table scans or expensive joins

6. **Create materialized views for frequent queries**: Cache results of commonly-run analytical queries to avoid recomputation

   ```sql
   CREATE MATERIALIZED VIEW daily_metrics AS
   SELECT DATE(time) as day, COUNT(*) as event_count
   FROM all_events
   GROUP BY DATE(time);
   ```

### Warehouse-specific optimization

**Redshift:**

* Use distribution keys and sort keys for frequently queried tables
* Run `VACUUM` and `ANALYZE` regularly on Connect tables
* Consider creating materialized views for complex, frequently-run queries

**BigQuery:**

* Take advantage of table partitioning (Data Connect tables are partitioned by date)
* Use approximate aggregation functions like `APPROX_COUNT_DISTINCT()` for large datasets
* Consider clustering keys for frequently filtered columns

**Snowflake:**

* Configure warehouse size appropriately for your workload
* Use Snowflake's automatic clustering or define clustering keys
* Set up automatic warehouse suspension to control costs
* Create materialized views for frequently-run analytical queries

**S3:**

* Optimize your ETL process to create partitioned Parquet or ORC files
* Use AWS Athena or EMR with proper partition pruning
* Consider using AWS Glue for schema management and cataloging

### Common query patterns to optimize

**Large table scans:**

```sql
-- Instead of scanning all events
SELECT COUNT(*) FROM all_events


-- Filter to specific time range and event types
SELECT COUNT(*)
FROM all_events
WHERE time >= DATEADD('month', -1, GETDATE())
  AND event_table_name IN ('page_view', 'button_click')
```

**Expensive joins:**

```sql
-- Instead of joining all sessions with all events
SELECT *
FROM sessions s
INNER JOIN all_events e ON s.session_id = e.session_id


-- Filter both tables first, then join
WITH recent_sessions AS (
  SELECT * FROM sessions
  WHERE time >= DATEADD('day', -7, GETDATE())
),
recent_events AS (
  SELECT * FROM all_events
  WHERE time >= DATEADD('day', -7, GETDATE())
)
SELECT *
FROM recent_sessions s
INNER JOIN recent_events e
  ON s.session_id = e.session_id
  AND s.user_id = e.user_id
```
