Managing Data Syncing

This guide explains how Data Connect syncs data to your data warehouse and how to effectively manage this process.

Data Connect uses a reliable ETL (Extract, Transform, Load) process to move data from Heap to your data warehouse. Understanding this process helps you effectively manage your data pipeline and troubleshoot any issues.

Data Connect offers different sync frequency options depending on your plan:

  • Daily Sync: All plans include daily data syncs
  • Hourly Sync: Enterprise plans can enable hourly syncs for more frequent data updates
  • Custom Schedules: Enterprise plans can work with Heap to establish custom sync schedules

The sync frequency affects how current your data is in the warehouse. More frequent syncs provide fresher data but may increase warehouse compute costs.

Initial Sync vs. Incremental Updates

Section titled Initial Sync vs. Incremental Updates

Data Connect uses two types of data syncs:

When you first set up Data Connect or add a new table, an initial sync copies all historical data from Heap to your warehouse. This process:

  • Creates the necessary tables and views
  • Copies all historical data
  • May take several hours to days depending on data volume
  • Runs only once per table (unless a full resync is needed)

After the initial sync, Data Connect performs incremental updates during each sync window. These updates:

  • Only transfer new or changed data since the last sync
  • Are much faster than initial syncs
  • Maintain data consistency while minimizing warehouse load
  • Run on your configured schedule (daily, hourly, etc.)

You can monitor the status of your Data Connect syncs through:

  • The Data Connect dashboard in the Heap UI
  • Email notifications for failed syncs (if configured)
  • Warehouse query logs showing Data Connect activity

It’s good practice to regularly verify that syncs are completing successfully, especially after making changes to your Heap implementation.

As your Heap implementation evolves (adding new events or properties), Data Connect handles schema evolution automatically:

How Schema Changes Are Handled

Section titled How Schema Changes Are Handled
  • New Property Added: A new column is added to the appropriate table
  • New Event Defined: A new table is created for the event
  • Property Type Change: Handled according to warehouse-specific rules

Some schema changes may require special handling:

  • Column Name Conflicts: If you rename properties in Heap to match existing columns
  • Type Incompatibilities: If property values change in ways that conflict with existing column types
  • Reserved Words: If new properties use names that are reserved in your warehouse

If a Connect sync fails:

  1. Heap’s system will automatically retry the sync
  2. If multiple retries fail, you’ll receive a notification
  3. Heap’s support team can help diagnose and resolve the issue
  4. Once resolved, syncs will resume from where they left off

Data Retention and Historical Data

Section titled Data Retention and Historical Data

Data Connect syncs all data available in your Heap account based on your data retention settings:

  • Data retention periods are set at the Heap account level
  • Deleted data in Heap will not be removed from your warehouse automatically
  • If you need to remove data from your warehouse, you’ll need to do so manually

To ensure smooth operation of your Data Connect pipeline:

  1. Monitor sync status regularly: Check that syncs are completing successfully
  2. Plan for schema changes: Consider potential impacts when adding new properties
  3. Test new warehouse queries: Verify queries after schema changes
  4. Manage warehouse resources: Schedule heavy queries outside of sync windows
  5. Document custom tables: Maintain documentation for any views or derived tables you create 1.. Set up alerting: Configure monitoring for sync failures
  6. Manage data volume: Archive or partition historical data as needed

If you encounter issues with your Data Connect syncs:

  1. Check for error messages in the Data Connect dashboard
  2. Verify warehouse permissions and quotas
  3. Look for schema conflicts
  4. Review recent changes to your Heap implementation
  5. Contact Heap support with specific error messages and timestamps

For detailed answers to common sync issues, see the Troubleshooting FAQs guide.