Managing Data Syncing
This guide explains how Data Connect syncs data to your data warehouse and how to effectively manage this process.
Sync Process Overview
Section titled Sync Process OverviewData Connect uses a reliable ETL (Extract, Transform, Load) process to move data from Heap to your data warehouse. Understanding this process helps you effectively manage your data pipeline and troubleshoot any issues.
Sync Frequency
Section titled Sync FrequencyData Connect offers different sync frequency options depending on your plan:
- Daily Sync: All plans include daily data syncs
- Hourly Sync: Enterprise plans can enable hourly syncs for more frequent data updates
- Custom Schedules: Enterprise plans can work with Heap to establish custom sync schedules
The sync frequency affects how current your data is in the warehouse. More frequent syncs provide fresher data but may increase warehouse compute costs.
Initial Sync vs. Incremental Updates
Section titled Initial Sync vs. Incremental UpdatesData Connect uses two types of data syncs:
Initial Sync
Section titled Initial SyncWhen you first set up Data Connect or add a new table, an initial sync copies all historical data from Heap to your warehouse. This process:
- Creates the necessary tables and views
- Copies all historical data
- May take several hours to days depending on data volume
- Runs only once per table (unless a full resync is needed)
Incremental Updates
Section titled Incremental UpdatesAfter the initial sync, Data Connect performs incremental updates during each sync window. These updates:
- Only transfer new or changed data since the last sync
- Are much faster than initial syncs
- Maintain data consistency while minimizing warehouse load
- Run on your configured schedule (daily, hourly, etc.)
Monitoring Sync Status
Section titled Monitoring Sync StatusYou can monitor the status of your Data Connect syncs through:
- The Data Connect dashboard in the Heap UI
- Email notifications for failed syncs (if configured)
- Warehouse query logs showing Data Connect activity
It’s good practice to regularly verify that syncs are completing successfully, especially after making changes to your Heap implementation.
Managing Schema Changes
Section titled Managing Schema ChangesAs your Heap implementation evolves (adding new events or properties), Data Connect handles schema evolution automatically:
How Schema Changes Are Handled
Section titled How Schema Changes Are Handled- New Property Added: A new column is added to the appropriate table
- New Event Defined: A new table is created for the event
- Property Type Change: Handled according to warehouse-specific rules
Potential Issues
Section titled Potential IssuesSome schema changes may require special handling:
- Column Name Conflicts: If you rename properties in Heap to match existing columns
- Type Incompatibilities: If property values change in ways that conflict with existing column types
- Reserved Words: If new properties use names that are reserved in your warehouse
Sync Failure Handling
Section titled Sync Failure HandlingIf a Connect sync fails:
- Heap’s system will automatically retry the sync
- If multiple retries fail, you’ll receive a notification
- Heap’s support team can help diagnose and resolve the issue
- Once resolved, syncs will resume from where they left off
Data Retention and Historical Data
Section titled Data Retention and Historical DataData Connect syncs all data available in your Heap account based on your data retention settings:
- Data retention periods are set at the Heap account level
- Deleted data in Heap will not be removed from your warehouse automatically
- If you need to remove data from your warehouse, you’ll need to do so manually
Best Practices
Section titled Best PracticesTo ensure smooth operation of your Data Connect pipeline:
- Monitor sync status regularly: Check that syncs are completing successfully
- Plan for schema changes: Consider potential impacts when adding new properties
- Test new warehouse queries: Verify queries after schema changes
- Manage warehouse resources: Schedule heavy queries outside of sync windows
- Document custom tables: Maintain documentation for any views or derived tables you create 1.. Set up alerting: Configure monitoring for sync failures
- Manage data volume: Archive or partition historical data as needed
Troubleshooting
Section titled TroubleshootingIf you encounter issues with your Data Connect syncs:
- Check for error messages in the Data Connect dashboard
- Verify warehouse permissions and quotas
- Look for schema conflicts
- Review recent changes to your Heap implementation
- Contact Heap support with specific error messages and timestamps
For detailed answers to common sync issues, see the Troubleshooting FAQs guide.