Data Synchronization Workflow
Keep your data sources in sync automatically
What you’ll build
A workflow that synchronizes data between systems — for example, keeping a Google Sheet in sync with BigQuery, or pushing CRM data to your data warehouse.
Time to build: 20 minutes
Nodes used: Google Sheets, Loop, BigQuery Writer, Conditional
The problem
Data lives in many places:
- Marketing data in Google Sheets
- Sales data in CRM
- Product data in databases
- Analytics in BigQuery
Manual syncing is error-prone and time-consuming.
Workflow overview
graph LR
A[Source Data] --> B[Transform]
B --> C[Validate]
C --> D{Valid?}
D -->|Yes| E[Write to Destination]
D -->|No| F[Log Error]
E --> G[Update Sync Log]
Common sync patterns
Sheet to BigQuery
Sync marketing spreadsheets to your data warehouse:
Google Sheets → Transform → BigQuery Writer
CRM to warehouse
Keep sales data centralized:
HubSpot → Transform → BigQuery Writer
Multi-source aggregation
Combine data from multiple sources:
Source A ─┐
Source B ─┼→ Merge → Destination
Source C ─┘
Step-by-step guide: Sheet to BigQuery
Step 1: Set up your source
Create a Google Sheet with structured data:
| date | campaign | spend | impressions | clicks |
|---|---|---|---|---|
| 2024-01-15 | Brand | 1000 | 50000 | 2500 |
| 2024-01-15 | Search | 2000 | 30000 | 3000 |
Step 2: Create the workflow
- Click New Workflow
- Name it “Marketing Data Sync”
Step 3: Fetch source data
Add Google Sheets Reader
Configure:
- Spreadsheet ID: Your marketing sheet
- Range:
Marketing!A:E - Include Headers: Yes
Step 4: Transform data
Add Loop node
Process each row:
- Items:
{{GoogleSheets_0.data}}
Add transformation logic
Use an LLM or utility nodes to:
- Convert date formats
- Calculate derived metrics
- Clean text fields
Example transformation:
{
"date": "{{date}}",
"campaign": "{{Lcampaign}}",
"spend": "{{spend}}"
} Step 5: Validate data
Add Conditional node
Check for valid data:
Conditions:
dateis not emptyspendis a numbercampaignis not empty
Route invalid records to error logging.
Step 6: Write to destination
Add BigQuery Writer node
Configure:
- Project: Your GCP project
- Dataset: marketing_data
- Table: campaign_metrics
- Write Mode: Append (or Replace for full refresh)
- Data: Transformed row data
Step 7: Log sync results
Add Google Sheets Writer for logging
Configure:
- Spreadsheet ID: Your sync log
- Data:
[[
"{{timestamp}}",
"{{totalItems}}",
"{{successCount}}",
"{{errorCount}}",
"success"
]] Handling sync modes
Full refresh
Replace all data on each sync:
1. Delete existing records
2. Insert all new records
Best for: Small datasets, daily snapshots
Incremental sync
Only sync new/changed records:
1. Track last sync timestamp
2. Fetch only records after timestamp
3. Insert/update as needed
Best for: Large datasets, real-time needs
Upsert (Update or Insert)
Update existing records, insert new ones:
1. Check if record exists (by ID)
2. If exists: Update
3. If not: Insert
Best for: CRM data, master data
Implementing incremental sync
Track last sync time
Store the last successful sync timestamp:
graph LR
A[Get Last Sync Time] --> B[Fetch Records Since Then]
B --> C[Process]
C --> D[Update Last Sync Time]
Fetch last sync time
Read from a tracking sheet or database:
SELECT MAX(synced_at) FROM sync_log WHERE table = 'campaigns'Filter source data
Only fetch new records:
WHERE updated_at > '{{lastSyncTime}}'Update sync time on success
After successful sync:
INSERT INTO sync_log (table, synced_at, record_count)
VALUES ('campaigns', NOW(), {{recordCount}}) Error handling
Retry logic
For transient failures:
Loop:
Try write to BigQuery
If fails:
Wait 5 seconds
Retry (max 3 times)
If still fails:
Log error
Continue with next record
Dead letter queue
Store failed records for manual review:
graph LR
A[Record] --> B{Valid?}
B -->|Yes| C[Write to Destination]
B -->|No| D[Dead Letter Queue]
C -->|Fail| D
Alert on failures
Send notifications when sync fails:
Conditional:
If error_count > threshold
→ Send alert email
→ Post to Slack
Scheduling sync jobs
| Sync Type | Frequency | Timing |
|---|---|---|
| Real-time | Every 5-15 min | Continuous |
| Near real-time | Every hour | On the hour |
| Daily | Once per day | Off-peak (2-4 AM) |
| Weekly | Once per week | Weekend |
Best practices
Idempotency
Make syncs safe to re-run:
- Use upsert instead of insert
- Include timestamps for deduplication
- Track sync batches with IDs
Data validation
Validate before writing:
- Check required fields
- Verify data types
- Validate ranges
- Check referential integrity
Logging
Log everything:
- Start/end times
- Record counts
- Success/failure status
- Error details
Monitoring
Track sync health:
| Metric | Alert Threshold |
|---|---|
| Sync duration | > 2x normal |
| Error rate | > 5% |
| Missing syncs | > 1 missed |
| Data lag | > 1 hour |
Results you can expect
| Metric | Manual | Automated |
|---|---|---|
| Sync frequency | Daily (maybe) | Every 15 min |
| Error rate | 5-10% | < 1% |
| Time spent | 2-4 hours/week | 0 |
| Data freshness | Hours/days | Minutes |