Go to Studio

Data Synchronization Workflow

Keep your data sources in sync automatically

What you’ll build

A workflow that synchronizes data between systems — for example, keeping a Google Sheet in sync with BigQuery, or pushing CRM data to your data warehouse.

Time to build: 20 minutes

Nodes used: Google Sheets, Loop, BigQuery Writer, Conditional

The problem

Data lives in many places:

  • Marketing data in Google Sheets
  • Sales data in CRM
  • Product data in databases
  • Analytics in BigQuery

Manual syncing is error-prone and time-consuming.

Workflow overview

graph LR
    A[Source Data] --> B[Transform]
    B --> C[Validate]
    C --> D{Valid?}
    D -->|Yes| E[Write to Destination]
    D -->|No| F[Log Error]
    E --> G[Update Sync Log]

Common sync patterns

Sheet to BigQuery

Sync marketing spreadsheets to your data warehouse:

Google Sheets → Transform → BigQuery Writer

CRM to warehouse

Keep sales data centralized:

HubSpot → Transform → BigQuery Writer

Multi-source aggregation

Combine data from multiple sources:

Source A ─┐
Source B ─┼→ Merge → Destination
Source C ─┘

Step-by-step guide: Sheet to BigQuery

Step 1: Set up your source

Create a Google Sheet with structured data:

datecampaignspendimpressionsclicks
2024-01-15Brand1000500002500
2024-01-15Search2000300003000

Step 2: Create the workflow

  1. Click New Workflow
  2. Name it “Marketing Data Sync”

Step 3: Fetch source data

Add Google Sheets Reader

Configure:

  • Spreadsheet ID: Your marketing sheet
  • Range: Marketing!A:E
  • Include Headers: Yes

Step 4: Transform data

Add Loop node

Process each row:

  • Items: {{GoogleSheets_0.data}}

Add transformation logic

Use an LLM or utility nodes to:

  • Convert date formats
  • Calculate derived metrics
  • Clean text fields

Example transformation:

{
  "date": "{{date}}",
  "campaign": "{{Lcampaign}}",
  "spend": "{{spend}}"
}

Step 5: Validate data

Add Conditional node

Check for valid data:

Conditions:

  • date is not empty
  • spend is a number
  • campaign is not empty

Route invalid records to error logging.

Step 6: Write to destination

Add BigQuery Writer node

Configure:

  • Project: Your GCP project
  • Dataset: marketing_data
  • Table: campaign_metrics
  • Write Mode: Append (or Replace for full refresh)
  • Data: Transformed row data

Step 7: Log sync results

Add Google Sheets Writer for logging

Configure:

  • Spreadsheet ID: Your sync log
  • Data:
[[
  "{{timestamp}}",
  "{{totalItems}}",
  "{{successCount}}",
  "{{errorCount}}",
  "success"
]]

Handling sync modes

Full refresh

Replace all data on each sync:

1. Delete existing records
2. Insert all new records

Best for: Small datasets, daily snapshots

Incremental sync

Only sync new/changed records:

1. Track last sync timestamp
2. Fetch only records after timestamp
3. Insert/update as needed

Best for: Large datasets, real-time needs

Upsert (Update or Insert)

Update existing records, insert new ones:

1. Check if record exists (by ID)
2. If exists: Update
3. If not: Insert

Best for: CRM data, master data

Implementing incremental sync

Track last sync time

Store the last successful sync timestamp:

graph LR
    A[Get Last Sync Time] --> B[Fetch Records Since Then]
    B --> C[Process]
    C --> D[Update Last Sync Time]

Fetch last sync time

Read from a tracking sheet or database:

SELECT MAX(synced_at) FROM sync_log WHERE table = 'campaigns'

Filter source data

Only fetch new records:

WHERE updated_at > '{{lastSyncTime}}'

Update sync time on success

After successful sync:

INSERT INTO sync_log (table, synced_at, record_count)
VALUES ('campaigns', NOW(), {{recordCount}})

Error handling

Retry logic

For transient failures:

Loop:
  Try write to BigQuery
  If fails:
    Wait 5 seconds
    Retry (max 3 times)
  If still fails:
    Log error
    Continue with next record

Dead letter queue

Store failed records for manual review:

graph LR
    A[Record] --> B{Valid?}
    B -->|Yes| C[Write to Destination]
    B -->|No| D[Dead Letter Queue]
    C -->|Fail| D

Alert on failures

Send notifications when sync fails:

Conditional:
  If error_count > threshold
    → Send alert email
    → Post to Slack

Scheduling sync jobs

Sync TypeFrequencyTiming
Real-timeEvery 5-15 minContinuous
Near real-timeEvery hourOn the hour
DailyOnce per dayOff-peak (2-4 AM)
WeeklyOnce per weekWeekend

Best practices

Idempotency

Make syncs safe to re-run:

  • Use upsert instead of insert
  • Include timestamps for deduplication
  • Track sync batches with IDs

Data validation

Validate before writing:

  • Check required fields
  • Verify data types
  • Validate ranges
  • Check referential integrity

Logging

Log everything:

  • Start/end times
  • Record counts
  • Success/failure status
  • Error details

Monitoring

Track sync health:

MetricAlert Threshold
Sync duration> 2x normal
Error rate> 5%
Missing syncs> 1 missed
Data lag> 1 hour

Results you can expect

MetricManualAutomated
Sync frequencyDaily (maybe)Every 15 min
Error rate5-10%< 1%
Time spent2-4 hours/week0
Data freshnessHours/daysMinutes

Next steps