Skip to main content

What you’ll build

A workflow that synchronizes data between systems — for example, keeping a Google Sheet in sync with BigQuery, or pushing CRM data to your data warehouse. Time to build: 20 minutes Nodes used: Google Sheets, Loop, BigQuery Writer, Conditional

The problem

Data lives in many places:
  • Marketing data in Google Sheets
  • Sales data in CRM
  • Product data in databases
  • Analytics in BigQuery
Manual syncing is error-prone and time-consuming.

Workflow overview

Common sync patterns

Sheet to BigQuery

Sync marketing spreadsheets to your data warehouse:
Google Sheets → Transform → BigQuery Writer

CRM to warehouse

Keep sales data centralized:
HubSpot → Transform → BigQuery Writer

Multi-source aggregation

Combine data from multiple sources:
Source A ─┐
Source B ─┼→ Merge → Destination
Source C ─┘

Step-by-step guide: Sheet to BigQuery

Step 1: Set up your source

Create a Google Sheet with structured data:
datecampaignspendimpressionsclicks
2024-01-15Brand1000500002500
2024-01-15Search2000300003000

Step 2: Create the workflow

  1. Click New Workflow
  2. Name it “Marketing Data Sync”

Step 3: Fetch source data

1

Add Google Sheets Reader

Configure:
  • Spreadsheet ID: Your marketing sheet
  • Range: Marketing!A:E
  • Include Headers: Yes

Step 4: Transform data

1

Add Loop node

Process each row:
  • Items: {{GoogleSheets_0.data}}
2

Add transformation logic

Use an LLM or utility nodes to:
  • Convert date formats
  • Calculate derived metrics
  • Clean text fields
Example transformation:
{
  "date": "{{date}}",
  "campaign": "{{Lcampaign}}",
  "spend": "{{spend}}"
}

Step 5: Validate data

1

Add Conditional node

Check for valid data:Conditions:
  • date is not empty
  • spend is a number
  • campaign is not empty
Route invalid records to error logging.

Step 6: Write to destination

1

Add BigQuery Writer node

Configure:
  • Project: Your GCP project
  • Dataset: marketing_data
  • Table: campaign_metrics
  • Write Mode: Append (or Replace for full refresh)
  • Data: Transformed row data

Step 7: Log sync results

1

Add Google Sheets Writer for logging

Configure:
  • Spreadsheet ID: Your sync log
  • Data:
[[
  "{{timestamp}}",
  "{{totalItems}}",
  "{{successCount}}",
  "{{errorCount}}",
  "success"
]]

Handling sync modes

Full refresh

Replace all data on each sync:
1. Delete existing records
2. Insert all new records
Best for: Small datasets, daily snapshots

Incremental sync

Only sync new/changed records:
1. Track last sync timestamp
2. Fetch only records after timestamp
3. Insert/update as needed
Best for: Large datasets, real-time needs

Upsert (Update or Insert)

Update existing records, insert new ones:
1. Check if record exists (by ID)
2. If exists: Update
3. If not: Insert
Best for: CRM data, master data

Implementing incremental sync

Track last sync time

Store the last successful sync timestamp:
1

Fetch last sync time

Read from a tracking sheet or database:
SELECT MAX(synced_at) FROM sync_log WHERE table = 'campaigns'
2

Filter source data

Only fetch new records:
WHERE updated_at > '{{lastSyncTime}}'
3

Update sync time on success

After successful sync:
INSERT INTO sync_log (table, synced_at, record_count)
VALUES ('campaigns', NOW(), {{recordCount}})

Error handling

Retry logic

For transient failures:
Loop:
  Try write to BigQuery
  If fails:
    Wait 5 seconds
    Retry (max 3 times)
  If still fails:
    Log error
    Continue with next record

Dead letter queue

Store failed records for manual review:

Alert on failures

Send notifications when sync fails:
Conditional:
  If error_count > threshold
    → Send alert email
    → Post to Slack

Scheduling sync jobs

Sync TypeFrequencyTiming
Real-timeEvery 5-15 minContinuous
Near real-timeEvery hourOn the hour
DailyOnce per dayOff-peak (2-4 AM)
WeeklyOnce per weekWeekend

Best practices

Idempotency

Make syncs safe to re-run:
  • Use upsert instead of insert
  • Include timestamps for deduplication
  • Track sync batches with IDs

Data validation

Validate before writing:
  • Check required fields
  • Verify data types
  • Validate ranges
  • Check referential integrity

Logging

Log everything:
  • Start/end times
  • Record counts
  • Success/failure status
  • Error details

Monitoring

Track sync health:
MetricAlert Threshold
Sync duration> 2x normal
Error rate> 5%
Missing syncs> 1 missed
Data lag> 1 hour

Results you can expect

MetricManualAutomated
Sync frequencyDaily (maybe)Every 15 min
Error rate5-10%< 1%
Time spent2-4 hours/week0
Data freshnessHours/daysMinutes

Next steps