Automated DNS Failover with Runlater

Detect server outages in seconds and automatically update DNS records to point traffic at a backup server. No manual intervention, no pager duty at 3 AM.

Why automate DNS failover?

When your primary server goes down, every minute of downtime costs you users and revenue. Manual failover means someone has to notice the outage, log in to your DNS provider, update the A record, and wait for propagation. That's 10-30 minutes on a good day.

With Runlater, you can detect failures within 60 seconds and trigger an automatic DNS update via your provider's API. Your backup server starts receiving traffic before most users even notice something went wrong.

Architecture

The setup uses three Runlater features working together: a cron task for health checks, an endpoint for the DNS update, and the on_failure_url hook that connects them.

Cron task (every 1 min)
    |
    |  GET https://your-server.com/health
    |
    +-- 200 OK? --> do nothing, wait for next tick
    |
    +-- timeout/5xx? --> task fails
            |
            |  on_failure_url fires automatically
            v
    Runlater endpoint (ep_xxx)
            |
            |  forwards to DNS provider API
            v
    Cloudflare / Route53 / your DNS
            |
            |  A record updated: primary IP → backup IP
            v
    Traffic now goes to backup server

Step 1: Create the health check cron task

Create a cron task that pings your primary server's health endpoint every minute. If the server is down, the task will fail — which is exactly what triggers the failover.

Using fetch
const res = await fetch("https://runlater.eu/api/v1/tasks", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.RUNLATER_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    name: "Health check: primary server",
    url: "https://your-server.com/health",
    method: "GET",
    cron: "* * * * *",              // Every minute
    timeout_ms: 10000,             // 10s timeout — fail fast
    expected_status_codes: [200],  // Anything else = failure
  }),
})

const { data } = await res.json()
console.log("Task ID:", data.id)
// Save this — you'll need it in Step 3
Using curl
curl -X POST https://runlater.eu/api/v1/tasks \
  -H "Authorization: Bearer pk_xxx.sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Health check: primary server",
    "url": "https://your-server.com/health",
    "method": "GET",
    "cron": "* * * * *",
    "timeout_ms": 10000,
    "expected_status_codes": [200]
  }'
Pro tier required. Minute-interval cron tasks (* * * * *) require the Pro plan. Free tier tasks can run at most once per hour.

Step 2: Create the DNS failover endpoint

Create an inbound endpoint that, when triggered, calls your DNS provider's API to update the A record. The endpoint stores the API credentials and request body so the failover happens without any application code running.

Cloudflare example

Cloudflare's DNS API lets you update a record with a PUT request. You'll need your Zone ID, the DNS Record ID, and an API token with DNS edit permissions.

Using fetch
const ZONE_ID = "your-cloudflare-zone-id"
const RECORD_ID = "your-dns-record-id"
const BACKUP_IP = "203.0.113.50"

const res = await fetch("https://runlater.eu/api/v1/endpoints", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.RUNLATER_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    name: "DNS failover: switch to backup",
    forward_urls: [
      `https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/dns_records/${RECORD_ID}`
    ],
    forward_headers: {
      "Authorization": `Bearer ${process.env.CLOUDFLARE_API_TOKEN}`,
      "Content-Type": "application/json",
    },
    forward_body: JSON.stringify({
      type: "A",
      name: "your-domain.com",
      content: BACKUP_IP,
      ttl: 60,
      proxied: true,
    }),
    retry_attempts: 3,
  }),
})

const { data } = await res.json()
console.log("Inbound URL:", data.inbound_url)
// https://runlater.eu/in/ep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Using curl
curl -X POST https://runlater.eu/api/v1/endpoints \
  -H "Authorization: Bearer pk_xxx.sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "DNS failover: switch to backup",
    "forward_urls": [
      "https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records/RECORD_ID"
    ],
    "forward_headers": {
      "Authorization": "Bearer cf-api-token-here",
      "Content-Type": "application/json"
    },
    "forward_body": "{\"type\":\"A\",\"name\":\"your-domain.com\",\"content\":\"203.0.113.50\",\"ttl\":60,\"proxied\":true}",
    "retry_attempts": 3
  }'
Credentials stay safe. Your DNS provider API token is stored in the endpoint's forward_headers on Runlater's servers. It's never exposed in logs or webhook payloads.

Step 3: Connect them with on_failure_url

Now update the health check task so that when it fails, it automatically triggers the failover endpoint. Set on_failure_url to the endpoint's inbound URL.

Using fetch
// Update the health check task with the failover URL
const TASK_ID = "task-id-from-step-1"
const FAILOVER_URL = "https://runlater.eu/in/ep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

await fetch(`https://runlater.eu/api/v1/tasks/${TASK_ID}`, {
  method: "PUT",
  headers: {
    "Authorization": `Bearer ${process.env.RUNLATER_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    on_failure_url: FAILOVER_URL,
  }),
})
Using curl
curl -X PUT https://runlater.eu/api/v1/tasks/TASK_ID \
  -H "Authorization: Bearer pk_xxx.sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "on_failure_url": "https://runlater.eu/in/ep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  }'

You can also set on_failure_url when creating the task in Step 1 — just add it to the request body. We separated the steps here for clarity, since you need the endpoint's inbound URL first.

All in one request. If you create the endpoint first, you can include on_failure_url directly in the task creation call in Step 1.

Step 4: (Optional) Automatic recovery

When your primary server comes back up, you probably want DNS to switch back automatically. Create a second endpoint that points the A record back to your primary IP, and set it as the task's on_recovery_url.

Using curl
# 1. Create the recovery endpoint (same DNS API, but with PRIMARY_IP)
curl -X POST https://runlater.eu/api/v1/endpoints \
  -H "Authorization: Bearer pk_xxx.sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "DNS recovery: switch to primary",
    "forward_urls": [
      "https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records/RECORD_ID"
    ],
    "forward_headers": {
      "Authorization": "Bearer cf-api-token-here",
      "Content-Type": "application/json"
    },
    "forward_body": "{\"type\":\"A\",\"name\":\"your-domain.com\",\"content\":\"198.51.100.10\",\"ttl\":60,\"proxied\":true}",
    "retry_attempts": 3
  }'

# 2. Update the health check task with both URLs
curl -X PUT https://runlater.eu/api/v1/tasks/TASK_ID \
  -H "Authorization: Bearer pk_xxx.sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "on_failure_url": "https://runlater.eu/in/ep_failover_slug_here",
    "on_recovery_url": "https://runlater.eu/in/ep_recovery_slug_here"
  }'

Now you have a fully automated loop:

Primary goes down
    → health check fails
    → on_failure_url fires
    → DNS updated to backup IP

Primary comes back up
    → health check succeeds (after previous failure)
    → on_recovery_url fires
    → DNS updated back to primary IP

DNS provider examples

Cloudflare

Cloudflare is the most common choice. You need three values from your Cloudflare dashboard:

  • Zone ID — found on the Overview page of your domain
  • DNS Record ID — get it via the GET /zones/:zone_id/dns_records API
  • API Token — create one with Zone > DNS > Edit permissions
Find your DNS Record ID
curl https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records \
  -H "Authorization: Bearer cf-api-token" \
  | jq '.result[] | select(.name == "your-domain.com") | .id'

Generic REST API

Any DNS provider with a REST API works. The pattern is the same:

  1. Find the API endpoint that updates a DNS record
  2. Put the URL in forward_urls
  3. Put auth headers in forward_headers
  4. Put the update payload in forward_body
Provider API endpoint Auth
Cloudflare PUT /client/v4/zones/:zone/dns_records/:id Bearer token
DigitalOcean PUT /v2/domains/:domain/records/:id Bearer token
Hetzner PUT /api/v1/records/:id Auth-API-Token header
Porkbun POST /api/json/v3/dns/editByNameType/:domain/A API key + secret in body

Testing the failover

Before relying on this in production, test the full flow:

1. Test the endpoint directly

Send a request to the endpoint's inbound URL to verify the DNS update works:

# Trigger the failover endpoint manually
curl -X POST https://runlater.eu/in/ep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Check that DNS was updated
dig your-domain.com A +short
# Should show: 203.0.113.50 (backup IP)

2. Test the full chain

Temporarily change the health check URL to something that will fail — for example, a non-existent path on your server:

# Point health check at a URL that returns 404
curl -X PUT https://runlater.eu/api/v1/tasks/TASK_ID \
  -H "Authorization: Bearer pk_xxx.sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://your-server.com/this-will-404" }'

# Wait 1-2 minutes, then check DNS
dig your-domain.com A +short

# Switch back to the real health check URL when done
curl -X PUT https://runlater.eu/api/v1/tasks/TASK_ID \
  -H "Authorization: Bearer pk_xxx.sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://your-server.com/health" }'

3. Verify recovery

If you set up the optional recovery endpoint from Step 4, the DNS should switch back to the primary IP after the health check succeeds again. Check with dig after a couple of minutes.

Tips

  • Use a low TTL. Set your DNS record's TTL to 60 seconds so the failover takes effect quickly. High TTLs (3600+) mean clients will keep connecting to the dead server for up to an hour after the DNS change.
  • Use expected_status_codes. Set expected_status_codes to [200] on your health check task. This way a 500 or 503 response is treated as a failure, not just timeouts.
  • Keep the timeout short. A 10-second timeout on the health check means failure detection within ~70 seconds (60s cron interval + 10s timeout). Don't set it to 30s or higher.
  • Set up org-level notifications too. Configure email or Slack notifications in your organization settings so you know when a failover happens, even if the recovery is automatic.
  • Health endpoint should check dependencies. Your /health endpoint should verify database connectivity, not just return 200. A server that can't reach its database is effectively down.
Ready to set up failover? Create a free account and set up automated DNS failover in under 5 minutes. Upgrade to Pro for minute-level health checks. See the API docs for the full task and endpoint reference.

Back to all guides