Automated DNS Failover with Runlater
Detect server outages in seconds and automatically update DNS records to point traffic at a backup server. No manual intervention, no pager duty at 3 AM.
Why automate DNS failover?
When your primary server goes down, every minute of downtime costs you users and revenue. Manual failover means someone has to notice the outage, log in to your DNS provider, update the A record, and wait for propagation. That's 10-30 minutes on a good day.
With Runlater, you can detect failures within 60 seconds and trigger an automatic DNS update via your provider's API. Your backup server starts receiving traffic before most users even notice something went wrong.
Architecture
The setup uses three Runlater features working together: a cron task for health checks, an endpoint
for the DNS update, and the
on_failure_url
hook that connects them.
Cron task (every 1 min)
|
| GET https://your-server.com/health
|
+-- 200 OK? --> do nothing, wait for next tick
|
+-- timeout/5xx? --> task fails
|
| on_failure_url fires automatically
v
Runlater endpoint (ep_xxx)
|
| forwards to DNS provider API
v
Cloudflare / Route53 / your DNS
|
| A record updated: primary IP → backup IP
v
Traffic now goes to backup server
Step 1: Create the health check cron task
Create a cron task that pings your primary server's health endpoint every minute. If the server is down, the task will fail — which is exactly what triggers the failover.
const res = await fetch("https://runlater.eu/api/v1/tasks", { method: "POST", headers: { "Authorization": `Bearer ${process.env.RUNLATER_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ name: "Health check: primary server", url: "https://your-server.com/health", method: "GET", cron: "* * * * *", // Every minute timeout_ms: 10000, // 10s timeout — fail fast expected_status_codes: [200], // Anything else = failure }), }) const { data } = await res.json() console.log("Task ID:", data.id) // Save this — you'll need it in Step 3
curl -X POST https://runlater.eu/api/v1/tasks \
-H "Authorization: Bearer pk_xxx.sk_xxx" \
-H "Content-Type: application/json" \
-d '{
"name": "Health check: primary server",
"url": "https://your-server.com/health",
"method": "GET",
"cron": "* * * * *",
"timeout_ms": 10000,
"expected_status_codes": [200]
}'
* * * * *)
require the Pro plan. Free tier tasks can run at most once per hour.
Step 2: Create the DNS failover endpoint
Create an inbound endpoint that, when triggered, calls your DNS provider's API to update the A record. The endpoint stores the API credentials and request body so the failover happens without any application code running.
Cloudflare example
Cloudflare's DNS API lets you update a record with a
PUT
request. You'll need your
Zone ID, the DNS Record ID, and an API token with DNS edit permissions.
const ZONE_ID = "your-cloudflare-zone-id" const RECORD_ID = "your-dns-record-id" const BACKUP_IP = "203.0.113.50" const res = await fetch("https://runlater.eu/api/v1/endpoints", { method: "POST", headers: { "Authorization": `Bearer ${process.env.RUNLATER_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ name: "DNS failover: switch to backup", forward_urls: [ `https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/dns_records/${RECORD_ID}` ], forward_headers: { "Authorization": `Bearer ${process.env.CLOUDFLARE_API_TOKEN}`, "Content-Type": "application/json", }, forward_body: JSON.stringify({ type: "A", name: "your-domain.com", content: BACKUP_IP, ttl: 60, proxied: true, }), retry_attempts: 3, }), }) const { data } = await res.json() console.log("Inbound URL:", data.inbound_url) // https://runlater.eu/in/ep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
curl -X POST https://runlater.eu/api/v1/endpoints \
-H "Authorization: Bearer pk_xxx.sk_xxx" \
-H "Content-Type: application/json" \
-d '{
"name": "DNS failover: switch to backup",
"forward_urls": [
"https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records/RECORD_ID"
],
"forward_headers": {
"Authorization": "Bearer cf-api-token-here",
"Content-Type": "application/json"
},
"forward_body": "{\"type\":\"A\",\"name\":\"your-domain.com\",\"content\":\"203.0.113.50\",\"ttl\":60,\"proxied\":true}",
"retry_attempts": 3
}'
forward_headers
on Runlater's servers.
It's never exposed in logs or webhook payloads.
Step 3: Connect them with on_failure_url
Now update the health check task so that when it fails, it automatically triggers the failover
endpoint. Set on_failure_url to
the endpoint's inbound URL.
// Update the health check task with the failover URL const TASK_ID = "task-id-from-step-1" const FAILOVER_URL = "https://runlater.eu/in/ep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" await fetch(`https://runlater.eu/api/v1/tasks/${TASK_ID}`, { method: "PUT", headers: { "Authorization": `Bearer ${process.env.RUNLATER_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ on_failure_url: FAILOVER_URL, }), })
curl -X PUT https://runlater.eu/api/v1/tasks/TASK_ID \
-H "Authorization: Bearer pk_xxx.sk_xxx" \
-H "Content-Type: application/json" \
-d '{
"on_failure_url": "https://runlater.eu/in/ep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}'
You can also set on_failure_url
when
creating the task in Step 1 — just add it to the request body. We separated the steps here
for clarity, since you need the endpoint's inbound URL first.
on_failure_url
directly in the
task creation call in Step 1.
Step 4: (Optional) Automatic recovery
When your primary server comes back up, you probably want DNS to switch back automatically. Create
a second endpoint that points the A record back to your primary IP, and set it as the task's on_recovery_url.
# 1. Create the recovery endpoint (same DNS API, but with PRIMARY_IP) curl -X POST https://runlater.eu/api/v1/endpoints \ -H "Authorization: Bearer pk_xxx.sk_xxx" \ -H "Content-Type: application/json" \ -d '{ "name": "DNS recovery: switch to primary", "forward_urls": [ "https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records/RECORD_ID" ], "forward_headers": { "Authorization": "Bearer cf-api-token-here", "Content-Type": "application/json" }, "forward_body": "{\"type\":\"A\",\"name\":\"your-domain.com\",\"content\":\"198.51.100.10\",\"ttl\":60,\"proxied\":true}", "retry_attempts": 3 }' # 2. Update the health check task with both URLs curl -X PUT https://runlater.eu/api/v1/tasks/TASK_ID \ -H "Authorization: Bearer pk_xxx.sk_xxx" \ -H "Content-Type: application/json" \ -d '{ "on_failure_url": "https://runlater.eu/in/ep_failover_slug_here", "on_recovery_url": "https://runlater.eu/in/ep_recovery_slug_here" }'
Now you have a fully automated loop:
Primary goes down
→ health check fails
→ on_failure_url fires
→ DNS updated to backup IP
Primary comes back up
→ health check succeeds (after previous failure)
→ on_recovery_url fires
→ DNS updated back to primary IP
DNS provider examples
Cloudflare
Cloudflare is the most common choice. You need three values from your Cloudflare dashboard:
- Zone ID — found on the Overview page of your domain
-
DNS Record ID
— get it via the
GET /zones/:zone_id/dns_recordsAPI - API Token — create one with Zone > DNS > Edit permissions
curl https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records \ -H "Authorization: Bearer cf-api-token" \ | jq '.result[] | select(.name == "your-domain.com") | .id'
Generic REST API
Any DNS provider with a REST API works. The pattern is the same:
- Find the API endpoint that updates a DNS record
-
Put the URL in
forward_urls -
Put auth headers in
forward_headers -
Put the update payload in
forward_body
| Provider | API endpoint | Auth |
|---|---|---|
| Cloudflare | PUT /client/v4/zones/:zone/dns_records/:id | Bearer token |
| DigitalOcean | PUT /v2/domains/:domain/records/:id | Bearer token |
| Hetzner | PUT /api/v1/records/:id | Auth-API-Token header |
| Porkbun | POST /api/json/v3/dns/editByNameType/:domain/A | API key + secret in body |
Testing the failover
Before relying on this in production, test the full flow:
1. Test the endpoint directly
Send a request to the endpoint's inbound URL to verify the DNS update works:
# Trigger the failover endpoint manually curl -X POST https://runlater.eu/in/ep_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Check that DNS was updated dig your-domain.com A +short # Should show: 203.0.113.50 (backup IP)
2. Test the full chain
Temporarily change the health check URL to something that will fail — for example, a non-existent path on your server:
# Point health check at a URL that returns 404 curl -X PUT https://runlater.eu/api/v1/tasks/TASK_ID \ -H "Authorization: Bearer pk_xxx.sk_xxx" \ -H "Content-Type: application/json" \ -d '{ "url": "https://your-server.com/this-will-404" }' # Wait 1-2 minutes, then check DNS dig your-domain.com A +short # Switch back to the real health check URL when done curl -X PUT https://runlater.eu/api/v1/tasks/TASK_ID \ -H "Authorization: Bearer pk_xxx.sk_xxx" \ -H "Content-Type: application/json" \ -d '{ "url": "https://your-server.com/health" }'
3. Verify recovery
If you set up the optional recovery endpoint from Step 4, the DNS should switch back to
the primary IP after the health check succeeds again. Check with
dig
after a couple of minutes.
Tips
- Use a low TTL. Set your DNS record's TTL to 60 seconds so the failover takes effect quickly. High TTLs (3600+) mean clients will keep connecting to the dead server for up to an hour after the DNS change.
-
Use expected_status_codes.
Set
expected_status_codesto[200]on your health check task. This way a 500 or 503 response is treated as a failure, not just timeouts. - Keep the timeout short. A 10-second timeout on the health check means failure detection within ~70 seconds (60s cron interval + 10s timeout). Don't set it to 30s or higher.
- Set up org-level notifications too. Configure email or Slack notifications in your organization settings so you know when a failover happens, even if the recovery is automatic.
-
Health endpoint should check dependencies.
Your
/healthendpoint should verify database connectivity, not just return 200. A server that can't reach its database is effectively down.