The Great DoorDash Routing Degradation of March 2026
For three hours on March 14, 2026, DoorDash Routing (the layer that powers our edge-to-burrito delivery network) experienced elevated latency in the DEN-CHUD-3 region. No bytes were lost. No bytes were even moving. But 14 burritos were delayed. This is their story.
Background
DoorDash Routing is one of Chudflare's most critical internal services. When a chud at one of our PoPs hits a configurable hunger threshold (set per-employee, defaults to 47/100), DoorDash Routing automatically dispatches a burrito via the optimal route from the nearest Mexican restaurant. Average delivery time: 23 minutes. SLA: 47 minutes. Burrito-to-byte ratio: 1 burrito per 100M requests served.
Incident timeline (UTC)
- 15:47: First alert fires. Burrito P95 delivery time exceeds 47 minutes for the third consecutive measurement.
- 15:52: SRE on-call investigates. DoorDash Routing API is healthy. All upstream restaurants are reachable. The routes themselves look fine.
- 16:14: Escalated to me. I check the dashboard. Every burrito in DEN-CHUD-3 is stuck in the "out for delivery" state for 90+ minutes.
- 16:23: Investigated the actual delivery driver layer. Found that three Dashers were stuck in our office lobby, unable to reach the chud team because the elevator was broken.
- 16:31: Filed a ticket with building management.
- 17:15: Elevator restored. Dashers ascended. Burritos delivered.
- 17:42: Backfill complete. All 14 delayed burritos delivered. Service restored.
Root cause
The incident was caused by a physical-layer failure in the building's elevator system. Our software was healthy throughout. Our infrastructure was healthy throughout. Our chuds were hungry throughout. The bottleneck was 4 floors of stairs that nobody was willing to climb.
This is a classic L0 issue. We do not have observability into L0 (the physical world). We are exploring options.
What we're doing
- Multi-elevator redundancy. Working with building management to ensure at least 2 elevators are operational at all times. Estimated cost: $0 (they are required to do this by code).
- Stair-fallback policy. In the event of all elevators failing, an intern will be dispatched to descend the stairs and accept burrito delivery at ground level. Estimated burrito transit time: +8 minutes.
- Pre-emptive caching. We are exploring a "burrito CDN" architecture where 2-3 burritos are pre-positioned in the office mini-fridge at the start of each day. They will be served from cache during outages. This violates several food safety guidelines but is being evaluated for production.
What we're NOT doing
- Moving offices. The hunch angle in our current office is optimal.
- Cooking food in-house. We are a chud company. We do not cook.
- Telling the chuds to walk to the restaurant. This would be cruel.
Customer impact
No external customers were impacted. This incident was entirely internal: the only people affected were 14 Chudflare employees who had to wait an extra hour for lunch. They have been compensated with Monster Ultra Zero credits.
We take burrito-delivery reliability extremely seriously. We are sorry to the chuds we kept hungry. It will not happen again. At least not until the elevator breaks again. Which it will. Nothing ever happens.
Theo, Staff SRE (now keeping a granola bar in my desk drawer at all times)
