January 11, 2026
Unbounded Consumption — Protecting Against Resource Exhaustion and AI-DoS
As AI becomes core infrastructure in 2026, attackers have learned to weaponize model compute costs. LLM10: Unbounded Consumption highlights how adversaries can trigger “Denial‑of‑Wallet” attacks by forcing models into expensive reasoning loops, overloading context windows, or causing agents to spam tool calls. This post outlines essential safeguards including hard token caps, token‑based rate limiting, execution timeouts, recursion limits, and semantic caching. The emerging best practice—model cascading—routes each request to the cheapest model capable of handling it, dramatically reducing risk and spend. The goal: keep your AI powerful, affordable, and resilient.