The Hook: “It’s just a dev script.”
We’ve all heard it. “I just need an OpenAI key to test this cron job. It runs once a night.”
Six hours later, that “once a night” cron job was stuck in an infinite retry loop because of a malformed JSON payload. It fired 400,000 requests against GPT-4 before anyone woke up.
If that key was direct-to-OpenAI, you just burned $12,000.
If that key was behind an AI Gateway, the budget cap would have triggered at $50.
The Problem: Distributed Keys
In most startups, API keys are treated like candy. They are:
- Hardcoded in
.envfiles. - Shared in Slack.
- Buried in CI/CD variables.
This creates a distributed vulnerability. You cannot rotate a key without breaking 10 unknown services. You cannot monitor who is spending what.
graph TD
A[Service A] -->|Key 1| OpenAi
B[Service B] -->|Key 2| OpenAi
C[Dev Script] -->|Key 3| OpenAi
D[Unknown Cron] -->|Key ?| OpenAi
style D fill:#f00,stroke:#333
The Solution: A Centralized Gateway
We implemented a unified ingress for all LLM traffic. No service talks to OpenAI directly. They talk to llm.internal.deveez.com.
The Architecture
Instead of managing 50 keys, we manage one master key at the Gateway level, and issue virtual tokens to internal teams.
# gateway-config.yaml
policies:
- name: "daily-budget-cap"
type: "cost-limit"
config:
limit_usd: 50
period: "daily"
action: "reject"
The Code
Here is how we enforce rate limits in our Golang proxy middleware:
func RateLimitMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
teamID := r.Header.Get("X-Team-ID")
if limiter.Allow(teamID) == false {
http.Error(w, "Rate limit exceeded", 429)
return
}
next.ServeHTTP(w, r)
})
}
The Deveez Stance
Security isn’t about trusting your developers; it’s about protecting them from their own scripts.
A Gateway isn’t “bureaucracy.” It’s a seatbelt. If you are running LLMs in production without a proxy layer, you are one while(true) loop away from bankruptcy.
The Bottom Line
- Revoke local keys immediately.
- Deploy a Gateway (Kong, LiteLLM, or custom).
- Set hard limits per team.
Worried about this exact security flaw? Run our AI Readiness Audit