The Context (The “Nightmare” State)
For this Global Retail Leader, provisioning a new environment meant navigating a bureaucratic labyrinth.
- Organizational Silos: Different infrastructure, security, and database teams were involved in spinning up a new cluster.
- Competing Backlogs: Each silo had its own backlog and competing priorities. A single environment request required coordinating multiple tickets across disparate teams.
- Massive Delays: What should have been a straightforward provisioning process typically meant waiting weeks—sometimes months—for a fully fledged cluster.
Developers were stalled before they could even write their first line of code.
The Architecture
We replaced the fragmented ticket-driven process with a centralized Internal Developer Platform (IDP) powered by cloud-native CNCF technologies.
Before:
- Manual ticketing to varied infrastructure and security teams.
- Inconsistent cluster configurations.
- Slow provisioning for Day-2 components (databases, buckets, permissions).
After (The Golden Path):
- Centralized IDP Core: A unified infrastructure platform orchestrating components across both cloud and on-premise environments.
- GitOps Delivery: Utilizing Flux to ensure that cluster state is version-controlled, auditable, and automatically reconciled.
- Seamless Integration: Developers request resources through their existing company portal, while the infrastructure team manages the entire fleet via a custom-built operational UI.
The Implementation
We architected a solution built entirely on open-source CNCF technologies to provide high scale and flexibility.
1. The Stack
- Orchestration: Crossplane for composing and provisioning cloud infrastructure as Kubernetes configurations.
- GitOps: Flux for continuous delivery and state reconciliation.
- Cluster Provisioning: The platform supports provisioning full-blown managed clusters of any flavor (OCP, EKS, AKS) directly, while also offering vCluster for creating lightweight, highly-isolated Kubernetes clusters when a dedicated managed cluster isn’t necessary.
2. The Hack: Crossplane Sharding & vCluster
The biggest technical hurdle was managing the sheer volume of resources without overwhelming the control plane.
To solve this, we implemented Crossplane Sharding. By distributing the load across multiple Crossplane instances, we gained the ability to rapidly provision and manage thousands of individual components seamlessly.
While developers can request full-fledged managed clusters of any flavor (like OCP, EKS, or AKS) if their workloads demand it, we paired the setup with vCluster to achieve massive scale. For teams that only require a quick, temporary environment, vCluster allowed us to spin up cheap, highly-isolated virtual clusters in a fraction of the time, perfectly optimized for rapid development without the overhead of heavy cluster provisioning.
The IDP doesn’t just provision clusters; it’s also responsible for handling all the smaller “Day-2” components seamlessly—spinning up databases, creating buckets, and wiring up permissions automatically.
The Results
The platform is now actively provisioning 2 to 3 new clusters every week, with each environment natively supporting dozens of active developers.
| Metric | Before | After |
|---|---|---|
| Provisioning Time | Weeks to Months | < 4 Hours |
| Developer Unblocking | Stalled in Backlogs | Self-service via existing portal |
| Infrastructure Management | Scattered Tickets | Unified Operational UI |
| Scale Mechanism | Manual tracking | Crossplane Sharding + vCluster |
Conclusion
By standardizing and automating the core infrastructure, we eliminated the silos that were bottlenecking developer velocity. Managing infrastructure as code and leveraging advanced CNCF tooling allowed the client to transform a multi-month bottleneck into a nearly immediate automated workflow.