DigitalOcean has announced that its Inference Cloud Platform, powered by AMD Instinct GPUs, is delivering double the production inference throughput for Character.ai while reducing cost per token by 50%. This performance milestone results from deep platform-level optimization, integrating hardware-aware scheduling and tuned inference runtimes to handle the AI entertainment platform's massive, latency-sensitive workload of over a billion daily queries.
DigitalOcean's Inference Cloud delivers 2X production inference throughput for Character.ai.
The platform, powered by AMD Instinct GPUs, reduced cost per token by 50%.
Achieved through deep collaboration on hardware-aware scheduling and optimized inference runtimes.
Character.ai handles over a billion queries per day with strict latency requirements.
The optimization balances latency, throughput, and concurrency for real production constraints.
This reflects a shift toward prioritizing predictable performance and cost efficiency over raw hardware specs.
Character.ai operates one of the most demanding production inference workloads, requiring high throughput and low latency for over a billion daily queries. By migrating to DigitalOcean's Inference Cloud, the company achieved significantly higher sustained request throughput while meeting its rigorous latency targets. This resulted in a 50% reduction in cost per token and expanded usable capacity for its end users, directly supporting platform growth.
The performance gains were achieved through a tight collaboration between DigitalOcean, Character.ai, and AMD. Rather than treating GPUs as generic infrastructure, DigitalOcean's platform integrates hardware-aware scheduling and optimized inference runtimes. The teams optimized AMD's ROCm software stack with vLLM and AITER (AMD's inference runtime) for Character.ai's specific transformer workloads on AMD Instinct MI300X and MI325X GPUs, extracting higher sustained performance per node.
This deployment underscores a broader shift in how scalable AI infrastructure is evaluated. As inference workloads grow, priorities are moving from raw hardware availability to predictable performance, operational simplicity, and total cost efficiency under real production constraints. DigitalOcean's platform is designed to operate AI applications in production, providing a unified hardware-software paradigm that delivers cost-efficiency, observability, and operational simplicity at scale.
The results demonstrate the impact of deep technical collaboration, where platform and silicon teams work together to solve specific production challenges, enabling builders to run large-scale, latency-sensitive AI applications more economically and reliably.
About DigitalOcean
DigitalOcean is an inference cloud platform that helps AI and Digital Native Businesses build, run, and scale intelligent applications with speed, simplicity, and predictable economics. The platform combines production-ready GPU infrastructure, a full-stack cloud, model-first inference workflows, and an agentic experience layer to reduce operational complexity and accelerate time to production. More than 640,000 customers trust DigitalOcean to deliver the cloud and AI infrastructure they need to build and grow.