Java at 1 Million TPS: What the Architecture Actually Looks Like

April 02, 2026 · By Satyendra Singh

    
    
    
    arch-analysis.sh
  

    sattu@prod-cluster
    :~/backend/scale
    $
     ./analyze --target=java --tps=1000000
  
> loading architecture layers...
RUNTIME      JDK 21 + virtual threads
CACHE HIT    95.8%
FRAMEWORK    Spring Boot 3.x / Quarkus
INSTANCES    50 pods active
EDGE         Cloudflare absorbing 70%
QUEUE        Kafka lag: 0ms
How a million-TPS Java
architecture actually looks
The framework is the least interesting part. What you build around it is everything.
1M
TPS TARGET
16.6K
REQ / SEC
8 min
READ TIME
sattu.in / engineering

Between slide decks promising "infinite scale" and the quiet panic of a 3 AM PagerDuty alert, there is a question worth answering: what does a Java system handling one million transactions per second actually look like in production?

No single Java framework gets you there on its own. Spring Boot did not build Twitter. Quarkus did not keep Grab alive during peak monsoon hours. The layers above and below your JVM process did. And whether Java is even the right language for your hot path is a question worth asking before you have 50 pods and a runbook.

16.6Krequests / second

95%+cache hit rate needed

50+service instances

The layered architecture

Every system at this scale is built in rings. Each ring absorbs what it can and passes only what it must to the next layer down.

01 Edge	CDN — absorbs ~70% of traffic Static assets, cached API responses, geographic routing. This traffic never reaches your JVM. Cloudflare handles it at the edge, often within 30ms globally. Cloudflare · AWS CloudFront · Akamai
02 Ingress	Load balancer + API gateway L4 balancing distributes TCP connections. The gateway handles auth, rate limiting, and routing before a single byte touches service code. AWS ALB · NGINX · Spring Cloud Gateway · Kong
03 Compute	Spring Boot / Quarkus instances (30–50 pods) Reactive I/O (WebFlux + Netty) or JDK 21 virtual threads. Both solve the one-thread-per-request bottleneck Tomcat hits. Spring WebFlux · Quarkus Reactive · JDK 21 Virtual Threads · HikariCP
04 Queue	Async offload via message broker Anything that does not need a synchronous response goes on a queue. Orders, notifications, audit logs drain asynchronously, protecting the database from write spikes. Apache Kafka · RabbitMQ · AWS SQS
05 Cache	Redis cluster — load-bearing at this scale At 95%+ cache hit rates, your database handles only 5% of reads. Redis is not an optimisation here. It is structural. Redis Cluster · Memcached · Spring Cache
06 Storage	Read replicas + write sharding One primary for writes, three or more replicas for reads. Indexes on every queried column. This sounds obvious until you are staring at a full table scan at 3 AM. PostgreSQL · MySQL · Cassandra · PlanetScale

"The framework is your engine. The architecture around it determines whether you survive the load, and whether your on-call rotation survives the night."
A lesson learned in production, not in tutorials

The thread model question

Traditional Spring Boot with Tomcat gives you one OS thread per request. At 1M TPS with average latency of 60ms, that means roughly 60,000 concurrent threads. Memory exhaustion, context-switching overhead, and a performance cliff you hit hard and without much warning.

Option A: Reactive with WebFlux

Spring WebFlux on Netty uses a small fixed pool of event-loop threads, typically 2 x CPU cores, and handles concurrency through non-blocking I/O. The catch is the mental overhead — reactive code is hard to write, harder to debug, and harder to hire for.

@GetMapping("/product/{id}")
public Mono<Product> getProduct(@PathVariable Long id) {
    return cacheService.get(id)
        .switchIfEmpty(productRepo.findById(id)
            .doOnNext(p -> cacheService.set(id, p)));
    // non-blocking all the way down
}

Option B: Virtual threads on JDK 21

Project Loom lets you write normal synchronous code and get non-blocking behaviour underneath. The JVM multiplexes millions of lightweight virtual threads onto a small pool of OS carrier threads. For most teams today this is the right call. Simpler code, comparable throughput, no reactive pyramid of doom to maintain.

# Spring Boot 3.2+ - one line to unlock it
spring.threads.virtual.enabled=true

# Tune HikariCP separately - virtual threads won't save a saturated pool
spring.datasource.hikari.maximum-pool-size=50
spring.datasource.hikari.minimum-idle=10

What actually breaks first

Connection pool exhaustion. HikariCP defaults are sized for modest load. A misconfigured pool becomes a synchronised chokepoint fast. Tune maximumPoolSize and watch pool metrics before go-live, not after.

GC stop-the-world pauses. A 200ms pause at 16K req/sec means thousands of requests timing out at once. Use G1GC or ZGC. Java 21's generational ZGC is currently the best choice for latency-sensitive workloads.

Slow queries without indexes. An un-indexed column that is fine in dev becomes a full table scan in production. Run EXPLAIN ANALYZE before you ship, not after your first incident.

Debug logs left on in prod. DEBUG logging at 16K req/sec generates gigabytes per minute. Your logging infrastructure falls over before your service does. Async appenders, structured JSON, log sampling. Non-negotiable at this traffic level.

What else can handle this scale, beyond Java

Java is excellent at this scale. It is also not the only answer. A few other languages are worth knowing about, and some of them beat Java in specific situations.

Language	Raw throughput	Sweet spot	Real trade-off
Go	Comparable to Java; goroutines are lighter than virtual threads	API gateways, proxies, infra tooling. Docker, Kubernetes, Prometheus are all Go.	Ecosystem thinner than Java's for enterprise use cases
Rust	Fastest in class. No GC, zero-cost abstractions. Actix-Web benchmarks are absurd.	Latency-critical hot paths, game servers, anything where p99 is a contract.	Borrow checker learning curve is steep. Hire-ability is a genuine concern.
Node.js	Good at I/O-bound; poor at CPU-bound. Event loop limits raw throughput.	Real-time apps, BFF layers, streaming APIs.	CPU-intensive work blocks the loop. Worker threads feel bolted on.
Python	FastAPI + uvicorn is capable for I/O-bound. GIL is still the ceiling.	ML inference, data pipelines, internal tooling.	Nobody at this scale runs Python on the hot path. They put Java or Go in front.
Elixir	Millions of lightweight BEAM processes. WhatsApp handled 2M connections on a single server.	Long-lived connections, chat, real-time multiplayer, telemetry pipelines.	Hiring pool is small. Not the right ecosystem for standard CRUD at scale.
.NET	C# with ASP.NET Core is legitimately competitive, often faster than Spring Boot in benchmarks.	Windows-native shops, enterprise software, gaming backends.	The "Java is better" crowd and the ".NET is better" crowd are both mostly wrong. Pick what your team knows.

Which one to actually pick

If your team knows Java well, use Java with virtual threads. The ecosystem, tooling, and hiring pool are unmatched. If you are building a proxy, a CLI tool, or anything infrastructure-adjacent, Go is the pragmatic call. If you have a p99 latency SLA and a team with patience, Rust is the right answer. If you need millions of concurrent long-lived connections for chat, gaming, or real-time data, look at Elixir before anything else.

At genuine 1M TPS scale, the language at the hot path matters less than most engineers like to debate. The CDN absorption rate, the cache hit ratio, the Kafka consumer group lag, the HikariCP pool size — those are the numbers that decide whether you page or sleep. Pick a language your team is productive in, tune the layers above and below, and put your architectural energy where the load actually lives.

"Architecture scales. Clever framework configuration does not. And neither does arguing about languages at 2 AM."
Earned opinion

One last thing on observability

At this scale you cannot debug by reading logs. By the time you find the relevant entry, the incident is over and your P0 bridge is still open. You need Prometheus metrics, distributed traces via OpenTelemetry, and dashboards that exist before the problem arrives. Five numbers to watch: p99 latency per endpoint, error rate, HikariCP active connections, Redis hit ratio, GC pause duration. If those are green, everything else is usually fine. If one spikes, you know where to look.

Comments

No comments yet. Be the first to share your thoughts!

Leave a comment — enter your name and message below. The URL field is optional and can be left blank.

Java at 1 Million TPS: What the Architecture Actually Looks Like

The layered architecture

The thread model question

What actually breaks first

What else can handle this scale, beyond Java

Which one to actually pick

One last thing on observability

Comments

Sattu's Blogs

Search This Blog

Blog Archive

Labels

Category

Report Abuse

Java at 1 Million TPS: What the Architecture Actually Looks Like

Contact Form

All Posts

Satyendra Singh

My Story

What You'll Find Here

Say Hello