In Part 1, we measured what web backend frameworks cost before serving a single request — idle memory, startup time, and Docker image sizes. We found a 30-100x spread between the lightest (Rust at 3 MB) and heaviest (Spring Boot at 500 MB) frameworks. In Part 3 we cover how to actually choose the right backend for your workload, team, and scale.
But idle cost is only half the story. Some frameworks with high startup costs become remarkably efficient once they’re running. Others with tiny idle footprints hit scaling walls under load. The question every developer and architect needs answered is: after your application is warmed up and running, how does each language actually perform?
We compiled data from TechEmpower Rounds 22 and 23, the Sharkbench Web Framework Benchmark (August 2025), and 40+ independent benchmark studies to find out.
Warmed-Up Throughput: The Steady-State Rankings
First, the raw numbers. This table shows throughput after warmup, measured on the same hardware (Sharkbench: Ryzen 7 7800X3D, Docker, 1 CPU core equivalent) running concurrent HTTP requests with JSON serialization and I/O operations.
| Language/Framework | Requests/sec | Median Latency | Memory | Stability* |
|---|---|---|---|---|
| Java Vert.x (Temurin JVM) | 23,116 | 1.3 ms | 484 MB | — |
| Bun.serve | 22,303 | 1.2 ms | 24.5 MB | 10.4% |
| Rust Actix | 21,965 | 1.4 ms | 16.6 MB | 66.6% |
| Rust Axum | 21,030 | 1.6 ms | 8.5 MB | 72.0% |
| Java Vert.x (Semeru) | 19,917 | 1.5 ms | 137 MB | — |
| C# ASP.NET Core | 14,707 | 1.2 ms | 136.5 MB | 2.6% |
| Node.js Fastify | 9,340 | 3.4 ms | 57 MB | 63.2% |
| Java Spring WebFlux (Semeru) | 7,051 | 1.2 ms | 130 MB | — |
| Java Quarkus Reactive | 6,473 | 0.7 ms | 341 MB | — |
| Node.js Express | 5,766 | 5.5 ms | 82.5 MB | 64.5% |
| Go FastHTTP | 5,567 | 0.7 ms | 13.4 MB | 0.8% |
| Elixir Phoenix (Bandit) | 4,375 | 7.3 ms | 145.5 MB | 84.9% |
| Go Gin | 3,546 | 1.0 ms | 16.7 MB | 1.1% |
| Ruby Rails 8 (YJIT) | 2,340 | 1.2 ms | 125 MB | 1.0% |
| Java Spring MVC (Semeru) | 2,305 | 1.1 ms | 157.5 MB | — |
| Python FastAPI (Uvicorn) | 1,185 | 21.0 ms | 41.2 MB | 21.2% |
| Java Spring MVC (Temurin) | 1,105 | 1.7 ms | 597 MB | — |
| Python Flask (Gunicorn) | 1,092 | 7.7 ms | 90.3 MB | 9.2% |
| Python Django (Gunicorn) | 950 | 8.8 ms | 130 MB | 10.3% |
| PHP Symfony 6.4 | 941 | 8.7 ms | 55.4 MB | 10.3% |
| PHP Laravel 11 | 299 | 101.7 ms | 84.2 MB | 56.5% |
* Stability = median latency / P99 latency ratio. Higher means more predictable response times.
Source: Sharkbench Web Framework Benchmark, August 2025
Several things jump out of this data.
Java Vert.x tops the chart — not Rust. At 23,116 req/s, Java’s reactive framework actually beats Rust Actix (21,965 req/s) on this benchmark. But it does so at a cost of 484 MB of memory, a 29x memory penalty compared to Rust Axum’s 8.5 MB for similar throughput.
Go looks surprisingly slow at 3,546-5,567 req/s. This is because Sharkbench limits each framework to 1 CPU core in Docker. Go’s goroutine model is designed for multi-core scaling — its per-core numbers are modest, but they multiply roughly linearly with available cores. On an 8-core machine, those numbers would be 3-6x higher.
Framework choice within a language matters enormously. Java Vert.x (23,116 req/s) versus Spring MVC (1,105-2,305 req/s) is a 10-20x difference within the same language. Node.js Fastify (9,340 req/s) versus Express (5,766 req/s) is a 1.6x gap. The framework you pick can matter as much as the language.
Bun has impressive raw speed but terrible stability — a 10.4% stability score means its P99 latency is nearly 10x its median latency. Fast on average, but unpredictable on the tail.
The JVM Warmup Question: Does It Pay Off?
This is the question Java developers ask most: after the JVM warms up and the JIT compiler kicks in, does Java actually become faster than Go, Rust, or C#?
Cold vs. Warmed Performance
The JVM doesn’t just run code — it learns from it. Through tiered JIT compilation, the C2 compiler identifies “hot” code paths and compiles them to highly optimized native machine code with aggressive inlining, escape analysis, and speculative optimizations based on runtime data.
| Metric | Cold JVM | Warmed JVM | Improvement |
|---|---|---|---|
| First request latency | 50-500 ms | 1-5 ms | 10-100x |
| Throughput (first 10s) | 20-40% of peak | 100% of peak | 2.5-5x |
| Time to peak | Baseline | 15-45 seconds typical | — |
| P99 latency | Highly variable | Stabilizes within 2-5x of median | Dramatic |
The warmup effect is real and substantial. A payment service documented by Azul reduced time-to-peak-performance from 45 seconds to 12 seconds by pre-compiling 20 key methods. Teads, an ad-tech company, implemented a 2 minute 40 second warmup period before serving live traffic, which eliminated timeout spikes entirely.
HotSpot JVM vs. GraalVM Native Image
A Spring PetClinic benchmark by Vincenzo Racca measured the trade-off directly:
| Metric | HotSpot JVM (JIT) | GraalVM Native Image (AOT) | Difference |
|---|---|---|---|
| Startup time | 7.18 seconds | 0.22 seconds | Native wins by 33x |
| Memory (RSS) | 1,751 MB | 694 MB | Native uses 40% as much |
| Peak throughput | 12,800 req/s | 10,249 req/s | JVM wins by 25% |
The JIT advantage is genuine but modest: 25% more throughput in exchange for 2.5x more memory and 33x slower startup. Whether that trade-off makes sense depends entirely on your deployment model.
At higher concurrency levels, the picture gets more interesting. The same benchmark at 200-300 concurrent users showed GraalVM Native Image actually edging out HotSpot JVM on throughput, likely because its lower memory footprint reduced GC pressure under load.
Does Warmed Java Beat Go or Rust?
No. The JVM warmup investment narrows the gap substantially but does not close it.
| Scenario | Java (warmed) | Go | Rust | C# (.NET 8) |
|---|---|---|---|---|
| JSON serialization | 50,000-100,000 | 80,000-150,000 | 500,000-1,000,000+ | 100,000-300,000 |
| Simple REST endpoint | 50,000-100,000 | 80,000-150,000 | 150,000-500,000 | 100,000-250,000 |
| DB-backed API (1 query) | 30,000-60,000 | 40,000-80,000 | 60,000-120,000 | 40,000-80,000 |
| P50 latency | 1-3 ms | 1-3 ms | 1-3 ms | 1-2 ms |
| P99 latency | 5-50 ms | 5-15 ms | 5-15 ms | 5-30 ms |
Values are requests/second and represent ranges across multiple benchmark sources. All measurements after warmup on multi-core hardware. Sources: TechEmpower R23, Sharkbench, index.dev: Java vs Go vs Rust Comparison
After warmup, Java reaches 60-80% of Go’s throughput and 40-60% of Rust’s for HTTP workloads. Java Vert.x and other reactive frameworks can approach or match Go and C# on median latency, but P99 latency remains significantly worse due to garbage collection pauses.
Where Java is competitive after warmup: mature thread pool management, excellent connection pooling, and the JIT’s ability to optimize hot paths based on actual runtime behavior — optimizations that ahead-of-time compilers can’t make.
Where Java still loses: raw throughput, P99 tail latency, and memory consumption per unit of throughput (10-30x more than Rust/Go).
Performance at Scale: 100, 1,000, and 10,000 Connections
Abstract benchmarks are useful, but the question that matters in production is: how do these frameworks perform at different levels of real concurrency?
100 Concurrent Connections (Low Load)
At low concurrency, most languages perform well. The differences are smallest here.
| Language/Framework | Throughput | Avg Latency | P99 Latency |
|---|---|---|---|
| Rust Actix | 22,000-36,000 | 1-3 ms | 5-10 ms |
| Go net/http | 18,000-30,000 | 1-3 ms | 5-10 ms |
| C# ASP.NET Core | 16,000-27,000 | 1-3 ms | 5-15 ms |
| Java Vert.x (warmed) | 15,000-25,000 | 1-3 ms | 5-20 ms |
| Java Spring Boot (warmed) | 8,000-15,000 | 2-5 ms | 10-30 ms |
| Node.js Fastify | 8,000-13,000 | 3-5 ms | 10-30 ms |
| Elixir Phoenix | 4,000-8,000 | 5-10 ms | 15-40 ms |
| Ruby Rails (YJIT) | 2,000-4,500 | 5-15 ms | 30-80 ms |
| Python FastAPI | 1,000-3,000 | 15-30 ms | 50-150 ms |
| PHP Laravel | 300-1,000 | 50-100 ms | 200-400 ms |
Sources: TechEmpower R23, Sharkbench, Travis Luong: FastAPI vs Fastify vs Spring Boot vs Gin
1,000 Concurrent Connections (Medium Load)
Performance starts to differentiate. Languages with efficient concurrency models pull ahead.
| Language/Framework | Throughput | Avg Latency | P99 Latency |
|---|---|---|---|
| Rust Actix | 30,000-50,000 | 3-8 ms | 10-20 ms |
| Go net/http | 25,000-45,000 | 3-10 ms | 10-20 ms |
| C# ASP.NET Core | 20,000-40,000 | 5-12 ms | 15-40 ms |
| Java Vert.x (warmed) | 20,000-35,000 | 5-15 ms | 20-60 ms |
| Java Spring Boot (warmed) | 10,000-20,000 | 10-25 ms | 30-100 ms |
| Node.js Fastify | 8,000-12,000 | 10-25 ms | 30-80 ms |
| Go Gin | 8,000-15,000 | 5-15 ms | 15-30 ms |
| Elixir Phoenix | 4,000-7,000 | 10-20 ms | 30-60 ms |
| Python FastAPI (multi-worker) | 2,000-5,000 | 30-60 ms | 100-300 ms |
| Ruby Rails (YJIT) | 2,000-4,000 | 15-40 ms | 80-200 ms |
| PHP Laravel Octane | 1,500-4,000 | 20-50 ms | 100-250 ms |
Sources: TechEmpower R23, 2024 Fastest REST API Servers
10,000 Concurrent Connections (High Load)
This is where architecture matters more than micro-optimization. Languages without efficient concurrent connection handling hit walls.
| Language/Framework | Throughput | Avg Latency | P99 Latency |
|---|---|---|---|
| Rust Actix | 35,000-60,000 | 15-45 ms | 30-80 ms |
| Go net/http | 30,000-50,000 | 20-60 ms | 40-100 ms |
| C# ASP.NET Core | 15,000-35,000 | 20-60 ms | 50-150 ms |
| Java Vert.x (warmed) | 15,000-30,000 | 20-50 ms | 50-200 ms |
| Go Gin | 10,000-20,000 | 15-50 ms | 30-80 ms |
| Node.js Fastify (clustered) | 6,000-10,000 | 30-60 ms | 80-250 ms |
| Java Spring Boot (warmed) | 5,000-12,000 | 40-100 ms | 100-500 ms |
| Elixir Phoenix | 4,000-8,000 | 20-40 ms | 50-100 ms |
| Python FastAPI (multi-worker) | 1,500-4,000 | 60-150 ms | 200-500 ms |
| Ruby Rails (YJIT + Puma) | 1,500-3,000 | 50-150 ms | 200-600 ms |
| PHP Laravel Octane | 1,000-3,000 | 50-100 ms | 200-400 ms |
Sources: TechEmpower R23, Go: Managing 10K+ Concurrent Connections, How Fast Is ASP.NET Core? (dusted.codes)
The Scaling Gap Widens — Mostly
The ratio between the top and bottom tiers grows with concurrency:
| Concurrency | Top (Rust) | Bottom (Laravel) | Ratio |
|---|---|---|---|
| 100 | ~30,000 req/s | ~500 req/s | 60x |
| 1,000 | ~40,000 req/s | ~2,000 req/s | 20x |
| 10,000 | ~50,000 req/s | ~1,500 req/s | 33x |
But within the compiled tier (Rust, Go, C#, Java reactive), the gap actually narrows at high concurrency as the bottleneck shifts from CPU to I/O and connection management.
Elixir Phoenix deserves a special callout. Its raw throughput is moderate (4,000-8,000 req/s), but notice something remarkable: its P99 latency barely changes between 100 and 10,000 connections (15-40 ms vs 50-100 ms). Sharkbench measured Phoenix with the highest stability score of any framework at 84.9%. The BEAM VM’s preemptive scheduler ensures no single request can monopolize a CPU core, providing the most predictable latency profile of any runtime tested. If your requirement is “no request ever takes more than X milliseconds,” Phoenix is worth a serious look regardless of its moderate peak throughput.
Tail Latency: Where the Real Differences Live
Average response time is what you measure. Tail latency is what your users experience. P99 latency — the response time that 99% of requests beat — is where garbage collection, thread contention, and memory management show their true cost.
P99/P50 Ratio by Language
This ratio measures how consistent a language is. A ratio of 3x means the worst 1% of requests are 3 times slower than the median. A ratio of 20x means occasional requests are 20 times slower — visible as UI lag, timeout errors, and frustrated users.
| Language | P50 | P99 | P99/P50 Ratio | Why |
|---|---|---|---|---|
| Rust | 1-3 ms | 5-15 ms | 3-5x | No GC, no runtime pauses |
| Go | 1-3 ms | 5-15 ms | 3-5x | GC pauses <1 ms |
| Elixir | 5-10 ms | 15-40 ms | 3-4x | Per-process GC, preemptive scheduling |
| PHP (Laravel) | 50-100 ms | 200-500 ms | 4-5x | Per-request model is consistent (but slow) |
| C# | 1-2 ms | 5-30 ms | 5-15x | Generally good; occasional GC spikes |
| Node.js | 3-5 ms | 15-50 ms | 5-10x | Event loop blocking causes spikes |
| Python | 15-25 ms | 80-200 ms | 5-8x | GIL contention; worker-dependent |
| Ruby | 5-15 ms | 30-100 ms | 5-7x | GVL contention; YJIT helps |
| Java (G1GC) | 1-5 ms | 10-100 ms | 10-20x | GC stop-the-world pauses |
Sources: Sharkbench stability scores, Why Tail Latency Matters (Medium), More Evidence for Problems in VM Warmup (Laurence Tratt)
Why This Matters More at Scale
The P99/P50 ratio worsens under higher load for garbage-collected languages. More allocations per second means more frequent GC cycles, and more concurrent requests means each GC pause affects more in-flight requests.
At 10,000 concurrent connections:
- Rust: P99/P50 stays at ~3-5x (barely changes from 100 connections)
- Go: P99/P50 stays at ~3-5x (goroutine scheduler handles load gracefully)
- Java (G1GC): P99/P50 grows to 20-50x (heap pressure triggers more aggressive GC)
- Node.js: P99/P50 grows to 10-20x (event loop congestion, backpressure)
Java’s answer is ZGC (production-ready since JDK 21), which brings GC pause times under 1 ms regardless of heap size. The trade-off is 5-10% lower throughput compared to G1GC. Most production Java deployments still use G1GC, but if P99 latency matters to your application, ZGC is the single most impactful configuration change you can make.
The GC Landscape
| Language | GC Type | Typical Pause | P99 Impact |
|---|---|---|---|
| Rust | None | 0 ms | None |
| Go | Concurrent, low-pause | <1 ms | Minimal |
| Java (ZGC) | Concurrent, sub-ms | <1 ms | Minimal |
| C# (.NET 8) | Generational, background | 1-10 ms | Moderate |
| Java (G1GC) | Generational, concurrent | 5-50 ms | Significant |
| Java (ParallelGC) | Stop-the-world | 50-500 ms | Severe |
Sources: Baeldung: JVM Warm-Up, JVM Warmup Optimization (Java Code Geeks)
The Database Equalizer
Everything above measures compute-bound performance — how fast the language itself processes requests. But most real web applications don’t just process requests. They query databases. And that changes the picture dramatically.
How the Gap Compresses
TechEmpower runs six test types on the same hardware with the same PostgreSQL database. Watch how the performance spread changes as database involvement increases:
| Test Type | What It Measures | Performance Spread |
|---|---|---|
| Plaintext | Raw HTTP throughput | 100x+ |
| JSON | Serialization + HTTP | 50-80x |
| Single Query | 1 DB SELECT | 20-40x |
| Fortunes | SELECT + template rendering | 15-30x |
| Multiple Queries (20) | 20 DB SELECTs | 8-15x |
| Data Updates (20) | 20 SELECT + 20 UPDATE | 5-10x |
Source: TechEmpower Framework Benchmarks Round 23, GoFrame TechEmpower R23 Analysis
The performance gap between the fastest and slowest frameworks narrows from over 100x in plaintext to under 10x with database writes. This is the most important finding in this entire analysis for anyone building a typical web application.
Why? With 20 database queries per request, each taking 0.5-2 ms of round-trip time, the minimum request time is 10-40 ms regardless of language. A framework that processes its part in 0.1 ms (Rust) versus 2 ms (Python) only changes total response time from 40.1 ms to 42 ms — a 5% difference rather than the 20x difference you see in pure compute benchmarks.
The Fortunes Test: The Most Realistic Benchmark
TechEmpower’s Fortunes test is the closest to a real web application: fetch rows from PostgreSQL, add a row in memory, sort, HTML-escape, and render via a template engine. Here’s how the major frameworks performed on 56-core hardware in Round 23:
| Language/Framework | Approx Requests/sec | Notes |
|---|---|---|
| C++ drogon | ~616,000 | Full MVC with templating |
| Rust xitca-web | ~588,000 | Proper MVC implementation |
| Java Jooby | ~404,000 | Lightweight Java framework (not Spring) |
| Rust Axum | ~400,000 | Full stack with PostgreSQL |
| Go atreugo | ~381,000 | Complete implementation |
| PHP mixphp | ~309,000 | Optimized PHP framework |
| C# ASP.NET Core (platform) | ~300,000+ | Stripped-down platform benchmark |
| C# ASP.NET Core MVC | ~184,000 | Realistic with templating engine |
| Node.js polkadot | ~125,000 | Optimized implementation |
| Java Spring Boot | ~60,000-80,000 | Full framework with templates |
| Elixir Phoenix | ~25,000-40,000 | BEAM VM |
| Python FastAPI | ~16,000-20,000 | Estimated from composite data |
| PHP Laravel | ~16,657 | Full framework |
| Ruby Rails | ~12,000-18,000 | With YJIT enabled |
Source: TechEmpower Framework Benchmarks Round 23, TechEmpower R23 Announcement
Important caveats: Many top TechEmpower entries are micro-optimized beyond production reality — custom allocators, raw SQL, SIMD parsing. The ASP.NET “platform” entry at 300K+ excludes standard framework features; the realistic MVC version scores ~184K. Java Jooby at 404K is not Spring Boot; Spring Boot with Thymeleaf scores 3-5x lower. Use these numbers for relative rankings, not absolute expectations.
What This Means for Real Applications
If your application makes 5-20 database calls per request (which describes most CRUD applications, admin panels, e-commerce sites, and content management systems), the language performance difference drops to 2-5x between the compiled and interpreted tiers.
At that point, query optimization, indexing, connection pooling, and caching strategy matter more than language choice. A poorly-indexed Django application with N+1 queries will be slower than a well-optimized Laravel application with proper eager loading, regardless of Python being “faster” than PHP in compute benchmarks.
Memory Efficiency Under Load
One final dimension: how much throughput do you get per megabyte of memory consumed?
| Language/Framework | Memory at 10K req/s | Req/s per MB | Efficiency Rating |
|---|---|---|---|
| Rust Axum | ~10 MB | ~2,100 | Exceptional |
| Go Gin | ~20 MB | ~500 | Excellent |
| Node.js Fastify | ~60 MB | ~155 | Good |
| C# ASP.NET Core | ~140 MB | ~105 | Good |
| Java Vert.x | ~500 MB | ~46 | Moderate |
| Python FastAPI | ~45 MB | ~26 | Moderate |
| Elixir Phoenix | ~150 MB | ~29 | Moderate |
| Ruby Rails | ~130 MB | ~18 | Low |
| Java Spring Boot | ~600 MB | ~17 | Low |
| PHP Laravel | ~85 MB | ~4 | Very Low |
Source: Sharkbench Web Framework Benchmark, memory measurements during sustained load
Rust delivers 2,100 requests per second per megabyte of memory. Laravel delivers 4. That’s a 500x difference in memory efficiency — which translates directly to infrastructure costs when you’re scaling horizontally.
The Bottom Line
Here’s how to think about all of this data when making real technology decisions:
If latency consistency matters (fintech, gaming, real-time): Rust or Go. Their P99/P50 ratios stay stable under any load. If you need the JVM ecosystem, use ZGC.
If peak throughput matters (high-traffic APIs): Rust for absolute maximum. Go for 60-80% of Rust’s throughput with dramatically simpler code. Java Vert.x or C# ASP.NET Core if you need those ecosystems.
If your application is database-heavy (most web apps): The language matters 3-5x less than benchmarks suggest. Pick the language that makes your team most productive and invest in query optimization, indexing, and caching. Even Python and Ruby are adequate when the database dominates response time.
If you need massive concurrent connections (chat, IoT, WebSockets): Elixir Phoenix (proven at 2M+ concurrent connections) or Go (goroutines scale to 100K+ connections trivially). Java with Virtual Threads (JDK 21+) is now competitive here.
If infrastructure cost matters (microservices at scale): Go or Rust. The 10-30x memory efficiency advantage compounds across dozens of services. But consider whether a single Java monolith would actually use less total memory than 50 Go microservices.
If developer productivity matters most (startups, small teams): Pick the language and framework your team knows best. The 5-20x performance difference between frameworks matters far less than the 2-5x productivity difference between a team writing idiomatic code in their preferred language versus wrestling with an unfamiliar one.
The benchmark data is clear. The right choice isn’t.
Want to try any of these languages? Every language linked in this article has a dedicated page on CodeArchaeology with Hello World tutorials and Docker images to get you running in minutes. Browse our complete collection of 70+ languages.
Sources
Primary Benchmarks
- TechEmpower Framework Benchmarks Round 22/23
- TechEmpower Round 23 Announcement
- Sharkbench Web Framework Benchmark
- Travis Luong: FastAPI vs Fastify vs Spring Boot vs Gin
- 2024 Fastest REST API Servers
Java / JVM Performance
- Vincenzo Racca: Spring Boot vs GraalVM Performance
- Baeldung: JVM Warm-Up
- JVM Warmup Optimization (Java Code Geeks)
- JVM and Cache Warm-Up Strategy (Teads Engineering)
- Analyzing and Tuning Warmup (Azul)
- GraalVM Native Image vs Traditional JVM (Java Code Geeks)
- GraalVM Performance Boost (InfoQ)
- Java vs Go vs Rust Comparison (index.dev)
ASP.NET Core / .NET
- How Fast Is ASP.NET Core? (dusted.codes)
- .NET vs Node.js vs Spring Boot vs Django vs Go (BeyondTheSemicolon)
Tail Latency and GC
- Why Tail Latency Matters (Medium)
- More Evidence for Problems in VM Warmup (Laurence Tratt)
- Don’t Get Caught in the Cold (USENIX OSDI'16)
Framework-Specific
- YJIT 3.4: Even Faster (Rails at Scale)
- Ruby Application Servers 2025 (DeployHQ)
- Laravel Octane: Drivers, Benchmarks & Safe Adoption
- Phoenix: Road to 2 Million WebSocket Connections
- Go: Managing 10K+ Concurrent Connections
Comments
Loading comments...
Leave a Comment