The Weight of Your Web Stack, Part 2: What Your Backend Costs Under Load

In Part 1, we measured what web backend frameworks cost before serving a single request — idle memory, startup time, and Docker image sizes. We found a 30-100x spread between the lightest (Rust at 3 MB) and heaviest (Spring Boot at 500 MB) frameworks. In Part 3 we cover how to actually choose the right backend for your workload, team, and scale.

But idle cost is only half the story. Some frameworks with high startup costs become remarkably efficient once they’re running. Others with tiny idle footprints hit scaling walls under load. The question every developer and architect needs answered is: after your application is warmed up and running, how does each language actually perform?

We compiled data from TechEmpower Rounds 22 and 23, the Sharkbench Web Framework Benchmark (August 2025), and 40+ independent benchmark studies to find out.

Warmed-Up Throughput: The Steady-State Rankings

First, the raw numbers. This table shows throughput after warmup, measured on the same hardware (Sharkbench: Ryzen 7 7800X3D, Docker, 1 CPU core equivalent) running concurrent HTTP requests with JSON serialization and I/O operations.

Language/Framework	Requests/sec	Median Latency	Memory	Stability*
Java Vert.x (Temurin JVM)	23,116	1.3 ms	484 MB	—
Bun.serve	22,303	1.2 ms	24.5 MB	10.4%
Rust Actix	21,965	1.4 ms	16.6 MB	66.6%
Rust Axum	21,030	1.6 ms	8.5 MB	72.0%
Java Vert.x (Semeru)	19,917	1.5 ms	137 MB	—
C# ASP.NET Core	14,707	1.2 ms	136.5 MB	2.6%
Node.js Fastify	9,340	3.4 ms	57 MB	63.2%
Java Spring WebFlux (Semeru)	7,051	1.2 ms	130 MB	—
Java Quarkus Reactive	6,473	0.7 ms	341 MB	—
Node.js Express	5,766	5.5 ms	82.5 MB	64.5%
Go FastHTTP	5,567	0.7 ms	13.4 MB	0.8%
Elixir Phoenix (Bandit)	4,375	7.3 ms	145.5 MB	84.9%
Go Gin	3,546	1.0 ms	16.7 MB	1.1%
Ruby Rails 8 (YJIT)	2,340	1.2 ms	125 MB	1.0%
Java Spring MVC (Semeru)	2,305	1.1 ms	157.5 MB	—
Python FastAPI (Uvicorn)	1,185	21.0 ms	41.2 MB	21.2%
Java Spring MVC (Temurin)	1,105	1.7 ms	597 MB	—
Python Flask (Gunicorn)	1,092	7.7 ms	90.3 MB	9.2%
Python Django (Gunicorn)	950	8.8 ms	130 MB	10.3%
PHP Symfony 6.4	941	8.7 ms	55.4 MB	10.3%
PHP Laravel 11	299	101.7 ms	84.2 MB	56.5%

* Stability = median latency / P99 latency ratio. Higher means more predictable response times.

Source: Sharkbench Web Framework Benchmark, August 2025

Several things jump out of this data.

Java Vert.x tops the chart — not Rust. At 23,116 req/s, Java’s reactive framework actually beats Rust Actix (21,965 req/s) on this benchmark. But it does so at a cost of 484 MB of memory, a 29x memory penalty compared to Rust Axum’s 8.5 MB for similar throughput.

Go looks surprisingly slow at 3,546-5,567 req/s. This is because Sharkbench limits each framework to 1 CPU core in Docker. Go’s goroutine model is designed for multi-core scaling — its per-core numbers are modest, but they multiply roughly linearly with available cores. On an 8-core machine, those numbers would be 3-6x higher.

Framework choice within a language matters enormously. Java Vert.x (23,116 req/s) versus Spring MVC (1,105-2,305 req/s) is a 10-20x difference within the same language. Node.js Fastify (9,340 req/s) versus Express (5,766 req/s) is a 1.6x gap. The framework you pick can matter as much as the language.

Bun has impressive raw speed but terrible stability — a 10.4% stability score means its P99 latency is nearly 10x its median latency. Fast on average, but unpredictable on the tail.

The JVM Warmup Question: Does It Pay Off?

This is the question Java developers ask most: after the JVM warms up and the JIT compiler kicks in, does Java actually become faster than Go, Rust, or C#?

Cold vs. Warmed Performance

The JVM doesn’t just run code — it learns from it. Through tiered JIT compilation, the C2 compiler identifies “hot” code paths and compiles them to highly optimized native machine code with aggressive inlining, escape analysis, and speculative optimizations based on runtime data.

Metric	Cold JVM	Warmed JVM	Improvement
First request latency	50-500 ms	1-5 ms	10-100x
Throughput (first 10s)	20-40% of peak	100% of peak	2.5-5x
Time to peak	Baseline	15-45 seconds typical	—
P99 latency	Highly variable	Stabilizes within 2-5x of median	Dramatic

The warmup effect is real and substantial. A payment service documented by Azul reduced time-to-peak-performance from 45 seconds to 12 seconds by pre-compiling 20 key methods. Teads, an ad-tech company, implemented a 2 minute 40 second warmup period before serving live traffic, which eliminated timeout spikes entirely.

HotSpot JVM vs. GraalVM Native Image

A Spring PetClinic benchmark by Vincenzo Racca measured the trade-off directly:

Metric	HotSpot JVM (JIT)	GraalVM Native Image (AOT)	Difference
Startup time	7.18 seconds	0.22 seconds	Native wins by 33x
Memory (RSS)	1,751 MB	694 MB	Native uses 40% as much
Peak throughput	12,800 req/s	10,249 req/s	JVM wins by 25%

The JIT advantage is genuine but modest: 25% more throughput in exchange for 2.5x more memory and 33x slower startup. Whether that trade-off makes sense depends entirely on your deployment model.

At higher concurrency levels, the picture gets more interesting. The same benchmark at 200-300 concurrent users showed GraalVM Native Image actually edging out HotSpot JVM on throughput, likely because its lower memory footprint reduced GC pressure under load.

Does Warmed Java Beat Go or Rust?

No. The JVM warmup investment narrows the gap substantially but does not close it.

Scenario	Java (warmed)	Go	Rust	C# (.NET 8)
JSON serialization	50,000-100,000	80,000-150,000	500,000-1,000,000+	100,000-300,000
Simple REST endpoint	50,000-100,000	80,000-150,000	150,000-500,000	100,000-250,000
DB-backed API (1 query)	30,000-60,000	40,000-80,000	60,000-120,000	40,000-80,000
P50 latency	1-3 ms	1-3 ms	1-3 ms	1-2 ms
P99 latency	5-50 ms	5-15 ms	5-15 ms	5-30 ms

Values are requests/second and represent ranges across multiple benchmark sources. All measurements after warmup on multi-core hardware. Sources: TechEmpower R23, Sharkbench, index.dev: Java vs Go vs Rust Comparison

After warmup, Java reaches 60-80% of Go’s throughput and 40-60% of Rust’s for HTTP workloads. Java Vert.x and other reactive frameworks can approach or match Go and C# on median latency, but P99 latency remains significantly worse due to garbage collection pauses.

Where Java is competitive after warmup: mature thread pool management, excellent connection pooling, and the JIT’s ability to optimize hot paths based on actual runtime behavior — optimizations that ahead-of-time compilers can’t make.

Where Java still loses: raw throughput, P99 tail latency, and memory consumption per unit of throughput (10-30x more than Rust/Go).

Performance at Scale: 100, 1,000, and 10,000 Connections

Abstract benchmarks are useful, but the question that matters in production is: how do these frameworks perform at different levels of real concurrency?

100 Concurrent Connections (Low Load)

At low concurrency, most languages perform well. The differences are smallest here.

Language/Framework	Throughput	Avg Latency	P99 Latency
Rust Actix	22,000-36,000	1-3 ms	5-10 ms
Go net/http	18,000-30,000	1-3 ms	5-10 ms
C# ASP.NET Core	16,000-27,000	1-3 ms	5-15 ms
Java Vert.x (warmed)	15,000-25,000	1-3 ms	5-20 ms
Java Spring Boot (warmed)	8,000-15,000	2-5 ms	10-30 ms
Node.js Fastify	8,000-13,000	3-5 ms	10-30 ms
Elixir Phoenix	4,000-8,000	5-10 ms	15-40 ms
Ruby Rails (YJIT)	2,000-4,500	5-15 ms	30-80 ms
Python FastAPI	1,000-3,000	15-30 ms	50-150 ms
PHP Laravel	300-1,000	50-100 ms	200-400 ms

Sources: TechEmpower R23, Sharkbench, Travis Luong: FastAPI vs Fastify vs Spring Boot vs Gin

1,000 Concurrent Connections (Medium Load)

Performance starts to differentiate. Languages with efficient concurrency models pull ahead.

Language/Framework	Throughput	Avg Latency	P99 Latency
Rust Actix	30,000-50,000	3-8 ms	10-20 ms
Go net/http	25,000-45,000	3-10 ms	10-20 ms
C# ASP.NET Core	20,000-40,000	5-12 ms	15-40 ms
Java Vert.x (warmed)	20,000-35,000	5-15 ms	20-60 ms
Java Spring Boot (warmed)	10,000-20,000	10-25 ms	30-100 ms
Node.js Fastify	8,000-12,000	10-25 ms	30-80 ms
Go Gin	8,000-15,000	5-15 ms	15-30 ms
Elixir Phoenix	4,000-7,000	10-20 ms	30-60 ms
Python FastAPI (multi-worker)	2,000-5,000	30-60 ms	100-300 ms
Ruby Rails (YJIT)	2,000-4,000	15-40 ms	80-200 ms
PHP Laravel Octane	1,500-4,000	20-50 ms	100-250 ms

Sources: TechEmpower R23, 2024 Fastest REST API Servers

10,000 Concurrent Connections (High Load)

This is where architecture matters more than micro-optimization. Languages without efficient concurrent connection handling hit walls.

Language/Framework	Throughput	Avg Latency	P99 Latency
Rust Actix	35,000-60,000	15-45 ms	30-80 ms
Go net/http	30,000-50,000	20-60 ms	40-100 ms
C# ASP.NET Core	15,000-35,000	20-60 ms	50-150 ms
Java Vert.x (warmed)	15,000-30,000	20-50 ms	50-200 ms
Go Gin	10,000-20,000	15-50 ms	30-80 ms
Node.js Fastify (clustered)	6,000-10,000	30-60 ms	80-250 ms
Java Spring Boot (warmed)	5,000-12,000	40-100 ms	100-500 ms
Elixir Phoenix	4,000-8,000	20-40 ms	50-100 ms
Python FastAPI (multi-worker)	1,500-4,000	60-150 ms	200-500 ms
Ruby Rails (YJIT + Puma)	1,500-3,000	50-150 ms	200-600 ms
PHP Laravel Octane	1,000-3,000	50-100 ms	200-400 ms

Sources: TechEmpower R23, Go: Managing 10K+ Concurrent Connections, How Fast Is ASP.NET Core? (dusted.codes)

The Scaling Gap Widens — Mostly

The ratio between the top and bottom tiers grows with concurrency:

Concurrency	Top (Rust)	Bottom (Laravel)	Ratio
100	~30,000 req/s	~500 req/s	60x
1,000	~40,000 req/s	~2,000 req/s	20x
10,000	~50,000 req/s	~1,500 req/s	33x

But within the compiled tier (Rust, Go, C#, Java reactive), the gap actually narrows at high concurrency as the bottleneck shifts from CPU to I/O and connection management.

Elixir Phoenix deserves a special callout. Its raw throughput is moderate (4,000-8,000 req/s), but notice something remarkable: its P99 latency barely changes between 100 and 10,000 connections (15-40 ms vs 50-100 ms). Sharkbench measured Phoenix with the highest stability score of any framework at 84.9%. The BEAM VM’s preemptive scheduler ensures no single request can monopolize a CPU core, providing the most predictable latency profile of any runtime tested. If your requirement is “no request ever takes more than X milliseconds,” Phoenix is worth a serious look regardless of its moderate peak throughput.

Tail Latency: Where the Real Differences Live

Average response time is what you measure. Tail latency is what your users experience. P99 latency — the response time that 99% of requests beat — is where garbage collection, thread contention, and memory management show their true cost.

P99/P50 Ratio by Language

This ratio measures how consistent a language is. A ratio of 3x means the worst 1% of requests are 3 times slower than the median. A ratio of 20x means occasional requests are 20 times slower — visible as UI lag, timeout errors, and frustrated users.

Language	P50	P99	P99/P50 Ratio	Why
Rust	1-3 ms	5-15 ms	3-5x	No GC, no runtime pauses
Go	1-3 ms	5-15 ms	3-5x	GC pauses <1 ms
Elixir	5-10 ms	15-40 ms	3-4x	Per-process GC, preemptive scheduling
PHP (Laravel)	50-100 ms	200-500 ms	4-5x	Per-request model is consistent (but slow)
C#	1-2 ms	5-30 ms	5-15x	Generally good; occasional GC spikes
Node.js	3-5 ms	15-50 ms	5-10x	Event loop blocking causes spikes
Python	15-25 ms	80-200 ms	5-8x	GIL contention; worker-dependent
Ruby	5-15 ms	30-100 ms	5-7x	GVL contention; YJIT helps
Java (G1GC)	1-5 ms	10-100 ms	10-20x	GC stop-the-world pauses

Sources: Sharkbench stability scores, Why Tail Latency Matters (Medium), More Evidence for Problems in VM Warmup (Laurence Tratt)

Why This Matters More at Scale

The P99/P50 ratio worsens under higher load for garbage-collected languages. More allocations per second means more frequent GC cycles, and more concurrent requests means each GC pause affects more in-flight requests.

At 10,000 concurrent connections:

Rust: P99/P50 stays at ~3-5x (barely changes from 100 connections)
Go: P99/P50 stays at ~3-5x (goroutine scheduler handles load gracefully)
Java (G1GC): P99/P50 grows to 20-50x (heap pressure triggers more aggressive GC)
Node.js: P99/P50 grows to 10-20x (event loop congestion, backpressure)

Java’s answer is ZGC (production-ready since JDK 21), which brings GC pause times under 1 ms regardless of heap size. The trade-off is 5-10% lower throughput compared to G1GC. Most production Java deployments still use G1GC, but if P99 latency matters to your application, ZGC is the single most impactful configuration change you can make.

The GC Landscape

Language	GC Type	Typical Pause	P99 Impact
Rust	None	0 ms	None
Go	Concurrent, low-pause	<1 ms	Minimal
Java (ZGC)	Concurrent, sub-ms	<1 ms	Minimal
C# (.NET 8)	Generational, background	1-10 ms	Moderate
Java (G1GC)	Generational, concurrent	5-50 ms	Significant
Java (ParallelGC)	Stop-the-world	50-500 ms	Severe

Sources: Baeldung: JVM Warm-Up, JVM Warmup Optimization (Java Code Geeks)

The Database Equalizer

Everything above measures compute-bound performance — how fast the language itself processes requests. But most real web applications don’t just process requests. They query databases. And that changes the picture dramatically.

How the Gap Compresses

TechEmpower runs six test types on the same hardware with the same PostgreSQL database. Watch how the performance spread changes as database involvement increases:

Test Type	What It Measures	Performance Spread
Plaintext	Raw HTTP throughput	100x+
JSON	Serialization + HTTP	50-80x
Single Query	1 DB SELECT	20-40x
Fortunes	SELECT + template rendering	15-30x
Multiple Queries (20)	20 DB SELECTs	8-15x
Data Updates (20)	20 SELECT + 20 UPDATE	5-10x

Source: TechEmpower Framework Benchmarks Round 23, GoFrame TechEmpower R23 Analysis

The performance gap between the fastest and slowest frameworks narrows from over 100x in plaintext to under 10x with database writes. This is the most important finding in this entire analysis for anyone building a typical web application.

Why? With 20 database queries per request, each taking 0.5-2 ms of round-trip time, the minimum request time is 10-40 ms regardless of language. A framework that processes its part in 0.1 ms (Rust) versus 2 ms (Python) only changes total response time from 40.1 ms to 42 ms — a 5% difference rather than the 20x difference you see in pure compute benchmarks.

The Fortunes Test: The Most Realistic Benchmark

TechEmpower’s Fortunes test is the closest to a real web application: fetch rows from PostgreSQL, add a row in memory, sort, HTML-escape, and render via a template engine. Here’s how the major frameworks performed on 56-core hardware in Round 23:

Language/Framework	Approx Requests/sec	Notes
C++ drogon	~616,000	Full MVC with templating
Rust xitca-web	~588,000	Proper MVC implementation
Java Jooby	~404,000	Lightweight Java framework (not Spring)
Rust Axum	~400,000	Full stack with PostgreSQL
Go atreugo	~381,000	Complete implementation
PHP mixphp	~309,000	Optimized PHP framework
C# ASP.NET Core (platform)	~300,000+	Stripped-down platform benchmark
C# ASP.NET Core MVC	~184,000	Realistic with templating engine
Node.js polkadot	~125,000	Optimized implementation
Java Spring Boot	~60,000-80,000	Full framework with templates
Elixir Phoenix	~25,000-40,000	BEAM VM
Python FastAPI	~16,000-20,000	Estimated from composite data
PHP Laravel	~16,657	Full framework
Ruby Rails	~12,000-18,000	With YJIT enabled

Source: TechEmpower Framework Benchmarks Round 23, TechEmpower R23 Announcement

Important caveats: Many top TechEmpower entries are micro-optimized beyond production reality — custom allocators, raw SQL, SIMD parsing. The ASP.NET “platform” entry at 300K+ excludes standard framework features; the realistic MVC version scores ~184K. Java Jooby at 404K is not Spring Boot; Spring Boot with Thymeleaf scores 3-5x lower. Use these numbers for relative rankings, not absolute expectations.

What This Means for Real Applications

If your application makes 5-20 database calls per request (which describes most CRUD applications, admin panels, e-commerce sites, and content management systems), the language performance difference drops to 2-5x between the compiled and interpreted tiers.

At that point, query optimization, indexing, connection pooling, and caching strategy matter more than language choice. A poorly-indexed Django application with N+1 queries will be slower than a well-optimized Laravel application with proper eager loading, regardless of Python being “faster” than PHP in compute benchmarks.

Memory Efficiency Under Load

One final dimension: how much throughput do you get per megabyte of memory consumed?

Language/Framework	Memory at 10K req/s	Req/s per MB	Efficiency Rating
Rust Axum	~10 MB	~2,100	Exceptional
Go Gin	~20 MB	~500	Excellent
Node.js Fastify	~60 MB	~155	Good
C# ASP.NET Core	~140 MB	~105	Good
Java Vert.x	~500 MB	~46	Moderate
Python FastAPI	~45 MB	~26	Moderate
Elixir Phoenix	~150 MB	~29	Moderate
Ruby Rails	~130 MB	~18	Low
Java Spring Boot	~600 MB	~17	Low
PHP Laravel	~85 MB	~4	Very Low

Source: Sharkbench Web Framework Benchmark, memory measurements during sustained load

Rust delivers 2,100 requests per second per megabyte of memory. Laravel delivers 4. That’s a 500x difference in memory efficiency — which translates directly to infrastructure costs when you’re scaling horizontally.

The Bottom Line

Here’s how to think about all of this data when making real technology decisions:

If latency consistency matters (fintech, gaming, real-time): Rust or Go. Their P99/P50 ratios stay stable under any load. If you need the JVM ecosystem, use ZGC.

If peak throughput matters (high-traffic APIs): Rust for absolute maximum. Go for 60-80% of Rust’s throughput with dramatically simpler code. Java Vert.x or C# ASP.NET Core if you need those ecosystems.

If your application is database-heavy (most web apps): The language matters 3-5x less than benchmarks suggest. Pick the language that makes your team most productive and invest in query optimization, indexing, and caching. Even Python and Ruby are adequate when the database dominates response time.

If you need massive concurrent connections (chat, IoT, WebSockets): Elixir Phoenix (proven at 2M+ concurrent connections) or Go (goroutines scale to 100K+ connections trivially). Java with Virtual Threads (JDK 21+) is now competitive here.

If infrastructure cost matters (microservices at scale): Go or Rust. The 10-30x memory efficiency advantage compounds across dozens of services. But consider whether a single Java monolith would actually use less total memory than 50 Go microservices.

If developer productivity matters most (startups, small teams): Pick the language and framework your team knows best. The 5-20x performance difference between frameworks matters far less than the 2-5x productivity difference between a team writing idiomatic code in their preferred language versus wrestling with an unfamiliar one.

The benchmark data is clear. The right choice isn’t.

Want to try any of these languages? Every language linked in this article has a dedicated page on CodeArchaeology with Hello World tutorials and Docker images to get you running in minutes. Browse our complete collection of 70+ languages.

The Weight of Your Web Stack, Part 2: What Your Backend Costs Under Load

Warmed-Up Throughput: The Steady-State Rankings

The JVM Warmup Question: Does It Pay Off?

Cold vs. Warmed Performance

HotSpot JVM vs. GraalVM Native Image

Does Warmed Java Beat Go or Rust?

Performance at Scale: 100, 1,000, and 10,000 Connections

100 Concurrent Connections (Low Load)

1,000 Concurrent Connections (Medium Load)

10,000 Concurrent Connections (High Load)

The Scaling Gap Widens — Mostly

Tail Latency: Where the Real Differences Live

P99/P50 Ratio by Language

Why This Matters More at Scale

The GC Landscape

The Database Equalizer

How the Gap Compresses

The Fortunes Test: The Most Realistic Benchmark

What This Means for Real Applications

Memory Efficiency Under Load

The Bottom Line

Sources

Primary Benchmarks

Java / JVM Performance

ASP.NET Core / .NET

Tail Latency and GC

Framework-Specific

TechEmpower Analysis

Concurrency Models

Comments

Leave a Comment