The Weight of Your Web Stack, Part 2: What Your Backend Costs Under Load

In Part 1, we measured what web backend frameworks cost before serving a single request — idle memory, startup time, and Docker image sizes. We found a 30-100x spread between the lightest (Rust at 3 MB) and heaviest (Spring Boot at 500 MB) frameworks. In Part 3 we cover how to actually choose the right backend for your workload, team, and scale.

But idle cost is only half the story. Some frameworks with high startup costs become remarkably efficient once they’re running. Others with tiny idle footprints hit scaling walls under load. The question every developer and architect needs answered is: after your application is warmed up and running, how does each language actually perform?

We compiled data from TechEmpower Rounds 22 and 23, the Sharkbench Web Framework Benchmark (August 2025), and 40+ independent benchmark studies to find out.

Warmed-Up Throughput: The Steady-State Rankings

First, the raw numbers. This table shows throughput after warmup, measured on the same hardware (Sharkbench: Ryzen 7 7800X3D, Docker, 1 CPU core equivalent) running concurrent HTTP requests with JSON serialization and I/O operations.

Language/FrameworkRequests/secMedian LatencyMemoryStability*
Java Vert.x (Temurin JVM)23,1161.3 ms484 MB
Bun.serve22,3031.2 ms24.5 MB10.4%
Rust Actix21,9651.4 ms16.6 MB66.6%
Rust Axum21,0301.6 ms8.5 MB72.0%
Java Vert.x (Semeru)19,9171.5 ms137 MB
C# ASP.NET Core14,7071.2 ms136.5 MB2.6%
Node.js Fastify9,3403.4 ms57 MB63.2%
Java Spring WebFlux (Semeru)7,0511.2 ms130 MB
Java Quarkus Reactive6,4730.7 ms341 MB
Node.js Express5,7665.5 ms82.5 MB64.5%
Go FastHTTP5,5670.7 ms13.4 MB0.8%
Elixir Phoenix (Bandit)4,3757.3 ms145.5 MB84.9%
Go Gin3,5461.0 ms16.7 MB1.1%
Ruby Rails 8 (YJIT)2,3401.2 ms125 MB1.0%
Java Spring MVC (Semeru)2,3051.1 ms157.5 MB
Python FastAPI (Uvicorn)1,18521.0 ms41.2 MB21.2%
Java Spring MVC (Temurin)1,1051.7 ms597 MB
Python Flask (Gunicorn)1,0927.7 ms90.3 MB9.2%
Python Django (Gunicorn)9508.8 ms130 MB10.3%
PHP Symfony 6.49418.7 ms55.4 MB10.3%
PHP Laravel 11299101.7 ms84.2 MB56.5%

* Stability = median latency / P99 latency ratio. Higher means more predictable response times.

Source: Sharkbench Web Framework Benchmark, August 2025

Several things jump out of this data.

Java Vert.x tops the chart — not Rust. At 23,116 req/s, Java’s reactive framework actually beats Rust Actix (21,965 req/s) on this benchmark. But it does so at a cost of 484 MB of memory, a 29x memory penalty compared to Rust Axum’s 8.5 MB for similar throughput.

Go looks surprisingly slow at 3,546-5,567 req/s. This is because Sharkbench limits each framework to 1 CPU core in Docker. Go’s goroutine model is designed for multi-core scaling — its per-core numbers are modest, but they multiply roughly linearly with available cores. On an 8-core machine, those numbers would be 3-6x higher.

Framework choice within a language matters enormously. Java Vert.x (23,116 req/s) versus Spring MVC (1,105-2,305 req/s) is a 10-20x difference within the same language. Node.js Fastify (9,340 req/s) versus Express (5,766 req/s) is a 1.6x gap. The framework you pick can matter as much as the language.

Bun has impressive raw speed but terrible stability — a 10.4% stability score means its P99 latency is nearly 10x its median latency. Fast on average, but unpredictable on the tail.

The JVM Warmup Question: Does It Pay Off?

This is the question Java developers ask most: after the JVM warms up and the JIT compiler kicks in, does Java actually become faster than Go, Rust, or C#?

Cold vs. Warmed Performance

The JVM doesn’t just run code — it learns from it. Through tiered JIT compilation, the C2 compiler identifies “hot” code paths and compiles them to highly optimized native machine code with aggressive inlining, escape analysis, and speculative optimizations based on runtime data.

MetricCold JVMWarmed JVMImprovement
First request latency50-500 ms1-5 ms10-100x
Throughput (first 10s)20-40% of peak100% of peak2.5-5x
Time to peakBaseline15-45 seconds typical
P99 latencyHighly variableStabilizes within 2-5x of medianDramatic

The warmup effect is real and substantial. A payment service documented by Azul reduced time-to-peak-performance from 45 seconds to 12 seconds by pre-compiling 20 key methods. Teads, an ad-tech company, implemented a 2 minute 40 second warmup period before serving live traffic, which eliminated timeout spikes entirely.

HotSpot JVM vs. GraalVM Native Image

A Spring PetClinic benchmark by Vincenzo Racca measured the trade-off directly:

MetricHotSpot JVM (JIT)GraalVM Native Image (AOT)Difference
Startup time7.18 seconds0.22 secondsNative wins by 33x
Memory (RSS)1,751 MB694 MBNative uses 40% as much
Peak throughput12,800 req/s10,249 req/sJVM wins by 25%

The JIT advantage is genuine but modest: 25% more throughput in exchange for 2.5x more memory and 33x slower startup. Whether that trade-off makes sense depends entirely on your deployment model.

At higher concurrency levels, the picture gets more interesting. The same benchmark at 200-300 concurrent users showed GraalVM Native Image actually edging out HotSpot JVM on throughput, likely because its lower memory footprint reduced GC pressure under load.

Does Warmed Java Beat Go or Rust?

No. The JVM warmup investment narrows the gap substantially but does not close it.

ScenarioJava (warmed)GoRustC# (.NET 8)
JSON serialization50,000-100,00080,000-150,000500,000-1,000,000+100,000-300,000
Simple REST endpoint50,000-100,00080,000-150,000150,000-500,000100,000-250,000
DB-backed API (1 query)30,000-60,00040,000-80,00060,000-120,00040,000-80,000
P50 latency1-3 ms1-3 ms1-3 ms1-2 ms
P99 latency5-50 ms5-15 ms5-15 ms5-30 ms

Values are requests/second and represent ranges across multiple benchmark sources. All measurements after warmup on multi-core hardware. Sources: TechEmpower R23, Sharkbench, index.dev: Java vs Go vs Rust Comparison

After warmup, Java reaches 60-80% of Go’s throughput and 40-60% of Rust’s for HTTP workloads. Java Vert.x and other reactive frameworks can approach or match Go and C# on median latency, but P99 latency remains significantly worse due to garbage collection pauses.

Where Java is competitive after warmup: mature thread pool management, excellent connection pooling, and the JIT’s ability to optimize hot paths based on actual runtime behavior — optimizations that ahead-of-time compilers can’t make.

Where Java still loses: raw throughput, P99 tail latency, and memory consumption per unit of throughput (10-30x more than Rust/Go).

Performance at Scale: 100, 1,000, and 10,000 Connections

Abstract benchmarks are useful, but the question that matters in production is: how do these frameworks perform at different levels of real concurrency?

100 Concurrent Connections (Low Load)

At low concurrency, most languages perform well. The differences are smallest here.

Language/FrameworkThroughputAvg LatencyP99 Latency
Rust Actix22,000-36,0001-3 ms5-10 ms
Go net/http18,000-30,0001-3 ms5-10 ms
C# ASP.NET Core16,000-27,0001-3 ms5-15 ms
Java Vert.x (warmed)15,000-25,0001-3 ms5-20 ms
Java Spring Boot (warmed)8,000-15,0002-5 ms10-30 ms
Node.js Fastify8,000-13,0003-5 ms10-30 ms
Elixir Phoenix4,000-8,0005-10 ms15-40 ms
Ruby Rails (YJIT)2,000-4,5005-15 ms30-80 ms
Python FastAPI1,000-3,00015-30 ms50-150 ms
PHP Laravel300-1,00050-100 ms200-400 ms

Sources: TechEmpower R23, Sharkbench, Travis Luong: FastAPI vs Fastify vs Spring Boot vs Gin

1,000 Concurrent Connections (Medium Load)

Performance starts to differentiate. Languages with efficient concurrency models pull ahead.

Language/FrameworkThroughputAvg LatencyP99 Latency
Rust Actix30,000-50,0003-8 ms10-20 ms
Go net/http25,000-45,0003-10 ms10-20 ms
C# ASP.NET Core20,000-40,0005-12 ms15-40 ms
Java Vert.x (warmed)20,000-35,0005-15 ms20-60 ms
Java Spring Boot (warmed)10,000-20,00010-25 ms30-100 ms
Node.js Fastify8,000-12,00010-25 ms30-80 ms
Go Gin8,000-15,0005-15 ms15-30 ms
Elixir Phoenix4,000-7,00010-20 ms30-60 ms
Python FastAPI (multi-worker)2,000-5,00030-60 ms100-300 ms
Ruby Rails (YJIT)2,000-4,00015-40 ms80-200 ms
PHP Laravel Octane1,500-4,00020-50 ms100-250 ms

Sources: TechEmpower R23, 2024 Fastest REST API Servers

10,000 Concurrent Connections (High Load)

This is where architecture matters more than micro-optimization. Languages without efficient concurrent connection handling hit walls.

Language/FrameworkThroughputAvg LatencyP99 Latency
Rust Actix35,000-60,00015-45 ms30-80 ms
Go net/http30,000-50,00020-60 ms40-100 ms
C# ASP.NET Core15,000-35,00020-60 ms50-150 ms
Java Vert.x (warmed)15,000-30,00020-50 ms50-200 ms
Go Gin10,000-20,00015-50 ms30-80 ms
Node.js Fastify (clustered)6,000-10,00030-60 ms80-250 ms
Java Spring Boot (warmed)5,000-12,00040-100 ms100-500 ms
Elixir Phoenix4,000-8,00020-40 ms50-100 ms
Python FastAPI (multi-worker)1,500-4,00060-150 ms200-500 ms
Ruby Rails (YJIT + Puma)1,500-3,00050-150 ms200-600 ms
PHP Laravel Octane1,000-3,00050-100 ms200-400 ms

Sources: TechEmpower R23, Go: Managing 10K+ Concurrent Connections, How Fast Is ASP.NET Core? (dusted.codes)

The Scaling Gap Widens — Mostly

The ratio between the top and bottom tiers grows with concurrency:

ConcurrencyTop (Rust)Bottom (Laravel)Ratio
100~30,000 req/s~500 req/s60x
1,000~40,000 req/s~2,000 req/s20x
10,000~50,000 req/s~1,500 req/s33x

But within the compiled tier (Rust, Go, C#, Java reactive), the gap actually narrows at high concurrency as the bottleneck shifts from CPU to I/O and connection management.

Elixir Phoenix deserves a special callout. Its raw throughput is moderate (4,000-8,000 req/s), but notice something remarkable: its P99 latency barely changes between 100 and 10,000 connections (15-40 ms vs 50-100 ms). Sharkbench measured Phoenix with the highest stability score of any framework at 84.9%. The BEAM VM’s preemptive scheduler ensures no single request can monopolize a CPU core, providing the most predictable latency profile of any runtime tested. If your requirement is “no request ever takes more than X milliseconds,” Phoenix is worth a serious look regardless of its moderate peak throughput.

Tail Latency: Where the Real Differences Live

Average response time is what you measure. Tail latency is what your users experience. P99 latency — the response time that 99% of requests beat — is where garbage collection, thread contention, and memory management show their true cost.

P99/P50 Ratio by Language

This ratio measures how consistent a language is. A ratio of 3x means the worst 1% of requests are 3 times slower than the median. A ratio of 20x means occasional requests are 20 times slower — visible as UI lag, timeout errors, and frustrated users.

LanguageP50P99P99/P50 RatioWhy
Rust1-3 ms5-15 ms3-5xNo GC, no runtime pauses
Go1-3 ms5-15 ms3-5xGC pauses <1 ms
Elixir5-10 ms15-40 ms3-4xPer-process GC, preemptive scheduling
PHP (Laravel)50-100 ms200-500 ms4-5xPer-request model is consistent (but slow)
C#1-2 ms5-30 ms5-15xGenerally good; occasional GC spikes
Node.js3-5 ms15-50 ms5-10xEvent loop blocking causes spikes
Python15-25 ms80-200 ms5-8xGIL contention; worker-dependent
Ruby5-15 ms30-100 ms5-7xGVL contention; YJIT helps
Java (G1GC)1-5 ms10-100 ms10-20xGC stop-the-world pauses

Sources: Sharkbench stability scores, Why Tail Latency Matters (Medium), More Evidence for Problems in VM Warmup (Laurence Tratt)

Why This Matters More at Scale

The P99/P50 ratio worsens under higher load for garbage-collected languages. More allocations per second means more frequent GC cycles, and more concurrent requests means each GC pause affects more in-flight requests.

At 10,000 concurrent connections:

  • Rust: P99/P50 stays at ~3-5x (barely changes from 100 connections)
  • Go: P99/P50 stays at ~3-5x (goroutine scheduler handles load gracefully)
  • Java (G1GC): P99/P50 grows to 20-50x (heap pressure triggers more aggressive GC)
  • Node.js: P99/P50 grows to 10-20x (event loop congestion, backpressure)

Java’s answer is ZGC (production-ready since JDK 21), which brings GC pause times under 1 ms regardless of heap size. The trade-off is 5-10% lower throughput compared to G1GC. Most production Java deployments still use G1GC, but if P99 latency matters to your application, ZGC is the single most impactful configuration change you can make.

The GC Landscape

LanguageGC TypeTypical PauseP99 Impact
RustNone0 msNone
GoConcurrent, low-pause<1 msMinimal
Java (ZGC)Concurrent, sub-ms<1 msMinimal
C# (.NET 8)Generational, background1-10 msModerate
Java (G1GC)Generational, concurrent5-50 msSignificant
Java (ParallelGC)Stop-the-world50-500 msSevere

Sources: Baeldung: JVM Warm-Up, JVM Warmup Optimization (Java Code Geeks)

The Database Equalizer

Everything above measures compute-bound performance — how fast the language itself processes requests. But most real web applications don’t just process requests. They query databases. And that changes the picture dramatically.

How the Gap Compresses

TechEmpower runs six test types on the same hardware with the same PostgreSQL database. Watch how the performance spread changes as database involvement increases:

Test TypeWhat It MeasuresPerformance Spread
PlaintextRaw HTTP throughput100x+
JSONSerialization + HTTP50-80x
Single Query1 DB SELECT20-40x
FortunesSELECT + template rendering15-30x
Multiple Queries (20)20 DB SELECTs8-15x
Data Updates (20)20 SELECT + 20 UPDATE5-10x

Source: TechEmpower Framework Benchmarks Round 23, GoFrame TechEmpower R23 Analysis

The performance gap between the fastest and slowest frameworks narrows from over 100x in plaintext to under 10x with database writes. This is the most important finding in this entire analysis for anyone building a typical web application.

Why? With 20 database queries per request, each taking 0.5-2 ms of round-trip time, the minimum request time is 10-40 ms regardless of language. A framework that processes its part in 0.1 ms (Rust) versus 2 ms (Python) only changes total response time from 40.1 ms to 42 ms — a 5% difference rather than the 20x difference you see in pure compute benchmarks.

The Fortunes Test: The Most Realistic Benchmark

TechEmpower’s Fortunes test is the closest to a real web application: fetch rows from PostgreSQL, add a row in memory, sort, HTML-escape, and render via a template engine. Here’s how the major frameworks performed on 56-core hardware in Round 23:

Language/FrameworkApprox Requests/secNotes
C++ drogon~616,000Full MVC with templating
Rust xitca-web~588,000Proper MVC implementation
Java Jooby~404,000Lightweight Java framework (not Spring)
Rust Axum~400,000Full stack with PostgreSQL
Go atreugo~381,000Complete implementation
PHP mixphp~309,000Optimized PHP framework
C# ASP.NET Core (platform)~300,000+Stripped-down platform benchmark
C# ASP.NET Core MVC~184,000Realistic with templating engine
Node.js polkadot~125,000Optimized implementation
Java Spring Boot~60,000-80,000Full framework with templates
Elixir Phoenix~25,000-40,000BEAM VM
Python FastAPI~16,000-20,000Estimated from composite data
PHP Laravel~16,657Full framework
Ruby Rails~12,000-18,000With YJIT enabled

Source: TechEmpower Framework Benchmarks Round 23, TechEmpower R23 Announcement

Important caveats: Many top TechEmpower entries are micro-optimized beyond production reality — custom allocators, raw SQL, SIMD parsing. The ASP.NET “platform” entry at 300K+ excludes standard framework features; the realistic MVC version scores ~184K. Java Jooby at 404K is not Spring Boot; Spring Boot with Thymeleaf scores 3-5x lower. Use these numbers for relative rankings, not absolute expectations.

What This Means for Real Applications

If your application makes 5-20 database calls per request (which describes most CRUD applications, admin panels, e-commerce sites, and content management systems), the language performance difference drops to 2-5x between the compiled and interpreted tiers.

At that point, query optimization, indexing, connection pooling, and caching strategy matter more than language choice. A poorly-indexed Django application with N+1 queries will be slower than a well-optimized Laravel application with proper eager loading, regardless of Python being “faster” than PHP in compute benchmarks.

Memory Efficiency Under Load

One final dimension: how much throughput do you get per megabyte of memory consumed?

Language/FrameworkMemory at 10K req/sReq/s per MBEfficiency Rating
Rust Axum~10 MB~2,100Exceptional
Go Gin~20 MB~500Excellent
Node.js Fastify~60 MB~155Good
C# ASP.NET Core~140 MB~105Good
Java Vert.x~500 MB~46Moderate
Python FastAPI~45 MB~26Moderate
Elixir Phoenix~150 MB~29Moderate
Ruby Rails~130 MB~18Low
Java Spring Boot~600 MB~17Low
PHP Laravel~85 MB~4Very Low

Source: Sharkbench Web Framework Benchmark, memory measurements during sustained load

Rust delivers 2,100 requests per second per megabyte of memory. Laravel delivers 4. That’s a 500x difference in memory efficiency — which translates directly to infrastructure costs when you’re scaling horizontally.

The Bottom Line

Here’s how to think about all of this data when making real technology decisions:

If latency consistency matters (fintech, gaming, real-time): Rust or Go. Their P99/P50 ratios stay stable under any load. If you need the JVM ecosystem, use ZGC.

If peak throughput matters (high-traffic APIs): Rust for absolute maximum. Go for 60-80% of Rust’s throughput with dramatically simpler code. Java Vert.x or C# ASP.NET Core if you need those ecosystems.

If your application is database-heavy (most web apps): The language matters 3-5x less than benchmarks suggest. Pick the language that makes your team most productive and invest in query optimization, indexing, and caching. Even Python and Ruby are adequate when the database dominates response time.

If you need massive concurrent connections (chat, IoT, WebSockets): Elixir Phoenix (proven at 2M+ concurrent connections) or Go (goroutines scale to 100K+ connections trivially). Java with Virtual Threads (JDK 21+) is now competitive here.

If infrastructure cost matters (microservices at scale): Go or Rust. The 10-30x memory efficiency advantage compounds across dozens of services. But consider whether a single Java monolith would actually use less total memory than 50 Go microservices.

If developer productivity matters most (startups, small teams): Pick the language and framework your team knows best. The 5-20x performance difference between frameworks matters far less than the 2-5x productivity difference between a team writing idiomatic code in their preferred language versus wrestling with an unfamiliar one.

The benchmark data is clear. The right choice isn’t.


Want to try any of these languages? Every language linked in this article has a dedicated page on CodeArchaeology with Hello World tutorials and Docker images to get you running in minutes. Browse our complete collection of 70+ languages.

Sources

Primary Benchmarks

Java / JVM Performance

ASP.NET Core / .NET

Tail Latency and GC

Framework-Specific

TechEmpower Analysis

Concurrency Models

Last updated:

Comments

Loading comments...

Leave a Comment

2000 characters remaining