Rust Engineering Practices
Release Profiles and Binary Size 🟡
What you'll learn:
- Release profile anatomy: LTO, codegen-units, panic strategy, strip, opt-level
- Thin vs Fat vs Cross-Language LTO trade-offs
- Binary size analysis with
cargo-bloat- Dependency trimming with
cargo-udepsandcargo-macheteCross-references: Compile-Time Tools — the other half of optimization · Benchmarking — measure runtime before you optimize · Dependencies — trimming deps reduces both size and compile time
The default cargo build --release is already good. But for production
deployment — especially single-binary tools deployed to thousands of servers —
there's a significant gap between "good" and "optimized." This chapter covers
the profile knobs and the tools to measure binary size.
Release Profile Anatomy
Cargo profiles control how rustc compiles your code. The defaults are
conservative — designed for broad compatibility, not maximum performance:
# Cargo.toml — Cargo's built-in defaults (what you get if you specify nothing)[profile.release]opt-level = 3 # Optimization level (0=none, 1=basic, 2=good, 3=aggressive)lto = false # Link-time optimization OFFcodegen-units = 16 # Parallel compilation units (faster compile, less optimization)panic = "unwind" # Stack unwinding on panic (larger binary, catch_unwind works)strip = "none" # Keep all symbols and debug infooverflow-checks = false # No integer overflow checks in releasedebug = false # No debug info in releaseProduction-optimized profile (what the project already uses):
[profile.release]lto = true # Full cross-crate optimizationcodegen-units = 1 # Single codegen unit — maximum optimization opportunitypanic = "abort" # No unwinding overhead — smaller, fasterstrip = true # Remove all symbols — smaller binaryThe impact of each setting:
| Setting | Default → Optimized | Binary Size | Runtime Speed | Compile Time |
|---|---|---|---|---|
lto = false → true | — | -10 to -20% | +5 to +20% | 2-5× slower |
codegen-units = 16 → 1 | — | -5 to -10% | +5 to +10% | 1.5-2× slower |
panic = "unwind" → "abort" | — | -5 to -10% | Negligible | Negligible |
strip = "none" → true | — | -50 to -70% | None | None |
opt-level = 3 → "s" | — | -10 to -30% | -5 to -10% | Similar |
opt-level = 3 → "z" | — | -15 to -40% | -10 to -20% | Similar |
Additional profile tweaks:
[profile.release]# All of the above, plus:overflow-checks = true # Keep overflow checks even in release (safety > speed)debug = "line-tables-only" # Minimal debug info for backtraces without full DWARFrpath = false # Don't embed runtime library pathsincremental = false # Disable incremental compilation (cleaner builds)# For size-optimized builds (embedded, WASM):# opt-level = "z" # Optimize for size aggressively# strip = "symbols" # Strip symbols but keep debug sectionsPer-crate profile overrides — optimize hot crates, leave others alone:
# Dev builds: optimize dependencies but not your code (fast recompile)[profile.dev.package."*"]opt-level = 2 # Optimize all dependencies in dev mode# Release builds: override specific crate optimization[profile.release.package.serde_json]opt-level = 3 # Maximum optimization for JSON parsingcodegen-units = 1# Test profile: match release behavior for accurate integration tests[profile.test]opt-level = 1 # Some optimization to avoid timeout in slow testsLTO in Depth — Thin vs Fat vs Cross-Language
Link-Time Optimization lets LLVM optimize across crate boundaries — inlining
functions from serde_json into your parsing code, removing dead code from
regex, etc. Without LTO, each crate is a separate optimization island.
[profile.release]# Option 1: Fat LTO (default when lto = true)lto = true# All code merged into one LLVM module → maximum optimization# Slowest compile, smallest/fastest binary# Option 2: Thin LTOlto = "thin"# Each crate stays separate but LLVM does cross-module optimization# Faster compile than fat LTO, nearly as good optimization# Best trade-off for most projects# Option 3: No LTOlto = false# Only intra-crate optimization# Fastest compile, larger binary# Option 4: Off (explicit)lto = "off"# Same as falseFat LTO vs Thin LTO:
| Aspect | Fat LTO (true) | Thin LTO ("thin") |
|---|---|---|
| Optimization quality | Best | ~95% of fat |
| Compile time | Slow (all code in one module) | Moderate (parallel modules) |
| Memory usage | High (all LLVM IR in memory) | Lower (streaming) |
| Parallelism | None (single module) | Good (per-module) |
| Recommended for | Final release builds | CI builds, development |
Cross-language LTO — optimize across Rust and C boundaries:
[profile.release]lto = true# Cargo.toml — for crates using the cc crate[build-dependencies]cc = "1.0"// build.rs — enable cross-language (linker-plugin) LTO
fn main() {
// The cc crate respects CFLAGS from the environment.
// For cross-language LTO, compile C code with:
// -flto=thin -O2
cc::Build::new()
.file("csrc/fast_parser.c")
.flag("-flto=thin")
.opt_level(2)
.compile("fast_parser");
}# Enable linker-plugin LTO (requires compatible LLD or gold linker)RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld" \ cargo build --releaseCross-language LTO allows LLVM to inline C functions into Rust callers and vice versa. This is most impactful for FFI-heavy code where small C functions are called frequently (e.g., IPMI ioctl wrappers).
Binary Size Analysis with cargo-bloat
cargo-bloat answers:
"What functions and crates are taking up the most space in my binary?"
# Installcargo install cargo-bloat# Show largest functionscargo bloat --release -n 20# Output:# File .text Size Crate Name# 2.8% 5.1% 78.5KiB serde_json serde_json::de::Deserializer::parse_...# 2.1% 3.8% 58.2KiB regex_syntax regex_syntax::ast::parse::ParserI::p...# 1.5% 2.7% 42.1KiB accel_diag accel_diag::vendor::parse_smi_output# ...# Show by crate (which dependencies are biggest)cargo bloat --release --crates# Output:# File .text Size Crate# 12.3% 22.1% 340KiB serde_json# 8.7% 15.6% 240KiB regex# 6.2% 11.1% 170KiB std# 5.1% 9.2% 141KiB accel_diag# ...# Compare two builds (before/after optimization)cargo bloat --release --crates > before.txt# ... make changes ...cargo bloat --release --crates > after.txtdiff before.txt after.txtCommon bloat sources and fixes:
| Bloat Source | Typical Size | Fix |
|---|---|---|
regex (full engine) | 200-400 KB | Use regex-lite if you don't need Unicode |
serde_json (full) | 200-350 KB | Consider simd-json or sonic-rs if perf matters |
| Generics monomorphization | Varies | Use dyn Trait at API boundaries |
Formatting machinery (Display, Debug) | 50-150 KB | #[derive(Debug)] on large enums adds up |
| Panic message strings | 20-80 KB | panic = "abort" removes unwinding, strip removes strings |
| Unused features | Varies | Disable default features: serde = { version = "1", default-features = false } |
Trimming Dependencies with cargo-udeps
cargo-udeps finds dependencies
declared in Cargo.toml that your code doesn't actually use:
# Install (requires nightly)cargo install cargo-udeps# Find unused dependenciescargo +nightly udeps --workspace# Output:# unused dependencies:# `diag_tool v0.1.0`# └── "tempfile" (dev-dependency)## `accel_diag v0.1.0` # └── "once_cell" ← was needed before LazyLock, now deadEvery unused dependency:
- Increases compile time
- Increases binary size
- Adds supply chain risk
- Adds potential license complications
Alternative: cargo-machete — faster, heuristic-based approach:
cargo install cargo-machetecargo machete# Faster but may have false positives (heuristic, not compilation-based)Size Optimization Decision Tree
🏋️ Exercises
🟢 Exercise 1: Measure LTO Impact
Build a project with default release settings, then with lto = true + codegen-units = 1 + strip = true. Compare binary size and compile time.
Solution
# Default releasecargo build --releasels -lh target/release/my-binarytime cargo build --release # Note time# Optimized release — add to Cargo.toml:# [profile.release]# lto = true# codegen-units = 1# strip = true# panic = "abort"cargo cleancargo build --releasels -lh target/release/my-binary # Typically 30-50% smallertime cargo build --release # Typically 2-3× slower to compile🟡 Exercise 2: Find Your Biggest Crate
Run cargo bloat --release --crates on a project. Identify the largest dependency. Can you reduce it by disabling default features or switching to a lighter alternative?
Solution
cargo install cargo-bloatcargo bloat --release --crates# Output:# File .text Size Crate# 12.3% 22.1% 340KiB serde_json# 8.7% 15.6% 240KiB regex# For regex — try regex-lite if you don't need Unicode:# regex-lite = "0.1" # ~10× smaller than full regex# For serde — disable default features if you don't need std:# serde = { version = "1", default-features = false, features = ["derive"] }cargo bloat --release --crates # Compare after changesKey Takeaways
lto = true+codegen-units = 1+strip = true+panic = "abort"is the production release profile- Thin LTO (
lto = "thin") gives 80% of Fat LTO's benefit at a fraction of the compile cost cargo-bloat --cratestells you exactly which dependencies are eating binary spacecargo-udepsandcargo-machetefind dead dependencies that waste compile time and binary size- Per-crate profile overrides let you optimize hot crates without slowing the whole build