Rust Engineering Practices

Miri, Valgrind, and Sanitizers — Verifying Unsafe Code 🔴

What you'll learn:

  • Miri as a MIR interpreter — what it catches (aliasing, UB, leaks) and what it can't (FFI, syscalls)
  • Valgrind memcheck, Helgrind (data races), Callgrind (profiling), and Massif (heap)
  • LLVM sanitizers: ASan, MSan, TSan, LSan with nightly -Zbuild-std
  • cargo-fuzz for crash discovery and loom for concurrency model checking
  • A decision tree for choosing the right verification tool

Cross-references: Code Coverage — coverage finds untested paths, Miri verifies the tested ones · no_std & Featuresno_std code often requires unsafe that Miri can verify · CI/CD Pipeline — Miri job in the pipeline

Safe Rust guarantees memory safety and data-race freedom at compile time. But the moment you write unsafe — for FFI, hand-rolled data structures, or performance tricks — those guarantees become your responsibility. This chapter covers the tools that verify your unsafe code actually upholds the safety contracts it claims.

Miri — An Interpreter for Unsafe Rust

Miri is an interpreter for Rust's Mid-level Intermediate Representation (MIR). Instead of compiling to machine code, Miri executes your program step-by-step with exhaustive checks for undefined behavior at every operation.

# Install Miri (nightly-only component)rustup +nightly component add miri# Run your test suite under Miricargo +nightly miri test# Run a specific binary under Miricargo +nightly miri run# Run a specific testcargo +nightly miri test -- test_name

How Miri works:

Source → rustc → MIR → Miri interprets MIR
                        │
                        ├─ Tracks every pointer's provenance
                        ├─ Validates every memory access
                        ├─ Checks alignment at every deref
                        ├─ Detects use-after-free
                        ├─ Detects data races (with threads)
                        └─ Enforces Stacked Borrows / Tree Borrows rules

What Miri Catches (and What It Cannot)

Miri detects:

CategoryExampleWould Crash at Runtime?
Out-of-bounds accessptr.add(100).read() past allocationSometimes (depends on page layout)
Use after freeReading a dropped Box through raw pointerSometimes (depends on allocator)
Double freeCalling drop_in_place twiceUsually
Unaligned access(ptr as *const u32).read() on odd addressOn some architectures
Invalid valuestransmute::<u8, bool>(2)Silently wrong
Dangling references&*ptr where ptr is freedNo (silent corruption)
Data racesTwo threads, one writing, no synchronizationIntermittent, hard to reproduce
Stacked Borrows violationAliasing &mut referencesNo (silent corruption)

Miri does NOT detect:

LimitationWhy
Logic bugsMiri checks memory safety, not correctness
Concurrency deadlocksMiri checks data races, not livelocks
Performance issuesInterpretation is 10-100× slower than native
OS/hardware interactionMiri can't emulate syscalls, device I/O
All FFI callsCan't interpret C code (only Rust MIR)
Exhaustive path coverageOnly tests the paths your test suite reaches

A concrete example — catching unsound code that "works" in practice:

#[cfg(test)]
mod tests {
    #[test]
    fn test_miri_catches_ub() {
        // This "works" in release builds but is undefined behavior
        let mut v = vec![1, 2, 3];
        let ptr = v.as_ptr();

        // Push may reallocate, invalidating ptr
        v.push(4);

        // ❌ UB: ptr may be dangling after reallocation
        // Miri will catch this even if the allocator happens to
        // not move the buffer.
        // let _val = unsafe { *ptr };
        // Error: Miri would report:
        //   "pointer to alloc1234 was dereferenced after this
        //    allocation got freed"
        
        // ✅ Correct: get a fresh pointer after mutation
        let ptr = v.as_ptr();
        let val = unsafe { *ptr };
        assert_eq!(val, 1);
    }
}

Running Miri on a Real Crate

Practical Miri workflow for a crate with unsafe:

# Step 1: Run all tests under Miricargo +nightly miri test 2>&1 | tee miri_output.txt# Step 2: If Miri reports errors, isolate themcargo +nightly miri test -- failing_test_name# Step 3: Use Miri's backtrace for diagnosisMIRIFLAGS="-Zmiri-backtrace=full" cargo +nightly miri test# Step 4: Choose a borrow model# Stacked Borrows (default, stricter):cargo +nightly miri test# Tree Borrows (experimental, more permissive):MIRIFLAGS="-Zmiri-tree-borrows" cargo +nightly miri test

Miri flags for common scenarios:

# Disable isolation (allow file system access, env vars)MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test# Memory leak detection is ON by default in Miri.# To suppress leak errors (e.g., for intentional leaks):# MIRIFLAGS="-Zmiri-ignore-leaks" cargo +nightly miri test# Seed the RNG for reproducible results with randomized testsMIRIFLAGS="-Zmiri-seed=42" cargo +nightly miri test# Enable strict provenance checkingMIRIFLAGS="-Zmiri-strict-provenance" cargo +nightly miri test# Multiple flagsMIRIFLAGS="-Zmiri-disable-isolation -Zmiri-backtrace=full -Zmiri-strict-provenance" \    cargo +nightly miri test

Miri in CI:

# .github/workflows/miri.ymlname: Mirion: [push, pull_request]jobs:  miri:    runs-on: ubuntu-latest    steps:      - uses: actions/checkout@v4      - uses: dtolnay/rust-toolchain@nightly        with:          components: miri      - name: Run Miri        run: cargo miri test --workspace        env:          MIRIFLAGS: "-Zmiri-backtrace=full"          # Leak checking is on by default.          # Skip tests that use system calls Miri can't handle          # (file I/O, networking, etc.)

Performance note: Miri is 10-100× slower than native execution. A test suite that runs in 5 seconds natively may take 5 minutes under Miri. In CI, run Miri on a focused subset: crates with unsafe code only.

Valgrind and Its Rust Integration

Valgrind is the classic C/C++ memory checker. It works on compiled Rust binaries too, checking for memory errors at the machine-code level.

# Install Valgrindsudo apt install valgrind  # Debian/Ubuntusudo dnf install valgrind  # Fedora# Build with debug info (Valgrind needs symbols)cargo build --tests# or for release with debug info:# cargo build --release# [profile.release]# debug = true# Run a specific test binary under Valgrindvalgrind --tool=memcheck \    --leak-check=full \    --show-leak-kinds=all \    --track-origins=yes \    ./target/debug/deps/my_crate-abc123 --test-threads=1# Run the main binaryvalgrind --tool=memcheck \    --leak-check=full \    --error-exitcode=1 \    ./target/debug/diag_tool --run-diagnostics

Valgrind tools beyond memcheck:

ToolCommandWhat It Detects
Memcheck--tool=memcheckMemory leaks, use-after-free, buffer overflows
Helgrind--tool=helgrindData races and lock-order violations
DRD--tool=drdData races (different detection algorithm)
Callgrind--tool=callgrindCPU instruction profiling (path-level)
Massif--tool=massifHeap memory profiling over time
Cachegrind--tool=cachegrindCache miss analysis

Using Callgrind for instruction-level profiling:

# Record instruction counts (more stable than wall-clock time)valgrind --tool=callgrind \    --callgrind-out-file=callgrind.out \    ./target/release/diag_tool --run-diagnostics# Visualize with KCachegrindkcachegrind callgrind.out# or the text-based alternative:callgrind_annotate callgrind.out | head -100

Miri vs Valgrind — when to use which:

AspectMiriValgrind
Checks Rust-specific UB✅ Stacked/Tree Borrows❌ Not aware of Rust rules
Checks C FFI code❌ Can't interpret C✅ Checks all machine code
Needs nightly✅ Yes❌ No
Speed10-100× slower10-50× slower
PlatformAny (interprets MIR)Linux, macOS (runs native code)
Data race detection✅ Yes✅ Yes (Helgrind/DRD)
Leak detection✅ Yes✅ Yes (more thorough)
False positivesVery rareOccasional (especially with allocators)

Use both:

  • Miri for pure-Rust unsafe code (Stacked Borrows, provenance)
  • Valgrind for FFI-heavy code and whole-program leak analysis

AddressSanitizer, MemorySanitizer, ThreadSanitizer

LLVM sanitizers are compile-time instrumentation passes that insert runtime checks. They're faster than Valgrind (2-5× overhead vs 10-50×) and catch different classes of bugs.

# Required: install Rust source for rebuilding std with sanitizer instrumentationrustup component add rust-src --toolchain nightly# AddressSanitizer (ASan) — buffer overflows, use-after-free, stack overflowsRUSTFLAGS="-Zsanitizer=address" \    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu# MemorySanitizer (MSan) — uninitialized memory readsRUSTFLAGS="-Zsanitizer=memory" \    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu# ThreadSanitizer (TSan) — data racesRUSTFLAGS="-Zsanitizer=thread" \    cargo +nightly test -Zbuild-std --target x86_64-unknown-linux-gnu# LeakSanitizer (LSan) — memory leaks (included in ASan by default)RUSTFLAGS="-Zsanitizer=leak" \    cargo +nightly test --target x86_64-unknown-linux-gnu

Note: ASan, MSan, and TSan require -Zbuild-std to rebuild the standard library with sanitizer instrumentation. LSan does not.

Sanitizer comparison:

SanitizerOverheadCatchesNightly?-Zbuild-std?
ASan2× memory, 2× CPUBuffer overflow, use-after-free, stack overflowYesYes
MSan3× memory, 3× CPUUninitialized readsYesYes
TSan5-10× memory, 5× CPUData racesYesYes
LSanMinimalMemory leaksYesNo

Practical example — catching a data race with TSan:

use std::sync::Arc;
use std::thread;

fn racy_counter() -> u64 {
    // ❌ UB: unsynchronized shared mutable state
    let data = Arc::new(std::cell::UnsafeCell::new(0u64));
    let mut handles = vec![];

    for _ in 0..4 {
        let data = Arc::clone(&data);
        handles.push(thread::spawn(move || {
            for _ in 0..1000 {
                // SAFETY: UNSOUND — data race!
                unsafe {
                    *data.get() += 1;
                }
            }
        }));
    }

    for h in handles {
        h.join().unwrap();
    }

    // Value should be 4000 but may be anything due to race
    unsafe { *data.get() }
}

// Both Miri and TSan catch this:
// Miri:  "Data race detected between (1) write and (2) write"
// TSan:  "WARNING: ThreadSanitizer: data race"
//
// Fix: use AtomicU64 or Mutex<u64>

cargo-fuzz — Coverage-Guided Fuzzing (finds crashes in parsers and decoders):

# Installcargo install cargo-fuzz# Initialize a fuzz targetcargo fuzz initcargo fuzz add parse_gpu_csv
// fuzz/fuzz_targets/parse_gpu_csv.rs
#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    if let Ok(s) = std::str::from_utf8(data) {
        // The fuzzer generates millions of inputs looking for panics/crashes.
        let _ = diag_tool::parse_gpu_csv(s);
    }
});
# Run the fuzzer (runs until interrupted or crash found)cargo +nightly fuzz run parse_gpu_csv -- -max_total_time=300  # 5 minutes# Minimize a crashcargo +nightly fuzz tmin parse_gpu_csv artifacts/parse_gpu_csv/crash-...

When to fuzz: Any function that parses untrusted/semi-trusted input (sensor output, config files, network data, JSON/CSV). Fuzzing found real bugs in every major Rust parser crate (serde, regex, image).

loom — Concurrency Model Checker (exhaustively tests atomic orderings):

[dev-dependencies]loom = "0.7"
#[cfg(loom)]
mod tests {
    use loom::sync::atomic::{AtomicUsize, Ordering};
    use loom::thread;

    #[test]
    fn test_counter_is_atomic() {
        loom::model(|| {
            let counter = loom::sync::Arc::new(AtomicUsize::new(0));
            let c1 = counter.clone();
            let c2 = counter.clone();

            let t1 = thread::spawn(move || { c1.fetch_add(1, Ordering::SeqCst); });
            let t2 = thread::spawn(move || { c2.fetch_add(1, Ordering::SeqCst); });

            t1.join().unwrap();
            t2.join().unwrap();

            // loom explores ALL possible thread interleavings
            assert_eq!(counter.load(Ordering::SeqCst), 2);
        });
    }
}

When to use loom: When you have lock-free data structures or custom synchronization primitives. Loom exhaustively explores thread interleavings — it's a model checker, not a stress test. Not needed for Mutex/RwLock-based code.

When to Use Which Tool

Decision tree for unsafe verification:

Is the code pure Rust (no FFI)?
├─ Yes → Use Miri (catches Rust-specific UB, Stacked Borrows)
│        Also run ASan in CI for defense-in-depth
└─ No (calls C/C++ code via FFI)
   ├─ Memory safety concerns?
   │  └─ Yes → Use Valgrind memcheck AND ASan
   ├─ Concurrency concerns?
   │  └─ Yes → Use TSan (faster) or Helgrind (more thorough)
   └─ Memory leak concerns?
      └─ Yes → Use Valgrind --leak-check=full

Recommended CI matrix:

# Run all tools in parallel for fast feedbackjobs:  miri:    runs-on: ubuntu-latest    steps:      - uses: dtolnay/rust-toolchain@nightly        with: { components: miri }      - run: cargo miri test --workspace  asan:    runs-on: ubuntu-latest    steps:      - uses: dtolnay/rust-toolchain@nightly      - run: |          RUSTFLAGS="-Zsanitizer=address" \          cargo test -Zbuild-std --target x86_64-unknown-linux-gnu  valgrind:    runs-on: ubuntu-latest    steps:      - run: sudo apt-get install -y valgrind      - uses: dtolnay/rust-toolchain@stable      - run: cargo build --tests      - run: |          for test_bin in $(find target/debug/deps -maxdepth 1 -executable -type f ! -name '*.d'); do            valgrind --error-exitcode=1 --leak-check=full "$test_bin" --test-threads=1          done

Application: Zero Unsafe — and When You'll Need It

The project contains zero unsafe blocks across 90K+ lines of Rust. This is a remarkable achievement for a systems-level diagnostics tool and demonstrates that safe Rust is sufficient for:

  • IPMI communication (via std::process::Command to ipmitool)
  • GPU queries (via std::process::Command to accel-query)
  • PCIe topology parsing (pure JSON/text parsing)
  • SEL record management (pure data structures)
  • DER report generation (JSON serialization)

When will the project need unsafe?

The likely triggers for introducing unsafe:

ScenarioWhy unsafeRecommended Verification
Direct ioctl-based IPMIlibc::ioctl() bypasses ipmitool subprocessMiri + Valgrind
Direct GPU driver queriesaccel-mgmt FFI instead of accel-query parsingValgrind (C library)
Memory-mapped PCIe configmmap for direct config-space readsASan + Valgrind
Lock-free SEL bufferAtomicPtr for concurrent event collectionMiri + TSan
Embedded/no_std variantRaw pointer manipulation for bare-metalMiri

Preparation: Before introducing unsafe, add the verification tools to CI:

# Cargo.toml — add a feature flag for unsafe optimizations[features]default = []direct-ipmi = []     # Enable direct ioctl IPMI instead of ipmitool subprocessdirect-accel-api = []     # Enable accel-mgmt FFI instead of accel-query parsing
// src/ipmi.rs — gated behind a feature flag
#[cfg(feature = "direct-ipmi")]
mod direct {
    //! Direct IPMI device access via /dev/ipmi0 ioctl.
    //!
    //! # Safety
    //! This module uses `unsafe` for ioctl system calls.
    //! Verified with: Miri (where possible), Valgrind memcheck, ASan.

    use std::os::unix::io::RawFd;

    // ... unsafe ioctl implementation ...
}

#[cfg(not(feature = "direct-ipmi"))]
mod subprocess {
    //! IPMI via ipmitool subprocess (default, fully safe).
    // ... current implementation ...
}

Key insight: Keep unsafe behind feature flags so it can be verified independently. Run cargo +nightly miri test --features direct-ipmi in CI to continuously verify the unsafe paths without affecting the safe default build.

cargo-careful — Extra UB Checks on Stable

cargo-careful runs your code with extra standard library checks enabled — catching some undefined behavior that normal builds ignore, without requiring nightly or Miri's 10-100× slowdown:

# Install (requires nightly, but runs your code at near-native speed)cargo install cargo-careful# Run tests with extra UB checks (catches uninitialized memory, invalid values)cargo +nightly careful test# Run a binary with extra checkscargo +nightly careful run -- --run-diagnostics

What cargo-careful catches that normal builds don't:

  • Reads of uninitialized memory in MaybeUninit and zeroed()
  • Creating invalid bool, char, or enum values via transmute
  • Unaligned pointer reads/writes
  • copy_nonoverlapping with overlapping ranges

Where it fits in the verification ladder:

Least overhead                                          Most thorough
├─ cargo test ──► cargo careful test ──► Miri ──► ASan ──► Valgrind ─┤
│  (0× overhead)  (~1.5× overhead)   (10-100×)  (2×)     (10-50×)   │
│  Safe Rust only  Catches some UB    Pure-Rust  FFI+Rust FFI+Rust   │

Recommendation: Add cargo +nightly careful test to CI as a fast safety check. It runs at near-native speed (unlike Miri) and catches real bugs that safe Rust abstractions mask.

Troubleshooting Miri and Sanitizers

SymptomCauseFix
Miri does not support FFIMiri is a Rust interpreter; it can't execute C codeUse Valgrind or ASan for FFI code instead
error: unsupported operation: can't call foreign functionMiri hit an extern "C" callMock the FFI boundary or gate behind #[cfg(miri)]
Stacked Borrows violationAliasing rule violation — even if code "works"Miri is correct; refactor to avoid aliasing &mut with &
Sanitizer says DEADLYSIGNALASan detected buffer overflowCheck array indexing, slice operations, and pointer arithmetic
LeakSanitizer: detected memory leaksBox::leak(), forget(), or missing drop()Intentional: suppress with __lsan_disable(); unintentional: fix the leak
Miri is extremely slowMiri interprets, doesn't compile — 10-100× slowerRun only on --lib tests or tag specific tests with #[cfg_attr(miri, ignore)] for slow ones
TSan: false positive with atomicsTSan doesn't understand Rust's atomic ordering model perfectlyAdd TSAN_OPTIONS=suppressions=tsan.supp with specific suppressions

Try It Yourself

  1. Trigger a Miri UB detection: Write an unsafe function that creates two &mut references to the same i32 (aliasing violation). Run cargo +nightly miri test and observe the "Stacked Borrows" error. Fix it with UnsafeCell or separate allocations.

  2. Run ASan on a deliberate bug: Create a test that does unsafe out-of-bounds array access. Build with RUSTFLAGS="-Zsanitizer=address" and observe ASan's report. Note how it pinpoints the exact line.

  3. Benchmark Miri overhead: Time cargo test --lib vs cargo +nightly miri test --lib on the same test suite. Calculate the slowdown factor. Based on this, decide which tests to run under Miri in CI and which to skip with #[cfg_attr(miri, ignore)].

Safety Verification Decision Tree

🏋️ Exercises

🟡 Exercise 1: Trigger a Miri UB Detection

Write an unsafe function that creates two &mut references to the same i32 (aliasing violation). Run cargo +nightly miri test and observe the Stacked Borrows error. Fix it.

Solution
#[cfg(test)]
mod tests {
    #[test]
    fn aliasing_ub() {
        let mut x: i32 = 42;
        let ptr = &mut x as *mut i32;
        unsafe {
            // BUG: Two &mut references to the same location
            let _a = &mut *ptr;
            let _b = &mut *ptr; // Miri: Stacked Borrows violation!
        }
    }
}

Fix: use separate allocations or UnsafeCell:

use std::cell::UnsafeCell;

#[test]
fn no_aliasing_ub() {
    let x = UnsafeCell::new(42);
    unsafe {
        let a = &mut *x.get();
        *a = 100;
    }
}

🔴 Exercise 2: ASan Out-of-Bounds Detection

Create a test with unsafe out-of-bounds array access. Build with RUSTFLAGS="-Zsanitizer=address" on nightly and observe ASan's report.

Solution
#[test]
fn oob_access() {
    let arr = [1u8, 2, 3, 4, 5];
    let ptr = arr.as_ptr();
    unsafe {
        let _val = *ptr.add(10); // Out of bounds!
    }
}
RUSTFLAGS="-Zsanitizer=address" cargo +nightly test -Zbuild-std \  --target x86_64-unknown-linux-gnu -- oob_access# ASan report: stack-buffer-overflow at <exact address>

Key Takeaways

  • Miri is the tool for pure-Rust unsafe — it catches aliasing violations, use-after-free, and leaks that compile and pass tests
  • Valgrind is the tool for FFI/C interop — it works on the final binary without recompilation
  • Sanitizers (ASan, TSan, MSan) require nightly but run at near-native speed — ideal for large test suites
  • loom is purpose-built for verifying lock-free concurrent data structures
  • Run Miri in CI on every push; run sanitizers on a nightly schedule to avoid slowing the main pipeline