Rust AI Inference Debug Checklist

Use this checklist systematically when diagnosing any issue with a Rust-based AI inference service.

Step 1: Confirm the build profile

bash

# Always profile and debug with release mode
cargo build --release

# Debug builds can be 10–50x slower — never benchmark debug builds
cargo run --release -- --port 8080

[ ] Running --release build, not debug.
[ ] Confirm opt-level = 3 in Cargo.toml [profile.release].

Step 2: Check request lifecycle

rust

use std::time::Instant;
use tracing::{info, instrument};

#[instrument(skip(input))]
async fn traced_infer(request_id: u64, input: Vec<f32>) -> Vec<f32> {
    let t0 = Instant::now();
    info!(request_id, input_len = input.len(), "inference started");

    let result = tokio::task::spawn_blocking(move || {
        input.iter().map(|x| x * 2.0).collect::<Vec<_>>()
    }).await.unwrap();

    info!(
        request_id,
        duration_ms = t0.elapsed().as_millis(),
        output_len = result.len(),
        "inference complete"
    );
    result
}

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt::init();
    let r = traced_infer(1, vec![1.0, 2.0, 3.0]).await;
    println!("{:?}", r);
}

[ ] Every request has a unique ID that appears in all log lines.
[ ] Log timestamps at: enqueue, dequeue, inference start, inference end, response sent.

Step 3: Validate input before inference

rust

fn check_input(input: &[f32]) -> Result<(), String> {
    if input.is_empty() { return Err("empty input".into()); }
    if input.len() > 4096 { return Err(format!("too large: {}", input.len())); }
    if input.iter().any(|v| !v.is_finite()) { return Err("contains NaN or Inf".into()); }
    Ok(())
}

fn main() {
    let cases = [
        vec![1.0f32, 2.0, 3.0],
        vec![f32::NAN],
        vec![],
    ];
    for c in &cases {
        println!("{:?} → {:?}", &c[..c.len().min(2)], check_input(c));
    }
}

[ ] Input is not empty.
[ ] Input length is within expected bounds.
[ ] No NaN or Inf values in input.
[ ] Input dtype matches model expectation (f32 vs f16 vs i64).

Step 4: Check for blocking calls in async context

rust

// Find any of these patterns in your async code:
// std::thread::sleep(...)  → replace with tokio::time::sleep(...)
// std::sync::Mutex::lock() → replace with tokio::sync::Mutex or RwLock
// Heavy CPU loops           → wrap in spawn_blocking

// Quick audit: search your codebase for these
// grep -r "std::thread::sleep" src/
// grep -r "\.lock()" src/ | grep -v "spawn_blocking"

[ ] No std::thread::sleep inside async fn.
[ ] No std::sync::Mutex::lock() inside async fn that can be long-held.
[ ] All heavy compute is in spawn_blocking.

Step 5: Inspect queue and concurrency metrics

rust

use std::sync::atomic::{AtomicUsize, Ordering};

static QUEUE_DEPTH: AtomicUsize = AtomicUsize::new(0);
static IN_FLIGHT: AtomicUsize = AtomicUsize::new(0);

fn report_metrics() {
    println!(
        "queue_depth={} in_flight={}",
        QUEUE_DEPTH.load(Ordering::Relaxed),
        IN_FLIGHT.load(Ordering::Relaxed),
    );
}

fn main() { report_metrics(); }

[ ] Queue depth stays under 100 under normal load.
[ ] In-flight requests don't exceed max_concurrent setting.
[ ] No requests stuck in queue for > 5 seconds.

Step 6: Verify output sanity

rust

fn output_ok(output: &[f32]) -> bool {
    !output.is_empty()
        && output.iter().all(|v| v.is_finite())
        && output.iter().any(|&v| v != 0.0) // not all zeros
}

fn main() {
    let good = vec![0.1f32, 0.9, 0.5];
    let bad = vec![0.0f32; 8];
    println!("good: {}", output_ok(&good));
    println!("bad (all zeros): {}", output_ok(&bad));
}

[ ] Output is not all-zeros.
[ ] Output contains no NaN/Inf values.
[ ] Output shape matches expected shape.
[ ] Softmax outputs sum to ~1.0 (classification models).

Step 7: Memory health check

bash

# Watch RSS over time
watch -n 5 'cat /proc/$(pgrep my-inference-server)/status | grep VmRSS'

# Check for leaks with valgrind (slower, but definitive)
valgrind --leak-check=full --track-origins=yes ./target/debug/my-server

[ ] RSS is stable after warm-up period.
[ ] No allocation spike on batch boundaries.
[ ] Buffer pool is returning buffers after each request.

Rust AI Inference Debug Checklist

Rust AI Inference Debug Checklist

Step 1: Confirm the build profile

Step 2: Check request lifecycle

Step 3: Validate input before inference

Step 4: Check for blocking calls in async context

Step 5: Inspect queue and concurrency metrics

Step 6: Verify output sanity

Step 7: Memory health check

Related reading

Related Guides

Rust AI Inference Troubleshooting

Rust AI Inference Anti-Patterns

Continue in This Topic

Rust AI Inference Best Practices

Rust AI Inference Decision Matrix

More Rust Guides

Building LLM Applications with Rust

LLM API Gateway in Rust

LLM Rust Anti-Patterns

LLM Rust Benchmarking

LLM Rust Decision Matrix

LLM Rust Interview Q&A