RRust By Example

Rust AI Inference Debug Checklist

Step-by-step debug checklist for AI inference issues in Rust. Use this checklist when your inference server is slow, crashing, or producing incorrect results.

Topic: Ai Inference

Search intent: High-intent search: "rust ai inference debug checklist"

Rust AI Inference Debug Checklist

Use this checklist systematically when diagnosing any issue with a Rust-based AI inference service.

Step 1: Confirm the build profile

bash
# Always profile and debug with release mode
cargo build --release

# Debug builds can be 10–50x slower — never benchmark debug builds
cargo run --release -- --port 8080
  • [ ] Running --release build, not debug.
  • [ ] Confirm opt-level = 3 in Cargo.toml [profile.release].

Step 2: Check request lifecycle

rust
use std::time::Instant;
use tracing::{info, instrument};

#[instrument(skip(input))]
async fn traced_infer(request_id: u64, input: Vec<f32>) -> Vec<f32> {
    let t0 = Instant::now();
    info!(request_id, input_len = input.len(), "inference started");

    let result = tokio::task::spawn_blocking(move || {
        input.iter().map(|x| x * 2.0).collect::<Vec<_>>()
    }).await.unwrap();

    info!(
        request_id,
        duration_ms = t0.elapsed().as_millis(),
        output_len = result.len(),
        "inference complete"
    );
    result
}

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt::init();
    let r = traced_infer(1, vec![1.0, 2.0, 3.0]).await;
    println!("{:?}", r);
}
  • [ ] Every request has a unique ID that appears in all log lines.
  • [ ] Log timestamps at: enqueue, dequeue, inference start, inference end, response sent.

Step 3: Validate input before inference

rust
fn check_input(input: &[f32]) -> Result<(), String> {
    if input.is_empty() { return Err("empty input".into()); }
    if input.len() > 4096 { return Err(format!("too large: {}", input.len())); }
    if input.iter().any(|v| !v.is_finite()) { return Err("contains NaN or Inf".into()); }
    Ok(())
}

fn main() {
    let cases = [
        vec![1.0f32, 2.0, 3.0],
        vec![f32::NAN],
        vec![],
    ];
    for c in &cases {
        println!("{:?} → {:?}", &c[..c.len().min(2)], check_input(c));
    }
}
  • [ ] Input is not empty.
  • [ ] Input length is within expected bounds.
  • [ ] No NaN or Inf values in input.
  • [ ] Input dtype matches model expectation (f32 vs f16 vs i64).

Step 4: Check for blocking calls in async context

rust
// Find any of these patterns in your async code:
// std::thread::sleep(...)  → replace with tokio::time::sleep(...)
// std::sync::Mutex::lock() → replace with tokio::sync::Mutex or RwLock
// Heavy CPU loops           → wrap in spawn_blocking

// Quick audit: search your codebase for these
// grep -r "std::thread::sleep" src/
// grep -r "\.lock()" src/ | grep -v "spawn_blocking"
  • [ ] No std::thread::sleep inside async fn.
  • [ ] No std::sync::Mutex::lock() inside async fn that can be long-held.
  • [ ] All heavy compute is in spawn_blocking.

Step 5: Inspect queue and concurrency metrics

rust
use std::sync::atomic::{AtomicUsize, Ordering};

static QUEUE_DEPTH: AtomicUsize = AtomicUsize::new(0);
static IN_FLIGHT: AtomicUsize = AtomicUsize::new(0);

fn report_metrics() {
    println!(
        "queue_depth={} in_flight={}",
        QUEUE_DEPTH.load(Ordering::Relaxed),
        IN_FLIGHT.load(Ordering::Relaxed),
    );
}

fn main() { report_metrics(); }
  • [ ] Queue depth stays under 100 under normal load.
  • [ ] In-flight requests don't exceed max_concurrent setting.
  • [ ] No requests stuck in queue for > 5 seconds.

Step 6: Verify output sanity

rust
fn output_ok(output: &[f32]) -> bool {
    !output.is_empty()
        && output.iter().all(|v| v.is_finite())
        && output.iter().any(|&v| v != 0.0) // not all zeros
}

fn main() {
    let good = vec![0.1f32, 0.9, 0.5];
    let bad = vec![0.0f32; 8];
    println!("good: {}", output_ok(&good));
    println!("bad (all zeros): {}", output_ok(&bad));
}
  • [ ] Output is not all-zeros.
  • [ ] Output contains no NaN/Inf values.
  • [ ] Output shape matches expected shape.
  • [ ] Softmax outputs sum to ~1.0 (classification models).

Step 7: Memory health check

bash
# Watch RSS over time
watch -n 5 'cat /proc/$(pgrep my-inference-server)/status | grep VmRSS'

# Check for leaks with valgrind (slower, but definitive)
valgrind --leak-check=full --track-origins=yes ./target/debug/my-server
  • [ ] RSS is stable after warm-up period.
  • [ ] No allocation spike on batch boundaries.
  • [ ] Buffer pool is returning buffers after each request.

Related reading

Related Guides

Continue in This Topic

More Rust Guides