RRust By Example

Rust AI Inference Pitfalls

Common pitfalls when building AI inference in Rust: silent precision loss, wrong memory order, batch deadlocks, model state mutation bugs, and runtime version mismatches.

Topic: Ai Inference

Search intent: High-intent search: "rust ai inference pitfalls gotchas"

Rust AI Inference Pitfalls

Pitfall 1: Silent f32 precision loss from f64 conversion

Rust will happily convert f64 to f32 with as, truncating precision without warning.

rust
fn main() {
    // Pitfall: f64 model input converted to f32 silently
    let precise_input: f64 = 3.141592653589793;
    let model_input = precise_input as f32; // Silent precision loss!
    println!("f64: {:.15}", precise_input); // 3.141592653589793
    println!("f32: {:.15}", model_input);   // 3.141592741012573 ← different!

    // Fix: be explicit about the conversion and document the precision budget
    let input_f32: f32 = precise_input as f32;
    let roundtrip_f64 = input_f32 as f64;
    let error = (precise_input - roundtrip_f64).abs();
    println!("Conversion error: {:.2e}", error); // ~1e-7 — acceptable for most models
}

Fix: Decide on the canonical precision early. For most neural networks, f32 is correct. Only use f64 if the model explicitly requires it.

---

Pitfall 2: Atomics with wrong memory ordering

Using Relaxed ordering for metrics is fine, but using it for model readiness flags causes subtle bugs.

rust
use std::sync::atomic::{AtomicBool, Ordering};

static MODEL_READY: AtomicBool = AtomicBool::new(false);
static mut MODEL_PTR: usize = 0; // Simulated model pointer

fn load_model() {
    // ❌ PITFALL: Using Relaxed doesn't guarantee the model data
    // is visible to other threads before MODEL_READY is set
    unsafe { MODEL_PTR = 0xDEAD_BEEF; } // Write model data
    MODEL_READY.store(true, Ordering::Relaxed); // NOT safe!
}

fn load_model_correct() {
    // ✅ FIX: Release ordering ensures all preceding writes are visible
    // before any thread reads MODEL_READY as true
    unsafe { MODEL_PTR = 0xDEAD_BEEF; }
    MODEL_READY.store(true, Ordering::Release);
}

fn use_model() {
    // ✅ FIX: Acquire ordering pairs with Release to see all prior writes
    if MODEL_READY.load(Ordering::Acquire) {
        println!("Model is ready and data is visible");
    }
}

fn main() {
    load_model_correct();
    use_model();
}

Rule: Use Acquire/Release for any flag that guards access to other data. Reserve Relaxed for pure counters (metrics, stats).

---

Pitfall 3: Batch accumulation deadlock

When using a channel-based batcher, if the batch sender also waits for a response, and the batcher is waiting for the batch to fill — you get a deadlock.

rust
use tokio::sync::{mpsc, oneshot};

// ❌ PITFALL: Single-threaded scenario where batcher and client
// are on the same task — the .await on response blocks the batcher
// from processing its own queue.

// This is why the batcher MUST run on a SEPARATE spawned task:
async fn correct_setup() {
    let (tx, mut rx) = mpsc::channel::<(Vec<f32>, oneshot::Sender<Vec<f32>>)>(64);

    // ✅ FIX: Batcher runs independently
    tokio::spawn(async move {
        while let Some((input, resp_tx)) = rx.recv().await {
            let result = input.iter().map(|x| x * 2.0).collect();
            let _ = resp_tx.send(result);
        }
    });

    // Client code
    let (resp_tx, resp_rx) = oneshot::channel();
    tx.send((vec![1.0, 2.0], resp_tx)).await.unwrap();
    let result = resp_rx.await.unwrap();
    println!("Result: {:?}", result);
}

#[tokio::main]
async fn main() { correct_setup().await; }

---

Pitfall 4: Mutable model state during inference

Neural networks are typically stateless (pure function of weights + input). Mutable state during inference causes race conditions.

rust
use std::sync::{Arc, RwLock};

// ❌ PITFALL: Mutable counter inside model causes contention
struct BadModel {
    weights: Vec<f32>,
    call_count: RwLock<u64>, // Unnecessary contention on every call!
}

// ✅ FIX: Keep inference function pure; track metrics separately
struct GoodModel {
    weights: Arc<Vec<f32>>, // Immutable during inference
}

impl GoodModel {
    fn infer(&self, input: &[f32]) -> Vec<f32> {
        // Pure function — no mutable state
        input.iter().zip(self.weights.iter()).map(|(x, w)| x * w).collect()
    }
}

// Track call count outside the model
use std::sync::atomic::{AtomicU64, Ordering};
static CALL_COUNT: AtomicU64 = AtomicU64::new(0);

fn main() {
    let model = Arc::new(GoodModel { weights: Arc::new(vec![0.5; 4]) });
    let input = [1.0f32, 2.0, 3.0, 4.0];
    let output = model.infer(&input);
    CALL_COUNT.fetch_add(1, Ordering::Relaxed);
    println!("Output: {:?}, calls: {}", output, CALL_COUNT.load(Ordering::Relaxed));
}

---

Pitfall 5: ONNX model input name mismatch

When using ort (ONNX Runtime), input names must match exactly. A wrong name silently uses default values or panics.

rust
// ❌ PITFALL:
// session.run(inputs![
//     "input_ids" => array  // ← if model expects "input" this silently fails
// ])

// ✅ FIX: Always print model input names at startup
// let inputs = session.inputs.iter().map(|i| &i.name).collect::<Vec<_>>();
// println!("Model inputs: {:?}", inputs);
// assert!(inputs.contains(&&"input_ids".to_string()), "Wrong input name!");

fn main() {
    println!("Always verify model input names at startup with session.inputs");
}

---

Pitfall 6: Reusing tokenizer state across threads

Some tokenizers (e.g., tokenizers crate's Tokenizer) are not thread-safe when using encode_batch with side effects.

rust
use std::sync::Arc;

// ❌ PITFALL: Sharing mutable tokenizer state
// let tokenizer = Tokenizer::new(...);  // Not Send+Sync in some versions
// let shared = Arc::new(tokenizer);

// ✅ FIX: Clone tokenizer per thread (they're cheap to clone after init)
// let tokenizer = Arc::new(tokenizer);
// thread::spawn(move || {
//     let local_tokenizer = tokenizer.as_ref().clone(); // Clone for this thread
//     local_tokenizer.encode("text", true)
// });

fn main() {
    println!("Clone tokenizer per thread if it's not Sync");
}

Related reading

Related Guides

Continue in This Topic

More Rust Guides