RRust By Example

Rust AI Inference Anti-Patterns

Common mistakes and anti-patterns when building AI inference services in Rust. Learn what to avoid: blocking the async runtime, cloning tensors unnecessarily, missing backpressure, and more.

Topic: Ai Inference

Search intent: High-intent search: "rust ai inference mistakes anti-patterns"

Rust AI Inference Anti-Patterns

Overview

These are the most common mistakes engineers make when building AI inference services in Rust. Each anti-pattern shows the problematic code, explains why it's wrong, and shows the correct approach.

---

Anti-pattern 1: Blocking the async runtime with inference

rust
// ❌ WRONG: Running CPU-intensive inference inside an async task
// This starves other tasks on the Tokio thread pool
async fn bad_infer(input: Vec<f32>) -> Vec<f32> {
    // Heavy matrix operations here block the executor thread
    input.iter().map(|x| {
        // Simulate heavy computation
        let mut acc = *x;
        for _ in 0..1_000_000 { acc = acc.sin().cos(); }
        acc
    }).collect()
}
rust
// ✅ CORRECT: Offload to a blocking thread pool
async fn good_infer(input: Vec<f32>) -> Vec<f32> {
    tokio::task::spawn_blocking(move || {
        input.iter().map(|x| {
            let mut acc = *x;
            for _ in 0..1_000_000 { acc = acc.sin().cos(); }
            acc
        }).collect()
    })
    .await
    .expect("inference task panicked")
}

Why it matters: Tokio uses a small thread pool (default: CPU count). Blocking a thread prevents other async tasks from running, causing cascading latency spikes.

---

Anti-pattern 2: Cloning tensors on every request

rust
// ❌ WRONG: Cloning large model weights for every inference call
struct InferenceService {
    weights: Vec<f32>, // 100MB+ model weights
}

impl InferenceService {
    fn infer(&self, input: Vec<f32>) -> Vec<f32> {
        let local_weights = self.weights.clone(); // 100MB allocation every request!
        local_weights.iter().zip(&input).map(|(w, x)| w * x).collect()
    }
}
rust
// ✅ CORRECT: Share weights via Arc, never clone them
use std::sync::Arc;

struct InferenceService {
    weights: Arc<Vec<f32>>,
}

impl InferenceService {
    fn infer(&self, input: &[f32]) -> Vec<f32> {
        // Arc::clone is O(1) — just increments a reference count
        let weights = Arc::clone(&self.weights);
        weights.iter().zip(input).map(|(w, x)| w * x).collect()
    }
}

---

Anti-pattern 3: No backpressure on the request queue

rust
// ❌ WRONG: Unbounded channel lets queue grow without limit
use tokio::sync::mpsc;

async fn start_server_bad() {
    let (tx, mut rx) = mpsc::unbounded_channel::<Vec<f32>>();
    // Under load, this channel grows to millions of items → OOM
    tokio::spawn(async move {
        while let Some(req) = rx.recv().await {
            process(req).await;
        }
    });
}
rust
// ✅ CORRECT: Bounded channel with explicit backpressure
async fn start_server_good() {
    let (tx, mut rx) = mpsc::channel::<Vec<f32>>(128); // max 128 pending requests

    tokio::spawn(async move {
        while let Some(req) = rx.recv().await {
            process(req).await;
        }
    });

    // Sender side: handle full queue gracefully
    // tx.try_send(req).map_err(|_| Error::QueueFull)
}

async fn process(_req: Vec<f32>) {}

---

Anti-pattern 4: Using `unwrap()` in inference hot paths

rust
// ❌ WRONG: unwrap in hot path panics the whole server on model error
fn infer_bad(input: &[f32]) -> f32 {
    let result = run_model(input).unwrap(); // panics if model returns None
    result
}
fn run_model(_: &[f32]) -> Option<f32> { None }
rust
// ✅ CORRECT: Propagate errors explicitly; never unwrap in hot path
fn infer_good(input: &[f32]) -> Result<f32, String> {
    run_model_result(input).ok_or_else(|| "model returned no output".to_string())
}
fn run_model_result(_: &[f32]) -> Option<f32> { Some(1.0) }

---

Anti-pattern 5: Serializing inside the concurrency limit

rust
// ❌ WRONG: Holding the concurrency permit during JSON serialization
async fn handle_bad(semaphore: Arc<tokio::sync::Semaphore>, result: Vec<f32>) -> String {
    let _permit = semaphore.acquire().await.unwrap(); // holds permit
    let output = run_inference(&result); // inference
    serde_json::to_string(&output).unwrap() // serialization INSIDE permit!
}
fn run_inference(v: &[f32]) -> Vec<f32> { v.to_vec() }
rust
// ✅ CORRECT: Release concurrency permit before serialization
async fn handle_good(semaphore: Arc<tokio::sync::Semaphore>, result: Vec<f32>) -> String {
    let output = {
        let _permit = semaphore.acquire().await.unwrap();
        run_inference(&result) // permit released here when _permit drops
    };
    // Serialization happens outside the critical section
    serde_json::to_string(&output).unwrap()
}

---

Anti-pattern 6: Ignoring model warm-up

rust
// ❌ WRONG: First real user request pays the JIT/cache warm-up cost
async fn serve_immediately(model_path: &str) {
    let model = load_model(model_path);
    start_http_server(model).await; // First request will be slow!
}

fn load_model(_: &str) {}
async fn start_http_server(_: ()) {}
rust
// ✅ CORRECT: Run warm-up inference before accepting traffic
async fn serve_with_warmup(model_path: &str) {
    let model = load_model_v2(model_path);
    // Run a few dummy inferences to warm CPU caches and JIT paths
    for _ in 0..10 {
        warmup_inference(&model);
    }
    println!("Warm-up complete. Accepting traffic.");
    start_http_server_v2(model).await;
}

fn load_model_v2(_: &str) -> Vec<f32> { vec![1.0] }
fn warmup_inference(_: &[f32]) {}
async fn start_http_server_v2(_: Vec<f32>) {}

Summary table

| Anti-pattern | Impact | Fix |

|---|---|---|

| Blocking async runtime | Cascading latency | spawn_blocking |

| Cloning weights per request | OOM, high GC pressure | Arc |

| Unbounded queue | OOM under load | Bounded mpsc::channel |

| unwrap() in hot path | Server crash | Result propagation |

| Serializing inside lock | Reduced throughput | Release permit early |

| No model warm-up | Slow first requests | Warmup before serving |

Related reading

Related Guides

Continue in This Topic

Previous

No previous guide in this topic.

Next

Rust AI Inference Architecture

More Rust Guides