LLM Rust Review Checklist

Use this checklist when reviewing Rust code that integrates with LLM APIs.

✅ Security

rust

// CHECK: API keys come from environment, not hardcoded
fn check_api_key_source() {
    // ✅ Good
    let key = std::env::var("OPENAI_API_KEY").expect("Set OPENAI_API_KEY");

    // ❌ Never do this
    // let key = "sk-prod-abc123...";

    // ❌ Never log the full key
    // println!("Using key: {}", key);

    // ✅ Safe masking for debugging
    let masked = format!("{}...", &key[..7.min(key.len())]);
    println!("Using key: {}", masked);
}

fn main() { check_api_key_source(); }

[ ] No API keys hardcoded in source code or committed to git.
[ ] Keys are loaded from environment variables or a secrets manager.
[ ] API keys are never included in error messages or logs.
[ ] Input is validated for prompt injection patterns before sending to LLM.
[ ] PII is scrubbed from prompts sent to external APIs.
[ ] Response content is filtered before returning to end users.

✅ Cost control

rust

fn check_cost_guards(
    model: &str,
    input_tokens: u32,
    requested_max_output: u32,
) -> Result<(), String> {
    let cost_per_1k = match model {
        "gpt-4o" => 0.005,
        "gpt-4o-mini" => 0.00015,
        "claude-3-5-sonnet" => 0.003,
        _ => 0.005,
    };

    let max_single_request_usd = 1.0;
    let estimated = (input_tokens + requested_max_output) as f64 / 1000.0 * cost_per_1k;

    if estimated > max_single_request_usd {
        return Err(format!(
            "Estimated cost ${:.4} exceeds limit ${:.2}",
            estimated, max_single_request_usd
        ));
    }
    Ok(())
}

#[cfg(test)]
mod cost_tests {
    use super::*;
    #[test]
    fn test_blocks_expensive_request() {
        assert!(check_cost_guards("gpt-4o", 100_000, 10_000).is_err());
    }
    #[test]
    fn test_allows_normal_request() {
        assert!(check_cost_guards("gpt-4o", 1000, 500).is_ok());
    }
}

[ ] max_tokens is always set (never unlimited).
[ ] Per-request cost estimate checked before calling API.
[ ] Per-tenant token budget enforced.
[ ] Token usage logged for cost attribution after each call.
[ ] Alert configured for anomalous cost spikes.

✅ Error handling

rust

#[derive(Debug)]
enum LlmError {
    RateLimit { retry_after_secs: u64 },
    AuthError,
    InvalidRequest(String),
    ProviderError(u16),
    Timeout,
    ParseError(String),
}

impl LlmError {
    fn is_retryable(&self) -> bool {
        matches!(self, Self::RateLimit { .. } | Self::ProviderError(500..=504) | Self::Timeout)
    }

    fn http_status_for_client(&self) -> u16 {
        match self {
            Self::RateLimit { .. } => 429,
            Self::AuthError => 503, // Don't expose auth issues to end users
            Self::InvalidRequest(_) => 400,
            Self::ProviderError(_) => 503,
            Self::Timeout => 504,
            Self::ParseError(_) => 500,
        }
    }
}

[ ] Errors are typed (enum), not String.
[ ] Retryable vs non-retryable errors are distinguished.
[ ] Retry uses exponential backoff with jitter.
[ ] finish_reason: "length" is detected and handled (truncated response).
[ ] Provider errors are mapped to appropriate HTTP status codes.
[ ] No unwrap() or expect() in request-handling paths.

✅ Observability

[ ] Every LLM call logged with: model, prompt_tokens, completion_tokens, latency_ms.
[ ] TTFT (time-to-first-token) measured separately for streaming calls.
[ ] Cache hit/miss tracked.
[ ] Provider name included in all metrics labels.
[ ] Error rate by provider and error type exported to Prometheus.

✅ Prompt quality

rust

fn validate_prompt_quality(system: &str, messages: &[(&str, &str)]) -> Vec<String> {
    let mut warnings = Vec::new();

    if system.is_empty() {
        warnings.push("System prompt is empty — model has no persona".to_string());
    }
    if system.len() < 20 {
        warnings.push("System prompt is very short — consider adding more context".to_string());
    }

    let total_chars: usize = messages.iter().map(|(_, c)| c.len()).sum::<usize>() + system.len();
    if total_chars > 32_000 {
        warnings.push(format!("Prompt is {} chars — may exceed context window", total_chars));
    }

    let has_user_turn = messages.iter().any(|(r, _)| *r == "user");
    if !has_user_turn {
        warnings.push("No user turn in messages".to_string());
    }

    warnings
}

fn main() {
    let warnings = validate_prompt_quality(
        "You are a helpful assistant.",
        &[("user", "Explain async Rust")],
    );
    if warnings.is_empty() {
        println!("✅ Prompt looks good");
    } else {
        for w in warnings { println!("⚠️  {}", w); }
    }
}

[ ] System prompt is non-empty and provides clear persona/instructions.
[ ] Context window usage estimated; alert if > 80% of limit.
[ ] Few-shot examples are high quality and representative.
[ ] Output format instructions are explicit when structured output is expected.

LLM Rust Review Checklist

LLM Rust Review Checklist

✅ Security

✅ Cost control

✅ Error handling

✅ Observability

✅ Prompt quality

Related reading

Related Guides

Building LLM Applications with Rust

LLM Rust Production Guide

Continue in This Topic

LLM Rust Real-World Cases

LLM Rust Scaling

More Rust Guides

Building LLM Applications with Rust

LLM API Gateway in Rust

LLM Rust Anti-Patterns

LLM Rust Benchmarking

LLM Rust Decision Matrix

LLM Rust Interview Q&A