RRust By Example

LLM Rust Review Checklist

Code review checklist for Rust LLM applications. Covers API security, prompt engineering quality, error handling, cost controls, observability, and production readiness.

Topic: Llm Rust

Search intent: High-intent search: "rust llm code review checklist"

LLM Rust Review Checklist

Use this checklist when reviewing Rust code that integrates with LLM APIs.

✅ Security

rust
// CHECK: API keys come from environment, not hardcoded
fn check_api_key_source() {
    // ✅ Good
    let key = std::env::var("OPENAI_API_KEY").expect("Set OPENAI_API_KEY");

    // ❌ Never do this
    // let key = "sk-prod-abc123...";

    // ❌ Never log the full key
    // println!("Using key: {}", key);

    // ✅ Safe masking for debugging
    let masked = format!("{}...", &key[..7.min(key.len())]);
    println!("Using key: {}", masked);
}

fn main() { check_api_key_source(); }
  • [ ] No API keys hardcoded in source code or committed to git.
  • [ ] Keys are loaded from environment variables or a secrets manager.
  • [ ] API keys are never included in error messages or logs.
  • [ ] Input is validated for prompt injection patterns before sending to LLM.
  • [ ] PII is scrubbed from prompts sent to external APIs.
  • [ ] Response content is filtered before returning to end users.

✅ Cost control

rust
fn check_cost_guards(
    model: &str,
    input_tokens: u32,
    requested_max_output: u32,
) -> Result<(), String> {
    let cost_per_1k = match model {
        "gpt-4o" => 0.005,
        "gpt-4o-mini" => 0.00015,
        "claude-3-5-sonnet" => 0.003,
        _ => 0.005,
    };

    let max_single_request_usd = 1.0;
    let estimated = (input_tokens + requested_max_output) as f64 / 1000.0 * cost_per_1k;

    if estimated > max_single_request_usd {
        return Err(format!(
            "Estimated cost ${:.4} exceeds limit ${:.2}",
            estimated, max_single_request_usd
        ));
    }
    Ok(())
}

#[cfg(test)]
mod cost_tests {
    use super::*;
    #[test]
    fn test_blocks_expensive_request() {
        assert!(check_cost_guards("gpt-4o", 100_000, 10_000).is_err());
    }
    #[test]
    fn test_allows_normal_request() {
        assert!(check_cost_guards("gpt-4o", 1000, 500).is_ok());
    }
}
  • [ ] max_tokens is always set (never unlimited).
  • [ ] Per-request cost estimate checked before calling API.
  • [ ] Per-tenant token budget enforced.
  • [ ] Token usage logged for cost attribution after each call.
  • [ ] Alert configured for anomalous cost spikes.

✅ Error handling

rust
#[derive(Debug)]
enum LlmError {
    RateLimit { retry_after_secs: u64 },
    AuthError,
    InvalidRequest(String),
    ProviderError(u16),
    Timeout,
    ParseError(String),
}

impl LlmError {
    fn is_retryable(&self) -> bool {
        matches!(self, Self::RateLimit { .. } | Self::ProviderError(500..=504) | Self::Timeout)
    }

    fn http_status_for_client(&self) -> u16 {
        match self {
            Self::RateLimit { .. } => 429,
            Self::AuthError => 503, // Don't expose auth issues to end users
            Self::InvalidRequest(_) => 400,
            Self::ProviderError(_) => 503,
            Self::Timeout => 504,
            Self::ParseError(_) => 500,
        }
    }
}
  • [ ] Errors are typed (enum), not String.
  • [ ] Retryable vs non-retryable errors are distinguished.
  • [ ] Retry uses exponential backoff with jitter.
  • [ ] finish_reason: "length" is detected and handled (truncated response).
  • [ ] Provider errors are mapped to appropriate HTTP status codes.
  • [ ] No unwrap() or expect() in request-handling paths.

✅ Observability

  • [ ] Every LLM call logged with: model, prompt_tokens, completion_tokens, latency_ms.
  • [ ] TTFT (time-to-first-token) measured separately for streaming calls.
  • [ ] Cache hit/miss tracked.
  • [ ] Provider name included in all metrics labels.
  • [ ] Error rate by provider and error type exported to Prometheus.

✅ Prompt quality

rust
fn validate_prompt_quality(system: &str, messages: &[(&str, &str)]) -> Vec<String> {
    let mut warnings = Vec::new();

    if system.is_empty() {
        warnings.push("System prompt is empty — model has no persona".to_string());
    }
    if system.len() < 20 {
        warnings.push("System prompt is very short — consider adding more context".to_string());
    }

    let total_chars: usize = messages.iter().map(|(_, c)| c.len()).sum::<usize>() + system.len();
    if total_chars > 32_000 {
        warnings.push(format!("Prompt is {} chars — may exceed context window", total_chars));
    }

    let has_user_turn = messages.iter().any(|(r, _)| *r == "user");
    if !has_user_turn {
        warnings.push("No user turn in messages".to_string());
    }

    warnings
}

fn main() {
    let warnings = validate_prompt_quality(
        "You are a helpful assistant.",
        &[("user", "Explain async Rust")],
    );
    if warnings.is_empty() {
        println!("✅ Prompt looks good");
    } else {
        for w in warnings { println!("⚠️  {}", w); }
    }
}
  • [ ] System prompt is non-empty and provides clear persona/instructions.
  • [ ] Context window usage estimated; alert if > 80% of limit.
  • [ ] Few-shot examples are high quality and representative.
  • [ ] Output format instructions are explicit when structured output is expected.

Related reading

Related Guides

Continue in This Topic

More Rust Guides