LLM Rust Review Checklist
Code review checklist for Rust LLM applications. Covers API security, prompt engineering quality, error handling, cost controls, observability, and production readiness.
Topic: Llm Rust
Search intent: High-intent search: "rust llm code review checklist"
LLM Rust Review Checklist
Use this checklist when reviewing Rust code that integrates with LLM APIs.
✅ Security
// CHECK: API keys come from environment, not hardcoded
fn check_api_key_source() {
// ✅ Good
let key = std::env::var("OPENAI_API_KEY").expect("Set OPENAI_API_KEY");
// ❌ Never do this
// let key = "sk-prod-abc123...";
// ❌ Never log the full key
// println!("Using key: {}", key);
// ✅ Safe masking for debugging
let masked = format!("{}...", &key[..7.min(key.len())]);
println!("Using key: {}", masked);
}
fn main() { check_api_key_source(); }- [ ] No API keys hardcoded in source code or committed to git.
- [ ] Keys are loaded from environment variables or a secrets manager.
- [ ] API keys are never included in error messages or logs.
- [ ] Input is validated for prompt injection patterns before sending to LLM.
- [ ] PII is scrubbed from prompts sent to external APIs.
- [ ] Response content is filtered before returning to end users.
✅ Cost control
fn check_cost_guards(
model: &str,
input_tokens: u32,
requested_max_output: u32,
) -> Result<(), String> {
let cost_per_1k = match model {
"gpt-4o" => 0.005,
"gpt-4o-mini" => 0.00015,
"claude-3-5-sonnet" => 0.003,
_ => 0.005,
};
let max_single_request_usd = 1.0;
let estimated = (input_tokens + requested_max_output) as f64 / 1000.0 * cost_per_1k;
if estimated > max_single_request_usd {
return Err(format!(
"Estimated cost ${:.4} exceeds limit ${:.2}",
estimated, max_single_request_usd
));
}
Ok(())
}
#[cfg(test)]
mod cost_tests {
use super::*;
#[test]
fn test_blocks_expensive_request() {
assert!(check_cost_guards("gpt-4o", 100_000, 10_000).is_err());
}
#[test]
fn test_allows_normal_request() {
assert!(check_cost_guards("gpt-4o", 1000, 500).is_ok());
}
}- [ ]
max_tokensis always set (never unlimited). - [ ] Per-request cost estimate checked before calling API.
- [ ] Per-tenant token budget enforced.
- [ ] Token usage logged for cost attribution after each call.
- [ ] Alert configured for anomalous cost spikes.
✅ Error handling
#[derive(Debug)]
enum LlmError {
RateLimit { retry_after_secs: u64 },
AuthError,
InvalidRequest(String),
ProviderError(u16),
Timeout,
ParseError(String),
}
impl LlmError {
fn is_retryable(&self) -> bool {
matches!(self, Self::RateLimit { .. } | Self::ProviderError(500..=504) | Self::Timeout)
}
fn http_status_for_client(&self) -> u16 {
match self {
Self::RateLimit { .. } => 429,
Self::AuthError => 503, // Don't expose auth issues to end users
Self::InvalidRequest(_) => 400,
Self::ProviderError(_) => 503,
Self::Timeout => 504,
Self::ParseError(_) => 500,
}
}
}- [ ] Errors are typed (
enum), notString. - [ ] Retryable vs non-retryable errors are distinguished.
- [ ] Retry uses exponential backoff with jitter.
- [ ]
finish_reason: "length"is detected and handled (truncated response). - [ ] Provider errors are mapped to appropriate HTTP status codes.
- [ ] No
unwrap()orexpect()in request-handling paths.
✅ Observability
- [ ] Every LLM call logged with: model, prompt_tokens, completion_tokens, latency_ms.
- [ ] TTFT (time-to-first-token) measured separately for streaming calls.
- [ ] Cache hit/miss tracked.
- [ ] Provider name included in all metrics labels.
- [ ] Error rate by provider and error type exported to Prometheus.
✅ Prompt quality
fn validate_prompt_quality(system: &str, messages: &[(&str, &str)]) -> Vec<String> {
let mut warnings = Vec::new();
if system.is_empty() {
warnings.push("System prompt is empty — model has no persona".to_string());
}
if system.len() < 20 {
warnings.push("System prompt is very short — consider adding more context".to_string());
}
let total_chars: usize = messages.iter().map(|(_, c)| c.len()).sum::<usize>() + system.len();
if total_chars > 32_000 {
warnings.push(format!("Prompt is {} chars — may exceed context window", total_chars));
}
let has_user_turn = messages.iter().any(|(r, _)| *r == "user");
if !has_user_turn {
warnings.push("No user turn in messages".to_string());
}
warnings
}
fn main() {
let warnings = validate_prompt_quality(
"You are a helpful assistant.",
&[("user", "Explain async Rust")],
);
if warnings.is_empty() {
println!("✅ Prompt looks good");
} else {
for w in warnings { println!("⚠️ {}", w); }
}
}- [ ] System prompt is non-empty and provides clear persona/instructions.
- [ ] Context window usage estimated; alert if > 80% of limit.
- [ ] Few-shot examples are high quality and representative.
- [ ] Output format instructions are explicit when structured output is expected.