LLM Rust Decision Matrix

Provider comparison

|---|---|---|---|---|---|

| Cost | $$$ | $$$ | $$ | Free | Infra cost only |

| Latency | 200ms–5s | 300ms–8s | 200ms–6s | 50ms–3s | 20ms–2s (depends on hw) |

| Streaming | SSE | SSE | SSE | SSE | Custom |

Decision flowchart

rust

Data privacy requirement (GDPR, HIPAA)?
└── Yes → Local/self-hosted only (Ollama or candle)

Need best quality for reasoning tasks?
└── Claude 3.5 Sonnet or GPT-4o

High volume, cost-sensitive?
└── GPT-4o-mini or Groq (fastest inference)

Long context (>100k tokens)?
└── Gemini 1.5 Pro or Claude 3.5 Sonnet

Offline capability required?
└── candle (compile into binary) or Ollama

Fine-tuned domain model?
└── Self-hosted via ort/candle

Prototyping quickly?
└── OpenAI (most mature Rust tooling)

Runnable example — provider-agnostic abstraction

rust

use serde::{Deserialize, Serialize};
use std::time::Duration;

#[derive(Debug, Clone, Serialize, Deserialize)]
struct Message { role: String, content: String }

#[derive(Debug)]
struct CompletionResult {
    content: String,
    provider: String,
    model: String,
    input_tokens: u32,
    output_tokens: u32,
    latency_ms: u64,
}

/// Unified interface for all LLM providers
trait LlmProvider: Send + Sync {
    fn provider_name(&self) -> &str;
    fn default_model(&self) -> &str;
}

/// OpenAI provider configuration
struct OpenAiProvider { api_key: String }
impl LlmProvider for OpenAiProvider {
    fn provider_name(&self) -> &str { "openai" }
    fn default_model(&self) -> &str { "gpt-4o-mini" }
}

/// Anthropic provider configuration
struct AnthropicProvider { api_key: String }
impl LlmProvider for AnthropicProvider {
    fn provider_name(&self) -> &str { "anthropic" }
    fn default_model(&self) -> &str { "claude-3-haiku-20240307" }
}

/// Local Ollama provider
struct OllamaProvider { base_url: String }
impl LlmProvider for OllamaProvider {
    fn provider_name(&self) -> &str { "ollama" }
    fn default_model(&self) -> &str { "llama3.2" }
}

/// Simulate completion (replace with actual HTTP calls per provider)
async fn complete(
    provider: &dyn LlmProvider,
    messages: &[Message],
    max_tokens: u32,
) -> Result<CompletionResult, String> {
    let start = std::time::Instant::now();

    // Provider-specific latency simulation
    let latency = match provider.provider_name() {
        "openai" => 300,
        "anthropic" => 400,
        "ollama" => 80,
        _ => 200,
    };
    tokio::time::sleep(Duration::from_millis(latency)).await;

    let content = format!(
        "[{}] Response: {}",
        provider.provider_name(),
        messages.last().map(|m| &m.content[..20.min(m.content.len())]).unwrap_or("")
    );

    Ok(CompletionResult {
        content,
        provider: provider.provider_name().to_string(),
        model: provider.default_model().to_string(),
        input_tokens: messages.iter().map(|m| m.content.len() as u32 / 4).sum(),
        output_tokens: max_tokens / 4,
        latency_ms: start.elapsed().as_millis() as u64,
    })
}

#[tokio::main]
async fn main() {
    let providers: Vec<Box<dyn LlmProvider>> = vec![
        Box::new(OpenAiProvider { api_key: "sk-test".to_string() }),
        Box::new(AnthropicProvider { api_key: "ant-test".to_string() }),
        Box::new(OllamaProvider { base_url: "http://localhost:11434".to_string() }),
    ];

    let messages = vec![Message {
        role: "user".to_string(),
        content: "Explain Rust async in one sentence.".to_string(),
    }];

    for provider in &providers {
        match complete(provider.as_ref(), &messages, 100).await {
            Ok(result) => println!(
                "[{:10}] {}ms | {}/{} tokens | {}",
                result.provider, result.latency_ms,
                result.input_tokens, result.output_tokens,
                result.content
            ),
            Err(e) => println!("[{}] Error: {}", provider.provider_name(), e),
        }
    }
}

Cost comparison for 1M daily requests (100 input + 100 output tokens)

| Provider | Model | Daily cost estimate |

|---|---|---|

| OpenAI | gpt-4o | ~$1,500 |

| OpenAI | gpt-4o-mini | ~$60 |

| Anthropic | claude-3-haiku | ~$50 |

| Groq | llama3-8b | ~$10 |

| Self-hosted | Llama3 8B (1 GPU) | ~$5 infra |

LLM Rust Decision Matrix

LLM Rust Decision Matrix

Provider comparison

Decision flowchart

Runnable example — provider-agnostic abstraction

Cost comparison for 1M daily requests (100 input + 100 output tokens)

Related reading

Related Guides

LLM API Gateway in Rust

Building LLM Applications with Rust

Continue in This Topic

LLM Rust Benchmarking

LLM Rust Interview Q&A

More Rust Guides

Building LLM Applications with Rust

LLM API Gateway in Rust

LLM Rust Anti-Patterns

LLM Rust Benchmarking

LLM Rust Interview Q&A

LLM Rust Maintainability