RRust By Example

Rust AI Inference Team Workflow

Team practices for developing and maintaining AI inference services in Rust. Covers branching strategy, model versioning, deployment pipelines, on-call runbooks, and ML-Ops workflows.

Topic: Ai Inference

Search intent: High-intent search: "rust ai inference mlops workflow team"

Rust AI Inference Team Workflow

Repository structure

rust
ml-inference-service/
├── Cargo.toml
├── src/Rust inference server
├── models/Model artifacts (tracked via DVC, not git)
│   ├── sentiment-v1.onnx
│   └── embed-v2.onnx
├── proto/                  — gRPC protobuf definitions
├── scripts/
│   ├── benchmark.sh        — Perf benchmark automation
│   ├── validate_model.sh   — Model output validation
│   └── deploy.sh           — Deployment script
├── tests/
│   ├── integration/Integration test suite
│   └── golden/Golden output files for regression tests
└── .github/workflows/
    ├── ci.yml              — Build, test, lint
    └── release.yml         — Docker build, deploy

Model versioning workflow

rust
/// Semantic versioning for model artifacts
#[derive(Debug, Clone, PartialEq)]
struct ModelVersion {
    major: u32,  // Breaking change in API or input format
    minor: u32,  // New capability, backward compatible
    patch: u32,  // Weight update only, same architecture
}

impl ModelVersion {
    fn parse(s: &str) -> Option<Self> {
        let parts: Vec<u32> = s.split('.').filter_map(|p| p.parse().ok()).collect();
        if parts.len() == 3 {
            Some(Self { major: parts[0], minor: parts[1], patch: parts[2] })
        } else {
            None
        }
    }

    fn is_compatible_with(&self, other: &ModelVersion) -> bool {
        // Major version must match for compatibility
        self.major == other.major
    }
}

impl std::fmt::Display for ModelVersion {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "{}.{}.{}", self.major, self.minor, self.patch)
    }
}

fn main() {
    let server_expects = ModelVersion::parse("2.0.0").unwrap();
    let model_file_v = ModelVersion::parse("2.3.1").unwrap();
    let breaking_v = ModelVersion::parse("3.0.0").unwrap();

    println!("v{} compatible with server: {}",
        model_file_v, model_file_v.is_compatible_with(&server_expects));
    println!("v{} compatible with server: {}",
        breaking_v, breaking_v.is_compatible_with(&server_expects));
}

CI pipeline checklist

yaml
# .github/workflows/ci.yml example (illustrative)
# on: [push, pull_request]
# jobs:
#   test:
#     runs-on: ubuntu-latest
#     steps:
#       - cargo fmt --check
#       - cargo clippy -- -D warnings
#       - cargo test --release
#       - cargo bench --no-run  (compile benchmarks to catch breakage)
#       - Run golden output tests against reference inputs
#       - Run load test at 50% production traffic

The key gates before merging:

  • [ ] All unit and integration tests pass.
  • [ ] No Clippy warnings (enforced with -D warnings).
  • [ ] Golden output regression tests pass.
  • [ ] Benchmark performance within 10% of baseline.
  • [ ] Memory usage in load test within 5% of baseline.

Model promotion workflow

rust
/// Three environments with clear promotion gates
#[derive(Debug, Clone, Copy, PartialEq)]
enum Environment {
    Dev,     // Automatic: any passing commit
    Staging, // Manual: after load test passes
    Prod,    // Manual: after 24h staging soak test
}

#[derive(Debug)]
struct ModelDeployment {
    model_id: String,
    version: String,
    environment: Environment,
    validated: bool,
}

impl ModelDeployment {
    fn can_promote_to(&self, target: Environment) -> bool {
        match (self.environment, target) {
            (Environment::Dev, Environment::Staging) => self.validated,
            (Environment::Staging, Environment::Prod) => self.validated,
            _ => false,
        }
    }
}

fn main() {
    let deployment = ModelDeployment {
        model_id: "sentiment-bert".to_string(),
        version: "2.1.0".to_string(),
        environment: Environment::Dev,
        validated: true,
    };

    println!(
        "Can promote to staging: {}",
        deployment.can_promote_to(Environment::Staging)
    );
    println!(
        "Can promote to prod: {}",
        deployment.can_promote_to(Environment::Prod)
    );
}

On-call runbook template

rust
// Document these operational procedures as comments in your code
// or in a separate RUNBOOK.md

/// # On-Call Runbook: High Latency
///
/// ## Symptoms
/// - p99 latency > 200ms
/// - Alert: `ai_inference_latency_p99_ms > 200`
///
/// ## Investigation steps
/// 1. Check queue depth: `kubectl exec -it pod -- curl localhost:9090/metrics | grep queue`
/// 2. Check GPU utilization: `nvidia-smi`
/// 3. Check for request spikes: `rate(ai_inference_requests_total[1m])`
///
/// ## Mitigation
/// - If queue > 500: Scale out workers (`kubectl scale deployment inference --replicas=+2`)
/// - If GPU OOM: Restart pod (`kubectl rollout restart deployment inference`)
/// - If bad deploy: Rollback (`kubectl rollout undo deployment inference`)
struct OnCallRunbook;

fn main() {
    println!("See RUNBOOK.md for operational procedures");
}

Code review SLA

| Review type | Turnaround |

|---|---|

| Bug fix | 4 hours |

| Model update (weights only) | 24 hours |

| Architecture change | 48 hours |

| Performance optimization | 48 hours |

Related reading

Related Guides

Continue in This Topic

More Rust Guides