kinfer-runtime

K-Scale kinfer model inference engine for deploying trained RL policies to real robots via ONNX Runtime in Rust

16 stars

byplurigrid

View on GitHub Installation ↓

Best use case

kinfer-runtime is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

K-Scale kinfer model inference engine for deploying trained RL policies to real robots via ONNX Runtime in Rust

Teams using kinfer-runtime should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/kinfer-runtime/SKILL.md --create-dirs "https://raw.githubusercontent.com/plurigrid/asi/main/plugins/asi/skills/kinfer-runtime/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/kinfer-runtime/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How kinfer-runtime Compares

Feature / Agent	kinfer-runtime	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

K-Scale kinfer model inference engine for deploying trained RL policies to real robots via ONNX Runtime in Rust

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# K-Scale kinfer Skill

> *"The K-Scale model export and inference tool"*

## Trigger Conditions

- User asks about deploying RL policies to real robots
- Questions about ONNX model inference, Rust ML runtime
- Policy execution on embedded systems
- Real-time neural network inference

## Overview

**kinfer** is K-Scale's model inference engine for deploying trained policies:

1. **Model Loading**: ONNX format support via `ort` (ONNX Runtime)
2. **Real-time Execution**: Rust implementation for low latency
3. **Logging**: NDJSON telemetry for debugging
4. **Integration**: Seamless connection with KOS firmware

## Architecture

```
┌─────────────────────────────────────────────────────────────────────────┐
│  kinfer Inference Pipeline                                               │
│                                                                          │
│  ┌──────────────┐      load      ┌──────────────┐                       │
│  │  ONNX Model  │───────────────▶│   Runtime    │                       │
│  │  (.onnx)     │                │  (ort-sys)   │                       │
│  └──────────────┘                └──────┬───────┘                       │
│                                         │                                │
│  ┌──────────────┐      step      ┌──────┴───────┐      output           │
│  │ Observation  │───────────────▶│   Inference  │───────────────▶Action │
│  │  (sensors)   │                │    Engine    │                       │
│  └──────────────┘                └──────────────┘                       │
│                                         │                                │
│                                         ▼                                │
│                                  ┌──────────────┐                       │
│                                  │   Logger     │                       │
│                                  │  (NDJSON)    │                       │
│                                  └──────────────┘                       │
└─────────────────────────────────────────────────────────────────────────┘
```

## Key Features

### 1. Single Tokio Runtime

```rust
// Efficient async execution with GIL management
lazy_static! {
    static ref RUNTIME: Runtime = Runtime::new().unwrap();
}
```

### 2. Pre-fetch Inputs

```rust
// Minimize latency by preparing inputs ahead of time
fn step_and_take_action(&mut self, observation: &[f32]) -> Vec<f32> {
    // Pre-fetch next input while processing current
    ...
}
```

### 3. NDJSON Logging

```rust
// Async logging thread for telemetry
struct Logger {
    file: File,
    tx: Sender<LogEntry>,
}
```

## Language & Stack

- **Primary**: Rust (performance-critical)
- **ML Runtime**: ONNX Runtime (`ort`, `ort-sys`)
- **Async**: Tokio for non-blocking I/O
- **Bindings**: Python via PyO3

## GF(3) Trit Assignment

```
Trit: -1 (MINUS)
Role: Verification/Validation (inference must be correct)
Color: #6E5FE4
URI: skill://kscale-kinfer#6E5FE4
```

### Balanced Triads

```
kscale-kinfer (-1) ⊗ kscale-ksim (0) ⊗ onnx-export (+1) = 0 ✓
kscale-kinfer (-1) ⊗ rust-ml (0) ⊗ policy-training (+1) = 0 ✓
```

## Key Contributors

| Contributor | Focus Areas |
|------------|-------------|
| **b-vm** | Step function, command names |
| **codekansas** | Performance, refactoring |
| **WT-MM** | Logging, env variables |
| **alik-git** | NDJSON logging, plotting |
| **nfreq** | Tokio runtime, GIL management |

## Example Usage

```python
import kinfer

# Load model
model = kinfer.load_model("walking_policy.onnx")

# Get observation from sensors
obs = get_sensor_data()

# Run inference
action = model.step(obs)

# Apply to actuators
apply_action(action)
```

### Rust API

```rust
use kinfer::InferenceEngine;

let mut engine = InferenceEngine::load("policy.onnx")?;

loop {
    let obs = get_observation();
    let action = engine.step_and_take_action(&obs);
    send_to_actuators(&action);
}
```

## References

- [kscalelabs/kinfer](https://github.com/kscalelabs/kinfer) - Main repository (17 stars)
- [kscalelabs/kinfer-sim](https://github.com/kscalelabs/kinfer-sim) - Simulation visualization
- [ONNX Runtime](https://onnxruntime.ai/) - Inference backend

Related Skills

world-runtime

from plurigrid/asi

Firecracker microVM + Morph Infinibranch WorldRuntime for parallel verse execution. Entities branch/snapshot in <250ms.

implementing-runtime-security-with-tetragon

from plurigrid/asi

Implement eBPF-based runtime security observability and enforcement in Kubernetes clusters using Cilium Tetragon for kernel-level threat detection and policy enforcement.

implementing-runtime-application-self-protection

from plurigrid/asi

Deploy Runtime Application Self-Protection (RASP) agents to detect and block attacks from within application runtime, covering OpenRASP integration, attack pattern detection, and security policy configuration for Java and Python web applications.

hvm-runtime

from plurigrid/asi

HVM Runtime Skill

detecting-container-drift-at-runtime

from plurigrid/asi

Detect unauthorized modifications to running containers by monitoring for binary execution drift, file system changes, and configuration deviations from the original container image.