ollama-local-llm

Run local LLM inference for chat, text generation, and embeddings via the Ollama server at {{OLLAMA_HOST}}:{{OLLAMA_PORT}}.

54 stars

bybidewio

View on GitHub Installation ↓

Best use case

ollama-local-llm is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Run local LLM inference for chat, text generation, and embeddings via the Ollama server at {{OLLAMA_HOST}}:{{OLLAMA_PORT}}.

Teams using ollama-local-llm should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ollama-local-llm/SKILL.md --create-dirs "https://raw.githubusercontent.com/bidewio/better-openclaw/main/skills/ollama-local-llm/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ollama-local-llm/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ollama-local-llm Compares

Feature / Agent	ollama-local-llm	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Run local LLM inference for chat, text generation, and embeddings via the Ollama server at {{OLLAMA_HOST}}:{{OLLAMA_PORT}}.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Ollama Local LLM Skill

Ollama local LLM server is available at `http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}` within the Docker network.

## Chat Completion

Send a multi-turn conversation and get a response:

```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      { "role": "system", "content": "You are a helpful coding assistant." },
      { "role": "user", "content": "Write a Python function to reverse a string." }
    ],
    "stream": false
  }'
```

## Text Generation

Generate text from a single prompt:

```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "prompt": "Explain the concept of recursion in simple terms.",
    "stream": false
  }'
```

## Streaming Responses

For real-time token-by-token output, enable streaming:

```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      { "role": "user", "content": "Tell me a short story." }
    ],
    "stream": true
  }'
```

## Generating Embeddings

Create vector embeddings for text (useful with Qdrant):

```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": ["This is a sentence to embed.", "Another sentence for comparison."]
  }'
```

## Model Management

```bash
# List available models
curl "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/tags"

# Pull a new model
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/pull" \
  -H "Content-Type: application/json" \
  -d '{"name": "llama3.2"}'

# Show model details
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/show" \
  -H "Content-Type: application/json" \
  -d '{"name": "llama3.2"}'

# Delete a model
curl -X DELETE "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/delete" \
  -H "Content-Type: application/json" \
  -d '{"name": "old-model"}'
```

## Advanced Generation Options

Fine-tune generation with parameters:

```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      { "role": "user", "content": "Write a creative poem about the ocean." }
    ],
    "stream": false,
    "options": {
      "temperature": 0.8,
      "top_p": 0.9,
      "top_k": 40,
      "num_predict": 512,
      "seed": 42
    }
  }'
```

## Using a Custom System Prompt

```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      { "role": "system", "content": "You are an expert data analyst. Respond with structured JSON when possible." },
      { "role": "user", "content": "Analyze this data: [10, 25, 18, 42, 7, 33]" }
    ],
    "stream": false,
    "format": "json"
  }'
```

## Recommended Models

| Model | Use Case | Size |
|-------|----------|------|
| `llama3.2` | General chat and reasoning | 3B |
| `llama3.2:70b` | Complex reasoning tasks | 70B |
| `codellama` | Code generation and review | 7B |
| `nomic-embed-text` | Text embeddings for RAG | 137M |
| `mistral` | Fast general-purpose inference | 7B |
| `phi3` | Compact and efficient reasoning | 3.8B |

## Tips for AI Agents

- Always set `"stream": false` when you need to parse the complete response programmatically.
- Use `format: "json"` when you need structured output that's easy to parse.
- Check available models with `/api/tags` before making inference calls to avoid 404 errors.
- For embedding tasks, use `nomic-embed-text` or similar dedicated embedding models, not chat models.
- Lower `temperature` (0.1-0.3) for factual/deterministic tasks; raise it (0.7-1.0) for creative tasks.
- Set `num_predict` to limit response length and prevent runaway generation.
- Use the embeddings endpoint with Qdrant for building RAG (Retrieval-Augmented Generation) pipelines.
- Check Ollama health at `http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/` — it returns "Ollama is running".

Related Skills

youtube-growth

from bidewio/better-openclaw

Act as an expert YouTube Strategy Consultant. Apply the Creator Unlock N.I.C.E.R. Framework for conducting channel audits, niche validation, and data-backed video ideation/thumbnail generation.

xyops-automate

from bidewio/better-openclaw

Build and manage automation pipelines using xyOps at {{XYOPS_HOST}}:{{XYOPS_PORT}}.

xml-parse

from bidewio/better-openclaw

Parse and transform XML/HTML documents using command-line tools in the shared volume at {{SHARED_VOLUME}}.

woodpecker-ci

from bidewio/better-openclaw

Lightweight container-native CI/CD with Woodpecker

whisper-transcribe

from bidewio/better-openclaw

Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.

web-interface-guidelines

from bidewio/better-openclaw

Checklist for reviewing UI code for compliance with comprehensive web interface, accessibility, performance, and content guidelines — based on Vercel's Web Interface Guidelines.

web-design-reviewer

from bidewio/better-openclaw

Inspect web interfaces for layout, responsive, accessibility, and visual issues, then apply targeted source code fixes and re-verify results.

weaviate-search

from bidewio/better-openclaw

Perform hybrid vector and keyword search using Weaviate at {{WEAVIATE_HOST}}:{{WEAVIATE_PORT}}.

watchtower-update

from bidewio/better-openclaw

Auto-update Docker containers using Watchtower.

vaultwarden-manage

from bidewio/better-openclaw

Self-hosted password management with Vaultwarden

vault-secrets

from bidewio/better-openclaw

Secrets management with HashiCorp Vault

vantajs-background

from bidewio/better-openclaw

Add animated WebGL background effects with Vanta.js — setup, parameters, resizing, performance considerations, and integration patterns in React/Next.js.