ollama-local-llm
Run local LLM inference for chat, text generation, and embeddings via the Ollama server at {{OLLAMA_HOST}}:{{OLLAMA_PORT}}.
Best use case
ollama-local-llm is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Run local LLM inference for chat, text generation, and embeddings via the Ollama server at {{OLLAMA_HOST}}:{{OLLAMA_PORT}}.
Teams using ollama-local-llm should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ollama-local-llm/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ollama-local-llm Compares
| Feature / Agent | ollama-local-llm | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Run local LLM inference for chat, text generation, and embeddings via the Ollama server at {{OLLAMA_HOST}}:{{OLLAMA_PORT}}.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Ollama Local LLM Skill
Ollama local LLM server is available at `http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}` within the Docker network.
## Chat Completion
Send a multi-turn conversation and get a response:
```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{ "role": "system", "content": "You are a helpful coding assistant." },
{ "role": "user", "content": "Write a Python function to reverse a string." }
],
"stream": false
}'
```
## Text Generation
Generate text from a single prompt:
```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/generate" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"prompt": "Explain the concept of recursion in simple terms.",
"stream": false
}'
```
## Streaming Responses
For real-time token-by-token output, enable streaming:
```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "Tell me a short story." }
],
"stream": true
}'
```
## Generating Embeddings
Create vector embeddings for text (useful with Qdrant):
```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/embed" \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"input": ["This is a sentence to embed.", "Another sentence for comparison."]
}'
```
## Model Management
```bash
# List available models
curl "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/tags"
# Pull a new model
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/pull" \
-H "Content-Type: application/json" \
-d '{"name": "llama3.2"}'
# Show model details
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/show" \
-H "Content-Type: application/json" \
-d '{"name": "llama3.2"}'
# Delete a model
curl -X DELETE "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/delete" \
-H "Content-Type: application/json" \
-d '{"name": "old-model"}'
```
## Advanced Generation Options
Fine-tune generation with parameters:
```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "Write a creative poem about the ocean." }
],
"stream": false,
"options": {
"temperature": 0.8,
"top_p": 0.9,
"top_k": 40,
"num_predict": 512,
"seed": 42
}
}'
```
## Using a Custom System Prompt
```bash
curl -X POST "http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/api/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{ "role": "system", "content": "You are an expert data analyst. Respond with structured JSON when possible." },
{ "role": "user", "content": "Analyze this data: [10, 25, 18, 42, 7, 33]" }
],
"stream": false,
"format": "json"
}'
```
## Recommended Models
| Model | Use Case | Size |
|-------|----------|------|
| `llama3.2` | General chat and reasoning | 3B |
| `llama3.2:70b` | Complex reasoning tasks | 70B |
| `codellama` | Code generation and review | 7B |
| `nomic-embed-text` | Text embeddings for RAG | 137M |
| `mistral` | Fast general-purpose inference | 7B |
| `phi3` | Compact and efficient reasoning | 3.8B |
## Tips for AI Agents
- Always set `"stream": false` when you need to parse the complete response programmatically.
- Use `format: "json"` when you need structured output that's easy to parse.
- Check available models with `/api/tags` before making inference calls to avoid 404 errors.
- For embedding tasks, use `nomic-embed-text` or similar dedicated embedding models, not chat models.
- Lower `temperature` (0.1-0.3) for factual/deterministic tasks; raise it (0.7-1.0) for creative tasks.
- Set `num_predict` to limit response length and prevent runaway generation.
- Use the embeddings endpoint with Qdrant for building RAG (Retrieval-Augmented Generation) pipelines.
- Check Ollama health at `http://{{OLLAMA_HOST}}:{{OLLAMA_PORT}}/` — it returns "Ollama is running".Related Skills
youtube-growth
Act as an expert YouTube Strategy Consultant. Apply the Creator Unlock N.I.C.E.R. Framework for conducting channel audits, niche validation, and data-backed video ideation/thumbnail generation.
xyops-automate
Build and manage automation pipelines using xyOps at {{XYOPS_HOST}}:{{XYOPS_PORT}}.
xml-parse
Parse and transform XML/HTML documents using command-line tools in the shared volume at {{SHARED_VOLUME}}.
woodpecker-ci
Lightweight container-native CI/CD with Woodpecker
whisper-transcribe
Transcribe audio and video files to text using the Whisper speech-to-text API at {{WHISPER_HOST}}:{{WHISPER_PORT}}.
web-interface-guidelines
Checklist for reviewing UI code for compliance with comprehensive web interface, accessibility, performance, and content guidelines — based on Vercel's Web Interface Guidelines.
web-design-reviewer
Inspect web interfaces for layout, responsive, accessibility, and visual issues, then apply targeted source code fixes and re-verify results.
weaviate-search
Perform hybrid vector and keyword search using Weaviate at {{WEAVIATE_HOST}}:{{WEAVIATE_PORT}}.
watchtower-update
Auto-update Docker containers using Watchtower.
vaultwarden-manage
Self-hosted password management with Vaultwarden
vault-secrets
Secrets management with HashiCorp Vault
vantajs-background
Add animated WebGL background effects with Vanta.js — setup, parameters, resizing, performance considerations, and integration patterns in React/Next.js.