llama

Meta Llama open-source LLM family. Use for local AI.

7 stars

byG1Joshi

View on GitHub Installation ↓

Best use case

llama is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Meta Llama open-source LLM family. Use for local AI.

Teams using llama should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/llama/SKILL.md --create-dirs "https://raw.githubusercontent.com/G1Joshi/Agent-Skills/main/skills/ai-ml/llama/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/llama/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How llama Compares

Feature / Agent	llama	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Meta Llama open-source LLM family. Use for local AI.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Llama

Meta Llama is the king of Open Weights models. Llama 4 (2025) pushes 405B+ parameters, rivaling closed models like GPT-5.

## When to Use

- **Privacy**: Run it on your own VPC (AWS Bedrock, Azure, or self-hosted).
- **Fine-Tuning**: It is the default base model for fine-tuning on domain data.
- **Cost**: Inference on Groq/Together AI is significantly cheaper than GPT.

## Core Concepts

### Models

- **405B**: Frontier intelligence. Requires massive GPU clusters (or API).
- **70B**: The workhorse. Smart enough for most tasks.
- **8B**: Runs on a laptop (MacBook M3).

### Quantization

Running models at 4-bit or 8-bit precision to fit in VRAM with minimal quality loss (GGUF, EXL2).

### Llama Stack

Standardized tooling for building agentic apps on Llama.

## Best Practices (2025)

**Do**:

- **Use via API**: Groq (LPU) runs Llama Instantaneously (>1000 tok/s).
- **Fine-Tune 8B**: For specific tasks (classification, SQL generation), a fine-tuned 8B beats a generic 70B.

**Don't**:

- **Don't self-host 405B**: Unless you have 8xH100s. Use an API provider.

## References

- [Llama Website](https://www.llama.com/)