gemini-codebase-onboard

Use Gemini CLI's 1M token context to understand entire codebases in one pass. Full architecture mapping, pattern discovery, and onboarding documentation.

28 stars

byDNYoussef

View on GitHub Installation ↓

Best use case

gemini-codebase-onboard is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use Gemini CLI's 1M token context to understand entire codebases in one pass. Full architecture mapping, pattern discovery, and onboarding documentation.

Teams using gemini-codebase-onboard should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/gemini-codebase-onboard/SKILL.md --create-dirs "https://raw.githubusercontent.com/DNYoussef/context-cascade/main/skills/platforms/gemini-codebase-onboard/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/gemini-codebase-onboard/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How gemini-codebase-onboard Compares

Feature / Agent	gemini-codebase-onboard	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Use Gemini CLI's 1M token context to understand entire codebases in one pass. Full architecture mapping, pattern discovery, and onboarding documentation.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

<!-- S0 META-IDENTITY -->

# Gemini Codebase Onboard Skill



---

## LIBRARY-FIRST PROTOCOL (MANDATORY)

**Before writing ANY code, you MUST check:**

### Step 1: Library Catalog
- Location: `.claude/library/catalog.json`
- If match >70%: REUSE or ADAPT

### Step 2: Patterns Guide
- Location: `.claude/docs/inventories/LIBRARY-PATTERNS-GUIDE.md`
- If pattern exists: FOLLOW documented approach

### Step 3: Existing Projects
- Location: `D:\Projects\*`
- If found: EXTRACT and adapt

### Decision Matrix
| Match | Action |
|-------|--------|
| Library >90% | REUSE directly |
| Library 70-90% | ADAPT minimally |
| Pattern exists | FOLLOW pattern |
| In project | EXTRACT |
| No match | BUILD (add to library after) |

---

## Kanitsal Cerceve (Evidential Frame Activation)
Kaynak dogrulama modu etkin.

## Purpose

Use Gemini CLI's massive 1 million token context window to analyze entire codebases in a single pass. Generate architecture maps, pattern documentation, and onboarding guides.

## When to Use This Skill

- Onboarding to a new codebase
- Understanding full system architecture
- Mapping dependencies across all files
- Finding patterns and anti-patterns
- Migration planning (need full context)
- Security audits requiring full codebase awareness
- Refactoring impact analysis

## When NOT to Use This Skill

- Working with single file (use Claude directly)
- Complex reasoning tasks (Claude is better)
- Writing new features (Gemini gets stuck in loops)
- Iterative refinement (use Codex)

## Workflow

### Phase 1: Megacontext Load

```bash
# Load entire codebase
./scripts/multi-model/gemini-yolo.sh "Analyze architecture" task-id megacontext

# Via delegate.sh
./scripts/multi-model/delegate.sh gemini "Map full architecture" --all-files

# Direct Gemini
bash -lc "gemini --all-files 'Analyze entire codebase and document architecture'"
```

### Phase 2: Analysis Queries

Common analysis patterns:
- Architecture overview
- Dependency mapping
- Pattern identification
- Security scanning
- Migration assessment

### Phase 3: Documentation Generation

1. Gemini produces findings
2. Claude synthesizes into documentation
3. Store in project for future reference

## Context Window Specs

| Metric | Value |
|--------|-------|
| Capacity | 1 million tokens |
| Equivalent | ~1,500 pages |
| Lines of Code | ~30,000 LOC |
| Best for | Projects under 30K LOC |

## Success Criteria

- Full architecture understood
- Key patterns documented
- Dependencies mapped
- Onboarding guide created

## Example Usage

### Example 1: New Project Onboarding

```text
User: "I just joined this team, help me understand the codebase"

Gemini Analysis:
bash -lc "gemini --all-files 'Create comprehensive onboarding guide including:
1. High-level architecture
2. Key components and their responsibilities
3. Data flow between components
4. Configuration and environment setup
5. Common patterns used'"

Output:
- Architecture: Microservices with API gateway
- Key Services: auth, users, orders, notifications
- Patterns: Repository pattern, Event-driven
- Setup: Docker Compose with 4 services
```

### Example 2: Refactoring Impact Analysis

```text
User: "What would break if we rename User to Account?"

Gemini Analysis:
bash -lc "gemini --all-files 'Identify all files affected by renaming User to Account'"

Output:
- 47 files with direct User references
- 12 database migrations needed
- 8 API endpoints returning User data
- 15 frontend components
- 3 background jobs
```

### Example 3: Security Audit

```text
User: "Find all places handling sensitive data"

Gemini Analysis:
bash -lc "gemini --all-files 'Security audit: Find all PII handling, auth patterns, and potential vulnerabilities'"

Output:
- PII fields: 12 database columns
- Auth: JWT with refresh tokens
- Concerns: 3 endpoints missing auth middleware
- Logging: 2 instances of sensitive data in logs
```

## Query Patterns

### Architecture Documentation
```
gemini --all-files "Document the full system architecture with component interactions"
```

### Dependency Mapping
```
gemini --all-files "Create a dependency graph showing how all modules relate"
```

### Pattern Analysis
```
gemini --all-files "Identify all design patterns used and assess consistency"
```

### API Documentation
```
gemini --all-files "Document all API endpoints with their request/response formats"
```

## Integration with Meta-Loop

```
META-LOOP PROPOSE PHASE:
    |
    +---> gemini-codebase-onboard
    |         |
    |         +---> Gemini: Load entire codebase (--all-files)
    |         +---> Gemini: Analyze architecture
    |         +---> Gemini: Identify impact areas
    |
    +---> Claude: Synthesize into plan
    |
    +---> Continue to IMPLEMENT phase
```

## Memory Integration

Results stored at:
- Key: `multi-model/gemini/onboard/{project}/{task_id}`
- Tags: WHO=gemini-megacontext, WHY=codebase-analysis
- Contains: Architecture map, patterns, dependencies

## Limitations

Based on real developer feedback:
- May generate errors in analysis (missing XML tags)
- Can get stuck in loops trying to fix mistakes
- Switches to Flash model after 5 minutes (Flash is weaker)
- Slower than Claude for complex reasoning
- Not great for implementation tasks

## Strengths

- Breadth of analysis is excellent
- Can summarize entire folders effectively
- Great for onboarding and auditing
- Powerful for architectural understanding

## Related Skills

- `multi-model-discovery`: Find existing solutions
- `gemini-research`: Current info research
- `codex-iterative-fix`: After understanding, for implementation
- `documentation`: Generate docs from analysis

<!-- S4 SUCCESS CRITERIA -->

## Verification Checklist

- [ ] Entire codebase loaded
- [ ] Architecture documented
- [ ] Key patterns identified
- [ ] Dependencies mapped
- [ ] Onboarding guide created
- [ ] Memory-MCP updated

<!-- PROMISE -->

[commit|confident] <promise>GEMINI_CODEBASE_ONBOARD_COMPLETE</promise>

Related Skills

gemini-research

from DNYoussef/context-cascade

Use Gemini CLI for research with Google Search grounding and 1M token context

/============================================================================/

from DNYoussef/context-cascade

/* SKILL SKILL :: VERILINGUA x VERIX EDITION */

web-scraping

from DNYoussef/context-cascade

Structured data extraction from web pages using claude-in-chrome MCP with sequential-thinking planning. Focus on READ operations, data transformation, and pagination handling for multi-page extraction.

visual-testing

from DNYoussef/context-cascade

Screenshot-based visual comparison and regression testing using claude-in-chrome MCP. Captures, compares, and validates UI states to detect layout shifts, visual bugs, and design regressions across viewports.

reflect

from DNYoussef/context-cascade

Extract learnings from session corrections and patterns, update skill files with persistent memory. Implements Loop 1.5 - per-session micro-learning between execution and meta-optimization.

fix-bug

from DNYoussef/context-cascade

Fix bug command

e2e-test

from DNYoussef/context-cascade

End-to-end testing workflow for validating complete user journeys through web applications using claude-in-chrome MCP. Specializes in test assertions, suite organization, evidence collection, and pass/fail reporting.

build-feature

from DNYoussef/context-cascade

Build feature command

browser-automation

from DNYoussef/context-cascade

Complex browser automation workflow using claude-in-chrome MCP with mandatory sequential-thinking planning. Use when automating multi-step web interactions, form filling, navigation sequences, or web scraping.

image-gen

from DNYoussef/context-cascade

Modular image generation - supports local SDXL Lightning, OpenAI DALL-E, Replicate, or custom providers

reverse-engineering-quick-triage

from DNYoussef/context-cascade

Fast binary analysis with string reconnaissance and static disassembly\ \ (RE Levels 1-2). Use when triaging suspicious binaries, extracting IOCs quickly,\ \ or performing initial malware analysis. Completes in \u22642 hours with automated\ \ decision gates."

reverse-engineering-firmware-analysis

from DNYoussef/context-cascade

Firmware extraction and IoT security analysis (RE Level 5) for routers and embedded systems. Use when analyzing IoT firmware, extracting embedded filesystems (SquashFS/JFFS2/CramFS), finding hardcoded credentials, performing CVE scans, or auditing embedded system security. Handles encrypted firmware with known decryption schemes. Completes in 2-8 hours with binwalk+firmadyne+QEMU emulation.

gemini-codebase-onboard

Best use case

When to use this skill

When not to use this skill

Installation

How gemini-codebase-onboard Compares

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

SKILL.md Source

Related Skills

gemini-research

/*============================================================================*/

web-scraping

visual-testing

reflect

fix-bug

e2e-test

build-feature

browser-automation

image-gen

reverse-engineering-quick-triage

reverse-engineering-firmware-analysis

/============================================================================/