Codex

memory-ingest

Ingest a source into any consumer's semantic memory by reading the topology contract

104 stars

Best use case

memory-ingest is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Ingest a source into any consumer's semantic memory by reading the topology contract

Teams using memory-ingest should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/memory-ingest/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/memory-ingest/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/memory-ingest/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How memory-ingest Compares

Feature / Agentmemory-ingestStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Ingest a source into any consumer's semantic memory by reading the topology contract

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# memory-ingest

Ingest an external source into a consumer framework's semantic memory. Reads the consumer's `memory.topology` contract to know where pages live, then extracts, summarizes, integrates, and cross-references — all topology-agnostic.

## When to Use

When new knowledge (a document, paper, URL, config file, or directory of files) needs to enter a consumer's semantic memory. This is the primary write path for external information.

## Parameters

### source (required)
Path to the source material. Supports: markdown (`.md`), PDF (`.pdf`), HTML (`.html`), YAML (`.yaml`/`.yml`), JSON (`.json`), a directory of files, or a URL.

### --consumer (optional)
Consumer ID to ingest into. Resolved via ADR-021 D4 precedence:
1. **Explicit** — `--consumer research-complete`
2. **Wrapper** — set by a calling skill or orchestrator
3. **Auto-detect** — cwd detection or active framework in `.aiwg/frameworks/registry.json`

### --dry-run (optional)
Preview what would be created/modified without writing any files. Outputs the planned page list, cross-references, and contradiction flags.

### --non-interactive (optional)
Skip the discussion step and proceed directly to extraction and page writing. Use for batch ingestion or CI pipelines.

## Operation

### 1. Resolve consumer

Determine which consumer's memory to target using ADR-021 D4 precedence. Fail with a clear error if no consumer can be resolved.

### 2. Load schema

Read `memory.topology` from the consumer's `manifest.json`. Extract:
- `rootDir` — base path for all memory pages
- `derivedPages.summary` — where summary pages are written
- `pageTemplate` — structure the summary must conform to
- `crossRefStyle` — how cross-references are formatted (e.g., wiki-links, markdown links)
- `indexPath` — location of the consumer's memory index
- `log` — path to `.log.jsonl`
- `ingestRequires` — optional list of required post-ingest actions (e.g., `"provenance"`)

### 3. Read source

Parse the source material based on type:
- **Markdown/HTML** — extract text, headings, and structure
- **PDF** — extract text content (use page ranges for large documents)
- **YAML/JSON** — parse structured data, identify key entities
- **Directory** — recursively read all supported files, treating each as a sub-source
- **URL** — fetch content, then parse based on content type

### 4. Discuss (interactive default)

**Default behavior** (no `--non-interactive` flag):
1. Present a concise summary of the source to the user
2. Highlight key takeaways, entities, and concepts found
3. Ask the user what to emphasize, de-prioritize, or reframe
4. Incorporate user guidance into the extraction strategy

This discussion-first pattern ensures the memory reflects human judgment, not just mechanical extraction.

### 5. Extract and summarize

Use LLM to produce a structured summary conforming to the consumer's `pageTemplate`. The summary captures:
- Key claims and findings
- Named entities (people, systems, concepts)
- Relationships between entities
- Source metadata (title, author, date, URI)

### 6. Integrate

- **Write summary page** to `derivedPages.summary` path
- **Update entity/concept pages** — for each entity or concept mentioned, update or create the relevant page under the consumer's entity directory, adding the new information with source attribution
- **Insert cross-references** — link the summary page to entity pages and vice versa, using the consumer's `crossRefStyle`

### 7. Contradiction detection

Compare new claims against existing pages. When a contradiction is found:
- **Flag inline** on the affected existing page using a callout:
  ```markdown
  > [!contradiction]
  > Source "paper.pdf" (2026-04-14) claims X, but this page states Y.
  > Ingested via memory-ingest — awaiting human resolution.
  ```
- **Log the contradiction** in `.log.jsonl` with `"contradictions"` count and details
- **Do not auto-resolve** — surface contradictions for human judgment

### 8. Update index

Regenerate the entry for the new summary page in the consumer's index at `indexPath`. Include title, source reference, date, and cross-ref targets.

### 9. Append log

Call `memory-log-append` with:
```
--consumer <resolved> --op ingest --data '{"source":"<path>","pages_touched":[...],"contradictions":<n>,"cross_refs_added":<n>}'
```

### 10. Optional provenance

If `ingestRequires` includes `"provenance"`, create a W3C PROV record documenting:
- `prov:Entity` — the new summary page
- `prov:Activity` — the ingest operation
- `prov:wasDerivedFrom` — the source material
- `prov:wasGeneratedBy` — this skill invocation
- `prov:wasAttributedTo` — the actor (model + user)

### 11. Report

Output a summary:
- Pages created or updated (with paths)
- Contradictions flagged (count and locations)
- Cross-references added (count)
- Provenance record path (if created)

## Error Handling

- **Consumer not found** — fail with actionable message listing available consumers
- **Source unreadable** — fail with format-specific guidance (e.g., "PDF extraction requires the Read tool with page ranges")
- **Schema missing fields** — warn and use sensible defaults; log the gap for `memory-lint` to catch
- **Log write failure** — non-blocking; report primary operation result regardless

## Examples

```
# Interactive ingest of a research paper
memory-ingest docs/papers/distributed-consensus.pdf --consumer research-complete

# Batch ingest a directory of meeting notes
memory-ingest .aiwg/working/meeting-notes/ --consumer sdlc-complete --non-interactive

# Dry run to preview what would change
memory-ingest https://example.com/api-spec.html --consumer sdlc-complete --dry-run

# Explicit consumer override
memory-ingest design-doc.md --consumer media-marketing-kit --non-interactive
```

## Related Skills

- `memory-log-append` — log write primitive (called in step 9)
- `memory-lint` — validates memory page structure and cross-ref integrity
- `memory-query-capture` — captures query patterns for memory optimization
- `provenance-create` — W3C PROV record creation (called in step 10 when required)

Related Skills

ralph-memory

104
from jmagly/aiwg

Manage Al semantic memory entries — list, query, and clear lessons learned across loop iterations

Codex

memory-query-capture

104
from jmagly/aiwg

Capture query synthesis as durable pages in semantic memory

Codex

memory-log-render

104
from jmagly/aiwg

Generate a human-readable Markdown view from a consumer's JSON Lines event log

Codex

memory-log-append

104
from jmagly/aiwg

Append a structured event to a consumer's semantic memory log

Codex

memory-forensics

104
from jmagly/aiwg

Volatility 3 memory forensics workflows covering acquisition with LiME and WinPmem, and structured analysis using Volatility 3 plugin reference

Codex

kb-ingest

104
from jmagly/aiwg

Ingest a source (URL, file, or freeform note) into the knowledge base. Creates a source summary and updates or creates relevant entity and concept pages.

Codex

grade-on-ingest

104
from jmagly/aiwg

Trigger GRADE quality assessment automatically when new research sources or findings enter the corpus

Codex

debug-memory

104
from jmagly/aiwg

Query and manage the executable feedback debug memory

Codex

aiwg-orchestrate

104
from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost

venv-manager

104
from jmagly/aiwg

Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.

pytest-runner

104
from jmagly/aiwg

Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.

vitest-runner

104
from jmagly/aiwg

Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.