Skip to main content

open source · local-first

Local code context for AI agents, priced honestly.

A local MCP server and CLI that indexes a repo, stores team memory locally, and gives coding agents bounded files, symbols, and notes before they edit.

Claude Code · one command

claude mcp add codebase-context -- npx -y codebase-context
20frozen tasks
100retrieval rows
99officially scored
66×indexing cost delta

narrow case study · not a leaderboard

problem

Agents can drown in context

  • Agents often start by opening half the repo before they know what matters.
  • Memory turns into noise when every question gets the same notes.
  • Search tools can sound confident even when they did not retrieve enough code to edit safely.
  • Most benchmark writeups bury the install pain, indexing time, hardware assumptions, token bill, and failed setup paths.

product

What the tool gives the agent

  • Local search that combines text matches with semantic retrieval
  • AST-aligned chunks tied to real source structure
  • Symbol references, imports, and project metadata
  • Team memory and conventions stored locally
  • CLI workflows for indexing, maps, and memory
  • MCP server interface for coding agents

engineering

The tool had to fit normal dev machines

Local-first default

code stays on the machine unless the user explicitly opts into a cloud provider.

CPU by default

the tool has to run on a normal dev machine before any optional cloud profile is considered.

Framework-neutral core

specialized analyzers self-register instead of leaking framework rules into shared code.

Claim discipline

the write-up says what the run showed and blocks claims the evidence cannot support.

cost

Context is not free

A higher retrieval score is not enough if the tool takes forever to install, needs a hosted service, burns the token budget, or fails before the evaluator can judge the result.

  • How long does install and first setup take?
  • How long does the first index take?
  • How fast is a normal query after indexing?
  • How much disk and cache footprint remains?
  • How many tokens reach the model?
  • What did the benchmark orchestration itself cost?
  • What failed before quality scoring?

evidence

One honest run

what the run showed

Close retrieval. Very different indexing cost.

Full benchmark

jCodeMunch

27.1%recall
11smedian index

Codebase Context

25.7%recall
750smedian index

1.4 pp recall gap. 66× indexing cost difference.

100 rows · 20 frozen open-source tasks · narrow case study, not a winner claim

limits

What I will not claim

  • It does not show that codebase-context writes better patches.
  • It does not show that developers ship faster.
  • It does not prove codebase-context is generally superior.
  • It does not prove repeat stability because this run has one repeat per task and lane.
  • It does not prove provider-dollar or normalized clean-install cost.
  • The current evidence supports a narrow retrieval-overlap engineering note, not a winner table.

roadmap

Before stronger claims

  1. 1Normalize cold install, dependency cache, and index reuse before making cost-efficiency claims.
  2. 2Decide whether provider-dollar spend is out of scope or explicitly measured.
  3. 3Collect enough repeated judged runs per tool/task before stability or confidence claims.
  4. 4Improve Codebase Context cold indexing and exact code-range matching, then rerun the same frozen tasks.
  5. 5Keep GrepAI in the appendix unless the owner reopens a capped readiness run.

local-first · open source

MCP server and CLI · runs fully offline