I built a semantic search tool. Codex and Claude agree it’s effective.

Index

Introduction

When an AI agent explores a codebase, it needs to acquire the right context to be effective. There are three complementary ways to do so:

  • Exact text with rg (ripgrep) — fastest for literals and identifiers.
  • Structure‑aware with ast-grep — precise for declarations and pattern audits.
  • Semantic search with embeddings— intent‑level queries like “user settings,” “retry logic,” or “transactional sync.”

Here is a comparison search for user settings:

# EXACT SEARCH: look for settings and possible synonyms
rg -i "settings|preferences|defaults|configuration" .

# SYNTAX AWARE SEARCH: look for functions with settings in their name
ast-grep -p "func $FUNC(settings)" --lang swift .

# SEMANTIC SEARCH
semly locate "user settings"

*I published semly last week, I’m not aware of a similar tool.

Mind the rg string of synonyms. Agents know settings may appear under multiple names so they search for alternatives. This is highly effective but not as good as semantic search. Semantic search finds code by what it does, not just what it’s called. If you search for ‘retry logic,’ it will find functions named ‘attemptWithBackoff’ or ‘handleFailureAndRetry’—even though those names don’t contain the word ‘retry.’

A typical result:

  • rg finds 47 matches across 12 files (lots of noise)
  • ast-grep finds 3 function declarations (precise but misses conceptual matches)
  • semly finds 8 relevant results ranked by semantic similarity

Agent Evaluation

I specifically wrote this tool to speed up project exploration afer spawning a new agent. Here is a quick test you can drop on your agent to confirm its utility:

compare the effectiveness of rg, ast-grep, and semly finding content in this swift project

Results on my machine:

  • Claude Opus executed 6 different tests. Full evaluation.
  • Codex gpt-5 high ran “a few representative searches”. Full evaluation.

Both reached this conclusion: ripgrep for exact matches, ast-grep for structure, Semly for meaning. Finally, I asked GPT-5 to summarize both evaluations in a table comparing all three tools:


ripgrep (rg)

Strengths:

  • ⚡ Extremely fast (~0.02–0.04s for most queries)
  • Excellent regex support
  • Found 122 class definitions accurately
  • Great for literal text searches and pattern matching
  • Low resource usage

Weaknesses:

  • No semantic understanding
  • Can't understand code structure or intent
  • Limited to text patterns

ast-grep

Strengths:

  • Structure-aware searching
  • Can find specific code patterns
  • Language-aware parsing

Weaknesses:

  • Slower than ripgrep (~0.05s+)
  • Limited Swift support
  • Pattern syntax errors with complex queries
  • Struggled with Swift-specific syntax like async functions

Semly

Strengths:

  • 🧠 Semantic understanding – finds conceptually related code
  • Excellent for high-level queries ("error handling", "UI components")
  • Returns contextual previews
  • Finds relevant documentation alongside code
  • Perfect for exploring unfamiliar codebases

Weaknesses:

  • Requires indexing upfront
  • Not suitable for exact text matching
  • Slower for simple searches

Verdict

Each tool excels in different scenarios:

  • Use ripgrep for fast text searches, refactoring, finding specific strings.
  • Use ast-grep for structural refactoring (better with languages other than Swift).
  • Use Semly for understanding code architecture, finding related functionality, and exploring concepts.

For this Swift project specifically, ripgrep + Semly is the optimal combination, with ripgrep handling precise searches and Semly providing semantic understanding.


Summary

  • rg: Fastest for literal/regex searches; great for exact identifiers, logs, TODOs, quick scoping.
  • ast-grep: Structure-aware (AST) precision; best for exact declarations, audits, and safe codemods.
  • semly: Intent-level (natural language) across code+docs; best for discovery/navigation and “what/where/how” questions.

Example 1 — scanForChanges

  • rg (decls+calls): quickly surfaces definitions and uses.
    rg -n "scanForChanges(" Sources

    Examples: Sources/Semly/Services/FileTracking/FileTracker.swift:15, Sources/Semly/Services/CodeAnalysis/CodeAnalysisCoordinator.swift:107, call sites in coordinators/routers.

  • ast-grep (declaration, precise):
    ast-grep run -l swift -p 'func scanForChanges(' Sources

    Pinpoints declaration: Sources/Semly/Services/FileTracking/FileTracker.swift:15

  • semly (navigate by intent):
    semly query "scanForChanges FileTracker" --project Semly --ext swift --limit 5 --mode locate

    Returns FileTracker.swift plus relevant pipeline/coordinator contexts for quick jumping.

Example 2 — Long Identifier

  • Identifier: mergeLeadingImportChunkIfPresent
  • rg: finds both call and declaration fast.

    Sources/Semly/Threading/AnalysisPipeline/ProjectAnalysisPipelineV2.swift:803 (call)

    Sources/Semly/Threading/AnalysisPipeline/ProjectAnalysisPipelineV2.swift:836 (decl)

  • ast-grep (declaration, structural):
    ast-grep run -l swift -p 'func mergeLeadingImportChunkIfPresent(' Sources

    Pinpoints decl: Sources/Semly/Threading/AnalysisPipeline/ProjectAnalysisPipelineV2.swift:836

  • semly (deterministic locate by symbol):
    semly query "mergeLeadingImportChunkIfPresent" --project Semly --ext swift --limit 5 --mode locate

    Surfaces the declaration in ProjectAnalysisPipelineV2.swift at top.

Example 3 — Concept-Level “Where is transactional sync performed?”

  • Core idea: syncing chunk+embedding results in a DB transaction.
  • semly (NL → code):
    semly query "Where is the transactional sync performed for chunks and embeddings?" --project Semly --ext swift --limit 6

    Points to DatabaseProjectAdapter.syncFileAnalysisInTransaction and pipeline references.

    Key file: Sources/Semly/Services/ProjectRegistry/DatabaseProjectAdapter.swift:240

  • rg (once you know the API):
    rg -n "syncFileAnalysisInTransaction" Sources

    Example: Sources/Semly/Services/ProjectRegistry/DatabaseProjectAdapter.swift:240

  • ast-grep (calls, if you craft patterns): good for auditing member/free/await/try forms; requires precise patterns and/or rules.

Strengths & Limits

  • rg
    • Pros: lightning fast, zero setup, great for exact strings and scoping.
    • Cons: no structure; noisy for conceptual questions.
  • ast-grep
    • Pros: language-aware precision; excellent for declarations and targeted replacements/audits.
    • Cons: needs correct patterns/rules per language; not for vague/NL queries.
  • semly
    • Pros: semantic discovery; cross code+docs; strong for “what/where/how” and long identifiers.
    • Cons: needs indexing; broad NL may prioritize docs unless you steer to code (--ext swift helps).

Practical Workflow

  • Use semly first to discover and jump (intent-level).
  • Use rg to quickly scope and enumerate concrete hits.
  • Use ast-grep for exact structural matches and codemods.
Tool Strengths Weaknesses Verdict
ripgrep
  • Fast (~0.02–0.04s)
  • Regex support
  • Literal matches
  • Low resource usage
No awareness of code structure, intent, or semantics Best for exact identifiers, logs, TODOs, and quick scoping.
ast-grep
  • Structure-aware searching (AST)
  • Finds specific code patterns
  • Language-aware parsing
  • Slower than rg (~0.05s+)
  • Limited Swift support
  • Errors with complex patterns
  • Struggles with Swift async syntax
Best for structural queries, declarations, and safe codemods.
Semly
  • Semantic understanding
  • Excellent for high-level queries
  • Returns contextual previews
  • Surfaces relevant docs + code
  • Great for unfamiliar codebases
  • Requires upfront indexing
  • Not optimal for exact string matches
  • Slower for trivial searches
Best for conceptual exploration, intent-level navigation, and discovery.

Embeddings Primer

When you search for “user preferences,” you want to find code about settings, configuration, and defaults—even if those exact words aren’t present. Traditional search finds only exact text matches. Semantic search finds meaning.

Here’s a quick mental model of semantic search. Imagine plotting words on a map where similar words sit close together. “duck” and “chicken” would be neighbors. “car” would be far away. If your query is “chicken,” nearby words (“duck,” “goose”) are retrieved.

Each word’s position on this map is stored as a pair of numbers— its x and y coordinates. These numbers are called embeddings. In reality, we use 300–1500 numbers instead of just 2, but the principle is the same: similar meanings = similar numbers.

The semantic search workflow is:

  1. Encode information into numbers
  2. Encode query into numbers
  3. Retrieve information similar to your query

The same principle can be applied to words or to chunks of text with 200 lines. The codebase indexing flow would be:

  1. Split into chunks.
  2. Compute embeddings.
  3. Store chunks with metadata (file, line).
  4. Encode the query as an embedding.
  5. Compare via “cosine similarity”
  6. Return ranked, relevant chunks.
  7. Optionally, hand results to an LLM to interpret the answer.

Here is additional background

Technical Details

Parsing Strategy

To parse Markdown and Swift I use swift-markdown and SwiftSyntax. I only support those two but I think Tree-sitter would make multi-language support straightforward.

These libraries tells me the structure, so I can split the file preserving related elements together. For instance, headings stay with their content, code examples remain intact, etc. In other words I’m splitting following semantic boundaries. The goal is fine-grained chunks that differentiate components while preserving local reasoning.

Embedding Model Choice

I used on-device MiniLM L6 v2. I described how in Embedding with MiniLM.

MiniLM outperforms OpenAI’s larger models for code because:

  • It is tuned for sentence-sized text. This plays well with one-liners like user queries, markdown headings, DocC comments, function signatures.
  • It tends to preserve the distinctiveness of rare tokens (like type and function names).
  • It runs fast on-device, so I can index at finer granularity and re-rank more candidates without significant latency.

Ranking & Relevance

  • Symbol pinning: ensure exact camelCase identifiers rank top. e.g. if user queries scanForChanges, that function, if found, ranks top.
  • Header-lex boosts: reward overlaps between query terms and signatures/headings.
  • Lexical prefilter: discard candidates without token overlap.

In practice, there are a lot more knobs to tune in A/B tests, and it’s impossible to predict success before running experiments. There are formal metrics to rank results but without a labeled dataset I just eyeball the results.

Infraestructure

Storage

  • Plain SQLite with GRDB. Not even sqlite-vec.
  • CPU/GPU/ANE cost is negligible. No need for a vector DB unless you’re at GitHub scale.

Communication terminal/app

  • Automator remains the less hacky option for MAS.
  • Other options considered: XPC, app groups, defaults, URL + TCP port, tmp files.
  • XPC for the notarized version but it requires a global mach name, which is disallowed in MAS.

Why

While working on Semly I created 100+ markdown documents about the application itself. My workflow was

“Claude read x,y,z then work on files a,b,c“.

Now I tell the agent “use semly” –which is unfortunate, but I find that agents don’t automatically reach for external tools.

I rarely read every line of code. Instead, I browse on Sourcetree (!) track architecture, write tests, log experiments, and step in when the agent goes astray. I’ve noticed agents struggle most with what they can’t see directly. Some examples I suffered, from worse to better: graphics with multiple coordinate systems > generics > inheritance > composition > Entity Component System (ECS). I found ECS very effective, giant state + reducers = no problem for agents.

So far, Semly is a workflow enhancer, not a necessity. Likely it will save some tokens and you won’t even notice. But I see no reason not to use it. I think in the near future agents will run semantic search hidden in the background. It’s also likely we will be able to create complete technical documentation for projects –the technology for it is already here; but that’s another story.

Semly is available as CLI and MCP. MCP is like dropping the whole manual in the context for better or worse. If you use Claude Code:

claude mcp add semly --scope user -- semly mcp
claude mcp remove "semly" -s user