Semantic Search
I built a semantic search tool. Codex and Claude agree it’s effective.
Index
Introduction
When an AI agent explores a codebase, it needs to acquire the right context to be effective. There are three complementary ways to do so:
- Exact text with
rg
(ripgrep) — fastest for literals and identifiers. - Structure‑aware with
ast-grep
— precise for declarations and pattern audits. - Semantic search with embeddings— intent‑level queries like “user settings,” “retry logic,” or “transactional sync.”
Here is a comparison search for user settings
:
# EXACT SEARCH: look for settings and possible synonyms
rg -i "settings|preferences|defaults|configuration" .
# SYNTAX AWARE SEARCH: look for functions with settings in their name
ast-grep -p "func $FUNC(settings)" --lang swift .
# SEMANTIC SEARCH
semly locate "user settings"
*I published semly last week, I’m not aware of a similar tool.
Mind the rg
string of synonyms. Agents know settings may appear under multiple names so they search for alternatives. This is highly effective but not as good as semantic search. Semantic search finds code by what it does, not just what it’s called. If you search for ‘retry logic,’ it will find functions named ‘attemptWithBackoff’ or ‘handleFailureAndRetry’—even though those names don’t contain the word ‘retry.’
A typical result:
- rg finds 47 matches across 12 files (lots of noise)
- ast-grep finds 3 function declarations (precise but misses conceptual matches)
- semly finds 8 relevant results ranked by semantic similarity
Agent Evaluation
I specifically wrote this tool to speed up project exploration afer spawning a new agent. Here is a quick test you can drop on your agent to confirm its utility:
compare the effectiveness of rg, ast-grep, and semly finding content in this swift project
Results on my machine:
- Claude Opus executed 6 different tests. Full evaluation.
- Codex gpt-5 high ran “a few representative searches”. Full evaluation.
Both reached this conclusion: ripgrep for exact matches, ast-grep for structure, Semly for meaning. Finally, I asked GPT-5 to summarize both evaluations in a table comparing all three tools:
ripgrep (rg)
Strengths:
- ⚡ Extremely fast (~0.02–0.04s for most queries)
- Excellent regex support
- Found 122 class definitions accurately
- Great for literal text searches and pattern matching
- Low resource usage
Weaknesses:
- No semantic understanding
- Can't understand code structure or intent
- Limited to text patterns
ast-grep
Strengths:
- Structure-aware searching
- Can find specific code patterns
- Language-aware parsing
Weaknesses:
- Slower than ripgrep (~0.05s+)
- Limited Swift support
- Pattern syntax errors with complex queries
- Struggled with Swift-specific syntax like async functions
Semly
Strengths:
- 🧠 Semantic understanding – finds conceptually related code
- Excellent for high-level queries ("error handling", "UI components")
- Returns contextual previews
- Finds relevant documentation alongside code
- Perfect for exploring unfamiliar codebases
Weaknesses:
- Requires indexing upfront
- Not suitable for exact text matching
- Slower for simple searches
Verdict
Each tool excels in different scenarios:
- Use ripgrep for fast text searches, refactoring, finding specific strings.
- Use ast-grep for structural refactoring (better with languages other than Swift).
- Use Semly for understanding code architecture, finding related functionality, and exploring concepts.
For this Swift project specifically, ripgrep + Semly is the optimal combination, with ripgrep handling precise searches and Semly providing semantic understanding.
Summary
- rg: Fastest for literal/regex searches; great for exact identifiers, logs, TODOs, quick scoping.
- ast-grep: Structure-aware (AST) precision; best for exact declarations, audits, and safe codemods.
- semly: Intent-level (natural language) across code+docs; best for discovery/navigation and “what/where/how” questions.
Example 1 — scanForChanges
- rg (decls+calls): quickly surfaces definitions and uses.
rg -n "scanForChanges(" Sources
Examples:
Sources/Semly/Services/FileTracking/FileTracker.swift:15
,Sources/Semly/Services/CodeAnalysis/CodeAnalysisCoordinator.swift:107
, call sites in coordinators/routers. - ast-grep (declaration, precise):
ast-grep run -l swift -p 'func scanForChanges(' Sources
Pinpoints declaration:
Sources/Semly/Services/FileTracking/FileTracker.swift:15
- semly (navigate by intent):
semly query "scanForChanges FileTracker" --project Semly --ext swift --limit 5 --mode locate
Returns
FileTracker.swift
plus relevant pipeline/coordinator contexts for quick jumping.
Example 2 — Long Identifier
- Identifier:
mergeLeadingImportChunkIfPresent
- rg: finds both call and declaration fast.
Sources/Semly/Threading/AnalysisPipeline/ProjectAnalysisPipelineV2.swift:803
(call)Sources/Semly/Threading/AnalysisPipeline/ProjectAnalysisPipelineV2.swift:836
(decl) - ast-grep (declaration, structural):
ast-grep run -l swift -p 'func mergeLeadingImportChunkIfPresent(' Sources
Pinpoints decl:
Sources/Semly/Threading/AnalysisPipeline/ProjectAnalysisPipelineV2.swift:836
- semly (deterministic locate by symbol):
semly query "mergeLeadingImportChunkIfPresent" --project Semly --ext swift --limit 5 --mode locate
Surfaces the declaration in
ProjectAnalysisPipelineV2.swift
at top.
Example 3 — Concept-Level “Where is transactional sync performed?”
- Core idea: syncing chunk+embedding results in a DB transaction.
- semly (NL → code):
semly query "Where is the transactional sync performed for chunks and embeddings?" --project Semly --ext swift --limit 6
Points to
DatabaseProjectAdapter.syncFileAnalysisInTransaction
and pipeline references.Key file:
Sources/Semly/Services/ProjectRegistry/DatabaseProjectAdapter.swift:240
- rg (once you know the API):
rg -n "syncFileAnalysisInTransaction" Sources
Example:
Sources/Semly/Services/ProjectRegistry/DatabaseProjectAdapter.swift:240
- ast-grep (calls, if you craft patterns): good for auditing member/free/await/try forms; requires precise patterns and/or rules.
Strengths & Limits
- rg
- Pros: lightning fast, zero setup, great for exact strings and scoping.
- Cons: no structure; noisy for conceptual questions.
- ast-grep
- Pros: language-aware precision; excellent for declarations and targeted replacements/audits.
- Cons: needs correct patterns/rules per language; not for vague/NL queries.
- semly
- Pros: semantic discovery; cross code+docs; strong for “what/where/how” and long identifiers.
- Cons: needs indexing; broad NL may prioritize docs unless you steer to code (
--ext swift
helps).
Practical Workflow
- Use semly first to discover and jump (intent-level).
- Use rg to quickly scope and enumerate concrete hits.
- Use ast-grep for exact structural matches and codemods.
Tool | Strengths | Weaknesses | Verdict |
---|---|---|---|
ripgrep |
|
No awareness of code structure, intent, or semantics | Best for exact identifiers, logs, TODOs, and quick scoping. |
ast-grep |
|
|
Best for structural queries, declarations, and safe codemods. |
Semly |
|
|
Best for conceptual exploration, intent-level navigation, and discovery. |
Embeddings Primer
When you search for “user preferences,” you want to find code about settings, configuration, and defaults—even if those exact words aren’t present. Traditional search finds only exact text matches. Semantic search finds meaning.
Here’s a quick mental model of semantic search. Imagine plotting words on a map where similar words sit close together. “duck” and “chicken” would be neighbors. “car” would be far away. If your query is “chicken,” nearby words (“duck,” “goose”) are retrieved.
Each word’s position on this map is stored as a pair of numbers— its x and y coordinates. These numbers are called embeddings. In reality, we use 300–1500 numbers instead of just 2, but the principle is the same: similar meanings = similar numbers.
The semantic search workflow is:
- Encode information into numbers
- Encode query into numbers
- Retrieve information similar to your query
The same principle can be applied to words or to chunks of text with 200 lines. The codebase indexing flow would be:
- Split into chunks.
- Compute embeddings.
- Store chunks with metadata (file, line).
- Encode the query as an embedding.
- Compare via “cosine similarity”
- Return ranked, relevant chunks.
- Optionally, hand results to an LLM to interpret the answer.
Here is additional background
- LLM Primer if you know nothing about LLMs
- Embedding with MiniLM if you want to embed using Swift
Technical Details
Parsing Strategy
To parse Markdown and Swift I use swift-markdown and SwiftSyntax. I only support those two but I think Tree-sitter would make multi-language support straightforward.
These libraries tells me the structure, so I can split the file preserving related elements together. For instance, headings stay with their content, code examples remain intact, etc. In other words I’m splitting following semantic boundaries. The goal is fine-grained chunks that differentiate components while preserving local reasoning.
Embedding Model Choice
I used on-device MiniLM L6 v2. I described how in Embedding with MiniLM.
MiniLM outperforms OpenAI’s larger models for code because:
- It is tuned for sentence-sized text. This plays well with one-liners like user queries, markdown headings, DocC comments, function signatures.
- It tends to preserve the distinctiveness of rare tokens (like type and function names).
- It runs fast on-device, so I can index at finer granularity and re-rank more candidates without significant latency.
Ranking & Relevance
- Symbol pinning: ensure exact camelCase identifiers rank top. e.g. if user queries
scanForChanges
, that function, if found, ranks top. - Header-lex boosts: reward overlaps between query terms and signatures/headings.
- Lexical prefilter: discard candidates without token overlap.
In practice, there are a lot more knobs to tune in A/B tests, and it’s impossible to predict success before running experiments. There are formal metrics to rank results but without a labeled dataset I just eyeball the results.
Infraestructure
Storage
- Plain SQLite with GRDB. Not even sqlite-vec.
- CPU/GPU/ANE cost is negligible. No need for a vector DB unless you’re at GitHub scale.
Communication terminal/app
- Automator remains the less hacky option for MAS.
- Other options considered: XPC, app groups, defaults, URL + TCP port, tmp files.
- XPC for the notarized version but it requires a global mach name, which is disallowed in MAS.
Why
While working on Semly I created 100+ markdown documents about the application itself. My workflow was
“Claude read x,y,z then work on files a,b,c“.
Now I tell the agent “use semly” –which is unfortunate, but I find that agents don’t automatically reach for external tools.
I rarely read every line of code. Instead, I browse on Sourcetree (!) track architecture, write tests, log experiments, and step in when the agent goes astray. I’ve noticed agents struggle most with what they can’t see directly. Some examples I suffered, from worse to better: graphics with multiple coordinate systems > generics > inheritance > composition > Entity Component System (ECS). I found ECS very effective, giant state + reducers = no problem for agents.
So far, Semly is a workflow enhancer, not a necessity. Likely it will save some tokens and you won’t even notice. But I see no reason not to use it. I think in the near future agents will run semantic search hidden in the background. It’s also likely we will be able to create complete technical documentation for projects –the technology for it is already here; but that’s another story.
Semly is available as CLI and MCP. MCP is like dropping the whole manual in the context for better or worse. If you use Claude Code:
claude mcp add semly --scope user -- semly mcp
claude mcp remove "semly" -s user