CORTEX v2.0 — SKILL CATALOG
Comprehensive AI Skill Catalog
Created: 03/03/2026 Total skills: 51 skills / 8 categories
Skill Interface Contract (Base)
Every skill in Cortex MUST implement the following interface:
type SkillCategory = 'rag' | 'memory' | 'agent' | 'code' | 'learning' | 'efficiency' | 'reasoning' | 'tool';
type Priority = 'P0' | 'P1' | 'P2';
interface SkillConfig {
projectId: string;
settings: Record<string, unknown>;
llmClient: LLMClient;
memoryManager: MemoryManager;
brainEngine: BrainEngine;
}
interface SkillInput {
query: string;
projectId: string;
conversationId: string;
context: {
coreMemory: string;
retrievedChunks: CodeChunk[];
conversationHistory: Message[];
userPreferences: UserPreferences;
};
config: Record<string, unknown>;
}
interface SkillOutput {
response: string;
citations: Citation[];
confidence: number; // 0-1
tokensUsed: { input: number; output: number; cached: number };
skillChain: string[]; // which skills were invoked
metadata: Record<string, unknown>;
}
interface SkillMetrics {
totalCalls: number;
avgLatencyMs: number;
successRate: number;
avgTokensPerCall: number;
cacheHitRate: number;
lastUsed: Date;
}
interface HealthStatus {
healthy: boolean;
message: string;
lastCheck: Date;
dependencies: { name: string; healthy: boolean }[];
}
interface CortexSkill {
readonly name: string;
readonly version: string;
readonly category: SkillCategory;
readonly priority: Priority;
readonly dependencies: string[];
readonly description: string;
initialize(config: SkillConfig): Promise<void>;
canHandle(input: SkillInput): boolean | Promise<boolean>;
execute(input: SkillInput): Promise<SkillOutput>;
shutdown(): Promise<void>;
healthCheck(): Promise<HealthStatus>;
getMetrics(): SkillMetrics;
}
Category 1: Advanced RAG (8 skills)
1.1 GraphRAG — Knowledge Graph + Vector Search
Priority: P0
Effort: 2 weeks
File: electron/services/skills/rag/graphrag-skill.ts
Description: Combines a knowledge graph with vector search to answer multi-hop questions. Instead of only searching vectors (finding similar chunks), GraphRAG builds a relationship graph between code entities (files, functions, classes, modules) and traverses it to find answers.
How it works:
- Entity Extraction: Uses LLM + Tree-sitter to extract entities from code (functions, classes, imports, exports)
- Relationship Mapping: Builds edges: imports, calls, inherits, implements, uses
- Graph Storage: Stores graph in SQLite with tables: graph_nodes, graph_edges
- Embedding: Embeds each node with context (file path + code + relationships)
- Query: When user asks, finds start nodes via vector search, then traverses the graph to expand context
- Fusion: Combines graph results + vector results via Reciprocal Rank Fusion
Integration:
class GraphRAGSkill implements CortexSkill {
name = 'graphrag';
category = 'rag' as const;
priority = 'P0' as const;
async buildGraph(projectId: string): Promise<void> {
// 1. Scan all files with Tree-sitter
// 2. Extract entities (functions, classes, imports)
// 3. Build edges (calls, imports, inherits)
// 4. Store in SQLite graph tables
// 5. Embed nodes with context
}
async execute(input: SkillInput): Promise<SkillOutput> {
// 1. Vector search for relevant nodes
// 2. Graph traversal (BFS 2-3 hops)
// 3. Collect context from traversed nodes
// 4. Merge with vector search results
// 5. Send to LLM with graph context
}
}
Dependencies: better-sqlite3, web-tree-sitter, embedder References:
- Microsoft GraphRAG: github.com/microsoft/graphrag
- Paper: From Local to Global (Microsoft Research 2024)
- Agentic GraphRAG: Combined with multi-agent (2025 trend)
1.2 Self-RAG — Self-Evaluating Retrieval Quality
Priority: P1
Effort: 1 week
File: electron/services/skills/rag/self-rag-skill.ts
Description: Instead of always trusting retrieval results, Self-RAG self-evaluates: (1) Is retrieval needed? (2) Are retrieved chunks relevant? (3) Is the response supported by evidence? (4) Is the response useful?
How it works:
- Retrieval Decision: LLM decides whether a search is needed (simple questions can be answered directly)
- Relevance Check: After retrieval, LLM rates each chunk: relevant / partially relevant / irrelevant
- Filter: Removes irrelevant chunks, keeps relevant ones
- Generate: Creates response from relevant chunks
- Support Check: LLM self-checks whether the response is supported by evidence
- Iterate: If support is low, goes back to step 1 with a refined query
Dependencies: llm-client, vector-search References: Paper: Self-RAG (arxiv 2310.11511)
1.3 Corrective RAG (CRAG) — Auto-Correcting Poor Retrieval
Priority: P1
Effort: 1 week
File: electron/services/skills/rag/crag-skill.ts
Description: When retrieval quality is low (confidence < threshold), CRAG automatically: (1) refines query, (2) re-searches with a different strategy, (3) falls back to web search if needed.
How it works:
- Retrieve: Performs normal vector search
- Evaluate: Uses cross-encoder to score relevance
- Decision:
- Score > 0.7: CORRECT → use directly
- Score 0.3–0.7: AMBIGUOUS → refine query and re-search
- Score < 0.3: INCORRECT → completely rewrite query, try different strategy
- Correction: Transforms retrieved docs (removes irrelevant parts, highlights key info)
Dependencies: llm-client, vector-search, cross-encoder model References: Paper: CRAG (arxiv 2401.15884)
1.4 Adaptive RAG — Auto-Selecting Strategy
Priority: P1
Effort: 1 week
File: electron/services/skills/rag/adaptive-rag-skill.ts
Description: Analyzes query complexity to select the right strategy:
- No retrieval: Simple questions, LLM answers directly (saves tokens)
- Single-hop: Specific question about 1 file/function → simple vector search
- Multi-hop: Complex question requiring connecting multiple pieces → GraphRAG or iterative retrieval
How it works:
- Classify: LLM + heuristics classify query complexity (low/medium/high)
- Route: Selects corresponding RAG strategy
- Execute: Runs the selected strategy
- Learn: Saves classification results to improve the classifier over time (feedback loop)
Dependencies: llm-client, graphrag-skill, vector-search References: Paper: Adaptive RAG (arxiv 2403.14403)
1.5 RAG Fusion — Multi-Query + Reciprocal Rank Fusion
Priority: P0
Effort: 3 days
File: electron/services/skills/rag/rag-fusion-skill.ts
Description: Instead of searching with 1 original query, generates 3–5 query variants from multiple perspectives, searches separately, then merges results via Reciprocal Rank Fusion (RRF).
How it works:
- Query Generation: LLM generates 3–5 variants of the original query
- Example: ‘How does auth work?’ →
- ‘authentication middleware implementation’
- ‘login signup handler code’
- ‘JWT token validation flow’
- ‘user session management’
- Example: ‘How does auth work?’ →
- Parallel Search: Searches each variant independently
- RRF Merge: score(doc) = sum(1 / (k + rank_i)) for each query i
- Deduplicate: Removes duplicate chunks
- Return: Top-K merged results
Dependencies: vector-search, llm-client References: LangChain RAG Fusion, Paper: Reciprocal Rank Fusion
1.6 HyDE — Hypothetical Document Embedding
Priority: P1
Effort: 2 days
File: electron/services/skills/rag/hyde-skill.ts
Description: Instead of embedding the query directly (short query, insufficient context), LLM generates a ‘hypothetical document’ that WOULD answer the query, then uses that document for search. The document resembles real code, making vector search more accurate.
How it works:
- Generate: LLM creates a hypothetical code/document that would answer the query
- Embed: Embeds the hypothetical document (not the query)
- Search: Finds chunks similar to the hypothetical document
- Answer: Uses real chunks (not the hypothetical) to generate the response
Dependencies: llm-client, embedder, vector-search References: Paper: HyDE (arxiv 2212.10496)
1.7 Contextual Retrieval — Anthropic Approach
Priority: P0
Effort: 3 days
File: electron/services/skills/rag/contextual-retrieval-skill.ts
Description: Problem with traditional chunking: each chunk loses context (which file? which function? which module?). Contextual Retrieval adds context to each chunk BEFORE embedding.
How it works:
- Chunk: Splits code into chunks (using Tree-sitter for code-aware chunking)
- Add Context: For each chunk, prepend:
- File path and language
- Parent function/class name
- Module description (from docstring or architecture info)
- Import/export relationships
- Embed: Embeds the chunk WITH context (more accurate embedding)
- Store: Saves contextual chunk to vector DB
Example:
// BEFORE (regular chunk):
function validateToken(token: string) { ... }
// AFTER (contextual chunk):
// File: src/auth/middleware.ts
// Module: Authentication
// Parent: AuthMiddleware class
// Imports: jsonwebtoken, config
// Description: Validates JWT token from request header
function validateToken(token: string) { ... }
Dependencies: code-chunker, embedder, tree-sitter References: Anthropic Blog (Nov 2024): Contextual Retrieval
1.8 Parent-Child Chunking
Priority: P1
Effort: 2 days
File: electron/services/skills/rag/parent-child-chunk-skill.ts
Description: Creates 2 tiers of chunks: small child chunks (for precise search) and large parent chunks (for sufficient context). When searching, matches child chunks but returns parent chunks.
How it works:
- Parent Chunking: Splits code into large chunks (entire file or large section)
- Child Chunking: Splits each parent into small children (1 function, 1 class)
- Link: Each child keeps a reference to its parent
- Search: Searches on child chunks (small, precise)
- Return: Returns parent chunk (large, more context)
Dependencies: code-chunker, vector-search References: LlamaIndex Parent-Child Retriever
Category 2: Self-Learning (6 skills)
2.1 DSPy Prompt Optimization
Priority: P0 | Effort: 2 weeks | File: electron/services/skills/learning/dspy-skill.ts
Description: DSPy (Stanford) lets you define AI pipelines in code (not prompts). The optimizer then AUTOMATICALLY finds the best prompts based on metrics. Cortex uses DSPy to self-improve response quality over time.
How it works:
- Define Signatures: Defines input/output for each step (retrieve, reason, answer)
- Define Metrics: Creates a metric function (relevance score, user satisfaction proxy)
- Collect Examples: Collects 50–100+ query-response pairs with feedback
- Optimize: Runs DSPy optimizer (MIPROv2 or BootstrapFewShot) to find the best prompts
- Deploy: Updates prompts in Cortex, saves old version for rollback
- Monitor: Tracks metrics after optimization, compares to baseline
// DSPy-inspired TypeScript implementation
interface DSPySignature {
name: string;
inputs: { name: string; type: string; description: string }[];
outputs: { name: string; type: string; description: string }[];
}
class DSPySkill implements CortexSkill {
name = 'dspy-optimizer';
async optimize(
signature: DSPySignature,
examples: TrainingExample[],
metric: (pred: any, expected: any) => number
): Promise<OptimizedPrompt> {
// 1. Generate prompt candidates
// 2. Evaluate each on examples using metric
// 3. Select best performing prompt
// 4. Store in prompt registry
}
}
Dependencies: llm-client (Python DSPy can be called via child_process or core logic re-implemented in TS) References: dspy.ai, Paper: DSPy (Stanford 2023), MIPROv2 optimizer
2.2 Behavioral Analytics Engine
Priority: P0 | Effort: 1 week | File: electron/services/skills/learning/behavioral-analytics.ts
Description: Collects IMPLICIT feedback from user actions: accept/reject suggestion, edit after suggestion, time-to-accept, follow-up questions, copy-paste patterns.
Events tracked:
response_accepted: User does not edit, continues conversationresponse_edited: User copies response but modifies before usingresponse_rejected: User asks the same topic again (implicit rejection)code_applied: User copies code from response and pastes into editorfollow_up_asked: User asks a follow-up (response was insufficient)time_to_action: Time from response to next user actionsession_length: Total session duration (engagement metric)
interface BehavioralEvent {
id: string;
type: 'accept' | 'reject' | 'edit' | 'ignore' | 'follow_up' | 'code_applied';
queryId: string;
responseId: string;
skillUsed: string;
timestamp: number;
timeToAction: number;
editDistance?: number;
metadata: Record<string, unknown>;
}
Storage: SQLite table behavioral_events with indexes on type, timestamp, skillUsed
References: GitHub Copilot personalization (workspace indexing), Letta skill learning
2.3 Learned Reranking
Priority: P1 | Effort: 1 week | File: electron/services/skills/learning/learned-reranker.ts
Description: Improves search ranking based on user interactions. When a user accepts a response, the chunks used are boosted. When rejected, chunks are demoted.
How it works:
- Collect Signals: From behavioral events, maps response quality → retrieved chunks
- Feature Extraction: For each chunk: vector similarity, BM25 score, graph distance, past usefulness
- Train Ranker: Lightweight cross-encoder fine-tuned on feedback data
- Apply: Reranks search results before sending to LLM
- Update: Retrains periodically (every 100 new events)
Dependencies: behavioral-analytics, vector-search, cross-encoder model References: Learning to Rank (LTR), existing learned-reranker.ts needs upgrade
2.4 Preference Learning
Priority: P1 | Effort: 1 week | File: electron/services/skills/learning/preference-learning.ts
Description: Learns the user’s coding style and preferences: naming conventions, architecture patterns, response format, level of detail.
Signals collected:
- Edits the user makes on suggestions (style corrections)
- Code patterns in the user’s repos (naming, structure)
- Response format preferences (detailed vs. summary)
- Language preferences (English vs. other)
Storage: Core Memory (always in system prompt) + Archival Memory (details)
2.5 Active Learning
Priority: P2 | Effort: 3 days | File: electron/services/skills/learning/active-learning.ts
Description: When Cortex is uncertain, ASK the user instead of guessing. But only asks when the answer would significantly improve learning (not over-asking).
How it works:
- Uncertainty Detection: When confidence < 0.6, considers asking the user
- Value Estimation: How much is the user’s answer worth for training?
- Ask Strategy: Asks concisely, specifically, easy to answer (Yes/No or choose A/B)
- Learn: Updates memory and preferences from the answer
2.6 RLAIF (RL from AI Feedback)
Priority: P2 | Effort: 2 weeks | File: electron/services/skills/learning/rlaif-skill.ts
Description: AI critiques itself. After generating a response, a ‘critic’ LLM evaluates and scores it. The score is used to improve prompt/retrieval.
How it works:
- Generate: Creates a normal response
- Critique: A different LLM (or same model with different prompt) evaluates:
- Accuracy: Is the response correct with respect to the code?
- Completeness: Is anything important missing?
- Relevance: Does it answer the actual question?
- Score: Aggregates score from all 3 criteria
- Learn: Uses scores to update DSPy optimization targets
Dependencies: llm-client, dspy-skill References: Paper: RLAIF (Google 2023)
Category 3: Memory System (5 skills)
3.1 Tiered Memory (Letta/MemGPT Inspired)
Priority: P0 | Effort: 2 weeks | Files: electron/services/memory/
Description: 3-tier memory like an OS:
- Core Memory (~2000 tokens): Always in the system prompt. Contains user profile, project context, preferences. Agent can SELF-EDIT.
- Archival Memory (unlimited): Long-term storage, vector-searchable. Contains past decisions, patterns, lessons learned.
- Recall Memory (conversation): Conversation history, searchable by content and time.
interface MemoryManager {
core: CoreMemory; // Always in context
archival: ArchivalMemory; // Long-term, searchable
recall: RecallMemory; // Conversation history
loadContext(projectId: string): Promise<MemoryContext>;
updateCore(key: string, value: string): Promise<void>;
archiveMemory(content: string, metadata: any): Promise<void>;
searchArchival(query: string, limit?: number): Promise<Memory[]>;
searchRecall(query: string, limit?: number): Promise<Message[]>;
}
References: Letta (github.com/letta-ai/letta), MemGPT paper (UC Berkeley)
3.2 Nano-Brain Integration (Upgrade)
Priority: P0 | Effort: 3 days | File: electron/services/memory/nano-brain-bridge.ts
Description: nano-brain already exists. Upgrade it to serve as the backend for the Archival Memory tier. Maintain compatibility with existing data.
3.3 Cross-Session Learning
Priority: P0 | Effort: 1 week | File: electron/services/memory/cross-session.ts
Description: Agent remembers and improves across every session. At the start of a new session, loads relevant memories from archival. At the end, automatically summarizes and archives key insights.
How it works:
- Session Start: Queries archival memory with current project context
- During Session: Tracks important decisions, patterns discovered
- Session End: Summarizes session → stores in archival memory
- Next Session: Previous insights available automatically
3.4 Memory Compaction
Priority: P1 | Effort: 3 days | File: electron/services/memory/compaction.ts
Description: When archival memory grows too large, automatically summarizes and compacts it. Keeps important information, discards unnecessary details.
3.5 Memory Decay
Priority: P2 | Effort: 2 days | File: electron/services/memory/decay.ts
Description: Old and rarely-accessed information has its relevance score reduced over time. Outdated information (code that has since changed) is flagged for cleanup.
Category 4: Efficiency Engine (6 skills)
4.1 LLMLingua Context Compression
Priority: P0 | Effort: 1 week | File: electron/services/skills/efficiency/llmlingua-skill.ts
Description: Compresses context 3–6x before sending to LLM. LLMLingua-2 removes unnecessary tokens while preserving meaning. Reduces cost by 60–80%.
How it works:
- Input: Retrieved chunks + conversation history + system prompt
- Compress: LLMLingua-2 removes redundant tokens
- Validate: Checks that compressed context still contains sufficient information
- Send: Sends compressed context to LLM (fewer tokens = cheaper)
Integration: Called via Python child_process (LLMLingua is a Python library) or core logic ported to TS References: github.com/microsoft/LLMLingua, Integrated in LangChain + LlamaIndex
4.2 Semantic Caching
Priority: P0 | Effort: 1 week | File: electron/services/skills/efficiency/semantic-cache.ts
Description: Caches responses based on SEMANTIC similarity (not exact match). If the user asks a similar query to a previous one, returns the cached response instead of calling the LLM again.
How it works:
- Query Embedding: Embeds the user query
- Cache Search: Finds cached queries with similarity > 0.92
- Hit: Returns cached response (0 tokens, instant)
- Miss: Calls LLM normally, caches the response
- Invalidation: Clears cache when the brain is re-synced
interface SemanticCache {
get(queryEmbedding: number[], threshold?: number): Promise<CachedResponse | null>;
set(queryEmbedding: number[], query: string, response: string, ttl?: number): Promise<void>;
invalidate(projectId: string): Promise<void>;
getStats(): { hits: number; misses: number; hitRate: number };
}
References: GPTCache (github.com/zilliztech/GPTCache)
4.3 Model Routing
Priority: P0 | Effort: 1 week | File: electron/services/skills/efficiency/model-router.ts
Description: Not every query needs the most expensive model. Model routing classifies query complexity and routes accordingly:
- Simple (greeting, clarification): GPT-4o-mini ($0.15/1M tokens)
- Medium (code explanation, single-file): GPT-4o ($2.50/1M tokens)
- Complex (architecture, multi-file, debugging): Claude Opus / o1 ($15/1M tokens)
interface ModelRoute {
model: string;
maxTokens: number;
costPer1kInput: number;
costPer1kOutput: number;
qualityScore: number;
avgLatencyMs: number;
}
class ModelRouter {
classifyComplexity(query: string, context: SkillContext): 'low' | 'medium' | 'high';
selectModel(complexity: string): ModelRoute;
// Learns from feedback: if cheap model fails, escalates
}
4.4 Prompt Caching
Priority: P1 | Effort: 3 days | File: electron/services/skills/efficiency/prompt-cache.ts
Description: System prompt + project context are usually the same across queries. Caches this prefix so it is not re-sent every time.
4.5 Adaptive Token Budget
Priority: P1 | Effort: 2 days | File: electron/services/skills/efficiency/token-budget.ts
Description: Allocates token budget based on query complexity. Simple query: 500 output tokens. Complex: 4000 output tokens. Prevents waste.
4.6 ChunkKV (KV Cache Compression)
Priority: P2 | Effort: 2 weeks | File: electron/services/skills/efficiency/chunkkv-skill.ts
Description: Compresses the LLM’s KV cache by semantic chunks instead of individual tokens. Reduces memory by 70% for long-context inference. References: Paper: ChunkKV (NeurIPS 2025)
Category 5: Agent/Tool Skills - MCP Based (9 skills)
5.1 MCP Protocol Core
Priority: P0 | Effort: 1 week | File: electron/services/skills/mcp/mcp-client.ts
Description: Implementation of the MCP client for connecting to any MCP server. This is the foundation for all tool integrations.
interface MCPClient {
connect(serverConfig: MCPServerConfig): Promise<MCPConnection>;
listTools(connection: MCPConnection): Promise<MCPTool[]>;
callTool(connection: MCPConnection, tool: string, args: any): Promise<MCPResult>;
listResources(connection: MCPConnection): Promise<MCPResource[]>;
readResource(connection: MCPConnection, uri: string): Promise<any>;
}
References: modelcontextprotocol.io, 5800+ MCP servers (2025)
5.2–5.9: Playwright, GitHub, Jira, Confluence, Slack, Code Execution, Sequential Thinking, File System
Each tool is an MCP server adapter wrapped as a CortexSkill. Implementation details are the same for all:
- Connect to MCP server
- Wrap available tools as skill methods
- Handle errors and timeouts
- Log usage for cost tracking
Category 6: Reasoning Skills (6 skills)
6.1 ReAct (Reasoning + Acting)
Priority: P0 | Effort: 1 week | File: electron/services/skills/reasoning/react-skill.ts
Description: Loop: Thought → Action → Observation → repeat until answer is found.
Loop:
1. THOUGHT: Think about how to solve the problem
2. ACTION: Perform an action (search, read file, run code)
3. OBSERVATION: Observe the result
4. If enough information → ANSWER
5. If not → go back to step 1
6.2–6.6: Plan-and-Execute, Reflexion, LATS, Chain of Thought, Tree of Thoughts
Details are similar. Each skill implements one specific reasoning pattern.
Category 7: Code Intelligence (6 skills)
7.1–7.6: Tree-sitter AST, AST-grep, LSP, Dependency Graph, Architecture Inference, Tech Debt
Existing architecture-analyzer.ts and code-chunker.ts. Upgrade to the CortexSkill interface.
Add AST-grep for pattern matching and LSP for go-to-definition/references.
Category 8: Fine-tuning & Local AI (5 skills)
8.1 Custom Embedding Fine-tuning
Priority: P1 | Effort: 2 weeks | File: electron/services/skills/finetune/embedding-finetune.ts
Description: Trains a custom embedding model on your codebase. The embeddings will understand your code better than a generic model.
How it works:
- Generate Pairs: Creates positive pairs (related code chunks) and negative pairs (unrelated)
- Fine-tune: Uses sentence-transformers to fine-tune on the pairs
- Evaluate: Compares retrieval quality before/after fine-tuning
- Deploy: Replaces the generic embedder with the custom model
8.2–8.5: LoRA, Synthetic Data, DPO, Local Model Serving
Advanced skills for a later phase. LoRA requires GPU, DPO requires more data. Local Model Serving (Ollama) is P1 — enables offline mode.
Summary
| Category | Skills | P0 | P1 | P2 |
|---|---|---|---|---|
| Advanced RAG | 8 | 3 | 5 | 0 |
| Self-Learning | 6 | 2 | 2 | 2 |
| Memory System | 5 | 3 | 1 | 1 |
| Efficiency Engine | 6 | 3 | 2 | 1 |
| Agent/Tool (MCP) | 9 | 3 | 4 | 2 |
| Reasoning | 6 | 2 | 2 | 2 |
| Code Intelligence | 6 | 3 | 2 | 1 |
| Fine-tuning | 5 | 0 | 3 | 2 |
| TOTAL | 51 | 19 | 21 | 11 |
Sprints 13–18 focus on the 19 P0 skills first, then P1, then P2.