|
@@ -1,9 +1,10 @@
|
|
|
# Docs RAG
|
|
# Docs RAG
|
|
|
|
|
|
|
|
-A TypeScript-based project for storing markdown documents in Qdrant vector database with Ollama embeddings. Includes both CLI tools and MCP server integration.
|
|
|
|
|
|
|
+A comprehensive TypeScript-based project for storing markdown documents in Qdrant vector database with Ollama embeddings, plus a powerful source code documentation parser. Includes both CLI tools and MCP server integration.
|
|
|
|
|
|
|
|
## Features
|
|
## Features
|
|
|
|
|
|
|
|
|
|
+### Document Management
|
|
|
- Store markdown documents in Qdrant collections
|
|
- Store markdown documents in Qdrant collections
|
|
|
- Vectorize documents using Ollama embeddings
|
|
- Vectorize documents using Ollama embeddings
|
|
|
- Split documents by paragraphs (one paragraph = one vector)
|
|
- Split documents by paragraphs (one paragraph = one vector)
|
|
@@ -12,6 +13,14 @@ A TypeScript-based project for storing markdown documents in Qdrant vector datab
|
|
|
- MCP server for integration
|
|
- MCP server for integration
|
|
|
- Recursive folder scanning
|
|
- Recursive folder scanning
|
|
|
|
|
|
|
|
|
|
+### 🚀 Source Code Parser (NEW!)
|
|
|
|
|
+- Parse C++ source code with full documentation support
|
|
|
|
|
+- Extract comments (`/** */`, `///`, `//!`) and tags (`@brief`, `@param`, `@return`, `@example`)
|
|
|
|
|
+- Generate structured markdown documentation matching Material theme
|
|
|
|
|
+- Create class hierarchies and method documentation
|
|
|
|
|
+- Modular architecture for adding new languages
|
|
|
|
|
+- Integrated CLI with existing commands
|
|
|
|
|
+
|
|
|
## Installation
|
|
## Installation
|
|
|
|
|
|
|
|
```bash
|
|
```bash
|
|
@@ -40,50 +49,85 @@ npm run build
|
|
|
|
|
|
|
|
## CLI Usage
|
|
## CLI Usage
|
|
|
|
|
|
|
|
-### Add a document collection
|
|
|
|
|
|
|
+### Document Management
|
|
|
|
|
|
|
|
|
|
+#### Add a document collection
|
|
|
```bash
|
|
```bash
|
|
|
npm run dev add --name "my-docs" --folder "./docs" --recursive
|
|
npm run dev add --name "my-docs" --folder "./docs" --recursive
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
-### Search within a document
|
|
|
|
|
-
|
|
|
|
|
|
|
+#### Search within a document
|
|
|
```bash
|
|
```bash
|
|
|
npm run dev search --document "my-docs" --query "installation guide" --limit 5
|
|
npm run dev search --document "my-docs" --query "installation guide" --limit 5
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
-### List all collections
|
|
|
|
|
-
|
|
|
|
|
|
|
+#### List all collections
|
|
|
```bash
|
|
```bash
|
|
|
npm run dev list
|
|
npm run dev list
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
-### Get collection info
|
|
|
|
|
-
|
|
|
|
|
|
|
+#### Get collection info
|
|
|
```bash
|
|
```bash
|
|
|
npm run dev info --document "my-docs"
|
|
npm run dev info --document "my-docs"
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
|
|
+### 🔥 Source Code Parsing (NEW!)
|
|
|
|
|
+
|
|
|
|
|
+#### Parse C++ source files
|
|
|
|
|
+```bash
|
|
|
|
|
+npm run parse --input "./src" --output "./docs" --languages cpp
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### List supported languages
|
|
|
|
|
+```bash
|
|
|
|
|
+npm run parse-list-languages
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### Validate input directory
|
|
|
|
|
+```bash
|
|
|
|
|
+npm run parse-validate --input "./src"
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+#### Dry run to preview what will be processed
|
|
|
|
|
+```bash
|
|
|
|
|
+npm run parse --input "./src" --output "./docs" --dry-run
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+## Output Structure
|
|
|
|
|
+
|
|
|
|
|
+### Document Management
|
|
|
|
|
+Collections are stored as vectors in Qdrant with metadata for retrieval.
|
|
|
|
|
+
|
|
|
|
|
+### Source Code Parser
|
|
|
|
|
+The parser generates structured documentation:
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+docs/
|
|
|
|
|
+├── index.md # Main index with package references
|
|
|
|
|
+└── Calculator/
|
|
|
|
|
+ ├── index.md # Module overview
|
|
|
|
|
+ ├── Calculator.md # Class documentation
|
|
|
|
|
+ ├── add.md # Method documentation
|
|
|
|
|
+ └── multiply.md # Method documentation
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
## MCP Server
|
|
## MCP Server
|
|
|
|
|
|
|
|
### Quick Start
|
|
### Quick Start
|
|
|
|
|
|
|
|
Start the MCP server:
|
|
Start the MCP server:
|
|
|
-
|
|
|
|
|
```bash
|
|
```bash
|
|
|
npm run mcp:cli
|
|
npm run mcp:cli
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
Or use the dedicated CLI:
|
|
Or use the dedicated CLI:
|
|
|
-
|
|
|
|
|
```bash
|
|
```bash
|
|
|
docs-rag-mcp
|
|
docs-rag-mcp
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
### Client Configuration
|
|
### Client Configuration
|
|
|
|
|
|
|
|
-For MCP clients, use this simple configuration:
|
|
|
|
|
-
|
|
|
|
|
|
|
+For MCP clients, use this configuration:
|
|
|
```json
|
|
```json
|
|
|
{
|
|
{
|
|
|
"mcpServers": {
|
|
"mcpServers": {
|
|
@@ -96,15 +140,9 @@ For MCP clients, use this simple configuration:
|
|
|
}
|
|
}
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
-See [MCP_CLIENTS.md](./MCP_CLIENTS.md) for detailed configuration examples for:
|
|
|
|
|
-- Claude Desktop
|
|
|
|
|
-- VS Code
|
|
|
|
|
-- Cursor
|
|
|
|
|
-- Claude Code CLI
|
|
|
|
|
-- Windsurf
|
|
|
|
|
-
|
|
|
|
|
### Available MCP Tools
|
|
### Available MCP Tools
|
|
|
|
|
|
|
|
|
|
+#### Document Management
|
|
|
1. **add_document** - Add a document collection from markdown files
|
|
1. **add_document** - Add a document collection from markdown files
|
|
|
- Parameters: `name` (string), `folder` (string), `recursive` (boolean, default: true)
|
|
- Parameters: `name` (string), `folder` (string), `recursive` (boolean, default: true)
|
|
|
|
|
|
|
@@ -121,22 +159,48 @@ See [MCP_CLIENTS.md](./MCP_CLIENTS.md) for detailed configuration examples for:
|
|
|
|
|
|
|
|
```
|
|
```
|
|
|
src/
|
|
src/
|
|
|
-├── config/ # Configuration management
|
|
|
|
|
-├── lib/ # Utility functions (file processing)
|
|
|
|
|
-├── services/ # Core services (Ollama, Qdrant, Document)
|
|
|
|
|
-├── cli/ # CLI tool implementation
|
|
|
|
|
-└── mcp/ # MCP server implementation
|
|
|
|
|
|
|
+├── config/ # Configuration management
|
|
|
|
|
+├── lib/
|
|
|
|
|
+│ ├── parser/ # Source code parser library
|
|
|
|
|
+│ │ ├── core/ # Core interfaces and generators
|
|
|
|
|
+│ │ ├── parsers/ # Language-specific parsers
|
|
|
|
|
+│ │ └── utils/ # File and markdown utilities
|
|
|
|
|
+│ └── fileProcessor.ts # Utility functions (file processing)
|
|
|
|
|
+├── services/ # Core services
|
|
|
|
|
+│ ├── documentService.ts # Document management
|
|
|
|
|
+│ ├── parseService.ts # Source code parser
|
|
|
|
|
+│ ├── ollamaService.ts # Embedding generation
|
|
|
|
|
+│ └── qdrantService.ts # Qdrant client
|
|
|
|
|
+├── cli/ # CLI tool implementation
|
|
|
|
|
+│ ├── index.ts # Main CLI with all commands
|
|
|
|
|
+│ └── parser-commands.ts # Parser-specific commands
|
|
|
|
|
+├── mcp/ # MCP server implementation
|
|
|
|
|
+│ ├── server.ts # MCP server
|
|
|
|
|
+│ ├── stdio.ts # stdio transport
|
|
|
|
|
+│ ├── agentic-stdio.ts # Agentic stdio
|
|
|
|
|
+│ └── cli.ts # CLI interface
|
|
|
|
|
+└── types/ # Type definitions
|
|
|
|
|
+ └── parser-types.ts # Parser type definitions
|
|
|
```
|
|
```
|
|
|
|
|
|
|
|
## How It Works
|
|
## How It Works
|
|
|
|
|
|
|
|
|
|
+### Document Management
|
|
|
1. **Document Processing**: Markdown files are scanned recursively and split into paragraphs
|
|
1. **Document Processing**: Markdown files are scanned recursively and split into paragraphs
|
|
|
2. **Embedding Creation**: Each paragraph is converted to a vector using Ollama
|
|
2. **Embedding Creation**: Each paragraph is converted to a vector using Ollama
|
|
|
3. **Storage**: Vectors are stored in Qdrant with metadata (file hash, name, last modified)
|
|
3. **Storage**: Vectors are stored in Qdrant with metadata (file hash, name, last modified)
|
|
|
4. **Search**: Semantic search finds relevant paragraphs based on query embeddings
|
|
4. **Search**: Semantic search finds relevant paragraphs based on query embeddings
|
|
|
|
|
|
|
|
|
|
+### Source Code Parser
|
|
|
|
|
+1. **File Scanning**: Recursively scans source files by extensions
|
|
|
|
|
+2. **Comment Extraction**: Parses `/** */`, `///`, and `//!` documentation styles
|
|
|
|
|
+3. **Tag Processing**: Extracts `@brief`, `@param`, `@return`, `@example` tags
|
|
|
|
|
+4. **AST Building**: Creates abstract syntax tree with classes and methods
|
|
|
|
|
+5. **Markdown Generation**: Outputs structured documentation with Material theme
|
|
|
|
|
+
|
|
|
## Dependencies
|
|
## Dependencies
|
|
|
|
|
|
|
|
|
|
+### Core Dependencies
|
|
|
- `@qdrant/qdrant-js` - Qdrant client
|
|
- `@qdrant/qdrant-js` - Qdrant client
|
|
|
- `@modelcontextprotocol/sdk` - MCP server framework
|
|
- `@modelcontextprotocol/sdk` - MCP server framework
|
|
|
- `commander` - CLI framework
|
|
- `commander` - CLI framework
|
|
@@ -144,9 +208,46 @@ src/
|
|
|
- `dotenv` - Environment configuration
|
|
- `dotenv` - Environment configuration
|
|
|
- `fs-extra` - Enhanced file system operations
|
|
- `fs-extra` - Enhanced file system operations
|
|
|
- `crypto` - Hash generation
|
|
- `crypto` - Hash generation
|
|
|
|
|
+- `glob` - File pattern matching
|
|
|
|
|
+
|
|
|
|
|
+### Parser Dependencies
|
|
|
|
|
+- `typescript` - TypeScript support and compilation
|
|
|
|
|
+- `tsx` - TypeScript execution runtime
|
|
|
|
|
+- `zod` - Type validation and schemas
|
|
|
|
|
|
|
|
## Prerequisites
|
|
## Prerequisites
|
|
|
|
|
|
|
|
|
|
+### Document Management
|
|
|
- Qdrant server running
|
|
- Qdrant server running
|
|
|
- Ollama server running with embedding model
|
|
- Ollama server running with embedding model
|
|
|
-- Node.js (TypeScript support)
|
|
|
|
|
|
|
+- Node.js (TypeScript support)
|
|
|
|
|
+
|
|
|
|
|
+### Source Code Parser
|
|
|
|
|
+- No additional prerequisites - works with existing Node.js setup
|
|
|
|
|
+- Supports C++ files (.cpp, .h, .hpp, .cxx, .cc, .c)
|
|
|
|
|
+
|
|
|
|
|
+## Use Cases
|
|
|
|
|
+
|
|
|
|
|
+### Document Management
|
|
|
|
|
+- **Technical Documentation**: Store API docs, user guides, and technical specs
|
|
|
|
|
+- **Knowledge Base**: Build searchable knowledge base from markdown documents
|
|
|
|
|
+- **Research Assistant**: Create AI-powered search for technical content
|
|
|
|
|
+- **API Documentation**: Store and search through API documentation
|
|
|
|
|
+
|
|
|
|
|
+### Source Code Parser
|
|
|
|
|
+- **API Documentation Generation**: Automatically generate docs from code comments
|
|
|
|
|
+- **Developer Onboarding**: Create documentation for new team members
|
|
|
|
|
+- **Code Review**: Ensure all components have proper documentation
|
|
|
|
|
+- **Maintenance**: Keep documentation in sync with codebase changes
|
|
|
|
|
+- **Multi-language Projects**: Expandable to Python, Java, and other languages
|
|
|
|
|
+
|
|
|
|
|
+## Extending the Parser
|
|
|
|
|
+
|
|
|
|
|
+The parser is designed to be extensible:
|
|
|
|
|
+
|
|
|
|
|
+1. **Add New Languages**: Implement `ILanguageParser` interface
|
|
|
|
|
+2. **Custom Templates**: Modify `DocumentationGenerator` output format
|
|
|
|
|
+3. **Additional Tags**: Extend comment parsing for custom tags
|
|
|
|
|
+4. **Output Formats**: Add support for HTML, PDF, or other formats
|
|
|
|
|
+
|
|
|
|
|
+See `src/lib/parser/parsers/base-parser.ts` for the base class structure.
|