# Docs RAG A comprehensive TypeScript-based project for storing markdown documents in Qdrant vector database with Ollama embeddings, plus a powerful source code documentation parser. Includes both CLI tools and MCP server integration. ## Features ### Document Management - Store markdown documents in Qdrant collections - Vectorize documents using Ollama embeddings - Split documents by paragraphs (one paragraph = one vector) - Store file metadata (hash, filename, last modified) - CLI tool for document management - MCP server for integration - Recursive folder scanning ### 🚀 Source Code Parser (NEW!) - Parse C++ source code with full documentation support - Extract comments (`/** */`, `///`, `//!`) and tags (`@brief`, `@param`, `@return`, `@example`) - Generate structured markdown documentation matching Material theme - Create class hierarchies and method documentation - Modular architecture for adding new languages - Integrated CLI with existing commands ## Installation ```bash npm install ``` ## Configuration Copy `.env.example` to `.env` and configure your settings: ```env # Qdrant URL Configuration QDRANT_URL=http://localhost:6333 QDRANT_API_KEY= # Ollama Configuration OLLAMA_URL=http://localhost:11434 OLLAMA_EMBEDDING_MODEL=nomic-embed-text ``` ## Building ```bash npm run build ``` ## CLI Usage ### Document Management #### Add a document collection ```bash npm run dev add --name "my-docs" --folder "./docs" --recursive ``` #### Search within a document ```bash npm run dev search --document "my-docs" --query "installation guide" --limit 5 ``` #### List all collections ```bash npm run dev list ``` #### Get collection info ```bash npm run dev info --document "my-docs" ``` ### 🔥 Source Code Parsing (NEW!) #### Parse C++ source files ```bash npm run parse --input "./src" --output "./docs" --languages cpp ``` #### List supported languages ```bash npm run parse-list-languages ``` #### Validate input directory ```bash npm run parse-validate --input "./src" ``` #### Dry run to preview what will be processed ```bash npm run parse --input "./src" --output "./docs" --dry-run ``` ## Output Structure ### Document Management Collections are stored as vectors in Qdrant with metadata for retrieval. ### Source Code Parser The parser generates structured documentation: ``` docs/ ├── index.md # Main index with package references └── Calculator/ ├── index.md # Module overview ├── Calculator.md # Class documentation ├── add.md # Method documentation └── multiply.md # Method documentation ``` ## MCP Server ### Quick Start Start the MCP server: ```bash npm run mcp:cli ``` Or use the dedicated CLI: ```bash docs-rag-mcp ``` ### Client Configuration For MCP clients, use this configuration: ```json { "mcpServers": { "docs-rag": { "command": "npx", "args": ["-y", "docs-rag-mcp"], "cwd": "/path/to/docs-rag" } } } ``` ### Available MCP Tools #### Document Management 1. **add_document** - Add a document collection from markdown files - Parameters: `name` (string), `folder` (string), `recursive` (boolean, default: true) 2. **search_documents** - Search within a document collection - Parameters: `documentName` (string), `query` (string), `limit` (number, default: 10) 3. **list_collections** - List all document collections - Parameters: none 4. **get_document_info** - Get information about a document collection - Parameters: `documentName` (string) ## Project Structure ``` src/ ├── config/ # Configuration management ├── lib/ │ ├── parser/ # Source code parser library │ │ ├── core/ # Core interfaces and generators │ │ ├── parsers/ # Language-specific parsers │ │ └── utils/ # File and markdown utilities │ └── fileProcessor.ts # Utility functions (file processing) ├── services/ # Core services │ ├── documentService.ts # Document management │ ├── parseService.ts # Source code parser │ ├── ollamaService.ts # Embedding generation │ └── qdrantService.ts # Qdrant client ├── cli/ # CLI tool implementation │ ├── index.ts # Main CLI with all commands │ └── parser-commands.ts # Parser-specific commands ├── mcp/ # MCP server implementation │ ├── server.ts # MCP server │ ├── stdio.ts # stdio transport │ ├── agentic-stdio.ts # Agentic stdio │ └── cli.ts # CLI interface └── types/ # Type definitions └── parser-types.ts # Parser type definitions ``` ## How It Works ### Document Management 1. **Document Processing**: Markdown files are scanned recursively and split into paragraphs 2. **Embedding Creation**: Each paragraph is converted to a vector using Ollama 3. **Storage**: Vectors are stored in Qdrant with metadata (file hash, name, last modified) 4. **Search**: Semantic search finds relevant paragraphs based on query embeddings ### Source Code Parser 1. **File Scanning**: Recursively scans source files by extensions 2. **Comment Extraction**: Parses `/** */`, `///`, and `//!` documentation styles 3. **Tag Processing**: Extracts `@brief`, `@param`, `@return`, `@example` tags 4. **AST Building**: Creates abstract syntax tree with classes and methods 5. **Markdown Generation**: Outputs structured documentation with Material theme ## Dependencies ### Core Dependencies - `@qdrant/qdrant-js` - Qdrant client - `@modelcontextprotocol/sdk` - MCP server framework - `commander` - CLI framework - `ollama` - Embedding generation - `dotenv` - Environment configuration - `fs-extra` - Enhanced file system operations - `crypto` - Hash generation - `glob` - File pattern matching ### Parser Dependencies - `typescript` - TypeScript support and compilation - `tsx` - TypeScript execution runtime - `zod` - Type validation and schemas ## Prerequisites ### Document Management - Qdrant server running - Ollama server running with embedding model - Node.js (TypeScript support) ### Source Code Parser - No additional prerequisites - works with existing Node.js setup - Supports C++ files (.cpp, .h, .hpp, .cxx, .cc, .c) ## Use Cases ### Document Management - **Technical Documentation**: Store API docs, user guides, and technical specs - **Knowledge Base**: Build searchable knowledge base from markdown documents - **Research Assistant**: Create AI-powered search for technical content - **API Documentation**: Store and search through API documentation ### Source Code Parser - **API Documentation Generation**: Automatically generate docs from code comments - **Developer Onboarding**: Create documentation for new team members - **Code Review**: Ensure all components have proper documentation - **Maintenance**: Keep documentation in sync with codebase changes - **Multi-language Projects**: Expandable to Python, Java, and other languages ## Extending the Parser The parser is designed to be extensible: 1. **Add New Languages**: Implement `ILanguageParser` interface 2. **Custom Templates**: Modify `DocumentationGenerator` output format 3. **Additional Tags**: Extend comment parsing for custom tags 4. **Output Formats**: Add support for HTML, PDF, or other formats See `src/lib/parser/parsers/base-parser.ts` for the base class structure.