# Docs RAG A TypeScript-based project for storing markdown documents in Qdrant vector database with Ollama embeddings. Includes both CLI tools and MCP server integration. ## Features - Store markdown documents in Qdrant collections - Vectorize documents using Ollama embeddings - Split documents by paragraphs (one paragraph = one vector) - Store file metadata (hash, filename, last modified) - CLI tool for document management - MCP server for integration - Recursive folder scanning ## Installation ```bash npm install ``` ## Configuration Copy `.env.example` to `.env` and configure your settings: ```env # Qdrant URL Configuration QDRANT_URL=http://localhost:6333 QDRANT_API_KEY= # Ollama Configuration OLLAMA_URL=http://localhost:11434 OLLAMA_EMBEDDING_MODEL=nomic-embed-text ``` ## Building ```bash npm run build ``` ## CLI Usage ### Add a document collection ```bash npm run dev add --name "my-docs" --folder "./docs" --recursive ``` ### Search within a document ```bash npm run dev search --document "my-docs" --query "installation guide" --limit 5 ``` ### List all collections ```bash npm run dev list ``` ### Get collection info ```bash npm run dev info --document "my-docs" ``` ## MCP Server ### Quick Start Start the MCP server: ```bash npm run mcp:cli ``` Or use the dedicated CLI: ```bash docs-rag-mcp ``` ### Client Configuration For MCP clients, use this simple configuration: ```json { "mcpServers": { "docs-rag": { "command": "npx", "args": ["-y", "docs-rag-mcp"], "cwd": "/path/to/docs-rag" } } } ``` See [MCP_CLIENTS.md](./MCP_CLIENTS.md) for detailed configuration examples for: - Claude Desktop - VS Code - Cursor - Claude Code CLI - Windsurf ### Available MCP Tools 1. **add_document** - Add a document collection from markdown files - Parameters: `name` (string), `folder` (string), `recursive` (boolean, default: true) 2. **search_documents** - Search within a document collection - Parameters: `documentName` (string), `query` (string), `limit` (number, default: 10) 3. **list_collections** - List all document collections - Parameters: none 4. **get_document_info** - Get information about a document collection - Parameters: `documentName` (string) ## Project Structure ``` src/ ├── config/ # Configuration management ├── lib/ # Utility functions (file processing) ├── services/ # Core services (Ollama, Qdrant, Document) ├── cli/ # CLI tool implementation └── mcp/ # MCP server implementation ``` ## How It Works 1. **Document Processing**: Markdown files are scanned recursively and split into paragraphs 2. **Embedding Creation**: Each paragraph is converted to a vector using Ollama 3. **Storage**: Vectors are stored in Qdrant with metadata (file hash, name, last modified) 4. **Search**: Semantic search finds relevant paragraphs based on query embeddings ## Dependencies - `@qdrant/qdrant-js` - Qdrant client - `@modelcontextprotocol/sdk` - MCP server framework - `commander` - CLI framework - `ollama` - Embedding generation - `dotenv` - Environment configuration - `fs-extra` - Enhanced file system operations - `crypto` - Hash generation ## Prerequisites - Qdrant server running - Ollama server running with embedding model - Node.js (TypeScript support)