# Docs RAG

A TypeScript-based project for storing markdown documents in Qdrant vector database with Ollama embeddings. Includes both CLI tools and MCP server integration.

## Features

- Store markdown documents in Qdrant collections
- Vectorize documents using Ollama embeddings
- Split documents by paragraphs (one paragraph = one vector)
- Store file metadata (hash, filename, last modified)
- CLI tool for document management
- MCP server for integration
- Recursive folder scanning

## Installation

```bash
npm install
```

## Configuration

Copy `.env.example` to `.env` and configure your settings:

```env
# Qdrant URL Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=

# Ollama Configuration
OLLAMA_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
```

## Building

```bash
npm run build
```

## CLI Usage

### Add a document collection

```bash
npm run dev add --name "my-docs" --folder "./docs" --recursive
```

### Search within a document

```bash
npm run dev search --document "my-docs" --query "installation guide" --limit 5
```

### List all collections

```bash
npm run dev list
```

### Get collection info

```bash
npm run dev info --document "my-docs"
```

## MCP Server

### Quick Start

Start the MCP server:

```bash
npm run mcp:cli
```

Or use the dedicated CLI:

```bash
docs-rag-mcp
```

### Client Configuration

For MCP clients, use this simple configuration:

```json
{
  "mcpServers": {
    "docs-rag": {
      "command": "npx",
      "args": ["-y", "docs-rag-mcp"],
      "cwd": "/path/to/docs-rag"
    }
  }
}
```

See [MCP_CLIENTS.md](./MCP_CLIENTS.md) for detailed configuration examples for:
- Claude Desktop
- VS Code  
- Cursor
- Claude Code CLI
- Windsurf

### Available MCP Tools

1. **add_document** - Add a document collection from markdown files
   - Parameters: `name` (string), `folder` (string), `recursive` (boolean, default: true)
   
2. **search_documents** - Search within a document collection
   - Parameters: `documentName` (string), `query` (string), `limit` (number, default: 10)
   
3. **list_collections** - List all document collections
   - Parameters: none
   
4. **get_document_info** - Get information about a document collection
   - Parameters: `documentName` (string)

## Project Structure

```
src/
├── config/          # Configuration management
├── lib/            # Utility functions (file processing)
├── services/       # Core services (Ollama, Qdrant, Document)
├── cli/            # CLI tool implementation
└── mcp/            # MCP server implementation
```

## How It Works

1. **Document Processing**: Markdown files are scanned recursively and split into paragraphs
2. **Embedding Creation**: Each paragraph is converted to a vector using Ollama
3. **Storage**: Vectors are stored in Qdrant with metadata (file hash, name, last modified)
4. **Search**: Semantic search finds relevant paragraphs based on query embeddings

## Dependencies

- `@qdrant/qdrant-js` - Qdrant client
- `@modelcontextprotocol/sdk` - MCP server framework
- `commander` - CLI framework
- `ollama` - Embedding generation
- `dotenv` - Environment configuration
- `fs-extra` - Enhanced file system operations
- `crypto` - Hash generation

## Prerequisites

- Qdrant server running
- Ollama server running with embedding model
- Node.js (TypeScript support)