Без опису

fszontagh 8ae697f134 Update README.md with comprehensive source code parser documentation 3 місяців тому
example-docs fb80726d5a Initial commit: Complete TypeScript source code parser implementation 3 місяців тому
src fb80726d5a Initial commit: Complete TypeScript source code parser implementation 3 місяців тому
.env.example fb80726d5a Initial commit: Complete TypeScript source code parser implementation 3 місяців тому
.gitignore 877579caf2 Update .gitignore with comprehensive exclusions 3 місяців тому
AGENTIC_CODER_CONFIG.md fb80726d5a Initial commit: Complete TypeScript source code parser implementation 3 місяців тому
IMPLEMENTATION_COMPLETE.md fb80726d5a Initial commit: Complete TypeScript source code parser implementation 3 місяців тому
MCP_CLIENTS.md fb80726d5a Initial commit: Complete TypeScript source code parser implementation 3 місяців тому
README.md 8ae697f134 Update README.md with comprehensive source code parser documentation 3 місяців тому
package-lock.json fb80726d5a Initial commit: Complete TypeScript source code parser implementation 3 місяців тому
package.json fb80726d5a Initial commit: Complete TypeScript source code parser implementation 3 місяців тому
tsconfig.json fb80726d5a Initial commit: Complete TypeScript source code parser implementation 3 місяців тому

README.md

Docs RAG

A comprehensive TypeScript-based project for storing markdown documents in Qdrant vector database with Ollama embeddings, plus a powerful source code documentation parser. Includes both CLI tools and MCP server integration.

Features

Document Management

  • Store markdown documents in Qdrant collections
  • Vectorize documents using Ollama embeddings
  • Split documents by paragraphs (one paragraph = one vector)
  • Store file metadata (hash, filename, last modified)
  • CLI tool for document management
  • MCP server for integration
  • Recursive folder scanning

🚀 Source Code Parser (NEW!)

  • Parse C++ source code with full documentation support
  • Extract comments (/** */, ///, //!) and tags (@brief, @param, @return, @example)
  • Generate structured markdown documentation matching Material theme
  • Create class hierarchies and method documentation
  • Modular architecture for adding new languages
  • Integrated CLI with existing commands

Installation

npm install

Configuration

Copy .env.example to .env and configure your settings:

# Qdrant URL Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=

# Ollama Configuration
OLLAMA_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

Building

npm run build

CLI Usage

Document Management

Add a document collection

npm run dev add --name "my-docs" --folder "./docs" --recursive

Search within a document

npm run dev search --document "my-docs" --query "installation guide" --limit 5

List all collections

npm run dev list

Get collection info

npm run dev info --document "my-docs"

🔥 Source Code Parsing (NEW!)

Parse C++ source files

npm run parse --input "./src" --output "./docs" --languages cpp

List supported languages

npm run parse-list-languages

Validate input directory

npm run parse-validate --input "./src"

Dry run to preview what will be processed

npm run parse --input "./src" --output "./docs" --dry-run

Output Structure

Document Management

Collections are stored as vectors in Qdrant with metadata for retrieval.

Source Code Parser

The parser generates structured documentation:

docs/
├── index.md                    # Main index with package references
└── Calculator/
    ├── index.md               # Module overview
    ├── Calculator.md          # Class documentation
    ├── add.md               # Method documentation
    └── multiply.md           # Method documentation

MCP Server

Quick Start

Start the MCP server:

npm run mcp:cli

Or use the dedicated CLI:

docs-rag-mcp

Client Configuration

For MCP clients, use this configuration:

{
  "mcpServers": {
    "docs-rag": {
      "command": "npx",
      "args": ["-y", "docs-rag-mcp"],
      "cwd": "/path/to/docs-rag"
    }
  }
}

Available MCP Tools

Document Management

  1. add_document - Add a document collection from markdown files

    • Parameters: name (string), folder (string), recursive (boolean, default: true)
  2. search_documents - Search within a document collection

    • Parameters: documentName (string), query (string), limit (number, default: 10)
  3. list_collections - List all document collections

    • Parameters: none
  4. get_document_info - Get information about a document collection

    • Parameters: documentName (string)

Project Structure

src/
├── config/              # Configuration management
├── lib/
│   ├── parser/          # Source code parser library
│   │   ├── core/       # Core interfaces and generators
│   │   ├── parsers/    # Language-specific parsers
│   │   └── utils/      # File and markdown utilities
│   └── fileProcessor.ts # Utility functions (file processing)
├── services/           # Core services
│   ├── documentService.ts    # Document management
│   ├── parseService.ts      # Source code parser
│   ├── ollamaService.ts     # Embedding generation
│   └── qdrantService.ts     # Qdrant client
├── cli/               # CLI tool implementation
│   ├── index.ts            # Main CLI with all commands
│   └── parser-commands.ts  # Parser-specific commands
├── mcp/               # MCP server implementation
│   ├── server.ts          # MCP server
│   ├── stdio.ts           # stdio transport
│   ├── agentic-stdio.ts   # Agentic stdio
│   └── cli.ts             # CLI interface
└── types/             # Type definitions
    └── parser-types.ts    # Parser type definitions

How It Works

Document Management

  1. Document Processing: Markdown files are scanned recursively and split into paragraphs
  2. Embedding Creation: Each paragraph is converted to a vector using Ollama
  3. Storage: Vectors are stored in Qdrant with metadata (file hash, name, last modified)
  4. Search: Semantic search finds relevant paragraphs based on query embeddings

Source Code Parser

  1. File Scanning: Recursively scans source files by extensions
  2. Comment Extraction: Parses /** */, ///, and //! documentation styles
  3. Tag Processing: Extracts @brief, @param, @return, @example tags
  4. AST Building: Creates abstract syntax tree with classes and methods
  5. Markdown Generation: Outputs structured documentation with Material theme

Dependencies

Core Dependencies

  • @qdrant/qdrant-js - Qdrant client
  • @modelcontextprotocol/sdk - MCP server framework
  • commander - CLI framework
  • ollama - Embedding generation
  • dotenv - Environment configuration
  • fs-extra - Enhanced file system operations
  • crypto - Hash generation
  • glob - File pattern matching

Parser Dependencies

  • typescript - TypeScript support and compilation
  • tsx - TypeScript execution runtime
  • zod - Type validation and schemas

Prerequisites

Document Management

  • Qdrant server running
  • Ollama server running with embedding model
  • Node.js (TypeScript support)

Source Code Parser

  • No additional prerequisites - works with existing Node.js setup
  • Supports C++ files (.cpp, .h, .hpp, .cxx, .cc, .c)

Use Cases

Document Management

  • Technical Documentation: Store API docs, user guides, and technical specs
  • Knowledge Base: Build searchable knowledge base from markdown documents
  • Research Assistant: Create AI-powered search for technical content
  • API Documentation: Store and search through API documentation

Source Code Parser

  • API Documentation Generation: Automatically generate docs from code comments
  • Developer Onboarding: Create documentation for new team members
  • Code Review: Ensure all components have proper documentation
  • Maintenance: Keep documentation in sync with codebase changes
  • Multi-language Projects: Expandable to Python, Java, and other languages

Extending the Parser

The parser is designed to be extensible:

  1. Add New Languages: Implement ILanguageParser interface
  2. Custom Templates: Modify DocumentationGenerator output format
  3. Additional Tags: Extend comment parsing for custom tags
  4. Output Formats: Add support for HTML, PDF, or other formats

See src/lib/parser/parsers/base-parser.ts for the base class structure.