Без опису

fszontagh 8ae697f134 Update README.md with comprehensive source code parser documentation		3 місяців тому
example-docs	fb80726d5a Initial commit: Complete TypeScript source code parser implementation	3 місяців тому
src	fb80726d5a Initial commit: Complete TypeScript source code parser implementation	3 місяців тому
.env.example	fb80726d5a Initial commit: Complete TypeScript source code parser implementation	3 місяців тому
.gitignore	877579caf2 Update .gitignore with comprehensive exclusions	3 місяців тому
AGENTIC_CODER_CONFIG.md	fb80726d5a Initial commit: Complete TypeScript source code parser implementation	3 місяців тому
IMPLEMENTATION_COMPLETE.md	fb80726d5a Initial commit: Complete TypeScript source code parser implementation	3 місяців тому
MCP_CLIENTS.md	fb80726d5a Initial commit: Complete TypeScript source code parser implementation	3 місяців тому
README.md	8ae697f134 Update README.md with comprehensive source code parser documentation	3 місяців тому
package-lock.json	fb80726d5a Initial commit: Complete TypeScript source code parser implementation	3 місяців тому
package.json	fb80726d5a Initial commit: Complete TypeScript source code parser implementation	3 місяців тому
tsconfig.json	fb80726d5a Initial commit: Complete TypeScript source code parser implementation	3 місяців тому

Docs RAG

A comprehensive TypeScript-based project for storing markdown documents in Qdrant vector database with Ollama embeddings, plus a powerful source code documentation parser. Includes both CLI tools and MCP server integration.

Features

Document Management

Store markdown documents in Qdrant collections
Vectorize documents using Ollama embeddings
Split documents by paragraphs (one paragraph = one vector)
Store file metadata (hash, filename, last modified)
CLI tool for document management
MCP server for integration
Recursive folder scanning

🚀 Source Code Parser (NEW!)

Parse C++ source code with full documentation support
Extract comments (/** */, ///, //!) and tags (@brief, @param, @return, @example)
Generate structured markdown documentation matching Material theme
Create class hierarchies and method documentation
Modular architecture for adding new languages
Integrated CLI with existing commands

Installation

npm install

Configuration

Copy .env.example to .env and configure your settings:

# Qdrant URL Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=

# Ollama Configuration
OLLAMA_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

Building

npm run build

CLI Usage

Document Management

Add a document collection

npm run dev add --name "my-docs" --folder "./docs" --recursive

Search within a document

npm run dev search --document "my-docs" --query "installation guide" --limit 5

List all collections

npm run dev list

Get collection info

npm run dev info --document "my-docs"

🔥 Source Code Parsing (NEW!)

Parse C++ source files

npm run parse --input "./src" --output "./docs" --languages cpp

List supported languages

npm run parse-list-languages

Validate input directory

npm run parse-validate --input "./src"

Dry run to preview what will be processed

npm run parse --input "./src" --output "./docs" --dry-run

Output Structure

Document Management

Collections are stored as vectors in Qdrant with metadata for retrieval.

Source Code Parser

The parser generates structured documentation:

docs/
├── index.md                    # Main index with package references
└── Calculator/
    ├── index.md               # Module overview
    ├── Calculator.md          # Class documentation
    ├── add.md               # Method documentation
    └── multiply.md           # Method documentation

MCP Server

Quick Start

Start the MCP server:

npm run mcp:cli

Or use the dedicated CLI:

docs-rag-mcp

Client Configuration

For MCP clients, use this configuration:

{
  "mcpServers": {
    "docs-rag": {
      "command": "npx",
      "args": ["-y", "docs-rag-mcp"],
      "cwd": "/path/to/docs-rag"
    }
  }
}

Available MCP Tools

Document Management

add_document - Add a document collection from markdown files
- Parameters: name (string), folder (string), recursive (boolean, default: true)
search_documents - Search within a document collection
- Parameters: documentName (string), query (string), limit (number, default: 10)
list_collections - List all document collections
- Parameters: none
get_document_info - Get information about a document collection
- Parameters: documentName (string)

Project Structure

src/
├── config/              # Configuration management
├── lib/
│   ├── parser/          # Source code parser library
│   │   ├── core/       # Core interfaces and generators
│   │   ├── parsers/    # Language-specific parsers
│   │   └── utils/      # File and markdown utilities
│   └── fileProcessor.ts # Utility functions (file processing)
├── services/           # Core services
│   ├── documentService.ts    # Document management
│   ├── parseService.ts      # Source code parser
│   ├── ollamaService.ts     # Embedding generation
│   └── qdrantService.ts     # Qdrant client
├── cli/               # CLI tool implementation
│   ├── index.ts            # Main CLI with all commands
│   └── parser-commands.ts  # Parser-specific commands
├── mcp/               # MCP server implementation
│   ├── server.ts          # MCP server
│   ├── stdio.ts           # stdio transport
│   ├── agentic-stdio.ts   # Agentic stdio
│   └── cli.ts             # CLI interface
└── types/             # Type definitions
    └── parser-types.ts    # Parser type definitions

How It Works

Document Management

Document Processing: Markdown files are scanned recursively and split into paragraphs
Embedding Creation: Each paragraph is converted to a vector using Ollama
Storage: Vectors are stored in Qdrant with metadata (file hash, name, last modified)
Search: Semantic search finds relevant paragraphs based on query embeddings

Source Code Parser

File Scanning: Recursively scans source files by extensions
Comment Extraction: Parses /** */, ///, and //! documentation styles
Tag Processing: Extracts @brief, @param, @return, @example tags
AST Building: Creates abstract syntax tree with classes and methods
Markdown Generation: Outputs structured documentation with Material theme

Dependencies

Core Dependencies

@qdrant/qdrant-js - Qdrant client
@modelcontextprotocol/sdk - MCP server framework
commander - CLI framework
ollama - Embedding generation
dotenv - Environment configuration
fs-extra - Enhanced file system operations
crypto - Hash generation
glob - File pattern matching

Parser Dependencies

typescript - TypeScript support and compilation
tsx - TypeScript execution runtime
zod - Type validation and schemas

Prerequisites

Document Management

Qdrant server running
Ollama server running with embedding model
Node.js (TypeScript support)

Source Code Parser

No additional prerequisites - works with existing Node.js setup
Supports C++ files (.cpp, .h, .hpp, .cxx, .cc, .c)

Use Cases

Document Management

Technical Documentation: Store API docs, user guides, and technical specs
Knowledge Base: Build searchable knowledge base from markdown documents
Research Assistant: Create AI-powered search for technical content
API Documentation: Store and search through API documentation

Source Code Parser

API Documentation Generation: Automatically generate docs from code comments
Developer Onboarding: Create documentation for new team members
Code Review: Ensure all components have proper documentation
Maintenance: Keep documentation in sync with codebase changes
Multi-language Projects: Expandable to Python, Java, and other languages

Extending the Parser

The parser is designed to be extensible:

Add New Languages: Implement ILanguageParser interface
Custom Templates: Modify DocumentationGenerator output format
Additional Tags: Extend comment parsing for custom tags
Output Formats: Add support for HTML, PDF, or other formats

See src/lib/parser/parsers/base-parser.ts for the base class structure.

README.md

Docs RAG

Features

Document Management

🚀 Source Code Parser (NEW!)

Installation

Configuration

Building

CLI Usage

Document Management

Add a document collection

Search within a document

List all collections

Get collection info

🔥 Source Code Parsing (NEW!)

Parse C++ source files

List supported languages

Validate input directory

Dry run to preview what will be processed

Output Structure

Document Management

Source Code Parser

MCP Server

Quick Start

Client Configuration

Available MCP Tools

Document Management

Project Structure

How It Works

Document Management

Source Code Parser

Dependencies

Core Dependencies

Parser Dependencies

Prerequisites

Document Management

Source Code Parser

Use Cases

Document Management

Source Code Parser

Extending the Parser