Przeglądaj źródła

Update README.md with comprehensive source code parser documentation

- Added new Source Code Parser section with all features
- Documented parser CLI commands (parse, parse-list-languages, parse-validate)
- Included installation and configuration instructions
- Added usage examples with dry-run and language selection
- Updated project structure to reflect parser library
- Documented output structure for generated documentation
- Added dependencies and prerequisites for parser functionality
- Included use cases for both document management and code parsing
- Added extensibility guide for adding new languages
- Comprehensive documentation for the complete dual-purpose tool

The README now reflects the full capabilities of the docs-rag project:
1. Document management with Qdrant and Ollama
2. Source code parsing with C++ support
3. CLI tools for both functionalities
4. MCP server integration
5. Extensible architecture for future languages
fszontagh 3 miesięcy temu
rodzic
commit
8ae697f134
1 zmienionych plików z 126 dodań i 25 usunięć
  1. 126 25
      README.md

+ 126 - 25
README.md

@@ -1,9 +1,10 @@
 # Docs RAG
 
-A TypeScript-based project for storing markdown documents in Qdrant vector database with Ollama embeddings. Includes both CLI tools and MCP server integration.
+A comprehensive TypeScript-based project for storing markdown documents in Qdrant vector database with Ollama embeddings, plus a powerful source code documentation parser. Includes both CLI tools and MCP server integration.
 
 ## Features
 
+### Document Management
 - Store markdown documents in Qdrant collections
 - Vectorize documents using Ollama embeddings
 - Split documents by paragraphs (one paragraph = one vector)
@@ -12,6 +13,14 @@ A TypeScript-based project for storing markdown documents in Qdrant vector datab
 - MCP server for integration
 - Recursive folder scanning
 
+### 🚀 Source Code Parser (NEW!)
+- Parse C++ source code with full documentation support
+- Extract comments (`/** */`, `///`, `//!`) and tags (`@brief`, `@param`, `@return`, `@example`)
+- Generate structured markdown documentation matching Material theme
+- Create class hierarchies and method documentation
+- Modular architecture for adding new languages
+- Integrated CLI with existing commands
+
 ## Installation
 
 ```bash
@@ -40,50 +49,85 @@ npm run build
 
 ## CLI Usage
 
-### Add a document collection
+### Document Management
 
+#### Add a document collection
 ```bash
 npm run dev add --name "my-docs" --folder "./docs" --recursive
 ```
 
-### Search within a document
-
+#### Search within a document
 ```bash
 npm run dev search --document "my-docs" --query "installation guide" --limit 5
 ```
 
-### List all collections
-
+#### List all collections
 ```bash
 npm run dev list
 ```
 
-### Get collection info
-
+#### Get collection info
 ```bash
 npm run dev info --document "my-docs"
 ```
 
+### 🔥 Source Code Parsing (NEW!)
+
+#### Parse C++ source files
+```bash
+npm run parse --input "./src" --output "./docs" --languages cpp
+```
+
+#### List supported languages
+```bash
+npm run parse-list-languages
+```
+
+#### Validate input directory
+```bash
+npm run parse-validate --input "./src"
+```
+
+#### Dry run to preview what will be processed
+```bash
+npm run parse --input "./src" --output "./docs" --dry-run
+```
+
+## Output Structure
+
+### Document Management
+Collections are stored as vectors in Qdrant with metadata for retrieval.
+
+### Source Code Parser
+The parser generates structured documentation:
+
+```
+docs/
+├── index.md                    # Main index with package references
+└── Calculator/
+    ├── index.md               # Module overview
+    ├── Calculator.md          # Class documentation
+    ├── add.md               # Method documentation
+    └── multiply.md           # Method documentation
+```
+
 ## MCP Server
 
 ### Quick Start
 
 Start the MCP server:
-
 ```bash
 npm run mcp:cli
 ```
 
 Or use the dedicated CLI:
-
 ```bash
 docs-rag-mcp
 ```
 
 ### Client Configuration
 
-For MCP clients, use this simple configuration:
-
+For MCP clients, use this configuration:
 ```json
 {
   "mcpServers": {
@@ -96,15 +140,9 @@ For MCP clients, use this simple configuration:
 }
 ```
 
-See [MCP_CLIENTS.md](./MCP_CLIENTS.md) for detailed configuration examples for:
-- Claude Desktop
-- VS Code  
-- Cursor
-- Claude Code CLI
-- Windsurf
-
 ### Available MCP Tools
 
+#### Document Management
 1. **add_document** - Add a document collection from markdown files
    - Parameters: `name` (string), `folder` (string), `recursive` (boolean, default: true)
    
@@ -121,22 +159,48 @@ See [MCP_CLIENTS.md](./MCP_CLIENTS.md) for detailed configuration examples for:
 
 ```
 src/
-├── config/          # Configuration management
-├── lib/            # Utility functions (file processing)
-├── services/       # Core services (Ollama, Qdrant, Document)
-├── cli/            # CLI tool implementation
-└── mcp/            # MCP server implementation
+├── config/              # Configuration management
+├── lib/
+│   ├── parser/          # Source code parser library
+│   │   ├── core/       # Core interfaces and generators
+│   │   ├── parsers/    # Language-specific parsers
+│   │   └── utils/      # File and markdown utilities
+│   └── fileProcessor.ts # Utility functions (file processing)
+├── services/           # Core services
+│   ├── documentService.ts    # Document management
+│   ├── parseService.ts      # Source code parser
+│   ├── ollamaService.ts     # Embedding generation
+│   └── qdrantService.ts     # Qdrant client
+├── cli/               # CLI tool implementation
+│   ├── index.ts            # Main CLI with all commands
+│   └── parser-commands.ts  # Parser-specific commands
+├── mcp/               # MCP server implementation
+│   ├── server.ts          # MCP server
+│   ├── stdio.ts           # stdio transport
+│   ├── agentic-stdio.ts   # Agentic stdio
+│   └── cli.ts             # CLI interface
+└── types/             # Type definitions
+    └── parser-types.ts    # Parser type definitions
 ```
 
 ## How It Works
 
+### Document Management
 1. **Document Processing**: Markdown files are scanned recursively and split into paragraphs
 2. **Embedding Creation**: Each paragraph is converted to a vector using Ollama
 3. **Storage**: Vectors are stored in Qdrant with metadata (file hash, name, last modified)
 4. **Search**: Semantic search finds relevant paragraphs based on query embeddings
 
+### Source Code Parser
+1. **File Scanning**: Recursively scans source files by extensions
+2. **Comment Extraction**: Parses `/** */`, `///`, and `//!` documentation styles
+3. **Tag Processing**: Extracts `@brief`, `@param`, `@return`, `@example` tags
+4. **AST Building**: Creates abstract syntax tree with classes and methods
+5. **Markdown Generation**: Outputs structured documentation with Material theme
+
 ## Dependencies
 
+### Core Dependencies
 - `@qdrant/qdrant-js` - Qdrant client
 - `@modelcontextprotocol/sdk` - MCP server framework
 - `commander` - CLI framework
@@ -144,9 +208,46 @@ src/
 - `dotenv` - Environment configuration
 - `fs-extra` - Enhanced file system operations
 - `crypto` - Hash generation
+- `glob` - File pattern matching
+
+### Parser Dependencies
+- `typescript` - TypeScript support and compilation
+- `tsx` - TypeScript execution runtime
+- `zod` - Type validation and schemas
 
 ## Prerequisites
 
+### Document Management
 - Qdrant server running
 - Ollama server running with embedding model
-- Node.js (TypeScript support)
+- Node.js (TypeScript support)
+
+### Source Code Parser
+- No additional prerequisites - works with existing Node.js setup
+- Supports C++ files (.cpp, .h, .hpp, .cxx, .cc, .c)
+
+## Use Cases
+
+### Document Management
+- **Technical Documentation**: Store API docs, user guides, and technical specs
+- **Knowledge Base**: Build searchable knowledge base from markdown documents
+- **Research Assistant**: Create AI-powered search for technical content
+- **API Documentation**: Store and search through API documentation
+
+### Source Code Parser
+- **API Documentation Generation**: Automatically generate docs from code comments
+- **Developer Onboarding**: Create documentation for new team members
+- **Code Review**: Ensure all components have proper documentation
+- **Maintenance**: Keep documentation in sync with codebase changes
+- **Multi-language Projects**: Expandable to Python, Java, and other languages
+
+## Extending the Parser
+
+The parser is designed to be extensible:
+
+1. **Add New Languages**: Implement `ILanguageParser` interface
+2. **Custom Templates**: Modify `DocumentationGenerator` output format
+3. **Additional Tags**: Extend comment parsing for custom tags
+4. **Output Formats**: Add support for HTML, PDF, or other formats
+
+See `src/lib/parser/parsers/base-parser.ts` for the base class structure.