# Model Architecture Detection System This project includes a custom model detection system that analyzes model files to determine their architecture without modifying the stable-diffusion.cpp library. ## Features ### Supported Model Formats - ✅ **Safetensors** (.safetensors) - Fully supported - ✅ **GGUF** (.gguf) - Quantized models, fully supported - ❌ **PyTorch Checkpoint** (.ckpt, .pt) - NOT supported (requires PyTorch library to parse safely) **Note**: .ckpt and .pt files are Python pickle formats that cannot be safely parsed in C++ without the full PyTorch library. These files will return "Unknown" architecture. Consider converting them to safetensors format for detection support. ### Detected Architectures - **SD 1.5** - Stable Diffusion 1.x (768 text encoder) - **SD 2.1** - Stable Diffusion 2.x (1024 text encoder) - **SDXL Base** - Stable Diffusion XL Base - **SDXL Refiner** - Stable Diffusion XL Refiner - **Flux Schnell** - Flux fast generation (4 steps) - **Flux Dev** - Flux development (20 steps) - **SD3** - Stable Diffusion 3 - **Qwen2-VL** - Vision-language model ## How It Works ### 1. File Header Parsing The system reads only the **file headers** (not the entire model) to extract: - **Safetensors**: JSON header containing tensor names, shapes, and metadata - **GGUF**: Binary header with metadata key-value pairs and tensor information ### 2. Architecture Analysis Model architecture is determined by analyzing: - **Tensor names and patterns** - SDXL: Has `conditioner` and `text_encoder_2` tensors - Flux: Contains `double_blocks` and `single_blocks` - SD3: Contains `joint_blocks` - **Tensor dimensions** - Text encoder output: 768 (SD1.x), 1024 (SD2.x), 1280 (SDXL) - UNet channels: ≤1280 (SD1.x/2.x), ≥2048 (SDXL) - **Metadata hints** - `modelspec.architecture` field - `diffusion_steps` for Flux variants ### 3. Configuration Recommendations For each detected architecture, the system provides: #### SD 1.5 ```cpp Recommended VAE: vae-ft-mse-840000-ema-pruned.safetensors Resolution: 512x512 Steps: 20 CFG Scale: 7.5 Sampler: euler_a TAESD: Supported ``` #### SD 2.1 ```cpp Recommended VAE: vae-ft-ema-560000.safetensors Resolution: 768x768 Steps: 25 CFG Scale: 7.0 Sampler: euler_a TAESD: Supported ``` #### SDXL Base/Refiner ```cpp Recommended VAE: sdxl_vae.safetensors Resolution: 1024x1024 Steps: 30 CFG Scale: 7.0 Sampler: dpm++2m TAESD: Supported Has Conditioner: true ``` #### Flux Schnell ```cpp Recommended VAE: ae.safetensors Resolution: 1024x1024 Steps: 4 CFG Scale: 1.0 Sampler: euler ``` #### Flux Dev ```cpp Recommended VAE: ae.safetensors Resolution: 1024x1024 Steps: 20 CFG Scale: 1.0 Sampler: euler ``` ## Usage Example ```cpp #include "model_detector.h" // Detect model architecture std::string modelPath = "/data/SD_MODELS/checkpoints/SDXL/myModel.safetensors"; ModelDetectionResult result = ModelDetector::detectModel(modelPath); // Check detected architecture std::cout << "Architecture: " << result.architectureName << std::endl; std::cout << "Text Encoder Dim: " << result.textEncoderDim << std::endl; std::cout << "UNet Channels: " << result.unetChannels << std::endl; // Get VAE recommendation if (result.needsVAE) { std::cout << "Recommended VAE: " << result.recommendedVAE << std::endl; } // Get loading parameters for (const auto& [param, value] : result.suggestedParams) { std::cout << param << ": " << value << std::endl; } // Access metadata for (const auto& [key, value] : result.metadata) { std::cout << "Metadata " << key << ": " << value << std::endl; } ``` ## Integration with Model Manager You can integrate this into the ModelManager to: 1. **Auto-detect model types during scanning** ```cpp auto detection = ModelDetector::detectModel(filePath); modelInfo.architecture = detection.architectureName; modelInfo.recommendedVAE = detection.recommendedVAE; ``` 2. **Validate model-VAE compatibility** ```cpp if (checkpoint.architecture == "SDXL" && vae.name != "sdxl_vae") { // Warn user about potential issues } ``` 3. **Auto-configure generation parameters** ```cpp auto params = ModelDetector::getRecommendedParams(architecture); request.width = std::stoi(params["width"]); request.height = std::stoi(params["height"]); request.steps = std::stoi(params["steps"]); ``` 4. **Provide UI hints** - Show recommended settings in WebUI - Display model architecture badges - Suggest appropriate VAE models ## Performance - **Fast**: Only reads file headers (typically < 1MB) - **Safe**: No model loading or GPU usage - **Reliable**: Works with both quantized (GGUF) and full-precision models ## Limitations 1. **PyTorch pickle files (.ckpt, .pt)**: Cannot be parsed without PyTorch library. These will return "Unknown" architecture. - **Solution**: Convert to safetensors using Python: `from safetensors.torch import save_file` 2. **Detection accuracy**: Some custom fine-tunes may be misidentified if they have unusual tensor structures 3. **Custom architectures**: New or experimental architectures will be marked as "Unknown" ## Future Enhancements - Support for LoRA architecture detection - ControlNet variant detection (canny, depth, openpose, etc.) - Embedding type classification - Model quality/version detection from metadata - Custom VAE detection and matching ## Technical Details ### File Format Specifications **Safetensors Format:** ``` [8 bytes: header length (uint64 LE)] [N bytes: JSON header with tensor info] [M bytes: tensor data (not read)] ``` **GGUF Format:** ``` [4 bytes: magic "GGUF"] [4 bytes: version (uint32)] [8 bytes: tensor count (uint64)] [8 bytes: metadata KV count (uint64)] [metadata key-value pairs] [tensor information] [tensor data (not read)] ``` ### Detection Heuristics The system uses multiple heuristics for robust detection: 1. **Primary**: Explicit metadata fields 2. **Secondary**: Tensor naming patterns 3. **Tertiary**: Tensor dimension analysis 4. **Fallback**: File size and structure This multi-layered approach ensures accurate detection even for models with incomplete metadata.