MODEL_DETECTION.md 6.0 KB

Model Architecture Detection System

This project includes a custom model detection system that analyzes model files to determine their architecture without modifying the stable-diffusion.cpp library.

Features

Supported Model Formats

  • Safetensors (.safetensors) - Fully supported
  • GGUF (.gguf) - Quantized models, fully supported
  • PyTorch Checkpoint (.ckpt, .pt) - NOT supported (requires PyTorch library to parse safely)

Note: .ckpt and .pt files are Python pickle formats that cannot be safely parsed in C++ without the full PyTorch library. These files will return "Unknown" architecture. Consider converting them to safetensors format for detection support.

Detected Architectures

  • SD 1.5 - Stable Diffusion 1.x (768 text encoder)
  • SD 2.1 - Stable Diffusion 2.x (1024 text encoder)
  • SDXL Base - Stable Diffusion XL Base
  • SDXL Refiner - Stable Diffusion XL Refiner
  • Flux Schnell - Flux fast generation (4 steps)
  • Flux Dev - Flux development (20 steps)
  • SD3 - Stable Diffusion 3
  • Qwen2-VL - Vision-language model

How It Works

1. File Header Parsing

The system reads only the file headers (not the entire model) to extract:

  • Safetensors: JSON header containing tensor names, shapes, and metadata
  • GGUF: Binary header with metadata key-value pairs and tensor information

2. Architecture Analysis

Model architecture is determined by analyzing:

  • Tensor names and patterns
    • SDXL: Has conditioner and text_encoder_2 tensors
    • Flux: Contains double_blocks and single_blocks
    • SD3: Contains joint_blocks
  • Tensor dimensions
    • Text encoder output: 768 (SD1.x), 1024 (SD2.x), 1280 (SDXL)
    • UNet channels: ≤1280 (SD1.x/2.x), ≥2048 (SDXL)
  • Metadata hints
    • modelspec.architecture field
    • diffusion_steps for Flux variants

3. Configuration Recommendations

For each detected architecture, the system provides:

SD 1.5

Recommended VAE: vae-ft-mse-840000-ema-pruned.safetensors
Resolution: 512x512
Steps: 20
CFG Scale: 7.5
Sampler: euler_a
TAESD: Supported

SD 2.1

Recommended VAE: vae-ft-ema-560000.safetensors
Resolution: 768x768
Steps: 25
CFG Scale: 7.0
Sampler: euler_a
TAESD: Supported

SDXL Base/Refiner

Recommended VAE: sdxl_vae.safetensors
Resolution: 1024x1024
Steps: 30
CFG Scale: 7.0
Sampler: dpm++2m
TAESD: Supported
Has Conditioner: true

Flux Schnell

Recommended VAE: ae.safetensors
Resolution: 1024x1024
Steps: 4
CFG Scale: 1.0
Sampler: euler

Flux Dev

Recommended VAE: ae.safetensors
Resolution: 1024x1024
Steps: 20
CFG Scale: 1.0
Sampler: euler

Usage Example

#include "model_detector.h"

// Detect model architecture
std::string modelPath = "/data/SD_MODELS/checkpoints/SDXL/myModel.safetensors";
ModelDetectionResult result = ModelDetector::detectModel(modelPath);

// Check detected architecture
std::cout << "Architecture: " << result.architectureName << std::endl;
std::cout << "Text Encoder Dim: " << result.textEncoderDim << std::endl;
std::cout << "UNet Channels: " << result.unetChannels << std::endl;

// Get VAE recommendation
if (result.needsVAE) {
    std::cout << "Recommended VAE: " << result.recommendedVAE << std::endl;
}

// Get loading parameters
for (const auto& [param, value] : result.suggestedParams) {
    std::cout << param << ": " << value << std::endl;
}

// Access metadata
for (const auto& [key, value] : result.metadata) {
    std::cout << "Metadata " << key << ": " << value << std::endl;
}

Integration with Model Manager

You can integrate this into the ModelManager to:

  1. Auto-detect model types during scanning

    auto detection = ModelDetector::detectModel(filePath);
    modelInfo.architecture = detection.architectureName;
    modelInfo.recommendedVAE = detection.recommendedVAE;
    
  2. Validate model-VAE compatibility

    if (checkpoint.architecture == "SDXL" && vae.name != "sdxl_vae") {
       // Warn user about potential issues
    }
    
  3. Auto-configure generation parameters

    auto params = ModelDetector::getRecommendedParams(architecture);
    request.width = std::stoi(params["width"]);
    request.height = std::stoi(params["height"]);
    request.steps = std::stoi(params["steps"]);
    
  4. Provide UI hints

    • Show recommended settings in WebUI
    • Display model architecture badges
    • Suggest appropriate VAE models

Performance

  • Fast: Only reads file headers (typically < 1MB)
  • Safe: No model loading or GPU usage
  • Reliable: Works with both quantized (GGUF) and full-precision models

Limitations

  1. PyTorch pickle files (.ckpt, .pt): Cannot be parsed without PyTorch library. These will return "Unknown" architecture.
    • Solution: Convert to safetensors using Python: from safetensors.torch import save_file
  2. Detection accuracy: Some custom fine-tunes may be misidentified if they have unusual tensor structures
  3. Custom architectures: New or experimental architectures will be marked as "Unknown"

Future Enhancements

  • Support for LoRA architecture detection
  • ControlNet variant detection (canny, depth, openpose, etc.)
  • Embedding type classification
  • Model quality/version detection from metadata
  • Custom VAE detection and matching

Technical Details

File Format Specifications

Safetensors Format:

[8 bytes: header length (uint64 LE)]
[N bytes: JSON header with tensor info]
[M bytes: tensor data (not read)]

GGUF Format:

[4 bytes: magic "GGUF"]
[4 bytes: version (uint32)]
[8 bytes: tensor count (uint64)]
[8 bytes: metadata KV count (uint64)]
[metadata key-value pairs]
[tensor information]
[tensor data (not read)]

Detection Heuristics

The system uses multiple heuristics for robust detection:

  1. Primary: Explicit metadata fields
  2. Secondary: Tensor naming patterns
  3. Tertiary: Tensor dimension analysis
  4. Fallback: File size and structure

This multi-layered approach ensures accurate detection even for models with incomplete metadata.