Model Architecture Detection System

This project includes a custom model detection system that analyzes model files to determine their architecture without modifying the stable-diffusion.cpp library.

Features

Supported Model Formats

✅ Safetensors (.safetensors) - Fully supported
✅ GGUF (.gguf) - Quantized models, fully supported
❌ PyTorch Checkpoint (.ckpt, .pt) - NOT supported (requires PyTorch library to parse safely)

Note: .ckpt and .pt files are Python pickle formats that cannot be safely parsed in C++ without the full PyTorch library. These files will return "Unknown" architecture. Consider converting them to safetensors format for detection support.

Detected Architectures

SD 1.5 - Stable Diffusion 1.x (768 text encoder)
SD 2.1 - Stable Diffusion 2.x (1024 text encoder)
SDXL Base - Stable Diffusion XL Base
SDXL Refiner - Stable Diffusion XL Refiner
Flux Schnell - Flux fast generation (4 steps)
Flux Dev - Flux development (20 steps)
SD3 - Stable Diffusion 3
Qwen2-VL - Vision-language model

How It Works

1. File Header Parsing

The system reads only the file headers (not the entire model) to extract:

Safetensors: JSON header containing tensor names, shapes, and metadata
GGUF: Binary header with metadata key-value pairs and tensor information

2. Architecture Analysis

Model architecture is determined by analyzing:

Tensor names and patterns
- SDXL: Has conditioner and text_encoder_2 tensors
- Flux: Contains double_blocks and single_blocks
- SD3: Contains joint_blocks
Tensor dimensions
- Text encoder output: 768 (SD1.x), 1024 (SD2.x), 1280 (SDXL)
- UNet channels: ≤1280 (SD1.x/2.x), ≥2048 (SDXL)
Metadata hints
- modelspec.architecture field
- diffusion_steps for Flux variants

3. Configuration Recommendations

For each detected architecture, the system provides:

SD 1.5

Recommended VAE: vae-ft-mse-840000-ema-pruned.safetensors
Resolution: 512x512
Steps: 20
CFG Scale: 7.5
Sampler: euler_a
TAESD: Supported

SD 2.1

Recommended VAE: vae-ft-ema-560000.safetensors
Resolution: 768x768
Steps: 25
CFG Scale: 7.0
Sampler: euler_a
TAESD: Supported

SDXL Base/Refiner

Recommended VAE: sdxl_vae.safetensors
Resolution: 1024x1024
Steps: 30
CFG Scale: 7.0
Sampler: dpm++2m
TAESD: Supported
Has Conditioner: true

Flux Schnell

Recommended VAE: ae.safetensors
Resolution: 1024x1024
Steps: 4
CFG Scale: 1.0
Sampler: euler

Flux Dev

Recommended VAE: ae.safetensors
Resolution: 1024x1024
Steps: 20
CFG Scale: 1.0
Sampler: euler

Usage Example

#include "model_detector.h"

// Detect model architecture
std::string modelPath = "/data/SD_MODELS/checkpoints/SDXL/myModel.safetensors";
ModelDetectionResult result = ModelDetector::detectModel(modelPath);

// Check detected architecture
std::cout << "Architecture: " << result.architectureName << std::endl;
std::cout << "Text Encoder Dim: " << result.textEncoderDim << std::endl;
std::cout << "UNet Channels: " << result.unetChannels << std::endl;

// Get VAE recommendation
if (result.needsVAE) {
    std::cout << "Recommended VAE: " << result.recommendedVAE << std::endl;
}

// Get loading parameters
for (const auto& [param, value] : result.suggestedParams) {
    std::cout << param << ": " << value << std::endl;
}

// Access metadata
for (const auto& [key, value] : result.metadata) {
    std::cout << "Metadata " << key << ": " << value << std::endl;
}

Integration with Model Manager

You can integrate this into the ModelManager to:

Auto-detect model types during scanning

auto detection = ModelDetector::detectModel(filePath);
modelInfo.architecture = detection.architectureName;
modelInfo.recommendedVAE = detection.recommendedVAE;

Validate model-VAE compatibility

if (checkpoint.architecture == "SDXL" && vae.name != "sdxl_vae") {
   // Warn user about potential issues
}

Auto-configure generation parameters

auto params = ModelDetector::getRecommendedParams(architecture);
request.width = std::stoi(params["width"]);
request.height = std::stoi(params["height"]);
request.steps = std::stoi(params["steps"]);

Provide UI hints
- Show recommended settings in WebUI
- Display model architecture badges
- Suggest appropriate VAE models

Performance

Fast: Only reads file headers (typically < 1MB)
Safe: No model loading or GPU usage
Reliable: Works with both quantized (GGUF) and full-precision models

Limitations

PyTorch pickle files (.ckpt, .pt): Cannot be parsed without PyTorch library. These will return "Unknown" architecture.
- Solution: Convert to safetensors using Python: from safetensors.torch import save_file
Detection accuracy: Some custom fine-tunes may be misidentified if they have unusual tensor structures
Custom architectures: New or experimental architectures will be marked as "Unknown"

Future Enhancements

Support for LoRA architecture detection
ControlNet variant detection (canny, depth, openpose, etc.)
Embedding type classification
Model quality/version detection from metadata
Custom VAE detection and matching

Technical Details

File Format Specifications

Safetensors Format:

[8 bytes: header length (uint64 LE)]
[N bytes: JSON header with tensor info]
[M bytes: tensor data (not read)]

GGUF Format:

[4 bytes: magic "GGUF"]
[4 bytes: version (uint32)]
[8 bytes: tensor count (uint64)]
[8 bytes: metadata KV count (uint64)]
[metadata key-value pairs]
[tensor information]
[tensor data (not read)]

Detection Heuristics

The system uses multiple heuristics for robust detection:

Primary: Explicit metadata fields
Secondary: Tensor naming patterns
Tertiary: Tensor dimension analysis
Fallback: File size and structure

This multi-layered approach ensures accurate detection even for models with incomplete metadata.

MODEL_DETECTION.md 6.0 KB Түүх Анхны өгөгдөл