# Model Architecture Detection System

This project includes a custom model detection system that analyzes model files to determine their architecture without modifying the stable-diffusion.cpp library.

## Features

### Supported Model Formats
- ✅ **Safetensors** (.safetensors) - Fully supported
- ✅ **GGUF** (.gguf) - Quantized models, fully supported
- ❌ **PyTorch Checkpoint** (.ckpt, .pt) - NOT supported (requires PyTorch library to parse safely)

**Note**: .ckpt and .pt files are Python pickle formats that cannot be safely parsed in C++ without the full PyTorch library. These files will return "Unknown" architecture. Consider converting them to safetensors format for detection support.

### Detected Architectures
- **SD 1.5** - Stable Diffusion 1.x (768 text encoder)
- **SD 2.1** - Stable Diffusion 2.x (1024 text encoder)
- **SDXL Base** - Stable Diffusion XL Base
- **SDXL Refiner** - Stable Diffusion XL Refiner
- **Flux Schnell** - Flux fast generation (4 steps)
- **Flux Dev** - Flux development (20 steps)
- **SD3** - Stable Diffusion 3
- **Qwen2-VL** - Vision-language model

## How It Works

### 1. File Header Parsing

The system reads only the **file headers** (not the entire model) to extract:
- **Safetensors**: JSON header containing tensor names, shapes, and metadata
- **GGUF**: Binary header with metadata key-value pairs and tensor information

### 2. Architecture Analysis

Model architecture is determined by analyzing:
- **Tensor names and patterns**
  - SDXL: Has `conditioner` and `text_encoder_2` tensors
  - Flux: Contains `double_blocks` and `single_blocks`
  - SD3: Contains `joint_blocks`
- **Tensor dimensions**
  - Text encoder output: 768 (SD1.x), 1024 (SD2.x), 1280 (SDXL)
  - UNet channels: ≤1280 (SD1.x/2.x), ≥2048 (SDXL)
- **Metadata hints**
  - `modelspec.architecture` field
  - `diffusion_steps` for Flux variants

### 3. Configuration Recommendations

For each detected architecture, the system provides:

#### SD 1.5
```cpp
Recommended VAE: vae-ft-mse-840000-ema-pruned.safetensors
Resolution: 512x512
Steps: 20
CFG Scale: 7.5
Sampler: euler_a
TAESD: Supported
```

#### SD 2.1
```cpp
Recommended VAE: vae-ft-ema-560000.safetensors
Resolution: 768x768
Steps: 25
CFG Scale: 7.0
Sampler: euler_a
TAESD: Supported
```

#### SDXL Base/Refiner
```cpp
Recommended VAE: sdxl_vae.safetensors
Resolution: 1024x1024
Steps: 30
CFG Scale: 7.0
Sampler: dpm++2m
TAESD: Supported
Has Conditioner: true
```

#### Flux Schnell
```cpp
Recommended VAE: ae.safetensors
Resolution: 1024x1024
Steps: 4
CFG Scale: 1.0
Sampler: euler
```

#### Flux Dev
```cpp
Recommended VAE: ae.safetensors
Resolution: 1024x1024
Steps: 20
CFG Scale: 1.0
Sampler: euler
```

## Usage Example

```cpp
#include "model_detector.h"

// Detect model architecture
std::string modelPath = "/data/SD_MODELS/checkpoints/SDXL/myModel.safetensors";
ModelDetectionResult result = ModelDetector::detectModel(modelPath);

// Check detected architecture
std::cout << "Architecture: " << result.architectureName << std::endl;
std::cout << "Text Encoder Dim: " << result.textEncoderDim << std::endl;
std::cout << "UNet Channels: " << result.unetChannels << std::endl;

// Get VAE recommendation
if (result.needsVAE) {
    std::cout << "Recommended VAE: " << result.recommendedVAE << std::endl;
}

// Get loading parameters
for (const auto& [param, value] : result.suggestedParams) {
    std::cout << param << ": " << value << std::endl;
}

// Access metadata
for (const auto& [key, value] : result.metadata) {
    std::cout << "Metadata " << key << ": " << value << std::endl;
}
```

## Integration with Model Manager

You can integrate this into the ModelManager to:

1. **Auto-detect model types during scanning**
   ```cpp
   auto detection = ModelDetector::detectModel(filePath);
   modelInfo.architecture = detection.architectureName;
   modelInfo.recommendedVAE = detection.recommendedVAE;
   ```

2. **Validate model-VAE compatibility**
   ```cpp
   if (checkpoint.architecture == "SDXL" && vae.name != "sdxl_vae") {
       // Warn user about potential issues
   }
   ```

3. **Auto-configure generation parameters**
   ```cpp
   auto params = ModelDetector::getRecommendedParams(architecture);
   request.width = std::stoi(params["width"]);
   request.height = std::stoi(params["height"]);
   request.steps = std::stoi(params["steps"]);
   ```

4. **Provide UI hints**
   - Show recommended settings in WebUI
   - Display model architecture badges
   - Suggest appropriate VAE models

## Performance

- **Fast**: Only reads file headers (typically < 1MB)
- **Safe**: No model loading or GPU usage
- **Reliable**: Works with both quantized (GGUF) and full-precision models

## Limitations

1. **PyTorch pickle files (.ckpt, .pt)**: Cannot be parsed without PyTorch library. These will return "Unknown" architecture.
   - **Solution**: Convert to safetensors using Python: `from safetensors.torch import save_file`
2. **Detection accuracy**: Some custom fine-tunes may be misidentified if they have unusual tensor structures
3. **Custom architectures**: New or experimental architectures will be marked as "Unknown"

## Future Enhancements

- Support for LoRA architecture detection
- ControlNet variant detection (canny, depth, openpose, etc.)
- Embedding type classification
- Model quality/version detection from metadata
- Custom VAE detection and matching

## Technical Details

### File Format Specifications

**Safetensors Format:**
```
[8 bytes: header length (uint64 LE)]
[N bytes: JSON header with tensor info]
[M bytes: tensor data (not read)]
```

**GGUF Format:**
```
[4 bytes: magic "GGUF"]
[4 bytes: version (uint32)]
[8 bytes: tensor count (uint64)]
[8 bytes: metadata KV count (uint64)]
[metadata key-value pairs]
[tensor information]
[tensor data (not read)]
```

### Detection Heuristics

The system uses multiple heuristics for robust detection:

1. **Primary**: Explicit metadata fields
2. **Secondary**: Tensor naming patterns
3. **Tertiary**: Tensor dimension analysis
4. **Fallback**: File size and structure

This multi-layered approach ensures accurate detection even for models with incomplete metadata.