# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a C++ REST API server that wraps the [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) library, providing HTTP endpoints for Stable Diffusion image generation. The server is built with a modular architecture featuring three main components: HTTP Server, Generation Queue, and Model Manager.

## Build Commands

### Initial Setup and Build
```bash
# Create build directory and configure
mkdir build && cd build
cmake ..

# Build the project (parallel build)
cmake --build . --parallel

# Install (optional)
cmake --install .
```

### Build Configuration Options
```bash
# Build with CUDA support (default: ON)
cmake -DSD_CUDA_SUPPORT=ON ..

# Build without CUDA
cmake -DSD_CUDA_SUPPORT=OFF ..

# Debug build
cmake -DCMAKE_BUILD_TYPE=Debug ..

# Release build (default)
cmake -DCMAKE_BUILD_TYPE=Release ..
```

### Clean and Rebuild
```bash
# Clean build artifacts
cd build
cmake --build . --target clean

# Or delete build directory entirely
rm -rf build
mkdir build && cd build
cmake ..
cmake --build . --parallel
```

### Running the Server

**Required Parameters:**
Both `--models-dir` and `--checkpoints` are required.

```bash
# Basic usage with required parameters
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints

# The above resolves checkpoints to: /path/to/models/checkpoints

# Using absolute path for checkpoints
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints /absolute/path/to/checkpoints

# With custom port and host
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints --host 0.0.0.0 --port 8080

# With verbose logging
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints --verbose

# With optional model directories (relative paths)
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints --lora-dir lora --vae-dir vae

# With optional model directories (absolute paths)
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints --lora-dir /other/lora --vae-dir /other/vae
```

**Path Resolution Logic:**
- If a directory parameter is an absolute path, it's used as-is
- If a directory parameter is a relative path, it's resolved relative to `--models-dir`
- Example: `--models-dir /data/models --checkpoints checkpoints` → `/data/models/checkpoints`
- Example: `--models-dir /data/models --checkpoints /other/checkpoints` → `/other/checkpoints`

## Architecture

### Three-Component Design

1. **HTTP Server** (`src/server.cpp`, `include/server.h`)
   - Uses cpp-httplib for HTTP handling
   - Runs in separate thread from generation
   - Handles request validation and response formatting
   - All endpoints are registered in `registerEndpoints()`
   - CORS is configured in `setupCORS()`

2. **Generation Queue** (`src/generation_queue.cpp`, `include/generation_queue.h`)
   - Thread-safe queue for managing generation requests
   - Uses Pimpl idiom (implementation hidden in .cpp)
   - Processes jobs sequentially (one at a time by default)
   - Provides job tracking via `JobInfo` structures
   - Main methods: `enqueueRequest()`, `getQueueStatus()`, `cancelJob()`

3. **Model Manager** (`src/model_manager.cpp`, `include/model_manager.h`)
   - Handles loading/unloading of different model types
   - Uses Pimpl idiom for implementation hiding
   - All model directories are explicitly configured
   - Supports path resolution: absolute paths used as-is, relative paths resolved from base models directory
   - Thread-safe with shared_mutex for concurrent reads
   - Model scanning is cancellable via `cancelScan()`

### Threading Architecture

- **Main thread**: Initialization, signal handling, coordination
- **Server thread**: HTTP request handling (in `Server::serverThreadFunction()`)
- **Queue worker threads**: Generation processing (managed by GenerationQueue)
- Signal handler sets global `g_running` flag for graceful shutdown

### Model Type System

Model types are bit flags (can be combined):
```cpp
enum class ModelType {
    LORA = 1, CHECKPOINT = 2, VAE = 4, PRESETS = 8,
    PROMPTS = 16, NEG_PROMPTS = 32, TAESD = 64,
    ESRGAN = 128, CONTROLNET = 256, UPSCALER = 512,
    EMBEDDING = 1024
};
```

Supported file extensions by type:
- LORA, CHECKPOINT, VAE, TAESD, CONTROLNET, EMBEDDING: `.safetensors`, `.pt`, `.ckpt`
- PRESETS: `.json`, `.yaml`, `.yml`
- PROMPTS, NEG_PROMPTS: `.txt`, `.json`
- ESRGAN, UPSCALER: `.pth`, `.pt`

## Key API Endpoints

### Generation Endpoints
- `POST /api/v1/generate` - General image generation
- `POST /api/v1/text2img` - Text-to-image generation
- `POST /api/v1/img2img` - Image-to-image generation
- `POST /api/v1/controlnet` - ControlNet generation

### Model Management
- `GET /api/v1/models` - List all available models
- `GET /api/v1/models/{type}` - List models by type
- `GET /api/v1/models/{id}` - Get model details
- `POST /api/v1/models/load` - Load a model
- `POST /api/v1/models/unload` - Unload a model
- `POST /api/v1/models/scan` - Rescan models directory

### Queue Management
- `GET /api/v1/queue` - Get queue status
- `GET /api/v1/jobs/{id}` - Get job status
- `DELETE /api/v1/jobs/{id}` - Cancel a job
- `DELETE /api/v1/queue` - Clear queue

### System Information
- `GET /api/v1/health` - Health check
- `GET /api/v1/status` - API status
- `GET /api/v1/system` - System capabilities

## Dependencies Management

Dependencies are managed in `cmake/FindDependencies.cmake` using CMake's FetchContent:

- **nlohmann/json** (v3.11.2) - JSON parsing/serialization
- **cpp-httplib** (v0.14.1) - HTTP server library
- **stable-diffusion.cpp** - Core SD library (via ExternalProject)
- **Threads** - POSIX threads
- **OpenMP** (optional) - Parallel processing
- **CUDA** (optional) - GPU acceleration

The stable-diffusion.cpp library is downloaded and built automatically via `ExternalProject_Add()` in the root CMakeLists.txt. The specific git tag is pinned to `master-334-d05e46c`.

## Important Implementation Details

### Pimpl Idiom Usage
Both `GenerationQueue` and `ModelManager` use the Pimpl (Pointer to Implementation) idiom:
```cpp
class GenerationQueue {
private:
    class Impl;
    std::unique_ptr<Impl> pImpl;
};
```
All implementation details are in the `.cpp` files, not headers. When modifying these classes, update the inner `Impl` class definition in the `.cpp` file.

### Signal Handling
- Global pointer `g_server` allows signal handler to trigger graceful shutdown
- Signal handler sets `g_running` atomic flag and calls `server->stop()`
- Shutdown sequence: stop server → stop queue → wait for threads → cleanup

### Directory Configuration
The server requires explicit configuration of model directories:

**Required Parameters:**
- `--models-dir`: Base directory for models (required)
- `--checkpoints`: Checkpoints directory (required)

**Optional Parameters:**
- `--lora-dir`, `--vae-dir`, `--controlnet-dir`, etc. (optional)

**Path Resolution:**
- Absolute paths (e.g., `/absolute/path/to/models`) are used as-is
- Relative paths (e.g., `checkpoints`) are resolved relative to `--models-dir`
- The `resolveDirectoryPath()` function in `main.cpp` handles this logic

### Generation Parameters
The `GenerationRequest` structure in `generation_queue.h` contains all parameters from stable-diffusion.cpp's CLI including:
- Basic: prompt, negative_prompt, width, height, steps, cfg_scale
- Sampling: sampling_method (EULER, EULER_A, HEUN, etc.), scheduler (DISCRETE, KARRAS, etc.)
- Advanced: clip_skip, strength, control_strength, skip_layers
- Performance: n_threads, offload_params_to_cpu, clip_on_cpu, vae_on_cpu, diffusion_flash_attn
- Model paths: vae_path, taesd_path, controlnet_path, lora_model_dir, embedding_dir

## Development Notes

### When Adding New Endpoints
1. Add handler method declaration in `include/server.h`
2. Implement handler in `src/server.cpp`
3. Register endpoint in `Server::registerEndpoints()`
4. Use helper methods `sendJsonResponse()` and `sendErrorResponse()` for consistent responses

### When Adding New Model Types
1. Add enum value to `ModelType` in `model_manager.h`
2. Update `modelTypeToString()` and `stringToModelType()` in `model_manager.cpp`
3. Add supported file extensions to model scanning logic
4. Update `ServerConfig` struct if a new directory parameter is needed

### When Modifying Generation Parameters
1. Update `GenerationRequest` struct in `generation_queue.h`
2. Update parameter validation in `Server::validateGenerationParameters()`
3. Update request parsing in generation endpoint handlers
4. Update `StableDiffusionWrapper` to pass new parameters to underlying library

### Thread Safety Considerations
- Model Manager uses `std::shared_mutex` - multiple readers OR single writer
- Generation Queue uses `std::mutex` and `std::condition_variable`
- Always use RAII locks (`std::lock_guard`, `std::unique_lock`, `std::shared_lock`)
- Atomic types used for flags: `std::atomic<bool>` for `g_running`, `m_isRunning`

### External Project Integration
The stable-diffusion.cpp library is built as an external project. Include directories and libraries are configured via the `sd-cpp` interface target:
```cmake
target_link_libraries(stable-diffusion-rest-server PRIVATE
    sd-cpp
    ggml
    ggml-base
    ggml-cpu
    ${DEPENDENCY_LIBRARIES}
)
```

When accessing stable-diffusion.cpp APIs, include from the installed headers:
- `#include <stable-diffusion.h>`
- `#include <ggml.h>`

The wrapper class `StableDiffusionWrapper` (`stable_diffusion_wrapper.cpp/h`) encapsulates all interactions with the stable-diffusion.cpp library.

## Model Architecture Detection System

The server includes an automatic model architecture detection system that analyzes checkpoint files to determine their type and required auxiliary models.

### Supported Architectures

The system can detect the following architectures:

| Architecture | Required Files | Command-Line Flags |
|-------------|----------------|-------------------|
| **SD 1.5** | VAE (optional) | `--vae vae-ft-mse-840000-ema-pruned.safetensors` |
| **SD 2.1** | VAE (optional) | `--vae vae-ft-ema-560000.safetensors` |
| **SDXL Base/Refiner** | VAE (optional) | `--vae sdxl_vae.safetensors` |
| **Flux Schnell** | VAE, CLIP-L, T5XXL | `--vae ae.safetensors --clip-l clip_l.safetensors --t5xxl t5xxl_fp16.safetensors` |
| **Flux Dev** | VAE, CLIP-L, T5XXL | `--vae ae.safetensors --clip-l clip_l.safetensors --t5xxl t5xxl_fp16.safetensors` |
| **Flux Chroma** | VAE, T5XXL | `--vae ae.safetensors --t5xxl t5xxl_fp16.safetensors` |
| **SD3** | VAE, CLIP-L, CLIP-G, T5XXL | `--vae sd3_vae.safetensors --clip-l clip_l.safetensors --clip-g clip_g.safetensors --t5xxl t5xxl_fp16.safetensors` |
| **Qwen2-VL** | Qwen2VL, Qwen2VL-Vision | `--qwen2vl qwen2vl.safetensors --qwen2vl-vision qwen2vl_vision.safetensors` |

### How Detection Works

1. **File Format Support**:
   - **Safetensors** (.safetensors): Fully supported
   - **GGUF** (.gguf): Fully supported (quantized models)
   - **PyTorch** (.ckpt, .pt): Assumed to be SD1.5 (cannot parse pickle format safely)

2. **Detection Method**:
   - Reads only file headers (~1MB) for fast detection
   - Analyzes tensor names and shapes
   - Checks for architecture-specific patterns:
     - Flux: `double_blocks`, `single_blocks` tensors
     - SD3: `joint_blocks` tensors
     - SDXL: `conditioner`, `text_encoder_2` tensors
     - Chroma: Flux structure + "chroma" in filename
   - Returns recommended settings (resolution, steps, sampler)

3. **API Integration**:
   - Architecture info is returned in `/api/models` endpoint
   - Includes `required_models` array listing needed auxiliary files
   - Includes `missing_models` array if dependencies are not found
   - Frontend can display warnings for missing dependencies

### Usage in Model Manager

During model scanning (`model_manager.cpp`):

```cpp
if (detectedType == ModelType::CHECKPOINT) {
    ModelDetectionResult detection = ModelDetector::detectModel(info.fullPath);
    info.architecture = detection.architectureName;
    info.recommendedVAE = detection.recommendedVAE;
    info.recommendedWidth = std::stoi(detection.suggestedParams["width"]);
    // ... parse other recommended parameters

    // Build required models list
    if (detection.needsVAE) {
        info.requiredModels.push_back("VAE: " + detection.recommendedVAE);
    }
}
```

### API Response Example

```json
{
  "name": "chroma-unlocked-v50-Q8_0.gguf",
  "type": "checkpoint",
  "architecture": "Flux Chroma (Unlocked)",
  "recommended_vae": "ae.safetensors",
  "recommended_width": 1024,
  "recommended_height": 1024,
  "recommended_steps": 20,
  "recommended_sampler": "euler",
  "required_models": [
    "VAE: ae.safetensors",
    "T5XXL: t5xxl_fp16.safetensors"
  ],
  "missing_models": [],
  "has_missing_dependencies": false
}
```

### Testing the Detection System

A standalone test binary can be built to test model detection:

```bash
cd build
cmake -DBUILD_MODEL_DETECTOR_TEST=ON ..
cmake --build . --target test_model_detector

# Run tests
./src/test_model_detector /data/SD_MODELS/checkpoints
```

### Architecture-Specific Loading

The server will automatically use the correct parameters when loading models based on detected architecture. For architectures requiring multiple auxiliary models (Flux, SD3, Qwen), the server will:

1. Check if all required models are available
2. Return warnings via API if models are missing
3. Display warnings in WebUI with instructions to load missing models
4. Provide correct command-line flags for manual loading

See `MODEL_DETECTION.md` for complete documentation on the detection system.