CLAUDE.md 13 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a C++ REST API server that wraps the stable-diffusion.cpp library, providing HTTP endpoints for Stable Diffusion image generation. The server is built with a modular architecture featuring three main components: HTTP Server, Generation Queue, and Model Manager.

Build Commands

Initial Setup and Build

# Create build directory and configure
mkdir build && cd build
cmake ..

# Build the project (parallel build)
cmake --build . --parallel

# Install (optional)
cmake --install .

Build Configuration Options

# Build with CUDA support (default: ON)
cmake -DSD_CUDA_SUPPORT=ON ..

# Build without CUDA
cmake -DSD_CUDA_SUPPORT=OFF ..

# Debug build
cmake -DCMAKE_BUILD_TYPE=Debug ..

# Release build (default)
cmake -DCMAKE_BUILD_TYPE=Release ..

Clean and Rebuild

# Clean build artifacts
cd build
cmake --build . --target clean

# Or delete build directory entirely
rm -rf build
mkdir build && cd build
cmake ..
cmake --build . --parallel

Running the Server

Required Parameters: Both --models-dir and --checkpoints are required.

# Basic usage with required parameters
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints

# The above resolves checkpoints to: /path/to/models/checkpoints

# Using absolute path for checkpoints
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints /absolute/path/to/checkpoints

# With custom port and host
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints --host 0.0.0.0 --port 8080

# With verbose logging
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints --verbose

# With optional model directories (relative paths)
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints --lora-dir lora --vae-dir vae

# With optional model directories (absolute paths)
./stable-diffusion-rest-server --models-dir /path/to/models --checkpoints checkpoints --lora-dir /other/lora --vae-dir /other/vae

Path Resolution Logic:

  • If a directory parameter is an absolute path, it's used as-is
  • If a directory parameter is a relative path, it's resolved relative to --models-dir
  • Example: --models-dir /data/models --checkpoints checkpoints/data/models/checkpoints
  • Example: --models-dir /data/models --checkpoints /other/checkpoints/other/checkpoints

Architecture

Three-Component Design

  1. HTTP Server (src/server.cpp, include/server.h)

    • Uses cpp-httplib for HTTP handling
    • Runs in separate thread from generation
    • Handles request validation and response formatting
    • All endpoints are registered in registerEndpoints()
    • CORS is configured in setupCORS()
  2. Generation Queue (src/generation_queue.cpp, include/generation_queue.h)

    • Thread-safe queue for managing generation requests
    • Uses Pimpl idiom (implementation hidden in .cpp)
    • Processes jobs sequentially (one at a time by default)
    • Provides job tracking via JobInfo structures
    • Main methods: enqueueRequest(), getQueueStatus(), cancelJob()
  3. Model Manager (src/model_manager.cpp, include/model_manager.h)

    • Handles loading/unloading of different model types
    • Uses Pimpl idiom for implementation hiding
    • All model directories are explicitly configured
    • Supports path resolution: absolute paths used as-is, relative paths resolved from base models directory
    • Thread-safe with shared_mutex for concurrent reads
    • Model scanning is cancellable via cancelScan()

Threading Architecture

  • Main thread: Initialization, signal handling, coordination
  • Server thread: HTTP request handling (in Server::serverThreadFunction())
  • Queue worker threads: Generation processing (managed by GenerationQueue)
  • Signal handler sets global g_running flag for graceful shutdown

Model Type System

Model types are bit flags (can be combined):

enum class ModelType {
    LORA = 1, CHECKPOINT = 2, VAE = 4, PRESETS = 8,
    PROMPTS = 16, NEG_PROMPTS = 32, TAESD = 64,
    ESRGAN = 128, CONTROLNET = 256, UPSCALER = 512,
    EMBEDDING = 1024
};

Supported file extensions by type:

  • LORA, CHECKPOINT, VAE, TAESD, CONTROLNET, EMBEDDING: .safetensors, .pt, .ckpt
  • PRESETS: .json, .yaml, .yml
  • PROMPTS, NEG_PROMPTS: .txt, .json
  • ESRGAN, UPSCALER: .pth, .pt

Key API Endpoints

Generation Endpoints

  • POST /api/v1/generate - General image generation
  • POST /api/v1/text2img - Text-to-image generation
  • POST /api/v1/img2img - Image-to-image generation
  • POST /api/v1/controlnet - ControlNet generation

Model Management

  • GET /api/v1/models - List all available models
  • GET /api/v1/models/{type} - List models by type
  • GET /api/v1/models/{id} - Get model details
  • POST /api/v1/models/load - Load a model
  • POST /api/v1/models/unload - Unload a model
  • POST /api/v1/models/scan - Rescan models directory

Queue Management

  • GET /api/v1/queue - Get queue status
  • GET /api/v1/jobs/{id} - Get job status
  • DELETE /api/v1/jobs/{id} - Cancel a job
  • DELETE /api/v1/queue - Clear queue

System Information

  • GET /api/v1/health - Health check
  • GET /api/v1/status - API status
  • GET /api/v1/system - System capabilities

Dependencies Management

Dependencies are managed in cmake/FindDependencies.cmake using CMake's FetchContent:

  • nlohmann/json (v3.11.2) - JSON parsing/serialization
  • cpp-httplib (v0.14.1) - HTTP server library
  • stable-diffusion.cpp - Core SD library (via ExternalProject)
  • Threads - POSIX threads
  • OpenMP (optional) - Parallel processing
  • CUDA (optional) - GPU acceleration

The stable-diffusion.cpp library is downloaded and built automatically via ExternalProject_Add() in the root CMakeLists.txt. The specific git tag is pinned to master-334-d05e46c.

Important Implementation Details

Pimpl Idiom Usage

Both GenerationQueue and ModelManager use the Pimpl (Pointer to Implementation) idiom:

class GenerationQueue {
private:
    class Impl;
    std::unique_ptr<Impl> pImpl;
};

All implementation details are in the .cpp files, not headers. When modifying these classes, update the inner Impl class definition in the .cpp file.

Signal Handling

  • Global pointer g_server allows signal handler to trigger graceful shutdown
  • Signal handler sets g_running atomic flag and calls server->stop()
  • Shutdown sequence: stop server → stop queue → wait for threads → cleanup

Directory Configuration

The server requires explicit configuration of model directories:

Required Parameters:

  • --models-dir: Base directory for models (required)
  • --checkpoints: Checkpoints directory (required)

Optional Parameters:

  • --lora-dir, --vae-dir, --controlnet-dir, etc. (optional)

Path Resolution:

  • Absolute paths (e.g., /absolute/path/to/models) are used as-is
  • Relative paths (e.g., checkpoints) are resolved relative to --models-dir
  • The resolveDirectoryPath() function in main.cpp handles this logic

Generation Parameters

The GenerationRequest structure in generation_queue.h contains all parameters from stable-diffusion.cpp's CLI including:

  • Basic: prompt, negative_prompt, width, height, steps, cfg_scale
  • Sampling: sampling_method (EULER, EULER_A, HEUN, etc.), scheduler (DISCRETE, KARRAS, etc.)
  • Advanced: clip_skip, strength, control_strength, skip_layers
  • Performance: n_threads, offload_params_to_cpu, clip_on_cpu, vae_on_cpu, diffusion_flash_attn
  • Model paths: vae_path, taesd_path, controlnet_path, lora_model_dir, embedding_dir

Development Notes

When Adding New Endpoints

  1. Add handler method declaration in include/server.h
  2. Implement handler in src/server.cpp
  3. Register endpoint in Server::registerEndpoints()
  4. Use helper methods sendJsonResponse() and sendErrorResponse() for consistent responses

When Adding New Model Types

  1. Add enum value to ModelType in model_manager.h
  2. Update modelTypeToString() and stringToModelType() in model_manager.cpp
  3. Add supported file extensions to model scanning logic
  4. Update ServerConfig struct if a new directory parameter is needed

When Modifying Generation Parameters

  1. Update GenerationRequest struct in generation_queue.h
  2. Update parameter validation in Server::validateGenerationParameters()
  3. Update request parsing in generation endpoint handlers
  4. Update StableDiffusionWrapper to pass new parameters to underlying library

Thread Safety Considerations

  • Model Manager uses std::shared_mutex - multiple readers OR single writer
  • Generation Queue uses std::mutex and std::condition_variable
  • Always use RAII locks (std::lock_guard, std::unique_lock, std::shared_lock)
  • Atomic types used for flags: std::atomic<bool> for g_running, m_isRunning

External Project Integration

The stable-diffusion.cpp library is built as an external project. Include directories and libraries are configured via the sd-cpp interface target:

target_link_libraries(stable-diffusion-rest-server PRIVATE
    sd-cpp
    ggml
    ggml-base
    ggml-cpu
    ${DEPENDENCY_LIBRARIES}
)

When accessing stable-diffusion.cpp APIs, include from the installed headers:

  • #include <stable-diffusion.h>
  • #include <ggml.h>

The wrapper class StableDiffusionWrapper (stable_diffusion_wrapper.cpp/h) encapsulates all interactions with the stable-diffusion.cpp library.

Model Architecture Detection System

The server includes an automatic model architecture detection system that analyzes checkpoint files to determine their type and required auxiliary models.

Supported Architectures

The system can detect the following architectures:

Architecture Required Files Command-Line Flags
SD 1.5 VAE (optional) --vae vae-ft-mse-840000-ema-pruned.safetensors
SD 2.1 VAE (optional) --vae vae-ft-ema-560000.safetensors
SDXL Base/Refiner VAE (optional) --vae sdxl_vae.safetensors
Flux Schnell VAE, CLIP-L, T5XXL --vae ae.safetensors --clip-l clip_l.safetensors --t5xxl t5xxl_fp16.safetensors
Flux Dev VAE, CLIP-L, T5XXL --vae ae.safetensors --clip-l clip_l.safetensors --t5xxl t5xxl_fp16.safetensors
Flux Chroma VAE, T5XXL --vae ae.safetensors --t5xxl t5xxl_fp16.safetensors
SD3 VAE, CLIP-L, CLIP-G, T5XXL --vae sd3_vae.safetensors --clip-l clip_l.safetensors --clip-g clip_g.safetensors --t5xxl t5xxl_fp16.safetensors
Qwen2-VL Qwen2VL, Qwen2VL-Vision --qwen2vl qwen2vl.safetensors --qwen2vl-vision qwen2vl_vision.safetensors

How Detection Works

  1. File Format Support:

    • Safetensors (.safetensors): Fully supported
    • GGUF (.gguf): Fully supported (quantized models)
    • PyTorch (.ckpt, .pt): Assumed to be SD1.5 (cannot parse pickle format safely)
  2. Detection Method:

    • Reads only file headers (~1MB) for fast detection
    • Analyzes tensor names and shapes
    • Checks for architecture-specific patterns:
      • Flux: double_blocks, single_blocks tensors
      • SD3: joint_blocks tensors
      • SDXL: conditioner, text_encoder_2 tensors
      • Chroma: Flux structure + "chroma" in filename
    • Returns recommended settings (resolution, steps, sampler)
  3. API Integration:

    • Architecture info is returned in /api/models endpoint
    • Includes required_models array listing needed auxiliary files
    • Includes missing_models array if dependencies are not found
    • Frontend can display warnings for missing dependencies

Usage in Model Manager

During model scanning (model_manager.cpp):

if (detectedType == ModelType::CHECKPOINT) {
    ModelDetectionResult detection = ModelDetector::detectModel(info.fullPath);
    info.architecture = detection.architectureName;
    info.recommendedVAE = detection.recommendedVAE;
    info.recommendedWidth = std::stoi(detection.suggestedParams["width"]);
    // ... parse other recommended parameters

    // Build required models list
    if (detection.needsVAE) {
        info.requiredModels.push_back("VAE: " + detection.recommendedVAE);
    }
}

API Response Example

{
  "name": "chroma-unlocked-v50-Q8_0.gguf",
  "type": "checkpoint",
  "architecture": "Flux Chroma (Unlocked)",
  "recommended_vae": "ae.safetensors",
  "recommended_width": 1024,
  "recommended_height": 1024,
  "recommended_steps": 20,
  "recommended_sampler": "euler",
  "required_models": [
    "VAE: ae.safetensors",
    "T5XXL: t5xxl_fp16.safetensors"
  ],
  "missing_models": [],
  "has_missing_dependencies": false
}

Testing the Detection System

A standalone test binary can be built to test model detection:

cd build
cmake -DBUILD_MODEL_DETECTOR_TEST=ON ..
cmake --build . --target test_model_detector

# Run tests
./src/test_model_detector /data/SD_MODELS/checkpoints

Architecture-Specific Loading

The server will automatically use the correct parameters when loading models based on detected architecture. For architectures requiring multiple auxiliary models (Flux, SD3, Qwen), the server will:

  1. Check if all required models are available
  2. Return warnings via API if models are missing
  3. Display warnings in WebUI with instructions to load missing models
  4. Provide correct command-line flags for manual loading

See MODEL_DETECTION.md for complete documentation on the detection system.