#8 bug: stale "processing" jobs after server restart

Lezárt
megnyitva ekkor: 3 hónapja fszontagh által · 1 hozzászólás

After a server crash or restart, queue items that were in "processing" state remain stuck in that state instead of being marked as failed or cancelled.

Steps to reproduce:

  1. Start a generation job
  2. Server crashes (e.g., segfault)
  3. Restart server
  4. Check queue view - job still shows as "processing"

Expected behavior: On server startup, any jobs in "processing" state should be automatically marked as "failed" or "cancelled" since they cannot continue.

Impact:

  • Misleading UI showing jobs as active when they're not
  • Queue view shows incorrect state
  • Users don't know the job failed
After a server crash or restart, queue items that were in "processing" state remain stuck in that state instead of being marked as failed or cancelled. **Steps to reproduce:** 1. Start a generation job 2. Server crashes (e.g., segfault) 3. Restart server 4. Check queue view - job still shows as "processing" **Expected behavior:** On server startup, any jobs in "processing" state should be automatically marked as "failed" or "cancelled" since they cannot continue. **Impact:** - Misleading UI showing jobs as active when they're not - Queue view shows incorrect state - Users don't know the job failed
fszontagh lezárta ekkor: 3 hónapja
Szontágh Ferenc hozzászólt 3 hónapja
Tulajdonos

Fixed in commit eac0d80

Root Cause: When the server crashed or restarted, jobs that were in "processing" state were loaded from disk with their original status preserved. Since the generation worker threads were terminated, these jobs could never complete but remained stuck as "processing".

Solution: Added cleanup logic in loadJobsFromDisk() method to detect and mark stale jobs:

  1. After loading each job from disk, check if status is PROCESSING
  2. If true, mark the job as FAILED
  3. Set error message: "Server restarted while job was processing"
  4. Set endTime to current timestamp
  5. Persist updated job status back to disk using saveJobToFile()
  6. Log cleanup action to console

Changes: Modified src/generation_queue.cpp (lines 554-562):

// Clean up stale processing jobs from server restart
if (job.status == GenerationStatus::PROCESSING) {
    job.status = GenerationStatus::FAILED;
    job.errorMessage = "Server restarted while job was processing";
    job.endTime = std::chrono::steady_clock::now();
    std::cout << "Marked stale job as failed: " << job.id << std::endl;
    // Persist updated status to disk
    saveJobToFile(job);
}

Result:

  • Queue view shows accurate job states after server restart
  • No more misleading "processing" indicators
  • Users can see which jobs were interrupted by restart
  • Console logs confirm cleanup: "Marked stale job as failed: {job_id}"

The fix runs automatically on every server startup when loading persisted jobs.

## Fixed in commit eac0d80 **Root Cause:** When the server crashed or restarted, jobs that were in "processing" state were loaded from disk with their original status preserved. Since the generation worker threads were terminated, these jobs could never complete but remained stuck as "processing". **Solution:** Added cleanup logic in `loadJobsFromDisk()` method to detect and mark stale jobs: 1. After loading each job from disk, check if status is `PROCESSING` 2. If true, mark the job as `FAILED` 3. Set error message: "Server restarted while job was processing" 4. Set endTime to current timestamp 5. Persist updated job status back to disk using `saveJobToFile()` 6. Log cleanup action to console **Changes:** Modified `src/generation_queue.cpp` (lines 554-562): ```cpp // Clean up stale processing jobs from server restart if (job.status == GenerationStatus::PROCESSING) { job.status = GenerationStatus::FAILED; job.errorMessage = "Server restarted while job was processing"; job.endTime = std::chrono::steady_clock::now(); std::cout << "Marked stale job as failed: " << job.id << std::endl; // Persist updated status to disk saveJobToFile(job); } ``` **Result:** - Queue view shows accurate job states after server restart - No more misleading "processing" indicators - Users can see which jobs were interrupted by restart - Console logs confirm cleanup: "Marked stale job as failed: {job_id}" The fix runs automatically on every server startup when loading persisted jobs.
Jelentkezzen be hogy csatlakozhasson a beszélgetéshez.
Nincs címke
bug
ui
Nincs mérföldkő
Nincs megbízott
1 Résztvevő
Töltés...
Mégse
Mentés
Még nincs tartalom.