#8 bug: stale "processing" jobs after server restart

已關閉
fszontagh3 月之前創建 · 1 條評論

After a server crash or restart, queue items that were in "processing" state remain stuck in that state instead of being marked as failed or cancelled.

Steps to reproduce:

  1. Start a generation job
  2. Server crashes (e.g., segfault)
  3. Restart server
  4. Check queue view - job still shows as "processing"

Expected behavior: On server startup, any jobs in "processing" state should be automatically marked as "failed" or "cancelled" since they cannot continue.

Impact:

  • Misleading UI showing jobs as active when they're not
  • Queue view shows incorrect state
  • Users don't know the job failed
After a server crash or restart, queue items that were in "processing" state remain stuck in that state instead of being marked as failed or cancelled. **Steps to reproduce:** 1. Start a generation job 2. Server crashes (e.g., segfault) 3. Restart server 4. Check queue view - job still shows as "processing" **Expected behavior:** On server startup, any jobs in "processing" state should be automatically marked as "failed" or "cancelled" since they cannot continue. **Impact:** - Misleading UI showing jobs as active when they're not - Queue view shows incorrect state - Users don't know the job failed

Fixed in commit eac0d80

Root Cause: When the server crashed or restarted, jobs that were in "processing" state were loaded from disk with their original status preserved. Since the generation worker threads were terminated, these jobs could never complete but remained stuck as "processing".

Solution: Added cleanup logic in loadJobsFromDisk() method to detect and mark stale jobs:

  1. After loading each job from disk, check if status is PROCESSING
  2. If true, mark the job as FAILED
  3. Set error message: "Server restarted while job was processing"
  4. Set endTime to current timestamp
  5. Persist updated job status back to disk using saveJobToFile()
  6. Log cleanup action to console

Changes: Modified src/generation_queue.cpp (lines 554-562):

// Clean up stale processing jobs from server restart
if (job.status == GenerationStatus::PROCESSING) {
    job.status = GenerationStatus::FAILED;
    job.errorMessage = "Server restarted while job was processing";
    job.endTime = std::chrono::steady_clock::now();
    std::cout << "Marked stale job as failed: " << job.id << std::endl;
    // Persist updated status to disk
    saveJobToFile(job);
}

Result:

  • Queue view shows accurate job states after server restart
  • No more misleading "processing" indicators
  • Users can see which jobs were interrupted by restart
  • Console logs confirm cleanup: "Marked stale job as failed: {job_id}"

The fix runs automatically on every server startup when loading persisted jobs.

## Fixed in commit eac0d80 **Root Cause:** When the server crashed or restarted, jobs that were in "processing" state were loaded from disk with their original status preserved. Since the generation worker threads were terminated, these jobs could never complete but remained stuck as "processing". **Solution:** Added cleanup logic in `loadJobsFromDisk()` method to detect and mark stale jobs: 1. After loading each job from disk, check if status is `PROCESSING` 2. If true, mark the job as `FAILED` 3. Set error message: "Server restarted while job was processing" 4. Set endTime to current timestamp 5. Persist updated job status back to disk using `saveJobToFile()` 6. Log cleanup action to console **Changes:** Modified `src/generation_queue.cpp` (lines 554-562): ```cpp // Clean up stale processing jobs from server restart if (job.status == GenerationStatus::PROCESSING) { job.status = GenerationStatus::FAILED; job.errorMessage = "Server restarted while job was processing"; job.endTime = std::chrono::steady_clock::now(); std::cout << "Marked stale job as failed: " << job.id << std::endl; // Persist updated status to disk saveJobToFile(job); } ``` **Result:** - Queue view shows accurate job states after server restart - No more misleading "processing" indicators - Users can see which jobs were interrupted by restart - Console logs confirm cleanup: "Marked stale job as failed: {job_id}" The fix runs automatically on every server startup when loading persisted jobs.
登入 才能加入這對話。
未選擇標籤
bug
ui
未選擇里程碑
未指派成員
1 參與者
正在加載...
取消
保存
尚未有任何內容