#8 bug: stale "processing" jobs after server restart

クローズ
3 ヶ月 前fszontagh によって開かれました · 1 コメント
Szontágh Ferenc3 ヶ月 前 にコメントしました

After a server crash or restart, queue items that were in "processing" state remain stuck in that state instead of being marked as failed or cancelled.

Steps to reproduce:

  1. Start a generation job
  2. Server crashes (e.g., segfault)
  3. Restart server
  4. Check queue view - job still shows as "processing"

Expected behavior: On server startup, any jobs in "processing" state should be automatically marked as "failed" or "cancelled" since they cannot continue.

Impact:

  • Misleading UI showing jobs as active when they're not
  • Queue view shows incorrect state
  • Users don't know the job failed
After a server crash or restart, queue items that were in "processing" state remain stuck in that state instead of being marked as failed or cancelled. **Steps to reproduce:** 1. Start a generation job 2. Server crashes (e.g., segfault) 3. Restart server 4. Check queue view - job still shows as "processing" **Expected behavior:** On server startup, any jobs in "processing" state should be automatically marked as "failed" or "cancelled" since they cannot continue. **Impact:** - Misleading UI showing jobs as active when they're not - Queue view shows incorrect state - Users don't know the job failed
Szontágh Ferenc この課題がコミット 3 ヶ月 前 から参照されました
fszontagh 3 ヶ月 前 に閉じられました
Szontágh Ferenc3 ヶ月 前 にコメントしました
オーナー

Fixed in commit eac0d80

Root Cause: When the server crashed or restarted, jobs that were in "processing" state were loaded from disk with their original status preserved. Since the generation worker threads were terminated, these jobs could never complete but remained stuck as "processing".

Solution: Added cleanup logic in loadJobsFromDisk() method to detect and mark stale jobs:

  1. After loading each job from disk, check if status is PROCESSING
  2. If true, mark the job as FAILED
  3. Set error message: "Server restarted while job was processing"
  4. Set endTime to current timestamp
  5. Persist updated job status back to disk using saveJobToFile()
  6. Log cleanup action to console

Changes: Modified src/generation_queue.cpp (lines 554-562):

// Clean up stale processing jobs from server restart
if (job.status == GenerationStatus::PROCESSING) {
    job.status = GenerationStatus::FAILED;
    job.errorMessage = "Server restarted while job was processing";
    job.endTime = std::chrono::steady_clock::now();
    std::cout << "Marked stale job as failed: " << job.id << std::endl;
    // Persist updated status to disk
    saveJobToFile(job);
}

Result:

  • Queue view shows accurate job states after server restart
  • No more misleading "processing" indicators
  • Users can see which jobs were interrupted by restart
  • Console logs confirm cleanup: "Marked stale job as failed: {job_id}"

The fix runs automatically on every server startup when loading persisted jobs.

## Fixed in commit eac0d80 **Root Cause:** When the server crashed or restarted, jobs that were in "processing" state were loaded from disk with their original status preserved. Since the generation worker threads were terminated, these jobs could never complete but remained stuck as "processing". **Solution:** Added cleanup logic in `loadJobsFromDisk()` method to detect and mark stale jobs: 1. After loading each job from disk, check if status is `PROCESSING` 2. If true, mark the job as `FAILED` 3. Set error message: "Server restarted while job was processing" 4. Set endTime to current timestamp 5. Persist updated job status back to disk using `saveJobToFile()` 6. Log cleanup action to console **Changes:** Modified `src/generation_queue.cpp` (lines 554-562): ```cpp // Clean up stale processing jobs from server restart if (job.status == GenerationStatus::PROCESSING) { job.status = GenerationStatus::FAILED; job.errorMessage = "Server restarted while job was processing"; job.endTime = std::chrono::steady_clock::now(); std::cout << "Marked stale job as failed: " << job.id << std::endl; // Persist updated status to disk saveJobToFile(job); } ``` **Result:** - Queue view shows accurate job states after server restart - No more misleading "processing" indicators - Users can see which jobs were interrupted by restart - Console logs confirm cleanup: "Marked stale job as failed: {job_id}" The fix runs automatically on every server startup when loading persisted jobs.
会話に参加するには サインイン してください。
ラベルなし
bug
ui
マイルストーンなし
担当者なし
1 参加者
読み込み中…
キャンセル
保存
まだコンテンツがありません