-
-
Notifications
You must be signed in to change notification settings - Fork 769
Description
SUMMARY
After a task failed, the workflow got stuck and didn't execute the next task. The workflow was still in running
state but no task was running anymore (all tasks were either succeeded
or failed
, I checked with st2 execution get ID
). The workflow stayed in this state 10 hours+ before I manually canceled it.
When checking st2workflowengine.log, I discovered this (the important line is the last one):
2019-06-04 15:58:28,387 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] Publish task "comment_ticket_processing_failure", route "0", with status "succeeded" to conductor.
2019-06-04 15:58:33,330 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] Identifying next set (0) of tasks after completion of task "comment_ticket_processing_failure", route "0".
2019-06-04 15:58:33,343 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] Identified the following set of tasks to execute next: process_order (route 0)
2019-06-04 15:58:33,343 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] Mark task "process_order", route "0", in conductor as running.
2019-06-04 15:58:36,275 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] Requesting execution for task "process_order", route "0".
2019-06-04 15:58:36,276 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] Processing task execution request for task "process_order", route "0".
2019-06-04 15:58:52,827 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] Task execution "5cf69534ba584539159a5e31" created for task "process_order", route "0".
2019-06-04 15:58:56,105 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] Action execution "5cf69540ba584539159a5e33" requested for task "process_order", route "0".
2019-06-04 15:58:59,145 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] Identifying next set (1) of tasks for workflow execution in status "running".
2019-06-04 15:58:59,151 140343041628752 INFO workflows [-] [5cf66aa2ba5845383247454d] No tasks identified to execute next.
2019-06-04 15:58:59,562 140343037586640 INFO workflows [-] [5cf66aa2ba5845383247454d] Action execution "5cf69540ba584539159a5e33" for task "process_order" is updated and in "scheduled" state.
2019-06-04 15:59:38,901 140343039628944 INFO workflows [-] [5cf66aa2ba5845383247454d] Action execution "5cf69540ba584539159a5e33" for task "process_order" is updated and in "failed" state.
2019-06-04 15:59:40,684 140343039628944 INFO workflows [-] [5cf66aa2ba5845383247454d] Handling completion of action execution "5cf69540ba584539159a5e33" in status "failed" for task "process_order", route "0".
2019-06-04 15:59:46,846 140343039628944 INFO workflows [-] [5cf66aa2ba5845383247454d] Publish task "process_order", route "0", with status "failed" to conductor.
2019-06-04 15:59:47,796 140343039628944 ERROR consumers [-] VariableMessageQueueConsumer failed to process message: ActionExecutionDB(action={...
After this, there is nothing in the logs for this workflow anymore.
Do you have an idea of what's the root cause/how we can prevent this from happening?
ISSUE TYPE
Bug Report
STACKSTORM VERSION
st2 3.0.1, on Python 2.7.12
OS / ENVIRONMENT / INSTALL METHOD
Custom HA install on Ubuntu 16.04
STEPS TO REPRODUCE
It's hard to reproduce as it doesn't happen everytime. However, it happened already 2 times to me. I think it can happen with any workflow.
EXPECTED RESULTS
If this error happens, it should retry/the workflow shouldn't get stuck.
ACTUAL RESULTS
When this error happens, the workflow gets stuck and we have to manually cancel it.