Skip to main content

Workflow error handling

Raydocs workflows handle errors in three layers:
  1. Retry locally
  2. Handle locally
  3. Catch globally
The goal is simple: recover from transient errors when possible, handle expected exceptions close to the node that raised them, and keep a single workflow-level cleanup path for everything else.

1. Retry locally

Each non-trigger node can enable Retry on fail. Use it when a node may fail for temporary reasons:
  • flaky HTTP requests
  • timeouts
  • brief upstream outages
  • rate limits
If a retry succeeds, the node is treated as successful and the workflow does not fail.

2. Handle locally

Each non-trigger node also has an On error policy:
  • Fail workflow
  • Continue
  • Route to error output

Continue

Use Continue when the failure is acceptable and the workflow should move on. The node emits a structured error payload on main, and the workflow keeps running. This option is available only on nodes that already have a normal main output.

Route to error output

Use Route to error output when you want a node-specific error branch. This is useful for local recovery logic such as:
  • fallback parsing
  • alternate API call
  • local notification
  • skipping one item in a loop
When a node routes to error, the failure is considered handled and does not trigger workflow-level failure handling. To use it safely, connect the node’s error output to a downstream node. If no error path is connected, the failure is treated as unhandled and falls back to workflow-level failure handling.

3. Catch globally with On Workflow Error

Add an On Workflow Error trigger when you want a single cleanup path inside the same workflow. This trigger runs only when a failure remains unhandled after:
  • retries are exhausted
  • the node still resolves to Fail workflow
  • no local continue or route absorbed the failure
Typical uses:
  • delete temporary files
  • rollback partial side effects
  • notify Slack or email
  • create an incident log
  • Use Retry on fail for transient errors.
  • Use Route to error output for node-specific recovery.
  • Use On Workflow Error for global cleanup.

What data does On Workflow Error receive?

The trigger receives structured information about the failure, including:
  • failed node id
  • failed step run id
  • failed node type
  • error message
  • error details
  • original run input
  • workflow run context
  • retry attempt information
This makes it possible to build cleanup logic without manually wiring every possible node failure.

Example strategy

For an HTTP-heavy workflow:
  • enable retries on the HTTP node
  • use a local error branch only if one request has a special fallback
  • add On Workflow Error to clean up temp files and send one final alert if the workflow still fails
That gives you resilience without losing a clear global failure path.