Workflow error handling

Raydocs workflows handle errors in three layers:

Retry locally
Handle locally
Catch globally

The goal is simple: recover from transient errors when possible, handle expected exceptions close to the node that raised them, and keep a single workflow-level cleanup path for everything else.

1. Retry locally

Each non-trigger node can enable Retry on fail. Use it when a node may fail for temporary reasons:

flaky HTTP requests
timeouts
brief upstream outages
rate limits

If a retry succeeds, the node is treated as successful and the workflow does not fail.

2. Handle locally

Each non-trigger node also has an On error policy:

Fail workflow
Continue
Route to error output

Continue

Use Continue when the failure is acceptable and the workflow should move on. The node emits a structured error payload on main, and the workflow keeps running. This option is available only on nodes that already have a normal main output.

Route to error output

Use Route to error output when you want a node-specific error branch. This is useful for local recovery logic such as:

fallback parsing
alternate API call
local notification
skipping one item in a loop

When a node routes to error, the failure is considered handled and does not trigger workflow-level failure handling. To use it safely, connect the node’s error output to a downstream node. If no error path is connected, the failure is treated as unhandled and falls back to workflow-level failure handling.

3. Catch globally with On Workflow Error

Add an On Workflow Error trigger when you want a single cleanup path inside the same workflow. This trigger runs only when a failure remains unhandled after:

retries are exhausted
the node still resolves to Fail workflow
no local continue or route absorbed the failure

Typical uses:

delete temporary files
rollback partial side effects
notify Slack or email
create an incident log

Recommended mental model

Use Retry on fail for transient errors.
Use Route to error output for node-specific recovery.
Use On Workflow Error for global cleanup.

What data does On Workflow Error receive?

The trigger receives structured information about the failure, including:

failed node id
failed step run id
failed node type
error message
error details
original run input
workflow run context
retry attempt information

This makes it possible to build cleanup logic without manually wiring every possible node failure.

Example strategy

For an HTTP-heavy workflow:

enable retries on the HTTP node
use a local error branch only if one request has a special fallback
add On Workflow Error to clean up temp files and send one final alert if the workflow still fails

That gives you resilience without losing a clear global failure path.

Documentation Index

​Workflow error handling

​1. Retry locally

​2. Handle locally

​Continue

​Route to error output

​3. Catch globally with On Workflow Error

​Recommended mental model

​What data does On Workflow Error receive?

​Example strategy