Routing and Correctness

  • Always define a Default route in Switch nodes
  • Prefer mutually exclusive branch conditions
  • If conditions overlap, explicitly order branches to reflect intended priority

Rate Limiting and User Experience

  • Configure retries carefully to avoid excessive contact attempts
  • Add a Wait node between telephony calls to prevent multiple calls in short intervals due to misconfiguration

Error Handling

  • Review and configure retry settings for every node
  • Prefer Continue on retry exhaust and implement explicit stop logic via workflow routing where required

Types and Timestamps

  • Keep workflow variable types aligned with the values being produced (notably agent outputs)
  • For timestamp comparisons, enforce the UTC format: %Y-%m-%dT%H:%M:%S.%fZ

Infinite Loop Prevention

An infinite loop occurs when a workflow routes back to a previous node and can repeat indefinitely. In the worst case, this results in high-frequency repeated telephony calls (e.g., calls firing every second).

High-Risk Loop Configurations

The following patterns must be avoided to prevent runaway call loops.
This is the most common cause of high-frequency repeated calls. If the Default path (or any branch) routes back to a Telephony node, a Wait node is required before the Telephony node.
Switch nodes must not contain incomplete conditions such as:
  • Variable not selected / missing variable input
  • Comparisons where the left-hand side becomes null (e.g., null == busy, null == no_answer)
This class of misconfiguration can prevent intended branches from matching and cause execution to fall through to Default, which may route back into a retry loop.Observed incident: A Switch node condition was saved without selecting connectivity_status, creating an invalid comparison that did not correctly match busy/no-answer cases. Execution fell to the Default branch which routed back to Telephony without a Wait.

Workflow Loop Alerts (Slack)

Automated loop-risk notifications are posted in #workflow-loop-alerts.

What Triggers an Alert

Alerts fire when the system detects loop-like patterns such as high call frequency for a workflow run (e.g., ”≥ 3 calls in 20 minutes”). Alerts typically include:
  • workflow_run_id
  • org_id
  • count
  • reason

Response Procedure

When an alert appears in #workflow-loop-alerts:
1

Containment

  • If repeated calls are ongoing: Pause the workflow campaign immediately
  • If impact is severe or continuing: Cancel the campaign
2

Identify the Loop Path

Find the repeating node sequence. Look specifically for:
  • Missing Wait nodes in retry loops
  • Always-true or invalid conditions (including null comparisons)
  • Empty or incomplete branches
3

Prevent Repeat Impact

  • Use the recovery feature (launching soon) to filter out affected users and restart with a corrected workflow version
  • Commit a new workflow version and run a new campaign against the committed version

Common Bugs and How to Fix Them

1. High-frequency repeated calls caused by Default routing back to Telephony

Symptom: Multiple calls to the same user in seconds or minutes. Root causes:
  • Default branch (or a fallback branch) routes back to Telephony
  • No Wait node exists between Switch and Telephony
Prevention: Default must not route back to Telephony without a Wait.

2. Invalid/incomplete conditions in Switch nodes (null comparisons)

Symptom: Expected branch does not match; execution falls to Default unexpectedly. Loop occurs if Default routes back to Telephony. Root causes:
  • A condition is saved without selecting the variable (e.g., missing connectivity_status)
  • The system evaluates comparisons like null == busy or null == no_answer
Prevention:
  • Verify every condition has a selected variable and valid comparison value
  • Do not use Default as a retry mechanism

3. Telephony outputs become null/empty and overwrite workflow variables

Symptom: After a call is not picked up (or retries exhaust), workflow variables unexpectedly become empty/null. Subsequent nodes fail or route incorrectly due to missing data. Root cause: Telephony webhook/final agent variable outputs were null in non-connected cases, and output mapping overwrote existing workflow variables with null/empty values. Fix pattern: Prevent workflow variables from being overwritten when the final agent variable output is null. Prevention:
  • Only map back outputs that should change
  • Avoid mapping static identifiers (loan number, username) in output mapping