# Fallbacks
**All robots can fail, but smart robots recover.**
Fallbacks are the **Self-Healing Mechanism** of a Sugarcoat component. They define the specific set of [Actions](actions.md) to execute automatically when a failure is detected in the component's [Health Status](status.md).
Instead of crashing or freezing when an error occurs, a Component can be configured to attempt intelligent recovery strategies:
* *Algorithm stuck?* $\rightarrow$ **Switch** to a simpler backup.
* *Driver disconnected?* $\rightarrow$ **Re-initialize** the hardware.
* *Sensor timeout?* $\rightarrow$ **Restart** the node.
```{figure} /_static/images/diagrams/fallbacks_dark.png
:class: dark-only
:alt: fig-fallbacks
:align: center
```
```{figure} /_static/images/diagrams/fallbacks_light.png
:class: light-only
:alt: fig-fallbacks
:align: center
The Self-Healing Loop
```
## The Recovery Hierarchy
When a component reports a failure, Sugarcoat doesn't just panic. It checks for a registered fallback strategy in a specific order of priority.
This allows you to define granular responses for different types of errors.
- {material-regular}`link_off;1.5em;sd-text-primary` 1. System Failure `on_system_fail`
**The Context is Broken.**
External failures like missing input topics or disk full.
*Example Strategy:* Wait for data, or restart the data pipeline.
- {material-regular}`error;1.5em;sd-text-danger` 2. Component Failure `on_component_fail`
**The Node is Broken.**
Internal crashes or hardware disconnects.
*Example Strategy:* Restart the component lifecycle or re-initialize drivers.
- {material-regular}`warning;1.5em;sd-text-warning` 3. Algorithm Failure `on_algorithm_fail`
**The Logic is Broken.**
The code ran but couldn't solve the problem (e.g., path not found).
*Example Strategy:* Reconfigure parameters (looser tolerance) or switch algorithms.
- {material-regular}`help_center;1.5em;sd-text-secondary` 4. Catch-All `on_fail`
**Generic Safety Net.**
If no specific handler is found above, this fallback is executed.
*Example Strategy:* Log an error or stop the robot.
## Recovery Strategies
A Fallback isn't just a single function call. It is a robust policy defined by **Actions** and **Retries**.
### 1. The Persistent Retry (Single Action)
*Try, try again.*
The system executes the action repeatedly until it returns `True` (success) or `max_retries` is reached.
```python
# Try to restart the driver up to 3 times
driver.on_component_fail(fallback=restart(component=driver), max_retries=3)
```
### 2. The Escalation Ladder (List of Actions)
*If at first you don't succeed, try something stronger.*
You can define a sequence of actions. If the first one fails (after its retries), the system moves to the next one.
1. **Clear Costmaps** (Low cost, fast)
2. **Reconfigure Planner** (Medium cost)
3. **Restart Planner Node** (High cost, slow)
```python
# Tiered Recovery for a Navigation Planner
planner.on_algorithm_fail(
fallback=[
Action(method=planner.clear_costmaps), # Step 1
Action(method=planner.switch_to_fallback), # Step 2
restart(component=planner) # Step 3
],
max_retries=1 # Try each step once before escalating
)
```
### 3. The "Give Up" State
If all strategies fail (all retries of all actions exhausted), the component enters the **Give Up** state and executes the `on_giveup` action. This is the "End of Line", usually used to park the robot safely or alert a human.
## How to Implement Fallbacks
### Method A: In Your Recipe (Recommended)
You can configure fallbacks externally without touching the component code. This makes your system modular and reusable.
```python
from ros_sugar.actions import restart, log
# 1. Define component
lidar = BaseComponent(component_name='lidar_driver')
# 2. Attach Fallbacks
# If it crashes, restart it (Unlimited retries)
lidar.on_component_fail(fallback=restart(component=lidar))
# If data is missing (System), just log it and wait
lidar.on_system_fail(fallback=log(msg="Waiting for Lidar data..."))
# If all else fails, scream
lidar.on_giveup(fallback=log(msg="LIDAR IS DEAD. STOPPING ROBOT."))
```
### Method B: In Component Class (Advanced)
For tightly coupled recovery logic (like re-handshaking a specific serial protocol), you can define custom fallback methods inside your class.
:::{tip}
Use the `@component_fallback` decorator. It ensures the method is only called when the component is in a valid state to handle it.
:::
```python
from ros_sugar.core import BaseComponent, component_fallback
from ros_sugar.core import Action
class MyDriver(BaseComponent):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Register the custom fallback internally
self.on_system_fail(
fallback=Action(self.try_reconnect),
max_retries=3
)
def _execution_step(self):
try:
self.hw.read()
self.health_status.set_healthy()
except ConnectionError:
# This trigger starts the fallback loop!
self.health_status.set_fail_system()
@component_fallback
def try_reconnect(self) -> bool:
"""Custom recovery logic"""
self.get_logger().info("Attempting handshake...")
if self.hw.connect():
return True # Recovery Succeeded!
return False # Recovery Failed, will retry...
```