Health Status¶
The Health Status is the heartbeat of a Sugarcoat component. It allows every part of your system to explicitly declare its operational state, not just “Alive” or “Dead,” but how it is functioning.
Unlike standard ROS2 nodes, Sugarcoat components are Self-Aware. They differentiate between a math error (Algorithm Failure), a hardware crash (Component Failure), or a missing input (System Failure).
These reports are broadcast back to the system to trigger:
Alerts: Notify the operator of specific issues.
Reflexes: Trigger Events to handle the situation.
Self-Healing: Execute automatic Fallbacks to recover the node.
Status Hierarchy¶
The status is broadcast using the automatika_ros_sugar/msg/ComponentStatus message. Sugarcoat defines distinct failure levels to help you pinpoint the root cause of an issue.
HEALTHY “Everything is awesome.” The component executed its main loop successfully and produced valid output.
ALGORITHM_FAILURE “I ran, but I couldn’t solve it.” The node is healthy, but the logic failed. Examples: Path planner couldn’t find a path; Object detector found nothing; Optimization solver did not converge.
COMPONENT_FAILURE “I am broken.” An internal crash or hardware issue occurred within this specific node. Examples: Memory leak; Exception raised in a callback; Division by zero.
SYSTEM_FAILURE “I am fine, but my inputs are broken.” The failure is caused by an external dependency. Examples: Input topic is empty or stale; Network is down; Disk is full.
Reporting Status¶
Every BaseComponent has an internal self.health_status object. You interact with this object inside your _execution_step or callbacks to declare the current state.
1. The Happy Path¶
Always mark the component as healthy at the end of a successful execution. This resets any previous error counters.
self.health_status.set_healthy()
2. Declaring Failures¶
When things go wrong, be specific. This helps the Fallback System decide whether to Retry (Algorithm), Restart (Component), or Wait (System).
Algorithm Failure:
# Optional: List the specific algorithm that failed
self.health_status.set_fail_algorithm(algorithm_names=["A_Star_Planner"])
Component Failure:
# Report that this component crashed
self.health_status.set_fail_component()
# Or blame a sub-module
self.health_status.set_fail_component(component_names=["Camera_Driver_API"])
System Failure:
# Report missing data on specific topics
self.health_status.set_fail_system(topic_names=["/camera/rgb", "/odom"])
Automatic Broadcasting¶
You do not need to manually publish the status message.
Sugarcoat automatically broadcasts the status at the start of every execution step.
This ensures a consistent “Heartbeat” frequency, even if your algorithm blocks or hangs (up to the threading limits).
Tip
If you need to trigger an immediate alert from a deeply nested callback or a separate thread, you can force a publish:
self.health_status_publisher.publish(self.health_status())
Implementation Pattern¶
Here is the robust pattern for writing an execution step using Health Status. This pattern enables the Self-Healing capabilities of Sugarcoat.
def _execution_step(self):
try:
# 1. Check Pre-conditions (System Level)
if self.input_image is None:
self.get_logger().warn("Waiting for video stream...")
self.health_status.set_fail_system(topic_names=[self.input_image.name])
return
# 2. Run Logic
result = self.ai_model.detect(self.input_image)
# 3. Check Logic Output (Algorithm Level)
if result is None or len(result.detections) == 0:
self.health_status.set_fail_algorithm(algorithm_names=["yolo_detector"])
return
# 4. Success!
self.publish_result(result)
self.health_status.set_healthy()
except ConnectionError:
# 5. Handle Crashes (Component Level)
# This will trigger the 'on_component_fail' fallback (e.g., Restart)
self.get_logger().error("Camera hardware disconnected!")
self.health_status.set_fail_component(component_names=["hardware_interface"])