Error Handling¶
flaude raises specific exception types for different failure modes. Understanding them helps you handle failures precisely and avoid masking bugs.
MachineExitError¶
MachineExitError is raised when a Fly machine exits with a non-zero exit code or reaches
the failed state (OOM kill, entrypoint crash, etc.). It carries the captured log tail so
you can diagnose failures without a separate log retrieval step.
from flaude import MachineConfig, MachineExitError, run_and_destroy
async def run_with_error_handling(app_name: str) -> None:
config = MachineConfig(
claude_code_oauth_token="sk-ant-oat-...",
prompt="Run the full test suite and fix any failures",
repos=["https://github.com/your-org/your-repo"],
)
try:
result = await run_and_destroy(app_name, config)
except MachineExitError as exc:
print(f"Machine {exc.machine_id} failed")
print(f" exit_code: {exc.exit_code}")
print(f" state: {exc.state}")
print(f" log lines: {len(exc.logs)}")
print("Last 5 lines:")
for line in exc.logs[-5:]:
print(f" {line}")
Attributes¶
| Attribute | Type | Description |
|---|---|---|
machine_id |
str |
The Fly machine ID |
exit_code |
int \| None |
Process exit code (non-zero, or None if unavailable) |
state |
str |
Final machine state (stopped, failed) |
logs |
list[str] |
Captured log lines — may be empty if no log drain was configured |
The exception message includes the last 20 log lines for quick debugging when printed or logged.
raise_on_failure=False¶
Both run_and_destroy and StreamingRun.result() accept raise_on_failure=False to
suppress MachineExitError and return RunResult directly:
result = await run_and_destroy(app_name, config, raise_on_failure=False)
if result.exit_code != 0 or result.state == "failed":
print(f"Run failed: exit_code={result.exit_code}, state={result.state}")
This is useful when you want to inspect the result yourself rather than catching an exception.
FlyAPIError¶
FlyAPIError is raised when the Fly.io Machines API returns an error response — for
example, when machine creation fails due to invalid configuration or resource limits:
from flaude.fly_client import FlyAPIError
try:
result = await run_and_destroy(app_name, config)
except FlyAPIError as exc:
print(f"Fly API error: {exc.status_code}")
print(f"Detail: {exc.detail}")
| Attribute | Type | Description |
|---|---|---|
status_code |
int |
HTTP status code from the Fly API |
detail |
str |
Error message from the API response |
Tip
A 404 error during wait_for_machine_exit is treated as a successful destroyed state
— flaude handles this transparently since Fly auto-destroys machines when auto_destroy=True.
ImageBuildError¶
ImageBuildError is raised when docker_build, docker_push, or ensure_image fails:
from flaude import ImageBuildError, ensure_image
try:
image = await ensure_image("my-flaude-app")
except ImageBuildError as exc:
print(f"Build failed (rc={exc.returncode})")
print(f"stderr: {exc.stderr}")
| Attribute | Type | Description |
|---|---|---|
returncode |
int \| None |
Exit code of the failed docker command |
stderr |
str |
Captured stderr output from the failed command |
asyncio.TimeoutError¶
asyncio.TimeoutError is raised when a machine does not exit within wait_timeout
(default: 3600 seconds). This can happen if Claude Code hangs or the machine gets stuck:
import asyncio
from flaude import run_and_destroy, MachineConfig
try:
result = await run_and_destroy(app_name, config, wait_timeout=600.0)
except asyncio.TimeoutError:
print("Machine did not exit within 600 seconds")
# Note: the machine may still be running on Fly.io — clean it up manually
# or rely on Fly's own machine lifecycle management.
Warning
When a TimeoutError is raised from run_and_destroy, flaude still attempts cleanup
in the finally block. However, if the machine is stuck, the stop/destroy calls may
also fail. Check the Fly dashboard if you see orphaned machines.
Exit code fallback chain¶
The Fly Machines API does not always populate the exit code in its response — this can
happen when machines are force-destroyed or reach the failed state without a clean
process exit.
When the API returns None for the exit code, flaude falls back to scanning the collected
log lines for a [flaude:exit:N] marker written by the container's entrypoint.sh before
the Claude Code process exits:
The fallback chain is:
- Fly Machines API
events[].status.exit_code - Fly Machines API
status.exit_code [flaude:exit:N]log marker (viaextract_exit_code_from_logs)Noneif none of the above is available