Release highlights (2026.3.4)
- Lifecycle reliability: suspend, resume, and reboot operations now return
202for temporary
provider pending/locked states instead of surfacing false 500 errors.
- Retry metadata: lifecycle pending responses now include
pending,retryable,provider,
and operation fields so clients can handle retries cleanly.
- Reconcile hardening: docker host reconcile now defers fresh
destroyinghosts, retries stale
ones, and reports provider-pending issue counts.
- Ops visibility: provider-pending lifecycle events are now recorded as deployment events for
faster incident triage.
Why this matters
Provider control planes are eventually consistent. Returning 500 on transient lock states makes healthy systems look broken and causes unnecessary support load. This release converts those transients into explicit retryable states and improves operational diagnostics.
Operator notes
- Treat
202on lifecycle actions as expected retryable behavior. - Use deployment events where
type=provisioningstepandmetadata.step=providerpendingto
track impacted hosts.
- Monitor reconcile counters:
- deferredDestroyingHosts - retriedDestroyingHosts - providerPendingIssues
Runbook links
- Admin runbook:
docs/admin-runbook.md - Provider transient lifecycle runbook:
docs/ops/provider-transient-lifecycle.md