Docker Retention Cleanup
Cessy uses retention labels to make Docker cleanup safe on shared hosts. Broad Docker prune commands can delete runtime or workspace state, so automated cleanup only removes objects that explicitly opt in as disposable CI or test artifacts.
Safety model
Cleanup may delete an object only when all retention labels agree that it is disposable:
Runtime and workspace objects should be visible in reports but protected by default:
Unlabeled ces-ws-* volumes are hard-blocked. Treat those names as possible
workspace volumes unless the object creation path has been audited and labeled.
Cleanup workflow
Run dry-run first whenever you inspect or operate the cleanup manually:
Apply mode deletes only expired objects that pass the label and TTL rules:
Reports group objects by owner, environment, app id, and workspace id. Each
object includes a reason such as ttl_expired, ttl_not_expired,
preserve_true, or ces_workspace_volume_without_explicit_safe_to_prune.
Scheduled cleanup
The Docker retention cleanup GitHub workflow runs daily. It produces a dry-run
report, applies label-gated cleanup on scheduled runs, and checks Docker disk
usage after cleanup.
DOCKER_DISK_USAGE_ALERT_PERCENT controls the alert threshold and defaults to
80. When Docker disk usage remains at or above the threshold after cleanup,
the workflow fails so the red run becomes the alert.
Runner pool isolation
CI workflows can use repository variables to separate low-Docker jobs from Docker-heavy jobs:
Low-Docker jobs include typecheck, lint, static checks, affected unit tests, and post-merge unit coverage. Docker-heavy jobs include image builds, integration tests, and retention cleanup. When variables are not configured, workflows fall back to the shared self-hosted Hetzner runner labels.
When the alert fires
- Open the failed workflow run and inspect the dry-run and apply reports.
- Confirm whether large blocked objects are preserved runtime objects or
unlabeled
ces-ws-*volumes. - Do not delete unlabeled workspace volumes from the host shell.
- Fix labels at the object creation path when an object is disposable.
- Re-run dry-run, then apply only after the safe deletion set is limited to disposable CI or test objects.
- If build cache dominates usage, schedule a maintenance window instead of pruning while image builds are active.