Skip to content

Commit 7d4d26e

Browse files
committed
Gate .sentinel removal on graceful shutdown behind env flag
Default behavior unchanged: .sentinel is removed on graceful shutdown (preserves existing semantics for the 99% of deployments that don't need the kubectl-delete-pod recovery path). With LIBSQL_PRESERVE_SENTINEL_ON_SHUTDOWN=1 set, the sentinel survives graceful shutdown. This re-enables the documented operator recovery procedure: 1. kubectl exec <pod> -- touch /data/dbs/<ns>/.sentinel 2. kubectl delete pod <pod> # SIGTERM → graceful shutdown 3. Kubernetes recreates pod 4. Next namespace access triggers dirty-recovery on the preserved .sentinel, rebuilding wallog/snapshots from the live data file Without this flag, step 2's graceful shutdown removes the sentinel BEFORE the pod stops, so step 4 doesn't find a sentinel and skips the dirty-recovery path. Now that POST /v1/namespaces/:ns/reset-replication is the primary recovery primitive, this flag is a low-priority belt-and-suspenders for emergency ops workflows (e.g. when the admin API is unavailable). Verified end-to-end with /tmp/run_sentinel_preserve_simple.sh: sentinel preserved with flag, dirty-recovery fires on next access, data preserved through the cycle.
1 parent 486d78f commit 7d4d26e

1 file changed

Lines changed: 25 additions & 2 deletions

File tree

  • libsql-server/src/namespace

libsql-server/src/namespace/mod.rs

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,31 @@ impl Namespace {
139139
self.checkpoint().await?;
140140
}
141141
self.db.shutdown().await?;
142-
if let Err(e) = tokio::fs::remove_file(self.path.join(".sentinel")).await {
143-
tracing::error!("unable to remove .sentinel file: {}", e);
142+
// Historically `.sentinel` was removed unconditionally on graceful
143+
// shutdown. This makes the documented `touch .sentinel + kubectl
144+
// delete pod` operator recovery path silently ineffective, because
145+
// kubectl sends SIGTERM first which invokes this graceful shutdown
146+
// and removes the sentinel before the pod actually stops.
147+
//
148+
// Guard the removal behind `LIBSQL_PRESERVE_SENTINEL_ON_SHUTDOWN`.
149+
// When set, the sentinel survives graceful shutdown, so the next
150+
// namespace init will correctly trigger dirty-recovery from the
151+
// live `data` file.
152+
//
153+
// Default remains: remove (preserves existing behavior for the
154+
// 99% of deployments that don't need this recovery path, now that
155+
// `POST /v1/namespaces/:ns/reset-replication` is the primary
156+
// recovery primitive).
157+
let preserve_sentinel =
158+
std::env::var("LIBSQL_PRESERVE_SENTINEL_ON_SHUTDOWN").is_ok();
159+
if !preserve_sentinel {
160+
if let Err(e) = tokio::fs::remove_file(self.path.join(".sentinel")).await {
161+
tracing::error!("unable to remove .sentinel file: {}", e);
162+
}
163+
} else {
164+
tracing::info!(
165+
"LIBSQL_PRESERVE_SENTINEL_ON_SHUTDOWN set; keeping .sentinel for recovery"
166+
);
144167
}
145168
Ok(())
146169
}

0 commit comments

Comments
 (0)