Skip to content

Commit e157523

Browse files
committed
reset-replication: refuse Mode B (live data-file corruption)
If the pre-teardown checkpoint fails with a DatabaseCorrupt / malformed error, the live data file itself is corrupt. Rebuilding the wallog from corrupt data would just propagate the corruption AND leave the namespace in a broken state (the destroy-then-make sequence fails halfway, leaving NamespaceDoesntExist). Now the endpoint returns 500 with an explicit error message pointing the operator to a restore-from-backup path, without destroying the in-memory namespace first. The namespace stays loaded and returns the underlying corruption error to subsequent reads — a true observability signal. Verified with /tmp/test_mode_b.sh: before fix, namespace went to 404 after reset; after fix, namespace stays loaded with 'malformed database schema' error. Mode A happy path (wallog corruption, live data OK) unchanged: 1135ms p95 over 3 reps, 100% data preserved.
1 parent 84b4b1c commit e157523

1 file changed

Lines changed: 27 additions & 2 deletions

File tree

libsql-server/src/namespace/store.rs

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -256,9 +256,34 @@ impl NamespaceStore {
256256

257257
// (1) Checkpoint before we tear down in-memory state so the data
258258
// file has everything the WAL was holding.
259+
//
260+
// If checkpoint fails with a `DatabaseCorrupt` / `malformed`
261+
// error, the live data file itself is corrupt. Rebuilding the
262+
// wallog from a corrupt data file would just propagate the
263+
// corruption — so bail out BEFORE we destroy the in-memory
264+
// namespace. The caller should fall back to a restore-from-
265+
// backup path (Mode B), not this endpoint.
259266
if let Some(ns) = lock.as_ref() {
260-
if let Err(e) = ns.checkpoint().await {
261-
tracing::warn!("reset_replication: checkpoint failed: {e}; proceeding anyway");
267+
match ns.checkpoint().await {
268+
Ok(()) => {}
269+
Err(e) => {
270+
let msg = e.to_string();
271+
let is_live_db_corrupt = msg.contains("malformed")
272+
|| msg.contains("DatabaseCorrupt")
273+
|| msg.contains("database disk image")
274+
|| msg.contains("file is not a database");
275+
if is_live_db_corrupt {
276+
return Err(Error::Internal(format!(
277+
"reset_replication: live data file appears corrupt \
278+
(checkpoint failed: {e}); refusing to rebuild \
279+
replication log from corrupt data. Use a \
280+
restore-from-backup procedure instead."
281+
)));
282+
}
283+
tracing::warn!(
284+
"reset_replication: checkpoint failed: {e}; proceeding anyway"
285+
);
286+
}
262287
}
263288
}
264289

0 commit comments

Comments
 (0)