message: account for duplicates across a recovered checkpoint boundary #406

jgraettinger · 2024-10-17T03:12:22Z

Sequencer's runtime assertion checks were slightly too strong. A scenario we encountered which tripped a runtime panic was:

CONTINUE_TXN messages were read from a journal, and were then part of a checkpoint. The checkpoint included the producer's earlier last ACK clock and a begin offset at the first of these messages.
The messages were then duplicated within the journal, and ACKed.
A new Sequencer recovers from the checkpoint.
It reads the later, duplicated messages and adds them to the ring, followed by an ACK which begins a dequeue with a replay-read of the earlier messages.
The earlier messages are dequeued first, and then the replay ends and hands off to the ring.
Sequencer blows up because the first ring message is not strictly larger than the largest Clock dequeued from the replay read.

The fix is simply to remove the runtime assertion, and discard duplicates, as this is valid thing that can happen. Also add a new test case covering this scenario.

This change is

Sequencer's runtime assertion checks were slightly too strong. A scenario we encountered which tripped a runtime panic was: - CONTINUE_TXN messages were read from a journal, and were then part of a checkpoint. The checkpoint included the producer's earlier last ACK clock and a begin offset at the first of these messages. - The messages were then duplicated within the journal, and ACKed. - A new Sequencer recovers from the checkpoint. - It reads the later, duplicated messages and adds them to the ring, followed by an ACK which begins a dequeue with a replay-read of the earlier messages. - The earlier messages are dequeued first, and then the replay ends and hands off to the ring. - Sequencer blows up because the first ring message is not strictly larger than the largest Clock dequeued from the replay read. The fix is simply to remove the runtime assertion, and discard duplicates, as this is valid thing that can happen. Also add a new test case covering this scenario.

psFried

LGTM

jgraettinger requested a review from psFried October 17, 2024 03:12

psFried approved these changes Oct 17, 2024

View reviewed changes

jgraettinger merged commit 719adef into master Oct 17, 2024
1 check passed

jgraettinger deleted the johnny/dups-across-checkpoints branch October 17, 2024 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

message: account for duplicates across a recovered checkpoint boundary #406

message: account for duplicates across a recovered checkpoint boundary #406

jgraettinger commented Oct 17, 2024 •

edited

Loading

psFried left a comment

message: account for duplicates across a recovered checkpoint boundary #406

message: account for duplicates across a recovered checkpoint boundary #406

Conversation

jgraettinger commented Oct 17, 2024 • edited Loading

psFried left a comment

Choose a reason for hiding this comment

jgraettinger commented Oct 17, 2024 •

edited

Loading