message: account for duplicates across a recovered checkpoint boundary #406
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sequencer's runtime assertion checks were slightly too strong. A scenario we encountered which tripped a runtime panic was:
CONTINUE_TXN messages were read from a journal, and were then part of a checkpoint. The checkpoint included the producer's earlier last ACK clock and a begin offset at the first of these messages.
The messages were then duplicated within the journal, and ACKed.
A new Sequencer recovers from the checkpoint.
It reads the later, duplicated messages and adds them to the ring, followed by an ACK which begins a dequeue with a replay-read of the earlier messages.
The earlier messages are dequeued first, and then the replay ends and hands off to the ring.
Sequencer blows up because the first ring message is not strictly larger than the largest Clock dequeued from the replay read.
The fix is simply to remove the runtime assertion, and discard duplicates, as this is valid thing that can happen. Also add a new test case covering this scenario.
This change is