-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: write retry delay #335
fix: write retry delay #335
Conversation
Codecov Report
@@ Coverage Diff @@
## master #335 +/- ##
=======================================
Coverage 90.48% 90.48%
=======================================
Files 23 23
Lines 2490 2490
=======================================
Hits 2253 2253
Misses 178 178
Partials 59 59
Continue to review full report at Codecov.
|
e821867
to
adfb995
Compare
…carded Design error, RetryDelay should be tracked in the service/queue and not in the batch. Similarly, a service/queue-level RetryAttempts should be used by computeRetryDelay() instead of a batch-level value.
…t.go min() function isn't used in service.go, only in service_test.go
adfb995
to
29369c7
Compare
I've added a test case that explicitly illustrates the design flaw fixed in this PR. In this scenario, MaxRetryInterval is capped at 300 ms and a client is attempting to write every 20 ms to an unresponsive server. Prior to this PR's fix, the behavior would be as follows...
... which I claim is incorrect. If server has not started responding, we do not suddenly want to start attempting to write to it again every time the client adds a new batch to the queue just because the oldest batch has expired. We should instead continue to wait MaxRetryInterval ms until the server begins responding again. This improved behavior is demonstrated as follows in the PR, where RetryDelay is now tracked at the Write Queue level instead of the Batch level:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging more into the issue. I agree that this is the better way to handle retry time.
Adding a test is a good practice! 👍
Just a small change.
internal/write/service.go
Outdated
RetryDelay uint | ||
RetryAttempts uint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those fields don't need to be public
#332 deadlock fix exposes design flaw in retry queue logic. Accumulation of delay should be tracked at the queue level, not at the batch level. Old design can result in repeated minimum-delay retries once batches start getting discarded due to age (oldest batch has never attempted write due to prior blockage, and prior batches expired due to age, however queue has not successfully written yet so should be at maximum-delay retry interval).