Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]Fix the collect distinct bug #5000

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

neuyilan
Copy link
Member

Purpose

Fix #4999

Tests

testAggWithDistinctFirstDuplicate

API and Format

no

Documentation

no

@neuyilan neuyilan force-pushed the dev/qhl/collect_distinct_bug_fix branch from a2df38d to 25df878 Compare January 26, 2025 08:24
@neuyilan neuyilan closed this Jan 26, 2025
@neuyilan neuyilan reopened this Jan 26, 2025
@yuzelin yuzelin self-requested a review February 11, 2025 03:09
Arrays.stream(aggregators).forEach(FieldAggregator::reset);
}

@Override
public void add(KeyValue kv) {
if (initialKv == null) {
initialKv = kv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason for extracting initialKv separately to handle the case where there is only one kv and it is a retract? What if there are multiple kvs, but all of them are retracts? It seems like it will still return an INSERT record? It looks like this situation existed before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right.
If there are multiple kvs, but all of them are retracts, we still return as INSERT record now.
Do we need to deal with this situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While, this looks like a bug to me, I remember that I previously submitted a related issue #4670.

I'm fine with not including the fix for this issue in this PR since it is beyond the scope of #4999.

@yuzelin What do you think? Can you help review? Thanks!

@JingsongLi
Copy link
Contributor

Hi @neuyilan and @zhongyujiang , I understand this bug, and currently we haven't considered the input situation. Generally speaking, the input is a single piece of data, and we even use the ARRAY function to assemble an array of individual element.

But I feel that the changes are too big, and in order to fix this bug, some difficult to understand code has been introduced.

This may be a current limitation, you can consider changing the document to clarify it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Collect agg merge engine return duplicate list even we set filed distinct
3 participants