Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of References for AVRO Schemas #926

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

libretto
Copy link
Contributor

@libretto libretto commented Jul 27, 2024

This change adds support for references for AVRO schemas

We aim to add support for references for AVRO schemas to Karapace using a straightforward approach by leveraging the same API used in Protobuf references. Although Avro does not natively support references, we can implement this functionality by utilizing the property of Avro schemas that allows combining types. We utilize Names from avro.name and make_avsc_object to create an Avro schema that includes referenced schemas within as explained in #987.

@libretto libretto requested review from a team as code owners July 27, 2024 13:50
Copy link
Contributor

@jjaakola-aiven jjaakola-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup commits to single/coherent commits. Add tests for better coverage.

@libretto
Copy link
Contributor Author

libretto commented Jul 29, 2024

Cleanup commits to single/coherent commits. Add tests for better coverage.

@jjaakola-aiven Could you suggest the tests that should be added?

@libretto libretto force-pushed the avro_references2 branch 2 times, most recently from f0f9450 to 0167a75 Compare July 29, 2024 09:46
@libretto
Copy link
Contributor Author

@jjaakola-aiven I'm still waiting for your suggestions regarding additional tests for the new functionality. Please keep in mind that we already have integration tests covering various aspects, including:

Basic Avro references
Avro references compatibility
Avro union references
Avro incompatible names in references

Additionally, the new code has successfully passed all existing tests related to Avro functionality. I also added a few unit tests for the AvroMerge class today.

@nosahama
Copy link
Contributor

@libretto shall we rebase this work so we get upstream changes and then let's discuss around the comments if you can tell me when you'd like to, we can spare sometime and discuss in one thread here.

@nosahama
Copy link
Contributor

@libretto The unit test for AvroMerge is gone and it seems that you have reverted some of your changes with your recent push.

@libretto
Copy link
Contributor Author

@nosahama

@libretto The unit test for AvroMerge is gone and it seems that you have reverted some of your changes with your recent push.

Yes it was an issue with rebase I just restored banch, synced it with main and try rebase again...

@libretto libretto force-pushed the avro_references2 branch 2 times, most recently from 9cac0ba to f4346ea Compare September 26, 2024 16:57
@libretto
Copy link
Contributor Author

@libretto shall we rebase this work so we get upstream changes and then let's discuss around the comments if you can tell me when you'd like to, we can spare sometime and discuss in one thread here.

rebased

@davide-armand
Copy link
Contributor

@libretto Hi, sorry for the delay, this took some time to properly review.

I think that the implementation using Avro unions is risky and might not work in all cases (hard to tell).
A more "standard" solution using the Avro Python library instead of unions seems a better choice.

I have opened a draft PR implementing the merging using Avro lib here:
#987

Please have a look and consider that approach instead.

@libretto
Copy link
Contributor Author

libretto commented Nov 6, 2024

@libretto Hi, sorry for the delay, this took some time to properly review.

I think that the implementation using Avro unions is risky and might not work in all cases (hard to tell). A more "standard" solution using the Avro Python library instead of unions seems a better choice.

I have opened a draft PR implementing the merging using Avro lib here: #987

Please have a look and consider that approach instead.

@davide-armand Hi,

As I understand, the union approach concept has been confirmed by our teams and has passed all existing Karapace tests, including the references tests I provided. This demonstrates that the concept works well. I will try using Your approach with the Avro library to eliminate the need for the union method. I expect the results to be consistent.

Regarding tests, I noticed that You added a few tests for recursion. However, the direct recursion test You included is on the Avro-level parser and should be tested within the Avro tests, outside of the Karapace project. Indirect recursion (where Schema A references Schema B, which in turn references Schema A) is not possible in Karapace's design, as all schemas are added simultaneously. A schema cannot be added if any of its references are unresolved, which effectively prevents indirect recursion between schemas.

I will include the Your code for the test_avro_reference() test in the my branch tests.

@libretto
Copy link
Contributor Author

libretto commented Nov 9, 2024

@davide-armand I utilized the Avro library functionality as you suggested. References now work in the native Avro way.

@libretto libretto requested a review from nosahama November 9, 2024 20:36
@libretto
Copy link
Contributor Author

libretto commented Nov 9, 2024

#987 implemented there

Copy link
Contributor

@davide-armand davide-armand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@libretto Hi, I left a few comments, please have a look


return merge

def resolve(self) -> list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve() and build() can be merged in a single method, since builder() is only called by resolve().
Maybe the whole AvroResolver class can be just a static method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-armand After this change, we have a pylint issue but all tests are passed. Please check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@libretto Do you mean the error No overload variant of "reversed"?
It can be fixed by passing a list to reversed

@cjrolo
Copy link

cjrolo commented Feb 4, 2025

Hello,
I would like to know if it is possible to have this re-reviewed? It seems that feedback was incorported into the lastest changes.

Thanks,

Carlos

merged_schema = None
for schema in schemas_list:
# Merge dep with all previously merged ones
merged_schema = make_avsc_object(json.loads(schema), names)
Copy link
Contributor

@davide-armand davide-armand Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_avro_enum_symbols and validate_avro_names should be passed to make_avsc_object().
I missed that in my PR

{
"type": "record",
"name": "Country",
"namespace": "com.netapp",
Copy link
Contributor

@davide-armand davide-armand Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use reference to real companies (com.netapp).
Applies to other fixtures as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will look into this!

@cjrolo
Copy link

cjrolo commented Mar 4, 2025

Fixed accordingly to the review. Hope I didn't miss anything, let me know!

Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants