Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try normalizing the dbcore String type #3774

Merged
merged 6 commits into from
Oct 28, 2020
Merged

Try normalizing the dbcore String type #3774

merged 6 commits into from
Oct 28, 2020

Conversation

sampsyo
Copy link
Member

@sampsyo sampsyo commented Oct 18, 2020

This is an attempt to fix #3773 by more aggressively normalizing values for dbcore's String type.

Strings were not being normalized, unlike some other types, leading to
downstream problems like #3773.
@wisp3rwind
Copy link
Member

That looks a lot like

# TODO This should eventually be replaced by
# `self.model_type(value)`
return value

should be adressed, right? Does anyone (in particular @geigerzaehler according to git blame) remember why that comment was added in the first place? Maybe, given that this PR has no CI failures, there's no issue with coercing all values to their model_type anymore?

In fact, maybe String.normalize() should ultimately be more strict, and assert that its values are actually six.text_type, rather than just converting them. Pretty much any value can successfully be converted to a string, so there's no real type checking otherwise.

@sampsyo
Copy link
Member Author

sampsyo commented Oct 19, 2020

Absolutely, yeah, that would be the thing to fix. Here's the context as best as I can remember. Before we added the type system, lots of parts of the codebase were very loosey-goosey about how they used various fields. This was a growing problem, of course, which is why we added the types in the first place. But we were nervous about all that existing code being that was so loosey-goosey about the types. The idea was to proceed in two phases:

  • Get the type system into place, but don't disrupt existing code—behave the same way as the type-free old system.
  • Flip the switch to turn on automatic conversions.

You could imagine adding a third phase to this, as @wisp3rwind suggested: instead of silently converting things like item.title = 123, start throwing errors. But it seems like we stalled out after the first phase. Getting more aggressive about this would be the right thing to do, however.

Copy link
Member

@wisp3rwind wisp3rwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to run the tests with the change mentioned in the TODO note above. There's one test failure, namely test_format_flex_field_bytes in test_dbcore. The issue is that types.Default then forces all flexattrs to unicode, but the test expects to be able to write utf8 bytes to the database. I'm pretty sure that forcing unicode here would break flexattr usage by some plugins. And since the database actually stores bytes rather than unicode objects (right?), it seems sensible to be able to write raw bytes.

So it might indeed by best to only coerce Strings to their model_type, not all fields.

@@ -207,6 +207,12 @@ class String(Type):
sql = u'TEXT'
query = query.SubstringQuery

def normalize(self, value):
if value is None:
return self.model_type()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to return self.null here (even though that is just self.model_type() for String.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're absolutely right about that.

@sampsyo
Copy link
Member Author

sampsyo commented Oct 23, 2020

Intriguing! Thanks for trying that out. Let's keep the dream alive—maybe we can work around that limitation by, for example, allowing fields without a declared type to remain dynamically typed and not coerced. But for now, we can make this more conservative change for strings only.

@sampsyo sampsyo merged commit 627005d into master Oct 28, 2020
@snejus snejus deleted the normalize-str branch June 15, 2024 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FetchArt crashes with TypeError for particular album
2 participants