-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't escape space characters in strings #525
Conversation
Could you provide an example of the full workflow where the problem occurs and for what reason? I just don't quite understand what do you mean |
Ok, thank you for explanation! So my question is, are there any issues with other control characters? I saw |
…-space-characters
Now it is checks "SPACE" group of unicode symbols and ignores other control symbols. And now it checks for utf8 strings, spaces and |
don't encode string values
…-space-characters
return "str" values as is without encoding
so, I debugged how postgres sends string and binary values. And yes, if data has Postgres knows types in the database but doesn't know type on application side. For example if app operates with Also I met a problem with encoding binary data when we don't know real type. When we received data that not related to any encryptor_config setting. Acra decrypts acrastructs/acrablocks transparently without any encryptor_config because data can be stored directly from app to database with acrawriter. Or data can be encrypted with AcraTranslator and also saved directly to database. So on Acra side we try all data to decode, match to known signatures, try to decrypt and then return encoded decrypted data or as is. So I added saving value |
case sqlparser.StrVal: | ||
// if it is valid string, return as is. otherwise there is binary data, and we should encode it | ||
if utils.IsPrintablePostgresqlString(data) { | ||
return data, nil | ||
} | ||
val.Type = sqlparser.PgEscapeString | ||
fallthrough | ||
case sqlparser.PgEscapeString: | ||
// valid strings we pass as is without extra encoding | ||
if utils.IsPrintablePostgresqlString(data) { | ||
return data, nil | ||
} | ||
return PgEncodeToHexString(data), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two cases look almost the same, maybe we can combine them back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are not the same. in case with StrVal
we check that it is printable string. If so, we return it as string literal as is. If it came as select 'some string'
, then we return 'some string'
. Otherwise we change type to PgEscapeString
that will return as E'some string'
. We return as PgEscapeString because with such format we can encode not printable characters as hex or oct literals.
When we fall below - yes, we check second time with IsPrintable. But it will fail and encode string.
Or we can return PgEncodeToHexString in case with StrVal too. But I prefer to change type and use fallthrough for a future if case with PgEscapeString will process it in more complicated or different way. Then we should not to maintain copy-paste
// string values can contain hex values and should be returned as is | ||
{input: []byte("\\xTT"), decodedData: []byte("\\xTT"), encodedData: []byte("\\xTT"), decodeErr: hex.InvalidByteError('T'), encodeErr: nil, | ||
setting: &config.BasicColumnEncryptionSetting{DataType: "str"}}, | ||
|
||
// invalid binary hex value that should be returned as is. Also encoded into hex due to invalid hex value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover confusing comment detected :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, thanks. will update it in the second copy-pasted row too
decryptor/postgresql/data_encoder.go
Outdated
if !base.IsDecryptedFromContext(ctx) { | ||
encodedValue, ok := getEncodedValueFromContext(ctx) | ||
if ok { | ||
return ctx, &identityValue{encodedValue}, nil | ||
} | ||
} else { | ||
return ctx, &byteSequenceValue{seq: data}, nil | ||
} | ||
return ctx, &identityValue{data}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought that 96th line was unreachable, but it isn't. Though, can we swap branches in this if
statement to simplify it (I'm talking about checking if base.IsDecryptedFromContext(ctx)
first). I think It would be easier to perceive and it would make branches more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, will reverse it
decryptor/postgresql/data_encoder.go
Outdated
if !base.IsDecryptedFromContext(ctx) { | ||
encodedValue, ok := getEncodedValueFromContext(ctx) | ||
if ok { | ||
return ctx, &identityValue{encodedValue}, nil | ||
} | ||
} else { | ||
return ctx, &byteSequenceValue{seq: data}, nil | ||
} | ||
return ctx, &identityValue{data}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, correct me if I am wrong.
If the decoding was successful in decodeText
we save the result into the context. Then here, in the encoding function, if the data type is not known, we check if data was decrypted. And if it wasn't, we try to return that saved decoded value.
But, as I understand, if the decryptor handlers cannot decrypt the ciphertext, they will return it as is. So, theoretically, the decoded and saved value should not change. It means that in any case encodedValue
must always be equal to data
. So, my question is, in what cases that isn't true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how acra processes data from database: it has 3 handlers.
- decoder
- decryptor
- encoder
When it receive data from database, we should decode it from encoded format into raw binary. For example binary string 'hello' database will send as bytea data:\x68656c6c6f
(in hex format). On our side we should decode it into raw binary value: 'hello' and pass it to our handlers that decrypt or detokenize data.
Okay, we had decrypted or did nothing with this data. Anyway to pass forward we should encode it in correct format to application. If we didn't do anything, we should pass it as is in hex format\x68656c6c6f
. If we decrypted data, we look on data_type. If it is "str" then we don't need to encode. If we don't have data_type then pass it as binary as it came to us and encode to hex format.
In function above we describe 3 step: encoder
. Decide what to do: encode or don't encode.
utils/escape_format.go
Outdated
// IsPrintablePostgresqlString returns true if it's valid ASCII or utf8 string except '\' character that used as escape | ||
// character in strings | ||
// IsPrintablePostgresqlString returns true if it's valid utf8 string | ||
// except '\' character that used as escape character in strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another confusing comment detected :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, thanks. removed second check at end after extra checks that it doesn't need anymore
Co-authored-by: Nazar Serhiichuk <[email protected]>
* zhars/fix_deserializon_old_container_on_fail Added deserialization of old container on decryption fail
* zhars/update_python_examples Update python examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, cool!
Found that valid strings with
\n
characters for string types we encode as hex values and then it rendered incorrectly in web apps. In this PR update our check of printable characters that skipped for encoding. Also found that postgresql sends unicode values in another way then simple latin strings (with encoding into hex). But I didn't find how to fix it correctly, so I created T2531 for futher investigations. And this PR just short fix that should fix obvious and frequent cases.Checklist
[ ] Public API has proper documentation in the Acra documentation site or has PR on documentation repositorywith new changes
[ ] CHANGELOG.md is updated (in case of notable or breaking changes)[ ] Benchmark results are attached (if applicable)[ ] Example projects and code samples are up-to-date (in case of API changes)