Don't escape space characters in strings #525

Lagovas · 2022-04-13T21:05:53Z

Found that valid strings with \n characters for string types we encode as hex values and then it rendered incorrectly in web apps. In this PR update our check of printable characters that skipped for encoding. Also found that postgresql sends unicode values in another way then simple latin strings (with encoding into hex). But I didn't find how to fix it correctly, so I created T2531 for futher investigations. And this PR just short fix that should fix obvious and frequent cases.

Checklist

Change is covered by automated tests
The coding guidelines are followed
[ ] Public API has proper documentation in the Acra documentation site or has PR on documentation repository
with new changes
~~[ ] CHANGELOG.md is updated (in case of notable or breaking changes)~~
CHANGELOG_DEV.md is updated
~~[ ] Benchmark results are attached (if applicable)~~
~~[ ] Example projects and code samples are up-to-date (in case of API changes)~~

G1gg1L3s · 2022-04-15T12:57:32Z

Found that valid strings with \n characters for string types we encode as hex values and then it rendered incorrectly in web apps.

Could you provide an example of the full workflow where the problem occurs and for what reason? I just don't quite understand what do you mean

Lagovas · 2022-04-15T13:53:15Z

Could you provide an example of the full workflow where the problem occurs and for what reason? I just don't quite understand what do you mean

first time I found it in our rubygems example:

There is hex value:

>>> unhexlify(b'2d2d2d205b5d0a')
b'--- []\n'

And that is how it looks like with correct encoding:

Empty value filtered on the app's side.
There is Acra encoded in such way due to \n symbol which is not printable

G1gg1L3s · 2022-04-15T14:29:02Z

Ok, thank you for explanation! So my question is, are there any issues with other control characters? I saw \n and \t in tests. But what about for example \r? Maybe it's better to just check that slice is valid utf8, and if not, use hex encoding? At least this complies with postgres' varchar type.

…-space-characters

Lagovas · 2022-04-15T14:40:00Z

Ok, thank you for explanation! So my question is, are there any issues with other control characters? I saw \n and \t in tests. But what about for example \r? Maybe it's better to just check that slice is valid utf8, and if not, use hex encoding? At least this complies with postgres' varchar type.

Now it is checks "SPACE" group of unicode symbols and ignores other control symbols.

And now it checks for utf8 strings, spaces and \\. Otherwise encodes into the hex.

G1gg1L3s · 2022-04-18T11:45:07Z

I am not sure if it work correctly. For example, I've configured two tables: encrypted and plain with one text field. Then I insert data with one backslash. Here are the results:

On the left there is a connection to acra, and on the right direct connection to postgres. As you can see, the Acra encoded plain results into hex, and encrypted removes \ from decrypted results entirely (checked in wireshark).

Maybe we should revise and somehow document the encoding/decoding strategy?

don't encode string values

…-space-characters

return "str" values as is without encoding

Lagovas · 2022-04-27T23:28:29Z

so, I debugged how postgres sends string and binary values. And yes, if data has text type then postgres doesn't encode it and sens as is. Even if it is not unprintable control characters but valid utf8.
But if its binary data, then it sends in the escaped format (hex/octal)

Postgres knows types in the database but doesn't know type on application side. For example if app operates with str types, in postgres it is stored as binary. So anyway it comes to acra-server as binary values. But after decryption we should send forwared it as string values without any encodings, otherwise encode as binary data.

Also I met a problem with encoding binary data when we don't know real type. When we received data that not related to any encryptor_config setting. Acra decrypts acrastructs/acrablocks transparently without any encryptor_config because data can be stored directly from app to database with acrawriter. Or data can be encrypted with AcraTranslator and also saved directly to database. So on Acra side we try all data to decode, match to known signatures, try to decrypt and then return encoded decrypted data or as is. So I added saving value as is to return in same representation in case if it was already decoded but not decrypted, and acra doesn't have info about type.

G1gg1L3s · 2022-04-28T09:16:01Z

encryptor/dbDataCoder.go

+		case sqlparser.StrVal:
+			// if it is valid string, return as is. otherwise there is binary data, and we should encode it
+			if utils.IsPrintablePostgresqlString(data) {
+				return data, nil
+			}
+			val.Type = sqlparser.PgEscapeString
+			fallthrough
+		case sqlparser.PgEscapeString:
+			// valid strings we pass as is without extra encoding
+			if utils.IsPrintablePostgresqlString(data) {
+				return data, nil
+			}
 			return PgEncodeToHexString(data), nil


These two cases look almost the same, maybe we can combine them back?

they are not the same. in case with StrVal we check that it is printable string. If so, we return it as string literal as is. If it came as select 'some string', then we return 'some string'. Otherwise we change type to PgEscapeString that will return as E'some string'. We return as PgEscapeString because with such format we can encode not printable characters as hex or oct literals.

When we fall below - yes, we check second time with IsPrintable. But it will fail and encode string.
Or we can return PgEncodeToHexString in case with StrVal too. But I prefer to change type and use fallthrough for a future if case with PgEscapeString will process it in more complicated or different way. Then we should not to maintain copy-paste

G1gg1L3s · 2022-04-28T09:22:42Z

decryptor/postgresql/data_encoder_test.go

+		// string values can contain hex values and should be returned as is
+		{input: []byte("\\xTT"), decodedData: []byte("\\xTT"), encodedData: []byte("\\xTT"), decodeErr: hex.InvalidByteError('T'), encodeErr: nil,
+			setting: &config.BasicColumnEncryptionSetting{DataType: "str"}},
+
 		// invalid binary hex value that should be returned as is. Also encoded into hex due to invalid hex value


Leftover confusing comment detected :)

nice catch, thanks. will update it in the second copy-pasted row too

G1gg1L3s · 2022-04-28T09:44:32Z

decryptor/postgresql/data_encoder.go

+	if !base.IsDecryptedFromContext(ctx) {
+		encodedValue, ok := getEncodedValueFromContext(ctx)
+		if ok {
+			return ctx, &identityValue{encodedValue}, nil
+		}
+	} else {
+		return ctx, &byteSequenceValue{seq: data}, nil
+	}
+	return ctx, &identityValue{data}, nil


At first I thought that 96th line was unreachable, but it isn't. Though, can we swap branches in this if statement to simplify it (I'm talking about checking if base.IsDecryptedFromContext(ctx) first). I think It would be easier to perceive and it would make branches more explicit.

agree, will reverse it

utils/escape_format.go

G1gg1L3s · 2022-04-28T10:42:15Z

decryptor/postgresql/data_encoder.go

+	if !base.IsDecryptedFromContext(ctx) {
+		encodedValue, ok := getEncodedValueFromContext(ctx)
+		if ok {
+			return ctx, &identityValue{encodedValue}, nil
+		}
+	} else {
+		return ctx, &byteSequenceValue{seq: data}, nil
+	}
+	return ctx, &identityValue{data}, nil


Also, correct me if I am wrong.
If the decoding was successful in decodeText we save the result into the context. Then here, in the encoding function, if the data type is not known, we check if data was decrypted. And if it wasn't, we try to return that saved decoded value.

But, as I understand, if the decryptor handlers cannot decrypt the ciphertext, they will return it as is. So, theoretically, the decoded and saved value should not change. It means that in any case encodedValue must always be equal to data. So, my question is, in what cases that isn't true?

how acra processes data from database: it has 3 handlers.

decoder

decryptor

encoder
When it receive data from database, we should decode it from encoded format into raw binary. For example binary string 'hello' database will send as bytea data: \x68656c6c6f (in hex format). On our side we should decode it into raw binary value: 'hello' and pass it to our handlers that decrypt or detokenize data.
Okay, we had decrypted or did nothing with this data. Anyway to pass forward we should encode it in correct format to application. If we didn't do anything, we should pass it as is in hex format \x68656c6c6f. If we decrypted data, we look on data_type. If it is "str" then we don't need to encode. If we don't have data_type then pass it as binary as it came to us and encode to hex format.

In function above we describe 3 step: encoder. Decide what to do: encode or don't encode.

G1gg1L3s · 2022-04-28T10:52:32Z

utils/escape_format.go

-// IsPrintablePostgresqlString returns true if it's valid ASCII or utf8 string except '\' character that used as escape
-// character in strings
+// IsPrintablePostgresqlString returns true if it's valid utf8 string
+// except '\' character that used as escape character in strings


Another confusing comment detected :)

yeah, thanks. removed second check at end after extra checks that it doesn't need anymore

Co-authored-by: Nazar Serhiichuk <[email protected]>

* zhars/fix_deserializon_old_container_on_fail Added deserialization of old container on decryption fail

* zhars/update_python_examples Update python examples

G1gg1L3s

Alright, cool!

Lagovas added 2 commits April 13, 2022 23:31

don't escape space characters in strings

12aeaab

update changelog_dev.md

ba17536

Lagovas requested review from iamnotacake, G1gg1L3s and Zhaars April 13, 2022 21:05

allow unicode strings except control symbols and escape symbols

82bff7a

Lagovas mentioned this pull request Apr 15, 2022

Use type awareness in rubygems example cossacklabs/acra-engineering-demo#49

Merged

Merge remote-tracking branch 'origin/master' into lagovas/dont-escape…

785e89e

…-space-characters

Lagovas added 4 commits April 26, 2022 02:23

allow utf8 characters

27dc868

don't encode string values

Merge remote-tracking branch 'origin/master' into lagovas/dont-escape…

d15be89

…-space-characters

always encode "bytes" values as valid bytea

2ae550d

return "str" values as is without encoding

remove redundant check

aae93e1

G1gg1L3s reviewed Apr 28, 2022

View reviewed changes

Lagovas and others added 6 commits April 28, 2022 16:29

Update utils/escape_format.go [skip ci]

7f81eb0

Co-authored-by: Nazar Serhiichuk <[email protected]>

reverse if condition to simplify reading

8e8a347

Deserialization old containers on decryption fail (#528)

79dcf71

* zhars/fix_deserializon_old_container_on_fail Added deserialization of old container on decryption fail

Update python examples (#529)

f1409c0

* zhars/update_python_examples Update python examples

update comments according to described rows/functions

31c1e0e

reformat "if" statement due to linter suggestions and updated comment

5f381cc

Lagovas requested a review from G1gg1L3s April 28, 2022 13:43

G1gg1L3s approved these changes Apr 28, 2022

View reviewed changes

Lagovas merged commit f1cb21a into master Apr 29, 2022

Lagovas deleted the lagovas/dont-escape-space-characters branch April 29, 2022 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't escape space characters in strings #525

Don't escape space characters in strings #525

Lagovas commented Apr 13, 2022

G1gg1L3s commented Apr 15, 2022

Lagovas commented Apr 15, 2022

G1gg1L3s commented Apr 15, 2022

Lagovas commented Apr 15, 2022

G1gg1L3s commented Apr 18, 2022

Lagovas commented Apr 27, 2022

G1gg1L3s Apr 28, 2022

Lagovas Apr 28, 2022

G1gg1L3s Apr 28, 2022

Lagovas Apr 28, 2022

G1gg1L3s Apr 28, 2022

Lagovas Apr 28, 2022

G1gg1L3s Apr 28, 2022

Lagovas Apr 28, 2022

G1gg1L3s Apr 28, 2022

Lagovas Apr 28, 2022

G1gg1L3s left a comment

Don't escape space characters in strings #525

Don't escape space characters in strings #525

Conversation

Lagovas commented Apr 13, 2022

Checklist

G1gg1L3s commented Apr 15, 2022

Lagovas commented Apr 15, 2022

G1gg1L3s commented Apr 15, 2022

Lagovas commented Apr 15, 2022

G1gg1L3s commented Apr 18, 2022

Lagovas commented Apr 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

G1gg1L3s left a comment

Choose a reason for hiding this comment