Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL encoding data types #517

Merged
merged 50 commits into from
Apr 7, 2022
Merged

MySQL encoding data types #517

merged 50 commits into from
Apr 7, 2022

Conversation

Zhaars
Copy link
Contributor

@Zhaars Zhaars commented Mar 29, 2022

Basic implementation and code refactoring regarding data type encoding in MySQL based on @Lagovas PR for PostgreSQL.

Checklist

wrap encoding functions with wrapper that do nothing with valid strings
user struct instead of interface for QueryDataItem to avoid extra nil checks with reflection
weren't used by any component and just produce unused code and complicate
reading
…ing/decoding

operations over interesting data for decryption/detokenization received
from database
extend encryptor, now find ColumnEncryptionSettings for queries from Parse
Packet with placeholders and save it in ClientSession to use it in
ParameterDescription packet followed by Parse packet (and also for
Bind packet but there we already have query to re-parse it)
update OID values according to ColumnEncryptionSetting
change value encoding flow for encrypted integers
Remove encoding/decoding logic from Column struct, use only
from PostgresqlEncoderDecoder and BoundValue encodings
- don't use DecodedData struct
- store raw data on protocol handler level
- encode/decode data in OnColumn handler on higher level
fix parameter description updates
Zhaars added 4 commits March 29, 2022 13:20
basic implementation of encoding data types for MySQL
updated columnInfo type with origin type tracking
# Conflicts:
#	decryptor/postgresql/data_encoder.go
#	decryptor/postgresql/data_encoder_test.go
#	decryptor/postgresql/pg_decryptor.go
#	decryptor/postgresql/proxy.go
#	encryptor/config/common/encryptedTypes.go
#	encryptor/config/encryptionSettings.go
#	encryptor/config/schemaStore_test.go
#	encryptor/dbDataCoder.go
#	encryptor/queryDataEncryptor.go
#	tests/test.py
#	utils/escape_format.go
#	utils/utils.go
@Zhaars Zhaars requested a review from Lagovas March 29, 2022 17:10
restore deleted test file
intValue, err := strconv.ParseInt(string(data), 10, 32)
if err != nil {
if newVal := setting.GetDefaultDataValue(); newVal != nil {
intValue, err = strconv.ParseInt(*newVal, 10, 64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
intValue, err = strconv.ParseInt(*newVal, 10, 64)
intValue, err = strconv.ParseInt(*newVal, 10, 32)


case common.EncryptedType_Int64:
encoded := make([]byte, 8)
intValue, err := strconv.ParseInt(string(data), 10, 32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
intValue, err := strconv.ParseInt(string(data), 10, 32)
intValue, err := strconv.ParseInt(string(data), 10, 64)

return data, false, err
}
}
// TODO: ask Lagovas what to do in case of error casting to defined type.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do nothing. return value inappropriate to type as it would be with encrypted binary data in case of expected integer.
at the future we will return DB specific error using wire protocol error notifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed to implement rollaback type logic

var n int
var err error
//var n int
//var err error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont' forget to remove

// means that we set encrypted data
if !bytes.Equal(m.textData, newData) {
m.paramType = TypeBlob
m.textData = newData
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, so now it's not only textData but binary too. maybe we should rename textData to data?

@@ -187,7 +187,11 @@ func (m *mysqlBoundValue) Copy() base.BoundValue {

// SetData set new value to BoundValue using ColumnEncryptionSetting if provided
func (m *mysqlBoundValue) SetData(newData []byte, setting config.ColumnEncryptionSetting) error {
m.textData = newData
// means that we set encrypted data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... for now it is used by QueryEncryptor uses it after encryption/tokenization. but what about if some else component will use it and there will be not only encrypted? can we somehow check that it is encrypted and we should change type?

WithError(err).WithField("field_index", columnIndex).
Errorln("Failed to process column data")
return nil, err
func (handler *Handler) extractProcessData(pos int, rowData []byte, field *ColumnDescription) ([]byte, int, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe name it just extractData? because two verbs a bit discourage

Zhaars added 3 commits April 5, 2022 17:18
Added intergration tests
Delete redundant changes
var ErrConvertToDataType = errors.New("error on converting to data type")

// BaseMySQLDataEncoderProcessor implements processor and encode/decode binary intX values to text format which acceptable by Tokenizer
type BaseMySQLDataEncoderProcessor struct{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about BaseMySQLDataEncoderProcessor rename to BaseMySQLDataProcessor, EncodeMySQLDataEncoderProcessor -> MySQLDataEncoderProcessor and DecodeMySQLDataEncoderProcessor -> MySQLDataDecoderProcessor?
because Encode...EncoderProcessor and Decode...EncodeProcessor looks strange)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree!

return data, false, err
}
return PutLengthEncodedString(binValue), true, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we dont return here ErrConvertToDataType in case when it wasn't decrypted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Currently we dont rollback fields type for text protocol. Do we have to?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed, we decided to rollback too. so we should return here same error?

@@ -515,13 +502,13 @@ func (handler *Handler) processTextDataRow(ctx context.Context, rowData []byte,
if err != nil {
return nil, err
}
value, err = handler.onColumnDecryption(ctx, i, value)
_, value, err = handler.onColumnDecryption(ctx, i, value, false, fields[i])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we ignore here ctx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question - we are not rolling back types in case of conversion failure for text protocol

@@ -8596,6 +8715,7 @@ def testClientIDRead(self):
self.assertEqual(data[column], row[column])
else:
self.assertNotEqual(data[column], row[column])
self.assertEqual(row[column], default_expected_values[column])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks)

break
}
}
handler.logger.WithField("column_index", i).Debugln("Parse field")
field, err := ParseResultField(fieldPacket.GetData())
field, err := ParseResultField(fieldPacket, handler.setting.TableSchemaStore())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about to split Parsing and updating type of field? or rename to something obvious that show the fact that field's type will be changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to rename. It looks like confusion

Zhaars added 3 commits April 6, 2022 13:49
# Conflicts:
#	encryptor/config/encryptionSettings.go
Merge master changes
@Zhaars Zhaars marked this pull request as ready for review April 6, 2022 17:42
updated CHANGELOG_DEV.md file
@Zhaars Zhaars requested a review from Lagovas April 7, 2022 08:27
Added addinonal logging/comments
return ctx, PutLengthEncodedString(data), nil
}

var columnType = columnInfo.DataBinaryType()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's assign this variable a bit below close to the usage block? because read this row and tried to find where it used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, makes sense.


case TypeTiny:
encoded = make([]byte, 1)
intValue, err := strconv.ParseInt(string(data), 10, 8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about to use BytesToString?)
looks like safe place for accessing data without concurrent access during parsing
same below for other int types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

return data, false, err
}
return PutLengthEncodedString(binValue), true, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed, we decided to rollback too. so we should return here same error?

if err != nil {
break
}
return ctx, []byte(strconv.FormatInt(int64(numericValue), 10)), nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about to use strconv.AppendInt? It helps avoid extra conversion from string to bytes:
strconv.FormatInt(nil, int64(numericValue), 10)
same below

Copy link
Contributor Author

@Zhaars Zhaars Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, didnt know about these methods before

return data, nil
case sqlparser.IntVal:
// if data was just tokenized, so we return it as is because it is valid int literal
if _, err := strconv.Atoi(string(data)); err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like BytesToString

@Lagovas
Copy link
Collaborator

Lagovas commented Apr 7, 2022

in general looks great except question about returning error in binary encoding. improvements related to type conversions are questionable)

if err != nil {
handler.logger.WithField(logging.FieldKeyEventCode, logging.EventCodeErrorProtocolProcessing).WithError(err).Errorln("Can't parse result field")
return err
}
// updating filed type according to DataType provided in schemaStore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// updating filed type according to DataType provided in schemaStore
// updating field type according to DataType provided in schemaStore

Zhaars added 3 commits April 7, 2022 14:18
Fixed failing tests
replace string() convertion to more efficient utils.BytesToString()
@Zhaars Zhaars requested a review from Lagovas April 7, 2022 14:13
Copy link
Collaborator

@Lagovas Lagovas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

replace string() convertion to more efficient utils.BytesToString()
@Zhaars Zhaars merged commit 9a71cd1 into master Apr 7, 2022
@Lagovas Lagovas deleted the zhars/mysql-encoding-data-types branch May 19, 2022 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants