Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thrift using different serialization protocols #19

Open
ivanprado opened this issue Jan 8, 2013 · 3 comments
Open

Thrift using different serialization protocols #19

ivanprado opened this issue Jan 8, 2013 · 3 comments

Comments

@ivanprado
Copy link
Contributor

Right now Pangool is serializing thrift using TBinaryProtocol. But could be interesting to use TCompactProtocol, which uses less space. The idea is to make the selection of the protocol configurable.

@epalace
Copy link
Member

epalace commented Jan 8, 2013

In the case of map-output this is easy to specify in the Configuration and read by ThriftSerialization. In the case of sequence files containing Thrift objects either in key or value that couldn't be managed directly by SequenceFileInput/OutputFormat. New {Input/Output}Format must be created, and the protocol expected would be specified via Configuration or via SequenceFile Header. In this case ThriftSerialization couldn't be used since with no Objects wrappers a la Avro (AvroKey,AvroValue) it can't distinguish if its an input, map-output or output.

@ivanprado
Copy link
Contributor Author

It sounds reasonable.

Iván

2013/1/8 Eric Palacios [email protected]

In the case of map-output this is easy to specify in the Configuration and
read by ThriftSerialization. In the case of sequence files containing
Thrift objects either in key or value that couldn't be managed directly by
SequenceFileInput/OutputFormat. New {Input/Output}Format must be created,
and the protocol expected would be specified via Configuration or via
SequenceFile Header. In this case ThriftSerialization couldn't be used
since with no Objects wrappers a la Avro (AvroKey,AvroValue) it can't
distinguish if its an input, map-output or output.


Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-11992216.

Iván de Prado
CEO & Co-founder
www.datasalt.com

@ivanprado
Copy link
Contributor Author

That would be solved properly by implementing a custom field serializer for Thrift (http://pangool.net/userguide/custom_serialization.html). The metadata would be used for storing the format used for serializing this field. This information would be carried as well in the header of the TupleFile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants