-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pangool Streaming ? #6
Comments
Current parameters of Hadoop Streamming are:
The reduce script receives all the data without being grouped. So the script is responsible of detecting changes in key, and creating manually the groups. Seems we could configure the streaming job, allowing to define the group by and sort by options. The reduce and combiner script would be called once per group. That could be inefficient, as the start up&down times of the scripts can be relevant. But, by the other side, maybe is useful. We could also allow to provide an intermediate schema, so than text is translated to Tuples after the mapper. That allows:
|
Sorry, I don't get what has "Hadoop Streaming" to do with Pangool. In my mind one uses Hadoop Streaming for orthogonal reasons to those for using Pangool or Java MapRed. Unless you can ellaborate more on why is this useful... I don't see it. There are already very good APIs on top of Hadoop Streaming like Python MapRed APIs. |
It could have sense at some point to build some kind of "Hadoop Streaming" Anyway, I don't see that as a big priority, so I would close the ticket. 2013/10/1 Pere Ferrera [email protected]
Iván de Prado |
Is able Pangool to work with Hadoop Streaming ?
The text was updated successfully, but these errors were encountered: