Monday, November 17, 2014

Stream Processing and Mining just got more interesting - O'Reilly Radar


Stream Processing and Mining just got more interesting - O'Reilly Radar

Stream Processing and Mining just got more interesting A general purpose stream processing framework from the team behind Kafka and new techniques for computing approximate quantiles Apache Samza: a distributed stream processing framework Behind Kafka's success as an open source project is a team of savvy engineers who have spent2 the last three years making it a rock solid system. The developers behind Kafka realized early on that it was best to place the bulk of data processing (i.e., stream processing) in another system. Armed with specific use cases, work on Samza proceeded in earnest about a year ago. So while they examined existing streaming frameworks (such as Storm, S4, Spark Streaming), Linkedin engineers wanted a system that better fit their needs 3 and requirements: Just as MapReduce requires a data source (in most cases HDFS), a general purpose data processing system like Samza requires a source of streams. Out of the box, Samza uses YARN as its fault-tolerant,

Read full article from Stream Processing and Mining just got more interesting - O'Reilly Radar

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.