Apache Storm, Apache Kafka, and Apache Spark (Streaming) as parts of a real-time processing engine; they work together very well and make for an easy development experience while still being very performant. About Spark Apache Spark is a general purpose, large-scale processing engine, recently fully inducted as an Apache project and is currently under very active development. As of this writing, Spark is at version 1.0.2 and 1.1 will be released some time soon. Spark is intended to be a drop in replacement for Hadoop MapReduce providing the benefit of improved performance. Combining Spark with its related projects and libraries — Spark SQL (formerly Shark) , Spark Streaming , Spark MLlib , GraphX , among others — and a very capable and promising processing stack emerges. Spark is capable of reading from HBase, Hive, Cassandra, and any HDFS data source. Not to mention the many external libraries that enable consuming data from many more sources, e.g.,
Read full article from Hadoop Real Time Analytics: Twitter Stream Sentiment Analysis with Apache Storm and Apache Kafka | Analytcz
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.