Open Source Apache Spark is fast becoming the de facto standard for Big Data processing and analytics. It’s an ‘in-memory’ data processing engine, utilising the distributed computing power of 10’s or even 1000’s of logical linked host machines (cluster). It’s able to crunch through vast quantities of both structured and unstructured data. You can easily scale out your cluster as your data appetite grows.
In addition to this it can also be used as a data federation layer spanning both traditional databases as well as other popular big data platforms, such as Hadoop HDFS, Hadoop Hbase, Cassandra, Amazon Redshift and S3, to name a few.
In addition to this it can also be used as a data federation layer spanning both traditional databases as well as other popular big data platforms, such as Hadoop HDFS, Hadoop Hbase, Cassandra, Amazon Redshift and S3, to name a few.