Apache Spark is a flexible and fast solution for large scale data processing. It is an open source distributed engine suitable for large scale data processing. Apache spark was founded by an Apache Software Foundation. It can be run on HBase, Hadoop, Cassandra, Hive, Apache Mesos, Amazon EC2 Cloud, HDFS, etc. It can run using its standalone cluster mode as well as on various cloud platforms.

Step-1 (Add the Java PPA)
# apt-add-repository ppa:webupd8team/java
Step-2 (Install the Java)

Update the apt-get repository

# apt-get update

Install the Java installer

# apt-get install oracle-java8-installer
Step-3 (Install the Scala)

Create the Scala directory

# mkdir /opt/scala

Download the Scala packages

# wget http://downloads.lightbend.com/scala/2.12.1/scala-2.12.1.deb

Install the Scala packages

# dpkg -i scala-2.12.1.deb
Step-4 (Install the Apache Spark)

Download the Apache Spark Tarball

# wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz

Extract the Apache Spark Tarball

# tar -xvf spark-2.0.2-bin-hadoop2.7.tgz

Copy the extracted packages to /opt/spark

# cp -rv spark-2.0.2-bin-hadoop2.7/* /opt/spark

Change the directory to access Spark Shell

# cd /opt/spark

Finally run the Spark Shell from Bash known as Scala Shell

# ./bin/spark-shell --master local[2]

That’s all for now.

How to Install Hexo Blog Framework on CentOS / RHEL

Previous article

How to Install Ghost Blogging Platform on Ubuntu / Debian

Next article

You may also like


Leave a reply

Your email address will not be published. Required fields are marked *

More in Linux