Spark2, PySpark and Jupyter installation and configuration

Steps to be followed for enabling SPARK 2, pysaprk and jupyter in cloudera clusters.


  1. Download and install java. It should be jdk 1.8+

# cd /usr/java/
# wget –no-cookies –no-check-certificate –header “Cookie:; oraclelicense=accept-securebackup-cookie” “”

# tar xzf jdk-8u144-linux-x64.tar.gz


2.Install java with Alternatives

# cd /usr/java
# alternatives –install /usr/bin/java java /usr/java/jdk1.8.0_144/bin/java 2
# alternatives –config java
There are 3 programs which provide ‘java’.

Selection Command
* 1 /opt/jdk1.7.0_60/bin/java
+ 2 /opt/jdk1.7.0_72/bin/java
3 /usr/java/jdk1.8.0_144/bin/java

Enter to keep the current selection[+], or type selection number: 3 [Press Enter]

3. Check java version

# java -version

java version “1.8.0_144”

Java(TM) SE Runtime Environment (build 1.8.0_144-b01)

Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)



Download anaconda python and give it execution permission as below


#chmod 755

This will ask to accept license and specify the installation location. Take a note of the installation location


  1. Make sure that the requirements (CDH, cloudera manager and JDK)of Spark2 are met
    1. Restart the Cloudera Manager Server:
  1. Download the Spark2 CSD and place it in csd location configured
  2. Set the file ownership to cloudera-scm:cloudera-scm with permission 644.

          #  service cloudera-scm-server restart

  1. Log into the Cloudera Manager Admin Console and restart the Cloudera Management Service.
  2. Download the Spark2 parcel, distribute the parcel to the hosts in your cluster, and activate the parcel.
  3. Add the Spark 2 service to your cluster.
  4. Restart the stale services in the cluster
  5. Do the testing of  spark and pyspark


#spark-submit  \   
–class org.apache.spark.examples.SparkPi \   
–deploy-mode cluster \   
–master yarn \   
$SPARK_HOME/examples/lib/spark-examples_version.jar 10
$ hdfs dfs -mkdir /user/systest/spark
$ pyspark2
SparkSession available as ‘spark’.
>>> strings = [“one”,”two”,”three”]
>>> s2 = sc.parallelize(strings)
>>> s3 = word: word.upper())
>>> s3.collect()
[‘ONE’, ‘TWO’, ‘THREE’]
>>> s3.saveAsTextFile(‘hdfs:///user/systest/spark/canary_test‘)
>>> quit()
$ hdfs dfs -ls /user/systest/spark
Found 1 items
drwxr-xr-x   – systest supergroup          0 2016-08-26 14:41 /user/systest/spark/canary_test
$ hdfs dfs -ls /user/systest/spark/canary_test
Found 3 items
-rw-r–r–   3 systest supergroup          0 2016-08-26 14:41 /user/systest/spark/canary_test/_SUCCESS
-rw-r–r–   3 systest supergroup          4 2016-08-26 14:41 /user/systest/spark/canary_test/part-00000
-rw-r–r–   3 systest supergroup         10 2016-08-26 14:41 /user/systest/spark/canary_test/part-00001
$ hdfs dfs -cat /user/systest/spark/canary_test/part-00000
$ hdfs dfs -cat /user/systest/spark/canary_test/part-00001



1. Enable python 3.6,

2. Open your .bash_profile  #vim .bash_profile

3. Add the line PATH=/data/anaconda3/bin:$PATH:$HOME/bin

4. Comment the existing path

5. The final path will be as below.

       export PATH


6.  Generate jupyter config


#jupyter notebook –generate-config

7.   Open the jupyter config and add ip, port, password and set not to open browser by default.
c.NotebookApp.open_browser = False
c.NotebookApp.password = u’sha1:b590ff3593c9:c469e487d6d4e4650677b318t8dedffec7be35db
c.NotebookApp.ip = ‘’
c.NotebookApp.port = 6090


The password is a hased one and can be generated as below.

from IPython.lib import passwd password = passwd(“secret”) password
8.   Launch jupyter

#jupyter notebook –config .jupyter/ &

6. PySpark from jupyter

Login to Jupyter and after that issue the below commands first by creating  a normal python3 notebook

import os

import sys

os.environ[“SPARK_HOME”] = “/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2”

os.environ[“PYLIB”] = os.environ[“SPARK_HOME”] + “/python/lib”

sys.path.insert(0, os.environ[“PYLIB”] +”/”)

sys.path.insert(0, os.environ[“PYLIB”] +”/”)


Successful code execution will show results as below.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s