Spark2, PySpark and Jupyter installation and configuration

Steps to be followed for enabling SPARK 2, pysaprk and jupyter in cloudera clusters. 1.INSTALL ORACLE JDK IN ALL NODES Download and install java. It should be jdk 1.8+ # cd /usr/java/ # wget –no-cookies –no-check-certificate –header “Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie” “http://download.oracle.com/otn-pub/java/jdk/8u144-b15/jdk-8u144-linux-x64.tar.gz” # tar xzf jdk-8u144-linux-x64.tar.gz   2.Install java with Alternatives # cd /usr/java # alternatives … Continue reading Spark2, PySpark and Jupyter installation and configuration

Advertisements

Creating multiple spark sessions in kerberos enabled cluster throws error

ISSUE:Creating multiple spark sessions in kerberos enabled cluster throws below error Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7519) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:548) Solution: Use the keytab and principal wihin spark code as below   spark = SparkSession\     .builder\     .appName('asdf')\     … Continue reading Creating multiple spark sessions in kerberos enabled cluster throws error