How it works...

This section explains how the SparkSession works as an entry point to develop within Spark.

  1. Staring with Spark 2.0, it is no longer necessary to create a SparkConf and SparkContext to begin development in Spark. Those steps are no longer needed as importing SparkSession will handle initializing a cluster.  Additionally, it is important to note that SparkSession is part of the sql module from pyspark.
  2. We can assign properties to our SparkSession:
    1. master: assigns the Spark master URL to run on our local machine with the maximum available number of cores.  
    2. appName: assign a name for the application
    3.  config: assign 6gb to the spark.executor.memory
    4. getOrCreate: ensures that a SparkSession is created if one is not available and retrieves an existing one if it is available