How it works...

This section explains how the manipulation of the dataframe is applied.

  1. functions from pyspark.sql have several useful logic applications that can be used to apply if-then transformations to columns in a Spark dataframe.  In our case, we are converting Female t0 0 and Male to 1.
  2. The function to convert to numeric is applied to the Spark dataframe using the .withColumn() transformation. 
  3. The .select() feature for a Spark dataframe functions like traditional SQL by selecting the columns in the order and manner requested.
  4. A final preview of the dataframe will display the updated dataset, as seen in the following screenshot: