- Apache Spark Deep Learning Cookbook
- Ahmed Sherif Amrith Ravindra
- 149字
- 2025-02-26 11:49:44
Getting ready
There are several ways to create a dataframe in Spark. One common way is by importing a .txt, .csv, or .json file. Another method is to manually enter fields and rows of data into the PySpark dataframe, and while the process can be a bit tedious, it is helpful, especially when dealing with a small dataset. To predict gender based on height and weight, this chapter will build a dataframe manually in PySpark. The dataset used is as follows:

While the dataset will be manually added to PySpark in this chapter, the dataset can also be viewed and downloaded from the following link:
Finally, we will begin this chapter and future chapters by starting up a Spark environment configured with a Jupyter notebook that was created in chapter 1, Setting up your Spark Environment for Deep Learning, using the following terminal command:
sparknotebook