Dealing with nulls in Apache Spark
1 min readSep 12, 2019
It is a best practice we should always use nulls to represent missing or empty data in a DataFrame. The main reason we should handle is because Spark can optimize when working with null values more than it can if you use empty strings or other values.
The primary way of interacting with null values at DataFrame is to use the .na subpackage on a DataFrame.
All the blank values and empty strings are read into a DataFrame as null by the Spark CSV library