Learn DataBricks PySpark & SparkSQL with our Notebook

 


In this post, we will share our Databricks Learning Notebook that contains syntax and best practices for PySpark Dataframes API and Spark SQL. 


   This notebook will teach you basics and how to handle certain Data Analyst tasks like Missing and Duplicate data, column transformations etc.

   Even though this notebook is created for Data Analyst use cases, anyone who uses data transformations on DataBricks will benefit from it.


   You can download the notebook via link below. Zipped file contains  DBC, HTML and Python File format of the same notebook. You can import  DBC file to your workspace to see the notebook. 

Click here to Download the Notebook


We divided Databricks Learning Notebook into 3 main categories : 

  1. BASIC DEFINITIONS & LINKS
  2. DATAFRAMES
  3. SPARK SQL

   You can navigate between sections, using the side navigator. You can also collapse the headers, to avoid unused sections taking space.



















   We also included useful links and courses to learn more about PySpark and Spark SQL.  Keep in mind that some of the courses requires membership.

















   To avoid using lots of cells, we combined similar code snippets in the same cell and added an empty cell below to run a particular command.  You can copy the command you want to run (including any dependent code snippets, import statements etc. )  and paste it to the empty cell below and run. 
























If you liked this post, please don't forget to share and leave a comment below! 




Share:

Popular Posts