Learn DataBricks PySpark & SparkSQL with our Notebook

 


In this post, we will share our Databricks Learning Notebook that contains syntax and best practices for PySpark Dataframes API and Spark SQL. 


   This notebook will teach you basics and how to handle certain Data Analyst tasks like Missing and Duplicate data, column transformations etc.

   Even though this notebook is created for Data Analyst use cases, anyone who uses data transformations on DataBricks will benefit from it.


   You can download the notebook via link below. Zipped file contains  DBC, HTML and Python File format of the same notebook. You can import  DBC file to your workspace to see the notebook. 

Click here to Download the Notebook


We divided Databricks Learning Notebook into 3 main categories : 

  1. BASIC DEFINITIONS & LINKS
  2. DATAFRAMES
  3. SPARK SQL

   You can navigate between sections, using the side navigator. You can also collapse the headers, to avoid unused sections taking space.



















   We also included useful links and courses to learn more about PySpark and Spark SQL.  Keep in mind that some of the courses requires membership.

















   To avoid using lots of cells, we combined similar code snippets in the same cell and added an empty cell below to run a particular command.  You can copy the command you want to run (including any dependent code snippets, import statements etc. )  and paste it to the empty cell below and run. 
























If you liked this post, please don't forget to share and leave a comment below! 




Share:

Use Our Quote Library to Enrich your Presentations

  


In this post, we will share our Quotation Library that we gathered from all around the Internet.

  We also labeled each quotation to related categories. So, every time you need a quotation from that subject, you can easily filter the library and find the suitable one for your need.

These are the categories we have for total of 727 quotes :

  • Attitude
  • Best Quotes of All Time
  • Brainy
  • Business
  • Change
  • Communication
  • Inspirational Quotes
  • Movies
  • Quotes About Life
  • Road to Success Quotes
  • Teamwork







You can download this quote library via link below.  (You can use slicers if you open with Excel)




If you liked this post, please don't forget to share and leave a comment below! 

Share:

Get the Latest News with our Knime Workflow

 

 In this post, we will take a look at our Knime Project where we can get the latest news from major media outlets using their RSS feeds and combining them with a user friendly UI.





Click here to Download this project from Official Knime Page



First of all, we connect to media outlet's RSS Feeds by using RSS Feed Reader node.

Then we concatenate all the results in a single table and pass it to dashboard component. 







Inside the dashboard component, we first show the results as a table with clickable hyperlink to each news detail. We also provide media outlet as a slicer for the user.





   Second part of the dashboard component is for creating the word cloud that contains the most common words from the news. In this step, we first pre process the data and then create bag of words from the news.  Then we calculate the term frequencies to show them in the word cloud visual.


Here is the end result look like : 


























With just one click, you can get the breaking news ready and read them from Knime Browser. 


If you like this project, please share and comment below. Also don't forget to check our Knime segment for more projects like this. 





Share:

Scrape your Competitor's Websites with Advanced Web Scraper







 In this post, we will go over the details of our latest project Advanced Web Scraper for H&M Germany Knime Workflow.


   This workflow, connects to the H&M Germany website and gets the product category and sub category information, as well as all the product page URLs and price information to aggregate at certain hierarchies. 

   This template can be used for other retailers. However, since the design of the website will be different than this, there has to be some changes required. 









   For each category in H&M website, we will replicate the steps. However, when website gets UI updates, there might be some code changes required as well. 















   First, we will use Webpage Retriever and Xpath nodes to connect to the website and get the product categories.  Then, we will do some transformation to get the price and format the data for our reports.






   We will also calculate the product count at each prive level per category to see how are the prices distributed per categories.  After these transformations, data will be ready to sent to Power BI for further reporting.






By using metanodes, we can wrap and run all of these with just one click and the data will be ready for our PowerBI dataset. 







We can also export results to Excel like below. 


















If you liked this project, please don't forget to share and leave a comment below!





Share:

One Click to get Weather Forecast of your favorite cities


 In this post, we'll go over the Knime Workflow that gets the weather forecast information for the cities you've selected.


 


(This workflow is available for download at BI-FI Business Projects Knime Hub page.)

Click here to visit the download page


   First step in this workflow is to connect to the weather forecast website and extract the details we are interested in. 

(https://www.weather-forecast.com/countries)

We will use Webpage Retriever and XPath nodes in Knime to achieve this.

Knime Workflow







   Then in the STEP 1 Component, we will let user select the cities they wish by using Nominal Row Filter Widget.  City list includes all the major cities all around the world.








   After user is done selecting the cities, we will do the transformation required in the STEP 3 metanode and feed the final data to dashboard to show the forecast for selected cities. 




Just click the Execute All button in the Knime and let the workflow do its magic. 


   For more workflows like this, check out the Knime section on our website and also our Knime Hub Profile.

  BI FI Business Knime Hub Profile








Share:

Free Courses List For Becoming a Data Analyst - 190 HOURS

    In this post, we will share the 190 hours long, complete list of free courses that are necessary to become a Data Analyst. 

The list consists of courses for :
  • Excel, 
  • SQL Server, 
  • Power BI , PowerBI - DAX, 
  • Python, Python - Numpy, Python - Pandas , 
  • StoryTelling, 
  • Communication Skills.

















    We have included the PowerBI and SQL Server since it is more common. However, if your desired company uses other combinations like MySQL + Tableau, you might want to learn them instead. 

    We have also include some soft skills courses like Story Telling and Communcation Skills, which is a broad concept, to improve these skills as well. Because in most of the cases, you will require these skills as much as your Software skills.  

    You should continue learning more about these soft skills even more to increase your presentation skills which will highly impact your career success.


    While mastering these skills, you should also focus on increasing your Business Domain Knowledge to increase your value in the organization. Business Domain Knowledge includes; 
  • Understanding how your organization works in details, 
  • What are your organization's current problems
  • What are important problems you can solve within your area of expertise




    If you are a complete beginner, we recommend you to start with Excel, and go according to the pathway on the chart. 



 

If you ever feel overwhelmed by how long will it take to learn all these skills, just remember this quote from Tony Robbins,

The only impossible journey is the one you never begin.


 


Share:

Data Cleaning Checklist

    In this post, we will explore the details of our Data Cleaning Checklist.


    This checklist will guide you through your data analysis projects by reminding you which actions to do next. 

Our Data Cleaning Checklist consists of 3 main sections: 

Extraction, Cleaning and Profiling

    We have also added the Reminders section to remind you certain things throughout your analysis.

Click here to download the checklist in Excel File
















Let's break the checklist down..

    Before starting an analysis, it is always recommended to check the personal data guidelines to avoid any potential data privacy problems.

    Then next thing you should do is to discuss the business needs with your stakeholders to chunk down the problem as much as possible before starting to code. Otherwise, you will spend hours on data profiling and trying to come up with an hypothesis. 

    After chunking down the problem and identifying the business needs, you can proceed to the Extraction section. 

    In Extraction section, you have to find the optimum way to get the data you need from the right sources. You should be querying for only the columns you need to avoid messy data.

    After you get your data ready in your preferred platform whether if it is a BI tool or a language like Python, you should focus on :

  • Data Integrity
  • Handling null and duplicate values
  • Formatting columns
  • Handling outliers 
  • Naming conventions

   After the cleaning part of the project, you should proceed with Data Profiling where you explore the relationships and correlation between columns to create hypothesis to solve your business problems. 

   Use statistics, pivot tables, drill down in different hierarchies, explore the outliers and try to find the root cause of the problem. 

   After all these analysis, you can create a Data Story with your findings to present it to your stakeholders and further explore your projects with them. 



If you like this content please don't forget to share and leave a comment below! 








Share:

Ace your Next Job Interview - A Complete Guide for Job Interviews

  
    In our previous post, we have shared a Decision Tree that guides you whether you should apply for a job or not.  Click here to read

    In this post, we will give you a complete tutorial for each step of a typical interview with checklists and helpful tips.

     We have created this tutorial based on data related positions, however, the best practices applies to almost any job interview. So, stick with us, you might have get some useful tips as well.

    Usually, most job interview processes for data related roles, starts with an Human Resources Specialist Interview where you meet and learn about the role and the process, showcase your personal brand and leave an impression. 

    If you are successful in previous step, you will most likely invited to the second step: which is typically with your potential manager and some stakeholders. In this step, you should go into more details about your experiences, your skillset and how you can bring value to their organization. 

    If you are successful at the second step as well, there might be a Case Study Step related to your role where you have to showcase some technical skills as well as your presentation abilities.

    And if everything goes according to plan, you will get an offer for your dream job. 

Don't worry, we are here to help you achieve this with our checklists for each steps to make sure that you are well prepared for every step of the process. 

  • Before Applying
  • Before the HR Interview
  • Before the Manager Interview
  • During the Interview
  • After the Interview

You can download these checklist in Excel Format via link below :

Click here to Download the Checklist


Please share the content and leave a comment if you find this useful.







If you liked this post, please don't forget to share and leave a comment below!








Share:

BEFORE YOU APPLY FOR THE NEXT JOB

     So you have decided to look for job posts, but there are maybe thousands of them possibly a good fit for you right?  And most of them ask you to go to their web page, create profile and fill out pages of forms. And you wonder if it's worth all that effort.


    Or you have applied for hundreds of jobs but they haven't even replied you with rejection e-mail. Well, chances of your application being filtered out in the process is pretty high. 

Share:

Knime Date Difference Calculator

   In this post, we will take a look at the Knime Date Difference Calculator created by BI-FI Blogs and explore how to use it. 

   This project is useful for calculating due dates, managing shipment dates or simply calculating the date difference to see : 

  • How many days and working days has passed from Start Date
  • How many weeks has passed from Start Date
  • How many days and working days left till End Date
  • How many weeks left till End Date


   This workflow contains; flow variable usage, date-time extraction nodes and many advanced syntax for date calculations. You can copy these node configurations for your own projects as well.

   You can download the workflow via link below. All the workflows created by BI-FI Blogs will be available on Knime Hub as well.  

Click to Check the Knime Date Difference Calculator















Workflow should look like this. Now let's break it down to see how it works.



First, execute and view the STEP 1 and 

enter the Start Date and End Date for the calculations.




Now,  just run the STEP 2 metanode. This metanode creates a date range with your Start-End Date and calculates the duration for the calculations.








   After that, STEP 3.1 calculates how many days, work days and weeks has passed. To calculate the work days, first we extract the day number of the dates and if date number is greater than 5 (Friday), that day will labeled as weekend. 







For the STEP 3.2, to calculate how many days left till the End Date, we need to know which date is the execution date. We are using nodes on the right to find the execution date and pass it as a flow variable to use in calculations.

   Finally, we can run the dashboard node to see all the calculations at once as tables below. 

   We have explained the workflow details, but only manuel job is entering the dates just once. After that you can run the workflow and it will create the tables for you.  



If you liked this project, please don't forget to share and leave a comment below! 



Share:

Twitter Scraper Project


Designed by pikisuperstar / Freepik



In this project, we will create a Knime Workflow to get the latest tweets from Twitter that contains the keywords we wanted.



Result will be a table, containing tweets, and other information like: user name, follower count, retweet count.


In order to run the workflow, you will need a developer account from Twitter which is free to get. You can easily apply for it from the link below : 

Apply for Twitter Developer Account


After getting the API keys from our Twitter developer account, we will start with downloading the workflow created by BI-FI Blogs. ( link below)

Download the Workflow Created by BI-FI Blogs

















First thing you should do is to open Twitter API Connector and enter your Twitter developer account details. 








Then enter the keyword you would like to search and "Click Apply as New Default"




After that, we will just run the rest of the workflow and it'll do its magic and bring us the latest tweets.

We can also export the results to an Excel File using the Step 5. 









If you liked this project, please don't forget to share and leave a comment below! 



Share:

How To Read and Automate any RSS with Knime

 


   In this post, we will learn how to read an RSS feed and automate this process by using Knime.

Before we dive into this, let's remember what RSS actually means :


RSS is a web feed that allows users and applications to access 

updates to websites in a standardized, computer-readable format. 

Wikipedia



  In our example, with the RSS feed of our blog, you will be able to see our blog posts with their link and creation date at anytime you run it. So you will not miss any content created.


  First, we need the URL of the RSS feed. In our example, we can click the the icon on the screenshot below and copy the URL on the opening tab: 

https://bifiblogs.blogspot.com/feeds/posts/default










   Then download the Knime workflow created by BI-FI Blogs from the link below  or drag it to your Knime Workflow Editor.

Click to Download the Knime Workflow













   After you open the workflow, it should look like the image above. Now, if you want to use another RSS feed, you can paste that URL to the "Table Creator" node and run the worklow.

   We will continue with the RSS feed of our blog from the first step. From now on, all you have to do is run the entire workflow. and read the table view like image below.

   You can see all the latest blogs and their links everytime you run the workflow : 





If you like this post, please don't forget to share and comment below ! 







Share:

Popular Posts