Data Import and Basic Cleaning With Google Sheets
: Entry Level Task

Data Import and Basic Cleaning With Google Sheets : Entry Level Task

The steps to import and clean data using google sheets

Hello Hashnoders! Today we would be looking at how to import and clean data using google sheets. The test data we would use is gotten from Entry Level - A site that helps provide Data Analysts with real-world experience and assistance in building a portfolio.

ACCOUNT CREATION

The first thing you want to do is create an account with Google drive

Screenshot (198)_LI.jpg

  1. Visit the link to create an account

  2. Click "Go to Drive"

  3. Fill in necessary details and voila account creation successful

Skip all steps and simply login to google drive if you already have an account.

CREATE GOOGLE SHEET

After successfully entering the google drive home page, click on New

Screenshot (199)_LI.jpg

Then, Go down and select Google Sheets

Screenshot (200)_LI.jpg

You should have an untitled spreadsheet open before you, Now it's time to import our CSV files.

You can download the files to your local system or add them to your drive.

IMPORT FILES TO SPREADSHEET

To import, click on the File tab then select Import

Screenshot (201).png

If the file is stored in your drive select the location from My Drive. if it's in your local system then select the Upload tab and provide the location.

Screenshot (202).png

NOTE: YOU CAN ONLY UPLOAD ONE FILE AT A TIME

If you have provided the correct location and selected a file, you should see this now:

Screenshot (203).png

We would go with these default choices for the first upload. Click Import data

If everything goes well, our data should be imported successfully.

Screenshot (204).png

Now unto the next upload, we would repeat the same process up until after file location selection. Here, we would be selecting something different:

Screenshot (206).png

This time instead of replacing the current sheet or creating a new one, we would append it to the current sheet since we want all our data in one file/ spreadsheet.

Repeat the same process for the remaining two files.

With all this being imported successfully, there should be a total of 3685 rows

Screenshot (207).png

DATA CLEANING

What is Data Cleaning?

As defined by Tableau:

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.

When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled. If data is incorrect, outcomes and algorithms are unreliable, even though they may look correct. There is no one absolute way to prescribe the exact steps in the data cleaning process because the processes will vary from dataset to dataset. But it is crucial to establish a template for your data cleaning process so you know you are doing it the right way every time.

Removing Duplicates

  • Select All columns and rows

Ctrl + A to select the entire sheet should be what most people would attempt to do but it wouldn't work in this case, why? because we have blank rows meaning the selection would stop prematurely and not capture all rows in the sheet. Instead, we would use something called named range Click on Data > Named Ranges

Screenshot (217).png

There should be an opened tab on the right called Named Ranges, select Add a Range and give it a name. I will call mine select. We would also edit the next input box which is for the range statement. We know that we have 3685 rows and 12 columns so in order to select all, we would need to go from A1 - L3685 looking like so: 'Data_Sheet_Udemy_Courses_-_Business_Courses'!A1:L3685

The first part is provided in the input already for whatever cell you are currently on, you just need to edit the letters at the end to look like that

Screenshot (218).png

Click Done. Now you have a range, to use it just click on it

Screenshot (219).png

All data has been successfully selected, we can now move on to the next step

  • Click on Data > Remove Duplicates

Screenshot (208).png

You would get a pop-up with all columns selected, if not, select all checkbox should be marked.

Screenshot (222).png

After this click Remove duplicates. A success message specifying the process complete should show up

Screenshot (223).png

Removing Blank Cells

  • Click on our select named range to select all rows and columns

  • Click on Data > Create a Filter

Screenshot (224).png

  • Click on the filter icon on any of the column heads

Screenshot (225)_LI.jpg

  • Click on **Filter by Condition **and select Is empty

Screenshot (226).png

Click Ok

After processing, all blank rows would be moved up

Screenshot (227).png

You can now delete each row. Select all the rows in one cell, right-click and select delete selected rows

Screenshot (229).png

  • Go to Data > Turn off filter

Find and Replace

One of the imported files, the web development file, has a different format for its subject column which is not consistent with the rest. We would find it and replace it with a consistent format

  • Go to Edit > Find and Replace

Screenshot (231).png

  • Find:- Subject: Web Development and Replace with: Web Development

Screenshot (235).png

Select Replace all to replace every instance it finds. Click "Done"

You should have a total of 3677 columns

Screenshot (236).png

You can also get other cleanup suggestions from google sheets

Screenshot (237).png

CONCLUSION

With all this, the very basics of cleaning data have been explained. We covered removing duplicate rows which were the headers that followed each file after merging, we removed blank rows and we corrected the format in one of the columns. This is only the tip of the iceberg as google sheets offer many other tools like data validation, analysis, among others.

Till Next Time

African Clothing Store