I wish I had a heart as big as yours To love all beings on Earth To clear all suffering and sorrow in this world To heal all the pain they carry To hug them all To whisper to them “you’re all so…
Data analysts spend most of their time cleaning data. In real life, data looks like this:
Bad data includes inconsistent text and formatting, missing values, strange characters, etc. This article will discuss general Excel data cleaning functions and showcase examples for three common data types: Text, Number, Date.
Excel How-To: Data → Filter
2. Use Pivot Tables to understand more about the data values.
Now we want to know how many values of “blank” and “NA” are there. This is important to determine our strategy in dealing with missing data.
Excel How-To: Insert → Pivot Table
3. Handle Missing Values
Missing values requires a lot of intuition. Here are three recommendations from the least recommended to the most recommended.
3.1 If there are relatively few rows/columns with missing values, delete them.
3.2 Replace missing values with the mean
3.3. Replace missing values with a constant like -999 to group them for further analysis
The key to data cleaning is consistency. Once the data is consistent across rows, it is a pretty good indication it is ready for analysis.
The location column is inconsistent. We are interested in the first word only. In order to extract that, we will use the Text to Columns functionality.
Excel How-To: Data → Text to Columns.
1.2 The country column has inconsistent capitalization. We will need to standardize it by making it all capital case, lower case, or proper case using the =upper(), =lower(), =proper() functions, respectively.
Excel How-To: =upper(), =lower(), =proper()
2. Cleaning Number Data
Numbers are easy to deal with. First, you want to get an idea of the distribution to identify any outliers. Typically, outliers are three standard deviations away from the mean. In this case, find any number greater than 49.46+(3*18.27) or less than 49.46-(3*18.27) and deal with them accordingly using the same techniques you used to deal with missing values.
Excel How-To: Install the Analysis Toolpak plugin. Then, Data → Data Analysis → Descriptive Statistics
3. Cleaning Date Data
Dates can be the trickiest to deal with. Let’s look at this example:
Use a pivot table to understand the different values available and their distribution. It seems that the correct data follows this format: Y-M-D
The remaining bad data looks like this: M/D/Y
We will need to use Text to Columns to split the data and then Concat in the order of the correct data Y-M-D.
Keep on examining the data using the filter function until it looks good.
Finally, and make sure you change the format of the column to Date.
I was listening to my baby brother play Weird Al’s song, Dare To Be Stupid. At first, I thought of the 1986 Transformers movie. Then, I thought, have I ever dared to be stupid? I have my blonde…
Once the stage lights pop and fizzle to darkness all that amazing chemistry disassembles and rides out its days at the bottom of a tiny gravity well in a minor galaxy, in a universe that may or may…
I got concept of new technique to manage my days to acheive my goals. I found thid technique very impressive. Most of guys can not manage time to perform their duties. This is because of diverted…