How to find duplicates in Excel? Today we will explore this question in and out. We will start first by understanding How to Highlight Duplicates in Excel. Our second objective will be learning How to Remove Duplicates. I will explore several approaches to this task – using the Data ribbons Remove Duplicates feature, but also showing how we can use MS Queries to remove duplicates and also do additional filtering/sanitizing of our data set.
Find Duplicates in Excel
Example Table with DuplicatesThere are several ways to go around Finding or Highlighting duplicates in Excel:
Following today’s post we will use the example Excel Table on the right.
1. Find Duplicates using Excel Formulas
To Find Duplicates in Excel using Formulas we will use the COUNTIF Excel formula to verify if a certain row as been repeated above. This will work only on a single column (our unique column identifier). Scroll down to the end to learn how to check for duplicates based on all three columns.
Add Is Duplicate? Column to your Worksheet
Add a Is Duplicate? Column to your Worksheet. In our example assuming that the Last column uniquely identifies records in my table input the following formula in the first cell of your new Is Duplicate? Column: Add COUNTIF Excel Formula to Find Duplicates Excel Formula:
=IF(COUNTIF(B$1:B1;"="&B2);"DUPLICATE";"")
Drag the Formula down to all Cells
Drag the COUNTIF Formula from the First Cell down to the Last.
Instead of dragging the Formula down manually you can:
- Select the First Cell with the Formula
- Copy by Formula by hitting CTRL+C
- Select all cells down by hitting CTRL+â–¼
- Paste the Formula by hitting CTRL+P
Find Entire Row Duplicates
Did you notice that in the previous example 2 rows were in fact not duplicates? See below: False Duplicates as Last name is not unique! What should you do if there is no Column that Uniquely identifies your rows? We need to use a Helper Column! See Column D below: Add a Helper Column that Concatenates All Columns Simply add the Helper Column and Concatenate all other Columns like so:
=A2&"_"&B2&"_"&C2
Next you need to use the COUNTIF Formula on the Helper Column instead.
2. Highlight Duplicates using Conditional Formatting
Conditional Formatting in HOME Ribbon TabTo Highlight Duplicates in Excel we need to use the Conditional Formatting feature in the HOME Ribbon Tab in Excel. Follow the steps below: How to Highlight Duplicates in Excel using Conditional Formatting
Select the Column to Highlight Duplicates
Select the Column in which you want to Highlight your Duplicates. As a Tip you can Select the First Cell and hit CTRL+END.
Open Conditional Formatting
Go to the HOME Excel Ribbon Tab and select Conditional Formatting.
Create New Rule
As we want to Highlight Duplicate Cell Values in Excel we need to create a new Conditional Formatting Rule. Click New Rule.
Select Duplicate formatting
Select the Format only unique or duplicate values option to define the Formatting of Duplicate Values.
Select Duplicate formatting
In the new window make sure Format All is set to duplicate. Next set the formatting for your Duplicates.
Once completed you should see your Duplicate Cells Highlighted in your Excel spreadsheet.
Remove Duplicates in Excel
Having answered the question of How to Find Duplicates in Excel, but what you often want to do is How to Delete Duplicates in Excel. So lets now tackle the issue of How to Remove Duplicates in Excel.
Similarly as with Finding Duplicates, there several ways to go around Removing Duplicates:
Remove Duplicates with Data Ribbon
Fortunately Removing Duplicates is a typical scenario in Excel supported by the Data Ribbon Tab.
Select Excel Table
Select your Entire Excel Table by hitting CTRL+A.
Remove Duplicates
Go to the Data Ribbon Tab in Excel and select Remove Duplicates.
Select Columns with Duplicates
In this last step you need to Select All Columns in which there are Duplicates. If you just want to remove rows in which there are duplicates in a single column – remember to select only that Column.
Once you finish hit OK and done!
Remove Duplicates with MS Query
Microsoft Query is a great tool to run MS Queries (SQL) on Excel data.
If you haven’t use MS Queries read my post on How to Create an MS Query in Excel or check-out my Excel SQL AddIn.
The DISTINCT SQL statement filters only the DISTINCT rows within a TABLE. You can use it on any SQL SELECT Query. See my example below:
The table on the Left is the Source Table the table on Right is the Result Table. This is the Microsoft Query you can use to filter only DISTINCT rows:
SELECT DISTINCT * FROM [Sheet1$]
Where replace Sheet1 with the name of your Worksheet.
Count Duplicates in Excel
Similarly as with the previous case, there several ways to go around Counting Duplicates:
Count Duplicates using Excel Formulas
To Count Duplicates in Excel follow the same steps as in Find Duplicates using Excel Formulas except you need to use the following Excel Formula using COUNTIF for the example in this post:
=COUNTIF(B$1:B1;"="&B2)
Similarly as in Find Entire Row Duplicates using Excel Formulas to Count Entire Rows that are duplicates, follow the above steps and create a similar Helper Column as such:
=A2&"_"&B2&"_"&C2
Count Duplicates using MS Query
Microsoft Query is a great tool to run MS Queries (SQL) on Excel data.
If you haven’t use MS Queries read my post on How to Create an MS Query in Excel or check-out my Excel SQL AddIn.
We can easily use a Microsoft Query to count only the duplicate records within our Data Set:
Here is the MS Query for the above Table:
SELECT First,Last,Age, (COUNT(*)-1) As [Duplicates] FROM [Sheet1$] GROUP BY First,Last,Age HAVING COUNT(*) > 1 ORDER BY COUNT(*) DESC