Tag Archives: distinct

how to find duplicates in excel

How to Find Duplicates in Excel. Remove Duplicates in Excel

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 4.00 out of 5)
Loading...

How to find duplicates in Excel? Today we will explore this question in and out. We will start first by understanding How to Highlight Duplicates in Excel. Our second objective will be learning How to Remove Duplicates. I will explore several approaches to this task – using the Data ribbons Remove Duplicates feature, but also showing how we can use MS Queries to remove duplicates and also do additional filtering/sanitizing of our data set.

CONTENTS

Find Duplicates in Excel

find duplicates source table
Example Table with Duplicates
There are several ways to go around Finding or Highlighting duplicates in Excel:

Following today’s post we will use the example Excel Table on the right.

1. Find Duplicates using Excel Formulas

To Find Duplicates in Excel using Formulas we will use the COUNTIF Excel formula to verify if a certain row as been repeated above. This will work only on a single column (our unique column identifier). Scroll down to the end to learn how to check for duplicates based on all three columns.

Add Is Duplicate? Column to your Worksheet

Add a Is Duplicate? Column to your Worksheet. In our example assuming that the Last column uniquely identifies records in my table input the following formula in the first cell of your new Is Duplicate? Column:

find duplicates in excel using formulas
Add COUNTIF Excel Formula to Find Duplicates

Excel Formula:

=IF(COUNTIF(B$1:B1;"="&B2);"DUPLICATE";"")

Drag the Formula down to all Cells

Drag the COUNTIF Formula from the First Cell down to the Last.
find duplicates in excel the steps

Instead of dragging the Formula down manually you can:

  1. Select the First Cell with the Formula
  2. Copy by Formula by hitting CTRL+C
  3. Select all cells down by hitting CTRL+
  4. Paste the Formula by hitting CTRL+P

Find Entire Row Duplicates

Did you notice that in the previous example 2 rows were in fact not duplicates? See below:

find duplicates in excel for an entire row
False Duplicates as Last name is not unique!

What should you do if there is no Column that Uniquely identifies your rows? We need to use a Helper Column! See Column D below:
find duplicates in excel with a helper column
Add a Helper Column that Concatenates All Columns

Simply add the Helper Column and Concatenate all other Columns like so:

=A2&"_"&B2&"_"&C2

Next you need to use the COUNTIF Formula on the Helper Column instead.

I recommend separating columns with some Separator (such as the _) as otherwise you may again get False Positive Duplicates

2. Highlight Duplicates using Conditional Formatting

highlight duplicates conditional formatting
Conditional Formatting in HOME Ribbon Tab
To Highlight Duplicates in Excel we need to use the Conditional Formatting feature in the HOME Ribbon Tab in Excel. Follow the steps below:
highlight duplicates conditional formatting
How to Highlight Duplicates in Excel using Conditional Formatting

Select the Column to Highlight Duplicates

highlight duplicates select columnSelect the Column in which you want to Highlight your Duplicates. As a Tip you can Select the First Cell and hit CTRL+END.

Open Conditional Formatting

highlight duplicates conditional formattingGo to the HOME Excel Ribbon Tab and select Conditional Formatting.

Create New Rule

highlight duplicates create new ruleAs we want to Highlight Duplicate Cell Values in Excel we need to create a new Conditional Formatting Rule. Click New Rule.

Select Duplicate formatting

highlight duplicates formatting
Select the Format only unique or duplicate values option to define the Formatting of Duplicate Values.

Select Duplicate formatting

In the new window make sure Format All is set to duplicate. Next set the formatting for your Duplicates.
highlight duplicates format
Once completed you should see your Duplicate Cells Highlighted in your Excel spreadsheet.
highlight duplicates in excel

Remove Duplicates in Excel

Having answered the question of How to Find Duplicates in Excel, but what you often want to do is Remove Duplicates. So lets now tackle the issue of How to Remove Duplicates in Excel.

Similarly as with Finding Duplicates, there several ways to go around Removing Duplicates:

Remove Duplicates with Data Ribbon

Fortunately Removing Duplicates is a typical scenario in Excel supported by the Data Ribbon Tab.
remove duplicates in excel

Select Excel Table

remove duplicates select tableSelect your Entire Excel Table by hitting CTRL+A.

Remove Duplicates

remove duplicates data ribbonGo to the Data Ribbon Tab in Excel and select Remove Duplicates.

Select Columns with Duplicates

remove duplicates select columns with duplicatesIn this last step you need to Select All Columns in which there are Duplicates. If you just want to remove rows in which there are duplicates in a single column – remember to select only that Column.
Once you finish hit OK and done!

Remove Duplicates with MS Query

Microsoft Query is a great tool to run MS Queries (SQL) on Excel data.

If you haven’t use MS Queries read my post on How to Create an MS Query in Excel or check-out my Excel SQL AddIn.

The DISTINCT SQL statement filters only the DISTINCT rows within a TABLE. You can use it on any SQL SELECT Query. See my example below:
remove duplicates ms query
The table on the Left is the Source Table the table on Right is the Result Table. This is the Microsoft Query you can use to filter only DISTINCT rows:

Where replace Sheet1 with the name of your Worksheet.

Count Duplicates in Excel

Similarly as with the previous case, there several ways to go around Counting Duplicates:

Count Duplicates using Excel Formulas

count duplicates in excelTo Count Duplicates in Excel follow the same steps as in Find Duplicates using Excel Formulas except you need to use the following Excel Formula using COUNTIF for the example in this post:

Similarly as in Find Entire Row Duplicates using Excel Formulas to Count Entire Rows that are duplicates, follow the above steps and create a similar Helper Column as such:

Count Duplicates using MS Query

Microsoft Query is a great tool to run MS Queries (SQL) on Excel data.

If you haven’t use MS Queries read my post on How to Create an MS Query in Excel or check-out my Excel SQL AddIn.

We can easily use a Microsoft Query to count only the duplicate records within our Data Set:
count duplicates in excel ms query
Here is the MS Query for the above Table:

distinct list

Dynamic Distinct Column in Excel using Array Formulas

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4.00 out of 5)
Loading...

I see that often many users have issues with managing lists of values and translating them to unique/distinct lists where values do not repeat themselves. There are many way to tackle this problem and you would be surprised that there even is a formula to handle this task. Let’s dive into this subject.

What we have (unsorted list) Vs. What we want (distinct list of sorted values)

Let’s assume a simple table of data. Where column A is full of repeating Names and column B is a list of corresponding values. We will be working on this data set through out this post. This data set is a typical example of the issue we often have with repeating rows of indistinct data. What we will want to do is to somehow summarize this table with a list of distinct Names and aggregated Values.

What we have

Let’s assume we have a simple table of Names and Values. What we will want to do is to somehow summarize this table with a list of distinct Names and aggregated Values.

Name (A column) Value (column B)
Tom 60
Matthew 98
James 19
John 16
Matthew 45
John 26
John 70
James 60

What we want

The result of this operation should look somewhat like this:

Name Value
James 79
John 112
Matthew 143
Tom 60

Now let’s familiarize with 2 approaches to this issue.

Method 1: Distinct column using a PivotTable

The most obvious solution is of course to create a PivotTable from this data set. All we need to do is add the Names as ROWS and Values as VALUES to the Pivot to get a simple summary with the exact data we need. See below:

Distinct list: Pivot Table to aggregate Values
Pivot Table to aggregate Values

Method 2: Distinct column using Array Formulas

Now putting aside the obvious, in some cases Pivots are not the answer. Especially if we don’t want to use VBA Macros and want to create a dynamic table which will simply update itself with the latest Values and Names.

Provide list of distinct Names

The first issue which we stumble upon is to somehow produce a list of distinct Names. At first this may seem impossible to be done by a Excel function. Fortunately again Array Formulas can come in handy in this task. This site features the elegant solution to this problem which I will try to explain in much detail.

To provide a list of distinct Nameswe must use the following formula:

=INDEX(A$1:A$9;MATCH(0;COUNTIF(E$1:E1;A$1:A$9);0))

This is the final outcome when we hit CTRL+SHIFT+ENTER to make the formula an Array Formula and drag it down:

Distinct list: Distinct column of Names
Distinct column of Names

Now let’s ponder for a second on how the formula works as it might not be so straightforward as it seems:

'Gets an item from A$1:A$9 with an index provided by the MATCH function
=INDEX(
       A$1:A$9;
'Find value "0" in the column of value provided by the COUNTIF function
       MATCH(
            0;
'Returns an array. Count items from array A$1:A$9 if they are provided
'on the E$1:E1 list
            COUNTIF(
'Notice that this list has only 1 static item. The first item is the 
'anchor, however, the second item will move to include all rows above
'the current one
                   E$1:E1;
                   A$1:A$9
                   );
            0)
      )

The exciting thing is how the MATCH function is used above. Usually the MATCH function is provided with a range of cells. This time, however, we are providing it with an array being the result of the COUNTIF function. What the MATCH function does is search for the first item resulting from the COUNTIF that has 0 counts i.e. it has not yet been provided in the list.

Now the formula will start producing #N/A errors when dragged beyond the number of distinct elements in the Names columns. You can correct this by wrapping it with a IFERROR function:

=IFERROR(
        INDEX(A$1:A$9;MATCH(0;COUNTIF(E$1:E1;A$1:A$9);0))
        ;"")

Sort the list of distinct Names

Now that we know how to provide a list of distinct names we still need to make sure that the list is sorted alphabetically else it will be provided in the same order as the items on the initial list. Here we need again to resort to Array Formulas to help us with this tasks.

Let’s start with explaining how to get the index of each name (concept from Chandoo.org):

Get index of elements in our unsorted distinct list
Get index of elements in our unsorted distinct list

Notice the code in column E:

=COUNTIF(D$2:D$5;"<"&D2)

What we are doing is counting, for each distinct name, the number of items alphabetically of lower lexicographical value than our the current Name (D2).

Let’s now consider an Array Formula based on this:

=COUNTIF(D$2:D$5;"<"&D$2:D$5)

If we hit F9 we will see that an Array Formula would evaluate to:

={3;2;0;1}

This is the correct sequence of our list of distinct items.

If we combine this with an INDEX-MATCH combo we can iterate through this sequence:

Sorted list of distinct Names
Sorted list of distinct Names

'Gets an item from D$2:D$5 with an index provided by the MATCH function
=INDEX(D$2:D$5;
'Find value of the current ROW-2 (this is simply to sequence through 
'the arrary) in the column of value provided by the COUNTIF function
               MATCH(
                    ROW()-2;
'Evaluates to {3;2;0;1} - this is the correct sequence of our distinct
'array of Names
                    COUNTIF(D$2:D$5;"<"&D$2:D$5);
                    0)
      )

Summary

Now to summarize what we have.

  • First, we provided the distinct list of Names in column D
  • Secondly, we provided a separate column which sorts the distinct Names in column D
  • Lastly, if we add a simple SUMIF function as shown below we can sum all values for each distinct Name in the Names column
Final data table
Final data table

Next Steps

Check out other similar posts:
EXCEL: Dynamic row numbers
EXCEL: 10 Top Excel features
EXCEL: Split columns on any pattern