Tag Archives: Excel formula

how to find duplicates in excel

How to Find Duplicates in Excel. Remove Duplicates in Excel

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading...

How to find duplicates in Excel? Today we will explore this question in and out. We will start first by understanding How to Highlight Duplicates in Excel. Our second objective will be learning How to Remove Duplicates. I will explore several approaches to this task – using the Data ribbons Remove Duplicates feature, but also showing how we can use MS Queries to remove duplicates and also do additional filtering/sanitizing of our data set.

CONTENTS

Find Duplicates in Excel

find duplicates source table
Example Table with Duplicates
There are several ways to go around Finding or Highlighting duplicates in Excel:

Following today’s post we will use the example Excel Table on the right.

1. Find Duplicates using Excel Formulas

To Find Duplicates in Excel using Formulas we will use the COUNTIF Excel formula to verify if a certain row as been repeated above. This will work only on a single column (our unique column identifier). Scroll down to the end to learn how to check for duplicates based on all three columns.

Add Is Duplicate? Column to your Worksheet

Add a Is Duplicate? Column to your Worksheet. In our example assuming that the Last column uniquely identifies records in my table input the following formula in the first cell of your new Is Duplicate? Column:

find duplicates in excel using formulas
Add COUNTIF Excel Formula to Find Duplicates

Excel Formula:

=IF(COUNTIF(B$1:B1;"="&B2);"DUPLICATE";"")

Drag the Formula down to all Cells

Drag the COUNTIF Formula from the First Cell down to the Last.
find duplicates in excel the steps

Instead of dragging the Formula down manually you can:

  1. Select the First Cell with the Formula
  2. Copy by Formula by hitting CTRL+C
  3. Select all cells down by hitting CTRL+
  4. Paste the Formula by hitting CTRL+P

Find Entire Row Duplicates

Did you notice that in the previous example 2 rows were in fact not duplicates? See below:

find duplicates in excel for an entire row
False Duplicates as Last name is not unique!

What should you do if there is no Column that Uniquely identifies your rows? We need to use a Helper Column! See Column D below:
find duplicates in excel with a helper column
Add a Helper Column that Concatenates All Columns

Simply add the Helper Column and Concatenate all other Columns like so:

=A2&"_"&B2&"_"&C2

Next you need to use the COUNTIF Formula on the Helper Column instead.

I recommend separating columns with some Separator (such as the _) as otherwise you may again get False Positive Duplicates

2. Highlight Duplicates using Conditional Formatting

highlight duplicates conditional formatting
Conditional Formatting in HOME Ribbon Tab
To Highlight Duplicates in Excel we need to use the Conditional Formatting feature in the HOME Ribbon Tab in Excel. Follow the steps below:
highlight duplicates conditional formatting
How to Highlight Duplicates in Excel using Conditional Formatting

Select the Column to Highlight Duplicates

highlight duplicates select columnSelect the Column in which you want to Highlight your Duplicates. As a Tip you can Select the First Cell and hit CTRL+END.

Open Conditional Formatting

highlight duplicates conditional formattingGo to the HOME Excel Ribbon Tab and select Conditional Formatting.

Create New Rule

highlight duplicates create new ruleAs we want to Highlight Duplicate Cell Values in Excel we need to create a new Conditional Formatting Rule. Click New Rule.

Select Duplicate formatting

highlight duplicates formatting
Select the Format only unique or duplicate values option to define the Formatting of Duplicate Values.

Select Duplicate formatting

In the new window make sure Format All is set to duplicate. Next set the formatting for your Duplicates.
highlight duplicates format
Once completed you should see your Duplicate Cells Highlighted in your Excel spreadsheet.
highlight duplicates in excel

Remove Duplicates in Excel

Having answered the question of How to Find Duplicates in Excel, but what you often want to do is Remove Duplicates. So lets now tackle the issue of How to Remove Duplicates in Excel.

Similarly as with Finding Duplicates, there several ways to go around Removing Duplicates:

Remove Duplicates with Data Ribbon

Fortunately Removing Duplicates is a typical scenario in Excel supported by the Data Ribbon Tab.
remove duplicates in excel

Select Excel Table

remove duplicates select tableSelect your Entire Excel Table by hitting CTRL+A.

Remove Duplicates

remove duplicates data ribbonGo to the Data Ribbon Tab in Excel and select Remove Duplicates.

Select Columns with Duplicates

remove duplicates select columns with duplicatesIn this last step you need to Select All Columns in which there are Duplicates. If you just want to remove rows in which there are duplicates in a single column – remember to select only that Column.
Once you finish hit OK and done!

Remove Duplicates with MS Query

Microsoft Query is a great tool to run MS Queries (SQL) on Excel data.

If you haven’t use MS Queries read my post on How to Create an MS Query in Excel or check-out my Excel SQL AddIn.

The DISTINCT SQL statement filters only the DISTINCT rows within a TABLE. You can use it on any SQL SELECT Query. See my example below:
remove duplicates ms query
The table on the Left is the Source Table the table on Right is the Result Table. This is the Microsoft Query you can use to filter only DISTINCT rows:

Where replace Sheet1 with the name of your Worksheet.

Count Duplicates in Excel

Similarly as with the previous case, there several ways to go around Counting Duplicates:

Count Duplicates using Excel Formulas

count duplicates in excelTo Count Duplicates in Excel follow the same steps as in Find Duplicates using Excel Formulas except you need to use the following Excel Formula using COUNTIF for the example in this post:

Similarly as in Find Entire Row Duplicates using Excel Formulas to Count Entire Rows that are duplicates, follow the above steps and create a similar Helper Column as such:

Count Duplicates using MS Query

Microsoft Query is a great tool to run MS Queries (SQL) on Excel data.

If you haven’t use MS Queries read my post on How to Create an MS Query in Excel or check-out my Excel SQL AddIn.

We can easily use a Microsoft Query to count only the duplicate records within our Data Set:
count duplicates in excel ms query
Here is the MS Query for the above Table:

vlookup

How to use VLOOKUP in Excel!

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading...

The Excel VLOOKUP function (Vertical Lookup) is one of need-to-know features in Excel! I can’t stress enough how many articles/posts out there have been published on using VLOOKUP. Probably rightly so. I couldn’t however resist to make an my attempt at consolidating all the knowledge around how to use VLOOKUP and all the bits and pieces of information you need to know to be able to fully leverage the vertical lookup function in Excel! Let’s start with a VLOOKUP example…

The VLOOKUP Phonebook analogy

To quickly explain what VLOOKUP does I usually like to use the common Phonebook analogy. In the example below imagine we have a friend called John White. Let’s say we want to call John. What we need to do is to locate John in our Phonebook by his Name and Surname and call him on his corresponding phone number located within the same row.

The VLOOKUP Phonebook analogy
The VLOOKUP Phonebook analogy

VLOOKUP returns a cell located at a row corresponding to your lookup value. In a phonebook analogy you would be looking-up a person’s phonenumber by his name and surname

Everyday we deal with similar lookup problems e.g. restaurant menu (looking up the dish price), grocery lists (how many bottles of milk did I need to buy?) etc. Hence the usefulness of using VLOOKUP. Let us move on to learn how to use VLOOKUP.

How to use VLOOKUP?

Considering the above VLOOKUP example let’s take a closer look at the parameters of the VLOOKUP function:

Excel VLOOKUP Function Parameters
Excel VLOOKUP Function Parameters

Let’s look at the arguments of VLOOKUP in more detail:

  • lookup_value – value you want to look up. Must be in the first column of the range of cells you specify in table-array
  • table_array – range of cells VLOOKUP will
    1. Search FIRST column to find the ROW containing the lookup_value
    2. Search and return corresponding value in column col_index_num of the table_array
  • col_index_num – column number within the table_array that contains return values. The first left-most column in the table_array starts numbering with 1
  • range_lookup – an OPTIONAL boolean (True/False) parameter that defines how the VLOOKUP function will behave:
    1. FALSE – search for the first EXACT MATCH in the first column of the table_array
    2. TRUE – Default option. assume the FIRST COLUMN in table_array is sorted either numerically or alphabetically and return an APPROXIMATE MATCH. This parameter works correctly ONLY IF THE FIRST COLUMN OF THE table_array IS SORTED!

FALSE VLOOKUP Example

Nothing beats a hands-on VLOOKUP example! See the below animation to witness for yourself how VLOOKUP works!

VLOOKUP Example
VLOOKUP Example

Notice that I am using the range_lookup set to FALSE. This means that the VLOOKUP function will always look for an EXACT MATCH.

TRUE VLOOKUP Example

In the previous example the range_lookup parameter was set to FALSE. This example will equally work with range_lookup equal to TRUE as long as we sort our FIRST COLUMN alphabetically (or numerically in other cases)!

TRUE VLOOKUP
TRUE VLOOKUP

So what’s the difference you will ask? PERFORMANCE. The TRUE (approximated match) VLOOKUP is significantly faster than the FALSE (exact match) VLOOKUP. Read here more on VLOOKUP Performance.

Important! If you got hooked up already on the performance bit of the TRUE VLOOKUP beware! The TRUE VLOOKUP will always return a result… even if it didn’t find an exact match. That is a setback. Luckily Excel experts have found a way around this with a DOUBLE TRUE VLOOKUP. Read on.

DOUBLE TRUE VLOOKUP

To exemplify the DOUBLE TRUE VLOOKUP let’s consider this table:
table
We want to match the Animal Monkey against a Category. If we use a standard FALSE VLOOKUP like this:

=VLOOKUP("Monkey";A1:B5;2;FALSE)

We will get the following results:

"Mammal" if found
"#N/A" if not found

Great! But what happens if we use a simple TRUE VLOOKUP?:

=VLOOKUP("Monkey";A1:B5;2;TRUE)

We will get the following results:

"Mammal" if found
"Amphibian" (or similar) if not found

Well that’s not very appropriate huh? How can we know for sure if the TRUE VLOOKUP cross-referenced the Animal correctly? Well.. why not use a second VLOOKUP for that? Consider the DOUBLE VLOOKUP below:

=IF(VLOOKUP("Monkey";A1:B5;1;TRUE)="Monkey";VLOOKUP("Monkey";A1:B5;2;TRUE);)

We will get the following results:

"Mammal" if found
"" if not found (replace "" with #N/A if needed)

See what happens here? The first VLOOKUP validates whether there is an EXACT MATCH in the first column. If so the second VLOOKUP will return the corresponding result. See below:

DOUBLE TRUE VLOOKUP Example
DOUBLE TRUE VLOOKUP Example

Are 2 DOUBLE TRUE VLOOKUPs really worth the trouble? If you are looking for performance than YES! Read more here.

INDEX MATCH vs VLOOKUP – Do’s and Don’t

If you’ve been using VLOOKUP long enough you probably stumbled across MANY articles/posts on why many consider a certain combination of two Excel functions i.e. INDEX MATCH, better than using a regular VLOOKUP. In fact MOST Google results for VLOOKUP vs INDEX MATCH won’t mention the significant advantage VLOOKUP has over INDEX MATCH which I hinted in the previous section.

Let’s compare the pro’s and con’s of using INDEX MATCH (instead of VLOOKUP):

  • More flexible – allows you to match both against rows and columns
  • Less error prone – adding/removing columns/rows from the lookup table should not crash the INDEX-MATCH combo
  • Both vertical and horizontal lookups – VLOOKUP and HLOOKUP address either only vertial or horizontal lookups, whereas with the INDEX MATCH you can easily do both
  • Harder to use – VLOOKUP is a little easier to understand than INDEX-MATCH and I know some people have difficulty with this two step approach
  • Slower (at least in newer versions of Excel like 2013) – opposed to the broadly shared myth INDEX MATCH is no longer faster than a simple VLOOKUP. Although, I still think INDEX-MATCH is better in most cases. Read my post here for how VLOOKUP compares in terms of performance against INDEX MATCH.

VLOOKUP with Multiple criteria

In some cases you would like to run a VLOOKUP against MORE THAN ONE COLUMN i.e. lookup a certain set of values against several columns instead of just one column. There is a simple way to achieve this by introducing a HELPER COLUMN with a concatenation of the lookup columns. Let’s consider the example below:

VLOOKUP Multiple Criteria: Example
VLOOKUP Multiple Criteria: Example

As you can see each month is specified in two separate columns. A simple one-column VLOOKUP will not do. So how to use VLOOKUP to get the result for multiple criteria? Using a HELPER COLUMN! See the solution below:
VLOOKUP Multiple Criteria: Solution
VLOOKUP Multiple Criteria: Solution

The above is the simplest approach to a multiple criteria VLOOKUP. There are more elegant approaches that need not a Helper Column e.g. check-out Chandoo’s example here.

VLOOKUP Performance

If you are impatient for the answer – for best performance always use DOUBLE TRUE VLOOKUPS. Now for a more thorough explanation let’s start from the beginning and summarize what we know in bullet points:

  • The VLOOKUP function can be well replaced with other functions/features in Excel
  • A common practice it to replace a VLOOKUP with a INDEX MATCH function combo. This does not affect performance (much) but solve a lot of typical usability issues
  • I have hinted above that a TRUE (approximated) VLOOKUP on a sorted table_array will have better performance that a regular FALSE (exact match) VLOOKUP

The TRUE (approximate) VLOOKUP function seems to be best candidate in terms of performance. However, approximating the result may render certain issues – a TRUE VLOOKUP will ALWAYS return a result! Even if there is no exact match! This creates a certain issue for us. Luckily there is a certain trick to use 2 TRUE VLOOKUPs in a combo to replace a regular FALSE VLOOKUP and expect similar results. This is commonly called a DOUBLE TRUE VLOOKUP.

Performance: VLOOKUPs VS INDEX MATCH VS SQL

The below performance comparison was carried out by me in my separate post on VLOOKUP vs INDEX-MATCH vs SQL. Follow to my post to read more. In the meantime let’s look at the comparison:

VLOOKUP Performance Comparison
VLOOKUP Performance Comparison

What do the various categories mean?:

  • VLOOKUP (sorted) – a regular FALSE (exact match) VLOOKUP against a sorted table_array
  • DOUBLE TRUE VLOOKUP (sorted) – 2 combined TRUE (approximate match) VLOOKUPs against a sorted table_array
  • INDEX-MATCH (sorted) – a combination of the INDEX and MATCH functions against a sorted table_array
  • SQL (Sorted) – an Excel MS Query (SQL) executed against a sorted table_array

How to use VLOOKUP recap

Well I do hope this exhausts the subject of using VLOOKUP in Excel. VLOOKUP is a common used function in Excel sometimes wrongly which, I hope from my performance stats above, can seriously tamper with your Workbook performance causing Excel Workbooks to recalculate in matters of minutes instead of seconds. Feel free to share your comments/thoughts below or on the AnalystCave forum.

dynamic row numbers

Dynamic Row Numbers in Excel

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading...

Today I want to elaborate shortly on how to correctly and easily number rows in Excel by adding dynamic row numbers using simple formulas. Every neat data table in Excel should have a numbering column in place so that every row can be easily reference at least by the item number. One way of numbering rows is to simply input the numbers and drag them down. However, this is a static manner and the numbering will not refresh automatically if you change the places of any rows or add/delete rows. Here I want to introduce several easy ways to achieve nice and neat dynamic row numbering in Excel. Ok so let’s dive right into 3 methods for achieving nice dynamic row numbers in Excel.

Method 1 – dynamic row numbering with w/o skipping empty rows

Let’s say we want to dynamically number rows counting also every empty row in between i.e. if there is an empty row between the numbers we want the numbering to account for that an increase the index. I use this approach most often due to it’s simplicity. See the example below on how this works.

Method 1 - Dynamic row numbers w/o skipping rows
Method 1 – Dynamic numbering w/o skipping rows

The formula is very simple:

=ROW()-ROW(A$1)

A$1 is simply the header of the column to guarantee we start numbering from 1. Easy and neat right?

Method 2 – Dynamic numbering skipping empty rows

This time let’s account for every empty row in between. We want to continue the numbering from the last index. This comes in handy when you have sections of data or if you group the rows into different headers but want to retain the right numbering. See the example below:

Method 2 - Dynamic row numbers with skipping empty rows
Method 2 – Dynamic numbering skipping empty rows

Again the formula is dead simple:

=COUNTA(C$1:C1)

We use the static $ marker to make the range start from always the first cell in the column. The formula will count all non-blank rows so will skip any blanks we leave in between.

Dynamic nested numbering

Now for a bonus – dynamic nested numbering. Sometimes we need to add numbering with nested indices. I certainly encourage using nested numbering as it makes many tables more clear to read and the grouping more obvious. The approach/formula below can easily be reused to support nesting of additional levels. Numbering nested indices manually is often a nightmare if we need to frequently rearrange rows or add/delete some rows in between. Unfortunately, no formula will know for us when to automatically increment the first few nested indices (e.g. 1.2.1 to 1.3.1), but we can automatically increment the last index in the nested numbering index. See below.

Dynamic nested row numbering
Dynamic nested numbering

We need to input manually add the first index e.g.

'Cell F2
1.1

Then below we can now use an automatic formula that we do the increment for us:

'Cell F3
=LEFT(F2;FIND(".";F2))&RIGHT(F2;LEN(F2)-FIND(".";F2))+1

We can repeat this process with the next indices e.g. 2.1, 3.1 etc. We will always need to type the first one manually but the formula can help renumber the subsequent indices automatically e.g. 2.2,2.3,2.4 etc.

Hope this helps you with generating those dynamic row numbers!