Tag Archives: excel function

how to find duplicates in excel

How to Find Duplicates in Excel. Remove Duplicates in Excel

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 4.00 out of 5)
Loading...

How to find duplicates in Excel? Today we will explore this question in and out. We will start first by understanding How to Highlight Duplicates in Excel. Our second objective will be learning How to Remove Duplicates. I will explore several approaches to this task – using the Data ribbons Remove Duplicates feature, but also showing how we can use MS Queries to remove duplicates and also do additional filtering/sanitizing of our data set.

CONTENTS

Find Duplicates in Excel

find duplicates source table
Example Table with Duplicates
There are several ways to go around Finding or Highlighting duplicates in Excel:

Following today’s post we will use the example Excel Table on the right.

1. Find Duplicates using Excel Formulas

To Find Duplicates in Excel using Formulas we will use the COUNTIF Excel formula to verify if a certain row as been repeated above. This will work only on a single column (our unique column identifier). Scroll down to the end to learn how to check for duplicates based on all three columns.

Add Is Duplicate? Column to your Worksheet

Add a Is Duplicate? Column to your Worksheet. In our example assuming that the Last column uniquely identifies records in my table input the following formula in the first cell of your new Is Duplicate? Column:

find duplicates in excel using formulas
Add COUNTIF Excel Formula to Find Duplicates

Excel Formula:

=IF(COUNTIF(B$1:B1;"="&B2);"DUPLICATE";"")

Drag the Formula down to all Cells

Drag the COUNTIF Formula from the First Cell down to the Last.
find duplicates in excel the steps

Instead of dragging the Formula down manually you can:

  1. Select the First Cell with the Formula
  2. Copy by Formula by hitting CTRL+C
  3. Select all cells down by hitting CTRL+
  4. Paste the Formula by hitting CTRL+P

Find Entire Row Duplicates

Did you notice that in the previous example 2 rows were in fact not duplicates? See below:

find duplicates in excel for an entire row
False Duplicates as Last name is not unique!

What should you do if there is no Column that Uniquely identifies your rows? We need to use a Helper Column! See Column D below:
find duplicates in excel with a helper column
Add a Helper Column that Concatenates All Columns

Simply add the Helper Column and Concatenate all other Columns like so:

=A2&"_"&B2&"_"&C2

Next you need to use the COUNTIF Formula on the Helper Column instead.

I recommend separating columns with some Separator (such as the _) as otherwise you may again get False Positive Duplicates

2. Highlight Duplicates using Conditional Formatting

highlight duplicates conditional formatting
Conditional Formatting in HOME Ribbon Tab
To Highlight Duplicates in Excel we need to use the Conditional Formatting feature in the HOME Ribbon Tab in Excel. Follow the steps below:
highlight duplicates conditional formatting
How to Highlight Duplicates in Excel using Conditional Formatting

Select the Column to Highlight Duplicates

highlight duplicates select columnSelect the Column in which you want to Highlight your Duplicates. As a Tip you can Select the First Cell and hit CTRL+END.

Open Conditional Formatting

highlight duplicates conditional formattingGo to the HOME Excel Ribbon Tab and select Conditional Formatting.

Create New Rule

highlight duplicates create new ruleAs we want to Highlight Duplicate Cell Values in Excel we need to create a new Conditional Formatting Rule. Click New Rule.

Select Duplicate formatting

highlight duplicates formatting
Select the Format only unique or duplicate values option to define the Formatting of Duplicate Values.

Select Duplicate formatting

In the new window make sure Format All is set to duplicate. Next set the formatting for your Duplicates.
highlight duplicates format
Once completed you should see your Duplicate Cells Highlighted in your Excel spreadsheet.
highlight duplicates in excel

Remove Duplicates in Excel

Having answered the question of How to Find Duplicates in Excel, but what you often want to do is Remove Duplicates. So lets now tackle the issue of How to Remove Duplicates in Excel.

Similarly as with Finding Duplicates, there several ways to go around Removing Duplicates:

Remove Duplicates with Data Ribbon

Fortunately Removing Duplicates is a typical scenario in Excel supported by the Data Ribbon Tab.
remove duplicates in excel

Select Excel Table

remove duplicates select tableSelect your Entire Excel Table by hitting CTRL+A.

Remove Duplicates

remove duplicates data ribbonGo to the Data Ribbon Tab in Excel and select Remove Duplicates.

Select Columns with Duplicates

remove duplicates select columns with duplicatesIn this last step you need to Select All Columns in which there are Duplicates. If you just want to remove rows in which there are duplicates in a single column – remember to select only that Column.
Once you finish hit OK and done!

Remove Duplicates with MS Query

Microsoft Query is a great tool to run MS Queries (SQL) on Excel data.

If you haven’t use MS Queries read my post on How to Create an MS Query in Excel or check-out my Excel SQL AddIn.

The DISTINCT SQL statement filters only the DISTINCT rows within a TABLE. You can use it on any SQL SELECT Query. See my example below:
remove duplicates ms query
The table on the Left is the Source Table the table on Right is the Result Table. This is the Microsoft Query you can use to filter only DISTINCT rows:

Where replace Sheet1 with the name of your Worksheet.

Count Duplicates in Excel

Similarly as with the previous case, there several ways to go around Counting Duplicates:

Count Duplicates using Excel Formulas

count duplicates in excelTo Count Duplicates in Excel follow the same steps as in Find Duplicates using Excel Formulas except you need to use the following Excel Formula using COUNTIF for the example in this post:

Similarly as in Find Entire Row Duplicates using Excel Formulas to Count Entire Rows that are duplicates, follow the above steps and create a similar Helper Column as such:

Count Duplicates using MS Query

Microsoft Query is a great tool to run MS Queries (SQL) on Excel data.

If you haven’t use MS Queries read my post on How to Create an MS Query in Excel or check-out my Excel SQL AddIn.

We can easily use a Microsoft Query to count only the duplicate records within our Data Set:
count duplicates in excel ms query
Here is the MS Query for the above Table:

excel webservice function

Excel WEBSERVICE and FILTERXML functions explained

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

One of the incredible new features coming to Excel 2013 are the Excel WEBSERVICE and Excel FILTERXML Worksheet functions. Excel 2013 introduces over 50 new Worksheet functions but I will tell you why the 2 above-mentioned rock my world ever since I learn of them. It is incredible that the introduction of this awesome function was almost unnoticed by most Excel gurus out there…

Excel up till now has been mostly an offline application. Although, you can use VBA, PowerQuery or other similar data tools in Excel to gain access to Internet and Online data sets these could not have been easily used with the help of Third Party AddIns such as my Scrape HTML AddIn. With the introduction of the WEBSERVICE function we gain immediate and easy access to any REST WebAPI.

In today’s post I will show you several examples of how to use these functions in practice. And how to extract data from an XML REST WebApi.

How to use the Excel WEBSERVICE function

The WEBSERVICE function downloads the HTTP response of the provided URL.

excel webservice function
The Excel WEBSERVICE Worksheet function

Try running the Excel WEBSERVICE function on Google:

=WEBSERVICE("https://www.google.com")

What do you get? If all goes well you should get the HTML response for the Google Web Page.
Now past the above formula to cell A1 and the below to cell A2:

=MID(A1;FIND("<title>";A1)+LEN("<title>");FIND("</title>";A1)-FIND("<title>";A1)-LEN("<title>"))

The result of this Worksheet Excel Formula should be:

Google

Using FILTERXML in tandem with WEBSERVICE

Another function introduced in Excel 2013 is the FILTERXML function. It is designed to work in tandem with the Excel WEBSERVICE function.

What does the FILTERXML function do?

The Excel FILTERXML Worksheet function parses a XML string (string containing XML document) and returns a single element (node or attribute) provided by a XPath.

excel filterxml
The Excel FILTERXML Worksheet Function

Lots of odd words there right? So let us start breaking it down. XML is markup language for encoding documents. As a matter of fact HTML is based on XML and share a lot of similarities. Unfortunately for us HTML does not need often have to be as strictly parsed/validated as an XML does. Although some HTML could pass as XML files – in fact many Websites don’t validate as XML.

Want to learn how to manage XML documents in VBA instead? Read my VBA XML Tutorial

Now what is XPath? XPath is a query language for selecting XML elements such as nodes and attributes. XPath works for XML and HTML.

How to use the FILTERXML function

Now that we know what the FILTERXML functions let’s use it on a simple XML file. W3Schools fortunately has a lot of simple XML examples – let us use a simple XML Food Menu:

Example: WEBSERVICE and FILTERXML function

filterxml and webservice example 1
Using WEBSERVICE and FILTERXML in tandem – Formulas

Example: WEBSERVICE and FILTERXML result

filterxml and webservice example 2
Using WEBSERVICE and FILTERXML in tandem – Result

Explanation

xml example
W3Schools XML Example Food Menu
Now to explain what happens above. The XML file includes a couple of nodes – each one nested within the previous one. We start with node which hosts nodes. Each food node represents a single item in the menu. A food node contains , and node – which describe each menu item.

Now the FILTERXML functions used the following XPath: //food/name. This translates to: take the first food tag and return the contents of it name node.

XPath is an easy query language to learn. I personally recommend the WSchools XPath tutorial

Scraping a whole XML document

Now the example above is fine when you need just a single node from your XML document URL. What if you want to scrape the whole contents of that XML? Thankfully we can combine the WEBSERVICE and FILTERXML functions with Array Formulas.

In the example below I will show you how to acquire all the names of the food items in the menu. You can use a similar technique to get other items.

Input the FILTERXML formula

Input the FILTERXML formula as shown below:

=FILTERXML(B2;"//food/name")

filterxml and webservice example 4

Drag the formula down

Drag the formula down to row 8:
filterxml and webservice example 5

Hit CTRL+SHIFT+ENTER

Hit the following key combo to create and Array Formula: CTRL+SHIFT+ENTER.
That is it. Now in each row you should see the name of a food item from the menu.

Don’t like using Array Functions? You can also use the XPath node index instead:

=FILTERXML(B2;"//food[2]/name")

will return the name of the second food menu item. To replicate this across all items use this example:

=FILTERXML(B$2;"//food[" & (ROW()-ROW(B$4)+1) &"]/name")

Provide you entered this function in cell B2, simply drag it down – the items should automatically complete.

WEBSERVICE functions Restrictions

Now the WEBSERVICE function unfortunately has several restrictions that will cause the function to return a #VALUE! error instead of the string:

  • If you don’t have a working Internet connection (or you are working with a proxy server)
  • Incorrect arguments or URL address
  • If HTTP result is not valid or contains more than the cell limit of 32767 characters
  • URL is a string that contains more than the 2048 characters that are allowed for a GET HTTP request
  • Protocols that aren’t supported, such as ftp:// or file://

Excel WEBSERVICE summary

The WEBSERVICE and FILTERXML functions are a great step forward to enabling access to Internet resources. These still have unfortunately a lot of limits. Especially when most websites have HTML files that exceed the 32727 character limit and often don’t parse as XML files.

This is where my VBA Web Scraping Kit fills the gap together with my Scrape HTML AddIn. The Kit has all the Web Scraping scenarios I consider possible in Excel where as the Scrape HTML AddIn extends a little the constrains of the WEBSERVICE and the FILTERXML functions

Want to learn Web Scraping?

Not satisfied? Want to know more about Web Scraping in Excel using VBA? Read my zero-to-hero Web Scraping Tutorial.

excel substrings

Excel Substring and VBA Substring – Left, Right, Mid, Split etc.

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 5.00 out of 5)
Loading...

Strings are not easy to manipulate and often we need to create a Excel Substring or VBA Substring. Excel and VBA in fact have multiple functions that support obtaining substrings.

Excel / VBA Substring functions

In the table below I compared the basic Excel and VBA Substring functions.

Description Excel Function VBA Function
Get # characters from the left
=LEFT("Hello";2) 
'Equals "He"
Left("Hello", 2) 
'Result: "He"

Read more here.

Get # characters from the right
=RIGHT("Hello";2) 
'Equals "lo"
Right("Hello", 2) 
'Result: "lo"

Read more here.

Get # characters from the middle MID(text_string, start_number, char_numbers)

=MID("Hello"; 2;3) 
'Result: "ell"
Mid(text_string, start_number, char_numbers)

Mid("Hello", 2, 3)
'Result: "ell"

Read more here.

Split string with delimiter
No function available in Excel
Split ( expression [,delimiter] [,limit] [,compare] )

Split("Hello World", " ")(1)
'Result: "World"

Read more here.

Substring examples

Excel Substring examples

The Left, Right and Mid functions are the basic Excel Substring functions. Below are some examples of Excel substrings using these:

Excel Substring Examples
Excel Substring Examples

VBA Substring examples

Below similar examples using a VBA macro. Additionally in VBA you can use the VBA Split function.

Dim str As String, s As Variant
str = "Hello there John!"
    
Debug.Print Left(str, 5) 'Result: "Hello"
    
Debug.Print Right(str, 5) 'Result: "John!"
   
Debug.Print Mid(str, 7, 5) 'Result: "there"
    
For Each s In Split(str, " ")
   Debug.Print s
Next
'Result: "Hello", "there", "John!"

Substrings using FIND

In some cases the substring you want to find is dependent on the placement of a certain character or another substring within your text. To extract the data you need you will need to use either the Excel Find function or the VBA InStr function. In the example below I am extracting the Year and Month using the LEFT, RIGHT and FIND functions.

Excel Substring Find examples
Excel Substring Find examples

What does Find do? If return the index position of a certain character or substring within a given string. Using this I can leverage the LEFT and RIGHT function to extract the data around the hyphen character - wherever it will be placed within a string.

Substrings using InStr and InStrRev

Substrings using Regex

By using Regular Expressions in VBA you can also split strings or get substrings based on virtually any defined pattern. In this post here I show how you can define a new User Defined Function which you can use as an Excel function to get a substring based on a Regular Expression pattern. See example below:

Get substring using Regex in Excel
Get substring using Regex in Excel

VBA String Functions reference

When obtaining substrings it is worth learning other useful VBA String Functions such as InStr. Do see my VBA String Functions reference to learn more!

VBA String Functions
VBA String Functions