Tag Archives: index-match

vlookup

How to use VLOOKUP in Excel!

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading...

The Excel VLOOKUP function (Vertical Lookup) is one of need-to-know features in Excel! I can’t stress enough how many articles/posts out there have been published on using VLOOKUP. Probably rightly so. I couldn’t however resist to make an my attempt at consolidating all the knowledge around how to use VLOOKUP and all the bits and pieces of information you need to know to be able to fully leverage the vertical lookup function in Excel! Let’s start with a VLOOKUP example…

The VLOOKUP Phonebook analogy

To quickly explain what VLOOKUP does I usually like to use the common Phonebook analogy. In the example below imagine we have a friend called John White. Let’s say we want to call John. What we need to do is to locate John in our Phonebook by his Name and Surname and call him on his corresponding phone number located within the same row.

The VLOOKUP Phonebook analogy
The VLOOKUP Phonebook analogy

VLOOKUP returns a cell located at a row corresponding to your lookup value. In a phonebook analogy you would be looking-up a person’s phonenumber by his name and surname

Everyday we deal with similar lookup problems e.g. restaurant menu (looking up the dish price), grocery lists (how many bottles of milk did I need to buy?) etc. Hence the usefulness of using VLOOKUP. Let us move on to learn how to use VLOOKUP.

How to use VLOOKUP?

Considering the above VLOOKUP example let’s take a closer look at the parameters of the VLOOKUP function:

Excel VLOOKUP Function Parameters
Excel VLOOKUP Function Parameters

Let’s look at the arguments of VLOOKUP in more detail:

  • lookup_value – value you want to look up. Must be in the first column of the range of cells you specify in table-array
  • table_array – range of cells VLOOKUP will
    1. Search FIRST column to find the ROW containing the lookup_value
    2. Search and return corresponding value in column col_index_num of the table_array
  • col_index_num – column number within the table_array that contains return values. The first left-most column in the table_array starts numbering with 1
  • range_lookup – an OPTIONAL boolean (True/False) parameter that defines how the VLOOKUP function will behave:
    1. FALSE – search for the first EXACT MATCH in the first column of the table_array
    2. TRUE – Default option. assume the FIRST COLUMN in table_array is sorted either numerically or alphabetically and return an APPROXIMATE MATCH. This parameter works correctly ONLY IF THE FIRST COLUMN OF THE table_array IS SORTED!

FALSE VLOOKUP Example

Nothing beats a hands-on VLOOKUP example! See the below animation to witness for yourself how VLOOKUP works!

VLOOKUP Example
VLOOKUP Example

Notice that I am using the range_lookup set to FALSE. This means that the VLOOKUP function will always look for an EXACT MATCH.

TRUE VLOOKUP Example

In the previous example the range_lookup parameter was set to FALSE. This example will equally work with range_lookup equal to TRUE as long as we sort our FIRST COLUMN alphabetically (or numerically in other cases)!

TRUE VLOOKUP
TRUE VLOOKUP

So what’s the difference you will ask? PERFORMANCE. The TRUE (approximated match) VLOOKUP is significantly faster than the FALSE (exact match) VLOOKUP. Read here more on VLOOKUP Performance.

Important! If you got hooked up already on the performance bit of the TRUE VLOOKUP beware! The TRUE VLOOKUP will always return a result… even if it didn’t find an exact match. That is a setback. Luckily Excel experts have found a way around this with a DOUBLE TRUE VLOOKUP. Read on.

DOUBLE TRUE VLOOKUP

To exemplify the DOUBLE TRUE VLOOKUP let’s consider this table:
table
We want to match the Animal Monkey against a Category. If we use a standard FALSE VLOOKUP like this:

=VLOOKUP("Monkey";A1:B5;2;FALSE)

We will get the following results:

"Mammal" if found
"#N/A" if not found

Great! But what happens if we use a simple TRUE VLOOKUP?:

=VLOOKUP("Monkey";A1:B5;2;TRUE)

We will get the following results:

"Mammal" if found
"Amphibian" (or similar) if not found

Well that’s not very appropriate huh? How can we know for sure if the TRUE VLOOKUP cross-referenced the Animal correctly? Well.. why not use a second VLOOKUP for that? Consider the DOUBLE VLOOKUP below:

=IF(VLOOKUP("Monkey";A1:B5;1;TRUE)="Monkey";VLOOKUP("Monkey";A1:B5;2;TRUE);)

We will get the following results:

"Mammal" if found
"" if not found (replace "" with #N/A if needed)

See what happens here? The first VLOOKUP validates whether there is an EXACT MATCH in the first column. If so the second VLOOKUP will return the corresponding result. See below:

DOUBLE TRUE VLOOKUP Example
DOUBLE TRUE VLOOKUP Example

Are 2 DOUBLE TRUE VLOOKUPs really worth the trouble? If you are looking for performance than YES! Read more here.

INDEX MATCH vs VLOOKUP – Do’s and Don’t

If you’ve been using VLOOKUP long enough you probably stumbled across MANY articles/posts on why many consider a certain combination of two Excel functions i.e. INDEX MATCH, better than using a regular VLOOKUP. In fact MOST Google results for VLOOKUP vs INDEX MATCH won’t mention the significant advantage VLOOKUP has over INDEX MATCH which I hinted in the previous section.

Let’s compare the pro’s and con’s of using INDEX MATCH (instead of VLOOKUP):

  • More flexible – allows you to match both against rows and columns
  • Less error prone – adding/removing columns/rows from the lookup table should not crash the INDEX-MATCH combo
  • Both vertical and horizontal lookups – VLOOKUP and HLOOKUP address either only vertial or horizontal lookups, whereas with the INDEX MATCH you can easily do both
  • Harder to use – VLOOKUP is a little easier to understand than INDEX-MATCH and I know some people have difficulty with this two step approach
  • Slower (at least in newer versions of Excel like 2013) – opposed to the broadly shared myth INDEX MATCH is no longer faster than a simple VLOOKUP. Although, I still think INDEX-MATCH is better in most cases. Read my post here for how VLOOKUP compares in terms of performance against INDEX MATCH.

VLOOKUP with Multiple criteria

In some cases you would like to run a VLOOKUP against MORE THAN ONE COLUMN i.e. lookup a certain set of values against several columns instead of just one column. There is a simple way to achieve this by introducing a HELPER COLUMN with a concatenation of the lookup columns. Let’s consider the example below:

VLOOKUP Multiple Criteria: Example
VLOOKUP Multiple Criteria: Example

As you can see each month is specified in two separate columns. A simple one-column VLOOKUP will not do. So how to use VLOOKUP to get the result for multiple criteria? Using a HELPER COLUMN! See the solution below:
VLOOKUP Multiple Criteria: Solution
VLOOKUP Multiple Criteria: Solution

The above is the simplest approach to a multiple criteria VLOOKUP. There are more elegant approaches that need not a Helper Column e.g. check-out Chandoo’s example here.

VLOOKUP Performance

If you are impatient for the answer – for best performance always use DOUBLE TRUE VLOOKUPS. Now for a more thorough explanation let’s start from the beginning and summarize what we know in bullet points:

  • The VLOOKUP function can be well replaced with other functions/features in Excel
  • A common practice it to replace a VLOOKUP with a INDEX MATCH function combo. This does not affect performance (much) but solve a lot of typical usability issues
  • I have hinted above that a TRUE (approximated) VLOOKUP on a sorted table_array will have better performance that a regular FALSE (exact match) VLOOKUP

The TRUE (approximate) VLOOKUP function seems to be best candidate in terms of performance. However, approximating the result may render certain issues – a TRUE VLOOKUP will ALWAYS return a result! Even if there is no exact match! This creates a certain issue for us. Luckily there is a certain trick to use 2 TRUE VLOOKUPs in a combo to replace a regular FALSE VLOOKUP and expect similar results. This is commonly called a DOUBLE TRUE VLOOKUP.

Performance: VLOOKUPs VS INDEX MATCH VS SQL

The below performance comparison was carried out by me in my separate post on VLOOKUP vs INDEX-MATCH vs SQL. Follow to my post to read more. In the meantime let’s look at the comparison:

VLOOKUP Performance Comparison
VLOOKUP Performance Comparison

What do the various categories mean?:

  • VLOOKUP (sorted) – a regular FALSE (exact match) VLOOKUP against a sorted table_array
  • DOUBLE TRUE VLOOKUP (sorted) – 2 combined TRUE (approximate match) VLOOKUPs against a sorted table_array
  • INDEX-MATCH (sorted) – a combination of the INDEX and MATCH functions against a sorted table_array
  • SQL (Sorted) – an Excel MS Query (SQL) executed against a sorted table_array

How to use VLOOKUP recap

Well I do hope this exhausts the subject of using VLOOKUP in Excel. VLOOKUP is a common used function in Excel sometimes wrongly which, I hope from my performance stats above, can seriously tamper with your Workbook performance causing Excel Workbooks to recalculate in matters of minutes instead of seconds. Feel free to share your comments/thoughts below or on the AnalystCave forum.

VLOOKUP

Excel VLOOKUP vs INDEX MATCH vs SQL vs VBA

1 Star2 Stars3 Stars4 Stars5 Stars (8 votes, average: 5.00 out of 5)
Loading...

VLOOKUP vs INDEX MATCH vs SQL vs VBA – today you are in for the ultimate Excel Showdown. The VLOOKUP Excel function is one of the most popular functions, around which there has always been much debate. You will most definitely find an article about this function on almost every Excel blog site out there that matters. Similarly there has been much argue about how efficient this function is, when compared to other combos like INDEX MATCH or DOUBLE TRUE VLOOKUPS. I have always wanted to put the dot over the “i” in at least the discussion around performance when using the VLOOKUP and the INDEX MATCH combo (VLOOKUP vs INDEX MATCH).

VLOOKUP vs INDEX MATCH
What to do with your VLOOKUPs to significant gain in performance, and what to replace them with if you are looking to make the workbook more maintainable? How much performance will you actually gain? Hopefully here you will find answers to these questions.

One additional thing I always wanted to bring into this discussion was the MS Query. Many Excel experts often forget to mention that when you need to lookup a lot of values within a certain table there is an approach almost as effective as any Excel trick out there – Microsoft Query (SQL). Excel features so call Query Tables which can execute OLEDB SQL queries on Excel data (worksheets treated like separate SQL Tables).

This means that instead of doing a lookup of a certain value cell-by-cell you can do it within a single query. This query can always be refreshed at the push of a button (or macro), instead of dealing with uncontrollable automatic recalculations. I felt the urge to include this approach in this post as it can challenge face on all the other approaches out there. But let us start from the beginning…

VLOOKUP Example

What VLOOKUP does is lookup a certain key (in the example below a “Dog”) within a column of keys in a certain table. Then it takes a value corresponding to the row in which the key was located and returns a corresponding value from another column.

Let’s see this in the below VLOOKUP example:

VLOOKUP Example
VLOOKUP Example

It is one of the most often used formulas and simple enough. However, the VLOOKUP function has several setbacks:

  • Hard to maintain when columns are added/removed to/from the lookup table
  • Key column needs to be first in the lookup table
  • Little flexibility – cannot be used to match against both rows and columns of a lookup table. Although it can be replaced with HLOOKUP (the forgotten twin brother of VLOOKUP) this can be a nuisance if you want to create a table lookuping up both columns and rows

Why INDEX MATCH?

There are many decent posts on why to consider using INDEX MATCH against the common VLOOKUP. But before we go into the pros and cons let’s understand how the INDEX MATCH combo works.

In short we can replace a VLOOKUP with a combo consisting of 2 functions:

  • INDEX – returning the value of an element in a table or an array, selected by the row and/or column number indexes)
  • MATCH – returning the relative position of an item in a specified range

In the example below the MATCH function will first return the relative position (the row number) of the Dog in the A column. Next the INDEX function will return a corresponding value from the same row in column B.

INDEX MATCH Combo
INDEX MATCH Combo

Not really much complicated than the VLOOKUP but the INDEX MATCH combo certainly handles all of the setbacks of the VLOOKUP pretty well (see section above).

But why use INDEX MATCH instead of VLOOKUP, especially if we get the exactly same result? Short summary of the pros and cons of the INDEX MATCH vs. the VLOOKUP:

  • More flexible – allows you to match both against rows and columns
  • Less error prone – adding/removing columns/rows from the lookup table should not crash the INDEX MATCH combo
  • Can be split to match multiple columns – by splitting the INDEX from the MATCH we can in fact match several column off a single INDEX column which points us to the result row
  • Both vertical and horizontal lookups – VLOOKUP and HLOOKUP address either only vertial or horizontal lookups, whereas with the INDEX MATCH you can easily do both
  • Harder to use – VLOOKUP is a little easier to understand than INDEX MATCH and I know some people have difficulty with this two step approach

What about SQL?

As promised a quick example as to how SQL (MS Query) can fit in. The query below will do the same lookup operation as the VLOOKUP and the INDEX-MATCH.

Returning the value

A Query to lookup a whole column of values would look more like this:

How to create an SQL query in Excel? Just go to:DATA From Other Sources From Microsoft Query or check out my Excel SQL Add-In.

Now SQL will not prove much useful for just a single lookup operation. Its benefits appear when needing to carry out A LOT of lookup operations. As you will see there are certain tricks you can use in Excel to get better performance than SQL can provide you, however, I would encourage learning SQL as in most cases it can easily replace the need to create complex queries or Array Formulas and still provide awesome performance.

What about VBA?

Couldn’t miss out on an opportunity to check how VBA compares to the other approaches, although it seems pretty obvious it wouldn’t be a fair match – as VBA is singlethreaded as opposed to Excel’s native formulas, which are recalculated concurrently in multiple threads. To challenge the other approaches I devised a simple VBA procedure using the VBA Dictionary object.
vlookup vba macro
How does the VBA lookup procedure work? It loads the entire lookup table to a VBA Dictionary object and then looks-up the entire lookup values against the VBA Dictionary.

DOUBLE TRUE VLOOKUP

The VLOOKUP allows you to either approximate a match (by providing range_lookup value to TRUE) or select an exact match. Strangely enough Excel defaults to the approximate options which is obviously a nuisance for most of us… but maybe that was actually a hint from Microsoft that this is the option to go with? It turn out that using the TRUE option that approximates the lookup result will return the lookup value providing a significant performance boost (scroll down to see how significant).

However, a VLOOKUP using the TRUE option will always return a result – not necessarily the one you were looking for. The trick therefore socialized by Charles Williams here shows how using 2 TRUE VLOOKUPS you can get accurate results with great performance. How does it work? In short what Charles suggests is using an IF function with the condition being the FIRST TRUE VLOOKUP to match for the lookup value (the key). If the result matches against the original lookup value (the key) the IF functions returns the SECOND TRUE VLOOKUP which returns the matched value in the right column. If not it returns any default value set. See below:

DOUBLE TRUE VLOOKUP
DOUBLE TRUE VLOOKUP

The formula below:

Notice that the condition in the formula verifies if the VLOOKUP located the right lookup value. If so, the second VLOOKUP will return the associated lookup result. Might seem strange at first that 2 VLOOKUPs are better than one. Let’s look at the statistics to see how it will compare against the other approaches…

There is one CONSIDERABLE setback to keep in mind. The lookup table has to be SORTED by the lookup column (key column)! Otherwise the query will return inconclusive results.

Let’s now dive right into the performances stats around these different approach.

Performance comparison

Before we start I want to level set a couple of things. I am running these tests with Excel 2013 installed so keep in mind that if you are using a different version you may see slightly different results (even the INDEX MATCH being quicker then the simple VLOOKUP as I am told to believe). As all things changes, so should the discussion around VLOOKUP need updating – especially performance-wise. Apart from that some additional things to mention:

  • Tests run on an Intel i5-4300U processor (2 physical cores)
  • Tests run on a large dataset (200k lookup table) and assuming a large number of lookups (>25k) to diffuse the problem of accurate performance measurements (25k – 200k lookup operations). Results should give a good approximation of the actual metrics

Lookups on UNSORTED data

In most common cases you are carrying out lookup operations on a UNSORTED lookup table. The chart below present results of the following alternatives:

  1. VLOOKUP (UnSorted) – a simple VLOOKUP on an unsorted lookup table
  2. INDEX MATCH (UnSorted) – a simple INDEX MATCH on an unsorted lookup table
  3. SQL (UnSorted) – an simple SELECT query matching against the lookup values (keys) of the VLOOKUP (returns same results in same order)
  4. VBA (Sorted) – a VBA procedure that creates a dictionary of the lookup table and matches the lookups using the VBA Dictionary. You can find the source code here: source code
Execution time of lookup operations on UNSORTED data
Execution time of lookup operations on UNSORTED data

VLOOKUP vs INDEX MATCH
So what do we see? It seems that the INDEX MATCH combo performs consistently slightly worse then the simple VLOOKUP. There are no large differences when increasing the amounts of operations performed which seems like there is no reason to jump to an INDEX MATCH combo in the need for just performance. SQL (and VBA) on the other hand wiped out the competition being almost 19x faster when executed against 200k lookups. It seems like the MS Query did not increase it’s execution time considerably probably leading to think that it may perform equally well on much larger lookup tables. VBA was also quite efficient when executed on less than 100k lookup operations. Nevertheless, I consider SQL the winner as clearly it performed better for more operations.

Lookups on SORTED data

Let’s now consider the ideal situation where we have a SORTED lookup table. The chart below will now present results of the following alternatives:

  1. VLOOKUP (Sorted) – a simple VLOOKUP on a sorted lookup table
  2. DOUBLE TRUE VLOOKUP (Sorted) – a DOUBLE TRUE VLOOKUP on a sorted lookup table
  3. INDEX MATCH (Sorted) – a simple INDEX MATCH on a sorted lookup table
  4. SQL (Sorted) – an simple SELECT query matching against the lookup values (keys) of the
    VLOOKUP (returns same results in same order)
  5. VBA (Sorted) – a VBA procedure that creates a dictionary of the lookup table and matches the lookups using the VBA Dictionary, you can find the source code here: source code
Execution time of lookup operations on SORTED data
Execution time of lookup operations on SORTED data

VLOOKUP vs INDEX MATCH
As we can see the DOUBLE TRUE VLOOKUP rules the stage with an astonishing 0.22 seconds vs. the aweful 110 seconds of a regular VLOOKUP. An astonishing improvement! SQL comes second with a score just slightly above 5 seconds which seems reasonable (although using only 1 core due to being single threaded). What comes as strange is that both the VLOOKUP and the INDEX MATCH actually performed worse when executed against a sorted lookup table. Not something you might expect, but broadly explained by Excel-guru Bill Jelen in this podcast really worth watching if you want to know more on the subject.

The RIGHT approach

Although the DOUBLE TRUE VLOOKUP proved superior to any other method, VLOOKUP is a function that is far from perfect (read section on INDEX MATCH above). In short VLOOKUP is less immune to changes than the INDEX MATCH. Ideally we would like to have an INDEX MATCH formula that is just as efficient as the DOUBLE TRUE VLOOKUP… Well in fact there is a way. At least if we mix a little of both worlds – by combining a TRUE VLOOKUP with a APPROXIMATE INDEX MATCH.

So what’s happening here? We are using the first TRUE VLOOKUP to check whether the lookup_value is present in the lookup table. Then once we have that confirmed we can do an APPROXIMATE INDEX MATCH (less than or equal in this case) to efficiently search for our corresponding value in the result_column.

See below how the MATCH function is defined in Excel:

Excel MATCH function definition
Excel MATCH function definition

A simple example below:
TRUE VLOOKUP with APPROXIMATE INDEX MATCH
TRUE VLOOKUP with APPROXIMATE INDEX MATCH

The APPROXIMATE INDEX MATCH is similarly as efficient as the TRUE VLOOKUP hence both approaches are equivalent in terms of performance! Fantastic right!?

Keep in mind that the lookup table has to be SORTED by the lookup column (key column)! Otherwise the query will return inconclusive results.

Conclusions on VLOOKUP vs INDEX MATCH

For me these results mean at least 4 things:

  1. On a daily basis swapping your VLOOKUPs for INDEX-MATCH combos will not affect your Excel workbook performance, although may provide you with more flexibility and reduce the number of errors when working on the lookup table, which I personally appreciate
  2. If performance is key sort your lookup table and swap those VLOOKUPS with DOUBLE TRUE VLOOKUPS or rather the TRUE VLOOKUP APPROXIMATE INDEX MATCH combo. Alternatively, swap the VLOOKUPs with a Microsoft Query (SQL) for more simplicity and control
  3. Unless you are using the DOUBLE TRUE VLOOKUP (or TRUE VLOOKUP APPROXIMATE INDEX MATCH combo) don’t sort your lookup data table if you want to gain performance – this will have the opposite effect
  4. When working with very large datasets (>100k rows) consider MS Query. Although it may be just a little slower (consistently a couple of sec) than the DOUBLE TRUE VLOOKUP, for lookup operations, it is more flexible (performs well against other Excel functions) and provides you more control over your query. This also releases you out of the mercy of the Excel Automatic Calculation feature which may cause cells to recalculate when source data is modified (you can refresh the query via macro or by clicking refresh)
  5. Don’t resort to VBA for performance. It is an overkill for this exercise, although it performed almost as well as SQL, it introduces unnecessary complexity and requires saving files in XLSM/XLSB/XLS file format. Other than for amusement, VBA has no justification for this scenario

The VLOOKUP vs INDEX MATCH topic is one of the most popular Excel debates, but I hope that this post will shed more light and provide a measurable comparison of the options out there. I also feel I need to encourage the use of MS Queries as having their own place in this debate.

Do you agree? What do you think about the comparison of these approaches?