VLOOKUP vs INDEX MATCH vs SQL vs VBA – today you are in for the ultimate Excel Showdown. The VLOOKUP Excel function is one of the most popular functions, around which there has always been much debate. You will most definitely find an article about this function on almost every Excel blog site out there that matters. Similarly there has been much argue about how efficient this function is, when compared to other combos like INDEX MATCH or DOUBLE TRUE VLOOKUPS. I have always wanted to put the dot over the “i” in at least the discussion around performance when using the VLOOKUP and the INDEX MATCH combo (VLOOKUP vs INDEX MATCH).
What to do with your VLOOKUPs to significantly gain performance, and what to replace them with if you are looking to make your workbook more maintainable? How much performance will you actually gain? Hopefully here you will find answers to these questions.
One additional thing I always wanted to bring into this discussion was the MS Query. Many Excel experts often forget to mention that when you need to lookup a lot of values within a certain table there is an approach almost as effective as any Excel trick out there – Microsoft Query (SQL). Excel features so call Query Tables which can execute OLEDB SQL queries on Excel data (worksheets treated like separate SQL Tables).
This means that instead of doing a lookup of a certain value cell-by-cell you can do it within a single query. This query can always be refreshed at the push of a button (or macro), instead of dealing with uncontrollable automatic recalculations. I felt the urge to include this approach in this post as it can challenge face on all the other approaches out there. But let us start from the beginning…
What VLOOKUP does is lookup a certain key (in the example below a “Dog”) within a column of keys in a certain table. Then it takes a value corresponding to the row in which the key was located and returns a corresponding value from another column.
Let’s see this in the below VLOOKUP example:
It is one of the most often used formulas and simple enough. However, the VLOOKUP function has several setbacks:
- Hard to maintain when columns are added/removed to/from the lookup table
- Key column needs to be first in the lookup table
- Little flexibility – cannot be used to match against both rows and columns of a lookup table. Although it can be replaced with HLOOKUP (the forgotten twin brother of VLOOKUP) this can be a nuisance if you want to create a table lookuping up both columns and rows
Why INDEX MATCH?
There are many decent posts on why to consider using INDEX MATCH against the common VLOOKUP. But before we go into the pros and cons let’s understand how the INDEX MATCH combo works.
In short we can replace a VLOOKUP with a combo consisting of 2 functions:
- INDEX – returning the value of an element in a table or an array, selected by the row and/or column number indexes)
- MATCH – returning the relative position of an item in a specified range
In the example below the MATCH function will first return the relative position (the row number) of the Dog in the A column. Next the INDEX function will return a corresponding value from the same row in column B.
Not really much complicated than the VLOOKUP but the INDEX MATCH combo certainly handles all of the setbacks of the VLOOKUP pretty well (see section above).
But why use INDEX MATCH instead of VLOOKUP, especially if we get the exactly same result? Short summary of the pros and cons of the INDEX MATCH vs. the VLOOKUP:
- More flexible – allows you to match both against rows and columns
- Less error prone – adding/removing columns/rows from the lookup table should not crash the INDEX MATCH combo
- Can be split to match multiple columns – by splitting the INDEX from the MATCH we can in fact match several column off a single INDEX column which points us to the result row
- Both vertical and horizontal lookups – VLOOKUP and HLOOKUP address either only vertial or horizontal lookups, whereas with the INDEX MATCH you can easily do both
- Harder to use – VLOOKUP is a little easier to understand than INDEX MATCH and I know some people have difficulty with this two step approach
What about SQL?
As promised a quick example as to how SQL (MS Query) can fit in. The query below will do the same lookup operation as the VLOOKUP and the INDEX-MATCH.
SELECT Category FROM [Sheet1$] WHERE Animal = "Dog"
Returning the value
A Query to lookup a whole column of values would look more like this:
SELECT LVS.lookup_value, LT.lookup_result FROM [LOOKUP_VALUES_SHEET$] AS LVS INNER JOIN [LOOKUP_TABLE$] as LT ON LVS.lookup_value = LT.lookup_value
Now SQL will not prove much useful for just a single lookup operation. Its benefits appear when needing to carry out A LOT of lookup operations. As you will see there are certain tricks you can use in Excel to get better performance than SQL can provide you, however, I would encourage learning SQL as in most cases it can easily replace the need to create complex queries or Array Formulas and still provide awesome performance.
What about VBA?
Couldn’t miss out on an opportunity to check how VBA compares to the other approaches, although it seems pretty obvious it wouldn’t be a fair match – as VBA is singlethreaded as opposed to Excel’s native formulas, which are recalculated concurrently in multiple threads. To challenge the other approaches I devised a simple VBA procedure using the VBA Dictionary object.
How does the VBA lookup procedure work? It loads the entire lookup table to a VBA Dictionary object and then looks-up the entire lookup values against the VBA Dictionary.
Sub TestVBA() OptimizeVBA True Dim startTime As Single, endTime As Single startTime = Timer Dim names As Range, ages As Range Dim lookupNames As Range, lookupAges As Range Dim vlookupCol As Object Set names = Worksheets("Sheet1").Range("A2:A7") Set ages = Worksheets("Sheet1").Range("B2:B7") Set lookupNames = Worksheets("Sheet1").Range("D2:D5") Set lookupAges = Worksheets("Sheet1").Range("E2:E5") 'Build Collection Set vlookupCol = BuildLookupCollection(names, ages) 'Lookup the values VLookupValues lookupNames, lookupAges, vlookupCol endTime = Timer Debug.Print (endTime - startTime) & " seconds have passed [VBA]" OptimizeVBA False Set vlookupCol = Nothing End Sub Function BuildLookupCollection(categories As Range, values As Range) Dim vlookupCol As Object, i As Long Set vlookupCol = CreateObject("Scripting.Dictionary") For i = 1 To categories.Rows.Count Call vlookupCol.Add(CStr(categories(i)), values(i)) Next i Set BuildLookupCollection = vlookupCol End Function Sub VLookupValues(lookupCategory As Range, lookupValues As Range, vlookupCol As Object) Dim i As Long, resArr() As Variant ReDim resArr(lookupCategory.Rows.Count, 1) For i = 1 To lookupCategory.Rows.Count resArr(i - 1, 0) = vlookupCol.Item(CStr(lookupCategory(i))) Next i lookupValues = resArr End Sub Sub OptimizeVBA(isOn As Boolean) Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic) Application.EnableEvents = Not (isOn) Application.ScreenUpdating = Not (isOn) ActiveSheet.DisplayPageBreaks = Not (isOn) End Sub
DOUBLE TRUE VLOOKUP
The VLOOKUP allows you to either approximate a match (by providing range_lookup value to TRUE) or select an exact match. Strangely enough Excel defaults to the approximate options which is obviously a nuisance for most of us… but maybe that was actually a hint from Microsoft that this is the option to go with? It turn out that using the TRUE option that approximates the lookup result will return the lookup value providing a significant performance boost (scroll down to see how significant).
However, a VLOOKUP using the TRUE option will always return a result – not necessarily the one you were looking for. The trick therefore socialized by Charles Williams here shows how using 2 TRUE VLOOKUPS you can get accurate results with great performance. How does it work? In short what Charles suggests is using an IF function with the condition being the FIRST TRUE VLOOKUP to match for the lookup value (the key). If the result matches against the original lookup value (the key) the IF functions returns the SECOND TRUE VLOOKUP which returns the matched value in the right column. If not it returns any default value set. See below:
The formula below:
Notice that the condition in the formula verifies if the VLOOKUP located the right lookup value. If so, the second VLOOKUP will return the associated lookup result. Might seem strange at first that 2 VLOOKUPs are better than one. Let’s look at the statistics to see how it will compare against the other approaches…
Let’s now dive right into the performances stats around these different approach.
Before we start I want to level set a couple of things. I am running these tests with Excel 2013 installed so keep in mind that if you are using a different version you may see slightly different results (even the INDEX MATCH being quicker then the simple VLOOKUP as I am told to believe). As all things changes, so should the discussion around VLOOKUP need updating – especially performance-wise. Apart from that some additional things to mention:
- Tests run on an Intel i5-4300U processor (2 physical cores)
- Tests run on a large dataset (200k lookup table) and assuming a large number of lookups (>25k) to diffuse the problem of accurate performance measurements (25k – 200k lookup operations). Results should give a good approximation of the actual metrics
Lookups on UNSORTED data
In most common cases you are carrying out lookup operations on a UNSORTED lookup table. The chart below present results of the following alternatives:
- VLOOKUP (UnSorted) – a simple VLOOKUP on an unsorted lookup table
- INDEX MATCH (UnSorted) – a simple INDEX MATCH on an unsorted lookup table
- SQL (UnSorted) – an simple SELECT query matching against the lookup values (keys) of the VLOOKUP (returns same results in same order)
- VBA (Sorted) – a VBA procedure that creates a dictionary of the lookup table and matches the lookups using the VBA Dictionary. You can find the source code here: source code
So what do we see? It seems that the INDEX MATCH combo performs consistently slightly worse then the simple VLOOKUP. There are no large differences when increasing the amounts of operations performed which seems like there is no reason to jump to an INDEX MATCH combo in the need for just performance. SQL (and VBA) on the other hand wiped out the competition being almost 19x faster when executed against 200k lookups. It seems like the MS Query did not increase it’s execution time considerably probably leading to think that it may perform equally well on much larger lookup tables. VBA was also quite efficient when executed on less than 100k lookup operations. Nevertheless, I consider SQL the winner as clearly it performed better for more operations.
Lookups on SORTED data
Let’s now consider the ideal situation where we have a SORTED lookup table. The chart below will now present results of the following alternatives:
- VLOOKUP (Sorted) – a simple VLOOKUP on a sorted lookup table
- DOUBLE TRUE VLOOKUP (Sorted) – a DOUBLE TRUE VLOOKUP on a sorted lookup table
- INDEX MATCH (Sorted) – a simple INDEX MATCH on a sorted lookup table
- SQL (Sorted) – an simple SELECT query matching against the lookup values (keys) of the
VLOOKUP (returns same results in same order)
- VBA (Sorted) – a VBA procedure that creates a dictionary of the lookup table and matches the lookups using the VBA Dictionary, you can find the source code here: source code
As we can see the DOUBLE TRUE VLOOKUP rules the stage with an astonishing 0.22 seconds vs. the aweful 110 seconds of a regular VLOOKUP. An astonishing improvement! SQL comes second with a score just slightly above 5 seconds which seems reasonable (although using only 1 core due to being single threaded). What comes as strange is that both the VLOOKUP and the INDEX MATCH actually performed worse when executed against a sorted lookup table. Not something you might expect, but broadly explained by Excel-guru Bill Jelen in this podcast really worth watching if you want to know more on the subject.
The RIGHT approach
Although the DOUBLE TRUE VLOOKUP proved superior to any other method, VLOOKUP is a function that is far from perfect (read section on INDEX MATCH above). In short VLOOKUP is less immune to changes than the INDEX MATCH. Ideally we would like to have an INDEX MATCH formula that is just as efficient as the DOUBLE TRUE VLOOKUP… Well in fact there is a way. At least if we mix a little of both worlds – by combining a TRUE VLOOKUP with a APPROXIMATE INDEX MATCH.
=IF( VLOOKUP(lookup_value; lookup_column; 1; TRUE) = lookup_value; INDEX(result_column; MATCH(lookup_value; lookup_column; 1)); NA())
So what’s happening here? We are using the first TRUE VLOOKUP to check whether the lookup_value is present in the lookup table. Then once we have that confirmed we can do an APPROXIMATE INDEX MATCH (less than or equal in this case) to efficiently search for our corresponding value in the result_column.
The APPROXIMATE INDEX MATCH is similarly as efficient as the TRUE VLOOKUP hence both approaches are equivalent in terms of performance! Fantastic right!?
Conclusions on VLOOKUP vs INDEX MATCH
For me these results mean at least 4 things:
- On a daily basis swapping your VLOOKUPs for INDEX-MATCH combos will not affect your Excel workbook performance, although may provide you with more flexibility and reduce the number of errors when working on the lookup table, which I personally appreciate
- If performance is key sort your lookup table and swap those VLOOKUPS with DOUBLE TRUE VLOOKUPS or rather the TRUE VLOOKUP APPROXIMATE INDEX MATCH combo. Alternatively, swap the VLOOKUPs with a Microsoft Query (SQL) for more simplicity and control
- Unless you are using the DOUBLE TRUE VLOOKUP (or TRUE VLOOKUP APPROXIMATE INDEX MATCH combo) don’t sort your lookup data table if you want to gain performance – this will have the opposite effect
- When working with very large datasets (>100k rows) consider MS Query. Although it may be just a little slower (consistently a couple of sec) than the DOUBLE TRUE VLOOKUP, for lookup operations, it is more flexible (performs well against other Excel functions) and provides you more control over your query. This also releases you out of the mercy of the Excel Automatic Calculation feature which may cause cells to recalculate when source data is modified (you can refresh the query via macro or by clicking refresh)
- Don’t resort to VBA for performance. It is an overkill for this exercise, although it performed almost as well as SQL, it introduces unnecessary complexity and requires saving files in XLSM/XLSB/XLS file format. Other than for amusement, VBA has no justification for this scenario
The VLOOKUP vs INDEX MATCH topic is one of the most popular Excel debates, but I hope that this post will shed more light and provide a measurable comparison of the options out there. I also feel I need to encourage the use of MS Queries as having their own place in this debate.
Do you agree? What do you think about the comparison of these approaches?