Fuzzy text matching is very useful when you want to compare a text string against other strings that don’t have to be identical. You still, however, want to find the one that is closest in terms of words. Fuzzy matching is very useful when e.g. you want to compare a user question against a database of solutions/answers. That is basically what Google does everyday. Rarely will your question ideally match the title of a blog or news article so Google tries to rank pages using fuzzy matching to find ones that are closest to your query.
Microsoft Fuzzy Lookup AddIn
As Fuzzy matching / lookup is a frequent feature Excel users required Microsoft decide to create their own Fuzzy Lookup AddIn. The AddIn basically allows you to lookup columns from another table using Fuzzy Matching and copy them to your source table.
When playing with the Fuzzy Lookup AddIn I wasn’t happy with it. I had an enormous database of user Questions and Answers and I wanted to have a choice of which items it will match for each of my records basis the user query. Often the fuzzy match algorithm would suggest a match that wasn’t perfect given context. Hence I decide to create my own Fuzzy Match VBA UserForm….
Custom Fuzzy Match using VBA UserForm
What I wanted was an easy way to lookup my query against a Knowledge Base of questions that are ranked in terms of their match against the query. The closer the query is to the question the higher the match. That way I can decide whether indeed I want to take the answer from the question with the highest match rate or maybe one of the below. The additional requirement was to ignore so called “stop words” in my query (the, a, it etc.) as these could generate a lot of false matches, while I wanted fuzzy matching only focus on the keywords.
To show you how the Fuzzy Match VBA UserForm works I created a simple Knowledge Base of Excel/VBA related questions. The KB consists of 3 basis columns: Questions, Answer and Category. The Category column is especially useful as looking at the query you might want to limit the Fuzzy algorithm to run on only a subset of items in your database.
Designing the Fuzzy Match UserForm
Below a quick overview of the Fuzzy Match VBA UserForm if you want to see it in action:
Below you can find the design of the VBA Fuzzy Match UserForm.
Each field is explained below:
- Text Selected – text from the Excel cell you selected in your workbook before running the VBA macro
- Search Question – if you want to override the “Text Selected” field simply type your query here and hit the Search button again. A typical scenario for this field is when you see that the result for your “Text Selected” don’t return satisfactory results and you would like to adjust the query
- Category(s) – if you want to improve the algorithm performance or simply reduce the categories to be searched then select the ones you are interested in
- Search – the Search button will run the algorithm and return results to the “Results” table
- Selected Question / Selected Answer – when you click on one of the results these fields will show the full text of the Question and Answer column. This makes it easier to copy the results where needed
The VBA Code
Initializing the UserForm
Firstly we will initialize our UserForm and assure it is showing us the current list of categories in our “KnowledgeBase” worksheet:
Private Sub UserForm_Initialize() Set catDict = GetListofCategories() For Each it In catDict.keys lbCategory.AddItem it Next it lbCategory.AddItem "Any" For i = 0 To lbCategory.ListCount - 1 lbCategory.Selected(i) = True Next i tbSelectedQuestion.Text = Selection.Value End Sub Function GetListofCategories() Dim question As Range Set dict = CreateObject("Scripting.Dictionary") Set wsQnA = GetQnAWorksheet For Each question In wsQnA.Range("C:C").SpecialCells(xlCellTypeConstants) If question.Row > 1 Then If Not dict.Exists(question.Value) Then dict.Add question.Value, 1 End If End If Next question Set GetListofCategories = dict End Function Function GetQnAWorksheet() As Worksheet Dim ws As Worksheet For Each ws In ActiveWorkbook.Sheets If ws.Name Like "KnowledgeBase*" And ws.Visible Then Set GetQnAWorksheet = ws Exit Function End If Next ws MsgBox "KnowledgeBase worksheet not found!~", vbCritical + vbOKOnly, "Error" Set GetQnAWorksheet = Nothing End Function
Function for removing Stop Words
Before we focus on generating matches let us focus for a sec on creating a function that will remove stop words from a given sentence. This will help us get rid of all those unnecessary words like “the, it, this” etc. that will dilute the fuzzy algorithm.
Function RemoveStopWords(sentence As String) As Collection If IsEmpty(stopWords) Then stopWords = Split("a;about;above;after;again;against;all;am;an;and;any;are;aren't;as;at;be;because;been;before;being;below;between;both;but;by;can't;cannot;chat;could;couldn't;did;didn't;do;does;doesn't;doing;don't;down;during;each;few;for;from;further;had;hadn't;has;hasn't;have;haven't;having;he;he'd;he'll;he's;her;here;here's;hers;herself;hi;him;himself;his;how;how's;i;i'd;i'll;i'm;i've;if;in;into;is;isn't;it;it's;its;itself;let's;me;more;most;mustn't;my;myself;need;needs;no;nor;not;of;off;on;once;only;or;other;ought;our;ours;out;over;own;same;shan't;she;she'd;she'll;she's;should;shouldn't;so;some;such;than;that;that's;the;their;theirs;them;themselves;then;there;there's;these;they;they'd;they'll;they're;they've;this;those;through;to;too;under;until;up;very;was;wasn't;we;we'd;we'll;we're;we've;were;weren't;what;what's;when;when's;where;where's;which;while;who;who's;whom;why;why's;with;won't;would;wouldn't;you;you'd;you'll;you're;you've;your;yours;yourself;yourselves;?;!;" & _ "-;,;", ";") Dim stopR As Range, r As Variant, col As Collection Set col = New Collection For Each w In Split(sentence, " ") If Not (IsNumeric(Trim(w))) Then w = Trim(w) If Len(w) > 0 Then col.Add w End If Next w For Each r In stopWords For i = col.Count To 1 Step -1 If UCase(col(i)) = UCase(r) Then col.Remove i End If Next i Next r Set RemoveStopWords = col End Function
You probably noticed I embedded a lot of words directly in the function. This makes it easy to add or remove stop words quickly.
Fuzzy Matching algorithm
Now for the meaty part :). Below the key logic for generating the results:
Public stopWords As Variant Dim selectedChat As Range Private Sub cbSearch_Click() 'Clear Search worksheet Set selectedChat = Selection Dim wsRes As Worksheet: Set wsRes = GetSearchWorksheet selectedChat.Worksheet.Activate 'Remove stop words from question Dim qCol As Collection Set qCol = RemoveStopWords(Selection.Value) 'Search Dim wsQnA As Worksheet, r As Range, saCol As Collection, sa() As String, saIndex As Long Dim dict As Object, dstR As Range, startCount As Long, pMax As Long, currProgress As Long Set dict = CreateObject("Scripting.Dictionary") For i = 0 To lbCategory.ListCount - 1 If lbCategory.Selected(i) Then dict.Add lbCategory.List(i), lbCategory.List(i) End If Next i 'Search question matching Service Area, search query and calculate match Set wsQnA = GetQnAWorksheet If wsQnA Is Nothing Then Exit Sub startCount = wsRes.Range("A:A").SpecialCells(xlCellTypeConstants).Count pMax = wsQnA.Range("A:A").SpecialCells(xlCellTypeConstants).Count For Each r In wsQnA.Range("A:A").SpecialCells(xlCellTypeConstants) If dict.Exists(r.Offset(0, 4).Value) Or IsEmpty(r.Offset(0, 4).Value) And r.Row > 1 Then If tbSearch.Value = vbNullString Or InStr(1, r.Value, tbSearch.Value, vbTextCompare) > 0 Then Set dstR = wsRes.Range("A1").Offset(startCount): startCount = startCount + 1 dstR.Value = r.Value dstR.Offset(0, 1).Value = r.Offset(0, 1).Value dstR.Offset(0, 2).Value = r.Offset(0, 2).Value dstR.Offset(0, 3).Value = CalculateMatch(qCol, r.Value) dstR.Offset(0, 3).NumberFormat = "0%" End If End If currProgress = currProgress + 1 If currProgress Mod 100 = 0 Then lStatus.Caption = "Searching " & Format(CDbl(currProgress) / pMax, "0%") DoEvents End If Next r 'Display Search sorted by Match If wsRes.UsedRange.Rows.Count > 0 Then With wsRes.Sort .SortFields.Clear .SortFields.Add2 Key:=GetSearchLastColumn(wsRes), SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal .SetRange wsRes.UsedRange .Header = xlYes .MatchCase = False .Orientation = xlTopToBottom .SortMethod = xlPinYin .Apply End With End If lbResult.RowSource = GetSearchRangeAddress(wsRes) lStatus.Caption = "" & lbResult.ListCount & " Questions found" End Sub Function GetSearchRangeAddress(ws As Worksheet) GetSearchRangeAddress = "'" & ws.Name & "'!" & Range(ws.Range("A2"), ws.Cells(ws.UsedRange.Rows.Count, ws.UsedRange.Columns.Count)).AddressLocal End Function Function GetSearchLastColumn(ws As Worksheet) As Range Set GetSearchLastColumn = Range(ws.Range("D2"), ws.Cells(ws.UsedRange.Rows.Count, ws.UsedRange.Columns.Count)) End Function Function CalculateMatch(sentCol As Collection, sentence As String) As Double Dim m As Long, s() As String s = Split(sentence, " ") For Each w In sentCol For Each ws In s If UCase(ws) = UCase(w) Then m = m + 1 Exit For End If Next ws Next w CalculateMatch = m / sentCol.Count End Function Sub CreateSearchResultsHeader(ws As Worksheet) ws.Range("A1").Value = "Questions" ws.Range("B1").Value = "Answer" ws.Range("C1").Value = "Category" ws.Range("D1").Value = "Match" End Sub Sub AddSearchResultsRow(ws As Worksheet, rowNum As Long, question As String, answer As String, sa As String, match As String) ws.Range("A" & rowNum).Value = question ws.Range("B" & rowNum).Value = answer ws.Range("C" & rowNum).Value = sa ws.Range("D" & rowNum).Value = match End Sub Function GetSearchWorksheet() Dim ws As Worksheet For Each ws In ActiveWorkbook.Sheets If ws.Name = "SearchResults" Then ws.UsedRange.Clear CreateSearchResultsHeader ws Set GetSearchWorksheet = ws Exit Function End If Next ws Set ws = ActiveWorkbook.Sheets.Add ws.Name = "SearchResults" CreateSearchResultsHeader ws Set GetSearchWorksheet = ws End Function
The above code will do the following – compare the query to each question from the selected categories, calculate the match and add it to a temporary worksheet. Once done the table in the form will be connected to the range in the temporary worksheet and displayed.
Below a few other pieces of code that help display the Q/A in the text boxes and help us clean-up:
Private Sub lbResult_Click() 'Display question and answer in textbox below For i = 0 To lbResult.ListCount - 1 If lbResult.Selected(i) Then tbQ.Value = lbResult.List(i, 0) tbA.Value = lbResult.List(i, 1) End If Next i End Sub Private Sub UserForm_Terminate() On Error Resume Next Application.DisplayAlerts = False ActiveWorkbook.Sheets("SearchResults").Delete Application.DisplayAlerts = True End Sub
Download the entire VBA Code Module
If you want to download the entire VBA Code Module for the Excel VBA Fuzzy Match UserForm click the download button below:
Download
Hey Tom – this is really great! I am a vascular surgeon trying to develop semantic matching methods for medical concepts. I look forward to trying to implement this as part of the solution.
Hi Tom, your code is fascinating! I’ve been working on a project to fuzzy match my product list to competitors. Any chance you can share your sample excel file to me?