Excel VBA Fuzzy Match

Excel VBA Fuzzy Match text against a table

Fuzzy text matching is very useful when you want to compare a text string against other strings that don’t have to be identical. You still, however, want to find the one that is closest in terms of words. Fuzzy matching is very useful when e.g. you want to compare a user question against a database of solutions/answers. That is basically what Google does everyday. Rarely will your question ideally match the title of a blog or news article so Google tries to rank pages using fuzzy matching to find ones that are closest to your query.

Microsoft Fuzzy Lookup AddIn

As Fuzzy matching / lookup is a frequent feature Excel users required Microsoft decide to create their own Fuzzy Lookup AddIn. The AddIn basically allows you to lookup columns from another table using Fuzzy Matching and copy them to your source table.

Go here to find Microsoft Official Fuzzy Lookup AddIn for Excel

When playing with the Fuzzy Lookup AddIn I wasn’t happy with it. I had an enormous database of user Questions and Answers and I wanted to have a choice of which items it will match for each of my records basis the user query. Often the fuzzy match algorithm would suggest a match that wasn’t perfect given context. Hence I decide to create my own Fuzzy Match VBA UserForm….

Custom Fuzzy Match using VBA UserForm

What I wanted was an easy way to lookup my query against a Knowledge Base of questions that are ranked in terms of their match against the query. The closer the query is to the question the higher the match. That way I can decide whether indeed I want to take the answer from the question with the highest match rate or maybe one of the below. The additional requirement was to ignore so called “stop words” in my query (the, a, it etc.) as these could generate a lot of false matches, while I wanted fuzzy matching only focus on the keywords.
Example Fuzzy Match
To show you how the Fuzzy Match VBA UserForm works I created a simple Knowledge Base of Excel/VBA related questions. The KB consists of 3 basis columns: Questions, Answer and Category. The Category column is especially useful as looking at the query you might want to limit the Fuzzy algorithm to run on only a subset of items in your database.

Example Knowledge Base of Questions and Answers
An example Knowledge Base

Designing the Fuzzy Match UserForm

Below a quick overview of the Fuzzy Match VBA UserForm if you want to see it in action:

Below you can find the design of the VBA Fuzzy Match UserForm.

Fuzzy Match UserForm
The design of the VBA Fuzzy Match UserForm

Each field is explained below:

  • Text Selected – text from the Excel cell you selected in your workbook before running the VBA macro
  • Search Question – if you want to override the “Text Selected” field simply type your query here and hit the Search button again. A typical scenario for this field is when you see that the result for your “Text Selected” don’t return satisfactory results and you would like to adjust the query
  • Category(s) – if you want to improve the algorithm performance or simply reduce the categories to be searched then select the ones you are interested in
  • Search – the Search button will run the algorithm and return results to the “Results” table
  • Selected Question / Selected Answer – when you click on one of the results these fields will show the full text of the Question and Answer column. This makes it easier to copy the results where needed

The VBA Code

Initializing the UserForm

Firstly we will initialize our UserForm and assure it is showing us the current list of categories in our “KnowledgeBase” worksheet:

Private Sub UserForm_Initialize()
    Set catDict = GetListofCategories()
    For Each it In catDict.keys
        lbCategory.AddItem it
    Next it
    lbCategory.AddItem "Any"
    For i = 0 To lbCategory.ListCount - 1
        lbCategory.Selected(i) = True
    Next i
    tbSelectedQuestion.Text = Selection.Value
End Sub

Function GetListofCategories()
    Dim question As Range
    Set dict = CreateObject("Scripting.Dictionary")
    Set wsQnA = GetQnAWorksheet
    For Each question In wsQnA.Range("C:C").SpecialCells(xlCellTypeConstants)
        If question.Row > 1 Then
            If Not dict.Exists(question.Value) Then
                dict.Add question.Value, 1
            End If
        End If
    Next question
    Set GetListofCategories = dict
End Function

Function GetQnAWorksheet() As Worksheet
    Dim ws As Worksheet
    For Each ws In ActiveWorkbook.Sheets
        If ws.Name Like "KnowledgeBase*" And ws.Visible Then
            Set GetQnAWorksheet = ws
            Exit Function
        End If
    Next ws
    MsgBox "KnowledgeBase worksheet not found!~", vbCritical + vbOKOnly, "Error"
    Set GetQnAWorksheet = Nothing
End Function

Function for removing Stop Words

Before we focus on generating matches let us focus for a sec on creating a function that will remove stop words from a given sentence. This will help us get rid of all those unnecessary words like “the, it, this” etc. that will dilute the fuzzy algorithm.

Function RemoveStopWords(sentence As String) As Collection
    If IsEmpty(stopWords) Then stopWords = Split("a;about;above;after;again;against;all;am;an;and;any;are;aren't;as;at;be;because;been;before;being;below;between;both;but;by;can't;cannot;chat;could;couldn't;did;didn't;do;does;doesn't;doing;don't;down;during;each;few;for;from;further;had;hadn't;has;hasn't;have;haven't;having;he;he'd;he'll;he's;her;here;here's;hers;herself;hi;him;himself;his;how;how's;i;i'd;i'll;i'm;i've;if;in;into;is;isn't;it;it's;its;itself;let's;me;more;most;mustn't;my;myself;need;needs;no;nor;not;of;off;on;once;only;or;other;ought;our;ours;out;over;own;same;shan't;she;she'd;she'll;she's;should;shouldn't;so;some;such;than;that;that's;the;their;theirs;them;themselves;then;there;there's;these;they;they'd;they'll;they're;they've;this;those;through;to;too;under;until;up;very;was;wasn't;we;we'd;we'll;we're;we've;were;weren't;what;what's;when;when's;where;where's;which;while;who;who's;whom;why;why's;with;won't;would;wouldn't;you;you'd;you'll;you're;you've;your;yours;yourself;yourselves;?;!;" & _
             "-;,;", ";")
    Dim stopR As Range, r As Variant, col As Collection
    Set col = New Collection
    For Each w In Split(sentence, " ")
        If Not (IsNumeric(Trim(w))) Then
            w = Trim(w)
            If Len(w) > 0 Then col.Add w
        End If
    Next w
    For Each r In stopWords
        For i = col.Count To 1 Step -1
            If UCase(col(i)) = UCase(r) Then
                col.Remove i
            End If
        Next i
    Next r
    Set RemoveStopWords = col
End Function

You probably noticed I embedded a lot of words directly in the function. This makes it easy to add or remove stop words quickly.

See also  Excel MEDIAN IF - Median with criteria in Excel

Fuzzy Matching algorithm

Now for the meaty part :). Below the key logic for generating the results:

Public stopWords As Variant
Dim selectedChat As Range
Private Sub cbSearch_Click()
    'Clear Search worksheet
    Set selectedChat = Selection
    Dim wsRes As Worksheet: Set wsRes = GetSearchWorksheet
    selectedChat.Worksheet.Activate
    'Remove stop words from question
    Dim qCol As Collection
    Set qCol = RemoveStopWords(Selection.Value)
    'Search
    Dim wsQnA As Worksheet, r As Range, saCol As Collection, sa() As String, saIndex As Long
    Dim dict As Object, dstR As Range, startCount As Long, pMax As Long, currProgress As Long
    Set dict = CreateObject("Scripting.Dictionary")
    For i = 0 To lbCategory.ListCount - 1
        If lbCategory.Selected(i) Then
            dict.Add lbCategory.List(i), lbCategory.List(i)
        End If
    Next i
    'Search question matching Service Area, search query and calculate match
    Set wsQnA = GetQnAWorksheet
    If wsQnA Is Nothing Then Exit Sub
    startCount = wsRes.Range("A:A").SpecialCells(xlCellTypeConstants).Count
    pMax = wsQnA.Range("A:A").SpecialCells(xlCellTypeConstants).Count
    For Each r In wsQnA.Range("A:A").SpecialCells(xlCellTypeConstants)
        If dict.Exists(r.Offset(0, 4).Value) Or IsEmpty(r.Offset(0, 4).Value) And r.Row > 1 Then
            If tbSearch.Value = vbNullString Or InStr(1, r.Value, tbSearch.Value, vbTextCompare) > 0 Then
                Set dstR = wsRes.Range("A1").Offset(startCount):  startCount = startCount + 1
                dstR.Value = r.Value
                dstR.Offset(0, 1).Value = r.Offset(0, 1).Value
                dstR.Offset(0, 2).Value = r.Offset(0, 2).Value
                dstR.Offset(0, 3).Value = CalculateMatch(qCol, r.Value)
                dstR.Offset(0, 3).NumberFormat = "0%"
            End If
        End If
        currProgress = currProgress + 1
        If currProgress Mod 100 = 0 Then
            lStatus.Caption = "Searching " & Format(CDbl(currProgress) / pMax, "0%")
            DoEvents
        End If
    Next r
    'Display Search sorted by Match
    If wsRes.UsedRange.Rows.Count > 0 Then
        With wsRes.Sort
            .SortFields.Clear
            .SortFields.Add2 Key:=GetSearchLastColumn(wsRes), SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
            .SetRange wsRes.UsedRange
            .Header = xlYes
            .MatchCase = False
            .Orientation = xlTopToBottom
            .SortMethod = xlPinYin
            .Apply
        End With
    End If
    
    lbResult.RowSource = GetSearchRangeAddress(wsRes)
    lStatus.Caption = "" & lbResult.ListCount & " Questions found"
End Sub
Function GetSearchRangeAddress(ws As Worksheet)
    GetSearchRangeAddress = "'" & ws.Name & "'!" & Range(ws.Range("A2"), ws.Cells(ws.UsedRange.Rows.Count, ws.UsedRange.Columns.Count)).AddressLocal
End Function
Function GetSearchLastColumn(ws As Worksheet) As Range
    Set GetSearchLastColumn = Range(ws.Range("D2"), ws.Cells(ws.UsedRange.Rows.Count, ws.UsedRange.Columns.Count))
End Function

Function CalculateMatch(sentCol As Collection, sentence As String) As Double
    Dim m As Long, s() As String
    s = Split(sentence, " ")
    For Each w In sentCol
        For Each ws In s
            If UCase(ws) = UCase(w) Then
                m = m + 1
                Exit For
            End If
        Next ws
    Next w
    CalculateMatch = m / sentCol.Count
End Function
Sub CreateSearchResultsHeader(ws As Worksheet)
    ws.Range("A1").Value = "Questions"
    ws.Range("B1").Value = "Answer"
    ws.Range("C1").Value = "Category"
    ws.Range("D1").Value = "Match"
End Sub
Sub AddSearchResultsRow(ws As Worksheet, rowNum As Long, question As String, answer As String, sa As String, match As String)
    ws.Range("A" & rowNum).Value = question
    ws.Range("B" & rowNum).Value = answer
    ws.Range("C" & rowNum).Value = sa
    ws.Range("D" & rowNum).Value = match
End Sub
Function GetSearchWorksheet()
    Dim ws As Worksheet
    For Each ws In ActiveWorkbook.Sheets
        If ws.Name = "SearchResults" Then
            ws.UsedRange.Clear
            CreateSearchResultsHeader ws
            Set GetSearchWorksheet = ws
            Exit Function
        End If
    Next ws
    Set ws = ActiveWorkbook.Sheets.Add
    ws.Name = "SearchResults"
    CreateSearchResultsHeader ws
    Set GetSearchWorksheet = ws
End Function

The above code will do the following – compare the query to each question from the selected categories, calculate the match and add it to a temporary worksheet. Once done the table in the form will be connected to the range in the temporary worksheet and displayed.

Below a few other pieces of code that help display the Q/A in the text boxes and help us clean-up:

Private Sub lbResult_Click()
    'Display question and answer in textbox below
    For i = 0 To lbResult.ListCount - 1
        If lbResult.Selected(i) Then
            tbQ.Value = lbResult.List(i, 0)
            tbA.Value = lbResult.List(i, 1)
        End If
    Next i
End Sub

Private Sub UserForm_Terminate()
    On Error Resume Next
    Application.DisplayAlerts = False
    ActiveWorkbook.Sheets("SearchResults").Delete
    Application.DisplayAlerts = True
End Sub

Download the entire VBA Code Module

If you want to download the entire VBA Code Module for the Excel VBA Fuzzy Match UserForm click the download button below:
Download

2 Comments

  1. Hey Tom – this is really great! I am a vascular surgeon trying to develop semantic matching methods for medical concepts. I look forward to trying to implement this as part of the solution.

  2. Hi Tom, your code is fascinating! I’ve been working on a project to fuzzy match my product list to competitors. Any chance you can share your sample excel file to me?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.