Regular expressions are ubiquitous in the developer world. They are used to validate website input, search for word patterns in large strings/texts and for many other uses. In Excel, Regular Expressions (VBA RegEx or simply VBA Regex) are not much advertised. Most users are good with using simple LEFT, RIGHT, MID and FIND functions for their string manipulation. These functions are, however, quite primitive and limited. Knowing how to use Regular Expressions (Regex) in Excel will save you a lot of time. This Excel Regex Tutorial focuses both on using Regex functions and in VBA. Let’s, however, not forget that VBA has also adopted the VBA Like operator which sometimes allows you to achieve some tasks reserved for Regular Expressions.
Excel Regex VBA example
A Regex (Regular Expression) is basically a pattern matching strings within other strings. Let’s demonstrate this with a simple Regex example. Let us assume we have the text below. And we want to capture just the numbers. Without knowing ahead how the text looks like it would be hard to extract these numbers both using Excel Functions and VBA. Below a simple example where we check if the pattern exists in the string.
'RegExp object initiation without referencing library Dim regex as Object Set regex = CreateObject("VBScript.RegExp") regex.Pattern = "[0-9]+" 'Pattern - look for sequence of numbers 'String to check for pattern Dim str as String str = "Hello 123 World!" 'Check if pattern exists Debug.Print regex.Test(str)
This is the output result. It returns True as string pattern was found
True
What does the [0-9]+ pattern represent? It translates to the following: capture any pattern matching the following range of characters ([ ]), being numbers from 0-9, in a sequence of at least 1 or more (+). As you can see a Regex uses a certain code to translate your pattern.
Regular Expression Language
The Regular Expression language (Regex) is quite elaborate but allows you to match virtually any regular language. Below a quick reference:
Matching characters
Syntax | Description | Example | Example match |
---|---|---|---|
. | Any character except newline (vbNewLine) | d.g | “dog” in “My dog is named dingo” |
[characters] | Matches any provided character between brackets [ ] | [af] | “a” , “f” in “alfa” |
[^characters] | Matches any character not being one of the provided between brackets [ ] | [^af] | “a” , “f” in “alfa” |
[start–end] | Matches any character belonging to the character range specified between brackets [ ] | [0-9] | “1” and “2” in “12” |
\w | Any word character (letters, modifiers, digits, punctuation and connectors) | w | “I”, “a” “m” “J” “o” “h” “n” in “I_am.John” |
\W | Any non-word character | w | “_” and “.” in “I_am.John” |
\s | Any white space character | s | ” ” in “Hi There!” |
\S | Any non-white space character | S | “M” and “e” in “M e” |
\d | Any decimal digit | d | “1” and “2” in “12” |
\D | Any non-decimal digit | D | “d”, “_”, “.” in “d_.” |
\ | Followed by any special character – escapes special characters | . | “\.” in “d.g” |
\r | Tab (vbTab) | \r | |
\n | Carriage return / new line (vbNewLine) | \n |
Quantifiers
Quantifiers allow you to specify the amount of times a certain pattern is supposed to matched against a string. It is important to understand the difference between GREEDY and non-GREEDY quantifiers:
Syntax | Description | Example | Example match |
---|---|---|---|
* | Zero or more of (GREEDY). Matches as many as possible | W.*W | “_dogs_cats_” in “_dogs_cats_” |
+ | One or more of (GREEDY). Matches as many as possible | Ww+W | “_dogs_cats_” in “_dogs_cats_” |
? | Zero or once (GREEDY). Matches as many as possible | d? | “1” in “Live1” |
{n} | “n” many times | d{2} | “21” and “12” in “212” |
{n,} | At least “n” times (GREEDY) | d{2,} | “12” and “123” in “1_12_123” |
{n,m} | Between “n” and “m” times (GREEDY) | d{3,4} | “123” and “1234” in “1_12_123_1234” |
*? | Zero or more of (non-GREEDY). Matches as few as possible | W.*?W | “_dogs_” and “_cats_” in “_dogs_cats_” |
+? | One or more of (non-GREEDY). Matches as few as possible | W.+?W | “_dogs_” and “_cats_” in “_dogs_cats_” |
?? | Zero or once (non-GREEDY). Matches as few as possible | d?? | “1” in “Live1” |
{n,}? | At least “n” times (non-GREEDY). Matches as few as possible | d{2,} | “12” and “123” in “1_12_123” |
{n,m}? | Between “n” and “m” times (non-GREEDY). Matches as few as possible | d{3,4} | “123” and “1234” in “1_12_123_1234” |
Grouping
Below the basic grouping expressions:
Syntax | Description | Example | Example match |
---|---|---|---|
(expression) | Group and capture the expression within the parenthesis ( ) | ([0-9]*) | Captures “123, “345” and “789” within “123-456-789” |
(?:expression) | Group BUT DON’T CAPTURE the expression within the parenthesis ( ) | (?:[0-9]*)([A-Z]*)(?:[0-9]*) | Captures only “hello” in “123hello456” |
Using Regex in VBA
To use Regex in VBA you need to use the RegExp object which is defined in the Microsoft VBScript Regular Expressions library. To start using this object add the following reference to your VBA Project: Tools->References->Microsoft VBScript Regular Expressions. Otherwise, if you don’t want to reference this library every time you can also create the RegExp object using the CreateObject function.
Option 1: Referencing the library Microsoft VBScript Regular Expressions
Dim regex as RegExp Set regex = New RegExp
Option 2: Using the CreateObject function
Dim regex as Object Set regex = CreateObject("VBScript.RegExp")
I personally prefer using the CreateObject function as it does not require referencing the library every time the Workbook is opened on a new workstation.
The RegExp object has the following properties:
- Pattern – The pattern (written in Regex) which you want to match against (e.g. “(.*)”)
- IgnoreCase – Ignore letter case (captial/non-capital letters)
- Global – Do you want to find all possible matches in the input string? If false, only match the first found pattern. Set false if you need just the first match for performance
- MultiLine – Do you want to match the pattern across line breaks?
The RegExp object facilitates the following 3 operations:
- Test (string) – returns True if the pattern can be matched agaist the provided string
- Replace (search-string, replace-string) – replaces occurrences of the pattern in search-string with replace-string
- Execute (search-string) – returns all matches of the pattern against the search-string
Regex: Test pattern against a string
The Test function allows you to test whether the selected Pattern provides any match against the string.
Dim regex As Object, str As String Set regex = CreateObject("VBScript.RegExp") With regex .Pattern = "[0-9]+" End With str = "Hello 123 World!" Debug.Print regex.Test(str) 'Result: True str = "Hello World!" Debug.Print regex.Test(str) 'Result: False
Regex: Replace pattern in a string
The Replace function will replace the first (if Global = False) or all matching patterns (if Global = True) within a certain string with another string of your choosing.
Dim regex As Object, str As String Set regex = CreateObject("VBScript.RegExp") With regex .Pattern = "123-[0-9]+-123" .Global = True 'If False, would replace only first End With str = "321-123-000-123-643-123-888-123" Debug.Print regex.Replace(str, "<Replace>") 'Result: 321-<Replace>-643-<Replace>
Regex: Match pattern in a string
The Execute function will match the first or all instances of a certain pattern within a certain string. You can also “capture” parts of the patterns as so called “Submatches”.
Dim regex As Object, str As String Set regex = CreateObject("VBScript.RegExp") With regex .Pattern = "123-[0-9]+" .Global = True End With str = "321-123-000-123-643-123-888-123" Set matches = regex.Execute(str) For Each match In matches Debug.Print match.Value 'Result: 123-000, 123-643, 123-888 Next match
As you can see we have managed to capture 3 instances of the 123-[0-9]+ pattern in the string. We can also define a “capture” within our pattern to capture parts of the pattern by embracing them with brackets “()”. See the example below:
Dim regex As Object, str As String Set regex = CreateObject("VBScript.RegExp") With regex .Pattern = "123-([0-9]+)" 'Notice the () around the second sequence .Global = True End With str = "321-123-000-123-643-123-888-123" Set matches = regex.Execute(str) For Each match In matches Debug.Print match.Value 'Result: 123-000, 123-643, 123-888 If match.SubMatches.Count > 0 Then For Each subMatch In Match.SubMatches Debug.Print subMatch 'Result: 000, 643, 888 Next subMatch End If Next match
Regex: Using Regex as an Excel Formula
Excel does not natively provide any Regex functions which often requires creating complex formulas for extracting pieces of strings otherwise easy to extract using Regular Expressions. Hence, to facilitate Regex in Excel you need to use User Defined Functions – functions defined in VBA but accessible as regular functions in Excel. Below find 2 basic UDF functions created just for this use:
'Returns the number of matches found for a given regex 'str - string to test the regex on 'reg - the regular expression Public Function RegexCountMatches(str As String, reg As String) As String On Error GoTo ErrHandl Set regex = CreateObject("VBScript.RegExp"): regex.Pattern = reg: regex.Global = True If regex.test(str) Then Set matches = regex.Execute(str) RegexCountMatches = matches.Count Exit Function End If ErrHandl: RegexCountMatches = CVErr(xlErrValue) End Function 'Executes a Regular Expression on a provided string and returns a selected submatch 'str - string to execute the regex on 'reg - the regular expression with at least 1 capture '()' 'matchIndex - the index of the match you want to return (default: 0) 'subMatchIndex - the index of the submatch you want to return (default: 0) Public Function RegexExecute(str As String, reg As String, _ Optional matchIndex As Long, _ Optional subMatchIndex As Long) As String On Error GoTo ErrHandl Set regex = CreateObject("VBScript.RegExp"): regex.Pattern = reg regex.Global = Not (matchIndex = 0 And subMatchIndex = 0) 'For efficiency If regex.test(str) Then Set matches = regex.Execute(str) RegexExecute = matches(matchIndex).SubMatches(subMatchIndex) Exit Function End If ErrHandl: RegexExecute = CVErr(xlErrValue) End Function
Now for an example:
…and the result:
Download Excel Regex example
You can download VBA Regex and other great snippets as part of the VBA Time Saver Kit. Feel free to download the full kit using the link below:
ReFiddle – Online testing your Regex!
Want to test quickly a Regular Expression (Regex)? Use ReFiddle. It is a great tool to quickly validate if a Regex works and to be able to quickly share your regex with others!
Keep in mind, however, that the VBA Regular Expression language (supported by RegExp object) does not support all Regular Expressions which are valid in ReFiddle.
Learn Regex (Regular Expression) the Fun way
Want to learn building Regex (Regular Expressions) and have some fun at the same time?
Try Regex Golf:
Regex Golf