Tag Archives: RegExp

regex

Excel Regex Tutorial (Regular Expressions)

1 Star2 Stars3 Stars4 Stars5 Stars (11 votes, average: 4.45 out of 5)
Loading...

Regular expressions are ubiquitous in the developer world. They are used to validate website input, search for word patterns in large strings/texts and for many other uses. In Excel, Regular Expressions (RegEx or simply Regex) are not much advertised. Most users are good with using simple LEFT, RIGHT, MID and FIND functions for their string manipulation. These functions are, however, quite primitive and limited. Knowing how to use Regular Expressions (Regex) in Excel will save you a lot of time. This Excel Regex Tutorial focuses both on using Regex functions and in VBA. Let’s, however, not forget that VBA has also adopted the Like operator which sometimes allows you to achieve some tasks reserved for Regular Expressions.

excel regex level
Using the right tools

What I aim with this tutorial is to make sure you learn to use the right tools at the right time!

LVL 1: Manipulating Strings

Before jumping into Regular Expressions I encourage you to spend a moment on learning the basic string manipulation functions available both as Excel functions and in VBA. Click here to go to the VBA String Functions reference:

VBA String Functions
VBA String Functions

These functions, when used in combination, allow you to achieve most often required string manipulation tasks.

Read more on the VBA String Functions here

LVL 2: Like Operator

Before we move on to Regular Expressions let’s focus on a bit on the Like Operator which often go unmentioned as some of us are so at home with writing Regular Expressions that we don’t want to make the extra effort to learn this.

If "Animal" Like "[A-Z]*" then 
   Debug.Print "Match: String starting with Capital letter!"
End If
Read more on the VBA Like Operator here

LVL 3: Regular Expressions (Excel Regex – RegExp)

A Regex (Regular Expression) is basically a pattern matching strings within other strings. Let’s demonstrate this with a simple Regex example. Let us assume we have the text below. And we want to capture just the numbers.

Hello 134, World 3424, 04234 Some Text

Without knowing ahead how the text looks like it would be hard to extract these numbers both using Excel Functions and VBA.
But with a Regex we can extract the numbers with a simple pattern:

[0-9]+

The captured numbers:

134, 3424, 04234

What does the [0-9]+ pattern represent? It translates to the following: capture any pattern matching the following range of characters ([ ]), being numbers from 0-9, in a sequence of at least 1 or more (+). As you can see a Regex uses a certain code to translate your pattern.

Regular Expression Language

The Regular Expression language (Regex) is quite elaborate but allows you to match virtually any regular language. Below a quick reference:

Matching characters

Syntax Description Example Example match
. Any character except newline (vbNewLine) d.g “dog” in “My dog is named dingo”
[characters] Matches any provided character between brackets [ ] [af] “a” , “f” in “alfa”
[^characters] Matches any character not being one of the provided between brackets [ ] [af] “a” , “f” in “alfa”
[startend] Matches any character belonging to the character range specified between brackets [ ] [0-9] “1” and “2” in “12”
w Any word character (letters, modifiers, digits, punctuation and connectors) w “I”, “a” “m” “J” “o” “h” “n” in “I_am.John”
W Any non-word character w “_” and “.” in “I_am.John”
s Any white space character s ” ” in “Hi There!”
S Any non-white space character S “M” and “e” in “M e”
d Any decimal digit d “1” and “2” in “12”
D Any non-decimal digit D “d”, “_”, “.” in “d_.”
Followed by any special character – escapes special characters . “.” im “d.g”
r Tab (vbTab) r
n Carriage return / new line (vbNewLine)

Quantifiers

Quantifiers allow you to specify the amount of times a certain pattern is supposed to matched against a string. It is important to understand the difference between GREEDY and non-GREEDY quantifiers:

Greedy vs Non-Greedy Regular Expression
Greedy vs Non-Greedy Regular Expression

Syntax Description Example Example match
* Zero or more of (GREEDY). Matches as many as possible W.*W “_dogs_cats_” in “_dogs_cats_”
+ One or more of (GREEDY). Matches as many as possible Ww+W “_dogs_cats_” in “_dogs_cats_”
? Zero or once (GREEDY). Matches as many as possible d? “1” in “Live1”
{n} “n” many times d{2} “21” and “12” in “212”
{n,} At least “n” times (GREEDY) d{2,} “12” and “123” in “1_12_123”
{n,m} Between “n” and “m” times (GREEDY) d{3,4} “123” and “1234” in “1_12_123_1234”
*? Zero or more of (non-GREEDY). Matches as few as possible W.*?W “_dogs_” and “_cats_” in “_dogs_cats_”
+? One or more of (non-GREEDY). Matches as few as possible W.+?W “_dogs_” and “_cats_” in “_dogs_cats_”
?? Zero or once (non-GREEDY). Matches as few as possible d?? “1” in “Live1”
{n,}? At least “n” times (non-GREEDY). Matches as few as possible d{2,} “12” and “123” in “1_12_123”
{n,m}? Between “n” and “m” times (non-GREEDY). Matches as few as possible d{3,4} “123” and “1234” in “1_12_123_1234”

Grouping

Below the basic grouping expressions:

Syntax Description Example Example match
(expression) Group and capture the expression within the parenthesis ( ) ([0-9]*) Captures “123, “345” and “789” within “123-456-789”
(?:expression) Group BUT DON’T CAPTURE the expression within the parenthesis ( ) (?:[0-9]*)([A-Z]*)(?:[0-9]*) Captures only “hello” in “123hello456”

Using Regex in VBA

To use Regex in VBA you need to use the RegExp object which is defined in the Microsoft VBScript Regular Expressions library. To start using this object add the following reference to your VBA Project: Tools->References->Microsoft VBScript Regular Expressions. Otherwise, if you don’t want to reference this library every time you can also create the RegExp object using the CreateObject function.

Option 1: Referencing the library Microsoft VBScript Regular Expressions

Dim regex as RegExp
Set regex = New RegExp

Option 2: Using the CreateObject function

Dim regex as Object
Set regex = CreateObject("VBScript.RegExp")

I personally prefer using the CreateObject function as it does not require referencing the library every time the Workbook is opened on a new workstation.

The RegExp object has the following properties:

  • Pattern – The pattern (written in Regex) which you want to match against (e.g. “(.*)”)
  • IgnoreCase – Ignore letter case (captial/non-capital letters)
  • Global – Do you want to find all possible matches in the input string? If false, only match the first found pattern. Set false if you need just the first match for performance
  • MultiLine – Do you want to match the pattern across line breaks?

The RegExp object facilitates the following 3 operations:

  • Test (string) – returns True if the pattern can be matched agaist the provided string
  • Replace (search-string, replace-string) – replaces occurrences of the pattern in search-string with replace-string
  • Execute (search-string) – returns all matches of the pattern against the search-string

Regex: Test pattern against a string

The Test function allows you to test whether the selected Pattern provides any match against the string.

Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")

With regex
  .Pattern = "[0-9]+"
End With
    
str = "Hello 123 World!"
Debug.Print regex.Test(str) 'Result: True

str = "Hello World!"
Debug.Print regex.Test(str) 'Result: False    

Regex: Replace pattern in a string

The Replace function will replace the first (if Global = False) or all matching patterns (if Global = True) within a certain string with another string of your choosing.

Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")

With regex
  .Pattern = "123-[0-9]+-123"
  .Global = True 'If False, would replace only first
End With
    
str = "321-123-000-123-643-123-888-123"
Debug.Print regex.Replace(str, "<Replace>") 
'Result: 321-<Replace>-643-<Replace>

Regex: Match pattern in a string

The Execute function will match the first or all instances of a certain pattern within a certain string. You can also “capture” parts of the patterns as so called “Submatches”.

Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")

With regex
  .Pattern = "123-[0-9]+"
  .Global = True
End With
    
str = "321-123-000-123-643-123-888-123"

Set matches = regex.Execute(str)
    
For Each match In matches
  Debug.Print match.Value 'Result: 123-000, 123-643, 123-888
Next match

As you can see we have managed to capture 3 instances of the 123-[0-9]+ pattern in the string. We can also define a “capture” within our pattern to capture parts of the pattern by embracing them with brackets “()”. See the example below:

Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")

With regex
  .Pattern = "123-([0-9]+)" 'Notice the () around the second sequence
  .Global = True
End With
    
str = "321-123-000-123-643-123-888-123"

Set matches = regex.Execute(str)
  
For Each match In matches
  Debug.Print match.Value 'Result: 123-000, 123-643, 123-888
  If match.SubMatches.Count > 0 Then
    For Each subMatch In Match.SubMatches
      Debug.Print subMatch 'Result: 000, 643, 888
      Next subMatch
  End If
Next match

Regex: Using Regex as an Excel Formula

Excel does not natively provide any Regex functions which often requires creating complex formulas for extracting pieces of strings otherwise easy to extract using Regular Expressions. Hence, to facilitate Regex in Excel you need to use User Defined Functions – functions defined in VBA but accessible as regular functions in Excel. Below find 2 basic UDF functions created just for this use:

'Returns the number of matches found for a given regex
'str - string to test the regex on
'reg - the regular expression
Public Function RegexCountMatches(str As String, reg As String) As String
    On Error GoTo ErrHandl
    Set regex = CreateObject("VBScript.RegExp"): regex.Pattern = reg: regex.Global = True
    If regex.test(str) Then
        Set matches = regex.Execute(str)
        RegexCountMatches = matches.Count
        Exit Function
    End If
ErrHandl:
    RegexCountMatches = CVErr(xlErrValue)
End Function

'Executes a Regular Expression on a provided string and returns a selected submatch
'str - string to execute the regex on
'reg - the regular expression with at least 1 capture '()'
'matchIndex - the index of the match you want to return (default: 0)
'subMatchIndex - the index of the submatch you want to return (default: 0)
Public Function RegexExecute(str As String, reg As String, _
                             Optional matchIndex As Long, _
                             Optional subMatchIndex As Long) As String
    On Error GoTo ErrHandl
    Set regex = CreateObject("VBScript.RegExp"): regex.Pattern = reg
    regex.Global = Not (matchIndex = 0 And subMatchIndex = 0) 'For efficiency
    If regex.test(str) Then
        Set matches = regex.Execute(str)
        RegexExecute = matches(matchIndex).SubMatches(subMatchIndex)
        Exit Function
    End If
ErrHandl:
    RegexExecute = CVErr(xlErrValue)
End Function

Now for an example:

Regex Excel: Regex UDF functions example
Regex UDF functions example

…and the result:
Regex Excel: Regex UDF functions result
Regex UDF functions result

Download Excel Regex example

You can download a working example with the Regex UDF function here:

ReFiddle – Online testing your Regex!

Want to test quickly a Regular Expression (Regex)? Use ReFiddle. It is a great tool to quickly validate if a Regex works and to be able to quickly share your regex with others!

Keep in mind, however, that the VBA Regular Expression language (supported by RegExp object) does not support all Regular Expressions which are valid in ReFiddle.

Learn Regex (Regular Expression) the Fun way

Want to learn building Regex (Regular Expressions) and have some fun at the same time?

Try Regex Golf:
Regex Golf

regex golf