Tag Archives: RegExp

regex

Excel Regex Tutorial (Regular Expressions)

1 Star2 Stars3 Stars4 Stars5 Stars (15 votes, average: 4.33 out of 5)
Loading...

Regular expressions are ubiquitous in the developer world. They are used to validate website input, search for word patterns in large strings/texts and for many other uses. In Excel, Regular Expressions (VBA RegEx or simply VBA Regex) are not much advertised. Most users are good with using simple LEFT, RIGHT, MID and FIND functions for their string manipulation. These functions are, however, quite primitive and limited. Knowing how to use Regular Expressions (Regex) in Excel will save you a lot of time. This Excel Regex Tutorial focuses both on using Regex functions and in VBA. Let’s, however, not forget that VBA has also adopted the VBA Like operator which sometimes allows you to achieve some tasks reserved for Regular Expressions.

Excel Regex example

A Regex (Regular Expression) is basically a pattern matching strings within other strings. Let’s demonstrate this with a simple Regex example. Let us assume we have the text below. And we want to capture just the numbers. Without knowing ahead how the text looks like it would be hard to extract these numbers both using Excel Functions and VBA.
But with a Regex we can extract the numbers with a simple pattern:

What does the [0-9]+ pattern represent? It translates to the following: capture any pattern matching the following range of characters ([ ]), being numbers from 0-9, in a sequence of at least 1 or more (+). As you can see a Regex uses a certain code to translate your pattern.

Regular Expression Language

The Regular Expression language (Regex) is quite elaborate but allows you to match virtually any regular language. Below a quick reference:

Matching characters

Syntax Description Example Example match
. Any character except newline (vbNewLine) d.g “dog” in “My dog is named dingo”
[characters] Matches any provided character between brackets [ ] [af] “a” , “f” in “alfa”
[^characters] Matches any character not being one of the provided between brackets [ ] [^af] “a” , “f” in “alfa”
[startend] Matches any character belonging to the character range specified between brackets [ ] [0-9] “1” and “2” in “12”
\w Any word character (letters, modifiers, digits, punctuation and connectors) w “I”, “a” “m” “J” “o” “h” “n” in “I_am.John”
\W Any non-word character w “_” and “.” in “I_am.John”
\s Any white space character s ” ” in “Hi There!”
\S Any non-white space character S “M” and “e” in “M e”
\d Any decimal digit d “1” and “2” in “12”
\D Any non-decimal digit D “d”, “_”, “.” in “d_.”
\ Followed by any special character – escapes special characters . “.” im “d.g”
\r Tab (vbTab) r
\n Carriage return / new line (vbNewLine)

Quantifiers

Quantifiers allow you to specify the amount of times a certain pattern is supposed to matched against a string. It is important to understand the difference between GREEDY and non-GREEDY quantifiers:

Greedy vs Non-Greedy Regular Expression
Greedy vs Non-Greedy Regular Expression

Syntax Description Example Example match
* Zero or more of (GREEDY). Matches as many as possible W.*W “_dogs_cats_” in “_dogs_cats_”
+ One or more of (GREEDY). Matches as many as possible Ww+W “_dogs_cats_” in “_dogs_cats_”
? Zero or once (GREEDY). Matches as many as possible d? “1” in “Live1”
{n} “n” many times d{2} “21” and “12” in “212”
{n,} At least “n” times (GREEDY) d{2,} “12” and “123” in “1_12_123”
{n,m} Between “n” and “m” times (GREEDY) d{3,4} “123” and “1234” in “1_12_123_1234”
*? Zero or more of (non-GREEDY). Matches as few as possible W.*?W “_dogs_” and “_cats_” in “_dogs_cats_”
+? One or more of (non-GREEDY). Matches as few as possible W.+?W “_dogs_” and “_cats_” in “_dogs_cats_”
?? Zero or once (non-GREEDY). Matches as few as possible d?? “1” in “Live1”
{n,}? At least “n” times (non-GREEDY). Matches as few as possible d{2,} “12” and “123” in “1_12_123”
{n,m}? Between “n” and “m” times (non-GREEDY). Matches as few as possible d{3,4} “123” and “1234” in “1_12_123_1234”

Grouping

Below the basic grouping expressions:

Syntax Description Example Example match
(expression) Group and capture the expression within the parenthesis ( ) ([0-9]*) Captures “123, “345” and “789” within “123-456-789”
(?:expression) Group BUT DON’T CAPTURE the expression within the parenthesis ( ) (?:[0-9]*)([A-Z]*)(?:[0-9]*) Captures only “hello” in “123hello456”

Using Regex in VBA

To use Regex in VBA you need to use the RegExp object which is defined in the Microsoft VBScript Regular Expressions library. To start using this object add the following reference to your VBA Project: Tools->References->Microsoft VBScript Regular Expressions. Otherwise, if you don’t want to reference this library every time you can also create the RegExp object using the CreateObject function.

Option 1: Referencing the library Microsoft VBScript Regular Expressions

Option 2: Using the CreateObject function

I personally prefer using the CreateObject function as it does not require referencing the library every time the Workbook is opened on a new workstation.

The RegExp object has the following properties:

  • Pattern – The pattern (written in Regex) which you want to match against (e.g. “(.*)”)
  • IgnoreCase – Ignore letter case (captial/non-capital letters)
  • Global – Do you want to find all possible matches in the input string? If false, only match the first found pattern. Set false if you need just the first match for performance
  • MultiLine – Do you want to match the pattern across line breaks?

The RegExp object facilitates the following 3 operations:

  • Test (string) – returns True if the pattern can be matched agaist the provided string
  • Replace (search-string, replace-string) – replaces occurrences of the pattern in search-string with replace-string
  • Execute (search-string) – returns all matches of the pattern against the search-string

Regex: Test pattern against a string

The Test function allows you to test whether the selected Pattern provides any match against the string.

Regex: Replace pattern in a string

The Replace function will replace the first (if Global = False) or all matching patterns (if Global = True) within a certain string with another string of your choosing.

Regex: Match pattern in a string

The Execute function will match the first or all instances of a certain pattern within a certain string. You can also “capture” parts of the patterns as so called “Submatches”.

As you can see we have managed to capture 3 instances of the 123-[0-9]+ pattern in the string. We can also define a “capture” within our pattern to capture parts of the pattern by embracing them with brackets “()”. See the example below:

Regex: Using Regex as an Excel Formula

Excel does not natively provide any Regex functions which often requires creating complex formulas for extracting pieces of strings otherwise easy to extract using Regular Expressions. Hence, to facilitate Regex in Excel you need to use User Defined Functions – functions defined in VBA but accessible as regular functions in Excel. Below find 2 basic UDF functions created just for this use:

Now for an example:

Regex Excel: Regex UDF functions example
Regex UDF functions example

…and the result:
Regex Excel: Regex UDF functions result
Regex UDF functions result

Download Excel Regex example

You can download a working example with the Regex UDF function here:

ReFiddle – Online testing your Regex!

Want to test quickly a Regular Expression (Regex)? Use ReFiddle. It is a great tool to quickly validate if a Regex works and to be able to quickly share your regex with others!

Keep in mind, however, that the VBA Regular Expression language (supported by RegExp object) does not support all Regular Expressions which are valid in ReFiddle.

Learn Regex (Regular Expression) the Fun way

Want to learn building Regex (Regular Expressions) and have some fun at the same time?

Try Regex Golf:
Regex Golf

regex golf