Tag Archives: regex

VBA Like operator – using wildcards in conditional statements

1 Star2 Stars3 Stars4 Stars5 Stars (6 votes, average: 5.00 out of 5)
Loading...

The VBA Like operator is something so useful I am often surprised how rarely it is used in Excel and Access VBA. I often tend to see the Like operator as the last resort before using Regular Expressions in VBA. It replaces greatly the VBA InStr function when needing to check if a certain substring is present within a certain string. So let’s get right to it!

Here is a simple example of using the VBA Like operator:

The VBA Like operator is an easier alternative to VBA Regex which allows you to search for almost any pattern within a string and find matches.

In case you want to not only find check if the string contains a substring but also return the position of the match – best try using the VBA InStr function.

VBA Like operator

The VBA Like operator is a boolean operator that return True if a string is matched against a certain string pattern.

VBA Like allows you also to use the following wildcards to replace certain strings or characters:

Wildcard Description
* matches any number of characters
? matches any 1 character
[ ] matches any 1 character specified between the brackets
matches any range of characters e.g. [a-z] matches any non-capital 1 letter of the alphabet
# matches any digit character
The VBA Like operator is not letter case sensitive! In case you need upper/lower case matching use VBA InStr function instead

And that is basically it. Easy right? Let’s see some examples…

VBA Like operator examples

Let us now look at a couple of examples that greatly show all the possibilities of the VBA Like operator:

Matching against letters

Matching against numbers

Matching a phone number with either dashes or dots

Matching a certain string within another string

As you can see in the first row we are using the Like Operator similarly as we use other compare operators (=, >, <, <>). The Like operator let’s you validate if a string on the left side of the operator matches the Like expression on the right.

Conculsions on the VBA Like operator

My main take-aways are:

Excel Regex Tutorial (Regular Expressions)

1 Star2 Stars3 Stars4 Stars5 Stars (16 votes, average: 4.25 out of 5)
Loading...

Regular expressions are ubiquitous in the developer world. They are used to validate website input, search for word patterns in large strings/texts and for many other uses. In Excel, Regular Expressions (VBA RegEx or simply VBA Regex) are not much advertised. Most users are good with using simple LEFT, RIGHT, MID and FIND functions for their string manipulation. These functions are, however, quite primitive and limited. Knowing how to use Regular Expressions (Regex) in Excel will save you a lot of time. This Excel Regex Tutorial focuses both on using Regex functions and in VBA. Let’s, however, not forget that VBA has also adopted the VBA Like operator which sometimes allows you to achieve some tasks reserved for Regular Expressions.

Excel Regex example

A Regex (Regular Expression) is basically a pattern matching strings within other strings. Let’s demonstrate this with a simple Regex example. Let us assume we have the text below. And we want to capture just the numbers. Without knowing ahead how the text looks like it would be hard to extract these numbers both using Excel Functions and VBA.
But with a Regex we can extract the numbers with a simple pattern:

What does the [0-9]+ pattern represent? It translates to the following: capture any pattern matching the following range of characters ([ ]), being numbers from 0-9, in a sequence of at least 1 or more (+). As you can see a Regex uses a certain code to translate your pattern.

Regular Expression Language

The Regular Expression language (Regex) is quite elaborate but allows you to match virtually any regular language. Below a quick reference:

Matching characters

Syntax Description Example Example match
. Any character except newline (vbNewLine) d.g “dog” in “My dog is named dingo”
[characters] Matches any provided character between brackets [ ] [af] “a” , “f” in “alfa”
[^characters] Matches any character not being one of the provided between brackets [ ] [^af] “a” , “f” in “alfa”
[startend] Matches any character belonging to the character range specified between brackets [ ] [0-9] “1” and “2” in “12”
\w Any word character (letters, modifiers, digits, punctuation and connectors) w “I”, “a” “m” “J” “o” “h” “n” in “I_am.John”
\W Any non-word character w “_” and “.” in “I_am.John”
\s Any white space character s ” ” in “Hi There!”
\S Any non-white space character S “M” and “e” in “M e”
\d Any decimal digit d “1” and “2” in “12”
\D Any non-decimal digit D “d”, “_”, “.” in “d_.”
\ Followed by any special character – escapes special characters . “.” im “d.g”
\r Tab (vbTab) r
\n Carriage return / new line (vbNewLine)

Quantifiers

Quantifiers allow you to specify the amount of times a certain pattern is supposed to matched against a string. It is important to understand the difference between GREEDY and non-GREEDY quantifiers:

Greedy vs Non-Greedy Regular Expression
Greedy vs Non-Greedy Regular Expression

Syntax Description Example Example match
* Zero or more of (GREEDY). Matches as many as possible W.*W “_dogs_cats_” in “_dogs_cats_”
+ One or more of (GREEDY). Matches as many as possible Ww+W “_dogs_cats_” in “_dogs_cats_”
? Zero or once (GREEDY). Matches as many as possible d? “1” in “Live1”
{n} “n” many times d{2} “21” and “12” in “212”
{n,} At least “n” times (GREEDY) d{2,} “12” and “123” in “1_12_123”
{n,m} Between “n” and “m” times (GREEDY) d{3,4} “123” and “1234” in “1_12_123_1234”
*? Zero or more of (non-GREEDY). Matches as few as possible W.*?W “_dogs_” and “_cats_” in “_dogs_cats_”
+? One or more of (non-GREEDY). Matches as few as possible W.+?W “_dogs_” and “_cats_” in “_dogs_cats_”
?? Zero or once (non-GREEDY). Matches as few as possible d?? “1” in “Live1”
{n,}? At least “n” times (non-GREEDY). Matches as few as possible d{2,} “12” and “123” in “1_12_123”
{n,m}? Between “n” and “m” times (non-GREEDY). Matches as few as possible d{3,4} “123” and “1234” in “1_12_123_1234”

Grouping

Below the basic grouping expressions:

Syntax Description Example Example match
(expression) Group and capture the expression within the parenthesis ( ) ([0-9]*) Captures “123, “345” and “789” within “123-456-789”
(?:expression) Group BUT DON’T CAPTURE the expression within the parenthesis ( ) (?:[0-9]*)([A-Z]*)(?:[0-9]*) Captures only “hello” in “123hello456”

Using Regex in VBA

To use Regex in VBA you need to use the RegExp object which is defined in the Microsoft VBScript Regular Expressions library. To start using this object add the following reference to your VBA Project: Tools->References->Microsoft VBScript Regular Expressions. Otherwise, if you don’t want to reference this library every time you can also create the RegExp object using the CreateObject function.

Option 1: Referencing the library Microsoft VBScript Regular Expressions

Option 2: Using the CreateObject function

I personally prefer using the CreateObject function as it does not require referencing the library every time the Workbook is opened on a new workstation.

The RegExp object has the following properties:

  • Pattern – The pattern (written in Regex) which you want to match against (e.g. “(.*)”)
  • IgnoreCase – Ignore letter case (captial/non-capital letters)
  • Global – Do you want to find all possible matches in the input string? If false, only match the first found pattern. Set false if you need just the first match for performance
  • MultiLine – Do you want to match the pattern across line breaks?

The RegExp object facilitates the following 3 operations:

  • Test (string) – returns True if the pattern can be matched agaist the provided string
  • Replace (search-string, replace-string) – replaces occurrences of the pattern in search-string with replace-string
  • Execute (search-string) – returns all matches of the pattern against the search-string

Regex: Test pattern against a string

The Test function allows you to test whether the selected Pattern provides any match against the string.

Regex: Replace pattern in a string

The Replace function will replace the first (if Global = False) or all matching patterns (if Global = True) within a certain string with another string of your choosing.

Regex: Match pattern in a string

The Execute function will match the first or all instances of a certain pattern within a certain string. You can also “capture” parts of the patterns as so called “Submatches”.

As you can see we have managed to capture 3 instances of the 123-[0-9]+ pattern in the string. We can also define a “capture” within our pattern to capture parts of the pattern by embracing them with brackets “()”. See the example below:

Regex: Using Regex as an Excel Formula

Excel does not natively provide any Regex functions which often requires creating complex formulas for extracting pieces of strings otherwise easy to extract using Regular Expressions. Hence, to facilitate Regex in Excel you need to use User Defined Functions – functions defined in VBA but accessible as regular functions in Excel. Below find 2 basic UDF functions created just for this use:

Now for an example:

Regex Excel: Regex UDF functions example
Regex UDF functions example

…and the result:
Regex Excel: Regex UDF functions result
Regex UDF functions result

Download Excel Regex example

You can download VBA Regex and other great snippets as part of the VBA Time Saver Kit. Feel free to download the full kit using the link below:

ReFiddle – Online testing your Regex!

Want to test quickly a Regular Expression (Regex)? Use ReFiddle. It is a great tool to quickly validate if a Regex works and to be able to quickly share your regex with others!

Keep in mind, however, that the VBA Regular Expression language (supported by RegExp object) does not support all Regular Expressions which are valid in ReFiddle.

Learn Regex (Regular Expression) the Fun way

Want to learn building Regex (Regular Expressions) and have some fun at the same time?

Try Regex Golf:
Regex Golf

regex golf

How to Split Cells in Excel. How to Split Excel Columns.

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading...

The easiest way on how to split Cells in Excel or split Columns in Excel, is to select the column you want to split. Next go to the Data ribbon and hover to the Data Tools group. Next Select Text to Columns and proceed according to the instructions.

The above works for simple splits on delimiters such as Commas, Semicolons, Tabs etc. However, for Non-Standard patterns such as Capital letters, specific word you might need to revert to more elaborate solutions such using VBA String functions or the VBA Regex. Read on to learn more.

CONTENTS

Splitting Cells using Text to Columns

A Delimiter is a sequence of 1 or more characters to separate columns within a Text String. An example of a Delimiter is the Comma in the following Text String Columns1,Column2 which separates the String Column1 from Column2. Popular Delimiters often used are: Commas (,), Semicolons (;), Dots (.), Tabs (\t), Spaces (\s). A Delimiter can be just as well any Sequence of Characters.

How to Split Cells in Excel using Text to Columns

The most obvious choice when wanting to Split Cells in Excel is to use the DATA Ribbon Text to Columns feature.

Select the Column

Select the Column with Cells you want to Split in Excel:

how to split cells in excel select column
Select Column

Select first column and proceed to Text to Columns

Select the entire first column where all your data should be located. Next click on the Text to Columns button in the DATA ribbon tab:

Data ribbon: Text to Columns
Data ribbon: Text to Columns

Proceed according to Wizard instructions

This is the hard part. Text to Columns need additional information on the delimiter and format of your columns.

Delimited or Fixed width?

Delimiters are any specific Sequence of Characters (like a comma or semicolon). Fixed Width means that each column in the Cell is separated by a Fixed Width of Whitespace Characters. In this case we select Delimited. Next click Next to proceed.

Text to Columns Wizard: Step 1
Text to Columns Wizard: Step 1

Select delimiter

Assuming your columns are separated with a specific Delimiter you need to provide this delimiter in the Wizard. Look at the Data preview to make sure your columns will be separated correctly. When finished proceed with Next

Text to Columns Wizard: Step 2
Text to Columns Wizard: Step 2

Format your columns (optional)

The last step is to format your columns if needed. If your columns represent Dates or you want to pull a column containing numbers/dates as text instead – be sure to format it appropriately. Usually, however, you are fine with hitting Finish:

Text to Columns Wizard: Step 3
Text to Columns Wizard: Step 3

The Resulting Split Columns

If you have proceeded according to the steps above you should have a neatly formatted spreadsheet like the one below.

how to split cells in excel the result
Splitted Columns in Excel

Splitting Cells using Formulas

Another way of how to Split Cells in Excel is using the LEFT, RIGHT and LEN functions. See examples below:

Splitting against a Delimiter:

The formula:

Split Cells on Patterns

Sometimes instead of Delimiters you want to Split your Excel Cells on Patterns that are dynamic and may be different for each cell in a certain column. Fortunately Excel supports Regular Expressions, which allow you to Define Patterns on which your Cells are to be Split.

Let us first introduce my often used GetRegex UDF function:

To install it – open your DEVELOPER Excel Tab, click Visual Basic and add the code above to any new VBA Module.

If you are not familiar with Regular Expressions read the VBA Regex Tutorial.

What does this UDF function do? It extracts any text that matches a certain pattern. Let’s see it in action:

Split columns: Split the first column
Split first column

Example 1: Splitting Cells on Capital Letters

Let us take a common example where we have 1 Column of Cells that have 2 merged Columns inside. We want to split them on the second capital letter:

Split columns on regular expressions
Splitting columns on regular expressions

Now splitting this on the second capital letter using the FIND, LEFT, RIGHT and LEN functions will be a nightmare.

Let’s decipher the regular expression now:

Pattern Description
[A-Z].+? This will catch all words and whitespaces starting with a Capital letter the ? sign means that this is a non-greedy regular expression
([A-Z].+?) The () brackets will capture inside any pattern matching this regular expression
([A-Z].+?)[A-Z].+ The final regex this will capture only the first words and whitespaces starting with a Capital letter. Notice that the capture will end at the next Capital letter

Now for the second column:

Split columns: Split second column
Split second column

Great right? Now just drag the formula across the rows – and you are done!

Example 2: Splitting Cells on Whitespaces

A simple example – let us Split an Excel Cell on a Variable number of Whitespace characters. Let us say the Words in our String can have 1 or more Spaces in between.
split columns on pattern non whitespace
The Formula for the above is:

Again let us break it down:

Pattern Description
[^\s] This specifies a non-whitespace character
[^\s]+ This specifies at least 1 non-whitespace character
([^\s]+) The () brackets will capture inside any pattern matching this regular expression – hence all sequences of non-whitespace characters

Split Excel Function

If you just need an Excel Split function and you can introduce the following UDF Function (copy to VBA Module):

It uses the VBA Split function which is available in VBA only.

Parameters

str
A String.

delimiter
The delimiter on which the Split operation is to be done.

index
The index of the substring resulting from the Split.

How to use the Excel Split Function

The Function will be available as an Excel Function:
excel split string function
The above Formula looks like this (return second substring from Split):

Simple class for browser automation in VBA

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Web browser automation (using Microsoft’s Web Browser) is not an easy task in VBA when doing some Web Scraping. Excel is certainly a great tool for building your web automation scripts but at every corner you find some obstacles. For me the most challenging was always the nondeterministic state of the IE browser control whenever waiting for it to finish loading/refreshing a website.

Be sure to checkout my VBA Web Scraping Kit which includes a whole worth of VBA scripts for Web Scraping

The problem

Most browser automation code out there includes the following lines of code whenever waiting for the Internet Explorer browser control to finish loading or refreshing:

While (objIE.Busy Or objIE.READYSTATE <> READYSTATE.READYSTATE_COMPLETE)
        DoEvents
Wend
'The row below may sometimes raise an error!
Set element = objIE.Document.GetElementById(id) 

The problem is, however, that although IE might state that it has fully loaded the web page this in fact does not always have to be the case and some code might generate errors at random lines of code.

The solution

One way of going around this problem is using arbitrary delays – leveraging Application.OnTime / Application.Wait / Sleep. This workaround, although somewhat useful, in most cases will be unsatisfactory as still does not guarantee that you will actually be able to access a web element or click a button that simply might not be loaded yet due to some JavaScript running in the background. So how to solve this issue?

My answer? Wait until the element appears accessible. See an example function below that will wait until the element is accessible on the web page and only then will it return it.

So will the original code above change? Not too much really when we use this new function:

The code above waits for the element with id to appear accessible on the web page. The 5 millisecond Sleep interval is simply there so we are able to specify an upper threshold for an error to appear on a web page (10 seconds). This threshold is there to prevent us from waiting for an infinite period of time in case the web page crashed etc. This I believe is an honest way out of this conundrum.

A simple class for deterministic browser automation

Browser automation requires usually a lot of coding and in order to provide the deterministic automation we need we would need to encapsulate more procedures for getting elements by id, name, tagname, classname and even regular expressions.


The Google web page is also not an easy task for browser automation, assuming you would not embed your query in the GET params but treat it like a regular web page where all data is to be input and handled by forms.

The IE class I wrote handles queries to the Google Web Page like this:

That’s 10 lines of code and not hassle with handling the accessibility of any web elements. All the blocking/waiting is handled in the Get* methods which will simply wait until the control is available in the Web Browser or raise an error otherwise if the maxTimeout threshold is breached. You might have noticed the there is still a method called WaitForIE which actually still waits for the browser to confirm that it is not Busy anymore. Why is it there? Because we need to be sure that the browser has at least mostly loaded the new content and that the element we are looking for is not found, by mistake, in the previous page content. This code has proven deterministic for me in over 1’000 tested queries!

One other interesting thing is the GetRegex method which I basically scrapped of my Excel Scrape HTML Add-In. I prefer regular expressions over any other methods for extracting contents from strings/web pages and I encourage you to do so too. It is much easier and once you get the hand of it, it really makes life easier.

The code of the IE class (click on the arrow to expand):

Deterministic browser automation class

I hope this VBA class becomes the cornerstone of every browser automation script you write in VBA!

Download

Feel free to download the whole class file here:

Next steps

Check out my Scrape HTML Add-In for easy HTML scraping without any VBA:
Excel Scrape HTML Add-In

Excel Scrape HTML Add-In now with HTML caching

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

I am constantly extending the Scrape HTML Add-In when I have some spare time. Today’s update adds Caching to the Get* functions.

See the latest version of the Excel Scrape HTML Add-In here:
Excel Scrape HTML Add-In

Why caching?

The primary concept of the Add-In is to reduce any need for writing VBA code. However, the problem with the Get* functions may be that if you specify multiple functions with the same URL there might be some overhead due to having to download the same HTML content just as many times – which might heavily impact performance. In order to solve this issue I have introduced the Cache functionality. If you specify the same URL in more than 1 Get* function then provided that the cache timeout has not passed the HTML content will not be refreshed from the Web but from the Cache. E.g. if you have 2 Get* functions with the same URL and a 60 sec timeout, then only the first call to the Get* function will download the content from the Web, whereas the second one will not – provided that the second call happens before the 60 sec timeout ends. You can change the setting at any time in the “Cache settings”.

Cache Settings
Cache Settings