Tag Archives: regular expression

regex

Excel Regex Tutorial (Regular Expressions)

1 Star2 Stars3 Stars4 Stars5 Stars (11 votes, average: 4.45 out of 5)
Loading...

Regular expressions are ubiquitous in the developer world. They are used to validate website input, search for word patterns in large strings/texts and for many other uses. In Excel, Regular Expressions (RegEx or simply Regex) are not much advertised. Most users are good with using simple LEFT, RIGHT, MID and FIND functions for their string manipulation. These functions are, however, quite primitive and limited. Knowing how to use Regular Expressions (Regex) in Excel will save you a lot of time. This Excel Regex Tutorial focuses both on using Regex functions and in VBA. Let’s, however, not forget that VBA has also adopted the Like operator which sometimes allows you to achieve some tasks reserved for Regular Expressions.

excel regex level
Using the right tools

What I aim with this tutorial is to make sure you learn to use the right tools at the right time!

LVL 1: Manipulating Strings

Before jumping into Regular Expressions I encourage you to spend a moment on learning the basic string manipulation functions available both as Excel functions and in VBA. Click here to go to the VBA String Functions reference:

VBA String Functions
VBA String Functions

These functions, when used in combination, allow you to achieve most often required string manipulation tasks.

Read more on the VBA String Functions here

LVL 2: Like Operator

Before we move on to Regular Expressions let’s focus on a bit on the Like Operator which often go unmentioned as some of us are so at home with writing Regular Expressions that we don’t want to make the extra effort to learn this.

If "Animal" Like "[A-Z]*" then 
   Debug.Print "Match: String starting with Capital letter!"
End If
Read more on the VBA Like Operator here

LVL 3: Regular Expressions (Excel Regex – RegExp)

A Regex (Regular Expression) is basically a pattern matching strings within other strings. Let’s demonstrate this with a simple Regex example. Let us assume we have the text below. And we want to capture just the numbers.

Hello 134, World 3424, 04234 Some Text

Without knowing ahead how the text looks like it would be hard to extract these numbers both using Excel Functions and VBA.
But with a Regex we can extract the numbers with a simple pattern:

[0-9]+

The captured numbers:

134, 3424, 04234

What does the [0-9]+ pattern represent? It translates to the following: capture any pattern matching the following range of characters ([ ]), being numbers from 0-9, in a sequence of at least 1 or more (+). As you can see a Regex uses a certain code to translate your pattern.

Regular Expression Language

The Regular Expression language (Regex) is quite elaborate but allows you to match virtually any regular language. Below a quick reference:

Matching characters

Syntax Description Example Example match
. Any character except newline (vbNewLine) d.g “dog” in “My dog is named dingo”
[characters] Matches any provided character between brackets [ ] [af] “a” , “f” in “alfa”
[^characters] Matches any character not being one of the provided between brackets [ ] [af] “a” , “f” in “alfa”
[startend] Matches any character belonging to the character range specified between brackets [ ] [0-9] “1” and “2” in “12”
w Any word character (letters, modifiers, digits, punctuation and connectors) w “I”, “a” “m” “J” “o” “h” “n” in “I_am.John”
W Any non-word character w “_” and “.” in “I_am.John”
s Any white space character s ” ” in “Hi There!”
S Any non-white space character S “M” and “e” in “M e”
d Any decimal digit d “1” and “2” in “12”
D Any non-decimal digit D “d”, “_”, “.” in “d_.”
Followed by any special character – escapes special characters . “.” im “d.g”
r Tab (vbTab) r
n Carriage return / new line (vbNewLine)

Quantifiers

Quantifiers allow you to specify the amount of times a certain pattern is supposed to matched against a string. It is important to understand the difference between GREEDY and non-GREEDY quantifiers:

Greedy vs Non-Greedy Regular Expression
Greedy vs Non-Greedy Regular Expression

Syntax Description Example Example match
* Zero or more of (GREEDY). Matches as many as possible W.*W “_dogs_cats_” in “_dogs_cats_”
+ One or more of (GREEDY). Matches as many as possible Ww+W “_dogs_cats_” in “_dogs_cats_”
? Zero or once (GREEDY). Matches as many as possible d? “1” in “Live1”
{n} “n” many times d{2} “21” and “12” in “212”
{n,} At least “n” times (GREEDY) d{2,} “12” and “123” in “1_12_123”
{n,m} Between “n” and “m” times (GREEDY) d{3,4} “123” and “1234” in “1_12_123_1234”
*? Zero or more of (non-GREEDY). Matches as few as possible W.*?W “_dogs_” and “_cats_” in “_dogs_cats_”
+? One or more of (non-GREEDY). Matches as few as possible W.+?W “_dogs_” and “_cats_” in “_dogs_cats_”
?? Zero or once (non-GREEDY). Matches as few as possible d?? “1” in “Live1”
{n,}? At least “n” times (non-GREEDY). Matches as few as possible d{2,} “12” and “123” in “1_12_123”
{n,m}? Between “n” and “m” times (non-GREEDY). Matches as few as possible d{3,4} “123” and “1234” in “1_12_123_1234”

Grouping

Below the basic grouping expressions:

Syntax Description Example Example match
(expression) Group and capture the expression within the parenthesis ( ) ([0-9]*) Captures “123, “345” and “789” within “123-456-789”
(?:expression) Group BUT DON’T CAPTURE the expression within the parenthesis ( ) (?:[0-9]*)([A-Z]*)(?:[0-9]*) Captures only “hello” in “123hello456”

Using Regex in VBA

To use Regex in VBA you need to use the RegExp object which is defined in the Microsoft VBScript Regular Expressions library. To start using this object add the following reference to your VBA Project: Tools->References->Microsoft VBScript Regular Expressions. Otherwise, if you don’t want to reference this library every time you can also create the RegExp object using the CreateObject function.

Option 1: Referencing the library Microsoft VBScript Regular Expressions

Dim regex as RegExp
Set regex = New RegExp

Option 2: Using the CreateObject function

Dim regex as Object
Set regex = CreateObject("VBScript.RegExp")

I personally prefer using the CreateObject function as it does not require referencing the library every time the Workbook is opened on a new workstation.

The RegExp object has the following properties:

  • Pattern – The pattern (written in Regex) which you want to match against (e.g. “(.*)”)
  • IgnoreCase – Ignore letter case (captial/non-capital letters)
  • Global – Do you want to find all possible matches in the input string? If false, only match the first found pattern. Set false if you need just the first match for performance
  • MultiLine – Do you want to match the pattern across line breaks?

The RegExp object facilitates the following 3 operations:

  • Test (string) – returns True if the pattern can be matched agaist the provided string
  • Replace (search-string, replace-string) – replaces occurrences of the pattern in search-string with replace-string
  • Execute (search-string) – returns all matches of the pattern against the search-string

Regex: Test pattern against a string

The Test function allows you to test whether the selected Pattern provides any match against the string.

Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")

With regex
  .Pattern = "[0-9]+"
End With
    
str = "Hello 123 World!"
Debug.Print regex.Test(str) 'Result: True

str = "Hello World!"
Debug.Print regex.Test(str) 'Result: False    

Regex: Replace pattern in a string

The Replace function will replace the first (if Global = False) or all matching patterns (if Global = True) within a certain string with another string of your choosing.

Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")

With regex
  .Pattern = "123-[0-9]+-123"
  .Global = True 'If False, would replace only first
End With
    
str = "321-123-000-123-643-123-888-123"
Debug.Print regex.Replace(str, "<Replace>") 
'Result: 321-<Replace>-643-<Replace>

Regex: Match pattern in a string

The Execute function will match the first or all instances of a certain pattern within a certain string. You can also “capture” parts of the patterns as so called “Submatches”.

Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")

With regex
  .Pattern = "123-[0-9]+"
  .Global = True
End With
    
str = "321-123-000-123-643-123-888-123"

Set matches = regex.Execute(str)
    
For Each match In matches
  Debug.Print match.Value 'Result: 123-000, 123-643, 123-888
Next match

As you can see we have managed to capture 3 instances of the 123-[0-9]+ pattern in the string. We can also define a “capture” within our pattern to capture parts of the pattern by embracing them with brackets “()”. See the example below:

Dim regex As Object, str As String
Set regex = CreateObject("VBScript.RegExp")

With regex
  .Pattern = "123-([0-9]+)" 'Notice the () around the second sequence
  .Global = True
End With
    
str = "321-123-000-123-643-123-888-123"

Set matches = regex.Execute(str)
  
For Each match In matches
  Debug.Print match.Value 'Result: 123-000, 123-643, 123-888
  If match.SubMatches.Count > 0 Then
    For Each subMatch In Match.SubMatches
      Debug.Print subMatch 'Result: 000, 643, 888
      Next subMatch
  End If
Next match

Regex: Using Regex as an Excel Formula

Excel does not natively provide any Regex functions which often requires creating complex formulas for extracting pieces of strings otherwise easy to extract using Regular Expressions. Hence, to facilitate Regex in Excel you need to use User Defined Functions – functions defined in VBA but accessible as regular functions in Excel. Below find 2 basic UDF functions created just for this use:

'Returns the number of matches found for a given regex
'str - string to test the regex on
'reg - the regular expression
Public Function RegexCountMatches(str As String, reg As String) As String
    On Error GoTo ErrHandl
    Set regex = CreateObject("VBScript.RegExp"): regex.Pattern = reg: regex.Global = True
    If regex.test(str) Then
        Set matches = regex.Execute(str)
        RegexCountMatches = matches.Count
        Exit Function
    End If
ErrHandl:
    RegexCountMatches = CVErr(xlErrValue)
End Function

'Executes a Regular Expression on a provided string and returns a selected submatch
'str - string to execute the regex on
'reg - the regular expression with at least 1 capture '()'
'matchIndex - the index of the match you want to return (default: 0)
'subMatchIndex - the index of the submatch you want to return (default: 0)
Public Function RegexExecute(str As String, reg As String, _
                             Optional matchIndex As Long, _
                             Optional subMatchIndex As Long) As String
    On Error GoTo ErrHandl
    Set regex = CreateObject("VBScript.RegExp"): regex.Pattern = reg
    regex.Global = Not (matchIndex = 0 And subMatchIndex = 0) 'For efficiency
    If regex.test(str) Then
        Set matches = regex.Execute(str)
        RegexExecute = matches(matchIndex).SubMatches(subMatchIndex)
        Exit Function
    End If
ErrHandl:
    RegexExecute = CVErr(xlErrValue)
End Function

Now for an example:

Regex Excel: Regex UDF functions example
Regex UDF functions example

…and the result:
Regex Excel: Regex UDF functions result
Regex UDF functions result

Download Excel Regex example

You can download a working example with the Regex UDF function here:

ReFiddle – Online testing your Regex!

Want to test quickly a Regular Expression (Regex)? Use ReFiddle. It is a great tool to quickly validate if a Regex works and to be able to quickly share your regex with others!

Keep in mind, however, that the VBA Regular Expression language (supported by RegExp object) does not support all Regular Expressions which are valid in ReFiddle.

Learn Regex (Regular Expression) the Fun way

Want to learn building Regex (Regular Expressions) and have some fun at the same time?

Try Regex Golf:
Regex Golf

regex golf

split cells in excel

How to Split Cells in Excel – Split Excel Columns

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading...

One common issue that many people have is to How to Split Columns in Excel upon non common delimiters e.g. Commas ,, Semicolons ;, or Non-Standard patterns such asCapital letters, a specific word – patterns which are not supported by the Text to Columns feature in the DATA ribbon tab.

How to deal with this? Usually you would need to resort to a VBA macro or really loooooong formula using the FIND, LEFT, RIGHT and LEN functions. That’s not an effective way of tackling this issue. What is the alternative? A simply Regex UDF function.

CONTENTS

Splitting Cells on Delimiter

A Delimiter is a sequence of 1 or more characters to separate columns within a Text String. An example of a Delimiter is the comma , in the following Text String Columns1,Column2 which separates the String Column1 from Column2. Popular Delimiters often used are:

  • Commas ,
  • Semicolons ;
  • Dots .
  • Tabs Tab
  • Spaces

A Delimiter can be any Sequence of Characters.

How to Split Cells in Excel using Text to Columns

The most obvious choice when wanting to Split Cells in Excel is to use the DATA Ribbon Text to Columns feature.

Select the Column

Select the Column with Cells you want to Split in Excel:

how to split cells in excel select column
Select Column

Select first column and proceed to Text to Columns

Select the entire first column where all your data should be located. Next click on the Text to Columns button in the DATA ribbon tab:

Data ribbon: Text to Columns
Data ribbon: Text to Columns

Proceed according to Wizard instructions

This is the hard part. Text to Columns need additional information on the delimiter and format of your columns.

Delimited or Fixed width?

Delimiters are any specific Sequence of Characters (like a comma or semicolon). Fixed Width means that each column in the Cell is separated by a Fixed Width of Whitespace Characters. In this case we select Delimited. Next click Next to proceed.

Text to Columns Wizard: Step 1
Text to Columns Wizard: Step 1

Select delimiter

Assuming your columns are separated with a specific Delimiter you need to provide this delimiter in the Wizard. Look at the Data preview to make sure your columns will be separated correctly. When finished proceed with Next

Text to Columns Wizard: Step 2
Text to Columns Wizard: Step 2

Format your columns (optional)

The last step is to format your columns if needed. If your columns represent Dates or you want to pull a column containing numbers/dates as text instead – be sure to format it appropriately. Usually, however, you are fine with hitting Finish:

Text to Columns Wizard: Step 3
Text to Columns Wizard: Step 3

The Resulting Split Columns

If you have proceeded according to the steps above you should have a neatly formatted spreadsheet like the one below.

how to split cells in excel the result
Splitted Columns in Excel

Split Cells on Patterns

Sometimes instead of Delimiters you want to Split your Excel Cells on Patterns that are dynamic and may be different for each cell in a certain column. Fortunately Excel supports Regular Expressions, which allow you to Define Patterns on which your Cells are to be Split.

Let us first introduce my often used GetRegex UDF function:

To install it – open your DEVELOPER Excel Tab, click Visual Basic and add the code above to any new VBA Module.
Its parameters are:

str
Any String.

reg
The Regular Expression including 1 Capture ().

index
The index of the capture e.g. if your Regex captures 3 matches then index=0 returns the first one and index=2 returns the 3rd one.

What does this UDF function do? If extracts any text via a Regular Expression. Let’s see it in action:

Split columns: Split the first column
Split first column

Example 1: Splitting Cells on Capital Letters

Let us take a common example where we have 1 Column of Cells that have 2 merged Columns inside. We want to split them on the second capital letter:

Split columns on regular expressions
Splitting columns on regular expressions

Now splitting this on the second capital letter using the FIND, LEFT, RIGHT and LEN functions will be a nightmare.

Let’s decipher the regular expression now:

Pattern Description
[A-Z].+? This will catch all words and whitespaces starting with a Capital letter the ? sign means that this is a non-greedy regular expression
([A-Z].+?) The () brackets will capture inside any pattern matching this regular expression
([A-Z].+?)[A-Z].+ The final regex this will capture only the first words and whitespaces starting with a Capital letter. Notice that the capture will end at the next Capital letter

Now for the second column:

Split columns: Split second column
Split second column

Great right? Now just drag the formula across the rows – and you are done!

Example 2: Splitting Cells on Whitespaces

A simple example – let us Split an Excel Cell on a Variable number of Whitespace characters. Let us say the Words in our String can have 1 or more Spaces in between.
split columns on pattern non whitespace
The Formula for the above is:

Again let us break it down:

Pattern Description
[^\s] This specifies a non-whitespace character
[^\s]+ This specifies at least 1 non-whitespace character
([^\s]+) The () brackets will capture inside any pattern matching this regular expression – hence all sequences of non-whitespace characters

Split Excel Function

If you just need an Excel Split function and you can introduce the following UDF Function (copy to VBA Module):

It uses the VBA Split function which is available in VBA only.

Parameters

str
A String.

delimiter
The delimiter on which the Split operation is to be done.

index
The index of the substring resulting from the Split.

How to use the Excel Split Function

The Function will be available as an Excel Function:
excel split string function
The above Formula looks like this (return second substring from Split):

hta example

Simple class for browser automation in VBA

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Web browser automation (using Microsoft’s Web Browser) is not an easy task in VBA when doing some Web Scraping. Excel is certainly a great tool for building your web automation scripts but at every corner you find some obstacles. For me the most challenging was always the nondeterministic state of the IE browser control whenever waiting for it to finish loading/refreshing a website.

Be sure to checkout my VBA Web Scraping Kit which includes a whole worth of VBA scripts for Web Scraping

The problem

Most browser automation code out there includes the following lines of code whenever waiting for the Internet Explorer browser control to finish loading or refreshing:

While (objIE.Busy Or objIE.READYSTATE <> READYSTATE.READYSTATE_COMPLETE)
        DoEvents
Wend
'The row below may sometimes raise an error!
Set element = objIE.Document.GetElementById(id) 

The problem is, however, that although IE might state that it has fully loaded the web page this in fact does not always have to be the case and some code might generate errors at random lines of code.

The solution

One way of going around this problem is using arbitrary delays – leveraging Application.OnTime / Application.Wait / Sleep. This workaround, although somewhat useful, in most cases will be unsatisfactory as still does not guarantee that you will actually be able to access a web element or click a button that simply might not be loaded yet due to some JavaScript running in the background. So how to solve this issue?

My answer? Wait until the element appears accessible. See an example function below that will wait until the element is accessible on the web page and only then will it return it.

Public Function GetElementById(id As String, Optional isBlocking As Boolean)
'id: id of the html element; isBlocking: is the code to be blocked until the element is found
    Dim timeout As Long
    On Error Resume Next
TryAgain:
    Set GetElementById = objIE.Document.GetElementById(id)
    If IIf(IsMissing(isBlocking), True, isBlocking) And _
     (Err.Number <> 0 Or (GetElementById Is Nothing)) And _ 
     timeout < maxTimeout Then
        Err.Clear: DoEvents: Sleep 5: timeout = timeout + 5
        GoTo TryAgain
    End If
    If (GetElementById Is Nothing) Then Err.Raise vbObjectError + ElementNotFoundError, "IE.GetElementById"
End Function

So will the original code above change? Not too much really when we use this new function:

While (objIE.Busy Or objIE.READYSTATE <> READYSTATE.READYSTATE_COMPLETE)
        DoEvents
Wend
'This will wait maxTimeout miliseconds until raising an error or return 
Set element = GetElementById(id)

The code above waits for the element with id to appear accessible on the web page. The 5 millisecond Sleep interval is simply there so we are able to specify an upper threshold for an error to appear on a web page (10 seconds). This threshold is there to prevent us from waiting for an infinite period of time in case the web page crashed etc. This I believe is an honest way out of this conundrum.

A simple class for deterministic browser automation

Browser automation requires usually a lot of coding and in order to provide the deterministic automation we need we would need to encapsulate more procedures for getting elements by id, name, tagname, classname and even regular expressions.


The Google web page is also not an easy task for browser automation, assuming you would not embed your query in the GET params but treat it like a regular web page where all data is to be input and handled by forms.

The IE class I wrote handles queries to the Google Web Page like this:

 Dim linkText as String, query as String: query = "Dog Wikipedia"
 On Error GoTo CleanIE
 Set ieClass = New IE
 ieClass.Navigate "https://www.google.pl/", False
 ieClass.GetElementByName("q").Value = query
 ieClass.GetElementByTagName("form").Submit
 ieClass.WaitForIE 'First wait for the page to mostly load
 linkText = ieClass.GetRegex("<h3(?:.|/n)*?<a onmousedown=""return(?:.|/n)*?"" href=""(?:.|/n)*?"">((?:.|/n)*?)</a>")
CleanIE:
 ieClass.Quit

That’s 10 lines of code and not hassle with handling the accessibility of any web elements. All the blocking/waiting is handled in the Get* methods which will simply wait until the control is available in the Web Browser or raise an error otherwise if the maxTimeout threshold is breached. You might have noticed the there is still a method called WaitForIE which actually still waits for the browser to confirm that it is not Busy anymore. Why is it there? Because we need to be sure that the browser has at least mostly loaded the new content and that the element we are looking for is not found, by mistake, in the previous page content. This code has proven deterministic for me in over 1’000 tested queries!

One other interesting thing is the GetRegex method which I basically scrapped of my Excel Scrape HTML Add-In. I prefer regular expressions over any other methods for extracting contents from strings/web pages and I encourage you to do so too. It is much easier and once you get the hand of it, it really makes life easier.

The code of the IE class (click on the arrow to expand):

Deterministic browser automation class

'!!!Please reference in Tools->References "Microsoft Internet Controls library"!!!
Option Explicit
Public Enum READYSTATE
    READYSTATE_UNINITIALIZED = 0
    READYSTATE_LOADING = 1
    READYSTATE_LOADED = 2
    READYSTATE_INTERACTIVE = 3
    READYSTATE_COMPLETE = 4
End Enum
Private objIE As Object
Const maxTimeout As Long = 10000 'Max time in milliseconds to wait until a control is found before raising error
Public ElementNotFoundError As Long
Public Function GetIE()
    Set GetIE = objIE
End Function
Private Sub Class_Initialize()
    ElementNotFoundError = 1
End Sub
Public Sub Navigate(urlAddress As String, isVisible As Boolean)
'urlAddress: destination url; isVisible: should the IE window be visible
    Set objIE = New InternetExplorer: objIE.Visible = isVisible: objIE.Navigate urlAddress
    WaitForIE
End Sub
Public Sub WaitForIE()
    While (objIE.Busy Or objIE.READYSTATE <> READYSTATE.READYSTATE_COMPLETE)
        DoEvents
    Wend
End Sub
'----Get elements----
Public Function GetElementByName(name As String, Optional isBlocking As Boolean, Optional index As Long)
'name: name of the html element; isBlocking: is the code to be blocked until the element is found; index: index of the element in the array of elements with this name
    Dim elems As Object, timeout As Long
    On Error Resume Next
TryAgain:
    Set elems = objIE.Document.GetElementsByName(name): Set GetElementByName = elems(IIf(IsMissing(index), 0, index))
    If IIf(IsMissing(isBlocking), True, isBlocking) And (Err.Number <> 0 Or (GetElementByName Is Nothing)) And timeout < maxTimeout Then
        Err.Clear: DoEvents: Sleep 5: timeout = timeout + 5
        GoTo TryAgain
    End If
    If (GetElementByName Is Nothing) Then Err.Raise vbObjectError + ElementNotFoundError, "IE.GetElementByName"
End Function
Public Function GetElementById(id As String, Optional isBlocking As Boolean)
'id: id of the html element; isBlocking: is the code to be blocked until the element is found
    Dim timeout As Long
    On Error Resume Next
TryAgain:
    Set GetElementById = objIE.Document.GetElementById(id)
    If IIf(IsMissing(isBlocking), True, isBlocking) And (Err.Number <> 0 Or (GetElementById Is Nothing)) And timeout < maxTimeout Then
        Err.Clear: DoEvents: Sleep 5: timeout = timeout + 5
        GoTo TryAgain
    End If
    If (GetElementById Is Nothing) Then Err.Raise vbObjectError + ElementNotFoundError, "IE.GetElementById"
End Function
Public Function GetElementByTagName(tagName As String, Optional isBlocking As Boolean, Optional index As Long)
'tagName: tagname of the html element; isBlocking: is the code to be blocked until the element is found; index: index of the element in the array of elements with this name
    Dim elems As Object, timeout As Long
    On Error Resume Next
TryAgain:
    Set elems = objIE.Document.GetElementsByTagName(tagName): Set GetElementByTagName = elems(IIf(IsMissing(index), 0, index))
    If IIf(IsMissing(isBlocking), True, isBlocking) And (Err.Number <> 0 Or (GetElementByTagName Is Nothing)) And timeout < maxTimeout Then
        Err.Clear: DoEvents: Sleep 5: timeout = timeout + 5
        GoTo TryAgain
    End If
    If (GetElementByTagName Is Nothing) Then Err.Raise vbObjectError + ElementNotFoundError, "IE.GetElementByTagName"
End Function
Public Function GetElementByClassName(className As String, Optional isBlocking As Boolean, Optional index As Long)
'className: classname of the html element; isBlocking: is the code to be blocked until the element is found; index: index of the element in the array of elements with this name
    Dim elems As Object, timeout As Long
    On Error Resume Next
TryAgain:
    Set elems = objIE.Document.GetElementsByClassName(tagName): Set GetElementByClassName = elems(IIf(IsMissing(index), 0, index))
    If IIf(IsMissing(isBlocking), True, isBlocking) And (Err.Number <> 0 Or (GetElementByClassName Is Nothing)) And timeout < maxTimeout Then
        Err.Clear: DoEvents: Sleep 5: timeout = timeout + 5
        GoTo TryAgain
    End If
    If (GetElementByClassName Is Nothing) Then Err.Raise vbObjectError + ElementNotFoundError, "IE.GetElementByClassName"
End Function
'----Get HTML by regex----
Public Function GetRegex(reg As String, Optional isBlocking, Optional index As Integer) As String
'reg: regular expression with 1 capture "()"; isBlocking: is the code to be blocked until the element is found; index: index of the element in the array of elements fulfilling this regular expression
    On Error Resume Next
    Dim regex, matches, timeout As Long
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = reg
    regex.Global = True
    If index < 0 Then index = 0
TryAgain:
    If regex.Test(objIE.Document.body.innerHtml) Then
        Set matches = regex.Execute(objIE.Document.body.innerHtml)
        GetRegex = matches(index).SubMatches(0)
        Exit Function
    End If
    If IIf(IsMissing(isBlocking), True, isBlocking) And (Err.Number <> 0 Or GetRegex = vbNullString) And timeout < maxTimeout Then
        Err.Clear: DoEvents: Sleep 5: timeout = timeout + 5
        GoTo TryAgain
    End If
    GetRegex = ""
End Function
Public Function GetMatchCount(reg As String) As Long
'reg: regular expression with 1 capture "()"
    On Error Resume Next
    Dim regex, matches
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = reg
    regex.Global = True
    If regex.Test(objIE.Document.body.innerHtml) Then
        Set matches = regex.Execute(objIE.Document.body.innerHtml)
         GetMatchCount = matches.Count
        Exit Function
    End If
    GetMatchCount = 0
End Function
'----Quit and terminate----
Public Sub Quit()
    If Not (objIE Is Nothing) Then objIE.Quit
    Set objIE = Nothing
End Sub
Private Sub Class_Terminate()
    Quit
End Sub

I hope this VBA class becomes the cornerstone of every browser automation script you write in VBA!

Download

Feel free to download the whole class file here:

Next steps

Check out my Scrape HTML Add-In for easy HTML scraping without any VBA:
Excel Scrape HTML Add-In

scrape html add-in

Excel Scrape HTML Add-In now with HTML caching

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

I am constantly extending the Scrape HTML Add-In when I have some spare time. Today’s update adds Caching to the Get* functions.

See the latest version of the Excel Scrape HTML Add-In here:
Excel Scrape HTML Add-In

Why caching?

The primary concept of the Add-In is to reduce any need for writing VBA code. However, the problem with the Get* functions may be that if you specify multiple functions with the same URL there might be some overhead due to having to download the same HTML content just as many times – which might heavily impact performance. In order to solve this issue I have introduced the Cache functionality. If you specify the same URL in more than 1 Get* function then provided that the cache timeout has not passed the HTML content will not be refreshed from the Web but from the Cache. E.g. if you have 2 Get* functions with the same URL and a 60 sec timeout, then only the first call to the Get* function will download the content from the Web, whereas the second one will not – provided that the second call happens before the 60 sec timeout ends. You can change the setting at any time in the “Cache settings”.

Cache Settings
Cache Settings

regex

Excel Regex Tester Tool

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

I have recently added the Regex Tester Tool to the Scrape HTML Excel Add-In. Also as some might have noticed I decided to consolidate my HTML scraping UDF functions into the new Scrape HTML Excel Add-In.

What I was missing when using the UDF functions to scrape HTML content was some tool to test my regular expressions. Sometimes even a single character can render the whole regex to crash.

In the Scrape HTML Excel Add-In toolbox you will now find a very nifty tool called the Regex Tester.

Regex Tester Tool

Regex Tester Tool
Regex Tester Tool

The tool allows online evaulation of regular expressions – when any part of the expression is change the tool with evaluation your expression and returning the output. It comes useful when working with the GetElementByRegex and GetRegex functions but just as well can help you with any other regular expressions tasks!

Check out also this example video:

Download the tool together with the: Scrape HTML Excel Add-In.

I hope this will prove useful to some. Comment to let me know how this can be improved or what else should be in the Scraping HTML Excel Add-In toolbox!