As I mentioned I am planning to extend the Scrape HTML Add-In with additional tools and functionalities. Scraping HTML content should not require any VBA coding – this rule is guiding the further development of this tool.
Adding to the toolbox I would like to introduce you to the Scrape HTML Tool. The Get* functions (e.g. GetElementByRegex) which I posted earlier are really useful when you are making a solution which can be reused e.g. scraping regularly posted online data etc. I suppose, however, that sometimes there is only a need to scrape some content once but preferably in a structured manner or only some subsets of the content.
The Scrape HTML Tool
The tool comes in handy when you want to quickly scrape all items of a certain type (matching a certain regular expression). It comes with several predefined scraping regular expressions e.g. scraping URLs and img src properties. However, these examples are just to start you off with building your own patterns/expressions.
See this video on how the Scrape HTML Tool can help you:
Let me know if this tool is useful to you and if you see any need of extending it!
I am also planning to post some a simple tutorial or something to show more elaborate examples of scraping/downloading HTML content from the web. Information is power – it is time to make usage of data more simple.
I have recently added the Regex Tester Tool to the Scrape HTML Excel Add-In. Also as some might have noticed I decided to consolidate my HTML scraping UDF functions into the new Scrape HTML Excel Add-In.
What I was missing when using the UDF functions to scrape HTML content was some tool to test my regular expressions. Sometimes even a single character can render the whole regex to crash.
The tool allows online evaulation of regular expressions – when any part of the expression is change the tool with evaluation your expression and returning the output. It comes useful when working with the GetElementByRegex and GetRegex functions but just as well can help you with any other regular expressions tasks!
After my post on “SCRAPE HTML BY ELEMENT ID, NAME OR… ANY REGEX!” I have been thinking about tinkering the macros a little bit more to make scraping HTML content even easier and reducing any additional needs for writing VBA code. What was missing in the puzzle was additional parsing of the scraped content i.e. let us say you want to download a HTML table row-by-row and cell-by-cell. Well the regex will probably capture your first row and cell… or the whole table leaving you with the dirty work of extracting the data you need for each row.
I therefore redefined the GetElementByRegex function and added an additional supporting function GetRegex:
'GetElementByRegex - capture HTML content by regular expression
Public Function GetElementByRegex(url As String, reg As String, Optional index As Integer)
Dim XMLHTTP As Object, html As Object, objResult As Object
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set regEx = CreateObject("VBScript.RegExp")
regEx.Pattern = reg
regEx.Global = True
If regEx.Test(XMLHTTP.ResponseText) Then
Set matches = regEx.Execute(XMLHTTP.ResponseText)
If IsMissing(index) Then
GetElementByRegex = matches(0).SubMatches(0)
GetElementByRegex = matches(index).SubMatches(0)
GetElementByRegex = ""
'GetRegex - capture any regex from a string
Public Function GetRegex(str As String, reg As String, Optional index As Integer)
Set regEx = CreateObject("VBScript.RegExp")
regEx.Pattern = reg
regEx.Global = True
If regEx.Test(str) Then
Set matches = regEx.Execute(str)
If IsMissing(index) Then
GetRegex = matches(0).SubMatches(0)
GetRegex = matches(index).SubMatches(0)
GetRegex = ""
This may seem like a small change but see this example to appreciate how flexible and easy scraping HTML is now:
I did this in a separate cell to optimize the workbook (so that in case of a recalculation of the worksheet the site content does not have to be downloaded separately for each cell). Notice the regex ([^”]*?). This is a non-greedy capture of ALL characters (non-“). This guarantees that only this table is captured in the expression and not all tables. Using (.*)? would not be enough as the dot character does not match newlines.
Next getting the th header cells (next headers by changing the last index in the range 0-3):
This captures the second row and then extracts the first cell.
Download the Scrape HTML example
Download the full example:
This is in my opinion a very powerful set of tools for every analyst working daily on Internet based content. There is no need for writing any additional VBA as the GetRegex function can be nested any number of times to allow you to extract the data you need. Use the index parameter in these functions to capture cells in structured tables or repeating patterns to reduce the amount of code you need to write.
Usually when developing Excel solutions you want to version you file often to prevent data loss due to the application crashing etc. You will probably also want to keep the older versions of you files to be able to go back and recover any previously working code. When you do this once or twice in a while this is no issue. But when you are making significant changes in a short amount of time saving new versions is a time-consuming task. Excel versioning is therefore something many deem useful.
That’s why I made myself a very simple VBA method for automatically saving the current Excel xlsm file as a new version while keeping the previous versions of the file. The macro increments the current file version. It is best to set a keyboard shortcut to the macro to save time.
Versioning Excel Code
The following code will save the ActiveWorkbook as a new Workbook while appending the version number by 1 in the format “_vXXX” where XXX is the version number. The versioning macro will maintain the file extension.
Dim fileName As String, index As Long, ext As String
arr = Split(ActiveWorkbook.Name, ".")
ext = arr(UBound(arr))
If InStr(ActiveWorkbook.Name, "_v") = 0 Then
fileName = ActiveWorkbook.Path & "" & Left(ActiveWorkbook.Name, InStr(ActiveWorkbook.Name, ".") - 1) & "_v1." & ext
index = CInt(Split(Right(ActiveWorkbook.Name, Len(ActiveWorkbook.Name) - InStr(ActiveWorkbook.Name, "_v") - 1), ".")(0))
index = index + 1
fileName = ActiveWorkbook.Path & "" & Left(ActiveWorkbook.Name, InStr(ActiveWorkbook.Name, "_v") - 1) & "_v" & index & "." & ext
You can also download the file as a bas file:
The module sets the macro as a CTRL+SHIFT+S shortcut as having this line of code:
As mentioned above the macro in the download section is setup by default for CTRL+SHIFT+S shortcut. However, in case you want to change the shortcut. Simply go to the DEVELOPER ribbon and select Macros. Next select the SaveNewVersion macro and click Options.... This will prompt you for a new keyboard shortcut.
Quick Access Toolbar
Why remember a keyboard shortcut when you can add a neat icon to your Quick Access Toolbar in Excel.
Open the Quick Access Toolbar
Go to File, Options and open the Quick Access Toolbar.
Add versioning VBA macro to the Quick Access Toolbar
Proceed as shown below to Add the SaveNewVersion macro to the Quick Access Toolbar:
Optional: Modify the icon
Why stick to a default macro icon when we can make it more pleasant to the eye? Click on the SaveNewVersion macro and hit Modify. Next select a new icon from this window:
This is the final effect:
Simply hit the icon to save a new version of your Excel file! Remember to save the file in XLS/XLSM/ or XLSB format.
Installing as an Excel AddIn
The above will add the versioning feature to all your Workbooks as long as your Excel file with the SaveNewVersion macro is not moved or deleted!. I strongly recommend that instead you include this file into the AddIns folder before configuring this shortcut.
Save the XLSM file as AddIn
First save the file in XLA or XLAM format, as an Excel AddIn.
Save the file in Microsoft Excel AddIn directory
Save the file in the following directory for it to open automatically on startup:
A VBA Progress Bar can be used to manage your users anxiety about the execution time of your VBA Macro. Sometimes there are very large and complex solutions built in Excel (which is a mistake mind you), where calculations or macro executions can take minutes or even hours. This causes many issues, especially for the end users who usually do not know how long processing the calculations/macros will take. In such cases it is important to notify the end users of the progress of your macros/calculations so they can switch to other activities. This is where the VBA Progress bar can aid you.
For one of my older projects I needed a VBA Progress Bar that would show:
The current progress of the computations
How much execution time was left (estimation)
Users especially wanted to know how much execution time was left – whether they should grab a coffee or stay and wait for the macro to finish. Calculating this is best done by approximating the time it took to run
Animated VBA Progress Bar Example
The easiest approach to animating an Excel is to create a simple VBA UserForm with the use of a label control which width you can manipulate to show the current progress. Easy and straightforward.
'Add 10% progress
'Hide and remove the Progress Bar
Create your VBA Progress Bar
Below find a quick tutorial of how to create your own VBA Progress Bar User Form!
Create a customer User Form
First you need to create a customer VBA User Form. Click on the link in case you want a tutorial on how to create these.
It is best to rename your UserForm e.g. to ProgressBar. Next add 3 VBA Labels to the User Form and change some selected properties per the image below:
Copy the VBA Progress Bar Source Code
Now we need the logic that will allow you to configure and run the Progress Bar. Right click on your new User Form and click View Code. Next copy past the code below:
Initialize (title As String, Optional max As Long = 100)– needs to be run first. Initializes the variables needed to run the Progress Bar and allows you to set the title of the User Form and the max i.e. maximum % progress. By default max is set to 100 but you can change this e.g. you want to process 200 files then set it to 200
AddProgress (Optional inc As Long = 1) – add inc incrementally to the amount of total progress. Assuming if you want to process 200 files and set the max value in Initialize and you processed 3 files then run AddProgress with inc equal to 3. When you process another 4 files then run AddProgress with inc equal to 4.
Now that we know the functions needed to run the Progress Bar let us configure and run your Animated Progress Bar in Excel. I added step by step comments to the example code snippet above.
'Declare and Initialize the ProgressBar UserForm
'Set the Title for the ProgressBar and the Maximum Value making the UserForm visible
'Use the ProgressBar to track macro execution by running the For loop 100 times