Category Archives: MS Office

scrape html add-in

Excel Scraping HTML by Regular expression continued…

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

After my post on “SCRAPE HTML BY ELEMENT ID, NAME OR… ANY REGEX!” I have been thinking about tinkering the macros a little bit more to make scraping HTML content even easier and reducing any additional needs for writing VBA code. What was missing in the puzzle was additional parsing of the scraped content i.e. let us say you want to download a HTML table row-by-row and cell-by-cell. Well the regex will probably capture your first row and cell… or the whole table leaving you with the dirty work of extracting the data you need for each row.

Struggling with Web Scraping using VBA? Check out my VBA Web Scraping Kit!

UDF VBA functions for scraping HTML

I therefore redefined the GetElementByRegex function and added an additional supporting function GetRegex:

'GetElementByRegex - capture HTML content by regular expression
Public Function GetElementByRegex(url As String, reg As String, Optional index As Integer)
    Dim XMLHTTP As Object, html As Object, objResult As Object
    Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
    XMLHTTP.send
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = XMLHTTP.ResponseText
    Set regEx = CreateObject("VBScript.RegExp")
    regEx.Pattern = reg
    regEx.Global = True
    If regEx.Test(XMLHTTP.ResponseText) Then
        Set matches = regEx.Execute(XMLHTTP.ResponseText)
        If IsMissing(index) Then
            GetElementByRegex = matches(0).SubMatches(0)
        Else
            GetElementByRegex = matches(index).SubMatches(0)
        End If
        Exit Function
    End If
    GetElementByRegex = ""
End Function

'GetRegex - capture any regex from a string
Public Function GetRegex(str As String, reg As String, Optional index As Integer)
    Set regEx = CreateObject("VBScript.RegExp")
    regEx.Pattern = reg
    regEx.Global = True
    If regEx.Test(str) Then
        Set matches = regEx.Execute(str)
        If IsMissing(index) Then
            GetRegex = matches(0).SubMatches(0)
        Else
            GetRegex = matches(index).SubMatches(0)
        End If
        Exit Function
    End If
    GetRegex = ""
End Function

This may seem like a small change but see this example to appreciate how flexible and easy scraping HTML is now:

Example of Scraping HTML table

Let us use this example HTML table on w3schools.

Let us scrape each cell into a separate Excel cell. It took me only a couple minutes to get this done:

Scraping HTML table in Excel
Scraped HTML table in Excel

Now step by step:

First I scraped the whole table into cell B2 using the GetElementbyRegex function:

=GetElementByRegex("http://www.w3schools.com/html/html_tables.asp";"<table class=""reference"" style=""width:100%"">([^""]*?)</table>")

I did this in a separate cell to optimize the workbook (so that in case of a recalculation of the worksheet the site content does not have to be downloaded separately for each cell). Notice the regex ([^”]*?). This is a non-greedy capture of ALL characters (non-“). This guarantees that only this table is captured in the expression and not all tables. Using (.*)? would not be enough as the dot character does not match newlines.

Next getting the th header cells (next headers by changing the last index in the range 0-3):

=GetRegex(GetRegex($B$1;"<tr>([^""]*?)</tr>";0);"<th>([^""]*?)</th>";0)

This captures the first row and then extracts the first header.

Similarly the td cells (columns and rows depending on the indices):

=GetRegex(GetRegex($B$1;"<tr>([^""]*?)</tr>";1);"<td>([^""]*?)</td>";0)

This captures the second row and then extracts the first cell.

Download the Scrape HTML example

Download the full example:

Summary

This is in my opinion a very powerful set of tools for every analyst working daily on Internet based content. There is no need for writing any additional VBA as the GetRegex function can be nested any number of times to allow you to extract the data you need. Use the index parameter in these functions to capture cells in structured tables or repeating patterns to reduce the amount of code you need to write.

I appreciate your comments!

versioning excel

Versioning Excel files with Excel VBA

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 4.67 out of 5)
Loading...

Usually when developing Excel solutions you want to version you file often to prevent data loss due to the application crashing etc. You will probably also want to keep the older versions of you files to be able to go back and recover any previously working code. When you do this once or twice in a while this is no issue. But when you are making significant changes in a short amount of time saving new versions is a time-consuming task. Excel versioning is therefore something many deem useful.

That’s why I made myself a very simple VBA method for automatically saving the current Excel xlsm file as a new version while keeping the previous versions of the file. The macro increments the current file version. It is best to set a keyboard shortcut to the macro to save time.

Excel VBA Versioning
Excel VBA Versioning

Versioning Excel Code

The following code will save the ActiveWorkbook as a new Workbook while appending the version number by 1 in the format “_vXXX” where XXX is the version number. The versioning macro will maintain the file extension.

Sub SaveNewVersion()
    Dim fileName As String, index As Long, ext As String
    arr = Split(ActiveWorkbook.Name, ".")
    ext = arr(UBound(arr))
    If InStr(ActiveWorkbook.Name, "_v") = 0 Then
        
        fileName = ActiveWorkbook.Path & "" & Left(ActiveWorkbook.Name, InStr(ActiveWorkbook.Name, ".") - 1) & "_v1." & ext
        ActiveWorkbook.SaveAs (fileName)
    Else
        index = CInt(Split(Right(ActiveWorkbook.Name, Len(ActiveWorkbook.Name) - InStr(ActiveWorkbook.Name, "_v") - 1), ".")(0))
        index = index + 1
        fileName = ActiveWorkbook.Path & "" & Left(ActiveWorkbook.Name, InStr(ActiveWorkbook.Name, "_v") - 1) & "_v" & index & "." & ext
    End If
    ActiveWorkbook.SaveAs (fileName)
End Sub

Download

You can also download the file as a bas file:


The module sets the macro as a CTRL+SHIFT+S shortcut as having this line of code:

Attribute SaveNewVersion.VB_ProcData.VB_Invoke_Func = "Sn14"

Setting up the Versioning Excel Macro

Keyboard shortcut

As mentioned above the macro in the download section is setup by default for CTRL+SHIFT+S shortcut. However, in case you want to change the shortcut. Simply go to the DEVELOPER ribbon and select Macros. Next select the SaveNewVersion macro and click Options.... This will prompt you for a new keyboard shortcut.

Set Excel Macro Shortcut
Set Excel Macro Shortcut

Quick Access Toolbar

Why remember a keyboard shortcut when you can add a neat icon to your Quick Access Toolbar in Excel.

Open the Quick Access Toolbar

Go to File, Options and open the Quick Access Toolbar.

Add versioning VBA macro to the Quick Access Toolbar

Proceed as shown below to Add the SaveNewVersion macro to the Quick Access Toolbar:

Quick Access Toolbar: Add macro
Quick Access Toolbar: Add macro

Optional: Modify the icon

Why stick to a default macro icon when we can make it more pleasant to the eye? Click on the SaveNewVersion macro and hit Modify. Next select a new icon from this window:

Save New Version: Select a new icon
Save New Version: Select a new icon

This is the final effect:
Quick Access Toolbar: Versioning Excel
Quick Access Toolbar: Versioning Excel

Simply hit the icon to save a new version of your Excel file! Remember to save the file in XLS/XLSM/ or XLSB format.

Installing as an Excel AddIn

The above will add the versioning feature to all your Workbooks as long as your Excel file with the SaveNewVersion macro is not moved or deleted!. I strongly recommend that instead you include this file into the AddIns folder before configuring this shortcut.

Save the XLSM file as AddIn

First save the file in XLA or XLAM format, as an Excel AddIn.

Save the file in Microsoft Excel AddIn directory

Save the file in the following directory for it to open automatically on startup:

C:/Users/USERNAME/AppData/Roaming/Microsoft/AddIns
vba progress bar

Animated VBA progress bar for Excel and Access

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 3.67 out of 5)
Loading...

Sometimes there are very large and complex solutions built in Excel (which is a mistake mind you), where calculations or macro executions can take minutes or even hours. This causes many issues, especially for the end users who usually do not know how long processing the calculations/macros will take. In such cases it is important to notify the end users of the progress of your macros/calculations so they can switch to other activities. This is where the VBA Progress bar can aid you.

For one of my older projects I needed a VBA Progress Bar that would show:

  1. The current progress of the computations
  2. How much execution time was left (estimation)

Users especially wanted to know how much execution time was left – whether they should grab a coffee or stay and wait for the macro to finish.

Option 1: Animated VBA Progress Bar UserForm

The easiest approach to animating an Excel is to create a simple UserForm with the use of a label control which width you can manipulate to show the current progress. Easy and straightforward.

Example of how to use the VBA Progress Bar in Excel:

Sub ExampleProgressBar()
    Dim pb As ProgressBar
    Set pb = New ProgressBar
    pb.Initialize "My title", 100
    'Add 10% progress
    pb.AddProgress 10
    '...
    'Hide and remove the Progress Bar
    pb.Hide
    Set pb = Nothing
End Sub

The result:

Advanced VBA Progress Bar
Advanced VBA Progress Bar

Download the Progress UserForm complete with sourcecode here:

Option 2: Animated Worksheet VBA Progress Bar

The UserForm progress bar is very good to use when you don’t want to show too much content or use advanced formatting. However, if you want the Progress Bar GUI to be more attractive I would suggest to go with a Worksheet Progress Bar. This gives you unlimited possibilities of how to make your Progress Bar more visually attractive and allows you to use Charts/Conditional formatting etc. My example below calculates the expected time left to complete a given task based on historical progress (forecasting based on how much time to make certain progress). Pretty cool and useful in my opinion when dealing with long lasting VBA Macros.

Excel VBA Progress Bar
Excel VBA Progress Bar

Below feel free to download a Workbook with the Excel Worksheet ProgressBar VBA:


Now how does it work? The progress bar is located on a separate hidden worksheet which appears only when the progress bar is activated. The estimation of the time left is extrapolated based on the time which elapsed to the current progress and will become more accurate with time – depending on how comparative each increment is.

See the example below of how the progress bar is activated and incremented.

Example of how to use the VBA Progress Bar in Excel:

Sub TestProgressBar()
    'Activate the progress bar - switch to the progress bar worksheet
    Call ActivateProgressBar(ActiveWorkbook, ActiveSheet, 100)
    For i = 1 To 100
        Call AddOneProgress
        DoEvents 
    Next i
    'Deactivate the progress bar - switch to your original worksheet
    Call DeactivateProgressBar(ActiveWorkbook)
End Sub

So simple! Notice also when opening the example file that the update procedure contains the “DoEvents” command. This guarantees that when doing long computations the Excel screen will not freeze but will update and be responsive. The “DoEvents” command is better than using a “Sleep” command as the “Sleep” command does not always work as expected (sometimes freezing Excel) and will unnecessarily extend the time needed to execute the macro. Whereas the “DoEvents” command will only carry out pending Excel events (usually just refreshing the window).

scrape html add-in

Excel Scrape html by element id, name or… any regex!

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Sometimes I have a need to quickly scrape some data from website to be able to work on them and update their values when needed e.g. stock prices, temperature, search results, statistics etc. From time to time I stumble upon similar issues. Below find 2 quick UDF functions (user defined functions) that you can use to scrape html items by id and name. Scrape HTML elements in Excel by ID, name or Regex.

Get element by id

An example of getting an element by ID:

Public Function GetElementById(url As String, id As String)
    Dim XMLHTTP As Object, html As Object, objResult As Object
    Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
    XMLHTTP.send
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = XMLHTTP.ResponseText
    Set objResult = html.GetElementById(id)
    GetElementById = objResult.innerHTML
End Function

Get element by name

An example of getting an element by name.

Public Function GetElementByName(url As String, name As String)
    Dim XMLHTTP As Object, html As Object, objResult As Object
    Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
    XMLHTTP.send
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = XMLHTTP.ResponseText
    Set objResult = html.GetElementByName(name)
    GetElementByName = objResult.innerHTML
End Function

Examples

You can use these functions directly like other Excel functions like this:

But these functions are often not enough to scrape more complicated structured data or require additional cleansing before being able to use the data. I thought therefore of using xpath at first but regular expressions seemed the more obvious solution. So I knocked up this more flexible alternative of the above functions which allows you to use any regex to scrape data of a website:

Get element by regex

An example of scraping an element by regex (regular expression).

Public Function GetElementByRegex(url As String, reg As String)
    Dim XMLHTTP As Object, html As Object, objResult As Object
    Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
    XMLHTTP.send
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = XMLHTTP.ResponseText
    Set regEx = CreateObject("VBScript.RegExp")
    regEx.Pattern = reg
    regEx.Global = True
    If regEx.Test(XMLHTTP.ResponseText) Then
        Set matches = regEx.Execute(XMLHTTP.ResponseText)
        GetElementByRegex = matches(0).SubMatches(0)
        Exit Function
    End If
    GetElementByRegex = ""
End Function

Need an example? Here is one.

Let us say we want to scrape the latest headline off the cnn website:

scrape: cnn
CNN

Try this regex then:

=GetElementByRegex("http://edition.cnn.com/";"<h2 data-analytics=""_list-hierarchical-xs_article_"" class=""banner-text banner-text--natural""><strong>(.*?)</strong></h2>")

The result:

scrape
Regex

What did I do? I looked at the HTML of the Cnn website and notice the following HTML:

<h2 data-analytics="_list-hierarchical-xs_article_" class="banner-text banner-text--natural"><strong>
Why these girls fear the summer
</strong></h2>

As this piece of HTML is quite unique in the whole HTML content we can simply replace the header between the strong tags with the following regex (.*?). This will extract any string of characters between these tags which are not end of line characters.

Simple? Yes. If you want to play around with this I recommend you read more on using regular expressions e.g. on stackoverflow.

map

Excel Google Charts Tool

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 4.67 out of 5)
Loading...

I always wanted to utilize the beautiful and interactive Google Charts in Excel. The Google Charts repository is constantly growing and sometimes Excel lacks those features. Hence I introduce the Excel Google Charts Tool to display a way to leverage some of those Google Charts directly in Excel.

The WebBrowser control is no longer supported by Office 2013 and above, hence this functionality might not work by default in those versions of Office

The Excel Google Charts Tool contains example Google Charts embedded inside an Excel xlsm file allowing you to visualize data in a more attractive way and enabling more user interaction.

Excel Google Charts: Gauge Chart

Gauge charts are extremely useful to highlight important values in reports. You can also visualize the good and bad ranges of values e.g. orange and red for too high values. These however, can be easily configured. I

Excel VBA Gauge Chart
Excel VBA Gauge Chart

How to configure a Gauge Chart?

Google Chart: Gauge Chart

Excel Google Charts: Treemap Chart

Treemaps can be particularly useful when you want to drill-down data values e.g. used disk space and drill-down across folders. Google Treemaps have 2 values which you can visualize – the area of the treemap and the color.

Excel VBA Treemap Chart
Excel VBA Treemap Chart

How to configure a Treemap Chart?

Google Charts: Treemap Chart

Excel Google Charts: Org Chart

Excel VBA Orgchart
Excel VBA Orgchart

How to configure a Org Chart?

Org Charts come in handy when you want to visualize the tree/organisational structure.
Google Charts: Org Chart

Excel Google Charts: Geo Chart

I would say – one of the most useful charts when playing with geo-data. Using the Geo Chart you can easily visualize how your data is broken down across countries. You can zoom in the Geo Chart just to show a single continent, country or region.

Excel VBA Geochart
Excel VBA Geochart

How to configure a Geo Chart?

Google Charts: Geo Chart

Download

The file below contains all examples of Google Charts used in the Excel Google Charts Tool.


Currently the Google Chart Tool contains examples of the following Google Charts:

Issues and errors

One issue you might stumble on when using the above Google Charts may be due to recent scriptable control restrictions imposed by Microsoft. Due to these in Excel 2013 and above Excel will restrict (by default) the use of some controls e.g. Microsoft Web Browser Control – which is required to run the above Google Charts. There is a way around that so utilize the link above to read more.