Tag Archives: data

VBA Collection Tutorial

1 Star2 Stars3 Stars4 Stars5 Stars (11 votes, average: 4.18 out of 5)

The VBA Collection is a simple native data structure available in VBA to store (collect as you wish) objects. VBA Collections are more flexible than VBA Arrays as they are not limited in their size at any point in time and don’t require manual re-sizing. Collections are also useful when you don’t want to leverage there more complex (but quite similar) Data Structures like the VBA ArrayList or even a VBA Dictionary.

Adding items to a VBA Collection

Let’s start by creating a New VBA Collection and adding some items to it.

Notice that the VBA Collection is not typed in any way as we didn’t have to declare what types of objects it stores. This is because a VBA Collection stores object of type Variant.

By default the Add procedure will push items to the end of the VBA Collection. However, you can also choose to insert items before or after any index in the Collection like so:


If you want to be able to reference a particular item in your VBA Collection by a certain string/name you can also define a key for object added to your VBA Collection like so:

Removing items from a VBA Collection

Removing items from a VBA Collection is equally easy. However, items are removed by specifying their index.

Be careful when removing items from a VBA Collection – items start indexing at 1 (not 0 as in most common programming languages.

When removing items in a loop do remember that the index of the remaining items in the VBA Collection will decrease by 1.

Clearing a VBA Collection

To Clear all items (remove them) from a VBA Collection use the Clear function.

Clearing all items from a Collection is similar to recreating a VBA Collection:

Counting items in a VBA Collection

Similarly as with other VBA Data Structures the VBA Collection facilitates the Count function which returns the amount of items currently stored.

Getting items in a VBA Collection

To get a specific item in your VBA Collection you need to either specify it’s index or optionally it’s key (if it was defined).

Traversing a VBA Collection

As with VBA Arrays you can similarly traverse a VBA Collection using loops such as For, While etc. by indexing it’s items, or by traversing its items using the For Each loop (preferred).

Check if VBA Collection contains item

Unfortunately, the VBA Collection object does not facilitate a Contains function. We can however easily write a simple Function that will extend this functionality. Feel free to use the Function below:

Usage example:

Convert VBA Collection to VBA Array

In some case you may want to convert your VBA Collection to a VBA Array e.g. if you would want to paste items from your Collection to a VBA Range. The Code below does that exactly that:

Below a test of how it works:


The VBA Collection is an extremely useful data type in VBA. It is more versatile than the VBA Array allowing you to add and remove items more freely. The Key-Value store works also similarly as for the VBA Dictionary making it a useful alternative.

As an exercise – next time you will consider using a VBA Array consider replacing it with the Collection object. Hopefully it will guest in your VBA Code more often.

Data Analysis Excel Tools

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 4.67 out of 5)

Excel is most appreciated for it’s ease of use as a Data Analysis Tool. I mean to explore the basics, as well as the more advanced Data Analysis Excel Tools. Be sure to read through the Other Tools section below for other honorable mentions.

Data Analysis Excel Tools
Sort & Filter PivotTable What If Analysis Data Analysis Toolpak Solver MS Query (SQL)

The DATA ribbon tab

As some of you already have noticed MOST (except for the PivotTable for no good reason) of the tools described in this post focus on features available BY DEFAULT within the Excel DATA ribbon tab (see image below). Some of them are AddIn that are installed by default along with MS Office, however, need to be activated manually. Skip to the relevant section of this post to read more.

excel data ribbon tab
Excel DATA ribbon tab (click to expand)
In this post I am focusing just on features that allow you to explore data or solve certain analytical problems. I am not focusing on simple data transformation features like removing duplicates, however, I do encourage that you learn more on these non-mentioned features.

Sort & Filter

Let’s kick off with the Sort & Filter group. See below for a quick look at how to use the Sort and Filter feature in Excel on an example data set:


PivotTables are so versatile a tool that most often no other Data Analysis Tool is needed. PivotTables are great for exploring data sets and for visualizing data with the help of automatically generated PivotCharts. See a simple PivotTable example below:

What If Analysis

The What If Analysis button in the DATA ribbon tab facilitates generally two useful features:

  • Scenario Manager – the scenario manager allows you to add several scenarios to a certain problem and then generate a summary comparing how the scenario values impact a selected cell value
  • Goal Seek
  • Data Tables – creating scenarios

Let’s explore these tools (apart from the last one) on a simple problem case example.
The problem:
what if the problem

Our example assumes there is a 500k$ Marketing Budget to spend on a Marketing Campaign which can consist of spending on Internet and/or TV ads. What is more, for the sake of the example let’s assume:

  • TV ads have a parabolic impact (see below) on Product demand i.e. at first the more you spend the more Product demand you generate, until you reach a peak where Product demand will actually fall (maybe viewers are tired of their favorite programs being bombarded by your ads).
  • Internet ads have a linear impact on Product demand i.e. the more you spend the more Product demand you generate

internet and tv spend
The Product demand cell formula is:


Scenario Manager

Using the example mentioned above let’s use the Scenario Manager to create 3 separate scenarios:

  1. 100% TV spend – 500k$ spent on TV ads
  2. 50% TV and Internet spend – 250k$ spent both on TV and Internet ads
  3. 100% Internet spend – 500k$ spent on Internet ads

Open the Scenario Manager

To add our scenarios go to the DATA ribbon tab, click What If Analysis and then select Scenario Manager....

Click Add to add a new Scenario

Specify the Scenario details – especially the range of Changing Cells that are to be specified by the scenario:

Scenario Manager: Adding a Scenario
Scenario Manager: Adding a Scenario

Next provide values as appropriate to your Scenario:
Scenario Manager: Setting the Scenario Values
Scenario Manager: Setting the Scenario Values

Repeat for each Scenario

You need to Name and specify the Changing Values for each Scenario separately.

Generate a Summary

Hit Summary... to generate a neat worksheet with a summary of your Scenarios. Specify the Result Cells – cells which will be modified in result to your Scenario. In my case it is just the Product demand cell:

Scenario Manager: Generate Summary
Scenario Manager: Generate Summary

Now admire the results:
Scenario Manager: Results
Scenario Manager: Results

Notice that in this example on purpose I created the formula as such so that it is not directly obvious (unless you work out the formula maximum) what is the best mix of TV and Internet ads. We will work through this example using different tools below.

Goal Seek

Goal Seek calculates the values necessary to achieve a specific goal. The “goal” is an amount returned by a formula. So in short – you should use Goal Seek if you have a function which characteristics are unknown or hard to figure out. Goal Seek is Solver’s younger (less developed) brother.

Some examples of when to use Goal Seek:

  • Solving single (changing) parameter mathematical functions (seeking a specific value e.g. 0)
  • Doing a quick check if a single parameter can help achieve a specific value for a formula (e.g. financial statements)

Using Goal Seek

Ok let’s learn how to apply Goal Seek. Let’s reuse our above example – let’s say we are aware that our TV ads function is a parabolic or a polynomial at least and we want to quickly locate some of it’s 0 values.

Our Product demand (just for TV ads) looks as so if we ignore the Internet ads:


Where C3 is Marketing Spend on TV ads.

Open Goal Seek

Go to the DATA ribbon tab, click What If Analysis and then select Goal Seek.

Provide the Goal Cell, Value and the Changing Cell

For our example the Goal is our Product demand, and our Changing Cell will be the Marketing Spend on TV ads:

Goal Seek: Setting up
Goal Seek: Setting up

Run Goal Seek and BEWARE!

This is the result I received. Notice that the value I provided into the TV ads cell was 0 to start with.

Goal Seek: Results
Goal Seek: Results

This time Goal Seek came up with the result: 0. Ok now let’s rerun this example. But this time I will type a higher starting value for TV ads spend… say 150k$. Do you think it will provide the same result? See below:
Goal Seek: Results with different starting point
Goal Seek: Results with different starting point

The computed result is different! This time Goal Seek came up with the result: 200. No why is that? Goal Seek is an iterative algorithm that in an iterative manner seeks to reach the Goal Value. However, if a function (cell formula) has several solutions expect that Goal Seek may land at different values.

This is very important to keep in mind!

Goal Seek in VBA

Not many know that you can use Goal Seek easily in VBA too! The parameters are as such:
Range.GoalSeek(Goal, ChangingCell)

  • Range – single cell with Goal function (formula)
  • Goal – the Goal Value to achieve
  • ChangingCell – single cell which will can be modified to achieve the Goal

MSDN provides a sexy example below:

Worksheets("Sheet1").Range("Polynomial").GoalSeek Goal:=15, ChangingCell:=Worksheets("Sheet1").Range("X")

Data Analysis Toolpak

Also know as Analysis Toolpak is a Microsoft AddIn delivered with Excel (usually disabled by default) that provides data analysis capabilities to Excel. Let’s start by enabling this AddIn:

Enabling the Analysis Toolpak

Find Add-Ins in Excel Options

To open the Window go to FILE, select Options and then pick AddIns.

Analysis Toolpak: Excel Options
Analysis Toolpak: Excel Options

Next from the above window hit Go to progress to Step 2.

Open the Add-Ins window and enable the Add-In

From the Window below select Analysis Toolpak and hit OK:

Analysis Toolpak: Enabling the Add-In
Analysis Toolpak: Enabling the Add-In

Using the Analysis Toolpak

To open the Analysis Toolpak window simply go to the DATA ribbon and you should find at your far right in a new group (Analysis) the Data Analysis button. Hit it to open the whole range of Data Analysis Excel tools.

Analysis Toolpak: Opening
Analysis Toolpak: Opening

Click on the button to see the whole list of different tools:
Analysis Toolpak Tools
Analysis Toolpak Tools

I won’t go through all these tools as they are pretty straightforward. Personally I tend to use only the Correlation feature which creates a nice correlation matrix out of a series of numerical columns.


Solver is a Microsoft AddIn to Excel that allows you to find optimal solutions for certain decision problems defined by a certain Objective (cell value). It’s like the older brother of the previously mentioned Goal Seek as it allows for more complicated computations and optimization making. Let’s start by enabling it in Excel.

Enabling the Solver AddIn

Find Add-Ins in Excel Options

To open the Window go to FILE, select Options and then pick AddIns.

Analysis Toolpak: Excel Options
Solver: Excel Options

Next from the above window hit Go to progress to Step 2.

Open the Add-Ins window and enable the Solver Add-In

From the Window below select Solver Add-In and hit OK:

Solver: Enabling the AddIn
Solver: Enabling the AddIn

Using the Solver AddIn

Now similarly as above let’s again consider our Problem from above.
what if the problem
As previously let’s exemplify this tool with our Marketing Budget example. Quick reminder – we have 500k$ to spend on TV and Internet ads and we have a formula that calculates our spend to the resulting Product demand.

Now we want to know what is the optimal value of our spend on TV and Internet ads to maximize Product demand. This is exactly what we can calculate with Solver:

To open Solver window simply go to the DATA ribbon and you should find at your far right in a new group (Analysis) the Solver button. Hit it to open the Solver wizard:


Solver requires that you:

  • Specify the Objective – set a single cell that measures your Objective. Next select whether you want to Maximize, Minimize or set this value by changing certain variables (specified later)
  • Specify the Changing Variables – specify the variables (cells) which should be modified by Solver to optimize your Objective function (formula)
  • Specify your Constraints – specify cells which should evaluate to certain constraints limiting the values of the Changing Variables

I used the settings above to run Solver on my Problem. See the results below:

Solver: Problem results
Solver: Problem results

The results are perfect (99 for TV ads and 401 for Internet ads)! Solver managed to find the optimal solution to my problem. Some of you probably thought the answer to be 100 and 400 :) – that was the hinted answer but not the optimal one! This is the power of Solver.

MS Query (SQL)

If you have been around my blog long enough you might have noticed my affection towards SQL in Excel manifested by my SQL Excel AddIn. I consider SQL a significant tool in the Data Analysis Excel toolbox. There are simply some things you won’t do easily with a PivotTable that can be easily written in a few SQL lines of code.

Let’s demonstrate a simple example of how to create a MS Query using the DATA ribbon tab using a simple source worksheet – Sheet1:
SQL Example data set

Open the MS Query window

Go to the DATA ribbon tab, click From Other Sources and then select From Microsoft Query.

Add MS Query (SQL)
Add MS Query (SQL)

Follow the MS Query window wizard to create your MS Query

This is the hard part. In my example I want to run a query against my working Workbook (Sheet1). See screens below:

Selecting the Data Source
Selecting the Data Source

Selecting the Excel file (can be your current Workbook)
Selecting the Excel file (can be your current Workbook)

Selecting the Worksheet and columns for your Query
Selecting the Worksheet and columns for your Query

Return data to Worksheet or Edit Query
Return data to Worksheet or Edit Query

Edit the Query (optional)
Edit the Query (optional)

Return Data to Worksheet (as Table/PivotTable/PivotChart)
Return Data to Worksheet (as Table/PivotTable/PivotChart)

Indeed that’s a lot of steps right? Hence, if you consider using MS Query (SQL) I strongly recommend my SQL Excel AddIn. It does the same in just 1 step. It does not add any new features to Excel but simply makes it so easy to create MS Queries in Excel – so you can just use it to create the queries using the AddIn. The AddIn is not required to use them.

Refresh or Edit your MS Query

This is the wonderful part – your MS Query is represented in Excel as a QueryTable. A queryable Excel Table that can be Refreshed or modified. See the options below available for QueryTables:

Refreshing/Editing the MS Query
Refreshing/Editing the MS Query

Summary: Data Analysis Excel

It is worth at least being aware of the wide range of Excel Data Analysis tools that are available by DEFAULT in Excel. Most Excel users are limited to the PivotTable and maybe Solver (2 common Excel questions asked for Analyst job interviews). However, Excel contains much more Data Analysis Excel tools than just those two.

I personally see much value in the MS Query (SQL) feature which has been so successfully hidden in Excel and made so hard to setup that it is simply neglected. Fortunately, there is a way around that with my SQL AddIn which I myself use on a regular basis. SQL however is a powerful tool (and language) and often allows you to create queries that can replace VBA macros and in most cases will run faster than VBA.

Working with XML files in VBA (VBA XML)

1 Star2 Stars3 Stars4 Stars5 Stars (13 votes, average: 4.92 out of 5)

XML files are one of the most common type of data files apart from text and CSV (comma-separated values) files. Reading data files which are not hierarchical (as XML files or JSON) is relatively easy. You can read in the data row by row and process columns separately. With XML (and JSON) the task is not as easy as the data is hierarchical (parent-child relationships exist between records in the schema) and the number of underlying nodes may vary as opposed to tabular data which usually has a constant number of columns separated with a specific delimiter.

Fortunately, we can use the MSXML2.DOMDocument object in VBA. Let’s however, as always, start with a short introduction as to how XML files a structure before we dive into the examples.

Loading XML document in VBA

The MSXML2.DOMDocument object allows you to easily traverse through an XML structure an extract any XML node and/or attribute needed. Let’s look at the example below.

Below we start by loading the XML document. Notice that I am selecting the load to be performed synchronously and not validation be carried out on parsing the document. Feel free to change these options if needed.

Alternatively load an XML from a string:

That’s it. You have loaded the XML document into memory into the DOMDocument object. The document has been parsed and you can easily traverse the enclosed elements. See next section.

XML DOM nodes in VBA

For the below I will use the following examples XML:

The XML document will provide you with the root of the entire DOM (of type XDoc.DocumentElement). Each DocumentElement (XML DOM node) facilitates the following node references:

Node Reference Type Description
parentNode [XDoc.DocumentElement] The parent node, one node higher in the DOM hierarchy
firstChild [XDoc.DocumentElement] The first child node, first node lower in the DOM hierarchy
lastChild [XDoc.DocumentElement] The last child node, last node lower in the DOM hierarchy
childNodes [Array of type XDoc.DocumentElement] All child nodes of the current node, all nodes lower in the DOM hierarchy
nextSibling [XDoc.DocumentElement] Next sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node
previousSibling [XDoc.DocumentElement] Previous sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node

All the above references allow you to free move within the XML DOM.


Let’s start by extracting the first list and printing it’s XML and text contents. The basics to moving around the XML DOM is using ChildNodes.

This is the result

Traversing through the whole XML in VBA

Now that we got the basics let’s print out the whole contents of the XML DOM including the basenames (node names).

This is the result:

Easy right? Using the basics above we can easily move around the document. But this still seems like a lot of coding right? Well there is an easier way of moving / extracting items using the DOMDocument object – called XPath.

XML Document example node references

Now that we have a hang of our XML document, based on the example XML I provided above I mapped a reference to how to obtain various elements of our XML file by using node references:

  • DistributionLists [FirstChild]
    • List [ChildNodes(0)]
      • Name: Recon [ChildNodes(0).ChildNodes(0).innerText]
      • TO: John;Bob;Rob;Chris [ChildNodes(0).ChildNodes(1).innerText]
      • CC: Jane;Ashley
      • BCC: Brent
    • (…)

    • List [ChildNodes(1)]
      • Name: Performance Report [ChildNodes(1).ChildNodes(0).innerText]
      • TO: Huck;Ashley
      • CC: Tom;Andrew
      • BCC: John;Seema

XPath in VBA

Instead of traversing the elements/nodes in your XML using the .ChildNodes/.FirstChild/NextChild properties we can also use XPath. XPath is a query language used for selecting XML nodes in an XML document. It is represented by a single string. It allows you to extract any number of nodes (0 or more) which match the specified XPath query.

If you want to learn XPath I can recommend this overview:

Now let’s jump into an example:

Example 1: Extract all Lists

Example 2: Extracting all TO fields

Example 3: Extracting the first and last Name field

Example 3: Extracting all child List nodes (Name, TO, CC, BCC)

XML Attributes in VBA

Let’s tackle one last example – attributes. Let’s slightly modify the XML above and include an example attribute named attribute.

Using XPath (or traversing the DOM) we can easily extract the attribute as shown below.

Creating XML documents

Creating documents is also quite straight forward in VBA.

Reverse engineering an Excel PivotTable / Flatten Excel PivotTable

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 4.33 out of 5)

Today a quick tip that helped me out of a tight spot during one of my projects. The issue was that I received an Excel file with multiple PivotTables, but no source tables on which these PivotTables were built (these were in separate worksheets that were not shared). I needed the source data to produce my own custom complex reports (for which I use MS Queries instead of PivotTables), and I needed it now. Fortunately, PivotTables contain embedded source tables and there is a quick and easy way to recover Pivot Table data. Read on to learn how… maybe it will come useful.

How to reverse engineer a PivotTable

Let’s assume we have the following Excel PivotTable below.

Pivot Table Data
The PivotTable

Scroll to botton right cell of the Pivot

In order to recover the Pivot Table data associated with the PivotTable you need to scroll down to the right-most, bottom-most Grand Total value in the PivotTable.

Pivot Table data: Reverse engineering a PivotTable
Double Click on the Grand Total in the bottom right-most corner

Double click on the Pivot cell

Now simply double-click on the Grand Total and the source table will appear on a separate Excel Worksheet as show below:

Pivot Table data
The resulting source Pivot Table data

Easy right? Hope this helps you in similar situations – if yes, I appreciate a comment below!

Saving your VBA data (VBA data dump)

1 Star2 Stars3 Stars4 Stars5 Stars (5 votes, average: 4.00 out of 5)

Sometimes VBA algorithms take a long time to execute and you would want to be able to maintain those calculations in memory when you close and reopen you Excel Workbook or Access Database. Why not save your VBA project data and recover it when reopening your VBA project? Actually you can do just that…

Saving VBA data to binary files

save vba dataThe trick to saving your VBA workspace variables is dump them into a binary file. This is not as complex as it seems, and if you are reading this post you probably already have mastered reading/writing data to text files. Binary files as opposed to text files can store any data type(s), while text files can only contain strings (you might try dumping data as strings and then parsing them, but its not exactly a pleasant job).

If you stumbled across my post on Writing files in VBA (the proper way!) you might be aware of how simple saving data to a binary file is (in this case the “testVar” Integer variable):

Dim fileName As String, fileNo As Integer, testVar As Integer
testVar = 4
fileNo = FreeFile
Open fileName For Binary Lock Read Write As #fileNo
Put #fileNo, , testVar
Close #fileNo

Reading the variable back to your VBA project is equally simple:

Dim fileName As String, fileNo As Integer, testVar As Integer
fileNo = FreeFile
Open fileName For Binary Lock Read As #fileNo
Get #fileNo, , testVar 
Close #fileNo

I hope this demonstrates how easily you can dump your VBA project variables to a binary file and then recover them.

The proper way to dump and recover your VBA data

Now the example above will work fairly well if your variables are of constant byte-length. For example an Integer will require 2 bytes, a Double 8 bytes… No problems yet right? Well issues start when you start dumping Objects and/or Strings. Why? Strings for example are simply a sequence of Characters (bytes). Therefore the string “Hello” will require 5 bytes of space, while “Hello World” will use 11 bytes of space. If you think you can manage this by preceding your strings with an Integer header – you are right. But you will have much more issues with dynamic arrays, Classes and other objects which will require additional headers.

How to manage your VBA project variables then? Encapsulate them using the Type statement. When dumping Types to binary files, the object will include a header with all the information needed to recover the whole Type back to your VBA project. Making a second attempt at the code above, your final methods should look like this:

Type TestType
    intVar As Integer
    strVar As String
End Type
Sub SaveVBAVariables()
  Dim fileName As String, fileNo As Integer, testVar As TestType
  '...Some code here...
  testVar.intVar = 4
  testVar.strVar = "Hello!"
  'Now save all testVar variables to a binary file
  fileName = "C:test.bin"
  fileNo = FreeFile
  Open fileName For Binary Lock Read Write As #fileNo
  Put #fileNo, , testVar
  Close #fileNo
End Sub

Sub ReadVBAVariables()
    'Read the testVar variables back to your VBA project
    Dim fileName As String, fileNo As Integer, testVar As TestType
    fileName = "C:test.bin"
    fileNo = FreeFile
    Open fileName For Binary Lock Read As #fileNo
    Get #fileNo, , testVar
    'Recovered the TestType to testVar
    Close #fileNo
End Sub

The SaveVBAVariables procedure demonstrates how you can create and work with your variables, and
save your results to a binary file.
The ReadVBAVariables procedure can be executed even after you close and reopen the workbook – recovering your Test data type along with all enclosed variables.