Tag Archives: xml

Working with XML files in VBA (VBA XML)

1 Star2 Stars3 Stars4 Stars5 Stars (13 votes, average: 4.92 out of 5)
Loading...

XML files are one of the most common type of data files apart from text and CSV (comma-separated values) files. Reading data files which are not hierarchical (as XML files or JSON) is relatively easy. You can read in the data row by row and process columns separately. With XML (and JSON) the task is not as easy as the data is hierarchical (parent-child relationships exist between records in the schema) and the number of underlying nodes may vary as opposed to tabular data which usually has a constant number of columns separated with a specific delimiter.

Fortunately, we can use the MSXML2.DOMDocument object in VBA. Let’s however, as always, start with a short introduction as to how XML files a structure before we dive into the examples.

Loading XML document in VBA

The MSXML2.DOMDocument object allows you to easily traverse through an XML structure an extract any XML node and/or attribute needed. Let’s look at the example below.

Below we start by loading the XML document. Notice that I am selecting the load to be performed synchronously and not validation be carried out on parsing the document. Feel free to change these options if needed.

Alternatively load an XML from a string:

That’s it. You have loaded the XML document into memory into the DOMDocument object. The document has been parsed and you can easily traverse the enclosed elements. See next section.

XML DOM nodes in VBA

For the below I will use the following examples XML:

The XML document will provide you with the root of the entire DOM (of type XDoc.DocumentElement). Each DocumentElement (XML DOM node) facilitates the following node references:

Node Reference Type Description
parentNode [XDoc.DocumentElement] The parent node, one node higher in the DOM hierarchy
firstChild [XDoc.DocumentElement] The first child node, first node lower in the DOM hierarchy
lastChild [XDoc.DocumentElement] The last child node, last node lower in the DOM hierarchy
childNodes [Array of type XDoc.DocumentElement] All child nodes of the current node, all nodes lower in the DOM hierarchy
nextSibling [XDoc.DocumentElement] Next sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node
previousSibling [XDoc.DocumentElement] Previous sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node

All the above references allow you to free move within the XML DOM.

ChildNodes

Let’s start by extracting the first list and printing it’s XML and text contents. The basics to moving around the XML DOM is using ChildNodes.

This is the result

Traversing through the whole XML in VBA

Now that we got the basics let’s print out the whole contents of the XML DOM including the basenames (node names).

This is the result:

Easy right? Using the basics above we can easily move around the document. But this still seems like a lot of coding right? Well there is an easier way of moving / extracting items using the DOMDocument object – called XPath.

XML Document example node references

Now that we have a hang of our XML document, based on the example XML I provided above I mapped a reference to how to obtain various elements of our XML file by using node references:

  • DistributionLists [FirstChild]
    • List [ChildNodes(0)]
      • Name: Recon [ChildNodes(0).ChildNodes(0).innerText]
      • TO: John;Bob;Rob;Chris [ChildNodes(0).ChildNodes(1).innerText]
      • CC: Jane;Ashley
      • BCC: Brent
    • (…)

    • List [ChildNodes(1)]
      • Name: Performance Report [ChildNodes(1).ChildNodes(0).innerText]
      • TO: Huck;Ashley
      • CC: Tom;Andrew
      • BCC: John;Seema

XPath in VBA

Instead of traversing the elements/nodes in your XML using the .ChildNodes/.FirstChild/NextChild properties we can also use XPath. XPath is a query language used for selecting XML nodes in an XML document. It is represented by a single string. It allows you to extract any number of nodes (0 or more) which match the specified XPath query.

If you want to learn XPath I can recommend this overview:
https://www.w3schools.com/xml/xpath_syntax.asp

Now let’s jump into an example:

Example 1: Extract all Lists

Example 2: Extracting all TO fields

Example 3: Extracting the first and last Name field

Example 3: Extracting all child List nodes (Name, TO, CC, BCC)

XML Attributes in VBA

Let’s tackle one last example – attributes. Let’s slightly modify the XML above and include an example attribute named attribute.

Using XPath (or traversing the DOM) we can easily extract the attribute as shown below.

Creating XML documents

Creating documents is also quite straight forward in VBA.

XLSB vs XLSX. The Pros and Cons of XLSB Files

1 Star2 Stars3 Stars4 Stars5 Stars (8 votes, average: 4.63 out of 5)
Loading...

An XLSB file is a Microsoft Office Excel Binary file. Information inside XLSB file is stored as binary data compared, while in usual XSLX files data is stored as XML files which are then compressed.

Working with large Excel files is often a drag. They open slower, they take an eternity to save and they often need to be uploaded to the Cloud to be shared with your coworkers or family. Why not explore the benefits of the XSLB file format then (sometimes mixed up with XLSB)?

What is an XLSB file format?

This is what is inside an XLSX (or XLSM) file (image on the right). What do you mean inside? – you ask.

Inside an XLSX file
Inside an XLSX file

XLSX and XLSM files are in fact compressed archives with XML files inside. That is because Microsoft has opened the Excel file format and decided to break the insides into XML files. When an XLSX or XLSM file is saved Excel needs to break it down into separate XML files, compress it and finally save it as XLSX or XLSM.

The XLSB file format on the other hand is a binary Excel file. It resembles the old XLS file format which was also a binary file.
What happens when Excel wants to save an XSLB file? Not much really. It mostly dumps the binary data into a binary file and saves it as XLSB.

XLSB advantages (XLSB vs. XLSX)

  • XLSB files are smallerSmaller file size – the Excel binary file uses noticeably less space. This is more noticeable especially when working with large Excel files. In some cases I heard of there being XLSB files that required 10% of the original file size – this is more visible with VERY large files
  • XLSB files save or open fasterOpens/saves more quickly – loading binary data is faster than parsing text (XML) files – similarly as you would compare opening a book in Spanish and having to translate every sentence to English as opposed to picking up a ready translated copy. Similarly, when saving the file – the binary format is more efficient than dumping the data back into the XML and then compressing it. From my experience XLSB files open and get saved 2x faster
  • XLSB files support longer formulasSupports formulas above the 8192 character limit. In other file formats they don’t save properly

Yahoo! Well it would seem there is nothing less obvious to do then to start working only on the XLSB file format. However, it’s not a straightforward decision as there are some minor setbacks. Here are some that come to mind…

XLSB disadvantages (XLSB vs. XLSX)

  • Compatibility – the XLSB Excel format is not supported by Excel 2003 and earlier versions, which frankly is less of a problem nowadays
  • Security (VBA) – with the distinction between the XLSM and XLSX format you know which files may or not contain VBA macros. With XLSB you won’t know for sure. So beware when opening XLSB files from unknown sources or from people/websites you don’t trust
  • You can’t make changes to the Excel Ribbon when working on an XLSB. You must temporarily save your file as XLSX or XLSM, makes changes and save back as XLSB.
  • Lack of interoperability with third-party tools. XLSB is a binary file format unlike the open XML XLSX and XLSM files. Hence you often won’t see your XLSB files working everywhere – like in OpenOffice

Other suggestions when working with large Excel files or datasets

I usually start with the quick wins suggestions and leave the more complex for desert. The XLSB file format is a good start and often won’t require that you meddle with the data/formatting/content of your workbook, rightfully as you shouldn’t have too. However, from time to time there will be those moments when that won’t be enough and you just won’t be able to work any longer with a slow and large Excel file. Here are some useful tips:

  • Reduce the file size by deleting unused cells – as stupid as it sounds this is often the reason for your Excel files mysteriously growing in size over a short period of time. Here is the solution:
    1. Find the last used row in your worksheets

    2. Delete all rows below

    3. Find the last used column in your worksheets

    4. Delete all columns forward

    5. Save the file and close Excel

    6. Reopen the file

  • Save data files without formatting – formatting may account for a lot of storage space and if you are simply working with a dataset and don’t need formatting save your file in .xml format
  • Save data files without formatting – formatting may account for a lot of storage space and if you are simply working with a dataset and don’t need formatting save your file in .xml format or as a .csv
  • Turn automatic calculations off – often even not so large Excel files cause Excel to freeze or crash. That may be because of an abundance of Excel formulas having to recalculate each time a change is made etc. Try to turn Calculation Options to Manual to get rid of this problem
  • PowerQuery Add-In – Excel has its limitations when working with large datasets. The Excel PowerQuery Add-In by Microsoft has been designed to handle Big Data and complex data queries to external databases or datasources. PowerQuery will allow you to work more efficiently with these data sets and will not limit the amount of records you can handle within a single Excel Worksheet

Common myths about XLSB

Do XLSB runs formulas faster than XLSX?

  • Not true

XLSB file are only loaded and unloaded faster (saved and closed) than XLSX files. Afterwards both formats run in RAM memory with similar performance on the same Excel engine. Hence, you won’t see your Excel formula’s running significantly faster. However, if you do save your file frequently you will definitely notice that the XLSB file format saves faster.

Do XLSB files crash more often?

  • Not true

The XLSB file format does not in any way increase the probability of crashing. However, if the file does crash it may be harder to recover. XLSB are binary files, where XLSX and XLSM files are in fact compressed XML files – text files in XML format. Therefore, in a critical situation you have definitely a better chance of reading a text file than a binary file. Then again I wouldn’t worry about this too much.

Conclusions

On a daily basis I would recommend sticking to XLSX and XSLM as standard Excel file formats. It is worth reaching out to the XLSB file format whenever you file starts running slow or uses an enormous amount of space.

XLSB will actually not benefit small Excel files and you might even see small XLSB files taking more space than small XLSX/XSLM files. Your clients / coworkers may also have doubts when opening XLSB files as Excel treats these file formats with an extra dose of caution.

Use the XLSB file format MAINLY with very LARGE Excel files.