Tag Archives: xml

VBA XML

Working with XML files in VBA (VBA XML)

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 5.00 out of 5)
Loading...

XML files are one of the most common type of data files apart from text and CSV (comma-separated values) files. Reading data files which are not hierarchical (as XML files or JSON) is relatively easy. You can read in the data row by row and process columns separately. With XML (and JSON) the task is not as easy as the data is hierarchical (parent-child relationships exist between records in the schema) and the number of underlying nodes may vary as opposed to tabular data which usually has a constant number of columns separated with a specific delimiter.

Fortunately, we can use the MSXML2.DOMDocument object in VBA. Let’s however, as always, start with a short introduction as to how XML files a structure before we dive into the examples.

Loading XML document in VBA

The MSXML2.DOMDocument object allows you to easily traverse through an XML structure an extract any XML node and/or attribute needed. Let’s look at the example below.

Below we start by loading the XML document. Notice that I am selecting the load to be performed synchronously and not validation be carried out on parsing the document. Feel free to change these options if needed.

Sub TestXML()
    Dim XDoc As Object, root as Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.Load (ThisWorkbook.Path & "\test.xml")
    Set root = XDoc.DocumentElement
    '... 
End Sub

Alternatively load an XML from a string:

Sub TestXML()
    Dim XDoc As Object, root as Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.LoadXML ("<root><child></child></root>")
    Set root = XDoc.DocumentElement
    '... 
End Sub

That’s it. You have loaded the XML document into memory into the DOMDocument object. The document has been parsed and you can easily traverse the enclosed elements. See next section.

XML DOM nodes in VBA

The XML document will provide you with the root of the entire DOM (of type XDoc.DocumentElement). Each DocumentElement (XML DOM node) facilitates the following node references:

  • parentNode [XDoc.DocumentElement] – the parent node, one node higher in the DOM hierarchy
  • firstChild [XDoc.DocumentElement] – the first child node, first node lower in the DOM hierarchy
  • lastChild [XDoc.DocumentElement] – the last child node, last node lower in the DOM hierarchy
  • childNodes [Array of type XDoc.DocumentElement] – all child nodes of the current node, all nodes lower in the DOM hierarchy
  • nextSibling [XDoc.DocumentElement] – next sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node
  • previousSibling [XDoc.DocumentElement]– previous sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node

All the above references allow you to free move within the XML DOM.

ChildNodes

Let’s start by extracting the first list and printing it’s XML and text contents. The basics to moving around the XML DOM is using ChildNodes.

Sub TestXML()
    Dim XDoc As Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.Load (ThisWorkbook.Path & "\test.xml")
    
    'Get Document Elements
    Set lists = XDoc.DocumentElement
    
    'Get first child ( same as ChildNodes(0) )
    Set getFirstChild = lists.FirstChild
    'Print first child XML
    Debug.Print getFirstChild.XML
    'Print first child Text
    Debug.Print getFirstChild.Text

    Set XDoc = Nothing
End Sub

This is the result

'Print first child XML
<List>
    <Name>Recon</Name>
    <TO>John;Bob;Rob;Chris</TO>
    <CC>Jane;Ashley</CC>
    <BCC>Brent</BCC>
</List>
'Print first child Text
Recon John;Bob;Rob;Chris Jane;Ashley Brent

Traversing through the whole XML in VBA

Now that we got the basics let’s print out the whole contents of the XML DOM including the basenames (node names).

Sub TestXML()
    Dim XDoc As Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.Load (ThisWorkbook.Path & "\test.xml")
    
    'Get Document Elements
    Set lists = XDoc.DocumentElement
    
    'Traverse all elements 2 branches deep
    For Each listNode In lists.ChildNodes
        Debug.Print "---Email---"
        For Each fieldNode In listNode.ChildNodes
            Debug.Print "[" & fieldNode.BaseName & "] = [" & fieldNode.Text & "]"
        Next fieldNode
    Next listNode
    
    Set XDoc = Nothing
End Sub

This is the result:

---Email---
[Name] = [Recon]
[TO] = [John;Bob;Rob;Chris]
[CC] = [Jane;Ashley]
[BCC] = [Brent]
---Email---
[Name] = [Safety Metrics]
[TO] = [Tom;Casper]
[CC] = [Ashley]
[BCC] = [John]
---Email---
[Name] = [Performance Report]
[TO] = [Huck;Ashley]
[CC] = [Tom;Andrew]
[BCC] = [John;Seema]

Easy right? Using the basics above we can easily move around the document. But this still seems like a lot of coding right? Well there is an easier way of moving / extracting items using the DOMDocument object – called XPath.

XML Document example node references

Now that we have a hang of our XML document. Let’s work on a simple example. Below an XML document assigned with references to each node/nodes:

Example XML Document

<?xml version="1.0" encoding="utf-8"?>
<DistributionLists>
    <List>
        <Name>Recon</Name>
        <TO>John;Bob;Rob;Chris</TO>
        <CC>Jane;Ashley</CC>
        <BCC>Brent</BCC>
    </List>
    <List>
        <Name>Safety Metrics</Name>
        <TO>Tom;Casper</TO>
        <CC>Ashley</CC>
        <BCC>John</BCC>
    </List>
    <List>
        <Name>Performance Report</Name>
        <TO>Huck;Ashley</TO>
        <CC>Tom;Andrew</CC>
        <BCC>John;Seema</BCC>
    </List>
</DistributionLists>

 
 
 

XML Document nodes reference examples

  • DistributionLists [FirstChild]
    • List [ChildNodes(0)]
      • Name: Recon [ChildNodes(0).ChildNodes(0).innerText]
      • TO: John;Bob;Rob;Chris [ChildNodes(0).ChildNodes(1).innerText]
      • CC: Jane;Ashley
      • BCC: Brent
    • List
      • Name: Safety Metrics
      • TO: Tom;Casper
      • CC: Ashley
      • BCC: John
    • List [ChildNodes(1)]
      • Name: Performance Report [ChildNodes(1).ChildNodes(0).innerText]
      • TO: Huck;Ashley
      • CC: Tom;Andrew
      • BCC: John;Seema

 
 
 

XPath in VBA

Instead of traversing the elements/nodes in your XML using the .ChildNodes/.FirstChild/NextChild properties we can also use XPath. XPath is a query language used for selecting XML nodes in an XML document. It is represented by a single string. It allows you to extract any number of nodes (0 or more) which match the specified XPath query.

If you want to learn XPath I can recommend this overview:
http://www.w3schools.com/xpath/xpath_syntax.asp

Now let’s jump into an example:

Example 1: Extract all Lists

Sub TestXML()
    Dim XDoc As Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.Load (ThisWorkbook.Path & "\test.xml")
    
    Set lists = XDoc.SelectNodes("//DistributionLists/List")

    Set XDoc = Nothing
End Sub

Example 2: Extracting all TO fields

Set toFields = XDoc.SelectNodes("//DistributionLists/List/TO")
End Sub

Example 3: Extracting the first and last Name field

Set firstNameField = XDoc.SelectNodes("//DistributionLists/List[0]/Name")

Set lastNameField = XDoc.SelectNodes("//DistributionLists/List[2]/Name")

Example 3: Extracting all child List nodes (Name, TO, CC, BCC)

Set listChildrenField = XDoc.SelectNodes("//DistributionLists/List/*")

XML Attributes in VBA

Let’s tackle one last example – attributes. Let’s slightly modify the XML above and include an example attribute named attribute.

<?xml version="1.0" encoding="utf-8"?>
<DistributionLists>
    <List>
        <Name attribute="some">Recon</Name>

Using XPath (or traversing the DOM) we can easily extract the attribute as shown below.

Set firstNameField = XDoc.SelectNodes("//DistributionLists/List[0]/Name")
Debug.Print firstNameField(0).Attributes(0).Text
'Result: "some"

Creating XML documents

Creating documents is also quite straight forward in VBA.

Dim XDoc As Object, root As Object, elem As Object
Set XDoc = CreateObject("MSXML2.DOMDocument")
Set root = XDoc.createElement("Root")
XDoc.appendChild root
 
'Add child to root
Set elem = XDoc.createElement("Child")
root.appendChild elem
    
'Add Attribute to the child
Dim rel As Object
Set rel = XDoc.createAttribute("Attrib")
rel.NodeValue = "Attrib value"
elem.setAttributeNode rel
    
'Save the XML file
XDoc.Save "C:\my_file.xml"
xlsb vs xlsx

XLSB vs XLSX. The Pros and Cons of XLSB Files

1 Star2 Stars3 Stars4 Stars5 Stars (5 votes, average: 4.40 out of 5)
Loading...

Working with large Excel files is often a drag. They open slower, they take an eternity to save and they often need to be uploaded to the Cloud to be shared with your coworkers or family. Why not explore the benefits of the XSLB file format then?

What is a XLSB file format?

Inside an XLSX file
Inside an XLSX file
This is what is inside an XLSX (or XLSM) file (image on the right). What do you mean inside? – you ask.

XLSX and XLSM files are in fact compressed archives with XML files inside. That is because Microsoft has opened the Excel file format and decided to break the insides into XML files. When an XLSX or XLSM file is saved Excel needs to break it down into separate XML files, compress it and finally save it as XLSX or XLSM.

large excelThe XLSB file format on the other hand is a binary Excel file. It resembles the old XLS file format which was also a binary file.
What happens when when Excel wants to save an XSLB file? Not much really. It mostly dumps the binary data into a binary file and saves it as XLSB.

XLSB advantages (XLSB vs. XLSX)

  • XLSB files are smallerSmaller file size – the Excel binary file uses noticeably less space. This is more noticeable especially when working with large Excel files. In some cases I heard of there being XLSB files that required 10% of the original file size – this is more visible with VERY large files
  • XLSB files save or open fasterOpens/saves more quickly – loading binary data is faster than parsing text (XML) files – similarly as you would compare opening a book in Spanish and having to translate every sentence to English as opposed to picking up a ready translated copy. Similarly, when saving the file – the binary format is more efficient than dumping the data back into the XML and then compressing it. From my experience XLSB files open and get saved 2x faster
  • XLSB files support longer formulasSupports formulas above the 8192 character limit. In other file formats they don’t save properly

Yahoo! Well it would seem there is nothing less obvious to do then to start working only on the XLSB file format. However, it’s not a straightforward decision as there are some minor setbacks. Here are some that come to mind…

XLSB disadvantages (XLSB vs. XLSX)

  • Compatibility – the XLSB Excel format is not supported by Excel 2003 and earlier versions, which frankly is less of a problem nowadays
  • Security (VBA) – with the distinction between the XLSM and XLSX format you know which files may or not contain VBA macros. With XLSB you won’t know for sure. So beware when opening XLSB files from unknown sources or from people/websites you don’t trust
  • You can’t make changes to the Excel Ribbon when working on an XLSB. You must temporarily save your file as XLSX or XLSM, makes changes and save back as XLSB.
  • Lack of interoperability with third-party tools. XLSB is a binary file format unlike the open XML XLSX and XLSM files. Hence you often won’t see your XLSB files working everywhere – like in OpenOffice

Other suggestions when working with large Excel files or datasets

I usually start with the quick wins suggestions and leave the more complex for desert. The XLSB file format is a good start and often won’t require that you meddle with the data/formatting/content of your workbook, rightfully as you shouldn’t have too. However, from time to time there will be those moments when that won’t be enough and you just won’t be able to work any longer with a slow and large Excel file. Here are some useful tips:

  • Reduce the file size by deleting unused cells – as stupid as it sounds this is often the reason for your Excel files mysteriously growing in size over a short period of time. Here is the solution:
    1. Find the last used row in your worksheets

    2. Delete all rows below

    3. Find the last used column in your worksheets

    4. Delete all columns forward

    5. Save the file and close Excel

    6. Reopen the file

  • Save data files without formatting – formatting may account for a lot of storage space and if you are simply working with a dataset and don’t need formatting save your file in .xml format
  • Save data files without formatting – formatting may account for a lot of storage space and if you are simply working with a dataset and don’t need formatting save your file in .xml format or as a .csv
  • Turn automatic calculations off – often even not so large Excel files cause Excel to freeze or crash. That may be because of an abundance of Excel formulas having to recalculate each time a change is made etc. Try to turn Calculation Options to Manual to get rid of this problem
  • PowerQuery Add-In – Excel has its limitations when working with large datasets. The Excel PowerQuery Add-In by Microsoft has been designed to handle Big Data and complex data queries to external databases or datasources. PowerQuery will allow you to work more efficiently with these data sets and will not limit the amount of records you can handle within a single Excel Worksheet

Common myths about XLSB

Do XLSB runs formulas faster than XLSX?

  • Not true

XLSB file are only loaded and unloaded faster (saved and closed) than XLSX files. Afterwards both formats run in RAM memory with similar performance on the same Excel engine. Hence, you won’t see your Excel formula’s running significantly faster. However, if you do save your file frequently you will definitely notice that the XLSB file format saves faster.

Do XLSB files crash more often?

  • Not true

The XLSB file format does not in any way increase the probability of crashing. However, if the file does crash it may be harder to recover. XLSB are binary files, where XLSX and XLSM files are in fact compressed XML files – text files in XML format. Therefore, in a critical situation you have definitely a better chance of reading a text file than a binary file. Then again I wouldn’t worry about this too much.

Conclusions

On a daily basis I would recommend sticking to XLSX and XSLM as standard Excel file formats. It is worth reaching out to the XLSB file format whenever you file starts running slow or uses an enormous amount of space.

XLSB will actually not benefit small Excel files and you might even see small XLSB files taking more space than small XLSX/XSLM files. Your clients / coworkers may also have doubts when opening XLSB files as Excel treats these file formats with an extra dose of caution.

Use the XLSB file format MAINLY with very LARGE Excel files.