VBA XML

Working with XML files in VBA (VBA XML)

XML files are one of the most common type of data files apart from text and CSV (comma-separated values) files. Reading data files which are not hierarchical (as XML files or JSON) is relatively easy. You can read in the data row by row and process columns separately. With XML (and JSON) the task is not as easy as the data is hierarchical (parent-child relationships exist between records in the schema) and the number of underlying nodes may vary as opposed to tabular data which usually has a constant number of columns separated with a specific delimiter.

Fortunately, we can use the MSXML2.DOMDocument object in VBA. Let’s however, as always, start with a short introduction as to how XML files a structure before we dive into the examples.

Loading XML document in VBA

The MSXML2.DOMDocument object allows you to easily traverse through an XML structure an extract any XML node and/or attribute needed. Let’s look at the example below.

Below we start by loading the XML document. Notice that I am selecting the load to be performed synchronously and not validation be carried out on parsing the document. Feel free to change these options if needed.

Sub TestXML()
    Dim XDoc As Object, root as Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.Load (ThisWorkbook.Path & "\test.xml")
    Set root = XDoc.DocumentElement
    '... 
End Sub

Alternatively load an XML from a string:

Sub TestXML()
    Dim XDoc As Object, root as Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.LoadXML ("<root><child></child></root>")
    Set root = XDoc.DocumentElement
    '... 
End Sub

That’s it. You have loaded the XML document into memory into the DOMDocument object. The document has been parsed and you can easily traverse the enclosed elements. See next section.

XML DOM nodes in VBA

For the below I will use the following examples XML:

<?xml version="1.0" encoding="utf-8"?>
<DistributionLists>
    <List>
        <Name>Recon</Name>
        <TO>John;Bob;Rob;Chris</TO>
        <CC>Jane;Ashley</CC>
        <BCC>Brent</BCC>
    </List>
    <List>
        <Name>Safety Metrics</Name>
        <TO>Tom;Casper</TO>
        <CC>Ashley</CC>
        <BCC>John</BCC>
    </List>
    <List>
        <Name>Performance Report</Name>
        <TO>Huck;Ashley</TO>
        <CC>Tom;Andrew</CC>
        <BCC>John;Seema</BCC>
    </List>
</DistributionLists>

The XML document will provide you with the root of the entire DOM (of type XDoc.DocumentElement). Each DocumentElement (XML DOM node) facilitates the following node references:

Node Reference Type Description
parentNode [XDoc.DocumentElement] The parent node, one node higher in the DOM hierarchy
firstChild [XDoc.DocumentElement] The first child node, first node lower in the DOM hierarchy
lastChild [XDoc.DocumentElement] The last child node, last node lower in the DOM hierarchy
childNodes [Array of type XDoc.DocumentElement] All child nodes of the current node, all nodes lower in the DOM hierarchy
nextSibling [XDoc.DocumentElement] Next sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node
previousSibling [XDoc.DocumentElement] Previous sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node

All the above references allow you to free move within the XML DOM.

ChildNodes

Let’s start by extracting the first list and printing it’s XML and text contents. The basics to moving around the XML DOM is using ChildNodes.

Sub TestXML()
    Dim XDoc As Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.Load (ThisWorkbook.Path & "\test.xml")
    
    'Get Document Elements
    Set lists = XDoc.DocumentElement
    
    'Get first child ( same as ChildNodes(0) )
    Set getFirstChild = lists.FirstChild
    'Print first child XML
    Debug.Print getFirstChild.XML
    'Print first child Text
    Debug.Print getFirstChild.Text

    Set XDoc = Nothing
End Sub

This is the result

'Print first child XML
<List>
    <Name>Recon</Name>
    <TO>John;Bob;Rob;Chris</TO>
    <CC>Jane;Ashley</CC>
    <BCC>Brent</BCC>
</List>
'Print first child Text
Recon John;Bob;Rob;Chris Jane;Ashley Brent

Traversing through the whole XML in VBA

Now that we got the basics let’s print out the whole contents of the XML DOM including the basenames (node names).

Sub TestXML()
    Dim XDoc As Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.Load (ThisWorkbook.Path & "\test.xml")
    
    'Get Document Elements
    Set lists = XDoc.DocumentElement
    
    'Traverse all elements 2 branches deep
    For Each listNode In lists.ChildNodes
        Debug.Print "---Email---"
        For Each fieldNode In listNode.ChildNodes
            Debug.Print "[" & fieldNode.BaseName & "] = [" & fieldNode.Text & "]"
        Next fieldNode
    Next listNode
    
    Set XDoc = Nothing
End Sub

This is the result:

---Email---
[Name] = [Recon]
[TO] = [John;Bob;Rob;Chris]
[CC] = [Jane;Ashley]
[BCC] = [Brent]
---Email---
[Name] = [Safety Metrics]
[TO] = [Tom;Casper]
[CC] = [Ashley]
[BCC] = [John]
---Email---
[Name] = [Performance Report]
[TO] = [Huck;Ashley]
[CC] = [Tom;Andrew]
[BCC] = [John;Seema]

Easy right? Using the basics above we can easily move around the document. But this still seems like a lot of coding right? Well there is an easier way of moving / extracting items using the DOMDocument object – called XPath.

See also  The VBA Type - VBA Custom Type alternative to Class

XML Document example node references

Now that we have a hang of our XML document, based on the example XML I provided above I mapped a reference to how to obtain various elements of our XML file by using node references:

  • DistributionLists [FirstChild]
    • List [ChildNodes(0)]
      • Name: Recon [ChildNodes(0).ChildNodes(0).innerText]
      • TO: John;Bob;Rob;Chris [ChildNodes(0).ChildNodes(1).innerText]
      • CC: Jane;Ashley
      • BCC: Brent
    • (…)

    • List [ChildNodes(1)]
      • Name: Performance Report [ChildNodes(1).ChildNodes(0).innerText]
      • TO: Huck;Ashley
      • CC: Tom;Andrew
      • BCC: John;Seema

XPath in VBA

Instead of traversing the elements/nodes in your XML using the .ChildNodes/.FirstChild/NextChild properties we can also use XPath. XPath is a query language used for selecting XML nodes in an XML document. It is represented by a single string. It allows you to extract any number of nodes (0 or more) which match the specified XPath query.

If you want to learn XPath I can recommend this overview:
https://www.w3schools.com/xml/xpath_syntax.asp

Now let’s jump into an example:

Example 1: Extract all Lists

Sub TestXML()
    Dim XDoc As Object
    
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False: XDoc.validateOnParse = False
    XDoc.Load (ThisWorkbook.Path & "\test.xml")
    
    Set lists = XDoc.SelectNodes("//DistributionLists/List")

    Set XDoc = Nothing
End Sub

Example 2: Extracting all TO fields

Set toFields = XDoc.SelectNodes("//DistributionLists/List/TO")
End Sub

Example 3: Extracting the first and last Name field

Set firstNameField = XDoc.SelectNodes("//DistributionLists/List[0]/Name")

Set lastNameField = XDoc.SelectNodes("//DistributionLists/List[2]/Name")

Example 3: Extracting all child List nodes (Name, TO, CC, BCC)

Set listChildrenField = XDoc.SelectNodes("//DistributionLists/List/*")

XML Attributes in VBA

Let’s tackle one last example – attributes. Let’s slightly modify the XML above and include an example attribute named attribute.

<?xml version="1.0" encoding="utf-8"?>
<DistributionLists>
    <List>
        <Name attribute="some">Recon</Name>

Using XPath (or traversing the DOM) we can easily extract the attribute as shown below.

Set firstNameField = XDoc.SelectNodes("//DistributionLists/List[0]/Name")
Debug.Print firstNameField(0).Attributes(0).Text
'Result: "some"

Creating XML documents

Creating documents is also quite straight forward in VBA.

Dim XDoc As Object, root As Object, elem As Object
Set XDoc = CreateObject("MSXML2.DOMDocument")
Set root = XDoc.createElement("Root")
XDoc.appendChild root
 
'Add child to root
Set elem = XDoc.createElement("Child")
root.appendChild elem
    
'Add Attribute to the child
Dim rel As Object
Set rel = XDoc.createAttribute("Attrib")
rel.NodeValue = "Attrib value"
elem.setAttributeNode rel
    
'Save the XML file
XDoc.Save "C:\my_file.xml"

Comments are closed.