XML files are one of the most common type of data files apart from text and CSV (comma-separated values) files. Reading data files which are not hierarchical (as XML files or JSON) is relatively easy. You can read in the data row by row and process columns separately. With XML (and JSON) the task is not as easy as the data is hierarchical (parent-child relationships exist between records in the schema) and the number of underlying nodes may vary as opposed to tabular data which usually has a constant number of columns separated with a specific delimiter.
Fortunately, we can use the MSXML2.DOMDocument object in VBA. Let’s however, as always, start with a short introduction as to how XML files a structure before we dive into the examples.
Loading XML document in VBA
The MSXML2.DOMDocument object allows you to easily traverse through an XML structure an extract any XML node and/or attribute needed. Let’s look at the example below.
Below we start by loading the XML document. Notice that I am selecting the load to be performed synchronously and not validation be carried out on parsing the document. Feel free to change these options if needed.
Sub TestXML() Dim XDoc As Object, root as Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.Load (ThisWorkbook.Path & "\test.xml") Set root = XDoc.DocumentElement '... End Sub
Alternatively load an XML from a string:
Sub TestXML() Dim XDoc As Object, root as Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.LoadXML ("<root><child></child></root>") Set root = XDoc.DocumentElement '... End Sub
That’s it. You have loaded the XML document into memory into the DOMDocument object. The document has been parsed and you can easily traverse the enclosed elements. See next section.
XML DOM nodes in VBA
For the below I will use the following examples XML:
<?xml version="1.0" encoding="utf-8"?> <DistributionLists> <List> <Name>Recon</Name> <TO>John;Bob;Rob;Chris</TO> <CC>Jane;Ashley</CC> <BCC>Brent</BCC> </List> <List> <Name>Safety Metrics</Name> <TO>Tom;Casper</TO> <CC>Ashley</CC> <BCC>John</BCC> </List> <List> <Name>Performance Report</Name> <TO>Huck;Ashley</TO> <CC>Tom;Andrew</CC> <BCC>John;Seema</BCC> </List> </DistributionLists>
The XML document will provide you with the root of the entire DOM (of type XDoc.DocumentElement). Each DocumentElement (XML DOM node) facilitates the following node references:
Node Reference | Type | Description |
---|---|---|
parentNode | [XDoc.DocumentElement] | The parent node, one node higher in the DOM hierarchy |
firstChild | [XDoc.DocumentElement] | The first child node, first node lower in the DOM hierarchy |
lastChild | [XDoc.DocumentElement] | The last child node, last node lower in the DOM hierarchy |
childNodes | [Array of type XDoc.DocumentElement] | All child nodes of the current node, all nodes lower in the DOM hierarchy |
nextSibling | [XDoc.DocumentElement] | Next sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node |
previousSibling | [XDoc.DocumentElement] | Previous sibling node i.e. node on the same level in the DOM hierarchy, having the same parent node |
All the above references allow you to free move within the XML DOM.
ChildNodes
Let’s start by extracting the first list and printing it’s XML and text contents. The basics to moving around the XML DOM is using ChildNodes.
Sub TestXML() Dim XDoc As Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.Load (ThisWorkbook.Path & "\test.xml") 'Get Document Elements Set lists = XDoc.DocumentElement 'Get first child ( same as ChildNodes(0) ) Set getFirstChild = lists.FirstChild 'Print first child XML Debug.Print getFirstChild.XML 'Print first child Text Debug.Print getFirstChild.Text Set XDoc = Nothing End Sub
This is the result
'Print first child XML <List> <Name>Recon</Name> <TO>John;Bob;Rob;Chris</TO> <CC>Jane;Ashley</CC> <BCC>Brent</BCC> </List> 'Print first child Text Recon John;Bob;Rob;Chris Jane;Ashley Brent
Traversing through the whole XML in VBA
Now that we got the basics let’s print out the whole contents of the XML DOM including the basenames (node names).
Sub TestXML() Dim XDoc As Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.Load (ThisWorkbook.Path & "\test.xml") 'Get Document Elements Set lists = XDoc.DocumentElement 'Traverse all elements 2 branches deep For Each listNode In lists.ChildNodes Debug.Print "---Email---" For Each fieldNode In listNode.ChildNodes Debug.Print "[" & fieldNode.BaseName & "] = [" & fieldNode.Text & "]" Next fieldNode Next listNode Set XDoc = Nothing End Sub
This is the result:
---Email--- [Name] = [Recon] [TO] = [John;Bob;Rob;Chris] [CC] = [Jane;Ashley] [BCC] = [Brent] ---Email--- [Name] = [Safety Metrics] [TO] = [Tom;Casper] [CC] = [Ashley] [BCC] = [John] ---Email--- [Name] = [Performance Report] [TO] = [Huck;Ashley] [CC] = [Tom;Andrew] [BCC] = [John;Seema]
Easy right? Using the basics above we can easily move around the document. But this still seems like a lot of coding right? Well there is an easier way of moving / extracting items using the DOMDocument object – called XPath.
XML Document example node references
Now that we have a hang of our XML document, based on the example XML I provided above I mapped a reference to how to obtain various elements of our XML file by using node references:
- DistributionLists [FirstChild]
- List [ChildNodes(0)]
- Name: Recon [ChildNodes(0).ChildNodes(0).innerText]
- TO: John;Bob;Rob;Chris [ChildNodes(0).ChildNodes(1).innerText]
- CC: Jane;Ashley
- BCC: Brent
- List [ChildNodes(1)]
- Name: Performance Report [ChildNodes(1).ChildNodes(0).innerText]
- TO: Huck;Ashley
- CC: Tom;Andrew
- BCC: John;Seema
(…)
- List [ChildNodes(0)]
XPath in VBA
Instead of traversing the elements/nodes in your XML using the .ChildNodes/.FirstChild/NextChild properties we can also use XPath. XPath is a query language used for selecting XML nodes in an XML document. It is represented by a single string. It allows you to extract any number of nodes (0 or more) which match the specified XPath query.
If you want to learn XPath I can recommend this overview:
https://www.w3schools.com/xml/xpath_syntax.asp
Now let’s jump into an example:
Example 1: Extract all Lists
Sub TestXML() Dim XDoc As Object Set XDoc = CreateObject("MSXML2.DOMDocument") XDoc.async = False: XDoc.validateOnParse = False XDoc.Load (ThisWorkbook.Path & "\test.xml") Set lists = XDoc.SelectNodes("//DistributionLists/List") Set XDoc = Nothing End Sub
Example 2: Extracting all TO fields
Set toFields = XDoc.SelectNodes("//DistributionLists/List/TO") End Sub
Example 3: Extracting the first and last Name field
Set firstNameField = XDoc.SelectNodes("//DistributionLists/List[0]/Name") Set lastNameField = XDoc.SelectNodes("//DistributionLists/List[2]/Name")
Example 3: Extracting all child List nodes (Name, TO, CC, BCC)
Set listChildrenField = XDoc.SelectNodes("//DistributionLists/List/*")
XML Attributes in VBA
Let’s tackle one last example – attributes. Let’s slightly modify the XML above and include an example attribute named attribute.
<?xml version="1.0" encoding="utf-8"?> <DistributionLists> <List> <Name attribute="some">Recon</Name>
Using XPath (or traversing the DOM) we can easily extract the attribute as shown below.
Set firstNameField = XDoc.SelectNodes("//DistributionLists/List[0]/Name") Debug.Print firstNameField(0).Attributes(0).Text 'Result: "some"
Creating XML documents
Creating documents is also quite straight forward in VBA.
Dim XDoc As Object, root As Object, elem As Object Set XDoc = CreateObject("MSXML2.DOMDocument") Set root = XDoc.createElement("Root") XDoc.appendChild root 'Add child to root Set elem = XDoc.createElement("Child") root.appendChild elem 'Add Attribute to the child Dim rel As Object Set rel = XDoc.createAttribute("Attrib") rel.NodeValue = "Attrib value" elem.setAttributeNode rel 'Save the XML file XDoc.Save "C:\my_file.xml"