The Excel WEBSERVICE and Excel FILTERXML Worksheet functions can be used to pull Internet data from a URL into a cell in a Excel spreadsheet. While the first pulls raw HTML data the second allows you to filter XML formats.
Excel up till now has been mostly an offline application. Although, you can use VBA, PowerQuery or other similar data tools in Excel to gain access to Internet and Online data sets these could not have been easily used with the help of Third Party AddIns such as my Scrape HTML AddIn. With the introduction of the WEBSERVICE function we gain immediate and easy access to any REST WebAPI.
In today’s post I will show you several examples of how to use these functions in practice. And how to extract data from an XML REST WebApi.
How to use the Excel WEBSERVICE function
The WEBSERVICE function downloads the HTTP response of the provided URL.
Try running the Excel WEBSERVICE function on Google:
=WEBSERVICE("https://www.google.com")
What do you get? If all goes well you should get the HTML response for the Google Web Page.
Now past the above formula to cell A1 and the below to cell A2:
=MID(A1;FIND("<title>";A1)+LEN("<title>");FIND("</title>";A1)-FIND("<title>";A1)-LEN("<title>"))
The result of this Worksheet Excel Formula should be:
Using FILTERXML in tandem with WEBSERVICE
Another function introduced in Excel 2013 is the FILTERXML function. It is designed to work in tandem with the Excel WEBSERVICE function.
What does the FILTERXML function do?
The Excel FILTERXML Worksheet function parses a XML string (string containing XML document) and returns a single element (node or attribute) provided by a XPath.
Lots of odd words there right? So let us start breaking it down. XML is markup language for encoding documents. As a matter of fact HTML is based on XML and share a lot of similarities. Unfortunately for us HTML does not need often have to be as strictly parsed/validated as an XML does. Although some HTML could pass as XML files – in fact many Websites don’t validate as XML.
Now what is XPath? XPath is a query language for selecting XML elements such as nodes and attributes. XPath works for XML and HTML.
How to use the FILTERXML function
Now that we know what the FILTERXML functions let’s use it on a simple XML file. W3Schools fortunately has a lot of simple XML examples – let us use a simple XML Food Menu:
Example: WEBSERVICE and FILTERXML function
Example: WEBSERVICE and FILTERXML result
Explanation
Now to explain what happens above. The XML file includes a couple of nodes – each one nested within the previous one. We start with
Now the FILTERXML functions used the following XPath: //food/name. This translates to: take the first food tag and return the contents of it name node.
Scraping a whole XML document
Now the example above is fine when you need just a single node from your XML document URL. What if you want to scrape the whole contents of that XML? Thankfully we can combine the WEBSERVICE and FILTERXML functions with Array Formulas.
In the example below I will show you how to acquire all the names of the food items in the menu. You can use a similar technique to get other items.
Input the FILTERXML formula
Input the FILTERXML formula as shown below:
=FILTERXML(B2;"//food/name")
Drag the formula down
Drag the formula down to row 8:
Hit CTRL+SHIFT+ENTER
Hit the following key combo to create and Array Formula: CTRL+SHIFT+ENTER.
That is it. Now in each row you should see the name of a food item from the menu.
=FILTERXML(B2;"//food[2]/name")
will return the name of the second food menu item. To replicate this across all items use this example:
=FILTERXML(B$2;"//food[" & (ROW()-ROW(B$4)+1) &"]/name")
Provide you entered this function in cell B2, simply drag it down – the items should automatically complete.
WEBSERVICE functions Restrictions
Now the WEBSERVICE function unfortunately has several restrictions that will cause the function to return a #VALUE! error instead of the string:
- If you don’t have a working Internet connection (or you are working with a proxy server)
- Incorrect arguments or URL address
- If HTTP result is not valid or contains more than the cell limit of 32767 characters
- URL is a string that contains more than the 2048 characters that are allowed for a GET HTTP request
- Protocols that aren’t supported, such as ftp:// or file://
Excel WEBSERVICE summary
The WEBSERVICE and FILTERXML functions are a great step forward to enabling access to Internet resources. These still have unfortunately a lot of limits. Especially when most websites have HTML files that exceed the 32727 character limit and often don’t parse as XML files.
This is where my VBA Web Scraping Kit fills the gap together with my Scrape HTML AddIn. The Kit has all the Web Scraping scenarios I consider possible in Excel where as the Scrape HTML AddIn extends a little the constrains of the WEBSERVICE and the FILTERXML functions
Want to learn Web Scraping?
Not satisfied? Want to know more about Web Scraping in Excel using VBA? Read my zero-to-hero Web Scraping Tutorial.
In the WEBSERVICE() section you provide examples, eg, =MID(A1;FIND(“”;A1)+LEN(“”);FIND(“”;A1)-FIND(“”;A1)-LEN(“”))
Can I point out that some of us use commas instead of semi colons: it would be helpful if you could flag that in some way!
Otherwise, I got the formula to work without a hitch, thank you.
Hi Duncan – , vs ; is a strictly regional setting in Windows (probably Mac as well). If you have a US regional setting in Windows you will probably have to have commas “,” to separate ranges or lists. In Europe we have the semicolon “;”