excel webservice function

Excel WEBSERVICE and FILTERXML functions explained

The Excel WEBSERVICE and Excel FILTERXML Worksheet functions can be used to pull Internet data from a URL into a cell in a Excel spreadsheet. While the first pulls raw HTML data the second allows you to filter XML formats.

Excel up till now has been mostly an offline application. Although, you can use VBA, PowerQuery or other similar data tools in Excel to gain access to Internet and Online data sets these could not have been easily used with the help of Third Party AddIns such as my Scrape HTML AddIn. With the introduction of the WEBSERVICE function we gain immediate and easy access to any REST WebAPI.

In today’s post I will show you several examples of how to use these functions in practice. And how to extract data from an XML REST WebApi.

How to use the Excel WEBSERVICE function

The WEBSERVICE function downloads the HTTP response of the provided URL.
excel webservice function
Try running the Excel WEBSERVICE function on Google:

=WEBSERVICE("https://www.google.com")

What do you get? If all goes well you should get the HTML response for the Google Web Page.
Now past the above formula to cell A1 and the below to cell A2:

=MID(A1;FIND("<title>";A1)+LEN("<title>");FIND("</title>";A1)-FIND("<title>";A1)-LEN("<title>"))

The result of this Worksheet Excel Formula should be:

Google

Using FILTERXML in tandem with WEBSERVICE

Another function introduced in Excel 2013 is the FILTERXML function. It is designed to work in tandem with the Excel WEBSERVICE function.

What does the FILTERXML function do?

The Excel FILTERXML Worksheet function parses a XML string (string containing XML document) and returns a single element (node or attribute) provided by a XPath.
excel filterxml
Lots of odd words there right? So let us start breaking it down. XML is markup language for encoding documents. As a matter of fact HTML is based on XML and share a lot of similarities. Unfortunately for us HTML does not need often have to be as strictly parsed/validated as an XML does. Although some HTML could pass as XML files – in fact many Websites don’t validate as XML.

Want to learn how to manage XML documents in VBA instead? Read my VBA XML Tutorial

Now what is XPath? XPath is a query language for selecting XML elements such as nodes and attributes. XPath works for XML and HTML.

How to use the FILTERXML function

Now that we know what the FILTERXML functions let’s use it on a simple XML file. W3Schools fortunately has a lot of simple XML examples – let us use a simple XML Food Menu:

Example: WEBSERVICE and FILTERXML function

filterxml and webservice example 1

Example: WEBSERVICE and FILTERXML result

filterxml and webservice example 2

Explanation

xml exampleNow to explain what happens above. The XML file includes a couple of nodes – each one nested within the previous one. We start with node which hosts nodes. Each food node represents a single item in the menu. A food node contains , and node – which describe each menu item.

Now the FILTERXML functions used the following XPath: //food/name. This translates to: take the first food tag and return the contents of it name node.

XPath is an easy query language to learn. I personally recommend the WSchools XPath tutorial

Scraping a whole XML document

Now the example above is fine when you need just a single node from your XML document URL. What if you want to scrape the whole contents of that XML? Thankfully we can combine the WEBSERVICE and FILTERXML functions with Array Formulas.

In the example below I will show you how to acquire all the names of the food items in the menu. You can use a similar technique to get other items.

Input the FILTERXML formula

Input the FILTERXML formula as shown below:

=FILTERXML(B2;"//food/name")

filterxml and webservice example 4

Drag the formula down

Drag the formula down to row 8:
filterxml and webservice example 5

See also  How to unhide sheets in Excel? Unhide all Sheets in Excel VBA

Hit CTRL+SHIFT+ENTER

Hit the following key combo to create and Array Formula: CTRL+SHIFT+ENTER.
That is it. Now in each row you should see the name of a food item from the menu.

Don’t like using Array Functions? You can also use the XPath node index instead:

=FILTERXML(B2;"//food[2]/name")

will return the name of the second food menu item. To replicate this across all items use this example:

=FILTERXML(B$2;"//food[" & (ROW()-ROW(B$4)+1) &"]/name")

Provide you entered this function in cell B2, simply drag it down – the items should automatically complete.

WEBSERVICE functions Restrictions

Now the WEBSERVICE function unfortunately has several restrictions that will cause the function to return a #VALUE! error instead of the string:

  • If you don’t have a working Internet connection (or you are working with a proxy server)
  • Incorrect arguments or URL address
  • If HTTP result is not valid or contains more than the cell limit of 32767 characters
  • URL is a string that contains more than the 2048 characters that are allowed for a GET HTTP request
  • Protocols that aren’t supported, such as ftp:// or file://

Excel WEBSERVICE summary

The WEBSERVICE and FILTERXML functions are a great step forward to enabling access to Internet resources. These still have unfortunately a lot of limits. Especially when most websites have HTML files that exceed the 32727 character limit and often don’t parse as XML files.

This is where my VBA Web Scraping Kit fills the gap together with my Scrape HTML AddIn. The Kit has all the Web Scraping scenarios I consider possible in Excel where as the Scrape HTML AddIn extends a little the constrains of the WEBSERVICE and the FILTERXML functions

Want to learn Web Scraping?

Not satisfied? Want to know more about Web Scraping in Excel using VBA? Read my zero-to-hero Web Scraping Tutorial.

2 Comments

  1. In the WEBSERVICE() section you provide examples, eg, =MID(A1;FIND(“”;A1)+LEN(“”);FIND(“”;A1)-FIND(“”;A1)-LEN(“”))

    Can I point out that some of us use commas instead of semi colons: it would be helpful if you could flag that in some way!

    Otherwise, I got the formula to work without a hitch, thank you.

    • Hi Duncan – , vs ; is a strictly regional setting in Windows (probably Mac as well). If you have a US regional setting in Windows you will probably have to have commas “,” to separate ranges or lists. In Europe we have the semicolon “;”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.