Visual Basic for Application (VBA) is great for making your first steps in Web Scraping as Excel is ubiquitous and a great training arena to learn Web Scraping. Web Scraping comes down to making HTTP requests to websites. At some point however you will find that some websites will cut you off or prevent multiple HTTP requests. This is were you need to use proxy servers to hide your HTTP requests behind multiple IPs. Using a proxy server is easy in VBA so let’s study a couple of real life examples.
What is a proxy server?
A proxy server is a server that acts as an intermediary for Internet/Intranet HTTP requests. A client machine connects to the proxy server, requesting a service, file, connection, web page, or other resource available from a different server and the proxy server serves that request. Proxy servers often provide anonymity.
VBA HTTP Request using ServerXMLHttp
Before we do a proxy HTTP request using VBA let’s look a how a regular HTTP request looks like. Typically I tend to use the ServerXMLHTTP request object, as I will show in the next section, it can be easily extended to run HTTP requests via a proxy IP.
Function GetResult(url As String) As String Dim XMLHTTP As Object, ret As String Set XMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.4.0") XMLHTTP.Open "GET", url, False XMLHTTP.send ret = XMLHTTP.ResponseText GetResult = ret End Function Sub Example() 'Print the HTML of Google.com Debug.Print GetResult "http://www.google.com" End Sub
VBA HTTP Request via proxy
Let us now extend the above example to run the same HTTP request via a proxy server. Notice the new line highlighted below:
Function GetResult(url As String) As String Dim XMLHTTP As Object, ret As String Set XMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.4.0") XMLHTTP.Open "GET", url, False XMLHTTP.setProxy 2, "xxx.xxx.xxx:80" XMLHTTP.send ret = XMLHTTP.ResponseText GetResult = ret End Function
The xxx.xxx.xxx is the IP of your proxy server and 80 is the default HTTP port. You can easily google free online proxy IPs which can easily be used to run proxy requests.
Last thoughts on VBA Proxy HTTP requests
Right.. so do play a little bit with this new piece of knowledge as this is one of the most important aspects of Web Scraping (unless you want to get your IP blocked). Here are some other tips to keep in mind:
- Use at least 5-10 proxy servers or else your proxies will quickly be blocked too
- Be sure to handle proxy server errors e.g. 500, 403. A good approach is to retest any server error with an additional proxy server
- Test your proxies before use. Especially free proxy servers have a lot of down time. Although 1 proxy IP may have worked yesterday.. it may not be so today. I tend to run a simple HTTP request on all proxies and remove any raising server errors before use
- Don’t cross the line: Web Scraping is useful when you got a lot of queries you want to run automatically instead of running them manually. Don’t cross the line by content-stealing or abusing websites TOS!