Tag Archives: web

scrape html add-in

Excel Scrape HTML Tool added to the Scrape HTML Add-In

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 4.00 out of 5)
Loading...

As I mentioned I am planning to extend the Scrape HTML Add-In with additional tools and functionalities. Scraping HTML content should not require any VBA coding – this rule is guiding the further development of this tool.

See the latest version of the Excel Scrape HTML Add-In here:
Excel Scrape HTML Add-In

Adding to the toolbox I would like to introduce you to the Scrape HTML Tool. The Get* functions (e.g. GetElementByRegex) which I posted earlier are really useful when you are making a solution which can be reused e.g. scraping regularly posted online data etc. I suppose, however, that sometimes there is only a need to scrape some content once but preferably in a structured manner or only some subsets of the content.

The Scrape HTML Tool

The Scrape HTML Tool
The Scrape HTML Tool

The tool comes in handy when you want to quickly scrape all items of a certain type (matching a certain regular expression). It comes with several predefined scraping regular expressions e.g. scraping URLs and img src properties. However, these examples are just to start you off with building your own patterns/expressions.

See this video on how the Scrape HTML Tool can help you:

Let me know if this tool is useful to you and if you see any need of extending it!

I am also planning to post some a simple tutorial or something to show more elaborate examples of scraping/downloading HTML content from the web. Information is power – it is time to make usage of data more simple.

google translate

Excel Google Translate functionality

1 Star2 Stars3 Stars4 Stars5 Stars (11 votes, average: 4.36 out of 5)
Loading...

Translations are not a fascinating task for me. I once received a request to help out a colleague in translating an Excel file to English. Copy to Google Translate, translate, copy back to Excel and again… After going through part of the file we quickly extrapolated that the whole translation would take at least a whole day if done in this manner. I had to optimize this process as much as possible.

That’s when I thought of leveraging Google Translate to help with my localization efforts. Why translate stuff manually when we can put Excel to do the job for you.

Excel Google Translate Code

Find below a quickly written VBA procedure that adds a shortcut to Excel to quickly translate any cell from one language to another. The below is a simple procedure you can easily pin to your Excel shortcuts (e.g. CTRL+K or similar). Simply select a certain Excel range and execute the TranslateCell procedure below.

Looking to translate an Excel or Word document? Why not use my Word / Excel Translator AddIns?
Sub TranslateCell()
    Dim getParam As String, trans As String, translateFrom As String, translateTo As String
    translateFrom = "fr"
    translateTo = "en"
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    getParam = ConvertToGet(ActiveCell.Value)
    URL = "https://translate.google.pl/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
    objHTTP.Open "GET", URL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ("")
    If InStr(objHTTP.responseText, "div dir=""ltr""") > 0 Then
        trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")
        ActiveCell.Value = Clean(trans)
    Else
        MsgBox ("Error")
    End If
End Sub

'----Used functions----
Function ConvertToGet(val As String)
    val = Replace(val, " ", "+")
    val = Replace(val, vbNewLine, "+")
    val = Replace(val, "(", "%28")
    val = Replace(val, ")", "%29")
    ConvertToGet = val
End Function
Function Clean(val As String)
    val = Replace(val, "&quot;", """")
    val = Replace(val, "%2C", ",")
    val = Replace(val, "&#39;", "'")
    Clean = val
End Function
Public Function RegexExecute(str As String, reg As String, _
                             Optional matchIndex As Long, _
                             Optional subMatchIndex As Long) As String
    On Error GoTo ErrHandl
    Set regex = CreateObject("VBScript.RegExp"): regex.Pattern = reg
    regex.Global = Not (matchIndex = 0 And subMatchIndex = 0) 'For efficiency
    If regex.Test(str) Then
        Set matches = regex.Execute(str)
        RegexExecute = matches(matchIndex).SubMatches(subMatchIndex)
        Exit Function
    End If
ErrHandl:
    RegexExecute = CVErr(xlErrValue)
End Function

How to setup Google Translate code in Excel?

    • Go to the DEVELOPER ribbon and add select Visual Basic a and new Module to your VBA Project
      insert macro
    • Insert the code from sections above (notice that the function is “Public” – therefore will be visible in your workbook as a so called UDF (User Defined Function) add macro
    • Go to the worksheet and input the function as shown above
    • Modify the translateFrom and translateTo languages in the function to change the translation e.g. change “pl” to “de” to change the translation from Polish to German.

Here you can find the full list of language 2 letter codes.

  • In the Excel Developer ribbon open Macros
  • Select the TranslateCell sub and go to Options
  • Add a shortcut: e.g. CTRL+SHIFT+T

How to use it?

  • Select any cell in Excel and click the configured shortcut e.g. CTRL+SHIFT+T

What if I want an UDF?

The procedure above can easily be converted to a User Defined Function (UDF) instead, allowing you to use it as a formula. Be sure to copy the required supporting functions (from the sections above) ConvertToGet, Clean and RegexExecute.

VBA UDF Google Translate

Public Function Translate(rng As Range, Optional translateFrom As String = "nl", Optional translateTo As String = "en")
    Dim getParam As String, trans As String, objHTTP As Object, URL As String
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    getParam = ConvertToGet(rng.Value)
    URL = "https://translate.google.pl/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
    objHTTP.Open "GET", URL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ("")
    If InStr(objHTTP.responseText, "div dir=""ltr""") > 0 Then
        trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>

")
        Translate = Clean(trans)
    Else
        Translate = CVErr(xlErrValue)
    End If
End Function

 

 

 

How to use the Translate UDF?

See an example below of how to translate text in cell A1 from French (fr) to English (en):

=TRANSLATE("A1","fr","en")

What if I want it to run on all selected cells?

Here is a modification of the function above to run on ALL selected cells:

Sub TranslateCell()
    Dim getParam As String, trans As String, translateFrom As String, translateTo As String
    translateFrom = "pl"
    translateTo = "en"
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    Dim r As Range, cell As Range
    Set cell = Selection
    For Each cell In Selection.Cells
        getParam = ConvertToGet(cell.Value)
        URL = "https://translate.google.pl/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
        objHTTP.Open "GET", URL, False
        objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        objHTTP.send ("")
        If InStr(objHTTP.responseText, "div dir=""ltr""") > 0 Then
            trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")
            cell.Value = Clean(trans)
        Else
            MsgBox ("Error")
        End If
    Next cell
End Sub