VBA to Python

VBA to Python – 10 Simple Python vs VBA Examples

The way Microsoft is moving with introduction of Office Scripts it does seem like the Redmond giant will not be planning to invest anymore in expanding VBA, however, will leave it as the designated offline scripting language. The challenge there is that while Office Scripts will become more mature we still need to wait a while for them to be fully supported for the desktop version of Excel. Meanwhile VBA is not getting any younger and is not evolving step by step with leading data science languages like Python. While I don’t think VBA will go away in the next 10 years, Microsoft will definitely be trying to get more users onto Office Scripts. But why limit yourself? I personally think today’s analysts should look towards languages and tools that are more versatile to processing data. While Office Scripts will integrate better with Excel and with Microsoft’s Power Suite of tools like Power Automate, I think today’s analysts should look towards languages used more broadly across fields of data science and data engineering.

Python has in the last few years become the language of choice for an overwhelming community of data professionals. If you are looking to migrate your VBA scripts to Python you don’t have to wait! You can start developing your scripts for Excel in Python right away, with only a few extra steps. Let me explain why would you want to consider doing so…

Why switch VBA to Python?

Don’t get me wrong I have created my fair share of VBA tools and automations. If you are creating Access based applications this might not be the alternative for you. However, if you use VBA mainly for creating automations, data cleaning, reports then I think you might appreciate the swap.

Python is the most popular programming language according to the TIOBE index. It is loved by data scientists and data engineers alike. Most machine learning algorithms on Kaggle are being created in Python. Similarly Python is the primary language used to extract, transform and load data used is managed cloud services like Amazon Glue or Azure Databricks. Invest in learning a language that can be more versatile than just for Excel.

Python supports many data formats today you might be processing Excel files. However, whether it is CSV, Excel, text, avro, json or any other type of data file – it shouldn’t really make a difference to you as a developer. Why limit yourself to processing Excel spreadsheets when instead you can learn a language that will help you process any data format instead.

Python is a more mature programming language. While VBA hasn’t seen updates since last 20 years, Python is in it’s 3rd major release, with it’s latest minor 3.10 version released just last autumn. With each release there are new capabilities and improvements to Python.

My journey of switching from VBA to Python

In this post I will try give VBA developers a quick comparison of some example scripts for common scenarios and how they can be recreated in Python. I personally started my programming journey back in the days from C/C++/C#. I started to do more Excel data wrangling and modeling to find I couldn’t do it without scripting. VBA was really the only alternative, as doing it in C# took much more time and languages like Python weren’t really then an alternative.

However, over the years as I moved more into data engineering I appreciated the ease with which you can process all kinds of data files using Databricks or Python Pandas. Over the past years Python has seen the Pandas library evolve and many other Excel dedicated libraries appear such as openpyxl. As of today I think the question of whether to use VBA or Python is not a clear choice anymore and requires considering each use case.

Anaconda for Python development

Albeit there are many IDEs (Integrated Development Environments) available for Python developers out there I suggest installing Anaconda. Anaconda is currently the most popular distribution platform for Python developers and comes with a Python distribution, the Spyder IDE used for data science and Jupyter Notebooks – the leading notebook platform for Python development.

Download the distribution package from the Anaconda website for your operating system (Windows, Apple, Linux) and start the installation.

Install Anaconda

Once installed you can decide if you want to continue coding in Spyder IDE or Jupyter Notebooks. For beginners I recommend Jupyter as it is more verbose and you can run bits and pieces of your code while seeing their outputs.

To get started open Jupyter Notebooks. After a moment you should be welcome with a browser window in which you will be welcome by Jupyter:Jupyter Screen
Now let us create our new Script file. Click on the New button and select Python 3 (ipykernel). That’s it you should see an browser tab open with an unnamed python file ready for action.
Jupyter Create New Python script
You should be all set now to play around with our scripts. If you want to get more familiar with Jupyter I recommend you read this tutorial.

Example 1: Hello World

Let’s start with a very simple example all the while getting used to using Jupyter Notebooks.

Hello World in VBA

With VBA you always need to declare a Sub procedure that runs your code. Hence, we will start by creating our Main Sub and inside we will include a simple Debug.Print statement to output our Hellos on to the Immediate console window

Example 1 - VBA - Hello World

Hello World in Python

In Python if you are running code from an IDE you can simply run the following print statement. Simply type your code into the first cell and hit SHIFT+ENTER to execute it:

Example 1 - Python - Hello World

In case you will be compiling your files you will need to specify the main procedure by defining it and running it like so:

def main():
  print("Hello World!")
  
main()

It is a good practice to always define a main procedure as your get used to creating more complex python executables and packages. However, in my examples below I want to keep the code simple and avoid wrapping it in main.

Example 2: Write to a cell

In our next example we will need to create a blank Excel file and write a single cell and close the file.

Write to a cell in VBA

In VBA we will need to create a new Excel Application (or use the existing one), create a new workbook, open the first worksheet and write text to our first worksheet cell. Lastly we will save our file and close the Excel application.

Sub Main()
    Dim app As Excel.Application
    Dim wb As Excel.Workbook
    Dim ws As Excel.Worksheet
    Dim fileName As String
    
    fileName = "C:\...\Test.xlsx" 'Replace ... with your file location
    Set app = New Excel.Application
    Set wb = app.Workbooks.Add
    Set ws = wb.Sheets(1)
    
    ws.Cells(1, 1) = "Hello there!"
    
    wb.SaveAs fileName
    app.Quit
End Sub

This code is actually very heavy as you will notice it will take a moment to see the new Excel application being launched.

Write to a cell in Python

In Python we usually fallback on certain libraries that allow us to run certain operations. In the case of Excel, if we want to have a very similar experience to using VBA we can leverage the openpyxl library. It comes with all frequently needed capabilities you might expect from VBA. Additionally it is very lightweight as it operates on the XML structure of Excel files (yep – Excel files are actually packages XML files) instead of having to launch Excel to run the operations for us. What happens below is we create a new Workbook, get the first active worksheet, save the string and save our new file or overwrite the old one.

import openpyxl as op
wb = op.Workbook()
ws = wb.active
ws.cell(1,1).value = "Hello there!"
wb.save(r"C:\... \Test.xlsx") #Replace ... with your file location

Appreciate how much shorter the code is! Because there is not interoperability with Excel we don’t need to remember about closing the application itself – as it was never open in the first place. In Python and in VBA we don’t really have to declare our variables. However, in VBA we won’t be able to see any hints on the variables methods and attributes if we don’t declare them. In Python, however, if you define a variable, you can create a new row in Jupyter and hit TAB after the dot “.” to see the available methods/attributes or hit SHIFT+TAB to get the documentation of the method. Much more convenient!

Example 3: String manipulation

In many scripting scenarios we often use VBA to do operations on Strings that otherwise would be very cumbersome by using LEFT, RIGHT, FIND or other Excel Formulas. Let’s compare multiple string manipulation examples in VBA vs Python.

String Manipulation in VBA

Usually using Excel formulas it is easy to make changes using standard LEFT, RIGHT functions. What is harder in Excel Formulas is splitting text. Below a simple example in VBA:

Dim full_name As String
full_name = "tom_hardY@google.com"

If InStr(full_name, "@") Then full_name = Split(full_name, "@")(0)
    
first_name = Split(full_name, "_")(0)
last_name = Split(full_name, "_")(1)
    
first_name = StrConv(Trim(first_name), vbProperCase) 'Tom
last_name = StrConv(Trim(last_name), vbProperCase)   'Hardy

String Manipulation in Python

The code in python will look similar, but notice that instead of running functions on text our strings actually ready methods for splitting, capitalizing and trimming text.

full_name = "tom_hardY@google.com"
if full_name.find("@") > 0 : full_name = full_name.split(sep="@")[0]
first_name = full_name.split("_")[0]
last_name = full_name.split("_")[1]
first_name = first_name.strip().capitalize() #Tom
last_name = last_name.strip().capitalize()   #Hardy

Example 4: Iterating over rows

A common task you perform in VBA is iterating over a set of rows in order to run tasks on individuals rows that you would have a hard time processing using Excel Formulas. For this example we will use the following simple Excel table. What I want is to iterate through the rows and print all first and last names of ppl that have an income above 60k.
Iterating Rows - VBA

Iterating over rows in VBA

In VBA we often don’t iterate over entire rows but more often over rows of a specific column (like an id column) in order to filter for patterns and only then do we run tasks on other columns of that particular row.

Below I need to identify all income cells in the G column and then filter only for ones that have values. Next in the loop I am checking whether the value is above 60k and ignoring the header row. To print the first and last name I am offsetting the cell by -5 and -4 columns respectively to capture the first and last names.

Sub IterateRows()
   Dim cell As Range
   Dim ws As Worksheet
   Set ws = Sheets("Sheet1")

   For Each cell In ws.Range("G:G").SpecialCells(xlCellTypeConstants)
        If cell.Value > 60000 And cell.Row > 1 Then
            Debug.Print "First name: " & cell.Offset(, -5) & ", Last name: " & cell.Offset(, -4)
        End If
   Next cell    
End Sub

I would say this is where VBA is a little clumsy as there are a lot of rooms for errors here. You could argue that we could have applied a Table on top of the data set or applied a filter. However, all of these require additional steps.

See also  How to Split Cells in Excel. How to Split Excel Columns.

Iterating over rows in Python

In Python we will fallback on one of the most popular data analysis libraries – Pandas. Now Pandas is loved by data scientists and very versatile due to the fact is can be applied across all kinds of file types and schemas. The key data structure in Pandas is a DataFrame – which basically represents a table with headers.

In the code below, we start by importing Pandas and loading the file and worksheet into df which is a variable with our DataFrame. The DataFrame by default detects the header row (by default the first row). We can use iterrows to iterate through each of the rows of our DataFrame. We can build our filtering directly into the DataFrame by simply df[ df[“income”]>60000 which indicates that we want to filter on all income values above 60k.

import pandas as pd

file_name = r"C:\...\Test.xlsx" #Replace ... with your path
df = pd.read_excel(file_name, sheet_name="Sheet1")

for r in df[ df["income"]>60000 ].iterrows():
    first_name = r[1]["first_name"]
    last_name = r[1]["last_name"]
    print(f"First Name:{first_name} Last Name:{last_name}")

What is great about DataFrames is you can add them very easily to simplify this task even more by created a series containing only filtered first and last names like below. This is a very elegant solution and more simple that using VBA.

import pandas as pd

file_name = r"C:\...\Test.xlsx" #Replace ... with your path
df = pd.read_excel(file_name, sheet_name="Sheet1")
df = df[df["income"] > 60000]

names = "First name:" + df["first_name"]+", Last name: " + df["last_name"]
for n in names:
    print(n)

Example 5: Inserting and Deleting Rows

Typically you might use VBA to insert new data rows into existing worksheets or remove existing rows. The problem using VBA is that often this comes at a large computational expense when compared to Python.

Inserting and Deleting Rows in VBA

As aluided above inserts/deletes can be time consuming in VBA – depending on how many rows of data you will have beneath the inserted/deleted rows. This is often because Excel will recompile a lot of calculations or for other reasons.

Sub InsertDeleteRows()

    'Delete row 10
    Range("10:10").EntireRow.Delete
    
    'Insert a row above row 10
    Range("10:10").EntireRow.Insert xlUp
    
End Sub

Inserting and Deleting Rows in Python

As we might not be looking at a structured worksheet we will again fallback on the openpyxl library. As opposed to VBA we need to be working on a closed document. If we want to save our changes we need to finish with save.

import openpyxl as op
file_name = r"C:\...\Test.xlsx" #Replace ... with your path
wb = op.load_workbook(file_name)

ws = wb["Sheet1"]
ws.delete_rows(10)
ws.insert_rows(10)

wb.save(file_name)

Example 6: Adding and Removing Worksheets

Adding and Removing Worksheets in VBA

In VBA we will need to add a blank worksheet to our ActiveWorkbook (the one open and visible), only then can we name our worksheet. When deleting the worksheet we might run into an issue that we will receive a pop-up window asking us if we want to delete the worksheet. That can be turned off using the DisplayAlerts option but requires an extra step.

Sub AddRemoveWorksheets()

    Dim wb As Workbook
    Dim ws As Worksheet
    Set wb = ActiveWorkbook
    
    'Add worksheet
    Set ws = wb.Worksheets.Add
    ws.Name = "NewWorksheet"
    
    'Find and remove worksheet
    Set ws = wb.Sheets("NewWorksheet")
    ws.Delete
    
End Sub

Adding and Removing Worksheets in Python

In Python both operations are oneliners.

import openpyxl as op
file_name = r"C:\...\Test.xlsx" #Replace ... with your path
wb = op.load_workbook(file_name)

wb.create_sheet("NewWorksheet")
wb.remove_sheet("NewWorksheet")

wb.save(file_name)

Example 7: Formatting cells

Formatting cells in VBA

Sub FormatCell()

    Dim cell As Range
    Set a1= Range("A1")
    
    a1.Font.Bold = True
    a1.Font.Color = RGB(255, 0, 0)
    a1.Borders.LineStyle = xlDot
    a1.Interior.Color = RGB(0, 255, 0)

End Sub

Formatting cells in Python

In Python to do Excel cell formatting we will need a few additional modules – Font, Color, Border and Side. Once imported the process is pretty similar to how you would do it in VBA.

from openpyxl.styles import colors
from openpyxl.styles import Font, Color, Border, Side
from openpyxl import Workbook
file_name = r"C:\...\Test.xlsx" #Replace ... with your path
wb = op.load_workbook(file_name)

ws = wb["Sheet1"]
a1 = ws.cell(1,1)
a1.font = Font(bold = True, color="FF0000")
bd = Side(style='thick', color="000000")
a1.border = Border(left=bd, top=bd, right=bd, bottom=bd)

wb.save(file_name)

Example 8: Conditional Statements

Contiational Statements in VBA

Sub ConditionalStatements()

    Dim num As Long
    num = 2
   
    If num > 10 Then
        Debug.Print "Value greater than 10"
    
    ElseIf num > 5 Then
        Debug.Print "Value greater than 5"
    Else
        Debug.Print "Value less or equal to 5"
    End If

End Sub

Conditional Statements in Python

num = 2
if num > 10:
    print("Value greater than 10")
elif num > 5:
    print("Value greater than 5")
else: print("Value less or equal to 5")

Example 9: Message Boxes

Message boxes are an easy way to notify your users of certain events such as finishing running a macro or asking them simple Yes / No questions.

Message Boxes in VBA

In VBA we use the native MsgBox function. They issue is that for Yes/No boxes we need to remember that the function will return a VBA code that we need to compare using one of the built in variables vbYes or vbNo.

Sub MessageBox()

    'Information box
    MsgBox "Hello", vbInformation, "Information"
    
    
    'Yes / No question
    If MsgBox("Do you like this tutorial?", vbYesNo, "Question") = vbYes Then
        Debug.Print "They like it!"
    Else
        Debug.Print "They don't like it!"
    End If
    
End Sub

Excel MessageBox

Message Boxes in Python

Python does not come with a native Message Box function. Instead you need to import a library that has this feature. I recommend using tkinter as it very similar in usage to the MsgBox VBA function.

from tkinter import messagebox

#Information box
messagebox.showinfo("Hello","Information")

#Yes / No question
if messagebox.askyesno("Question", "Do you like this tutorial?"):
    print("They like it!")
else:
    print("They don't like it!")

Python MessageBox

Example 10: Functions

As my last example let us compare how you can create custom functions in VBA vs Python.

Functions in VBA

In VBA we always need to be clear whether we are defining a subroutine or a function. For Functions if you want to specify an undefined number of arguments you can use the ParamArray.

'A Function with 2 arguments
Function AddNumbers(num1 as Long, num2 as Long) as Long
  ret = num1+num2
  AddNumbers = ret
End Function

'A Function with undefined number of arguments
Function AddManyNumbers(ParamArray values() as Variant) as Long
   ret = 0
   For Each v in values
      ret = ret + v
   next v
   AddManyNumbers = ret
End Function

Sub Test()
   Debug.Print AddNumbers 10, 20
   'Result 30

   Debug.Print AddManyNumbers 10, 20, 30
   'Result 60
End Sub

Functions in Python

In Python we do not have to be explicit about subroutines vs functions. You main code will run only when it skips the functions defined from top to bottom as shown below. Additionally if you want to create functions with undefined amounts of arguments you can precede your argument with * (or ** if you want a dictionary instead of a list).

#Function with 2 arguments
def add_numbers(num1, num2):
    ret = num1 + num2
    return ret

#Function with many arguments
def add_many_numbers(*args):
    ret = 0
    for arg in args:
        ret = ret + arg
    return ret

#Test functions
print(add_numbers(10,20))
'Result 30

print(add_many_numbers(10,20,30))
'Result 60

Python vs VBA: Conclusions

While I am not arguing to stop using VBA entirely in you Excel scripts, I think it makes sense to start considering Python for more complex automations/data processing tasks. Conversely when I get asked about what language is a good starter for beginner analysts – I still recommend they start with VBA. With Excel you get an IDE built in, your VBA scripts can get easily shared and appreciated by other folk at your workplace. Let me expand a bit more why VBA might still be your default choice some times:

Why you might still prefer to use VBA?

You should still stick with VBA if you are beginner developer and primarily an Excel user. If you haven’t got a strong grasp of programming and on daily basis you create only simple, usually pre-recorded, scripts I don’t think the switch will provide you a lot of benefit.
What is Python missing?

  • Python isn’t integrated into Excel. You always need to send your scripts separately and assure your users have Python installed on their workstations
  • Python doesn’t run on open documents – in VBA you might be used to seeing your code live on your open documents. Python usually manipulates closed documents, although you might try using interop libraries, however, that defeats the purpose of using Python
  • There is no Excel Macro recorder for Python scripts – if you are a beginner I would still stick with VBA, until you get a good grasp of it

When is the right moment to go Python?

If you are data engineer/data scientist Python makes more sense from the start. You don’t really care if you need to process Excel, csv or json files, data is data. In such cases all the benefits of packaging VBA with Excel files etc. don’t really matter.

  • Python has many data libraries such as numpy and pandas for data processing and mathematics. This includes data science ML/AI libraries such as scikit-learn allowing you to build neural networks, decision trees etc.
  • Python libraries, such as pandas, usually provide similar support for all kinds of data formats. This means you don’t need to learn different approaches for processing csv vs Excel files, instead you just need to get familiar with DataFrames
  • Support for parallel processing – you can parallelize your loops in just a few lines an increase many times over the performance of your scripts. VBA whereas is be default single-threaded and running large scripts can sometimes take hours
  • Processing of large datasets. VBA wasn’t built for processing large datasets and often even storing data in arrays of thousands can cause memory issues or exceptions. Python can easily process millions of rows of data. If you need to process terabytes of data you can use frameworks such as Spark
  • Working with APIs and web scraping. You can run web calls using the XMLHttpObject, however, the experience won’t be as easy as using the Python requests library, Scrapy, BeautifulSoup etc. Especially when you need to use authentication tokens. For precisely this reason Microsoft recommends Office Scripts

Any thoughts?

7 Comments

  1. Thank you for all of your efforts in creating this website as a wonderful resource for the community. I am a seasoned VBA developer (15+ years) and have many, many lines of code that practically automate my day job. I have looked at many other languages (python included) but the big deal breakers for me were: interacting with the Excel Object Model – which is fluent and easy on VBA and not being able to operate on open workbooks – I often run routines on the active workbook (from an addin) and can then see the results immediately. So what I have ended up doing is using VBA 95% of the time and for the remaining 5% (where VBA would be too slow) starting a shell (from vba) to run a .Net exe (which will be much faster than python) then pulling the results back in to Excel through VBA. But please don’t misunderstand me – I think Python is a great language with some really cool features and I continue to read your posts with interest.

    • Hi Alan, appreciate the feedback from an experienced VBA developer. I can’t disagree with your point – Python isn’t that good for running routines on open workbooks (using Interop), and creating Python Excel AddIns requires a lot of extra steps and $$$ (e.g. PyXLL). In terms of performance it all depends on the type of libraries used – if you are using .NET interoperability you won’t see that many performance benefits compared to VBA in a single thread. Obviously that will change when using the OpenXML SDK. C#.NET obviously will be faster than a Python interpreter. I spent the first few years of my career as a Microsoft .NET developer and loved it as a platform.

      The benefits of Python vs C#.NET are IMHO – the speed of writing code (C#.NET will have a lot of overhead), portability (Linux, Mac), ETL /Numeric / ML & AI libraries. I noticed that when I want to run a quick script – it just takes me a minute to spin-up Jupyter / Spyder and another minute to write a few lines of code (and pip any missing libraries). I can see why so many developers appreciate the pandas lib and how Python’s syntax allows you to easily filter DataFrames, run multirow computations etc.

      Stay tuned for more articles soon on Python for Excel! While I will try to keep the VBA posts going ;).

      • If you are thinking of a topic for a future post – I would be very interested to see a worked example of taking data from an Excel table and manipulating it with pandas dataframes etc then writing back to the workbook. ๐Ÿ˜€

      • Thanks Alan. I was planning exactly that! I think Python is a natural client and server side language for data analysts nowadays and definitely worth learning. Look out for more posts on the subject coming soon!

  2. Thank you for your reply. I just want to clarify one of the points that I made. I don’t use XLL addins (as I don’t need UDFs in my worksheets) and I avoid interop (at all costs) because it is messy and (in my experience) unreliable – memory issues and crashes.

    Almost all of my work is done in VBA – because it is perfectly adequate for the job. But when it’s not – I delegate to a compiled language.

    I export my data from Excel to a txt file then launch a .Net exe from a VBA shell (so VBA is still ‘in control’). The .Net exe outputs to one or more txt files (with the desired calculations done) and VBA the brings that ‘new’ data back in to the Excel Workbook. From there, I can update pivot tables, charts etc and report accordingly.

    This approach (using an external coding language for data processing – while avoiding all direct interop) does not seem to be widely used (I haven’t seen anyone in the Excel community write about it) but it is immensely powerful.

    My exe files are .Net (because that suits my purposes and I find it fairly easy to write). But it could easily be C++, Rust, Python, Javascript {insert any language here}.

    The big breakthrough came when I realised that the interface does not have to be in memory (as has been the traditional way of doing it). The interface can be txt files – which is the same principle you would use to call an external Web api and retrieve data.

    As a side issue, I was inspired by one of your previous articles on multi-threading, and Golang’s concept of channels to create a new dedicated folder in the temp directory for each process to communicate between VBA and the Exe. I find the whole setup works really well as I have intermediary files in this folder – which can be very handy for debugging and I regularly clean my temp foldrs with CCleaner.

    Anyway, I just enjoy chatting with someone who appreciates Excel and code – it doesn’t happen to me very often!

    • Interesting to hear someone doing such complex client side solutions. It does seem like whatever algorithms you are creating and running require a lot of computational power and they seem to need to run on the client side, if they necessitate the need to create .NET libraries.

      I am really curious now – what type of work are you performing? ๐Ÿ™‚

      • I work in the finance department for a large organisation in the UK. I have automated alot of the finance processes which includes gathering data from multiple sources and cross referencing (reconciling) / filling in blanks / correcting errors and amalgamating data for new reports (like a SQL Join but on steroids!).

        I have no control over the data input but I am responsible for the output so there is a lot of ‘process logic’ that goes on.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.