Microsoft Excel is widely used in almost every industry. Its intuitive interface and ease of use for organising data, performing calculations, and analysis of data sets has led to it being commonly used in countless different fields globally.
Whether you’re a fan of Excel or not, at some point you will have to deal with it! For many applications you won’t want to do complex calculations or manage large data sets in Excel itself, but you may need to take values from Excel as inputs, produce reports in an Excel format, or provide tools to Excel users. Python can be a better choice for complex tasks and fortunately there are many tools for the Python developer to work with so Excel and Python can be used together.
This post gives an overview of some of the most popular and useful tools out there to help you choose which is the right one for your specific application.
Below there’s a feature matrix outlining the different features of the packages for calling Python from Excel.
- Building Interactive Python tools with Excel as a front-end
- Reading and writing Excel workbooks
- Additional Resources
Building Interactive Python Tools with Excel as a Front-End
Excel is a well known and really good user interface for many tasks. When you get into more complex tasks and processing larger datasets however you can soon reach the limits of what can sensibly be achieved in Excel. Python is a popular choice for data science and other disciplines as it can handle these complex cases far better than Excel alone. By using both together and recognising the strengths of each, it’s possible for you to build really powerful interactive tools using Excel as a user-friendly front end, with all the heavy lifting done in Python.
Python is an extremely powerful language with an extensive ecosystem of 3rd party libraries. Leveraging Python in Excel spreadsheets can be a fantastic way to enhance your productivity and remove the need for importing and exporting data into and out of Excel. Interactive worksheets can be developed using Python code in the same way as you might use VBA, but with all of the advantages of Python.
There are a few tools available that can be used to bring Python to Excel and it can be difficult to know which one is right for different situations. Below is an overview of each, which I hope will highlight the differences between them and help you decide which ones are right for what you need to achieve.
See the table of features along with the packages that support them below.
PyXLL – The Python Excel Add-In
PyXLL is currently the only package that enables developers to write fully featured Excel addins in Python. It embeds the Python interpreter into Excel so that it can be used as a complete VBA replacement. You can think of it conceptually as being similar to something like Excel-DNA for C#, except that it is dynamic and imports your Python code while Excel is running – so there’s no add-in to build and no need to restart Excel when modifying your Python code.
See PyXLL's FeaturesUsing PyXLL, you can write Python code to create:
- Worksheet functions (user defined functions, called from Excel worksheet formulas)
- Macros
- Menus
- Custom Ribbon Bars
- Real Time Data feeds
Writing a user defined function with PyXLL requires the ‘xl_func’ decorator to be applied to a normal Python function:
from pyxll import xl_func
@xl_func
def py_test(a, b, c):
return (a + b) * c
PyXLL has a config file (pyxll.cfg) which contains a list of all the modules that will be imported when Excel starts. By adding the module above to the list in that file, PyXLL will expose the ‘py_test’ function to Excel as a user defined function to be called from a worksheet.
Some additional features of PyXLL are:
- Array functions
PyXLL can work with arrays of data and has support for NumPy and Pandas types. Functions returning arrays can automatically resize to avoid errors when the dimensions of a result change.
- Real Time Data
Stream real time data into Excel from Python with PyXLL’s Real Time Data feature.
- Object Cache
For functions that return Python objects, rather than simple types (strings, numbers etc) or arrays (NumPy arrays and Pandas DataFrames or Series) PyXLL has a clever ‘object cache’. Object identifiers are returned, and when passed into another function the identifier is used to find the original object. This allows objects to be passed between Python functions using Excel formulas. This can be very useful when dealing with large data sets where the whole data set doesn’t need to be visible in Excel all at once, but instead is passed between Python functions – for example, loading a large data set and performing some aggregation operations and presenting the aggregate results in Excel.
- Excel Object Model
PyXLL has integration with the main COM packages, pywin32 and comtypes, which allow the entire Excel Object Model to be used from Excel macros and functions written with PyXLL. This enables anything that could be done in VBA to be done in Python. It also integrates with xlwings so that the xlwings API can also be used to read and write from Excel.
For more features take a look at the feature matrix below.
Home Page | Download PyXLL | Documentation
Download PyXLLpywin32 / comtypes
The entire Excel API (or Object Model) is exposed via COM. Everything that can be written as a VBA macro can also be written using the Excel COM API in Python by using pywin32 or comtypes.
The Excel COM API can be used from outside of Excel (e.g. from a running Python prompt, script or Jupyter notebook). If you already know how to do something in VBA then doing the equivalent task in Python via the COM API is generally quite straightforward. Calling a routine using pywin32 or comtypes from Excel (e.g. from a button on the ribbon bar, menu item or macro) can be done using PyXLL.
The Excel Object Model is documented here https://docs.microsoft.com/en-gb/office/vba/api/overview/Excel/object-model and once you understand the basic differences between VBA and Python you will find it’s fairly simple to translate between the two.
To demonstrate let’s go though an example. Suppose you had the following VBA code and want to translate it into Python:
Sub Macro1() Range('B11:K11').Select Selection.AutoFill Destination:=Range('B11:K16'), Type:=xlFillDefault Columns('B:K').Select Selection.ColumnWidth = 4 End Sub
First of all we must get the Excel Application object in Python. This code can be run from an interactive Python prompt or a Jupyter notebook, or even run inside Excel itself using PyXLL.
from win32com.client.gencache import EnsureDispatch # Get the Excel Application COM object xl = EnsureDispatch('Excel.Application')
Now we have the Application object we can call the Range method in the same way as the VBA code above. The first important difference to notice is that in VBA simply calling ‘Range().Select’ calls the Select method, but in Python we need to use ‘()’ to call the method.
xl.Range('B11:K11').Select()
The next line requires a constant, ‘xlFillDefault’. To access the same constant in Python we use the ‘win32com.client.constants’ module. Also notice that in VBA no parentheses are used when calling an object method, but in Python there are.
from win32com.client import constants xl.Selection.AutoFill(Destination=xl.Range('B11:K16'), Type=constants.xlFillDefault)
The rest of the code is similar to those lines we’re just translated, and so the entire function looks like
from win32com.client.gencache import EnsureDispatch from win32com.client import constants def Macro1(): xl = EnsureDispatch('Excel.Application') xl.Range('B11:K11').Select() xl.Selection.AutoFill(Destination=xl.Range('B11:K16'), Type=constants.xlFillDefault) xl.Columns('B:K').Select() xl.Selection.ColumnWidth = 4
The translated Python code looks very similar to the original VBA code! Automating tasks in Excel, or just calling it interactively in this way from a Jupyter notebook can be very powerful.
This Python code could be called from Excel as a macro using PyXLL’s “@xl_macro” decorator. Instead of using EnsureDispatch, pyxll.xl_app() should be used to ensure that if there are multiple Excel processes running the correct one is returned.
xlwings
xlwings provides a wrapper around the Excel COM API described above for simplifying many common tasks, such as writing Pandas DataFrames to an open Excel workbook. It uses pywin32’s COM wrappers and gives you access to those, so you can always drop down to using the normal Excel API should you need to.
In the same way as pywin32 and comtypes, xlwings can talk to Excel from a normal Python prompt or Jupyter notebook. For calling code using xlwings from Excel itself, PyXLL provides a convenient way of getting the Excel Application object as an xlwings object. This allows you to script Excel in Python and trigger running your code from a ribbon button or menu item. An example use-case could be a ribbon button for fetching data from a database, building a report, and writing it straight into the running Excel.
The following shows how values can be read and written to a running Excel workbook, including a Pandas DataFrame.
import xlwings as xw wb = xw.Book('workbook.xlsx') # Open an existing Workbook sheet = wb.sheets['Sheet1'] # read and write values from the worksheet sheet.range('A1').value = 'Foo' print(sheet.range('A1').value) # Write a Pandas DataFrames directly to the Excel sheet import pandas as pd df = pd.DataFrame([[1,2], [3,4]], columns=['a', 'b']) sht.range('A1').value = df # Read the DataFrame back, using the 'expand' option to read the whole table sht.range('A1').options(pd.DataFrame, expand='table').value
xlwings includes a way of writing user defined functions (UDFs) or worksheet functions in Python that are called from a formula in Excel, similar to the user defined functions offered by PyXLL. These rely on a server process running outside of Excel and VBA wrappers to call into that server. It’s a simple solution with some drawbacks, such as poor performance and that those functions are only available from the workbook containing the VBA wrappers.
DataNitro
DataNitro is another API to control Excel from Python. It’s not clear what the advantage over its API and the existing and well understood Microsoft Excel COM API is, but it does allow you to write and run scripts without leaving Excel. It has rudimentary support for user defined functions (worksheet functions), but they run outside of the Excel process and only work if there is only one Excel process running.
DataNitro is no longer under active development and is not available to license any more, but it was included here for completeness.
Feature Matrix For Integrating Python and Excel
Feature | DataNitro | xlwings | PyXLL | Comments |
---|---|---|---|---|
Basic worksheet functions | ✔ | ✔ | ✔ | DataNitro and xlwings use an external Python process, xlwings requires VBA wrapper code |
Real time data | ✘ | ✘ | ✔ | Stream real time data into Excel worksheets |
Ribbon customisation | ✘ | ✘ | ✔ | Give users a rich user experience with custom ribbon menus |
Custom task panes | ✘ | ✘ | ✔ | Integrate Python UI components into Excel |
Menu functions | ✘ | ✘ | ✔ | Call Python code from the Excel menu |
Object Cache | ✘ | ✘ | ✔ | Pass Python objects between worksheet functions seamlessly via an object cache |
IntelliSense (tooltips) | ✘ | ✘ | ✔ | IntelliSense tooltip as you type – PyXLL integrates with the ExcelDNA Intellisense Addin |
Thread safe | ✘ | ✘ | ✔ | Improve worksheet responsiveness by using Excel’s own threadpool to run worksheet functions concurrently |
Asynchronous functions | ✘ | ✘ | ✔ | Don’t block Excel waiting for long running functions |
Macros | ✘ | ✔ | ✔ | Macros are functions that can be attached to UI elements like buttons or called from VBA |
Keyboard shortcuts | ✘ | ✘ | ✔ | Keyboard shortcuts can be assigned to macros with PyXLL |
Macro sheet equivalent functions | ✘ | ✘ | ✔ | Call back into Excel from a worksheet function |
Function documentation | ✘ | ✔ | ✔ | Include Python function docstrings in the Excel function wizard |
Automatically resize arrays | ✘ | ✔ | ✔ | Array functions can resize automatically, even in older version of Excel without Dynamic Arrays |
Volatile Functions | ✘ | ✔ | ✔ | Volatile functions are called every time a worksheet is recalculated |
Full Excel API exposed | ✘ | ✔ | ✔ | xlwings uses pywin32, PyXLL users can choose between pywin32, comtypes or xlwing |
Reload without restarting Excel | ✔ | ✔ | ✔ | Modules can be reloaded without restarting Excel. PyXLL also supports ‘deep reloading’ where all module dependencies are also reloaded |
Automatic reloading | ✘ | ✘ | Reload automatically whenever changes are made to your code |
See PyXLL's Features Download PyXLL
Reading and Writing Excel workbooks
For some tasks you may need to read or write an Excel file directly. For batch processing or tasks running on a server Excel may not be installed. The following packages allow you to read and write Excel files directly without needing to use Excel.
OpenPyXL
For working with Excel 2010 onwards, OpenPyXL is a great all round choice. Using OpenPyXL you can read and write xlsx, xlsm, xltx and xltm files. The following code shows how an Excel workbook can be written as an xlsx file with a few lines of Python.
from openpyxl import Workbook wb = Workbook() # grab the active worksheet ws = wb.active # Data can be assigned directly to cells ws['A1'] = 42 # Rows can also be appended ws.append([1, 2, 3]) # Save the file wb.save('sample.xlsx')
Don’t confuse OpenPyXL with PyXLL. The two are completely different and serve different purposes. OpenPyXL is a package for reading and writing Excel files, whereas PyXLL is a tool for building fully featured Excel Add-Ins for integrating Python code into Excel.
OpenPyXL covers more advanced features of Excel such as charts, styles, number formatting and conditional formatting. It even includes a tokeniser for parsing Excel formulas!
One really nice feature for writing reports is its built-in support for NumPy and Pandas data. To write a Pandas DataFrame all that’s required is the included ‘dataframe_to_rows’ function:
from openpyxl.utils.dataframe import dataframe_to_rows wb = Workbook() ws = wb.active for r in dataframe_to_rows(df, index=True, header=True): ws.append(r) wb.save('pandas_openpyxl.xlsx')
If you need to read Excel files to extract data then OpenPyXL can do that too. The Excel file types are incredibly complicated and openpyxl does an amazing job of reading them into a form that’s easy to access in Python. There are some things that openpyxl can’t load though, such as charts and images, so if you open a file and save it with the same name then some elements may be lost.
from openpyxl import load_workbook wb = load_workbook(filename = 'book.xlsx') sheet_ranges = wb['range names'] print(sheet_ranges['D18'].value)
A possible downside of OpenPyXL is that it can be quite slow for handling large files. If you have to write reports with thousands of rows and your application is time-sensitive then XlsxWriter or PyExcelerate may be better choices.
XlsxWriter
If you only need to write Excel workbooks and not read them then XlsxWriter is an easy to use package to use that works well. If you are working with large files or are particularly concerned about speed then you may find XlsxWriter a better choice than OpenPyXL.
XlsxWriter is a Python module that can be used to write text, numbers, formulas and hyperlinks to multiple worksheets in an Excel 2007+ XLSX file. It supports features such as formatting and many more, including:
- 100% compatible Excel XLSX files.
- Full formatting.
- Merged cells.
- Defined names.
- Charts.
- Autofilters.
- Data validation and drop down lists.
- Conditional formatting.
- Worksheet PNG/JPEG/BMP/WMF/EMF images.
- Rich multi-format strings.
- Cell comments.
- Textboxes.
- Integration with Pandas.
- Memory optimization mode for writing large files.
Writing Excel workbooks using XlsxWriter is simple enough. Cells can be written to using the Excel address notation (eg ‘A1’) or row and column numbers. Below is a basic example that shows creating a workbook, adding some data and saving it as an xlsx file.
import xlsxwriter workbook = xlsxwriter.Workbook('hello.xlsx') worksheet = workbook.add_worksheet() worksheet.write('A1', 'Hello world') workbook.close()
If you are using Pandas then you’ll want to use XlsxWriter’s Pandas integration. It takes the hard work out of writing Pandas DataFrames to Excel, and even creating charts.
import pandas as pd # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter') # Get the xlsxwriter objects from the dataframe writer object. workbook = writer.book worksheet = writer.sheets['Sheet1'] # Create a chart object. chart = workbook.add_chart({'type': 'column'}) # Configure the series of the chart from the dataframe data. chart.add_series({'values': '=Sheet1!$B$2:$B$8'}) # Insert the chart into the worksheet. worksheet.insert_chart('D2', chart) # Convert the dataframe to an XlsxWriter Excel object. df.to_excel(writer, sheet_name='Sheet1') # Close the Pandas Excel writer and output the Excel file. writer.save()
When referencing the Pandas data in the worksheet (as the formula in the chart above does), you have to figure out where the data will be in the worksheet so that the formulas point to the correct cells. For reports involving a lot of formulas or charts this can become problematic as doing something as as simple as adding an extra row requires adjusting all affected formulas. For reports like that the package ‘xltable’ can help.
XLTable
XLTable is a higher level library for building Excel reports from pandas DataFrames. Rather than writing the workbook cell by cell or row by row, whole tables are added and can include formulas that reference other tables without having to know ahead of time where those tables will be. For more complex reports involving formulas xltable can be very useful.
The main feature that makes xltable more useful than just writing the Excel files directly is that it can handle tables with formulas that relate to cells in the workbook, without having to know in advance where those tables will be placed on a worksheet. Therefore only when all the tables have been added to the workbook and the workbook is being written are formulas resolved to their final cell addresses.
If you need to write a report that includes formulas rather than just data, XLTable makes it easier by tracking the cell references so you don’t have to construct the formulas by hand and worry about references changing when tables grow or new rows or columns are added.
from xltable import * import pandas as pd # create a dataframe with three columns where the last is the sum of the first two dataframe = pd.DataFrame({ 'col_1': [1, 2, 3], 'col_2': [4, 5, 6], 'col_3': Cell('col_1') + Cell('col_2'), }, columns=['col_1', 'col_2', 'col_3']) # create the named xltable Table instance table = Table('table', dataframe) # create the Workbook and Worksheet objects and add table to the sheet sheet = Worksheet('Sheet1') sheet.add_table(table) workbook = Workbook('example.xlsx') workbook.add_sheet(sheet) # write the workbook to the file using xlsxwriter workbook.to_xlsx()
XLTable can use either XlsxWriter to write an xlsx file, or it can use pywin32 (win32com) to write directly to an open Excel application (Windows only). Writing directly to Excel is good for interactive reports. For example, you could have a button in the Excel ribbon that a user could press to query some data and produce a report. By writing it directly to Excel they can get that report immediately in Excel without having it written to a file first. For details of how to customise the Excel ribbon in Excel see PyXLL: Customizing the Ribbon.
Pandas
For working with ranges of data and reading or writing them to Excel workbooks with no frills, using pandas can be a very quick and effective method. If you don’t need much in the way of formatting and just care about getting data into or out of Excel workbooks then the pandas functions “read_excel” and “to_excel” may be just what you need.
df = pd.DataFrame([ ('string1', 1), ('string2', 2), ('string3', 3) ], columns=['Name', 'Value']) # Write dataframe to an xlsx file df.to_excel('tmp.xlsx')
For more complex tasks because XlsxWriter, OpenPyXL and XLTable all have Pandas integration any of those can also be used to write Pandas DataFrames to Excel. But, for just getting data into Excel using Pandas directly as above is very convenient.
xlrd/xlwt
xlrd and xlwt read and write the old Excel .xls files respectively. These are included in this list for completeness, but are now really only used when you are forced to deal with the legacy xls file format. They are both extremely mature packages that are very capable and stable, but xlwt will never be extended to support the newer xlsx/xlsm file formats therefore for new code dealing with modern Excel file formats they are no longer the best choice.
pyexcel
If you need to deal with multiple file formats (eg xlsx, xls, ods or csv) then pyexcel can be used to handle all of them. It wraps some of the packages above (xlrd/xlwt, openpyxl and xlxswriter and others) to give a single consistent API regardless of the file format you are working with.
The pyexcel packages focuses on data rather than formatting, so if you are looking to produce high quality reports for Excel users then you should consider the alternatives, but if you need to extract data from a spreadsheet without worrying so much about the file type then this package will help you do that.
import pyexcel as pe records = pe.iget_records(file_name='your_file.xls') for record in records: print('%s is aged at %d' % (record['Name'], record['Age'])) # Prints... # Adam is aged at 28 # Beatrice is aged at 29 # Ceri is aged at 30 # Dean is aged at 26 pe.free_resources()
2 thoughts on “Tools for Working with Excel and Python”
Comments are closed.