PyXLL 3.0 introduced a new, simpler, way of streaming real time data to Excel from Python.
Excel has had support for real time data (RTD) for a long time, but it requires a certain knowledge of COM to get it to work. With the new RTD features in PyXLL 3.0 it is now a lot simpler to get streaming data into Excel without having to write any COM code.
This blog will show how to build a simple real time data feed from Twitter in Python using the tweepy package, and then show how to stream that data into Excel using PyXLL.
As we are interested in real time data we will use tweepy’s streaming API. Details on this are available in the tweepy documentation here http://tweepy.readthedocs.org/en/v3.5.0/streaming_how_to.html.
If you don’t already have tweepy installed, you can install it using pip
pip install tweepy
The code from this blog is available on github https://github.com/pyxll/pyxll-examples/tree/master/twitter.
Getting Twitter API keys
In order to access Twitter Streaming API you will need a Twitter API key, API secret, Access token and Access token secret. Follow the steps below to get your own access tokens.
- Create a twitter account if you do not already have one.
- Go to https://apps.twitter.com/ and log in with your twitter credentials.
- Click “Create New App”.
- Fill out the form, agree to the terms, and click “Create your Twitter application”
- In the next page, click on “API keys” tab, and copy your “API key” and “API secret”.
- Scroll down and click “Create my access token”, and copy your “Access token” and “Access token secret”.
Streaming Tweets in Python
To start with we can create a simple listener class that simply prints tweets as they arrive
from tweepy.streaming import StreamListener from tweepy import OAuthHandler from tweepy import Stream import logging _log = logging.getLogger(__name__) # User credentials to access Twitter API access_token = "YOUR ACCESS TOKEN" access_token_secret = "YOUR ACCESS TOKEN SECRET" consumer_key = "YOUR CONSUMER KEY" consumer_secret = "YOUR CONSUMER KEY SECRET" class TwitterListener(StreamListener): def __init__(self, phrases): auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) self.__stream = Stream(auth, listener=self) self.__stream.filter(track=phrases, async=True) def disconnect(self): self.__stream.disconnect() def on_data(self, data): print(data) def on_error(self, status): print(status) if __name__ == '__main__': import time logging.basicConfig(level=logging.INFO) phrases = ["python", "excel", "pyxll"] listener = TwitterListener(phrases) # listen for 60 seconds then stop time.sleep(60) listener.disconnect()
If we run this code any tweets mentioning Python, Excel or PyXLL get printed:
python twitterxl.py INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): stream.twitter.com {"text": "Excel keyboard shortcut - CTRL+1 to bring up Cell Formatting https://t.co/wvx634EpUy", "is... {"text": "Excel Tips - What If Analysis #DMZWorld #Feature #Bond #UMI https://t.co/lxzgZnIItu #UMI",... {"text": "How good are you at using #Excel? We're looking for South Africa's #ExcelChamp Ts & Cs... {"text": "The Best Data Scientists Run R and Python - insideBIGDATA https://t.co/rwty058dL2 #python ... {"text": "How to Create a Pivot Table in Excel: A Step-by-Step Tutorial (With Video) \u2013 https://... {"text": "Python eats Alligator 02, Time Lapse Speed x6 https://t.co/3km8I92zJo", "is_quote_status":... ... Process finished with exit code 0
In order to make this more suitable for getting these tweets into Excel we will now extend this TwitterListener class in the following ways:
- Broadcast updates to other subscribers instead of just printing tweets.
- Keep a buffer of the last few received tweets.
- Only ever create one listener for each unique set of phrases.
- Automatically disconnect listeners with no subscribers.
The updated TwitterListener class is as follows:
class TwitterListener(StreamListener): """tweepy.StreamListener that notifies multiple subscribers when new tweets are received and keeps a buffer of the last 100 tweets received. """ __listeners = {} # class level cache of listeners, keyed by phrases __lock = threading.RLock() __max_size = 100 @classmethod def get_listener(cls, phrases, subscriber): """Fetch an TwitterListener listening to a set of phrases and subscribe to it""" with cls.__lock: # get the listener from the cache or create a new one phrases = frozenset(map(str, phrases)) listener = cls.__listeners.get(phrases, None) if listener is None: listener = cls(phrases) cls.__listeners[phrases] = listener # add the subscription and return listener.subscribe(subscriber) return listener def __init__(self, phrases): """Use static method 'get_listener' instead of constructing directly""" _log.info("Creating listener for [%s]" % ", ".join(phrases)) self.__phrases = phrases self.__subscriptions = set() self.__tweets = [None] * self.__max_size # listen for tweets in a background thread using the 'async' keyword auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) self.__stream = Stream(auth, listener=self) self.__stream.filter(track=phrases, is_async=True) self.__connected = True @property def tweets(self): return list(self.__tweets) def subscribe(self, subscriber): """Add a subscriber that will be notified when new tweets are received""" with self.__lock: self.__subscriptions.add(subscriber) def unsubscribe(self, subscriber): """Remove subscriber added previously. When there are no more subscribers the listener is stopped. """ with self.__lock: self.__subscriptions.remove(subscriber) if not self.__subscriptions: self.disconnect() def disconnect(self): """Disconnect from the twitter stream and remove from the cache of listeners.""" with self.__lock: if self.__connected: _log.info("Disconnecting twitter stream for [%s]" % ", ".join(self.__phrases)) self.__listeners.pop(self.__phrases) self.__stream.disconnect() self.__connected = False @classmethod def disconnect_all(cls): """Disconnect all listeners.""" with cls.__lock: for listener in list(cls.__listeners.values()): listener.disconnect() def on_data(self, data): data = json.loads(data) with self.__lock: self.__tweets.insert(0, data) self.__tweets = self.__tweets[:self.__max_size] for subscriber in self.__subscriptions: try: subscriber.on_data(data) except: _log.error("Error calling subscriber", exc_info=True) return True def on_error(self, status): with self.__lock: for subscriber in self.__subscriptions: try: subscriber.on_error(status) except: _log.error("Error calling subscriber", exc_info=True) if __name__ == '__main__': import time logging.basicConfig(level=logging.INFO) class TestSubscriber(object): """simple subscriber that just prints tweets as they arrive""" def on_error(self, status): print("Error: %s" % status) def on_data(self, data): print(data.get("text")) subscriber = TestSubscriber() listener = TwitterListener.get_listener(["python", "excel", "pyxll"], subscriber) # listen for 60 seconds then stop time.sleep(60) listener.unsubscribe(subscriber)
When this is run it’s very similar to the last case, except that now only the text part of the tweets are printed. Also note that the listener is not explicitly disconnected, that happens automatically when the last subscriber unsubscribes.
python twitterxl.py INFO:__main__:Creating listener for python, excel, pyxll INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): stream.twitter.com Linuxtoday Make a visual novel with Python: Linux User & Developer: Bridge the gap between books... How to create drop down list in excel https://t.co/Ii2hKRlRBe... RT @papisdotio: Flying dron with Python @theglamp #PAPIsConnect https://t.co/zzPNSFb66e... RT @saaid230: The reason I work hard and try to excel at everything I do so one day I can take care ... RT @javacodegeeks: I'm reading 10 Awesome #Python Tutorials to Kick-Start my Web #Programming https:... INFO:__main__:Disconnecting twitter stream for [python 1="excel," 2="pyxll" language=","][/python] Process finished with exit code 0
Getting the Data into Excel
Now the hard part of getting the streaming twitter data into Python is taken care of, creating a real time data source in Excel using PyXLL is pretty straightforward.
PyXLL 3.0 has a new class, RTD
. When a function decorated with the xl_func
decorator returns an RTD instance, the value of the calling cell will be the value
property of the RTD instance. If the value property of the returned RTD instance later changes, the cell value changes.
We will write a new class inheriting from RTD that acts as a subscriber to our twitter stream (in the same way as TestSubscriber in the code above). Whenever a new tweet is received it will update its value, and so the cell in Excel will update.
from pyxll import RTD class TwitterRTD(RTD): """Twitter RTD class that notifies Excel whenever a new tweet is received.""" def __init__(self, phrases): # call super class __init__ with an initial value super(TwitterRTD, self).__init__(value="Waiting for tweets...") # get the TwitterListener and subscribe to it self.__listener = TwitterListener.get_listener(phrases, self) def disconnect(self): # overridden from RTD base class. Called when Excel no longer # needs the RTD object (for example, when the cell formula # is changed. self.__listener.unsubscribe(self) def on_error(self, status): self.value = "#ERROR %s" % status def on_data(self, data): self.value = data.get("text")
To expose that to Excel all that’s needed is a function that returns an instance of our new TwitterRTD class
from pyxll import xl_func @xl_func("string[] phrases: rtd") def twitter_listen(phrases): """Listen for tweets containing certain phrases""" # flatten the 2d list of lists into a single list of phrases phrases = [str(x) for x in itertools.chain(*phrases) if x] assert len(phrases) > 0, "At least one phrase is required" # return our TwitterRTD object that will update when a tweet is received return TwitterRTD(phrases)
All that’s required now is to add the module to the pyxll.cfg file, and then the new function ‘twitter_listen’ will appear in Excel, and calling it will return a live stream of tweets.
A More Complete Twitter Feed
So far we’ve got live tweets streaming into Excel, which is pretty cool, but only one tweet is visible at a time and we can only see the tweet text. It would be even better to see a grid of data showing the most recent tweets with some metadata as well as the tweet itself.
RTD functions always return just a single cell of data, so what we need to do is write a slightly different function that takes a couple more arguments: A key for the part of the tweet we want (e.g. ‘text’ or ‘created_at’) and an index (e.g. 0 as the latest tweet, 1 the second most recent tweet etc).
As some interesting bits of metadata are in nested dictionaries in the twitter data, the ‘key’ used to select the item from the data dictionary is a ‘/’ delimited list of keys used to drill into tweet data (for example, the name of the user is in the sub-dictionary ‘user’, so to retrieve it the key ‘user/name’ would be used).
The TwitterListener class we’ve written already keeps a limited history of the tweets it’s received so this isn’t too much more than we’ve already done.
class TwitterRTD(RTD): """Twitter RTD class that notifies Excel whenever a new tweet is received.""" def __init__(self, phrases, row=0, key="text"): super(TwitterRTD, self).__init__(value="Waiting for tweets...") self.__listener = TwitterListener.get_listener(phrases, self) self.__row = row self.__key = key def disconnect(self): self.__listener.unsubscribe(self) def on_error(self, status): self.value = "#ERROR %s" % status def on_data(self, data): # if there are no tweets for this row return an empty string tweets = self.__listener.tweets if len(tweets) < self.__row or not tweets[self.__row]: self.value = "" return # get the value from the tweets value = tweets[self.__row] for key in self.__key.split("/"): if not isinstance(value, dict): value = "" break value = value.get(key, {}) # set the value back in Excel self.value = str(value)
The worksheet function also has to be updated to take these extra arguments
@xl_func("string[] phrases, int row, string key: rtd") def twitter_listen(phrases, row=0, key="text"): """Listen for tweets containing certain phrases""" # flatten the 2d list of lists into a single list of phrases phrases = [str(x) for x in itertools.chain(*phrases) if x] assert len(phrases) > 0, "At least one phrase is required" # return our TwitterRTD object that will update when a tweet is received return TwitterRTD(phrases, row, key)
After reloading the PyXLL addin, or restarting Excel, we can now call this modified function with different values for row and key to build an updating grid of live tweets.
One final step is to make sure that any active streams are disconnected when Excel closes. This will prevent the tweepy background thread from preventing Excel from exiting cleanly.
from pyxll import xl_on_close @xl_on_close def disconnect_all_listeners(): TwitterListener.disconnect_all()
The code from this blog is available on github https://github.com/pyxll/pyxll-examples/tree/master/twitter.
Other Use Cases
Of course getting real time tweets into Excel is interesting, but it may not really be that useful! Real time data in Excel has many other real-world applications however, for example:
- Streaming market data such as stock prices from an external data provider
- Retrieving data from a web application
- Returning results from a long running function on a background thread
- Live web traffic monitoring