Canadian Parliament - Question Period Analyzer

About this project

Questions? Email : stuart@stuartduncan.ca

This site does rather simplistic sentiment and toxicity analysis of the Canadian parliament's Question Period. Each Question Period speech's toxicity is analyzed using Google’s Perspective API, which is an AI driven tool that was designed to measure the toxicity of online comments. Sentiment analysis of each speech is measured using the TextBlob Python library. Question Period speech data is gathered via the incredibly useful Open Parliament API. Toxicity is measured on scale of 0 to 1, with the closer the score being to 1 the more toxic the speech is considered. With sentiment analysis a comment is considered positive if the polarity is above 0, and the closer to 1 the more positive. Negative sentiment has a negative polarity score and the closer to -1 the more negative the comment is considered. This project / site was created using Python, PHP, a bit of Bootstrap and MySQL. Please take the numbers surrounding toxicity, negativity or positivity with a grain of salt, as this project uses imperfect experimental tools in ways they weren’t really intended.

Lately I have been researching text analysis automation and have been particularly fascinated by automated sentiment analysis. One of things I do in my role at the CBC is running live streams of Question Period. While watching one of these streams I thought back to The Toronto Star’s excellent interactive feature Parliament in Check and started thinking about ways that I could use these text analysis tools on Question Period speeches.

Using Python and the Open Parliament API, I put all of the House of Commons speeches (over 128,000 speeches and counting) since the 2015 election in a MySQL database. Using the Perspective API and the TextBlob library, I analyzed all of the Question Period speeches for toxicity and sentiment. I stuck with just Question Period speeches, which is about 40,000 speeches, as speeches outside of Question Period are pretty benign. Question Period speeches are also pretty short which make them much easier to analyze.

My approach for this project is a bit problematic for a couple of reasons. The Perspective API is not designed to analyze human speech, it is built to analyze online comments. Its AI model, which helps determine whether a piece of text is toxic, is trained using online comments. Text that might be considered toxic in an online space may not be considered toxic with the contextualization of human speech. Automated sentiment analysis faces some of the same challenges. An eventual goal with the project is to highlight where these tools fail in this realm and find tools that would work better in this space.

In many ways this project was also a way for me build back up my development skills and it is very much done in my spare time, so it is all a bit rough. This is very much a work in progress and there are few things I would slowly like to pick away at:

Putting the project on GitHub (I need to clean it up a bit so that code makes more sense).
Work a bit on the mobile friendliness of the site (still far too much text displayed on the mobile version of the site, to make it an enjoyable experience)
Clean up some of the special characters that show up during speeches.
Speed up the loads of some of the pages.
Add some of the experimental Perspective API models to the site, particularly the model that measures whether a comment is an insult.
Create some data visualizations that display sentiment over time and highlight which Question Period days were most positive or negative.
Create a calendar view of Question Period dates rather replacing the current ugly big long list of text dates.
TextBlob also measures subjectivity, which in very basic terms measures whether a statement is based on someone’s personal beliefs or a factual measurement. For example, the statement “That tree is beautiful” is subjective while the statement “That tree is tall” is less so. I would like to factor this into the sentiment scores somehow.