About this project

Questions? Email : stuart@stuartduncan.ca

This site does rather simplistic sentiment and toxicity analysis of the Canadian parliament's Question Period. Each Question Period speech's toxicity is analyzed using Google’s Perspective API, which is an AI driven tool that was designed to measure the toxicity of online comments. Sentiment analysis of each speech is measured using the TextBlob Python library. Question Period speech data is gathered via the incredibly useful Open Parliament API. Toxicity is measured on scale of 0 to 1, with the closer the score being to 1 the more toxic the speech is considered. With sentiment analysis a comment is considered positive if the polarity is above 0, and the closer to 1 the more positive. Negative sentiment has a negative polarity score and the closer to -1 the more negative the comment is considered. This project / site was created using Python, PHP, a bit of Bootstrap and MySQL. Please take the numbers surrounding toxicity, negativity or positivity with a grain of salt, as this project uses imperfect experimental tools in ways they weren’t really intended.

Lately I have been researching text analysis automation and have been particularly fascinated by automated sentiment analysis. One of things I do in my role at the CBC is running live streams of Question Period. While watching one of these streams I thought back to The Toronto Star’s excellent interactive feature Parliament in Check and started thinking about ways that I could use these text analysis tools on Question Period speeches.

Using Python and the Open Parliament API, I put all of the House of Commons speeches (over 128,000 speeches and counting) since the 2015 election in a MySQL database. Using the Perspective API and the TextBlob library, I analyzed all of the Question Period speeches for toxicity and sentiment. I stuck with just Question Period speeches, which is about 40,000 speeches, as speeches outside of Question Period are pretty benign. Question Period speeches are also pretty short which make them much easier to analyze.

My approach for this project is a bit problematic for a couple of reasons. The Perspective API is not designed to analyze human speech, it is built to analyze online comments. Its AI model, which helps determine whether a piece of text is toxic, is trained using online comments. Text that might be considered toxic in an online space may not be considered toxic with the contextualization of human speech. Automated sentiment analysis faces some of the same challenges. An eventual goal with the project is to highlight where these tools fail in this realm and find tools that would work better in this space.

In many ways this project was also a way for me build back up my development skills and it is very much done in my spare time, so it is all a bit rough. This is very much a work in progress and there are few things I would slowly like to pick away at: