Language of Power

Project Info

Team Name


Team Members

Aqeel and 2 other members with unpublished profiles.

Project Description

Insights from the Thaum AI team.

We ran our NLP model on over 150,000 transcripts of Australian politicians and generated visualisations that display sentiment (emotional tone) of public discourse in marginal topics. We hope that with this information and AI Driven Knowledge Base we can be more mindful of our language and how it is percieved and the affect that has as it co-evolves over time with culture, disasters, protest, and media.

Ultimately, we hope this allows us to better understand and communicate with each other. Building empathy, respect, and self-awareness. I truly believe we're going to have a good old age.

#natural language processing #nlp #sentiment analysis #machine learning #artificial intelligence #social awareness #empathy #psychology

Data Story

Our primary focus was collecting and cleaning as much publicly available political textual data.
Our primary datasets were the following:

Australian parliament - Record of Proceedings -Hansard API!/rows=15/sort=score%20desc/class=collection/q=hansard/p=1/
We found that the Australian parliament Hansard API very complete and easy to use. However, the Hansard XML files have a lot extraneous data that needed to be cleansed before we could apply NLP data.
We had a couple of use-cases for this dataset:
(1) general sentiment analysis conditioned on party affiliation,
(2) linguistic and semantic choices conditioned on party affiliation,
(3) language and sentiment expression depending on debate topic.

Commonwealth Parliamentary Debates (Hansard), 1901-1980
We used this dataset to test our approach to the above while we obtained newer data.

PM Transcripts repository
This dataset had many significant issues that needed to be overcome before we could use it. In particular, the content is often in interview format. This means that analysis of PM language and sentiment would be mixed with that of the interviewer. Parsing this data proved difficult because the transcription formats varied quite a bit. This meant that we had to drop a lot of data. Furthermore, much of the data has been transcribed using OCR which has done a poor job. Our approach to getting around this was applying an automatic spellchecker. However, this is a far from perfect solution.
We used this for many different analyses: (1) PM sentiment, emotion and fine-emotion over time, (2) identifying PM rhetoric, (3) analysis of crisis response and language.

Federal Election speeches

Parliamentary press releases relating to immigrants and refugees
We used this to determine keywords for searching for crisis response language.

Evidence of Work



Project Image

Team DataSets

Senate and House Hansards 1980-2020

Description of Use Text aggregation for political sentiment and cultural bias analysis.

Data Set

Public Transcripts from the PM

Data Set

Hansard up to 1980s

Data Set


Data Set

Challenge Entries

The language of leadership

In times of crisis words can inspire and unite us, but they can also provoke division and conflict. How has the language of Australia’s leaders changed over time? How can we represent these changes in public discourse within a historical timeline?

Go to Challenge | 7 teams have entered this challenge.

Awareness, understanding and respect – How can Open Data help the #BLM movement?

The Black Lives Matter (#BLM) movement is not new, neither are racial injustices. However, in 2020 a series of racially motivated deaths, brutality and profiling in the US sent shockwaves around the world. Over 15 Million people took to the streets around the world to protest, and demand change. What can Open Government Data do to help the movement?

Go to Challenge | 5 teams have entered this challenge.