This exam is motivated by the blog post by Peter Ellis on polls leading up to the Australian Federal election, and the most recent blog post from election day. A copy of the data can be downloaded from or read directly from here. Download and read the data into your R session.
start_date Min. :2007-11-20 1st Qu.:2012-01-17 Median :2013-09-03 Mean :2014-03-15 3rd Qu.:2016-06-02 Max. :2019-05-15
# A tibble: 12 x 2 firm n <chr> <int> 1 Essential 1922 2 Newspoll 1562 3 Roy Morgan 1210 4 ReachTEL 323 5 Nielsen 234 6 Ipsos 224 7 Galaxy 192 8 YouGov 63 9 Election result 25 10 AMR 18 11 Lonergan 6 12 YouGov/Galaxy 6
Newspoll is associated with The Australian newspaper, which is owned by Murdoch Media empire. However, https://en.wikipedia.org/wiki/Newspoll is administered by Galaxy, and owned by international market research and data analytics group, YouGov. The latest polling information is displayed at http://www.newspoll.com.au, but it does not give details on how the data is collected.
Essential is associated with the Guardian newspaper. They maintain a panel of 100,000 members, and draw from this panel about 1000 for interviews each week. It has an aim of 50/50 male/female ratio of over 18 years olds. Data is sourced from Your Source, another company.
Ipsos is a specialist polling organisation with no apparent affiliation with news organisations or political parties. In the most recent poll, they sampled 1,842 people, using random digit dialing of mobile phone numbers.
Roy Morgan is an Australia market research company. It is independent, and the company now operates globally. Their most recent polling data was collected by asking respondents “Regardless of who you have or will vote for who do you THINK will win the Federal Election?” Data was collected on 3,004 voters, by SMS.
Yes! It is in long form, where every measured value
intended_vote is identified by numerous characteristics, dates, firm, preference type, party.
# A tibble: 12 x 3 firm first last <chr> <date> <date> 1 Newspoll 2007-11-20 2019-05-16 2 Election result 2007-11-24 2016-07-02 3 Roy Morgan 2010-07-17 2019-05-12 4 Essential 2010-08-13 2019-05-14 5 Nielsen 2010-08-21 2014-05-17 6 Galaxy 2011-08-03 2019-04-25 7 AMR 2013-03-22 2013-08-18 8 ReachTEL 2013-05-02 2018-08-06 9 Ipsos 2014-10-30 2019-05-15 10 Lonergan 2016-05-06 2016-05-08 11 YouGov 2017-06-22 2017-12-10 12 YouGov/Galaxy 2019-05-13 2019-05-15
There is a lot of difference in the operating time frames of the pollsters. The main ones have been consistently polling for a decade or more. Others have popped up and disappeared, e.g. AMR. And Nielsen, which was a major operator, stopped conducting polls in 2014.
intended_vote, separately for each pollster, and sort from highest to lowest median value. (Be sure to drop the actual election results.) Write a few sentences explaining what you learn, particularly focusing on the initial question which relates to pollster bias.
The median intended vote varies among pollsters. With Nielsen generally providing much more favorable results for Lib/Nat, and Ipsos the least. It suggests that the pollsters may either be biased towards one politcal party or another, or that their collection methods sample different types of people.
# A tibble: 4 x 2 start_date m * <date> <dbl> 1 2007-11-24 47.3 2 2010-08-21 49.9 3 2013-09-07 53.5 4 2016-07-02 50.4 # A tibble: 1 x 1 `mean(m)` <dbl> 1 50.3
The percentage vote for Lib/Nat has varied at each election. In 2007, they lost to ALP, but attained government in 2010, 2013, and 2016. The average percentage across this time was 50.275. Thus the polls would be expected to be centred around this value. That is clearly not the case for many pollsters, with many having medians above or below this number.
loesssmoother that will allow the reader to look at the rough average of the polls, and hence see how the voting public are trending over time. Overlay the actual election results (as points). Include a baseline at 50% that will show the critical juncture when the outcome would likely be a change in government. Coming into the election last Saturday (18/5/2019), what did it look like the result would be?