The project is designed to give you experience collecting or finding your own dataset, determining the appropriate questions to answer about the data, and planning how to execute analysis of the data. The project involves several parts. The project represents 30% of your final grade for ETC5510.

  1. Locate a suitable data source and determine appropriate questions that could be answered using this data. It cannot be data set from kaggle. It needs to be from an original source. If it is is csv format, there need to be more than one file or multiple sheets. Challenge yourself to work with data addressing a problem in today’s world!

  2. Cleaning of your data, in order to answer your questions. This is the important part to illustrate in your project, because we are expecting you to be able to demonstrate your ability to take a messy data set and organise it for later analysis.

  3. Simple analysis using methods covered in class; exploratory data analysis, numerical and visual summaries of the data, and the application of basic modeling strategies. The focus is on trying to answer some of the questions you posed. You are not expected to answer all, if you have a long lots of questions.

  4. Describe your cleaning procedures and analytics in web story board, which can be done using a slideshow, created xaringan, a flexdashboard, or a simple shiny app. You should include why you chose the data and what learned about the problem by completing this project. We can upload these to the departmental shiny server for everyone to see, and so that you can show it off to future employers or your family members.

  5. Present your data analysis in class, as a 5 - 10 minute oral presentation, pre-recorded with zoom. You will upload your presentation to moodle as a video assignment and we will watch the presentations during the last class in week 12.

This project will be conducted collaboratively, withs team of your choices, and with a maximum team size of 4 To ensure correct marks are awarded, please carefully document, in detail, your individual contributions to the project. Each team member is expected to participate substantially in all aspects of the work, including the writing and oral presentation.

Due date Turn in Points
Milestone 1: 13th May Prospective team members and topics 5
Milestone 2: 20th May Team members and team name, and paragraph describing possible data sets, with links to the data sources. 5
Milestone 3: 27th May Electronic copy of your data, and a page of data description, and cleaning done, or needing to be done. 10
Milestone 4: 5th June Final version of story board uploaded 40
Milestone 5: (Video submission 9th June, Presentation Class 10th June) Project presentations during class periods. All students are expected to attend, and points will be de- ducted for non-attendance. 30 (peer evaluation) 5 points will be deducted from your presentation score if you do not attend for the entire class, and 5 points if you skip the class where you did not present.

Marking Guide for flexdashboard/xaringan/shiny/

To help you complete the project, below is a rubric to guide you to what we are expecting in your final dashboard:

content description Excellent (HD) Very good (D) Good (C) Satisfactory (P) Unsatisfactory (F)
Landing page Explanation of the problem of interest (10%) Motivation and explanation of problem of interest, to communicate the scenario. Outline encouraging exploration of other sections. Data sources explained, including limitations that might affect possible analysis and conclusions. Explanation of problem of interest is very clear and provides information about the scenario. Explanation of problem of interest is clear and provides information about the scenario. Explanation of problem of interest is rudimentary and lacks detail. Explanation of the problem is unclear and/or not shown. There is no explanation of the problem to be solved.
Methodology tab Rationale for data selection and learnings from analysis. Description of cleaning procedures and analytics (20%) List of questions being addressed by the storyboard. Key parts of analysis clearly explained. Detailed and concise explanation of data being used and what was observed with a comprehensive explanation of the methods used to tidy and wrangle data including reasons. A description of the chosen data and includes an informed rationale for its use. Learnings from analysis are clearly articulated. Description and rationale is soundly presented and demonstrates some learnings from analysis. Description and rationale is reasonably presented and lists basic learnings from analysis. Description of chosen data is unclear and/or not shown and demonstrates no or little learning.
Analysis components Components that communicate the analysis being carried out (30%) Separate components or tabs, that clearly correspond to each of the main questions being addressed. Highly appropriate choice of data plots included, numerical summaries and application of models. Exceptional user interaction elements appropriate for helping to explore the data, matching different aspects of the questions of interest. Separate components or tabs, that correspond to each of the main questions being addressed. Appropriate choice of data plots and user interaction elements for exploring data. Separate components or tabs, that correspond to each of the main questions being addressed. Reasonable choice of data plots and user interaction elements. Separate components or tabs correspond to main questions being addressed, but lacks detail. Rudimentary choice of data plots and user interaction elements. Separate components or tabs not used and/or main questions being addressed is unclear. Inappropriate plot choice and/or not used. Inappropriate user interaction elements and/or not used or do not allow for exploration of data.
User interface and instruction Component, features and instruction that allow the user to interact with dashboard (20%) Functional components and user interface that is intuitive to use, and easily enables the exploration of the data. Clear information on how the user is to interact with the dashboard, and easily accessible. Excellent use of interactive elements, such as mouse over, or animation to help communicate additional information. Functional components and commendable user interface that enables the exploration of the data. Detailed information on how the user is to interact with the dashboard, a disclaimer and a description is provided. Effective components and effective user interface that enables the exploration of the data. Information on how the user is to interact with the dashboard, but lacks detail. Rudimentary components and user interface that enables limited exploration of the data. Some user instruction provided. Little or no information on how the user is to interact with the dashboard. Disclaimer or description is not available or accessible.
Visualisation Graphical representation of data (20%) Choice of plots match the required analysis and problem being studied. Appropriate mappings of variables to plot elements. Use of proximity and similarity and cognitive principles in plot design. Neatly labelled axes and legends. Annotations on plot as needed to indicate important features, e.g. outliers labelled. Excellent use of interactive elements, such as mouse over, or animation to help communicate additional information. The dashboard presented incorporates commendable use of interactive visualisation. The dashboard presented incorporates an effective use of visualisation. The dashboard presented incorporates a basic use of visualisation. The dashboard presented incorporates little or no use of visualisation.
Expression and grammar Scholarly, succinct with correct spelling, grammar and punctuation (5%) Writing style is exceptional, scholarly and succinct that’s free from spelling, grammar and punctuation errors. Writing style is scholarly, free from spelling, grammar and punctuation errors. Writing style is scholarly, but wordy. Free from spelling, grammar and punctuation errors. Writing is scholarly and wordy. Contains some grammatical, punctuation and spelling errors. Writing is unscholarly. Many grammatical, punctuation and spelling errors.
References Application of accurate and consistent APA 6th style (5%) The appropriate referencing style has been used consistently, with no errors.Includes citations for software used, and data sources. The appropriate referencing style has been used consistently,with very few errors. The appropriate referencing style has been used consistently, with few errors. The appropriate referencing style has been used much of the time, but attention needs to be given to reducing the number of errors. Material used fro mother sources without citation.

No late turn-ins accepted