<!-- background-color: #006DAE --> <!-- class: middle center hide-slide-number --> <div class="shade_black" style="width:60%;right:0;bottom:0;padding:10px;border: dashed 4px white;margin: auto;"> <i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See <a href=/>here for PDF <i class="fas fa-file-pdf"></i></a>. </div> <br> .white[Press the **right arrow** to progress to the next slide!] --- background-image: url(images/bg1.jpg) background-size: cover class: hide-slide-number split-70 title-slide count: false .column.shade_black[.content[ <br> # .monash-blue.outline-text[ETC5510: Introduction to Data Analysis] <h2 class="monash-blue2 outline-text" style="font-size: 30pt!important;">Week 3, part B</h2> <br> <h2 style="font-weight:900!important;">Dates and Times</h2> .bottom_abs.width100[ Lecturer: *Nicholas Tierney & Stuart Lee* Department of Econometrics and Business Statistics
<i class="fas fa-envelope faa-float animated "></i>
ETC5510.Clayton-x@monash.edu March 2020 <br> ] ]] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> --- class: transition # Overview - Working with dates - Constructing graphics --- # Reminder re the assignment: - Due 5pm **April 9th** - Submit by **one person** in the assignment group - ED > assessments > upload your `Rmd`, and `html`, files. - **One per group** - **Remember to name your files** - E.g., "ETC5510-assignment-1-group-name.Rmd" --- class: transition # How to submit on ED --- background-image: url(images/allison-horst-ggplot2-masterpiece.png) background-size: contain background-position: 50% 50% class: center, bottom, white .right.purple.small[ Art by Allison Horst ] --- class: refresher # Try drawing a mental model of last lecture's material on ggplot2 --- background-image: url(images/allison-horst-lubridate.png) background-size: contain background-position: 50% 50% class: center, bottom, white .right.purple.small[ Art by Allison Horst ] --- # The challenges of working with dates and times - Conventional order of day, month, year is different across location - Australia: DD-MM-YYYY - "21-02-2020" - America: MM-DD-YYYY - "02-21-2020" - [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601): YYYY-MM-DD - "2020-02-21" --- background-image: url(https://imgs.xkcd.com/comics/iso_8601.png) background-size: contain background-position: 50% 50% class: center, bottom, white --- # The challenges of working with dates and times - Number of units change: - Years do not have the same number of days (leap years) - Months have differing numbers of days. (January vs February vs September) - Not every minute has 60 seconds (leap seconds!) - Times are local, for us. Where are you? - Timezones!!! -- - Representing time relative to it's type: - What day of the week is it? - Day of the month? - Week in the year? - Years start on different days (Monday, Sunday, ...) --- # The challenges of working with dates and times - Representing time relative to it's type: - Months could be numbers or names. (1st month, January) - Days could be numbers of names. (1st day....Sunday? Monday?) - Days and Months have abbreviations. (Mon, Tue, Jan, Feb) -- - Time can be relative: - How many days until we go on holidays? - How many working days? --- background-image: url(images/allison-horst-lubridate.png) background-size: contain background-position: 50% 50% class: center, bottom, white .right.purple.small[ Art by Allison Horst ] --- # Lubridate .left-code[ - Simplifies date/time by helping you: - Parse values - Create new variables based on components like month, day, year - Do algebra on time ] .right-plot[ <img src="images/lubridate.png" width="50%" style="display: block; margin: auto;" /> ] --- background-image: url(images/allison-horst-lubridate-ymd.png) background-size: contain background-position: 50% 50% class: center, bottom, white .right.purple.large[Art by Allison Horst] --- class: transition # Parsing dates & time zones using `ymd()` --- # `ymd()` can take a character input ```r ymd("20190810") ## [1] "2019-08-10" ``` --- # `ymd()` can also take other kinds of separators ```r ymd("2020-03-31") ## [1] "2020-03-31" ymd("2020/03/31") ## [1] "2020-03-31" ``` -- ```r ymd("??2020-.-03//31---") ## [1] "2020-03-31" ``` -- ## ....yeah, wow, I was actually surprised this worked --- # Change the letters, change the output # `mdy()` expects month, day, year. -- ```r mdy("03/31/2020") ## [1] "2020-03-31" ``` -- # `dmy()` expects day, month, year. -- ```r dmy("03/31/2020") ## [1] NA ``` --- # Add a timezone If you add a time zone, what changes? ```r ymd("2020-03-31", tz = "Australia/Melbourne") ## [1] "2020-03-31 AEDT" ``` --- # What happens if you try to specify different time zones? .pull-left[ ```r ymd("2020-03-31", tz = "Africa/Abidjan") ## [1] "2020-03-31 GMT" ymd("2020-03-31", tz = "America/Los_Angeles") ## [1] "2020-03-31 PDT" ``` ] .pull-right[ A list of acceptable time zones can be found [here](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) (google wiki timezone database) ] --- # Timezones another way: ```r today() ## [1] "2020-03-31" ``` -- ```r today(tz = "America/Los_Angeles") ## [1] "2020-03-30" ``` -- ```r now() ## [1] "2020-03-31 13:28:32 AEDT" ``` -- ```r now(tz = "America/Los_Angeles") ## [1] "2020-03-30 19:28:32 PDT" ``` --- # date and time: `ymd_hms()` ```r ymd_hms("2020-03-31 10:05:30", tz = "Australia/Melbourne") ## [1] "2020-03-31 10:05:30 AEDT" ``` ```r ymd_hms("2020-03-31 10:05:30", tz = "America/Los_Angeles") ## [1] "2020-03-31 10:05:30 PDT" ``` --- # Extracting temporal elements - Very often we want to know what day of the week it is - Trends and patterns in data can be quite different depending on the type of day: - week day vs. weekend - weekday vs. holiday - regular saturday night vs. new years eve --- # Many ways of saying similar things - Many ways to specify day of the week: - A number. Does 1 mean... Sunday, Monday or even Saturday??? - Or text or or abbreviated text. (Mon vs. Monday) -- - Talking with people we generally use day name: - Today is Friday, tomorrow is Saturday vs Today is 5 and tomorrow is 6. - But, doing data analysis on days might be useful to have it represented as a number: - e.g., Saturday - Thursday is 2 days (6 - 4) --- # The Many ways to say Monday ```r wday("2019-08-12") ## [1] 2 wday("2019-08-12", label = TRUE) ## [1] Mon ## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat ``` -- ```r wday("2019-08-12", label = TRUE, abbr = FALSE) ## [1] Monday ## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday wday("2019-08-12", label = TRUE, week_start = 1) ## [1] Mon ## Levels: Mon < Tue < Wed < Thu < Fri < Sat < Sun ``` --- # Similarly, we can extract what month the day is in. ```r month("2020-03-31") ## [1] 3 month("2020-03-31", label = TRUE) ## [1] Mar ## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec month("2020-03-31", label = TRUE, abbr = FALSE) ## [1] March ## 12 Levels: January < February < March < April < May < June < ... < December ``` --- # Fiscally, it is useful to know what quarter the day is in. ```r quarter("2020-03-31") ## [1] 1 semester("2020-03-31") ## [1] 1 ``` --- # Similarly, we can select days within a year. ```r yday("2020-03-31") ## [1] 91 ``` --- class: transition # Your Turn: Download exercise 3B from the course site and answer the questions about date
04
:
00
--- # [Melbourne pedestrian sensor portal](http://www.pedestrian.melbourne.vic.gov.au/): .pull-left[ <img src="images/sensors.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - Contains hourly counts of people walking around the city. - Extract records for 2018 for the sensor at Melbourne Central - Use lubridate to extract different temporal components, so we can study the pedestrian patterns at this location. ] --- # getting pedestrian count data with rwalkr ```r library(rwalkr) walk_all <- melb_walk_fast(year = 2019) walk <- walk_all %>% filter(Sensor == "Melbourne Central") walk ``` ``` ## # A tibble: 8,760 x 5 ## Sensor Date_Time Date Time Count ## <chr> <dttm> <date> <dbl> <dbl> ## 1 Melbourne Central 2017-12-31 13:00:00 2018-01-01 0 2996 ## 2 Melbourne Central 2017-12-31 14:00:00 2018-01-01 1 3481 ## 3 Melbourne Central 2017-12-31 15:00:00 2018-01-01 2 1721 ## 4 Melbourne Central 2017-12-31 16:00:00 2018-01-01 3 1056 ## 5 Melbourne Central 2017-12-31 17:00:00 2018-01-01 4 417 ## 6 Melbourne Central 2017-12-31 18:00:00 2018-01-01 5 222 ## 7 Melbourne Central 2017-12-31 19:00:00 2018-01-01 6 110 ## 8 Melbourne Central 2017-12-31 20:00:00 2018-01-01 7 180 ## 9 Melbourne Central 2017-12-31 21:00:00 2018-01-01 8 205 ## 10 Melbourne Central 2017-12-31 22:00:00 2018-01-01 9 326 ## # … with 8,750 more rows ``` --- # Let's think about the data structure. .left-code[ - The basic time unit is hour of the day. - Date can be decomposed into - month - week day vs weekend - week of the year - day of the month - holiday or work day ] .right-plot[ ![](images/Time.png) ] --- # What format is walk in? ```r walk ## # A tibble: 8,760 x 5 ## Sensor Date_Time Date Time Count ## <chr> <dttm> <date> <dbl> <dbl> ## 1 Melbourne Central 2017-12-31 13:00:00 2018-01-01 0 2996 ## 2 Melbourne Central 2017-12-31 14:00:00 2018-01-01 1 3481 ## 3 Melbourne Central 2017-12-31 15:00:00 2018-01-01 2 1721 ## 4 Melbourne Central 2017-12-31 16:00:00 2018-01-01 3 1056 ## 5 Melbourne Central 2017-12-31 17:00:00 2018-01-01 4 417 ## 6 Melbourne Central 2017-12-31 18:00:00 2018-01-01 5 222 ## 7 Melbourne Central 2017-12-31 19:00:00 2018-01-01 6 110 ## 8 Melbourne Central 2017-12-31 20:00:00 2018-01-01 7 180 ## 9 Melbourne Central 2017-12-31 21:00:00 2018-01-01 8 205 ## 10 Melbourne Central 2017-12-31 22:00:00 2018-01-01 9 326 ## # … with 8,750 more rows ``` --- # Add month and weekday information ```r walk_tidy <- walk %>% mutate(month = month(Date, label = TRUE, abbr = TRUE), wday = wday(Date, label = TRUE, abbr = TRUE, week_start = 1)) walk_tidy ## # A tibble: 8,760 x 7 ## Sensor Date_Time Date Time Count month wday ## <chr> <dttm> <date> <dbl> <dbl> <ord> <ord> ## 1 Melbourne Central 2017-12-31 13:00:00 2018-01-01 0 2996 Jan Mon ## 2 Melbourne Central 2017-12-31 14:00:00 2018-01-01 1 3481 Jan Mon ## 3 Melbourne Central 2017-12-31 15:00:00 2018-01-01 2 1721 Jan Mon ## 4 Melbourne Central 2017-12-31 16:00:00 2018-01-01 3 1056 Jan Mon ## 5 Melbourne Central 2017-12-31 17:00:00 2018-01-01 4 417 Jan Mon ## 6 Melbourne Central 2017-12-31 18:00:00 2018-01-01 5 222 Jan Mon ## 7 Melbourne Central 2017-12-31 19:00:00 2018-01-01 6 110 Jan Mon ## 8 Melbourne Central 2017-12-31 20:00:00 2018-01-01 7 180 Jan Mon ## 9 Melbourne Central 2017-12-31 21:00:00 2018-01-01 8 205 Jan Mon ## 10 Melbourne Central 2017-12-31 22:00:00 2018-01-01 9 326 Jan Mon ## # … with 8,750 more rows ``` --- # Pedestrian count per month .left-code[ ```r ggplot(walk_tidy, aes(x = month, y = Count)) + geom_col() ``` ] .right-plot[ <img src="lecture_3b_files/figure-html/gg-walk-month-count-out-1.png" width="100%" style="display: block; margin: auto;" /> ] ??? - January has a very low count relative to the other months. Something can't be right with this number, because it is much lower than expected. - The remaining months have roughly the same counts. --- # Pedestrian count per weekday .left-code[ ```r ggplot(walk_tidy, aes(x = wday, y = Count)) + geom_col() ``` ] .right-plot[ <img src="lecture_3b_files/figure-html/gg-wday-count-out-1.png" width="100%" style="display: block; margin: auto;" /> ] ??? How would you describe the pattern? - Friday and Saturday tend to have a few more people walking around than other days. --- # What might be wrong with these interpretations? - There might be a different number of days of the week over the year. - This means that simply summing the counts might lead to a misinterpretation of pedestrian patterns. - Similarly, months have different numbers of days. --- class: transition # Your Turn: Brainstorm to answer these questions: 1. Are pedestrian counts different depending on the month? 2. Are pedestrian counts different depending on the day of the week?
04
:
00
--- # What are the number of pedestrians per day? ```r walk_tidy ## # A tibble: 8,760 x 7 ## Sensor Date_Time Date Time Count month wday ## <chr> <dttm> <date> <dbl> <dbl> <ord> <ord> ## 1 Melbourne Central 2017-12-31 13:00:00 2018-01-01 0 2996 Jan Mon ## 2 Melbourne Central 2017-12-31 14:00:00 2018-01-01 1 3481 Jan Mon ## 3 Melbourne Central 2017-12-31 15:00:00 2018-01-01 2 1721 Jan Mon ## 4 Melbourne Central 2017-12-31 16:00:00 2018-01-01 3 1056 Jan Mon ## 5 Melbourne Central 2017-12-31 17:00:00 2018-01-01 4 417 Jan Mon ## 6 Melbourne Central 2017-12-31 18:00:00 2018-01-01 5 222 Jan Mon ## 7 Melbourne Central 2017-12-31 19:00:00 2018-01-01 6 110 Jan Mon ## 8 Melbourne Central 2017-12-31 20:00:00 2018-01-01 7 180 Jan Mon ## 9 Melbourne Central 2017-12-31 21:00:00 2018-01-01 8 205 Jan Mon ## 10 Melbourne Central 2017-12-31 22:00:00 2018-01-01 9 326 Jan Mon ## # … with 8,750 more rows ``` --- # What are the number of pedestrians per day? ```r walk_day <- walk_tidy %>% group_by(Date) %>% summarise(day_count = sum(Count, na.rm = TRUE)) walk_day ## # A tibble: 365 x 2 ## Date day_count ## <date> <dbl> ## 1 2018-01-01 30832 ## 2 2018-01-02 26136 ## 3 2018-01-03 26567 ## 4 2018-01-04 26532 ## 5 2018-01-05 28203 ## 6 2018-01-06 20845 ## 7 2018-01-07 24052 ## 8 2018-01-08 26530 ## 9 2018-01-09 27116 ## 10 2018-01-10 28203 ## # … with 355 more rows ``` --- # What are the mean number of people per weekday? ```r walk_day %>% mutate(wday = wday(Date, label = TRUE, abbr = TRUE, week_start = 1)) ## # A tibble: 365 x 3 ## Date day_count wday ## <date> <dbl> <ord> ## 1 2018-01-01 30832 Mon ## 2 2018-01-02 26136 Tue ## 3 2018-01-03 26567 Wed ## 4 2018-01-04 26532 Thu ## 5 2018-01-05 28203 Fri ## 6 2018-01-06 20845 Sat ## 7 2018-01-07 24052 Sun ## 8 2018-01-08 26530 Mon ## 9 2018-01-09 27116 Tue ## 10 2018-01-10 28203 Wed ## # … with 355 more rows ``` --- # What are the mean number of people per weekday? ```r walk_day %>% mutate(wday = wday(Date, label = TRUE, abbr = TRUE, week_start = 1)) %>% group_by(wday) ## # A tibble: 365 x 3 ## # Groups: wday [7] ## Date day_count wday ## <date> <dbl> <ord> ## 1 2018-01-01 30832 Mon ## 2 2018-01-02 26136 Tue ## 3 2018-01-03 26567 Wed ## 4 2018-01-04 26532 Thu ## 5 2018-01-05 28203 Fri ## 6 2018-01-06 20845 Sat ## 7 2018-01-07 24052 Sun ## 8 2018-01-08 26530 Mon ## 9 2018-01-09 27116 Tue ## 10 2018-01-10 28203 Wed ## # … with 355 more rows ``` --- # What are the mean number of people per weekday? ```r walk_day %>% mutate(wday = wday(Date, label = TRUE, abbr = TRUE, week_start = 1)) %>% group_by(wday) %>% summarise(m = mean(day_count, na.rm = TRUE), s = sd(day_count, na.rm = TRUE)) ## # A tibble: 7 x 3 ## wday m s ## <ord> <dbl> <dbl> ## 1 Mon 25590. 8995. ## 2 Tue 26242. 8989. ## 3 Wed 27627. 9535. ## 4 Thu 27887. 8744. ## 5 Fri 31544. 10239. ## 6 Sat 30470. 9823. ## 7 Sun 25296. 9024. ``` --- # What are the mean number of people per weekday? ```r walk_week_day <- walk_day %>% mutate(wday = wday(Date, label = TRUE, abbr = TRUE, week_start = 1)) %>% group_by(wday) %>% summarise(m = mean(day_count, na.rm = TRUE), s = sd(day_count, na.rm = TRUE)) walk_week_day ## # A tibble: 7 x 3 ## wday m s ## <ord> <dbl> <dbl> ## 1 Mon 25590. 8995. ## 2 Tue 26242. 8989. ## 3 Wed 27627. 9535. ## 4 Thu 27887. 8744. ## 5 Fri 31544. 10239. ## 6 Sat 30470. 9823. ## 7 Sun 25296. 9024. ``` --- # What are the mean number of people per weekday? ```r ggplot(walk_week_day) + geom_errorbar(aes(x = wday, ymin = m - s, ymax = m + s)) + ylim(c(0, 45000)) + labs(x = "Day of week", y = "Average number of predestrians") ``` <img src="lecture_3b_files/figure-html/gg-walk-day-1.png" width="576" style="display: block; margin: auto;" /> --- class: transition # Distribution of counts Side-by-side boxplots show the distribution of counts over different temporal elements. --- # Hour of the day ```r ggplot(walk_tidy, aes(x = as.factor(Time), y = Count)) + geom_boxplot() ``` <img src="lecture_3b_files/figure-html/gg-time-count-1.png" width="576" style="display: block; margin: auto;" /> --- # Day of the week ```r ggplot(walk_tidy, aes(x = wday, y = Count)) + geom_boxplot() ``` <img src="lecture_3b_files/figure-html/gg-walk-weekday-count-1.png" width="576" style="display: block; margin: auto;" /> --- # Month ```r ggplot(walk_tidy, aes(x = month, y = Count)) + geom_boxplot() ``` <img src="lecture_3b_files/figure-html/gg-month-count-boxplot-1.png" width="576" style="display: block; margin: auto;" /> --- # Time series plots ## Lines show consecutive hours of the day ```r ggplot(walk_tidy, aes(x = Time, y = Count, group = Date)) + geom_line() ``` <img src="lecture_3b_files/figure-html/gg-time-count-line-1.png" width="576" style="display: block; margin: auto;" /> --- # By month ```r ggplot(walk_tidy, aes(x = Time, y = Count, group = Date)) + geom_line() + facet_wrap( ~ month) ``` <img src="lecture_3b_files/figure-html/gg-time-count-by-date-1.png" width="576" style="display: block; margin: auto;" /> --- # By week day ```r ggplot(walk_tidy, aes(x = Time, y = Count, group = Date)) + geom_line() + facet_grid(month ~ wday) ``` <img src="lecture_3b_files/figure-html/gg-time-count-line-facet-grid-1.png" width="1008" style="display: block; margin: auto;" /> --- # Calendar plots .left-code[ ```r library(sugrrants) walk_tidy_calendar <- frame_calendar(walk_tidy, x = Time, y = Count, date = Date, nrow = 4) p1 <- ggplot(walk_tidy_calendar, aes(x = .Time, y = .Count, group = Date)) + geom_line() prettify(p1) ``` ] .right-plot[ <img src="lecture_3b_files/figure-html/calendar-plot-out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Holidays ```r library(tsibble) library(sugrrants) library(timeDate) vic_holidays <- holiday_aus(2018, state = "VIC") vic_holidays ## # A tibble: 12 x 2 ## holiday date ## <chr> <date> ## 1 New Year's Day 2018-01-01 ## 2 Australia Day 2018-01-26 ## 3 Labour Day 2018-03-12 ## 4 Good Friday 2018-03-30 ## 5 Easter Saturday 2018-03-31 ## 6 Easter Sunday 2018-04-01 ## 7 Easter Monday 2018-04-02 ## 8 ANZAC Day 2018-04-25 ## 9 Queen's Birthday 2018-06-11 ## 10 Melbourne Cup 2018-11-06 ## 11 Christmas Day 2018-12-25 ## 12 Boxing Day 2018-12-26 ``` --- # Holidays ```r walk_holiday <- walk_tidy %>% mutate(holiday = if_else(condition = Date %in% vic_holidays$date, true = "yes", false = "no")) %>% mutate(holiday = if_else(condition = wday %in% c("Sat", "Sun"), true = "yes", false = holiday)) walk_holiday ## # A tibble: 8,760 x 8 ## Sensor Date_Time Date Time Count month wday holiday ## <chr> <dttm> <date> <dbl> <dbl> <ord> <ord> <chr> ## 1 Melbourne Cen… 2017-12-31 13:00:00 2018-01-01 0 2996 Jan Mon yes ## 2 Melbourne Cen… 2017-12-31 14:00:00 2018-01-01 1 3481 Jan Mon yes ## 3 Melbourne Cen… 2017-12-31 15:00:00 2018-01-01 2 1721 Jan Mon yes ## 4 Melbourne Cen… 2017-12-31 16:00:00 2018-01-01 3 1056 Jan Mon yes ## 5 Melbourne Cen… 2017-12-31 17:00:00 2018-01-01 4 417 Jan Mon yes ## 6 Melbourne Cen… 2017-12-31 18:00:00 2018-01-01 5 222 Jan Mon yes ## 7 Melbourne Cen… 2017-12-31 19:00:00 2018-01-01 6 110 Jan Mon yes ## 8 Melbourne Cen… 2017-12-31 20:00:00 2018-01-01 7 180 Jan Mon yes ## 9 Melbourne Cen… 2017-12-31 21:00:00 2018-01-01 8 205 Jan Mon yes ## 10 Melbourne Cen… 2017-12-31 22:00:00 2018-01-01 9 326 Jan Mon yes ## # … with 8,750 more rows ``` --- # Holidays ```r walk_holiday_calendar <- frame_calendar(data = walk_holiday, x = Time, y = Count, date = Date, nrow = 6) p2 <- ggplot(walk_holiday_calendar, aes(x = .Time, y = .Count, group = Date, colour = holiday)) + geom_line() + scale_colour_brewer(palette = "Dark2") ``` --- # Holidays <img src="lecture_3b_files/figure-html/show-calendar-plot-p2-1.png" width="1008" style="display: block; margin: auto;" /> --- # References - suggrants - tsibble - lubridate - dplyr - timeDate - rwalkr --- # Your Turn: - Do the lab exercises - Take the lab quiz - Use the rest of the lab time to coordinate with your group on the first assignment.