<!-- background-color: #006DAE --> <!-- class: middle center hide-slide-number --> <div class="shade_black" style="width:60%;right:0;bottom:0;padding:10px;border: dashed 4px white;margin: auto;"> <i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See <a href=/>here for PDF <i class="fas fa-file-pdf"></i></a>. </div> <br> .white[Press the **right arrow** to progress to the next slide!] --- background-image: url(images/bg1.jpg) background-size: cover class: hide-slide-number split-70 title-slide count: false .column.shade_black[.content[ <br> # .monash-blue.outline-text[ETC5510: Introduction to Data Analysis] <h2 class="monash-blue2 outline-text" style="font-size: 30pt!important;">Week 4, part B</h2> <br> <h2 style="font-weight:900!important;">Advanced topics in data visualisation</h2> .bottom_abs.width100[ Lecturer: *Nicholas Tierney & Stuart Lee* Department of Econometrics and Business Statistics
<i class="fas fa-envelope faa-float animated "></i>
ETC5510.Clayton-x@monash.edu April 2020 <br> ] ]] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> --- class: transition # While the song is playing... Draw a mental model / concept map of last lectures content on joins. --- class: refresher # recap - Joins - venn diagrams - feedback --- # Joins with a person and a coat, by [Leight Tami](https://twitter.com/leigh_tami18/status/1021471889309487105/photo/1) <img src="images/joins_using_coat.jpg" width="100%" style="display: block; margin: auto;" /> --- # Upcoming Due Dates - Assignment 1: ... - Other due dates? - Stay tuned on ED for the upcoming dates --- # Making effective data plots 1. Principles / science of data visualisation 2. Features of graphics --- # Principles / science of data visualisation - Palettes and colour blindness - change blindness - using proximity - hierarchy of mappings --- # Features of graphics - Layering statistical summaries - Themes - adding interactivity --- # Palettes and colour blindness There are three main types of colour palette: - Qualitative: categorical variables - Sequential: low to high numeric values - Diverging: negative to positive values --- # Qualitative: categorical variables <img src="lecture_4b_files/figure-html/print-qual-pal-1.png" width="100%" style="display: block; margin: auto;" /> --- # Sequential: low to high numeric values <img src="lecture_4b_files/figure-html/print-seq-pal-1.png" width="100%" style="display: block; margin: auto;" /> --- # Diverging: negative to positive values <img src="lecture_4b_files/figure-html/print-div-pal-1.png" width="100%" style="display: block; margin: auto;" /> --- # Example: TB data ``` ## # A tibble: 157,820 x 5 ## country year count gender age ## <chr> <dbl> <dbl> <chr> <chr> ## 1 Afghanistan 1980 NA m 04 ## 2 Afghanistan 1981 NA m 04 ## 3 Afghanistan 1982 NA m 04 ## 4 Afghanistan 1983 NA m 04 ## 5 Afghanistan 1984 NA m 04 ## 6 Afghanistan 1985 NA m 04 ## 7 Afghanistan 1986 NA m 04 ## 8 Afghanistan 1987 NA m 04 ## 9 Afghanistan 1988 NA m 04 ## 10 Afghanistan 1989 NA m 04 ## # … with 157,810 more rows ``` --- # Example: TB data: adding relative change ``` ## # A tibble: 219 x 4 ## country `2002` `2012` reldif ## <chr> <dbl> <dbl> <dbl> ## 1 Afghanistan 6509 13907 1.14 ## 2 Albania 225 185 -0.178 ## 3 Algeria 8246 7510 -0.0893 ## 4 American Samoa 1 0 -1 ## 5 Andorra 2 2 0 ## 6 Angola 17988 22106 0.229 ## 7 Anguilla 0 0 0 ## 8 Antigua and Barbuda 4 1 -0.75 ## 9 Argentina 5383 4787 -0.111 ## 10 Armenia 511 316 -0.382 ## # … with 209 more rows ``` --- # Example: Sequential colour with default palette ```r ggplot(tb_map) + geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) + theme_map() ``` <img src="lecture_4b_files/figure-html/map-default-1.png" width="80%" style="display: block; margin: auto;" /> --- # Example: (improved) sequential colour with default palette ```r library(viridis) ggplot(tb_map) + geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) + theme_map() + scale_fill_viridis(na.value = "white") ``` <img src="lecture_4b_files/figure-html/viridis-plot-1.png" width="80%" style="display: block; margin: auto;" /> --- # Example: Diverging colour with better palette ```r ggplot(tb_map) + geom_polygon(aes(x = long, y = lat, group = group, fill = reldif)) + theme_map() + scale_fill_distiller(palette = "PRGn", na.value = "white", limits = c(-7, 7)) ``` <img src="lecture_4b_files/figure-html/map-distiller-1.png" width="80%" style="display: block; margin: auto;" /> --- # Summary on colour palettes - Different ways to map colour to values: - Qualitative: categorical variables - Sequential: low to high numeric values - Diverging: negative to positive values --- # Colour blindness - About 8% of men (about 1 in 12), and 0.5% women (about 1 in 200) population have difficulty distinguishing between red and green. - Several colour blind tested palettes: RColorbrewer has an associated web site [colorbrewer.org](http://colorbrewer2.org) where the palettes are labelled. See also `viridis`, and `scico`. --- # Plot of two coloured points: Normal Mode <img src="lecture_4b_files/figure-html/colour-blind-1.png" width="100%" style="display: block; margin: auto;" /> --- # Plot of two coloured points: dicromat mode <img src="lecture_4b_files/figure-html/colour-blind-2-1.png" width="100%" style="display: block; margin: auto;" /> --- # Showing all types of colourblindness <img src="lecture_4b_files/figure-html/colourblindr-show-1.png" width="100%" style="display: block; margin: auto;" /> --- # Impact of colourblind-safe palette ```r p2 <- p + scale_colour_brewer(palette = "Dark2") p2 ``` <img src="lecture_4b_files/figure-html/colourblindr-brewer-1.png" width="100%" style="display: block; margin: auto;" /> --- # Impact of colourblind-safe palette <img src="lecture_4b_files/figure-html/cb-grid-1.png" width="100%" style="display: block; margin: auto;" /> --- # Impact of colourblind-safe palette ```r p3 <- p + scale_colour_viridis_d() p3 ``` <img src="lecture_4b_files/figure-html/colourblindr-viridis-1.png" width="100%" style="display: block; margin: auto;" /> --- # Impact of colourblind-safe palette <img src="lecture_4b_files/figure-html/cb-grid-viridis-1.png" width="100%" style="display: block; margin: auto;" /> --- # Summary colour blindness - Apply colourblind-friendly colourscales - `+ scale_colour_viridis()` - `+ scale_colour_brewer(palette = "Dark2")` - `scico` R package --- # Pre-attentiveness: Find the odd one out? <img src="lecture_4b_files/figure-html/pre-attentiveness-1.png" width="100%" style="display: block; margin: auto;" /> --- # Pre-attentiveness: Find the odd one out? <img src="lecture_4b_files/figure-html/pre-attentive-easier-1.png" width="100%" style="display: block; margin: auto;" /> --- class: idea # Using proximity in your plots Basic rule: place the groups that you want to compare close to each other --- # Which plot answers which question? - "Is the incidence similar for males and females in 2012 across age groups?" - "Is the incidence similar for age groups in 2012, across gender?" --- # incidence similar for: (M and F) or (age, across gender) ?" <img src="lecture_4b_files/figure-html/print-many-tb-1.png" width="100%" style="display: block; margin: auto;" /><img src="lecture_4b_files/figure-html/print-many-tb-2.png" width="100%" style="display: block; margin: auto;" /> ??? Here are two different arrangements of the tb data. To answer the question "Is the incidence similar for males and females in 2012 across age groups?" the first arrangement is better. It puts males and females right beside each other, so the relative heights of the bars can be seen quickly. The answer to the question would be "No, the numbers were similar in youth, but males are more affected with increasing age." The second arrangement puts the focus on age groups, and is better to answer the question "Is the incidence similar for age groups in 2012, across gender?" To which the answer would be "No, among females, the incidence is higher at early ages. For males, the incidence is much more uniform across age groups." --- # "Incidence similar for M & F in 2012 across age?" <img src="lecture_4b_files/figure-html/gg-fill-gender-print-1.png" width="100%" style="display: block; margin: auto;" /> - Males & females next to each other: relative heights of bars is seen quickly. - Auestion answer: "No, the numbers were similar in youth, but males are more affected with increasing age." --- # "Incidence similar for age in 2012, across gender?" <img src="lecture_4b_files/figure-html/gg-fill-age-print-1.png" width="100%" style="display: block; margin: auto;" /> - Puts the focus on age groups - Answer to the question: "No, among females, the incidence is higher at early ages. For males, the incidence is much more uniform across age groups." --- # Proximity wrap up - Facetting of plots, and proximity are related to change blindness, an area of study in cognitive psychology. - There are a series of fabulous videos illustrating the effects of making a visual break, on how the mind processes it by Daniel Simons lab. - Here's one example: [The door study](https://www.youtube.com/watch?v=FWSxSQsspiQ) --- # Layering - *Statistical summaries:* It is common to layer plots, particularly by adding statistical summaries, like a model fit, or means and standard deviations. The purpose is to show the **trend** in relation to the **variation**. - *Maps:* Commonly maps provide the framework for data collected spatially. One layer for the map, and another for the data. --- # `geom_point()` ```r ggplot(df, aes(x = x, y = y1)) + geom_point() ``` <img src="lecture_4b_files/figure-html/point-1-1.png" width="100%" style="display: block; margin: auto;" /> --- # `geom_smooth(method = "lm", se = FALSE)` ```r ggplot(df, aes(x = x, y = y1)) + geom_point() + geom_smooth(method = "lm", se = FALSE) ``` <img src="lecture_4b_files/figure-html/point-2-1.png" width="100%" style="display: block; margin: auto;" /> --- # `geom_smooth(method = "lm")` ```r ggplot(df, aes(x = x, y = y1)) + geom_point() + geom_smooth(method = "lm") ``` <img src="lecture_4b_files/figure-html/point-3-1.png" width="100%" style="display: block; margin: auto;" /> --- # `geom_point()` ```r ggplot(df, aes(x = x, y = y2)) + geom_point() ``` <img src="lecture_4b_files/figure-html/point-4-1.png" width="100%" style="display: block; margin: auto;" /> --- # `geom_smooth(method = "lm", se = FALSE)` ```r ggplot(df, aes(x = x, y = y2)) + geom_point() + geom_smooth(method = "lm", se = FALSE) ``` <img src="lecture_4b_files/figure-html/point-5palette-1.png" width="100%" style="display: block; margin: auto;" /> --- `geom_smooth(se = FALSE)` ```r ggplot(df, aes(x = x, y = y2)) + geom_point() + geom_smooth(se = FALSE) ``` <img src="lecture_4b_files/figure-html/point-6palette-1.png" width="100%" style="display: block; margin: auto;" /> --- `geom_smooth(se = FALSE, span = 0.05)` ```r ggplot(df, aes(x = x, y = y2)) + geom_point() + geom_smooth(se = FALSE, span = 0.05) ``` <img src="lecture_4b_files/figure-html/point-7palette-1.png" width="100%" style="display: block; margin: auto;" /> --- # `geom_smooth(se = FALSE, span = 0.2)` ```r p1 <- ggplot(df, aes(x = x, y = y2)) + geom_point() + geom_smooth(se = FALSE, span = 0.2) p1 ``` <img src="lecture_4b_files/figure-html/point-8palette-1.png" width="100%" style="display: block; margin: auto;" /> --- # Interactivity with magic plotly ```r library(plotly) ggplotly(p1) ```
--- # Themes: Add some style to your plot .left-code[ ```r p <- ggplot(mtcars) + geom_point(aes(x = wt, y = mpg, colour = factor(gear))) + facet_wrap(~am) p ``` ] .right-plot[ <img src="lecture_4b_files/figure-html/mtcars-out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Theme: theme_minimal .left-code[ ```r p + theme_minimal() ``` ] .right-plot[ <img src="lecture_4b_files/figure-html/mtcars-minimal-out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Theme: ggthemes `theme_few()` .left-code[ ```r p + theme_few() + scale_colour_few() ``` ] .right-plot[ <img src="lecture_4b_files/figure-html/mtcars-theme-few-out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Theme: ggthemes `theme_excel()` 🤧 .left-code[ ```r p + theme_excel() + scale_colour_excel() ``` ] .right-plot[ <img src="lecture_4b_files/figure-html/mtcars-theme-excel-out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Theme: for fun .left-code[ ```r library(wesanderson) p + scale_colour_manual( values = wes_palette("Royal1") ) ``` ] .right-plot[ <img src="lecture_4b_files/figure-html/theme-wes-out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Summary: themes - The `ggthemes` package has many different styles for the plots. - Other packages such as `xkcd`, `skittles`, `wesanderson`, `beyonce`, `ochre`, .... --- # Hierarchy of mappings 1. Position - common scale (BEST): axis system 2. Position - nonaligned scale: boxes in a side-by-side boxplot 3. Length, direction, angle: pie charts, regression lines, wind maps 4. Area: bubble charts 5. Volume, curvature: 3D plots 6. Shading, color (WORST): maps, points coloured by numeric variable - [Di's crowd-sourcing expt](http://visiphilia.org/2016/08/03/CM-hierarchy) - Nice explanation by [Peter Aldous](http://paldhous.github.io/ucb/2016/dataviz/week2.html) - [General plotting advice and a book from Naomi Robbins](https://www.forbes.com/sites/naomirobbins/#2b1e20082a6a) --- # Your Turn: - lab quiz open (requires answering questions from Lab exercise) - go to rstudio and check out exercise 4-B - If you want to use R / Rstudio on your laptop: - Install R + Rstudio (see ) - open R - type the following: ```r # install.packages("usethis") library(usethis) use_course("mida.numbat.space/exercises/4b/mida-exercise-4b.zip") ``` --- # Resources - Kieran Healy [Data Visualization](http://socviz.co/index.html) - Winston Chang (2012) [Cookbook for R](graphics cookbook) - Antony Unwin (2014) [Graphical Data Analysis](http://www.gradaanwr.net) - Naomi Robbins (2013) [Creating More Effective Charts](http://www.nbr-graphs.com)