<!-- background-color: #006DAE --> <!-- class: middle center hide-slide-number --> <div class="shade_black" style="width:60%;right:0;bottom:0;padding:10px;border: dashed 4px white;margin: auto;"> <i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See <a href=/>here for PDF <i class="fas fa-file-pdf"></i></a>. </div> <br> .white[Press the **right arrow** to progress to the next slide!] --- background-image: url(images/bg1.jpg) background-size: cover class: hide-slide-number split-70 title-slide count: false .column.shade_black[.content[ <br> # .monash-blue.outline-text[ETC5510: Introduction to Data Analysis] <h2 class="monash-blue2 outline-text" style="font-size: 30pt!important;">Week 10, part B</h2> <br> <h2 style="font-weight:900!important;">Classification Trees</h2> .bottom_abs.width100[ Lecturer: *Professer Di Cook & Nicholas Tierney & Stuart Lee* Department of Econometrics and Business Statistics
<i class="fas fa-envelope faa-float animated "></i>
ETC5510.Clayton-X@monash.edu May 2020 <br> ] ]] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> --- # recap - Decision Tree --- # Admin - Assignment 2 peer evaluation, to be completed on ED by next Monday - Project - Talk to us about your data in class and at consults, next milestone is due on Friday and is avalaible on the assessments tab on ED - Practical exam - Next Wednesday from 6pm Wednesday, closing 6pm Thursday --- # What is a decision tree? .pull-left[ Tree based models consist of one or more of nested `if-then` statements for the predictors that partition the data. Within these partitions, a model is used to predict the outcome. ] .pull-right[ <img src="images/tree.jpg" width="100%" style="display: block; margin: auto;" /> .small[Source: [Egor Dezhic](becominghuman.ai)] ] --- # Regression Tree .pull-left[ <img src="lecture_10b_files/figure-html/reg-tree-split-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="lecture_10b_files/figure-html/show-split-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Regression Tree .pull-left[ <img src="lecture_10b_files/figure-html/show-split-again-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="lecture_10b_files/figure-html/rpart-plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Regression tree - What if we want to predict something being in a particular group? Say,predicting whether someone passes a course based on two exam scores: - Moving from continuous to categorical response. <img src="lecture_10b_files/figure-html/unnamed-chunk-1-1.png" width="80%" style="display: block; margin: auto;" /> --- # Regression? Classification? - Regression trees give the predicted response for an observation by using the mean response of the observations that belong to the same terminal node: <img src="lecture_10b_files/figure-html/show-reg-pred-1.png" width="100%" style="display: block; margin: auto;" /> --- # Classification A classification tree predicts each observation belonging to the most commonly occurring class of observations. However, when we interpret a classification tree, we are often interested not only in the class prediction (what is most common), but also the proportion of correct classifications. --- # Building a classification tree - Similar approach to building a classification tree as for regression trees - We use this "recursive binary splitting" approach - But we don't use the residual sums of squares $$ SS_T = \sum (y_i-\bar{y})^2 $$ Since we now have a category, we need some way to describe that. We need something else! --- # Classification tree - We can use the "classification error". - Where we count up the number of mis-classified things, and choose the split that has the lowest number of mis-classified things. - We can represent this in an equation as the .orange[fraction of observations in a region which don't belong to the most common class]. `$$E = 1 - \text{max}_{k}(\hat{p}_{mk})$$` Here, `\(\hat{p}_{mk}\)` refers to the proportion of observations in the `\(m\)`th region, from the `\(k\)`th class. --- # Understanding classification Another way to think about this is to understand when E is zero, and when E is large `\(E = 1 - \text{max}_{k}(\hat{p}_{mk})\)` E is zero when `\(\text{max}_{k}(\hat{p}_{mk})\)` is 1, which is 1 when observations are the same class: --- # Classification trees - A classification tree is used to predict a .orange[categorical response] and regression tree is used to predict a quantitative response - Use a recursive binary splitting to grow a classification tree. That is, sequentially break the data into two subsets, typically using a single variable each time. - The predicted value for a new observation, `\(x_0\)`, will be the .orange[most commonly occurring class] of observations in the sub-region in which `\(x_0\)` falls --- # Predicting pass or fail ? Consider the dataset `Exam` where two exam scores are given for each student, and a class `Label` represents whether they passed or failed the course. .pull-left[ ``` ## Exam1 Exam2 Label ## 1 34.62366 78.02469 0 ## 2 30.28671 43.89500 0 ## 3 35.84741 72.90220 0 ## 4 60.18260 86.30855 1 ``` ] .pull-right[ <img src="lecture_10b_files/figure-html/unnamed-chunk-2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Your turn: Open "10b-exercise-intro.Rmd" and let's decide a point to split the data. --- # Calculate the number of misclassifications Along all splits for `Exam1` classifying according to the majority class for the left and right splits <img src="gifs/two_d_cart.gif" width="80%" style="display: block; margin: auto;" /> Red dots are .orange["fails"], blue dots are .green["passes"], and crosses indicate misclassifications. .small[Source: John Ormerod, U.Syd] --- # Calculate the number of misclassifications Along all splits for `Exam2` classifying according to the majority class for the top and bottom splits <img src="gifs/two_d_cart2.gif" width="80%" style="display: block; margin: auto;" /> Red dots are .orange["fails"], blue dots are .green["passes"], and crosses indicate misclassifications. .small[Source: John Ormerod, U.Syd] --- # Combining the results from `Exam1` and `Exam2` splits - The minimum number of misclassifications from using all possible splits of `Exam1` was 19 when the value of `Exam1` was **56.7** - The minimum number of misclassifications from using all possible splits of `Exam2` was 23 when the value of `Exam2` was .orange[52.5] -- So we split on the best of these, i.e., split the data on `Exam1` at 56.7. --- # Split criteria - purity/impurity metrics It turns out that classification error is not sufficiently sensitive for tree-growing. In practice two other measures are preferable, as they are more sensitive: - The Gini Index and - Information Entropy. They are both quite similar numerically. Small values mean that a node contains mostly observations of a single class, referred to as .orange[node purity]. --- # Example - predicting heart disease `\(Y\)`: presence of heart disease (Yes/No) `\(X\)`: heart and lung function measurements ``` ## [1] "Age" "Sex" "ChestPain" "RestBP" "Chol" "Fbs" ## [7] "RestECG" "MaxHR" "ExAng" "Oldpeak" "Slope" "Ca" ## [13] "Thal" "AHD" ``` <img src="lecture_10b_files/figure-html/rpart-heart-1.png" width="70%" style="display: block; margin: auto;" /> --- # Deeper trees Trees can be built deeper by: - decreasing the value of the complexity parameter `cp`, which sets the difference between impurity values required to continue splitting. - reducing the `minsplit` and `minbucket` parameters, which control the number of observations below splits are forbidden. <img src="lecture_10b_files/figure-html/deeper-trees-1.png" width="70%" style="display: block; margin: auto;" /> --- # Tabulate true vs predicted to make a .orange[confusion table]. <center> <table> <tr> <td> </td><td> </td> <td colspan="2" align="center" > true </td> </tr> <tr> <td> </td><td> </td> <td align="center" bgcolor="#daf2e9" width="80px"> C1 (positive) </td> <td align="center" bgcolor="#daf2e9" width="80px"> C2 (negative) </td> </tr> <tr height="50px"> <td> pred- </td><td bgcolor="#daf2e9"> C1 </td> <td align="center" bgcolor="#D3D3D3"> <em>a</em> </td> <td align="center" bgcolor="#D3D3D3"> <em>b</em> </td> </tr> <tr height="50px"> <td>icted </td><td bgcolor="#daf2e9"> C2</td> <td align="center" bgcolor="#D3D3D3"> <em>c</em> </td> <td align="center" bgcolor="#D3D3D3"> <em>d</em> </td> </tr> </table> </center> - .orange[Accuracy: *(a+d)/(a+b+c+d)*] - .orange[Error: *(b+c)/(a+b+c+d)*] - Sensitivity: *a/(a+c)* (true positive, recall) - Specificity: *d/(b+d)* (true negative) - .orange[Balanced accuracy: *(sensitivity+specificity)/2*] --- # Confusion and error ``` ## Reference ## Prediction No Yes ## No 75 5 ## Yes 11 58 ## Accuracy ## 0.8926174 ``` --- # Example - Crabs Physical measurements on WA crabs, males and females. .small[*Data source*: Campbell, N. A. & Mahon, R. J. (1974)] <img src="lecture_10b_files/figure-html/read-crabs-1.png" width="50%" style="display: block; margin: auto;" /> --- # Example - Crabs <img src="lecture_10b_files/figure-html/crabs-plot-1.png" width="80%" style="display: block; margin: auto;" /> --- # Comparing models .pull-left[ Classification tree <img src="lecture_10b_files/figure-html/unnamed-chunk-4-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ Linear classifier <img src="lecture_10b_files/figure-html/unnamed-chunk-5-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Strengths and Weaknesses Strengths: - The decision rules provided by trees are very easy to explain, and follow. A simple classification model. - Trees can handle a mix of predictor types, categorical and quantitative. - Trees efficiently operate when there are missing values in the predictors. Weaknesses: - Algorithm is greedy, a better final solution might be obtained by taking a second best split earlier. - When separation is in linear combinations of variables trees struggle to provide a good classification --- # 👩💻 Made by a human with a computer - Slides inspired by [https://iml.numbat.space](https://iml.numbat.space), [https://github.com/numbats/iml](https://github.com/numbats/iml). - Created using [R Markdown](https://rmarkdown.rstudio.com) with flair by [**xaringan**](https://github.com/yihui/xaringan), and [**kunoichi** (female ninja) style](https://github.com/emitanaka/ninja-theme). <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.