class: center, middle, inverse, title-slide # Training Workshop on Data Visualization using ggplot2 in R ## Session 2: Understanding grammar of graphics --- layout: true --- ## Outline + The grammar of graphics + Datasets and mapping + Geometries + Statistical transformation and plotting distribution + Position adjustment and scales + Coordinates and themes + Facets and custom plots --- class: middle center # The grammar of graphics ---- --- ## Why ggplot2? .leftcol[ + Most requested programming languages for data scientists are R and Python. + ggplot2 as a visualization package for R, is becoming an industry standard for visualization. ] .rightcol[ <img src="image/ggplot2_logo.png" width="40%" style="display: block; margin: auto;" /> ] --- ## Why ggplot2? .leftcol[ + Relies on an underlying grammar, called layered grammar of graphics + Grammars on statistical graphics: + Bertin, 1983 + Wilkinson, 2005 + Wickham, 2010 ] .rightcol[ <img src="image/gg_book_cover.jfif" width="40%" style="display: block; margin: auto;" /> .center[Source: [The grammar of graphics](https://link.springer.com/book/10.1007/0-387-28695-0)] ] --- ## Why a grammar? .leftcol[ + You can create new sentences if you know about the grammar. + In ggplot2 context, you can create new graphics or tailored plot that suits your needs or preferences. ] .rightcol[ <img src="image/gg_book_cover.jfif" width="40%" style="display: block; margin: auto;" /> .center[Source: [The grammar of graphics](https://link.springer.com/book/10.1007/0-387-28695-0)] ] --- ## The idea of grammar of graphics .left[ <br> <img src="image/gg0.png" width="50%" style="display: block; margin: auto;" /> ] --- ## The idea of grammar of graphics .leftcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> ] .rightcol[ <img src="image/gg_data.png" width="50%" style="display: block; margin: auto;" /> ```r ggplot(data = data1) ``` ] --- ## The idea of grammar of graphics .leftcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> ] .rightcol[ <img src="image/gg_map.png" width="50%" style="display: block; margin: auto;" /> ```r ggplot(data = data1, * mapping = aes(x = price, y = carat, color = clarity)) ``` ] --- ## The idea of grammar of graphics .leftcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> ] .rightcol[ <img src="image/gg_geom.png" width="50%" style="display: block; margin: auto;" /> ```r ggplot(data = data1, mapping = aes(x = price, y = carat, color = clarity)) + * geom_point() ``` ] --- ## The idea of grammar of graphics .leftcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ] .rightcol[ <img src="image/gg_stat.png" width="50%" style="display: block; margin: auto;" /> ```r ggplot(data = data1, mapping = aes(x = price, y = carat, color = clarity)) + geom_point() + * stat_smooth() ``` ] --- ## The idea of grammar of graphics .leftcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" /> ] .rightcol[ <img src="image/gg_scale.png" width="50%" style="display: block; margin: auto;" /> ```r ggplot(data = data1, mapping = aes(x = price, y = carat, color = clarity)) + geom_point() + stat_smooth() + * scale_x_log10() ``` ] --- ## The idea of grammar of graphics .leftcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> ] .rightcol[ <img src="image/gg_coord.png" width="50%" style="display: block; margin: auto;" /> ```r ggplot(data = data1, mapping = aes(x = price, y = carat, color = clarity)) + geom_point() + stat_smooth() + scale_x_log10() + * coord_flip() ``` ] --- ## The idea of grammar of graphics .leftcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] .rightcol[ <img src="image/gg_facet.png" width="50%" style="display: block; margin: auto;" /> ```r ggplot(data = data1, mapping = aes(x = price, y = carat, color = clarity)) + geom_point() + stat_smooth() + scale_x_log10() + coord_flip() + * facet_wrap(~ cut) ``` ] --- ## The idea of grammar of graphics .leftcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-27-1.png" width="100%" style="display: block; margin: auto;" /> ] .rightcol[ <img src="image/gg_theme.png" width="50%" style="display: block; margin: auto;" /> ```r ggplot(data = data1, mapping = aes(x = price, y = carat, color = clarity)) + geom_point() + stat_smooth() + scale_x_log10() + coord_flip() + facet_wrap(~ cut) + * theme_light() ``` ] --- class: middle center # Data and mapping ---- --- ## Data .leftcol60[ + syntax ```r ggplot(data = <dataset>) ``` + For ggplot graphs, data are usually wrangled. + Tidy data + Each variable is a column + Each observation is a row + Each value is a cell ] .rightcol40[ <img src="image/gg_data.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Mappings .leftcol60[ + syntax ```r ggplot(data = <dataset>, * mapping = aes(x = <var1>, y = <var2>, ...)) ``` + Variables are mapped to graphic's visual properties with **aesthetics mapping** + Usually we map: + one variable on x axis + one variable on y axis + mapped to color, shape, fill, group, etc. ] .rightcol40[ <img src="image/gg_map.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Data and mappings .leftcol60[ + Can specify data and mappings in the plot category ```r ggplot(data = mydata, mapping = aes(x = varX, y = varY)) ``` + Or specify for each layer ```r ggplot() + geom_point(data = mydata, mapping = aes(x = varX, y =varY)) ``` ] .rightcol40[ <img src="image/gg_data_map.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Data and mapping .leftcol[ ```r ggplot(data = mpg, * mapping = aes(x = displ, y = hwy, color = class)) + geom_point() + geom_smooth(se = FALSE) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-38-1.png" width="100%" style="display: block; margin: auto;" /> ] .rightcol[ ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + * geom_point(aes(color = class)) + geom_smooth(se = FALSE) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-40-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Aesthetic mapping .leftcol[ **Setting vs mapping** Important difference + **map** an aesthetic to a **variable** + **set** an aesthetic to a **constant** value ] .rightcol[ <img src="image/aesthetics_list.jpg" width="100%" style="display: block; margin: auto;" /> ] --- ## Aesthetic mapping .leftcol[ **mapping** ```r ggplot(data = mpg, * mapping = aes(x = displ, y = hwy, color = class)) + geom_point() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-43-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .rightcol[ **setting** ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) + * geom_point(color = "red") ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-45-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle center # Geometries ---- --- ## Geometries .leftcol60[ + syntax ```r ggplot(data = <dataset>, mapping = aes(x = <varX>, y = <varY>, ...)) + * geom_<function>(...) ``` + Geometry stands for geom function + Tell R how to render each data point on a given figure. ] .rightcol40[ <img src="image/gg_geom.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Geometries .center[ <img src="image/geom_collection.png" width="70%" style="display: block; margin: auto;" /> .fifty[ Source: [National Bioinformatics Insfrastructure Sweden (NBIS), 2019](https://github.com/NBISweden/RaukR-2019/blob/master/docs/ggplot/presentation/ggplot_presentation_assets/geoms.png) ] ] --- ## Geometries .leftcol40[ + Each geom can display certain aesthetics. + Some of them are required. ] .rightcol60[ <img src="image/geom_aes_summary.jpg" width="90%" style="display: block; margin: auto;" /> ] --- ## Geometries .leftcol40[ **Line plots** Aesthetics of `geom_path`, `geom_line`, `geom_step`: + x + y + alpha + colour/ color + linetype + size + group We will use the `babynames` data from the `babynames` package for demonstration. ] .rightcol60[ ```r install.packages("babynames") library(babynames) glimpse(babynames) ``` ``` Rows: 1,924,665 Columns: 5 $ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880,~ $ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", ~ $ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida",~ $ n <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258,~ $ prop <dbl> 0.07238359, 0.02667896, 0.02052149, 0.01986579, 0.01788843, 0.016~ ``` ] --- ## Geometries .leftcol40[ **Line plots: practice exercise** + Recreate the plot shown on the right. ] .rightcol60[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-52-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Geometries .leftcol60[ **Scatterplots** We can derived plots like: + Connected scatter plot (if `geom_line` is added) + Bubble plot (mapping size to a variable) To avoid overlappping + alpha aesthetic + "jitter" or `geom_jitter` for position ] .rightcol40[ ```r geom_point( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) ``` ] --- ## Geometries .leftcol60[ **Scatterplots** #### Colors + continuous data + `scale_color_gradient` + `scale_fill_gradient` + discrete data + `scale_color_manual` + `scale_fill_manual` ] .rightcol40[ ```r geom_point( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) ``` ] --- ## Geometries .leftcol40[ **Scatter plot: practice exercise** Use the `mpg` data to recreate the plot shown on the right. ] .rightcol60[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-55-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle center # Statistical transformation and plotting distribution ---- --- ## Statistics .leftcol60[ + syntax ```r ggplot(data = <dataset>, mapping = aes(x = <varX>, y = <varY>, ...)) + * geom_<function>(..., stat = <stat>, position = <position>) + * geom_<stat>(...) ``` + Every layer has a statistical transformation associated to it. + **geoms** control the way the plot looks + **stats** control the way the data is transformed ] .rightcol40[ <img src="image/gg_stat.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Statistics **Geoms and stats** .leftcol[ Every geometry has a default stat. + `geom_line` default stat is `stat_identity` + `geom_point` default stat is `stat_identity` + `geom_smooth` default stat is `stat_smooth` Each stat has a default geom + `stat_smooth` default geom is `geom_smooth` + `stat_count` default geom is `geom_bar` + `stat_sum` default geom is `geom_point` ] .rightcol40[ <img src="image/gg_stat.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Statistics .leftcol[ **Interesting stats** + `stat_smooth(geom_smooth)` + `stat_unique(geom_point)` + `stat_summary(geom_pointrange` + `stat_count(geom_bar)` + `stat_bin(geom_histogram)` + `stat_density(geom_density)` + `stat_boxplot(geom_boxplot)` + `stat_ydensity(geom_violin)` ] .rightcol40[ <img src="image/gg_stat.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Statistics .leftcol[ **Computed aesthetics** When a stat perform a transformation, new variables are created. e.g., in `geom_histogram` computed variables are: + **`count`** - number of points in bin + **`density`** - density of points in bins, scaled to integrate to 1 ncount. + **`ncount`** - count, scaled to maximum of 1 + **`ndensity`** - density, scaled to maximum of 1 To access: + old way: **`..<stat name>..`** + new way: **`stat(name)`** ] .rightcol[ ```r ggplot(data = mpg, mapping = aes(x = displ)) + * geom_histogram(aes(y = ..count..)) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-60-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Statistics .leftcol[ **Computed aesthetics** When a stat perform a transformation, new variables are created. e.g., in `geom_histogram` computed variables are: + **`count`** - number of points in bin + **`density`** - density of points in bins, scaled to integrate to 1 ncount. + **`ncount`** - count, scaled to maximum of 1 + **`ndensity`** - density, scaled to maximum of 1 To access: + old way: **`..<stat name>..`** + new way: **`stat(name)`** ] .rightcol[ ```r ggplot(data = mpg, mapping = aes(x = displ)) + * geom_histogram(aes(y = stat(density))) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-61-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ Ways to look at distributions: + Histograms + Frequency polygons + Density plots + Boxplots + Violin plots ] .rightcol60[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-62-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **Histogram and freq polygon** **`geom_histogram`** + display counts with bars + require continuous data ] .rightcol60[ ```r ggplot(data = data1, mapping = aes(x = price)) + * geom_histogram() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-63-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **Histogram and freq polygon** **`geom_histogram`** + display counts with bars + require continuous data **`geom_freqpoly`** + use lines instead of bars + same parameters can be applied + bindwith + bins ] .rightcol60[ ```r ggplot(data = data1, mapping = aes(x = price)) + * geom_freqpoly() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-64-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **Histogram and freq polygon** **`geom_histogram`** + display counts with bars + require continuous data **`geom_freqpoly`** + use lines instead of bars + same parameters can be applied + bindwith + bins ] .rightcol60[ ```r ggplot(data = data1, mapping = aes(x = price)) + * geom_freqpoly() + * geom_histogram(alpha = 0.4) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-65-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **`Density plots`** **`geom_density`** + a smoothed version of the frequency polygon + different from **`geom_area`** where aesthetic y is needed ] .rightcol60[ ```r ggplot(data = data1, mapping = aes(x = depth)) + * geom_density() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-66-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **`Density plots`** **`geom_density`** + a smoothed version of the frequency polygon + different from **`geom_area`** where aesthetic y is needed ] .rightcol60[ ```r ggplot(data = data1, mapping = aes(x = depth, color = cut, fill = cut)) + * geom_density(alpha = 0.4) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-67-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **`Density plots`** **`geom_density`** + a smoothed version of the frequency polygon + different from **`geom_area`** where aesthetic y is needed **`geom_density_ridges`** + available in **ggridges** package + create a ridgeline plots ] .rightcol60[ ```r install.packages("ggridges") library(ggridges) ggplot(data = data1, mapping = aes(x = depth, color = cut, fill = cut)) + * geom_density_ridges(aes(y = cut)) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-69-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **Practice exercise** Use the `mpg` data to recreate the plot shown on the right. ] .rightcol60[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-70-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **Boxplot** **`geom_boxplot`** + Interesting parameters + width and varwidth + show.legend + outlier.alpha + outlier.shape ] .rightcol60[ ```r ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + * geom_boxplot() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-71-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **Violin plot** **`geom_violin`** + Interesting parameters + trim + scale + draw_quantiles ] .rightcol60[ ```r ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + * geom_violin() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-72-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Displaying distribution .leftcol40[ **Practice exercise** Use the `mpg` data to recreate the plot shown on the right. ] .rightcol60[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-73-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle center # Scales and position adjustments ---- --- ## Scales and position adjustments .leftcol60[ + syntax ```r ggplot(data = <dataset>, mapping = aes(x = <varX>, y = <varY>, ...)) + geom_<function>(..., stat = <stat>, position = <position>) +0 geom_<stat>(...) + * scale_<aesthetic>_<type> ``` + **Scales** control how data values are translated to visual properties + Can overide default scales like axis,legend, and transformation of data to aesthetics. ] .rightcol40[ <img src="image/gg_scale.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Scales .leftcol40[ Scales belong to one these types: + continuous scale + discrete scale + binned scale Naming scheme: + *scale + aesthetic + name of scale* + `scale_*_continuous()` + `scale_*_discrete()` + `scale_*_manual()` ] -- .rightcol60[ <img src="image/scale_collection.jpg" width="100%" style="display: block; margin: auto;" /> ] --- ## Scales .leftcol40[ ] .rightcol60[ <img src="image/scale_params.jpg" width="100%" style="display: block; margin: auto;" /> ] --- ## Scales .leftcol[ ```r p <- ggplot(iris_data, aes(x = sepal_length, y = sepal_width, color = species)) + geom_point(size = 2) p ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-79-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .rightcol[ ```r p + scale_color_manual(name = "Manual", values = c("#5BC0EB","#FDE74C","#9BC53D")) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-80-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Scale .leftcol60[ **Position scales** #### Continuous + scale_x_continuous | scale_y_continuous + scale_x_log10 | scale_y_log10 + scale_x_reverse | scale_y_reverse + scale_x_sqrt | scale_x_sqrt ] .rightcol40[ ``` scale_x_continuous( name = waiver(), breaks = waiver(), minor_breaks = waiver(), n.breaks = NULL, labels = waiver(), limits = NULL, expand = waiver() oob = censor, na.value = NA_real_, trans = "identity", position = "bottom", guide = waiver, ) ``` ] --- ## Scale .leftcol60[ **Position scales** #### Continuous + scale_x_continuous | scale_y_continuous + scale_x_log10 | scale_y_log10 + scale_x_reverse | scale_y_reverse + scale_x_sqrt | scale_x_sqrt #### Binned + scale_x_binned | scale_y_binned ] .rightcol40[ ``` scale_x_binned( name = waiver(), breaks = waiver(), labels = waiver(), limits = NULL, exapand = waiver() oob = censor, na.value = NA_real_, trans = "identity", position = "bottom", ) ``` ] --- ## Scale .leftcol60[ **Position scales** #### Continuous + scale_x_continuous | scale_y_continuous + scale_x_log10 | scale_y_log10 + scale_x_reverse | scale_y_reverse + scale_x_sqrt | scale_x_sqrt #### Binned + scale_x_binned | scale_y_binned #### Discrete + scale_x_discrete | scale_y_discrete ] .rightcol40[ ``` scale_x_binned( name = waiver(), breaks = waiver(), labels = waiver(), limits = NULL, exapand = waiver() oob = censor, na.value = NA_real_, trans = "identity", position = "bottom", ) ``` ] --- ## Scale .leftcol[ **Color scales** Continuous + scale_color_continuous | scale_fill_continuous + scale_color_gradient | scale_fill_gradient Binned + scale_color_binned | scale_fill_binned + scale_color_steps | scale_fill_steps Discrete + scale_color_discrete | scale_fill_discrete + scale_color_hue | scale_fill_hue + scale_color_grey | scale_color_grey ] --- ## Scale .leftcol[ **Viridis family** Continuous + scale_color_viridis_c + scale_fill_viridis_c Binned + scale_color_viridis_b + scale_fill_viridis_b Discrete + scale_color_viridis_d + scale_fill_viridis_d ] -- .rightcol[ ```r ggplot(mtcars, aes(mpg, wt, color = cyl)) + geom_point(size = 3) + scale_color_viridis_c() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-81-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Scale .leftcol[ **Colorbrewer family** Continuous + scale_color_distiller | scale_fill_distiller Binned + scale_color_fermenter | scale_fill_fermenter Discrete + scale_color_brewer | scale_fill_brewer ``` #Useful parameters type = "seq"(sequential, the default), "div"(diverging), "qual"(qualitative) direction = 1 (default), -1 (reverse order) palette = name of pallete or index ``` ] -- .rightcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-82-1.png" width="80%" style="display: block; margin: auto;" /><img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-82-2.png" width="80%" style="display: block; margin: auto;" /><img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-82-3.png" width="80%" style="display: block; margin: auto;" /> ] --- ## Scales **`scale_*_manual`** .leftcol[ + available only for discrete scales + useful if you want to specify your own set of mappings from levels in the data to aesthetic values. For example + choosing set of colors in discrete color scale + specifying your own set of alpha + specifying your own set of shapes + specifying your own set of linetypes + ... ] -- .rightcol[ ```r ggplot(mtcars, aes(x = mpg, y = wt, color = factor(cyl))) + geom_point(size = 4) + * scale_color_manual(values = c("blue", "black", "orange")) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-83-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Scales **`scale_*_manual`** .leftcol[ + available only for discrete scales + useful if you want to specify your own set of mappings from levels in the data to aesthetic values. For example + choosing set of colors in discrete color scale + specifying your own set of alpha + specifying your own set of shapes + specifying your own set of linetypes + ... ] -- .rightcol[ ```r ggplot(mtcars, aes(x = mpg, y = wt, alpha = factor(cyl))) + geom_point(size = 4) + * scale_alpha_manual(values = c(0.3, 0.6, 1)) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-84-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Scales **`scale_*_manual`** .leftcol[ + available only for discrete scales + useful if you want to specify your own set of mappings from levels in the data to aesthetic values. For example + choosing set of colors in discrete color scale + specifying your own set of alpha + specifying your own set of shapes + specifying your own set of linetypes + ... ] -- .rightcol[ ```r ggplot(mtcars, aes(x = mpg, y = wt, shape = factor(cyl))) + geom_point(size = 4) + * scale_shape_manual(values = c(5, 10, 8)) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-85-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Scales .leftcol[ **Shortcuts** #### Labs + modify axis, legend, and plot labels + `xlabs`, `ylabs`: modify `x` and `y` axis label names + use `labs` with arguments: + title + x + subtitle + caption ] -- .rightcol[ ```r ggplot(mtcars, aes(mpg, wt)) + geom_point(size = 3) + * labs(title = "Title of my plot", * x = "Miles per gallon", * y = "Weight", * subtitle = "This would be a subtitle", * caption = "This is my caption") ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-86-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Scales .leftcol[ **Shortcuts** #### Lims + modify limits of the plot + use `xlim` and `ylim` ``` xlim(0,50) ylim(NA, 40) ``` + or `lims` specifying vectors ``` lims(x = c(0, 50)) lims(y = y(NA, 40)) ``` ] -- .rightcol[ ```r ggplot(mtcars, aes(mpg, wt)) + geom_point(size = 3) + labs(title = "Title of my plot", x = "Miles per gallon", y = "Weight", subtitle = "This would be a subtitle", caption = "This is my caption") + * xlim(10, 25) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-87-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Scales .leftcol[ **Shortcuts** #### Lims + modify limits of the plot + use `xlim` and `ylim` ``` xlim(0,50) ylim(NA, 40) ``` + or `lims` specifying vectors ``` lims(x = c(0, 50)) lims(y = y(NA, 40)) ``` ] -- .rightcol[ ```r ggplot(mtcars, aes(mpg, wt)) + geom_point(size = 3) + labs(title = "Title of my plot", x = "Miles per gallon", y = "Weight", subtitle = "This would be a subtitle", caption = "This is my caption") + * lims(x = c(0, 20), * y = c(1, 4)) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-88-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Position adjustments .leftcol[ + All layers have a position that resolves overlapping `geoms` + Overrides default using `position` argument to `geom_` or `stat_` function. ] --- ## Position adjustments .leftcol[ **`position_identity`** + `?position_identity` + Adds random noise to the data points to avoid overlaps + Useful for scatterplots **Ways to call** ``` position = "identity" position = position_identity() ``` ] -- .rightcol[ ```r ggplot(mpg, aes(x = class, y = hwy)) + geom_point(size = 3, * position = "identity") ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-89-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Position adjustments .leftcol[ **`position_jitter`** + `?position_jitter` + Adds random noise to the data points to avoid overlaps + Useful for scatterplots + wrapper: `geom_jitter` **Parameters** ``` seeds = random seeds to make jitter reproducible width = amount of jitter horizontally height = amount of jitter vertically ``` ] -- .rightcol[ ```r ggplot(mpg, aes(x = class, y = hwy)) + geom_point(size = 3, * position = position_jitter(width = 0.2, * seed = 143 )) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-90-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Position adjustments .leftcol[ **`position_stack()`** + `?position_stack` + Stacks geoms on top of each other. **`position_fill()`** + `?position_fill` + Stacks `geoms` on top of each other and standardizes the height. **Parameters** ``` reverse - default = FALSE, if TRUE will reverse the default stacking order. ``` ] -- .rightcol[ ```r ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) + geom_bar(alpha = 0.5, position = "identity") ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-91-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Position adjustments .leftcol[ **`position_stack()`** + `?position_stack` + Stacks geoms on top of each other. **`position_fill()`** + `?position_fill` + Stacks `geoms` on top of each other and standardizes the height. **Parameters** ``` reverse - default = FALSE, if TRUE will reverse the default stacking order. ``` ] -- .rightcol[ ```r ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) + geom_bar(alpha = 0.5, * position = "stack") ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-92-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Position adjustments .leftcol[ **`position_stack()`** + `?position_stack` + Stacks geoms on top of each other. **`position_fill()`** + `?position_fill` + Stacks `geoms` on top of each other and standardizes the height. **Parameters** ``` reverse - default = FALSE, if TRUE will reverse the default stacking order. ``` ] -- .rightcol[ ```r ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) + geom_bar(alpha = 0.5, * position = "fill") ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-93-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Position adjustment .leftcol[ **`position_dodge`** + `?position_dodge` + preserves the vertical position of a geom while adjusting the horizontal position. ] .rightcol[ ```r ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) + geom_bar(alpha = 0.5, * position = "dodge") ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-94-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Position adjustment .leftcol[ **`position_dodge`** + `?position_dodge` + preserves the vertical position of a geom while adjusting the horizontal position. **Parameters** ``` width(default = 0.9) refers to dodging width preserve = "single" / "total" (defaul = "total") ``` ] .rightcol[ ```r ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) + geom_bar(alpha = 0.5, * position = position_dodge(width = 1)) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-95-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Position adjustment .leftcol[ **`position_dodge`** + `?position_dodge` + preserves the vertical position of a geom while adjusting the horizontal position. **Parameters** ``` width(default = 0.9) refers to dodging width preserve = "single" / "total" (defaul = "total") ``` ] .rightcol[ ```r ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) + geom_bar(alpha = 0.5, position = position_dodge(width = 1, * preserve = "single")) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-96-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Scales and position adjustments .leftcol40[ **Practice exercise** Use the code below to generate a hypothetical dataset and recreate the barplot on the right. ```r # Create some data df <- data.frame(supp=rep(c("VC", "OJ"), each=3), dose=rep(c("0.5", "1", "2"),2), len=c(6.8, 15, 33, 4.2, 10, 29.5)) ``` ] .rightcol[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-98-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ syntax ```r ggplot(data = <dataset>, mapping = aes(x = <varX>, y = <varY>, ...)) + geom_<function>(..., stat = <stat>, position = <position>) + geom_<stat>(...) + <scale function> + <coordinate function> ``` + Coordinate are sets that locate points in space + `coord_cartesian()` + `coord_flip()` + `coord_polar()` ] .rightcol[ <img src="image/gg_coord.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_cartesian()`** + default coordinate system **Zooming into plots** + setting limits using scale + eliminates data outside the specified range + setting limits using coordinate system + proper way to zoom + does not eliminate data outside the plot **Parameters** ``` xlim, ylim, expand, clip ``` ] .rightcol[ ```r p <- ggplot(mtcars, aes(x = mpg, y = hp)) + geom_point(size = 3) + geom_smooth(size = 1.5) print(p) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-101-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_cartesian()`** + default coordinate system **Zooming into plots** + setting limits using scale + eliminates data outside the specified range + setting limits using coordinate system + proper way to zoom + does not eliminate data outside the plot **Parameters** ``` xlim, ylim, expand, clip ``` ] .rightcol[ ```r p + * scale_x_continuous(limits = c(15, 20)) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-102-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_cartesian()`** + default coordinate system **Zooming into plots** + setting limits using scale + eliminates data outside the specified range + setting limits using coordinate system + proper way to zoom + does not eliminate data outside the plot **Parameters** ``` xlim, ylim, expand, clip ``` ] .rightcol[ ```r p + * coord_cartesian(xlim = c(15, 20)) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-103-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_cartesian()`** + default coordinate system **Zooming into plots** + setting limits using scale + eliminates data outside the specified range + setting limits using coordinate system + proper way to zoom + does not eliminate data outside the plot **Parameters** ``` xlim, ylim, expand, clip ``` ] .rightcol[ ```r p + * coord_cartesian(expand = FALSE) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-104-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_cartesian()`** + default coordinate system **Zooming into plots** + setting limits using scale + eliminates data outside the specified range + setting limits using coordinate system + proper way to zoom + does not eliminate data outside the plot **Parameters** ``` xlim, ylim, expand, clip ``` ] .rightcol[ ```r p + coord_cartesian(expand = FALSE, * clip = "off") ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-105-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_flip()`** + Flips cartesian coordinates (i.e., horizontal axis becomes vertical axis). + Useful to draw plots in horizontal mode without having to change the aesthetic mappings. ] .rightcol[ ```r p + * coord_flip() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-106-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_polar()`** + Apply a pola coordinate system to the plot **Parameters** ``` theta : map angle to x or y direction: 1 (clockwise) -1 (anticlockwise) start: offset of starting point in radian ``` ] .rightcol[ ```r ggplot(mpg, aes(x = displ)) + geom_bar() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-107-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_polar()`** + Apply a pola coordinate system to the plot **Parameters** ``` theta : map angle to x or y direction: 1 (clockwise) -1 (anticlockwise) start: offset of starting point in radian ``` ] .rightcol[ ```r ggplot(mpg, aes(x = displ)) + geom_bar() + * coord_polar() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-108-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_polar()`** + Apply a pola coordinate system to the plot **Parameters** ``` theta : map angle to x or y direction: 1 (clockwise) -1 (anticlockwise) start: offset of starting point in radian ``` ] .rightcol[ ```r ggplot(mpg, aes(x = displ)) + geom_bar() + * coord_polar(theta = "y") ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-109-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_polar()`** + Apply a pola coordinate system to the plot **Parameters** ``` theta : map angle to x or y direction: 1 (clockwise) -1 (anticlockwise) start: offset of starting point in radian ``` ] .rightcol[ ```r ggplot(mpg, aes(x = displ)) + geom_bar() + coord_polar(theta = "y", * direction = -1) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-110-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Coordinates .leftcol[ **`coord_polar()`** + Apply a pola coordinate system to the plot **Parameters** ``` theta : map angle to x or y direction: 1 (clockwise) -1 (anticlockwise) start: offset of starting point in radian ``` ] .rightcol[ ```r ggplot(mpg, aes(x = displ)) + geom_bar() + coord_polar(theta = "y", * start = 0.5) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-111-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle center # Facets and Themes ---- --- ## Facets .leftcol[ syntax ```r ggplot(data = <dataset>, mapping = aes(x = <varX>, y = <varY>, ...)) + geom_<function>(..., stat = <stat>, position = <position>) + geom_<stat>(...) + <scale function> + <coordinate function> + facet_<function> ``` + Facets divide plot into subplots based on the values of one or more discrete variables. + `facet_wrap()` + `facet_grid()` ] .rightcol[ <img src="image/gg_facet.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Facets .leftcol[ **`facet_wrap** + "wraps" a 1d ribbon of panels into 2d + useful if you have a variable with many levels **Parameters** ``` ncol nrow scales ``` ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy)) + geom_blank() + xlab(NULL) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-115-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Facets .leftcol[ **`facet_wrap** + "wraps" a 1d ribbon of panels into 2d + useful if you have a variable with many levels ``` ncol nrow scales ``` ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy)) + geom_blank() + xlab(NULL) + * facet_wrap(~ class) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-116-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Facets .leftcol[ **`facet_grid** + produces a 2d grid of panels defined by variables which form the rows and columns + `.~ a` spreads the values across columns + `b ~ .` spreads the values of `b` down the ro ws + `a ~ b` spreads `a` across columns and `b` down rows ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy)) + geom_blank() + xlab(NULL) + * facet_grid(. ~ cyl) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-117-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Facets .leftcol[ **`facet_grid** + produces a 2d grid of panels defined by variables which form the rows and columns + `.~ a` spreads the values across columns + `b ~ .` spreads the values of `b` down the rows + `a ~ b` spreads `a` across columns and `b` down rows ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy)) + geom_blank() + xlab(NULL) + * facet_grid(drv ~ .) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-118-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Facets .leftcol[ **`facet_grid** + produces a 2d grid of panels defined by variables which form the rows and columns + `.~ a` spreads the values across columns + `b ~ .` spreads the values of `b` down the rows + `a ~ b` spreads `a` across columns and `b` down rows ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy)) + geom_blank() + xlab(NULL) + * facet_grid(drv ~ cyl) ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-119-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Themes .leftcol[ syntax ```r ggplot(data = <dataset>, mapping = aes(x = <varX>, y = <varY>, ...)) + geom_<function>(..., stat = <stat>, position = <position>) + geom_<stat>(...) + <scale function> + <coordinate function> + facet_<function> theme_<function> ``` + Controlling all non-data elements + title appearance + axis labels + axis ticks + strips + .... ] .rightcol[ <img src="image/gg_theme.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Themes .leftcol[ **Options** + Using the built-in-theme from `ggplot2` library + `theme_gray()` + `theme_bw()` + `theme_light()` + `theme_classic(`) + `...` + Using other package e.g., `ggthemes` ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy, color = factor(cyl))) + geom_point(size = 3) + * theme_gray() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-122-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Themes .leftcol[ **Options** + Using the built-in-theme from `ggplot2` library + `theme_gray()` + `theme_bw()` + `theme_light()` + `theme_classic(`) + `...` + Using other package e.g., `ggthemes` ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy, color = factor(cyl))) + geom_point(size = 3) + * theme_bw() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-123-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Themes .leftcol[ **Options** + Using the built-in-theme from `ggplot2` library + `theme_gray()` + `theme_bw()` + `theme_light()` + `theme_classic(`) + `...` + Using other package e.g., `ggthemes` ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy, color = factor(cyl))) + geom_point(size = 3, show.legend = FALSE) + facet_wrap(~ cyl) + * theme_bw() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-124-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Themes .leftcol[ **Options** + Using the built-in-theme from `ggplot2` library + `theme_gray()` + `theme_bw()` + `theme_light()` + `theme_classic(`) + `...` + Using other package e.g., `ggthemes` ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy, color = factor(cyl))) + geom_point(size = 3) + * theme_classic() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-125-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Themes .leftcol[ **Options** + Using the built-in-theme from `ggplot2` library + Using other package e.g., `ggthemes` + `theme_economist_white()` + `theme_fivethirtyeight()` + `theme_stata()` + `theme_tufte()` ] .rightcol[ ```r library(ggthemes) ggplot(mpg, aes(x = displ, hwy, color = factor(cyl))) + geom_point(size = 3) + * theme_economist_white() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-126-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Themes .leftcol[ **Options** + Using the built-in-theme from `ggplot2` library + Using other package e.g., `ggthemes` + `theme_economist_white()` + `theme_fivethirtyeight()` + `theme_stata()` + `theme_tufte()` ] .rightcol[ ```r ggplot(mpg, aes(x = displ, hwy, color = factor(cyl))) + geom_point(size = 3) + * theme_fivethirtyeight() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-127-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Themes .leftcol[ **Options** + Using the built-in-theme from `ggplot2` library + Using other package e.g., `ggthemes` + `theme_economist_white()` + `theme_fivethirtyeight()` + `theme_stata()` + `theme_tufte()` ] .rightcol[ ```r library(ggthemes) ggplot(mpg, aes(x = displ, hwy, color = factor(cyl))) + geom_point(size = 3) + * theme_stata() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-128-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Themes .leftcol[ **Options** + Using the built-in-theme from `ggplot2` library + Using other package e.g., `ggthemes` + `theme_economist_white()` + `theme_fivethirtyeight()` + `theme_stata()` + `theme_tufte()` ] .rightcol[ ```r library(ggthemes) ggplot(mpg, aes(x = displ, hwy, color = factor(cyl))) + geom_point(size = 3) + * theme_tufte() ``` <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-129-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Practice exercise .leftcol40[ **Let's apply what we have covered!** + Use the mpg dataset to recreate the plot. + But first, we need to do some data wrangling! + Use the updated mpg data to mimic the plot. ] .rightcol60[ <img src="02_understanding_ggplot2_files/figure-html/unnamed-chunk-131-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: center ## Session 2... DONE! <img src="image/proud_of_you.gif" width="60%" style="display: block; margin: auto;" /> --- class: middle center # Thank you! #### Slides created via the R packages: .leftcol[ <img src="image/xaringan.png" style="display:inline-block; margin: 0" width=20%/> ### xaringan by Yihui ] .rightcol[ <img src="image/xaringanthemer.png" style="display:inline-block; margin: 0" width=25%/> ### xaringanthemer and xaringanExtra by Garrick ]