Data Wrangling with the tidyverse

Welcome

Welcome to the “Data Wrangling with tidyverse workshop!

About me

  • Postdoctoral researcher
  • Behavioral ecologist / Cognitive scientist
  • Also teach Quarto, PKM, and Open Science

What is Data Wrangling?

  • Cleaning, reshaping, transforming data
  • Examples:
    • Renaming columns
    • Filtering rows
    • Creating new variables
    • Summarizing or reshaping tables

What is the tidyverse?

  • A collection of R packages for data science
  • Key packages:
    • dplyr – manipulate data
    • ggplot2 – visualize data
    • readr – import data
    • tidyr – reshape data
    • forcats – helps with factor levels (like categories)
    • stringr – makes string/text handling easier
  • Unified grammar and syntax

Why use the tidyverse?

  • Human-readable syntax
  • Consistent verbs: filter(), select(), mutate()
  • Chain commands using pipes: %>%
  • Well-documented and community-supported

Meet our dataset: gapminder

  • Countries × years
  • Variables:
    • Country and Continent
    • life expectancy lifeExp
    • GDP per capita gdpPercap
    • population pop
  • Clean and well-structured ~ perfect for practice!
# A tibble: 6 × 6
  country     continent  year lifeExp      pop gdpPercap
  <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
1 Afghanistan Asia       1952    28.8  8425333      779.
2 Afghanistan Asia       1957    30.3  9240934      821.
3 Afghanistan Asia       1962    32.0 10267083      853.
4 Afghanistan Asia       1967    34.0 11537966      836.
5 Afghanistan Asia       1972    36.1 13079460      740.
6 Afghanistan Asia       1977    38.4 14880372      786.

Tidy Data principles

  • Each variable = one column
  • Each observation = one row
  • Each type of observation = one table

Visual from tidyr documentation: tidy data

Workshop Goals

You’ll learn how to:

  • Explore: glimpse(), summary()
  • Filter/Select: filter(), select()
  • Transform: mutate(), case_when()
  • Summarize: group_by(), summarise()
  • Visualize: ggplot2

What You’ll Need

  • RStudio or RStudio Cloud
  • tidyverse package installed
  • A copy of the .R script we’ll use live
  • .Rmd for follow-up practice Here is the GitHub repo where you can download all!

Let’s Begin!

  • Open your .R script now
  • I’ll walk you through each chunk step by step
  • Ask questions anytime in the chat or unmute

Ready? Let’s wrangle