Tidy Data
→ What is it?
Tidy data are data that are in a pleasant format for the computer - and in turn you - to work with. This format can be summarized as such:
Each column is a variable, each row is a 'case'.
It is important to hone two skills:
- Recognizing when data are - or are not - tidy
- Knowing how to untidy -> tidy
Much (e-)ink has been spilt writing about tidy data by teachers much more adept than myself, though I may eventually give it a spin myself. As for now, here are some links:
- Hadley Wickham's seminal 'tidy data' paper (PDF). The packages used to get from messy data to tidy data are now long out of date, but the descriptions of common 'messy' data are still incredibly valuable. It's an easy to read paper and genuinely delightful.
-
If you want a version where code is more up to date, this
tidyrvignette is a good resource: (link). - This chapter in R for Data Science (link).
-
The
tidyrcheatsheet is a great visual resource for if you know what your data looks like and what you want it to look like, but not how to get there (PDF).
→ Exercises
In early 2024, I made a tidy data exercise sheet for my friends. They were, at best, ambivalent, at worst actively resentful.
Anyway.
It takes place on a dark and stormy night.
→ SCENE
You pry back the wooden boards covering the windows of an old, vine-encrusted victorian home, and pull your tired and sodden body through the narrow window, thankful for a short respite from the torrential rain. Lifting yourself from the floor, your eyes slowly adjust to the room: so long has it gone without sunlight that darkness seems to have seeped into the very walls of the room. Fortunately for you, however, the house seems long abandoned. You begin to make camp for the night, risking a small fire in the fireplace to warm your bones and dry your clothes. Too restless for sleep, you pluck a mostly unburnt candle from a nearby candelabra, pass it briefly through the fire to collect a flame, and begin to explore the home.
→ ROOM 1
The room immediately to the right of the living room is circular and without any furniture to grace the creaking floorboards. You turn your attention upwards and note that the room extends - somewhat impossibly - upwards beyond the length of your candlelight. Furthermore, the wall is covered with portraits, not all of which are human nor any kind of animal you can discern: beastly things with talons and eyes that seem to glimmer. In fact, upon closer inspection you see the eyes glimmer because each eye in every portrait is a gemstone. You pluck out each gem (aided by a sliding ladder which protests underfoot at each rung) and record your spoils in a table:
| title | gem | n |
|---|---|---|
| A Night Scene | ruby | 10 |
| Reginald Foursight | opal | 4 |
| Harry L. Christopher | diamond | 2 |
| Lucy | red beryl | 74 |
Are these data tidy?
→ ROOM 2
Your plundering is cut short by the unexpected, rapid, and catastrophic disassembly of the ladder. You feed the pieces to the fire and amble to another room.
You're in luck - it's the larder, and you're starving. Despite the house appearing to be abandoned for several centuries, all the food is incredibly fresh. You find what appears to be an inventory list. Despite its weathered appearance - indeed, most of it crumbled by your touch - it seems to be up to date.
| item | amount |
|---|---|
| caviar | 10 jars |
| bread | 5 loaves |
| lettuce | 3 heads |
| coffee | 5 lbs |
| flour | 10 lbs |
| brandy | 2 liters |
Are these data tidy?
→ ROOM 3
Having eaten a questionable but filling meal in the larder, you head upstairs. In the hallway you find a brass telescope on a tripod. Curiously, the feet of the tripod are screwed into the floor, but the telescope can move freely on the mount. You determine the reason: on the window, in unsteady black marker, regions are circled (and named) such that when you peer through them with the telescope, you peer directly into each of the neighboring home's windows. Your unease and puzzlement is resolved when an young tabby hops into the region encircled by 'skipper'. Indeed, taped on the wall is a small chart:
| name | age | orange | tuxedo | hairless |
|---|---|---|---|---|
| rex | 3 | x | ||
| peridot | 7 | x | ||
| skipper | 2 | x | ||
| her majesty | 10 | x |
Are these data tidy?
→ ROOM 4
Having made your acquaintance with each of the neighborhood cats (giving - and sometimes receiving - a slow blink somewhat analogous to a handshake), you make your way to the room at the end of the hallway, taking a small detour only to steal some pillows from the bedroom along the way.
You make your way into an office. A solid, ornately carved wooden desk guards a plush leather chair. Atop the desk, papers still lay strewn, as though the person who put them there expected to come back only moments later. A fountain pen lay across the pages, ink having long since dried from the pen.
After briefly flicking though the papers on the desk, you find it includes doctors notes for several patients. You piece together some data from the patients into a more cohesive table:
| name | dob | rx 1 date | rx 1 name | rx 2 date | treat 2 name |
|---|---|---|---|---|---|
| Dudley, Hanch | 1000-12-40 | 2010-03-09 | metalinoclax | 2010-12-19 | halibraethe |
| Truck, Mike | 2004-10-10 | 2016-07-19 | glibosimet | NA | |
| Dugnut, Bobson | 1445-09-13 | 2018-09-03 | NA | NA | |
| McDichael, Sleve | 2007-01-13 | 2022-10-30 | quxitomab | 2024-01-13 | halibraethe |
Are these data tidy?
→ ROOM 5
You make your way back down to the fireplace, the air warm and admixed with the crackle of firewood and a recently deceased ladder. As you drift to sleep in a nest made of stolen bedding (steeling your mind for tomorrow's journey), you wonder a final question:
If any of these data were not tidy, what would they look like tidy?