4 Bar charts

4 Bar charts🔗ℹ

For this section, we’ll be using a CSV dump of the 2016 GSS (General Social Survey) from its respective R library, a dataset that sociologists continually manage to squeeze more and more insights out of. More importantly, the Gapminder dataset from the previous section has a lot of continuous variables (such as GDP per capita and life expectancy, which we worked with), but no categorical variables. The GSS has a wide variety of categorical variables to work with, making it ideal for making bar charts and histograms.

Similarly to last time, we load it up and take a gander:

> (define gss (df-read/csv "data/gss_sm.csv"))
> (show gss)
data-frame: 2867 rows x 33 columns
┌─────────┬─────────────┬────┬────────┬────────┬──────┐
│grass    │marital      │kids│siblings│relig   │ballot│
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│NA       │Married      │3   │2       │None    │1     │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│Legal    │Never Married│0   │3       │None    │2     │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│Not Legal│Married      │2   │3       │Catholic│3     │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│NA       │Married      │4+  │3       │Catholic│1     │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│Legal    │Married      │2   │2       │None    │3     │
├─────────┼─────────────┼────┼────────┼────────┼──────┤
│Legal    │Married      │2   │2       │None    │2     │
└─────────┴─────────────┴────┴────────┴────────┴──────┘
2861 rows, 27 cols elided
(use (show df everything #:n-rows 'all) for full frame)

Clearly, we have a lot of data to work with here, but a lot of it is categorical – meaning we can make some bar charts!

We start off by taking a look at the variable religion, which contains a condensed version of the religions in the GSS. (The variable relig is more descriptive, but has too many categories for simple examples.) We use the bar renderer, with no arguments, to take a look at the count:

> (graph #:data gss
         #:title "Religious preferences, GSS 2016"
         #:mapping (aes #:x "religion")
         (bar))

Let’s say that we wanted to, instead, look at the proportion of each religion among the whole, rather than its individual count. We can specify this with the #:mode argument of bar, which can either be 'count or 'prop, with 'count being the default behavior we saw before.

> (graph #:data gss
         #:title "Religious preferences, GSS 2016"
         #:mapping (aes #:x "religion")
         (bar #:mode 'prop))

With the y-axis representing proportions from 0 to 1, we now have a good idea of what’s going on here. Similarly to the last example with Gapminder, let’s say that we wanted to split on each region, cross-classifying between the categorical variables of religion and bigregion (Northeast/Midwest/South/West, in the US). To accomplish this, we can make the x-axis region, and then group with respect to the variable religion – effectively, making each individual region its own bar chart. To do this, we use the aesthetic #:group, and adjust the plot size:

> (graph #:data gss
         #:title "Religious preferences among regions, GSS 2016"
         #:mapping (aes #:x "bigregion" #:group "religion")
         #:width 600 #:height 400
         (bar #:mode 'prop))

Another way of laying out this data is to use a stacked bar chart, in which each bar itself is stratified by its variable. Graphite supports this by simply changing bar to stacked-bar:

> (graph #:data gss
         #:title "Religious preferences among regions, GSS 2016"
         #:mapping (aes #:x "bigregion" #:group "religion")
         #:width 600 #:height 400
         (stacked-bar #:mode 'prop))

But both of these methods of presentation, while they have their uses, are still difficult to read. Both of them require consulting the legend in order to determine the bar type, and furthermore, the stacked bar makes it somewhat difficult to compare different categories within each region. To remedy this, we need to introduce another concept...

1	Deciding what library to use
2	Key forms
3	Gapminder
4	Bar charts
5	Faceting
6	Data wrangling, 101
7	Data wrangling, 201: Wrangle harder