Happy new year!
The aim of this workshop is to review the code and analysis I showed you last year.
Last term there were three workshops:
We can group the code from last term into three categories
We loaded/defined data. Then the data was explored. A simple statistical model was fit and an inference made. In other words, we found and answered a question with some data.
In this workshop, we are going to consider the British Social Attitudes survey. We will start off recovering ground and then move into more small group work. The goal here is to help you think about how you can use R to fit, consider and evaluate trends in data. Those who feel a little lost are encouraged to look at my previous workshop materials.
Our data for this workshop is the British Social Attitudes survey (see above). Those of you who attended my workshops last term will be familiar with this data set.
The data is a table. Each row is a person and each column is a response. The file has each row a new line and the columns are seperated by tabs.
Loading the data is done using a function called read.table. The arguments we give to the function are the filename, the column seperator - a tab or ‘’ - and that the first row of files contains the names of our variables. We use the assignment operator <- to save the output of this function (the data R reads in) in the enviroment to a variable called d.
# this is a comment, just to you know. R will not run this line.
# Make sure R is looking at the directory containing the data file
# You can set the working directory by going to Session > Set Working Directory > Choose Directory
# read in our data.
d <- read.table(file = 'bsa16_to_ukda.tab', sep = "\t", header = TRUE)
If we make changes to d then we can write a file - saving the data on the computer.
# We can write the content of d into a file laid out the same way as our origonal data
write.table(file = 'our_data.tab', sep = '\t', x = d)
# Or we can save the data in an R format
# The R format will often be smaller but must be loaded in R
save(d, file = 'our_data.RData')
# We can clear our enviroment and lose d
rm(list=ls())
# And load the R file back in
load(file = 'our_data.RData')
The BSA data has lots of variables. We are going to pick a few variables to explore from the documentation. RStudio has a cool autocomplete feature which might help. If you type in d$ into a code area (within the shaded area of this notebook or the console) RStudio will list all the columns in d.
Below we explore the data by looking at the sex, location and marriage status. All of this data is in number format and also need to be recoded (have labels attached to it).
# sex
d$Rsex.recode <- factor(x = d$Rsex, labels = c('Male', 'Female'))
table(d$Rsex.recode)
##
## Male Female
## 1291 1651
plot(d$Rsex.recode)
# location
d$Country.recode <- factor(x = d$Country, labels = c('England', 'Scotland', 'Wales'))
table(d$Country.recode)
##
## England Scotland Wales
## 2525 252 165
plot(d$Country.recode)
# marriage status
d$Married.recode <- factor(x = d$Married, labels = c('Married/living as married', 'Seperated/divorced', 'Widowed', 'Never married'))
table(d$Married.recode)
##
## Married/living as married Seperated/divorced
## 1618 426
## Widowed Never married
## 289 609
plot(d$Married.recode, las = 2)
# also
summary(d$Married.recode)
## Married/living as married Seperated/divorced
## 1618 426
## Widowed Never married
## 289 609
head(d$Married.recode)
## [1] Married/living as married Married/living as married
## [3] Married/living as married Married/living as married
## [5] Widowed Never married
## 4 Levels: Married/living as married Seperated/divorced ... Never married
str(d$Married.recode)
## Factor w/ 4 levels "Married/living as married",..: 1 1 1 1 3 4 2 2 1 3 ...
# A side note on finding out data types
# d is a data frame
class(d)
## [1] "data.frame"
# number variables
class(d$Rsex)
## [1] "integer"
# we turned some variables into categories (factors)
class(d$Rsex.recode)
## [1] "factor"
We can count data accross categories.
table(d$Married.recode, d$Country.recode)
##
## England Scotland Wales
## Married/living as married 1383 147 88
## Seperated/divorced 381 27 18
## Widowed 245 21 23
## Never married 516 57 36
# proportions
plot(d$Country.recode, d$Rsex.recode)
Group task
In your groups, visualise some of the variables. Look at their frequencies. Can you think of any interesting questions you would like to address?
Something new: Lattice.
# a short introduction is here
# https://www.statmethods.net/advgraphs/trellis.html
library(lattice)
histogram(~d$Rsex.recode | d$Country.recode, type = 'count')