Creating a dataframe from raw participant data - Multiple files
One of the most useful functions I’ve found in R is the ability to read individual data files and combine them into a dataframe. The power of this functionality marks a massive improvement from the days of copy and paste from excel into SPSS (IBM Corp, 2015). This post outline how to create a dataframe in R from a large number of individual data files. Ideally this dataframe will then be analysed in R. However, these instructions can also be helpful in efficiently creating a data file for analysis in SPSS.
Imagine you are conducting a survey or an experiment using a software package like OpenSesame (Mathôt, Schreij, & Theeuwes, 2012) or Superlab (Corporation, 2012). Unlike tools such as Surveymonkey or Questback (Unipark, 2013), which collate all participant data into a single file for analysis, OpenSesame and Superlab generally generate an individual (CSV) data file for each participant.
The traditional method of analysing these data would require each participant file to be opened in Excel. The relevant cells can then be copied and pasted into an SPSS file. This process is time consuming, tedious, and may lead to errors in transfer.
The R solution
R (R Core Team, 2017) provides a much simpler solution to this, using primarily the functions
read.csv() is a function for reading data from a csv file into the R environment.
rbind() can be used to combine similar data sets by row: r (for row) bind.
The basic (still tedious) solution
One possibility for combining all the participant data files would be to read each file to an object individually and use
rbind() to combine the objects.
# create an object for each participant a <- read.csv("/path to participant_1_data") b <- read.csv("/path to participant_2_data") c <- read.csv("/path to participant_3_data") d <- ... # combine these objects to create a data frame df <- rbind(a,b,c,d...) # alternatively df <- rbind.data.frame(a,b,c...)
df is then your complete data set that you can analyse using R.
A more efficient solution
The above solution requires that you create an object for each participant file, and that each object is explicitly listed in
rbind() in order to be included. This leads to typing a lot of repetitive code. Fortunately, there is a much more efficient solution available.
- Save all the participant files into a single folder.
- In your R session, use
setwd()to set your working directory as the folder containing all the participant files.
- Create and save a list of all the files in the working directory using
- Finally use
lapply()to read all the files in the list consecutively. This can be nested alongside
do.call()in order to combine and create a dataframe.
# set working directory to location of data files setwd("path to directory containing all data files") # create list all files in an object file_list <- list.files() # pass this object created through the code below to read all the data files into a dataframe df <- do.call("rbind", lapply(file_list, read.csv))
The above code will read each data file in the directory, and combine them into a single dataframe:
df. This dataframe can then now be analysed using R.
The data can now be analysed in R.
Alternatively, you can use
write.foreign() from the package
foreign to write the data frame to an SPSS file for analysis. This can occasionally lead to errors (depending on a number of things, including the length of variable names). However, the initial solution (copy and paste from Excel into SPSS) can also be used. Create a csv file using the
write.csv, this file can then be opened using Excel, and copied and pasted into SPSS for analysis.
Corporation, C. (2012, March). SuperLab Pro (4.5.3). San Pedro CA. Retrieved from http://www.superlab.com
IBM Corp. (2015). SPSS. Armonk, NY: IBM Corp.
Mathôt, S., Schreij, D., & Theeuwes, J. (2012). OpenSesame: An open-source, graphical experiment builder for the social sciences. Behavior Research Methods, 44(2), 314–324. https://doi.org/10.3758/s13428-011-0168-7
R Core Team. (2017). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
Unipark, Q. (2013). QuestBack Unipark.(2013).