Set environment
env <- c("plyr","dplyr","stringr")
lapply(env, library, character.only = 1)
Read a file containing students’ names, age, sex, and grade.
* The file name “Assignment 6 Dataset-1.txt” is hardcoded - str
and head
follows.
autoData <- read.table("Assignment 6 Dataset-1.txt", header = TRUE, sep = ",")
str(autoData)
## 'data.frame': 20 obs. of 4 variables:
## $ Name : chr "Raul" "Booker" "Lauri" "Leonie" ...
## $ Age : int 25 18 21 21 22 20 23 24 21 23 ...
## $ Sex : chr "Male" "Male" "Female" "Female" ...
## $ Grade: int 80 83 90 91 85 69 91 97 78 81 ...
head(autoData)
## Name Age Sex Grade
## 1 Raul 25 Male 80
## 2 Booker 18 Male 83
## 3 Lauri 21 Female 90
## 4 Leonie 21 Female 91
## 5 Sherlyn 22 Female 85
## 6 Mikaela 20 Female 69
* File name is user provided.
In this case readLine
is used. This does require an interactive session. From the R documentation:
“An interactive R session is one in which it is assumed that there is a human operator to interact with”
So the code will not run & show on a “Knit to HTML” document. Knitting throws an error unless the code first checks for session type. In this case we check for interactivity by using: if ( interactive() )
if ( interactive() ){
fileInput <- readline("Enter filename to read: ")
manualData <- read.table(fileInput, header = TRUE)
head(manualData)
} else {
cat( "This code block must be run in an interactive session." )
}
## This code block must be run in an interactive session.
Get the grouped mean using package plyr
groupedAverage = ddply(autoData, "Sex", summarise,
Average = mean(Grade)
)
groupedAverage
## Sex Average
## 1 Female 86.9375
## 2 Male 80.2500
So, how many males and females?
autoData %>% group_by(Sex) %>%
summarise(Count = n())
## # A tibble: 2 x 2
## Sex Count
## * <chr> <int>
## 1 Female 16
## 2 Male 4
Filtering
Filter the dataframe for names that contain the letter “i”. Then create a new data set with those names.
Let’s use package stringr
to make things easy and clear.
df <- autoData %>%
filter( str_detect(Name, "i") )
str(df)
## 'data.frame': 14 obs. of 4 variables:
## $ Name : chr "Lauri" "Leonie" "Mikaela" "Aiko" ...
## $ Age : int 21 21 20 24 21 23 23 20 23 21 ...
## $ Sex : chr "Female" "Female" "Female" "Female" ...
## $ Grade: int 90 91 69 97 78 81 98 87 97 67 ...
head(df)
## Name Age Sex Grade
## 1 Lauri 21 Female 90
## 2 Leonie 21 Female 91
## 3 Mikaela 20 Female 69
## 4 Aiko 24 Female 97
## 5 Tiffaney 21 Female 78
## 6 Corina 23 Female 81
Write to a file.
Write the new dataframe to a CSV file using write.csv
.
write.csv(df, "i-students.csv")
Opening the CSV file in Excel shows us:
GitHub
Related file(s) can be found at Git Me
No comments:
Post a Comment