Wednesday, March 3, 2021

Mod 8: I/O

Mod-8.utf8


Set environment

env <- c("plyr","dplyr","stringr")
lapply(env, library, character.only = 1)


Read a file containing students’ names, age, sex, and grade.


* The file name “Assignment 6 Dataset-1.txt” is hardcoded - str and head follows.
autoData <- read.table("Assignment 6 Dataset-1.txt", header = TRUE, sep = ",")
str(autoData)
## 'data.frame':    20 obs. of  4 variables:
##  $ Name : chr  "Raul" "Booker" "Lauri" "Leonie" ...
##  $ Age  : int  25 18 21 21 22 20 23 24 21 23 ...
##  $ Sex  : chr  "Male" "Male" "Female" "Female" ...
##  $ Grade: int  80 83 90 91 85 69 91 97 78 81 ...
head(autoData)
##      Name Age    Sex Grade
## 1    Raul  25   Male    80
## 2  Booker  18   Male    83
## 3   Lauri  21 Female    90
## 4  Leonie  21 Female    91
## 5 Sherlyn  22 Female    85
## 6 Mikaela  20 Female    69


* File name is user provided.

In this case readLine is used. This does require an interactive session. From the R documentation:

“An interactive R session is one in which it is assumed that there is a human operator to interact with”

So the code will not run & show on a “Knit to HTML” document. Knitting throws an error unless the code first checks for session type. In this case we check for interactivity by using: if ( interactive() )

if ( interactive() ){
  fileInput <- readline("Enter filename to read: ")
  manualData <- read.table(fileInput, header = TRUE)
  head(manualData)
} else {
  cat( "This code block must be run in an interactive session."  )
}
## This code block must be run in an interactive session.


Get the grouped mean using package plyr

groupedAverage = ddply(autoData, "Sex", summarise,
                       Average = mean(Grade)
                       )
groupedAverage
##      Sex Average
## 1 Female 86.9375
## 2   Male 80.2500


So, how many males and females?

autoData %>% group_by(Sex) %>%
  summarise(Count = n())
## # A tibble: 2 x 2
##   Sex    Count
## * <chr>  <int>
## 1 Female    16
## 2 Male       4


Filtering

Filter the dataframe for names that contain the letter “i”. Then create a new data set with those names.

Let’s use package stringr to make things easy and clear.

df <- autoData %>%
  filter( str_detect(Name, "i")  )
str(df)
## 'data.frame':    14 obs. of  4 variables:
##  $ Name : chr  "Lauri" "Leonie" "Mikaela" "Aiko" ...
##  $ Age  : int  21 21 20 24 21 23 23 20 23 21 ...
##  $ Sex  : chr  "Female" "Female" "Female" "Female" ...
##  $ Grade: int  90 91 69 97 78 81 98 87 97 67 ...
head(df)
##       Name Age    Sex Grade
## 1    Lauri  21 Female    90
## 2   Leonie  21 Female    91
## 3  Mikaela  20 Female    69
## 4     Aiko  24 Female    97
## 5 Tiffaney  21 Female    78
## 6   Corina  23 Female    81


Write to a file.

Write the new dataframe to a CSV file using write.csv.

write.csv(df,  "i-students.csv")


Opening the CSV file in Excel shows us:

pic of i-students

GitHub

Related file(s) can be found at Git Me

No comments:

Post a Comment