The provided dataset - raw
freqRaw <- c( ".6",".3",".4",".4",".2",".6",".3",".4",".9",".2" )
bpRaw <- c( "103","87","32","42","59","109","78","205","135","176" )
visit_1Raw <- c( "bad","bad","bad","bad","good","good","good","good","NA","bad")
visit_2Raw <- c( "low","low","high","high","low","low","high","high","high","high" )
visit_3Raw <- c( "low","high","low","high","low","high","low","high","high","high" )
Notes:
- freq is frequency in the last 12 months e.g. freq of .3 equals .3*12=4 visits during the last 12 months
- bp is blood pressure
- visit_x indicates assessment number
- visits need to be re-encoded for numeric designations:
- 0: good or low
- 1: bad or high
Transform
Let’s transform the data from strings to numbers and re-encode the visits using conditional statements and a loop within a function.
freqRaw <- c( ".6",".3",".4",".4",".2",".6",".3",".4",".9",".2" )
bpRaw <- c( "103","87","32","42","59","109","78","205","135","176" )
visit_1Raw <- c( "bad","bad","bad","bad","good","good","good","good","NA","bad")
visit_2Raw <- c( "low","low","high","high","low","low","high","high","high","high" )
visit_3Raw <- c( "low","high","low","high","low","high","low","high","high","high" )
freq <- as.numeric( freqRaw )
bp <- as.integer( bpRaw )
replaceOldNew <- function(vect,old1,new1,old2,new2){
for (i in 1:length(vect)){
if (vect[i]==old1) vect[i]=new1
if (vect[i]==old2) vect[i]=new2
}
return ( as.numeric(vect) )
}
visit_1 <- replaceOldNew(visit_1Raw,"bad",1,"good",0)
visit_2 <- replaceOldNew(visit_2Raw,"low",0,"high",1)
visit_3 <- replaceOldNew(visit_3Raw,"low",0,"high",1)
df <- data.frame( freq,bp,visit_1,visit_2,visit_3 )
df
## freq bp visit_1 visit_2 visit_3
## 1 0.6 103 1 0 0
## 2 0.3 87 1 0 1
## 3 0.4 32 1 1 0
## 4 0.4 42 1 1 1
## 5 0.2 59 0 0 0
## 6 0.6 109 0 0 1
## 7 0.3 78 0 1 0
## 8 0.4 205 0 1 1
## 9 0.9 135 NA 1 1
## 10 0.2 176 1 1 1
Quick EDA
Boxplot of bloodpressure and Histogram of patient visits last 12 months.
par(mfrow=c(1,2))
boxplot( df$bp , main="Blood Pressure" )
hist( df$freq*12 , main="Visits" , xlab="Months" , ylab="Patients" )
A summary of blood pressure.
summary( df$bp )
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 32.00 63.75 95.00 102.60 128.50 205.00
BPs & MDs Ratings
There appears to be more of an agreement when blood pressure readings are in the mid-range. Per the histogram somewhere between 50 and 150 - this includes rounding considerations.
total <- rowSums( cbind(df$visit_1,df$visit_2,df$visit_3), na.rm=TRUE )
df <- cbind(df,total)
hist( df$bp , main="BP Reading Congruence" , xlab="Blood Pressure" , ylab="Doctors" )
A bin width of 10 creates the following histogram.
hist( df$bp , breaks=10, main="BP Reading Congruence" , xlab="Blood Pressure" , ylab="Doctors" )
GitHub
Related file(s) can be found at Git Me
No comments:
Post a Comment