Tuesday, February 2, 2021

Mod 4: Programming Structure

Mod-4.utf8


The provided dataset - raw

freqRaw <- c( ".6",".3",".4",".4",".2",".6",".3",".4",".9",".2" )
bpRaw <- c( "103","87","32","42","59","109","78","205","135","176" )
visit_1Raw <- c( "bad","bad","bad","bad","good","good","good","good","NA","bad")
visit_2Raw <- c( "low","low","high","high","low","low","high","high","high","high" )
visit_3Raw <- c( "low","high","low","high","low","high","low","high","high","high" )

Notes:

  • freq is frequency in the last 12 months e.g. freq of .3 equals .3*12=4 visits during the last 12 months
  • bp is blood pressure
  • visit_x indicates assessment number
  • visits need to be re-encoded for numeric designations:
    • 0: good or low
    • 1: bad or high


Transform

Let’s transform the data from strings to numbers and re-encode the visits using conditional statements and a loop within a function.

freqRaw <- c( ".6",".3",".4",".4",".2",".6",".3",".4",".9",".2" )
bpRaw <- c( "103","87","32","42","59","109","78","205","135","176" )
visit_1Raw <- c( "bad","bad","bad","bad","good","good","good","good","NA","bad")
visit_2Raw <- c( "low","low","high","high","low","low","high","high","high","high" )
visit_3Raw <- c( "low","high","low","high","low","high","low","high","high","high" )

freq <- as.numeric( freqRaw )
bp <- as.integer( bpRaw )

replaceOldNew <- function(vect,old1,new1,old2,new2){
    for (i in 1:length(vect)){
        if (vect[i]==old1) vect[i]=new1
        if (vect[i]==old2) vect[i]=new2
    }
    return ( as.numeric(vect) )
}

visit_1 <- replaceOldNew(visit_1Raw,"bad",1,"good",0)
visit_2 <- replaceOldNew(visit_2Raw,"low",0,"high",1)
visit_3 <- replaceOldNew(visit_3Raw,"low",0,"high",1)

df <- data.frame( freq,bp,visit_1,visit_2,visit_3 )
df
##    freq  bp visit_1 visit_2 visit_3
## 1   0.6 103       1       0       0
## 2   0.3  87       1       0       1
## 3   0.4  32       1       1       0
## 4   0.4  42       1       1       1
## 5   0.2  59       0       0       0
## 6   0.6 109       0       0       1
## 7   0.3  78       0       1       0
## 8   0.4 205       0       1       1
## 9   0.9 135      NA       1       1
## 10  0.2 176       1       1       1


Quick EDA

Boxplot of bloodpressure and Histogram of patient visits last 12 months.

par(mfrow=c(1,2))
boxplot( df$bp , main="Blood Pressure" )
hist( df$freq*12 , main="Visits" , xlab="Months" , ylab="Patients" )


A summary of blood pressure.

summary( df$bp )
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   32.00   63.75   95.00  102.60  128.50  205.00


BPs & MDs Ratings

There appears to be more of an agreement when blood pressure readings are in the mid-range. Per the histogram somewhere between 50 and 150 - this includes rounding considerations.

total <- rowSums( cbind(df$visit_1,df$visit_2,df$visit_3), na.rm=TRUE )
df <- cbind(df,total)
hist( df$bp , main="BP Reading Congruence" , xlab="Blood Pressure" , ylab="Doctors"  )


A bin width of 10 creates the following histogram.

hist( df$bp , breaks=10, main="BP Reading Congruence" , xlab="Blood Pressure" , ylab="Doctors"  )




GitHub

Related file(s) can be found at Git Me

No comments:

Post a Comment