The provided data
names <- c("Jeb", "Donald", "Ted", "Marco", "Carly", "Hillary", "Bernie")
abcPoll <- c(4, 62, 51, 21, 2, 14, 15)
cbsPoll <- c(12, 75, 43, 19, 1, 21, 19)
names <- c("Jeb", "Donald", "Ted", "Marco", "Carly", "Hillary", "Bernie" )
abcPoll <- c(4, 62, 51, 21, 2, 14, 15)
cbsPoll <- c(12, 75, 43, 19, 1, 21, 19)
The data.frame
pollResult <- data.frame(names,abcPoll,cbsPoll)
pollResult <- data.frame(names,abcPoll,cbsPoll)
str(pollResult)
## 'data.frame': 7 obs. of 3 variables:
## $ names : chr "Jeb" "Donald" "Ted" "Marco" ...
## $ abcPoll: num 4 62 51 21 2 14 15
## $ cbsPoll: num 12 75 43 19 1 21 19
pollResult
## names abcPoll cbsPoll
## 1 Jeb 4 12
## 2 Donald 62 75
## 3 Ted 51 43
## 4 Marco 21 19
## 5 Carly 2 1
## 6 Hillary 14 21
## 7 Bernie 15 19
Let’s determine the rank order for names
. There are two possibilities.
(1) Ranking: simple average
Caveat: potential issue.
avgPolls <- (abcPoll+cbsPoll)*.5
pollResult <- cbind(pollResult,avgPolls)
pollResult[ order(-avgPolls), ]
## names abcPoll cbsPoll avgPolls
## 2 Donald 62 75 68.5
## 3 Ted 51 43 47.0
## 4 Marco 21 19 20.0
## 6 Hillary 14 21 17.5
## 7 Bernie 15 19 17.0
## 1 Jeb 4 12 8.0
## 5 Carly 2 1 1.5
colSums( pollResult[2:3] )
## abcPoll cbsPoll
## 169 190
Here we see the ranking sorted by avgPolls
.
What is the issue? The abcPoll
and the cbsPoll
have different raw totals - 169 vs 190. Probably due to different polling methodologies. Now, looking at this small dataset…probably not a big deal but, it could pose problems with larger datasets.
Let’s consider a proportional perspective.
(2) Ranking: proportional
As (1) above but using prop.table
first. Kind of normalizing the data?
prop_abc <- prop.table(abcPoll)
prop_cbs <- prop.table(cbsPoll)
propFrame <- data.frame(names,prop_abc,prop_cbs)
avgProps <- (prop_abc+prop_cbs)*.5
propResult <- cbind(propFrame,avgProps)
propResult[ order(-avgProps), ]
## names prop_abc prop_cbs avgProps
## 2 Donald 0.36686391 0.394736842 0.380800374
## 3 Ted 0.30177515 0.226315789 0.264045469
## 4 Marco 0.12426036 0.100000000 0.112130178
## 6 Hillary 0.08284024 0.110526316 0.096683276
## 7 Bernie 0.08875740 0.100000000 0.094378698
## 1 Jeb 0.02366864 0.063157895 0.043413267
## 5 Carly 0.01183432 0.005263158 0.008548739
colSums( propFrame[2:3] )
## prop_abc prop_cbs
## 1 1
The prop_abc
and prop_cbs
should add to 1…good.
In this case, there is no difference in ranking when using a simple average versus a proportional ranking. Maybe something to be watchful for the next time.
GitHub
Related file(s) can be found at Git Me
No comments:
Post a Comment