Computes the point-biserial correlation between a dichotomous and a continuous variable.
biserial.cor(x, y, use = c("all.obs", "complete.obs"), level = 1)
a numeric vector representing the continuous variable.
a factor or a numeric vector (that will be converted to a factor) representing the dichotomous variable.
If use is "all.obs", then the presence of missing observations will produce an error. If use is "complete.obs" then missing values are handled by casewise deletion.
which level of y to use.
The point biserial correlation computed by biserial.cor() is defined as follows
where \overline_1 and \overline_0 denote the sample means of the X -values corresponding to the first and second level of Y , respectively, S_x is the sample standard deviation of X , and \pi is the sample proportion for Y = 1 . The first level of Y is defined by the level argument; see Examples.
the (numeric) value of the point-biserial correlation.
Changing the order of the levels for y will produce a different result. By default, the first level is used as a reference level
# the point-biserial correlation between # the total score and the first item, using # '0' as the reference level biserial.cor(rowSums(LSAT), LSAT[[1]]) # and using '1' as the reference level biserial.cor(rowSums(LSAT), LSAT[[1]], level = 2)