mendel_data
P W
observed 459 141
expected 450 150
BayesFactor
and simplebf
provide functions and automations for categorical count or frequency dataThe BayesFactor
package has some functions for performing other types of tests and returning a Bayes Factor. In this section we will briefly look at these.
A common question is whether proportions of counted things or frequency is different between samples. The one we typically learn first as biologists is Mendel’s pea data that led to his genetic insights, like this 2x2 table for flower colour (purple or white). Note that we have the counts of flower colour that were observed and expected counts that would come from a 3:1 Mendelian segregating cross.
mendel_data
P W
observed 459 141
expected 450 150
The \(\chi^2\) test is the classical frequentist test performed to determine differences in proportions in a contingency table, and there is an equivalent Bayesian method in BayesFactor
. We can run our data through the function contingencyTableBF()
very easily, but it does need the data to be an R matrix object, not the more typical dataframe. We can change that easily with as.matrix()
, then run the function.
The arguments are important: fixedMargin
describes whether the variable of interest is in the rows or columns of the table - here it is in the columns so we use cols
; sampleType
describes what the function should do in the Bayesian sampling process as it runs. This is highly technical and out of scope for what we want to discuss, so I’m going to gloss over it. The function documentation has more information if you want it (?contingencyTableBF
) the option used here indepMulti
is a good one to start with.
<- as.matrix(mendel_data)
mendel_matrix
library(BayesFactor)
contingencyTableBF(mendel_matrix, sampleType = "indepMulti", fixedMargin='cols')
Bayes factor analysis
--------------
[1] Non-indep. (a=1) : 0.1011097 ±0%
Against denominator:
Null, independence, a = 1
---
Bayes factor type: BFcontingencyTable, independent multinomial
The hypotheses that are tested in this example are fixed and simple ones. Strictly \(H_0\) is that the proportions in the table are equal and \(H_1\) is that the proportions are not equal. So in effect the whole table is tested to see whether the observed counts are different to the expected counts. Here we see that the odds are 1:0.101 against \(H_1\) so the conclusion is that the proportions are equal, that is our observed flower colour proportions match the expected.
There isn’t a way to use different \(H_1\)’s in the way that we did with the Bayes Factor \(t\)-test, so we can’t test the explicit hypothesis that one is bigger (or smaller than the other).
In most of our work we’ve used tidy data (or case based data) in dataframes. The function we just learned uses a contingency table in a matrix, not a dataframe. Sometimes too, we will want to make a contingency table to see it. We can make a contingency table out of a dataframe with the table
function, we just have to select the columns we want using the $
notation.
hr_df
# A tibble: 9 × 3
strain replicate score
<chr> <dbl> <dbl>
1 control 1 1
2 mild 1 3
3 deadly 1 4
4 control 2 2
5 mild 2 3
6 deadly 2 4
7 control 3 1
8 mild 3 3
9 deadly 3 3
<- table(hr_df$score,hr_df$strain) hr_cont_table
Sometime we’ll have a contingency table of counts that is larger than 2 x 2 IE we have more than two samples and more than two levels of a variable. For example we might have this HR scoring table.
hr_table
control deadly mild
1 2 0 0
2 1 0 0
3 0 1 3
4 0 2 0
As we can see it shows an HR score in the rows and different strains in the columns. The numbers represent the count of times each score was seen in three replicated experiments. Because it’s a contingency table the replicates are merged in together. It is important therefore that the same amount of sampling was done in each strain.
Here we would want to compare the two basic hypotheses of whether the proportions of observed scores are different between the strains are the same or not. Let’s go ahead and do that with contingencyTableBF()
contingencyTableBF(hr_table, sampleType = "indepMulti", fixedMargin = "cols")
Bayes factor analysis
--------------
[1] Non-indep. (a=1) : 11.55 ±0%
Against denominator:
Null, independence, a = 1
---
Bayes factor type: BFcontingencyTable, independent multinomial
We get a clear answer, the Bayes Factor strongly favours the hypothesis that the proportions of scores across strains are not equal. Which is nice but it doesn’t go far enough - it doesn’t tell us which are bigger than others and whether the conclusion applies to all the possible pairings of strains. This is the same problem we had with the Bayes Factor \(t\)-test and the solution is the same. We can just pull out each pair of strains and compare them one pair at a time. All we need is a book-keeping method to do this. The library simplebf
contains one, so let’s use that.
We can use the allpairs_proportionbf()
function to get a data frame of Bayes Factors. If you pass this function a dataframe it will make the contingency table for you. You must specify which columns to use for the group and the counts. For easy reading we’ll send the output to the knitr::kable()
function.
library(simplebf)
allpairs_proportionbf(hr_df,
group_col = "strain", count_col = "score",
sample_type = "indepMulti") %>%
::kable() knitr
control_group | test_group | h_0 | h_1 | BayesFactor | odds_h_1 | summary |
---|---|---|---|---|---|---|
control | mild | mild proportions equal to control proportions | mild proportions not equal to control proportions | 5.6000 | 1:5.6 | Substantial evidence for H_1 compared to H_0 |
control | deadly | deadly proportions equal to control proportions | deadly proportions not equal to control proportions | 4.2000 | 1:4.2 | Substantial evidence for H_1 compared to H_0 |
mild | deadly | deadly proportions equal to mild proportions | deadly proportions not equal to mild proportions | 2.1875 | 1:2.1875 | Anecdotal evidence for H_1 compared to H_0 |
So we get a nice set of Bayesian Hypothesis test for proportion or contingency table data on our HR experiment.
BayesFactor
and simplebf
packages are useful tools implementing these