Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: inter_rater.fleiss_kappa p-values and confidence interval #9207

Open
josef-pkt opened this issue Apr 14, 2024 · 1 comment · May be fixed by #9241
Open

ENH: inter_rater.fleiss_kappa p-values and confidence interval #9207

josef-pkt opened this issue Apr 14, 2024 · 1 comment · May be fixed by #9241

Comments

@josef-pkt
Copy link
Member

https://stackoverflow.com/questions/78323943/statistic-values-of-fleiss-kappa-using-statsmodels-stats-inter-rater/78324041#78324041

Note our fleiss_kappa includes also randolph's kappa, i.e. we would need p-values also for those.

(needs reference, I have not looked at this in a long time)

copy from answer

import numpy as np
import pandas as pd
from statsmodels.stats.inter_rater import fleiss_kappa
from scipy.stats import norm

np.random.seed(42)

data = {
    f'Item{i+1}': np.random.choice([0, 1, 2], size=30, p=[0.33, 0.33, 0.34]) for i in range(15)
}
df = pd.DataFrame(data)

formatted_data = {
    f"Category {cat}": [(df[item] == cat).sum() for item in df] for cat in range(3)
}
formatted_df = pd.DataFrame(formatted_data)

kappa = fleiss_kappa(formatted_df.values)

category_totals = formatted_df.sum(axis=1) 
p = np.sum((category_totals / (30 * 15))**2)  

n = 15  
k = 3   
N = n * 30  

variance = (1 / (N * (n - 1))) * (N * p * (1 - p) + (n * (k - 1) * (p - (1 / k)**2)))
if variance > 0:
    z_value = kappa / np.sqrt(variance)
    p_value = 2 * (1 - norm.cdf(np.abs(z_value)))
    z_critical = norm.ppf(0.975)
    margin_of_error = z_critical * np.sqrt(variance)
    lower_bound = kappa - margin_of_error
    upper_bound = kappa + margin_of_error

    print("Fleiss' kappa:", kappa)
    print("Z-value:", z_value)
    print("P-value:", p_value)
    print("Confidence interval (95%):", (lower_bound, upper_bound))
else:
    print("Variance calculation error: Non-positive variance", variance)
Fleiss' kappa: -0.008536683290635389
Z-value: -0.1312124600755962
P-value: 0.8956072394628303
Confidence interval (95%): (-0.13605194965657783, 0.11897858307530704)
@jseabold
Copy link
Member

jseabold commented May 8, 2024

I needed this today as well coincidentally, so coded something up based on Fleiss, Nee, and Landis (1979) "Large sample variance of kappa in the case of different set of raters." Equation 3 in this paper (which says don't do it). This is what stata uses. If the number of raters is not the same for each subject, they don't produce anything for inference.

def fleiss_standard_error(table):
    n, k = table.shape  # n_subjects, n_choices
    m = table.sum(axis=1)[0]  # assume they all have the same ratings count
    p_bar = table.sum(axis=0) / (n * m)
    q_bar = 1 - p_bar

    return (
        (2 ** .5 / (p_bar.dot(q_bar) * np.sqrt(n * m * (m - 1))))
        * (
            (p_bar.dot(q_bar) ** 2) - np.sum(p_bar * q_bar * (q_bar - p_bar))
        ) ** .5
    )

jseabold added a commit to jseabold/statsmodels that referenced this issue May 9, 2024
@jseabold jseabold linked a pull request May 9, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants