Module stikpetP.effect_sizes.eff_size_odds_ratio
Expand source code
import math
from statistics import NormalDist
import pandas as pd
from ..other.table_cross import tab_cross
def es_odds_ratio(field1, field2, categories1=None, categories2=None):
'''
Odds Ratio
----------
Determines the odds ratio from a 2x2 table.
Odds can sometimes be reported as 'a one in five odds', but sometimes as 1 : 4. This later notation is less often seen, but means for every one event on the left side, there will be four on the right side.
The Odds is the ratio of that something will happen, over the probability that it will not. For the Odds Ratio, we compare the odds of the first category with the second group.
If the result is 1, it indicates that one variable has no influence on the other. A result higher than 1, indicates the odds are higher for the first category. A result lower than 1, indicates the odds are lower for the first.
Parameters
----------
field1 : pandas series
data with categories for the rows
field2 : pandas series
data with categories for the columns
categories1 : list or dictionary, optional
the two categories to use from field1. If not set the first two found will be used
categories2 : list or dictionary, optional
the two categories to use from field2. If not set the first two found will be used
Returns
-------
A dataframe with:
* *OR*, the odds ratio
* *n*, the sample size
* *statistic*, the test statistic (z-value)
* *p-value*, the significance (p-value)
Notes
-----
The formula used is (Fisher, 1935, p. 50):
$$OR = \\frac{a/c}{b/d} = \\frac{a\\times d}{b\\times c}$$
*Symbols used:*
* \\(a\\) the count in the top-left cell of the cross table
* \\(b\\) the count in the top-right cell of the cross table
* \\(c\\) the count in the bottom-left cell of the cross table
* \\(d\\) the count in the bottom-right cell of the cross table
* \\(\\Phi\\left(\\dots\\right)\\) the cumulative density function of the standard normal distribution
As for the test (McHugh, 2009, p. 123):
$$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$
With:
$$SE = \\sqrt{\\frac{1}{a} + \\frac{1}{b} + \\frac{1}{c} + \\frac{1}{d}}$$
$$z = \\frac{\\ln{\\left(OR\\right)}}{SE}$$
The p-value is for the null-hypothesis that the population OR is 1.
The term Odds Ratio can for example be found in Cox (1958, p. 222).
See Also
--------
stikpetP.other.thumb_odds_ratio.th_odds_ratio : rules of thumb for odds ratio
stikpetP.other.convert_es.es_convert : to convert an odds ratio to Yule Q, Yule Y, or Cohen d.
References
----------
Cox, D. R. (1958). The regression analysis of binary sequences. *Journal of the Royal Statistical Society: Series B (Methodological), 20*(2), 215–232. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x
Fisher, R. A. (1935). The logic of inductive inference. *Journal of the Royal Statistical Society, 98*(1), 39–82. https://doi.org/10.2307/2342435
McHugh, M. (2009). The odds ratio: Calculation, usage, and interpretation. *Biochemia Medica, 19*(2), 120–126. https://doi.org/10.11613/BM.2009.011
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> es_odds_ratio(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"])
OR n statistic p-value
0 1.750802 495 2.86455 0.004176
'''
# determine sample cross table
tab = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="exclude")
# cell values of sample cross table
a = tab.iloc[0,0]
b = tab.iloc[0,1]
c = tab.iloc[1,0]
d = tab.iloc[1,1]
# odds ratio
oddsRatio = a*d/(b*c)
# significance
se = (1/a + 1/b + 1/c + 1/d)**0.5
z = math.log(oddsRatio)/se
pValue = 2 * (1 - NormalDist().cdf(abs(z)))
n = a + b + c + d
#the results
colNames=["OR", "n", "statistic", "p-value"]
results = pd.DataFrame([[oddsRatio, n, z, pValue]], columns=colNames)
return (results)
Functions
def es_odds_ratio(field1, field2, categories1=None, categories2=None)
-
Odds Ratio
Determines the odds ratio from a 2x2 table.
Odds can sometimes be reported as 'a one in five odds', but sometimes as 1 : 4. This later notation is less often seen, but means for every one event on the left side, there will be four on the right side.
The Odds is the ratio of that something will happen, over the probability that it will not. For the Odds Ratio, we compare the odds of the first category with the second group.
If the result is 1, it indicates that one variable has no influence on the other. A result higher than 1, indicates the odds are higher for the first category. A result lower than 1, indicates the odds are lower for the first.
Parameters
field1
:pandas series
- data with categories for the rows
field2
:pandas series
- data with categories for the columns
categories1
:list
ordictionary
, optional- the two categories to use from field1. If not set the first two found will be used
categories2
:list
ordictionary
, optional- the two categories to use from field2. If not set the first two found will be used
Returns
A dataframe with:
- OR, the odds ratio
- n, the sample size
- statistic, the test statistic (z-value)
- p-value, the significance (p-value)
Notes
The formula used is (Fisher, 1935, p. 50): OR = \frac{a/c}{b/d} = \frac{a\times d}{b\times c}
Symbols used:
- a the count in the top-left cell of the cross table
- b the count in the top-right cell of the cross table
- c the count in the bottom-left cell of the cross table
- d the count in the bottom-right cell of the cross table
- \Phi\left(\dots\right) the cumulative density function of the standard normal distribution
As for the test (McHugh, 2009, p. 123): sig. = 2\times\left(1 - \Phi\left(\left|z\right|\right)\right)
With: SE = \sqrt{\frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}} z = \frac{\ln{\left(OR\right)}}{SE}
The p-value is for the null-hypothesis that the population OR is 1.
The term Odds Ratio can for example be found in Cox (1958, p. 222).
See Also
th_odds_ratio()
- rules of thumb for odds ratio
stikpetP.other.convert_es.es_convert : to convert an odds ratio to Yule Q
,Yule Y
,or Cohen d.
References
Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2), 215–232. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x
Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98(1), 39–82. https://doi.org/10.2307/2342435
McHugh, M. (2009). The odds ratio: Calculation, usage, and interpretation. Biochemia Medica, 19(2), 120–126. https://doi.org/10.11613/BM.2009.011
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> es_odds_ratio(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"]) OR n statistic p-value 0 1.750802 495 2.86455 0.004176
Expand source code
def es_odds_ratio(field1, field2, categories1=None, categories2=None): ''' Odds Ratio ---------- Determines the odds ratio from a 2x2 table. Odds can sometimes be reported as 'a one in five odds', but sometimes as 1 : 4. This later notation is less often seen, but means for every one event on the left side, there will be four on the right side. The Odds is the ratio of that something will happen, over the probability that it will not. For the Odds Ratio, we compare the odds of the first category with the second group. If the result is 1, it indicates that one variable has no influence on the other. A result higher than 1, indicates the odds are higher for the first category. A result lower than 1, indicates the odds are lower for the first. Parameters ---------- field1 : pandas series data with categories for the rows field2 : pandas series data with categories for the columns categories1 : list or dictionary, optional the two categories to use from field1. If not set the first two found will be used categories2 : list or dictionary, optional the two categories to use from field2. If not set the first two found will be used Returns ------- A dataframe with: * *OR*, the odds ratio * *n*, the sample size * *statistic*, the test statistic (z-value) * *p-value*, the significance (p-value) Notes ----- The formula used is (Fisher, 1935, p. 50): $$OR = \\frac{a/c}{b/d} = \\frac{a\\times d}{b\\times c}$$ *Symbols used:* * \\(a\\) the count in the top-left cell of the cross table * \\(b\\) the count in the top-right cell of the cross table * \\(c\\) the count in the bottom-left cell of the cross table * \\(d\\) the count in the bottom-right cell of the cross table * \\(\\Phi\\left(\\dots\\right)\\) the cumulative density function of the standard normal distribution As for the test (McHugh, 2009, p. 123): $$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$ With: $$SE = \\sqrt{\\frac{1}{a} + \\frac{1}{b} + \\frac{1}{c} + \\frac{1}{d}}$$ $$z = \\frac{\\ln{\\left(OR\\right)}}{SE}$$ The p-value is for the null-hypothesis that the population OR is 1. The term Odds Ratio can for example be found in Cox (1958, p. 222). See Also -------- stikpetP.other.thumb_odds_ratio.th_odds_ratio : rules of thumb for odds ratio stikpetP.other.convert_es.es_convert : to convert an odds ratio to Yule Q, Yule Y, or Cohen d. References ---------- Cox, D. R. (1958). The regression analysis of binary sequences. *Journal of the Royal Statistical Society: Series B (Methodological), 20*(2), 215–232. https://doi.org/10.1111/j.2517-6161.1958.tb00292.x Fisher, R. A. (1935). The logic of inductive inference. *Journal of the Royal Statistical Society, 98*(1), 39–82. https://doi.org/10.2307/2342435 McHugh, M. (2009). The odds ratio: Calculation, usage, and interpretation. *Biochemia Medica, 19*(2), 120–126. https://doi.org/10.11613/BM.2009.011 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> es_odds_ratio(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"]) OR n statistic p-value 0 1.750802 495 2.86455 0.004176 ''' # determine sample cross table tab = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="exclude") # cell values of sample cross table a = tab.iloc[0,0] b = tab.iloc[0,1] c = tab.iloc[1,0] d = tab.iloc[1,1] # odds ratio oddsRatio = a*d/(b*c) # significance se = (1/a + 1/b + 1/c + 1/d)**0.5 z = math.log(oddsRatio)/se pValue = 2 * (1 - NormalDist().cdf(abs(z))) n = a + b + c + d #the results colNames=["OR", "n", "statistic", "p-value"] results = pd.DataFrame([[oddsRatio, n, z, pValue]], columns=colNames) return (results)