Module stikpetP.effect_sizes.eff_size_phi
Expand source code
import pandas as pd
from ..other.table_cross import tab_cross
def es_phi(field1, field2, categories1=None, categories2=None):
'''
Pearson/Yule Phi Coefficient / Cole C2 / Mean Square Contingency
-----------------------------
After performing chi-square test the question of the effect size comes up. An obvious candidate to use in a measure of effect size is the test statistic, the \\eqn{\\chi^2}. One of the earliest and often mentioned measure uses this: the phi coefficient (or mean square contingency). Both Yule (1912, p. 596) and Pearson (1900, p. 12) mention this measure, and Cole (1949, p. 415) refers to it as Cole C2. It is also the same as Cohen's w (Cohen, 1988, p. 216), but Cohen does not specify it to be only for 2x2 tables.
It is interesting that this gives the same result, as if you would assign a 0 and 1 to each of the two variables categories, and calculate the regular correlation coefficient.
Pearson (1904, p. 6) calls the squared value (i.e. not taking the square root) the Mean Square Contingency.
Parameters
----------
field1 : pandas series
data with categories for the rows
field2 : pandas series
data with categories for the columns
categories1 : list or dictionary, optional
the two categories to use from field1. If not set the first two found will be used
categories2 : list or dictionary, optional
the two categories to use from field2. If not set the first two found will be used
Returns
-------
phi coefficient
Notes
-----
The formula used is (Pearson, 1900, p. 12):
$$\\phi = \\frac{a\\times d - b\\times c}{\\sqrt{R_1\\times R_2 \\times C_1 \\times C_2}}$$
*Symbols used:*
* \\(a\\) the count in the top-left cell of the cross table
* \\(b\\) the count in the top-right cell of the cross table
* \\(c\\) the count in the bottom-left cell of the cross table
* \\(d\\) the count in the bottom-right cell of the cross table
* \\(R_i\\) the sum of counts in the i-th row
* \\(C_i\\) the sum of counts in the i-th column
The formula is also sometimes expressed with a \\eqn{\\chi^2} value (Pearson, 1904, p.6; Cohen, 1988, p. 216):
$$\\phi = \\sqrt{\\frac{\\chi^2}{n}}$$
Note that Cohen w did not limit the size of the table, but uses the same formula.
See Also
--------
stikpetP.other.thumb_cohen_w.th_cohen_w : rules of thumb for Cohen w
References
----------
Cohen, J. (1988). *Statistical power analysis for the behavioral sciences* (2nd ed.). L. Erlbaum Associates.
Cole, L. C. (1949). The measurement of interspecific associaton. *Ecology, 30*(4), 411–424. https://doi.org/10.2307/1932444
Pearson, K. (1900). Mathematical Contributions to the Theory of Evolution. VII. On the Correlation of Characters not Quantitatively Measurable. *Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character*, 195, 1–405.
Pearson, K. (1904). *Contributions to the Mathematical Theory of Evolution. XIII. On the theory of contingency and its relation to association and normal correlation*. Dulau and Co.
Yule, G. U. (1912). On the methods of measuring association between two attributes. *Journal of the Royal Statistical Society, 75*(6), 579–652. https://doi.org/10.2307/2340126
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> es_phi(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"])
np.float64(0.1293456121124377)
'''
# determine sample cross table
tab = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="exclude")
# cell values of sample cross table
a = tab.iloc[0,0]
b = tab.iloc[0,1]
c = tab.iloc[1,0]
d = tab.iloc[1,1]
R1 = a+b
R2 = c+d
C1 = a+c
C2 = b+d
phi =(a*d - b*c)/(R1*R2*C1*C2)**0.5
return (phi)
Functions
def es_phi(field1, field2, categories1=None, categories2=None)
-
Pearson/Yule Phi Coefficient / Cole C2 / Mean Square Contingency
After performing chi-square test the question of the effect size comes up. An obvious candidate to use in a measure of effect size is the test statistic, the \eqn{\chi^2}. One of the earliest and often mentioned measure uses this: the phi coefficient (or mean square contingency). Both Yule (1912, p. 596) and Pearson (1900, p. 12) mention this measure, and Cole (1949, p. 415) refers to it as Cole C2. It is also the same as Cohen's w (Cohen, 1988, p. 216), but Cohen does not specify it to be only for 2x2 tables.
It is interesting that this gives the same result, as if you would assign a 0 and 1 to each of the two variables categories, and calculate the regular correlation coefficient.
Pearson (1904, p. 6) calls the squared value (i.e. not taking the square root) the Mean Square Contingency.
Parameters
field1
:pandas series
- data with categories for the rows
field2
:pandas series
- data with categories for the columns
categories1
:list
ordictionary
, optional- the two categories to use from field1. If not set the first two found will be used
categories2
:list
ordictionary
, optional- the two categories to use from field2. If not set the first two found will be used
Returns
phi coefficient
Notes
The formula used is (Pearson, 1900, p. 12): \phi = \frac{a\times d - b\times c}{\sqrt{R_1\times R_2 \times C_1 \times C_2}}
Symbols used:
- a the count in the top-left cell of the cross table
- b the count in the top-right cell of the cross table
- c the count in the bottom-left cell of the cross table
- d the count in the bottom-right cell of the cross table
- R_i the sum of counts in the i-th row
- C_i the sum of counts in the i-th column
The formula is also sometimes expressed with a \eqn{\chi^2} value (Pearson, 1904, p.6; Cohen, 1988, p. 216): \phi = \sqrt{\frac{\chi^2}{n}}
Note that Cohen w did not limit the size of the table, but uses the same formula.
See Also
th_cohen_w()
- rules of thumb for Cohen w
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). L. Erlbaum Associates.
Cole, L. C. (1949). The measurement of interspecific associaton. Ecology, 30(4), 411–424. https://doi.org/10.2307/1932444
Pearson, K. (1900). Mathematical Contributions to the Theory of Evolution. VII. On the Correlation of Characters not Quantitatively Measurable. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 195, 1–405.
Pearson, K. (1904). Contributions to the Mathematical Theory of Evolution. XIII. On the theory of contingency and its relation to association and normal correlation. Dulau and Co.
Yule, G. U. (1912). On the methods of measuring association between two attributes. Journal of the Royal Statistical Society, 75(6), 579–652. https://doi.org/10.2307/2340126
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> es_phi(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"]) np.float64(0.1293456121124377)
Expand source code
def es_phi(field1, field2, categories1=None, categories2=None): ''' Pearson/Yule Phi Coefficient / Cole C2 / Mean Square Contingency ----------------------------- After performing chi-square test the question of the effect size comes up. An obvious candidate to use in a measure of effect size is the test statistic, the \\eqn{\\chi^2}. One of the earliest and often mentioned measure uses this: the phi coefficient (or mean square contingency). Both Yule (1912, p. 596) and Pearson (1900, p. 12) mention this measure, and Cole (1949, p. 415) refers to it as Cole C2. It is also the same as Cohen's w (Cohen, 1988, p. 216), but Cohen does not specify it to be only for 2x2 tables. It is interesting that this gives the same result, as if you would assign a 0 and 1 to each of the two variables categories, and calculate the regular correlation coefficient. Pearson (1904, p. 6) calls the squared value (i.e. not taking the square root) the Mean Square Contingency. Parameters ---------- field1 : pandas series data with categories for the rows field2 : pandas series data with categories for the columns categories1 : list or dictionary, optional the two categories to use from field1. If not set the first two found will be used categories2 : list or dictionary, optional the two categories to use from field2. If not set the first two found will be used Returns ------- phi coefficient Notes ----- The formula used is (Pearson, 1900, p. 12): $$\\phi = \\frac{a\\times d - b\\times c}{\\sqrt{R_1\\times R_2 \\times C_1 \\times C_2}}$$ *Symbols used:* * \\(a\\) the count in the top-left cell of the cross table * \\(b\\) the count in the top-right cell of the cross table * \\(c\\) the count in the bottom-left cell of the cross table * \\(d\\) the count in the bottom-right cell of the cross table * \\(R_i\\) the sum of counts in the i-th row * \\(C_i\\) the sum of counts in the i-th column The formula is also sometimes expressed with a \\eqn{\\chi^2} value (Pearson, 1904, p.6; Cohen, 1988, p. 216): $$\\phi = \\sqrt{\\frac{\\chi^2}{n}}$$ Note that Cohen w did not limit the size of the table, but uses the same formula. See Also -------- stikpetP.other.thumb_cohen_w.th_cohen_w : rules of thumb for Cohen w References ---------- Cohen, J. (1988). *Statistical power analysis for the behavioral sciences* (2nd ed.). L. Erlbaum Associates. Cole, L. C. (1949). The measurement of interspecific associaton. *Ecology, 30*(4), 411–424. https://doi.org/10.2307/1932444 Pearson, K. (1900). Mathematical Contributions to the Theory of Evolution. VII. On the Correlation of Characters not Quantitatively Measurable. *Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character*, 195, 1–405. Pearson, K. (1904). *Contributions to the Mathematical Theory of Evolution. XIII. On the theory of contingency and its relation to association and normal correlation*. Dulau and Co. Yule, G. U. (1912). On the methods of measuring association between two attributes. *Journal of the Royal Statistical Society, 75*(6), 579–652. https://doi.org/10.2307/2340126 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> es_phi(df1['mar1'], df1['sex'], categories1=["WIDOWED", "DIVORCED"]) np.float64(0.1293456121124377) ''' # determine sample cross table tab = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="exclude") # cell values of sample cross table a = tab.iloc[0,0] b = tab.iloc[0,1] c = tab.iloc[1,0] d = tab.iloc[1,1] R1 = a+b R2 = c+d C1 = a+c C2 = b+d phi =(a*d - b*c)/(R1*R2*C1*C2)**0.5 return (phi)