Module stikpetP.effect_sizes.eff_size_cohen_h_os
Expand source code
import math
import pandas as pd
def es_cohen_h_os(data, p0=0.5, p0Cat=None, codes=None):
'''
Cohen's h'
----------
An adaptation of Cohen h for a one-sample case. It is an effect size measure that could be accompanying a one-sample binomial, score or Wald test.
See also https://youtu.be/ddWe94VKX_8, a video on Cohen h'.
This function is shown in this [YouTube video](https://youtu.be/CHFPfThJ4aY) and the effect size is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/CohenH.html)
Parameters
----------
data : list or pandas data series
the data
p0 : float, optional
hypothesized proportion for the first category (default is 0.5)
p0Cat : optional
the category for which p0 was used
codes : list, optional
the two codes to use
Returns
-------
pandas.DataFrame
A dataframe with the following columns:
- *Cohen's h'* : value of Cohen h'
- *comment* : description on which category for p0 was used.
Notes
-----
To decide on which category is associated with p0 the following is used:
* If codes are provided, the first code is assumed to be the category for the p0.
* If p0Cat is specified that will be used for p0 and all other categories will be considered as category 2, this means if there are more than two categories the remaining two or more (besides p0Cat) will be merged as one large category.
* If neither codes or p0Cat is specified and more than two categories are in the data a warning is printed and no results.
* If neither codes or p0Cat is specified and there are two categories, p0 is assumed to be for the category closest matching the p0 value (i.e. if p0 is above 0.5 the category with the highest count is assumed to be used for p0)
Formula used (Cohen, 1988, p. 202):
$$h'=\\phi_{1}-\\phi_{h_0}$$
With:
$$\\phi_{i}=2\\times\\text{arcsin}\\sqrt{p_{i}}$$
$$p_i = \\frac{F_i}{n}$$
$$n = \\sum_{i=1}^k F_i$$
*Symbols used*:
* $F_i$, is the (absolute) frequency (count) of category i
* $n$, is the sample size, i.e. the sum of all frequencies
* $p_i$, the proportion of cases in category i
* $p_{h_0}$, the expected proportion (i.e. the proportion according to the null hypothesis)
Before, After and Alternatives
------------------------------
Before this effect size you might first want to perform a test:
* [ts_binomial_os](../tests/test_binomial_os.html#ts_binomial_os) for a One-Sample Binomial Test
* [ts_score_os](../tests/test_score_os.html#ts_score_os) for One-Sample Score Test
* [ts_wald_os](../tests/test_wald_os.html#ts_wald_os) for One-Sample Wald Test
After this, you might want a rule-of-thumb or first convert this to a 'regular' Cohen h:
* [es_convert](../effect_sizes/convert_es.html#es_convert) to convert Cohen h' to Cohen h, use fr="cohenhos" and to=cohenh
* [th_cohen_h](../other.thumb/cohen_h.html#th_cohen_h) for rules-of-thumb for Cohen h
Alternatives could be:
* [es_cohen_g](../effect_sizes/eff_size_cohen_g.html#es_cohen_g) for Cohen g
* [es_alt_ratio](../effect_sizes/eff_size_alt_ratio.html#es_alt_ratio) for Alternative Ratio
* [r_rosenthal](../correlations/cor_rosenthal.html#r_rosenthal) for Rosenthal Correlation if a z-value is available
References
----------
Cohen, J. (1988). *Statistical power analysis for the behavioral sciences* (2nd ed.). L. Erlbaum Associates.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Example 1: Numeric list
>>> ex1 = [1, 1, 2, 1, 2, 1, 2, 1]
>>> es_cohen_h_os(ex1)
Cohen h' comment
0 0.25268 assuming p0 for 1
>>> es_cohen_h_os(ex1, p0=0.3)
Cohen h' comment
0 0.664197 assuming p0 for 1
Example 2: pandas Series
>>> import pandas as pd
>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> es_cohen_h_os(df1['sex'])
Cohen h' comment
0 0.10251 assuming p0 for FEMALE
>>> es_cohen_h_os(df1['mar1'], codes=["DIVORCED", "NEVER MARRIED"])
Cohen h' comment
0 -0.114495 with p0 for DIVORCED
'''
if type(data) is list:
data = pd.Series(data)
#remove missing values
data = data.dropna()
#Determine number of successes, failures, and total sample size
if codes is None:
#create a frequency table
freq = data.value_counts()
if p0Cat is None:
#check if there were exactly two categories or not
if len(freq) != 2:
# unable to determine which category p0 would belong to, so print warning and end
print("WARNING: data does not have two unique categories, please specify two categories using codes parameter")
return
else:
#simply select the two categories as cat1 and cat2
n1 = freq.values[0]
n2 = freq.values[1]
n = n1 + n2
#determine p0 was for which category
p0_cat = freq.index[0]
if p0 > 0.5 and n1 < n2:
n3=n2
n2 = n1
n1 = n3
p0_cat = freq.index[1]
cat_used = "assuming p0 for " + str(p0_cat)
else:
n = sum(freq.values)
n1 = sum(data==p0Cat)
n2 = n - n1
p0_cat = p0Cat
cat_used = "with p0 for " + str(p0Cat)
else:
n1 = sum(data==codes[0])
n2 = sum(data==codes[1])
n = n1 + n2
cat_used = "with p0 for " + str(codes[0])
p1 = n1/n
phi1 = 2 * math.asin(p1**0.5)
phic = 2 * math.asin(p0**0.5)
h2 = phi1 - phic
results = pd.DataFrame([[h2, cat_used]], columns=["Cohen h'", "comment"])
return (results)
Functions
def es_cohen_h_os(data, p0=0.5, p0Cat=None, codes=None)
-
Cohen's h'
An adaptation of Cohen h for a one-sample case. It is an effect size measure that could be accompanying a one-sample binomial, score or Wald test.
See also https://youtu.be/ddWe94VKX_8, a video on Cohen h'.
This function is shown in this YouTube video and the effect size is also described at PeterStatistics.com
Parameters
data
:list
orpandas data series
- the data
p0
:float
, optional- hypothesized proportion for the first category (default is 0.5)
p0Cat
:optional
- the category for which p0 was used
codes
:list
, optional- the two codes to use
Returns
pandas.DataFrame
-
A dataframe with the following columns:
- Cohen's h' : value of Cohen h'
- comment : description on which category for p0 was used.
Notes
To decide on which category is associated with p0 the following is used: * If codes are provided, the first code is assumed to be the category for the p0. * If p0Cat is specified that will be used for p0 and all other categories will be considered as category 2, this means if there are more than two categories the remaining two or more (besides p0Cat) will be merged as one large category. * If neither codes or p0Cat is specified and more than two categories are in the data a warning is printed and no results. * If neither codes or p0Cat is specified and there are two categories, p0 is assumed to be for the category closest matching the p0 value (i.e. if p0 is above 0.5 the category with the highest count is assumed to be used for p0)
Formula used (Cohen, 1988, p. 202): h'=\phi_{1}-\phi_{h_0}
With: \phi_{i}=2\times\text{arcsin}\sqrt{p_{i}} p_i = \frac{F_i}{n} n = \sum_{i=1}^k F_i
Symbols used:
- $F_i$, is the (absolute) frequency (count) of category i
- $n$, is the sample size, i.e. the sum of all frequencies
- $p_i$, the proportion of cases in category i
- $p_{h_0}$, the expected proportion (i.e. the proportion according to the null hypothesis)
Before, After and Alternatives
Before this effect size you might first want to perform a test: * ts_binomial_os for a One-Sample Binomial Test * ts_score_os for One-Sample Score Test * ts_wald_os for One-Sample Wald Test
After this, you might want a rule-of-thumb or first convert this to a 'regular' Cohen h: * es_convert to convert Cohen h' to Cohen h, use fr="cohenhos" and to=cohenh * th_cohen_h for rules-of-thumb for Cohen h
Alternatives could be: * es_cohen_g for Cohen g * es_alt_ratio for Alternative Ratio * r_rosenthal for Rosenthal Correlation if a z-value is available
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). L. Erlbaum Associates.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Example 1: Numeric list
>>> ex1 = [1, 1, 2, 1, 2, 1, 2, 1] >>> es_cohen_h_os(ex1) Cohen h' comment 0 0.25268 assuming p0 for 1 >>> es_cohen_h_os(ex1, p0=0.3) Cohen h' comment 0 0.664197 assuming p0 for 1
Example 2: pandas Series
>>> import pandas as pd >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> es_cohen_h_os(df1['sex']) Cohen h' comment 0 0.10251 assuming p0 for FEMALE >>> es_cohen_h_os(df1['mar1'], codes=["DIVORCED", "NEVER MARRIED"]) Cohen h' comment 0 -0.114495 with p0 for DIVORCED
Expand source code
def es_cohen_h_os(data, p0=0.5, p0Cat=None, codes=None): ''' Cohen's h' ---------- An adaptation of Cohen h for a one-sample case. It is an effect size measure that could be accompanying a one-sample binomial, score or Wald test. See also https://youtu.be/ddWe94VKX_8, a video on Cohen h'. This function is shown in this [YouTube video](https://youtu.be/CHFPfThJ4aY) and the effect size is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/CohenH.html) Parameters ---------- data : list or pandas data series the data p0 : float, optional hypothesized proportion for the first category (default is 0.5) p0Cat : optional the category for which p0 was used codes : list, optional the two codes to use Returns ------- pandas.DataFrame A dataframe with the following columns: - *Cohen's h'* : value of Cohen h' - *comment* : description on which category for p0 was used. Notes ----- To decide on which category is associated with p0 the following is used: * If codes are provided, the first code is assumed to be the category for the p0. * If p0Cat is specified that will be used for p0 and all other categories will be considered as category 2, this means if there are more than two categories the remaining two or more (besides p0Cat) will be merged as one large category. * If neither codes or p0Cat is specified and more than two categories are in the data a warning is printed and no results. * If neither codes or p0Cat is specified and there are two categories, p0 is assumed to be for the category closest matching the p0 value (i.e. if p0 is above 0.5 the category with the highest count is assumed to be used for p0) Formula used (Cohen, 1988, p. 202): $$h'=\\phi_{1}-\\phi_{h_0}$$ With: $$\\phi_{i}=2\\times\\text{arcsin}\\sqrt{p_{i}}$$ $$p_i = \\frac{F_i}{n}$$ $$n = \\sum_{i=1}^k F_i$$ *Symbols used*: * $F_i$, is the (absolute) frequency (count) of category i * $n$, is the sample size, i.e. the sum of all frequencies * $p_i$, the proportion of cases in category i * $p_{h_0}$, the expected proportion (i.e. the proportion according to the null hypothesis) Before, After and Alternatives ------------------------------ Before this effect size you might first want to perform a test: * [ts_binomial_os](../tests/test_binomial_os.html#ts_binomial_os) for a One-Sample Binomial Test * [ts_score_os](../tests/test_score_os.html#ts_score_os) for One-Sample Score Test * [ts_wald_os](../tests/test_wald_os.html#ts_wald_os) for One-Sample Wald Test After this, you might want a rule-of-thumb or first convert this to a 'regular' Cohen h: * [es_convert](../effect_sizes/convert_es.html#es_convert) to convert Cohen h' to Cohen h, use fr="cohenhos" and to=cohenh * [th_cohen_h](../other.thumb/cohen_h.html#th_cohen_h) for rules-of-thumb for Cohen h Alternatives could be: * [es_cohen_g](../effect_sizes/eff_size_cohen_g.html#es_cohen_g) for Cohen g * [es_alt_ratio](../effect_sizes/eff_size_alt_ratio.html#es_alt_ratio) for Alternative Ratio * [r_rosenthal](../correlations/cor_rosenthal.html#r_rosenthal) for Rosenthal Correlation if a z-value is available References ---------- Cohen, J. (1988). *Statistical power analysis for the behavioral sciences* (2nd ed.). L. Erlbaum Associates. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Example 1: Numeric list >>> ex1 = [1, 1, 2, 1, 2, 1, 2, 1] >>> es_cohen_h_os(ex1) Cohen h' comment 0 0.25268 assuming p0 for 1 >>> es_cohen_h_os(ex1, p0=0.3) Cohen h' comment 0 0.664197 assuming p0 for 1 Example 2: pandas Series >>> import pandas as pd >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> es_cohen_h_os(df1['sex']) Cohen h' comment 0 0.10251 assuming p0 for FEMALE >>> es_cohen_h_os(df1['mar1'], codes=["DIVORCED", "NEVER MARRIED"]) Cohen h' comment 0 -0.114495 with p0 for DIVORCED ''' if type(data) is list: data = pd.Series(data) #remove missing values data = data.dropna() #Determine number of successes, failures, and total sample size if codes is None: #create a frequency table freq = data.value_counts() if p0Cat is None: #check if there were exactly two categories or not if len(freq) != 2: # unable to determine which category p0 would belong to, so print warning and end print("WARNING: data does not have two unique categories, please specify two categories using codes parameter") return else: #simply select the two categories as cat1 and cat2 n1 = freq.values[0] n2 = freq.values[1] n = n1 + n2 #determine p0 was for which category p0_cat = freq.index[0] if p0 > 0.5 and n1 < n2: n3=n2 n2 = n1 n1 = n3 p0_cat = freq.index[1] cat_used = "assuming p0 for " + str(p0_cat) else: n = sum(freq.values) n1 = sum(data==p0Cat) n2 = n - n1 p0_cat = p0Cat cat_used = "with p0 for " + str(p0Cat) else: n1 = sum(data==codes[0]) n2 = sum(data==codes[1]) n = n1 + n2 cat_used = "with p0 for " + str(codes[0]) p1 = n1/n phi1 = 2 * math.asin(p1**0.5) phic = 2 * math.asin(p0**0.5) h2 = phi1 - phic results = pd.DataFrame([[h2, cat_used]], columns=["Cohen h'", "comment"]) return (results)