Module `stikpetP.effect_sizes.eff_size_dominance`

Expand source code

import pandas as pd

def es_dominance(data, levels=None, mu=None, out="dominance"):
    '''
    Dominance and a Vargha-Delaney A like effect size measure
    ---------------------------------------------------------
    
    This measure could be used with a sign test, since it does not rely on a z-value.

    This function is shown in this [YouTube video](https://youtu.be/UN5MEkH_KzM) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/DominanceScore.html)
    
    Parameters
    ----------
    data : list or pandas data series 
        the data
    levels : dictionary, optional
        the categories and numeric value to use
    mu : float, optional 
        parameter to set the hypothesized median. If not used the midrange is used
    out : {"dominance", "vda"}, optional 
        to either show the "dominance" score (default), or a "vda" like measure
        
    Returns
    -------
    testResults : pandas dataframe 
        with mu and the requested value
   
    Notes
    -----
    The formula used is (Mangiafico, 2016, p. 223-224):
    $$D = p_{pos} - p_{neg}$$
    
    Where:
    $$p_i = \\frac{n_i}{n}$$
    
    *Symbols used:*
    
    * $p_{pos}$ the proportion of cases above the hypothesized median
    * $p_{neg}$ the proportion of cases below the hypothesized median
    * $n_{pos}$ the number of cases above the hypothesized median
    * $n_{neg}$ the number of cases below the hypothesized median
    * $n$ the total number of cases
    
    The dominance score will range from -1 to 1.
    
    A Vargha-Delaney A (VDA) style effect size is calculated with (Mangiafico, 2016, p. 223-224):
    $$VDA_{like} = \\frac{D + 1}{2}$$
    
    This will range from 0 to 1, with 0.5 being the same as a dominance score of 0.

    Before, After and Alternatives
    ------------------------------
    Before this measure you might want to perform the test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
    
    Alternative effect size measure:
    * [es_common_language_os](../effect_sizes/eff_size_common_language_os.html#es_common_language_os) for the Common Language Effect Size
    * [r_rank_biserial_os](../correlations/cor_rank_biserial_os.html#r_rank_biserial_os) for the Rank-Biserial Correlation
    * [r_rosenthal](../correlations/cor_rosenthal.html#r_rosenthal) for the Rosenthal Correlation if a z-value is available
    
    References 
    ----------
    Mangiafico, S. S. (2016). Summary and analysis of extension program evaluation in R (1.20.01). Rutger Cooperative Extension.

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df2['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> es_dominance(ex1, levels=order)
        mu  dominance
    0  3.0  -0.296296
    
    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> es_dominance(ex2)
        mu  dominance
    0  3.0   0.222222
    
    
    '''
    if type(data) is list:
        data = pd.Series(data)
        
    #remove missing values
    data = data.dropna()
    if levels is not None:
        pd.set_option('future.no_silent_downcasting', True)
        data = data.map(levels).astype('Int8')
    else:
        data = pd.to_numeric(data)
    
    data = data.sort_values()
    #set hypothesized median to mid range if not provided
    if mu is None:
        mu = (min(data) + max(data)) / 2
        
    #total sample size
    n = len(data)
    
    #remove scores equal to hypothesized median
    dataRed = data[data != mu]
    
    pPlus = sum(data > mu)/n
    pMin = sum(data < mu)/n
    
    res = pPlus - pMin
    title = "dominance"
    
    if out=="vda":
        res = (res + 1)/2
        title = "VDA-like"
    
    #prepare results
    results = pd.DataFrame([[mu, res]], columns=["mu", title])        
    pd.set_option('display.max_colwidth', None)
    
    return(results)

Functions

def es_dominance(data, levels=None, mu=None, out='dominance')

Dominance and a Vargha-Delaney A like effect size measure

This measure could be used with a sign test, since it does not rely on a z-value.

This function is shown in this YouTube video and the measure is also described at PeterStatistics.com

Parameters

data : list or pandas data series: the data
levels : dictionary, optional: the categories and numeric value to use
mu : float, optional: parameter to set the hypothesized median. If not used the midrange is used
out : {"dominance", "vda"}, optional: to either show the "dominance" score (default), or a "vda" like measure

Returns

testResults : pandas dataframe: with mu and the requested value

Notes

The formula used is (Mangiafico, 2016, p. 223-224): $D = p_{pos} - p_{neg}$

Where: $p_i = \frac{n_i}{n}$

Symbols used:

$p_{pos}$ the proportion of cases above the hypothesized median
$p_{neg}$ the proportion of cases below the hypothesized median
$n_{pos}$ the number of cases above the hypothesized median
$n_{neg}$ the number of cases below the hypothesized median
$n$ the total number of cases

The dominance score will range from -1 to 1.

A Vargha-Delaney A (VDA) style effect size is calculated with (Mangiafico, 2016, p. 223-224): $VDA_{like} = \frac{D + 1}{2}$

This will range from 0 to 1, with 0.5 being the same as a dominance score of 0.

Before, After and Alternatives

Before this measure you might want to perform the test: * ts_sign_os for One-Sample Sign Test * ts_trinomial_os for One-Sample Trinomial Test * ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)

Alternative effect size measure: * es_common_language_os for the Common Language Effect Size * r_rank_biserial_os for the Rank-Biserial Correlation * r_rosenthal for the Rosenthal Correlation if a z-value is available

References

Mangiafico, S. S. (2016). Summary and analysis of extension program evaluation in R (1.20.01). Rutger Cooperative Extension.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Text Pandas Series

>>> import pandas as pd
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df2['Teach_Motivate']
>>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
>>> es_dominance(ex1, levels=order)
    mu  dominance
0  3.0  -0.296296

Example 2: Numeric data

>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> es_dominance(ex2)
    mu  dominance
0  3.0   0.222222

Expand source code

def es_dominance(data, levels=None, mu=None, out="dominance"):
    '''
    Dominance and a Vargha-Delaney A like effect size measure
    ---------------------------------------------------------
    
    This measure could be used with a sign test, since it does not rely on a z-value.

    This function is shown in this [YouTube video](https://youtu.be/UN5MEkH_KzM) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/DominanceScore.html)
    
    Parameters
    ----------
    data : list or pandas data series 
        the data
    levels : dictionary, optional
        the categories and numeric value to use
    mu : float, optional 
        parameter to set the hypothesized median. If not used the midrange is used
    out : {"dominance", "vda"}, optional 
        to either show the "dominance" score (default), or a "vda" like measure
        
    Returns
    -------
    testResults : pandas dataframe 
        with mu and the requested value
   
    Notes
    -----
    The formula used is (Mangiafico, 2016, p. 223-224):
    $$D = p_{pos} - p_{neg}$$
    
    Where:
    $$p_i = \\frac{n_i}{n}$$
    
    *Symbols used:*
    
    * $p_{pos}$ the proportion of cases above the hypothesized median
    * $p_{neg}$ the proportion of cases below the hypothesized median
    * $n_{pos}$ the number of cases above the hypothesized median
    * $n_{neg}$ the number of cases below the hypothesized median
    * $n$ the total number of cases
    
    The dominance score will range from -1 to 1.
    
    A Vargha-Delaney A (VDA) style effect size is calculated with (Mangiafico, 2016, p. 223-224):
    $$VDA_{like} = \\frac{D + 1}{2}$$
    
    This will range from 0 to 1, with 0.5 being the same as a dominance score of 0.

    Before, After and Alternatives
    ------------------------------
    Before this measure you might want to perform the test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
    
    Alternative effect size measure:
    * [es_common_language_os](../effect_sizes/eff_size_common_language_os.html#es_common_language_os) for the Common Language Effect Size
    * [r_rank_biserial_os](../correlations/cor_rank_biserial_os.html#r_rank_biserial_os) for the Rank-Biserial Correlation
    * [r_rosenthal](../correlations/cor_rosenthal.html#r_rosenthal) for the Rosenthal Correlation if a z-value is available
    
    References 
    ----------
    Mangiafico, S. S. (2016). Summary and analysis of extension program evaluation in R (1.20.01). Rutger Cooperative Extension.

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df2['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> es_dominance(ex1, levels=order)
        mu  dominance
    0  3.0  -0.296296
    
    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> es_dominance(ex2)
        mu  dominance
    0  3.0   0.222222
    
    
    '''
    if type(data) is list:
        data = pd.Series(data)
        
    #remove missing values
    data = data.dropna()
    if levels is not None:
        pd.set_option('future.no_silent_downcasting', True)
        data = data.map(levels).astype('Int8')
    else:
        data = pd.to_numeric(data)
    
    data = data.sort_values()
    #set hypothesized median to mid range if not provided
    if mu is None:
        mu = (min(data) + max(data)) / 2
        
    #total sample size
    n = len(data)
    
    #remove scores equal to hypothesized median
    dataRed = data[data != mu]
    
    pPlus = sum(data > mu)/n
    pMin = sum(data < mu)/n
    
    res = pPlus - pMin
    title = "dominance"
    
    if out=="vda":
        res = (res + 1)/2
        title = "VDA-like"
    
    #prepare results
    results = pd.DataFrame([[mu, res]], columns=["mu", title])        
    pd.set_option('display.max_colwidth', None)
    
    return(results)