Module stikpetP.effect_sizes.eff_size_freeman_theta

Expand source code
import pandas as pd
from ..other.table_cross import tab_cross

def es_freeman_theta(catField, ordField, categories=None, levels=None):
    '''
    Freeman Theta
    -------------
    According to Jacobson (1972, p. 42), this is the only measure for nominal-ordinal data, and is a modification of Somers d.
    
    It can range from 0 to 1, with 0 indicating no influence of the catField on the scores of the ordField, and a 1 a perfect relationship.
    
    Alternatives could be eta-squared and epsilon-squared.
    
    Parameters
    ----------
    catField : pandas series
        data with categories
    ordField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
    levels : list or dictionary, optional
        the levels or order used in ordField.
        
    Returns
    -------
    theta : float
        the Freeman Theta value
        
    Notes
    -----
    The formula used is (Freeman, 1965, p. 116):
    $$\\theta = \\frac{D}{T}$$
    
    With:
    $$D = \\sum D_{x,y}$$
    $$D_{x,y} = \\left|f_a - f_b\\right|$$
    $$f_a = \\sum_{i=1}^{n_{lvl} - 1}\\left(F_{x,i}\\times\\sum_{j=i+1}^{n_{lvl}} F_{y,j}\\right)$$
    $$f_b = \\sum_{i=2}^{n_{lvl}}\\left(F_{x,i}\\times\\sum_{j=1}^{i-1} F_{y,j}\\right)$$
    
    *Symbols used:*
    
    * \\(F_{x,i}\\), from category x, the number of cases with level i.
    * \\(n_{lvl}\\), the number of levels.
    * \\(n_i\\), the total number of cases from category i
    
    References
    ----------
    
    Freeman, L. C. (1965). *Elementary applied statistics: For students in behavioral science*. Wiley.
    
    Jacobson, P. E. (1972). Applying measures of association to nominal-ordinal data. *The Pacific Sociological Review, 15*(1), 41–60. doi:10.2307/1388286

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    #create the cross table    
    ct = tab_cross(catField, ordField, order1=categories, order2=levels, totals="include")
    
    #basic counts
    k = ct.shape[0]-1
    nlvl = ct.shape[1]-1
    
    d = 0
    t = 0
    for x in range(0, k - 1):
        for y in range(x + 1, k):
            fb = 0
            for i in range(1, nlvl):
                fs = 0
                for j in range(0, i):
                    fs = fs + ct.iloc[y, j] 
                fb = fb + ct.iloc[x, i] * fs
            
            fa = 0
            for i in range(0, nlvl - 1):
                fs = 0
                for j in range(i + 1, nlvl):
                    fs = fs + ct.iloc[y, j]
                fa = fa + ct.iloc[x, i] * fs
            
            d = d + abs(fa - fb)
            t = t + ct.iloc[x, nlvl] * ct.iloc[y, nlvl]

    theta = d / t
    
    return theta

Functions

def es_freeman_theta(catField, ordField, categories=None, levels=None)

Freeman Theta

According to Jacobson (1972, p. 42), this is the only measure for nominal-ordinal data, and is a modification of Somers d.

It can range from 0 to 1, with 0 indicating no influence of the catField on the scores of the ordField, and a 1 a perfect relationship.

Alternatives could be eta-squared and epsilon-squared.

Parameters

catField : pandas series
data with categories
ordField : pandas series
data with the scores
categories : list or dictionary, optional
the categories to use from catField
levels : list or dictionary, optional
the levels or order used in ordField.

Returns

theta : float
the Freeman Theta value

Notes

The formula used is (Freeman, 1965, p. 116): \theta = \frac{D}{T}

With: D = \sum D_{x,y} D_{x,y} = \left|f_a - f_b\right| f_a = \sum_{i=1}^{n_{lvl} - 1}\left(F_{x,i}\times\sum_{j=i+1}^{n_{lvl}} F_{y,j}\right) f_b = \sum_{i=2}^{n_{lvl}}\left(F_{x,i}\times\sum_{j=1}^{i-1} F_{y,j}\right)

Symbols used:

  • F_{x,i}, from category x, the number of cases with level i.
  • n_{lvl}, the number of levels.
  • n_i, the total number of cases from category i

References

Freeman, L. C. (1965). Elementary applied statistics: For students in behavioral science. Wiley.

Jacobson, P. E. (1972). Applying measures of association to nominal-ordinal data. The Pacific Sociological Review, 15(1), 41–60. doi:10.2307/1388286

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def es_freeman_theta(catField, ordField, categories=None, levels=None):
    '''
    Freeman Theta
    -------------
    According to Jacobson (1972, p. 42), this is the only measure for nominal-ordinal data, and is a modification of Somers d.
    
    It can range from 0 to 1, with 0 indicating no influence of the catField on the scores of the ordField, and a 1 a perfect relationship.
    
    Alternatives could be eta-squared and epsilon-squared.
    
    Parameters
    ----------
    catField : pandas series
        data with categories
    ordField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
    levels : list or dictionary, optional
        the levels or order used in ordField.
        
    Returns
    -------
    theta : float
        the Freeman Theta value
        
    Notes
    -----
    The formula used is (Freeman, 1965, p. 116):
    $$\\theta = \\frac{D}{T}$$
    
    With:
    $$D = \\sum D_{x,y}$$
    $$D_{x,y} = \\left|f_a - f_b\\right|$$
    $$f_a = \\sum_{i=1}^{n_{lvl} - 1}\\left(F_{x,i}\\times\\sum_{j=i+1}^{n_{lvl}} F_{y,j}\\right)$$
    $$f_b = \\sum_{i=2}^{n_{lvl}}\\left(F_{x,i}\\times\\sum_{j=1}^{i-1} F_{y,j}\\right)$$
    
    *Symbols used:*
    
    * \\(F_{x,i}\\), from category x, the number of cases with level i.
    * \\(n_{lvl}\\), the number of levels.
    * \\(n_i\\), the total number of cases from category i
    
    References
    ----------
    
    Freeman, L. C. (1965). *Elementary applied statistics: For students in behavioral science*. Wiley.
    
    Jacobson, P. E. (1972). Applying measures of association to nominal-ordinal data. *The Pacific Sociological Review, 15*(1), 41–60. doi:10.2307/1388286

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    #create the cross table    
    ct = tab_cross(catField, ordField, order1=categories, order2=levels, totals="include")
    
    #basic counts
    k = ct.shape[0]-1
    nlvl = ct.shape[1]-1
    
    d = 0
    t = 0
    for x in range(0, k - 1):
        for y in range(x + 1, k):
            fb = 0
            for i in range(1, nlvl):
                fs = 0
                for j in range(0, i):
                    fs = fs + ct.iloc[y, j] 
                fb = fb + ct.iloc[x, i] * fs
            
            fa = 0
            for i in range(0, nlvl - 1):
                fs = 0
                for j in range(i + 1, nlvl):
                    fs = fs + ct.iloc[y, j]
                fa = fa + ct.iloc[x, i] * fs
            
            d = d + abs(fa - fb)
            t = t + ct.iloc[x, nlvl] * ct.iloc[y, nlvl]

    theta = d / t
    
    return theta