Module stikpetP.effect_sizes.eff_size_freeman_theta
Expand source code
import pandas as pd
from ..other.table_cross import tab_cross
def es_freeman_theta(catField, ordField, categories=None, levels=None):
'''
Freeman Theta
-------------
According to Jacobson (1972, p. 42), this is the only measure for nominal-ordinal data, and is a modification of Somers d.
It can range from 0 to 1, with 0 indicating no influence of the catField on the scores of the ordField, and a 1 a perfect relationship.
Alternatives could be eta-squared and epsilon-squared.
Parameters
----------
catField : pandas series
data with categories
ordField : pandas series
data with the scores
categories : list or dictionary, optional
the categories to use from catField
levels : list or dictionary, optional
the levels or order used in ordField.
Returns
-------
theta : float
the Freeman Theta value
Notes
-----
The formula used is (Freeman, 1965, p. 116):
$$\\theta = \\frac{D}{T}$$
With:
$$D = \\sum D_{x,y}$$
$$D_{x,y} = \\left|f_a - f_b\\right|$$
$$f_a = \\sum_{i=1}^{n_{lvl} - 1}\\left(F_{x,i}\\times\\sum_{j=i+1}^{n_{lvl}} F_{y,j}\\right)$$
$$f_b = \\sum_{i=2}^{n_{lvl}}\\left(F_{x,i}\\times\\sum_{j=1}^{i-1} F_{y,j}\\right)$$
*Symbols used:*
* \\(F_{x,i}\\), from category x, the number of cases with level i.
* \\(n_{lvl}\\), the number of levels.
* \\(n_i\\), the total number of cases from category i
References
----------
Freeman, L. C. (1965). *Elementary applied statistics: For students in behavioral science*. Wiley.
Jacobson, P. E. (1972). Applying measures of association to nominal-ordinal data. *The Pacific Sociological Review, 15*(1), 41–60. doi:10.2307/1388286
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
#create the cross table
ct = tab_cross(catField, ordField, order1=categories, order2=levels, totals="include")
#basic counts
k = ct.shape[0]-1
nlvl = ct.shape[1]-1
d = 0
t = 0
for x in range(0, k - 1):
for y in range(x + 1, k):
fb = 0
for i in range(1, nlvl):
fs = 0
for j in range(0, i):
fs = fs + ct.iloc[y, j]
fb = fb + ct.iloc[x, i] * fs
fa = 0
for i in range(0, nlvl - 1):
fs = 0
for j in range(i + 1, nlvl):
fs = fs + ct.iloc[y, j]
fa = fa + ct.iloc[x, i] * fs
d = d + abs(fa - fb)
t = t + ct.iloc[x, nlvl] * ct.iloc[y, nlvl]
theta = d / t
return theta
Functions
def es_freeman_theta(catField, ordField, categories=None, levels=None)
-
Freeman Theta
According to Jacobson (1972, p. 42), this is the only measure for nominal-ordinal data, and is a modification of Somers d.
It can range from 0 to 1, with 0 indicating no influence of the catField on the scores of the ordField, and a 1 a perfect relationship.
Alternatives could be eta-squared and epsilon-squared.
Parameters
catField
:pandas series
- data with categories
ordField
:pandas series
- data with the scores
categories
:list
ordictionary
, optional- the categories to use from catField
levels
:list
ordictionary
, optional- the levels or order used in ordField.
Returns
theta
:float
- the Freeman Theta value
Notes
The formula used is (Freeman, 1965, p. 116): \theta = \frac{D}{T}
With: D = \sum D_{x,y} D_{x,y} = \left|f_a - f_b\right| f_a = \sum_{i=1}^{n_{lvl} - 1}\left(F_{x,i}\times\sum_{j=i+1}^{n_{lvl}} F_{y,j}\right) f_b = \sum_{i=2}^{n_{lvl}}\left(F_{x,i}\times\sum_{j=1}^{i-1} F_{y,j}\right)
Symbols used:
- F_{x,i}, from category x, the number of cases with level i.
- n_{lvl}, the number of levels.
- n_i, the total number of cases from category i
References
Freeman, L. C. (1965). Elementary applied statistics: For students in behavioral science. Wiley.
Jacobson, P. E. (1972). Applying measures of association to nominal-ordinal data. The Pacific Sociological Review, 15(1), 41–60. doi:10.2307/1388286
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def es_freeman_theta(catField, ordField, categories=None, levels=None): ''' Freeman Theta ------------- According to Jacobson (1972, p. 42), this is the only measure for nominal-ordinal data, and is a modification of Somers d. It can range from 0 to 1, with 0 indicating no influence of the catField on the scores of the ordField, and a 1 a perfect relationship. Alternatives could be eta-squared and epsilon-squared. Parameters ---------- catField : pandas series data with categories ordField : pandas series data with the scores categories : list or dictionary, optional the categories to use from catField levels : list or dictionary, optional the levels or order used in ordField. Returns ------- theta : float the Freeman Theta value Notes ----- The formula used is (Freeman, 1965, p. 116): $$\\theta = \\frac{D}{T}$$ With: $$D = \\sum D_{x,y}$$ $$D_{x,y} = \\left|f_a - f_b\\right|$$ $$f_a = \\sum_{i=1}^{n_{lvl} - 1}\\left(F_{x,i}\\times\\sum_{j=i+1}^{n_{lvl}} F_{y,j}\\right)$$ $$f_b = \\sum_{i=2}^{n_{lvl}}\\left(F_{x,i}\\times\\sum_{j=1}^{i-1} F_{y,j}\\right)$$ *Symbols used:* * \\(F_{x,i}\\), from category x, the number of cases with level i. * \\(n_{lvl}\\), the number of levels. * \\(n_i\\), the total number of cases from category i References ---------- Freeman, L. C. (1965). *Elementary applied statistics: For students in behavioral science*. Wiley. Jacobson, P. E. (1972). Applying measures of association to nominal-ordinal data. *The Pacific Sociological Review, 15*(1), 41–60. doi:10.2307/1388286 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' #create the cross table ct = tab_cross(catField, ordField, order1=categories, order2=levels, totals="include") #basic counts k = ct.shape[0]-1 nlvl = ct.shape[1]-1 d = 0 t = 0 for x in range(0, k - 1): for y in range(x + 1, k): fb = 0 for i in range(1, nlvl): fs = 0 for j in range(0, i): fs = fs + ct.iloc[y, j] fb = fb + ct.iloc[x, i] * fs fa = 0 for i in range(0, nlvl - 1): fs = 0 for j in range(i + 1, nlvl): fs = fs + ct.iloc[y, j] fa = fa + ct.iloc[x, i] * fs d = d + abs(fa - fb) t = t + ct.iloc[x, nlvl] * ct.iloc[y, nlvl] theta = d / t return theta