Module stikpetP.correlations.cor_kendall_tau

Expand source code
import pandas as pd
from statistics import NormalDist
from ..other.table_cross import tab_cross
from ..distributions.dist_kendall_tau import di_kendall_tau

def r_kendall_tau(ordField1, ordField2, levels1=None, levels2=None, version="b", test="kendall-appr", cc=False, useRanks=False):
    '''
    Kendall Tau
    -----------
    A rank correlation coefficient. It ranges from -1 (perfect negative association) to 1 (perfect positive association). A zero would indicate no correlation at all.
    
    A positive correlation indicates that if someone scored high on the first field, they also likely score high on the second, while a negative correlation would indicate a high score on the first would give a low score on the second.
    
    Alternatives for Gamma are Kendall Tau, Stuart-Kendall Tau and Somers D, but also Spearman rho could be considered.
    
    Kendall Tau b looks at so-called discordant and concordant pairs, but unlike Gamma it does not ignore tied pairs. Stuart-Kendall Tau c also, but also takes the size of the table into consideration. Somers d only makes a correction for tied pairs in one of the two directions. Spearman rho is more of a variation on Pearson correlation, but applied to ranks. See Göktaş and İşçi. (2011) for more information on the comparisons.
    
    Kendall Tau a is the same as Goodman-Kruskal Gamma. See r_stuart_tau() for Stuart-Kendall-Tau c.
    
    Parameters
    ----------
    ordField1 : pandas series
        the ordinal or scale scores of the first variable
    ordField2 : pandas series
        the ordinal or scale scores of the second variable
    levels1 : list or dictionary, optional
        the categories to use from ordField1
    levels2 : list or dictionary, optional
        the categories to use from ordField2
    version : {"b", "a"}, optional
        which Tau to determine
    test : {"bb", "kendall-appr", "as71", "kendall-exact"} : optional
        which test to use. Only applies if version="b"
    cc : boolean, optional
        use of continuity correction. Default is False
    useRanks : boolean, optional
        rank the data first or not. Default is False
        
    Returns
    -------
    A dataframe with:
    
    * *tau*, the tau value
    * *statistic*, the test statistic (z-value)
    * *p-value*, the p-value (significance)
    * *test*, description of the test used
    
    Notes
    -----
    
    The formula used for tau-a (Kendall, 1938, p. 82):
    $$\\tau_a = \\frac{P-Q}{n\\times\\left(n - 1\\right)}$$
    
    The formula for tau-b (Kendall, 1945, p. 243):
    $$\\tau_b = \\frac{P-Q}{\\sqrt{D_r\\times D_c}}$$
    
    An alternative formula for tau-b is actually used in the function (Kendall, 1962, p. 35; Kendall & Gibbons, 1990, p. 41; Schaeffer & Levitt, 1956, p. 339):
    $$\\tau_b = \\frac{n_c - n_d}{\\sqrt{\\left(n_0 - n_1\\right)\\times\\left(n_0 - n_2\\right)}}$$
    
    With:
    $$P = \\sum_{i,j} P_{i,j}$$
    $$Q = \\sum_{i,j} Q_{i,j}$$
    $$P_{i,j} = F_{i,j}\\times C_{i,j}$$
    $$Q_{i,j} = F_{i,j}\\times D_{i,j}$$
    $$C_{i,j} = \\sum_{h<i}\\sum_{k<j} F_{h,k} + \\sum_{h>i}\\sum_{k>j} F_{h,k}$$
    $$D_{i,j} = \\sum_{h<i}\\sum_{k>j} F_{h,k} + \\sum_{h>i}\\sum_{k<j} F_{h,k}$$
    $$D_r = n^2 - \\sum_{i=1}^r RS_i^2$$
    $$D_c = n^2 - \\sum_{j=1}^c CS_j^2$$
    $$RS_i = \\sum_{j=1}^c F_{i,j}$$
    $$CS_j = \\sum_{i=1}^r F_{i,j}$$
    $$n = \\sum_{i=1}^r \\sum_{j=1}^c F_{i,j} = \\sum_{j=1}^c CS_j = \\sum_{i=1}^r RS_i$$
    $$n_c = \\frac{P}{2}$$
    $$n_d = \\frac{Q}{2}$$
    $$n_0 = \\frac{n\\times\\left(n - 1\\right)}{2}$$
    $$n_1 = \\frac{\\sum_{i=1}^r RS_i\\times\\left(R_i -1\\right)}{2}$$
    $$n_2 = \\frac{\\sum_{j=1}^c CS_j\\times\\left(CS_j -1\\right)}{2}$$
    
    The significance (p-value) is calculated for tau a with (Kendall, 1938, p. 82; Kendall, 1962, p. 51; Schaeffer & Levitt, 1956, p. 341):
    $$z_{\\tau_a} = \\frac{\\tau_a}{\\sqrt{\\frac{\\sigma_{\\tau_a}^2}{n}}}$$
    $$\\sigma_{\\tau_a} = \\frac{4\\times n + 10}{9\\times n\\times\\left(n-1\\right)}$$
    $$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z_{\\tau_a}\\right|\\right)\\right)$$
    
    The significance (p-value) for tau b, when test="bb" is the approximation from Brown and Benedetti (1977, p. 311):
    $$z_{\\tau_b} = \\frac{\\tau_b}{ASE_0}$$
    $$ASE_0 = 2\\times\\sqrt{\\frac{\\sum_{i=1}^r \\sum_{j=1}^c F_{i,j}\\times\\left(C_{i,j} - D_{i,j}\\right)^2 - \\frac{\\left(P-Q\\right)^2}{n}}{\\left(n^2-\\sum_{i=1}^r RS_i^2\\right)\\times\\left(n^2-\\sum_{j=1}^c CS_j^2\\right)}}
    $$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z_{\\tau_b}\\right|\\right)\\right)$$
    
    The continuity correction can then also be applied using Schaeffer and Levitt (1956, p. 342):
    $$\\tau_b^* = \\left|\\tau_b\\right| - \\frac{2}{n\\times\\left(n - 1\\right)}$$
    
    Kendall (Kendall, 1962, p. 55; Kendall & Gibbons, 1990, p. 66) also gives an approximation:
    $$z_{\\tau_b} = \\frac{\\tau_b}{\\sqrt{s^2}}$$
    $$s^2 = \\frac{\\alpha_1 - \\alpha_2 - \\alpha_3}{18} + \\frac{\\alpha_5 \\times \\alpha_6}{\\alpha_4}+\\frac{\\alpha_8 \\times \\alpha_9}{\\alpha_7}$$
    $$\\alpha_1 = n\\times\\left(n-1\\right)\\times\\left(2\\times n+5\\right)$$
    $$\\alpha_2 = \\sum_{i=1}^r RS_i\\times\\left(RS_i-1\\right)\\times\\left(2\\times RS_i+5\\right)$$
    $$\\alpha_3 = \\sum_{j=1}^c CS_j\\times\\left(CS_j-1\\right)\\times\\left(2\\times CS_j+5\\right)$$
    $$\\alpha_4 = 9\\times n\\times\\left(n-1\\right)\\times\\left(n-2\\right)$$
    $$\\alpha_5 = \\sum_{i=1}^r RS_i\\times\\left(RS_i-1\\right)\\times\\left(RS_i-2\\right)$$ 
    $$\\alpha_6 = \\sum_{j=1}^c CS_j\\times\\left(CS_j-1\\right)\\times\\left(CS_j-2\\right)$$
    $$\\alpha_7 = 2\\times n\\times\\left(n-1\\right)$$
    $$\\alpha_8 = \\sum_{i=1}^r RS_i\\times\\left(RS_i-1\\right)$$ 
    $$\\alpha_9 = \\sum_{j=1}^c CS_j\\times\\left(CS_j-1\\right)$$
    
    If test="as71" then the exact Kendall-Tau-b distribution is used, with Algorithm AS71 from Best and Gipps (1974) adapted to Python by Stikker.
    
    *Symbols Used*
    
    * \\(F_{i,j}\\), the number of cases in row i, column j.
    * \\(n\\), the total sample size
    * \\(r\\), the number of rows
    * \\(c\\), the number of columns
    * \\(\\Phi\\left(\\dots\\right)\\), the cumulative distribution function of the standard normal distribution.
    
    References
    ----------
    Best, D. J., & Gipps, P. G. (1974). Algorithm AS 71: The upper tail probabilities of Kendall’s tau. *Applied Statistics, 23*(1), 98–100. doi:10.2307/2347062
    
    Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of test for correlation in two-way contingency tables. *Journal of the American Statistical Association, 72*(358), 309–315. doi:10.2307/2286793
    
    Göktaş, A., & İşçi, Ö. (2011). A comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. *Advances in Methodology and Statistics, 8*(1). doi:10.51936/milh5641
    
    Kendall, M. G. (1938). A new measure of rank correlation. *Biometrika, 30*(1–2), 81–93. doi:10.1093/biomet/30.1-2.81
    
    Kendall, M. G. (1945). The treatment of ties in ranking problems. *Biometrika, 33*(3), 239–251. doi:10.1093/biomet/33.3.239
    
    Kendall, M. G. (1962). *Rank correlation methods* (3rd ed.). Charles Griffin.
    
    Kendall, M., & Gibbons, J. D. (1990). *Rank correlation methods* (5th ed.). Oxford University Press.
    
    Schaeffer, M. S., & Levitt, E. E. (1956). Concerning Kendall’s tau, a nonparametric correlation coefficient. *Psychological Bulletin, 53*(4), 338–346. doi:10.1037/h0045013
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076   
    
    '''
    
    ct = tab_cross(ordField1, ordField2, order1=levels1, order2=levels2)
    k1 = ct.shape[0]
    k2 = ct.shape[1]
    
    if useRanks==False:
        if levels1 is not None:
            #replace row labels with numeric score
            ct = ct.reset_index(drop=True)
            
        if levels2 is not None:            
            ct.columns = [i for i in range(0, k2)]
    
    n = 0
    conc = [[0]*k1]*k2
    disc = [[0]*k1]*k2
    conc = pd.DataFrame(conc)
    disc = pd.DataFrame(disc)
    
    for i in range(0, k1):
        for j in range(0, k2):
            for h in range(0, k1):
                for k in range(0, k2):
                    
                    if useRanks:
                        if h > i and k > j:
                            conc.iloc[i,j] = conc.iloc[i,j] + ct.iloc[h,k]
                        elif h<i and k<j:
                            conc.iloc[i,j] = conc.iloc[i,j] + ct.iloc[h,k]
                        elif h>i and k<j:
                            disc.iloc[i,j] = disc.iloc[i,j] + ct.iloc[h,k]
                        elif h<i and k>j:
                            disc.iloc[i,j] = disc.iloc[i,j] + ct.iloc[h,k]                        
                    else:
                        if ct.index[h] > ct.index[i] and ct.columns[k] > ct.columns[j]:
                            conc.iloc[i,j] = conc.iloc[i,j] + ct.iloc[h,k]
                        elif ct.index[h] < ct.index[i] and ct.columns[k] < ct.columns[j]:
                            conc.iloc[i,j] = conc.iloc[i,j] + ct.iloc[h,k]
                        elif ct.index[h] > ct.index[i] and ct.columns[k] < ct.columns[j]:
                            disc.iloc[i,j] = disc.iloc[i,j] + ct.iloc[h,k]
                        elif ct.index[h] < ct.index[i] and ct.columns[k] > ct.columns[j]:
                            disc.iloc[i,j] = disc.iloc[i,j] + ct.iloc[h,k]
            n = n + ct.iloc[i,j]
    
    ct = ct.reset_index(drop=True)
    ct.columns = [i for i in range(0, k2)]
    
    p = (ct*conc).sum().sum()
    q = (ct*disc).sum().sum()
    
    if version=="a":
        #Kendall Tau a
        tauUsed = "Kendall Tau-a"
        tau = (p - q) / (n * (n - 1))
    
        varA = (4 * n + 10) / (9 * n * (n - 1))
        seA = (varA / n)**0.5
        z = tau / seA
        pVal = 2 * (1 - NormalDist().cdf(abs(z))) 
        testUsed = "Kendall approximation"
        
    elif version=="b":
        #Kendall Tau b
        tauUsed = "Kendall Tau-b"
        n0 = n * (n - 1) / 2
        
        rs = ct.sum(axis=1)
        n1 = (rs*(rs-1)).sum()/2
        
        cs = ct.sum()
        n2 = (cs*(cs-1)).sum()/2
        
        nc = p / 2
        nd = q / 2
        
        tau = (nc - nd) / ((n0 - n1) * (n0 - n2))**0.5
        
        if test=="bb":
            #Brown and Benedetti (1977, p. 311) = version from SPSS
            ase0 = (ct*(conc-disc)**2).sum().sum()           
            dr = n**2 - (rs**2).sum()
            dc = n**2 - (cs**2).sum()            
            ase0 = 2 * ((ase0 - (p - q)**2 / n) / (dr * dc))**0.5
            if cc:
                ## Schaeffer and Levitt (1956, p. 342):
                tauTest = abs(tau) - 2 / (n * (n - 1))
            else:
                tauTest = tau
                        
            z = tauTest / ase0
            
            pVal = 2 * (1 - NormalDist().cdf(abs(z)))
            testUsed = "Brown and Benedetti approximation"
            
        elif test=="kendall-appr":
            #Kendall (1962, p. 55)
            v2P1 = (rs*(rs - 1)*(rs - 2)).sum()
            v1P1 = (rs*(rs - 1)).sum()
            vr = (rs*(rs - 1)*(2*rs + 5)).sum()
            
            v2P2 = (cs*(cs - 1)*(cs - 2)).sum()
            v1P2 = (cs*(cs - 1)).sum()
            vc = (cs*(cs - 1)*(2*cs + 5)).sum()
            
            v0 = n * (n - 1) * (2 * n + 5)
            v1 = v1P1 * v1P2 / (2 * n * (n - 1))
            v2 = v2P1 * v2P2 / (9 * n * (n - 1) * (n - 2))
            v = (v0 - vr - vc) / 18 + v1 + v2
            
            if cc:
                z = (abs(nc - nd) - 1) / (v)**0.5
            else:
                z = (nc - nd) / (v)**0.5            
            
            pVal = 2 * (1 - NormalDist().cdf(abs(z)))
            if tau < 0:
                z = -abs(z)
                        
            testUsed = "Kendall approximation"
            
        elif test== "as71":
            pVal = di_kendall_tau(n, tau, "as71", cc)
            z = None
            testUsed = "exact with AS71 algorithm"
            
        elif test== "kendall-exact":
            pVal = di_kendall_tau(n, tau, "kendall", cc)
            z = None
            testUsed = "Kendall exact"
        
    res = pd.DataFrame([[tau, z, pVal, testUsed]])
    res.columns = [tauUsed, "statistic", "p-value", "test"]
    
    return res

Functions

def r_kendall_tau(ordField1, ordField2, levels1=None, levels2=None, version='b', test='kendall-appr', cc=False, useRanks=False)

Kendall Tau

A rank correlation coefficient. It ranges from -1 (perfect negative association) to 1 (perfect positive association). A zero would indicate no correlation at all.

A positive correlation indicates that if someone scored high on the first field, they also likely score high on the second, while a negative correlation would indicate a high score on the first would give a low score on the second.

Alternatives for Gamma are Kendall Tau, Stuart-Kendall Tau and Somers D, but also Spearman rho could be considered.

Kendall Tau b looks at so-called discordant and concordant pairs, but unlike Gamma it does not ignore tied pairs. Stuart-Kendall Tau c also, but also takes the size of the table into consideration. Somers d only makes a correction for tied pairs in one of the two directions. Spearman rho is more of a variation on Pearson correlation, but applied to ranks. See Göktaş and İşçi. (2011) for more information on the comparisons.

Kendall Tau a is the same as Goodman-Kruskal Gamma. See r_stuart_tau() for Stuart-Kendall-Tau c.

Parameters

ordField1 : pandas series
the ordinal or scale scores of the first variable
ordField2 : pandas series
the ordinal or scale scores of the second variable
levels1 : list or dictionary, optional
the categories to use from ordField1
levels2 : list or dictionary, optional
the categories to use from ordField2
version : {"b", "a"}, optional
which Tau to determine
test : {"bb", "kendall-appr", "as71", "kendall-exact"} : optional
which test to use. Only applies if version="b"
cc : boolean, optional
use of continuity correction. Default is False
useRanks : boolean, optional
rank the data first or not. Default is False

Returns

A dataframe with:
 
  • tau, the tau value
  • statistic, the test statistic (z-value)
  • p-value, the p-value (significance)
  • test, description of the test used

Notes

The formula used for tau-a (Kendall, 1938, p. 82): \tau_a = \frac{P-Q}{n\times\left(n - 1\right)}

The formula for tau-b (Kendall, 1945, p. 243): \tau_b = \frac{P-Q}{\sqrt{D_r\times D_c}}

An alternative formula for tau-b is actually used in the function (Kendall, 1962, p. 35; Kendall & Gibbons, 1990, p. 41; Schaeffer & Levitt, 1956, p. 339): \tau_b = \frac{n_c - n_d}{\sqrt{\left(n_0 - n_1\right)\times\left(n_0 - n_2\right)}}

With: P = \sum_{i,j} P_{i,j} Q = \sum_{i,j} Q_{i,j} P_{i,j} = F_{i,j}\times C_{i,j} Q_{i,j} = F_{i,j}\times D_{i,j} C_{i,j} = \sum_{h<i}\sum_{k<j} F_{h,k} + \sum_{h>i}\sum_{k>j} F_{h,k} D_{i,j} = \sum_{h<i}\sum_{k>j} F_{h,k} + \sum_{h>i}\sum_{k<j} F_{h,k} D_r = n^2 - \sum_{i=1}^r RS_i^2 D_c = n^2 - \sum_{j=1}^c CS_j^2 RS_i = \sum_{j=1}^c F_{i,j} CS_j = \sum_{i=1}^r F_{i,j} n = \sum_{i=1}^r \sum_{j=1}^c F_{i,j} = \sum_{j=1}^c CS_j = \sum_{i=1}^r RS_i n_c = \frac{P}{2} n_d = \frac{Q}{2} n_0 = \frac{n\times\left(n - 1\right)}{2} n_1 = \frac{\sum_{i=1}^r RS_i\times\left(R_i -1\right)}{2} n_2 = \frac{\sum_{j=1}^c CS_j\times\left(CS_j -1\right)}{2}

The significance (p-value) is calculated for tau a with (Kendall, 1938, p. 82; Kendall, 1962, p. 51; Schaeffer & Levitt, 1956, p. 341): z_{\tau_a} = \frac{\tau_a}{\sqrt{\frac{\sigma_{\tau_a}^2}{n}}} \sigma_{\tau_a} = \frac{4\times n + 10}{9\times n\times\left(n-1\right)} sig. = 2\times\left(1 - \Phi\left(\left|z_{\tau_a}\right|\right)\right)

The significance (p-value) for tau b, when test="bb" is the approximation from Brown and Benedetti (1977, p. 311): z_{\tau_b} = \frac{\tau_b}{ASE_0} ASE_0 = 2\times\sqrt{\frac{\sum_{i=1}^r \sum_{j=1}^c F_{i,j}\times\left(C_{i,j} - D_{i,j}\right)^2 - \frac{\left(P-Q\right)^2}{n}}{\left(n^2-\sum_{i=1}^r RS_i^2\right)\times\left(n^2-\sum_{j=1}^c CS_j^2\right)}} sig. = 2\times\left(1 - \Phi\left(\left|z_{\tau_b}\right|\right)\right)$$

The continuity correction can then also be applied using Schaeffer and Levitt (1956, p. 342): \tau_b^* = \left|\tau_b\right| - \frac{2}{n\times\left(n - 1\right)}

Kendall (Kendall, 1962, p. 55; Kendall & Gibbons, 1990, p. 66) also gives an approximation: z_{\tau_b} = \frac{\tau_b}{\sqrt{s^2}} s^2 = \frac{\alpha_1 - \alpha_2 - \alpha_3}{18} + \frac{\alpha_5 \times \alpha_6}{\alpha_4}+\frac{\alpha_8 \times \alpha_9}{\alpha_7} \alpha_1 = n\times\left(n-1\right)\times\left(2\times n+5\right) \alpha_2 = \sum_{i=1}^r RS_i\times\left(RS_i-1\right)\times\left(2\times RS_i+5\right) \alpha_3 = \sum_{j=1}^c CS_j\times\left(CS_j-1\right)\times\left(2\times CS_j+5\right) \alpha_4 = 9\times n\times\left(n-1\right)\times\left(n-2\right) \alpha_5 = \sum_{i=1}^r RS_i\times\left(RS_i-1\right)\times\left(RS_i-2\right) \alpha_6 = \sum_{j=1}^c CS_j\times\left(CS_j-1\right)\times\left(CS_j-2\right) \alpha_7 = 2\times n\times\left(n-1\right) \alpha_8 = \sum_{i=1}^r RS_i\times\left(RS_i-1\right) \alpha_9 = \sum_{j=1}^c CS_j\times\left(CS_j-1\right)

If test="as71" then the exact Kendall-Tau-b distribution is used, with Algorithm AS71 from Best and Gipps (1974) adapted to Python by Stikker.

Symbols Used

  • F_{i,j}, the number of cases in row i, column j.
  • n, the total sample size
  • r, the number of rows
  • c, the number of columns
  • \Phi\left(\dots\right), the cumulative distribution function of the standard normal distribution.

References

Best, D. J., & Gipps, P. G. (1974). Algorithm AS 71: The upper tail probabilities of Kendall’s tau. Applied Statistics, 23(1), 98–100. doi:10.2307/2347062

Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of test for correlation in two-way contingency tables. Journal of the American Statistical Association, 72(358), 309–315. doi:10.2307/2286793

Göktaş, A., & İşçi, Ö. (2011). A comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. Advances in Methodology and Statistics, 8(1). doi:10.51936/milh5641

Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1–2), 81–93. doi:10.1093/biomet/30.1-2.81

Kendall, M. G. (1945). The treatment of ties in ranking problems. Biometrika, 33(3), 239–251. doi:10.1093/biomet/33.3.239

Kendall, M. G. (1962). Rank correlation methods (3rd ed.). Charles Griffin.

Kendall, M., & Gibbons, J. D. (1990). Rank correlation methods (5th ed.). Oxford University Press.

Schaeffer, M. S., & Levitt, E. E. (1956). Concerning Kendall’s tau, a nonparametric correlation coefficient. Psychological Bulletin, 53(4), 338–346. doi:10.1037/h0045013

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def r_kendall_tau(ordField1, ordField2, levels1=None, levels2=None, version="b", test="kendall-appr", cc=False, useRanks=False):
    '''
    Kendall Tau
    -----------
    A rank correlation coefficient. It ranges from -1 (perfect negative association) to 1 (perfect positive association). A zero would indicate no correlation at all.
    
    A positive correlation indicates that if someone scored high on the first field, they also likely score high on the second, while a negative correlation would indicate a high score on the first would give a low score on the second.
    
    Alternatives for Gamma are Kendall Tau, Stuart-Kendall Tau and Somers D, but also Spearman rho could be considered.
    
    Kendall Tau b looks at so-called discordant and concordant pairs, but unlike Gamma it does not ignore tied pairs. Stuart-Kendall Tau c also, but also takes the size of the table into consideration. Somers d only makes a correction for tied pairs in one of the two directions. Spearman rho is more of a variation on Pearson correlation, but applied to ranks. See Göktaş and İşçi. (2011) for more information on the comparisons.
    
    Kendall Tau a is the same as Goodman-Kruskal Gamma. See r_stuart_tau() for Stuart-Kendall-Tau c.
    
    Parameters
    ----------
    ordField1 : pandas series
        the ordinal or scale scores of the first variable
    ordField2 : pandas series
        the ordinal or scale scores of the second variable
    levels1 : list or dictionary, optional
        the categories to use from ordField1
    levels2 : list or dictionary, optional
        the categories to use from ordField2
    version : {"b", "a"}, optional
        which Tau to determine
    test : {"bb", "kendall-appr", "as71", "kendall-exact"} : optional
        which test to use. Only applies if version="b"
    cc : boolean, optional
        use of continuity correction. Default is False
    useRanks : boolean, optional
        rank the data first or not. Default is False
        
    Returns
    -------
    A dataframe with:
    
    * *tau*, the tau value
    * *statistic*, the test statistic (z-value)
    * *p-value*, the p-value (significance)
    * *test*, description of the test used
    
    Notes
    -----
    
    The formula used for tau-a (Kendall, 1938, p. 82):
    $$\\tau_a = \\frac{P-Q}{n\\times\\left(n - 1\\right)}$$
    
    The formula for tau-b (Kendall, 1945, p. 243):
    $$\\tau_b = \\frac{P-Q}{\\sqrt{D_r\\times D_c}}$$
    
    An alternative formula for tau-b is actually used in the function (Kendall, 1962, p. 35; Kendall & Gibbons, 1990, p. 41; Schaeffer & Levitt, 1956, p. 339):
    $$\\tau_b = \\frac{n_c - n_d}{\\sqrt{\\left(n_0 - n_1\\right)\\times\\left(n_0 - n_2\\right)}}$$
    
    With:
    $$P = \\sum_{i,j} P_{i,j}$$
    $$Q = \\sum_{i,j} Q_{i,j}$$
    $$P_{i,j} = F_{i,j}\\times C_{i,j}$$
    $$Q_{i,j} = F_{i,j}\\times D_{i,j}$$
    $$C_{i,j} = \\sum_{h<i}\\sum_{k<j} F_{h,k} + \\sum_{h>i}\\sum_{k>j} F_{h,k}$$
    $$D_{i,j} = \\sum_{h<i}\\sum_{k>j} F_{h,k} + \\sum_{h>i}\\sum_{k<j} F_{h,k}$$
    $$D_r = n^2 - \\sum_{i=1}^r RS_i^2$$
    $$D_c = n^2 - \\sum_{j=1}^c CS_j^2$$
    $$RS_i = \\sum_{j=1}^c F_{i,j}$$
    $$CS_j = \\sum_{i=1}^r F_{i,j}$$
    $$n = \\sum_{i=1}^r \\sum_{j=1}^c F_{i,j} = \\sum_{j=1}^c CS_j = \\sum_{i=1}^r RS_i$$
    $$n_c = \\frac{P}{2}$$
    $$n_d = \\frac{Q}{2}$$
    $$n_0 = \\frac{n\\times\\left(n - 1\\right)}{2}$$
    $$n_1 = \\frac{\\sum_{i=1}^r RS_i\\times\\left(R_i -1\\right)}{2}$$
    $$n_2 = \\frac{\\sum_{j=1}^c CS_j\\times\\left(CS_j -1\\right)}{2}$$
    
    The significance (p-value) is calculated for tau a with (Kendall, 1938, p. 82; Kendall, 1962, p. 51; Schaeffer & Levitt, 1956, p. 341):
    $$z_{\\tau_a} = \\frac{\\tau_a}{\\sqrt{\\frac{\\sigma_{\\tau_a}^2}{n}}}$$
    $$\\sigma_{\\tau_a} = \\frac{4\\times n + 10}{9\\times n\\times\\left(n-1\\right)}$$
    $$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z_{\\tau_a}\\right|\\right)\\right)$$
    
    The significance (p-value) for tau b, when test="bb" is the approximation from Brown and Benedetti (1977, p. 311):
    $$z_{\\tau_b} = \\frac{\\tau_b}{ASE_0}$$
    $$ASE_0 = 2\\times\\sqrt{\\frac{\\sum_{i=1}^r \\sum_{j=1}^c F_{i,j}\\times\\left(C_{i,j} - D_{i,j}\\right)^2 - \\frac{\\left(P-Q\\right)^2}{n}}{\\left(n^2-\\sum_{i=1}^r RS_i^2\\right)\\times\\left(n^2-\\sum_{j=1}^c CS_j^2\\right)}}
    $$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z_{\\tau_b}\\right|\\right)\\right)$$
    
    The continuity correction can then also be applied using Schaeffer and Levitt (1956, p. 342):
    $$\\tau_b^* = \\left|\\tau_b\\right| - \\frac{2}{n\\times\\left(n - 1\\right)}$$
    
    Kendall (Kendall, 1962, p. 55; Kendall & Gibbons, 1990, p. 66) also gives an approximation:
    $$z_{\\tau_b} = \\frac{\\tau_b}{\\sqrt{s^2}}$$
    $$s^2 = \\frac{\\alpha_1 - \\alpha_2 - \\alpha_3}{18} + \\frac{\\alpha_5 \\times \\alpha_6}{\\alpha_4}+\\frac{\\alpha_8 \\times \\alpha_9}{\\alpha_7}$$
    $$\\alpha_1 = n\\times\\left(n-1\\right)\\times\\left(2\\times n+5\\right)$$
    $$\\alpha_2 = \\sum_{i=1}^r RS_i\\times\\left(RS_i-1\\right)\\times\\left(2\\times RS_i+5\\right)$$
    $$\\alpha_3 = \\sum_{j=1}^c CS_j\\times\\left(CS_j-1\\right)\\times\\left(2\\times CS_j+5\\right)$$
    $$\\alpha_4 = 9\\times n\\times\\left(n-1\\right)\\times\\left(n-2\\right)$$
    $$\\alpha_5 = \\sum_{i=1}^r RS_i\\times\\left(RS_i-1\\right)\\times\\left(RS_i-2\\right)$$ 
    $$\\alpha_6 = \\sum_{j=1}^c CS_j\\times\\left(CS_j-1\\right)\\times\\left(CS_j-2\\right)$$
    $$\\alpha_7 = 2\\times n\\times\\left(n-1\\right)$$
    $$\\alpha_8 = \\sum_{i=1}^r RS_i\\times\\left(RS_i-1\\right)$$ 
    $$\\alpha_9 = \\sum_{j=1}^c CS_j\\times\\left(CS_j-1\\right)$$
    
    If test="as71" then the exact Kendall-Tau-b distribution is used, with Algorithm AS71 from Best and Gipps (1974) adapted to Python by Stikker.
    
    *Symbols Used*
    
    * \\(F_{i,j}\\), the number of cases in row i, column j.
    * \\(n\\), the total sample size
    * \\(r\\), the number of rows
    * \\(c\\), the number of columns
    * \\(\\Phi\\left(\\dots\\right)\\), the cumulative distribution function of the standard normal distribution.
    
    References
    ----------
    Best, D. J., & Gipps, P. G. (1974). Algorithm AS 71: The upper tail probabilities of Kendall’s tau. *Applied Statistics, 23*(1), 98–100. doi:10.2307/2347062
    
    Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of test for correlation in two-way contingency tables. *Journal of the American Statistical Association, 72*(358), 309–315. doi:10.2307/2286793
    
    Göktaş, A., & İşçi, Ö. (2011). A comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. *Advances in Methodology and Statistics, 8*(1). doi:10.51936/milh5641
    
    Kendall, M. G. (1938). A new measure of rank correlation. *Biometrika, 30*(1–2), 81–93. doi:10.1093/biomet/30.1-2.81
    
    Kendall, M. G. (1945). The treatment of ties in ranking problems. *Biometrika, 33*(3), 239–251. doi:10.1093/biomet/33.3.239
    
    Kendall, M. G. (1962). *Rank correlation methods* (3rd ed.). Charles Griffin.
    
    Kendall, M., & Gibbons, J. D. (1990). *Rank correlation methods* (5th ed.). Oxford University Press.
    
    Schaeffer, M. S., & Levitt, E. E. (1956). Concerning Kendall’s tau, a nonparametric correlation coefficient. *Psychological Bulletin, 53*(4), 338–346. doi:10.1037/h0045013
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076   
    
    '''
    
    ct = tab_cross(ordField1, ordField2, order1=levels1, order2=levels2)
    k1 = ct.shape[0]
    k2 = ct.shape[1]
    
    if useRanks==False:
        if levels1 is not None:
            #replace row labels with numeric score
            ct = ct.reset_index(drop=True)
            
        if levels2 is not None:            
            ct.columns = [i for i in range(0, k2)]
    
    n = 0
    conc = [[0]*k1]*k2
    disc = [[0]*k1]*k2
    conc = pd.DataFrame(conc)
    disc = pd.DataFrame(disc)
    
    for i in range(0, k1):
        for j in range(0, k2):
            for h in range(0, k1):
                for k in range(0, k2):
                    
                    if useRanks:
                        if h > i and k > j:
                            conc.iloc[i,j] = conc.iloc[i,j] + ct.iloc[h,k]
                        elif h<i and k<j:
                            conc.iloc[i,j] = conc.iloc[i,j] + ct.iloc[h,k]
                        elif h>i and k<j:
                            disc.iloc[i,j] = disc.iloc[i,j] + ct.iloc[h,k]
                        elif h<i and k>j:
                            disc.iloc[i,j] = disc.iloc[i,j] + ct.iloc[h,k]                        
                    else:
                        if ct.index[h] > ct.index[i] and ct.columns[k] > ct.columns[j]:
                            conc.iloc[i,j] = conc.iloc[i,j] + ct.iloc[h,k]
                        elif ct.index[h] < ct.index[i] and ct.columns[k] < ct.columns[j]:
                            conc.iloc[i,j] = conc.iloc[i,j] + ct.iloc[h,k]
                        elif ct.index[h] > ct.index[i] and ct.columns[k] < ct.columns[j]:
                            disc.iloc[i,j] = disc.iloc[i,j] + ct.iloc[h,k]
                        elif ct.index[h] < ct.index[i] and ct.columns[k] > ct.columns[j]:
                            disc.iloc[i,j] = disc.iloc[i,j] + ct.iloc[h,k]
            n = n + ct.iloc[i,j]
    
    ct = ct.reset_index(drop=True)
    ct.columns = [i for i in range(0, k2)]
    
    p = (ct*conc).sum().sum()
    q = (ct*disc).sum().sum()
    
    if version=="a":
        #Kendall Tau a
        tauUsed = "Kendall Tau-a"
        tau = (p - q) / (n * (n - 1))
    
        varA = (4 * n + 10) / (9 * n * (n - 1))
        seA = (varA / n)**0.5
        z = tau / seA
        pVal = 2 * (1 - NormalDist().cdf(abs(z))) 
        testUsed = "Kendall approximation"
        
    elif version=="b":
        #Kendall Tau b
        tauUsed = "Kendall Tau-b"
        n0 = n * (n - 1) / 2
        
        rs = ct.sum(axis=1)
        n1 = (rs*(rs-1)).sum()/2
        
        cs = ct.sum()
        n2 = (cs*(cs-1)).sum()/2
        
        nc = p / 2
        nd = q / 2
        
        tau = (nc - nd) / ((n0 - n1) * (n0 - n2))**0.5
        
        if test=="bb":
            #Brown and Benedetti (1977, p. 311) = version from SPSS
            ase0 = (ct*(conc-disc)**2).sum().sum()           
            dr = n**2 - (rs**2).sum()
            dc = n**2 - (cs**2).sum()            
            ase0 = 2 * ((ase0 - (p - q)**2 / n) / (dr * dc))**0.5
            if cc:
                ## Schaeffer and Levitt (1956, p. 342):
                tauTest = abs(tau) - 2 / (n * (n - 1))
            else:
                tauTest = tau
                        
            z = tauTest / ase0
            
            pVal = 2 * (1 - NormalDist().cdf(abs(z)))
            testUsed = "Brown and Benedetti approximation"
            
        elif test=="kendall-appr":
            #Kendall (1962, p. 55)
            v2P1 = (rs*(rs - 1)*(rs - 2)).sum()
            v1P1 = (rs*(rs - 1)).sum()
            vr = (rs*(rs - 1)*(2*rs + 5)).sum()
            
            v2P2 = (cs*(cs - 1)*(cs - 2)).sum()
            v1P2 = (cs*(cs - 1)).sum()
            vc = (cs*(cs - 1)*(2*cs + 5)).sum()
            
            v0 = n * (n - 1) * (2 * n + 5)
            v1 = v1P1 * v1P2 / (2 * n * (n - 1))
            v2 = v2P1 * v2P2 / (9 * n * (n - 1) * (n - 2))
            v = (v0 - vr - vc) / 18 + v1 + v2
            
            if cc:
                z = (abs(nc - nd) - 1) / (v)**0.5
            else:
                z = (nc - nd) / (v)**0.5            
            
            pVal = 2 * (1 - NormalDist().cdf(abs(z)))
            if tau < 0:
                z = -abs(z)
                        
            testUsed = "Kendall approximation"
            
        elif test== "as71":
            pVal = di_kendall_tau(n, tau, "as71", cc)
            z = None
            testUsed = "exact with AS71 algorithm"
            
        elif test== "kendall-exact":
            pVal = di_kendall_tau(n, tau, "kendall", cc)
            z = None
            testUsed = "Kendall exact"
        
    res = pd.DataFrame([[tau, z, pVal, testUsed]])
    res.columns = [tauUsed, "statistic", "p-value", "test"]
    
    return res