R多重検定スティール・ドゥワス（Steel-Dwass）法

スティール・ドゥワス（Steel-Dwass）法とは、テューキー（Tukey）法の多重比較に対応するノンパラメトリックな多重比較です。
スティール・ドゥワス法を簡単に言うと、正規分布を仮定しない各群間を順位を用いて多重比較で調べる方法です。

Rで、スティール・ドゥワス法を使う場合は、「スティール・ドゥワス(Steel-Dwass)の方法による多重比較」のページにソースコードが紹介されています。

ここでは、このソースコードを少しばかり改変したものをご紹介します。主な対応は次のとおりです。

引数をformula形式に対応（response ~ group）
群を表すベクトルを1からの数値ベクトル以外のものに対応
返り値をデータフレームに変更

次が改変したソースコードです。標準パッケージstats内のソースコードも参考にさせていただきました。


steel.dwass.test <- function(x, ...) UseMethod("steel.dwass.test")

steel.dwass.test.default <- 
function(x, g, ...)
{
    if (is.list(x)) {
        if (length(x) < 2L)
            stop("'x' must be a list with at least 2 elements")
        if (!missing(g))
            warning("'x' is a list, so ignoring argument 'g'")
        DNAME <- deparse(substitute(x))
        x <- lapply(x, function(u) u <- u[complete.cases(u)])
        if (!all(sapply(x, is.numeric)))
            warning("some elements of 'x' are not numeric and will be coerced to numeric")
        k <- length(x)
        l <- sapply(x, "length")
        if (any(l == 0L))
            stop("all groups must contain data")
        g <- factor(rep.int(seq_len(k), l))
        x <- unlist(x)
    }
    else {
        if (length(x) != length(g))
            stop("'x' and 'g' must have the same length")
        DNAME <- paste(deparse(substitute(x)), "and",
                       deparse(substitute(g)))
        OK <- complete.cases(x, g)
        x <- x[OK]
        g <- g[OK]
        if (!all(is.finite(g)))
            stop("all group levels must be finite")
        g <- factor(g)
        k <- nlevels(g)
        if (k < 2L)
            stop("all observations are in the same group")
    }

    ## number of data in each group
    n.i <- table(g)

    ## calculate t-value
    t <- combn(levels(g), 2, function(ij) {
        i <- ij[1]
        j <- ij[2]
        ## ranking group i,j
        r <- rank(c(x[g == i], x[g == j]))                              
        ## test statistic
        R <- sum(r[1 : length(x[g == i])])                              
        ## total number of the 2 group data
        N <- n.i[i] + n.i[j]                                            
        ## expectation of test statistic
        E <- n.i[i] * (N + 1) / 2                                       
        ## variance of test statistic
        V <- n.i[i] * n.i[j] / (N * (N - 1)) * (sum(r^2) - N * (N + 1)^2 / 4) 
        ## return t-value
        return(abs(R - E) / sqrt(V))                                    
        })

    ## calculate p-value
    p <- ptukey(t * sqrt(2), length(n.i), Inf, lower.tail = FALSE)      

    df <- data.frame(
            comparison = combn(levels(g), 2, paste, collapse = ":"),
            t.value = t,
            p.value = p)
    return(df)
}

steel.dwass.test.formula <-
function(formula, data, subset, na.action, ...)
{
    if(missing(formula) || (length(formula) != 3L))
        stop("'formula' missing or incorrect")
    m <- match.call(expand.dots = FALSE)
    if(is.matrix(eval(m$data, parent.frame())))
        m$data <- as.data.frame(data)
    ## need stats:: for non-standard evaluation
    m[[1L]] <- quote(stats::model.frame)
    mf <- eval(m, parent.frame())
    if (!is.factor(mf[[2]])) {
      mf[[2]] <- factor(mf[[2]])
    }
    if(length(mf) > 2L)
        stop("'formula' should be of the form response ~ group")
    DNAME <- paste(names(mf), collapse = " by ")
    names(mf) <- NULL
    y <- do.call("steel.dwass.test", as.list(mf))
    y
}

元のソースコードと結果が一致するかを確認します。次が元のソースコードの結果です。


data <- c(
(con)    6.9, 7.5, 8.5, 8.4, 8.1, 8.7, 8.9, 8.2, 7.8, 7.3, 6.8, # 第 1 群のデータ，11 例
(con)    9.6, 9.4, 9.5, 8.5, 9.4, 9.9, 8.7, 8.1, 7.8, 8.8,      # 第 2 群のデータ，10 例
(con)    5.7, 6.4, 6.8, 7.8, 7.6, 7.0, 7.7, 7.5, 6.8, 5.9,      # 第 3 群のデータ，10 例
(con)    7.6, 8.7, 8.5, 8.5, 9.0, 9.2, 9.3, 8.0, 7.2, 7.9, 7.8  # 第 4 群のデータ，11 例
(out)    )
group <- rep(1:4, c(11, 10, 10, 11))                     # 群の識別変数
Steel.Dwass(data, group)
(out)
(out)           t           p
(out)1:2 2.680234 0.036960431
(out)1:3 2.539997 0.053980573
(out)1:4 1.282642 0.574011771
(out)2:3 3.746076 0.001031145
(out)2:4 2.046776 0.170965537
(out)3:4 3.384456 0.003976894

次が改変したソースコードの結果です。意図的に群を表すベクトルを文字列にして、formulaを用いています。


df <- data.frame(response = data, group = rep(letters[1:4], c(11, 10, 10, 11)))
steel.dwass.test(response ~ group, data = df)
(out)
(out)  comparison  t.value     p.value
(out)1        a:b 2.680234 0.036960431
(out)2        a:c 2.539997 0.053980573
(out)3        a:d 1.282642 0.574011771
(out)4        b:c 3.746076 0.001031145
(out)5        b:d 2.046776 0.170965537
(out)6        c:d 3.384456 0.003976894

結果が同様になることが確認できました。

追記

Rのバージョンが3.6.xでは動作していたが、4.1.0では次のエラーが出て、動作しない事例をご報告いただきました。
上記のコードは以下のエラーが出ないように変更したコードになります。
Rのバージョン3.6.3と4.1.0で調べてみたところ、eval()関数またはparent.frame()関数の挙動がバージョン間で異なっておりました。
ご報告いただいた方へ、この場を借りてお礼申し上げます。


teel.dwass.test.default(c(6.9, 7.5, 8.5, 8.4, 8.1, 8.7, 8.9,  でエラー: 
  all group levels must be finite

追記

関連記事