Journal of Statistical Software: Volume 111の記事一覧

Journal of Statistical Software Volume 111に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。

1 記事
2 参考文献
3 関連情報

記事

clinicalsignificance: Clinical Significance Analyses of Intervention Studies in R

clinicalsignificance: Clinical Significance Analyses of Intervention Studies in R / clinicalsignificance: Rを用いた介入研究の臨床的意義分析

The analysis of clinical significance is helpful to decide if an intervention leads to practically relevant or meaningful changes for individual patients which is clearly different from the analysis of statistical significance. However, the framework is used rarely and inconsistently. We introduce the R package clinicalsignificance to harness the use of clinical significance analysis of intervention trials in clinical research. This package provides all relevant methods to calculate and present analyses of clinical significance in a consistent form and easy to use implementation. Despite its shortcomings, clinical significance analyses are a valuable tool to gain more insight into intended and potential unintended intervention effects and they may improve the interpretation and comparability of intervention trial results. Lastly, analyses of clinical significance may guide researchers and policy makers in determining which interventions are clinically effective.

臨床的意義の分析は、介入が個々の患者にとって実際的に関連性のある、あるいは意味のある変化をもたらすかどうかを判断するのに役立ちますが、これは統計的意義の分析とは明らかに異なります。しかしながら、この枠組みはめったに使用されず、一貫性もありません。そこで、臨床研究における介入試験の臨床的意義分析の活用を促進するために、Rパッケージ「clinicalsignificance」を開発しました。このパッケージは、臨床的意義の分析を一貫した形式で、かつ使いやすい形で計算・提示するための関連メソッドをすべて提供します。臨床的意義分析には欠点もありますが、意図した介入効果と潜在的な意図しない介入効果についてより深い洞察を得るための貴重なツールであり、介入試験結果の解釈と比較可能性を向上させる可能性があります。最後に、臨床的意義分析は、研究者や政策立案者がどの介入が臨床的に有効であるかを判断する際の指針となるでしょう。

jti and sparta: Time and Space Efficient Packages for Model-Based Prediction in Large Bayesian Networks

jti and sparta: Time and Space Efficient Packages for Model-Based Prediction in Large Bayesian Networks / JTIとSparta:大規模ベイジアンネットワークにおけるモデルベース予測のための時間とスペース効率の良いパッケージ

A Bayesian network is a multivariate (potentially very high dimensional) probabilistic model formed by combining lower-dimensional components. In Bayesian networks, the computation of conditional probabilities is fundamental for model-based predictions. This is usually done based on message passing algorithms that utilize conditional independence structures. In this paper, we deal with a specific message passing algorithm that exploits a second structure called a junction tree and hence is known as the junction tree algorithm (JTA). In Bayesian networks for discrete variables with finite state spaces, there is a fundamental problem in high dimensions: A discrete distribution is represented by a table of values, and in high dimensions, such tables can become prohibitively large. In JTA, such tables must be multiplied which can lead to even larger tables. The jti package meets this challenge by using the package sparta by implementing methods that efficiently handle multiplication and marginalization of sparse tables through JTA. The two packages are written in the R programming language and are freely available from the Comprehensive R Archive Network.

ベイジアンネットワークは、低次元のコンポーネントを組み合わせて形成される多変量 (潜在的に非常に高次元) 確率モデルです。ベイジアンネットワークでは、条件付き確率の計算はモデルベースの予測の基本です。これは通常、条件付き独立構造を利用するメッセージパッシングアルゴリズムに基づいて行われます。この論文では、ジャンクションツリーと呼ばれる 2 番目の構造を利用する特定のメッセージパッシングアルゴリズムを取り上げます。このアルゴリズムは、ジャンクションツリーアルゴリズム (JTA) として知られています。有限状態空間を持つ離散変数のベイジアンネットワークでは、高次元で基本的な問題が発生します。離散分布は値のテーブルで表されますが、高次元では、このようなテーブルが法外に大きくなる可能性があります。JTA では、このようなテーブルを乗算する必要があり、テーブルがさらに大きくなる可能性があります。jti パッケージは、パッケージ sparta を使用して、JTA を通じてスパーステーブルの乗算とマージナル化を効率的に処理するメソッドを実装することで、この課題に対応しています。2 つのパッケージは R プログラミング言語で記述されており、Comprehensive R Archive Network から無料で入手できます。

GET: Global Envelopes in R

GET: Global Envelopes in R / GET: Rにおけるグローバルエンベロープ

This work describes the R package GET that implements global envelopes for a general set of d-dimensional vectors T in various applications. A 100(1 – α)% global envelope is a band bounded by two vectors such that the probability that T falls outside this envelope in any of the d points is equal to α. The term ‘global’ means that this probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be employed for central regions of functional or multivariate data, for graphical Monte Carlo and permutation tests where the test statistic is multivariate or functional, and for global confidence and prediction bands. Intrinsic graphical interpretation property is introduced for global envelopes. The global envelopes included in the GET package have this property, which particularly helps to interpret test results, by providing a graphical interpretation that shows the reasons of rejection of the tested hypothesis. Examples of different uses of global envelopes and their implementation in the GET package are presented, including global envelopes for single and several one- or two-dimensional functions, Monte Carlo goodness-of-fit tests for simple and composite hypotheses, comparison of distributions, functional analysis of variance, functional linear model, and confidence bands in polynomial regression.

本稿では、さまざまなアプリケーションにおいて、一般的なd次元ベクトルTのグローバルエンベロープを実装するRパッケージGETについて説明します。100(1 -α)%グローバルエンベロープとは、2つのベクトルによって囲まれた帯であり、d個の点のいずれかでTがこのエンベロープの外側に収まる確率が α に等しくなります。「グローバル」という用語は、この確率がベクトルのすべてのd個の要素に対して同時に制御されることを意味します。グローバルエンベロープは、関数データまたは多変量データの中心領域、検定統計量が多変量または関数であるグラフィカルなモンテカルロ検定および順列検定、グローバルな信頼帯および予測帯に使用できます。グローバルエンベロープには、固有のグラフィカルな解釈特性が導入されています。GETパッケージに含まれるグローバルエンベロープはこの特性を持ち、特に検定結果の解釈に役立ちます。これは、検定された仮説が棄却された理由を示すグラフィカルな解釈を提供するためです。グローバルエンベロープのさまざまな使用例と、GETパッケージにおけるその実装例を紹介します。これには、単一または複数の1次元または2次元関数に対するグローバルエンベロープ、単純および複合仮説に対するモンテカルロ適合度検定、分布の比較、関数分散分析、関数線形モデル、多項式回帰における信頼区間などが含まれます。

BEKKs: An R Package for Estimation of Conditional Volatility of Multivariate Time Series

BEKKs: An R Package for Estimation of Conditional Volatility of Multivariate Time Series / BEKKs: 多変量時系列の条件付きボラティリティを推定するためのRパッケージ

We describe the R package BEKKs, which implements the estimation and diagnostic analysis of a prominent family of multivariate generalized autoregressive conditionally heteroskedastic (MGARCH) processes, the so-called BEKK models. Unlike existing software packages, we make use of analytical derivatives implemented in efficient C++ code for nonlinear log-likelihood optimization. This allows fast parameter estimation even in higher model dimensions N > 3. The baseline BEKK model is complemented with an asymmetric parameterization that allows for a flexible modeling of conditional (co)variances. Furthermore, we provide the user with the simplified scalar and diagonal BEKK models to deal with high dimensionality of heteroskedastic time series. The package is designed in an object-oriented way featuring a comprehensive toolbox of methods to investigate and interpret, for instance, volatility impulse response functions, risk estimation and forecasting (VaR) and a backtesting algorithm to compare the forecasting performance of alternative BEKK models. For illustrative purposes, we analyze a bivariate ETF return series (S&P, US treasury bonds) and a four-dimensional system comprising, in addition, a gold ETF and changes of a log oil price by means of the suggested package. We find that the BEKKs package is more than 100 times faster for time series systems of dimension N > 3 than other existing packages.

我々は、多変量一般化自己回帰条件付き異分散 (MGARCH) プロセスの代表的なファミリー、いわゆる BEKK モデルの推定と診断分析を実装する R パッケージ BEKKs について説明します。既存のソフトウェアパッケージとは異なり、我々は非線形対数尤度最適化のために効率的な C++ コードで実装された分析導関数を使用します。これにより、N > 3 の高次元モデルでも高速なパラメータ推定が可能になります。ベースライン BEKK モデルは、条件付き (共) 分散の柔軟なモデリングを可能にする非対称パラメータ化で補完されます。さらに、我々はユーザーに、異分散時系列の高次元を処理するための簡略化されたスカラーおよび対角 BEKK モデルを提供します。このパッケージはオブジェクト指向で設計されており、たとえばボラティリティインパルス応答関数、リスク推定および予測 (VaR)、代替 BEKK モデルの予測パフォーマンスを比較するためのバックテストアルゴリズムなどを調査および解釈するための包括的なツールボックスを備えています。説明のために、提案されたパッケージを使用して、2 変量 ETF リターンシリーズ (S&P、米国債) と、金 ETF および対数原油価格の変化を含む 4 次元システムを分析します。BEKKs パッケージは、次元 N > 3 の時系列システムの場合、他の既存のパッケージよりも 100 倍以上高速であることがわかりました。

Birth-and-Death Processes in Python: The BirDePy Package

Birth-and-Death Processes in Python: The BirDePy Package / Pythonでの誕生と死のプロセス: BirDePyパッケージ

Birth-and-death processes (BDPs) form a class of continuous-time Markov chains that are particularly suited to describing the changes in the size of a population over time. Population-size-dependent BDPs (PSDBDPs) allow the rate at which a population grows to depend on the current population size. The main purpose of our new Python package BirDePy is to provide easy-to-use functions that allow the parameters of discretely-observed PSDBDPs to be estimated. The package can also be used to estimate parameters of continuously-observed PSDBDPs, simulate sample paths, approximate transition probabilities, and generate forecasts. We describe in detail several methods which have been incorporated into BirDePy to achieve each of these tasks. The usage and effectiveness of the package is demonstrated through a variety of examples of PSDBDPs, as well as case studies involving annual population count data of two endangered bird species.

出生死亡プロセス (BDP) は、連続時間マルコフ連鎖の一種で、時間の経過に伴う個体群規模の変化を記述するのに特に適しています。個体群規模依存 BDP (PSDBDP) では、個体群の成長率が現在の個体群規模に依存します。新しい Python パッケージ BirDePy の主な目的は、離散的に観測される PSDBDP のパラメータを推定できる使いやすい関数を提供することです。このパッケージは、連続的に観測される PSDBDP のパラメータを推定したり、サンプルパスをシミュレートしたり、遷移確率を概算したり、予測を生成したりするためにも使用できます。これらの各タスクを達成するために BirDePy に組み込まれているいくつかの方法について詳しく説明します。パッケージの使用方法と有効性は、さまざまな PSDBDP の例や、絶滅危惧種の鳥類 2 種の年間個体数データを含むケーススタディを通じて実証されています。

pyStoNED: A Python Package for Convex Regression and Frontier Estimation

pyStoNED: A Python Package for Convex Regression and Frontier Estimation / pyStoNED: 凸回帰とフロンティア推定のためのPythonパッケージ

Shape-constrained nonparametric regression is a growing area in econometrics, statistics, operations research, machine learning, and related fields. In the field of productivity and efficiency analysis, recent developments in multivariate convex regression and related techniques such as convex quantile regression and convex expectile regression have bridged the long-standing gap between the conventional deterministic-nonparametric and stochastic-parametric methods. Unfortunately, the heavy computational burden and the lack of a powerful, reliable, and fully open-access computational package have slowed down the diffusion of these advanced estimation techniques to the empirical practice. The purpose of the Python package pyStoNED is to address this challenge by providing a freely available and user-friendly tool for multivariate convex regression, convex quantile regression, convex expectile regression, isotonic regression, stochastic nonparametric envelopment of data, and related methods. This paper presents a tutorial of the pyStoNED package and illustrates its application, focusing on estimating frontier cost and production functions.

形状制約付きノンパラメトリック回帰は、計量経済学、統計学、オペレーションズリサーチ、機械学習、および関連分野で成長している分野です。生産性と効率性の分析の分野では、多変量凸回帰と、凸分位回帰や凸期待値回帰などの関連手法の最近の開発により、従来の決定論的ノンパラメトリック法と確率的パラメトリック法の間の長年のギャップが埋められました。残念ながら、計算負荷が大きく、強力で信頼性が高く、完全にオープンアクセスの計算パッケージがないため、これらの高度な推定手法が実証的な実践に普及するペースが遅くなっています。Python パッケージ pyStoNED の目的は、多変量凸回帰、凸分位回帰、凸期待値回帰、アイソトニック回帰、データの確率的ノンパラメトリック包絡、および関連手法のための無料で利用できる使いやすいツールを提供することで、この課題に対処することです。この論文では、pyStoNED パッケージのチュートリアルを紹介し、フロンティアコストと生産関数の推定に焦点を当ててそのアプリケーションを説明します。

mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R

mlr3spatiotempcv: Spatiotemporal Resampling Methods for Machine Learning in R / mlr3spatiotempcv: Rでの機械学習のための時空間リサンプリング手法

Spatial and spatiotemporal machine-learning models require a suitable framework for their model assessment, model selection, and hyperparameter tuning, in order to avoid error estimation bias and over-fitting. This contribution provides an overview of the state-of-the-art in spatial and spatiotemporal cross-validation techniques and their implementations in R while introducing the R package mlr3spatiotempcv as an extension package of the machine-learning framework mlr3. Currently various R packages implementing different spatiotemporal partitioning strategies exist: blockCV, CAST, skmeans and sperrorest. The goal of mlr3spatiotempcv is to gather the available spatiotemporal resampling methods in R and make them available to users through a simple and common interface. This is made possible by integrating the package directly into the mlr3 machine-learning framework, which already has support for generic non-spatiotemporal resampling methods such as random partitioning. One advantage is the use of a consistent nomenclature in an overarching machine-learning toolkit instead of a varying package-specific syntax, making it easier for users to choose from a variety of spatiotemporal resampling methods. This package avoids giving recommendations which method to use in practice as this decision depends on the predictive task at hand, the autocorrelation within the data, and the spatial structure of the sampling design or geographic objects being studied.

空間および時空間機械学習モデルでは、誤差推定バイアスや過剰適合を回避するために、モデル評価、モデル選択、ハイパーパラメータ調整のための適切なフレームワークが必要です。この投稿では、機械学習フレームワーク mlr3 の拡張パッケージとして R パッケージ mlr3spatiotempcv を紹介しながら、空間および時空間クロス検証手法の最新技術と R でのその実装の概要を示します。現在、さまざまな時空間分割戦略を実装するさまざまな R パッケージが存在します: blockCV、CAST、skmeans、sperrorest。mlr3spatiotempcv の目標は、R で利用可能な時空間リサンプリング手法を収集し、シンプルで共通のインターフェイスを通じてユーザーが利用できるようにすることです。これは、ランダム分割などの一般的な非時空間リサンプリング手法をすでにサポートしている mlr3 機械学習フレームワークにパッケージを直接統合することで可能になります。 1 つの利点は、パッケージ固有のさまざまな構文ではなく、包括的な機械学習ツールキットで一貫した命名法を使用することです。これにより、ユーザーはさまざまな時空間リサンプリング方法から選択しやすくなります。このパッケージでは、実際にどの方法を使用するかについての推奨は提供されません。この決定は、手元の予測タスク、データ内の自己相関、および調査対象のサンプリング設計または地理オブジェクトの空間構造によって異なります。

Interpreting Deep Neural Networks with the Package innsight

Interpreting Deep Neural Networks with the Package innsight / パッケージinnsightによるディープニューラルネットワークの解釈

The R package innsight offers a general toolbox for revealing variable-wise interpretations of deep neural networks’ predictions with so-called feature attribution methods. Aside from the unified and user-friendly framework, the package stands out in three ways: It is generally the first R package implementing feature attribution methods for neural networks. Secondly, it operates independently of the deep learning library, allowing the interpretation of neural networks from any R package, including keras, torch, neuralnet, and even custom models. Despite its flexibility, innsight benefits internally from the torch package’s fast and efficient array calculations, which builds on LibTorch – PyTorch’s C++ backend – without a Python dependency. Finally, it offers a variety of visualization tools for tabular, signal, image data, or a combination of these. Additionally, the plots can be rendered interactively using the plotly package.

R パッケージ innsight は、いわゆる特徴帰属法を使用して、ディープニューラルネットワークの予測の変数ごとの解釈を明らかにするための一般的なツールボックスを提供します。統一されたユーザーフレンドリなフレームワークとは別に、このパッケージは次の 3 つの点で際立っています。一般的に、ニューラルネットワークの特徴帰属法を実装した最初の R パッケージです。2 番目に、ディープラーニングライブラリから独立して動作し、keras、torch、neuralnet、さらにはカスタムモデルを含む任意の R パッケージからニューラルネットワークを解釈できます。柔軟性があるにもかかわらず、innsight は、Python に依存せずに LibTorch (PyTorch の C++ バックエンド) 上に構築された torch パッケージの高速で効率的な配列計算から内部的に恩恵を受けています。最後に、表形式、信号、画像データ、またはこれらの組み合わせ用のさまざまな視覚化ツールを提供します。さらに、プロットは plotly パッケージを使用してインタラクティブにレンダリングできます。

How to Interpret Statistical Models Using marginaleffects for R and Python

How to Interpret Statistical Models Using marginaleffects for R and Python / RおよびPythonのmarginaleffectsを使用して統計モデルを解釈する方法

The parameters of a statistical model can sometimes be difficult to interpret substantively, especially when that model includes nonlinear components, interactions, or transformations. Analysts who fit such complex models often seek to transform raw parameter estimates into quantities that are easier for domain experts and stakeholders to understand. This article presents a simple conceptual framework to describe a vast array of such quantities of interest, which are reported under imprecise and inconsistent terminology across disciplines: predictions, marginal predictions, marginal means, marginal effects, conditional effects, slopes, contrasts, risk ratios, etc. We introduce marginaleffects, a package for R and Python which offers a simple and powerful interface to compute all of those quantities, and to conduct (non-)linear hypothesis and equivalence tests on them. marginaleffects is lightweight; extensible; it works well in combination with other R and Python packages; and it supports over 100 classes of models, including linear, generalized linear, generalized additive, mixed effects, Bayesian, and several machine learning models.

統計モデルのパラメータは、特にモデルに非線形コンポーネント、相互作用、または変換が含まれている場合、実質的に解釈するのが難しい場合があります。このような複雑なモデルを適合するアナリストは、多くの場合、生のパラメータ推定値をドメインの専門家や利害関係者が理解しやすい量に変換しようとします。この記事では、予測、限界予測、限界平均、限界効果、条件付き効果、傾き、対比、リスク比など、分野間で不正確で一貫性のない用語で報告されている、このような膨大な量の関心を表すためのシンプルな概念フレームワークを示します。ここでは、R および Python 用のパッケージである marginaleffects を紹介します。このパッケージは、これらすべての量を計算し、それらに対して (非) 線形仮説および同等性テストを実行するためのシンプルで強力なインターフェイスを提供します。marginaleffects は軽量で拡張可能であり、他の R および Python パッケージと組み合わせても適切に機能します。また、線形、一般化線形、一般化加法、混合効果、ベイズ、およびいくつかの機械学習モデルを含む 100 を超えるクラスのモデルをサポートしています。

Estimating Conditional Distributions with Neural Networks Using R Package deeptrafo

Estimating Conditional Distributions with Neural Networks Using R Package deeptrafo / Rパッケージdeeptrafoを用いたニューラルネットワークによる条件付き分布の推定

Contemporary empirical applications frequently require flexible regression models for complex response types and large tabular or non-tabular, including image or text, data. Classical regression models either break down under the computational load of processing such data or require additional manual feature extraction to make these problems tractable. Here, we present deeptrafo, a package for fitting flexible regression models for conditional distributions using a tensorflow back end with numerous additional processors, such as neural networks, penalties, and smoothing splines. Package deeptrafo implements deep conditional transformation models (DCTMs) for binary, ordinal, count, survival, continuous, and time series responses, potentially with uninformative censoring. Unlike other available methods, DCTMs do not assume a parametric family of distributions for the response. Further, the data analyst may trade off interpretability and flexibility by supplying custom neural network architectures and smoothers for each term in an intuitive formula interface. We demonstrate how to set up, fit, and work with DCTMs for several response types. We further showcase how to construct ensembles of these models, evaluate models using inbuilt cross-validation, and use other convenience functions for DCTMs in several applications. Lastly, we discuss DCTMs in light of other approaches to regression with non-tabular data.

現代の実証的応用では、複雑な応答タイプや、画像やテキストを含む大規模な表形式または非表形式データに対して、柔軟な回帰モデルが頻繁に求められます。従来の回帰モデルは、このようなデータの処理における計算負荷に耐えられず破綻するか、あるいはこれらの問題を扱いやすくするために追加の手動特徴抽出が必要となります。そこで本稿では、TensorFlowバックエンドとニューラルネットワーク、ペナルティ、平滑化スプラインなどの多数の追加プロセッサを使用して、条件付き分布に対する柔軟な回帰モデルを適合させるためのパッケージであるdeeptrafoを紹介します。deeptrafoパッケージは、二値、順序、カウント、生存、連続、時系列応答に対して、情報のない打ち切りを含む可能性のあるディープ条件付き変換モデル（DCTM）を実装しています。他の手法とは異なり、DCTMは応答変数に対してパラメトリックな分布族を仮定しません。さらに、データアナリストは、直感的な数式インターフェースで各項にカスタムニューラルネットワークアーキテクチャとスムーザーを提供することで、解釈性と柔軟性のトレードオフを行うことができます。本稿では、複数の応答タイプに対してDCTMを設定、適合、および操作する方法を示します。さらに、これらのモデルのアンサンブルの構築方法、組み込みの交差検証を使用したモデルの評価方法、および複数のアプリケーションにおけるDCTMのその他の便利な関数の使用方法を紹介します。最後に、非表形式データを用いた回帰分析の他のアプローチとの比較において、DCTMについて考察します。

参考文献

Journal of Statistical Software Volume 111

記事