Journal of Artificial Intelligence Resarch: Volume 17の論文一覧

Journal of Artificial Intelligence Resarch Vol. 17 (2002)に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。
Monitoring Teams by Overhearing: A Multi-Agent Plan-Recognition Approach
傍聴によるチーム監視：マルチエージェント計画認識アプローチ

Recent years are seeing an increasing need for on-line monitoring of teams of cooperating agents, e.g., for visualization, or performance tracking. However, in monitoring deployed teams, we often cannot rely on the agents to always communicate their state to the monitoring system. This paper presents a non-intrusive approach to monitoring by ‘overhearing’, where the monitored team’s state is inferred (via plan-recognition) from team-members’ *routine* communications, exchanged as part of their coordinated task execution, and observed (overheard) by the monitoring system. Key challenges in this approach include the demanding run-time requirements of monitoring, the scarceness of observations (increasing monitoring uncertainty), and the need to scale-up monitoring to address potentially large teams. To address these, we present a set of complementary novel techniques, exploiting knowledge of the social structures and procedures in the monitored team: (i) an efficient probabilistic plan-recognition algorithm, well-suited for processing communications as observations; (ii) an approach to exploiting knowledge of the team’s social behavior to predict future observations during execution (reducing monitoring uncertainty); and (iii) monitoring algorithms that trade expressivity for scalability, representing only certain useful monitoring hypotheses, but allowing for any number of agents and their different activities to be represented in a single coherent entity. We present an empirical evaluation of these techniques, in combination and apart, in monitoring a deployed team of agents, running on machines physically distributed across the country, and engaged in complex, dynamic task execution. We also compare the performance of these techniques to human expert and novice monitors, and show that the techniques presented are capable of monitoring at human-expert levels, despite the difficulty of the task.

近年、視覚化やパフォーマンス追跡などのために、協力するエージェントのチームをオンラインで監視する必要性が高まっています。しかし、展開されたチームを監視する場合、エージェントが常に監視システムに状態を伝えるとは限らない。本稿では、「傍受」による非侵入的な監視手法を提示します。この手法では、監視対象チームの状態は、チームメンバーの日常的なコミュニケーションから（計画認識を介して）推論され、協調的なタスク実行の一環として交換され、監視システムによって観察（傍受）されます。このアプローチにおける主な課題としては、監視の厳しい実行時間要件、観測データの不足（監視の不確実性の増大）、そして潜在的に大規模なチームに対応するために監視をスケールアップする必要性などが挙げられます。これらの課題に対処するため、監視対象チームの社会的構造と手順に関する知識を活用する、補完的な新技術群を紹介します。(i)コミュニケーションを観測データとして処理するのに適した、効率的な確率的計画認識アルゴリズム、(ii)実行中にチームの社会的行動に関する知識を活用して将来の観測データを予測するアプローチ（監視の不確実性の低減）、(iii)表現力を犠牲にしてスケーラビリティを確保した監視アルゴリズム。このアルゴリズムは、特定の有用な監視仮説のみを表現し、任意の数のエージェントとその異なる活動を単一の首尾一貫したエンティティで表現することを可能にします。これらの技術を、物理的に全国に分散されたマシン上で実行され、複雑で動的なタスク実行に従事するエージェントの展開チームを監視する際に、組み合わせて、または個別に使用した際の実証的評価を示します。また、これらの手法のパフォーマンスを人間の専門家および初心者の監視者と比較し、提示された手法はタスクの難しさにもかかわらず、人間の専門家レベルで監視できることを示します。

A Logic for Reasoning about Upper Probabilities
上側確率に関する推論ロジック

We present a propositional logic to reason about the uncertainty of events, where the uncertainty is modeled by a set of probability measures assigning an interval of probability to each event. We give a sound and complete axiomatization for the logic, and show that the satisfiability problem is NP-complete, no harder than satisfiability for propositional logic.

我々は、事象の不確実性について推論するための命題論理を提示します。ここで、不確実性は、各事象に確率の区間を割り当てる一連の確率測度によってモデル化されます。我々は、この論理の健全かつ完全な公理化を与え、充足可能性問題がNP完全であり、命題論理の充足可能性問題よりも困難ではないことを示す。

Expert-Guided Subgroup Discovery: Methodology and Application
専門家によるサブグループ発見：方法論と応用

This paper presents an approach to expert-guided subgroup discovery. The main step of the subgroup discovery process, the induction of subgroup descriptions, is performed by a heuristic beam search algorithm, using a novel parametrized definition of rule quality which is analyzed in detail. The other important steps of the proposed subgroup discovery process are the detection of statistically significant properties of selected subgroups and subgroup visualization: statistically significant properties are used to enrich the descriptions of induced subgroups, while the visualization shows subgroup properties in the form of distributions of the numbers of examples in the subgroups. The approach is illustrated by the results obtained for a medical problem of early detection of patient risk groups.

本論文では、専門家によるサブグループ発見へのアプローチを提示します。サブグループ発見プロセスの主要なステップであるサブグループ記述の誘導は、詳細に分析されたルール品質の新たなパラメータ化された定義を用いたヒューリスティックビームサーチアルゴリズムによって実行されます。提案されたサブグループ発見プロセスのその他の重要なステップは、選択されたサブグループの統計的に有意な特性の検出とサブグループの可視化です。統計的に有意な特性は、誘導されたサブグループの説明を充実させるために使用され、可視化では、サブグループ内の例の数の分布の形でサブグループの特性が示されます。このアプローチは、患者のリスクグループの早期発見という医療問題で得られた結果によって説明されます。

Policy Recognition in the Abstract Hidden Markov Model
抽象隠れマルコフモデルにおける方策認識

In this paper, we present a method for recognising an agent’s behaviour in dynamic, noisy, uncertain domains, and across multiple levels of abstraction. We term this problem on-line plan recognition under uncertainty and view it generally as probabilistic inference on the stochastic process representing the execution of the agent’s plan. Our contributions in this paper are twofold. In terms of probabilistic inference, we introduce the Abstract Hidden Markov Model (AHMM), a novel type of stochastic processes, provide its dynamic Bayesian network (DBN) structure and analyse the properties of this network. We then describe an application of the Rao-Blackwellised Particle Filter to the AHMM which allows us to construct an efficient, hybrid inference method for this model. In terms of plan recognition, we propose a novel plan recognition framework based on the AHMM as the plan execution model. The Rao-Blackwellised hybrid inference for AHMM can take advantage of the independence properties inherent in a model of plan execution, leading to an algorithm for online probabilistic plan recognition that scales well with the number of levels in the plan hierarchy. This illustrates that while stochastic models for plan execution can be complex, they exhibit special structures which, if exploited, can lead to efficient plan recognition algorithms. We demonstrate the usefulness of the AHMM framework via a behaviour recognition system in a complex spatial environment using distributed video surveillance data.

本稿では、動的、ノイズ、不確実性の領域で、複数の抽象化レベルにわたってエージェントの行動を認識する方法を紹介します。この問題を不確実性下でのオンライン計画認識と呼び、一般的にはエージェントの計画の実行を表す確率過程に対する確率推論として捉えます。本稿での貢献は2つあります。確率推論の点では、新しいタイプの確率過程である抽象隠れマルコフモデル(AHMM)を紹介し、その動的ベイジアンネットワーク(DBN)構造を示し、このネットワークの特性を分析します。次に、Rao-Blackwellised Particle FilterをAHMMに適用する方法について説明します。これにより、このモデルに対する効率的なハイブリッド推論方法を構築できます。計画認識の点では、計画実行モデルとしてAHMMに基づく新しい計画認識フレームワークを提案します。AHMMのためのRao-Blackwell化ハイブリッド推論は、プラン実行モデルに固有の独立性特性を活用し、プラン階層のレベル数に応じて適切にスケーリング可能なオンライン確率プラン認識アルゴリズムを実現します。これは、プラン実行の確率モデルは複雑になり得るものの、特殊な構造を示しており、これを活用すれば効率的なプラン認識アルゴリズムにつながることを示しています。本稿では、分散型ビデオ監視データを用いた複雑な空間環境における行動認識システムを通して、AHMMフレームワークの有用性を実証します。

Specific-to-General Learning for Temporal Events with Application to Learning Event Definitions from Video
時系列イベントの特定から一般への学習と、ビデオからのイベント定義学習への応用

We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that is sufficiently expressive to represent many events yet sufficiently restrictive to support learning. We then give algorithms, along with lower and upper complexity bounds, for the subsumption and generalization problems for AMA formulas. We present a positive-examples–only specific-to-general learning method based on these algorithms. We also present a polynomial-time–computable “syntactic” subsumption test that implies semantic subsumption without being equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. Finally, we apply this algorithm to the task of learning relational event definitions from video and show that it yields definitions that are competitive with hand-coded ones.

我々は、単純な時相論理のための、新しい教師あり学習器を開発、分析、評価し、得られたアルゴリズムを用いてビデオシーケンスから視覚イベント定義を学習します。まず、多くのイベントを表現するのに十分な表現力を持ちながら、学習をサポートするのに十分な制限を持つ、AMAと呼ばれる単純な命題的時相イベント記述言語を導入します。次に、AMA式の包摂および一般化問題に対するアルゴリズムと、その計算量の下限と上限を示します。これらのアルゴリズムに基づく、正例のみを用いた、特殊から一般への学習法を提示します。また、意味的包摂を示唆するが、それと同値ではない、多項式時間で計算可能な「構文的」包摂テストを提示します。意味的一般化の代わりに統語的包摂に基づく一般化アルゴリズムを用いることで、得られる学習アルゴリズムの漸近的複雑性を改善できます。最後に、このアルゴリズムをビデオから関係イベント定義を学習するタスクに適用し、手作業でコーディングされたものと同等の定義が得られることを示す。

Competitive Safety Analysis: Robust Decision-Makingin Multi-Agent Systems
競争安全性分析：マルチエージェントシステムにおけるロバストな意思決定

Much work in AI deals with the selection of proper actions in a given (known or unknown) environment. However, the way to select a proper action when facing other agents is quite unclear. Most work in AI adopts classical game-theoretic equilibrium analysis to predict agent behavior in such settings. This approach however does not provide us with any guarantee for the agent. In this paper we introduce competitive safety analysis. This approach bridges the gap between the desired normative AI approach, where a strategy should be selected in order to guarantee a desired payoff, and equilibrium analysis. We show that a safety level strategy is able to guarantee the value obtained in a Nash equilibrium, in several classical computer science settings. Then, we discuss the concept of competitive safety strategies, and illustrate its use in a decentralized load balancing setting, typical to network problems. In particular, we show that when we have many agents, it is possible to guarantee an expected payoff which is a factor of 8/9 of the payoff obtained in a Nash equilibrium. Our discussion of competitive safety analysis for decentralized load balancing is further developed to deal with many communication links and arbitrary speeds. Finally, we discuss the extension of the above concepts to Bayesian games, and illustrate their use in a basic auctions setup.

AI研究の多くは、与えられた（既知または未知の）環境における適切な行動の選択を扱っています。しかし、他のエージェントと対峙した際に適切な行動を選択する方法は非常に不明確です。AI研究の多くは、このような状況におけるエージェントの行動を予測するために、古典的なゲーム理論的均衡分析を採用しています。しかし、このアプローチはエージェントの行動を保証するものではありません。本稿では、競争的安全性分析を導入します。このアプローチは、望ましい利得を保証するために戦略を選択すべきという、望ましい規範的なAIアプローチと均衡分析との間のギャップを埋めるものです。いくつかの古典的なコンピュータサイエンスの設定において、安全レベル戦略はナッシュ均衡で得られる値を保証できることを示します。次に、競争的安全性戦略の概念について議論し、ネットワーク問題に典型的な分散型負荷分散設定におけるその適用例を示します。特に、多数のエージェントが存在する場合、ナッシュ均衡で得られる利得の8/9倍の期待利得を保証できることを示します。分散型負荷分散における競争的安全性分析に関する議論は、多数の通信リンクと任意の速度に対応できるようさらに発展させています。最後に、上記の概念のベイズゲームへの拡張について議論し、基本的なオークション設定におけるその使用例を示します。

Inferring Strategies for Sentence Ordering in Multidocument News Summarization
複数文書ニュース要約における文順序付け戦略の推論

The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input articles. In this paper, we propose a methodology for studying the properties of ordering information in the news genre and describe experiments done on a corpus of multiple acceptable orderings we developed for the task. Based on these experiments, we implemented a strategy for ordering information that combines constraints from chronological order of events and topical relatedness. Evaluation of our augmented algorithm shows a significant improvement of the ordering over two baseline strategies.

複数文書の要約において、生成された要約が首尾一貫したものとなるように情報を整理するという問題は、これまであまり注目されてきませんでした。単一文書の要約では、入力記事内の文章の順序から文章の順序を決定できますが、複数文書の要約では、要約文章が異なる入力記事から抽出される可能性があるため、これは当てはまりません。本論文では、ニュースというジャンルにおける順序情報の特性を研究するための方法論を提案し、このタスクのために開発した複数の許容可能な順序付けのコーパスに対して行った実験について説明します。これらの実験に基づいて、出来事の時系列順序とトピックの関連性からの制約を組み合わせた情報順序付け戦略を実装しました。拡張アルゴリズムの評価では、2つのベースライン戦略と比較して順序付けが大幅に改善されることが示されました。

A New Technique for Combining Multiple Classifiers using The Dempster-Shafer Theory of Evidence
デンプスター・シェーファー証拠理論を用いた複数の分類器を組み合わせる新しい手法

This paper presents a new classifier combination technique based on the Dempster-Shafer theory of evidence. The Dempster-Shafer theory of evidence is a powerful method for combining measures of evidence from different classifiers. However, since each of the available methods that estimates the evidence of classifiers has its own limitations, we propose here a new implementation which adapts to training data so that the overall mean square error is minimized. The proposed technique is shown to outperform most available classifier combination methods when tested on three different classification problems.

本稿では、デンプスター・シェーファーの証拠理論に基づく新しい分類器組み合わせ手法を紹介します。デンプスター・シェーファーの証拠理論は、異なる分類器からの証拠の尺度を組み合わせるための強力な手法です。しかし、分類器の証拠を推定する既存の手法にはそれぞれ限界があるため、ここでは学習データに適応し、全体的な平均二乗誤差を最小化する新しい実装を提案します。提案手法は、3つの異なる分類問題でテストした結果、既存の分類器の組み合わせ手法のほとんどよりも優れた性能を示すことが示されました。

An Analysis of Phase Transition in NK Landscapes
NKランドスケープにおける相転移の分析

In this paper, we analyze the decision version of the NK landscape model from the perspective of threshold phenomena and phase transitions under two random distributions, the uniform probability model and the fixed ratio model. For the uniform probability model, we prove that the phase transition is easy in the sense that there is a polynomial algorithm that can solve a random instance of the problem with the probability asymptotic to 1 as the problem size tends to infinity. For the fixed ratio model, we establish several upper bounds for the solubility threshold, and prove that random instances with parameters above these upper bounds can be solved polynomially. This, together with our empirical study for random instances generated below and in the phase transition region, suggests that the phase transition of the fixed ratio model is also easy.

本稿では、NKランドスケープモデルの決定バージョンを、閾値現象と、一様確率モデルと固定比率モデルの2つのランダム分布下における相転移の観点から分析します。一様確率モデルの場合、問題のサイズが無限大に近づくにつれて確率が1に漸近する問題のランダムインスタンスを解くことができる多項式アルゴリズムが存在するという意味で、相転移が容易であることを証明します。固定比率モデルの場合、溶解閾値にいくつかの上限を設定し、これらの上限を超えるパラメータを持つランダムインスタンスは多項式で解けることを証明します。これは、相転移領域の下および相転移領域内で生成されたランダムインスタンスに関する経験的研究と合わせて、固定比率モデルの相転移も容易であることを示唆しています。

A Unified Model of Structural Organization in Language and Music
言語と音楽における構造的組織化の統一モデル

Is there a general model that can predict the perceived phrase structure in language and music? While it is usually assumed that humans have separate faculties for language and music, this work focuses on the commonalities rather than on the differences between these modalities, aiming at finding a deeper ‘faculty’. Our key idea is that the perceptual system strives for the simplest structure (the ‘simplicity principle’), but in doing so it is biased by the likelihood of previous structures (the ‘likelihood principle’). We present a series of data-oriented parsing (DOP) models that combine these two principles and that are tested on the Penn Treebank and the Essen Folksong Collection. Our experiments show that (1) a combination of the two principles outperforms the use of either of them, and (2) exactly the same model with the same parameter setting achieves maximum accuracy for both language and music. We argue that our results suggest an interesting parallel between linguistic and musical structuring.

言語と音楽における知覚されるフレーズ構造を予測できる一般的なモデルは存在するのだろうか？人間は言語と音楽においてそれぞれ異なる能力を持っていると一般的に考えられているが、本研究ではこれらの様相の相違点ではなく共通点に焦点を当て、より深い「能力」の発見を目指す。我々の核となる考え方は、知覚システムは最も単純な構造（「単純性原理」）を目指すが、その際に以前の構造の尤度（「尤度原理」）によってバイアスを受けるというものです。我々は、これら2つの原理を組み合わせた一連のデータ指向構文解析（DOP）モデルを提示し、ペン・ツリーバンクとエッセン・フォークソング・コレクションでテストした。実験の結果、(1) 2つの原理を組み合わせると、どちらか一方を使用するよりも優れた性能を発揮すること、(2)全く同じモデルを同じパラメータ設定で用いることで、言語と音楽の両方において最大の精度を達成することが示されました。我々は、この結果が言語と音楽の構造化の間に興味深い類似点があることを示唆していると主張します。

When do Numbers Really Matter?
数字が本当に重要になるのはいつなのか？

Common wisdom has it that small distinctions in the probabilities (parameters) quantifying a belief network do not matter much for the results of probabilistic queries. Yet, one can develop realistic scenarios under which small variations in network parameters can lead to significant changes in computed queries. A pending theoretical question is then to analytically characterize parameter changes that do or do not matter. In this paper, we study the sensitivity of probabilistic queries to changes in network parameters and prove some tight bounds on the impact that such parameters can have on queries. Our analytic results pinpoint some interesting situations under which parameter changes do or do not matter. These results are important for knowledge engineers as they help them identify influential network parameters. They also help explain some of the previous experimental results and observations with regards to network robustness against parameter changes.

一般的に、信念ネットワークを定量化する確率（パラメータ）の小さな違いは、確率的クエリの結果にはあまり影響しないと考えられています。しかし、ネットワークパラメータの小さな変化が計算されたクエリに大きな変化をもたらす現実的なシナリオを構築することは可能です。そこで、重要なパラメータの変化と重要でないパラメータの変化を解析的に特徴付けることが、理論的な未解決の問題となります。本稿では、ネットワークパラメータの変化に対する確率的クエリの感度を研究し、そのようなパラメータがクエリに与える影響に関する厳密な境界を証明します。私たちの分析結果は、パラメータの変化が重要になる、あるいは重要にならない興味深い状況をいくつか特定しています。これらの結果は、影響力のあるネットワークパラメータを特定するのに役立つため、知識エンジニアにとって重要です。また、パラメータの変化に対するネットワークの堅牢性に関する過去の実験結果や観察結果の一部を説明するのにも役立ちます。

A Knowledge Compilation Map
知識蓄積マップ

We propose a perspective on knowledge compilation which calls for analyzing different compilation approaches according to two key dimensions: the succinctness of the target compilation language, and the class of queries and transformations that the language supports in polytime. We then provide a knowledge compilation map, which analyzes a large number of existing target compilation languages according to their succinctness and their polytime transformations and queries. We argue that such analysis is necessary for placing new compilation approaches within the context of existing ones. We also go beyond classical, flat target compilation languages based on CNF and DNF, and consider a richer, nested class based on directed acyclic graphs (such as OBDDs), which we show to include a relatively large number of target compilation languages.

本稿では、知識コンパイルに関する視点を提案します。この視点では、対象となるコンパイル言語の簡潔さと、その言語が多時間でサポートするクエリと変換のクラスという2つの主要な次元に基づいて、さまざまなコンパイル手法を分析する必要があります。次に、既存の多数の対象コンパイル言語を、その簡潔さ、多時間変換、クエリに基づいて分析する知識コンパイルマップを提供します。このような分析は、新しいコンパイル手法を既存の手法の文脈に位置付けるために必要であると主張します。また、CNFとDNFに基づく従来のフラットな対象コンパイル言語を超えて、有向非巡回グラフ（OBDDなど）に基づくより豊富なネストされたクラスを検討し、比較的多くの対象コンパイル言語を含むことを示す。

Towards Adjustable Autonomy for the Real World
現実世界における調整可能な自律性に向けて

Adjustable autonomy refers to entities dynamically varying their own autonomy, transferring decision-making control to other entities (typically agents transferring control to human users) in key situations. Determining whether and when such transfers-of-control should occur is arguably the fundamental research problem in adjustable autonomy. Previous work has investigated various approaches to addressing this problem but has often focused on individual agent-human interactions. Unfortunately, domains requiring collaboration between teams of agents and humans reveal two key shortcomings of these previous approaches. First, these approaches use rigid one-shot transfers of control that can result in unacceptable coordination failures in multiagent settings. Second, they ignore costs (e.g., in terms of time delays or effects on actions) to an agent’s team due to such transfers-of-control. To remedy these problems, this article presents a novel approach to adjustable autonomy, based on the notion of a transfer-of-control strategy. A transfer-of-control strategy consists of a conditional sequence of two types of actions: (i) actions to transfer decision-making control (e.g., from an agent to a user or vice versa) and (ii) actions to change an agent’s pre-specified coordination constraints with team members, aimed at minimizing miscoordination costs. The goal is for high-quality individual decisions to be made with minimal disruption to the coordination of the team. We present a mathematical model of transfer-of-control strategies. The model guides and informs the operationalization of the strategies using Markov Decision Processes, which select an optimal strategy, given an uncertain environment and costs to the individuals and teams. The approach has been carefully evaluated, including via its use in a real-world, deployed multi-agent system that assists a research group in its daily activities.

調整可能な自律性とは、主体が自身の自律性を動的に変化させ、重要な状況において意思決定の制御権を他の主体（典型的には、制御権を人間ユーザに移譲するエージェント）に移譲することを指す。このような制御権移譲を行うべきかどうか、またいつ行うべきかを決定することは、調整可能な自律性における基本的な研究課題と言えるだろう。これまでの研究では、この問題に対処するための様々なアプローチが検討されてきたが、多くの場合、個々のエージェントと人間の相互作用に焦点が当てられてきた。しかしながら、エージェントと人間のチーム間の協働を必要とする領域においては、これらの従来のアプローチには2つの重要な欠点があります。第一に、これらのアプローチは、マルチエージェント環境において許容できない調整の失敗につながる可能性のある、硬直した一回限りの制御権移譲を用いています。第二に、これらのアプローチは、制御権移譲によってエージェントのチームに生じるコスト（例えば、時間遅延や行動への影響など）を無視しています。これらの問題を解決するため、本稿では、制御権移譲戦略の概念に基づく、調整可能な自律性への新たなアプローチを提示します。制御権移転戦略は、2種類のアクションの条件付きシーケンスで構成されます。(i)意思決定制御権を移転するアクション（例：エージェントからユーザーへ、またはその逆）、および(ii)ミスコーディネーションコストを最小化することを目的とした、エージェントとチームメンバー間の事前指定された調整制約を変更するアクションです。目標は、チームの調整への混乱を最小限に抑えながら、質の高い個々の意思決定を行うことです。本稿では、制御権移転戦略の数理モデルを提示します。このモデルは、不確実な環境と個人およびチームにかかるコストを前提として最適な戦略を選択するマルコフ決定過程を用いた戦略の運用化を導き、情報を提供します。このアプローチは、研究グループの日常業務を支援する実世界でのマルチエージェントシステムでの使用を含め、慎重に評価されています。

Inducing Interpretable Voting Classifiers without Trading Accuracy for Simplicity: Theoretical Results, Approximation Algorithms
正確性を犠牲にすることなく解釈可能な投票分類器を誘導する：理論的結果、近似アルゴリズム

Recent advances in the study of voting classification algorithms have brought empirical and theoretical results clearly showing the discrimination power of ensemble classifiers. It has been previously argued that the search of this classification power in the design of the algorithms has marginalized the need to obtain interpretable classifiers. Therefore, the question of whether one might have to dispense with interpretability in order to keep classification strength is being raised in a growing number of machine learning or data mining papers. The purpose of this paper is to study both theoretically and empirically the problem. First, we provide numerous results giving insight into the hardness of the simplicity-accuracy tradeoff for voting classifiers. Then we provide an efficient “top-down and prune” induction heuristic, WIDC, mainly derived from recent results on the weak learning and boosting frameworks. It is to our knowledge the first attempt to build a voting classifier as a base formula using the weak learning framework (the one which was previously highly successful for decision tree induction), and not the strong learning framework (as usual for such classifiers with boosting-like approaches). While it uses a well-known induction scheme previously successful in other classes of concept representations, thus making it easy to implement and compare, WIDC also relies on recent or new results we give about particular cases of boosting known as partition boosting and ranking loss boosting. Experimental results on thirty-one domains, most of which readily available, tend to display the ability of WIDC to produce small, accurate, and interpretable decision committees.

投票分類アルゴリズムの研究における最近の進歩は、アンサンブル分類器の識別力を明確に示す経験的および理論的な結果をもたらした。アルゴリズムの設計においてこの分類力の探索が、解釈可能な分類器を取得する必要性を軽視してきたと以前に主張されてきました。そのため、分類の強度を維持するために解釈可能性を放棄する必要があるかどうかという疑問が、ますます多くの機械学習やデータマイニングの論文で提起されています。本論文の目的は、この問題を理論的および経験的に研究することです。まず、投票分類器における簡潔性と精度のトレードオフの難しさに関する洞察を与える多数の結果を示します。次に、主に弱学習およびブースティングフレームワークに関する最近の結果から導き出された、効率的な「トップダウンおよびプルーン」型帰納的ヒューリスティックであるWIDCを示します。我々の知る限り、これは、強学習フレームワーク（ブースティングのようなアプローチを用いた分類器で一般的に使用される）ではなく、弱学習フレームワーク（決定木帰納においてこれまで非常に成功を収めてきたフレームワーク）を用いて投票分類器を基本式として構築する初の試みです。WIDCは、他の概念表現のクラスで既に成功を収めた既知の帰納的スキームを使用しているため、実装と比較が容易ですが、パーティションブースティングおよびランキングロスブースティングと呼ばれるブースティングの特定のケースに関する、我々が示す最近または新しい結果も利用しています。31のドメイン（そのほとんどは既に入手可能）における実験結果は、WIDCが小型で正確かつ解釈可能な決定委員会を生成できることを示している傾向があります。

A Critical Assessment of Benchmark Comparison in Planning
計画におけるベンチマーク比較の批判的評価

Recent trends in planning research have led to empirical comparison becoming commonplace. The field has started to settle into a methodology for such comparisons, which for obvious practical reasons requires running a subset of planners on a subset of problems. In this paper, we characterize the methodology and examine eight implicit assumptions about the problems, planners and metrics used in many of these comparisons. The problem assumptions are: PR1) the performance of a general purpose planner should not be penalized/biased if executed on a sampling of problems and domains, PR2) minor syntactic differences in representation do not affect performance, and PR3) problems should be solvable by STRIPS capable planners unless they require ADL. The planner assumptions are: PL1) the latest version of a planner is the best one to use, PL2) default parameter settings approximate good performance, and PL3) time cut-offs do not unduly bias outcome. The metrics assumptions are: M1) performance degrades similarly for each planner when run on degraded runtime environments (e.g., machine platform) and M2) the number of plan steps distinguishes performance. We find that most of these assumptions are not supported empirically; in particular, that planners are affected differently by these assumptions. We conclude with a call to the community to devote research resources to improving the state of the practice and especially to enhancing the available benchmark problems.

計画研究における最近の傾向により、経験的な比較が一般的になりつつあります。この分野では、このような比較のための方法論が定着し始めていますが、明白な実際的な理由から、問題のサブセットに対してプランナーのサブセットを実行することが必要となります。本稿では、この方法論を特徴づけ、これらの比較の多くで使用されている問題、プランナー、およびメトリクスに関する8つの暗黙の仮定を検証します。問題の仮定は、PR1)汎用プランナーをサンプル問題とドメインで実行した場合、そのパフォーマンスにペナルティやバイアスがかかってはならない、PR2)表現におけるわずかな構文の違いはパフォーマンスに影響しない、PR3) ADLを必要としない限り、問題はSTRIPS対応プランナーで解決できる、というものです。プランナーの仮定は、PL1)最新バージョンのプランナーが使用するのに最も適している、PL2)デフォルトのパラメーター設定は良好なパフォーマンスに近い、PL3)時間制限は結果に過度のバイアスをかけない、というものです。メトリクスの仮定は、M1)性能が低下した実行環境(マシンプラットフォームなど)で実行した場合、各プランナーのパフォーマンスが同様に低下する、およびM2)プランステップの数によってパフォーマンスが区別される、というものです。これらの仮定のほとんどは経験的に裏付けられていないことがわかりました。特に、プランナーはこれらの仮定によって異なる影響を受けます。結論として、コミュニティに対し、研究リソースを投入して実践状態を改善し、特に利用可能なベンチマーク問題を強化するよう呼びかけます。

関連記事