Journal of Artificial Intelligence Resarch: Volume 12の論文一覧

Journal of Artificial Intelligence Resarch Vol. 12 (2000)に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。
Exact Phase Transitions in Random Constraint Satisfaction Problems
ランダム制約充足問題における正確な相転移

In this paper we propose a new type of random CSP model, called Model RB, which is a revision to the standard Model B. It is proved that phase transitions from a region where almost all problems are satisfiable to a region where almost all problems are unsatisfiable do exist for Model RB as the number of variables approaches infinity. Moreover, the critical values at which the phase transitions occur are also known exactly. By relating the hardness of Model RB to Model B, it is shown that there exist a lot of hard instances in Model RB.

本稿では、標準的なモデルBを改訂したモデルRBと呼ばれる新しいタイプのランダムCSPモデルを提案します。モデルRBでは、変数の数が無限大に近づくにつれて、ほぼすべての問題が充足可能な領域からほぼすべての問題が充足不可能な領域への相転移が存在することが証明されています。さらに、相転移が発生する臨界値も正確に分かっています。モデルRBの困難性をモデルBに関連付けることで、モデルRBには多くの困難な事例が存在することが示されます。

On Deducing Conditional Independence from d-Separation in Causal Graphs with Feedback (Research Note)
フィードバック付き因果グラフにおけるd分離からの条件付き独立性の推論について（研究ノート）

Pearl and Dechter (1996) claimed that the d-separation criterion for conditional independence in acyclic causal networks also applies to networks of discrete variables that have feedback cycles, provided that the variables of the system are uniquely determined by the random disturbances. I show by example that this is not true in general. Some condition stronger than uniqueness is needed, such as the existence of a causal dynamics guaranteed to lead to the unique solution.

Pearl and Dechter (1996)は、システムの変数がランダム外乱によって一意に決定されるという条件のもと、非巡回因果ネットワークにおける条件付き独立性に対するd分離基準は、フィードバックサイクルを持つ離散変数ネットワークにも適用できると主張した。本稿では、例を挙げてこれが一般には当てはまらないことを示す。一意性よりも強い条件、例えば、唯一の解を導くことが保証された因果ダイナミクスの存在などが必要となります。

An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email
電子メール向け音声対話システムにおける対話戦略選択への強化学習の応用

This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.

本論文では、音声対話システムが人間のユーザーとの対話経験から最適な対話戦略を選択することを学習できる、新たな手法について述べる。この手法は、強化学習と音声対話システムのパフォーマンスモデリングを組み合わせたものです。強化学習コンポーネントではQ学習（Watkins, 1989）を適用し、パフォーマンスモデリングコンポーネントではPARADISE評価フレームワーク（Walkerら, 1997）を適用して、強化学習で使用されるパフォーマンス関数（報酬）を学習します。本稿では、電話による電子メールへのアクセスをサポートする音声対話システムELVIS（EmaiL Voice Interactive System）を用いて、この手法を説明します。人間ユーザーが電話でELVISと対話する219の対話コーパスを用いて、最適な対話戦略を学習するための一連の実験を行う。次に、その戦略を18の対話コーパスでテストします。その結果、ELVISはエージェント主導、メッセージの読み上げ、電子メールフォルダの要約について、戦略選択を最適化できることが示されます。

Reasoning on Interval and Point-based Disjunctive Metric Constraints in Temporal Contexts
時間的コンテキストにおける区間ベースおよび点ベースの選言的メトリック制約の推論

We introduce a temporal model for reasoning on disjunctive metric constraints on intervals and time points in temporal contexts. This temporal model is composed of a labeled temporal algebra and its reasoning algorithms. The labeled temporal algebra defines labeled disjunctive metric point-based constraints, where each disjunct in each input disjunctive constraint is univocally associated to a label. Reasoning algorithms manage labeled constraints, associated label lists, and sets of mutually inconsistent disjuncts. These algorithms guarantee consistency and obtain a minimal network. Additionally, constraints can be organized in a hierarchy of alternative temporal contexts. Therefore, we can reason on context-dependent disjunctive metric constraints on intervals and points. Moreover, the model is able to represent non-binary constraints, such that logical dependencies on disjuncts in constraints can be handled. The computational cost of reasoning algorithms is exponential in accordance with the underlying problem complexity, although some improvements are proposed.

時間的コンテキストにおける区間と時点に対する選言的メトリック制約を推論するための時間的モデルを導入します。この時間的モデルは、ラベル付き時間的代数とその推論アルゴリズムで構成されます。ラベル付き時間的代数は、ラベル付き選言的メトリックポイントベースの制約を定義します。この制約では、各入力選言制約の各選言がラベルに一意に関連付けられます。推論アルゴリズムは、ラベル付き制約、関連付けられたラベルリスト、および相互に矛盾する選言のセットを管理します。これらのアルゴリズムは一貫性を保証し、最小限のネットワークを実現します。さらに、制約は代替的な時間的コンテキストの階層に編成できます。したがって、区間と点に対するコンテキスト依存の選言的メトリック制約を推論できます。さらに、このモデルは非二項制約を表現できるため、制約内の論理和の依存関係を処理できます。推論アルゴリズムの計算コストは、基盤となる問題の複雑さに応じて指数関数的に増加しますが、いくつかの改善が提案されています。

On Reasonable and Forced Goal Orderings and their Use in an Agenda-Driven Planning Algorithm
アジェンダ駆動型プランニングアルゴリズムにおける合理的および強制的な目標順序付けとその利用について

The paper addresses the problem of computing goal orderings, which is one of the longstanding issues in AI planning. It makes two new contributions. First, it formally defines and discusses two different goal orderings, which are called the reasonable and the forced ordering. Both orderings are defined for simple STRIPS operators as well as for more complex ADL operators supporting negation and conditional effects. The complexity of these orderings is investigated and their practical relevance is discussed. Secondly, two different methods to compute reasonable goal orderings are developed. One of them is based on planning graphs, while the other investigates the set of actions directly. Finally, it is shown how the ordering relations, which have been derived for a given set of goals G, can be used to compute a so-called goal agenda that divides G into an ordered set of subgoals. Any planner can then, in principle, use the goal agenda to plan for increasing sets of subgoals. This can lead to an exponential complexity reduction, as the solution to a complex planning problem is found by solving easier subproblems. Since only a polynomial overhead is caused by the goal agenda computation, a potential exists to dramatically speed up planning algorithms as we demonstrate in the empirical evaluation, where we use this method in theIPP planner.

本論文は、AIプランニングにおける長年の課題の一つであるゴール順序付けの計算問題を取り上げます。本論文は2つの新たな貢献をしています。まず、合理的順序付けと強制順序付けと呼ばれる2つの異なるゴール順序付けを正式に定義し、考察します。どちらの順序付けも、単純なSTRIPS演算子だけでなく、否定や条件付き効果をサポートするより複雑なADL演算子に対しても定義されています。これらの順序付けの複雑さを調査し、その実用的意義について考察します。次に、合理的ゴール順序付けを計算する2つの異なる手法を開発します。1つはプランニンググラフに基づき、もう1つはアクションの集合を直接調査します。最後に、与えられたゴール集合Gに対して導出された順序付け関係を用いて、Gを順序付けられたサブゴール集合に分割する、いわゆるゴールアジェンダを計算する方法を示します。これにより、原理的には、あらゆるプランナーがゴールアジェンダを用いて、増加するサブゴール集合を計画することができます。複雑なプランニング問題の解は、より簡単なサブ問題を解くことで得られるため、これは指数関数的な計算量削減につながります。目標アジェンダの計算によって生じるオーバーヘッドは多項式のオーバーヘッドのみであるため、IPPプランナーでこの手法を用いた実証評価で示すように、プランニングアルゴリズムを劇的に高速化する可能性があります。

Axiomatizing Causal Reasoning
因果推論の公理化

Causal models defined in terms of a collection of equations, as defined by Pearl, are axiomatized here. Axiomatizations are provided for three successively more general classes of causal models: (1) the class of recursive theories (those without feedback), (2) the class of theories where the solutions to the equations are unique, (3) arbitrary theories (where the equations may not have solutions and, if they do, they are not necessarily unique). It is shown that to reason about causality in the most general third class, we must extend the language used by Galles and Pearl. In addition, the complexity of the decision procedures is characterized for all the languages and classes of models considered.

パールによって定義された方程式の集合として定義される因果モデルは、本稿で公理化されます。公理化は、因果モデルの3つのより一般的なクラス、すなわち(1)再帰理論のクラス（フィードバックを持たないもの）、(2)方程式の解が一意である理論のクラス、(3)任意理論のクラス（方程式が解を持たない可能性があり、また、解を持つ場合でも必ずしも一意ではないもの）について提供されます。最も一般的な3番目のクラスにおける因果関係について推論するには、ガレスとパールが用いた言語を拡張する必要があることが示されます。さらに、検討対象のすべての言語とモデルクラスについて、決定手続きの複雑さが特徴付けられます。

On the Compilability and Expressive Power of Propositional Planning Formalisms
命題的計画形式主義のコンパイル可能性と表現力について

The recent approaches of extending the GRAPHPLAN algorithm to handle more expressive planning formalisms raise the question of what the formal meaning of “expressive power” is. We formalize the intuition that expressive power is a measure of how concisely planning domains and plans can be expressed in a particular formalism by introducing the notion of “compilation schemes” between planning formalisms. Using this notion, we analyze the expressiveness of a large family of propositional planning formalisms, ranging from basic STRIPS to a formalism with conditional effects, partial state specifications, and propositional formulae in the preconditions. One of the results is that conditional effects cannot be compiled away if plan size should grow only linearly but can be compiled away if we allow for polynomial growth of the resulting plans. This result confirms that the recently proposed extensions to the GRAPHPLAN algorithm concerning conditional effects are optimal with respect to the “compilability” framework. Another result is that general propositional formulae cannot be compiled into conditional effects if the plan size should be preserved linearly. This implies that allowing general propositional formulae in preconditions and effect conditions adds another level of difficulty in generating a plan.

GRAPHPLANアルゴリズムをより表現力豊かな計画形式論を扱うために拡張する最近のアプローチは、「表現力」の正式な意味とは何かという疑問を提起します。表現力とは、特定の形式主義において計画領域と計画をどれだけ簡潔に表現できるかを示す尺度であるという直感を、計画形式主義間の「コンパイルスキーム」という概念を導入することで形式化します。この概念を用いて、基本的なSTRIPSから、条件付き効果、部分的な状態仕様、そして前提条件に命題式を含む形式主義に至るまで、幅広い命題計画形式主義の表現力を分析します。結果の一つは、プランのサイズが線形にしか増加しない場合は条件付き効果をコンパイルして除去できないが、結果として得られるプランの多項式増加を許容する場合は条件付き効果をコンパイルして除去できることです。この結果は、最近提案されたGRAPHPLANアルゴリズムの条件付き効果に関する拡張が、「コンパイル可能性」フレームワークに関して最適であることを裏付けるものです。もう一つの結果は、プランのサイズが線形に維持される場合は、一般的な命題式を条件付き効果にコンパイルできないことです。これは、前提条件と効果の条件に一般的な命題式を含めることを許容すると、プラン生成の難易度がさらに高まることを意味します。

Backbone Fragility and the Local Search Cost Peak
バックボーンの脆弱性と局所探索コストのピーク

The local search algorithm WSat is one of the most successful algorithms for solving the satisfiability (SAT) problem. It is notably effective at solving hard Random 3-SAT instances near the so-called `satisfiability threshold’, but still shows a peak in search cost near the threshold and large variations in cost over different instances. We make a number of significant contributions to the analysis of WSat on high-cost random instances, using the recently-introduced concept of the backbone of a SAT instance. The backbone is the set of literals which are entailed by an instance. We find that the number of solutions predicts the cost well for small-backbone instances but is much less relevant for the large-backbone instances which appear near the threshold and dominate in the overconstrained region. We show a very strong correlation between search cost and the Hamming distance to the nearest solution early in WSat’s search. This pattern leads us to introduce a measure of the backbone fragility of an instance, which indicates how persistent the backbone is as clauses are removed. We propose that high-cost random instances for local search are those with very large backbones which are also backbone-fragile. We suggest that the decay in cost beyond the satisfiability threshold is due to increasing backbone robustness (the opposite of backbone fragility). Our hypothesis makes three correct predictions. First, that the backbone robustness of an instance is negatively correlated with the local search cost when other factors are controlled for. Second, that backbone-minimal instances (which are 3-SAT instances altered so as to be more backbone-fragile) are unusually hard for WSat. Third, that the clauses most often unsatisfied during search are those whose deletion has the most effect on the backbone. In understanding the pathologies of local search methods, we hope to contribute to the development of new and better techniques.

局所探索アルゴリズムWSatは、充足可能性(SAT)問題を解決するための最も成功したアルゴリズムの1つです。これは、いわゆる「充足可能性閾値」付近の難しいランダム3-SATインスタンスを解決するのに特に効果的ですが、閾値付近で探索コストがピークに達し、インスタンス間でコストが大きく変動します。最近導入されたSATインスタンスのバックボーンの概念を用いて、高コストのランダムインスタンスにおけるWSatの解析に多くの重要な貢献をしています。バックボーンとは、インスタンスに内包されるリテラルの集合です。解の数は、バックボーンが小さいインスタンスの場合はコストをうまく予測しますが、閾値付近に現れ、過剰制約領域を支配するバックボーンが大きいインスタンスの場合は、それほど重要ではないことがわかりました。WSatの探索初期において、探索コストと最も近い解までのハミング距離の間には非常に強い相関関係があることがわかりました。このパターンから、インスタンスのバックボーンの脆弱性の尺度を導入することができます。これは、節が削除されたときにバックボーンがどれだけ持続するかを示します。局所探索において高コストとなるランダムインスタンスとは、バックボーンが非常に大きく、かつバックボーンが脆弱なインスタンスであると我々は提案します。充足可能性閾値を超えるコストの減少は、バックボーンの堅牢性（バックボーンの脆弱性の反対）の増加に起因すると我々は示唆します。我々の仮説は3つの正しい予測を立てる。第一に、他の要因を考慮に入れた場合、インスタンスのバックボーンの堅牢性は局所探索コストと負の相関関係にあります。第二に、バックボーン最小インスタンス（よりバックボーンが脆弱になるように変更された3-SATインスタンス）は、WSatにとって非常に困難です。第三に、探索中に最も頻繁に充足されない節は、その削除がバックボーンに最も影響を与える節です。局所探索法の病理を理解することで、我々は新しくより優れた手法の開発に貢献したいと考えています。

Randomized Algorithms for the Loop Cutset Problem
ループカットセット問題のためのランダム化アルゴリズム

We show how to find a minimum weight loop cutset in a Bayesian network with high probability. Finding such a loop cutset is the first step in the method of conditioning for inference. Our randomized algorithm for finding a loop cutset outputs a minimum loop cutset after O(c 6^k kn) steps with probability at least 1 – (1 – 1/(6^k))^c6^k, where c > 1 is a constant specified by the user, k is the minimal size of a minimum weight loop cutset, and n is the number of vertices. We also show empirically that a variant of this algorithm often finds a loop cutset that is closer to the minimum weight loop cutset than the ones found by the best deterministic algorithms known.

ベイジアンネットワークで最小重量のループカットセットを高確率で見つける方法を示します。このようなループカットセットを見つけることは、推論の条件付け法の最初のステップです。ループカットセットを見つけるためのランダム化アルゴリズムは、少なくとも1 – (1 – 1/(6^k))^c6^kの確率でO(c 6^k kn)ステップ後に最小ループカットセットを出力します。ここで、c > 1はユーザーが指定した定数、kは最小重量のループカットセットの最小サイズ、nは頂点の数です。また、このアルゴリズムの変形では、既知の最良の決定論的アルゴリズムで見つかったループカットセットよりも最小重量のループカットセットに近いループカットセットが見つかることが多いことも経験的に示します。

The Complexity of Reasoning with Cardinality Restrictions and Nominals in Expressive Description Logics
表現的記述論理におけるカーディナリティ制約と名詞を用いた推論の複雑性

We study the complexity of the combination of the Description Logics ALCQ and ALCQI with a terminological formalism based on cardinality restrictions on concepts. These combinations can naturally be embedded into C^2, the two variable fragment of predicate logic with counting quantifiers, which yields decidability in NExpTime. We show that this approach leads to an optimal solution for ALCQI, as ALCQI with cardinality restrictions has the same complexity as C^2 (NExpTime-complete). In contrast, we show that for ALCQ, the problem can be solved in ExpTime. This result is obtained by a reduction of reasoning with cardinality restrictions to reasoning with the (in general weaker) terminological formalism of general axioms for ALCQ extended with nominals. Using the same reduction, we show that, for the extension of ALCQI with nominals, reasoning with general axioms is a NExpTime-complete problem. Finally, we sharpen this result and show that pure concept satisfiability for ALCQI with nominals is NExpTime-complete. Without nominals, this problem is known to be PSpace-complete.

記述論理ALCQとALCQIを、概念のカーディナリティ制約に基づく用語論的形式主義と組み合わせた場合の複雑さを調査します。これらの組み合わせは、計数量化子を持つ述語論理の2変数フラグメントであるC^2に自然に埋め込むことができ、NExpTimeで決定可能性が得られます。カーディナリティ制約付きのALCQIはC^2（NExpTime完全）と同じ複雑さを持つため、このアプローチはALCQIの最適解につながることを示す。対照的に、ALCQの場合、問題はExpTimeで解決できることを示す。この結果は、カーディナリティ制約付きの推論を、名詞で拡張されたALCQの一般公理の（一般に弱い）用語論的形式主義による推論に縮減することによって得られます。同じ縮約を用いて、名詞を含むALCQIの拡張において、一般公理を用いた推論がNExpTime完全問題であることを示す。最後に、この結果を明確化し、名詞を含むALCQIの純粋概念充足可能性はNExpTime完全であることを示す。名詞を含まない場合、この問題はPSpace完全であることが知られています。

A Model of Inductive Bias Learning
帰納的バイアス学習モデル

A major problem in machine learning is that of inductive bias: how to choose a learner’s hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonably-sized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Explicit bounds are also derived demonstrating that learning multiple tasks within an environment of related tasks can potentially give much better generalization than learning a single task.

機械学習における主要な問題は、帰納的バイアスです。これは、学習対象の問題の解を含むのに十分な大きさでありながら、適切なサイズのトレーニングセットから信頼性の高い一般化を保証するのに十分な小ささの、学習者の仮説空間をどのように選択するかという問題です。通常、このようなバイアスは、専門家のスキルと洞察力によって手動で提供されます。本論文では、バイアスを自動的に学習するモデルを検討します。このモデルの中心的な仮定は、学習者が関連する学習タスクの環境に埋め込まれているというものです。このような環境内では、学習者は複数のタスクからサンプルを取得できるため、環境内の多くの問題に対する優れた解を含む仮説空間を探索することができます。学習者が利用可能なすべての仮説空間の集合に一定の制約を課す場合、十分な数のトレーニングタスクで良好なパフォーマンスを示す仮説空間は、同じ環境で新しいタスクを学習する際にも良好なパフォーマンスを示すことを示す。明示的な境界も導出され、関連するタスクの環境内で複数のタスクを学習すると、単一のタスクを学習するよりもはるかに優れた一般化が得られる可能性があることが示されます。

Robust Agent Teams via Socially-Attentive Monitoring
社会的注意モニタリングによるロバストなエージェントチーム

Agents in dynamic multi-agent environments must monitor their peers to execute individual and group plans. A key open question is how much monitoring of other agents’ states is required to be effective: The Monitoring Selectivity Problem. We investigate this question in the context of detecting failures in teams of cooperating agents, via Socially-Attentive Monitoring, which focuses on monitoring for failures in the social relationships between the agents. We empirically and analytically explore a family of socially-attentive teamwork monitoring algorithms in two dynamic, complex, multi-agent domains, under varying conditions of task distribution and uncertainty. We show that a centralized scheme using a complex algorithm trades correctness for completeness and requires monitoring all teammates. In contrast, a simple distributed teamwork monitoring algorithm results in correct and complete detection of teamwork failures, despite relying on limited, uncertain knowledge, and monitoring only key agents in a team. In addition, we report on the design of a socially-attentive monitoring system and demonstrate its generality in monitoring several coordination relationships, diagnosing detected failures, and both on-line and off-line applications.

動的なマルチエージェント環境のエージェントは、個人およびグループの計画を実行するためにピアを監視する必要があります。重要な未解決の問題は、他のエージェントの状態をどの程度監視すれば効果的か、つまり監視選択性問題です。私たちはこの問題を、エージェント間の社会的関係の失敗の監視に焦点を当てたSocially-Attentive Monitoringを介して、協力するエージェントのチームにおける失敗を検出するという文脈で調査します。2つの動的で複雑なマルチエージェント領域で、タスク配分と不確実性のさまざまな条件下で、社会的に注意を払うチームワーク監視アルゴリズムのファミリーを経験的かつ分析的に調査します。複雑なアルゴリズムを使用する集中型のスキームでは、正確さと完全性がトレードオフされ、すべてのチームメイトを監視する必要があることを示します。対照的に、単純な分散型チームワーク監視アルゴリズムは、限られた不確実な知識に依存し、チーム内の主要なエージェントのみを監視しているにもかかわらず、チームワークの失敗を正しく完全に検出します。さらに、社会的注意力を備えたモニタリングシステムの設計について報告し、複数の調整関係の監視、検出された障害の診断、そしてオンラインおよびオフラインの両方のアプリケーションにおけるその汎用性を実証します。

Planning Graph as a (Dynamic) CSP: Exploiting EBL, DDB and other CSP Search Techniques in Graphplan
（動的）CSPとしての計画グラフ：GraphplanにおけるEBL、DDB、その他のCSP検索手法の活用

This paper reviews the connections between Graphplan’s planning-graph and the dynamic constraint satisfaction problem and motivates the need for adapting CSP search techniques to the Graphplan algorithm. It then describes how explanation based learning, dependency directed backtracking, dynamic variable ordering, forward checking, sticky values and random-restart search strategies can be adapted to Graphplan. Empirical results are provided to demonstrate that these augmentations improve Graphplan’s performance significantly (up to 1000x speedups) on several benchmark problems. Special attention is paid to the explanation-based learning and dependency directed backtracking techniques as they are empirically found to be most useful in improving the performance of Graphplan.

本稿では、Graphplanのプランニンググラフと動的制約充足問題との関連性を考察し、CSP探索手法をGraphplanアルゴリズムに適用する必要性を説明します。さらに、説明に基づく学習、依存性に基づくバックトラッキング、動的変数順序付け、フォワードチェック、スティッキー値、ランダムリスタート探索戦略をGraphplanにどのように適用できるかを説明します。これらの拡張機能により、Graphplanのパフォーマンスがいくつかのベンチマーク問題において大幅に向上する（最大1000倍高速化する）ことを実証する実験結果が示されています。特に、Graphplanのパフォーマンス向上に最も効果的であることが実証されている説明ベース学習と依存性指向バックトラッキング手法に注目しています。

関連記事