Journal of Artificial Intelligence Resarch: Volume 15の論文一覧

Journal of Artificial Intelligence Resarch Vol. 15 (2001)に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。
Mean Field Methods for a Special Class of Belief Networks
特定のクラスの信念ネットワークに対する平均場法

The chief aim of this paper is to propose mean-field approximations for a broad class of Belief networks, of which sigmoid and noisy-or networks can be seen as special cases. The approximations are based on a powerful mean-field theory suggested by Plefka. We show that Saul, Jaakkola and Jordan’ s approach is the first order approximation in Plefka’s approach, via a variational derivation. The application of Plefka’s theory to belief networks is not computationally tractable. To tackle this problem we propose new approximations based on Taylor series. Small scale experiments show that the proposed schemes are attractive.

本論文の主な目的は、シグモイドネットワークやノイズ入りORネットワークを特別なケースと見なせる、幅広いBeliefネットワークのクラスに対する平均場近似を提案することです。この近似は、Plefkaによって示唆された強力な平均場理論に基づいています。変分微分により、Saul、Jaakkola、およびJordanのアプローチがPlefkaのアプローチにおける一次近似であることを示す。プレフカ理論をビリーフネットワークに適用することは、計算的に扱いにくい。この問題に対処するため、テイラー級数に基づく新しい近似法を提案します。小規模実験により、提案手法が魅力的であることが示されました。

Parameter Learning of Logic Programs for Symbolic-Statistical Modeling
記号統計モデリングのための論理プログラムのパラメータ学習

We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. definite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM algorithm, the graphical EM algorithm, that runs for a class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs. The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside algorithm for PCFGs, and the one for singly connected Bayesian networks that have been developed independently in each research field. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can significantly outperform the Inside-Outside algorithm.

パラメータ化論理プログラム（パラメータ化された分布を持つ確率的事実を含む確定節プログラム）の統計的パラメータ学習のための論理的/数学的フレームワークを提案します。これは、論理プログラミングにおける従来の最小エルブランモデル意味論を、HMM、PCFG、ベイジアンネットワークを含む任意の論理プログラムに無条件に適用可能な確率分布を持つ可能世界意味論である分布意味論へと拡張します。また、各決定が排他的かつ独立である逐次決定プロセスを表すパラメータ化論理プログラムのクラスに対して実行される新しいEMアルゴリズム、グラフィカルEMアルゴリズムも提案します。これは、観測とその説明の論理関係を記述するサポートグラフと呼ばれる新しいデータ構造上で実行され、論理プログラムに一般化された内部確率と外部確率を計算することでパラメータを学習します。計算量分析の結果、観測値に対するすべての説明を探索するOLDT法と組み合わせた場合、グラフィカルEMアルゴリズムは、その汎用性にもかかわらず、既存のEMアルゴリズム、すなわちHMMのBaum-Welchアルゴリズム、PCFGのInside-Outsideアルゴリズム、および各研究分野で独立に開発されてきた単結合ベイジアンネットワークのアルゴリズムと同じ計算量を持つことがわかった。中規模の2つのコーパスを用いたPCFGの学習実験では、グラフィカルEMアルゴリズムがInside-Outsideアルゴリズムを大幅に上回る性能を示すことがわかった。

Finding a Path is Harder than Finding a Tree
パスの探索はツリーの探索よりも難しい

I consider the problem of learning an optimal path graphical model from data and show the problem to be NP-hard for the maximum likelihood and minimum description length approaches and a Bayesian approach. This hardness result holds despite the fact that the problem is a restriction of the polynomially solvable problem of finding the optimal tree graphical model.

本稿では、データから最適パスのグラフィカルモデルを学習する問題を考察し、最大尤度法と最小記述長法、そしてベイズ法のいずれにおいても、この問題がNP困難であることを示す。この困難性は、この問題が最適なツリーグラフィカルモデルを求める多項式的に解ける問題の制約であるにもかかわらず成立します。

Experiments with Infinite-Horizon, Policy-Gradient Estimation
無限時間ポリシー勾配推定の実験

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm’s chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter & Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.

本論文では、部分観測マルコフ決定過程(POMDP)における平均報酬の勾配上昇法を実行するアルゴリズムを紹介します。これらのアルゴリズムは、関連論文(Baxter & Bartlett、本書)で紹介されたGPOMDPに基づいています。GPOMDPは、POMDPにおける性能勾配の偏りのある推定値を計算し、その性能勾配を推定するアルゴリズムです。このアルゴリズムの主な利点は、バイアスと分散のトレードオフの観点から自然な解釈が可能な自由パラメータβを1つだけ使用すること、基礎状態に関する知識を必要とせず、無限状態空間、制御空間、観測空間に適用できることです。本稿では、GPOMDPによって生成された勾配推定値を用いて、従来の確率勾配アルゴリズムと、勾配情報を用いて直線探索における最大値を括弧で囲む共役勾配に基づくアルゴリズムの両方を用いて、勾配上昇法を実行する方法を示す。実験結果では、(Baxter & Bartlett,本巻)の理論的な結果を玩具問題で示し、より現実的な問題に対するアルゴリズムの実際的な側面も示す。

Infinite-Horizon Policy-Gradient Estimation
無限時間ポリシー勾配推定

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes POMDPs controlled by parameterized stochastic policies. A similar algorithm was proposed by (Kimura et al. 1995). The algorithm’s chief advantages are that it requires storage of only twice the number of policy parameters, uses one free beta (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter beta is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter et al., this volume) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward.

強化学習における直接的な方策探索への勾配ベースのアプローチは、部分的観測性の問題を解決し、価値関数法における方策の劣化に関連する問題の一部を回避する手段として、最近大きな注目を集めています。本稿では、パラメータ化された確率的方策によって制御される部分観測マルコフ決定過程(POMDP)の平均報酬の勾配のバイアス推定値を生成するシミュレーションベースのアルゴリズムGPOMDPを紹介します。同様のアルゴリズムがKimuraら(1995)によって提案されています。このアルゴリズムの主な利点は、方策パラメータの2倍の数の記憶のみを必要とし、1つの自由ベータ(バイアスと分散のトレードオフに関して自然な解釈が可能)を使用し、基礎となる状態に関する知識を必要としないことです。本稿ではGPOMDPの収束を証明し、パラメータベータの正しい選択が制御されたPOMDPの混合時間とどのように関係するかを示す。GPOMDPを制御マルコフ連鎖、連続状態、観測空間と制御空間、マルチエージェント、高階微分、そして内部状態を持つ確率的方策を訓練するためのバージョンに拡張した手法について簡単に説明します。関連論文(Baxterら、本書)では、GPOMDPによって生成された勾配推定値を、従来の確率的勾配アルゴリズムと共役勾配法の両方で使用して、平均報酬の局所最適値を見つける方法を示します。

Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic
力のダイナミクスとイベントロジックを用いた視覚知覚に基づく動詞の語彙意味論の基盤構築

This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile.

本論文では、短い画像シーケンスにおいて、単純な空間運動動詞で記述されるイベントの発生を認識するための実装システムを紹介します。これらの動詞の意味は、イベントの参加者間の力学関係の状態変化を記述するイベントロジック式で指定されます。液体および半液体のイベントを記述する際に発生する無限の間隔集合に対して、効率的な有限表現が導入されています。さらに、この表現を用いた効率的な手順を提示し、イベントロジック式で記述された複合イベントの発生を、プリミティブイベントの発生から推論します。力のダイナミクスとイベントロジックを用いてイベントの語彙的意味を規定することで、このシステムは従来の動作プロファイルに基づくシステムよりも堅牢になります。

Efficient Methods for Qualitative Spatial Reasoning
質的空間推論のための効率的な手法

The theoretical properties of qualitative spatial reasoning in the RCC8 framework have been analyzed extensively. However, no empirical investigation has been made yet. Our experiments show that the adaption of the algorithms used for qualitative temporal reasoning can solve large RCC8 instances, even if they are in the phase transition region — provided that one uses the maximal tractable subsets of RCC8 that have been identified by us. In particular, we demonstrate that the orthogonal combination of heuristic methods is successful in solving almost all apparently hard instances in the phase transition region up to a certain size in reasonable time.

RCC8フレームワークにおける質的空間推論の理論的特性は、これまで広範囲に分析されてきました。しかし、実証的な調査はまだ行われていません。私たちの実験は、質的時間推論に用いられるアルゴリズムを適応させることで、たとえ相転移領域にあるとしても、大規模なRCC8インスタンスを解くことができることを示しています。ただし、私たちが特定したRCC8の最大限に扱いやすいサブセットを使用することが条件です。特に、ヒューリスティック手法の直交的な組み合わせにより、相転移領域にある一見困難なインスタンスのほとんどすべてを、ある程度の大きさまで妥当な時間で解くことができることを実証します。

Computational Approach to Anaphora Resolution in Spanish Dialogues
スペイン語対話における照応解決への計算論的アプローチ

This paper presents an algorithm for identifying noun-phrase antecedents of pronouns and adjectival anaphors in Spanish dialogues. We believe that anaphora resolution requires numerous sources of information in order to find the correct antecedent of the anaphor. These sources can be of different kinds, e.g., linguistic information, discourse/dialogue structure information, or topic information. For this reason, our algorithm uses various different kinds of information (hybrid information). The algorithm is based on linguistic constraints and preferences and uses an anaphoric accessibility space within which the algorithm finds the noun phrase. We present some experiments related to this algorithm and this space using a corpus of 204 dialogues. The algorithm is implemented in Prolog. According to this study, 95.9% of antecedents were located in the proposed space, a precision of 81.3% was obtained for pronominal anaphora resolution, and 81.5% for adjectival anaphora.

本論文では、スペイン語の対話における代名詞および形容詞の名詞句先行詞を識別するアルゴリズムを提示します。我々は、アナフォラ解決には、正しい先行詞を見つけるために多数の情報源が必要であると考えています。これらの情報源は、言語情報、談話／対話構造情報、話題情報など、様々な種類があり得る。そのため、本アルゴリズムでは、様々な種類の情報（ハイブリッド情報）を使用します。このアルゴリズムは、言語的制約と好みに基づいており、名詞句を見つけるためのアナフォリックアクセシビリティ空間を使用します。本稿では、204の対話コーパスを用いて、このアルゴリズムとこの空間に関するいくつかの実験を紹介します。このアルゴリズムはPrologで実装されています。この研究によると、提案された空間内に先行詞の95.9%が位置し、代名詞の照応解決では81.3%、形容詞の照応解決では81.5%の精度が得られました。

Planning by Rewriting
書き換えによるプランニング

Domain-independent planning is a hard combinatorial problem. Taking into account plan quality makes the task even more difficult. This article introduces Planning by Rewriting (PbR), a new paradigm for efficient high-quality domain-independent planning. PbR exploits declarative plan-rewriting rules and efficient local search techniques to transform an easy-to-generate, but possibly suboptimal, initial plan into a high-quality plan. In addition to addressing the issues of planning efficiency and plan quality, this framework offers a new anytime planning algorithm. We have implemented this planner and applied it to several existing domains. The experimental results show that the PbR approach provides significant savings in planning effort while generating high-quality plans.

ドメイン非依存プランニングは難しい組み合わせ問題です。プランの品質を考慮すると、作業はさらに困難になります。本稿では、効率的で高品質なドメイン非依存プランニングのための新しいパラダイムである、Planning by Rewriting (PbR)を紹介します。PbRは、宣言的なプラン書き換えルールと効率的な局所探索技術を活用して、生成は容易だが最適ではない可能性のある初期プランを高品質なプランに変換します。プランニングの効率性とプランの品質という問題に対処するだけでなく、このフレームワークは新しいいつでもプランニング可能なアルゴリズムを提供します。私たちはこのプランナーを実装し、既存の複数のドメインに適用しました。実験結果から、PbRアプローチはプランニングの労力を大幅に削減しながら、高品質なプランを生成することが示されました。

ATTac-2000: An Adaptive Autonomous Bidding Agent
ATTac-2000：適応型自律入札エージェント

The First Trading Agent Competition (TAC) was held from June 22nd to July 8th, 2000. TAC was designed to create a benchmark problem in the complex domain of e-marketplaces and to motivate researchers to apply unique approaches to a common task. This article describes ATTac-2000, the first-place finisher in TAC. ATTac-2000 uses a principled bidding strategy that includes several elements of adaptivity. In addition to the success at the competition, isolated empirical results are presented indicating the robustness and effectiveness of ATTac-2000’s adaptive strategy.

第1回トレーディングエージェントコンペティション（TAC）は、2000年6月22日から7月8日まで開催されました。TACは、電子マーケットプレイスという複雑な領域におけるベンチマーク問題を作成し、研究者が共通の課題に独自のアプローチを適用するよう促すために設計されました。本論文では、TACで優勝したATTac-2000について説明します。ATTac-2000は、適応性の要素を複数含む、原理に基づいた入札戦略を採用しています。コンペティションでの成功に加え、ATTac-2000の適応戦略の堅牢性と有効性を示す個別の実証結果も示します。

An Analysis of Reduced Error Pruning
エラー削減枝刈りの分析

Top-down induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems of decision tree learning. In this paper we present analyses of Reduced Error Pruning in three different settings. First we study the basic algorithmic properties of the method, properties that hold independent of the input decision tree and pruning examples. Then we examine a situation that intuitively should lead to the subtree under consideration to be replaced by a leaf node, one in which the class label and attribute values of the pruning examples are independent of each other. This analysis is conducted under two different assumptions. The general analysis shows that the pruning probability of a node fitting pure noise is bounded by a function that decreases exponentially as the size of the tree grows. In a specific analysis we assume that the examples are distributed uniformly to the tree. This assumption lets us approximate the number of subtrees that are pruned because they do not receive any pruning examples. This paper clarifies the different variants of the Reduced Error Pruning algorithm, brings new insight to its algorithmic properties, analyses the algorithm with less imposed assumptions than before, and includes the previously overlooked empty subtrees to the analysis.

決定木のトップダウン誘導法は、枝刈り段階の機能不全に悩まされていることが観察されています。特に、得られる木のサイズはサンプルサイズに比例して増加するが、木の精度は向上しないことが知られています。Reduced Error Pruningは、決定木学習の問題を説明する際に代表的な手法として用いられてきたアルゴリズムです。本稿では、3つの異なる設定におけるReduced Error Pruningの分析を示す。まず、入力決定木や枝刈り例とは独立して成立する、この手法の基本的なアルゴリズム特性を検討します。次に、枝刈り例のクラスラベルと属性値が互いに独立している葉ノードに、直観的に置換されるべきであろう状況を検討します。この分析は、2つの異なる仮定の下で行われます。一般的な分析では、純粋なノイズに適合するノードの枝刈り確率は、木のサイズが大きくなるにつれて指数関数的に減少する関数によって制限されることが示されます。具体的な分析では、例がツリーに均一に分布していると仮定します。この仮定により、枝刈り例が与えられないために枝刈りされるサブツリーの数を概算できます。本論文では、Reduced Error Pruningアルゴリズムの様々なバリエーションを明らかにし、そのアルゴリズム特性への新たな知見をもたらし、従来よりも少ない仮定の下でアルゴリズムを分析し、これまで見落とされていた空のサブツリーを分析に含めます。

The GRT Planning System: Backward Heuristic Construction in Forward State-Space Planning
GRTプランニングシステム：順方向状態空間プランニングにおける逆方向ヒューリスティック構築

This paper presents GRT, a domain-independent heuristic planning system for STRIPS worlds. GRT solves problems in two phases. In the pre-processing phase, it estimates the distance between each fact and the goals of the problem, in a backward direction. Then, in the search phase, these estimates are used in order to further estimate the distance between each intermediate state and the goals, guiding so the search process in a forward direction and on a best-first basis. The paper presents the benefits from the adoption of opposite directions between the preprocessing and the search phases, discusses some difficulties that arise in the pre-processing phase and introduces techniques to cope with them. Moreover, it presents several methods of improving the efficiency of the heuristic, by enriching the representation and by reducing the size of the problem. Finally, a method of overcoming local optimal states, based on domain axioms, is proposed. According to it, difficult problems are decomposed into easier sub-problems that have to be solved sequentially. The performance results from various domains, including those of the recent planning competitions, show that GRT is among the fastest planners.

本稿では、STRIPS世界用のドメイン非依存のヒューリスティックプランニングシステムであるGRTについて説明します。GRTは2つのフェーズで問題を解決します。前処理フェーズでは、各事実と問題の目標との間の距離を後方に推定します。次に、探索フェーズでは、これらの推定値を用いて、各中間状態と目標との間の距離をさらに推定し、探索プロセスを前方方向かつ最良優先で進めます。本論文では、前処理フェーズと探索フェーズで逆方向を採用することの利点を示し、前処理フェーズで生じるいくつかの困難について議論し、それらに対処するための手法を紹介します。さらに、表現を豊かにし、問題の規模を縮小することで、ヒューリスティックの効率を向上させるいくつかの手法を提示します。最後に、ドメイン公理に基づく局所最適状態を克服する方法を提案します。この方法によれば、困難な問題は、順次解決すべきより簡単な部分問題に分解されます。最近のプランニングコンペティションを含む様々なドメインにおけるパフォーマンス結果は、GRTが最も高速なプランナーの1つであることを示しています。

Goal Recognition through Goal Graph Analysis
ゴールグラフ分析によるゴール認識

We present a novel approach to goal recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure called a Goal Graph is constructed to represent the observed actions, the state of the world, and the achieved goals as well as various connections between these nodes at consecutive time steps. Then, the Goal Graph is analysed at each time step to recognise those partially or fully achieved goals that are consistent with the actions observed so far. The Goal Graph analysis also reveals valid plans for the recognised goals or part of these goals. Our approach to goal recognition does not need a plan library. It does not suffer from the problems in the acquisition and hand-coding of large plan libraries, neither does it have the problems in searching the plan space of exponential size. We describe two algorithms for Goal Graph construction and analysis in this paradigm. These algorithms are both provably sound, polynomial-time, and polynomial-space. The number of goals recognised by our algorithms is usually very small after a sequence of observed actions has been processed. Thus the sequence of observed actions is well explained by the recognised goals with little ambiguity. We have evaluated these algorithms in the UNIX domain, in which excellent performance has been achieved in terms of accuracy, efficiency, and scalability.

我々は、グラフ構築と分析という2段階パラダイムに基づく、目標認識への新しいアプローチを提示します。まず、観測された行動、世界の状態、達成された目標、そして連続する時間ステップにおけるこれらのノード間の様々な接続を表すために、目標グラフと呼ばれるグラフ構造を構築します。次に、各時間ステップで目標グラフを分析し、それまでに観測された行動と一致する、部分的にまたは完全に達成された目標を認識します。ゴールグラフ解析では、認識されたゴール全体、あるいはその一部に対する有効なプランも明らかになります。ゴール認識に対する我々のアプローチは、プランライブラリを必要としない。大規模なプランライブラリの取得や手作業によるコーディングの問題に悩まされることもなく、指数関数的なサイズのプラン空間の検索にも問題がない。このパラダイムにおいて、ゴールグラフの構築と解析のための2つのアルゴリズムについて説明します。これらのアルゴリズムは、いずれも証明可能に健全であり、多項式時間、多項式空間で実行されます。我々のアルゴリズムによって認識されるゴールの数は、観測された一連のアクションが処理された後では、通常非常に少なくなります。したがって、観測された一連のアクションは、認識されたゴールによって曖昧さをほとんど伴わずにうまく説明されます。我々はこれらのアルゴリズムをUNIXドメインで評価し、精度、効率、スケーラビリティの点で優れたパフォーマンスが達成されています。

関連記事