Journal of Artificial Intelligence Resarch: Volume 25の論文一覧

Journal of Artificial Intelligence Resarch Vol. 25 (2006)に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。

1 論文
2 参考文献
3 関連情報

論文

Engineering a Conformant Probabilistic Planner

Engineering a Conformant Probabilistic Planner / 適合型確率プランナーの設計

We present a partial-order, conformant, probabilistic planner, Probapop which competed in the blind track of the Probabilistic Planning Competition in IPC-4. We explain how we adapt distance based heuristics for use with probabilistic domains. Probapop also incorporates heuristics based on probability of success. We explain the successes and difficulties encountered during the design and implementation of Probapop.

IPC-4の確率的プランニングコンペティションのブラインドトラックに出場した、半順序型で適合性のある確率的プランナーProbapopを紹介します。確率的領域での使用にあたり、距離ベースのヒューリスティックスをどのように適応させたかを説明します。Probapopは成功確率に基づくヒューリスティックスも組み込んでいます。Probapopの設計と実装において得られた成果と、その過程で遭遇した困難についても説明します。

Learning in Real-Time Search: A Unifying Framework

Learning in Real-Time Search: A Unifying Framework / リアルタイム探索における学習：統合フレームワーク

Real-time search methods are suited for tasks in which the agent is interacting with an initially unknown environment in real time. In such simultaneous planning and learning problems, the agent has to select its actions in a limited amount of time, while sensing only a local part of the environment centered at the agent’s current location. Real-time heuristic search agents select actions using a limited lookahead search and evaluating the frontier states with a heuristic function. Over repeated experiences, they refine heuristic values of states to avoid infinite loops and to converge to better solutions. The wide spread of such settings in autonomous software and hardware agents has led to an explosion of real-time search algorithms over the last two decades. Not only is a potential user confronted with a hodgepodge of algorithms, but he also faces the choice of control parameters they use. In this paper we address both problems. The first contribution is an introduction of a simple three-parameter framework (named LRTS) which extracts the core ideas behind many existing algorithms. We then prove that LRTA*, epsilon-LRTA*, SLA*, and gamma-Trap algorithms are special cases of our framework. Thus, they are unified and extended with additional features. Second, we prove completeness and convergence of any algorithm covered by the LRTS framework. Third, we prove several upper-bounds relating the control parameters and solution quality. Finally, we analyze the influence of the three control parameters empirically in the realistic scalable domains of real-time navigation on initially unknown maps from a commercial role-playing game as well as routing in ad hoc sensor networks.

リアルタイム探索手法は、エージェントが当初未知の環境とリアルタイムで相互作用するタスクに適しています。このような同時計画・学習問題では、エージェントは限られた時間内で行動を選択し、エージェントの現在位置を中心とした環境の局所的な部分のみを感知する必要があります。リアルタイムヒューリスティック探索エージェントは、限定的な先読み探索とヒューリスティック関数を用いたフロンティア状態の評価を用いて行動を選択します。エージェントは、繰り返しの経験を通して状態のヒューリスティック値を洗練させ、無限ループを回避し、より良い解へと収束していきます。自律ソフトウェアエージェントおよびハードウェアエージェントにおけるこのような設定の普及により、過去20年間でリアルタイム探索アルゴリズムが爆発的に増加しました。潜在的なユーザーは、寄せ集めのアルゴリズムに直面するだけでなく、それらが使用する制御パラメータの選択にも直面します。本稿では、この両方の問題に取り組みます。最初の貢献は、多くの既存アルゴリズムの背後にある中核的なアイデアを抽出する、シンプルな3パラメータフレームワーク（LRTS）の紹介です。次に、LRTA*、epsilon-LRTA*、SLA*、およびgamma-Trapアルゴリズムが本フレームワークの特殊なケースであることを証明します。したがって、これらのアルゴリズムは統合され、追加機能によって拡張されています。次に、LRTSフレームワークでカバーされるすべてのアルゴリズムの完全性と収束性を証明します。3つ目に、制御パラメータと解の品質に関連するいくつかの上限を証明します。最後に、商用ロールプレイングゲームの初期未知のマップでのリアルタイムナビゲーション、およびアドホックセンサーネットワークでのルーティングという現実的でスケーラブルな領域において、3つの制御パラメータの影響を経験的に分析します。

Dynamic Local Search for the Maximum Clique Problem

Dynamic Local Search for the Maximum Clique Problem / 最大クリーク問題のための動的局所探索

In this paper, we introduce DLS-MC, a new stochastic local search algorithm for the maximum clique problem. DLS-MC alternates between phases of iterative improvement, during which suitable vertices are added to the current clique, and plateau search, during which vertices of the current clique are swapped with vertices not contained in the current clique. The selection of vertices is solely based on vertex penalties that are dynamically adjusted during the search, and a perturbation mechanism is used to overcome search stagnation. The behaviour of DLS-MC is controlled by a single parameter, penalty delay, which controls the frequency at which vertex penalties are reduced. We show empirically that DLS-MC achieves substantial performance improvements over state-of-the-art algorithms for the maximum clique problem over a large range of the commonly used DIMACS benchmark instances.

本稿では、最大クリーク問題に対する新しい確率的局所探索アルゴリズムであるDLS-MCを紹介します。DLS-MCは、適切な頂点を現在のクリークに追加する反復改善フェーズと、現在のクリークの頂点を現在のクリークに含まれない頂点と交換するプラトー探索フェーズを交互に繰り返します。頂点の選択は、探索中に動的に調整される頂点ペナルティのみに基づいて行われ、探索の停滞を克服するために摂動メカニズムが使用されます。DLS-MCの挙動は、頂点ペナルティが低減される頻度を制御するペナルティ遅延という単一のパラメータによって制御されます。我々は、DLS-MCが、一般的に使用されるDIMACSベンチマークインスタンスの広範な範囲において、最大クリーク問題に対する最先端のアルゴリズムよりも大幅な性能向上を達成することを経験的に示しています。

Decision-Theoretic Planning with non-Markovian Rewards

Decision-Theoretic Planning with non-Markovian Rewards / 非マルコフ報酬を用いた決定理論的プランニング

A decision process in which rewards depend on history rather than merely on the current state is called a decision process with non-Markovian rewards (NMRDP). In decision-theoretic planning, where many desirable behaviours are more naturally expressed as properties of execution sequences rather than as properties of states, NMRDPs form a more natural model than the commonly adopted fully Markovian decision process (MDP) model. While the more tractable solution methods developed for MDPs do not directly apply in the presence of non-Markovian rewards, a number of solution methods for NMRDPs have been proposed in the literature. These all exploit a compact specification of the non-Markovian reward function in temporal logic, to automatically translate the NMRDP into an equivalent MDP which is solved using efficient MDP solution methods. This paper presents NMRDPP (Non-Markovian Reward Decision Process Planner), a software platform for the development and experimentation of methods for decision-theoretic planning with non-Markovian rewards. The current version of NMRDPP implements, under a single interface, a family of methods based on existing as well as new approaches which we describe in detail. These include dynamic programming, heuristic search, and structured methods. Using NMRDPP, we compare the methods and identify certain problem features that affect their performance. NMRDPP’s treatment of non-Markovian rewards is inspired by the treatment of domain-specific search control knowledge in the TLPlan planner, which it incorporates as a special case. In the First International Probabilistic Planning Competition, NMRDPP was able to compete and perform well in both the domain-independent and hand-coded tracks, using search control knowledge in the latter.

報酬が現在の状態だけでなく履歴にも依存する意思決定プロセスは、非マルコフ報酬を伴う意思決定プロセス（NMRDP）と呼ばれます。意思決定理論的計画では、多くの望ましい動作が状態の特性ではなく実行シーケンスの特性としてより自然に表現されるため、NMRDPは、一般的に採用されている完全マルコフ意思決定プロセス（MDP）モデルよりも自然なモデルを形成します。MDP用に開発された、より扱いやすい解決法は、非マルコフ報酬がある場合には直接適用できませんが、文献ではNMRDP用の解決法が数多く提案されています。これらはすべて、時相論理における非マルコフ報酬関数のコンパクトな仕様を利用して、NMRDPを、効率的なMDP解決法を使用して解決される同等のMDPに自動的に変換します。この論文では、非マルコフ報酬を使用した意思決定理論的計画方法の開発と実験のためのソフトウェアプラットフォームであるNMRDPP (非マルコフ報酬決定プロセスプランナー)を紹介します。NMRDPPの現在のバージョンでは、単一のインターフェイスで、ここで詳細に説明する既存および新しいアプローチに基づく一連の手法を実装しています。これらには、動的計画法、ヒューリスティック検索、構造化手法が含まれます。NMRDPPを使用して、これらの手法を比較し、パフォーマンスに影響を与える特定の問題の特徴を特定します。NMRDPPの非マルコフ報酬の扱いは、TLPlanプランナーにおけるドメイン固有の探索制御知識の扱いに着想を得ており、特別なケースとして組み込まれています。第1回国際確率プランニングコンペティションにおいて、NMRDPPはドメイン独立型トラックとハンドコーディング型トラックの両方で、後者の探索制御知識を用いて、優れた成績を収めました。

An Approach to Temporal Planning and Scheduling in Domains with Predictable Exogenous Events

An Approach to Temporal Planning and Scheduling in Domains with Predictable Exogenous Events / 予測可能な外生イベントを伴う領域における時間的プランニングとスケジューリングへのアプローチ

The treatment of exogenous events in planning is practically important in many real-world domains where the preconditions of certain plan actions are affected by such events. In this paper we focus on planning in temporal domains with exogenous events that happen at known times, imposing the constraint that certain actions in the plan must be executed during some predefined time windows. When actions have durations, handling such temporal constraints adds an extra difficulty to planning. We propose an approach to planning in these domains which integrates constraint-based temporal reasoning into a graph-based planning framework using local search. Our techniques are implemented in a planner that took part in the 4th International Planning Competition (IPC-4). A statistical analysis of the results of IPC-4 demonstrates the effectiveness of our approach in terms of both CPU-time and plan quality. Additional experiments show the good performance of the temporal reasoning techniques integrated into our planner.

計画における外生的イベントの扱いは、多くの現実世界の領域において実際上重要です。そのような領域では、特定の計画アクションの前提条件がそのようなイベントによって影響を受けます。本稿では、既知の時間に発生する外生的イベントを伴う時間領域での計画に焦点を当て、計画内の特定のアクションは事前に定義された時間ウィンドウ内に実行されなければならないという制約を課します。アクションに継続時間がある場合、そのような時間的制約の処理は計画にさらなる困難をもたらします。本稿では、制約ベースの時間的推論を、局所探索を用いたグラフベースの計画フレームワークに統合する、これらの領域での計画手法を提案します。本手法は、第4回国際計画コンペティション(IPC-4)に参加したプランナーに実装されています。IPC-4の結果の統計的分析により、CPU時間と計画品質の両方の観点から本手法の有効性が実証されています。追加の実験により、時間的推論手法をプランナーに統合することで優れたパフォーマンスが得られることが示されています。

Improving Heuristics Through Relaxed Search – An Analysis of TP4 and HSP*a in the 2004 Planning Competition

Improving Heuristics Through Relaxed Search – An Analysis of TP4 and HSP*a in the 2004 Planning Competition / 緩和探索によるヒューリスティックの改善 – 2004年プランニングコンペティションにおけるTP4とHSP*aの分析

The hm admissible heuristics for (sequential and temporal) regression planning are defined by a parameterized relaxation of the optimal cost function in the regression search space, where the parameter m offers a trade-off between the accuracy and computational cost of theheuristic. Existing methods for computing the hm heuristic require time exponential in m, limiting them to small values (m <= 2). The hm heuristic can also be viewed as the optimal cost function in a relaxation of the search space: this paper presents relaxed search, a method for computing this function partially by searching in the relaxed space. The relaxed search method, because it computes hm only partially, is computationally cheaper and therefore usable for higher values of m. The (complete) hm heuristic is combined with partial hm heuristics, for m = 3,..., computed by relaxed search, resulting in a more accurate heuristic.This use of the relaxed search method to improve on the hm heuristic is evaluated by comparing two optimal temporal planners: TP4, which does not use it, and HSP*a, which uses it but is otherwise identical to TP4. The comparison is made on the domains used in the 2004 International Planning Competition, in which both planners participated. Relaxed search is found to be cost effective in some of these domains, but not all. Analysis reveals a characterization of the domains in which relaxed search can be expected to be cost effective, in terms of two measures on the original and relaxed search spaces. In the domains where relaxed search is cost effective, expanding small states is computationally cheaper than expanding large states and small states tend to have small successor states.

（シーケンシャルおよびテンポラル）回帰計画におけるhm許容ヒューリスティックは、回帰探索空間における最適コスト関数のパラメータ化された緩和によって定義されます。ここで、パラメータmはヒューリスティックの精度と計算コストのトレードオフとなります。hmヒューリスティックを計算する既存の手法は、mの指数関数の時間を必要とするため、小さな値（m <= 2）に限定されます。hmヒューリスティックは、探索空間の緩和における最適コスト関数と見なすこともできます。本稿では、緩和空間内で探索を行うことでこの関数を部分的に計算する手法である緩和探索を紹介します。緩和探索法はhmを部分的にしか計算しないため、計算コストが低く、したがってmの値が大きい場合にも使用できます。(完全な) hmヒューリスティックは、緩和検索で計算されるm = 3、...の部分hmヒューリスティックと組み合わされ、より正確なヒューリスティックになります。この緩和検索法を使用したhmヒューリスティックの改良は、それを使用しないTP4と、それを使用する以外はTP4と同一のHSP*aという2つの最適時間プランナーを比較することによって評価されます。この比較は、両方のプランナーが参加した2004年の国際計画コンペティションで使用されたドメインで行われます。緩和検索は、これらのドメインの一部では費用対効果が高いことがわかりましたが、すべてではありません。分析により、元の検索空間と緩和検索空間に関する2つの尺度で、緩和検索が費用対効果が高いと期待できるドメインの特性が明らかになりました。緩和検索が費用対効果が高いドメインでは、小さな状態を拡張する方が大きな状態を拡張するよりも計算コストが安く、小さな状態は小さな後継状態を持つ傾向があります。

Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web

Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web / ピアツーピア環境における分散推論：セマンティックウェブへの応用

In a peer-to-peer inference system, each peer can reason locally but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary. In this paper, we consider peer-to-peer inference systems in which the local theory of each peer is a set of propositional clauses defined upon a local vocabulary. An important characteristic of peer-to-peer inference systems is that the global theory (the union of all peer theories) is not known (as opposed to partition-based reasoning systems). The main contribution of this paper is to provide the first consequence finding algorithm in a peer-to-peer setting: DeCA. It is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We exhibit a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. Another important contribution is to apply this general distributed reasoning setting to the setting of the Semantic Web through the Somewhere semantic peer-to-peer data management system. The last contribution of this paper is to provide an experimental analysis of the scalability of the peer-to-peer infrastructure that we propose, on large networks of 1000 peers.

ピアツーピア推論システムでは、各ピアは局所的に推論できるだけでなく、一部の語彙を共有する知り合いピアを勧誘することもできます。この論文では、各ピアのローカル理論がローカル語彙に基づいて定義された命題節の集合であるピアツーピア推論システムについて考察します。ピアツーピア推論システムの重要な特徴は、グローバル理論（すべてのピア理論の和集合）が不明なことです（パーティションベースの推論システムとは対照的です）。この論文の主な貢献は、ピアツーピア設定で最初の結果発見アルゴリズムであるDeCAを提供することです。これはいつでも実行でき、勧誘されたピアから徐々に遠くのピアへと結果を計算します。このアルゴリズムの完全性を保証するために、ピアツーピア推論システムの知り合いグラフに関する十分な条件を示します。もう一つの重要な貢献は、Somewhereセマンティックピアツーピアデータ管理システムを通じて、この一般的な分散推論環境をセマンティックウェブ環境に適用することです。本論文の最後の貢献は、1000ピアの大規模ネットワーク上で、提案するピアツーピアインフラストラクチャのスケーラビリティを実験的に分析することです。

Negotiating Socially Optimal Allocations of Resources

Negotiating Socially Optimal Allocations of Resources / 社会的に最適な資源配分の交渉

A multiagent system may be thought of as an artificial society of autonomous software agents and we can apply concepts borrowed from welfare economics and social choice theory to assess the social welfare of such an agent society. In this paper, we study an abstract negotiation framework where agents can agree on multilateral deals to exchange bundles of indivisible resources. We then analyse how these deals affect social welfare for different instances of the basic framework and different interpretations of the concept of social welfare itself. In particular, we show how certain classes of deals are both sufficient and necessary to guarantee that a socially optimal allocation of resources will be reached eventually.

マルチエージェントシステムは、自律ソフトウェアエージェントの人工社会と考えることができ、厚生経済学と社会選択理論から借用した概念を適用することで、このようなエージェント社会の社会福祉を評価することができます。本論文では、エージェントが不可分な資源の束を交換するための多国間取引に合意できる抽象的な交渉フレームワークを研究します。次に、基本フレームワークの異なるインスタンスと社会福祉の概念自体の異なる解釈において、これらの取引が社会福祉にどのように影響するかを分析します。特に、特定の種類の取引が、最終的に社会的に最適な資源配分に到達することを保証するために十分かつ必要であることを示します。

Representing Conversations for Scalable Overhearing

Representing Conversations for Scalable Overhearing / スケーラブルな立ち聞きのための会話表現

Open distributed multi-agent systems are gaining interest in the academic community and in industry. In such open settings, agents are often coordinated using standardized agent conversation protocols. The representation of such protocols (for analysis, validation, monitoring, etc) is an important aspect of multi-agent applications. Recently, Petri nets have been shown to be an interesting approach to such representation, and radically different approaches using Petri nets have been proposed. However, their relative strengths and weaknesses have not been examined. Moreover, their scalability and suitability for different tasks have not been addressed. This paper addresses both these challenges. First, we analyze existing Petri net representations in terms of their scalability and appropriateness for overhearing, an important task in monitoring open multi-agent systems. Then, building on the insights gained, we introduce a novel representation using Colored Petri nets that explicitly represent legal joint conversation states and messages. This representation approach offers significant improvements in scalability and is particularly suitable for overhearing. Furthermore, we show that this new representation offers a comprehensive coverage of all conversation features of FIPA conversation standards. We also present a procedure for transforming AUML conversation protocol diagrams (a standard human-readable representation), to our Colored Petri net representation.

オープン分散マルチエージェントシステムは、学術界と産業界で関心を集めています。このようなオープンな設定では、エージェントは標準化されたエージェント会話プロトコルを用いて調整されることが多い。このようなプロトコル（分析、検証、監視など）の表現は、マルチエージェントアプリケーションの重要な側面です。近年、ペトリネットはそのような表現に対する興味深いアプローチであることが示され、ペトリネットを用いた根本的に異なるアプローチが提案されています。しかし、それらの相対的な長所と短所は検証されていない。さらに、スケーラビリティや様々なタスクへの適合性についても検討されていない。本稿では、これらの課題の両方に対処します。まず、既存のペトリネット表現を、スケーラビリティと、オープンマルチエージェントシステムの監視における重要なタスクである傍受への適合性の観点から分析します。次に、得られた知見に基づき、合法的な共同会話状態とメッセージを明示的に表現するカラーペトリネットを用いた新しい表現を導入します。この表現アプローチはスケーラビリティを大幅に向上させ、特に傍受に適しています。さらに、この新しい表現がFIPA会話標準のすべての会話機能を包括的にカバーすることを示す。また、AUML会話プロトコル図(標準的な人間が判読できる表現)をカラーペトリネット表現に変換する手順も紹介します。

On Graphical Modeling of Preference and Importance

On Graphical Modeling of Preference and Importance / 選好と重要度のグラフィカルモデリングについて

In recent years, CP-nets have emerged as a useful tool for supporting preference elicitation, reasoning, and representation. CP-nets capture and support reasoning with qualitative conditional preference statements, statements that are relatively natural for users to express. In this paper, we extend the CP-nets formalism to handle another class of very natural qualitative statements one often uses in expressing preferences in daily life – statements of relative importance of attributes. The resulting formalism, TCP-nets, maintains the spirit of CP-nets, in that it remains focused on using only simple and natural preference statements, uses the ceteris paribus semantics, and utilizes a graphical representation of this information to reason about its consistency and to perform, possibly constrained, optimization using it. The extra expressiveness it provides allows us to better model tradeoffs users would like to make, more faithfully representing their preferences.

近年、CPネットは選好の引き出し、推論、および表現を支援するための有用なツールとして登場しました。CPネットは、ユーザーが比較的自然に表現できる質的な条件付き選好ステートメントによる推論を捕捉し、サポートします。本稿では、CPネットの形式主義を拡張し、日常生活で選好を表現する際によく使用される、属性の相対的な重要度のステートメントという、非常に自然な質的ステートメントの別のクラスを処理します。結果として得られる形式主義であるTCPネットは、単純で自然な選好ステートメントのみを使用することに重点を置き、ceteris paribusセマンティクスを使用し、この情報のグラフィカル表現を使用してその一貫性について推論し、場合によっては制約付きの最適化を実行するという点で、CPネットの精神を維持しています。それが提供する追加の表現力により、ユーザーが行いたいトレードオフをより適切にモデル化し、ユーザーの選好をより忠実に表現することができます。

Logical Hidden Markov Models

Logical Hidden Markov Models / 論理隠れマルコフモデル

Logical hidden Markov models (LOHMMs) upgrade traditional hidden Markov models to deal with sequences of structured symbols in the form of logical atoms, rather than flat characters.This note formally introduces LOHMMs and presents solutions to the three central inference problems for LOHMMs: evaluation, most likely hidden state sequence and parameter estimation. The resulting representation and algorithms are experimentally evaluated on problems from the domain of bioinformatics.

論理隠れマルコフモデル（LOHMM）は、従来の隠れマルコフモデルを改良し、構造化されたシンボルのシーケンスを、平坦な文字ではなく論理アトムの形で処理します。本稿では、LOHMMを正式に紹介し、LOHMMの3つの主要な推論問題（評価、最も尤度の高い隠れ状態シーケンス、パラメータ推定）に対する解決策を提示します。得られた表現とアルゴリズムは、バイオインフォマティクス分野の問題を用いて実験的に評価されます。

A Continuation Method for Nash Equilibria in Structured Games

A Continuation Method for Nash Equilibria in Structured Games / 構造化ゲームにおけるナッシュ均衡の継続法

Structured game representations have recently attracted interest as models for multi-agent artificial intelligence scenarios, with rational behavior most commonly characterized by Nash equilibria. This paper presents efficient, exact algorithms for computing Nash equilibria in structured game representations, including both graphical games and multi-agent influence diagrams (MAIDs). The algorithms are derived from a continuation method for normal-form and extensive-form games due to Govindan and Wilson; they follow a trajectory through a space of perturbed games and their equilibria, exploiting game structure through fast computation of the Jacobian of the payoff function. They are theoretically guaranteed to find at least one equilibrium of the game, and may find more. Our approach provides the first efficient algorithm for computing exact equilibria in graphical games with arbitrary topology, and the first algorithm to exploit fine-grained structural properties of MAIDs. Experimental results are presented demonstrating the effectiveness of the algorithms and comparing them to predecessors. The running time of the graphical game algorithm is similar to, and often better than, the running time of previous approximate algorithms. The algorithm for MAIDs can effectively solve games that are much larger than those solvable by previous methods.

構造化ゲーム表現は、マルチエージェント人工知能シナリオのモデルとして近年注目を集めており、合理的行動は一般的にナッシュ均衡によって特徴付けられます。本稿では、グラフィカルゲームとマルチエージェント影響ダイアグラム（MAID）の両方を含む構造化ゲーム表現において、ナッシュ均衡を計算するための効率的かつ正確なアルゴリズムを提示します。これらのアルゴリズムは、GovindanとWilsonによる正規形および展開形ゲームのための継続法から派生したものです。これらのアルゴリズムは、摂動を受けたゲームとその均衡の空間を通る軌跡を辿り、利得関数のヤコビアンを高速に計算することでゲーム構造を活用します。これらのアルゴリズムは、理論上、ゲームの均衡点を少なくとも1つ見つけることが保証されており、さらに複数の均衡点を見つける可能性もあります。本アプローチは、任意の位相を持つグラフィカルゲームにおける正確な均衡点を計算するための初の効率的なアルゴリズムであり、MAIDの細粒度構造特性を活用する初のアルゴリズムでもあります。本アルゴリズムの有効性を実証し、先行アルゴリズムと比較した実験結果も提示します。このグラフィカルゲームアルゴリズムの実行時間は、従来の近似アルゴリズムと同等であり、場合によってはそれよりも高速です。MAIDアルゴリズムは、従来の手法で解けるゲームよりもはるかに大規模なゲームを効果的に解くことができます。

Fault Tolerant Boolean Satisfiability

Fault Tolerant Boolean Satisfiability / フォールトトレラントなブール充足可能性

A delta-model is a satisfying assignment of a Boolean formula for which any small alteration, such as a single bit flip, can be repaired by flips to some small number of other bits, yielding a new satisfying assignment. These satisfying assignments represent robust solutions to optimization problems (e.g., scheduling) where it is possible to recover from unforeseen events (e.g., a resource becoming unavailable). The concept of delta-models was introduced by Ginsberg, Parkes and Roy (AAAI 1998) , where it was proved that finding delta-models for general Boolean formulas is NP-complete. In this paper, we extend that result by studying the complexity of finding delta-models for classes of Boolean formulas which are known to have polynomial time satisfiability solvers. In particular, we examine 2-SAT, Horn-SAT, Affine-SAT, dual-Horn-SAT, 0-valid and 1-valid SAT. We see a wide variation in the complexity of finding delta-models, e.g., while 2-SAT and Affine-SAT have polynomial time tests for delta-models, testing whether a Horn-SAT formula has one is NP-complete.

デルタモデルとは、ブール式の満足できる割り当てであり、1ビットの反転などの小さな変更は、他の少数のビットの反転によって修復でき、新しい満足できる割り当てが得られます。これらの満足できる割り当ては、予期しないイベント（リソースが利用できなくなるなど）から回復できる最適化問題（スケジューリングなど）に対する堅牢なソリューションとなります。デルタモデルの概念は、Ginsberg、Parkes、Roy（AAAI 1998）によって導入され、一般的なブール式のデルタモデルを見つけることはNP完全であることが証明されました。本稿では、その結果を拡張し、多項式時間の満足可能性ソルバーを持つことが知られているブール式のクラスについてデルタモデルを見つける複雑さを検討します。特に、2-SAT、Horn-SAT、Affine-SAT、dual-Horn-SAT、0-valid、および1-valid SATを検討します。デルタモデルを見つける複雑さには大きなばらつきがあります。例えば、2-SATとAffine-SATはデルタモデルに対して多項式時間テストを実行できますが、Horn-SAT式にデルタモデルが存在するかどうかをテストすることはNP完全です。

Asynchronous Partial Overlay: A New Algorithm for Solving Distributed Constraint Satisfaction Problems

Asynchronous Partial Overlay: A New Algorithm for Solving Distributed Constraint Satisfaction Problems / 非同期部分オーバーレイ：分散制約充足問題を解くための新しいアルゴリズム

Distributed Constraint Satisfaction (DCSP) has long been considered an important problem in multi-agent systems research. This is because many real-world problems can be represented as constraint satisfaction and these problems often present themselves in a distributed form. In this article, we present a new complete, distributed algorithm called Asynchronous Partial Overlay (APO) for solving DCSPs that is based on a cooperative mediation process. The primary ideas behind this algorithm are that agents, when acting as a mediator, centralize small, relevant portions of the DCSP, that these centralized subproblems overlap, and that agents increase the size of their subproblems along critical paths within the DCSP as the problem solving unfolds. We present empirical evidence that shows that APO outperforms other known, complete DCSP techniques.

分散制約充足（DCSP）は、マルチエージェントシステムの研究において長年重要な問題と考えられてきました。これは、多くの現実世界の問題を制約充足として表現でき、これらの問題はしばしば分散形式で現れるためです。本稿では、協調的調停プロセスに基づく、DCSPを解決するための新しい完全分散アルゴリズム、非同期部分オーバーレイ（APO）を紹介します。このアルゴリズムの背後にある基本的な考え方は、エージェントが調停者として行動する際に、DCSPの関連性の高い小さな部分を集中管理すること、これらの集中管理された部分問題が重なり合うこと、そして問題解決が進むにつれて、エージェントがDCSP内のクリティカルパスに沿って部分問題のサイズを拡大していくことです。APOが他の既知の完全なDCSP手法よりも優れていることを示す実証的証拠を提示します。

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes / 政策言語バイアスを用いた近似政策反復：関係マルコフ決定過程の解法

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.

大規模な関係マルコフ決定過程（MDP）に対する方策選択のアプローチを検討します。通常の価値関数学習ステップを方策空間での学習ステップに置き換える近似方策反復（API）の変種を検討します。これは、対応する価値関数よりも優れた方策の表現と学習が容易な領域で有利であり、我々が関心を持つ関係MDPでは多くの場合これが当てはまる。このような問題にAPIを適用するために、関係方策言語と対応する学習器を導入します。さらに、ランダムウォークに基づくゴールベースプランニング領域用の新しいブートストラッピングルーチンを導入します。このようなブートストラッピングは、報酬が非常にスパースである多くの大規模関係MDPに必要であり、APIは情報のない方策で初期化された場合、そのような領域では効果がない。実験の結果、得られたシステムは、多数の古典的な計画領域とその確率的変種に対し、非常に大規模な関係MDPとして解くことで、優れた方策を見つけることができることが示されました。また、実験は私たちのアプローチのいくつかの限界を示唆しており、今後の課題を示唆しています。

参考文献

Journal of Artificial Intelligence Resarch Vol. 25 (2006)

論文