Journal of Artificial Intelligence Resarch: Volume 19の論文一覧

Journal of Artificial Intelligence Resarch Vol. 19 (2003)に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。
Optimal Schedules for Parallelizing Anytime Algorithms: The Case of Shared Resources
Anytimeアルゴリズムの並列化のための最適スケジュール：共有リソースの場合

The performance of anytime algorithms can be improved by simultaneously solving several instances of algorithm-problem pairs. These pairs may include different instances of a problem (such as starting from a different initial state), different algorithms (if several alternatives exist), or several runs of the same algorithm (for non-deterministic algorithms). In this paper we present a methodology for designing an optimal scheduling policy based on the statistical characteristics of the algorithms involved. We formally analyze the case where the processes share resources (a single-processor model), and provide an algorithm for optimal scheduling. We analyze, theoretically and empirically, the behavior of our scheduling algorithm for various distribution types. Finally, we present empirical results of applying our scheduling algorithm to the Latin Square problem.

いつでもアルゴリズムのパフォーマンスは、アルゴリズムと問題のペアの複数のインスタンスを同時に解くことによって向上できます。これらのペアには、問題の異なるインスタンス(異なる初期状態から開始するなど)、異なるアルゴリズム(複数の選択肢がある場合)、または同じアルゴリズムの複数回の実行(非決定性アルゴリズムの場合)が含まれる場合があります。この論文では、関連するアルゴリズムの統計的特性に基づいて、最適なスケジューリングポリシーを設計する方法を示します。プロセスがリソースを共有するケース(シングルプロセッサモデル)を正式に分析し、最適なスケジューリングのためのアルゴリズムを提供します。さまざまな分布タイプに対するスケジューリングアルゴリズムの動作を理論的かつ経験的に分析します。最後に、このスケジューリングアルゴリズムをラテン方格問題に適用した経験的結果を示します。

AltAltp: Online Parallelization of Plans with Heuristic State Search
AltAltp：ヒューリスティック状態探索を用いた計画のオンライン並列化

Despite their near dominance, heuristic state search planners still lag behind disjunctive planners in the generation of parallel plans in classical planning. The reason is that directly searching for parallel solutions in state space planners would require the planners to branch on all possible subsets of parallel actions, thus increasing the branching factor exponentially. We present a variant of our heuristic state search planner AltAlt, called AltAltp which generates parallel plans by using greedy online parallelization of partial plans. The greedy approach is significantly informed by the use of novel distance heuristics that AltAltp derives from a graphplan-style planning graph for the problem. While this approach is not guaranteed to provide optimal parallel plans, empirical results show that AltAltp is capable of generating good quality parallel plans at a fraction of the cost incurred by the disjunctive planners.

ヒューリスティックな状態探索プランナーは、ほぼ優位に立っているにもかかわらず、古典的な計画における並列計画の生成において、依然として選言プランナーに遅れをとっています。その理由は、状態空間プランナーで並列ソリューションを直接探索すると、プランナーが並列アクションのすべての可能なサブセットに分岐する必要があり、分岐係数が指数関数的に増加するためです。私たちは、ヒューリスティックな状態探索プランナーAltAltの亜種であるAltAltpを紹介します。これは、部分計画の貪欲なオンライン並列化を使用して並列計画を生成します。貪欲なアプローチは、AltAltpが問題のグラフプランスタイルの計画グラフから導き出す新しい距離ヒューリスティックの使用によって大きく情報化されています。このアプローチは最適な並列計画を提供することが保証されているわけではありませんが、実験結果では、AltAltpは選言プランナーで発生するコストのほんの一部で高品質の並列計画を生成できることが示されています。

Accelerating Reinforcement Learning through Implicit Imitation
暗黙的模倣による強化学習の加速

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent’s ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.

模倣は、マルチエージェント環境における学習を強化する手段と見なすことができます。模倣は、協力的な教師や他の経験豊富なエージェントの行動に暗黙的に含まれる知識をインテリジェントに利用することで、エージェントが有用な行動を学習する能力を強化します。我々は、特定のケースにおいて強化学習を劇的に加速できる暗黙的模倣の形式モデルを提案し、その有効性を検証します。大まかに言えば、強化学習エージェントはメンターを観察することで、状態空間の未訪問部分における自身の能力とその相対的な価値に関する情報を抽出できます。本研究では、このモデルの2つの具体的な例を検討します。1つは学習エージェントとメンターが同一の能力を持つ場合、もう1つは異なる行動セットを持つエージェントとメンターに対処するように設計された場合です。暗黙的模倣の利点を、優先順位付きスイープと統合することで示し、単一および複数のメンターの観察を通じてパフォーマンスと収束性が向上することを実証します。観測可能性と起こり得る相互作用に関していくつかの厳格な仮定を置くものの、これらの制約を緩和するモデルの拡張についても簡単に触れます。

Decentralized Supply Chain Formation: A Market Protocol and Competitive Equilibrium Analysis
分散型サプライチェーン形成：市場プロトコルと競争均衡分析

Supply chain formation is the process of determining the structure and terms of exchange relationships to enable a multilevel, multiagent production activity. We present a simple model of supply chains, highlighting two characteristic features: hierarchical subtask decomposition, and resource contention. To decentralize the formation process, we introduce a market price system over the resources produced along the chain. In a competitive equilibrium for this system, agents choose locally optimal allocations with respect to prices, and outcomes are optimal overall. To determine prices, we define a market protocol based on distributed, progressive auctions, and myopic, non-strategic agent bidding policies. In the presence of resource contention, this protocol produces better solutions than the greedy protocols common in the artificial intelligence and multiagent systems literature. The protocol often converges to high-value supply chains, and when competitive equilibria exist, typically to approximate competitive equilibria. However, complementarities in agent production technologies can cause the protocol to wastefully allocate inputs to agents that do not produce their outputs. A subsequent decommitment phase recovers a significant fraction of the lost surplus.

サプライチェーン形成とは、マルチレベルのマルチエージェント生産活動を可能にするために、交換関係の構造と条件を決定するプロセスです。本稿では、階層的なサブタスク分解とリソース競合という2つの特徴に焦点を当てた、サプライチェーンのシンプルなモデルを提示します。形成プロセスを分散化するために、チェーンに沿って生産されるリソースに市場価格システムを導入します。このシステムの競争均衡において、エージェントは価格に関して局所的に最適な割り当てを選択し、全体的な結果は最適となります。価格を決定するために、分散型の漸進的オークションと、近視眼的で非戦略的なエージェント入札ポリシーに基づく市場プロトコルを定義します。資源競合が存在する場合、このプロトコルは、人工知能やマルチエージェントシステムの文献で一般的に用いられる貪欲プロトコルよりも優れた解を生成します。このプロトコルは多くの場合、高価値サプライチェーンに収束し、競争均衡が存在する場合は通常、競争均衡に近づく。しかし、エージェント生産技術の相補性により、プロトコルは出力を生産しないエージェントに無駄に入力を割り当てる可能性があります。その後のデコミットメントフェーズで、失われた余剰の大部分が回復されます。

Temporal Decision Trees: Model-based Diagnosis of Dynamic Systems On-Board
時系列決定木：動的システムのオンボードモデルベース診断

The automatic generation of decision trees based on off-line reasoning on models of a domain is a reasonable compromise between the advantages of using a model-based approach in technical domains and the constraints imposed by embedded applications. In this paper we extend the approach to deal with temporal information. We introduce a notion of temporal decision tree, which is designed to make use of relevant information as long as it is acquired, and we present an algorithm for compiling such trees from a model-based reasoning system.

ドメインモデルに基づくオフライン推論に基づく決定木の自動生成は、技術分野におけるモデルベースアプローチの利点と組み込みアプリケーションによって課される制約との間の妥当な妥協点です。本稿では、このアプローチを時間情報への対応に拡張します。取得された関連情報を活用するように設計された時間決定木の概念を導入し、モデルベース推論システムからそのような決定木をコンパイルするためのアルゴリズムを提示します。

Efficient Solution Algorithms for Factored MDPs
因数分解MDPの効率的な解アルゴリズム

This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.

本論文は、大規模なマルコフ決定過程（MDP）における不確実性下における計画問題を取り扱う。因子化MDPは、状態変数を用いて複雑な状態空間を表現し、動的ベイジアンネットワークを用いて遷移モデルを表現します。この表現は、構造化MDPの表現サイズを指数関数的に削減できることが多いが、そのようなMDPの厳密解アルゴリズムの複雑さは、表現サイズが大きくなるにつれて指数関数的に増大する可能性があります。本稿では、因子分解されたMDPの構造を利用する2つの近似解アルゴリズムを紹介します。どちらも、基底関数の線形結合として表される近似値関数を使用します。ここで、各基底関数はドメイン変数の小さなサブセットのみを含みます。本稿の主な貢献は、因子分解されたMDPの加法的な構造とコンテキスト固有の構造の両方を利用することで、両方のアルゴリズムの基本操作を閉形式で効率的に実行する方法を示していることです。私たちのアルゴリズムの中心的要素は、ベイジアンネットワークの変数除去に類似した、指数的に大きなLPを証明可能に同等な多項式サイズのLPに削減する新しい線形計画法分解手法です。1つのアルゴリズムは近似線形計画法を使用し、もう1つは近似動的計画法を使用します。私たちの動的計画法アルゴリズムは、近似MDPアルゴリズムの誤差境界に現れる項をより直接的に最小化する手法である最大ノルムに基づく近似を使用するという点で新しいものです。10^40を超える状態の問題に対する実験結果を示し、私たちのアプローチのスケーラビリティの有望な兆候を実証し、私たちのアルゴリズムを既存の最先端のアプローチと比較し、いくつかの問題で計算時間が指数関数的に向上することを示します。

An Architectural Approach to Ensuring Consistency in Hierarchical Execution
階層的実行における一貫性を保証するアーキテクチャ的アプローチ

Hierarchical task decomposition is a method used in many agent systems to organize agent knowledge. This work shows how the combination of a hierarchy and persistent assertions of knowledge can lead to difficulty in maintaining logical consistency in asserted knowledge. We explore the problematic consequences of persistent assumptions in the reasoning process and introduce novel potential solutions. Having implemented one of the possible solutions, Dynamic Hierarchical Justification, its effectiveness is demonstrated with an empirical analysis.

階層的タスク分解は、多くのエージェントシステムでエージェントの知識を整理するために用いられる手法です。本研究では、階層構造と知識の永続的な主張の組み合わせが、主張された知識の論理的一貫性の維持を困難にする可能性があることを示す。推論プロセスにおける永続的な仮定の問題点を探り、新たな解決策を提示します。解決策の一つである動的階層的正当化を実装し、その有効性を実証分析によって実証します。

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction
学習データが高コストな場合の学習：クラス分布が木導出に与える影響

For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the training examples and/or the computational costs associated with learning from them. In such circumstances, one question of practical importance is: if only n training examples can be selected, in what proportion should the classes be represented? In this article we help to answer this question by analyzing, for a fixed training-set size, the relationship between the class distribution of the training data and the performance of classification trees induced from these data. We study twenty-six data sets and, for each, determine the best class distribution for learning. The naturally occurring class distribution is shown to generally perform well when classifier performance is evaluated using undifferentiated error rate (0/1 loss). However, when the area under the ROC curve is used to evaluate classifier performance, a balanced distribution is shown to perform well. Since neither of these choices for class distribution always generates the best-performing classifier, we introduce a budget-sensitive progressive sampling algorithm for selecting training examples based on the class associated with each example. An empirical analysis of this algorithm shows that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance.

大規模な実世界の帰納学習問題では、訓練例の入手、準備、保管にかかるコストや、それらからの学習にかかる計算コストのために、訓練例の数を制限しなければならない場合が多い。このような状況において、実用上重要な疑問の一つは、n個の訓練例しか選択できない場合、クラスをどの程度の割合で表現すべきか、ということです。本稿では、訓練セットのサイズを固定した場合、訓練データのクラス分布と、これらのデータから導出される分類木の性能との関係を分析することで、この疑問への答えを見つける。26個のデータセットを研究し、それぞれについて、学習に最適なクラス分布を決定します。分類器の性能を非分化エラー率（0/1損失）を用いて評価した場合、自然発生的なクラス分布は一般的に良好な性能を示す。しかし、ROC曲線下面積を用いて分類器の性能を評価する場合、バランスの取れた分布が良好な性能を示す。これらのクラス分布の選択肢はいずれも、必ずしも最良の性能を示す分類器を生成するわけではないため、各例に関連付けられたクラスに基づいて訓練例を選択する、予算を考慮した漸進的サンプリングアルゴリズムを導入します。このアルゴリズムの実証分析により、得られたトレーニングセットのクラス分布から、良好な（ほぼ最適な）分類性能を持つ分類器が得られることが示されました。

Compiling Causal Theories to Successor State Axioms and STRIPS-Like Systems
後続状態公理とSTRIPS型システムへの因果理論のコンパイル

We describe a system for specifying the effects of actions. Unlike those commonly used in AI planning, our system uses an action description language that allows one to specify the effects of actions using domain rules, which are state constraints that can entail new action effects from old ones. Declaratively, an action domain in our language corresponds to a nonmonotonic causal theory in the situation calculus. Procedurally, such an action domain is compiled into a set of logical theories, one for each action in the domain, from which fully instantiated successor state-like axioms and STRIPS-like systems are then generated. We expect the system to be a useful tool for knowledge engineers writing action specifications for classical AI planning systems, GOLOG systems, and other systems where formal specifications of actions are needed.

アクションの効果を定義するシステムについて説明します。AIプランニングで一般的に用いられるものとは異なり、本システムは、ドメインルールを用いてアクションの効果を指定できるアクション記述言語を採用しています。ドメインルールとは、既存のアクション効果から新たなアクション効果を導き出すことができる状態制約です。宣言的に、本言語におけるアクションドメインは、状況計算における非単調因果理論に対応します。手続き的に、このようなアクションドメインは、ドメイン内の各アクションに対応する論理理論の集合にコンパイルされ、そこから完全にインスタンス化された後続状態のような公理とSTRIPSのようなシステムが生成されます。本システムは、古典的なAIプランニングシステム、GOLOGシステム、その他アクションの形式的な仕様記述が求められるシステムのアクション仕様を記述する知識エンジニアにとって有用なツールとなることが期待されます。

Answer Set Planning Under Action Costs
行動コストを考慮した解答集合計画

Recently, planning based on answer set programming has been proposed as an approach towards realizing declarative planning systems. In this paper, we present the language Kc, which extends the declarative planning language K by action costs. Kc provides the notion of admissible and optimal plans, which are plans whose overall action costs are within a given limit resp. minimum over all plans (i.e., cheapest plans). As we demonstrate, this novel language allows for expressing some nontrivial planning tasks in a declarative way. Furthermore, it can be utilized for representing planning problems under other optimality criteria, such as computing “shortest” plans (with the least number of steps), and refinement combinations of cheapest and fastest plans. We study complexity aspects of the language Kc and provide a transformation to logic programs, such that planning problems are solved via answer set programming. Furthermore, we report experimental results on selected problems. Our experience is encouraging that answer set planning may be a valuable approach to expressive planning systems in which intricate planning problems can be naturally specified and solved.

最近、宣言型プランニングシステムを実現するためのアプローチとして、アンサーセットプログラミングに基づくプランニングが提案されています。本稿では、宣言型プランニング言語Kをアクションコストによって拡張した言語Kcを紹介します。Kcは、許容プランと最適プランの概念を提供します。これらは、全体のアクションコストが所定の制限値内、またはすべてのプラン（すなわち、最安プラン）の中で最小となるプランです。本稿で示すように、この新しい言語は、いくつかの非自明なプランニングタスクを宣言的に表現することを可能にします。さらに、この言語は、他の最適性基準に基づくプランニング問題、例えば「最短」プラン（ステップ数が最小）の計算や、最安プランと最速プランの改良組み合わせの計算にも利用できます。本稿では、言語Kcの複雑性の側面を研究し、プランニング問題をアンサーセットプログラミングによって解決できるように論理プログラムへの変換を提供します。さらに、選択された問題における実験結果も報告します。本稿の経験から、アンサーセットプログラミングは、複雑なプランニング問題を自然に定義し、解決できる表現力豊かなプランニングシステムへの有用なアプローチとなる可能性が示唆されています。

Updating Probabilities
確率の更新

As examples such as the Monty Hall puzzle show, applying conditioning to update a probability distribution on a “naive space”, which does not take into account the protocol used, can often lead to counterintuitive results. Here we examine why. A criterion known as CAR (“coarsening at random”) in the statistical literature characterizes when “naive” conditioning in a naive space works. We show that the CAR condition holds rather infrequently, and we provide a procedural characterization of it, by giving a randomized algorithm that generates all and only distributions for which CAR holds. This substantially extends previous characterizations of CAR. We also consider more generalized notions of update such as Jeffrey conditioning and minimizing relative entropy (MRE). We give a generalization of the CAR condition that characterizes when Jeffrey conditioning leads to appropriate answers, and show that there exist some very simple settings in which MRE essentially never gives the right results. This generalizes and interconnects previous results obtained in the literature on CAR and MRE.

モンティ・ホール・パズルの例が示すように、「ナイーブ空間」上の確率分布を更新するために条件付けを適用すると、使用されるプロトコルを考慮に入れずに、しばしば直感に反する結果につながる可能性があります。ここではその理由を検証します。統計学の文献ではCAR（「ランダムな粗化」）として知られる基準は、ナイーブ空間における「ナイーブ」条件付けが機能する場合を特徴付けます。CAR条件が成立する頻度は比較的低いことを示し、CARが成立するすべての分布のみを生成するランダム化アルゴリズムを与えることで、手続き的な特徴付けを提供します。これは、CARのこれまでの特徴付けを大幅に拡張するものです。また、ジェフリー条件付けや相対エントロピー最小化（MRE）といった、より一般化された更新の概念も考察します。ジェフリー条件付けが適切な解を導く場合を特徴付けるCAR条件の一般化を示し、MREが本質的に正しい結果を与えない非常に単純な設定が存在することを示します。これは、CARとMREに関する文献で得られたこれまでの結果を一般化し、相互に関連付けるものです。

Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous, Interacting Auctions
同時相互作用オークションにおける学習済み密度モデルに基づく決定理論的入札

Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to the greatest extent possible, optimal bids. We introduce a new and general boosting-based algorithm for conditional density estimation problems of this kind, i.e., supervised learning problems in which the goal is to estimate the entire conditional distribution of the real-valued label. This approach is fully implemented as ATTac-2001, a top-scoring agent in the second Trading Agent Competition (TAC-01). We present experiments demonstrating the effectiveness of our boosting-based price predictor relative to several reasonable alternatives.

オークションは、特にインターネット上で、ビジネスを取引する方法としてますます普及しつつあります。この記事では、相互作用する商品の複数の同時オークションに入札する自律入札エージェントを構築するための一般的なアプローチを紹介します。私たちのアプローチの中核となる要素は、過去のデータに基づいて経験的な価格動向のモデルを学習し、そのモデルを使用して、可能な限り最適な入札を解析的に計算することです。私たちは、この種の条件付き密度推定問題、つまり実数値ラベルの条件付き分布全体を推定することを目的とする教師あり学習問題に対する、新しい一般的なブースティングベースのアルゴリズムを導入します。このアプローチは、第2回トレーディングエージェントコンペティション(TAC-01)で最高得点を獲得したエージェントであるATTac-2001として完全に実装されています。私たちは、いくつかの合理的な代替手段と比較して、ブースティングベースの価格予測器の有効性を示す実験を紹介します。

Potential-Based Shaping and Q-Value Initialization are Equivalent
ポテンシャルベースのシェーピングとQ値初期化は等価

Shaping has proven to be a powerful but precarious means of improving reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the potential-based shaping algorithm for adding shaping rewards in a way that guarantees the learner will learn optimal behavior. In this note, we prove certain similarities between this shaping algorithm and the initialization step required for several reinforcement learning algorithms. More specifically, we prove that a reinforcement learner with initial Q-values based on the shaping algorithm’s potential function make the same updates throughout learning as a learner receiving potential-based shaping rewards. We further prove that under a broad category of policies, the behavior of these two learners are indistinguishable. The comparison provides intuition on the theoretical properties of the shaping algorithm as well as a suggestion for a simpler method for capturing the algorithm’s benefit. In addition, the equivalence raises previously unaddressed issues concerning the efficiency of learning with potential-based shaping.

シェーピングは、強化学習の性能を向上させる強力だが不安定な手段であることが証明されています。Ng、Harada、およびRussell (1999)は、学習者が最適な行動を学習することを保証する方法でシェーピング報酬を追加する、ポテンシャルベースのシェーピングアルゴリズムを提案した。本稿では、このシェーピングアルゴリズムと、いくつかの強化学習アルゴリズムに必要な初期化ステップとの間の類似性を証明します。より具体的には、シェーピングアルゴリズムのポテンシャル関数に基づく初期Q値を持つ強化学習者は、ポテンシャルベースのシェーピング報酬を受け取る学習者と学習を通して同じ更新を行うことを証明した。さらに、広範なポリシーカテゴリの下では、これら2つの学習者の行動は区別がつかないことを証明した。この比較は、シェーピングアルゴリズムの理論的特性に関する直観的な理解と、アルゴリズムの利点を捉えるためのより簡単な方法の提案を提供します。さらに、この同等性は、ポテンシャルベースのシェーピングを用いた学習の効率性に関して、これまで未解決であった問題を提起します。

Representing and Aggregating Conflicting Beliefs
相反する信念の表現と集約

We consider the two-fold problem of representing collective beliefs and aggregating these beliefs. We propose a novel representation for collective beliefs that uses modular, transitive relations over possible worlds. They allow us to represent conflicting opinions and they have a clear semantics, thus improving upon the quasi-transitive relations often used in social choice. We then describe a way to construct the belief state of an agent informed by a set of sources of varying degrees of reliability. This construction circumvents Arrow’s Impossibility Theorem in a satisfactory manner by accounting for the explicitly encoded conflicts. We give a simple set-theory-based operator for combining the information of multiple agents. We show that this operator satisfies the desirable invariants of idempotence, commutativity, and associativity, and, thus, is well-behaved when iterated, and we describe a computationally effective way of computing the resulting belief state. Finally, we extend our framework to incorporate voting.

我々は、集団的信念の表現と、それらの信念の集約という二重の問題を考察します。可能世界におけるモジュール型の推移的関係を用いた、集団的信念の新たな表現法を提案します。この表現法は、相反する意見を表現することを可能にし、明確な意味論を持つため、社会選択においてしばしば用いられる準推移的関係を改良します。次に、信頼性の異なる情報源の集合から得られるエージェントの信念状態を構築する方法を説明します。この構築法は、明示的に符号化された矛盾を考慮することで、アローの不可能性定理を満足のいく方法で回避します。我々は、複数のエージェントの情報を統合するための、集合論に基づく単純な演算子を提示します。この演算子は、冪等性、可換性、結合性という望ましい不変条件を満たし、したがって反復処理においても良好な振る舞いを示すことを示す。そして、結果として得られる信念状態を計算するための計算効率の高い方法を説明します。最後に、投票を組み込むために我々の枠組みを拡張します。

Bound Propagation
境界伝播法

In this article we present an algorithm to compute bounds on the marginals of a graphical model. For several small clusters of nodes upper and lower bounds on the marginal values are computed independently of the rest of the network. The range of allowed probability distributions over the surrounding nodes is restricted using earlier computed bounds. As we will show, this can be considered as a set of constraints in a linear programming problem of which the objective function is the marginal probability of the center nodes. In this way knowledge about the maginals of neighbouring clusters is passed to other clusters thereby tightening the bounds on their marginals. We show that sharp bounds can be obtained for undirected and directed graphs that are used for practical applications, but for which exact computations are infeasible.

本稿では、グラフィカルモデルの周辺値の境界を計算するアルゴリズムを紹介します。複数の小さなノードクラスターについて、ネットワークの残りの部分とは独立して、周辺値の上限と下限を計算します。周囲のノードにおける許容確率分布の範囲は、以前に計算された境界を使用して制限されます。後ほど示すように、これは、中心ノードの周辺確率を目的関数とする線形計画問題における一連の制約として考えることができます。このようにして、近隣クラスターの周辺値に関する知識が他のクラスターに渡され、それらのクラスターの周辺値の境界が狭まります。実用的なアプリケーションで使用されるものの、正確な計算が不可能な無向グラフと有向グラフに対して、明確な境界が得られることを示します。

Learning to Coordinate Efficiently: A Model-based Approach
効率的な調整の学習：モデルベースアプローチ

In common-interest stochastic games all players receive an identical payoff. Players participating in such games must learn to coordinate with each other in order to receive the highest-possible value. A number of reinforcement learning algorithms have been proposed for this problem, and some have been shown to converge to good solutions in the limit. In this paper we show that using very simple model-based algorithms, much better (i.e., polynomial) convergence rates can be attained. Moreover, our model-based algorithms are guaranteed to converge to the optimal value, unlike many of the existing algorithms.

共通利益確率ゲームでは、すべてのプレイヤーは同一の利得を得る。このようなゲームに参加するプレイヤーは、可能な限り高い価値を得るために、互いに協調することを学習しなければならない。この問題に対して、多くの強化学習アルゴリズムが提案されており、そのうちのいくつかは極限において良好な解に収束することが示されています。本論文では、非常に単純なモデルベースアルゴリズムを用いることで、はるかに優れた（すなわち多項式的な）収束率を達成できることを示す。さらに、我々のモデルベースアルゴリズムは、既存の多くのアルゴリズムとは異なり、最適値への収束が保証されています。

New Polynomial Classes for Logic-Based Abduction
論理ベースアブダクションのための新しい多項式クラス

We address the problem of propositional logic-based abduction, i.e., the problem of searching for a best explanation for a given propositional observation according to a given propositional knowledge base. We give a general algorithm, based on the notion of projection; then we study restrictions over the representations of the knowledge base and of the query, and find new polynomial classes of abduction problems.

我々は命題論理に基づくアブダクション問題、すなわち、与えられた命題知識ベースに基づいて、与えられた命題的観測に対する最良の説明を探索する問題に取り組む。射影の概念に基づく一般的なアルゴリズムを提示し、次に知識ベースとクエリの表現に関する制約を研究し、アブダクション問題の新しい多項式クラスを見出す。

関連記事