Journal of Artificial Intelligence Resarch: Volume 39の論文一覧

Journal of Artificial Intelligence Resarch Vol. 39 (2010)に記載されている内容を一覧にまとめ、機械翻訳を交えて日本語化し掲載します。

1 論文
2 参考文献
3 関連情報

論文

Planning with Noisy Probabilistic Relational Rules

Planning with Noisy Probabilistic Relational Rules / ノイズの多い確率的関係ルールを用いたプランニング

Noisy probabilistic relational rules are a promising world model representation for several reasons. They are compact and generalize over world instantiations. They are usually interpretable and they can be learned effectively from the action experiences in complex worlds. We investigate reasoning with such rules in grounded relational domains. Our algorithms exploit the compactness of rules for efficient and flexible decision-theoretic planning. As a first approach, we combine these rules with the Upper Confidence Bounds applied to Trees (UCT) algorithm based on look-ahead trees. Our second approach converts these rules into a structured dynamic Bayesian network representation and predicts the effects of action sequences using approximate inference and beliefs over world states. We evaluate the effectiveness of our approaches for planning in a simulated complex 3D robot manipulation scenario with an articulated manipulator and realistic physics and in domains of the probabilistic planning competition. Empirical results show that our methods can solve problems where existing methods fail.

ノイズを含む確率的関係ルールは、いくつかの理由から有望な世界モデル表現です。それらはコンパクトで、世界のインスタンス化に対して一般化できます。それらは通常解釈可能であり、複雑な世界における行動経験から効果的に学習できます。私たちは、グラウンデッド関係領域におけるそのようなルールを用いた推論を調査します。私たちのアルゴリズムは、効率的で柔軟な意思決定理論的プランニングのために、ルールのコンパクトさを活用します。最初のアプローチとして、私たちはこれらのルールを、先読み木に基づく木に適用される上側信頼境界（UCT）アルゴリズムと組み合わせます。2番目のアプローチは、これらのルールを構造化された動的ベイジアンネットワーク表現に変換し、近似推論と世界状態に対する信念を用いて行動シーケンスの効果を予測します。私たちは、多関節マニピュレータと現実的な物理特性を持つシミュレートされた複雑な3Dロボット操作シナリオ、および確率的プランニング競争の領域におけるプランニングに対する私たちのアプローチの有効性を評価します。実験結果は、私たちの方法が既存の方法が解決できない問題を解決できることを示しています。

The LAMA Planner: Guiding Cost-Based Anytime Planning with Landmarks

The LAMA Planner: Guiding Cost-Based Anytime Planning with Landmarks / LAMAプランナー：ランドマークを用いたコストベースのいつでもプランニングのガイド

LAMA is a classical planning system based on heuristic forward search. Its core feature is the use of a pseudo-heuristic derived from landmarks, propositional formulas that must be true in every solution of a planning task. LAMA builds on the Fast Downward planning system, using finite-domain rather than binary state variables and multi-heuristic search. The latter is employed to combine the landmark heuristic with a variant of the well-known FF heuristic. Both heuristics are cost-sensitive, focusing on high-quality solutions in the case where actions have non-uniform cost. A weighted A* search is used with iteratively decreasing weights, so that the planner continues to search for plans of better quality until the search is terminated.LAMA showed best performance among all planners in the sequential satisficing track of the International Planning Competition 2008. In this paper we present the system in detail and investigate which features of LAMA are crucial for its performance. We present individual results for some of the domains used at the competition, demonstrating good and bad cases for the techniques implemented in LAMA. Overall, we find that using landmarks improves performance, whereas the incorporation of action costs into the heuristic estimators proves not to be beneficial. We show that in some domains a search that ignores cost solves far more problems, raising the question of how to deal with action costs more effectively in the future. The iterated weighted A* search greatly improves results, and shows synergy effects with the use of landmarks.

LAMAは、ヒューリスティックフォワードサーチに基づく古典的なプランニングシステムです。LAMAの核となる特徴は、ランドマーク（計画タスクのあらゆる解において必ず成立する命題式）から導出される疑似ヒューリスティックの使用です。LAMAは、高速下方計画システムに基づいて構築されており、バイナリ状態変数ではなく有限領域とマルチヒューリスティック探索を使用します。後者は、ランドマークヒューリスティックと、よく知られたFFヒューリスティックの変種を組み合わせるために用いられます。どちらのヒューリスティックもコストに敏感で、アクションのコストが均一でない場合の高品質な解に重点を置きます。重み付きA*探索が反復的に重みを減少させるために使用されるため、プランナーは探索が終了するまで、より質の高い計画を探索し続けます。LAMAは、2008年国際計画コンペティションの逐次満足度トラックにおいて、すべてのプランナーの中で最高のパフォーマンスを示しました。本稿では、このシステムを詳細に紹介し、LAMAのどの機能がそのパフォーマンスに不可欠であるかを調査します。コンペティションで使用されたいくつかのドメインについて、個々の結果を示し、LAMAに実装された手法の良い例と悪い例を示します。全体として、ランドマークの使用は性能を向上させる一方で、行動コストをヒューリスティック推定に組み込むことは有益ではないことがわかった。一部の領域では、コストを無視した探索によってはるかに多くの問題が解決されることを示し、将来的に行動コストをより効果的に処理する方法という問題を提起します。反復重み付きA*探索は結果を大幅に改善し、ランドマークの使用による相乗効果を示す。

Cooperative Games with Overlapping Coalitions

Cooperative Games with Overlapping Coalitions / 重複する提携を持つ協力ゲーム

In the usual models of cooperative game theory, the outcome of a coalition formation process is either the grand coalition or a coalition structure that consists of disjoint coalitions. However, in many domains where coalitions are associated with tasks, an agent may be involved in executing more than one task, and thus may distribute his resources among several coalitions. To tackle such scenarios, we introduce a model for cooperative games with overlapping coalitionsor overlapping coalition formation (OCF) games. We then explore the issue of stability in this setting. In particular, we introduce a notion of the core, which generalizes the corresponding notion in the traditional (non-overlapping) scenario. Then, under some quite general conditions, we characterize the elements of the core, and show that any element of the core maximizes the social welfare. We also introduce a concept of balancedness for overlapping coalitional games, and use it to characterize coalition structures that can be extended to elements of the core. Finally, we generalize the notion of convexity to our setting, and show that under some natural assumptions convex games have a non-empty core. Moreover, we introduce two alternative notions of stability in OCF that allow a wider range of deviations, and explore the relationships among the corresponding definitions of the core, as well as the classic (non-overlapping) core and the Aubin core. We illustrate the general properties of the three cores, and also study them from a computational perspective, thus obtaining additional insights into their fundamental structure.

協力ゲーム理論の通常のモデルでは、連合形成プロセスの結果は、大連合、または互いに素な連合からなる連合構造のいずれかです。しかし、連合がタスクに関連付けられている多くの領域では、エージェントは複数のタスクの実行に関与する可能性があり、したがって、複数の連合にリソースを配分する可能性があります。このようなシナリオに対処するために、重複連合を含む協力ゲーム、または重複連合形成（OCF）ゲームのモデルを導入します。そして、この設定における安定性の問題を調査します。特に、従来の（重複しない）シナリオにおける対応する概念を一般化したコアの概念を導入します。次に、いくつかの極めて一般的な条件下で、コアの要素を特徴付け、コアの任意の要素が社会福祉を最大化することを示す。また、重複する提携ゲームにおける均衡の概念を導入し、それを用いてコアの要素に拡張可能な提携構造を特徴付ける。最後に、凸性の概念を我々の設定に一般化し、いくつかの自然な仮定の下で、凸ゲームが空でないコアを持つことを示す。さらに、OCFにおける安定性の2つの代替概念を導入し、より広い範囲の偏差を許容します。そして、対応するコアの定義、および従来の（重複しない）コアとオービンコアとの関係を考察します。3つのコアの一般的な特性を示すとともに、計算論的観点からそれらを研究することで、それらの基本構造に関する更なる知見を得る。

Narrative Planning: Balancing Plot and Character

Narrative Planning: Balancing Plot and Character / 物語プランニング：プロットとキャラクターのバランス

Narrative, and in particular storytelling, is an important part of the human experience. Consequently, computational systems that can reason about narrative can be more effective communicators, entertainers, educators, and trainers. One of the central challenges in computational narrative reasoning is narrative generation, the automated creation of meaningful event sequences. There are many factors — logical and aesthetic — that contribute to the success of a narrative artifact. Central to this success is its understandability. We argue that the following two attributes of narratives are universal: (a) the logical causal progression of plot, and (b) character believability. Character believability is the perception by the audience that the actions performed by characters do not negatively impact the audience’s suspension of disbelief. Specifically, characters must be perceived by the audience to be intentional agents. In this article, we explore the use of refinement search as a technique for solving the narrative generation problem — to find a sound and believable sequence of character actions that transforms an initial world state into a world state in which goal propositions hold. We describe a novel refinement search planning algorithm — the Intent-based Partial Order Causal Link (IPOCL) planner — that, in addition to creating causally sound plot progression, reasons about character intentionality by identifying possible character goals that explain their actions and creating plan structures that explain why those characters commit to their goals. We present the results of an empirical evaluation that demonstrates that narrative plans generated by the IPOCL algorithm support audience comprehension of character intentions better than plans generated by conventional partial-order planners.

物語、特にストーリーテリングは、人間の経験において重要な部分を占めています。そのため、物語について推論できる計算システムは、より効果的なコミュニケーター、エンターテイナー、教育者、トレーナーとなることができます。計算による物語推論における中心的な課題の1つは、物語生成、つまり意味のあるイベントシーケンスの自動作成です。物語作品の成功には、論理的要素と美的要素の両方を含む多くの要素が寄与します。この成功の中核を成すのは、その理解しやすさです。私たちは、物語の次の2つの属性が普遍的であると主張します。(a)プロットの論理的な因果関係の進行、および(b)キャラクターの信憑性です。キャラクターの信憑性とは、登場人物の行動が観客の疑念の停止に悪影響を与えないという観客の認識です。具体的には、登場人物は観客から意図的な行為者として認識される必要があります。本稿では、物語生成問題を解決する手法として、洗練探索の利用について検討します。この問題は、初期の世界状態を目標命題が成り立つ世界状態に変換する、健全で説得力のある登場人物の行動シーケンスを見つけることです。我々は、因果的に健全なプロット進行を作成することに加えて、登場人物の行動を説明する可能性のある目標を特定し、登場人物が目標にコミットする理由を説明するプラン構造を作成することで、登場人物の意図について推論する、新しい洗練探索プランニングアルゴリズム、意図に基づく半順序因果リンク(IPOCL)プランナーについて説明します。我々は、IPOCLアルゴリズムによって生成された物語プランが、従来の半順序プランナーによって生成されたプランよりも、観客による登場人物の意図の理解をサポートすることを示す実証的評価の結果を提示します。

Case-Based Subgoaling in Real-Time Heuristic Search for Video Game Pathfinding

Case-Based Subgoaling in Real-Time Heuristic Search for Video Game Pathfinding / ビデオゲームのパスファインディングのためのリアルタイムヒューリスティックサーチにおける事例ベースのサブゴール設定

Real-time heuristic search algorithms satisfy a constant bound on the amount of planning per action, independent of problem size. As a result, they scale up well as problems become larger. This property would make them well suited for video games where Artificial Intelligence controlled agents must react quickly to user commands and to other agents’ actions. On the downside, real-time search algorithms employ learning methods that frequently lead to poor solution quality and cause the agent to appear irrational by re-visiting the same problem states repeatedly. The situation changed recently with a new algorithm, D LRTA*, which attempted to eliminate learning by automatically selecting subgoals. D LRTA* is well poised for video games, except it has a complex and memory-demanding pre-computation phase during which it builds a database of subgoals. In this paper, we propose a simpler and more memory-efficient way of pre-computing subgoals thereby eliminating the main obstacle to applying state-of-the-art real-time search methods in video games. The new algorithm solves a number of randomly chosen problems off-line, compresses the solutions into a series of subgoals and stores them in a database. When presented with a novel problem on-line, it queries the database for the most similar previously solved case and uses its subgoals to solve the problem. In the domain of pathfinding on four large video game maps, the new algorithm delivers solutions eight times better while using 57 times less memory and requiring 14% less pre-computation time.

リアルタイムヒューリスティック検索アルゴリズムは、問題のサイズに関係なく、行動あたりのプランニング量に関する一定の上限を満たす。その結果、問題が大規模になっても適切にスケールアップできます。この特性により、リアルタイム探索アルゴリズムは、ユーザーのコマンドや他のエージェントの行動に人工知能制御エージェントがすばやく反応しなければならないビデオゲームに適しています。欠点としては、リアルタイム探索アルゴリズムで採用されている学習方法によって、ソリューションの品質が低下することが多く、同じ問題の状態を繰り返し再訪することでエージェントが不合理に見えることがあります。最近、新しいアルゴリズムD LRTA*の登場により状況は変わりました。このアルゴリズムは、サブゴールを自動的に選択することで学習を排除しようとしました。D LRTA*はビデオゲームに適していますが、サブゴールのデータベースを構築する複雑でメモリを大量に消費する事前計算フェーズがあります。本稿では、サブゴールを事前計算するより単純でメモリ効率の高い方法を提案し、ビデオゲームに最先端のリアルタイム探索方法を適用する際の主な障害を取り除きます。新しいアルゴリズムは、ランダムに選択された多数の問題をオフラインで解き、そのソリューションを一連のサブゴールに圧縮してデータベースに格納します。このアルゴリズムは、オンラインで新しい問題が提示されると、データベースを参照して過去に解決された最も類似した事例を検索し、そのサブゴールを用いて問題を解決します。4つの大規模なビデオゲームマップにおける経路探索の領域において、この新しいアルゴリズムは、メモリ使用量を57分の1に削減し、事前計算時間を14%短縮しながら、8倍の精度でソリューションを提供します。

A Model-Based Active Testing Approach to Sequential Diagnosis

A Model-Based Active Testing Approach to Sequential Diagnosis / モデルベースのアクティブテストアプローチによる逐次診断

Model-based diagnostic reasoning often leads to a large number of diagnostic hypotheses. The set of diagnoses can be reduced by taking into account extra observations (passive monitoring), measuring additional variables (probing) or executing additional tests (sequential diagnosis/test sequencing). In this paper we combine the above approaches with techniques from Automated Test Pattern Generation (ATPG) and Model-Based Diagnosis (MBD) into a framework called FRACTAL (FRamework for ACtive Testing ALgorithms). Apart from the inputs and outputs that connect a system to its environment, in active testing we consider additional input variables to which a sequence of test vectors can be supplied. We address the computationally hard problem of computing optimal control assignments (as defined in FRACTAL) in terms of a greedy approximation algorithm called FRACTAL-G. We compare the decrease in the number of remaining minimal cardinality diagnoses of FRACTAL-G to that of two more FRACTAL algorithms: FRACTAL-ATPG and FRACTAL-P. FRACTAL-ATPG is based on ATPG and sequential diagnosis while FRACTAL-P is based on probing and, although not an active testing algorithm, provides a baseline for comparing the lower bound on the number of reachable diagnoses for the FRACTAL algorithms. We empirically evaluate the trade-offs of the three FRACTAL algorithms by performing extensive experimentation on the ISCAS85/74XXX benchmark of combinational circuits.

モデルベース診断推論は、多くの場合、多数の診断仮説を生み出します。診断セットは、追加の観測（パッシブモニタリング）、追加変数の測定（プロービング）、または追加テストの実行（シーケンシャル診断／テストシーケンス）を考慮することで削減できます。本稿では、上記のアプローチと自動テストパターン生成（ATPG）およびモデルベース診断（MBD）の手法を組み合わせ、FRACTAL（FRamework for ACtive Testing ALgorithms）と呼ばれるフレームワークを構築します。アクティブテストでは、システムとその環境を接続する入出力に加えて、一連のテストベクトルを供給できる追加の入力変数を考慮します。我々は、FRACTALで定義されている最適制御割り当てを計算するという計算的に困難な問題に、貪欲近似アルゴリズムであるFRACTAL-Gを用いて取り組む。FRACTAL-Gの残存最小基数診断数の減少を、さらに2つのFRACTALアルゴリズム（FRACTAL-ATPGおよびFRACTAL-P）の減少と比較します。FRACTAL-ATPGはATPGとシーケンシャル診断に基づいているのに対し、FRACTAL-Pはプロービングに基づいており、アクティブテストアルゴリズムではないものの、FRACTALアルゴリズムの到達可能な診断数の下限を比較するための基準を提供します。我々は、組み合わせ回路のベンチマークであるISCAS85/74XXXを用いて広範な実験を行い、3つのFRACTALアルゴリズムのトレードオフを経験的に評価します。

Active Tuples-based Scheme for Bounding Posterior Beliefs

Active Tuples-based Scheme for Bounding Posterior Beliefs / 事後確率の境界設定のためのアクティブタプルベースのスキーム

The paper presents a scheme for computing lower and upper bounds on the posterior marginals in Bayesian networks with discrete variables. Its power lies in its ability to use any available scheme that bounds the probability of evidence or posterior marginals and enhance its performance in an anytime manner. The scheme uses the cutset conditioning principle to tighten existing bounding schemes and to facilitate anytime behavior, utilizing a fixed number of cutset tuples. The accuracy of the bounds improves as the number of used cutset tuples increases and so does the computation time. We demonstrate empirically the value of our scheme for bounding posterior marginals and probability of evidence using a variant of the bound propagation algorithm as a plug-in scheme.

本論文では、離散変数ベイジアンネットワークにおける事後周辺分布の下限値と上限値を計算する手法を提示します。その強みは、証拠確率または事後周辺分布を境界付ける任意の既存の手法を利用可能であり、いつでもパフォーマンスを向上できることにあります。本手法は、カットセット条件付け原理を用いて既存の境界設定手法を強化し、固定数のカットセットタプルを用いていつでも動作を可能にします。境界値の精度は、使用されるカットセットタプル数の増加と計算時間の増加に伴い向上します。本手法の事後周辺分布と証拠確率の境界設定における価値を、境界伝播アルゴリズムの変種をプラグイン手法として用いて実証的に証明します。

A Constraint Satisfaction Framework for Executing Perceptions and Actions in Diagrammatic Reasoning

A Constraint Satisfaction Framework for Executing Perceptions and Actions in Diagrammatic Reasoning / 図式的推論における知覚と行動の実行のための制約充足フレームワーク

Diagrammatic reasoning (DR) is pervasive in human problem solving as a powerful adjunct to symbolic reasoning based on language-like representations. The research reported in this paper is a contribution to building a general purpose DR system as an extension to a SOAR-like problem solving architecture. The work is in a framework in which DR is modeled as a process where subtasks are solved, as appropriate, either by inference from symbolic representations or by interaction with a diagram, i.e., perceiving specified information from a diagram or modifying/creating objects in a diagram in specified ways according to problem solving needs. The perceptions and actions in most DR systems built so far are hand-coded for the specific application, even when the rest of the system is built using the general architecture. The absence of a general framework for executing perceptions/actions poses as a major hindrance to using them opportunistically — the essence of open-ended search in problem solving.Our goal is to develop a framework for executing a wide variety of specified perceptions and actions across tasks/domains without human intervention. We observe that the domain/task-specific visual perceptions/actions can be transformed into domain/task-independent spatial problems. We specify a spatial problem as a quantified constraint satisfaction problem in the real domain using an open-ended vocabulary of properties, relations and actions involving three kinds of diagrammatic objects — points, curves, regions. Solving a spatial problem from this specification requires computing the equivalent simplified quantifier-free expression, the complexity of which is inherently doubly exponential. We represent objects as configuration of simple elements to facilitate decomposition of complex problems into simpler and similar subproblems. We show that, if the symbolic solution to a subproblem can be expressed concisely, quantifiers can be eliminated from spatial problems in low-order polynomial time using similar previously solved subproblems. This requires determining the similarity of two problems, the existence of a mapping between them computable in polynomial time, and designing a memory for storing previously solved problems so as to facilitate search. The efficacy of the idea is shown by time complexity analysis. We demonstrate the proposed approach by executing perceptions and actions involved in DR tasks in two army applications.

図式的推論（DR）は、言語的表現に基づく記号的推論の強力な補助として、人間の問題解決に広く浸透しています。本論文で報告する研究は、SOARのような問題解決アーキテクチャの拡張として、汎用DRシステムの構築に貢献するものです。この研究は、DRを、記号表現からの推論またはダイアグラムとのインタラクションによって、サブタスクが適切に解決されるプロセスとしてモデル化するフレームワークに基づいています。つまり、ダイアグラムから特定の情報を知覚するか、問題解決のニーズに応じて特定の方法でダイアグラム内のオブジェクトを変更/作成します。これまでに構築されたほとんどのDRシステムにおける知覚とアクションは、システムの残りの部分が一般的なアーキテクチャを使用して構築されている場合でも、特定のアプリケーション用に手動でコーディングされています。知覚/アクションを実行するための一般的なフレームワークがないと、それらを機会主義的に使用すること、つまり問題解決におけるオープンエンドの探索の本質を阻害する大きな障害となります。私たちの目標は、人間の介入なしに、タスク/ドメイン間でさまざまな特定の知覚とアクションを実行するためのフレームワークを開発することです。ドメイン/タスク固有の視覚知覚/アクションは、ドメイン/タスクに依存しない空間問題に変換できることがわかっています。我々は、点、曲線、領域という3種類の図式的オブジェクトに関する特性、関係、および動作というオープンエンドな語彙を用いて、空間問題を実領域における定量化された制約充足問題として規定します。この規定から空間問題を解くには、等価な簡略化された量指定子のない表現を計算する必要があり、その計算量は本質的に二重指数関数的です。我々は、複雑な問題をより単純で類似した部分問題に分解しやすくするために、オブジェクトを単純な要素の集合として表現します。部分問題の記号解が簡潔に表現できる場合、既に解決済みの類似の部分問題を用いて、空間問題から量指定子を低次の多項式時間で除去できることを示す。そのためには、2つの問題の類似性、それらの問題間に多項式時間で計算可能な写像が存在するかどうか、そして検索を容易にするために既に解決済みの問題を格納するメモリを設計する必要があります。このアイデアの有効性は、時間計算量解析によって示されます。我々は、2つの軍事アプリケーションにおけるDRタスクに関連する知覚と動作を実行することで、提案されたアプローチを実証します。

Nominals, Inverses, Counting, and Conjunctive Queries or: Why Infinity is your Friend!

Nominals, Inverses, Counting, and Conjunctive Queries or: Why Infinity is your Friend! / 名義式、逆数、計数、および論理積クエリ、あるいは：なぜ無限はあなたの味方なのか！

Description Logics are knowledge representation formalisms that provide, for example, the logical underpinning of the W3C OWL standards. Conjunctive queries, the standard query language in databases, have recently gained significant attention as an expressive formalism for querying Description Logic knowledge bases. Several different techniques for deciding conjunctive query entailment are available for a wide range of DLs. Nevertheless, the combination of nominals, inverse roles, and number restrictions in OWL 1 and OWL 2 DL causes unsolvable problems for the techniques hitherto available. We tackle this problem and present a decidability result for entailment of unions of conjunctive queries in the DL ALCHOIQb that contains all three problematic constructors simultaneously. Provided that queries contain only simple roles, our result also shows decidability of entailment of (unions of) conjunctive queries in the logic that underpins OWL 1 DL and we believe that the presented results will pave the way for further progress towards conjunctive query entailment decision procedures for the Description Logics underlying the OWL standards.

記述論理は、例えばW3C OWL標準の論理的基盤を提供する知識表現形式です。データベースの標準クエリ言語である連言クエリは、記述論理知識ベースへのクエリ実行のための表現形式として近年大きな注目を集めています。連言クエリの含意を決定するための様々な手法が、様々な記述論理（DL）で利用可能です。しかしながら、OWL 1およびOWL 2 DLにおける名詞、逆役割、および数値制限の組み合わせは、これまで利用可能な手法では解決できない問題を引き起こします。我々はこの問題に取り組み、3つの問題のある構成子を同時に含む記述論理（DL）ALCHOIQbにおける連言クエリの和集合の含意に関する決定可能性の結果を提示します。クエリが単純な役割のみを含む場合、我々の結果は、OWL 1 DLを支える論理における連言クエリ（の和集合）の含意の決定可能性も示しており、提示された結果は、OWL標準の基礎となる記述論理における連言クエリ含意決定手続きのさらなる進歩への道を開くものと考えています。

Kalman Temporal Differences

Kalman Temporal Differences / カルマン時間差分

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

強化学習はスケーラビリティに欠けるため、オンライン値（およびQ）関数近似は、この10年間でますます注目を集めています。本稿では、サンプル効率、非線形近似、非定常性処理、不確実性管理といった特徴を持つ、カルマン時間差分（KTD）フレームワークという新しい近似スキームを紹介します。確率的遷移の場合に偏りのある推定値を生成する、決定論的マルコフ決定過程（MDP）用の最初のKTDベースのアルゴリズムを提供します。確率的MDPを解く拡張KTDフレームワーク（XKTD）について述べる。決定論的遷移と確率的遷移の両方の特殊なケースについて収束性を解析します。関連アルゴリズムを従来のベンチマークで実験します。これらは、発表された特徴を示しつつ、最先端技術と比較して遜色ない。

Implicit Abstraction Heuristics

Implicit Abstraction Heuristics / 暗黙的抽象化ヒューリスティック

State-space search with explicit abstraction heuristics is at the state of the art of cost-optimal planning. These heuristics are inherently limited, nonetheless, because the size of the abstract space must be bounded by some, even if a very large, constant. Targeting this shortcoming, we introduce the notion of (additive) implicit abstractions, in which the planning task is abstracted by instances of tractable fragments of optimal planning. We then introduce a concrete setting of this framework, called fork-decomposition, that is based on two novel fragments of tractable cost-optimal planning. The induced admissible heuristics are then studied formally and empirically. This study testifies for the accuracy of the fork decomposition heuristics, yet our empirical evaluation also stresses the tradeoff between their accuracy and the runtime complexity of computing them. Indeed, some of the power of the explicit abstraction heuristics comes from precomputing the heuristic function offline and then determining h(s) for each evaluated state s by a very fast lookup in a “database.” By contrast, while fork-decomposition heuristics can be calculated in polynomial time, computing them is far from being fast. To address this problem, we show that the time-per-node complexity bottleneck of the fork-decomposition heuristics can be successfully overcome. We demonstrate that an equivalent of the explicit abstraction notion of a “database” exists for the fork-decomposition abstractions as well, despite their exponential-size abstract spaces. We then verify empirically that heuristic search with the “databased” fork-decomposition heuristics favorably competes with the state of the art of cost-optimal planning.

明示的抽象化ヒューリスティックを用いた状態空間探索は、コスト最適計画の最先端の手法です。しかしながら、これらのヒューリスティックは、抽象空間のサイズが、たとえ非常に大きな定数であっても、何らかの値で制限されなければならないため、本質的に限界があります。この欠点を解消するため、我々は（加法的な）暗黙的抽象化の概念を導入します。この概念では、計画タスクは、最適計画の扱いやすい断片のインスタンスによって抽象化されます。次に、このフレームワークの具体的な設定として、扱いやすいコスト最適計画の2つの新しい断片に基づく、フォーク分解と呼ばれる設定を導入します。そして、誘導された許容可能なヒューリスティックを形式的かつ経験的に研究します。本研究はフォーク分解ヒューリスティックの精度を証明するものであるが、同時に、我々の実証的評価は、その精度と計算にかかる実行時複雑度とのトレードオフも強調しています。実際、明示的抽象化ヒューリスティックの威力の一部は、オフラインでヒューリスティック関数を事前計算し、評価された各状態sのh(s)を「データベース」を非常に高速に検索することで決定することから得られます。対照的に、フォーク分解ヒューリスティックは多項式時間で計算できるものの、その計算は高速とは程遠い。この問題に対処するため、我々はフォーク分解ヒューリスティックにおけるノードあたりの時間複雑度のボトルネックをうまく克服できることを示す。我々は、その抽象空間が指数関数的であるにもかかわらず、明示的抽象化概念の「データベース」に相当するものがフォーク分解抽象にも存在することを示す。次に、「データベース」のフォーク分解ヒューリスティックスを使用したヒューリスティック検索が、最先端のコスト最適計画と有利に競合することを経験的に検証します。

Theta*: Any-Angle Path Planning on Grids

Theta*: Any-Angle Path Planning on Grids / Theta*：グリッド上の任意角度パスプランニング

Grids with blocked and unblocked cells are often used to represent terrain in robotics and video games. However, paths formed by grid edges can be longer than true shortest paths in the terrain since their headings are artificially constrained. We present two new correct and complete any-angle path-planning algorithms that avoid this shortcoming. Basic Theta* and Angle-Propagation Theta* are both variants of A* that propagate information along grid edges without constraining paths to grid edges. Basic Theta* is simple to understand and implement, fast and finds short paths. However, it is not guaranteed to find true shortest paths. Angle-Propagation Theta* achieves a better worst-case complexity per vertex expansion than Basic Theta* by propagating angle ranges when it expands vertices, but is more complex, not as fast and finds slightly longer paths. We refer to Basic Theta* and Angle-Propagation Theta* collectively as Theta*. Theta* has unique properties, which we analyze in detail. We show experimentally that it finds shorter paths than both A* with post-smoothed paths and Field D* (the only other version of A* we know of that propagates information along grid edges without constraining paths to grid edges) with a runtime comparable to that of A* on grids. Finally, we extend Theta* to grids that contain unblocked cells with non-uniform traversal costs and introduce variants of Theta* which provide different tradeoffs between path length and runtime.

ブロックされたセルとブロックされていないセルを持つグリッドは、ロボット工学やビデオゲームで地形を表すためによく使用されます。ただし、グリッドエッジによって形成されるパスは、その方向が人工的に制約されているため、地形の真の最短パスよりも長くなる可能性があります。この欠点を回避する2つの新しい正確で完全な任意角度のパスプランニングアルゴリズムを紹介します。Basic Theta*とAngle-Propagation Theta*はどちらもA*のバリエーションであり、パスをグリッドエッジに制約することなく、グリッドエッジに沿って情報を伝搬します。Basic Theta*は理解と実装が簡単で、高速で、短いパスを見つけます。ただし、真の最短パスが見つかる保証はありません。Angle-Propagation Theta*は、頂点を拡張するときに角度の範囲を伝搬することにより、頂点拡張あたりの最悪ケースの複雑さをBasic Theta*よりも改善しますが、より複雑で、それほど高速ではなく、わずかに長いパスを見つけます。Basic Theta*とAngle-Propagation Theta*を総称してTheta*と呼びます。Theta*には独自の特性があり、これを詳細に分析します。実験的に、Theta*は、ポストスムージングされたパスを持つA*やField D* (パスをグリッドのエッジに制限することなくグリッドのエッジに沿って情報を伝播する、我々が知る唯一のA*のバージョン)よりも短いパスを、グリッド上のA*と同等の実行時間で見つけることを示します。最後に、Theta*を、非均一なトラバーサルコストを持つブロックされていないセルを含むグリッドに拡張し、パス長と実行時間の間に異なるトレードオフを提供するTheta*の亜種を導入します。

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback / どのクラスタリングが必要ですか？最小限のフィードバックで理想的なクラスタリングを誘導する

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author’s mood, gender, age, or sentiment. Without knowing the user’s intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the user-desired dimension, previous work has focused on learning a similarity metric from data manually annotated with the user’s intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for fine-tuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonly-used sentiment datasets.

テキストクラスタリングに関する従来の研究は、主にトピックによるドキュメントのグループ化に焦点を当ててきましたが、ユーザーが作成者の気分、性別、年齢、感情など、他の次元でドキュメントをクラスタリングしたいと考えることも考えられます。ユーザーの意図を知らなければ、クラスタリングアルゴリズムは最も重要な次元でのみドキュメントをグループ化し、それがユーザーが望む次元ではない可能性があります。ユーザーが望む次元に沿って文書をクラスタリングする問題に対処するため、これまでの研究では、ユーザーの意図を手動で注釈付けしたデータから類似度指標を学習するか、クラスタリングプロセス中に人間がインタラクティブな方法で特徴空間を構築することに焦点を当ててきました。これらのアプローチに必要な類似度関数の微調整や関連特徴の選択における人間の知識への依存を軽減することを目指し、本稿では、ユーザーが少数の単語を調べるだけで、文書をクラスタリングする次元を容易に選択できる、新しいアクティブクラスタリングアルゴリズムを提案します。本アルゴリズムの有効性を、一般的に使用されるさまざまな感情データセットで実証します。

A Utility-Theoretic Approach to Privacy in Online Services

A Utility-Theoretic Approach to Privacy in Online Services / オンラインサービスにおけるプライバシーへの効用理論的アプローチ

Online offerings such as web search, news portals, and e-commerce applications face the challenge of providing high-quality service to a large, heterogeneous user base. Recent efforts have highlighted the potential to improve performance by introducing methods to personalize services based on special knowledge about users and their context. For example, a user’s demographics, location, and past search and browsing may be useful in enhancing the results offered in response to web search queries. However, reasonable concerns about privacy by both users, providers, and government agencies acting on behalf of citizens, may limit access by services to such information. We introduce and explore an economics of privacy in personalization, where people can opt to share personal information, in a standing or on-demand manner, in return for expected enhancements in the quality of an online service. We focus on the example of web search and formulate realistic objective functions for search efficacy and privacy. We demonstrate how we can find a provably near-optimal optimization of the utility-privacy tradeoff in an efficient manner. We evaluate our methodology on data drawn from a log of the search activity of volunteer participants. We separately assess users preferences about privacy and utility via a large-scale survey, aimed at eliciting preferences about peoples willingness to trade the sharing of personal data in returns for gains in search efficiency. We show that a significant level of personalization can be achieved using a relatively small amount of information about users.

Web検索、ニュースポータル、eコマースアプリケーションなどのオンラインサービスは、大規模で異質なユーザーベースに高品質のサービスを提供するという課題に直面しています。最近の取り組みでは、ユーザーとそのコンテキストに関する専門知識に基づいてサービスをパーソナライズする方法を導入することで、パフォーマンスを向上させる可能性が浮き彫りになっています。例えば、ユーザーの人口統計、場所、過去の検索や閲覧履歴は、Web検索クエリへの応答として提供される結果を強化するのに役立つ可能性があります。しかし、ユーザー、プロバイダー、そして市民を代表して活動する政府機関の双方がプライバシーについて合理的な懸念を抱くことで、サービスによるそのような情報へのアクセスが制限される可能性があります。本稿では、パーソナライゼーションにおけるプライバシーの経済学を紹介し、考察します。これは、オンラインサービスの品質向上が期待される見返りとして、ユーザーが個人情報を常時またはオンデマンドで共有することを選択できるというものです。Web検索を例に挙げ、検索効率とプライバシーに関する現実的な目的関数を定式化します。そして、効用とプライバシーのトレードオフを、証明可能な形で効率的に最適化する方法を示します。ボランティア参加者の検索活動ログから得られたデータを用いて、本手法を評価します。大規模調査を実施し、プライバシーと効用に関するユーザーの選好を個別に評価します。この調査は、検索効率の向上と引き換えに個人データの共有を許容する意思に関する選好を明らかにすることを目的としています。比較的少量のユーザー情報を用いて、高いレベルのパーソナライゼーションを実現できることを示します。

An Effective Algorithm for and Phase Transitions of the Directed Hamiltonian Cycle Problem

An Effective Algorithm for and Phase Transitions of the Directed Hamiltonian Cycle Problem / 有向ハミルトン閉路問題の効果的なアルゴリズムと相転移

The Hamiltonian cycle problem (HCP) is an important combinatorial problem with applications in many areas. It is among the first problems used for studying intrinsic properties, including phase transitions, of combinatorial problems. While thorough theoretical and experimental analyses have been made on the HCP in undirected graphs, a limited amount of work has been done for the HCP in directed graphs (DHCP). The main contribution of this work is an effective algorithm for the DHCP. Our algorithm explores and exploits the close relationship between the DHCP and the Assignment Problem (AP) and utilizes a technique based on Boolean satisfiability (SAT). By combining effective algorithms for the AP and SAT, our algorithm significantly outperforms previous exact DHCP algorithms, including an algorithm based on the award-winning Concorde TSP algorithm. The second result of the current study is an experimental analysis of phase transitions of the DHCP, verifying and refining a known phase transition of the DHCP.

ハミルトン閉路問題（HCP）は、多くの分野に応用されている重要な組合せ問題です。これは、組合せ問題の本質的な特性（相転移など）を研究するために最初に用いられた問題の一つです。無向グラフのHCPについては、徹底した理論的・実験的解析が行われてきましたが、有向グラフのHCP（DHCP）については、限られた研究しか行われていません。本研究の主な貢献は、DHCPのための効果的なアルゴリズムの開発です。私たちのアルゴリズムは、DHCPと割り当て問題（AP）の密接な関係を探求・活用し、ブール充足可能性（SAT）に基づく手法を活用しています。APとSATのための効果的なアルゴリズムを組み合わせることで、私たちのアルゴリズムは、受賞歴のあるコンコルドTSPアルゴリズムに基づくアルゴリズムを含む、これまでの正確なDHCPアルゴリズムを大幅に上回ります。本研究の2つ目の成果は、DHCPの相転移の実験的解析であり、DHCPの既知の相転移を検証・改良しています。

Best-First Heuristic Search for Multicore Machines

Best-First Heuristic Search for Multicore Machines / マルチコアマシンのための最良優先ヒューリスティック探索

To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals.

現代のマルチコアプロセッサを活用するには、基本アルゴリズムの並列バージョンの開発が不可欠です。本稿では、共有メモリ環境における並列最良優先探索への様々なアプローチを比較します。本稿では、抽象化を用いて状態空間を分割し、頻繁なロックを必要とせずに重複状態を検出する新しい手法PBNFを提示します。PBNFは、スレッドをビジー状態に保つために必要に応じて投機的な拡張を可能にします。本稿では、本手法における潜在的なライブロック状態を特定して修正し、時相論理を用いてその正しさを証明した。本手法は汎用性が高く、準最適探索やいつでも実行可能なヒューリスティック探索にも容易に拡張可能です。8コアマシンを用いたSTRIPS計画、グリッドパスファインディング、スライディングタイルパズル問題における実験的比較において、PBNFを用いて実装されたA*、重み付きA*、およびいつでも実行可能な重み付きA*は、従来の並列探索提案の改良版よりも高速な探索を実現することを示す。

Intrusion Detection using Continuous Time Bayesian Networks

Intrusion Detection using Continuous Time Bayesian Networks / 連続時間ベイジアンネットワークを用いた侵入検知

Intrusion detection systems (IDSs) fall into two high-level categories: network-based systems (NIDS) that monitor network behaviors, and host-based systems (HIDS) that monitor system calls. In this work, we present a general technique for both systems. We use anomaly detection, which identifies patterns not conforming to a historic norm. In both types of systems, the rates of change vary dramatically over time (due to burstiness) and over components (due to service difference). To efficiently model such systems, we use continuous time Bayesian networks (CTBNs) and avoid specifying a fixed update interval common to discrete-time models. We build generative models from the normal training data, and abnormal behaviors are flagged based on their likelihood under this norm. For NIDS, we construct a hierarchical CTBN model for the network packet traces and use Rao-Blackwellized particle filtering to learn the parameters. We illustrate the power of our method through experiments on detecting real worms and identifying hosts on two publicly available network traces, the MAWI dataset and the LBNL dataset. For HIDS, we develop a novel learning method to deal with the finite resolution of system log file time stamps, without losing the benefits of our continuous time model. We demonstrate the method by detecting intrusions in the DARPA 1998 BSM dataset.

侵入検知システム（IDS）は、ネットワークの挙動を監視するネットワークベースシステム（NIDS）と、システムコールを監視するホストベースシステム（HIDS）という2つの高レベルカテゴリに分類されます。本稿では、両システムに適用可能な一般的な手法を提示します。我々は、過去の規範に適合しないパターンを識別する異常検出を使用します。どちらのタイプのシステムでも、変化率は時間の経過とともに（バースト性のため）そしてコンポーネントごとに（サービスの違いのため）劇的に変化します。このようなシステムを効率的にモデル化するために、連続時間ベイジアンネットワーク（CTBN）を使用し、離散時間モデルに共通する固定更新間隔の指定を避けます。通常のトレーニングデータから生成モデルを構築し、この規範の下での尤度に基づいて異常な動作にフラグを付けます。NIDSについては、ネットワークパケットトレースの階層型CTBNモデルを構築し、Rao-Blackwellized粒子フィルタリングを使用してパラメータを学習します。公開されている2つのネットワークトレース（MAWIデータセットとLBNLデータセット）で実際のワームを検出し、ホストを識別する実験を通じて、この手法の威力を示します。HIDSについては、連続時間モデルの利点を失うことなく、システムログファイルのタイムスタンプの有限解像度を処理するための新しい学習方法を開発します。DARPA 1998 BSMデータセットへの侵入を検出することでこの方法を実証します。

参考文献

Journal of Artificial Intelligence Resarch Vol. 39 (2010)

論文