IJCAI 2018, Computer Games Workshop

IJCAI 2018 is being held in Stockholm from July the 13th to July the 19th as a part of the Federated AI Meeting. In this and other posts I will give some short comments on interesting things I’ve heard. If anything is wrong or misrepresented, please don’t hesitate to get in touch with me.

The first day I attended the Computer Games Workshop.

Revisiting the Arcade Learning Environment

Marlos Machado had an invited talk on the Arcade Learning Environment (ALE). ALE is currently a collection of Atari games that agents can interact with, but more generally it could be any type of arcade game.

Why Atari 2600 games? They are simple to simulate fast, varied in what to do, still interesting for humans, and very importantly it is fully free from experimenter bias since they were all done in the 80s. One especially interesting aspect of Atari games is that they are deterministic. Future dream is naturally to expand to other types of consoles.

One problem with research using ALE is that the evaluation approach is not standardized. For example: play one life or multiple lives; optimize per game or in general; how to count the amount of training data used; how to add stochasticity such as random frame skips for countering determinism. Without standardization it is impossible to compare results. More generally, how should results for learning be summarized. Best policy, training performance, area under graph, and so on.

Even though the Atari 2600 games are deterministic, this is not very good for general learning. In IJCAI 2015 an algorithm called The Brute was presented, that uses random actions and then changes choices in the paths using a dynamic programming algorithm. To make the platform more realistic ALE has changed to inject stochasticity by default using random input delays. This breaks The Brute algorithm.

We are scientists. If you give us a number to optimize, we are very happy to optimize it

Standardized set of benchmarks results to compare against is published, using all the agreed upon choices for the above mentioned questions.

The Atari 2600 had levers that could be used to change what game is played. For example, for Freeway (similar to Frogger) one can get versions with faster or longer cars. For Breakout a version gives partial observability by only flashing the blocks when something is hit. A new feature in ALE is that these levers are now also available, increasing the number and variety of games available.

Experimenting with the different modes and difficulties in Freeway, standard DQN learning performs much worse in more difficult versions. Transferring models between modes is even worse, if the learning has over-fitted to the specific mode. Combining learning on different modes is actually better. First training for half the frame training budget, and then continuing training in the more difficult mode. More advanced versions using transferring of parts of the learned network is even more effective.

Go

Golois

Golois is a Go-program that uses an architecture inspired by the original AlphaGo.

The policy network alone reaches 4d on the KGS server. The main difference for the policy network from AlphGo is that is is a residual network, meaning that the input is added to the layers after the convolutional layers. This type of architecture is inspired by computer vision research.

A value network for estimating the end game score for the black player is also constructed. It has 9 binary outputs for scores <= 180, 181, 182, …, >=189. The details for the architecture are in the paper.

The policy and value networks are then placed into a Monte-Carlo tree search without playouts, using only a value network for end node evaluations. The MCTS is guided by PUCT, which is a variant of UCT when a priors are available. This combination reaches 8d on KGS.

The main contribution of the paper was an improvement of how to construct the value network using spatial average pooling, which reduced the training time and size significantly, without lowering quality.

Impact of knowledge and search in MCTS for Go

This paper interestingly doe not use deep neural nets, contrary to most modern papers inspired by AlphaGo. The presenter jokes that it is also probably the last such paper.

The motivation is to find out why MCTS works so well with Go, and in particular how knowledge (priors) and search interact in the results for MCTS for Go.

Knowledge in Go comes in several forms

Playout policies (!M moves per second)
Simple features
Small and large-scale patterns
Neural networks

By taking an existing engine (Fuego) and using different parts of it (just playout policy, base MCTS, MCTS with some knowledge, and so on), and comparing the engines with professional games. The measure is if the variant can predict the professional move. The playout policy has some basic knowledge, and can predict 1 out of five moves, typically simple forced moves. Interestingly, the MCTS with no knowledge becomes better with more search. The MCTS versions with knowledge increases initially, and then drops off.

However, that does not mean that the MCTS version with knowledge were worse players. As expected, the knowledge based search are actually better players. Some interesting diagrams showing the differences can be seen in the paper.

One explanation of the interesting difference in move prediction is that the engines optimize for winning, not for final score. This means that in the end game, the engine plays more different from humans the better it gets.

One interesting observation is that feature knowledge makes it more probable that the engine plays local moves, and avoids making so-called tenuki moves (moves that are in a different area).

After the paper was written, the authors made some experiments with Leela Zero, a very strong go program that uses deep neural networks. Comparing the prediction accuracy, Leela Zero has 50% base accuracy of prediction vs. 30% accuracy for Fuego. Stronger trend of less prediction accuracy with more search. Also, Leela Zero makes tenuki moves about the same amount as professional players.

Quotes:

Neural networks, they are not missing much

and:

With more search, Leela Zero disagrees more with humans. Since Leela Zero is better than humans, this is probably not good news for us.

General Game Playing

General game playing (GGP) is quite interesting, with a language for describing games. Two presentations on GGP were given.

GGP uses the Game Description Language (GDL). There is no canonical format for a game, the language is a limited programming language similar to Prolog.

Incomplete information

Recently incomplete information was added, which makes the problem a lot more complex. I did not know that GGP had been extended for this, so that was quite interesting. The first paper shows a new method to solve these types of games using something called iterative tree search. One interesting feature (among several) is that the new algorithm could reasonably model opponents and their strategy. the draw-back is that the algorithm only works for very small games with small trees.

Statistical GGP Game Decomposition

Games can be compound, meaning that they are composed of several parts. For example, serial/sequential games, asynchronous game, playing multiple games, and so on. The paper is about decomposing compound games based. Naturally, a decomposition is very desirable since it lowers the complexity.

The results are interesting, with most games in a large test-set being decomposable in less than a minute. Furthermore, it is possible to decompose games dynamically during game-play.

Textworld

Textworld is a framework for experimenting with text based games. Some challenges in general for AI:

Language understanding
Common sense
Memory
Planning
Experimentation

From Hitch hikers guide to the galaxy a common sense interaction at the start of the game is:

Game: It is Pitch black
User: Turn on light

Reinforcement learning challenges

Compositional actions
Partial observability
Sparse rewards
Exploration (trial and error)

The paper shows a generator for games is produced that can produce many different games with varying parameters. Objects are generated as the elements in the world. The generator uses linear logic action rules to model that game state changes, since linear logic can “consume” resources. The game state is then simply a collection of predicates. Given the game state, a context-free grammar generator to generate the game text description.

Generating Stable, Building Block Structures from Sketches

The paper is about taking a sketch and creating stable structures for use in Angry Birds. The main process has a number of steps

Polygon splitting
Stability analysis / structure adjustment
Grouping Block Sets (height and shape)
Create Composite Block Shapes
Block Scaling
Select Block Shapes
Structure Evaluation

The criteria for polygon splitting is to be stable on flat ground, which is uncommon. Splitting is done by finding corners in the sketch and snapping the found corners to similar values. Corners that are convex are connected horizontally.

For stability a custom approximate algorithm is used instead of expensive simulation. The algorithm has 98% accuracy compared with the simulation. The algorithm can also identify specific unstable blocks, so that extra support can be added.

Composite blocks is about combining the 8 basic block shapes from Angry Birds to create larger structures. However, combining parts is hard (knapsack) and splitting larger blocks into smaller may also affect stability.

One interesting feature is that sketches are un-scaled. To solve this, the system gives five different solutions on sizes.

To evaluate the system, actual Angry Birds structures were shown to participants who sketched the structures. Then structures were generated. Quality depended a lot on how good the sketches were, the result generally looks very much like the sketches.

Towards Embodied StarCraft II Winner Prediction

Is it possible to determine the winner in StarCraft II (SCII) 8 minutes in? 20 minutes in? SCII is a very complex RTS that recently got an official interface from DeepMind in collaboration with Blizzard. Standard bots are very bad at the moment. A method that can predict the winner can be useful in creating bots, for example when implementing MCTS for SCII.

The title of the paper contains the term embodied, which in this context means that the prediction is done based on the observations of one player, not total game state observation. Fortunately, it was not about putting the classification on a computer inside a physical robot :-)

The data-set uses around 5k games of Terran vs. Terran. No spatial data is used. To extract the data, the games are re-played and values are extracted during the games. This is quite a time-consuming task, taking several minutes per game. The methods are “standard” AI method, such as decision trees, K-NN neighbor classification, random forests, and very simple shallow neural networks.

Deep neural nets were tried, but did not give any good results. They did however perform very well (better than the other methods) on deciding winner in the end-state.

In the end, it is possible to predict the winner with between 65% and 70% accuracy between 10 and 20 minutes in. The best methods used random forests, but decision trees are very good and are also understandable for a SCII expert.

What’s In A Game?

The paper uses logs/graphs of learning for different atari games as a measure of the games. These logs are then clustered hierarchically, and the clustering distance shows that similar games have similar learning.