IJCAI 2018, first day

IJCAI 2018 is being held in Stockholm from July the 13th to July the 19th as a part of the Federated AI Meeting. In this and other posts I will give some short comments on interesting things I’ve heard. If anything is wrong or misrepresented, please don’t hesitate to get in touch with me.

On the fourth day the main IJCAI conference started. The conference has many parallel tracks, comments here are from sessions that I thought might be interesting for me.


In the opening remarks, we were told that in total the FAIM conferences had more than 7500 registrants, including the more than 2500 IJCAI attendees. For IJCAI, there were 3470 submissions (a record number for IJCAI), and 710 (21%) were accepted. 20 area chairs, 500 senior program members, and more than 2k program committee members.

A historic context from the opening remarks is that in a panel around 70 years ago, Alan Turing said that it is certainly possible to make a true artificial intelligence, but that it would take at least 100 years of dedicated work.

Learning World Models: the Next Step Towards AI

Video is available

Yann le Cun is a very well-known researcher in machine learning, and is currently at Facebook AI.

Most modern cases of machine learning in AI is currently done using supervised learning, where lots and lots of examples are used. Deep learning has improved this by allowing training both feature extraction and the classifier in conjunction, instead of hand-coding feature extractors. The ideas are old and from the 50s.

The original idea was to use gradient descent/hill climbing, but originally the problem of local minima was would be a problem. In high-dimensional spaces, this turns out not to be a problem. Neural nets have strange behavior where they can be quite a bit easier to train when they are (much) larger than they need to be. While it is possible to use full layers, it is most common to use mostly convolutional layers, and also to re-use weights.

Neural nets were successful for some applications in the mid 90’s (AT&T check reader), but then fell out of favor in the research community. One potential reason is lack of software and data-sets, and also very expensive training.

Cool application in mid 00s was using neural nets for training steering of vehicles. In particular, the robot wanted to classify an image into drivable ground. In order to get training data, more expensive stereo-vision data was collected to classify pixels which could then be used for training.

In 2010 they submitted a paper using convolutional nets for image segmentation to a image recognition conference. The paper was rejected because they could not believe that a method they had never heard of could giving such good results. This has rapidly changed. A few years later, Tesla licensed the technology for use in their cars. A few years later, the first results that used deep neural nets for object classification was published. The error rates have gone down from around 25% using classical techniques to 16% using the first neural nets and today just a couple of percent.

Todays neural networks use around 50-100 layers. For example, at Facebook all the images uploaded are processed using neural nets to tag events, to tag friends, and to generate summaries for visually impaired, and to find objectional content.

So, why do we need so many layers? The world is compositional and has many details. This compositionality is naturally represented by many layers.

A recent experiment using 3.5G images from Instagram and training a neural network against the hashtags used. Removing the tag classification from the network and then re-using the “feature extraction” part of the network for a classical image classification case improved results. This implies that it is possible to actually learn in some sense models of the world.

Modern image segmentation can classify images, separate different similar objects from each other, and also to find key points on humans. This made it possible to on a mobile device real time detect skeleton outline movements on human.

It used to be that speech recognition, translation, and image recognition used completely different techniques. This is changing to everyone using neural nets and people can use ideas from one field in another more easily.

All of this is fantastic, but what is missing? Neural nets and deep learning is only perception, it does not include reasoning. Many interesting research directions for combinating neural nets and reasoning.

Another problem is that reinforcement learning for deep neural nets uses a lot of data. This means that it works well for games, but it is hard to use in the real worlds. For example, using RL to train a real car would not work, since the car would have to run of a cliff thousands of times before it learns not to do that. Humans can learn to drive in a few hours without ever crashing.

We need to have smarter learning that has “common sense”.

This means that we need true world models. As a simple example, we can not today make a robot that loads or empties a dishwasher.

Babies learn by observation, essentially without interaction. Studies are made at which age humans learn different concepts. For example, 9 month olds understand concepts such as gravity and inertia.

A potential concept is to use self-supervised learning. For example, predict the future from the past, or to “forget” the past and try to predict it from the present. This allows learning from observations. Comparison

  • Reinforcement learning has low feedback content (was this good or bad)
  • Supervised learning has medium feedback content (whatever has been labeled for each instance)
  • Self-supervised learning has high feedback content (all of the data)

So, could self-supervised learning lead to common sense. Perhaps the possibility to learn from observation is that is common sense.

The next step is to also make predictions. Predictions are needed to make plans. Since the world is uncertain, we also need to make predictions with uncertainty. One potential avenue for this is to use adversarial training.   Combining semantic segmentation of images with prediction models for dashboard cameras in cars has many interesting applications. For example, it is possible to predict that a pedestrian crossing the street will continue crossing the street. Integrating this with planning allows for very interesting possibilities.


  • Science drives technology, but also vice versa
  • A lot of scientific domains have been born from the study of technological artifacts
    • Telescope -> optics
    • Airplane -> aerodynamics
    • Steam engine -> thermodynamics
    • Calculators -> computer science
    • Telecommunications -> information theory
  • If AI and neural nets are the technology, what is the science?

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

Instead of using reinforcement learning (RL) for playing Atari games, OpenAI published research using another technique called evolution strategies (ES).

The neural nets for playing contain 1.7M edges. Instead of using gradients as in RL, ES treats the edge weights as a black box. The idea is to just create versions of current state, find the best versions, and create a weighted version. This is then iterated, as a normal local search algorithm.

The paper simplified the method used from the original version (avoiding all assumptions on the network), and showed that it still worked ok, surprisingly.

An interesting difference is that the system only optimizes for total reward, which means that the behaviors are not as “natural”. One really interesting case was that an infinite score bug was found in Q*bert that could be replicated on a real Atari.

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

This is the same as the presentation in the first days workshop.

MCTS-Minimax Hybrids with State Evaluations

MCTS is a kind of best-first search algorithm, typically guided by simulations playouts. Minimax on the other hand try to explore the whole tree.

Minimax does not think too deeply on promising things, while MCTS might miss traps that are close to the root. Classically (pre AlphaZero), MCTS has been better for games like Go and minimax style algorithms for games like Chess with many catastrophic traps.

Is it possible to combine the strengths? The idea is to integrate shallow minimax searches into MCTS instead of evaluation function calls.

MCTS variants with informed rollouts, informed rollout cutoffs, and informed node priors. The evaluation function for the three variants was changed to a reasonable minimax version.

The variants were tested on breakout and Reversi. The best version used informed priors, and triggering a minimax search when visiting a node a certain number of times. The idea is that the minimax trees protected well against traps.

The paper is quite long (39 pages), so lots of details skipped. Seems very interesting and worth studying.

Knowledge Representation and Agents: Games, Decision, Social Choice

Ceteris paribus majority for social ranking

When ranking groups of individuals with a power relation, some groups may not be similar according to the relation. The paper proposes that group rankings where the groups differ only in one member of the group can be used to extract an ordering for the group members. Blindly using this for ranking groups that are otherwise similar according to the power relation might lead to cycles, so some restrictions are needed.

Apart from the technical contributions made (axioms, formal definitions, theorems, and so on), the idea seems interesting to explore.

Related: much research on extending rankings of individuals to rankings of teams, while this essentially does the inverse.

An Efficient Algorithm To Compute Distance Between Lexicographic Preference Trees

Lexicographic preference tree (LP-Tree) are used to represent preferences. Can be used for many things, such as market research, planning/strategy, behavior analysis. LP-Trees is a way to have hierarchical choices on preferences that is ordered with the most important choice at the root. Note that the left and right sub-trees of a node may have different order of importance for the variables ranked on.

Unfortunately, computing the induced rankings from LP-Trees is difficult since the number of outcomes to compare is exponential. The article gives a way to compute the distance between two trees using a tree traversals.

My reflection knowing nothing about LP-Trees before, is that it would be interesting to turn an LP-Tree into a Reduced Order Binary Decision Diagram. Extracting the order might explode the size of the diagram, but could give nice other properties and be useful for implementation.

Game Description Language and Dynamic Epistemic Logic Compared

GDL-III is used for general game playing. It has intuitive modeling but hard semantics. Dynamic epistemig logic (DEL) is used in epistemic planning. It has intuitive semantics, but hard to model. The article presents ways to translate between the two. Unfortunately, the translation from GDL-III to DEL required extending and also getting an exponential blow-up.

Goal-Based Collective Decisions: Axiomatics and Computational Complexity

Goal based collective decision is about formalizing the problem of finding a common ground among agents that have different goals. This is modeled as propositional formulas over binary decision variables.

The base case is just to check if the goals have any common model. In cases where no common model exists, the paper uses approval voting or the agents (agents get to vote on the model) using different voting methods.

The computational complexity is quite high (P^NP an PP-hard for two different methods). Given the high complexity, strategic behavior by the agents may be interesting to study.


Interactive, Collaborative Robots: Challenges and Opportunities

Danica Kragic from KTH gave the second invited talk.

Current state of the art is for robots to pick parcels as is done in the Amazon pick challenge. The future goal is for computers that can understand their environment and the needs of humans. For example, a robot assistant that can help in a kitchen based on visual cues. Just at simple task like cutting bread is very hard to do currently. Amount of force to use, how to cut, how to hold, and so on are very hard. Given a robotic assistant, it is also important to make the robot interact with humans in a natural manner. This requires predicting human behavior, interacting physically, communication between robot and human, and so on.

Replicating the ability of a human to grasp and handle objects is very challenging. One issue is just mechanical, it is hard to build good fine-motion robotic hands that can also apply enough force. Some examples using three-finger hands shown where humans push/pull on objects the hand is holding and the hand re-adjusts its grasp. Extracting 6-dof tracking of hands as they interact with object. This data is then extracted and used in various ways. One approach is modeling different hand designs and checking which parts of human hand motion they can perform.

Early Career 2

Formal Analysis of Deep Binarized Neural Networks

Adversarial input to neural networks can make a neural network give a completely wrong answer, without a human being able to see the difference. Given this problem, nina Narodytska presented her research direction into analyzing neural networks with the goal of actually verifying that a network for example does not have adversarial inputs.

The main idea is to translate a binarized neural network into a SAT formula. This can then be combined with input values and a “perturbation”, and requiring that that the output should flip based on the perturbation. If the formula is satisfiable, then that represents an adversarial example, and if the formula is unsatisfiable then there is no such input.

While the formula can be dumped into a black-box SAT solver, exploiting the structure of the network for a smarter search is very beneficial.

Reasoning about NP-complete Constraints

Three friends are playing footballtennis, where two play and the third waits and then plays the winner.

  • Hazard played 10 games
  • Modric played 14 games
  • Mbappé played 18 games

Who lost the second game?

When solving constraint programming problems global constraints are crucial. For the above riddle, the sequence constraint can be used to model the above riddle. The constraint can be used to limit the sum of variables on a sliding window along a sequence of variables.

The model for the above riddle uses variables saying Hazard skipped the game. There are (10+14+18)/2 games. Hazard must therefore have skipped 11 games. He can not skip more than two games in a row, which is efficiently modeled by the sequence constraint with a window size of two.

Another example of global constraints was the Philae lander that landed on a comet. In order to solve the problem, a specialized global constraint that helped model the amount of data storage available. This gave both a better solution and also changed the solution time from hours to seconds.

In his presentation, Emmanuel Hebrard argued for the need to also focus on global constraints for NP-complete global constraints. For such constraints, one needs to focus on incomplete propagation algorithms; domain consistency is not possible in reasonable time.

Both interesting generalizations of the sequence constraints and the special constraint for Philae are NP-hard.

A nice thing about handling such hard problems as global constraints is that it is possible to combine the constraint with other constraints.

A concept called kernels can be used similarly to domain consistency for characterizing the propagation of NP-hard constraints.

Demo: Solving Sudoku with Consistency: A Visual and Interactive Approach

A nice web interface showing how propagation for Sudoku proceeds. The site uses the AC/GAC/SGAC/… terminology and solver styles. The propagation is visualized nicely grouped along the 27 different all_different constraints a 9x9 Sudoku is normally modeled with.

The interface is done in React and Redux, and the solver is implemented in TypeScript so the solution runs in the web-browser.