Knowledge base completion 101

Knowledge base completion (KBC) is not a standard task in natural language processing nor in machine learning. A search on Google scholar results in only over 100 article containing this phrase. Although it is similar to link prediction, “a long-standing challenge in modern information science” (Lü & Zhou, 2011), it has received much less attention.

However KBC is potentially an important step towards natural language understanding and recent advances in representation learning have enabled researchers to learn larger datasets with improved precision. Actually, a half of KBC articles were published in or after 2010.

There isn’t a common definition of the task. In Socher et al. (2013), the machine was asked to classify triples as correct or corrupted (i.e. one of two entities was replaced by a random entity). In this article we will use the same formulation but instead of predicting hundreds of thousands triples extracted from WordNet and FreeBase, we will try to predict the relations between people in two small families.

The dataset

family trees - hinton 1986

Two isomorphic trees, the symbol “=” means “married to”. Adopted from Hinton (1986).

The data was taken from the classical paper Hinton (1986). 2 families (see figure) were described by binary relations between members: father (of), mother, husband, wife, son, daughter, uncle, aunt, niece and nephew. That makes 336 triples in total (I used a Prolog program to generate them).

To get you started more easily, I already divided them into training, validate and test set with a ratio of 8:1:1. In the training set, each positive example was repeatedly corrupted to create 120 negative examples. To maintain the balance, there were 120 replicas of each positive example. In other sets, only one negative was generated for each positive example. All files were encoded in compressed numpy format so to examine them closely, you can run:

import numpy as np
arr = np.load('train.npz')['arr_0']
print arr[:5,:]

The results should be like:

[[ 3  5  7  1]
 [ 4  6  7  0]
 [22 18  0  1]
 [ 7  5 30  1]
 [24 17 28  1]]

which means (entity #3, relation #5, entity #7, positive) and so on…

The model

When working in academia, it’s easy to come up with overly complicated models. People have been working on tensor multiplication in the hope for more interaction between entities but I’m not persuaded. The interaction between two entity vectors \vec{e_1} and \vec{e_2} is stronger in the sense that the product \vec{p}=\vec{e_1}W\vec{e_2} response to a change of \vec{e_2} in way that is strictly depended on \vec{e_1}. Say, p_i might increase or decrease when \vec{e_2} changes by the same amount depending on \vec{e_1}. However such interaction is very hard to interpret and might entangle different factors of the data instead of disentangle them.

Therefore, in this experiment I used a very simple model: a feedforward neural network with only one rectified linear hidden layer. A rectified linear unit can be thought of as a feature detector whose activation depends on how much the underlying state matches its favorite pattern. Different from sigmoid units, it doesn’t restrict itself to a valid probability.

The output layer was softmax and for the ease of implementation I used two output units. One of them emits the probability of a correct relation while the other corresponds to a corrupted one. The network learns to minimize the negative log likelihood:

\ell(x;\theta) = - \log (y_0 \hat{y}_0 + y_1 \hat{y}_1)

The implementation

Pylearn2 is an extremely useful toolkit to play around with neural network. Tutorials are available and there is also Vincent’s blog post which will help you dive deeper with less pain.

First thing first, we need a class representing our dataset. DenseDesignMatrix is, well, a dataset design based on dense matrices. The use of x_labels and y_labels here is crucial – it tell Pylearn2 to interpret matrix elements as labels instead of real values.

class KBCDataset(DenseDesignMatrix):
...
def __init__(self, which_set, home_dir, max_labels):
...
super(KBCDataset, self).__init__(
X=self._data[:, :-1], X_labels=max_labels,
y=self._data[:, -1:], y_labels=2
)

Model building in Pylearn2 is declarative and straightforward. One would specify a network layer by layer in YAML as below:

model: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.sandbox.nlp.models.mlp.ProjectionLayer {
layer_name: 'projection',
dim: 10,
irange: 0.3
},
!obj:pylearn2.models.mlp.RectifiedLinear {
layer_name: 'h',
dim: 10,
irange: 0.3,
},
!obj:pylearn2.models.mlp.Softmax {
layer_name: 'y',
n_classes: 2,
irange: 0.05,
binary_target_dim: 1,
},
],
...
},

The first layer doesn’t do any computation apart from replacing each label in input triples by a 10-dimensional vector. 3 vectors will be concatenated to make one vector of 30 dimensions. Our neural network actually works on this input instead of labels. irange tells Pylearn2 how to initialize parameters, for example, an irange of 0.3 means the parameters will be generated from a uniform distribution in the range [-0.3, 0.3] before the first training epoch. binary_target_dim makes training slightly more efficient by keeping expected output as labels instead of converting them into values.

We specify the training dataset by simply calling the dataset class constructor and passing appropriate parameters:

dataset: &train !obj:dataset.KBCDataset {
which_set: 'train',
home_dir: &data_dir 'dataset',
max_labels: &max_labels 40,
},

The rest of the configuration file is also very readable and can be found here.

To train the model, one simply runs train.py. To visualize the vector space, tsne.py is available.

The results

After about 50 epochs, the model reached 100% training accuracy and 75-78% accuracy on validate and test set. It is not bad at all because I didn’t try different values for hyper-parameters.

The visualization reveals some aspects of the vector space. First, the universe of the problem is divided into relations and persons (notice that nothing told the model about that, it learned from data). Second, two families are clearly separated. With some exceptions, females are to the right and males are to the left in each family. It seems that family and gender are the most informative dimensions in this problem.

Conclusions

In this article we learned how to perform knowledge base completion with Pylearn2. It is amazing that we can use distributed representation to perform a symbolic task and it is even more exciting to observe it by our own eyes. The very same principles apply to both this small experiment and full-blown, real-world application. And as the saying goes, a journey of a thousand miles begins with a single step.

All source code is available at Github.

References

Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society (Vol. 1, p. 12).

Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6), 1150-1170.

Socher, R., Chen, D., Manning, C. D., & Ng, A. Y. (2013). Reasoning With Neural Tensor Networks for Knowledge Base Completion. Advances in Neural Information Processing Systems, 926–934.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s