On Link Prediction in Knowledge Bases: Max-K Criterion and Prediction Protocols

Published in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

Errata

The Theorem 1 in the paper is false! We are sorry for this! We only realized the mistake after submitting the camera-ready version. In the conference when I gave the talk, I explicitly pointed this out.

Reason: Suppose for a link prediction task, there are $T$ correct answers and $1<T<k$. When we sample $k$ answers from the oracle distribution, there may be duplicate answers. The worst case is all these sampled answers are the same. This would result in the expected recall being less than the optimal value in Lemma 1. So does the expected $F1$.

Hence the sampling protocol is not optimal in an average sense and cannot guarantee that when a model is optimal, it will achieve the theoretical max-k precision, recall and F1 using this protocol. However, the greedy protocol, derived from the sampling protocol, is optimal. To see this, a proof is given below.

Although the theorem 1 is false, the main conclusion in the paper still holds true.

Abstract

Building knowledge base embedding models for link prediction has achieved great success. We however argue that the conventional top-k criterion used for evaluating the model performance is inappropriate. This paper introduces a new criterion, referred to as max-k. Through theoretical analysis and experimental study, we show that the top-k criterion is fundamentally inferior to max-k. We also introduce two prediction protocols for the max-k criterion. These protocols are strongly justified theoretically. Various insights concerning the max-k criterion and the two protocols are obtained through extensive experiments.

Jiajie Mei, Richong Zhang, Yongyi Mao, and Ting Deng. 2018. On Link Prediction in Knowledge Bases: Max-K Criterion and Prediction Protocols. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '18). ACM, New York, NY, USA, 755-764. DOI: https://doi.org/10.1145/3209978.3210029