Why is inference better/easier in a knowledge graph?



I’m completely new to Knowledge Graph so apologise for the noob question.

I’m trying to understand how inference works in practice and why inference is easier/better in a knowledge graph than in a traditional relational database. I’ve read that inference can be based on type (e.g. a car is defined as a vehicle, and a vehicle can be driven a car will inherit the “drivenness” capability) or defined rules (if x is in y and y is in z then x is in z). Why can’t this be achieved as easily in a relational database? Is it because organising data in a graph makes the query easier?

Again sorry for the basic question, I hope it makes sense. I’ve been looking around but can’t get my head around how inference works.



This is a very good and open-ended question.

If we are directly interested in the performance, your question basically boils down to a relational vs graph db question which is a widely discussed topic. This gives a short answer to the question without providing much insight. The key point, however, is how inference and reasoning processes work and how they can be realised in a database context.

Focusing on the logical deduction - a process which obtains a logically certain conclusion from a set of premises, it can be viewed as attempting to find new relations (in a graph case, more generally connections) between entities already present in the database. This process typically involves operating on many-to-many or complex relationships as the sought relation between given entities can easily involve multiple attributes (entities can be multiple hops away).

Now for a relational database, querying these types of relations typically requires introducing multiple JOIN tables which hold foreign keys of participating tables. For complex or multiple relations this is often an overkill. Moreover, in the inference context, even for moderate volumes of data a simple input query may require executing thousands subqueries checking specific premises are satisfied.

Now the problem of high-cost joins is inexistent in graph databases as encoding the database in a graph allows to leverage the multi-connectedness of data required during reasoning as the graph structure allows to query complex and many-to-many relationships naturally.

Now when talking about reasoning, a simple graph database is not enough. In order to have a generic inference engine - capable of providing automated query response, it needs to be able to interpret the data unambiguously - a higher level structure needs to be imposed and the semantics of it - the meaning of the structure, need to be defined.
A structured graph database with explicit semantics gives rise to an entity referred to as the knowledge graph which combines a knowledge base in a graph form with a reasoning engine. Failure of providing an explicit structure to the graph either results in limited expressing power or hand-waving/hard-coded inference procedures.

This provides a very basic top-level explanation. Shoot us questions in case of a doubt.


Hi! This is a great question. I can give you my own take on this but ultimately it’s up for debate.
The short version is graphs are a closer match semantically for deduction and it gives some computational benefits.

Inference has been performed historically on data represented relationally and under predicate form. If you have “parent(adam, bob)” and “likes(adam, pizza)” in your relational knowledge base, you don’t have a quick way to infer things about adam and you 1) need to know about all the tables involved 2) need to make some kind of join over those tables. If these facts are in a graph, you can easily navigate from adam with a simple lookup. Graphs also make it fairly easy to implement an algorithm for deductive inference called resolution.

This said, inference can be performed in many ways and in some cases a relational representation might be more appropriate. A way very generic and incomplete way to summarise this is, knowledge graphs give you entity-centric inference while relational is relationship-centric.

As a side note, graphs help with other kind of queries that you might loosely call inference. For example, how many clusters of people do we have in our data? Or what’s the average fan out for certain entities? This is not deductive inference but it gives us useful information. And in addition to that we can use this extra information to enhance deduction (if our data is distributed, we can keep data that is likely to be accessed in the same inference in the same node; or we can use information about the fan out to optimise the way we do deduction).

Someone else from the team will chip in with their opinion and we are probably going to write a blog post about the topic!


Thanks @kasper and @domenico, these are great answers in what was a very open ended question. I’m happy you did not answer in terms of technical performance only.

It clarifies a lot to explicitly define a knowledge graph as
= graph database + explicit semantic
= knowledge base + reasoning engine

Also, summarising the graph vs. relational inference as ‘knowledge graphs give you entity-centric inference while relational is relationship-centric’ actually makes it clearer, as I can understand why inference would be easier navigating between individual entities (near one another) in a graph, rather than trying to aggregate all attributes related to a given entity from different tables (as in a relational database). I suppose it would also scale up better?

Thanks again for the time you took to answer the question. Looking forward to reading the blog post.


And what about speed and effectiveness of relational inference in case of traditional Prolog ?

For example parent(adam, bob) and likes(adam, pizza) predicates you used as illustration, is native Prolog clauses, and modern systems (like XSB), including compiling ones, gives very notable inference speeds.

PS: but I never hear about Prolog system able to run in cluster on huge data