Load relationships while loading the graph data


#1

Hi All,
I am able to migrate a csv file into GRAKN using templates. I am looking to define relationships based on a logic on the csv data . Eg. the CSV file has a row showing the employee and the supervisor and i am trying to define an employment relationship with employee and supervisor as roles. I am looking forward to do this while loading the CSV data using templates.Can anyone help me on this?

I tired loading relationship after i loaded the graph entity data by running the GRAQL

match isa emp_table has ID $e isa emp_table has SUPERVISOR_ID $s;offset 0; limit 50;insert (employee: $e, supervisor: $s) isa employment;

, but i am getting the below error
The type [ID] of role player [12476574] is not allowed to play RoleType [employee]
The type [SUPERVISOR_ID] of role player [4527834] is not allowed to play RoleType [supervisor]

Also, i did declare roles which the entity needs to play

emp_table sub entity
plays employee
plays supervisor
has ID
has SUPERVISOR_ID;

employment sub relation
relates employee
relates supervisor;

employee sub role;
supervisor sub role;

Kindly let me know on any update needed.
Thank You


#2

Hi Sudharshan,

I believe that you have an issue with your ontology. You currently have a row in your csv file modelled as an entity emp_table sub entity. The entity has two resources attached has ID and has SUPERVISOR_ID. You then are trying to add the relationship employment between the resources ID and SUPERVISOR_ID.

Your ontology, however, specifies that only the entity emp_table can play the roles in the employment relationship. You actually need to specify that the resources themselves can play the roles in the relationship. If you use the ontology below then your query should work.

insert 
emp_table sub entity 
has ID 
has SUPERVISOR_ID; 
 
ID sub resource datatype string 
plays employee; 
 
SUPERVISOR_ID sub resource datatype string 
plays supervisor; 
 
employment sub relation 
relates employee 
relates supervisor; 
 
employee sub role; 
supervisor sub role;

Good luck!

Sheldon


#3

Hi Sheldon,
Thank you very much for the quick response.
Let me explain what i am trying to do.I am trying to implement the below ontology, where the 'ID and ‘SUPERVISOR_ID’ are resources of ‘emp_table’.

I am trying to load data in a fashion, thereby i would be able to traverse to the supervisor instance from an employee instance.
Kindly let me know on any update needed.

Thank You


#4

Hi Sudharshan,

If I understand correctly your end goal is to be able to recursively determine a persons chain of supervisors. I think the problem you are having at the moment is with your data model. The entity that you have defined emp_table actually represents a row in your csv file, with the information from each column contained in the resources ID and SUPERVISOR_ID.

I would suggest a model more like this:

insert

# the main entity is now a person instead of a row in the input data
person sub entity
    key ID
    plays employee
    plays supervisor;

# people have a unique identifier
ID sub resource datatype string;

# people can take part in the employment relationship
employment sub relation
    relates employee
    relates supervisor;

employee sub role;
supervisor sub role;

consider the example data:

insert

# people
$a isa person has ID "1";
$b isa person has ID "2";
$c isa person has ID "3";
$d isa person has ID "4";

# supervisors
(employee: $a, supervisor: $b) isa employment;
(employee: $b, supervisor: $c) isa employment;
(employee: $c, supervisor: $d) isa employment;

now you could execute the query:

match
    $a isa person has ID "1";
    (employee: $a, supervisor: $b);
    (employee: $b, supervisor: $c);
    $b isa person has ID $d;
    $c isa person has ID $e;
    select $d, $e;

and you would get this answer:

$d value "2" isa ID; $e value "3" isa ID;

With respect to importing the data, unfortunately the format of the csv file means that you will have to deal with duplicates. It would help a lot if you could load all of your people in advance, and then migrate the csv file with the supervisor information. You could then even write a reasoner rule to infer the information about supervisors.

Hope this helps,

Sheldon


#5

Hi Sheldon,
Thanks a ton for the detailed explanation.
Can you please help on the below too?

  1. do i need to have the data defined in all possible ways only through a program? or Is there any way using GRAQL template, to do the below in a much efficient manner

#people
$a isa person has ID “1”;
$b isa person has ID “2”;
$c isa person has ID “3”;
$d isa person has ID “4”;

Please let me know on any update needed.

Thank You


#6

Hi Sheldon,
Please ignore the previous message. I was able to use the migration.sh csv to load the relations to the immediate supervisor alone based on the relations template in the template file.
I am right now looking to fins the shortest path or common relationships between employees based on the supervisor relation ships defined between them, let me keep you posted on updates on this.

Thank You


#7

Let me know if it works!

Sheldon


#8

Hi Sheldon,
I am trying to find the ‘shortest path’ or ‘explore relations’ based on implicit relationships following the below rule

insert
$supervisorOfSupervisorAreManagers isa inference-rule
lhs
{(employee:$e, supervisor: $s) isa employment;
(employee: $s, supervisor: $m) isa employment;
}
rhs
{(employee: $e, manager: $m) isa employment;};

The query match (employee: $e, manager: $m) isa employment; works , but i am inclined towards finding the ‘shortest path’ or ‘explore relations’ without explicitly defining the path or relation.Can you please help on this?

Thank You


#9

In terms of reasoner, you can try this rule (very slightly modified from yours):

insert
$supervisorOfSupervisorAreManagers isa inference-rule
lhs
{(employee:$e, supervisor: $s) isa employment;
(employee: $s, supervisor: $m) isa employment;
}
rhs
{(employee: $e, supervisor: $m) isa employment;};

Then in the web dashboard you can run this query:

match
$x isa person has ID "1";
($x, supervisor: $y);
$y isa person;

and it will show you the three people that are the supervisor of person 1. This is not materialised at this point, however explore relations should work. The shortest path results will not include the information that is inferred.

If you want to use the rules with shortest path, then the information they explain needs to be materialised. To do this you can run Graql shell with the -n -m arguments to infer and materialise the extra relations. If you run the above query again and commit then the relationships will be permanently added. Then you can try shortest path and explore relations.

Sheldon


#10

Hi Sheldon,
Thanks!! I am in need of finding the shortest path between any two nodes in a graph, such as a result from Dijkstra’s algorithm.I was under the impression that ‘shortest path’ or ‘explore relations’ would do that; such as the shortest way how any two employees are related to each other depending on the hierarchy of supervisors they report to.
Can you please guide on this?

Thank You


#11

Using graql.sh -f filename load this file (just the above example in a single file):

insert

# the main entity is now a person instead of a row in the input data
person sub entity
    key ID
    plays employee
    plays supervisor;

# people have a unique identifier
ID sub resource datatype string;

# people can take part in the employment relationship
employment sub relation
    relates employee
    relates supervisor;

employee sub role;
supervisor sub role;

# people
$a isa person has ID "1";
$b isa person has ID "2";
$c isa person has ID "3";
$d isa person has ID "4";

# supervisors
(employee: $a, supervisor: $b) isa employment;
(employee: $b, supervisor: $c) isa employment;
(employee: $c, supervisor: $d) isa employment;

# rule
insert
$supervisorOfSupervisorAreManagers isa inference-rule
lhs
{(employee:$e, supervisor: $s) isa employment;
(employee: $s, supervisor: $m) isa employment;
}
rhs
{(employee: $e, supervisor: $m) isa employment;};

You can then run graql.sh and execute this query in the terminal:

match $x isa person has ID $y;

and you will get results something like this (your ids will be different):

$x id "8192" isa person; $y value "1" isa ID; 
$x id "20720" isa person; $y value "3" isa ID; 
$x id "24616" isa person; $y value "4" isa ID; 
$x id "24624" isa person; $y value "2" isa ID;

using these ids we can look for the shortest path between person 1 and person 4:

compute path from "8192" to "24616";

which is:

id "8192" isa person
id "20480" (supervisor: id "24624", employee: id "8192") isa employment
id "24624" isa person
id "24728" (supervisor: id "20720", employee: id "24624") isa employment
id "20720" isa person
id "16440" (supervisor: id "24616", employee: id "20720") isa employment
id "24616" isa person

Try it out and let me know how it goes.

Sheldon


#12

Hi Sheldon,
Thanks, i guess someway or the other we need to store raw data in all possible relations which we are thinking of, for Grakn to know how data is related. I was under the impression that depending on the abstract relationships the engine would check for relations based on the data following a schema in a csv file.

Also, i am in need of defining relationship on relational data over large datasets, in the neighborhood of 7,00,000 to a million records.Can you please advice on this?

Thank You


#13

We successfully load graphs with millions of relations in the office, so your data size won’t be a problem. In terms of the data model, I am not completely sure I understand your question. However, I will have a go.

When we migrate a dataset into Grakn we spend some time at the beginning creating the schema that models the data in as natural way as possible. Like in the ontology I recommended using above. Then we will map the entities and relations in the raw data to the ontology in the migration scripts (sometimes a little post-processing or multiple migrations of a single file is needed for entity de-duplication). The aim of the mapping is to get as much of the explicit information into the graph.

Once the explicit information is in the graph we then start working on the rules, which will allow us to infer the higher level relationships that are not explicit. A good example is the one we have been looking at above. The explicit relationships are those telling us who is the supervisor or who in terms of people. We then specify a rule saying that the supervisor of my supervisor is also my supervisor (this would be the inferred relationship).

Now that the explicit data and rules are in place you can then query across all of the information including the relationships that weren’t explicitly in your original dataset.

Perhaps you can provide a little more information about the types of data you want to migrate into Grakn so I can make my answer a little more specific.

Sheldon


#14

Hi Sheldon,
Thanks for offering help. I am actually trying to build a knowledge graph on a company’s LDAP directory data. I am trying to do two things

  1. I have data in a csv file, so would like to represent the data in a graphical form(i.e) in a hierarchical pattern)–This is done.Also, with your help i was able to define the supervisor and employee relationship upto one level from the employee’s level in the hierarchy.
  2. Automated reasoning, where i am trying to deduce facts, such as … what are the possible ways two random employees can be related to?(depending on explicitly defined relations).
    I hope this explain my requirement.Kindly let me know on any update required.

Thank You


#15

Glad to hear you have got something working already. With respect to your second point, the shortest path algorithm is an option for you. Given two employees it will find a path between them if it exists. You can also specify a subgraph so that you only find shortest paths that exist only over a specific set of relationship and entity types. The two issues that you need to be aware of are: the shortest path algorithm currently returns a single shortest path, not all possible paths; it can be very expensive to determine the information you are looking for if we use the shortest paths algorithm.

How often do you need to compute this information and does it need to be done in real time? If this is the case the algorithm may need to be more customised to your application.


#16

Hi Sheldon,
Yes, i understand the expensive nature for the scan through all relationships.It would be helpful to find the shortest path based on a set of relationships and entity types.
Also, i am right now starting to use the Java API to stream data from an inbound stream into Grakn. So, i would appreciate if there were any sample programs to start with, it would be faster for me this way, otherwise i need to go through the documentation for each requirement and write code from scratch. I am right now working part time in Grakn, so just looking for ways by which i can just concentrate on my logic and built something quickly.

Thank You


#17

Ok the syntax for shortest path in Java will be this:

try (GraknGraph graph = session.open(GraknTxType.READ)) {
                graph.graql().compute().path().from(concept1).to(concept2).in("person","employment").execute();
            }

Where in allows you to specify that you only use the employment relation and person entities. In terms of an example project I had a quick look at our sample projects on github and perhaps analytics genealogy and graph api genealogy and Simpsons are the closest.

I have added a requirement for us to create some kind of example project of streaming data to help out future users as it seems like you will not be the only person who this may help.


#18

Hi Sheldon,
I believe the java API examples on genealogy and Simpsons should help me get started. Also, to your point i second that a streaming example would benefit many, as there are lot of applications which revolve on realtime data; so,thanks for submitting the request.

Thank You


#19

We have someone starting soon who will hopefully start with this task. Thanks for the feedback!