[Solved] Relation Creation Takes Forever (most likely wrong query)


#1

Hello,

I have successfully created a course ontology and filled it with instances of courses, course offerings and other entities. However, when I try to fill the graph with relations my command takes forever to compute. I use this:

match
   $c isa COURSE has COURSE_ID $same;
   $co isa COURSE_OFFERING has COURSE_ID $same;
insert
   $relation (offered-course: $c, course-offering: $co) isa is-offered;

The idea is that multiple course offerings (in different semesters) are mapped to one course (description) and I try to match them by their common ID and then insert a relation is-offered between their nodes. There are about 10000 courses and 160000 course offerings, I think. GRAKN has been working for half a day now :confused:

Is my attempt wrong or does this kind of import just take a while?

Thanks in advance.

Best wishes,
FP


#2

I find you query weird match @c isa COURSE as from what I read in the doc it should read match $c isa COURSE @ is reserved for macros: https://grakn.ai/pages/documentation/graql/graql-templating.html#macros

Are COURSE and COURSE_ID all caps in your ontology?


#3

Typo. I’m running this in a strictly isolated VM and thus had to manually type it. It is $ in the original and still running. I’m pretty sure that this is wrong code, but I’m too unfamiliar with GRAKN to know how to optimize it. 160k offerings times 10k courses means it is most likely checking 1.6 billion combinations, so no wonder it’s still working. And yeah - they are all caps, since I copied them over from an RDBMS. I will try to post the ontology later. Thank you for your answer.


#4

Sorry, I do not have much wisdom on this yet. Still learning grakn.


#5

OK, here is the ontology:


insert

# Entities

COURSE sub entity
  has COURSE_ID
  has NAME
  has DEPARTMENT
  has NUMBER
  has CREDITS
  has ADVISORY_REQUIREMENT
  has ENFORCED_REQUIREMENT
  has DESCRIPTION
  has NUM_SEMESTERS
  has NUM_ENROLLED
  has HAS_DISCUSSION
  has HAS_LAB
  has HAS_PROJECTS
  has HAS_EXAMS
  has NUM_REVIEWS
  has CLARITY_SCORE
  has EASINESS_SCORE
  has HELPFULNESS_SCORE;

INSTRUCTOR sub entity
  has INSTRUCTOR_ID
  has NAME
  has UNIQNAME;

OFFERING_INSTRUCTOR sub entity
  has OFFERING_INSTRUCTOR_ID
  has OFFERING_ID
  has INSTRUCTOR_ID;

COURSE_OFFERING sub entity
  has OFFERING_ID
  has COURSE_ID
  has SEMESTER
  has SECTION_NUMBER
  has MONDAY
  has TUESDAY
  has WEDNESDAY
  has THURSDAY
  has FRIDAY
  has SATURDAY
  has SUNDAY
  has HAS_FINAL_PROJECT
  has HAS_FINAL_EXAM
  has TEXTBOOK
  has CLASS_ADDRESS
  has ALLOW_AUDIT;

# Resources

COURSE_ID sub resource datatype long;
NAME sub resource datatype string;
DEPARTMENT sub resource datatype string;
NUMBER sub resource datatype string;
CREDITS sub resource datatype string;
ADVISORY_REQUIREMENT sub resource datatype string;
ENFORCED_REQUIREMENT sub resource datatype string;
DESCRIPTION sub resource datatype string;
NUM_SEMESTERS sub resource datatype long;
NUM_ENROLLED sub resource datatype long;
HAS_DISCUSSION sub resource datatype string;
HAS_LAB sub resource datatype string;
HAS_PROJECTS sub resource datatype string;
HAS_EXAMS sub resource datatype string;
NUM_REVIEWS sub resource datatype long;
CLARITY_SCORE sub resource datatype long;
EASINESS_SCORE sub resource datatype long;
HELPFULNESS_SCORE sub resource datatype long;

OFFERING_INSTRUCTOR_ID sub resource datatype long;

UNIQNAME sub resource datatype string;
INSTRUCTOR_ID sub resource datatype long;

OFFERING_ID sub resource datatype long;
SEMESTER sub resource datatype long;
SECTION_NUMBER sub resource datatype long;
MONDAY sub resource datatype string;
TUESDAY sub resource datatype string;
WEDNESDAY sub resource datatype string;
THURSDAY sub resource datatype string;
FRIDAY sub resource datatype string;
SATURDAY sub resource datatype string;
SUNDAY sub resource datatype string;
HAS_FINAL_PROJECT sub resource datatype string;
HAS_FINAL_EXAM sub resource datatype string;
TEXTBOOK sub resource datatype string;
CLASS_ADDRESS sub resource datatype string;
ALLOW_AUDIT sub resource datatype string;

# Roles and Relations

offered-course sub role;
course-offering sub role;
COURSE plays offered-course;
COURSE_OFFERING plays course-offering;
is-offered sub relation;
is-offered relates offered-course;
is-offered relates course-offering;

instructor sub role;
INSTRUCTOR plays instructor;
offering-instructor sub relation;
offering-instructor relates instructor, relates course-offering;

The relevant conversions are

insert $course isa COURSE
  has COURSE_ID <COURSE_ID>
  if(<NAME> != null) do { has NAME <NAME> }
  if(<DEPARTMENT> != null) do { has DEPARTMENT <DEPARTMENT> }
  if(<NUMBER> != null) do { has NUMBER <NUMBER> }
  if(<CREDITS> != null) do { has CREDITS <CREDITS> }
  if(<ADVISORY_REQUIREMENT> != null) do { has ADVISORY_REQUIREMENT <ADVISORY_REQUIREMENT> }
  if(<ENFORCED_REQUIREMENT> != null) do { has ENFORCED_REQUIREMENT <ENFORCED_REQUIREMENT> }
  if(<DESCRIPTION> != null) do { has DESCRIPTION <DESCRIPTION> }
  if(<NUM_SEMESTERS> != null) do { has NUM_SEMESTERS <NUM_SEMESTERS> }
  if(<NUM_ENROLLED> != null) do { has NUM_ENROLLED <NUM_ENROLLED> }
  if(<HAS_DISCUSSION> != null) do { has HAS_DISCUSSION <HAS_DISCUSSION> }
  if(<HAS_LAB> != null) do { has HAS_LAB <HAS_LAB> }
  if(<HAS_PROJECTS> != null) do { has HAS_PROJECTS <HAS_PROJECTS> }
  if(<HAS_EXAMS> != null) do { has HAS_EXAMS <HAS_EXAMS> }
  if(<NUM_REVIEWS> != null) do { has NUM_REVIEWS <NUM_REVIEWS> }
  if(<CLARITY_SCORE> != null) do { has CLARITY_SCORE <CLARITY_SCORE> }
  if(<EASINESS_SCORE> != null) do { has EASINESS_SCORE <EASINESS_SCORE> }
  if(<HELPFULNESS_SCORE> != null) do { has HELPFULNESS_SCORE <HELPFULNESS_SCORE> };

and

insert $course-offering isa COURSE_OFFERING
  has OFFERING_ID <OFFERING_ID>
  has COURSE_ID <COURSE_ID>
  if(<SEMESTER> != null) do { has SEMESTER <SEMESTER> }
  if(<SECTION_NUMBER> != null) do { has SECTION_NUMBER <SECTION_NUMBER> }
  if(<MONDAY> != null) do { has MONDAY <MONDAY> }
  if(<TUESDAY> != null) do { has TUESDAY <TUESDAY> }
  if(<WEDNESDAY> != null) do { has WEDNESDAY <WEDNESDAY> }
  if(<THURSDAY> != null) do { has THURSDAY <THURSDAY> }
  if(<FRIDAY> != null) do { has FRIDAY <FRIDAY> }
  if(<SATURDAY> != null) do { has SATURDAY <SATURDAY> }
  if(<SUNDAY> != null) do { has SUNDAY <SUNDAY> }
  if(<HAS_FINAL_PROJECT> != null) do { has HAS_FINAL_PROJECT <HAS_FINAL_PROJECT> }
  if(<HAS_FINAL_EXAM> != null) do { has HAS_FINAL_EXAM <HAS_FINAL_EXAM> }
  if(<TEXTBOOK> != null) do { has TEXTBOOK <TEXTBOOK> }
  if(<CLASS_ADDRESS> != null) do { has CLASS_ADDRESS <CLASS_ADDRESS> }
  if(<ALLOW_AUDIT> != null) do { has ALLOW_AUDIT <ALLOW_AUDIT> };

The sql-migrators.yaml file I use to populate them looks like this and works well:

- query: SELECT * FROM COURSE;
  template: templates/course-template.gql
  
- query: SELECT * FROM COURSE_OFFERING;
  template: templates/course-offering-template.gql

And when I create manual relations they work just fine. Here is a manual example:

$c0 isa COURSE, has COURSE_ID = 0, has NAME = "Intro to AI", has DEPARTMENT = "AI Lab", has NUMBER = "EECS 421", has CREDITS = "6", has ADVISORY_REQUIREMENT = "no", has ENFORCED_REQUIREMENT = "no", has DESCRIPTION = "Some long description", has NUM_SEMESTERS = 1, has NUM_ENROLLED = 203, has HAS_DISCUSSION = "yes", has HAS_LAB = "yes", has HAS_PROJECTS = "yes", has HAS_EXAMS = "yes", has NUM_REVIEWS = 8, has CLARITY_SCORE = 5, has EASINESS_SCORE = 2, has HELPFULNESS_SCORE = 5;

$co0 isa COURSE_OFFERING has OFFERING_ID = 0, has COURSE_ID = 0, has SEMESTER = 1, has SECTION_NUMBER = 2, has MONDAY = "yes", has TUESDAY = "no", has WEDNESDAY = "no", has THURSDAY = "no", has FRIDAY = "yes", has SATURDAY = "no", has SUNDAY = "no", has HAS_FINAL_PROJECT = "no", has HAS_FINAL_EXAM = "no", has TEXTBOOK = "Great Book of AI", has CLASS_ADDRESS = "4824", has ALLOW_AUDIT = "no";

$i0 isa INSTRUCTOR has NAME "Instructor name", has UNIQNAME "instructor";

(offered-course: $c0, course-offering: $co0) isa is-offered;
(instructor: $i0, course-offering: $co0) isa offering-instructor;

So now the only question is how I can connect my 160k course offering to the 10k courses in a way that does not take days to run. Is the code from my first posting even correct? It seems suspicious to me - I’m not convinced it does the right thing and it hasn’t terminated, yet.


#6

I think this query:

match
   $c isa COURSE has COURSE_ID $same;
   $co isa COURSE_OFFERING has COURSE_ID $same;
insert
   $relation (offered-course: $c, course-offering: $co) isa is-offered;

Is bound to be slow because of the $same variable. Since you don’t set a value for that it means we have to scan ALL course ids and this will be time consuming. In addition to this we have to check the links for every course ID.

Is there any chance you can set the value of $same ?

Barring that their might be a better way to model this issue.


#7

Good point. Yeah, I can probably use a SQL query to get it, i.e. SELECT * FROM COURSE; and then <COURSE_ID> in the query, i.e.

match
   $c isa COURSE has COURSE_ID <COURSE_ID>;
   $co isa COURSE_OFFERING has COURSE_ID <COURSE_ID>;
insert
   $relation (offered-course: $c, course-offering: $co) isa is-offered;

Unless you post any objections, I’ll retry now. Thanks, Filipe - I highly appreciate your great support.


#8

That was exactly what I was about to suggest. You can also use a rule instead if you prefer, but in this case it makes more sense to have the relationship in the data


#9

Just for the record: I started the job 2.5 hours ago and it’s still running. Apparently, still quite a heavyweight task. If there are any other suggestions, please feel free to let me know.


#10

I just noticed that there is a problem with your migration script I think:

the “COURSE_ID” is supposed to be a long, so you need @long(<COURSE_ID>)


#11

Interesting, thank you. But why? In my previous inserts I had stuff like insert $course isa COURSE has COURSE_ID <COURSE_ID> and that worked fine. Why do I have to cast it all of a sudden? Thanks in advance.


#12

Are you sure it worked fine? As I did that and it inserted a string instead of an int/long


#13

No, I’m not sure at all. I’m still waiting for the import to finish and essentially just tried to understand the implications of Miko’s advice. Because if I need to cast the course id I will most likely need to cast any resource with datatype long like NUM_SEMESTERS, CLARITY_SCORE etc. Good advice, though - would have expected GRAKN to pick this up from the column types in MySQL, but apparently I have to cast explicitly…


#14

Yes I think you might have to do this…


#15

OK, I stopped my migration attempt after running for a day, changed the long thing, cleared the DB and started from scratch. Thanks to anyone who helped. I’ll report back.


#16

And it seems to work. About 2 hours - great, the long made a major difference in speed. Thanks again. I’ll now move on to more complicated relations. This is getting interesting…


#17

That’s very good to hear. What was happening is that things were failing at validation time (you were trying to string into a long attribute) and that takes long, because the migrator really tries hard to fit the square peg into the round hole :slight_smile: