Use specific ID in insert query


#1

@haikal, based on your comment at Loading long files into Grakn [solved] a valid way of batch inserting data is to split the data into entities and relationships. I’ve done that like so:

entities - https://gist.github.com/BFergerson/3603ac636d640f52056fa0270b604d77
relationships - https://gist.github.com/BFergerson/569bb00fd13028eaccb48f683a4eec67

The above files were created with UUIDs that are unique to each entity (same as Grakn’s id). The UUID has no special meaning and should be considered temporary. It is only created to link entities together via relationships. I understand that I could store this value as a key and use that in the match part of a match-insert query for the relationships file to achieve the desired results. This seem like extra work though if the UUID is really the same as Grakn’s id. Also, if I’m not mistaken using Grakn’s internal id is the quickest way to get an entity out of the database so I would prefer to use the id for batch processing.

I partly remember reading an old discussion topic about removing the ability to specify the internal id Grakn uses. If it’s not possible to set the id of the entity that goes in the database is it possible to get some kind of response from the batch insert that relates the UUID to the internal id? I know via the console an insert returns the identifier and the id of anything inserted. Don’t believe this is implemented for batch processes though.

Open to any ideas. Thanks


#2

This sounds like the same thing being asked by @attodorov on his thread at How can I batch-insert many statements that are dependent on each other?.

In my case for example, lets say I have this entity:

insert $10bcf92e-c8fe-4d29-bb57-7f7be0add0be isa ReturnStmt;

It’s a return statement in Java. Doesn’t have any unique identifiers just like it doesn’t in real Java source code. What makes it special is its place in the source code. So it has a relationship to other code, like so:

insert isa statement has order_index 0 (has_statement: $a3dfcbfd-4af6-4949-830e-3165c2a38256, is_statement: $10bcf92e-c8fe-4d29-bb57-7f7be0add0be);

The above meaning that the return entity has the relationship is_statement with another entity and the other entity having a has_statement relationship to it.

What I believe @attodorov and myself are asking is how can we link these entities to each other via relationships using identifiers given during batch inserts? In my case I could of course write the query necessary to match the return entity in the database but this query is going to be unique for each insert query since it doesn’t have an identifier and the match query would be based on its exact location in the graph. Instead there should be a way to link it via a simple identifier. The best in this case being Grakn’s internal one. From what I understand though this isn’t allowed to be set manually and isn’t returned via batch inserts.


#3

hi @BFergerson, unfortunately it’s not possible to return the Concept (internal) IDs after batch loading. This is not a Grakn limitation, Just like any database), this would produce a massive dump of data back to the user, which makes no sense.

Your way of loading the data with your own generated UUID is not a bad idea. In the near future, this should be done by using keys. For now, storing them as attributes is okay. Attributes are actually indexed too.