Apache Spark Integration with Grakn


#1

Hi Team,
I have data in a Hive database with Apache Spark on top of it. I like the thought process of defining ontology, and i am looking for ways to define ontology on the data in the Hive database.I saw something in Grakn Architecture which connects to Apache Spark.
Can anyone guide me on the process of capabilities of Grakn to connect to data in Hive database through Apache Spark?
Otherwise; i assume that i need to write pipelines to load Hive data into Grakn Cassandra database, as i am under the impression that Grakn needs the data to be loaded into its cassandra database.

Kindly help me on this transformation.

Thank You


#2

Hi @Sudharshan,

So at this stage you would need to pipe the Hive data into Grakn.

However, we are at the moment trying to identify other tools to more strongly couple with. We already are using Apache Spark for our analytics platform but nothing we have “out of the box” can automigrate your existing Hive data.

If you were able to convert your data into CSV or JSOn then you could migrate using one of our existing tools.

If you can’t do that then you would need to create a custom pipeline as you mentioned. For this we have a batch loading client which may be able to help.

Hope that’s a start at least.

Regards,

Filipe


#3

Hi @filipe,
Thanks for your update.