RocksDBException: unknown WriteBatch tag


#1

I’m using the python client. I was running a batch process for 5 hours. after about 5 hours, most grakn queries started failing with this error:

grakn.exception.GraknError.GraknError: Server/network error: <_Rendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "org.rocksdb.RocksDBException: unknown WriteBatch tag. Please check server logs for the stack trace."
	debug_error_string = "{"created":"@1549570257.028775382","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1036,"grpc_message":"org.rocksdb.RocksDBException: unknown WriteBatch tag. Please check server logs for the stack trace.","grpc_status":2}"

I was using the same grakn session the whole time, is that a problem?

As soon as I stopped and restarted the batch job, it stopped getting those errors. I didn’t stop or restart the grakn server or anything

I’m using Grakn-Cork 1.4.2

grakn.log just has this over and over, for 170,000 lines:

171031 2019-02-07 14:44:43,983 [ForkJoinPool.commonPool-worker-9] ERROR a.g.e.a.d.AttributeDeduplicatorDaemon - An exception has occurred in the attribute de-duplicator daemon.
171032 ai.grakn.engine.attribute.deduplicator.queue.RocksDbQueueException: org.rocksdb.RocksDBException: unknown WriteBatch tag
171033         at ai.grakn.engine.attribute.deduplicator.queue.RocksDbQueue.ack(RocksDbQueue.java:136)
171034         at ai.grakn.engine.attribute.deduplicator.AttributeDeduplicatorDaemon.lambda$startDeduplicationDaemon$1(AttributeDeduplicatorDaemon.java:122)
171035         at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
171036         at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
171037         at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
171038         at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
171039         at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
171040         at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
171041 Caused by: org.rocksdb.RocksDBException: unknown WriteBatch tag
171042         at org.rocksdb.RocksDB.write0(Native Method)
171043         at org.rocksdb.RocksDB.write(RocksDB.java:602)
171044         at ai.grakn.engine.attribute.deduplicator.queue.RocksDbQueue.ack(RocksDbQueue.java:133)
171045         ... 7 common frames omitted

#2

Hi Tapple, is there a reproducible example which we can access to troubleshoot it?


#3

I’m just doing a data migration

I ran the same migration that gave the first error without restarting the database, and it gave the error again. I’ve now restarted the grakn database, and am waiting to see if it will give the same error again


#4

It’s running on an ubuntu 16.04 LTS machine. I’ve not seen the same error running grakn on mac


#5

We needed to do maintainance on that server. I’ll let you know if the issue happens again


#6

I just realized that there’s a different version of grakn for mac and linux, and I’m using the mac version on linux. I’m going to switch to the linux version and see if this problem reappears within a few days


#7

I’m now getting this client-side error again, but now the logs are filled with a different error. I’m still running 1.4.2

2019-03-12 12:11:44,131 [ForkJoinPool.commonPool-worker-9] ERROR a.g.e.a.d.AttributeDeduplicatorDaemon - An exception has occurred in the attribute de-duplicator daemon.
java.util.NoSuchElementException: null
	at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
	at ai.grakn.engine.attribute.deduplicator.AttributeDeduplicator.deduplicate(AttributeDeduplicator.java:58)
	at ai.grakn.engine.attribute.deduplicator.AttributeDeduplicatorDaemon.lambda$startDeduplicationDaemon$1(AttributeDeduplicatorDaemon.java:117)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
	at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
2019-03-12 12:11:44,143 [ForkJoinPool.commonPool-worker-9] ERROR a.g.e.a.d.AttributeDeduplicatorDaemon - An exception has occurred in the attribute de-duplicator daemon.
java.util.NoSuchElementException: null
	at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
	at ai.grakn.engine.attribute.deduplicator.AttributeDeduplicator.deduplicate(AttributeDeduplicator.java:58)
	at ai.grakn.engine.attribute.deduplicator.AttributeDeduplicatorDaemon.lambda$startDeduplicationDaemon$1(AttributeDeduplicatorDaemon.java:117)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
	at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

#8

What my script is doing is, running this batch process in 12 python processes:

  1. Given a directory of files:
  2. Delete the grakn keyspace for that directory
  3. Load the schema
  4. Do a bunch of file and sqlite processing (1-10 minutes)
  5. Migrate a few thousand nodes from sqlite to grakn in one transaction (1-3 minutes)
  6. Run a few dozen more single-node insertions into grakn over the next few minutes

#9

Hi @Tapple,

this issue should be fixed in Grakn 1.5 - coming out in a few days!