1.4.2 errors and delete after migrate issue


#1

Hi

Whilst updating to 1.4.2 I notice that I’m seeing all sorts of errors pop up in the server logs. I’m running a simple test (on ubuntu bionic) which creates a keyspace (using the console), migrates some data (using the console) and then deletes the keyspace using the driver.

After the first test run completed successfully I am seeing errors being logged continually i.e. over and over again:

2018-10-09 13:51:25,003 [Thread-12] ERROR a.g.e.a.d.AttributeDeduplicatorDaemon - An exception has occurred in the attribute de-duplicator daemon.

org.janusgraph.core.JanusGraphException: Could not call index

    at org.janusgraph.graphdb.util.SubqueryIterator.<init>(SubqueryIterator.java:68)

    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$3.execute(StandardJanusGraphTx.java:1295)

    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx$3.execute(StandardJanusGraphTx.java:1190)

    at org.janusgraph.graphdb.query.QueryProcessor$LimitAdjustingIterator.getNewIterator(QueryProcessor.java:194)

    at org.janusgraph.graphdb.query.LimitAdjustingIterator.hasNext(LimitAdjustingIterator.java:68)

    at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:54)

    at org.janusgraph.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:44)

    at org.janusgraph.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:66)

    at [com.google.common.collect.Iterables$4.iterator(Iterables.java:578](http://com.google.common.collect.iterables$4.iterator(iterables.java:578/))

    at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphStep.executeGraphCentryQuery(JanusGraphStep.java:156)

    at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphStep.lambda$null$1(JanusGraphStep.java:95)

    at java.lang.Iterable.forEach(Iterable.java:75)

    at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphStep.lambda$new$2(JanusGraphStep.java:95)

    at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:142)

    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:128)

    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:38)

    at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:200)

    at ai.grakn.engine.attribute.deduplicator.AttributeDeduplicator.deduplicate(AttributeDeduplicator.java:58)

    at ai.grakn.engine.attribute.deduplicator.AttributeDeduplicatorDaemon.lambda$startDeduplicationDaemon$1(AttributeDeduplicatorDaemon.java:117)

    at [java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590](http://java.util.concurrent.completablefuture$asyncsupply.run(completablefuture.java:1590/))

    at java.lang.Thread.run(Thread.java:748)

Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT10S

    at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:100)

    at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)

    at org.janusgraph.diskstorage.BackendTransaction.executeRead(BackendTransaction.java:470)

    at org.janusgraph.diskstorage.BackendTransaction.indexQuery(BackendTransaction.java:396)

    at org.janusgraph.graphdb.query.graph.MultiKeySliceQuery.execute(MultiKeySliceQuery.java:51)

    at org.janusgraph.graphdb.database.IndexSerializer.query(IndexSerializer.java:512)

    at org.janusgraph.graphdb.util.SubqueryIterator.<init>(SubqueryIterator.java:66)

    ... 20 common frames omitted

Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend

    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:159)

    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:113)

    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getSlice(AstyanaxKeyColumnValueStore.java:102)

    at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)

    at org.janusgraph.diskstorage.keycolumnvalue.KCVSProxy.getSlice(KCVSProxy.java:77)

    at org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:399)

    at org.janusgraph.diskstorage.BackendTransaction$5.call(BackendTransaction.java:396)

    at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)

    ... 26 common frames omitted

Caused by: [com.netflix.astyanax.connectionpool.exceptions.BadRequestException](http://com.netflix.astyanax.connectionpool.exceptions.badrequestexception/): BadRequestException: [host=127.0.0.1(127.0.0.1):9160, latency=0(0), attempts=1]InvalidRequestException(why:Keyspace test_keyspace_321_1 does not exist)

    at [com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:159](http://com.netflix.astyanax.thrift.thriftconverter.toconnectionpoolexception(thriftconverter.java:159/))

    at [com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65](http://com.netflix.astyanax.thrift.abstractoperationimpl.execute(abstractoperationimpl.java:65/))

    at [com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28](http://com.netflix.astyanax.thrift.abstractoperationimpl.execute(abstractoperationimpl.java:28/))

    at [com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:153](http://com.netflix.astyanax.thrift.thriftsyncconnectionfactoryimpl$thriftconnection.execute(thriftsyncconnectionfactoryimpl.java:153/))

    at [com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119](http://com.netflix.astyanax.connectionpool.impl.abstractexecutewithfailoverimpl.tryoperation(abstractexecutewithfailoverimpl.java:119/))

    at [com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352](http://com.netflix.astyanax.connectionpool.impl.abstracthostpartitionconnectionpool.executewithfailover(abstracthostpartitionconnectionpool.java:352/))

    at [com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538](http://com.netflix.astyanax.thrift.thriftcolumnfamilyqueryimpl$4.execute(thriftcolumnfamilyqueryimpl.java:538/))

    at org.janusgraph.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:157)

    ... 33 common frames omitted

Caused by: org.apache.cassandra.thrift.InvalidRequestException: null

    at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14678)

    at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.read(Cassandra.java:14633)

    at org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:14559)

    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)

    at org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:741)

    at org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:725)

    at [com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:544](http://com.netflix.astyanax.thrift.thriftcolumnfamilyqueryimpl$4$1.internalexecute(thriftcolumnfamilyqueryimpl.java:544/))

    at [com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4$1.internalExecute(ThriftColumnFamilyQueryImpl.java:541](http://com.netflix.astyanax.thrift.thriftcolumnfamilyqueryimpl$4$1.internalexecute(thriftcolumnfamilyqueryimpl.java:541/))

    at [com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60](http://com.netflix.astyanax.thrift.abstractoperationimpl.execute(abstractoperationimpl.java:60/))

    ... 39 common frames omitted

However, the server status shows both Storage and Engine as running.

After a 2nd successful run of the test this changes to a different error, again continually, over and over again:

2018-10-09 13:53:54,285 [Thread-12] ERROR a.g.e.a.d.AttributeDeduplicatorDaemon - An exception has occurred in the attribute de-duplicator daemon.

java.util.NoSuchElementException: null

    at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)

    at ai.grakn.engine.attribute.deduplicator.AttributeDeduplicator.deduplicate(AttributeDeduplicator.java:58)

    at ai.grakn.engine.attribute.deduplicator.AttributeDeduplicatorDaemon.lambda$startDeduplicationDaemon$1(AttributeDeduplicatorDaemon.java:117)

    at [java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590](http://java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590))

    at java.lang.Thread.run(Thread.java:748)

I am also getting intermittent delete failures on this test, which seem to go away if I time.sleep(30) before the delete.:

args = namespace(keyspace='test_keyspace_308_5')
 def delete_keyspace(args):
 try:
> grakn.Grakn('localhost:48555').keyspaces().delete(args.keyspace)
graql_service/graql_operations.py:80:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <grakn.service.Keyspace.KeyspaceService.KeyspaceService object at 0x7f1b8c45c400>
keyspace = 'test_keyspace_308_5'
 def delete(self, keyspace):
 delete_request = keyspace_messages.Keyspace.Delete.Req()
 delete_request.name = keyspace
> self.stub.delete(delete_request)
/root/.local/share/virtualenvs/graql-service-Q4DLt09h/lib/python3.6/site-packages/grakn/service/Keyspace/KeyspaceService.py:41:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7f1b8c47a390>
request = name: "test_keyspace_308_5"
, timeout = None, metadata = None
credentials = None
 def __call__(self, request, timeout=None, metadata=None, credentials=None):
 state, call, = self._blocking(request, timeout, metadata, credentials)
> return _end_unary_response_blocking(state, call, False, None)
/root/.local/share/virtualenvs/graql-service-Q4DLt09h/lib/python3.6/site-packages/grpc/_channel.py:532:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
state = <grpc._channel._RPCState object at 0x7f1b8c47a400>
call = <grpc._cython.cygrpc.SegregatedCall object at 0x7f1b8c480608>
with_call = False, deadline = None
 def _end_unary_response_blocking(state, call, with_call, deadline):
 if state.code is grpc.StatusCode.OK:
 if with_call:
 rendezvous = _Rendezvous(state, call, None, deadline)
 return state.response, rendezvous
 else:
 return state.response
 else:
> raise _Rendezvous(state, None, None, deadline)
E grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
E status = StatusCode.INVALID_ARGUMENT
E details = "GraknTxOperationException-Unable to close graph [test_keyspace_308_5]. Please check server logs for the stack trace."
E debug_error_string = "{"created":"@1539087142.206763532","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"GraknTxOperationException-Unable to close graph [test_keyspace_308_5]. Please check server logs for the stack trace.","grpc_status":3}"
E >
/root/.local/share/virtualenvs/graql-service-Q4DLt09h/lib/python3.6/site-packages/grpc/_channel.py:466:

#2

Hi Nathan, is it possibble to create a Github issue with a reproducible example?


#3

Hi Ganesh

I will try and come back to this, reproducibility was tricky and this will also require crafting some ‘de-sensitized’ test data sets, hence not a 5 min job.


#4

Thank you @nathan.johnson! Absolutely, there’s no rush.