Efficient way to query with nodejs


#1

Hi,
Following the grakn meeting in Paris where I talk with some grakn team members, I learn that the priority was on the grpc driver than the REST API. Unfortunately, I choose to go with the rest API because of the performance… So I try again and this post aim to give you my feedbacks on this and maybe have some tips to improve my test when I use the nodeJS driver.

Context: I want to fetch the first 25 elements of an entity and all attributes values for each.

My query:

match $x isa Malware; offset 0; limit 25; get;

My results:

loadRestMalwares: Done in 365 ms
loadRestMalwares: Done in 366 ms
loadRestMalwares: Done in 367 ms

loadMalwaresNoAwait: Done in 1115 ms
loadMalwaresNoAwait: Done in 1125 ms
loadMalwaresNoAwait: Done in 1126 ms

loadMalwaresPartialAwait: Done in 2356 ms
loadMalwaresPartialAwait: Done in 2376 ms
loadMalwaresPartialAwait: Done in 2380 ms

loadMalwaresFullAwait: Done in 5518 ms
loadMalwaresFullAwait: Done in 5519 ms
loadMalwaresFullAwait: Done in 5519 ms

My code:
loadRestMalwares:
I can’t put the code here because i have some specifc ramda mapper and more code. But the idea is to get the result of the query and then call all the /attributes api in parrallel to get the needed information

loadMalwaresFullAwait:
Await everything like in documentation or https://github.com/graknlabs/grakn/issues/4664

const loadMalwaresFullAwait = async function getMalwaresSync() {
  const start = new Date().getTime();
  const session = await grakn.session('grakn');
  const transaction = await session.transaction(Grakn.txType.READ);
  const answerIterator = await transaction.query(
    'match $x isa Malware; offset 0; limit 25; get;'
  );
  const instances = [];
  let answer = await answerIterator.next();
  while (answer != null) {
    const attrIterator = await answer
      .map()
      .get('x')
      .attributes();
    let attr = await attrIterator.next();
    const attributes = [];
    while (attr != null) {
      const t = await attr.type();
      const label = await t.label();
      const value = await attr.value();
      attributes.push([label, value]);
      attr = await attrIterator.next();
    }
    instances.push(attributes);
    answer = await answerIterator.next();
  }
  transaction.close();
  session.close();
  const end = new Date().getTime();
  console.log(`loadMalwaresFullAwait: Done in ${end - start} ms`);
  const queryResult = { data: instances };
  return map(l => fromPairs(l))(queryResult.data);
};

loadMalwaresPartialAwait:
Use await on session, transaction and iterators

const loadMalwaresPartialAwait = async function getMalwares() {
  const start = new Date().getTime();
  const session = await grakn.session('grakn');
  const transaction = await session.transaction(Grakn.txType.READ);
  const answerIterator = await transaction.query(
    'match $x isa Malware; offset 0; limit 25; get;'
  );
  const instances = [];
  let answer = await answerIterator.next();
  while (answer != null) {
    const attrIterator = await answer
      .map()
      .get('x')
      .attributes();
    let attr = await attrIterator.next();
    const attributes = [];
    while (attr != null) {
      const label = attr.type().then(t => t.label());
      const value = attr.value();
      attributes.push([label, value]);
      attr = await attrIterator.next();
    }
    instances.push(attributes);
    answer = await answerIterator.next();
  }
  transaction.close();
  session.close();
  const queryResult = await Promise.resolveDeep({ data: instances });
  const end = new Date().getTime();
  console.log(`loadMalwaresPartialAwait: Done in ${end - start} ms`);
  return map(l => fromPairs(l))(queryResult.data);
};

loadMalwaresNoAwait:
Just await the session and the transaction but nothing more. I dont find any way the asynchronously iter over the iterator of the driver, so I use the collect call instead of iterator.next

const loadMalwaresNoAwait = async function getMalwares() {
  const start = new Date().getTime();
  const session = await grakn.session('grakn');
  const transaction = await session.transaction(Grakn.txType.READ);
  const dataPromise = transaction
    .query('match $x isa Malware; offset 0; limit 25; get;')
    .then(result =>
      result
        .collect() // Blocking call
        .then(concepts => {
          // For each concept, we need to get all attributes label/value.
          const conceptsPromise = [];
          concepts.forEach(concept => {
            const conceptWithAttribute = concept
              .map()
              .get('x')
              .attributes()
              .then(attributesDefIterator => {
                const attributesDefinitionPromise = attributesDefIterator.collect(); // Blocking call
                return attributesDefinitionPromise.then(
                  attributesDefinition => {
                    // For each find attribute reference, we need to fetch label and value.
                    const attributesOfConcept = [];
                    attributesDefinition.forEach(attr => {
                      const labelPromise = attr.type().then(t => t.label());
                      const valuePromise = attr.value();
                      attributesOfConcept.push([labelPromise, valuePromise]);
                    });
                    return attributesOfConcept;
                  }
                );
              });
            conceptsPromise.push(conceptWithAttribute);
          });
          return conceptsPromise;
        })
    );
  // Resolve every promises in the result
  const queryResult = await Promise.resolveDeep({ data: dataPromise });
  transaction.close();
  session.close();
  const end = new Date().getTime();
  console.log(`loadMalwaresNoAwait: Done in ${end - start} ms`);
  return map(l => fromPairs(l))(queryResult.data);
};

Conclusion:
As you can see, I cant find a way to get same performance with NodeJS driver.
Can you help me on that? What I am doing wrong?

Thanks !


#2

Hi @julien,

can you try with the following and let us know the execution time:

const loadMalwaresFullAwait = async function getMalwaresSync() {
  const start = new Date().getTime();
  const session = await grakn.session('grakn');
  const transaction = await session.transaction(Grakn.txType.READ);
  const concepts = await (await transaction.query('match $x isa Malware; offset 0; limit 25; get;')).collectConcepts();
  const instances = [];

  await Promise.all(concepts.map(async (concept)=>{
    const attrs = await (await concept.attributes()).collect();
    return Promise.all(attrs.map(async (attr)=>{
      const label = await (await attr.type()).label();
      const value = await attr.value();
      instances.push([label, value]);
    }))
  }))

  transaction.close();
  session.close();
  const end = new Date().getTime();
  console.log(`loadMalwaresParallelAwait: Done in ${end - start} ms`);
  const queryResult = { data: instances };
  return map(l => fromPairs(l))(queryResult.data);
};

I havent tested the code, but it should work, maybe you need small adjustements.

Basically here I am trying to collect attributes and their values in parallel, blocking on Promise.all so that once the code is done you should have all the attributes inside the instances array.

This code should theoretically run a bit faster, please do let us know!

Thank you for asking,

Marco


#3

Thanks for your anwser @marco
Your implementation takes approximatively the same time than my noAwait implementation.

To other people that may have the same question, for this use case the the nodeJS driver add extra networks overhead that makes it slower than REST … for now :wink: