Running Large-Scale Graph Analytics with Memgraph and Nvidia CuGraph Algorithms

lmeyerov · on Aug 17, 2022

We've been enjoying cudf+cugraph for millions/billions of nodes & edges!

For notebook/python users, cugraph is setup well to work with non-graph DB's as well. The blogpost makes sense for graph db users, but the pydata ecosystem has moved far enough nowadays that we've been able to recreate the blogpost in a few lines to work directly on the data (compute-tier), and typical with our security/fraud/social/supplychain/etc analysts, with non-graph DBs (spark, sql, pandas, cudf, ...):

```

# conda install rapids ...

# pip install --user graphistry==0.27.1

# import graphistry, pandas as pd # or import pyspark, cudf, sqlalchemy, ...

g1 = graphistry.edges(pd.read_csv('logs.csv'), 'src_ip', 'user')

g2 = g1.compute_cugraph('pagerank').encode_point_color('pagerank', ['blue', 'red', 'yellow'], as_continuous=True)

g2.plot() # or url = g2.plot(render=False)

```

Adding a graph db or kv store to your infra can still make sense, but a much more nuanced conv. Ex: Can help gathering a lot of data to do inferencing for a graph neural net in real-time during heavy website load. It's interesting to consider adding a GPU to your DB storage nodes instead of keeping separate, e.g., $-wise, T4 GPUs are making a bit more approachable!

underpeak · on Aug 17, 2022

It's always a question of how to separate concerns when using an ecosystem consisting of DB and an engine running some kind of algorithms on top of it.

The advantage of using the same system is a similar API for users, and the opportunity to use identical query language to run both matching queries and analyses like this one in cuGraph. For anyone using graph DBs, this might be relevant, while on the other side, achieving this in a few lines of Python seems like a good deal for typical analysts like you mentioned.

mbuda · on Aug 17, 2022

Correct, but with a DB involved, there are additional benefits. This implementation is still rudimentary, but there are many options for tighter integrations, e.g., a specialized DB index that will copy data to a GPU continuously, significantly reducing latency in the production environment!

lmeyerov · on Aug 17, 2022

Yes, that was what motivated my comment on for GNN inferencing, or more generally, graph-informed inferencing servers:

We generally find it's better to separate DB nodes from pricey GPU parts -- always-on $$$ on db vs during-use GPU (daytime, ...), elastic scaling to multi-GPU, not having heavy GPU analytics jobs bursts break your DB server SLAs (!!!), etc.

But not always. Most to mind, we're seeing:

- Internal data teams: When a box has ~no one using it on average it's ok to be taken over. Like a data scientist's personal dev box, or a DS team server. In these cases, everything is on the ~same box: notebooks, rapids containers, graphistry runtimes, whatever DB, esp. with bulk Apache Arrow in/out like the new Neo4j support here or RAPIDS zero-copy GPU dataframe pointers, and Spark has had this for years. (I'm guessing memgraph is/will support this too if they're already integrating cugraph.)

- Production: The GNN inferencing case is interesting because it is likely much more tightly scoped and steady for being safe in production DBs with basic delivery SLAs. We aren't really seeing production users doing GPU + DB on the same node (except ~small-scale GPU DBs like say omnisci/heavy.ai) due to the above issues. Inferencing is inching closer, with graph "entity 360" context today is looking more like like HBase/Mongo/S3 system <> cpu/gpu inferencer. An interesting thing with memgraph's architecture is we're getting into customer convs where it may make sense to the inferencer compute closer to the DB node. I probably still wouldn't do a big A100 on a production DB node, but attaching some T4s vs running them separately is interesting!

mbuda · on Aug 17, 2022

Very good points!

JacobiX · on Aug 17, 2022

This is the first time I hear about memgraph, its really interesting because I'm beginning a project that involves the analysis of some large graphs. Sometimes, before using a lib, I dig into the GitHub repo. So here, I viewed their implementation of a ring buffer (memgraph/src/data_structures/ring_buffer.hpp).

I was surprised with what I found, especially the sleep after adding an element and the loop just before a spin-lock. Not sure, if this is a good implementation, I hope if a more knowledgable person could explain to me this piece of code.

```

  template <typename... TArgs>
  void emplace(TArgs &&...args) {

    while (true) {
      {
        std::lock_guard<memgraph::utils::SpinLock> guard(lock_);
        if (size_ < capacity_) {
          buffer_[write_pos_++] = TElement(std::forward<TArgs>(args)...);
          write_pos_ %= capacity_;
          size_++;
          return;
        }
      }

      SPDLOG_WARN("RingBuffer full: worker waiting");

      // Sleep time determined using tests/benchmark/ring_buffer.cpp

      std::this_thread::sleep_for(std::chrono::microseconds(250));

    }
  }

```

mbuda · on Aug 17, 2022

Nice observation! That sleep is there in the case the queue is full, which means the caller can't put an additional element into it. The only option is to wait and see if the queue becomes less full... The hard question is for how long? Here we just added a fixed amount which was/is reasonable for our use-cases. Maybe/probably a better would be to implement exponential backoff https://en.wikipedia.org/wiki/Exponential_backoff, but that's a bit more complex (not too much). We never hit any limitation with the current implementation, if we do, we'll improve further!