1

I'm trying to design custom graph partitioning for my use case.

As per the JanusGraph documentation, the number of partition can be configured.

The configuration option max-partitions controls how many virtual partitions JanusGraph creates. This number should be roughly twice the number of storage backend instances.

The ID placement can be controlled by a custom IDPlacementStrategy implementation.

The user can provide a use case specific vertex placement strategy by implementing the IDPlacementStrategy interface and registering it in the configuration through the ids.placement option

In the docs, it is also mentioned that the partitions are identified by an integer.

When implementing IDPlacementStrategy, note that partitions are identified by an integer id in the range from 0 to the number of configured virtual partitions minus 1. For our example configuration, there are partitions 0, 1, 2, 3, ..31.

My questions are,

  1. Is the virtual partition in JanusGraph same as the Cassandra partitions?
  2. Can the partitioning be configured in such a way that Vertices with same value for a given property gets stored in a single or a group of Cassandra partitions?

1 Answer 1

1

JanusGraph does virtual partitioning as a means of distributing (splitting) the graph across storage backends such that the vertices are distributed across machines in a balanced way.

When using Cassandra as a storage backend for JanusGraph, the partitioning is handled by Cassandra such that the graph data is stored in CQL tables. Graph nodes, edges (and their properties), and vertices are stored in an edgestore partitioned by key (see JanusGraph Data Model for details).

If you are interested, Boxuan Li wrote a blog post with details of how the graph data is stored in Cassandra -- JanusGraph Deep Dive: Data layout in JanusGraph. Cheers!

Not the answer you're looking for? Browse other questions tagged or ask your own question.