Skip to content

Commit a75802b

Browse files
authored
Merge pull request JanusGraph#1187 from GDATASoftwareAG/architecture-docs
Document Deployment Scenarios JanusGraph#119 [skip ci]
2 parents 0181f44 + 1fe43bf commit a75802b

9 files changed

+58
-2
lines changed

docs/basics.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -814,6 +814,8 @@ By adding the `JanusGraphIoRegistry` to the `org.apache.tinkerpop.gremlin.driver
814814

815815
It is possible to extend Gremlin Server with other means of communication by implementing the interfaces that it provides and leverage this with JanusGraph. See more details in the appropriate TinkerPop documentation.
816816

817+
include::deploymentscenarios.adoc[]
818+
817819
include::configuredgraphfactory.adoc[]
818820

819821
[[indexes]]

docs/deploymentscenarios.adoc

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
[[deployment-scenarios]]
2+
== Deployment Scenarios
3+
4+
JanusGraph offers a wide choice of storage and index backends which results in great flexibility of how it can be deployed. This chapter presents a few possible deployment scenarios to help with the complexity that comes with this flexibility.
5+
6+
Before discussing the different deployment scenarios, it is important to understand the roles of JanusGraph itself and that of the backends. First of all, applications only communicate directly with JanusGraph, mostly by sending Gremlin traversals for execution. JanusGraph then communicates with the configured backends to execute the received traversal. When JanusGraph is used in the form of JanusGraph Server, then there is nothing like a _master_ JanusGraph Server. Applications can therefore connect to any JanusGraph Server instance. They can also use a load-balancer to schedule requests to the different instances. The JanusGraph Server instances themselves don't communicate to each other directly which makes it easy to scale them when the need arises to process more traversals.
7+
8+
NOTE: The scenarios presented in this chapter are only examples of how JanusGraph can be deployed. Each deployment needs to take into account the concrete use cases and production needs.
9+
10+
[[getting-started-scenario]]
11+
=== Getting Started Scenario
12+
13+
This scenario is the scenario most users probably want to choose when they are just getting started with JanusGraph. It offers scalability and fault tolerance with a minimum number of servers required. JanusGraph Server runs together with an instance of the storage backend and optionally also an instance of the index backend on every server.
14+
15+
image:getting-started-scenario.svg[Getting started deployment scenario diagram, 650]
16+
17+
A setup like this can be extended by simply adding more servers of the same kind or by moving one of the components onto dedicated servers. The latter describes a growth path to transform the deployment into the <<advanced-scenario,Advanced Scenario>>.
18+
19+
Any of the scalable storage backends can be used with this scenario. Note however that for Scylla http://docs.scylladb.com/getting-started/scylla_in_a_shared_environment/[some configuration is required when it is hosted co-located with other services] like in this scenario. When an index backend should be used in this scenario then it also needs to be one that is scalable.
20+
21+
[[advanced-scenario]]
22+
=== Advanced Scenario
23+
24+
The advanced scenario is an evolution of the <<getting-started-scenario>>. Instead of hosting the JanusGraph Server instances together with the storage backend and optionally also the index backend, they are now separated on different servers.
25+
The advantage of hosting the different components (JanusGraph Server, storage/index backend) on different servers is that they can be scaled and managed independently of each other.
26+
This offers a higher flexibility at the cost of having to maintain more servers.
27+
28+
image:advanced-scenario.svg[Advanced deployment scenario diagram, 800]
29+
30+
Since this scenario offers independent scalability of the different components, it of course makes most sense to also use scalable backends.
31+
32+
[[minimalist-scenario]]
33+
=== Minimalist Scenario
34+
35+
It is also possible to host JanusGraph Server together with the backend(s) on just one server. This is especially attractive for testing purposes or for example when JanusGraph just supports a single application which can then also run on the same server.
36+
37+
image:minimalist-scenario.svg[Minimalist deployment scenario diagram, 650]
38+
39+
Opposed to the previous scenarios, it makes most sense to use backends for this scenario that are not scalable. The in-memory backend can be used for testing purposes or Berkeley DB for production and Lucene as the optional index backend.
40+
41+
[[embedded-janusgraph]]
42+
=== Embedded JanusGraph
43+
44+
Instead of connecting to the JanusGraph Server from an application it is also possible to embed JanusGraph as a library inside a JVM based application. While this reduces the administrative overhead, it makes it impossible to scale JanusGraph independently of the application.
45+
Embedded JanusGraph can be deployed as a variation of any of the other scenarios. JanusGraph just moves from the server(s) directly into the application as its now just used as a library instead of an independent service.

docs/partitioning.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[[graph-partitioning]]
22
== Graph Partitioning
33

4-
When the JanusGraph cluster consists of multiple storage backend instances, the graph is partitioned across those machines. Since JanusGraph stores the graph in an adjacency list representation the assignment of vertices to machines determines the partitioning. By default, JanusGraph uses a random partitioning strategy that randomly assigns vertices to machines. Random partitioning is very efficient, requires no configuration, and results in balanced partitions. However, random partitioning results in less efficient query processing as the JanusGraph cluster grows to accommodate more graph data because of the increasing cross-instance communication required to retrieve the query's result set. Explicit graph partitioning can ensure that strongly connected and frequently traversed subgraphs are stored on the same instance thereby reducing the communication overhead significantly.
4+
When JanusGraph is deployed on a cluster of multiple storage backend instances, the graph is partitioned across those machines. Since JanusGraph stores the graph in an adjacency list representation the assignment of vertices to machines determines the partitioning. By default, JanusGraph uses a random partitioning strategy that randomly assigns vertices to machines. Random partitioning is very efficient, requires no configuration, and results in balanced partitions. However, random partitioning results in less efficient query processing as the JanusGraph cluster grows to accommodate more graph data because of the increasing cross-instance communication required to retrieve the query's result set. Explicit graph partitioning can ensure that strongly connected and frequently traversed subgraphs are stored on the same instance thereby reducing the communication overhead significantly.
55

66
To enable explicit graph partitioning in JanusGraph, the following configuration options must be set when the JanusGraph cluster is initialized.
77

@@ -10,7 +10,7 @@ cluster.partition = true
1010
cluster.max-partitions = 32
1111
ids.flush = false
1212

13-
The configuration option `max-partitions` controls how many virtual partitions JanusGraph creates. This number should be roughly twice the number of storage backend instances. If the JanusGraph cluster is expected to grow, estimate the size of the cluster in the foreseeable future and take this number as the baseline. Setting this number too large will unnecessarily fragment the cluster which can lead to poor performance.
13+
The configuration option `max-partitions` controls how many virtual partitions JanusGraph creates. This number should be roughly twice the number of storage backend instances. If the cluster of storage backend instances is expected to grow, estimate the size of the cluster in the foreseeable future and take this number as the baseline. Setting this number too large will unnecessarily fragment the cluster which can lead to poor performance.
1414

1515
Because explicit graph partitioning controls the assignment of vertices to storage instances it cannot be enabled once a JanusGraph cluster is initialized. Likewise, the number of virtual partitions cannot be changed without reloading the graph.
1616

docs/static/images/advanced-scenario.svg

Lines changed: 2 additions & 0 deletions
Loading

docs/static/images/advanced-scenario.xml

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

docs/static/images/getting-started-scenario.svg

Lines changed: 2 additions & 0 deletions
Loading

docs/static/images/getting-started-scenario.xml

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

docs/static/images/minimalist-scenario.svg

Lines changed: 2 additions & 0 deletions
Loading

docs/static/images/minimalist-scenario.xml

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)