Skip to content

Commit db58340

Browse files
committed
[GITFLOW]merging 'release/0.5.0' into 'master'
2 parents ab3d050 + c7c63e2 commit db58340

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+4186
-5794
lines changed

CHANGES.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
0.5.0
2+
=====
3+
4+
#30 fix hash table size check for extreme hash table sizes
5+
#27 replace ReentrantLock with CAS
6+
#26 add chunked implementation
7+
#13 remove tables implementation
8+
19
0.4.5
210
=====
311

README.rst

Lines changed: 113 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,99 @@
11
OHC - An off-heap-cache
22
=======================
33

4-
Status
5-
------
6-
7-
This library should be considered as stable.
8-
94
Features
10-
--------
5+
========
116

127
- asynchronous cache loader support
138
- optional per entry or default TTL/expireAt
149
- entry eviction and expiration without a separate thread
1510
- capable of maintaining huge amounts of cache memory
11+
- suitable for tiny/small entries with low overhead using the chunked implementation
1612

1713
Performance
18-
-----------
14+
===========
1915

2016
OHC shall provide a good performance on both commodity hardware and big systems using non-uniform-memory-architectures.
2117

2218
No performance test results available yet - you may try the ohc-benchmark tool. See instructions below.
2319
A very basic impression on the speed is in the _Benchmarking_ section.
2420

2521
Requirements
26-
------------
22+
============
2723

2824
Java7 VM that support 64bit and has ``sun.misc.Unsafe`` (Oracle JVMs on x64 Intel CPUs).
2925

30-
An extension jar that makes use of new ``sun.misc.Unsafe`` methods in Java 8 exists.
26+
OHC is targeted for Linux and OSX. It *should* work on Windows and other Unix OSs.
3127

3228
Architecture
33-
------------
29+
============
30+
31+
OHC provides two implementations for different cache entry characteristics:
32+
- The _linked_ implementation allocates off-heap memory for each entry individually and works best for medium and big entries.
33+
- The _chunked_ implementation allocates off-heap memory for each hash segment as a whole and is intended for small entries.
34+
35+
Linked implementation
36+
---------------------
37+
38+
The number of segments is configured via ``org.caffinitas.ohc.OHCacheBuilder``, defaults to ``# of cpus * 2`` and must
39+
be a power of 2. Entries are distribtued over the segments using the most significant bits of the 64 bit hash code.
40+
Accesses on each segment are synchronized.
41+
42+
Each hash-map entry is allocated individually. Entries are free'd (deallocated), when they are no longer referenced by
43+
the off-heap map itself or any external reference like ``org.caffinitas.ohc.DirectValueAccess`` or a
44+
``org.caffinitas.ohc.CacheSerializer``.
45+
46+
The design of this implementation reduces the locked time of a segment to a very short time. Put/replace operations
47+
allocate memory first, call the ``org.caffinitas.ohc.CacheSerializer`` to serialize the key and value and then put the
48+
fully prepared entry into the segment.
49+
50+
Eviction is performed using an LRU algorithm. A linked list through all cached elements per segment is used to keep
51+
track of the eldest entries.
52+
53+
Since this implementation performs alloc/free operations for each individual entry, take care of memory fragmentation.
54+
We recommend using jemalloc to keep fragmentation low. On Unix operating systems, preload jemalloc. OSX usually does
55+
not require jemalloc for performance reasons.
56+
57+
The extension jar ``ohc-core-j8`` is not recommmended for the linked implementation to use of new ``sun.misc.Unsafe``
58+
methods in Java 8 exists.
59+
60+
Chunked implementation
61+
----------------------
62+
63+
Chunked memory allocation off-heap implementation.
64+
65+
Purpose of this implementation is to reduce the overhead for relatively small cache entries compared to the linked
66+
implementation since the memory for the whole segment is pre-allocated. This implementation is suitable for small
67+
entries with fast (de)serialization implementations of ``org.caffinitas.ohc.CacheSerializer``.
68+
69+
Segmentation is the same as in the linked implementation. The number of segments is configured via
70+
``org.caffinitas.ohc.OHCacheBuilder``, defaults to ``# of cpus * 2`` and must be a power of 2. Entries are distribtued
71+
over the segments using the most significant bits of the 64 bit hash code. Accesses on each segment are synchronized.
72+
73+
Each segment is divided into multiple chunks. Each segment is responsible for a portion of the total capacity
74+
``(capacity / segmentCount)``. This amount of memory is allocated once up-front during initialization and logically
75+
divided into a configurable number of chunks. The size of each chunk is configured using the ``chunkSize`` option in
76+
``org.caffinitas.ohc.OHCacheBuilder``.
3477

35-
OHC uses multiple segments. Each segment contains its own independent off-heap hash map. Synchronization occurs
36-
on critical sections that access a off-heap hash map. Necessary serialization and deserialization is performed
37-
outside of these critical sections.
38-
Eviction is performed using LRU strategy when adding entries.
39-
Rehashing is performed in each individual off-heap map when necessary.
78+
Like the linked implementation, hash entries are serialized into a temporary buffer first, before the actual put
79+
into a segment occurs (segement operations are synchronized).
80+
81+
New entries are placed into the current write chunk. When that chunk is full, the next empty chunk will become the new
82+
write chunk. When all chunks are full, the least recently used chunk, including all the entries it contains, is evicted.
83+
84+
Specifying the ``fixedKeyLength`` and ``fixedValueLength`` builder properties reduces the memory footprint by
85+
8 bytes per entry.
86+
87+
Serialization, direct access and get-with-loader functions are not supported in this implementation.
88+
89+
NOTE: The CRC hash algorithm requires JRE 8 or newer.
90+
91+
The extension jar ``ohc-core-j8`` is not required for the chunked implementation.
92+
93+
To enable the chunked implementation, specify the ``chunkSize`` in ``org.caffinitas.ohc.OHCacheBuilder``.
4094

4195
Configuration
42-
-------------
96+
=============
4397

4498
Use the class ``OHCacheBuilder`` to configure all necessary parameter like
4599

@@ -57,8 +111,11 @@ hash partition - that means less linked-link walks and increased performance.
57111
The total amount of required off heap memory is the *total capacity* plus *hash table*. Each hash bucket (currently)
58112
requires 8 bytes - so the formula is ``capacity + segment_count * hash_table_size * 8``.
59113

114+
OHC allocates off-heap memory directly bypassing Java's off-heap memory limitation. This means, that all
115+
memory allocated by OHC is not counted towards ``-XX:maxDirectMemorySize``.
116+
60117
Usage
61-
-----
118+
=====
62119

63120
Quickstart::
64121

@@ -85,8 +142,16 @@ Key and value serializers need to implement the ``CacheSerializer`` interface. T
85142
- ``void serialize(Object obj, DataOutput out)`` to serialize the given object to the data output
86143
- ``T deserialize(DataInput in)`` to deserialize an object from the data input
87144

145+
Java 9
146+
------
147+
148+
Java 9 support is still *experimental*!
149+
150+
OHC has been tested with some early access releases of Java 9 and the unit and JMH tests pass. However,
151+
it requires access to ``sun.misc.Unsafe`` via the JVM option ``-XaddExports:java.base/sun.nio.ch=ALL-UNNAMED``.
152+
88153
Building from source
89-
--------------------
154+
====================
90155

91156
Clone the git repo to your local machine. Either use the stable master branch or a release tag.
92157

@@ -98,11 +163,12 @@ Just execute
98163
``mvn clean install``
99164

100165
Benchmarking
101-
------------
166+
============
102167

103168
You need to build OHC from source because the big benchmark artifacts are not uploaded to Maven Central.
104169

105-
Execute ``java -jar ohc-benchmark/target/ohc-benchmark-0.3-SNAPSHOT.jar -h`` to get some help information.
170+
Execute ``java -jar ohc-benchmark/target/ohc-benchmark-0.5.0-SNAPSHOT.jar -h`` (when building from source)
171+
to get some help information.
106172

107173
Generally the benchmark tool starts a bunch of threads and performs _get_ and _put_ operations concurrently
108174
using configurable key distributions for _get_ and _put_ operations. Value size distribution also needs to be configured.
@@ -117,11 +183,16 @@ Available command line options::
117183
-rkd <arg> hot key use distribution - default: uniform(1..10000)
118184
-sc <arg> number of segments (number of individual off-heap-maps)
119185
-t <arg> threads for execution
120-
-type <arg> implementation type - default: linked - option: tables
121186
-vs <arg> value sizes - default: fixed(512)
122187
-wkd <arg> hot key use distribution - default: uniform(1..10000)
123188
-wu <arg> warm up - <work-secs>,<sleep-secs>
124189
-z <arg> hash table size
190+
-cs <arg> chunk size - if specified it will use the "chunked" implementation
191+
-fks <arg> fixed key size in bytes
192+
-fvs <arg> fixed value size in bytes
193+
-mes <arg> max entry size in bytes
194+
-unl do not use locking - only appropiate for single-threaded mode
195+
-hm <arg> hash algorithm to use - MURMUR3, XX, CRC32
125196
-bh show bucket historgram in stats
126197
-kl <arg> enable bucket histogram. Default: false
127198

@@ -141,7 +212,7 @@ Distributions for read keys, write keys and value sizes can be configured using
141212

142213
Quick example with a read/write ratio of ``.9``, approx 1.5GB max capacity, 16 threads that runs for 30 seconds::
143214

144-
java -jar ohc-benchmark/target/ohc-benchmark-0.3-SNAPSHOT.jar \
215+
java -jar ohc-benchmark/target/ohc-benchmark-0.5.0-SNAPSHOT.jar \
145216
-rkd 'gaussian(1..20000000,2)' \
146217
-wkd 'gaussian(1..20000000,2)' \
147218
-vs 'gaussian(1024..32768,2)' \
@@ -158,7 +229,7 @@ On a 2.6GHz Core i7 system (OSX) the following numbers are typical running the a
158229
- # of puts per second: 270000
159230

160231
Why off-heap memory
161-
-------------------
232+
===================
162233

163234
When using a very huge number of objects in a very large heap, Virtual machines will suffer from increased GC
164235
pressure since it basically has to inspect each and every object whether it can be collected and has to access all
@@ -178,25 +249,39 @@ But off heap memory is great when you have to deal with a huge amount of several
178249
that dos not put any pressure on the Java garbage collector. Let the Java GC do its job for the application where
179250
this library does its job for the cached data.
180251

252+
Why *not* use ByteBuffer.allocateDirect()?
253+
==========================================
254+
255+
TL;DR allocating off-heap memory directly and bypassing ``ByteBuffer.allocateDirect`` is very gentle to the
256+
GC and we have explicit control over memory allocation and, more importantly, free. The stock implementation
257+
in Java frees off-heap memory during a garbage collection - also: if no more off-heap memory is available, it
258+
likely triggers a Full-GC, which is problematic if multiple threads run into that situation concurrently since
259+
it means lots of Full-GCs sequentially. Further, the stock implementation uses a global, synchronized linked
260+
list to track off-heap memory allocations.
261+
262+
This is why OHC allocates off-heap memory directly and recommends to preload jemalloc on Linux systems to
263+
improve memory managment performance.
264+
181265
History
182-
-------
266+
=======
267+
268+
OHC was developed in 2014/15 for `Apache Cassandra <http://cassandra.apache.org/>`_ 2.2 and 3.0 to be used as the
269+
`new row-cache backend <https://issues.apache.org/jira/browse/CASSANDRA-7438>`_.
183270

184-
OHC was developed in 2014/15 for `Apache Cassandra <http://cassandra.apache.org/>`_ 3.0 to be used as the `new
185-
row-cache backend <https://issues.apache.org/jira/browse/CASSANDRA-7438>`_.
186271
Since there were no suitable fully off-heap cache implementations available, it has been decided to
187272
build a completely new one - and that's OHC. But it turned out that OHC alone might also be usable for
188273
other projects - that's why OHC is a separate library.
189274

190275
Contributors
191-
------------
276+
============
192277

193278
A big 'thank you' has to go to `Benedict Elliott Smith <https://twitter.com/_belliottsmith>`_ and
194279
`Ariel Weisberg <https://twitter.com/ArielWeisberg>`_ from DataStax for their very useful input to OHC!
195280

196281
Developer: `Robert Stupp <https://twitter.com/snazy>`_
197282

198283
License
199-
-------
284+
=======
200285

201286
Copyright (C) 2014 Robert Stupp, Koeln, Germany, robert-stupp.de
202287

notes-todos.txt

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,26 +2,13 @@
22
THIS IS JUST AN UNORDERED LIST OF NOTES AND TODOS AND HINTS
33

44

5-
Let only one implementation (linked / tables) survive, test on NUMA machine.
6-
75
Add burn test.
86

97
Cleanup (or better: combination of entry size and trigger/target) may expose a bug. Definitely worth keeping an eye on it.
108
It will just "do nothing" if there's not enough off-heap memory could be allocated.
119
But if too many threads allocate new entries and the machine is near-to-full, the whole machine may fail (due to OOM).
1210

1311

14-
// Marker item (RowCacheSentinel) added to row cache before actual load of row.
15-
--
16-
Honestly I cannot recall. I think it's simply to avoid duplicated work, but I think both end up performing the read.
17-
It would make sense if one simply waited on the result of the other, but I don't tihnk that's what happens.
18-
--
19-
It's nice - but we should not do more than adding a "marker" to the cache and regularly poll if it has been processed.
20-
But it will degrade performance for rows that are bigger than "maxEntrySize".
21-
--> use CacheLoader since OHC 0.3
22-
23-
24-
2512
jemalloc + Unsafe allocation on Linux + Windows regarding fragmentation on long running systems.
2613

2714

ohc-benchmark/pom.xml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@
55
<parent>
66
<groupId>org.caffinitas.ohc</groupId>
77
<artifactId>ohc-parent</artifactId>
8-
<version>0.4.5</version>
8+
<version>0.5.0</version>
99
<relativePath>..</relativePath>
1010
</parent>
1111

1212
<artifactId>ohc-benchmark</artifactId>
13-
<version>0.4.5</version>
13+
<version>0.5.0</version>
1414

1515
<name>OHC benchmark executable</name>
1616
<description>Off-Heap concurrent hash map intended to store GBs of serialized data</description>
@@ -76,12 +76,12 @@
7676
<dependency>
7777
<groupId>org.caffinitas.ohc</groupId>
7878
<artifactId>ohc-core</artifactId>
79-
<version>0.4.5</version>
79+
<version>0.5.0</version>
8080
</dependency>
8181
<dependency>
8282
<groupId>org.caffinitas.ohc</groupId>
8383
<artifactId>ohc-core-j8</artifactId>
84-
<version>0.4.5</version>
84+
<version>0.5.0</version>
8585
</dependency>
8686
<dependency>
8787
<groupId>commons-cli</groupId>

0 commit comments

Comments
 (0)