Oopsla08 Memory-Efficient Java Slides
Oopsla08 Memory-Efficient Java Slides
Quiz
Small boxes?
Q: What is the size ratio of Integer to int? a. 1 : 1 b. 1.33 : 1 c. 2 : 1 d. ?
Assume 32-bit platform
Small boxes?
Q: What is the size ratio of Integer to int? a. 1 : 1 b. 1.33 : 1 c. 2 : 1 d. 4 : 1
Assume 32-bit platform
Small things?
Q: How many bytes in an 8-character String? a. 8 b. 16 c. 28 d. ?
Assume 32-bit platform
Small things?
Q: How many bytes in an 8-character String? a. 8 b. 16 c. 28 d. 64
Assume 32-bit platform
Bigger? Better?
Q: Which of the following is true about HashSet relative to HashMap a. does less, smaller b. does more, smaller c. similar amount of functionality, same size d. ?
Bigger? Better?
Q: Which of the following is true about HashSet relative to HashMap a. does less, smaller b. does more, smaller c. similar amount of functionality, same size d. does less, larger
Small collections?
Q: Put the following 2-element collections in size order: ArrayList, HashSet, LinkedList, HashMap
Small collections?
Q: Put the following 2-element collections in size order: ArrayList, HashSet, LinkedList, HashMap
Collections?
Q: How many live collections in a typical heap? a. between five and ten b. tens c. hundreds d. ?
Collections?
Q: How many live collections in a typical heap? a. between five and ten b. tens c. hundreds d. tens/hundreds of thousands, even millions
Roadmap
Quiz Background & myths Memory health Patterns of memory usage
Process
Background
Background
Our group has been diagnosing memory and performance problems in large Java systems for more than 8 years
Worked with dozens of applications: open source, large commercial applications, software products
Common thread: it is easy to build systems that consume large memory resources compared to the work accomplished
Grown from 500M to 2-3G or more in the past few years But not necessarily supporting more users or functions
Surprisingly common: requiring 1G memory to support a few hundred users saving 500K session state per user requiring 2M for a text index per simple document creating 100K temporary objects per web hit
It is easy for costs to pile up, just piecing together building blocks
Myths
Frameworks are written by experts, so theyve been optimized (for my use case!)
I knew foo was expensive; I didnt know it was this expensive! Its no use: O-O plus Java is always expensive Efficiency is incompatible with good design
Goals
Its not a lost cause!
Roadmap
Quiz Background & myths Memory health Patterns of memory usage
Process
Collections
Many, high overhead
Empty Small
Special purpose
High data
Duplication
Fields
Lifetime
Short
Complex temps
Long
In-memory Space Correlated designs vs. time lifetime
Roadmap
Quiz Background & myths Memory health Patterns of memory usage
Process
Modeling your data types Modeling relationships . break More relationships More data type modeling Object lifetime
Measurements shown are estimates obtained from experiments on a sampling of different JVMs. In practice costs will vary across JVMs. Measures we report may be subject to errors due to data collection, analysis tools, or interpretation of results. They are intended only to illustrate the nature of memory problems.
Memory health
TreeMap
x1 = 3.9KB Average fanout
Schematic of a data structure Distinguish data types from collections A region includes all of its implementation classes
Double
x100 = 2.3KB
Double
x100 = 2.3KB
TreeMap
x1 = 3.9KB
Cost: 8.6KB What are the bytes accomplishing? How much is actual data vs. overhead?
100
100
Double
x100 = 2.3KB
Double
x100 = 2.3KB
Double
24 bytes
33% is actual data 67% is the representation overhead From one 32-bit JVM. Varies with JVM, architecture.
8-char String
64 bytes
only 25% is the actual data 75% is overhead of representation would need 96 characters for overhead to be 20% or less
String JVM overhead 16 bytes char[] chars JVM overhead 16 bytes data 16 bytes bookkeeping fields 12 bytes pointer 4 bytes
Collection health
A 100-entry TreeMap
TreeMap
x1 = 3.9KB
Fixed overhead: 48 bytes
How does a TreeMap spend its bytes? Collections have fixed and variable costs
TreeMap
data
TreeMap
x1 = 3.9KB 100% overhead
82% overhead overall Design enables updates while maintaining order Is it worth the price?
100 100
Double
x100 = 2.3KB
Double
x100 = 2.3KB
67% overhead
double[]
1x = 816 bytes
double[]
1x = 816 bytes
2% overhead
Binary search against sorted array Less functionality suitable for loadthen-use scenario 2% overhead
TreeMap
x1 = 391KB
Overhead is still 82% of cost Overhead is not amortized in this design High constant cost per element: 88 bytes
10000
10000
Double
x10000 = 234KB
Double
x10000 = 234KB
88% 82%
Overhead is still 82% of cost Overhead is not amortized in this design High constant cost per element: 88 bytes
Data Overhead
100K
200K
300K
400K
double[]
double[]
Overhead starts out low, quickly goes to 0 Cost per element is 16 bytes, pure data
Data Overhead
2% 1
Summary: Health
Distinguish actual data from the overhead of representation:
Overhead from your data types Overhead from your collection choices, fixed vs. variable Many other ways to break down overhead costs
JVM object overhead, delegation costs, empty array slots, unused fields, duplicate data, ...
How much room for improvement? Is the functionality worth the price? Which overhead will be amortized? If constant, how large?
Collections
Many, high overhead
Empty Small
Special purpose
High data
Duplication
Fields
Lifetime
Short
Complex temps
Long
In-memory Space Correlated designs vs. time lifetime
header 12 bytes Double 24 bytes header 12 bytes double 8 bytes alignment 4 bytes boolean alignment 1 byte 3 bytes
JVM & hardware impose costs on objects. Can be substantial for small objects
8-char String
64 bytes
31% is overhead due to modeling as two objects Effect varies with size of String
String JVM overhead 16 bytes char[] chars JVM overhead 16 bytes data 16 bytes bookkeeping fields 12 bytes pointer 4 bytes
Fine-grained modeling
Case study: server framework, part of connection
Request info
Request
x46K = 67MB
one Request
34 instances to represent a request. Cost: 1.5K per request. Will not scale.
Entry To
Contact
NameAddress
NameAddress
NameAddress
Url
Url
Url
Params
Params
Params
Params
Params
Using 64-bit addressing to solve memory problems can cause new ones
Increases object header, alignment, and pointer overhead One study shows 40-50% avg. increase in heap sizes for benchmarks
Some JVMs have options for extended 32-bit addressing, allowing access to larger heaps without the footprint cost
8-char String
96 bytes
50% larger
String
Delegated design is responsible for extra object header and pointer costs
alignment 4 bytes
Bookkeeping fields
Simple example: an 8-character String
8-char String
64 bytes
String users pay a 12byte tax to store offset, length, hashcode. Unnecessary for most common use cases.
String
JVM overhead 16 bytes char[] chars JVM overhead 16 bytes data 16 bytes bookkeeping fields 12 bytes pointer 4 bytes
12 Object Object 12 Object 12 40 ContactInfo ContactInfo 40 ContactInfo 40 Date Date createDate Date Date Date Date createDate Date Date Date createDate Party enteredBy Party enteredBy Party enteredBy Date updateDate Date updateDate Date updateDate Party updateBy Party updateBy Party updateBy Object primary Object primary Object primary int typeId int typeId int typeId String type String type String type ElectronicAddress 48 PhysicalAddress 100 PhoneNumber 60
Problem 1:
total:
100
total:
152
total:
112
ModelObjectImpl
pointer 4 bytes
ModelObjectImpl
Problem: fields allocated for features that are not used in many models
PropertiesHolder
bookkeeping 20 bytes
ModelObjectImpl
Fixed
bookkeeping 20 bytes
Recompute as needed
ModelObjectImpl
Example: these fields are used for crossmodel references. Memory costs force models to be broken into fragments, increasing the need for these references.
pointer 4 bytes
Solution: refactoring
Some fields are not needed in the general case, or are for rarelyused features Fine-grained designs using a common base class will multiply the cost of the base class design
Data fields Semi-constant fields Sparse fields Saving recomputable data unnecessarily often the result of premature optimization. Both scalar and reference fields
Typically, many different cases occur together in the same data model
Moving rarely-used fields to side objects can incur delegation costs Moving sparse fields to a map can incur high map entry costs Verifying actual costs and benefits is essential
Not easy late in the cycle Using interfaces and factories up front can help
Representing relationships
Collections
Many, high overhead
Empty Small
Special purpose
High data
Duplication
Fields
Lifetime
Short
Complex temps
Long
In-memory Space Correlated designs vs. time lifetime
Representing relationships
small collections
HashMap
x1 = 1.8MB
Keys
Values
HashSet
ArrayList
x65K = 3.1MB
x65K = 16MB
Vertex
Level
Data
4.5
Integer
x65K = 1MB
Edge
x297K = 9MB
HashMap
x1 = 1.8MB
Values
Keys
ArrayList
x65K = 3.1MB
HashSet
x65K = 16MB
Vertex
Level
Data
4.5
Integer
x65K = 1MB
Edge
x297K = 9MB
HashMap Default capacity 16. For 5-entry set: 44+ bytes empty slots. array
Assumes entry, key, value sets all commonly used. Cost: 12 bytes HashMap$Entry Key
Remedy HashMap
x1 = 1.8MB
Values
Keys
ArrayList
x65K = 3.1MB
ArrayList
x65K = 3.7MB
HashSet functionality was not worth the cost. Uniqueness already guaranteed elsewhere
Vertex
Level
Data
4.5
Integer
x65K = 1MB
x297K = 9MB
Gracefully-growing collections
HashMap
x1 = 1.8MB
Keys
Values
ArrayList
x65K = 3.1MB
ArrayList
x65K = 3.7MB
Vertex
Level
Data
4.5
Integer
x65K = 1MB
Edge
x297K = 9MB
Object[]
entry
Default size and growth policy can mean overhead from empty slots
entry
Remedy: HashMap
x1 = 1.8MB
Values
Keys
ArrayList
Pair
x65K = 1.3MB
x65K = 3.7MB
Data
4.5
Vertex
Edge
x297K = 9MB
Multipart key
Case study: Apache Commons MultiKeyMap
MultiKeyMap
KeyPart1 KeyPart2
Array
MultiKey
Array
Could have easily created specialized MultiKey2, MultiKey3, etc. to avoid delegation cost
Growth policies
Example: creating default-size ArrayLists
Index
HashMap
x1 = 1.8MB
28% overhead in ArrayLists just from empty slots collections optimized for growth large defaults and jumps doubling 10% tax on some copies
Values
Keys
ArrayList
Pair
x65K = 1.3MB Would be 3.7M with optimal sizing
1
x65K = 5.2MB
Vertex
Data
4.5
Remedies:
Edge
x297K = 9MB
SessionData
x330 = under 1MB
Small run had 26M of session data. Will not scale. 210 empty collections per session = 28% of session cost
Person
other structures
15 MB
Profile
x1.95K = 4.6MB
Remedies:
70
ArrayList
x101K = 7.9MB
Array
Array
HashSet
LinkedList
HashMap
LinkedList$Entry
Array
Representing relationships
small collections
special-purpose collections
ConcurrentHashMap
x1 = 4MB Session
110K
Nested CHMs: > 1600 bytes each! Cost was 90% of this structure; 10-20% of total heap
Chat session
x110K = 10MB
ConcurrentHashMap
x110K = 173MB Subscriber 1
Library not intended for use at this scale Concurrency requirements were different at fine vs. coarse grain
Subscriber
x110K = 4.2MB
Remedies:
ConcurrentHashMap
x1 = 4MB
Session
110K
Chat session
x110K = 10MB
Used Hashtable, since high level of concurrency not needed. Savings: 90+%
Subscribers 1
Hashtable
x110K = 17M
Note:
Subscriber 1
Subscriber
x110K = 4.2MB
HashMap
28 bytes
HashMap
ConcurrentHashMap
x1 = 3.8MB Element
64K
108MB for UnmodifiableMap wrappers. 56 bytes each Twice the cost as on a 32-bit JVM
CachedElement
Titles x63K = 137MB Titles
32
UnmodifiableMap
x1.9M = 106M
1
UnmodifiableMap
x1.9M = 465MB Title
Functionality not worth the cost at this scale. Unmodifiable serves a developmenttime purpose
HashMap
x1.9M = 359MB
1.01
String
x2M = 156MB
HashMap
x1 = .3MB
10K
10K
Vertex (key)
HashMap
x10K = 2.4MB
Level (key)
HashMap
x1 = under 1K
Level (key)
HashMap
x5 = 1.4MB
10K 10K
Vertex (key)
HashMap
x1 = 1.4MB
50K 50K
Pair
x50K = 1MB
Trading fixed costs of small collections for per-element cost in a large collection: 28-byte HM entry + 20-byte Pair
Vertex (key) 1 * HM fixed overhead + 50K * HM per-entry overhead + 50K * Pair overhead Total: 2.4 MB
Assume num levels is much smaller than num vertices Then II is consistently better than I
Difference of III vs. others is sensitive to the number of levels, even within a small range
Break
Representing relationships
Vertex (key)
From experiments with a few different JVMs, all 32-bit. Excludes amortized per-collection costs such as empty array slots. Includes pointer to entry.
Stores 7 properties per subscription, via session API HT per-entry, boxing costs add 350 bytes overhead per session, impeding scalability
Session
x20K Properties 1
Hashtable
x20K = 7M Attributes
7
Values
String
(shared across sessions)
Remedy:
Session
x20K Properties 1
Hashtable
x20K = 2.6M Attributes
1
Values
String
(shared across sessions)
SubscriptionProperty
x20K = 1.2M
Representing relationships
special-purpose collections
TreeMap
x52 = 537MB
265K
265K
Double
x13.4M = 342MB
Double
x13.4M = 342MB
Identity maps
Comparison: HashMap vs. IdentityHashMap
HashMap
x1 = 298KB
IdentityHashMap
x1 = 128KB
For maintaining a map of unique objects, where the reference is the key Equality based on == Open addressing implementation avoids the cost of Entry objects Cost reduced by 59% in this experiment
10000
10000
10000
10000
Key
Value
Key
Value
Overhead of small, nested collections the total cost of each collection is what matters both fixed and variable parts
Per-element overhead of large collections collection per-entry and data delegation overheads
Collections alternatives
Apache Commons
GNU Trove
Many space-efficient implementations e.g. scalar collections e.g. list entries without delegation cost
Cliff Click nonblocking; Javolution; Amino Specialized collections within frameworks you use
Important: check your corporate policy re: specific open source frameworks
Choosing and configuring carefully really matters, since collection implementations are often expensive
Session
x111K = 42MB
StringBuffer
x334K = 187MB
Empty space overhead in StringBuffer Space cost not worth the time savings
Remedies:
SessionImpl
Data type had been split in three Same coding pattern copied to each part
SessionBase
Profile
Saving formatted data Some were constants (10%). Some had few values (Y, N) Storing a boolean as a String. Health ratio is 48 : 1
Y or N
10%
String
String
Duplicating data with high-overhead representation Space cost not worth the time savings
ConcordanceEntry
17% of cost due to duplication of Type and its String data Only a small number of immutable Types
Annotation 1
Type
Interface design did not provide for sharing Full cost of duplication was hidden
String
Remedy
char[]
1
You specify which Strings to share Shares the String object and the character array Make sure its worth it, since there is a space cost Myth that is causes memory leaks
Boxed scalars
Integer.valueOf(), Boolean.valueOf(), etc. Shares some common values (not all) Make sure you dont rely on ==
Common-prefix data
Case study: application server, class cache
Class map
Class loader map of class names to jar files > 120M of Strings, mostly duplicate prefix information
HashMap
Class name
Class info
String
120+ MB
Remedy
Collections
Many, high overhead
Empty Small
Special purpose
High data
Duplication
Fields
Lifetime
Short
Complex temps
Long
In-memory Space Correlated designs vs. time lifetime
short-lived data
Temporaries
Expensive temporaries
Example: SimpleDateFormat
SimpleDateFormat
Costly construction process. Each call to the default constructor results in:
DecimalFormat
GregorianCalendar
String[] String[]
TimeZone
Tradeoffs
Converter, formatter, factory, schema, connection, etc. may be good candidates for reuse. They can be expensive to create, and are often designed for reuse
Sometimes local variables are good enough Avoid rolling your own resource pools
Some temporaries are inexpensive to create (e.g. Integer, many iterators) ThreadLocal access is usually a hash lookup Verify cost/benefit
long-lived data
1. 2. 3.
In-memory design. Data is in memory forever Space vs. time. Data may be discarded and recomputed Correlated lifetime. Data alive only during the lifetime of other objects or during specific phases
Each has its own best practices and pitfalls Many problems stem from misunderstanding requirements
If not careful, extending the lifetime of objects can introduce concurrency problems, leaks, and additional memory overhead from structures that manage lifetime
long-lived data
in-memory designs
Requirement: analyze 80-million object heap Design: one object per target application object Hypothetical minimum: if each object needed just 4 fields (type, id, ptr to references, flags): 80M x 32 bytes = 2.5G just to model application objects! To model references (2-3 per object), and leave scratch space for algorithms, design would require at least 10G
Note
An earlier design used a modeling framework with high overhead costs. Just optimizing those costs would not have been sufficient.
Backing store using memory-mapped files (java.nio)
Built a column-based storage infrastructure with scalar arrays, to reduce working set and avoid object header costs
Specialized for this applications access patterns Dont try this at home!
Column-based approach is a last resort. For optimization of highly specialized and protected components
long-lived data
Soft References:
Tells GC to reclaim these objects only when the space is really needed Will keep an object alive after it is no longer strongly referenced, just in case it is needed again Used mostly to avoid recomputation
e.g. for caches and resource pools e.g. for side objects (cached fields) which can be recreated if lost
> 100M of classname strings Implemented an in-memory design. Purpose was for performance - should have been a small, bounded cache Cache itself was only needed during startup
Unbounded growth (leak). An object pool framework was used for 20 different purposes, to improve performance. Unbounded size; strong references. Solution: soft references
Case study: financial web application Cache sized too large, aiming for 95% hit rate Result: performance problems due to excessive GC
Relying solely on soft references gives up control over policy May not leave enough headroom for temporary objects, causing the GC to run more often
Caches / pools should in general be bounded in size Soft references can be used as an additional failsafe mechanism
Many implementations of caches and resource pools are available Avoid writing your own if possible
long-lived data
Correlated lifetime
Objects needed only while other objects are alive
Sharing pool
keys
HashMap
Keeps subexpressions (and map entries) around forever
values
Shared data
Subexpressions
Algorithm
Weak Reference:
Useful for preventing leaks ties the lifetime of objects to other objects
Remedy:
Shared data
ReferenceMap
values
Apache Commons ReferenceMap (Strong, Weak) Pool entry will be removed when value is no longer needed
Weak reference
Subexpressions
Note:
transient references for one iteration
Algorithm
Also considered soft references. But each iteration used different expressions, so no need to prolong lifetime. Goal was space, not time.
The standard Java WeakHashMap. Example usage: key = object to be annotated, value = annotation Caution if key is the same as or strongly reachable from value
Weak and soft references are Objects, and so incur footprint costs
Some weak/soft maps entries extend Weak/SoftReference; others add yet another level of delegation
Leak: bug in end-of-request processing failed to remove an object from a listener queue Immediate fix: fixed bug in request For robustness: have listener queue use weak references
Large index needed only during load time Easy solution: nulled out pointer
. . .
Process
. . .
Measurement
Many surprises
Small, synthetic experiments are extremely valuable, to test out frameworks and design patterns before they are adopted Of course, incorporate measurement into unit and system tests
Use detailed diagnostic tools to periodically check for scale, and look for surprises Be mindful of normalization units when designing tests: how many concurrent users? active sessions? Understand costs of major units used in lower layers Run experiments at different scales early on. Are costs amortized as expected?
Cardinality of relationships: state as part of design; verify periodically; then use in combination with measurement as the basis for estimation Caches and pools: verify that they are working and they are worth it
For analyzing the sources of memory bloat and for verifying assumptions, tools that rely on heap snapshots are the most valuable Some free tools from IBM and Sun
Commercial and open source tools SAP MAT: now open source (Eclipse) YourKit, JProfiler, . many only read hprof, not phd format
IBM and Sun diagnostic guides have information on gathering and analyzing heap snapshots, and pointers to free tools
Formats
For footprint: at steady state with known load For footprint of a single feature or for suspected growth: before/after fixed number of operations, starting after system is warmed up
Additional resources
JDK library source code is freely available, and can be very worthwhile to consult Many valuable articles on the web
IBM DeveloperWorks, Sun Developer Network are good starting points Some misinformation occasionally found on reputable sites Best practices and tuning guides for specific frameworks
Garbage collection and overall heap usage IBM and Sun diagnosis sites have GC tuning guides, free tools
Object allocation Most Java performance profilers can show allocation information with calling context. e.g. hprof (free)
Conclusions
Distributed development, layers of frameworks, and Javas modeling limitations make it easy to create bloated data designs.
The concept of data structure health the ratio of actual data to its representation can illuminate where there is room for improvement, and point out aspects of a design that will not scale.
Acknowledgments
Thanks to:
Matthew Arnold Dave Grove Tim Klinger Trevor Parsons Peter Santhanam Edith Schonberg Yeti
See also: N. Mitchell, G. Sevitsky, The Causes of Bloat, the Limits of Health, in Proceedings of Object Oriented Programming Systems Languages and Applications (OOPSLA) 2007, Montreal, Canada.