Sousa Graphics Gems CryENGINE3
Sousa Graphics Gems CryENGINE3
AGENDA
Anti-aliasing
Camera Post-Processing
DX 10.1 introduced SV_SampleIndex / SV_Coverage system value semantics. Allows to solve via multipass for pixel/sample frequency passes [Thibieroz11] Forces pixel shader execution for each sub-sample and provides index of the sub-sample currently executed Index can be used to fetch sub-sample from a Multisampled RT. E.g. FooMS.Load( UnnormScreenCoord, nSampleIndex) Indicates to pixel shader which sub-samples covered during raster stage. Can modify also sub-sample coverage for custom coverage mask
Loop through MSAA tagged sub-samples
Advances in Real-Time Rendering course, Siggraph 2013 3
SV_SampleIndex
SV_Coverage
DEFERRED MSAA\HEADS UP !
Simple theory, troublesome practice
Pre-resolves sample 0, for pixel frequency passes such as lighting/other MSAA dependent passes In same pass create sub-sample mask (compare samples similarity, mark if mismatching) Avoid default SV_COVERAGE, since it results in redundant processing on regions not requiring MSAA
SV_Coverage
Reserve 1 bit from stencil buffer Update with sub-sample mask Tag entire pixel-quad instead of just single pixel -> improves stencil culling efficiency Make usage of stencil read/write bitmask to avoid per-sample bit override
StencilWriteMask = 0x7F
Use clip/discard Extra overhead also from additional texture read for per-sample mask
Advances in Real-Time Rendering course, Siggraph 2013 6
Set stencil read mask to reserved bits for per-pixel regions (~0x80) Bind pre-resolved (non-multisampled) targets SRVs Render pass as usual
Set stencil read mask to reserved bit for per-sample regions (0x80) Bind multisampled targets SRVs Index current sub-sample via SV_SAMPLEINDEX Render pass as usual
Advances in Real-Time Rendering course, Siggraph 2013 7
Default SV_Coverage only applies to triangle edges E.g. check if current sub-sample uses AT or not and set bit
Alpha Test SSAA Disabled
static const float2 vMSAAOffsets[2] = {float2(0.25, 0.25),float2(-0.25,-0.25)}; const float2 vDDX = ddx(vTexCoord.xy); const float2 vDDY = ddy(vTexCoord.xy); [unroll] for(int s = 0; s < nSampleCount; ++s) { float2 vTexOffset = vMSAAOffsets[s].x * vDDX + (vMSAAOffsets[s].y * vDDY); float fAlpha = tex2D(DiffuseSmp, vTexCoord + vTexOffset).w; uCoverageMask |= ((fAlpha-fAlphaRef) >= 0)? (uint(0x1)<<i) : 0; }
Render shadows as usual at pixel frequency Bilateral upscale during deferred shading composite pass
Recommendation to tackle via per-sample frequency is fairly slow on real world scenarios Using Max Depth works ok for most cases and N-times faster
10
Use alpha to coverage instead, or even no alpha test AA (let morphological AA tackle that)
11
Incorrect
Incorrect
Advances in Real-Time Rendering course, Siggraph 2013 12
Fixed
Fixed
Advances in Real-Time Rendering course, Siggraph 2013 13
DEFERRED MSAA\RECAP
Accessing and/or rendering to Multisampled RTs?
Then you need to care about accessing and outputting correct sub-sample Avoid vanilla deferred lighting
Prefer fully deferred, hybrids, or just skip deferred altogether.
Highly Compressed 4k
Source 4k
Advances in Real-Time Rendering course, Siggraph 2013 15
FXAA, MLAA, SMAA, SRAA, DEAA, GBAA, DLAA, ETC AA Filtering Approaches for Real-Time Anti-Aliasing [Jimenez et all 11]
Shading Anti-aliasing
Mip mapping normal maps [Toksvig04] Spectacular Specular: LEAN and CLEAN Specular Highlights [Baker11] Rock-Solid Shading: Image Stability withouth Sacrificing Detail [Hill12]
16
Reprojection
|V| weighting
17
Cant handle signal (color) changes nor transparency. For correct result, all opaque geometry must output velocity
Alpha blended surfaces (e.g. particles), lighting/shadow/reflections/uv animation/etc Any scatter and alike post processes, before the AA resolve E.g. ghosting on transparency, lighting, shadows and such Silhouettes might appear, from scatter and alike post processes (e.g. bloom)
Incorrect Blending Simplest solution: force resource sync NVIDIA exposes driver hint to force sync resource , via NVAPI. This is solution used by NVIDIAs TXAA
Note to hw vendors: would be great if all vendors exposed such (even better if Multi-GPU API functionality generalized)
Pathological cases
Multi-GPU
18
For higher temporal stability: accumulate multiple frames in an accumulation buffer, alike TXAA [Lottes12] Re-project accumulation buffer Weighting: Map acc. buffer colors into the range of curr. frame neighborhood color extents [Malan2012]; different weight for hi/low frequency regions (for sharpness preservation).
TL TR M BL BR
Current Frame (t0)
M
Accumulation Buffer (tN)
19
For higher temporal stability: accumulate multiple frames in an accumulation buffer, alike TXAA [Lottes12] Re-project accumulation buffer Weighting: Map acc. buffer colors into the range of curr. frame neighborhood color extents [Malan2012]; different weight for hi/low frequency regions (for sharpness preservation).
Regular TAA
Advances in Real-Time Rendering course, Siggraph 2013
Smaa 1tx
20
float3 cMax = max(cTL, max(cTR, max(cBL, cBR))); float3 cMin = min(cTL, min(cTR, min(cBL, cBR))); float3 wk = abs((cTL+cTR+cBL+cBR)*0.25-cM); return lerp(cM, clamp(cAcc, cMin, cMax), saturate(rcp(lerp(kl, kh, wk)));
21
DEPTH OF FIELD
22
Typical controls such as focus range + blur amount and others have not much physical meaning CoC depends mainly on f-stops, focal length and focal distance. These last 2 directly affect FOV. If you want more Bokeh, you need to max your focal length + widen aperture. This means also getting closer or further from subject for proper framing.
Not the typical way a game artist/programmer thinks about DOF.
50 mm
200 mm
Advances in Real-Time Rendering course, Siggraph 2013 24
DEPTH OF FIELD\F-STOPS
2 f-stops
8 f-stops
22 f-stops
25
2 f-stops
2.8 f-stops
4 f-stops
5.6 f-stops
26
0.5 m
Advances in Real-Time Rendering course, Siggraph 2013 27
0.75 m
Advances in Real-Time Rendering course, Siggraph 2013 28
1.0 m
Advances in Real-Time Rendering course, Siggraph 2013 29
Bigger aperture = more circular bokeh, smaller aperture = more polygonal bokeh
Polygonal bokeh look depends on diaphragm blades count Blades count varies on lens characteristics
Almost same amount of light enters camera iris from all directions
Edges might be in shadow, this is commonly known as Vignetting Poor lenses manufacturing may introduce a vast array of optical aberrations [Wiki01]
This is main reason why gaussian blur, diffusion dof, and derivative techniques look wrong/visually unpleasant
Advances in Real-Time Rendering course, Siggraph 2013 30
Simple implementation and nice results. Downside: performance, particularly on shallow DOF
Variable/inconsistent fillrate hit, depending on near/far layers resolution and aperture size might reach >5 ms Quad generation phase has fixed cost attached.
Advances in Real-Time Rendering course, Siggraph 2013 31
[Kawase09]
[Gotanda09]
[White11]
[Andreev 12]
Advances in Real-Time Rendering course, Siggraph 2013
[Macintosh12]
32
1st pass N^2 taps (e.g: 7x7). 2nd pass N^2 taps (e.g: 3x3) for flood filling shape R11G11B10F: downscaled HDR scene; R8G8: CoC Done at half resolution Far/Near fields processed in same pass Limit offset range to minimize undersampling Higher specs hw can have higher tap count
33
A camera withouth lens Light has to pass through single small aperture before hitting image plane Tipical realtime rendering
Thin lens
Camera lenses have finite dimension Light refracts through lens until hitting image plane. F = Focal lenght P = Plane in focus I = Image distance
P
Advances in Real-Time Rendering course, Siggraph 2013
Image Plane
Circle of Confusion
I
34
F = Focal length (where light starts getting in focus) P = Plane in focus (camera focal distance) I = Image distance (where image is projected in focus)
f = f-stops (aka as the f-number or focal ratio) D = Object distance A = Aperture diameter
Simplifies to:
Note: f and F are known variables from camera setup Folds down into a single mad in shader
Typical film formats (or sensor), 35mm/70mm Can alternatively derive focal length from FOV
Advances in Real-Time Rendering course, Siggraph 2013 35
Camera FOV:
DEPTH OF FIELD\SAMPLING
Concentric Mapping [Shirley97] used for uniform sample distribution
Maps unit square to unit circle Square mapped to (a,b) [-1,1]2 and divided into 4 regions by lines a=b, a=-b First region given by:
36
DEPTH OF FIELD\
ND SAMPLING: 2 ITERATION
+...=
U...=
37
58 taps (0.52ms)
Advances in Real-Time Rendering course, Siggraph 2013 39
2f-stops
4f-stops
Advances in Real-Time Rendering course, Siggraph 2013 40
Downscale CoC target k times (k = tile count) Take min fragment for far field, max fragment for near field R8G8 storage
Dynamic branching using Tile Min/Max CoC for both fields Balances cost between far/near Also used for scatter as gather approximation for near field
41
Careful: downscale is source of error due to bilinear filtering Use custom bilinear (bilateral) filter for downscaling
Scale kernel size and weight samples with far CoC [Scheumerman05] Pre-multiply layer with far CoC [Gotanda09]
Prevents bleeding artifacts from bilinear/separable filter
Far Field
No weighting
CoC weighting
42
Careful: downscale is source of error due to bilinear filtering Use custom bilinear (bilateral) filter for downscaling
Scale kernel size and weight samples with far CoC [Scheumerman05] Pre-multiply layer with far CoC [Gotanda09]
Prevents bleeding artifacts from bilinear/separable filter
Far Field
Near Field
Scatter as gather aproximation Scale kernel size + weight fragments with Tile Max CoC against near CoC Pre-multiply with near CoC
Only want to blur near field fragments (cheap partial occlusion approximation)
Far Field
Near Field
43
Take 4 taps from half res CoC, compare against full res CoC Weighted using bicubic filtering for quality [Sigg05] Far field CoC used for blending
Half resolution near field CoC used for blending Can bleed as much as possible Also using bicubic filtering Linear blending doesnt look good (signal frequency soup)
Can be seen in many games, including all Crysis series (puts hat of shame)
Linear blend
44
MOTION BLUR
45
The longer the exposure (slower shutter), the more light received (and the bigger amount of motion blur), and vice-versa The lower f-stops the faster the exposure can be (and have less motion blur), and vice versa
47
E.g.velocity dilation, velocity blur, tile max velocity; single vs. multiple pass composite; depth/v/obj ID masking; single pass DOF+MB
48
Downscale Velocity buffer by k times (k is tile count) Take max length velocity at each step
49
Simplify and vectorize inner loop weight computation (ends up in couple mads) Fat buffers sampling are half rate on GCN hw with bilinear (point filtering is fullrate, but doesnt look good due to aliasing) Inputs: R11G11B10F for scene , bake ||V|| and 8 bit depth into a R8G8 target Make it separable, 2 passes [Sousa08]
float2 DepthCmp(float2 z0, float2 z1, float2 fSoftZ) { return saturate( (1.0f + z0* fSoftZ) - z1* fSoftZ ); } float4 VelCmp(float lensq_xy, float2 vxy) { return saturate((1.0f - lensq_xy.xxxx *rcp(vxy.xyxy)) + float4(0.0f, 0.0f, 0.95f, 0.95f)); }
51
Avoids separate geometry passes. Rigid geometry: object distance < distance threshold Deformable geometry: if amount of movement > movement threshold Moving geometry rendered last R8G8 fmt
Velocity encoded in gamma 2.0 space Precision still insufficient, but not much noticeable in practice
Encode
Object velocity
Decode
Advances in Real-Time Rendering course, Siggraph 2013
MB before DOF
Advances in Real-Time Rendering course, Siggraph 2013
MB after DOF
53
FINAL REMARKS
Practical MSAA details
Separable, 1st pass: 0.236 ms, 2nd pass: 0.236ms. Sum: 0.472ms for reconstruction filter *
SPECIAL THANKS
Natalya Tartachuck Michael Kopietz, Nicolas Schulz, Christopher Evans, Carsten Wenzel, Christopher Raine, Nick Kasyan, Magnus Larbrant, Pierre Donzallaz
55
WE ARE HIRING !
56
QUESTIONS ?
57
REFERENCES
Potmesil M., Chakravarty I. Synthetic Image Generation with a Lens and Aperture Camera Model, 1981 Shirley P., Chiu K., A Low Distortion Map Between Disk and Square, 1997 Green S. Stupid OpenGL Tricks, 2003 Toksvig M., Mip Mapping Normal Maps, 2004 Scheuermann T., Tatarchuk N. Improved Depth of Field Rendering, Shader X3, 2005 Sigg C., Hadwiger M., Fast Third-Order Texture Filtgering, 2005 Cyril, P et al. Photographic Depth of Field Rendering, 2005 Gritz L., Eon E. The Importance of Being Linear, 2007 Sawada Y., Talk at Game Developers Conference, http://www.beyond3d.com/content/news/499 , 2007 Sousa T. Crysis Next Gen Effects, 2008 Gotanda Y. Star Ocean 4: Flexible Shader Management and Post Processing, 2009 Yang L. et al, Amortized SuperSampling, 2009 Kawase M., Anti-Downsized Buffer Artifacts, 2011 Sousa T., CryENGINE 3 Rendering Techniques, 2011 Binks D., Dynamic Resolution Rendering, 2011 Sousa T., Schulz N. , Kasyan N., Secrets of CryENGINE 3 Graphics Technology, 2011 Sousa T., Anti-Aliasing Methods in CryENGINE 3, 2011 Jimenez J. et.al, Filtering Approaches for Real-Time Anti-Aliasing, 2011 Thibieroz N., Deferred Shading Optimizations, 2011
Advances in Real-Time Rendering course, Siggraph 2013 58
REFERENCES
Cloward B., Otstott A., Cinematic Character Lighting in Star Wars: The Old Republic, 2011 Mittring M., Dudash B., The Technology Behind the DirectX11 Unreal Engine Samaritan Demo, 2011 Futuremark, 3DMark11 White Paper, 2011 Baker D., Spectacular Specular: LEAN and CLEAN Specular Highlights, 2011 White J., Brisebois C., More Performance! Five Rendering Ideas from Batlefield 3 and Need for Speed: The Run, 2011 Lottes T., Nvidia TXAA, http://www.nvidia.in/object/txaa-anti-aliasing-technology-in.html#gameContent=1 , 2012 McGuire M. et al, A Reconstruction Filter for Plausible Motion Blur, 2012 McIntosh L. Et al, Efficiently Simulating Bokeh, 2012 Malan H., Real-Time Global Illumination and Reflections in Dust 514, 2012 Hill S., Baker D., Rock-Solid Shading: Image Stability withouth Sacrificing Detail, 2012 Sousa T., Wenzel C., Raine C., Rendering Technologies of Crysis 3, 2013 Thibieroz N., Gruen H., DirectX11 Performance Reloaded, 2013 Andreev D., Rendering Tricks in Dead Space 3, 2013 http://en.wikipedia.org/wiki/Lens_%28optics%29 http://en.wikipedia.org/wiki/Pinhole_camera http://en.wikipedia.org/wiki/F-number http://en.wikipedia.org/wiki/Angle_of_view
59