/
NOTES
439 lines (334 loc) · 25.7 KB
/
NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
Debugging and profiling
=======================
How to use the apple OpenGL profiler, very handy:
https://developer.apple.com/library/mac/#technotes/tn2178/_index.html
Allocators
==========
Zmalloc features native support for jemalloc and tcmalloc, but even easier would be to try the LD_PRELOAD
trick later and see if it helps.
Alternatively or additionally, we could consider an arena allocator (i.e.: libarena): http://25thandclement.com/~william/projects/libarena.html
Stack allocators, FIFO allocators. Custom GCC cleanup extension:
https://techtalk.intersec.com/2013/10/memory-part-4-intersecs-custom-allocators/
Tutorials
=========
The 5 best:
http://www.opengl-tutorial.org/
http://ogldev.atspace.co.uk/
http://arcsynthesis.org/gltut/
http://www.antongerdelan.net/opengl - Anton's OpenGL4 tutorials
http://github.prideout.net/modern-opengl-prezo/ - Modern OpenGL Prezzo
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch25.html - GPU Gems 3 (free!)
http://svenandersson.se/2014/realtime-rendering-blogs.html - Collects all the blogs, basically, for when you've caught up...
glsl:
https://github.com/daw42/glslcookbook
Tools
=====
- Blending function and equationer test - http://www.andersriggelsen.dk/glblendfunc.php
Features implemented
====================
- MSAA (done, was actually just GL_MULTISAMPLE)
Features to implement
=====================
- Clang static analyzer Makefile target: make analyze
- Regular lighting - http://tomdalling.com/blog/modern-opengl/07-more-lighting-ambient-specular-attenuation-gamma/
- sRGB framebuffers (https://github.com/g-truc/ogl-samples/blob/master/samples/gl-320-fbo-srgb.cpp)
- sRGB textures
- Premultiplied alpha (http://blogs.msdn.com/b/shawnhar/archive/2009/11/06/premultiplied-alpha.aspx?Redirected=true)
- Premultiplied alpha that interacts correctly with sRGB everything - http://ssp.impulsetrain.com/2011-08-10_Gamma_Correction_vs__Premultiplied_Pixels.html
- File notification (inotify/kqueue/... or a cross-platform lib, maybe based on https://code.google.com/p/simplefilewatcher/ - or even better: https://github.com/alandipert/fswatch)
- Basic lighting (+ shadow mapping) (http://devmaster.net/posts/2974/the-basics-of-3d-lighting)
- Shadow mapping (http://the-witness.net/news/2013/09/shadow-mapping-summary-part-1/ - very GOOD article)
- Font rendering
- Font rendering with distance fields (Valve tech)
- Cartoon rendering (http://pymolwiki.org/index.php/GLSL_Shaders) - no geometry shaders
- Silhouette rendering (http://prideout.net/blog/?p=54 - includes simple pure C source for geometry shaders as well)
- Geometry shaders
- Model loading (assimp)
- Compressed textures (http://www.g-truc.net/project-0024.html) - (MUCH BETTER CODE: https://github.com/ccxvii/mio/blob/master/image.c)
- Memory tracking, per-frame and linear allocators: http://www.swedishcoding.com/2008/08/31/are-we-out-of-memory/
- Particles (https://github.com/daw42/glslcookbook/blob/master/chapter09/shader/particleinstanced.vs)
- CSAA (Coverage Sample Anti Aliasing) - http://www.opengl.org/registry/specs/NV/multisample_coverage.txt and illustrated in GL3HelloWorld
- SSAO
- Quaternion nlerp instead of slerp (equally useful and cheaper to calculate: http://number-none.com/product/Understanding%20Slerp,%20Then%20Not%20Using%20It/)
- Volumetric clouds
- Volumetric lighting
- Global illumination
- Bumpmapping (http://www.crazybump.com/, http://quixel.se/ndo/)
- Parallax mapping
- Tone mapping
(http://www.rockpapershotgun.com/2013/09/20/nowhere-might-be-the-most-ambitious-pitch/)
- Shadertoy implementation of many tone mapping operators: https://www.shadertoy.com/view/lslGzl
- HDR lighting (RGBA16F, LogLuv, http://mynameismjp.wordpress.com/2008/12/12/logluv-encoding-for-hdr/)
- Tesselation
- GPU skinning
- GPU animation
- Instancing
- Pare down renderstate changes - (http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BRenderstate%20change%20costs%5D%5D)
- Cache oblivious vertex indexing
- Vertex cache optimization (Tom Forsyth) - http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html - https://github.com/dv1/vcache_optimizer - http://www.opengl.org/wiki/Post_Transform_Cache#Forsyth
- Virtual texture terrain (http://devmaster.net/posts/3258/virtual-texture-terrain)
- Terrain LOD
- Water shader (http://www.youtube.com/watch?annotation_id=annotation_790483&feature=iv&src_vid=OiGADgezjC8&v=oUSdSjnDB_E - http://www.gamedev.net/blog/1302/entry-2250847-bow-shock-a-summary-of-work-done-so-far/) - http://http.developer.nvidia.com/GPUGems/gpugems_ch01.html - http://www.jayconrod.com/posts/34/water-simulation-in-glsl
- Vegetation (Jungle - OpenGL 3D engine/viewer project - UTBM - http://www.youtube.com/watch?v=4M-wfzuCgvs)
- Clip plans ahead of sight - http://github.prideout.net/clip-planes/
- Instancing
- Deferred shading (and why deferred shading is better than deferred
lighting: http://gameangst.com/?p=141 - example: https://github.com/ccxvii/mio/blob/master/shader.c and http://www.youtube.com/watch?v=ELeGC-x1hO8&feature=youtu.be)
- Gamma correct rendering
- Get a real asser in there: http://cnicholson.net/2009/02/stupid-c-tricks-adventures-in-assert/
- verify your binary searches are generating CMOVs where it counts... (drawlist sorting)
- use some open-source textures to make the demo's look better and test bump/normal-mapping: http://opengameart.org/content/new-england-textures-i
Data oriented design
====================
http://www.asawicki.info/news_1422_data-oriented_design_-_links_and_thoughts.html - Links to slides and articles (BitSquid, etc...)
OpenGL wiki interesting articles
================================
http://www.opengl.org/wiki/Vertex_Specification
Links to be reviewed for implementation
=======================================
Techniques from "crazy fractal worlds":
http://www.youtube.com/watch?v=8-5Z_031x00
http://www.iquilezles.org/www/material/function2009/function2009.pdf
Distance functions - iñigo quilez:
http://www.iquilezles.org/www/articles/distfunctions/distfunctions.htm
Awesome articles - iñigo quilez:
http://iquilezles.org/www/
The care and feeding of an OpenGL 3.2 core context on OSX:
http://hci.rwth-aachen.de/opengl_3_2_tutorial
Fully self-contained OpenGL 3.2 + GLSL 1.5 example, with matrix functions in C:
http://www.lighthouse3d.com/cg-topics/code-samples/opengl-3-3-glsl-1-5-sample/
Indexed drawing:
http://www.arcsynthesis.org/gltut/Positioning/Tutorial%2005.html
Kazmatch, a C math library for games:
https://github.com/Kazade/kazmath/tree/master/kazmath
Apple, best practices for working with vertex data:
http://developer.apple.com/library/ios/#documentation/3ddrawing/conceptual/opengles_programmingguide/TechniquesforWorkingwithVertexData/TechniquesforWorkingwithVertexData.html
[GEOMETRY SHADERS]
http://gamedev.stackexchange.com/questions/16454/glsl-rewriting-shaders-from-330-to-130
http://gamedev.stackexchange.com/questions/33190/glsl-rewriting-geometry-shader-from-330-to-130-version?rq=1
http://github.prideout.net/modern-opengl-prezo/
Create a galaxy with geometry shaders - https://raw.github.com/progschj/OpenGL-Examples/master/07geometry_shader_blending.cpp
[INSTANCING] Drawing instanced geometry with the help of glVertexAttribDivisor:
http://programming4.us/multimedia/8306.aspx
[INSTANCING] Alternative way to instance, pass a buffer and index it with gl_InstanceId:
http://gamedev.stackexchange.com/questions/60203/most-efficient-way-to-draw-large-number-of-the-same-objects-but-with-different
[INSTANCING] Wolfire Lugaru: pseudo-instancing and discussion:
http://blog.wolfire.com/2009/11/Fast-object-instancing
[INSTANCING] iki.fi, pass a (transformation) matrix (mat4) with 4 glVertexAttribPointer()-calls and glVertexAttribDivisor, very cool:
http://sol.gfxile.net/instancing.html
[FILESYSTEM] efsw, file system watching (C++ but used by C projects as well):
https://bitbucket.org/SpartanJ/efsw
[ENGINES] DarkHammer, C, Lua:
https://bitbucket.org/sepul/dark-hammer
[ENGINES] Urho3D, C++, Lua, Task-based multithreading:
https://code.google.com/p/urho3d/
[SHADER EFFECTS] Glow and Bloom:
http://devmaster.net/posts/3100/shader-effects-glow-and-bloom
[SHADER EFFECTS]: Depth of field (DOF):
http://devmaster.net/posts/3021/shader-effects-depth-of-field
[SHADER EFFECTS]: The egg:
http://pixelshaders.com/editor/#5d000001002e0700000000000000381c88cdaf8125d4569ed1e6e6c09c2fe72b7d489ad9d27ce026c85da1432f9b0a56cbf82b16d7f997abaffdd294666ceebb4d1566bfeab205986af7591b004d02668635f6bfb645f3d18f1536b6d917ec60480c76e3602b9cc9ca742cd07c3c448b52255ce638c24ebbeadda97525371d914a2d66906a6414d62b4767a6a8a7752dcc2a488d629e87a4c9cdd7f6f3eefb93c5779c8f1d893756d5e1049f04fa0fddabb44cffbae6ee0452d206faa0878e1a3b62f404f85321be1d8413f2b99b58afec5c70d891897aa69526a12659130d8ea95742e6bf4c88b83e7aae47c94134a1c62cd63d672904adc0bdfcbe58c223348742f69a106a723127d77c7cc9c0dc4b9e32c703840301aeb7ef177e071501faa8a290824345db2b1f07e8a58fe9700503683d64aa79c11353f322628da8abb88905b71eab3a890411fc60c8024e01395fbc2de7561c60666a089df33f5143ba88ba740396b2b917962d8b20270250c757b391103c86aa946689d54561ee8f61934ca2382c712a8f3d243926468108d8a2f1c1c6a2ea9db98efa0bec579c0a47717e3735c74c4320b6a6803aa1c0a44fcbc2dd02a68c6ec1be68fcfac70a12c4660d8b7a0bd158060960b5ff0a16dfe04703d80b0465758896e09613eef025677d11d3bb5abeec62cfabea1e65138c51a13dae171ea7498d853f17e9f5c051b6666ce7966ff0bf63ddd94a904c9df419dd8ef43839923a9e364cc5d002ca2f7553eec21aadafa890bbe66a69fd05833106af40baa5433407f5b0dce9c1f8d56ae264f8136faf10c1a54f4dd49d6d850fb79f27fb35f71d9d2f1263decb08ddbad3f3ade9a36eab876bae76f844e3a1258d8b7276041561515378427c58ce8be0df826bbdb84dddbdb69420d2c0e3b577af9982f46b49de2a426af9de743cb1a55d2e8a52dee6b60eb84ff0594f684a3115b9bf0c4bbcd6f1e79c1b0360eac704a85435c39dceec3fc2a166db96d0d6dd0c86413ae97b2e3a46bfc024ba1c6b2c2d2c2a90422a245a38d33581cfc6e4a181b18fe25a679f6dc64e5171925908515a45d0b5f040081e0e047533bffff9a5184f
[TESSELATION] Tesselation does work on OSX:
http://jonmacey.blogspot.de/2013/05/glsl-tessellation-shaders-under-mac-osx.html
[GRAPHICS] Screen space ambient occlusion (SSAO), a step for deferred rendering:
http://devmaster.net/posts/3095/shader-effects-screen-space-ambient-occlusion
[GRPAHICS] Article full of links for deferred rendering:
http://www.gamerendering.com/2008/11/01/deferred-lightning/
Killzone 2 presentation on deferred rendering - http://www.guerrilla-games.com/publications/dr_kz2_rsx_dev07.pdf
Deferred rendering demystified, gamedev.net - http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/deferred-rendering-demystified-r2746
Intel research, deferred rendering - http://visual-computing.intel-research.net/art/publications/deferred_rendering/
[GRAPHICS] Best tutorial on bump mapping I've seen yet:
http://ogldev.atspace.co.uk/www/tutorial26/tutorial26.html
http://stackoverflow.com/questions/15324357/normal-mapping-tbn-matrix-calculation?rq=1
[LUA] How to export symbols from an executable:
http://stackoverflow.com/questions/3954137/mac-how-to-export-symbols-from-an-executable
[LUA] How to include an extra module at compile time (such as YAML):
http://stackoverflow.com/questions/11485911/lua-5-2-c-api-and-require
[LUA] use FFI module:
http://stackoverflow.com/questions/5919743/how-to-use-luajits-ffi-module-when-embedding?rq=1
[LUA] Simple C to Lua example, setup environment with variables and call lua script, read result:
http://lua-users.org/wiki/SimpleLuaApiExample
[LUA] The lua faq:
http://www.luafaq.org/
[LUA] Support both 5.1 and 5.2 with a tiny bit of IFDEF:
http://lua.2524044.n2.nabble.com/Standard-Lua-modules-in-5-2-1-td7581168.html
[PHYSICS] How to create a custom physics engine:
http://gamedev.tutsplus.com/tutorials/implementation/how-to-create-a-custom-2d-physics-engine-the-core-engine/
[TERRAIN] The seminal simplex noise paper (look for JS implementations as well):
http://webstaff.itn.liu.se/~stegu/simplexnoise/simplexnoise.pdf
[TERRAIN] Terrain generation tutorial:
http://www.shamusyoung.com/twentysidedtale/?p=147
[PATHFINDING] Avoiding collisions with pathfinding:
http://cleverpuntitle.blogspot.be/2013/08/avoiding-pathfinding-caused-collisions.html
[COMPILERS] Things DEFINE'd by specific compilers:
http://sourceforge.net/p/predef/wiki/Compilers/
[BENCHMARK] Create code to benchmark pieces, some inspiration:
http://preshing.com/20111203/a-c-profiling-module-for-multithreaded-apis
[GRAPHICS - RAYTRACING]
Writing a small ray-tracer from scratch (series) - http://www.scratchapixel.com/lessons/3d-basic-lessons/lesson-1-writing-a-simple-raytracer/
Creating a small path-tracer - http://geon.github.io/programming/2013/08/22/gloss-my-bidirectional-path-tracer/
Brigade, advanced (realtime!) path tracer - http://raytracey.blogspot.co.nz/
Sanfilippo Salvatore - tiny JS raytracer - https://github.com/antirez/jsrt/blob/master/rt.html
[LUA] Lulup, line-level profiler for Lua and LuaJIT (actually just LuaJIT for now, needs FFI):
http://blog.jgc.org/2013/04/lulip-line-level-profiler-for-code.html
[PROGRAMMING] The nature of code (processing), L-systems, fractals, physics, attractors, ...:
http://natureofcode.com/book/chapter-6-autonomous-agents/
http://acko.net/files/fullfrontal/fullfrontal/wdcode/online.html
http://acko.net/files/fullfrontal/fullfrontal/webglmath/online.html
https://www.udacity.com/course/viewer#!/c-cs291/l-68866048/m-96403513 (udacity url)
[IDEA] Make mathbox shapes as objects in prototype (interpolate all the things):
http://acko.net/blog/making-mathbox/
[NETWORKING] Halo's networking architecture and Gaffer on Games
http://www.gdcvault.com/play/1014345/I-Shot-You-First-Networking
http://gafferongames.com/networking-for-game-programmers/
http://www.reddit.com/r/gamedev/comments/1l3q5a/netcode/
[PHYSICS-OPENCL] Instanced drawing the OpenGL, physics with OpenCL for absolutely awesome speed:
http://www.altdevblogaday.com/2011/06/11/a-short-opencl-tutorial-updating-world-transforms-of-instanced-meshes-on-the-gpu/
[OPENGL] Tips and tricks for modern OpenGL:
http://openglinsights.com/tips.html
[OPENGL] Why have two matrices in GLSL instead of one pre-multiplied one?
http://stackoverflow.com/questions/10617589/why-would-it-be-beneficial-to-have-a-separate-projection-matrix-yet-combine-mod
[OPENGL] glPrimitiveRestartIndex():
The restart index is the index that, when fetched from the index buffer (hint hint) doesn't actually specify a vertex. It instead means that the primitive should be restarted. So if you set the restart index to 65535, and draw {0, 1, 2, 3, 4, 65535, 5, 6, 7, 8} as a GL_TRIANGLE_STRIP, you will get 5 triangles. The first three from the first 5 indices, and the next two from the last 4. 65535 is what says to restart.
[OPENGL] Buffer Object Streaming:
http://www.opengl.org/wiki/Buffer_Object_Streaming
[ENGINE DESIGN] Doom III BFG design document:
http://fabiensanglard.net/doom3_documentation/index.php
[OPENGL] All drawing methods glDrawInstanced(), glDrawElementsBaseVertex(), ...:
http://www.opengl.org/wiki/Vertex_Rendering
[PERFORMANCE] Demystifying the restrict keyword:
http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html
[SIMD/SSE/AVX]
List of functions with small description, really handy - http://softpixel.com/~cwright/programming/simd/sse4.php
Check CPU support at runtime - http://stackoverflow.com/questions/6121792/how-to-check-if-a-cpu-supports-the-sse3-instruction-set/7495023#7495023
http://fastcpp.blogspot.be/2011/04/vector-cross-product-using-sse-code.html
[OPENGL] How does the OpenGL coordinate system work:
http://stackoverflow.com/questions/7769371/how-does-opengls-coordinate-system-work
[PERFORMANCE] Best practices for working with vertex data (interleave, ...):
https://developer.apple.com/library/ios/documentation/3ddrawing/conceptual/opengles_programmingguide/TechniquesforWorkingwithVertexData/TechniquesforWorkingwithVertexData.html
[PERFORMANCE] Optimize vertex data automatically (pre-bake):
http://www.lighthouse3d.com/tutorials/glsl-core-tutorial/vertex-shader/
[PERFORMANCE] use intels' ispc for higly-optimized SIMD code:
https://github.com/ispc/ispc
[PERFORMANCE] mat to quat + angular velocity based on quaternions (real awesome!):
http://www.gamasutra.com/view/feature/131686/rotating_objects_using_quaternions.php?print=1
[LIGHTING, MATH, VIDEO] John Carmack explains lighting in games:
http://www.youtube.com/watch?v=jA2ZWYZSBYA
[FONT] must-see projects and sites for font rendering
https://github.com/erwincoumans/experiments/tree/master/rendering/OpenGLTrueTypeFont
http://jonmacey.blogspot.be/2011/10/text-rendering-using-opengl-32.html
http://digestingduck.blogspot.be/2009/08/font-stash.html and https://github.com/apetrone/font-stash
http://stackoverflow.com/questions/2071621/opengl-live-text-rendering
Using stb_truetype with SDL - http://eatenbyagrue.org/using_stb_truetype_with_sdl.html
freetype-gl, 2013, modern high-quality font rendering with single vertex buffer! (includes distance map code) - https://code.google.com/p/freetype-gl/ (examples for console input and distance fields)
Gist, minimal font rasterizer and blitter example, modern OpenGL + freetype - https://github.com/ccxvii/snippets/blob/master/glfont.c
==>
glGenTextures(1, &g_cache_tex);
glBindTexture(GL_TEXTURE_2D, g_cache_tex);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
glTexImage2D(GL_TEXTURE_2D, 0, GL_ALPHA, g_cache_w, g_cache_h, 0, GL_ALPHA, GL_UNSIGNED_BYTE, NULL);
[GEOMETRY] implicit surfaces:
http://paulbourke.net/geometry/implicitsurf/
http://www.iquilezles.org/
[GPU GEMS] Efficient distance field generation"
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch34.html
[PERFORMANCE] Intel intrinsics guide, great reference app:
http://software.intel.com/en-us/articles/intel-intrinsics-guide
[MEMORY PERFORMANCE] naive vs. store vs. stream (for data that is known to be larger than L2 cache, mm_stream is clear winner, otherwise mm_store wins):
http://www.gamedev.net/topic/532112-fast-memset/
[PERFORMANCE, GEOMETRY] Inigo Quilez - avoiding trigonometry:
http://www.iquilezles.org/www/articles/noacos/noacos.htm
[PERFORMANCE, THREEDEE] Picking orthogonal vectors:
http://lolengine.net/blog/2013/09/21/picking-orthogonal-vector-combing-coconuts
/* Always works if the input is non-zero.
* Doesn’t require the input to be normalised.
* Doesn’t normalise the output. */
vec3 orthogonal(vec3 v) {
return abs(v.x) > abs(v.z) ? vec3(-v.y, v.x, 0.0)
: vec3(0.0, -v.z, v.y);
}
/* or... branchless but probably performs worse */
/* Requires the input to be normalised.
* Doesn’t normalise the output. */
vec3 orthogonal(vec3 v) {
float k = fract(abs(v.x) + 0.5);
return vec3(-v.y, v.x - k * v.z, k * v.y);
}
[OPENGL] gamma correct rendering:
http://www.g-truc.net/post-0263.html
http://hacksoflife.blogspot.de/2010/11/gamma-and-lighting-part-3-errata.html
http://stackoverflow.com/questions/9049687/using-srgb-colour-in-qglframebufferobject-with-multisampling
http://filmicgames.com/archives/299
http://filmicgames.com/archives/327
http://filmicgames.com/archives/14
http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BPremultiplied%20alpha%5D%5D
Use a FP16 framebuffer for even more accuracy and better HDR: http://www.horde3d.org/forums/viewtopic.php?f=8&t=1097
sRGB texture:
[INPUT] = sRGB data (most pictures, for example, usually have sRGB data, because they were made to look good on the artists monitor)
[OUTPUT] = RGB data (when sampled, sRGB textures output RGB data)
sRGB framebuffer:
[INPUT] = RGB data (you need to input linear colors into the framebuffer)
[OUTPUT] = sRGB data
An alternative to using an sRGB framebuffer is to correct the gamma yourself in the output shader, though
this is less efficient (if if you can cheat and use 2.0 for gamma, making the operation cheaper). pow(color, 1.0 / 2.2).
Usually, we don't notice something is off in the beginning (simple pipeline with no gamma correction). That's because the
textures are pre-gamma corrected. They were made to look good with gamma 2.2 (sRGB). And when you blit that, it looks good.
But it messes with light calculations, because the sRGB space is nonlinear. So just use both sRGB textures and a sRGB framebuffer
and you should be fine. An illustration:
// RGB texture pass-through
// No correction needed, the texture is in inverse gamma by definition!
// That's the reason most simple sprite engines look fine.
out = texture(tex, coord);
// sRGB texture pass-through
// The GPU has converted the sample from inverse gamma to linear for us, we need to correct.
out = pow(texture(tex, coord), 1.0 / 2.2);
// RGB + lighting
// We're adding something, need to go linear, then gamma correct.
out = pow(pow(texture(tex, coord), 2.2) * diffuse + specular, 1.0 / 2.2);
// sRGB + lighting
// The sample is already linear (and even has correct linear/mipmap filtering).
out = pow(texture(tex, coord) * diffuse + specular, 1.0 / 2.2);
The final observation that should also be interesting to you is that I said "no matter what". This means that even if you use floating point textures exclusively and you perform linear HDR rendering all the way, you STILL have to do gamma-correction during tone-mapping. Your tone-mapping operator needs to be gamma-aware or you simply go from linear to gamma-space as the last step in the tone-mapping shader.
The reason why we don't pre-process images to be linearized is that we would lose precision. Going from sRGB to RGB
loses precision. It's 8-bit to 8-bit and the human eye has less resolution for color changes than for lightness
changes. But we can linearize in the shader, no? or an sRGB texture does it for us, no? True, but both the shader
and the sRGB texture output FLOATS, which have far more precision than the 8-bit RGB you were going to preprocess
to. So the conversion is lossless when you go from 8-bit sRGB to 32-bit floats, but not from 8-bit sRGB to 8-bit RGB.
Should you want to switch to true HDR lighting later on the road, you might need to use a high-precision floating point framebuffer,
which I haven't read up enough on. I assume that means we don't get sRGB for free anymore and we'll have to correct in the
output shader...
[SHADERTOY]
Wolfenstein #Dhttps://www.shadertoy.com/view/4sfGWX
Venice: https://www.shadertoy.com/view/MdXGW2
[OPTIMIZATION] Tom Forsyth - Vertex Cache Optimization
Well, it never rains but it pours. OK, not a true reinvention this time, but a sort of co-invention. Sony just announced their EDGE libraries recently and in them was a snazzy new vertex-cache optimiser, and some people at GDC asked me whether it was my algorithm. I had no idea - haven't written anything for the ~PS3 for ages, but I knew Dave Etherton of Rockstar ported [[my algorithm|Vertex Cache Optimisation]] to the ~PS3 with some really good results (he says 20% faster, but I think that's compared to whatever random ordering you get out of Max/Maya).
Anyway, the EDGE one is actually a version of the [[K-Cache algorithm|http://www.ecse.rpi.edu/~lin/K-Cache-Reorder/]] tweaked and refined by [[Jason Hughes|http://www.dirtybitsoftware.com/]] for the particular quirky features of the post-transform cache of the RSX (the ~PS3's graphics chip). Turns out we do very similar broad-scale algorithms, but with slightly different focus - they tune for a particular cache, I try for generality across a range of hardware. It would be academically interesting to compare and contrast the two. However, they both probably achieve close enough to the ideal ACMR to ensure that vertex transformation is not going to be the bottleneck, which is all you really need.
Remember - after ordering your triangles with vertex-cache optimisation, you then want to reorder your vertex buffer so that you're using vertices in a roughly linear order. This is extremely simple - it's exactly the algorithm you'd expect (scan through the tris, first time you use a vert, add it to the VB) but it's as much of a win in some cases as doing the triangle reordering.
[OPTIMIZATION]
Fast multiplication of normalized 16 bit numbers with SSE2 - http://ssp.impulsetrain.com/2011-07-03_Fast_Multiplication_of_Normalized_16_bit_Numbers_with_SSE2.html
Fast quaternion-vector - http://molecularmusings.wordpress.com/2013/05/24/a-faster-quaternion-vector-multiplication/
Write combining is not your friend - http://fgiesen.wordpress.com/2013/01/29/write-combining-is-not-your-friend/
Optimizing a skinning pipeline - http://molecularmusings.wordpress.com/2013/08/22/adventures-in-data-oriented-design-part-4-skinning-it-to-11/#more-458
Optimizing software occlusion culling (series), features much more - http://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/
Storing normals efficinetly - http://aras-p.info/texts/CompactNormalStorage.html
[ENGINE DESIGN] How to reference objects owned by other systems in the
engine (pretty clever):
http://molecularmusings.wordpress.com/2013/07/24/adventures-in-data-oriented-design-part-3c-external-references/
[SCRIPTING] Racket on the playstation 3 (Naughty Dog):
http://www.youtube.com/watch?v=Gomcqdi6kKE&t=70m
[OPENGL OPTIMIZATION]
Efficient draw calls - glMultiDrawElements vs primitive restart - http://programming4.us/multimedia/8302.aspx
Conclusion: primitive restart wins hard. Also, plain GL_TRIANGLES is
mostly preferred, except for regular grids: http://www.opengl.org/discussion_boards/showthread.php/174329-glMultiDrawElements-or-glPrimitiveRestartindex
also see http://www.ludicon.com/castano/blog/2009/02/optimal-grid-rendering/
One million draws per frame, notes from an AMD driver developer:
http://www.openglsuperbible.com/2013/10/16/the-road-to-one-million-draws/#return-note-403-1
[CPU OPTIMIZATION]
Create different function versions for different ISAs with gcc 4.8: http://gcc.gnu.org/wiki/FunctionMultiVersioning
[OPENGL CONCEPT DEVELOPMENT]
The GLSL hacker, prototype in Lua - http://www.geeks3d.com/glslhacker/
[ENGINE DESIGN]
Parallel rendering and sorting your drawlists (see comments) - http://bitsquid.blogspot.be/2009/10/parallel-rendering.html