Skip to content

Commit ce659ee

Browse files
committed
a few extra comments for MSVC 2008 (or prior) projects.
updated fail guard explanation and performance tests included an additional usage hint (regarding yielding the rendering thread after each frame draw). Signed-off-by: Marcos Paulo Berteli Slomp <[email protected]>
1 parent 29a7c33 commit ce659ee

File tree

1 file changed

+86
-76
lines changed

1 file changed

+86
-76
lines changed

platform/windows/README.TXT

Lines changed: 86 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ SUMMARY
55

66
1) libfreenect: incompatibilities with Visual C++
77
This section is here merely for historical reasons. All of the issues documented
8-
there were already fixed in the libfreenect repository and CMake should be able to
8+
here were already fixed in the libfreenect repository and CMake should be able to
99
produce a project that is ready to build libfreenect in Windows with Visual Studio.
1010
Consider browsing this section if experiencing compilation issues under different
1111
platforms, compilers and/or IDEs.
@@ -15,7 +15,7 @@ SUMMARY
1515
a proper port of libusb-1.0 for Windows is not yet available. Such emulation layer
1616
allows Windows development to keep in sync with the official development branch of
1717
libfreenect, without the need of dedicated drivers/implementations. This section
18-
discusses how and why the current libfreenect Windows port moved in this direction.
18+
discusses why and how the current libfreenect Windows port moved in this direction.
1919

2020
3) libusbemu: Tips, Hints and Best Practices
2121
The current status of libusbemu is quite reliable under normal usage circumstances,
@@ -40,8 +40,8 @@ Language issues: The Microsoft C compiler does not implement all the C99 standar
4040
----------------------------------------------------------------------------------
4141

4242
An attempt to compile the current libfreenect with Visual C++ will trigger a lot
43-
of errors. A simple workaround is to tell Visual Studio to force compilation all .c
44-
files within the project using the C++ compiler:
43+
of errors. A simple workaround is to tell Visual Studio to force compilation all the
44+
".c" files within the project using the C++ compiler:
4545
Project >> Properties >> C/C++ >> Advanced >> Compile As: Compile as C++ Code (/TP)
4646

4747
This will get rid of most errors, except those regarding implicit pointer casts.
@@ -168,36 +168,22 @@ libusbemu exists.
168168
3) libusbemu: Tips, Hints and Best Practices
169169
*****************************************
170170

171-
----------------------------------------------------------------------------
172-
TIP: Configure your project to build your application with a console window!
173-
----------------------------------------------------------------------------
171+
---------------------------------------------------------------
172+
TIP: Trigger the Fail Guard if the system becomes unresponsive.
173+
---------------------------------------------------------------
174174

175-
Since libusbemu is quite experimental, There is a fail guard within it. If for some
176-
reason your system renders unresponsive, try to focus the console window of your
177-
application and press [ESC].
175+
Since libusbemu is quite experimental, there is a fail guard within it. If for some
176+
reason your system renders unresponsive, try focusing any window of the application
177+
and hold [CTRL] + [ALT] for a while.
178178

179179
This will trigger a special synchronization event within the libusbemu which will
180180
interrupt the execution of any internal thread of libusbemu and prompt a message box
181181
to the user asking for action. You can either resume execution (if you unintentionally
182-
pressed [ESC] in the console window) or abort the libusbemu. The recent fixes in the
183-
libusbemu don not seem to be causing any unresponsiveness, but is hard to assert...
182+
pressed [CTRL] + [ALT] in the console window) or abort libusbemu. The fail guard will
183+
never trigger if there is no incoming video or depth streams.
184184

185-
Note that this key check will only have effect if the console window is focused. The
186-
application is still free to capture the [ESC] key normally within the application
187-
window (unless the application window happens to be the console, but then it is very
188-
likely that no stream is being captured and no call to freenect_process_events() is
189-
being made, thus preventing the fail guard to be ever trigged).
190-
191-
--------------------------------------------------
192-
TIP: Do not change the default state of libusbemu.
193-
--------------------------------------------------
194-
195-
The default libusbemu setup embraces the best performance currently found so far:
196-
multi-threaded stream reaping.
197-
198-
If one feels curious/adventurous, the USB isochronous reap strategy can be modified
199-
within the emulator code but don't expect good results; the default ReapThreaded()
200-
strategy is the most reliable and stable, while the others are very experimental.
185+
The Fail Guard is important since in such situations one would most likely be forced
186+
to shutdown the computer (in a not so graceful way, by holding the power button!).
201187

202188
----------------------------------------
203189
TIP: Only use a single freenect context.
@@ -206,6 +192,23 @@ TIP: Only use a single freenect context.
206192
Multiple freenect contexts should be no problem in the future. Having more than one
207193
device attached to the same context should work fine, but no tests were made so far.
208194

195+
-----------------------------------------------------
196+
TIP: Yield the rendering thread after each iteration.
197+
-----------------------------------------------------
198+
199+
In case your application performs direct rendering (OpenGL, Direct3D), it is highly
200+
recommended to yield the rendering thread after finishing rendering each frame. This
201+
can alleviate a lot of CPU usage. The ideal place for this yield is right after the
202+
swap-buffers call (SwapBuffers(), glutSwapBuffers(), Present(), etc). The best way to
203+
yield is by calling Sleep(1). Note that Sleep(0) is also possible, but the former is
204+
more "democratic".
205+
206+
Note that some graphics drivers or platforms may already yield after swap-buffers.
207+
This seems to be the case with OpenGL NVIDIA drivers for Linux. In Windows, however,
208+
the same driver (version) does not yield. Maybe in Linux the "yielder" is not the
209+
driver itself, but the underlying implementation of glXSwapBuffers()... Anyway, when
210+
in doubt, it will not hurt to explicitly yield again.
211+
209212
--------------------------------------------------------------------------------------
210213
TIP: Perform stream operations only in the thread that calls freenect_process_events()
211214
--------------------------------------------------------------------------------------
@@ -242,77 +245,84 @@ and if such behavior is really to be expected is a hard task to determine.
242245
4) Overall performance of libfreenect in Windows
243246
*********************************************
244247

248+
---------------------------------------------
249+
THIS SECTION IS OUTDATED, GOTTA REDO IT SOON!
250+
---------------------------------------------
251+
245252
Hardware:
246253
* Notebook
247254
* CPU: Intel Core2 Duo 32bit [T7250] @ 2.0GHz
248255
* RAM: 4GB RAM
249256
* GPU: GeForce 8600M GT 256MB VRAM
250257

258+
259+
251260
Task: display of simultaneous RGB (Bayer-to-RGB) and depth streams (16bit unpadded) on
252261
the screen through OpenGL textures. Application source code is identical for all tests
253262
(except for the Zephod's version that required some interface adaptation).
254263

255-
The performance results below refer to the average frame time (one loop iteration).
256264

257-
Linux: Ubuntu Notebook 10.10
258-
* compiler: gcc 4.4.5
259-
* time measured via POSIX gettime()
260-
* Debug: 1.22ms (libfreenect also built in debug mode)
261-
* Release: 1.15ms
262265

263-
Win32: Windows 7 Enterprise 32bit
266+
Results: performance measurements refer to the average frame time (one loop iteration).
267+
268+
Linux: Ubuntu Notebook 10.10 -- gcc 4.4.5
269+
* Debug: 1.22ms | CPU @ 77% | video @ 30Hz | depth @ 30Hz
270+
* Release: 1.15ms | CPU @ 72% | video @ 30Hz | depth @ 30Hz
271+
272+
Win32: Windows 7 Enterprise 32bit -- VC++ (Professional) 2010
264273
1) libfreenect with libusbemu:
265-
* compiler: VC++ 2010
266-
* time measured via QueryPerformanceFrequency() / QueryPerformanceCounter()
267-
* Debug: 4.01ms (libfreenect also built in debug mode)
268-
* Release: 2.75ms (run without debug)
269-
2) Zephod's dedicated driver V16:
270-
* compiler: VC++ 2010
271-
* time measured via QueryPerformanceFrequency() / QueryPerformanceCounter()
272-
* Debug: 2.91ms (Zephod's driver also built in debug mode)
273-
* Release: 2.57ms (run without debug)
274-
275-
A Win32 MinGW-based (gcc/g++ 3.4.5) build of libfreenect with libusbemu using POSIX
274+
* Debug: 2.44ms | CPU @ 75% | video @ 30Hz | depth @ 30Hz
275+
* Release: 1.93ms | CPU @ 63% | video @ 30Hz | depth @ 30Hz
276+
2) Zephod's dedicated driver (V16):
277+
* Debug: 2.87ms | CPU @ 82% | video @ 30Hz | depth @ 30Hz
278+
* Release: 2.55ms | CPU @ 77% | video @ 30Hz | depth @ 30Hz
279+
280+
281+
282+
Remarks:
283+
284+
Debug builds account for the library itself being built in Debug mode (libfreenect or
285+
Zaphod's driver, whichever applies). Release builds also imply that the program was
286+
started without any debug information embedded.
287+
288+
In Windows, time was measured through the Win32 exclusive QueryPerformanceFrequency()
289+
and QueryPerformanceCounter() routines. In Linux, the measurement was performed via
290+
POSIX gettime() function.
291+
292+
A Win32 MinGW-based (gcc/g++ 4.5.0) build of libfreenect with libusbemu using POSIX
276293
gettime() also yielded to nearly identical performance results than VC++ 2010 with
277-
Performance Counters. All of the Win32 performance results were also double-checked
278-
with Fraps.
294+
Performance Counters.
295+
296+
All of the Win32 performance results were also double-checked with Fraps.
297+
298+
In Windows, a single thread yield call was placed after rendering each screen frame
299+
(as recommended in Item 3-3). In Linux, the graphics driver - possibly glXSwapBuffers
300+
itself - seems to be already yielding the rendering thread, and forcing it in the code
301+
did not incur into any extra impact on performance.
302+
279303

280-
In all platforms and build configurations, video and depth were streamed at 30Hz,
281-
which seems to be the maximum throughput available from Kinect.
282304

283305
Discussion:
284306

285307
Even though there are no streaming frequency discrepancies between platforms, one may
286308
infer, just from the frame times, that Windows clearly has an overhead disadvantage.
287309
However, this does not hold true: there is still plenty of time for the application
288-
logic to run. For a steady 60FPS (16.66ms per frame) real-time performance, a Release
289-
build in Windows would still have about 14ms per frame available for the application,
290-
while in Linux the available time would be around 15.5ms, a 1.5ms overhead difference
310+
logic to run.
311+
312+
For a steady 60FPS (~16.66ms per frame) real-time performance, a Release build using
313+
libusbemu in Windows would still have about 14.70ms per frame available for the client
314+
code, while in Linux the available time would be around 15.50ms, a 0.8ms overhead
291315
between the platforms. Such a small difference should not impose any special design
292316
considerations for the client code.
293317

294-
The overhead between libusbemu and Zephod's code is due the fact that libusbemu has
295-
to inspect the usb packages before forwarding them to libfreenect which will then be
296-
checked again within libfreenect. Another performance consideration is the fact that
297-
Zephod's Bayer-to-RGB and bit-unpacking seem to be faster than the stream conversion
298-
procedures provided by libfreenect.
299-
300-
Another performance impact between libusbemu and Zephod's code, is due the fact that
301-
libusbemu makes heavy use of STL containers which in Debug mode tend to introduce a
302-
significative overhead; such overhead disappears in Release mode since STL containers
303-
are prone to be "inlined" and further optimized, besides the fact that many security
304-
checks of the STL are disabled in Release builds. Moreover, libusbemu also uses lots
305-
of synchronization directives, such as mutexes and conditional variables (events),
306-
that impose extra overhead. Anyway, the overall overhead between the libusbemu and the
307-
Zephod's dedicated Win32 driver is negligible (1.1ms for Debug builds and 0.2ms for
308-
Release builds).
309-
310-
TODO:
311-
* CPU consumption: CPU should be at 100% in either Linux or Windows
312-
But in Windows, there are two critical-time threads inside the libusbemu which will
313-
compete for CPU resources with the main program (and the freenect_process_events()
314-
thread) which may also explain the overhead.
315-
* Memory consumption: no known leaks in the libusbemu; double checked with VC++ memory
316-
leak detection tools.
318+
Furthermore, note that the CPU usage in Windows tend to be lower than in Linux. The
319+
reason behind this difference is quite difficult to summarize here, but it is probably
320+
related to the way Windows and Linux perform asynchronous I/O in USB-mapped files.
321+
The interested reader is encouraged to refer to the following links:
322+
> http://www.unwesen.de/articles/waitformultipleobjects_considered_expensive
323+
> http://softwarecommunity.intel.com/articles/eng/2807.htm
324+
> http://software.intel.com/en-us/blogs/2006/10/19/why-windows-threads-are-better-than-posix-threads/
325+
The memory consumption overhead of libusbemu in Windows is negligible, and as far as
326+
the VC++ memory leak detection goes, libusbemu has no memory leaks.
317327

318328
======================================================================================

0 commit comments

Comments
 (0)