Introduction
During the testing and optimization of my latest setup, I started looking at the TouchLib code repository.
I found that there are some issues with the current code that could easily be fixed. This fix will not only improve the CPU usage, capture rate and jitter, but also the responsivness of the whole setup. Some of you were complaining that TouchLib cannot run at 60fps. Well with this fix it will run at the frame rate of the camera.
The starting point is the file CTouchScreen.cpp and its function process(). This function is executed on a separate thread created by calling the beginProcessing() function.
bool CTouchScreen::process()
{
while(1) {
if(filterChain.size() == 0)
return false;
//printf("Process chain\n");
filterChain[0]->process(NULL);
IplImage *output = filterChain.back()->getOutput();
if(output != NULL) {
//printf("Process chain complete\n");
frame = output;
if(bTracking == true)
{
//printf("Tracking 1\n");
tracker->findBlobs(frame);
tracker->trackBlobs();
#ifdef WIN32
DWORD dw = WaitForSingleObject(eventListMutex, INFINITE);
//dw == WAIT_OBJECT_0
if(dw == WAIT_TIMEOUT || dw == WAIT_FAILED) {
// handle time-out error
//throw TimeoutExcp();
printf("Failed %d", dw);
}
else
{
//printf("Tracking 2\n");
tracker->gatherEvents();
ReleaseMutex(eventListMutex);
}
#else
int err;
if((err = pthread_mutex_lock(&eventListMutex)) != 0){
// some error occured
fprintf(stderr,"locking of mutex failed\n");
}else{
tracker->gatherEvents();
pthread_mutex_unlock(&eventListMutex);
}
#endif
}
//return true;
}
SLEEP(32);
}
}
The second point of interest is the call to
filterChain[0]->process(NULL);
The filterChain[0] is the first filter in our graph, which is by default (and in my setup) the DSVLCaptureFilter.
This filter’s process() function internally calls kernel() function. If we look at kernel() function in DSVLCaptureFilter.cpp file we’ll find the following:
void DSVLCaptureFilter::kernel()
{
DWORD wait_result = dsvl_vs->WaitForNextSample(100/60);//);
//if(wait_result == WAIT_OBJECT_0)
{
//dsvl_vs->Lock();
if(SUCCEEDED(dsvl_vs->CheckoutMemoryBuffer(&g_mbHandle, &g_pPixelBuffer)))
{
acquired->imageData = (char *)g_pPixelBuffer;
cvCopy(acquired, destination);
g_Timestamp = dsvl_vs->GetCurrentTimestamp();
dsvl_vs->CheckinMemoryBuffer(g_mbHandle);
}
}
}
Analysis
1. As we can see from the code in the process() function in CTouchScreen.cpp file, after we process one camera frame the thread goes to sleep for 32ms. This is a quick and dirty way to reduce the CPU usage. The side effect of this is that we will directly limit the maximum processing frame rate of the TouchLib code. Another this is that after sleeping for 32ms, the whole blob processing loop will run asynchronously to the camera capture, which will result in increased time lag and jitter.
2. I’m not sure if the following line in DSVLCaptureFilter::kernel() function is a feature/bug:
DWORD wait_result = dsvl_vs->WaitForNextSample(100/60);
but this call actually result in a wait for new frame that only lasts 1ms.
This has greater implications. Since this call to DSVL library is a blocking for a duration of time specified, the DSVL library may or may not capture a new frame during this time. Therefore it is a common sense for this delay to be at least the duration of one sample.
By having the value of this delay being 1ms, and not checking its result, function CheckoutMemoryBuffer() will be returning the same buffer that was already processed (depending on the camera capture rate) thus wasting unnecessary CPU cycles and increasing the latency.
Because of all of this I suggest a simple fix.
Fix
1. In the CTouchScreen.cpp file, in the process() function remove or comment out:
SLEEP(32);
since our call to
filterChain[0]->process(NULL);
will be blocking until the new frame is captured, this is ok and will not result in the increase of the CPU usage. More over, it will result in the CPU usage decrease, since we will process every captured frame only once.
2. In the DSVLCaptureFilter.cpp file, in the kernel() function change:
DWORD wait_result = dsvl_vs->WaitForNextSample(100/60);
to
DWORD wait_result = dsvl_vs->WaitForNextSample(200);
this will cause the DSVL library to wait for 200ms which is plenty of time and will guarantee in a brand new captured frame.
Conclusion
By doing the modifications described above, we are in effect synchonizing our blob detection/processing to the incoming frame rate from the camera. We are also minimizing the delay between capture and processing, thus completely eliminating jitter and reducing the lag to the minimum.
So as the people say “The proof is in the pudding”, after some testing, these fixes reduced my CPU usage while at the same time reducing the blob recognition lag and jitter.
~Alex
