1 of 2
1
TouchLib Analysis and Fix!!! 
Posted: 09 June 2008 06:46 PM   [ Ignore ]
Administrator
Avatar
RankRank
Total Posts:  201
Joined  2008-05-08

Introduction

During the testing and optimization of my latest setup, I started looking at the TouchLib code repository.

I found that there are some issues with the current code that could easily be fixed. This fix will not only improve the CPU usage, capture rate and jitter, but also the responsivness of the whole setup. Some of you were complaining that TouchLib cannot run at 60fps. Well with this fix it will run at the frame rate of the camera.

The starting point is the file CTouchScreen.cpp and its function process(). This function is executed on a separate thread created by calling the beginProcessing() function.

bool CTouchScreen::process()
{

    
while(1{
        
if(filterChain.size() == 0)
            return 
false;
        
//printf("Process chain\n");
        
filterChain[0]->process(NULL);
        
IplImage *output filterChain.back()->getOutput();

        if(
output != NULL{
            
//printf("Process chain complete\n");
            
frame output;

            if(
bTracking == true)
            
{
                
//printf("Tracking 1\n");
                
tracker->findBlobs(frame);
                
tracker->trackBlobs();

#ifdef WIN32                
                
DWORD dw WaitForSingleObject(eventListMutexINFINITE);
                
//dw == WAIT_OBJECT_0
                
if(dw == WAIT_TIMEOUT || dw == WAIT_FAILED{
                    
// handle time-out error
                    //throw TimeoutExcp();
                    
printf("Failed %d"dw);
                    
                

                
else 
                
{
                    
//printf("Tracking 2\n");
                    
tracker->gatherEvents();
                    
ReleaseMutex(eventListMutex);
                
}
#else
                
int err;
                if((
err pthread_mutex_lock(&eventListMutex)) != 0){
                    
// some error occured
                    
fprintf(stderr,"locking of mutex failed\n");
                
}else{
                    tracker
->gatherEvents();
                    
pthread_mutex_unlock(&eventListMutex);
                
}
#endif
            
}
            
//return true;
        
}
        SLEEP
(32);
    
}

}

The second point of interest is the call to

filterChain[0]->process(NULL);

The filterChain[0] is the first filter in our graph, which is by default (and in my setup) the DSVLCaptureFilter.
This filter’s process() function internally calls kernel() function. If we look at kernel() function in DSVLCaptureFilter.cpp file we’ll find the following:

void DSVLCaptureFilter::kernel()
{
    DWORD wait_result 
dsvl_vs->WaitForNextSample(100/60);//);
    //if(wait_result == WAIT_OBJECT_0)
    
{

        
        
//dsvl_vs->Lock();
        
if(SUCCEEDED(dsvl_vs->CheckoutMemoryBuffer(&g_mbHandle, &g_pPixelBuffer)))
        
{

            acquired
->imageData = (char *)g_pPixelBuffer;
            
cvCopy(acquireddestination);
            
g_Timestamp dsvl_vs->GetCurrentTimestamp();
            
dsvl_vs->CheckinMemoryBuffer(g_mbHandle);
        
}
    }
}

Analysis

1. As we can see from the code in the process() function in CTouchScreen.cpp file, after we process one camera frame the thread goes to sleep for 32ms. This is a quick and dirty way to reduce the CPU usage. The side effect of this is that we will directly limit the maximum processing frame rate of the TouchLib code. Another this is that after sleeping for 32ms, the whole blob processing loop will run asynchronously to the camera capture, which will result in increased time lag and jitter.

2. I’m not sure if the following line in DSVLCaptureFilter::kernel() function is a feature/bug:

DWORD wait_result dsvl_vs->WaitForNextSample(100/60);

but this call actually result in a wait for new frame that only lasts 1ms.
This has greater implications. Since this call to DSVL library is a blocking for a duration of time specified, the DSVL library may or may not capture a new frame during this time. Therefore it is a common sense for this delay to be at least the duration of one sample.
By having the value of this delay being 1ms, and not checking its result, function CheckoutMemoryBuffer() will be returning the same buffer that was already processed (depending on the camera capture rate) thus wasting unnecessary CPU cycles and increasing the latency.

Because of all of this I suggest a simple fix.

Fix

1. In the CTouchScreen.cpp file, in the process() function remove or comment out:

SLEEP(32);

since our call to

filterChain[0]->process(NULL);


will be blocking until the new frame is captured, this is ok and will not result in the increase of the CPU usage. More over, it will result in the CPU usage decrease, since we will process every captured frame only once.

2. In the DSVLCaptureFilter.cpp file, in the kernel() function change:

DWORD wait_result dsvl_vs->WaitForNextSample(100/60);

to

DWORD wait_result dsvl_vs->WaitForNextSample(200);

this will cause the DSVL library to wait for 200ms which is plenty of time and will guarantee in a brand new captured frame.

Conclusion

By doing the modifications described above, we are in effect synchonizing our blob detection/processing to the incoming frame rate from the camera. We are also minimizing the delay between capture and processing, thus completely eliminating jitter and reducing the lag to the minimum.
So as the people say “The proof is in the pudding”, after some testing, these fixes reduced my CPU usage while at the same time reducing the blob recognition lag and jitter.

~Alex

 Signature 

Computing is not about computers any more.  It is about living!

~ Send me a PM about high quality laser modules for LLP ~

Profile
 
 
Posted: 09 June 2008 08:52 PM   [ Ignore ]   [ # 1 ]
New Member
Rank
Total Posts:  11
Joined  2008-05-29

kudos to you Alex! Keep up the great work!

Profile
 
 
Posted: 10 June 2008 01:19 AM   [ Ignore ]   [ # 2 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  599
Joined  2008-02-12

great work Alex…

Taha

 Signature 

My MultiTouch Blog

Profile
 
 
Posted: 10 June 2008 03:37 AM   [ Ignore ]   [ # 3 ]
New Member
Rank
Total Posts:  24
Joined  2008-04-01

Hi AlexP

Nice work! It’s good to see people that really dig into touchlib.

I would like to have your opinion on the Permute2() function in CBlobTracker. This is a recursive function and also seems to cause slowdowns.

E.

 Signature 

canTouch tangible interfaces - Amsterdam - http://www.cantouch.nl

Profile
 
 
Posted: 10 June 2008 05:24 AM   [ Ignore ]   [ # 4 ]
Administrator
Avatar
RankRank
Total Posts:  204
Joined  2007-04-03

Hey AlexP,

very interesting analysis. During testing I also found the Sleep(32) which was stalling the processing loop ( http://www.multigesture.net/2008/05/01/touchlib-speedfix-and-mma-pro-update/ ). However, when removing the sleep in the processing loop, video playback with the opencv plugin is no longer possible.

About the dsvl fix, I should really check that out on my own system smile.

 Signature 

My multitouch blog: http://www.multigesture.net
Howto: Compile touchlib on windows XP/Vista
Howto: Compile touchlib on Ubuntu Linux
Downloads: Touchlib SVN builds

Profile
 
 
Posted: 10 June 2008 09:07 AM   [ Ignore ]   [ # 5 ]
Jr. Member
Avatar
RankRank
Total Posts:  188
Joined  2007-09-13

Good stuff, AlexP!

I’m using the videowrapper instead of DSVL, and having a look at the code it seems alright. But I need you guys to verify that to me, as I’m no C++ guru.

void VideoWrapperFilter::kernel()
{
    timeval t
;
    if(!
g_hVideo)
        return;

    
VIDEO_getFrame(g_hVideo, (unsigned char**) (&acquired->imageData), &t);

    if(
acquired->imageData)
    
{
        cvCopy
(acquireddestination);
    
}

    VIDEO_releaseFrame
g_hVideo );
}

Thanks.

Profile
 
 
Posted: 10 June 2008 11:32 AM   [ Ignore ]   [ # 6 ]
Administrator
Avatar
RankRank
Total Posts:  201
Joined  2008-05-08
Falcon4ever - 10 June 2008 05:24 AM

Hey AlexP,

very interesting analysis. During testing I also found the Sleep(32) which was stalling the processing loop ( http://www.multigesture.net/2008/05/01/touchlib-speedfix-and-mma-pro-update/ ). However, when removing the sleep in the processing loop, video playback with the opencv plugin is no longer possible.

About the dsvl fix, I should really check that out on my own system smile.

You would have to implement both changes in order for video playback to work. Having the wait in DSVL be only 1ms is too short and will eat up a lot of CPU (ie. the blob detection/tracking would run every 1ms and you don’t want that). In my version video playback works fine.

 Signature 

Computing is not about computers any more.  It is about living!

~ Send me a PM about high quality laser modules for LLP ~

Profile
 
 
Posted: 10 June 2008 11:55 AM   [ Ignore ]   [ # 7 ]
Sr. Member
Avatar
RankRankRank
Total Posts:  285
Joined  2008-06-01

Just to add somethinmg to this… it would be great if TouchLib would run without lag on older processors/machines.

Just for fun, and as a test or benchmark as it were, I connected an old eMachines computer (1.2 ghz, 384 MB RAM, brand new Nvidia PCI video card) to my table and fired up TouchLib.
It worked, but blob tracking was a full half second delay before the blob appeared! That pretty much made it worthless to work with.

If what you are doing to optimize code will help TouchLib run on older slow machines, it just means that more DIY folks can find equipment to play with easier and cheaper.
Kind of the same way that a Linux distro will run on an old 300 mhz machine. ( :

just some thoughts..
Thanks for your work Alex!

 Signature 

Blobs the likes of which even the Gods have not seen!

Profile
 
 
Posted: 10 June 2008 12:46 PM   [ Ignore ]   [ # 8 ]
Sr. Member
Avatar
RankRankRank
Total Posts:  265
Joined  2007-09-22

The combination of image filtering , processing of live video frames is really cpu-intensive , it’s a miracle that we can get this to work under 40-50% cpu load.
It runs decent on a P4 3Ghz with 1GB ram and a crapptastic FX5200.This machine is about 3 years old and I’m curently using it until my c2q parts get here.
But it does the job done , and a a decent C2D machine would be around 500$ if you build it yourself.

Profile
 
 
Posted: 10 June 2008 04:08 PM   [ Ignore ]   [ # 9 ]
Administrator
Avatar
RankRank
Total Posts:  201
Joined  2008-05-08
Vlado - 10 June 2008 12:46 PM

The combination of image filtering , processing of live video frames is really cpu-intensive , it’s a miracle that we can get this to work under 40-50% cpu load.
It runs decent on a P4 3Ghz with 1GB ram and a crapptastic FX5200.This machine is about 3 years old and I’m curently using it until my c2q parts get here.
But it does the job done , and a a decent C2D machine would be around 500$ if you build it yourself.

Well on mine C2D E6000 @ 2.4GHz system the OSC.exe with above fixes runs at ~5%.
In the OSC.exe implemented the FPS counter to display the current blob processing frame rate.
The above numbers are for 320x240 at 30fps. There are no dropped video frames and every frame is processed only once. You should try this on your machine and see if there is any difference.

UPDATE: Under the same system I get about ~12% CPU usage for 640x480 @30fps. I get these numbers using my USB iBot camera.

~Alex

 Signature 

Computing is not about computers any more.  It is about living!

~ Send me a PM about high quality laser modules for LLP ~

Profile
 
 
Posted: 12 June 2008 04:09 AM   [ Ignore ]   [ # 10 ]
New Member
Rank
Total Posts:  26
Joined  2008-02-06

Hej Alex,

I appreciate the work you do, with both software (here) and laser (in your other topic) stuff

The lasers are a bit expancive for me to get in right now but i will do that later defeniatly.

The software fixes are great. I already commented the sleep before, but that gave me a 50% cpu usage. With your other fix the cpu drops to about 15% (640x480 30fps) !

keep it up!

cheers,

Jarno

Profile
 
 
Posted: 12 June 2008 06:00 AM   [ Ignore ]   [ # 11 ]
Sr. Member
Avatar
RankRankRank
Total Posts:  265
Joined  2007-09-22

Works great for me to , just got my new machine today.On my Q6600 i get around 8% CPU with your fix at 640x480.

Edit: That’s only on one core not the whole cpu , really impressive.

Profile
 
 
Posted: 13 June 2008 08:29 PM   [ Ignore ]   [ # 12 ]
New Member
Rank
Total Posts:  87
Joined  2007-10-23

Is there a possibility for us poor chaps who can’t compile touchlib to see the improvements? wink

Profile
 
 
Posted: 13 June 2008 08:33 PM   [ Ignore ]   [ # 13 ]
Jr. Member
Avatar
RankRank
Total Posts:  188
Joined  2007-09-13

EDIT: simultaneous post… Glad to know it’s not just me.  wink

Would anyone be kind enough to post the compiled binaries with the aforementioned fix applied?

Thanks in advance.

Profile
 
 
Posted: 16 June 2008 07:24 AM   [ Ignore ]   [ # 14 ]
New Member
Rank
Total Posts:  50
Joined  2008-03-17

I’m humbly begging for a compiled download too.

Profile
 
 
Posted: 16 June 2008 11:15 AM   [ Ignore ]   [ # 15 ]
Jr. Member
Avatar
RankRank
Total Posts:  198
Joined  2007-05-05

Hi alex.. yeah one big fat “binaries please” from me as well...smile

Just to check… with these modifications you get 100fps + from the firefly MV?

thanks..

 Signature 

Blog: http://iad.projects.zhdk.ch/multitouch/
180 Project: http://www.timroth.de/180/

Profile
 
 
   
1 of 2
1