I want to share here a “basic dual camera framework” setup, how it can be used, I have just drawn it on a paper using a pen and took a camera snapshot using the PS3Eye camera.
What you should be able to see on this picture are:
1. cam1 with a static (x,y,z) location in 3D space, e.g. we can define the lens position of cam1 as (0,0,0) or as the reference point in 3D space. We would place cam1 on the right side of the human on a long(er) table.
2. cam2 with a static (x,y,z) location in 3D space, which is placed just above the head of the human or somewhere between the computer display and the humans chair where she should sit normally.
3. four corners of the computer display which are also static in 3D space.
4. humans single hand (in future maybe even supporting two human hands simultaneously) with 6 (x,y,z) locations as a function of time. For the numbering of the fingers I used the piano finger numbering style ranging from 1 to 5 and the number 0 is used for the root of the hand.
If now a system would be able to take those six static locations (cam1, cam2, 4 corners of the screen) and in real-time the six dynamic locations of a single human hand we should be able to develop many input possibilities.
Additionally even periodic movements of the human head and the human upper body could be used e.g. for synchronizing purposes, e.g. to control the tempo or bpm of a song or rhythm or beat.
Click thumbnail to see full-size image