[GSoC 2012 Proposal] Real-Time Depth Analysis and Tracking Via Dual Webcams
Posted: 26 March 2012 11:09 PM   [ Ignore ]
Avatar
RankRank
Joined  2008-11-28
Total Posts:  122
Member

About Me:
Name: Eric Hill
NUIGroup Username: hillbilly
Age: 18
Geographical Information: Cloverdale, Ohio
Timezone: Eastern Time Zone (North America)
Brief Biography: Although never trained in any formal school or institution on the topics of computers or software engineering, I am an able and a persisting computer enthusiast with hobbies that range from creating software to fixing software related and computer related problems. I started this extensive range of hobbies back when I was in Junior High and have furthered my knowledge through various Internet sources and literary publications in order to understand and indulge in the hobbies that started my computer career. My experience encompasses languages such as C, C++, BlitzBasic, JustBasic, Enyo, HTML, PHP, and CSS. My work can be found here, on the NUI Group forums, as well as on the ever-expanding Internet.

Proposal:
For around two years now, I have been fascinated with the functions, abilities, and practicalities of Microsoft’s Kinect Sensor for XBox360. It wasn’t just the fact that the Kinect was revolutionizing the way that humans interact with software and hardware, but it was the limitless potential that the Kinect had for further implementations that really sparked my creativity and, subsequently, my interest. Keep in mind, though, that I am a poor High School senior with little to no extra spending cash for the expensive pleasures of using the sensor that captivated my senses. No money in hand, I set out to find a new way of replicating and enhancing the technology found in the Kinect for my own uses and development. I came up with a single-camera system which focused on changes in gray tones in the image to differentiate pixel depth and it worked for the most part but there were obvious signs of limitation in this setup. Eventually, I found myself experimenting with two PS3Eye Cameras in order to enhance the technique that I coined as ‘Depth Tracking’, but unfortunately was not successful in my attempts. So, as to my proposal, I wish to integrate the functionality of a depth-sensing rig involving dual webcams into the next version of CCV. In short, this would mean a new land of limitless possibilities for software based on this new implementation.

“So What,” You say?
Many might say, “Big deal. You’re trying to re-invent the wheel! Why not just buy a Kinect and develop for it instead?” To that I say, “Touche”. But there stands reasons why a custom rig is needed and more beneficial. Allow me to elaborate…

1. Cost - A Kinect as of the date of this post (Monday, March 26, 2012) costs $100.00 from a Google Shopping result [0]. With a dual-webcam setup, the cost would nearly be cut in half, as the price for two PS3Eye Cameras (recommended for speedy FPS and notable quality) is only around $24.96 ($28.95 with est. tax & shipping) [1].
2. Range and Functionality - A Kinect has a limited range on its depth-sensing field. After around 5ft or so, the sensor loses the ability to determine a person’s distance away from the camera. Also, it is important to note that the Kinect accomplishes its depth analysis and recognition by spreading an infrared laser through a prism and spitting out hundreds of little dots across a room. With a dual-camera setup such as proposed, there is no need for an infrared laser and therefore a reduced risk of eye exposure to infrared in general.
3. Improved Quality - In theory, the workings behind a dual-camera setup would provide a greater quality of Depth Tracking and therefore provide a more enriched experience to a user.
4. Open Source - I say this as an advantage because I am a firm believer that for the software industry to advance and flourish, it needs to be open-sourced so that anyone can help to contribute to a better solution or expound the software for his or her own use. This, with the Kinect, is not an option, for obvious reasons of copyrights and patents, etc.
5. Ability of Use Outside - With the elimination of infrared as a depth-sensing agent, the ability to use this setup would be extended to that of the outdoors.

Building the Software/Detection Implementation
My idea is simple in theory but complex in code. Plainly put, it is comparable to the effect that you see when you compare the vision in a left eye with that in a right without moving the head. As one can see, any object in vision changes its X-Position relative to the observer. The greater the change in X-Position, the closer that the object is to the set of eyes. The same effect would be achieved in my ideological software. Imagine, as easy as it is to do, that each eye then represented a separate camera on the proposed dual-camera setup and then correlate our brain’s functionality to determine the shift in X-Position, or depth of an object, to the proposed software. The code would consist of detecting objects, something that CCV is already a great solution for, in each image and then comparing the objects found in each image for matches of shape, position, color, texture, etc. Anything to create a match. After this is done, the software then will analyze correlating objects for a shift in the X-Position as found in the difference of the two images giving a real-time 3D Depth Tracking effect.

The Algorithm
A copy of my proposed algorithm can be downloaded here: http://nuigroup.com/?ACT=28&fid=112&aid=7803_SXIoCt6QRMlD0gOFydkZ
Or, you can scroll to the bottom of this post for an attached algorithm.

What Could be Achieved in the Future
With the implementation of the dual-camera setup and custom software solution in CCV, I would envision future applications and games such as the ones outlined below.

  • Protocol bridge implementation (Capable of integration with the Kinect protocol?) Would mean access to the software already coded for the Kinect.
  • MOCAP in real-time with mesh/model-to-body capabilities. Imagine the time saved with such software.
  • Implementation of a room-monitor that utilizes the object-tracking system already built into CCV and tracks objects introduced into an environment and saves a record of where it was placed last/where it was last seen. Extremely useful in a moment where a person has forgotten where he or she placed his or her keys/phone/remote/glasses, etc.
  • 3D Scan of objects just by rotating any object in front of the camera.
  • Enhanced Augmented Reality games where the virtual world takes on a real, depth-full feel.
  • Gesture-based recognition systems such as the ones found in CCV, Kinect, and others.
    The possibilities are literally endless.

    Background on this Topic
    I have gained experience in the study of, what I coined ‘Depth Tracking’ from my previous project call ‘[Vision]DepthTracker’ [2, 3]. In coding this application I gained expertise in the area of 2D-to-3D image-to-space conversion and have a full understanding of how to achieve a similar solution in this new proposed setup. I have read various articles on the internet, all of which are publications of other teams’ works and findings on related fields of study [4, 5, 6]. With this in mind, I think I am more than able to take on a project such as this.

    Conclusion
    With my competent, able skill-set and expertise in the field of Depth Tracking, I feel compelled to state my own worth and necessity for a project of this magnitude. Creating software is no easy task, and I’ve had a long time to experience the hardships brought with it. But, with the help of a mentor, I envision a bright future for CCV and related software on the web in which Depth Tracking is a common form of human-computer interaction.

    FULL PROPOSAL (PDF)
    http://nuigroup.com/?ACT=28&fid=112&aid=7820_qw052HfcHqbqYGQTUlgB
    or see the bottom of this post…

    Citations:
    [0] http://www.bestbuy.com/site/Microsoft+-+Kinect+for+Xbox+360/1036858.p?id=1218212157998&skuId=1036858&cmp=RMX&ref=06&loc=01&ci_src=14110944&ci_sku=1036858
    [1] http://www.amazon.com/PlayStation-Eye-3/dp/B000VTQ3LU
    [2] http://nuigroup.com/forums/viewthread/12451/
    [3] http://nuigroup.com/forums/viewthread/12439/
    [4] http://www.cs.cornell.edu/~rdz/papers/kz-eccv02-recon.pdf
    [5] http://ia.cs.colorado.edu/~jane/pdf/mulligan_wsmb01.pdf
    [6] http://robotics.pme.duth.gr/research/topics/stereo-vision/

  • File Attachments
    Algorithm.pdf  (File Size: 91KB - Downloads: 178)
    GSOC 2012 Proposal.pdf  (File Size: 510KB - Downloads: 141)
    Profile
     
     
    Posted: 27 March 2012 05:26 AM   [ Ignore ]   [ # 1 ]
    Avatar
    RankRankRank
    Joined  2009-09-20
    Total Posts:  263
    Sr. Member

    Hi - could you tell about your background on this area? Have you read any papers on this field? If yes - dont’t forget to include them to your proposal.

     Signature 

    Microsoft Applied Sciences Group Intern

    Profile
     
     
    Posted: 27 March 2012 10:43 AM   [ Ignore ]   [ # 2 ]
    Rank
    Joined  2010-02-13
    Total Posts:  2
    New Member
    hillbilly - 26 March 2012 11:09 PM

    ...
    “So What,” You say?
    Many might say, “Big deal. You’re trying to re-invent the wheel! Why not just buy a Kinect and develop for it instead?” To that I say, “Touche”. But there stands reasons why a custom rig is needed and more beneficial. Allow me to elaborate…

    1. Cost - A Kinect as of the date of this post (Monday, March 26, 2012) costs $100.00 from a Google Shopping result [0]. With a dual-webcam setup, the cost would nearly be cut in half, as the price for two PS3Eye Cameras (recommended for speedy FPS and notable quality) is only around $24.96 ($28.95 with est. tax & shipping) [1].
    2. Range and Functionality - A Kinect has a limited range on its depth-sensing field. After around 5ft or so, the sensor loses the ability to determine a person’s distance away from the camera. Also, it is important to note that the Kinect accomplishes its depth analysis and recognition by spreading an infrared laser through a prism and spitting out hundreds of little dots across a room. With a dual-camera setup such as proposed, there is no need for an infrared laser and therefore a reduced risk of eye exposure to infrared in general.
    3. Improved Quality - In theory, the workings behind a dual-camera setup would provide a greater quality of Depth Tracking and therefore provide a more enriched experience to a user.
    4. Open Source - I say this as an advantage because I am a firm believer that for the software industry to advance and flourish, it needs to be open-sourced so that anyone can help to contribute to a better solution or expound the software for his or her own use. This, with the Kinect, is not an option, for obvious reasons of copyrights and patents, etc.

    5. Might work outside (i.e. no sunlight interference)

    hillbilly - 26 March 2012 11:09 PM

    What is planned

    Don’t you mean What could be achieved in the future?
    I think you need to add which techniques/algorithms you are going to use to implement this dual-camera system. Read related work and show that you are able to understand & show potential to implement it.

    Profile
     
     
    Posted: 27 March 2012 04:11 PM   [ Ignore ]   [ # 3 ]
    Avatar
    RankRank
    Joined  2008-11-28
    Total Posts:  122
    Member
    Zillode - 27 March 2012 10:43 AM

    hillbilly - 26 March 2012 11:09 PM

    ...
    “So What,” You say?
    Many might say, “Big deal. You’re trying to re-invent the wheel! Why not just buy a Kinect and develop for it instead?” To that I say, “Touche”. But there stands reasons why a custom rig is needed and more beneficial. Allow me to elaborate…

    1. Cost - A Kinect as of the date of this post (Monday, March 26, 2012) costs $100.00 from a Google Shopping result [0]. With a dual-webcam setup, the cost would nearly be cut in half, as the price for two PS3Eye Cameras (recommended for speedy FPS and notable quality) is only around $24.96 ($28.95 with est. tax & shipping) [1].
    2. Range and Functionality - A Kinect has a limited range on its depth-sensing field. After around 5ft or so, the sensor loses the ability to determine a person’s distance away from the camera. Also, it is important to note that the Kinect accomplishes its depth analysis and recognition by spreading an infrared laser through a prism and spitting out hundreds of little dots across a room. With a dual-camera setup such as proposed, there is no need for an infrared laser and therefore a reduced risk of eye exposure to infrared in general.
    3. Improved Quality - In theory, the workings behind a dual-camera setup would provide a greater quality of Depth Tracking and therefore provide a more enriched experience to a user.
    4. Open Source - I say this as an advantage because I am a firm believer that for the software industry to advance and flourish, it needs to be open-sourced so that anyone can help to contribute to a better solution or expound the software for his or her own use. This, with the Kinect, is not an option, for obvious reasons of copyrights and patents, etc.

    5. Might work outside (i.e. no sunlight interference)

    hillbilly - 26 March 2012 11:09 PM

    What is planned

    Don’t you mean What could be achieved in the future?
    I think you need to add which techniques/algorithms you are going to use to implement this dual-camera system. Read related work and show that you are able to understand & show potential to implement it.

    Yes, thank you for this. Curse my own overlooking of extra features. It would work outside because of the independence from infrared. And, as to the second part of your post, I completely agree. I’m going to add the algorithms here in a minute along with some of the other works I’ve read thus far.

    Profile