I had been racking my brain for a couple weeks trying to think of a great application to make when, all of a sudden, it hit me. I was marveling over the Barcode Scanner application when I suddenly the idea: What if I combine the device's camera with OCR?
OCR is the process of turning pictures into text, and once you have raw text, the possibilities are endless. As a simple example, imagine being able take a picture of an address on a brochure and having it immediately displayed on a map. The same idea could also be applied to calling phone numbers, or even translating!
Another technology that could come in to play with this concept is the newly added TTS feature. Now you could have things read out loud to you. Whether you can't afford to take your eyes off the road, can't see because you forgot your glasses, or want to make every book a self-reading child book? No problem, just take a quick picture, and the text will be read out loud.
The next step is to do layout analysis on the picture. This tells you things like at (x, y) on the picture, there is z letter/word/block of text. So now you could just hold your phone over a page of text, and have all the instances of the word "the" highlighted in yellow -- finally, that real-life Ctrl+F (find) you've always wanted!
Well friends, after about two weeks of development, I finally have a working prototype! I stole about 98% of the code from the ZXing, Ocrad, and STLPort (Gears's fork) projects, but hey, this is open source, and that's how we roll. Here's how it works:
- Hold your device over a document. The camera will auto-focus and send the image to Ocrad for processing.
- Once Ocrad has identified some text, it will be returned and displayed on the screen.
Surprisingly, it only took Ocrad took about 200ms to process the entire image on the device. It takes Barcode Sanner about twice as long to process a 2D barcode. Although, I did cheat by using a native processor (Barcode Scanner has the overhead of Java), and mine only has to scan for one type of image (Barcode Scanner scans for many different formats for each picture).
Next, I'm going to get it processing the layout and visually overlaying it somehow on the screen. I have a feeling though that this is going to add a pretty big hit to the performance, but as long as I can keep it at least as good as Barcode Scanner, I think it will be acceptable.
When I have time, I'll also try to get the code posted on CodePlex for those interested in seeing the exact details of how it was all done.