ATI FW: [VICUG-L] Optical character recognition (OCR) in Google Docs
paltschul at centurytel.net
Wed Jun 23 13:46:19 CDT 2010
Optical character recognition (OCR) in Google Docs.
Tuesday, June 22, 2010.
A couple of months ago, my co-worker, Mike, showed up at my desk with a pile
paper, each of the yellowed sheets densely covered with an ancient-looking
typewriter font. His wife had recently discovered parts of her family
in the attic, typed up by her grandmother many years ago! Now he was
if there was a way for her to continue writing the chronicles in Google
The papers sat on my desk for a while, but recently, I returned them to Mike
with a smile, cheerfully telling him that what started as my 20% project is
ready for everyone to use - Google Docs now officially supports importing
scanned documents. What we launched as an experimental feature for the
List Data API last year is now available on the upload page: check the
text from PDF or image files to Google Docs documents", upload your scanned
images (JPEG, GIF, PNG) or PDFs, and Google Docs will extract text and
formatting from the scans for you to edit away.
For the technically curious: we're using Optical Character Recognition (OCR)
that our friends from Google Books helped us set up. OCR works best with
high-resolution images, and not all formatting may be preserved. The
images will be included in the new document to make it easier for you to
mistakes. Supported languages include English, French, Italian, German and
Spanish, with more languages and character sets on their way. We're looking
forward to get feedback from you while we keep improving the feature over
And Mike's scanned family chronicles have even been extended by an
chapter in Google Docs: his wife recently had a baby boy named James!
Posted by: Jaron Schaeffer, Software Engineer, Google Docs.
VICUG-L is the Visually Impaired Computer User Group List.
Archived on the World Wide Web at
Signoff: vicug-l-unsubscribe-request at listserv.icors.org
Subscribe: vicug-l-subscribe-request at listserv.icors.org
More information about the ATI