The challenge of digitizing printed books - Book Scanning

jtd · June 17, 2019, 5:42pm

Books are treasure troves of knowledge. As we have moved into the digital age, reading online has become the norm. This leaves a huge number of pre digital age books in libraries, but especially personal book collections, unavailable to a larger audience. Digitizing books also provides many means to analyse and produce very useful metadata.

However book scanners are expensive. Hence a DIY scanner would be ideal.

We can simultaneously learn a whole lot about imagers, imaging software, lighting and colours, rendition, mechanics etc.

We also have plenty of prior art to fallback upon and adapt to our local environments and needs.

A very good site is
http://diybookscanner.org/archivist/index.html

GN · June 18, 2019, 4:39am

Thanks for initiating this topic. I think it becomes a wonderful project for the tinkering participants. Would you like to initiate writing about the project, what does it take to complete, etc.

Let us do this. There are a number of ancient libraries in India which store books (most of which are completely out of print) that need a scanner to digitize. The scanners that we can produce can be used for this kind of projects.

jtd · June 18, 2019, 7:20pm

Yes.
Will be drawing up a specs list, which will also become the challenge list.

yatheendra.m · May 30, 2020, 9:14pm

I am wondering why you need dedicated hardware.

A plain copier-style scanner could be a good enough starting point, given the existence of tesseract-ocr (which also supports a few Indian languages). In the interests of modularity, the scanned images could be stored in fax machines’s CCITT Group 4 (G4) format, for reference and for processing using OCR software.

jtd · May 31, 2020, 5:32am

A flat bed scanner distorts the spline region of a book.
The book has to be placed face down making the process a lot harder.
Many simple DIY scanners also use face down. But you have the advantage of seeing the image on your screen.
Lastly flatbed scanners are line scans, hence far slower than area scanners.

yatheendra.m · May 31, 2020, 1:20pm

Scanning is likely to be a transitionary activity for digitizing existing printed content. It begs to take advantage of the demographic dividend and get it done as visual transcription jobs, with crowd-sourcing to vet the entered text.

If thinking post-apocalyptic digitization isn’t enough long-term. Maybe some physical but readable archival media (say, microfilm on metal?) are required.

yatheendra.m · August 15, 2020, 6:46am

Assuming flat bed scanners, is book spline distortion the only problem?
Maybe something like a “flattened prism” placed on the scanning area, with the book spline placed on the prism’s edge, would be enough to scan 1 page properly at a time?

ravi312 · February 13, 2021, 7:44am

Found this interesting image of Studio CAMP

Image: CAMP studio (handmade book scanner, optical scanner, four computers, NVR recorder, joystick, microphones, salad box, water, biscuits, coffee on the folding table. Bookshelves made of paper rolls, books, routers, awards, air conditioning and fan above. Inventory of electronics + museum of Jurassic technology below the tables, flooring replaced from wear. Some persons on a break, a person taking the picture.)

GN · February 17, 2021, 3:00am

Taking the clue from this, we can use a sheet of glass and a appropriately positioned mirror instead of a prism. I am assuming that a prism will be more expensive than a mirror+glass combination. Placing the transparent thin glass can help getting a page of a book flattened nicely.