Book Scanning -The challenge of digitizing printed books

Introduction

Books are treasure troves of knowledge. As we have moved into the digital age, reading online has become the norm. This leaves a huge number of pre-digital age books in libraries, but especially personal book collections, unavailable to a larger audience. Digitizing books also provides many means to analyze and produce very useful metadata.
To those who are visually impaired, restricting their ability to read, digitized books provide the means to be converted to voice.

However, book scanners are expensive. Hence a DIY scanner would be ideal.

We can simultaneously learn a whole lot about imagers, imaging software, lighting and colours, rendition, mechanics, etc.

We also have plenty of prior art to fall back upon and adapt to our local environments and needs.

A very good site is
http://diybookscanner.org/archivist/index.htmlSome text here

Desirable design criteria.

Here are some of the design criteria and rationale for these criteria.

  1. Scan quality

    • Resolution - 600 dpi or higher. By having a suitable camera mount, one can easily achieve good resolution, commensurate with one’s budget. One will have to use OCR (optical character recognition). The higher the resolution the better.

    • Dynamic range - 24-bit color. Above camera, the mount will allow a suitable camera to be used. Specifically needed to render images. However OCR works best with monochromatic prints.

    • Book curvature distortion - use suitable platten to remove page curvature. Use software.

  2. Size

    • Vast majority of books printed are smaller than Quarto size - 9 1⁄2" × 12" or 240mm × 305mm

    • Thickness of 3" 76.2mm, approximately 800 pages of 75gsm

    • Should be able to accommodate a spiral bind

  3. Build complexity and cost

    • Build complexity and scanning speed are inversely linked. Simple builds require one to lift the book off the platten to flip the page and will scan only one page. The most complex build will flip pages automatically and scan both facing pages.

    • Cost and complexity are directly proportional. A more complex build requires multiple components, substantial precision, and reasonable skill to put together.

Daniel Rietz a visual neuroscientist has built a book scanner through several iterations. He and others have documented their designs on http://diybookscanner.org/. We will be using these designs to build ours.

6 Likes

In a discussion some years ago, the thought of crowdsourcing, to utilise the abilities of multiple persons anywhere, using personal phones, to capture severally the contents of any given title, came up.

This also has a range of challenges, whose solutions lie largely in software rather than hardware. For instance, if colour illustrations are involved, there has to be some post-capture image harmonisation. This could be managed via the shared app, which would reference a server based tool with presets for the particular title.

And there would be OCR, with consequent and inevitable faults. Here too, the app should enable and encourage multiple users to share and minimise the individual workload, to fix all reported errors.

3 Likes

Since writing that sentence, digital technology has moved ahead rapidly. Today, a number of apps available on smartphones, both iOS and Android, support scanning with OCR, some that are remarkably good at it.

This means that the task of collaborative ‘finishing’ of texts for readability has become considerably easier. All the more reason to consider getting more people involved to actually kick it off.

Perhaps this is a fit subject for a chatShaala, or a Project. @Nagarjuna @Ashish_Pardeshi let’s get one going. Maybe a book per week?

5 Likes