OS/2 Warp Compatible Hardware List Web site: IntroductionOS/2 Scanning Solutions Overview, advantages of TAME/2.
USB and SCSI Scanning with Tame/2: Introduction.
The one-and-only OS/2 Warp Imaging and Scanning Solution for LPT, SCSI and USB!
© 2004, Goran Ivankovic, Klaus Staedtler, os2warp.be
Last updated: 2004/01/07.3
OS/2 and hardware... For more than a decade, OS/2 has been misjudged to have
very limited hardware support. But in fact, if you dig some further, you'll
notice that OS/2 Warp and eComStation offer extreme large and powerful hardware
support. Not so long ago, there was no support for USB scanners or whatsoever;
only SCSI models were supported. Not only are SCSI scanners a lot more expensive,
but newer scanners are shipped with a USB interface only, and the parallel port
interface has disappeared.
With TAME/2, the time is to change. It offers an easy-to-use GUI to use your
USB and SCSI scanners, and it is still being developed, in contrast to the older
(though also very performant) drivers and scanning utilities.
In this article, we will have a brief look at all solutions available for scanning
on the OS/2 Warp platform, and we'll focus especially on TAME/2, and its advantages
compared to other solutions.
|| This part of os2warp.be is currently being developed. We are doing our
very best efforts to get this section online and operational as soon as
possible. We thank you for your continued confidence in os2warp.be and hope
to see you back soon.
- General: Introduction
- How a Flatbed Scanner Works
- How Optical Character Recognition Works
- Addendum: TWAIN Technology
- OS/2 Scanning Solutions
- CFM Twain Scanner Drivers
- Solution Technology Scanner Drivers
- The Pros 'n Cons
- The Guts
- OCR on OS/2 bij review ocr finereader vermelden, ook
zoeken nr spell check and/or second pass!
- OS/2 Image Manipulation Programs
- And what about Linux?
1. General: Introduction.
Scanners are the eyes of your personal computer. They allow a PC to convert
a photograph or drawing into code that a graphics or desktop publishing program
can use to display the image onscreen and produce the image with a graphics
printer. A scanner also can let you convert printed type into editable text
using OCR technology.
Two basic types of scanners differ primarily in the way that the page containing
the image and the scan head that reads the image move past each other. In a
sheet-fed scanner, mechanical rollers move the paper past the scan head. In
a flatbed scanner, the page is stationary behind a glass window while the head
moves past the page, similar to the way a copying machine works. So, in fact
you notice that all your scanner does is take a photograph of a sheet of paper,
and return it to the computer. The flatbed scanner requires a number of mirrors
to keep the image that is picked up by the moving scan head focused on the lens
that feeds the image to a bank of sensors. Because no mirror is perfect, the
image undergoes some degradation each time it is reflected. But the advantage
of a flatbed scanner is that it can scan oversized or thick documents, such
as an encyclopedia book. With a sheet-fed scanner, you're limited to scanning
a single, ordinary-size sheet of paper at a time. Today, you're most likely
to find only flatbed scanners in computer stores. These scanners offer acceptable
performance for home endings and are available for an acceptable price. In general,
flatbed scanners come with a parallel printer or USB interface as default. Due
to their better performance, the professional sheet-fed scanners are not widely
spread, and often very expensive. These scanners still are often shipped with
a SCSI peripheral interface.
A scanner's sophistication lies in its capability to translate an unlimited
range of analog voltage levels into digital values. Some scanners can distinguish
between only black and white and are useful just for text. More precise models
can differentiate shades of grey. Colour scanners use red, blue, and green filters
to detect the colours in the reflected light.
Regardless of a scanner's sensitivity to light or how the head and paper move,
the operations of all scanners are basically simple and similar. We'll take
you on a tour that is representative of the technologies involved - a flatbed
scanner. We'll also examine one of the most important reasons for scanning a
document - to convert its image into editable text by using OCR software. Finally,
we introduce you to some general theoretic concepts and afterwards, we'll discuss
programs and drivers available for OS/2 according to these standards.
1.1. How a Flatbed Scanner Works
Since all modern scanners are flatbed scanners by default, we'll take a brief
look at how such a scanner accomplishes its task.
A light source (a strong lamp) illuminates a piece of paper placed face down
against a glass window right above the scanning mechanism. Blank or white spaces
reflect more light than do inked or colored letters and/or images. Based on
the "amounts" of reflection a sensor will pick up (produced by the
lamp beneath the glass), a certain digital value is created that can be sent
to the computer so that it get in fact as raster/matrix of all colour values
of each pixel of the entire image. Since a motor moves the scan head (and thus
the light-emitting lamp) beneath the page, the scan head captures light bounced
off individual areas of the page, each no larger than 1 / 90000 of a square
inch. The light from the page s reflected through a system of mirrors that must
continually pivot to keep the light beams aligned with a lens (to magnify, so
that a large/small image can be shrunk/enlarged). The lens focuses the beams
of light onto light-sensitive diodes (previously referred to as sensor) that
translate the amount of light into a electrical current. The more light that's
reflected, the greater the voltage of the current. If the scanner works with
colored images, the reflected light is directed through red, green, or blue
filters in front of separate diodes. As last phase of the scanning process,
a ADC (Analog-to-Digital Converter) stores each analog reading of voltage as
a digital pixel representing the light's intensity for a spot along a line that
contains from 300 up to 1200 pixels to the inch. Finally, this digital information
(bits and bytes) is sent to the PC, either via parallel printer, USB port or
SCSI interface. Here, data will be stored in a particular format, so that graphics
programs can open and edit the scan later.
1.2. How Optical Character Recognition Works
Optical Character Recognition (OCR) is the recognition of printed or written
text characters by a computer. This involves photo scanning of the text character-by-character,
analysis of the scanned-in image, and then translation of the character image
into character codes, such as ASCII, commonly used in data processing. In OCR
processing, the scanned-in image or bitmap is analyzed for light and dark areas
in order to identify each alphabetic letter or numeric digit. When a character
is recognized, it is converted into an ASCII code. OCR is being used per example
by libraries to digitize and preserve their holdings. OCR is also used to process
checks and credit card slips and sort the mail. Billions of magazines and letters
are sorted every day by OCR machines, considerably speeding up mail delivery.
There are several OCR solutions available for OS/2, and we'll give a brief look
of them later on in this article.
But how does it work? When a scanner "reads in the image of a document"
(while in fact making a photograph of it), the scanner converts the dark elements
(text and graphics) on the page to a bitmap (which in fact is a raster or matrix
that holds colour and intensity data about each square pixel, each "dot",
of the photograph taken). These pixels can either be on (black) or off (white).
Notice that in order to apply OCR, we only need a black and white scan, since
the text we focus is mostly black. Because the pixels are larger than the details
of most text, this procedure degenerates the sharp edges of characters, much
as a fax machine blurs the sharpness of characters. This degradation creates
most of the problems for OCR systems.
The OCR software reads in the bitmap captured via the scanner and averages out
the zones of on and off pixels on the page, in effect mapping the white space
on the page. This enables the software to block off paragraphs, alineas, columns,
head-lines and graphics. The white spaces between lines of text within a block
defines each line's baseline, which is an essential detail for recognizing the
characters in the text. Based on these baselines, the OCR application can distinguish
the different visual parts of the document.
In its first pass converting images to text, the OCR software tries to match
each character through a pixel-by-pixel comparison to character templates that
the program holds in memory. These templates can include complete font characteristics
like numbers, punctuation, extended characters. Because this technique demands
a very close match, the character attributes, such as bold and italic, must
be identical to qualify as a match. Poor quality scans can easily trip up matrix
matching. Therefore, if you want to take real beneficious power of OCR, scan
your base image at rather high resolutions.
The characters that remain unrecognized go through a more intensive and time-consuming
process called feature extraction (consider this the second pass of
the OCR process). The OCR software calculates the text's x-height (the height
of a font's lowercase x) and analyses each character's combination of straight
lines, curves, and some other aspects like bowls (hollow areas within loops,
as in o or b).The OCR programs know for example that a character with a curved
descender below the baseline and a bowl above it is most likely a lowercase
g. As the software builds its working alphabet of each new character it encounters,
recognition speed accelerates. Because these two processes don't decipher every
character, OCR programs take two approaches to the remaining hieroglyphics.
Some OCR programs tag unrecognized characters with a distinctive character,
such as @, # or ~, and then simply quit their job. You must use the search capability
of a word processor to find where the distinctive character has been inserted
and correct the word manually. Some OCR programs (like AbbySoft's FineReader
for Windows) also display a magnified bitmap onscreen and ask you to substitute
for the placeholder character. Still other OCR programs invoke a specialized
spelling checker to search for obvious errors and locate possible alternatives
for words that contain tagged unrecognized characters. For example, to OCR applications,
the number 1 and the letter l look very similar, so do 5 and S, or cl and d.
A word such as downturn might be rendered as clownturn. A
spelling checker recognizes some typical OCR errors and corrects them.
Finally, we conclude this section by noting that most OCR programs give you
the option of saving the converted document to an ASCII file or in a file format
recognized by popular word processors or spreadsheets.
1.3. Addendum: TWAIN Technology
It is most likely that you, as OS/2 user, have installed a version of PMView
for OS/2. From within this program you can acquire your image via scanning when
you have - for example - installed the CFM TWAIN drivers. This linking between
both the application and the scanning is accomplished by an implementation of
TWAIN. TWAIN is a widely-used standard that is implemented in the form a some
kind of a device driver that allows you to scan an image using a scanner directly
into the application (such as PMView) where you want to edit the image. Without
TWAIN, you would have to close an application that was open, open a special
application to receive the image, and then move the image to the application
where you wanted to work with it. The TWAIN driver runs between an application
and the scanner hardware. TWAIN usually comes as part of the software package
you get when you buy a scanner. It's also integrated into PhotoShop and similar
image manipulation programs, like PMView. Thus, in fact TWAIN just sits between
the application and the scanner driver(s).
The software was developed from 1992 on by a work group from major scanner
manufacturers and scanning software developers and is now an industry standard
(Adobe, Xerox, Kodak, Fujitsu, DigiMark, AnyDoc, and some other companies).
The name is not intended to be an acronym. In several accounts, TWAIN was an
acronym developed playfully from Technology Without An Important Name.
2. OS/2 Scanning Solutions
Device listing of all scanners supported by the scanning solutions discussed
below can be found at www.os2warp.be.
2.1. CFM Twain Scanner Drivers
The commercial CFM Twain Scanner Drivers from the CFM
(Computer für Menschen ) GmbH is a classic driver suite that has been
around for several years now. With this robust driver suite, CFM offers a TWAIN-compliant
scanning interface that supports a lot of SCSI scanners. Because of its TWAIN-modular
design, you can acquire scanned images via your scanner out of each application
that supports a TWAIN subsystem. Example of such programs is PMView, which is
officially supported. Most likely, every modern desktop publisher (like Maul
Publisher for OS/2), optical character recognition and document imaging application
comes with built-in support for TWAIN.
Unfortunately, this product has been discontinued some years ago, and no updates
nor bug fixes are released anymore, so the product is quite aging rapidly. The
product is available for both the Windows and OS/2 platforms. The version for
the former operating system is release 7.0, which supports more SCSI scanners
than version 5.2, which was the latest version for OS/2 Warp. However, if you
purchase CFM Twain Scanner Drivers v5.2 for OS/2 via Mensys or BMT Micro, you
also get the 2 disk set upgrade package. The product is still sold for a price
of approximately € 55,-. You can download a trial of the driver suite from
Hobbes or directly from BMT Micro. There are no disabled functions nor time
limits imposed when you use an unregistered demo / trial version, but all scanned
images have a CFM DEMO stamp. The package comes with an installer, which is
very easy and intuitive to use. There are several NLVs (National Language Versions):
Dutch, English, French, German, Italian, Japanese, Korean, Portuguese and Spanish.
Above, you can see a screenshot if the window that appear when you want to
acquire (scan) an image out of PMView. As you can see, you have an easy-to-use
graphical user interface (GUI), that offers all necessary settings conveniently
all in one place.
If CFM supported more scanners, then this would have been a good solution. However,
support and development has been discontinued and the product has grown old.
2.2. Solution Technology Scanner Drivers
Solution Technology (STi) comes with
the TWAIN for OS/2 Driver Packs. Driver Packs add scanning functionality to
any TWAIN enabled application, just as is the case with CFM Twain. There are
three different levels (or entries) of the Driver Packs:
- The Consumer Driver Pack supports desktop scanners and digital cameras like
Hewlett-Packard, Epson, Fujitsu, etc.
- The Medium Speed Pack supports production scanners like Bell & Howell
and select Fujitsu series.
- The High Speed Pack supports high end production scanners as well as some
Kodak scanner series.
The drivers supports black and white, gray scale, and color (not all supported
scanners) as well as sheet feeders and transparency units to the TWAIN 1.8 specification
level. It is very clear that these drivers were developed for a business target,
because everything is so neat and well-designed. Extreme robustness and stability,
support for a lot of high-end scanners. All TWAIN for OS/2 drivers are multi-threaded
to take advantage of OS/2 to maximize data transfer throughput and minimize
system resource hogging.
It appears that development to these products has also been discontinued. Even
though they're pricy when you buy them directly from STi, they actually do support
a great deal of older professional scanners (only SCSI). The High Speed Driver
Pack can, however, be purchased for a small amount of money in the eComStation
Application Pak, a product of Serenity Systems Intl.
commercial ImpOS/2 has been around for several years now, and is somewhat a
special case amongst others. Not only does Compart
GmbH include OS/2 scanner drivers in the suite, but also a rather extensive,
powerful and easy-to-use image manipulation utilities. Unfortunately, no matter
how powerful the suite may be, development has been discontinued. Version 2.1
dates from April 1998, and this is still the latest. However, the package includes
drivers for a lot of SCSI scanners, and there are two diskettes with updates
available, which you also get when you purchase the product. The latest drivers
were released in April 2001, and they are available from http://www.compart.net.
Compared to the image editing programs available, ImpOS/2 looks a bit outdated.
On OS/2, Embellish and Gimp are interesting alternatives. Something very interesting
for power-users is that most functions in ImpOS/2 can be controlled via the
REXX scripting language. This may not sound very useful in an image manipulation
program, but it means that the possibilities of ImpOS/2 can be easily expanded,
and combined with other programs.
The latest version comes on a cd-rom which includes both an English and a German
versions, but it would easily fit on a couple of floppy disks. Despite it's
small footprint (both in terms of memory and disk space usage), ImpOS/2 is quite
flexible in it's appearance. It supports a great deal of SCSI scanners, and
can likewise be used as an alternative for the rather expensive STi Driver Packs.
The estimated retail price of ImpOS/2 is about € 160 for the full package,
and € 95 for an upgrade.
2.4. SANE/2 herschrijven!!!!!!
In tegenstelling tot vorigen naast scsi ook lpt, usb... Basis tame, maar verschillen
This section is slightly a modified version of the SANE Introduction at
www.sane-project.org web site.
SANE stands for Scanner Access Now Easy and is an application programming
interface (API) that provides standardized access to any raster image scanner
hardware (flatbed scanner, hand-held scanner, video- and still-cameras, frame-grabbers,
etc.). The SANE API is public domain and its discussion and development is open
to everybody. The current source code is written for UNIX (including GNU/Linux)
and is available under the GNU General Public License (the SANE API is available
to proprietary applications and backends as well, however). Ports to MacOS X,
OS/2 and Microsoft Windows are either already done or in progress.
SANE is a universal scanner interface. The value of such a universal interface
is that it allows writing just one driver per image acquisition device rather
than one driver for each device and application. So, if you have three applications
and four devices, traditionally you'd have had to write 12 different programs.
With SANE, this number is reduced to seven: the three applications plus the
four drivers. Of course, the savings get even bigger as more and more drivers
and/or applications are added.
Not only does SANE reduce development time and code duplication, it also raises
the level at which applications can work. As such, it will enable applications
that were previously unheard of in the UNIX world. While SANE is primarily targeted
at a UNIX environment, the standard has been carefully designed to make it possible
to implement the API on virtually any hardware or operating system.
While SANE is an acronym for Scanner Access Now Easy, the hope is of
course that SANE is indeed sane in the sense that it will allow easy implementation
of the API while accommodating all features required by today's scanner hardware
and applications. Specifically, SANE should be broad enough to accommodate devices
such as scanners, digital still and video cameras, as well as virtual devices
like image file filters.
If you're familiar with TWAIN, you may wonder why there is a need for SANE.
Simply put, TWAIN does not separate the user-interface from the driver of a
device. This, unfortunately, makes it difficult, if not impossible, to provide
network transparent access to image acquisition devices (which is useful if
you have a LAN full of machines, but scanners connected to only one or two machines;
it's obviously also useful for remote-controlled cameras and such). It also
means that any particular TWAIN driver is pretty much married to a particular
GUI API (be it Win32 or the Mac API). In contrast, SANE cleanly separates device
controls from their representation in a user-interface. As a result, SANE has
no difficulty supporting command-line driven interfaces or network-transparent
scanning. For these reasons, it is unlikely that there will ever be a SANE backend
that can talk to a TWAIN driver. The converse is no problem though: it is pretty
straight forward to access SANE devices through a TWAIN source. In summary,
if TWAIN had been just a little better designed, there would have been no reason
for SANE to exist, but things being the way they are, TWAIN simply isn't SANE.
Several front-end for the OS/2 SANE port exist which comes in very handy, since
SANE is rather difficult to use for daily scanning. One of them is TAME/2, mentioned
in a separate section later on in this article. Another one is ScanIt/2,
but this one has been discontinued during 2000. The homepage of the SANE/2 Project
is located at http://home.tiscalinet.de/fbakan/sane-os2.htm.
The SANE/2 port is available for free, but it's hard to cope with the difficult
text-line commands, and settings.
Based on SANE/2, TAME/2 is in fact a REXX Graphical User Interface (GUI) that
makes the difficult use of SANE easy. Besides offering an easy-to-use and flexible
interface, TAME also comes with a huge list of extra goodies, that make it very
interesting for most people.
Tame/2 keeps the promise of easy scanner access. You no longer must fight with
non-mnemonic command lines to produce scans or to get the most out of your scanner;
Tame/2 offer a complete set of scanning utilities which you can install via
a simple installer.
To illustrate TAME/2's easy interface, we've provided some screenshots below.
Click the thumbnails to enlarge the picture:
It is nothing more and nothing less than an bridge between your scanner, Sane,
MMOS/2 and your preferred picture painting/drawing/publishing application.
Beyond simply controlling the scan process, Tame/2 supports printing and faxing
from scans, uses an OS/2 "light table" to facilitate handling your
scanned images, scanning from a queue, and scanning over a network.
Tame/2 is not a drawing or picture painting tool, therefore no picture
editing capabilities are built in. For that, you'll still need to use PMView,
Embellish, or another graphics suite.
Tame/2 should work with every SANE release and with every Scanner supported
by an OS/2 port of SANE. This is done using database of the specific settings
for the scanners supported by SANE.
Some major features of Tame/2:
- Included Sane versions updated to 1.0.12
- Easy-to-use installation program
- Updated and corrected Scanner.dat
- Fixed sliders
- Fixed preview
- Network scanning configuration added to INF
- Skinned wait window with abort
- Calculation of image size
- Selectable ruler units (mm,cm,inches, pixel, point, pica)
- Reduced main GUI for 640x480
- Skinned About screen
- Activate SCSI-Bus in setup
- Printing/Faxing enabled
- Queued Scanning
- Support for scanners requiring units other than mm
- Selectable preview quality (required for slide-scanners)
- On-the-fly switching for adapters
- Resolve dependencies
- Displaying PNM in preview reestablished
- Limit printable scansize to 40MB (so prtgraph.dll doesn't crash)
- Check free diskspace
- Extended bubblehelp
- Update of objecthints.ini if WPSwizard is installed
And what about Linux?
At this moment, only one of these scanning solutions is actually developed
actively: TAME/2. Not only does the suite offer an easy-to-use handy graphical
user interface (GUI), but also it offers considerably more interesting and powerful
features than SANE/2. Though other solutions still work fine, they are no longer
being developed and these commercial solutions are aging rapidly. All in all,
TAME/2 is a must if you want to use modern scanners on OS/2 Warp or eComStation.