Table of Contents
Optical Character Recognition (OCR) is a technology that enables the digitization of scanned images with printed or handwritten text into machine-readable data that can later be used for electronic editing. Image sources fed to OCR software include image-only PDFs, scanned documents, handwritten manuscripts or camera images, among others.
Applications of OCR technology are wide and varied and include automatic data entry, passport recognition in airports, digitizing dated newspapers, automatic number-plate recognition and assistive technology for the visually impaired. Advantages of using OCR to digitize text are clear. That is, OCR offers a massive saving of storage space by compacting paper documents into electronic documents; searchability is vastly improved for printed texts; revising a document is then easier once a text has been computerised into machine-encoded text and can be done with a standard word processor; and digital backups of printed text (e.g., legal paperwork or newspapers) can be done frequently and with greater security over keeping documents in printed form.
Squish includes OCR as a compliment to its already powerful Object-based and Image-based recognition methods. Variability in a component's visual appearance is particularly prominent for onscreen text when trying to create platform-independent tests, due to a wide assortment of fonts, font sizes, decorations and rendering modes. Thus, Image-based recognition methods, including Fuzzy Image Search, are generally unsuitable for locating text onscreen. OCR therefore allows for efficient text handling in those scenarios where the same text is rendered with different parameters, making it look largely dissimilar in pixel-to-pixel comparison (i.e., due to varying letter widths, different kerning or shifting line break positions).
Squish uses, as its primary engine, the free Tesseract OCR library to faciliate text recognition. In order to use the Tesseract OCR engine, the package, including all of the language files, needs to be installed independently of Squish. Any other OCR engine can potentially be substituted for use with Squish.
Tesseract for Squish is supplied as a single, easy-to-install binary package that contains the engine libraries and the full set of language files. The packages for all supported platforms can be found in the download portal.
Table of Contents
Please download the Tesseract for Squish package for your operating system from your customer area onto your computer, and execute it.
![]() | On Linux |
---|---|
On Linux, you first need to make the
|
The installation program will guide you through the configuration process by presenting multiple pages.
![]() | Changing Configuration Settings |
---|---|
Once you start the installer, you can go back to change a configuration setting using the Back button and proceed to the following pages using the Next button. |
This step decides the location on your system in which the Tesseract for Squish will be installed.
After selecting the installation folder, you will be presented with the license under which you are permitted to use your copy of Tesseract for Squish. Please read the entire license text carefully before proceeding. Click one of the two radio buttons ( or ), that appear below the license text, to indicate whether you agree or disagree with the terms. If you disagree, then you cannot install or use Tesseract for Squish. To terminate the installation, click the .
If you accept the license, the Next button will become enabled, and you can proceed to the next step of the configuration process.
In order to use Tesseract with Squish, its installation path needs to be registered with Squish. The Tesseract for Squish package installer will perform the registration during the installation if the Register the Tesseract installation with Squish selected.
![]() | Note |
---|---|
If you choose not to register the Tesseract installation with Squish, you can do it at a later time by entering the chosen installation path on the Squish IDE OCR Preferences pane or by editing the ocr.ini (Section 7.6.1.2) file manually. |
At this point all the configuration options have been set and the installation is ready to launch. A page is shown which displays the disk space required by the Tesseract for Squish installation.
The installation program now commences installing Tesseract for Squish on your system. You can click the Show Details button to get a detailed list of actions performed as part of the installation.
You can close the installer at any time, e.g. by closing the window or by pressing the Cancel button (only visible on platforms other than macOS). All changes done so far will be rolled back.
It is possible to perform the installation of Tesseract for Squish
completely unattended, passing any required values up front. Unattended
installation requires no user interactions whatsoever and is equivalent to
manually interacting with the installer interface. To perform an unattended
installation, invoke the Tesseract for Squish installation program from the
command line passing at least the argument unattended=1
:
$ ./tesseract-4.0.0-for-squish.x64.run unattended=1 <more options...>
That argument will launch the installation without any graphical user interface. Instead, progress information and potential error messages are written to the console.
In addition to the unattended=1
argument, you may want to specify
targetdir=<PATH>
argument to specify the
target installation directory or the register=0
to disable the
automatic registration of the engine with Squish
$ ./tesseract-4.0.0-for-squish.x64.run unattended=1 targetdir=/opt/tesseract register=0