Advanced Indexing: OCR Assisted Indexing

Overview

 

Turn on OCR assisted indexing by checking the "Enable OCR indexing in Index module" checkbox. When activated, users may select the index field, then use the mouse to hover on the relevant text/numbers highlighted in blue. With a left click, the data will be populated in the corresponding index field.

 

 

Settings Tab

  • Documents to OCR - Select which documents processed to use OCR on in the Index Module. The options are:
    • All Documents
    • Unclassified Documents Only
    • Only Documents with no Triggered Zone Profile
  • Pages to OCR - Select which pages processed to use OCR on in the Index module. The options are:
    • First Page of Document Only
    • All Pages of Document (Default)
    • Custom Page List of Document - When selected an area to enter the custom page(s) will appear. NOTE:Enter pages and page ranges, separated by a comma (e.g. 1, 3, 5-7, 9-END)
  • Zone to OCR - Choose to OCR the entire page or select a user defined zone to perform OCR.


 

Define Zones

 

 

Zone Configuration Menu

The Icons in the ribbon menu of the above screen allow the user to:

 

  Save the zone settings created.
  Load a template image via Windows Explorer.
  Capture a template image via capture device.
  Reruns any auto zone creations if edits were made.

 

Zone Configuration - Viewer Pane

 

 

Viewer Controls

The viewer window toolbar at the top of this window has the following tools:

 

  Pointer - A pointer tool allowing the user to navigate the screen is selected by default.
  Draw Zone (CTRL+1) - Manually draw a zone on the template.
  Draw OMR Zone (CTRL+2) - Manually draw an OMR zone on the template.
  Draw OCR Smart Zone (CTRL+3) - Manually draw an OCR smart zone on the template.
  Draw OMR Smart Zone (CTRL+4) - Manually draw an OMR smart zone on the template.
  Precision OMR Zone (CTRL+5) - Manually draw a Precision OMR zone on the template.
  Select from the list to create an auto zone on the template. Example: Auto create a 1/4 page zone or 1/2 page zone.
  Create a copy of a selected zone on the template.
  Delete any selected zones on the template.
  Group multiple zones for OMR purposes.
  Select to display a single selected zone or all zones on the template.
  Select an area of the template to zoom in on.
  Zoom in / out on the template.
  Rotate the template left / right in 90 degree increments.

 

Zones Name 

Once a zone has been drawn using the tools above the next step would be to name the zone and select which page of the document it would be found on. This name usually relates to the zone area that is highlighted. For example if users have a zone drawn around the mailing address then the zone name would be mailing address. 

The zone names are kept in a list for use anywhere in the program that the user can define zones. NOTE: The page of the template and its resolution is displayed at the bottom of the screen and they MUST match the page and resolution at capture time.

 

 

Zones Buttons

Tools available in the zone naming area are:

 

  Delete the highlighted unwanted zone. NOTE: Users can use the delete key on their keyboard to delete the selected zone.
  Zone preview allows the user to see a preview of what PSIcapture would see during indexing for any selected zone.
  Ungroup a cluster of child zones contained within the selected zone (child zones are used for OMR purposes).
  This pulls up more OCR options like enabling OCR logical context filtering or enabling OCR Trigram mode.

 


 

OCR Indexing Field Options

The existing index fields as configured earlier in Capture Profile - Index Data Fields are listed here as shown below.

 

 

Auto Highlighting

 

 

OCR text will be highlighted when the appropriate index field is in focus. Expressions can be used to select which type of text strings are highlighted. Click “Add” or “Edit” to bring up the Regular Expression Editor. 

Type new Expressions in the space provided. Alternatively, the user can type in the phrase or series of numbers needed in an expression and click on "Generate" to have the expression generated automatically. A message displayed in red will denote an invalid expression, green a valid expression. Click "Select" to view commonly used expressions via the Regular Expressions Manager.

Other options available in the auto highlighting area include:

  • Highlight Words in Zone - This highlights the OCR result words without populating the index field for user review. Users can select the entire page as a zone or define a specific zone to highlight.
  • Highlight Expression Processing - This highlights an entire word, only matching words, or only matching words with a custom format. Custom format is used for word match and the options are shown by clicking on the "Help" button to the right of the Custom dialog box. 

Character Filtering

 

 

OCR results can be filtered to facilitate indexing. The options available for character filtering listed below. NOTE: The "Extended Characters" dialog is only available on the All Characters or Custom filter.

  • All Characters
  • Alpha Only (a-z, A-Z)
  • Numeric Only (0-9)
  • Numeric Extended (0-9, $%#+-.,)
  • Date(0-9,/-)
  • Extended Characters Only
  • Standard Printable Characters
  • Custom - The user may create custom filter expressions using regular expressions. NOTE: This is an inverse expression (i.e.) one that removes what the user does not want as opposed to what they do want.

 

Validation

 

 

Validation deals with invalid character action. There are a few options that are available to correct this issue as described below. 

  • Do Not Correct  - Doesn't change the invalid character for the raw OCR results.
  • Remove - Simply removes the invalid character.
  • Auto Correct - PSIcapture will attempt to correct based on the character set selected in the Character Filtering. For example in the table above if the OCR engine returns an alpha O and the Character Filtering + Extended Characters are expecting 0-9 and a, b or c. The character placed in the field would be substituted with a zero (0).
  • Replace with an invalid character marker - Allows all invalid character to be marked with a character of the users choice.  NOTE: Replacing invalid characters with a character that is invalid for that fields data type will cause either no data to be returned or errors to occur.

 

Customize Auto Highlighting Colors

The Customize tab allows the user to choose the highlighting colors for Auto Highlighting.

 

 


 

Customize

 

 

Customize how the tool tip floating dialog box will appear during operation.

 

Tool Tip Title

  • Font - choose one of the system fonts
  • Font Size - choose the desired font size
  • Font Color - choose the desired font color
  • Background Color choose the desired background color

Tool Tip Text

  • Font - choose one of the system fonts
  • Font Size - choose the desired font size
  • Font Color - choose the desired font color
  • Background Color choose the desired background color

 

Tool Tip Preview

 

 


 

Advanced Options

 

 

The Advanced Options available are listed below.

  • Word Order - Choose between Selection Order or Page Order (this effects how multiple words are compiled into one field when holding the left mouse button down and moving it)
  • Line Tolerance - Choose the distance in inches in which the user would like the program to consider words to be on the same line. (lower number is less tolerant)
  • Shift Key Appends Data - When enabled, holding down the ‘Shift’ key while clicking on the left mouse button will append data to the existing data in the field.
  • Enable preparation for OCR assisted indexing during automation - When enabled, this option allows auto-indexing steps to prepare documents for OAI indexing in order to allow for faster OCR preparation for profiles with more than one Indexing phase.

 

Preview

After configuring all the desired settings, select "Preview" button in the upper right corner to see the likely results.

 


Preview Toolbar

 

  Select the preview image via file folder.
  Capture the preview image via capture device.
  Performs OCR on the loaded page.
  Zooms in / out on the preview image.

 

For more information on the user experience during an indexing step, see:

PSIcapture User Guide: Index

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.