Indexing: OCR Smart Zone Configuration

version 7.9.x   Download Pending

 Note

This article includes advanced Administrator areas for PSIcapture.

 Audience

This article is meant for PSIcapture Administrators.

 

Overview

A Smart Zone uses OCR to search for information based on a common anchor point across documents of varying format. For example, users may have a mixture of invoices in varying formats. In this example we will use a label called “Invoice Number”, but the placement of the invoice number varies from form to form. The invoice number could be next to or underneath the label and may even be in different locations of each document (left side vs right side). Instead of defining multiple zones or multiple zone definition profiles, users can use one OCR smart zone. It will find the common anchor point (i.e. the label “Invoice Number”), and then search in predefined areas around the anchor to find the actual value needed (i.e. the invoice number itself).

This article covers configuration of a basic OCR smart zone. It also covers Grouped OCR Smart Zones (which can populate multiple index fields) and Multi-Record OCR Smart Zones (which are useful for extracting oddly formatted tables).

As well as creating a new OCR smart zone the user can create specific profiles that are associated with the Capture Profile or page that needs its own single OCR smart zone configuration or multiple OCR smart zone configurations. With the ability to create, edit, or copy an OCR smart zone profile, the user has more flexibility. This option can be configured while making an OCR smart zone or in Configuration-Global Lists-User Defined Form Field Types.

 

Basic OCR Smart Zone

In this example, we’ll create a OCR smart zone to locate the invoice number and store it in an index field.

  1. Create a new Capture Profile in PSIcapture. In the Index Data Fields configuration, create an index field called Invoice Number. Then in the Zone tab, press the Define Zones button:

    cap_root_indexing_zone1.png

  2. Press the Select Template Image button to load a document into the viewer. To draw an OCR smart zone on the image, press the Draw OCR Smart Zone button. NOTE: Alternatively a user can select an OCR smart zone profile that already exists by clicking on the Select Profile button. This will open up a selection of existing profiles for the user to choose from.

    Zone_Configuration_1.png

  3. Draw an OCR Smart Zone over a large area of the image where the Invoice Number label is sure to be. This will display the OCR Smart Zone Configuration screen:

    OCR_Smart_Zone_Configuration_1.png

  4. Press the anchor expression Add button to create a OCR Smart Zone Anchor Expression. Enter a regular expression to locate the words “INVOICE NUMBER” on the document page. Users can use the built-in regular expression builder by pressing the button. This will allow the user to highlight one or more words and automatically build a regular expression to match them. Press the Locate button to locate the anchor on the page:



    Defining Type
    1. Custom - User can manually enter the expression or use one of the Regex buttons to create the expression.

    2. System - User selects from the dropdown list of System form fields:

      systemtype.png

    3. User defined - User selects from the dropdown list of User Defined form fields:

      userdefined.png

      NOTE: User Defined Form Fields can be managed via the button at the top of the bar:

      mudff.png

      This opens the Manage User Defined Form Fields dialog box, allowing quick selection of commonly used form fields. Add, Copy, Edit, and Delete your Defined Form Fields as necessary.

      mudff2.png

      When Adding a new User Defined Form Field, the Regex editor appears:

      mudff3.png

      Add Regex definitions as helpful to fine-tuning your recognition process.

  5. In this example, we have two layouts for the invoice number: one is directly below the anchor, while the other is to the right side of the anchor. Define Child Zones for each of these areas by pressing the child zone Add button:

    1.png

  6. Press the Save icon to save the changes and return to the Zone Configuration screen:

    2.png

  7. Supply a name for the new OCR smart zone (InvoiceNumberZone), then press the Save icon.

    3.png

  8. Select OCR as the Zone Definition Action and assign the zone to the index field using the Zone drop-box. Also, check the Don’t run Zone OCR if field is already populated box to stop processing the child zones once a value has been found.

    4.png

When we run a batch, the OCR smart zone will pick up the invoice number from either of the two child positions defined. 

Beside the label:

 

 

Under the label:

 


 

Grouped Smart Zones

The basic smart zone can only populate one index field. However, there are often multiple values users wish to capture into different index fields that are all anchored by a common anchor point. In these cases, use a grouped smart zone. A grouped smart zone has group names associated with each child zone. Users can then assign individual child zone groups directly to an index field.

In the example below, we have one anchor that looks for the text “Invoice”. This anchor has four child zones. From this single anchor the Invoice Number, Invoice Date, Company, and Phone Number will be located and captured.

 

5.png

 

On the Index Data Fields configuration screen, the user can now select each of the individual child zone groups for assignment to an index field:

 

6.png

 

In the example above, the Invoice number will be populated by the zone Invoice Data:Invoice. Invoice Data will receive the value from the Invoice Data:Date child zone, phone will receive its value from the Invoice Data:Phone child zone, and Company Name from Invoice Data:Company.


 

Multi-Record Smart Zones

The multi-record OCR smart zone is used for table extraction when a table is oddly formatted. For example, if a page is made up of multiple tables, it can be difficult to set up a set of standard multi-record zones to extract the line items correctly. Instead, create a single OCR smart zone that covers just one column, enter a regular expression to detect that column’s values, and then set up grouped child zones to extract each line item’s details.

In the example below, the OCR smart zone anchors on the quantity column numeric values. Whenever one is found, the child zones gather the data from the line item:

 

 

In order to generate the multi-record smart zone, users must check box labeled Create a record for each OCR Smart Zone anchor found on the page in the Options area.

The OCR Smart Zone configuration screen accepts a hierarchy level number. These level numbers define the parent/child relationships that exist between multiple smart zones that are used for table extraction.

 

 

After saving the zone configuration, the child zones can be selected and assigned to each individual index field:

 

Was this article helpful?
0 out of 1 found this helpful

Comments

0 comments

Article is closed for comments.