|version 7.7.x||Download Pending|
This article is intended for PSIcapture Administrators.
PSIcapture's Classification Workflow step enables users to validate and match specific data to incoming documents in a variety of ways. To better understand Classification, we've broken Classification into three major focus areas:
- Page Validation – When examining forms, users need to decide the type of page validation required when processing forms. Page validation in the Classification engine defines separation and page merging functionality.
- Forms Identification – Currently in PSIcapture, users can define and classify forms based on OCR match criteria or barcode recognition. This is the most critical planning step, and will ultimately define how pages are classified and documents are created.
- Data Extraction – The ultimate goal in classification is to identify the correct Form ID, and then extract data based on the assigned Record Type.
For more information on the configuration of the Classification Workflow Step, see:
Table of Contents
Classification Form Definitions
The Form Definition in Classification allows users to define all the characteristics of a form, how to identify or classify, and provides key methods for how PSIcapture will behave when a classification occurs. In the Classification Module, users can Add, Edit, Copy and Delete form definitions, as well as change their order. As of PSIcapture 7.7, users can select "Move To..." to quickly specify an exact order of position that they would like to place the Classification Profile. Users can also import and export Classification Form Definitions, also known as a taxonomy, through the Import/Export options in the lower-right.
Global Classification Forms
NOTE: Global Classification forms are not restricted to a single Capture Profile, and can be used across any capture profile or configuration.
- Form ID - This is the name of the form.
- Record Type - A record type could be something like an invoice, quote, purchase order, etc. This is another way to separate your forms.
- Group Type - A group type could be something like manufacturing, tax, HR, construction, etc. This allows the user to group forms together per industry for instance.
- Validated - This shows whether a Form has been validated or not.
Zone Profile Triggered - This shows which Zone Profile is triggered when the application recognizes a form. The gear icon allows users to edit the Zone Profile from Classification Settings.
Zone Profile Editing Options - When an user clicks on the gear icon they get 3 options:
Edit the Zone Profile associated with the Form. If multiple profiles are associated to the current Form the option of choosing which profile to edit is available.
- Create a new Zone Profile for the Form.
- Select a different Zone Profile not associated with the Form.
At the bottom of the forms list there is an area where the user can see a few statistics. These statistics tell the user how many total forms there are, how many validated forms there are and what percentage of them are actually validated, how many record types there are, and how many groups of forms there are. Users can also choose whether to show if a form has been validated via a checkbox in the far right column of the forms list.
NOTE: The View Usage button allows users to view global usage of each form. When clicking on View Usage the following window will pop up allowing users to run a query by timeframe.
Adding Form Definitions
Clicking the Add button will open the Form Definition dialog. As mentioned, this provides an interface for defining all the characteristics of a form. Within this configuration interface, users have the standard template toolbar which allows them to load or scan a template image, as well as a set of zooming tools.
- Form ID – The Form ID is the name of the form these characteristics define. Note: This name will be available as a variable, and be placed in a linked index field.
- Group – The Group allows users to create subsets of forms and currently is purely for organization within the configuration.
- Record Type – This dropdown will link to the configured Record Types on step 3 of the configuration wizard, and allows the linking of the Form Definition to the chosen Record.
- Description – Allows a user defined description of the form.
- Page Count – For forms of specified page lengths, this count will be utilized in page validation.
- Usage Ranking Behavior - This option allows users to keep the current use ranked position or override usage ranking settings so that the selected form gets process in the beginning or end of the queue.
Use Ranked position
Override Ranking and process Form at the beginning of the Form list
Override Ranking and process Form at the end of the Form list
The Classification Rules section of the module provides the ability to input one or more rules that will define the form. Below are the options:
- Match – Users can choose a positive or negative match for the rule, and combine them to build a series of rules that will define the form. For instance, users may have a form that has “Form OFS 2” on the top, but there are two versions, with different locations for the required data. One form has “Version 2” on the bottom, one does not. Users can use a negative rule to make sure the form without Version 2 is properly identified.
- Rule Type – Currently there are two types of rules, OCR Text and Barcode.
- Search Region - This allows the user to select where on the page the OCR text is searched for.
- Index Value - This allows the user to select which index field to set the value of using the classification rule.
- Rule Value – The Rule Value provides an entry point for a regular expression to match either the barcode value or an OCR expression. This will trigger the classification and setting of Record Type.
- Rule Match Behavior – If users have multiple rules, this drop down will provide a means to logically combine them to define the overall match. Users can either choose to match on the first rule matched, or make the combination of all the rules required.
Note: The order of rules can be used to the user's advantage as rules are processed in the order of entry.
|When clicked a pop-up window comes up allowing the user to choose what text will be used to identify the Form.|
|When clicked the Barcode Recognition window pops up allowing the user to choose what barcode will be used to identify the Form.|
|When clicked the application will verify that it recognizes the text or barcode defined.|
|When clicked the edit Regex window pops up allowing the user to edit the regex for the rule.|
|Deletes the rule.|
Last Page Classification Rules
If Last Page Rule processing is enabled and a Form Definition contains Last Page Rules, then when that Form is classified, all other Page Validation and classification is disabled and classification will only search for a matching last page for that form. Once is it is found, all pages up to that page will be added to that Form and classification will switch back to normal processing looking for matches for all defined forms. We will also handle the special case where the first page of a Form is also a last page.
If a Form Definition does not contain Last Page Rules, then the selected option under Page Validation will be used (Loose, Strict, None). This allows users to mix both types of validation in case they aren't able to use Last Page Rules for all of their forms.
Table Extraction-Line Items
This allows classification based on the page orientation or the size of the form. This can be useful as an additional criteria for defining a form, or can be used by itself with no rules to define a form. An example might be when scanning checks and check stubs, users can assign a record type of Check when certain page size criteria are met.
Clicking Import button on Classification Module settings will now display a dialog allowing users to choose which type of import to perform:
How to import via database:
- Set up the Database Connection. This uses standard dialogs used throughout the product.
- Import Definition
- Form ID is required
- Form ID, Description and Rules all use the standard Build Custom Value dialog to build those values from different database fields/constants.
- The other fields are all optional including Rules. Setting up Rules during this step applies them universally across all imported forms. By making Rules optional, it allows the user to come back later and add rules to individual forms.
- When defining Rules, users can either use the values from the table as is or run the values through the Regex Builder to generate codes necessary. This behavior is controlled for each rule separately using the “Convert to Regular Expression” option. The global Regex Options can be accessed using the Regular Expression Options button.
- Import Options
- Duplicate Form ID Behavior – Users can either skip creation of a form if a duplicate is found or add the rules to an existing form.
- “Mark Imported Classification Form Definitions as Not Validated….” – If selected, this option will import the form as Not Validated. If the corresponding option on the Classification Definition settings is selected (see below), documents that match these Non Validated Forms will be treated as Exceptions to be processed on the Classification Validation dialog. To validate the Form, the user will open the Form in the ACE dialog. When they save out of ACE, the form will be validated for that document, any others in the batch of that type of Form and all future documents classified as that Form type.
- "Do not create Classification Form Definitions that have no rules" - If selected no rule will be added and the form will not be created. The system will warn the user and let them know which form definitions were not made.
Sample Database Import
Custom Text File Import
All users need to do is Browse to the location of the text file and click the Import button.
This allows users to select an XML file that they have exported previously from the Form Definitions export option. NOTE: In versions 6.0.2.x and below this import option is only available in the Classification Configuration settings of the main configuration.
This allows users to export an XML file from Classification Workflow Settings.
Data Extraction and Classification
Once a document is classified, and a Record Type assigned, custom data extraction rules can be applied for that particular type of document. Through the use of shared and unique fields tied to Record Types, all the different methods of data population are available. There are several key features that leverage Record Type focused extraction:
- Dynamic Regular Expressions – Advanced Data Extraction (ADE) now allows specific regular expressions to be configured based on the Record Type.
- Zone Profiles – Allow zone OCR-based templates that are linked to specific Record Types.