Overview
Document Profiles represent a PSIcapture Profile that has been imported, or a PSIcapture Mailroom Profile that has been manually configured to define the index fields, content import methods, and associated database lookups. Each document profile contains:
- Indexes - A list of one or more fields that define which metadata attributes should be extracted from the document content.
- General Configuration - A series of rules applied to the document profile or the document profile fields that further define the content the document profile expects
- Database Lookups - Configuration options for optional features that work to improve user throughput and accuracy when processing documents
The remainder of this article will cover features for managing the list of document profiles in the organization.
Adding or Editing a Document Profile
To add a new document profile click the Add button in the upper right corner of the document profile table. To edit an existing document profile, select the document profile you wish to edit and click the pencil icon to the right of the profile row. Both adding and editing a document profile share the same dialog
Editing a document profile involves multiple distinct sections: The General and Fields articles cover mandatory configuration steps for every document profile. The Lookups article covers an optional feature that can help improve data accuracy and indexing throughput. This section has been split into multiple articles to improve readability and ease of use.
General - Attributes & Options
The general tab defines attributes that describe the document profile, as well as a number of options that control the workflow and features that the document profile will employ. Options in this section are documented in the General Tab section below.
Fields - Defining Document Metadata
After configuring the general options for the document profile, you must add one or more fields. The Fields Tab section below documents data profiles and general field options, as well as validation and usability features that help improve the quality of your captured data without sacrificing indexing efficiency.
Lookups - Integrating External Data
When indexing documents in your organization, there are often third party data sources available that contain critical metadata and can be leveraged to reduce the amount of manual keying necessary for each document. PSIcapture Mailroom provides an optional feature called Lookups to allow you to tap into this data and integrate it into your thin client indexing solution.
Copying a Document profile
Copying a document profile allows you to take a completed document profile, make modifications to it and save the modified document profile using a new distinct name. To copy a document profile:
- Select the document profile you wish to use as a starting point for the new document profile
- Click the Copy button
- Edit the name of the document profile to be a new, unique name
- Make any additional changes to the document profile attributes, options, fields or lookups; as documented throughout this section
- Save the completed document profile
Removing a Document profile
To remove a document profile from the system, select the document profile you wish to delete from the list of document profiles and click the Remove button.
Removed document profiles are not backed up automatically or recoverable in any way. It is recommended that you first export a backup copy of a document profile before removing it in case you later need to restore it.
Importing & Exporting Document profiles
PSIcapture Mailroom allows administrators to import document profiles from external sources to facilitate sharing configurations across organizations or deployments and to ease backup and restore of critical document profile configurations. Exported document profiles are stored in an industry standard, open Extensible Markup Language (XML) format.
Importing a Document profile
To import a document profile into an organization, follow the following steps:
- Click the Import button
- Locate the XML file for the document profile you wish to import
- PSIcapture Mailroom will create a new document profile from the imported configuration file and open the new document profile for editing
- Follow the procedures documented above for Adding or Editing a Document profile
Exporting a Document profile
To export a document profile to XML, select the document profile you wish to export and click the Export button. You will be prompted for a location and file name to use when the document profile is exported. PSIcapture Mailroom will then export a complete copy of the document profile, including all attributes, options settings, fields and lookups as well as any linked Lists or Connections.
The exported document profile is not version dependent and should be usable with any version of PSIcapture Mailroom.
Configuring Document Profiles
Expand the steps below as needed to configure your PSIcapture Mailroom Document Profile:
Step 1: General Tab
Document Profile Name
The document profile name is used to uniquely identify the document profile in system lists and is displayed to users when interacting with their document queue.
Document Profile Output Folder Name
As documents in PSIcapture Mailroom are indexed and completed, they are moved out of the user's queue and into a completed folder where they await pick-up by PSIcapture. The path documents are exported to consists multiple sections, including a folder that indicates which document profile was used to process the document. The output folder name allows the administrator to specify what name should be used when outputting completed documents.
This field is optional, when it is not specified the document profile name will be used.
Description
The description is an informational field where administrators can provide a brief summary, notes or instructions to future administrators regarding the current document profile. The description is not displayed to end users, nor used by the application.
Multiple Records per Document
This option enables the collection of multiple data records for each document. By default, each document is associated with a single set of metadata fields. When multiple records are enabled, users are presented with interface controls that enable the creation and management of additional records for each document, where each record contains a distinct set of the defined document profile fields.
For example with invoices users could have multiple records that include quantity, description, cost, etc for multiple line items.
Copying Field Values When Adding a Record to a Document
This option enables users to copy any field data from one record to another. For example if a user is indexing multiple line invoices that have the same quantity or item, the index information gets transferred to the next record.
Summary
After configuring the attributes and options on the document profile, you must define one or more metadata fields. Please proceed to the next step of this guide where we will introduce metadata fields, validation and automation.
Step 2: Fields Tab
Fields represent the largest single configuration area when setting up a new document type, and as such understanding the information in this article is a critically important aspect to mastering document profile creation.
Adding a Field
Clicking the Add button will add a new field to the field list, using default attributes, validation and automation options. Users must populate the field name before the document type will validate and allow saving.
Removing a Field
Select a field from the field list and click on the "x" icon to remove it from the list. If the field is mapped to a list the list definition is left unchanged.
This action is not reversible, and no backup of the associated field validation and automation settings is performed. To restore the removed field you will have to add a new field and recreate the settings manually.
Field Order
The order fields are displayed in the administration interface is also the order the fields will be rendered in the index application interface. If users wish to reorder the fields for the document profile, select a field use the Up and Down buttons at the top right of the table to adjust the field position.
Repeat this for each field to be moved.
Index Field Options
The options outlined below describe the field options available.
Field Name
Fields in each document profile must be identified by a unique name. Names are only required to be unique within the current document profile, and can be reused within other document profiles in the application. It is recommended that you use field names that are concise and accurately describe the data you wish to collect in that field.
Data Type
Data types define the type and format of data the field should accept. Each field is required to define exactly one data type. The data types supported by PSIcapture Mailroom are as follows:
This list includes documentation of some characters that have different meanings across different cultures and locales. When specifying supported characters by field data type, these character lists have no bearing on how data will be parsed with respect to the current system's culture and locale settings. The currently active culture and locale on the client system will always be used to parse text based numeric and date/time data, and PSIcapture Mailroom will fully support data operations on systems where the current locale is not set to US English.
Text
Text fields can accept any printable characters and are suitable for containing any type and format of incoming data. This is the default data type applied to newly added fields.
Text fields are rendered in the indexing interface as a single line text box. If users wish to have a multiple line text field, we recommend using the Memo field. Hitting the enter key while a text field is active in the queue will advance the interface to the next available field in the document profile. This differs from the Memo data type where hitting the enter key will progress to the next line.
Text data is stored in double-byte compatible Unicode Transformation Format (UTF), where the UTF variant is automatically selected based from UTF-8, UTF-16 and UTF-32 based on the text content.
Number
Number fields are suitable for capturing and displaying any type of numeric data, including whole numbers, fractional numbers, and negative numbers.
Number fields are rendered in the indexing interface as a single line text box. Number fields are filtered to accept only the following characters:
- Digits (0 to 9)
- Sign indicators (+ and -)
- Decimal separator (.)
- Thousands separator (,)
Number data is stored as a double width (64 bit) floating point value.
Currency
Currency fields are suitable for expressing monetary values, and are parsed, filtered and stored using the same rules documented for the Number data type.
Whole Number
Whole number fields are limited to displaying digits only and will not accept any form of punctuation or separator characters. Use whole number fields when you wish to capture numeric field data, but restrict the entry of separator or signage characters from the field.
Whole number fields are rendered in the user interface as a single line text box, with filtering options set to accept only digit characters.
Whole number data is stored as a 32 bit integer value.
Decimal
Decimal fields are used for collecting numeric data that should always contain a decimal separator value. Parsing, filtering and storage of decimal fields follows the same rules documented for the Number field type.
Date/Time
Date/Time fields are used for collecting date values, with an optional time component. Date/Time values are always parsed and manipulated using culture-aware processes that respect the current culture and locale settings from the client application.
Date/Time fields are rendered in the user interface using a date picker control that also supports masked and validated text entry.
Storage of date/time values is done using Coordinated Universal Time (UTC) values coupled with a culture-agnostic text format that is capable of being persisted and retrieved multiple times with no loss of fidelity. This ensures that captured date/time data can be moved between systems with different culture and locale settings, operating in different time zones without any negative effect on date/time data.
Boolean
Fields assigned the boolean data type can be used to capture true/false, on/off, set/unset data, and are represented in the user interface as a check box.
Boolean values are stored using a boolean data type and will always have a value of either true or false.
Memo
Memo fields function identically to Text fields, with the exception of the indexing interface, where they are rendered using a multiple line enabled text box control that is configured to accept line feeds. This changes the user experience from that of the text data type since the enter key will no longer advance to the next field, but instead progress to the next line of the memo box..
Data storage and filtering are handled exactly as documented for the text field type.
General
Validation options are divided into general options and advanced options.
Field Description
The field description is used to provide explanatory information or instructions regarding the intended use of the field. The field description is displayed to the indexing operators via the user interface, so this is an ideal place to specify instructions relating to the validation requirements for the field. This will assist index operators in entering valid data quicker by ensuring that the data requirements for each field are clearly documented and accessible.
The description is limited to 200 characters.
Default Value
There are a number of default values that are preset in PSIcapture Mailroom that users can select or optionally users can enter a custom value.
- None
- Custom Value
- Today's Date
- Today's Date/Time
- Document Profile Name
- User ID
- User Name
- User Display Name
- User First Name
- User Last Name
- User Email Address
- Document Timestamp
- Document Page Count
Automation Options
Other general options include the following as described. They are in order from left to right by row.
- Required - Marking a field as required forces index operators to provide a value for the field.
- Skip - Skip fields will be treated as a tab stop when the index operator is using either the tab or enter keys to navigate the list of fields. The field will be skipped and focus will stop on the next non-skipped field, or the save button, if there are no non-skipped fields remaining. NOTE: The only exception to this rule is if the skip field is marked with one or more validation options and does not pass validation. In this case focus will stop on the field as if the skip option were not set.
- Sticky - Fields with the sticky attribute set will default to the value entered in the same field on the previous document, carrying forward the same value until it is changed. Sticky fields are a good choice when a metadata field is read from one of the first documents in a set, and then remains static for the next several documents before changing again.
Combining the sticky setting with the Required and Skip options will enable users to ignore the field unless it is blank, and the user may click into the field manually when the value needs to be changed.
In addition to holding values manually entered by the user, sticky fields will also maintain any default value for the field or any value imported from the document metadata. - Read Only - Read Only fields are displayed in the user interface during index processing, however they are not editable. This is useful for when an external application populates and exports data to PSIcapture Mailroom that should be left intact and unmodified by the user. If the user should not see the information, you should use the Hidden option instead.
- Hidden - Hidden fields will not be displayed in the user interface during index processing. This is useful for when an external application populates and exports data to PSIcapture Mailroom that should be left intact and unmodified by the user. This field data can then be exported from PSIcapture Mailroom upon document completion and sent back with the document to PSIcapture.
Use hidden fields for this purpose when the index operator does not need, or should not be allowed; to view the data in the field. If the user should see the data, but be restricted from editing it, you should use the Read Only option instead. - Protected - Protected fields will not be cleared if the Clear button is hit while indexing. This field can be edited at any time and Read Only should be used if the field shouldn't be edited.
- Synchronized - Synchronized fields work similarly to Sticky fields, but instead of copying the value from the previous document, the value is copied across all data records in the current document. This option is only effective when Indexing Documents with Multiple Records is enabled, and will have no affect in other scenarios.
- Minimum Length - Defining a minimum length for a field forces index operators to ensure that the length of the data in the field is not less than the specified minimum value. NOTE: Minimum length can only be set on text or memo fields.
- Maximum Length - Defining a maximum length for a field forces index operators to ensure that the length of the data in the field does not exceed the specified maximum value. NOTE: Maximum length can only be set on text or memo fields.
Advanced
Map to List
Selecting the map to list option enables the current field to be linked to a list defined within the PSIcapture Mailroom organization. Lists are an easy way to provide a list of suggested values for the field to the index operator.
Default values and imported metadata values will be matched to the list, and the appropriate list value will be selected, if it exists.
The available lists for the current field will be restricted to those lists that conform to the data type for the current field. If users have defined a list that is not showing up in the Map to List drop down control, ensure that the data type for the list and the data type for the field match.
To learn about defining lists and optionally connecting them to external data sources, please refer to the Lists article.
Limit Field Value to List Entries
This option is only enabled after the Map to List option has been enabled. Limiting a field value to the available list entries converts the list backed field into a Constrained Value List (CVL). Fields with this option set will fail validation if the field value does not correspond to one of the available list values.
Enable Regular Expression Validation
Regular Expression Validation enables complex text based matching rules to be defined against each field. When a regular expression validation pattern is applied to a field, the data in the field results in a positive match against the regular expression pattern in order for validation to succeed. Authoring regular expressions is a complex topic that we can not fully cover here, however there are excellent resources available should users wish to learn more.
A good tutorial and general reference site, maintained by the author of the excellent Regex Buddy tool: http://www.regular-expressions.info/
Input Mask
When an input mask is defined, placeholders are defined by the PromptChar property. When inputting data, the user can only replace a placeholder with a character that is of the same type as the one specified in the input mask. If the user enters an invalid character, the control rejects the character. The control can distinguish between numeric and alphabetic characters for validation, but cannot validate for valid content, such as the correct month or time of day.
The input mask can consist of the following characters:
Character | Description |
---|---|
# | Digit placeholder Character must be numeric (0-9) and entry is required. |
. | Decimal placeholder The actual character used is the one specified as the decimal placeholder by the system's international settings. This character is treated as a literal for masking purposes. |
, | Thousands separator The actual character used is the one specified as the thousands separator by the system's international settings. This character is treated as a literal for masking purposes. |
: | Time separator The actual character used is the one specified as the time separator by the system's international settings. This character is treated as a literal for masking purposes . |
/ | Date separator The actual character used is the one specified as the date separator by the system's international settings. This character is treated as a literal for masking purposes. |
\ | Treat the next character in the mask string as a literal. This allows you to include the '#', '&', 'A', and '?' as well as other characters with special meanings in the mask. This character is treated as a literal for masking purposes. |
& |
Character placeholder |
> | Convert all the characters that follow to uppercase. |
< | Convert all the characters that follow to lowercase. |
A | Alphanumeric character placeholder. For example: a-z, A-Z, or 0-9. Character entry is required. |
a | Alphanumeric character placeholder. For example: a-z, A-Z, or 0-9. Character entry is not required. |
9 | Digit placeholder Character must be numeric (0-9) but entry is not required. |
- | Minus sign when followed by a number section defined by series of 'n's (like in "-nn,nnn.nn") indicates that negative numbers are allowed. When not followed by a series of 'n's, it's taken as a literal. Minus sign will only be shown when the number is actually negative. |
+ | Plus sign when followed by a number section defined by series of 'n's (like in "-nn,nnn.nn") indicates that negative numbers are allowed. However, it differs from '-' in the respect that it will always show a '+' or a '-' sign depending on whether the number is positive or negative. |
C | Character or space placeholder Character entry is not required. This operates exactly like the '&' placeholder, and ensures compatibility with Microsoft Access. |
? | Letter placeholder For example: a-z or A-Z. Character entry is not required. |
n |
Digit placeholder |
mm,dd,yy | Combination of these three special tokens can be used to define a date mask. mm for month, dd for day, yy for two digit year and yyyy for four digit year. Examples: mm/dd/yyyy, yyyy/mm/dd, mm/yy. |
hh,mm,ss,tt | Combination of these three special tokens can be used to define a time mask. hh for hour, mm for minute, ss for second, and tt for AP/PM. Examples: hh:mm, hh:mm tt, hh:mm:ss. |
{date} | {date} token is a place holder for short date input. The date mask is derived using the underlying culture settings. |
{time} | {time} token is a place holder for short time input. Short time typically does not include the seconds portion. The time mask is derived using the underlying culture settings. |
{longtime} | {longtime} token is a place holder for long time input. Long time typically includes the seconds portion. The long time mask is derived using the underlying culture settings. |
{double:i.f:c} | {double:i.f:c} is a place holder for a mask that allows floating point input where i and f in i.f specify the number of digits in the integer and fraction portions respectively. The :c portion of the mask is optional and it specifies that the inputting of the value should be done continous across fraction and integer portions. For example, with :c in the mask, in order to enter 12.34 the user types in "1234". Notice that the decimal separator character is missing. This allevietes the user from having to type in the decimal separator. |
{double:-i.f:c} | Same as {double:i.f:c} except this allows negative numbers. |
{currency:i.f:c} | Same as {double:i.f:c} except the mask is constructed based on currency formatting information of the underlying format provider or the culture. It typically has the currency symbol and also displays the group characters. |
{currency:-i.f:c} | Same as {currency:i.f:c} except this allows negative numbers. |
Literal |
All other symbols are displayed as literals; that is, they appear as themselves. |
NOTE: You can also escape the mask with {LOC} character sequence to indicate that symbols in the following table should be mapped to the associated symbols in the underlying culture settings.
Character | Description |
---|---|
$ | Currency symbol |
/ | Date separator |
: | Time separator |
, | Thousands separator |
. | Decimal separator |
+ | Positive sign |
- | Negative sign |
Summary
After configuring the fields for the document type, you may optionally define one or more Lookups. Please proceed to the next step of this guide where we will demonstrate how to configure PSIcapture Mailroom to make use of external data sources
Step 3: Lookups Tab
Typical businesses collect, produce and maintain data across various systems that can often be leveraged to reduce or eliminate the manual keying necessary when indexing documents. Lookups are an optional feature within each document type that allow users to tap into this data and integrate it into this thin client indexing solution.
Lookups use shared connections and queries from the PSIcapture Mailroom organization to connect to external data sources. If you have not yet configured any connections, or are unfamiliar with connections and queries within PSIcapture Mailroom, please refer to the Connections article.
Each lookup uses key fields, return fields and lookup options to shape the query that is built, as well as how the results are handled by the indexing client application. During indexing, index operators providing metadata values for lookup key fields either by using imported values, default values or manually keyed values; and the system then builds and executes the lookup query and automatically populates mapped return fields with the query results, using the defined lookup options to determine which records to apply.
Adding or Editing a Lookup
To add a new lookup click the Add button at the top right corner of the lookup table. To edit an existing lookup, select the lookup you wish to edit and click the pencil icon. Both adding and editing a lookup share the same dialog, as seen in the screenshot below.
- Lookup Name - Enter a unique name for the lookup. The lookup name is only required to be unique within the current document profile, and is only used for display and identification purposes.
- Connection - Select an available connection from the list to use for querying external data. After selecting a connection, the list of queries available will be populated.
- Query - Select an available query to use as the key field return. NOTE: The field mapping portion of the dialog will only be displayed after selecting a valid connection and query
-
Max Records - The maximum records option specifies the maximum number of result records the lookup query execution will allow. Where applicable, data source specific extensions will be used to limit the query results. For data sources that do not support a limit extension, the results will be queried at the PSIcapture Mailroom server and limited manually before transporting the result set to the client application.
-
Record Selection Options - The record selection options setting determines how PSIcapture Mailroom should handle lookup queries that result in more than one record being returned from the external data source. The following options are available:
-
Apply the first returned record - The first returned record from the query results will be used to update the metadata. This is the default behavior.
-
Select the record to apply - If the query results contain more than one record, a dialog will be shown to the index operator where they must select the record that should be applied. When the query returns only one record, the data from the single result record will be applied.
-
Create an index record for each returned record - Creates an index record on the document corresponding to each returned result record. Any existing records on the document will be reused before adding new records, and the result records metadata will be applied to the document records in the order the results are returned. This option will be ignored when multiple records are not enabled on the document type.
-
-
Lookup Execution Options - The lookup execution options lets users modify the conditions under which the lookup query will be run. This option is useful for controlling the execution of multiple lookups that share one or more document index fields. Users can setup multiple lookups that share the same or overlapping field sets for their key and mapped value fields, with each lookup pointing to a different external data source. The lookups will be run in the order they are defined in the field mappings table. Using the options defined here, users can ensure that the lookups will only be run until one of them provides a valid result record.
- Execute Always - The lookup will run every time it is triggered, regardless of the metadata content of the return fields. This is the default behavior.
- Execute if all returned fields are blank - The lookup will run when all return field values are blank.
- Execute if one or more return fields are blank - The lookup will run when any of the return field values are blank.
-
Wildcard Handling Options - This option controls the query that the lookup will build before executing against the external data source. The available options and their effect are listed below.
Wildcard characters supported by the lookup query builder include:
-
Asterisk ( * )
-
Percent ( % )
-
Characters in key metadata fields will be converted into the appropriate database-specific character and syntax by the lookup query builder.
- None - Wildcard characters included in key field metadata will not be processed. This is the default behavior.
- Allowed - Wildcard characters included in key field metadata will be included in the resulting query and processed against the external data source. Users may execute explicit searches by entering key values into fields, or execute a wildcard enabled search by entering a partial value with a wildcard character.
- Implied - All key field metadata values will be modified to be queried as if a wildcard character was included. This enables users to always execute wildcard enabled searches by simply entering a partial value into any key field.
-
Field Mappings
Field mappings take a field defined by the document type and map it to a field defined by the Connections. Multiple fields from the document type can be mapped, but each field can only be mapped to a single index field. Each field mapping also has several options available that control how the metadata is used to build a lookup query, and upon receiving a result set, how the result metadata is applied to the document type fields.
Field mappings in PSIcapture Mailroom may be defined as either a key field mapping, a return field mapping, or both.
In cases where a field mapping is both a key field and a return field, the value in the linked index field will be used to build the lookup query, and any values returned for the linked query field will be used to overwrite the original key field value once a result record is chosen.
- Mapping Type - This option allows users to map the field to the document profile or to a specific Index field value.
- Index Field - Select the index field to map the query to. How the field is used within the lookup operation is dependent on the key and return field options selected below.
- Query Field - Select the query field to map the index field to. How the field is used within the lookup operation is dependent on the key and return field options selected below.
- Key - Key fields are used to build the query portion of the lookup, that will be executed against the external data source. Key fields may be defined using the following options:
- None - The index field should not be used as a filtering key field in the lookup query. This is the default behavior.
- Required - The index field is a required part of the lookup query and will always be included.
-
Optional - The index field should be used as a filtering key field in the lookup query, if a value is supplied; otherwise the field will not be included in the lookup query.
Required vs. Optional Key Fields
A lookup will not be triggered unless one or more key field values have been supplied by the index fields via a field mapping on the lookup.
When a lookup consists of all required key field mappings, all key fields must have a value assigned before the lookup will be triggered.
When a lookup consists of mixed required and optional key field mappings, all required fields must have a value assigned before the lookup will be triggered, and any optional fields that include a value will be included in the query.
When a lookup contains only optional key field mappings, the lookup will be triggered anytime at least one of the mapped key fields has a value. At the time the query is run, any optional key fields with values will be included in the query, and those optional key fields without values will be ignored.
-
Return - Return fields are used to receive metadata queried from the records returned by the external data source. Return fields may be defined using the following options:
- None - The field should not be used as a return field in the lookup query. This is the default behavior.
- Map to Field - The field should be used as a return field, with the value from the result field specified by the Query Field setting used to populate the document metadata field.
- Map to Field, if Field is Blank - The field should be used as a return field, just as described for the Map to Field setting, however the metadata field data will only it contains no value.
- Display Only - The field will be queried by the lookup query and returned to the client application, but it will not be mapped to any document index fields, and will only be used for display when record selection is required.
- Trim - The trim field options settings enables removal of white space from query return fields, before the return field data is applied to the document index field. The field data is trimmed after the query is run and before the data is sent from the server to the client application. The data in the external data source is never modified. Available trim field options are described below:
- None - Use the return field data as it is returned, with no white space trimming enable enabled. This is the default behavior.
- Trim Start - Remove all white space from the beginning of the returned field value.
- Trim End - Remove all white space from the end of the returned field value.
- Trim Both - Remove all white space from both the beginning and the end of the returned field value.
Removing a Lookup
To remove a lookup from the document profile, select the lookup to delete and click on the "x" icon to the right of the lookup name.
Removing a lookup is a permanent and unrecoverable operation, if a user wishes to restore the lookup to the document profile it will need to be recreated. NOTE: The Connection used for the lookup will not be deleted, and can be reused.
Lookup Order
The order lookups are displayed on the lookups tab is also the order the lookups will be executed as the index operator moves through the fields providing key field values. If a user wishes to reorder the lookups for the document profile, select a lookup to move and use the Up and Down buttons in the upper right corner of the table.
Repeat this for each lookup that needs to be reordered, and save the document profile.
Comments
Article is closed for comments.