Indexing considerations

The index object file contains Index Element (IEL) structured fields that identify the location of the tagged groups in the print file. The tags are contained in the Tag Logical Element (TLE) structured fields.

The structured field offset and byte offset values are accurate at the time ACIF creates the output document file. However, if you extract various pages or page groups for viewing or printing, you must dynamically create from the original a temporary index object file that contains the correct offset information for the new file. For example:

  • ACIF processed all the bank statements for six branches by using the account number, statement date, and branch number.
  • The resultant output files were archived by using a system that lets these statements be retrieved based on any combination of these three indexing values.

If you wanted to view all the bank statements from Branch 1, your retrieval system must be able to extract all the statements from the print file ACIF created (possibly by using the IELs and TLEs in the index object file) and create another document for viewing. This new document would need its own index object file that contains the correct offset information.

Under some circumstances, the indexing that ACIF produces might not be what you expect, for example:

  • If your page definition produces multiple-up output, and if the data values you are using for your indexing attributes appear on more than one of the multiple-up subpages, ACIF might produce two indexing tags for the same physical page of output. In this situation, only the first index attribute name appears as a group name, when you are using AFP Workbench Viewer. To avoid this situation, specify a page definition that formats your data without multiple-up when you submit the indexing job to ACIF.
  • If your input file contains machine carriage control characters, and you use the new page carriage control character as a TRIGGER, the indexing tag created points to the page on which the carriage control character was found, not to the new page created by the carriage control character. This situation happens because machine controls write before they process any action and are, therefore, associated with the page or line on which they appear. Using machine carriage control characters for triggers is not a recommended practice.
  • If your input file contains application-generated separator pages (for example, banner pages), and you want to use data values for your indexing attributes, you can write an Input Data exit program to remove the separator pages. Otherwise, the presence of those pages in the file makes the input data too unpredictable for ACIF to reliably locate the data values. As alternatives to writing an exit program, you can also change your application program to remove the separator pages from its output, or you can use the INDEXSTARTBY parameter to instruct ACIF to start indexing on the first page after the header pages.
  • If you want to use data values for your indexing attributes, but none of the values appear on the first page of each logical document, you can cause ACIF to place an indexing tag on the first page by defining a FIELD parameter with a large enough negative relative record number from the anchor record to page backward to the first page. Without referencing this FIELD parameter in an INDEX parameter, the tag that is generated by any INDEX parameter is placed on the first page.
  • If your input file contains Unicode data and you specify EXTENSIONS=IDXCPGID to process the code page identifiers, you must ensure that:
    • The CPGID parameter indicates the code page of the document and the extracted index values, which must be in the same code page.
    • The TRIGGER parameter value and INDEX parameter name are expressed in big endian format in the code page that is specified by the CPGID parameter.
    • The FIELD parameter values are extracted from the document in big endian format.
    • The mask field is not specified on the FIELD parameter unless you are using code page 1208 and only indexing single-byte characters. MASK does not support the multiple-byte characters of code page 1208 (UTF-8).

    Example of ACIF parameters for processing documents with Unicode data shows the ACIF parameters for a document with a code page of 1200.

    Example of ACIF parameters for processing documents with Unicode data

    CC=YES  
    CCTYPE=A  
    CPGID=1200  
    FILEFORMAT=RECORD,401  
    TRIGGER1=*,228,X’0050004100470045’,(TYPE=GROUP)   /* P A G E */  
    FIELD1=0,246,10,(TRIGGER=1,BASE=0)  
    FIELD2=0,-76,16,(TRIGGER=1,BASE=TRIGGER)  
    INDEX1=X’0070006100670065’,FIELD1,(TYPE=GROUP,BREAK=YES)   /* page */  
    INDEX2=X’006E0061006D0065’,FIELD2,(TYPE=GROUP,BREAK=YES)  /* name */  
    EXTENSIONS=IDXCPGID  
    FORMDEF=F1IBMTU3  
    PAGEDEF=P1IBMTU3  
    RESLIB=\acif\reslib2 

    In the example, on the first page, these 10 bytes are extracted in big endian format for FIELD1:

    X’00200020002000200031’  /* 1 */
    and these 16 bytes are extracted for FIELD2:
    X’002000500045004C0053004800320032’  /* PELSH22 */