Cloud NiFi Processors
  • Calculated Systems NiFi Processors
  • Amazon Web Services
    • Comprehend API
      • DetectDominantLanguage Processor
      • DetectKeyPhrases Processor
      • DetectEntities Processor
        • Entity Types
      • DetectSentiment Processor
      • DetectSyntax Processor
    • Textract API
      • DetectDocumentText Processor
      • AnalyzeDocument Processor
      • Block Types
  • Google Cloud Platform
    • Natural Language API
      • AnalyzeSyntax Processor
      • AnalyzeEntities Processor
      • AnalyzeSentiment Processor
      • AnalyzeEntitiesWithSentiment Processor
      • ClassifyText Processor
      • AnnotateText Processor
      • Entity Types
      • The Metadata Field
Powered by GitBook
On this page

Was this helpful?

  1. Amazon Web Services
  2. Textract API

Block Types

Part of the AWS Textract processor family

Each block returned from the Textract API have a specific type that convey what each block is or represents.

For text detection operations (done by the DetectDocumentText processors), a block can take one of the following types:

Identifier

Description

PAGE

A block whose relationships contain a list of blocks (of type LINE) that are detected on a document page

WORD

A word detected on a document page. A word is one or more ISO basic Latin script characters that are not separated by spaces.

LINE

A string of tab-delimited, contiguous words that are detected on a document page

For text analysis operations (done by the AnalyzeDocumentText processors), a block can take one of the following types:

Identifier

Description

PAGE

A block whose relationships contain a list of blocks (of type LINE) that are detected on a document page

WORD

A word detected on a document page. A word is one or more ISO basic Latin script characters that are not separated by spaces

LINE

A string of tab-delimited, contiguous words that are detected on a document page

TABLE

A table that is detected on a document page. A table is a grid-based information structure with two or more rows or columns, with a cell span of one row and one column each

CELL

A cell within a detected table. The cell is the parent of the block that contains the text in the cell

SELECTION_ELEMENT

A selection element such as a radio button or a check box that is detected on a document page

PreviousAnalyzeDocument ProcessorNextNatural Language API

Last updated 5 years ago

Was this helpful?