Block Types

Part of the AWS Textract processor family

Each block returned from the Textract API have a specific type that convey what each block is or represents.

For text detection operations (done by the DetectDocumentText processors), a block can take one of the following types:

Identifier

Description

PAGE

A block whose relationships contain a list of blocks (of type LINE) that are detected on a document page

WORD

A word detected on a document page. A word is one or more ISO basic Latin script characters that are not separated by spaces.

LINE

A string of tab-delimited, contiguous words that are detected on a document page

For text analysis operations (done by the AnalyzeDocumentText processors), a block can take one of the following types:

Identifier

Description

PAGE

A block whose relationships contain a list of blocks (of type LINE) that are detected on a document page

WORD

A word detected on a document page. A word is one or more ISO basic Latin script characters that are not separated by spaces

LINE

A string of tab-delimited, contiguous words that are detected on a document page

TABLE

A table that is detected on a document page. A table is a grid-based information structure with two or more rows or columns, with a cell span of one row and one column each

CELL

A cell within a detected table. The cell is the parent of the block that contains the text in the cell

SELECTION_ELEMENT

A selection element such as a radio button or a check box that is detected on a document page

Last updated