Block Types
Part of the AWS Textract processor family
Each block returned from the Textract API have a specific type that convey what each block is or represents.
For text detection operations (done by the DetectDocumentText processors), a block can take one of the following types:
Identifier
Description
PAGE
A block whose relationships contain a list of blocks (of type LINE
) that are detected on a document page
WORD
A word detected on a document page. A word is one or more ISO basic Latin script characters that are not separated by spaces.
LINE
A string of tab-delimited, contiguous words that are detected on a document page
For text analysis operations (done by the AnalyzeDocumentText processors), a block can take one of the following types:
Identifier
Description
PAGE
A block whose relationships contain a list of blocks (of type LINE
) that are detected on a document page
WORD
A word detected on a document page. A word is one or more ISO basic Latin script characters that are not separated by spaces
LINE
A string of tab-delimited, contiguous words that are detected on a document page
TABLE
A table that is detected on a document page. A table is a grid-based information structure with two or more rows or columns, with a cell span of one row and one column each
CELL
A cell within a detected table. The cell is the parent of the block that contains the text in the cell
SELECTION_ELEMENT
A selection element such as a radio button or a check box that is detected on a document page
Last updated