DetectDocumentText Processor

Part of the AWS Textract processor family
The DetectDocumentText processor will extract text from a given document, which can be either an image or a PDF document.

Properties

All of our Textract processors also include these common properties.
This processor does not have any unique properties outside of the common ones.

Data Output

If the Destination property is set to flowfile-attribute, then the output of this processor will be routed to the FlowFile's ocr.DetectedText attribute, which will be created if it isn't present.
Output Structure
Relevant Data Structures
Example Output
Field Name
Data Type
Description
blocks
array of Block
The list of blocks returned from the API

Block

Field Name
Data Type
Description
text
string
The text in this block
confidence
float
How confident the API is in its response
id
string
The UUID pertaining to this block. Can be used to cross-reference relationships between blocks
page
int
The page of the document in which this block resides
columnIndex
int
columnSpan
int
rowIndex
int
rowSpan
int
type
string (BlockType)
The kind of block
geometry
Geometry
The position and size of the block
relationship
array of Relationship
The relationships this block has to others

Geometry

Field Name
Data Type
Description
x
float
The X position of the block on the page
y
float
The Y position of the block on the page
width
float
The width of the block
height
float
The height of the block

Relationship

Field Name
Data Type
Description
type
string (equal to eitherVALUE or CHILD)
The kind of relationship
ids
array of strings
The list of block UUIDs that are connected via this relationship
{
"output": {
"blocks": [
{
"relationships": [],
"confidence": 99.35694,
"geometry": {
"width": 0.2716,
"height": 0.02702,
"x": 0.36377,
"y": 0.07574
},
"text": "Spirit Game Script",
"id": "e63c08cf-0bef-4c6d-ac04-0e250e254229",
"page": 1,
"type": "LINE",
},
{
"relationships": [
{
"ids": [
"a5ade6e3-a368-49fe-b338-7a8a4fbe9058",
"10e1b516-02c3-4eaf-a216-65efc12af6a7",
"2f44eedd-651b-4ac9-b42a-744bd0dfcbe1",
"fbb0d176-f9fb-4c48-8a5a-4d09252f6d17"
],
"type":"CHILD"
}
],
"confidence": 99.51574,
"geometry": {
"width": 0.37119,
"height": 0.18922,
"x": 0.321415,
"y": 0.251234,
},
"text": "Courageous, pleading and clear.",
"id": "7396c015-e0aa-49e5-8d05-3cf9a7430539",
"page": 1,
"type": "LINE"
},
// ... plus potentially many more entries!
]
}
}