Cloud NiFi Processors
  • Calculated Systems NiFi Processors
  • Amazon Web Services
    • Comprehend API
      • DetectDominantLanguage Processor
      • DetectKeyPhrases Processor
      • DetectEntities Processor
        • Entity Types
      • DetectSentiment Processor
      • DetectSyntax Processor
    • Textract API
      • DetectDocumentText Processor
      • AnalyzeDocument Processor
      • Block Types
  • Google Cloud Platform
    • Natural Language API
      • AnalyzeSyntax Processor
      • AnalyzeEntities Processor
      • AnalyzeSentiment Processor
      • AnalyzeEntitiesWithSentiment Processor
      • ClassifyText Processor
      • AnnotateText Processor
      • Entity Types
      • The Metadata Field
Powered by GitBook
On this page
  • Properties
  • Data Output

Was this helpful?

  1. Amazon Web Services
  2. Textract API

DetectDocumentText Processor

Part of the AWS Textract processor family

PreviousTextract APINextAnalyzeDocument Processor

Last updated 5 years ago

Was this helpful?

The DetectDocumentText processor will extract text from a given document, which can be either an image or a PDF document.

Properties

All of our Textract processors also include these .

This processor does not have any unique properties outside of the common ones.

Data Output

If the Destination property is set to flowfile-attribute, then the output of this processor will be routed to the FlowFile's ocr.DetectedText attribute, which will be created if it isn't present.

Field Name

Data Type

Description

blocks

array of Block

The list of blocks returned from the API

Block

Field Name

Data Type

Description

text

string

The text in this block

confidence

float

How confident the API is in its response

id

string

The UUID pertaining to this block. Can be used to cross-reference relationships between blocks

page

int

The page of the document in which this block resides

columnIndex

int

columnSpan

int

rowIndex

int

rowSpan

int

type

string ()

The kind of block

geometry

Geometry

The position and size of the block

relationship

array of Relationship

The relationships this block has to others

Geometry

Field Name

Data Type

Description

x

float

The X position of the block on the page

y

float

The Y position of the block on the page

width

float

The width of the block

height

float

The height of the block

Relationship

Field Name

Data Type

Description

type

string (equal to eitherVALUE or CHILD)

The kind of relationship

ids

array of strings

The list of block UUIDs that are connected via this relationship

{
	"output": {
		"blocks": [
			{
				"relationships": [],
				"confidence": 99.35694,
				"geometry": {
					"width": 0.2716,
					"height": 0.02702,
					"x": 0.36377,
					"y": 0.07574
				},
				"text": "Spirit Game Script",
				"id": "e63c08cf-0bef-4c6d-ac04-0e250e254229",
				"page": 1,
				"type": "LINE",
			},
			{
				"relationships": [
					{
						"ids": [
							"a5ade6e3-a368-49fe-b338-7a8a4fbe9058",
							"10e1b516-02c3-4eaf-a216-65efc12af6a7",
							"2f44eedd-651b-4ac9-b42a-744bd0dfcbe1",
							"fbb0d176-f9fb-4c48-8a5a-4d09252f6d17"
						],
            "type":"CHILD"
					}
				],
				"confidence": 99.51574,
				"geometry": {
					"width": 0.37119,
					"height": 0.18922,
					"x": 0.321415,
					"y": 0.251234,
				},
				"text": "Courageous, pleading and clear.",
				"id": "7396c015-e0aa-49e5-8d05-3cf9a7430539",
				"page": 1,
				"type": "LINE"
			},
			// ... plus potentially many more entries!
		]
	}
}

common properties
BlockType