Cloud NiFi Processors
  • Calculated Systems NiFi Processors
  • Amazon Web Services
    • Comprehend API
      • DetectDominantLanguage Processor
      • DetectKeyPhrases Processor
      • DetectEntities Processor
        • Entity Types
      • DetectSentiment Processor
      • DetectSyntax Processor
    • Textract API
      • DetectDocumentText Processor
      • AnalyzeDocument Processor
      • Block Types
  • Google Cloud Platform
    • Natural Language API
      • AnalyzeSyntax Processor
      • AnalyzeEntities Processor
      • AnalyzeSentiment Processor
      • AnalyzeEntitiesWithSentiment Processor
      • ClassifyText Processor
      • AnnotateText Processor
      • Entity Types
      • The Metadata Field
Powered by GitBook
On this page
  • Properties
  • Data Output

Was this helpful?

  1. Amazon Web Services
  2. Textract API

AnalyzeDocument Processor

Part of the AWS Textract processor family

PreviousDetectDocumentText ProcessorNextBlock Types

Last updated 5 years ago

Was this helpful?

The AnalyzeDocument processor will search a document for text, forms, and tables. You can tell the processor to search for specific things by modifying the Feature Types property.

Properties

All of our Textract processors also include these .

Properties whose names are in bold and italic are required.

  • Feature Types- a dropdown list that tells controls what the API is looking for. It can be set to one of the following:

    • tables - the API will search for text, as well as tables

    • forms - the API will search for text, as well as forms, which are areas where a user would be expected to input information

    • tables-and-forms - the API will search for text, tables, and forms

Data Output

If the Destination property is set to flowfile-attribute, then the output of this processor will be routed to the FlowFile's ocr.AnalyzedDocument attribute, which will be created if it isn't present.

Field Name

Data Type

Description

blocks

array of Block

The list of blocks returned from the API

Block

Field Name

Data Type

Description

text

string

The text in this block

confidence

float

How confident the API is in its response

id

string

The UUID pertaining to this block. Can be used to cross-reference relationships between blocks

page

int

The page of the document in which this block resides

columnIndex

int

columnSpan

int

rowIndex

int

rowSpan

int

type

string ()

The kind of block

geometry

Geometry

The position and size of the block

relationship

array of Relationship

The relationships this block has to others

Geometry

Field Name

Data Type

Description

x

float

The X position of the block on the page

y

float

The Y position of the block on the page

width

float

The width of the block

height

float

The height of the block

Relationship

Field Name

Data Type

Description

type

string (equal to eitherVALUE or CHILD)

The kind of relationship

ids

array of strings

The list of block UUIDs that are connected via this relationship

{
	"output": {
		"blocks": [
			{
				"relationships": [],
				"confidence": 99.35694,
				"geometry": {
					"width": 0.2716,
					"height": 0.02702,
					"x": 0.36377,
					"y": 0.07574
				},
				"text": "Spirit Game Script",
				"id": "e63c08cf-0bef-4c6d-ac04-0e250e254229",
				"page": 1,
				"type": "LINE",
			},
			{
				"relationships": [
					{
						"ids": [
							"a5ade6e3-a368-49fe-b338-7a8a4fbe9058",
							"10e1b516-02c3-4eaf-a216-65efc12af6a7",
							"2f44eedd-651b-4ac9-b42a-744bd0dfcbe1",
							"fbb0d176-f9fb-4c48-8a5a-4d09252f6d17"
						],
            "type":"CHILD"
					}
				],
				"confidence": 99.51574,
				"geometry": {
					"width": 0.37119,
					"height": 0.18922,
					"x": 0.321415,
					"y": 0.251234,
				},
				"text": "Courageous, pleading and clear.",
				"id": "7396c015-e0aa-49e5-8d05-3cf9a7430539",
				"page": 1,
				"type": "LINE"
			},
			// ... plus potentially many more entries!
		]
	}
}

common properties
BlockType