Textract API

Home of the AWS Textract processor family

Our AWS Textract processor family brings the functionality of Amazon's flagship OCR API to your NiFi flow. They can analyze .png, .jpg/.jpegand .pdf files, stored in an Amazon S3 bucket.

Here is a list of the processors in the AWS Textract processor family:

Common Properties

Common properties are properties that are shared between all Textract processors This means every Textract processor will include these properties, plus whatever additional properties the individual processors add.

Properties whose names are in bold and italics are required.

  • Textract Region - A dropdown list of AWS regions. This is to be set to the region you've set for Textract

  • S3 Region - A dropdown list of AWS regions. This is to be set to the region of the S3 bucket(s) you will be pulling documents from for analysis

  • Bucket - An expression-language supporting input that holds the S3 bucket of the file to be analyzed.

  • Object Key - An expression-language supporting input that holds the name of the S3 object to be analyzed

  • Destination - A dropdown input that determines what part of the outgoing FlowFile will contain the output information. The value can be set to one of the following:

    • flowfile-body: the data will be put to the FlowFile body. Additionally, the FlowFile's mime.type property will be set to application/json

    • flowfile-attribute: the data will be put to an attribute, whose name depends on the processor. This name is listed on the processor's documentation page.

  • Communications Timeout - how long before the processor routes a FlowFile to failure due to lack of API response

  • AWS Credentials Provider Service - A reusable provider controller service that stores AWS credentials. If this is not set, you will need to put in whatever relevant credentials information manually into their respective properties.

  • Access Key ID - The secret access key ID of an AWS credential

  • Secret Access Key - The body of an AWS credential's secret access key

  • Credentials File - The path to a properties file (on your instance) containing an AWS access key and secret key

  • SSL Context Service - an optional reusable SSL context service which will be used to create connections if provided

Last updated