# DetectDocumentText Processor

The DetectDocumentText processor will extract text from a given document, which can be either an image or a PDF document.

## Properties

{% hint style="info" %}
All of our Textract processors also include these [common properties](https://calculatedsystems.gitbook.io/cloud-nifi-processors/amazon-web-services/textract-api/..#common-properties).
{% endhint %}

This processor does not have any unique properties outside of the common ones.

## Data Output

If the *Destination* property is set to `flowfile-attribute`, then the output of this processor will be routed to the FlowFile's `ocr.DetectedText` attribute, which will be created if it isn't present.

{% tabs %}
{% tab title="Output Structure" %}

| Field Name   | Data Type      | Description                              |
| ------------ | -------------- | ---------------------------------------- |
| blocks       | array of Block | The list of blocks returned from the API |
| {% endtab %} |                |                                          |

{% tab title="Relevant Data Structures" %}

#### `Block`

| Field Name   | Data Type                                                                                                                     | Description                                                                                    |
| ------------ | ----------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| text         | string                                                                                                                        | The text in this block                                                                         |
| confidence   | float                                                                                                                         | How confident the API is in its response                                                       |
| id           | string                                                                                                                        | The UUID pertaining to this block. Can be used to cross-reference relationships between blocks |
| page         | int                                                                                                                           | The page of the document in which this block resides                                           |
| columnIndex  | int                                                                                                                           |                                                                                                |
| columnSpan   | int                                                                                                                           |                                                                                                |
| rowIndex     | int                                                                                                                           |                                                                                                |
| rowSpan      | int                                                                                                                           |                                                                                                |
| type         | string ([BlockType](https://calculatedsystems.gitbook.io/cloud-nifi-processors/amazon-web-services/textract-api/block-types)) | The kind of block                                                                              |
| geometry     | Geometry                                                                                                                      | The position and size of the block                                                             |
| relationship | array of Relationship                                                                                                         | The relationships this block has to others                                                     |

#### `Geometry`

| Field Name | Data Type | Description                             |
| ---------- | --------- | --------------------------------------- |
| x          | float     | The X position of the block on the page |
| y          | float     | The Y position of the block on the page |
| width      | float     | The width of the block                  |
| height     | float     | The height of the block                 |

#### `Relationship`

| Field Name   | Data Type                                  | Description                                                       |
| ------------ | ------------------------------------------ | ----------------------------------------------------------------- |
| type         | string (equal to either`VALUE` or `CHILD`) | The kind of relationship                                          |
| ids          | array of strings                           | The list of block UUIDs  that are connected via this relationship |
| {% endtab %} |                                            |                                                                   |

{% tab title="Example Output" %}

```javascript
{
	"output": {
		"blocks": [
			{
				"relationships": [],
				"confidence": 99.35694,
				"geometry": {
					"width": 0.2716,
					"height": 0.02702,
					"x": 0.36377,
					"y": 0.07574
				},
				"text": "Spirit Game Script",
				"id": "e63c08cf-0bef-4c6d-ac04-0e250e254229",
				"page": 1,
				"type": "LINE",
			},
			{
				"relationships": [
					{
						"ids": [
							"a5ade6e3-a368-49fe-b338-7a8a4fbe9058",
							"10e1b516-02c3-4eaf-a216-65efc12af6a7",
							"2f44eedd-651b-4ac9-b42a-744bd0dfcbe1",
							"fbb0d176-f9fb-4c48-8a5a-4d09252f6d17"
						],
            "type":"CHILD"
					}
				],
				"confidence": 99.51574,
				"geometry": {
					"width": 0.37119,
					"height": 0.18922,
					"x": 0.321415,
					"y": 0.251234,
				},
				"text": "Courageous, pleading and clear.",
				"id": "7396c015-e0aa-49e5-8d05-3cf9a7430539",
				"page": 1,
				"type": "LINE"
			},
			// ... plus potentially many more entries!
		]
	}
}
```

{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://calculatedsystems.gitbook.io/cloud-nifi-processors/amazon-web-services/textract-api/detectdocumenttext-processor.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
