建議閱讀網站

DetectDocumentText關於Line和Word的相關介紹，包含關係圖：https://docs.aws.amazon.com/zh_tw/textract/latest/dg/how-it-works-lines-words.html
文檔上的目標位置怎麼看？：https://docs.aws.amazon.com/zh_tw/textract/latest/dg/text-location.html
請求的格式說明：start_document_text_detection - Boto3 1.34.68 documentation (amazonaws.com)

安裝包

ref: Textract/Quickstart

# 習慣建立一個獨立的環境
conda create -n textract python=3.12 
conda activate textract

# 安裝 boto3 為了可以使用aws的服務
pip install boto3

本地權限設置

根據官方說法，要先設定以下：

先建立一個IAM使用者，該使用者必須擁有4個權限分別是：
- AmazonTextractFullAccess: 可以呼叫Textract的所有API
- AmazonS3FullAccess: 因為分析的檔案會放在S3上，所以要有存取的權限
- AmazonSQSFullAccess＋AmazonSNSFullAccess: 如果要使用異步檢測，就需要這個權限，為了把分析成功的狀態由SNS通知給SQS
建立存取金鑰

在 ~/.aws/credentials 設定該金鑰

1
2
3

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

在 ~/.aws/config 設定區域
1
2
[default]
region=us-east-1

這樣在執行 boto3.client(‘textract’) 時就會根據設定的金鑰和區域來進行操作。

Textract 非同步處理

從Detecting or Analyzing Text in a Multipage Document - Amazon Textract 整體整個操作流程，從上圖可以看到，我們Client Application在呼叫Textract的StartDocumentTextDetection方法時：

Textract 會收到檔案分析的請求，根據請求內容找到S3上的檔案
開始分析，會回傳一個JobId讓使用者可以根據此JobId來檢索SQS中是否有Completion Status的JobId
- 如何給Textract發送SNS通知，需要設定相關權限，我們需要建立RoleArn (Textract) 的權限，讓SNS可以接收來自Textract的通知。
- 我們可以參考 Configuring Amazon Textract for Asynchronous Operations - Amazon Textract 需要先去建立一個Role，為了獲得RoleArn
SNS會通知SQS，SQS會將結果放到Queue中
- 需要先建立SNS Topic 和 SQS Queue, 並且 subscribe 建立的 queue 到 topic.
- 我們可以參考 Amazon SNS 入門 - Amazon Simple Notification Service 了解 SNS 的使用方式。
- 還可以參考 Subscribing an Amazon SQS queue to an Amazon SNS topic - Amazon Simple Notification Service 要讓SNS送訊息到SQS就需要先建立Topic和Queue，並且確保Queue有權限可以接受來自sqs:SendMessage的執行。（通常在SNS Subscribe後可以選擇要哪一個SQS，此時就會設定相關的Policy）。
Client Application會不斷的檢查SQS中是否有Step2回傳的JobId，如果有表示該Job已經完成，可以透過GetDocumentTextDetection來取得結果。
- 相關的回傳和請求內容格式，可以參考調用 Amazon Textract 異步操作 - Amazon Textract 裡面有說明發送請求的設定。
透過GetDocumentTextDetection取得結果後，就可以進行後續的處理。

Step2: 取得RoleArn

Ref: Configuring Amazon Textract for Asynchronous Operations - Amazon Textract

當異步操作完成時，Amazon Textract 需要獲得許可才能向您的 Amazon SNS 主題發送消息。您可以使用 IAM 服務角色，讓 Amazon Textract 存取 Amazon SNS 主題。創建 Amazon SNS 主題時，必須在主題名稱前加上AmazonTextract— 例如，AmazonTextractMyTopicName。

登入 IAM 主控台 (https://console.aws.amazon.com/iam)。
在導覽窗格中，選擇 Roles (角色)。
選擇 Create Role (建立角色)。
對於 Select type of trusted entity (選取信任的實體類型)，選擇 AWS service (AWS 服務)。
適用於選擇將使用此角色的服務，選擇Textract。
選擇 Next: (下一步：) Permissions (許可)。
驗證AmazonTextractServiceRole策略已包含在附加策略列表中。若要在清單中顯示政策，請在篩選政策。
選擇 Next: (下一步：) Tags (標籤)。
您不需要新增標籤，所以請選擇下一頁: Review (檢閱)。
在 Review (檢閱) 區段中，針對 Role name (角色名稱)，輸入角色的名稱 (例如，TextractRole)。In角色描述，請更新該角色的描述，然後選擇建立角色。
選擇新角色來開啟角色的詳細資訊頁面。
在 Summary (摘要) 中，複製 Role ARN (角色 ARN) 值，並將其儲存。
選擇 Trust relationships (信任關係)。
選擇編輯信任關係，並確保信任策略如下所示。為了防止混淆的代理問題，請確保信任策略包含限制權限範圍的條件。有關此潛在安全問題的更多詳細信息，請參閱跨服務混淆的代理預防措施。在下面的示例中，將123456789012文本替換為您的 AWS 帳戶 ID。
選擇Update Trust Policy更新信任政策。

{
  "Version": "2012-10-17",
  "Statement": {
    "Sid": "ConfusedDeputyPreventionExamplePolicy",
    "Effect": "Allow",
    "Principal": {
      "Service": "textract.amazonaws.com"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "ArnLike": {
        "aws:SourceArn":"arn:aws:textract:*:123456789012:*"
      },
      "StringEquals": {
        "aws:SourceAccount": "123456789012"
      }
    }
  }
}

結果如下：

Step 3: SQS and SNS

將許可提供給 Amazon SNS Topic，以將訊息傳送至 Amazon SQS 佇列。為了讓 Amazon SNS 主題能夠傳送訊息至Queue，您必須對Queue設定政策，允許 Amazon SNS 主題執行 sqs:SendMessage 動作。在您訂閱Queue到Topic之前，您需要建立Topic和Queue。如果您尚未建立Topic和Queue，請現在建立。如需詳細資訊，請參閱建立Topic，並參閱 Amazon Queue Service 開發人員指南中的建立Queue。

使用 Amazon SQS 主控台設定佇列的 SendMessage 政策

登入 AWS Management Console，並在 https://console.aws.amazon.com/sqs/ 開啟 Amazon SQS 主控台。
選取您要設定其政策之佇列的方塊，選擇 Access policy (存取政策) 索引標籤，然後選擇 Edit (編輯)。
在存取政策區段中，定義誰可以存取您的佇列。
- 新增條件以允許用於主題的動作。
- 將 Principal 設定為 Amazon SNS 服務，如下列範例所示。
- 使用 aws:SourceArn 或者 aws:SourceAccount 全域條件金鑰，以防止混淆代理人案例。如要使用這些條件金鑰，請將值設定為主題的 ARN。若您的佇列訂閱了多個主題，則可改用 aws:SourceAccou

例如，下列政策允許 MyTopic 傳送訊息至 MyQueue。請取代123456789012為您的帳戶ID。

{
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sns.amazonaws.com"
      },
      "Action": "sqs:SendMessage",
      "Resource": "arn:aws:sqs:us-east-2:123456789012:MyQueue",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:sns:us-east-2:123456789012:MyTopic"
        }
      }
    }
  ]
}

python code

可以參考 Detecting or Analyzing Text in a Multipage Document - Amazon Textract ，附上完整的Python程式碼，詳細的介紹如何透過Python實現上述講到的所有流程。

import boto3
import json
import sys
import time


class ProcessType:
    DETECTION = 1
    ANALYSIS = 2


class DocumentProcessor:
    jobId = ''
    region_name = ''

    roleArn = ''
    bucket = ''
    document = ''

    sqsQueueUrl = ''
    snsTopicArn = ''
    processType = ''

    def __init__(self, role, bucket, document, region):
        self.roleArn = role
        self.bucket = bucket
        self.document = document
        self.region_name = region

        self.textract = boto3.client('textract', region_name=self.region_name)
        self.sqs = boto3.client('sqs')
        self.sns = boto3.client('sns')

    def ProcessDocument(self, type):
        jobFound = False

        self.processType = type
        validType = False

        # Determine which type of processing to perform
        if self.processType == ProcessType.DETECTION:
            response = self.textract.start_document_text_detection(
                DocumentLocation={'S3Object': {'Bucket': self.bucket, 'Name': self.document}},
                NotificationChannel={'RoleArn': self.roleArn, 'SNSTopicArn': self.snsTopicArn})
            print('Processing type: Detection')
            validType = True

        if self.processType == ProcessType.ANALYSIS:
            response = self.textract.start_document_analysis(
                DocumentLocation={'S3Object': {'Bucket': self.bucket, 'Name': self.document}},
                FeatureTypes=["TABLES", "FORMS"],
                NotificationChannel={'RoleArn': self.roleArn, 'SNSTopicArn': self.snsTopicArn})
            print('Processing type: Analysis')
            validType = True

        if validType == False:
            print("Invalid processing type. Choose Detection or Analysis.")
            return

        print('Start Job Id: ' + response['JobId'])
        dotLine = 0
        while jobFound == False:
            sqsResponse = self.sqs.receive_message(QueueUrl=self.sqsQueueUrl, MessageAttributeNames=['ALL'],
                                                   MaxNumberOfMessages=10)

            if sqsResponse:

                if 'Messages' not in sqsResponse:
                    if dotLine < 40:
                        print('.', end='')
                        dotLine = dotLine + 1
                    else:
                        print()
                        dotLine = 0
                    sys.stdout.flush()
                    time.sleep(5)
                    continue

                for message in sqsResponse['Messages']:
                    notification = json.loads(message['Body'])
                    textMessage = json.loads(notification['Message'])
                    print(textMessage['JobId'])
                    print(textMessage['Status'])
                    if str(textMessage['JobId']) == response['JobId']:
                        print('Matching Job Found:' + textMessage['JobId'])
                        jobFound = True
                        self.GetResults(textMessage['JobId'])
                        self.sqs.delete_message(QueueUrl=self.sqsQueueUrl,
                                                ReceiptHandle=message['ReceiptHandle'])
                    else:
                        print("Job didn't match:" +
                              str(textMessage['JobId']) + ' : ' + str(response['JobId']))
                    # Delete the unknown message. Consider sending to dead letter queue
                    self.sqs.delete_message(QueueUrl=self.sqsQueueUrl,
                                            ReceiptHandle=message['ReceiptHandle'])

        print('Done!')

    def CreateTopicandQueue(self):

        millis = str(int(round(time.time() * 1000)))

        # Create SNS topic
        snsTopicName = "AmazonTextractTopic" + millis

        topicResponse = self.sns.create_topic(Name=snsTopicName)
        self.snsTopicArn = topicResponse['TopicArn']

        # create SQS queue
        sqsQueueName = "AmazonTextractQueue" + millis
        self.sqs.create_queue(QueueName=sqsQueueName)
        self.sqsQueueUrl = self.sqs.get_queue_url(QueueName=sqsQueueName)['QueueUrl']

        attribs = self.sqs.get_queue_attributes(QueueUrl=self.sqsQueueUrl,
                                                AttributeNames=['QueueArn'])['Attributes']

        sqsQueueArn = attribs['QueueArn']

        # Subscribe SQS queue to SNS topic
        self.sns.subscribe(
            TopicArn=self.snsTopicArn,
            Protocol='sqs',
            Endpoint=sqsQueueArn)

        # Authorize SNS to write SQS queue
        policy = """{{
  "Version":"2012-10-17",
  "Statement":[
    {{
      "Sid":"MyPolicy",
      "Effect":"Allow",
      "Principal" : {{"AWS" : "*"}},
      "Action":"SQS:SendMessage",
      "Resource": "{}",
      "Condition":{{
        "ArnEquals":{{
          "aws:SourceArn": "{}"
        }}
      }}
    }}
  ]
}}""".format(sqsQueueArn, self.snsTopicArn)

        response = self.sqs.set_queue_attributes(
            QueueUrl=self.sqsQueueUrl,
            Attributes={
                'Policy': policy
            })

    def DeleteTopicandQueue(self):
        self.sqs.delete_queue(QueueUrl=self.sqsQueueUrl)
        self.sns.delete_topic(TopicArn=self.snsTopicArn)

    # Display information about a block
    def DisplayBlockInfo(self, block):

        print("Block Id: " + block['Id'])
        print("Type: " + block['BlockType'])
        if 'EntityTypes' in block:
            print('EntityTypes: {}'.format(block['EntityTypes']))

        if 'Text' in block:
            print("Text: " + block['Text'])

        if block['BlockType'] != 'PAGE':
            print("Confidence: " + "{:.2f}".format(block['Confidence']) + "%")

        print('Page: {}'.format(block['Page']))

        if block['BlockType'] == 'CELL':
            print('Cell Information')
            print('\tColumn: {} '.format(block['ColumnIndex']))
            print('\tRow: {}'.format(block['RowIndex']))
            print('\tColumn span: {} '.format(block['ColumnSpan']))
            print('\tRow span: {}'.format(block['RowSpan']))

            if 'Relationships' in block:
                print('\tRelationships: {}'.format(block['Relationships']))

        print('Geometry')
        print('\tBounding Box: {}'.format(block['Geometry']['BoundingBox']))
        print('\tPolygon: {}'.format(block['Geometry']['Polygon']))

        if block['BlockType'] == 'SELECTION_ELEMENT':
            print('    Selection element detected: ', end='')
            if block['SelectionStatus'] == 'SELECTED':
                print('Selected')
            else:
                print('Not selected')

    def GetResults(self, jobId):
        maxResults = 1000
        paginationToken = None
        finished = False

        while finished == False:

            response = None

            if self.processType == ProcessType.ANALYSIS:
                if paginationToken == None:
                    response = self.textract.get_document_analysis(JobId=jobId,
                                                                   MaxResults=maxResults)
                else:
                    response = self.textract.get_document_analysis(JobId=jobId,
                                                                   MaxResults=maxResults,
                                                                   NextToken=paginationToken)

            if self.processType == ProcessType.DETECTION:
                if paginationToken == None:
                    response = self.textract.get_document_text_detection(JobId=jobId,
                                                                         MaxResults=maxResults)
                else:
                    response = self.textract.get_document_text_detection(JobId=jobId,
                                                                         MaxResults=maxResults,
                                                                         NextToken=paginationToken)

            blocks = response['Blocks']
            print('Detected Document Text')
            print('Pages: {}'.format(response['DocumentMetadata']['Pages']))

            # Display block information
            for block in blocks:
                self.DisplayBlockInfo(block)
                print()
                print()

            if 'NextToken' in response:
                paginationToken = response['NextToken']
            else:
                finished = True

    def GetResultsDocumentAnalysis(self, jobId):
        maxResults = 1000
        paginationToken = None
        finished = False

        while finished == False:

            response = None
            if paginationToken == None:
                response = self.textract.get_document_analysis(JobId=jobId,
                                                               MaxResults=maxResults)
            else:
                response = self.textract.get_document_analysis(JobId=jobId,
                                                               MaxResults=maxResults,
                                                               NextToken=paginationToken)

                # Get the text blocks
            blocks = response['Blocks']
            print('Analyzed Document Text')
            print('Pages: {}'.format(response['DocumentMetadata']['Pages']))
            # Display block information
            for block in blocks:
                self.DisplayBlockInfo(block)
                print()
                print()

                if 'NextToken' in response:
                    paginationToken = response['NextToken']
                else:
                    finished = True


def main():
    roleArn = ''
    bucket = ''
    document = ''
    region_name = ''

    analyzer = DocumentProcessor(roleArn, bucket, document, region_name)
    analyzer.CreateTopicandQueue()
    analyzer.ProcessDocument(ProcessType.DETECTION)
    analyzer.DeleteTopicandQueue()


if __name__ == "__main__":
    main()

Textract 同步處理

Ref: AWS Textract API 規格書：https://docs.aws.amazon.com/zh_tw/textract/latest/dg/API_DetectDocumentText.html

DetectDocumentText 是可以分析文件中的 line 或是 word 字段和位置。AWS Textract 可以選擇要透過S3 或是傳送 base64 編碼的bytes來進行分析。根據官方對偵測文字的說法，回應返回文檔中檢操到的：

文本的line和word
文本中line裡面和word之間的關係
檢測到的文本顯示在圖片上哪一個位置

在這裏，我們是使用同步檢測（一個做完才做下一個），如果要使用異步檢測可以參考StartDocumentTextDetection，並且透過GetDocumentTextDetection來取得結果。如需詳細資訊和範例，請參閱使用異步操作處理文檔。

建立client

我們要先建立一個client，然後使用 detect_document_text 來進行分析。


# AWS 
import boto3 

# 圖片處理
import io
from io import BytesIO
from PIL import Image, ImageDraw, ImageFont
from pdf2image import convert_from_bytes



# 讀取pdf 
pdf_file = 'test.pdf' # 設定想要處理的檔案
with open(pdf_file, 'rb') as file: 
    pdf_binary = file.read()

# aws client 
client = boto3.client('textract', region_name='us-east-1')

# textract 
response = client.detect_document_text(Document={'Bytes': pdf_binary})

處理回傳的資料

以下程式碼是官方提供的範例程式，主要可以看到如何處理回傳的資料，並且在圖片上畫出文字的位置。他的Bounding Box是一個比例，所以要乘上圖片的寬高才能得到正確的位置。

檢測到的文本將在Text欄位Block物件。所以此BlockType字段確定文本是一行文本 (LINE) 還是單詞 (WORD)。一個字是一或多個 ISO 基本拉丁腳本字符，不以空格分隔。一個線是製表符分隔和連續單詞的字符串。

最後bounding box的位置由以下組成：用來定義邊界框的四個點。[(x0, y0), (x1, y1)] 或 [x0, y0, x1, y1] 的數列

def process_text_detection():
    images = convert_from_bytes(pdf_binary)
    image = images[0]

    #Get the text blocks
    blocks=response['Blocks']
    width, height =image.size  
    draw = ImageDraw.Draw(image)  
    print ('Detected Document Text')
   
    # Create image showing bounding box/polygon the detected lines/text
    for block in blocks:
            print('Type: ' + block['BlockType'])
            if block['BlockType'] != 'PAGE':
                print('Detected: ' + block['Text'])
                print('Confidence: ' + "{:.2f}".format(block['Confidence']) + "%")

            print('Id: {}'.format(block['Id']))
            if 'Relationships' in block:
                print('Relationships: {}'.format(block['Relationships']))
            print('Bounding Box: {}'.format(block['Geometry']['BoundingBox']))
            print('Polygon: {}'.format(block['Geometry']['Polygon']))
            print()
            draw=ImageDraw.Draw(image)
            # Draw WORD - Green -  start of word, red - end of word
            if block['BlockType'] == "WORD":
                draw.line([(width * block['Geometry']['Polygon'][0]['X'],
                height * block['Geometry']['Polygon'][0]['Y']),
                (width * block['Geometry']['Polygon'][3]['X'],
                height * block['Geometry']['Polygon'][3]['Y'])],fill='green',
                width=2)
            
                draw.line([(width * block['Geometry']['Polygon'][1]['X'],
                height * block['Geometry']['Polygon'][1]['Y']),
                (width * block['Geometry']['Polygon'][2]['X'],
                height * block['Geometry']['Polygon'][2]['Y'])],
                fill='red',
                width=2)    

                 
            # Draw box around entire LINE  
            if block['BlockType'] == "LINE":
                points=[]

                for polygon in block['Geometry']['Polygon']:
                    points.append((width * polygon['X'], height * polygon['Y']))

                draw.polygon((points), outline='black')    
  
                # Uncomment to draw bounding box
                box=block['Geometry']['BoundingBox']                    
                left = width * box['Left']
                top = height * box['Top']           
                draw.rectangle([left,top, left + (width * box['Width']), top +(height * box['Height'])],outline='yellow') 


    # Display the image
    image.show()
    # display image for 10 seconds

    
    return len(blocks)

結果

詳細規格可以餐考文本檢測和文檔分析響應對象）

但是你會發現一個叫做confidence的欄位，從上圖跟下面的json可以看到，#Get x the * document IN from IN S3這段x跟*還有IN都不應該出現，而他們的confidence也很低，分別都是25。應該要進行filter掉信心不高的內容。根據不同的場景，可信度低的檢測可能需要人類的視覺確認。

{
  "DocumentMetadata": { "Pages": 1 },
  "Blocks": [
    {
      "BlockType": "PAGE",
      "Geometry": {
        "BoundingBox": {
          "Width": 1.0,
          "Height": 0.9976544976234436,
          "Left": 0.0,
          "Top": 0.0
        },
        "Polygon": [
          { "X": 0.0, "Y": 0.0 },
          { "X": 1.0, "Y": 2.8319884677330265e-6 },
          { "X": 1.0, "Y": 0.9966249465942383 },
          { "X": 0.0, "Y": 0.9976544976234436 }
        ]
      },
      "Id": "964301ea-606d-4842-b329-e935cdd1ccac",
      "Relationships": [
        {
          "Type": "CHILD",
          "Ids": [
            "f4e39047-43f0-468b-a10b-d8718af58a9a",
            "d4d91acf-da43-4c51-828d-a99528296d76",
            "e02398d3-94f9-4dc6-9d5a-9259fa6fbe1d",
            "d331852e-b1e7-474a-9906-200fea91a887",
            "6299c37e-3586-49bf-8804-a7660e998acc",
            "cc4dda56-bff8-4c37-962b-6f9112680dc7"
          ]
        }
      ]
    },
    {
      "BlockType": "LINE",
      "Confidence": 70.04537200927734,
      "Text": "#Get x the * document IN from IN S3",
      "Geometry": {
        "BoundingBox": {
          "Width": 0.30295300483703613,
          "Height": 0.014886284247040749,
          "Left": 0.15722371637821198,
          "Top": 0.11024140566587448
        },
        "Polygon": [
          { "X": 0.15722371637821198, "Y": 0.11024140566587448 },
          { "X": 0.4601767361164093, "Y": 0.11063025891780853 },
          { "X": 0.46017175912857056, "Y": 0.12512768805027008 },
          { "X": 0.15722690522670746, "Y": 0.12475030869245529 }
        ]
      },
      "Id": "f4e39047-43f0-468b-a10b-d8718af58a9a",
      "Relationships": [
        {
          "Type": "CHILD",
          "Ids": [
            "005d8051-a3e8-414f-b509-3e3f70dce56c",
            "2e14dc51-9d9d-4740-82fb-2f6710febacf", # 這裡要注意
            "d6ef827c-d9fa-488c-b295-8d8a0b20ee86",
            "5224fd43-c247-4f96-aa67-ae4e9d63b3f1",
            "c78aca25-1f8d-4d29-9b27-5f06fb32561e",
            "c0bdce47-96af-4b3e-aacd-672deec8c2d2",
            "e1f3f661-3f2a-4ad6-9875-835d05e18132",
            "88b396a3-1f7c-455b-b73f-4ed5e2ef300a",
            "6d9b7d1d-0a22-4a53-a141-12ce50157b64"
          ]
        }
      ]
    },
    ...
        },
    {
      "BlockType": "WORD",
      "Confidence": 25.193822860717773,
      "Text": "x", # 圖片這個根本不是x，因死這個confidence也很低，應該要filter掉
      "TextType": "PRINTED",
      "Geometry": {
        "BoundingBox": {
          "Width": 0.004908399190753698,
          "Height": 0.0033209826797246933,
          "Left": 0.20951908826828003,
          "Top": 0.1167941763997078
        },
        "Polygon": [
          { "X": 0.20951908826828003, "Y": 0.1167941763997078 },
          { "X": 0.21442709863185883, "Y": 0.11680039763450623 },
          { "X": 0.21442748606204987, "Y": 0.12011516094207764 },
          { "X": 0.20951949059963226, "Y": 0.12010898441076279 }
        ]
      },
      "Id": "2e14dc51-9d9d-4740-82fb-2f6710febacf"
    },

補充內容

關於文字的位置

ref: https://docs.aws.amazon.com/zh_tw/textract/latest/dg/text-location.html

若要確定項目在文件頁面上的位置，請使用週框(Geometry)由 Amazon Textract 操作返回的信息Block物件。所以此Geometry物件包含兩類檢測到的項目的位置和幾何資訊：

軸對齊BoundingBox物件，該物件包含左上方座標以及項目的寬度和高度。
描述項目輪廓的多邊形對象，指定為Point對象包含X(水平軸) 和Y（垂直軸）每個點的文檔頁面座標。

{
    "Geometry": {
        "BoundingBox": {
            "Width": 0.053907789289951324, 
            "Top": 0.08913730084896088, 
            "Left": 0.11085548996925354, 
            "Height": 0.013171200640499592
        }, 
        "Polygon": [
            {# 起點的線
                "Y": 0.08985357731580734, 
                "X": 0.11085548996925354
            }, 
            {
                "Y": 0.08913730084896088, 
                "X": 0.16447919607162476
            }, 
            {
                "Y": 0.10159222036600113, 
                "X": 0.16476328670978546
            }, 
            {# 終點的線
                "Y": 0.10230850428342819, 
                "X": 0.11113958805799484
            }
        ]
    }, 
    "Text": "Name:", 
    "TextType": "PRINTED",
    "BlockType": "WORD", 
    "Confidence": 99.56285858154297, 
    "Id": "c734fca6-c4c4-415c-b6c1-30f7510b72ee"
},