前言

AWS Textract 是用於從 pdf(或圖片)中擷取文字的 AWS 工具。最好的情況是您的原始文件只有一欄,例如一本書。當您有多個專欄(例如報紙文章)時,事情處理起來會更加複雜。所以這次來分享一下如何使用 Amazon Textract 來處理多欄位的文字排序。有參考這篇AWS Textract: how to detect and sort text from a multi-column document做一些改良。

我的來源是一篇報紙文章,版面如下:

Textract Response format

Textract 輸出是由各種 BlockType 分層排列形成的 JSON。一個BlockType的「Page」由多個「Line」組成,而「Line」又由多個「Word」組成。在這些回應中,您看不到任何結構資訊,無法將多列文字僅排序為一列。但是可以知道的是,Textract在解析文字時,是由上到下,且一排排的解析,可以參考下圖中的編號的第27~48,您可以發現儘管是在不同的Column,但是Textract解析的順序是由左到右依序往下解析。

Solution

我們所採用的想法是使用 Textract 所提供繪製出邊界框座標。把相近的Line形成一個Block,來找出同一Column的Line。最終結果會如下圖:

從上圖中可以得知以下訊息:

  • 最長的Block可能是Title
  • 其他Block可以由上而下,由左至右排序,這樣就可以組合出閱讀順序,並得知Column的分佈

Step 1: Define the Class

從官網中有說明文檔上的目標位置怎麼看,簡單來說會長以下

1
2
3
4
5
6
"BoundingBox": {
"Width": 0.007353090215474367,
"Height": 0.0288887619972229,
"Left": 0.08638829737901688,
"Top": 0.03477252274751663
}

取自官方網站

在理解格式之後我們可以先定義Class來處理這些資料:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
class Page:
def __init__(self, page_number, lines):
self.lines = lines
self.page = page_number

def __str__(self):
for line in self.lines:
print(f"line: {line.__str__()}")
return f"Page: {self.page}"


from typing import List
from src.models.line import Line


class Block:
def __init__(self) -> None:
self.lines: List[Line] = []
self.page: int = 0
self.reason: str = ""
self.id: str = ""

self.left: int = 0
self.top: int = 0
self.height: int = 0
self.width: int = 0

def __str__(self) -> str:
return f"Block: page={self.page}, id={self.id}, (x1,y1)=({self.left}, {self.top}), (x2,y2)=({self.left + self.width},{self.top + self.height})"

def add_line(self, line):
self.lines.append(line)

class Line:
def __init__(self, id_: str, page: int, text: str, top: int, left: int, width: int, height: int) -> None:
self.top: int = top
self.left: int = left
self.width: int = width
self.height: int = height

self.page: int = page
self.id_: str = id_
self.text: str = text
self.center: list = self.get_center()

def __str__(self) -> str:
return (f"Line: \t page={self.page}, "
f"Id={self.id_}, "
f"Text={self.text}, \n"
f"left={self.left}, top={self.top}); "
f"width={self.width}, height={self.height} \n")

def get_center(self) -> list:
x = self.left
y = self.top
x1 = self.left + self.width
y1 = self.top + self.height
x_center = (x + x1) / 2
y_center = (y + y1) / 2
return [x_center, y_center]

Step 2: General Function

接下來我們需要一些通用的Function來處理資料,這邊我們會需要以下幾個Function

  • read_json_file: 讀取Textract回傳的JSON檔案
  • read_file_to_bytes: 主要讀取PDF檔案呈現結果使用
  • print_blocks: 用來印出Block的資訊
  • get_lines_from_json: 從Textract回傳的JSON中取得Line的資訊
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# response
import json
from PIL import Image
from pdf2image import convert_from_bytes
from typing import List


def read_json_file(file_path):
with open(file_path, "r") as file:
return json.load(file)


def read_file_to_bytes(file_path: str) -> List[Image.Image]:
with open(file_path, 'rb') as file:
pdf_binary = file.read()
return convert_from_bytes(pdf_binary)


def get_lines_from_json(file_path: str) -> List[Line]:
"""
Get all type "LINE" from json file generated by textract.
:param file_path: json file generated by textract.
:return: list of Line.
"""
lines: List[Line] = []
json_res = read_json_file(file_path)
for item in json_res["Blocks"]:
if item["BlockType"] == "LINE":
box = item["Geometry"]["BoundingBox"]
lines.append(
Line(
item["Id"],
item["Page"],
item["Text"],
box["Top"],
box["Left"],
box["Width"],
box["Height"]))
return lines


def print_blocks(blocks: List[Block]) -> None:
"""
print block and line information
:param blocks: blocks to print
"""
for block in blocks:
print(f"{block.__str__()}")
for line in block.lines:
print(f"{line.__str__()}")
print("\n")

此外,因為我們判斷兩個Line是否要合併成一個Block很大的要素就是看他們是否相近,因此我們可以定義一個Class負責做這件事情:

  • two_point_distance 計算兩點之間的距離,這個會被用來計算Lines之間的中心距離是否太遠: d=(x2x1)2+(y2y1)2\text{d} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

class LineSimilarityChecker:
"""
This class is used to check the similarity between two lines.
"""

def __init__(self, column_type: ColumnType,
distance_tolerance: float = 0.03,
width_tolerance: float = 0.01,
left_tolerance: float = 0.02,
height_tolerance: float = 0.02,
same_line_tolerance: float = 0.005
) -> None:
self.column_type = column_type

self.distance_tolerance = distance_tolerance
self.width_tolerance = width_tolerance
self.left_tolerance = left_tolerance
self.height_tolerance = height_tolerance
self.same_line_tolerance = same_line_tolerance

def is_left_similar(self, line1, line2, tolerance=None):
tolerance = tolerance or self.left_tolerance
return self.pretty_similar(line1.left, line2.left, tolerance)

def is_width_similar(self, line1, line2, tolerance=None):
tolerance = tolerance or self.width_tolerance
return self.pretty_similar(line1.width, line2.width, tolerance)

def is_height_similar(self, line1, line2, tolerance=None):
tolerance = tolerance or self.height_tolerance
return self.two_point_height(line1.top, line2.top) < tolerance

def is_center_close(self, line1: Line, line2: Line) -> bool:
return self.two_point_distance(
line1.center[0],
line1.center[1],
line2.center[0],
line2.center[1]) < self.distance_tolerance

@staticmethod
def pretty_similar(x: float, x1: float, tolerance: float):
return abs(x - x1) < tolerance

@staticmethod
def two_point_distance(x: float, y: float, x1: float, y1: float):
distance = math.sqrt((x - x1) ** 2 + (y - y1) ** 2)
return distance

@staticmethod
def two_point_height(y: float, y1: float):
return abs(y - y1)

Step 3: Define Rule of Block

現在我們要設計,在什麼樣的條件滿足下可以形成一個Block,這邊我們設計的規則如下:

定義規則的程式碼如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def is_two_line_close(block, target_line, cur_line):
"""
Check if two lines are close enough, so we can merge them into a block
"""
left_tolerance = 0.02
width_tolerance = 0.01 # [left_tolerance, (target_line.width - cur_line.width).abs / 2].max
distance_tolerance = 0.04
height_tolerance = 0.03

def is_left_similar(line1, line2, tolerance = left_tolerance):
return pretty_similar(line1.left, line2.left, tolerance)
def is_width_similar(line1, line2, tolerance = width_tolerance):
return pretty_similar(line1.width, line2.width, tolerance)
def is_height_similar(line1, line2, tolerance = height_tolerance):
return two_point_hight(line1.top, line2.top) < tolerance
def is_on_same_page(line1, line2):
return line1.page == line2.page
def is_center_close(line1, line2):
return two_point_distance(line1.center[0], line1.center[1], line2.center[0], line2.center[1]) < distance_tolerance

def is_same_paragraph():
"""
用來處理相同的Paragraph
如果左邊起點相同 且 高度相近 那他們就是在同一個Block
"""
if (is_left_similar(target_line, cur_line) and
is_height_similar(block.lines[-1], cur_line)):
return True
return False

def is_text_center_context():
"""
用來處理置中的文字
如果中心點相近 且 高度相近 那他們就是在同一個Block
"""
return (is_center_close(target_line, cur_line) and is_height_similar(block.lines[-1], cur_line))

# 首先檢查是否在同一頁 因為不能讓Block跨頁
if is_on_same_page(target_line, cur_line):
# 把相同的paragraph 或是 內容置中的 匡選起來
if is_same_paragraph() or is_text_center_context():
return True
else:
return False

Step 4: Iterate Line to Form Block

接下來我們就可以開始遞迴所有的Line來找出Block

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def merge_lines_to_block(lines):
"""
把Line合併成Block
"""
ready_blocks = [] # 準備一個空的Block

# 只要lines還有東西就繼續組成Block
while lines:
block = Block()
target_line = lines[0] # 取出第一個Line作為組成Block的第一個被比較的對象
block.add_line(lines[0]) # 把target_line加入Block
block.page = target_line.page # 設定Block的Page
lines.pop(0) # 把target_line從lines中移除
index = 0 # 重新設置 index 為 0,因為pop會影響到index順序
# 遞迴所有的Line直到沒有lines可以比較了
while index < len(lines):
cur_line = lines[index]
if target_line.page == cur_line.page:
# 寬度一樣 那中心不能差太遠
if is_two_line_close(block, target_line, cur_line):
block.add_line(cur_line)
lines.pop(index) # pop完之後cur_line要從index 0開始
index = 0 # 重新設置 index 為 0
continue # 繼續下一輪循環
index += 1 # 檢查下一個元素

ready_blocks.append(block) # 把整理好的Block加入清單
return blocks

Step5: Execute

最後我們就可以執行程式碼了

1
2
3
4
5
6
7
8
9

json_path = "./result/test.json" # 從Textract回傳的JSON檔案
pdf_path = "../../src/test.pdf"

lines = get_lines_from_json(json_path) # 從Textract回傳的JSON中取得Line的資訊
blocks = merge_lines_to_block(lines) # 把Line合併成Block
blocks = find_block_corners(blocks) # 找出Block的四個角落座標,以可以包圍所有的Line
show_image_bbox(pdf_path, blocks)
#print_blocks(blocks) # 可以印出Block的資訊

結果如下圖:

Advance: Clean Code + Single Page

但是上述的結果並不適合Single Page,因此我們可以多做一些額外的設定。為了方便管理,我把各個服務拆分變成模組,以下是檔案結構

檔案結構
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
.
├── resource
│   ├── pdf # 放pdf檔案
│   │   ├── single-column.pdf
│   │   └── multi-column.pdf
│   └── result # 放textract解析後的json檔案
│   ├── single-column
│   │   └── final-result.json
│   └── multi-column
│   ├── final-result.json
│   ├── result_0.json
│   └── result_1.json
├── src # 放程式碼
│   ├── models # 放Class
│   │   ├── __init__.py
│   │   ├── block.py
│   │   ├── line.py
│   │   ├── page.py
│   │   └── process_type.py
│   ├── ocr # 放處理OCR相關的服務
│   │   ├── util
│   │   │   ├── __init__.py
│   │   │   ├── bbox_merger.py
│   │   │   └── functions.py
│   │   └── __init__.py
│   └── __init__.py
├── tests
│   ├── __init__.py
│   ├── test_block_merge.ipynb
└── README.md

經過整理的程式碼可以參考如下:

各種class檔案
1
2
3
4
5
6
7
│   ├── models
│   │   ├── __pycache__
│   │   ├── __init__.py
│   │   ├── block.py
│   │   ├── line.py
│   │   ├── page.py
│   │   └── process_type.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
from typing import List
from src.models.line import Line


class Block:
def __init__(self) -> None:
self.lines: List[Line] = []
self.page: int = 0
self.reason: str = ""
self.id: str = ""

self.left: int = 0
self.top: int = 0
self.height: int = 0
self.width: int = 0

def __str__(self) -> str:
return f"Block: page={self.page}, id={self.id}, (x1,y1)=({self.left}, {self.top}), (x2,y2)=({self.left + self.width},{self.top + self.height})"

def add_line(self, line):
self.lines.append(line)

class Line:
def __init__(self, id_: str, page: int, text: str, top: int, left: int, width: int, height: int) -> None:
self.top: int = top
self.left: int = left
self.width: int = width
self.height: int = height

self.page: int = page
self.id_: str = id_
self.text: str = text
self.center: list = self.get_center()

def __str__(self) -> str:
return (f"Line: \t page={self.page}, "
f"Id={self.id_}, "
f"Text={self.text}, \n"
f"left={self.left}, top={self.top}); "
f"width={self.width}, height={self.height} \n")

def get_center(self) -> list:
x = self.left
y = self.top
x1 = self.left + self.width
y1 = self.top + self.height
x_center = (x + x1) / 2
y_center = (y + y1) / 2
return [x_center, y_center]

class Page:
def __init__(self, page_number, lines):
self.lines = lines
self.page = page_number

def __str__(self):
for line in self.lines:
print(f"line: {line.__str__()}")
return f"Page: {self.page}"

class ProcessType:
LINE = "LINE"
WORD = "WORD"
通用Function

src/ocr/util/functions.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# response
import json
from PIL import Image
from pdf2image import convert_from_bytes
from typing import List


def read_json_file(file_path):
with open(file_path, "r") as file:
return json.load(file)


def read_file_to_bytes(file_path: str) -> List[Image.Image]:
with open(file_path, 'rb') as file:
pdf_binary = file.read()
return convert_from_bytes(pdf_binary)
bbox_merger.py 負責把lines變成blocks

src/ocr/util/bbox_merger.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
import math
import re
from typing import List

from pdf2image import convert_from_bytes
from matplotlib.patches import Rectangle

from src.models.block import Block
from src.models.line import Line
from src.ocr.util.functions import read_json_file
from matplotlib import pyplot as plt


class ColumnType:
"""
Column type, based on the column type of pdf.
"""
SINGLE = "SINGLE"
MULTI = "MULTI"


def get_lines_from_json(file_path: str) -> List[Line]:
"""
Get all type "LINE" from json file generated by textract.
:param file_path: json file generated by textract.
:return: list of Line.
"""
lines: List[Line] = []
json_res = read_json_file(file_path)
for item in json_res["Blocks"]:
if item["BlockType"] == "LINE":
box = item["Geometry"]["BoundingBox"]
lines.append(
Line(
item["Id"],
item["Page"],
item["Text"],
box["Top"],
box["Left"],
box["Width"],
box["Height"]))
return lines


def print_blocks(blocks: List[Block]) -> None:
"""
print block and line information
:param blocks: blocks to print
"""
for block in blocks:
print(f"{block.__str__()}")
for line in block.lines:
print(f"{line.__str__()}")
print("\n")


class LineSimilarityChecker:
"""
This class is used to check the similarity between two lines.
"""

def __init__(self, column_type: ColumnType,
distance_tolerance: float = 0.03,
width_tolerance: float = 0.01,
left_tolerance: float = 0.02,
height_tolerance: float = 0.02,
same_line_tolerance: float = 0.005
) -> None:
self.column_type = column_type

self.distance_tolerance = distance_tolerance
self.width_tolerance = width_tolerance
self.left_tolerance = left_tolerance
self.height_tolerance = height_tolerance
self.same_line_tolerance = same_line_tolerance

def is_left_similar(self, line1, line2, tolerance=None):
tolerance = tolerance or self.left_tolerance
return self.pretty_similar(line1.left, line2.left, tolerance)

def is_width_similar(self, line1, line2, tolerance=None):
tolerance = tolerance or self.width_tolerance
return self.pretty_similar(line1.width, line2.width, tolerance)

def is_height_similar(self, line1, line2, tolerance=None):
tolerance = tolerance or self.height_tolerance
return self.two_point_height(line1.top, line2.top) < tolerance

def is_center_close(self, line1: Line, line2: Line) -> bool:
return self.two_point_distance(
line1.center[0],
line1.center[1],
line2.center[0],
line2.center[1]) < self.distance_tolerance

@staticmethod
def pretty_similar(x: float, x1: float, tolerance: float):
return abs(x - x1) < tolerance

@staticmethod
def two_point_distance(x: float, y: float, x1: float, y1: float):
distance = math.sqrt((x - x1) ** 2 + (y - y1) ** 2)
return distance

@staticmethod
def two_point_height(y: float, y1: float):
return abs(y - y1)


class LineMerger:
"""
This class is used to turn lines to blocks by compare each line's similarity.
"""

def __init__(self, lines, column_type: ColumnType = ColumnType.SINGLE):
self.column_type = column_type
self.line_check = LineSimilarityChecker(self.column_type)
self.lines: List[Line] = lines

def get_blocks(self) -> List[Block]:
"""
Get all blocks after turning lines into blocks.
:param column_type: column is SINGLE or MULTI default is SINGLE
:return: blocks
"""
blocks = self.merge_lines_to_block(self.lines)
return self.find_block_corners(blocks)

def merge_lines_to_block(self, lines) -> List[Block]:
blocks: List[Block] = []
while lines:
block = Block()
block.add_line(lines.pop(0))
block.page = block.lines[0].page
target_line = block.lines[0]
index = 0
while index < len(lines):
cur_line = lines[index]
if target_line.page == cur_line.page:
# for single column, when encounter number point, make a
# new a block
if self.column_type == ColumnType.SINGLE and self.is_start_special_word(
cur_line):
print("---Found special word---")
print(cur_line.text)
print("---End special word: Jump to next block---")
break
# other case, all need to compare the lines are close or
# not.
else:
if self.is_two_line_close(block, cur_line):
block.add_line(cur_line)
lines.pop(index)
index = 0
continue
index += 1
blocks.append(block)
return blocks

def is_start_special_word(self, cur_line: Line):
# 先去字串前後空白 根據空白進行split,取第一個字串
curStart = cur_line.text.strip().split(" ")[0]
pattern = self._regex_pattern()

if re.match(pattern, curStart):
return True
else:
return False

@staticmethod
def _regex_pattern() -> str:
# general word or number + "." + any words (e.g. 1.Hello my friend)
GENERAL_WORD_DOT_PATTERN = r'^[a-zA-Z0-9]\..*'
# non-general one word or number + general one word or num + any word (e.g (1) This is ...)
NON_ALPHANUMERIC_WORD_PATTERN = r'[^a-zA-Z0-9][a-zA-Z0-9][^a-zA-Z0-9].*'

return '{}|{}'.format(
GENERAL_WORD_DOT_PATTERN,
NON_ALPHANUMERIC_WORD_PATTERN)

def is_two_line_close(self, block, cur_line):
last_line: Line = block.lines[-1]
target_line: Line = block.lines[0]

if self.is_on_same_page(target_line, cur_line):
# multi column: center text & paragraph block
if self.column_type == ColumnType.MULTI:
if (self.is_same_paragraph(last_line, cur_line) or
self.is_text_center_context(last_line, cur_line)):
return True

# single column: same line
elif self.column_type == ColumnType.SINGLE:
if (self.is_on_same_line(last_line, cur_line) or
self.is_same_paragraph(last_line, cur_line)):
return True

return False

@staticmethod
def is_on_same_page(line1, line2) -> bool:
return line1.page == line2.page

def is_on_same_line(self, last_line, cur_line) -> bool:
return self.line_check.is_height_similar(last_line, cur_line)

def is_same_paragraph(
self,
last_line: Line,
cur_line: Line) -> bool:
if (self.line_check.is_left_similar(last_line, cur_line)
and self.line_check.is_height_similar(last_line, cur_line)):
return True

return False

def is_text_center_context(self, last_line: Line, cur_line: Line) -> bool:
return (self.line_check.is_center_close(last_line, cur_line) and
self.line_check.is_height_similar(last_line, cur_line))

@staticmethod
def find_block_corners(blocks: List[Block]) -> List[Block]:
for index, block in enumerate(blocks):
min_top = min(line.top for line in block.lines)
min_left = min(line.left for line in block.lines)
max_bottom = max(line.top + line.height for line in block.lines)
max_right = max(line.left + line.width for line in block.lines)

block.height = max_bottom - min_top
block.width = max_right - min_left
block.top = min_top
block.left = min_left
block.id = index

return blocks


def show_image_bbox(pdf_file, blocks) -> None:
"""
show image bounding box
:param pdf_file: the pdf file location
:param blocks: the list of blocks we want to draw
"""
with open(pdf_file, 'rb') as file:
images = convert_from_bytes(file.read())

for index, image in enumerate(images):
width, height = image.size
page = index + 1
print(f"Process Page Index: {page}")

plt.figure(figsize=(20, 16))
plt.imshow(image)

# iterate over the blocks
for i, block in enumerate(blocks):
if block.page == page:
rect = Rectangle(
(width * block.left,
height * block.top),
block.width * width,
block.height * height,
edgecolor='r',
facecolor='none')
plt.text(
width * block.left,
height * block.top,
block.id,
fontsize=12,
color='red')
plt.gca().add_patch(rect)
plt.show()

接下來就可以直接呼叫了
tests/test_block_merge.ipynb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from src.ocr.util.bbox_merger import (
show_image_bbox,
get_lines_from_json,
LineMerger,
print_blocks,
ColumnType
)

class Value:
def __init__(self, is_multi_column: bool):
if is_multi_column:
self.topic = "multi-column"
self.column_type = ColumnType.MULTI
else:
self.topic = "single-column"
self.column_type = ColumnType.SINGLE

self.json_path = '../resource/result/{}/final-result.json'.format(self.topic)
self.pdf_path = '../resource/pdf/{}.pdf'.format(self.topic)

def main(is_multi_column: bool):
v = Value(is_multi_column = is_multi_column)
lines = get_lines_from_json(v.json_path)
blocks = LineMerger(lines, v.column_type).get_blocks()
print_blocks(blocks)
show_image_bbox(pdf_file=v.pdf_path, blocks=blocks)

if __name__ == "__main__":
main(is_multi_column = False)