我理解以下内容:
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
输出:
(2, 4)
所以我想知道为什么我会得到以下信息:
import numpy
import pytesseract
import logging
# Raw call does not need escaping like usual Windows path in python
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
logging.basicConfig(level=logging.WARNING)
logging.getLogger('pytesseract').setLevel(logging.DEBUG)
image = r'C:\ocr\target\31832_226140__0001-00002b.jpg'
target = numpy.asarray(pytesseract.image_to_string(image, config='--dpi 96 --psm 6 -c preserve_interword_spaces=1 -c tessedit_char_whitelist="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,- \'" '))
print("target type is:",type(target))
print("target array shape is:",target.shape)
输出:
DEBUG:pytesseract:['C:\\Program Files\\Tesseract-OCR\\tesseract', 'C:\\ocr\\target\\31832_226140__0001-00002b.jpg', 'C:\\Users\\david\\AppData\\Local\\Temp\\tess_p68ogbz9', '--dpi', '96', '--psm', '6', '-c', 'preserve_interword_spaces=1', '-c', "tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,- '", 'txt']
target type is: <class 'numpy.ndarray'>
target array shape is: ()
可以我的数组是文本。但我仍然认为我会得到参数的例子,比如说
(1,999)
为了我的身材?
使用线路
print(target)
给出以下类型的输出。
-------->snip<----------
196 ANGUS, Lynne Manon ........................128 Wellington Rd, Wemuomata Recepnonst
197 ANGUS, Mane Joan .........00... ......129 Wellington Road, Weinumomata, Married
198 ANGUS, Manon Jean .........................173 Wellington Road, Weinuiomata,Texi Driver
199 ANGUS. Noel Fulton ........................127 Weinuomats Road, Weinuomate, Carpenter