代码之家  ›  专栏  ›  技术社区  ›  Nithin Varghese

连接线段和标签线段中最近的点

  •  11
  • Nithin Varghese  · 技术社区  · 7 年前

    我使用开放式简历和略读法对数据表进行文档分析。 enter image description here 我正试图分开阴影区。 enter image description here

    我现在能够将零件和编号分割为不同的簇。 enter image description here

    使用 felzenszwalb() 从skimage I片段中可以看出:

    import matplotlib.pyplot as plt
    import numpy as np     
    from skimage.segmentation import felzenszwalb
    from skimage.io import imread
    
    img = imread('test.jpg')
    
    segments_fz = felzenszwalb(img, scale=100, sigma=0.2, min_size=50)
    
    print("Felzenszwalb number of segments {}".format(len(np.unique(segments_fz))))
    
    plt.imshow(segments_fz)
    plt.tight_layout()
    plt.show()
    

    但无法连接他们。任何有条理地连接并用零件号和零件号标记出相应的段的想法都会有很大帮助。 如果我漏掉了什么,不管是强调过度还是强调不足,请在评论中告诉我。

    1 回复  |  直到 6 年前
        1
  •  3
  •   Richard    6 年前

    预备工作

    一些初步代码:

    %matplotlib inline
    %load_ext Cython
    import numpy as np
    import cv2
    from matplotlib import pyplot as plt
    import skimage as sk
    import skimage.morphology as skm
    import itertools
    
    def ShowImage(title,img,ctype):
      plt.figure(figsize=(20, 20))
      if ctype=='bgr':
        b,g,r = cv2.split(img)       # get b,g,r
        rgb_img = cv2.merge([r,g,b])     # switch it to rgb
        plt.imshow(rgb_img)
      elif ctype=='hsv':
        rgb = cv2.cvtColor(img,cv2.COLOR_HSV2RGB)
        plt.imshow(rgb)
      elif ctype=='gray':
        plt.imshow(img,cmap='gray')
      elif ctype=='rgb':
        plt.imshow(img)
      else:
        raise Exception("Unknown colour type")
      plt.axis('off')
      plt.title(title)
      plt.show()
    

    作为参考,这是您的原始图像:

    #Read in image
    img         = cv2.imread('part.jpg')
    ShowImage('Original',img,'bgr')
    

    Original Image

    识别号码

    为了简化,我们需要将像素分类为开或关。我们可以通过阈值来实现。由于我们的图像包含两类清晰的像素(黑色和白色),我们可以使用 Otsu's method 是的。我们将反转颜色方案,因为我们使用的库认为黑色像素无聊,白色像素有趣。

    #Convert image to grayscale
    gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    
    #Apply Otsu's method to eliminate pixels of intermediate colour
    ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
    
    ShowImage('Applying Otsu',thresh,'gray')
    
    #Verify that pixels are either black or white and nothing in between
    np.unique(thresh)
    

    Otsu transformed

    我们的策略是定位数字,然后沿着数字附近的线找到零件,然后标记这些零件。因为,方便地,所有阿拉伯数字都是由相邻像素构成的,所以我们可以从查找连接的组件开始。

    ret, components = cv2.connectedComponents(thresh)
    #Each component is a different colour
    ShowImage('Connected Components', components, 'rgb')
    

    Connected components

    然后,我们可以过滤连接的组件,通过过滤维度来查找数字。请注意,这并不是一种超健壮的方法。一个更好的选择是使用字符识别,但这是留给读者的练习:-)

    class Box:
        def __init__(self,x0,x1,y0,y1):
            self.x0, self.x1, self.y0, self.y1 = x0,x1,y0,y1
        def overlaps(self,box2,tol):
            if self.x0 is None or box2.x0 is None:
                return False
            return not (self.x1+tol<=box2.x0 or self.x0-tol>=box2.x1 or self.y1+tol<=box2.y0 or self.y0-tol>=box2.y1)
        def merge(self,box2):
            self.x0 = min(self.x0,box2.x0)
            self.x1 = max(self.x1,box2.x1)
            self.y0 = min(self.y0,box2.y0)
            self.y1 = max(self.y1,box2.y1)
            box2.x0 = None #Used to mark `box2` as being no longer valid. It can be removed later
        def dist(self,x,y):
            #Get center point
            ax = (self.x0+self.x1)/2
            ay = (self.y0+self.y1)/2
            #Get distance to center point
            return np.sqrt((ax-x)**2+(ay-y)**2)
        def good(self):
            return not (self.x0 is None)
    
    def ExtractComponent(original_image, component_matrix, component_number):
        """Extracts a component from a ConnectedComponents matrix"""
        #Create a true-false matrix indicating if a pixel is part of a particular component
        is_component = component_matrix==component_number
        #Find the coordinates of those pixels
        coords = np.argwhere(is_component)
    
        # Bounding box of non-black pixels.
        y0, x0 = coords.min(axis=0)
        y1, x1 = coords.max(axis=0) + 1   # slices are exclusive at the top
    
        # Get the contents of the bounding box.
        return x0,x1,y0,y1,original_image[y0:y1, x0:x1]
    
    numbers_img = thresh.copy() #This is used purely to show that we can identify numbers
    numbers = []
    for component in range(components.max()):
        tx0,tx1,ty0,ty1,this_component = ExtractComponent(thresh, components, component)
        #ShowImage('Component #{0}'.format(component), this_component, 'gray')
        cheight, cwidth = this_component.shape
        #print(cwidth,cheight) #Enable this to see dimensions
        #Identify numbers based on aspect ratio
        if (abs(cwidth-14)<3 or abs(cwidth-7)<3) and abs(cheight-24)<3:
            numbers_img[ty0:ty1,tx0:tx1] = 128
            numbers.append(Box(tx0,tx1,ty0,ty1))
    ShowImage('Numbers', numbers_img, 'gray')
    

    Numbers with Separated Boxes

    我们现在通过稍微扩展它们的边界框并寻找重叠来将这些数字连接到相邻的块中。

    #This is kind of a silly way to do this, but it will work find for small quantities (hundreds)
    merged=True                                       #If true, then a merge happened this round
    while merged:                                     #Continue until there are no more mergers
        merged=False                                  #Reset merge indicator
        for a,b in itertools.combinations(numbers,2): #Consider all pairs of numbers
            if a.overlaps(b,10):                      #If this pair overlaps
                a.merge(b)                            #Merge it
                merged=True                           #Make a note that we've merged
    numbers = [x for x in numbers if x.good()]        #Eliminate those boxes that were gobbled by the mergers
    
    #This is used purely to show that we can identify numbers
    numbers_img = thresh.copy() 
    for n in numbers:
        numbers_img[n.y0:n.y1,n.x0:n.x1] = 128
        thresh[n.y0:n.y1,n.x0:n.x1] = 0 #Drop numbers from thresholded image
    ShowImage('Numbers', numbers_img, 'gray')
    

    Numbers connected

    好吧,现在我们已经确定了号码!我们稍后将使用这些来识别零件。

    识别箭头

    接下来,我们要弄清楚数字指向的是什么部分。为此,我们要检测线。hough变换对此很好。为了减少误报的数量,我们对数据进行骨架化,然后将其转换为一个最多一个像素宽的表示。

    skel = sk.img_as_ubyte(skm.skeletonize(thresh>0))
    ShowImage('Skeleton', skel, 'gray')
    

    Skeleton

    现在我们执行hough变换。我们要找一个能识别从数字到零件的所有行的。要做到这一点,可能需要对参数进行一些修改。

    lines = cv2.HoughLinesP(
        skel,
        1,           #Resolution of r in pixels
        np.pi / 180, #Resolution of theta in radians
        30,          #Minimum number of intersections to detect a line
        None,
        80,          #Min line length
        10           #Max line gap
    )
    lines = [x[0] for x in lines]
    
    line_img = thresh.copy()
    line_img = cv2.cvtColor(line_img, cv2.COLOR_GRAY2BGR)
    for l in lines:
        color = tuple(map(int, np.random.randint(low=0, high=255, size=3)))
        cv2.line(line_img, (l[0], l[1]), (l[2], l[3]), color, 3, cv2.LINE_AA)
    ShowImage('Lines', line_img, 'bgr')
    

    Lines Identified

    我们现在要找到一条或多条最接近每个数字的线,并且只保留这些线。我们基本上过滤掉了所有不是箭头的线。为此,我们将每条线的端点与每个数字框的中心点进行比较。

      comp_labels = np.zeros(img.shape[0:2], dtype=np.uint8)
    
    for n_idx,n in enumerate(numbers):
        distvals = []
        for i,l in enumerate(lines):
            #Distances from each point of line to midpoint of rectangle
            dists    = [n.dist(l[0],l[1]),n.dist(l[2],l[3])] 
            #Minimum distance and the end point (0 or 1) of the line associated with that point
            #Tuples of (Line Number, Line Point, Dist to Line Point) are produced
            distvals.append( (i,np.argmin(dists),np.min(dists)) )
        #Sort by distance between the number box and the line
        distvals = sorted(distvals, key=lambda x: x[2])
        #Include nearby lines, not just the closest one. This accounts for forking.
        distvals = [x for x in distvals if x[2]<1.5*distvals[0][2]]
    
        #Draw a white rectangle where the number box was
        cv2.rectangle(comp_labels, (n.x0,n.y0), (n.x1,n.y1), 1, cv2.FILLED)
    
        #Draw white lines where the arrows are
        for dv in distvals:
            l = lines[dv[0]]
            lp = (l[0],l[1]) if dv[1]==0 else (l[2],l[3])
            cv2.line(comp_labels, (l[0], l[1]), (l[2], l[3]), 1, 3, cv2.LINE_AA)
            cv2.line(comp_labels, (lp[0], lp[1]), ((n.x0+n.x1)//2, (n.y0+n.y1)//2), 1, 3, cv2.LINE_AA)
    ShowImage('Lines', comp_labels, 'gray')
    

    The Arrows

    查找零件

    这部分很难!我们现在要分割图像中的部分。如果有某种方法将连接子部件的线断开,这将很容易。不幸的是,连接子部件的线与组成部件的许多线的宽度相同。

    为了解决这个问题,我们需要很多逻辑。这将是痛苦和容易出错的。

    或者,我们可以假设你有一个专家。这位专家的唯一工作是切断连接子部件的线路。这对他们来说应该既简单又快捷。给所有东西贴标签对人类来说是缓慢而悲伤的,但对计算机来说却是快速的。分离事物对人类来说很容易,但对计算机来说却很难。所以我们让他们都做他们最擅长的事。

    在这种情况下,你可能会在几分钟内训练某人完成这项工作,因此真正的“专家”其实并不必要。只是一个稍微有能力的人。

    如果你继续这样做,你需要编写专家在环工具。为此,请保存骨架图像,让专家对其进行修改,然后重新读取骨架图像。像这样。

    #Save the image, or display it on a GUI
    #cv2.imwrite("/z/skel.png", skel);
    #EXPERT DOES THEIR THING HERE
    #Read the expert-mediated image back in
    skelhuman = cv2.imread('/z/skel.png')
    #Convert back to the form we need
    skelhuman = cv2.cvtColor(skelhuman,cv2.COLOR_BGR2GRAY)
    ret, skelhuman = cv2.threshold(skelhuman,0,255,cv2.THRESH_OTSU)
    ShowImage('SkelHuman', skelhuman, 'gray')
    

    Skeleton with human modification

    现在我们已经分离了部分,我们将尽可能多地消除箭头。我们已经提取了上面的内容,如果需要的话,我们可以稍后再添加。

    为了消除箭头,我们将找到所有以另一条线以外的位置终止的线。也就是说,我们将定位只有一个相邻像素的像素。然后我们将消除像素并查看它的邻居。重复执行此操作可消除箭头。既然我不知道它的另一个术语,我就称它为 熔丝变换 是的。因为这需要操作单个像素,这将是 超级的 在python中速度较慢,我们将用cython编写转换。

    %%cython -a --cplus
    import cython
    
    from libcpp.queue cimport queue
    import numpy as np
    cimport numpy as np
    
    @cython.boundscheck(False)
    @cython.wraparound(False)
    @cython.nonecheck(False)
    @cython.cdivision(True) 
    cpdef void FuseTransform(unsigned char [:, :] image):
        # set the variable extension types
        cdef int c, x, y, nx, ny, width, height, neighbours
        cdef queue[int] q
    
        # grab the image dimensions
        height = image.shape[0]
        width  = image.shape[1]
    
        cdef int dx[8]
        cdef int dy[8]
    
        #Offsets to neighbouring cells
        dx[:] = [-1,-1,0,1,1,1,0,-1]
        dy[:] = [0,-1,-1,-1,0,1,1,1]
    
        #Find seed cells: those with only one neighbour
        for y in range(1, height-1):
            for x in range(1, width-1):
                if image[y,x]==0: #Seed cells cannot be blank cells
                    continue
                neighbours = 0
                for n in range(0,8):   #Looks at all neighbours
                    nx = x+dx[n]
                    ny = y+dy[n]
                    if image[ny,nx]>0: #This neighbour has a value
                        neighbours += 1
                if neighbours==1:      #Was there only one neighbour?
                    q.push(y*width+x)  #If so, this is a seed cell
    
        #Starting with the seed cells, gobble up the lines
        while not q.empty():
            c = q.front()
            q.pop()
            y = c//width         #Convert flat index into 2D x-y index
            x = c%width
            image[y,x] = 0       #Gobble up this part of the fuse
            neighbour  = -1      #No neighbours yet
            for n in range(0,8): #Look at all neighbours
                nx = x+dx[n]     #Find coordinates of neighbour cells
                ny = y+dy[n]
                #If the neighbour would be off the side of the matrix, ignore it
                if nx<0 or ny<0 or nx==width or ny==height:
                    continue
                if image[ny,nx]>0:      #Is the neighbouring cell active?
                    if neighbour!=-1:   #If we've already found an active neighbour
                        neighbour=-1    #Then pretend we found no neighbours
                        break           #And stop looking. This is the end of the fuse.
                    else:               #Otherwise, make a note of the neighbour's index.
                        neighbour = ny*width+nx
            if neighbour!=-1:           #If there was only one neighbour
                q.push(neighbour)       #Continue burning the fuse
    

    回到标准的python:

    #Apply the Fuse Transform
    skh_dilated=skelhuman.copy()
    FuseTransform(skh_dilated)
    ShowImage('Fuse Transform', skh_dilated, 'gray')
    

    Fuse Transformed

    现在我们已经消除了连接各个部分的所有箭头和线,我们扩展了剩余的像素 很多 是的。

    kernel = np.ones((3,3),np.uint8)
    dilated  = cv2.dilate(skh_dilated, kernel, iterations=6)
    ShowImage('Dilation', dilated, 'gray')
    

    Dilated parts

    把所有的东西放在一起

    覆盖我们之前分割的标签和箭头…

    comp_labels_dilated  = cv2.dilate(comp_labels, kernel, iterations=5)
    labels_combined = np.uint8(np.logical_or(comp_labels_dilated,dilated))
    ShowImage('Comp Labels', labels_combined, 'gray')
    

    Combined arrows and parts

    最后,我们获取合并的数字框、组件箭头和部分,并使用来自 Color Brewer 是的。然后我们将其覆盖在原始图像上,以获得所需的突出显示。

    ret, labels = cv2.connectedComponents(labels_combined)
    colormask = np.zeros(img.shape, dtype=np.uint8)
    #Colors from Color Brewer
    colors = [(228,26,28),(55,126,184),(77,175,74),(152,78,163),(255,127,0),(255,255,51),(166,86,40),(247,129,191),(153,153,153)]
    for l in range(labels.max()):
        if l==0: #Background component
            colormask[labels==0] = (255,255,255)
        else:
            colormask[labels==l] = colors[l]
    ShowImage('Comp Labels', colormask, 'bgr')
    blended = cv2.addWeighted(img,0.7,colormask,0.3,0)
    ShowImage('Blended', blended, 'bgr')
    

    Colored parts

    最终图像

    Final image

    所以,概括地说,我们确定了数字、箭头和零件。在某些情况下,我们能够自动分离它们。在其他情况下,我们在循环中使用专家。在我们必须单独操作像素的地方,我们使用cython来提高速度。

    当然,这类事情的危险在于,其他一些图像会打破我在这里所做的(许多)假设。但是当你试图用一张图片来呈现一个问题时,你会冒这个风险。