Automatic ID Card Information Acquisition

2017-11-27

Word count: 1k | Reading time≈ 6 min

¶Introduction

The ID recognition system is a system that can recognize ID numbers in Chinese ID card. The source of ID card can be from videos or images. For video ID recognition, this system first track ID card in a video and then recognize ID in the clearest frame. For accurate result of tracking the moving speed of the ID card should not be too fast. Also the resolution of the video should be at least 1000*600 if the ID card occupies 1/3 of one frame. To realize functions above, we used 3 image processing methods, including intelligent scissors, SURF feature tracking and ID recognition.

¶Background

Effective and instant ID card identification is a critical issue for public especially on-campus security. Nowadays RF based technologies take up a major share in this application. But the electromagnetic interference limits the reliability of such tech. Digital image processing based Object Detection and tracking is considered as a reliable complement. Availability of high definition videos, fast processing computers and exponentially increased highly reliable automated video analysis algorithms have made it possible.

We have learned a lot about image computing methodology this semester in course: Biomedical image processing. And we want to use some of them. According to the idea of ID recognition, we want to implement the segmentation, tracking, recognition to transform ID numbers from pixels to integer in computer.

¶Algorithms

Intelligent Scissors for image segmentation
The Intelligent Scissors of our final project is from the paper (“Intelligent Scissors for Image Composition”, Eric N. Mortensen, William A. Barrett. Brigham Young University). We use this methodology to get a template where we can find features for the next tracking step. To implement this, first we should calculate all the local cost in a image by the following formulation (1):
$$
I(p,q) = \omega Z f Z(q) + \omega G f G(q) (\omega Z=0.8;\omega G=0.2)
$$

The $fZ(q)$ is the zero crossing result and the $fG(q)$ is the result of gradient magnitude. The second step is to calculate the shortest path between two points using the revised Dijkstra algorithm. And the last step is to show this path in the image by a bright color line.

Feature Tracking
To realize the feature tracking, we need to detect the features first. In the segment image which we get from the last step, our program try to find SURF features by the following method:

Smooth image with Approximate Gaussian filters
First, use different Approximate Gaussian smoothing filters (the size of the Approximate Gaussian filter are 99, 1515, 2121, 2727 respectively.

And then calculate the determinant of the Hessian matrix and the following formulation (2) is the Hessian Matrix and the determinant:

And $D_{xx}$ and $D_{xy}$ is the convolution of approximate Gaussian filter and the Image at each point in direction x-x and direction x-y respectively. The value 0.9 is a parameter which is got from experiments. We set a threshold and compare it with the determinant of the Hessian matrix to detect interested points. The method above can realize the scale invariance and the following orientation assignments can realize the orientation invariance.
Orientation assignment
First, calculate the Haar-wavelet responses in x and y direction in a circular neighborhood of radius 6s around the interest point, with s the scale of this image. Then, calculate the sum of all wavelet responses within a sliding orientation window covering an angle of π /3 to determine the orientation.
Get the SURF descriptor

Constructing a square region centred around the interest point and oriented along the orientation selected. Split up the region to 44 sub-regions and compute a new Haar-wavelet response at 55 space sample points(dx &dy). From the paper, the authors bring in information about the polarity of the intensity changes, we also extract the sum of the absolute values of the responses. Sum up dx and dy over each sub-region form a first set of entries to the feature vector. Compare the descriptor among all the interest points and find similarity in the descriptor to match the features.
Use Ransac algorithm to wipe off the outliers
A simple example is fitting of a line in two dimensions to a set of observations. Assuming that this set contains both inliers, i.e., points which approximately can be fitted to a line, and outliers, points which cannot be fitted to this line, a simple least squares method for line fitting will generally produce a line with a bad fit to the inliers. So this algorithm wants to find a line that fit the most inliers. An iterative process randomly finds a line and counts the inliers until find one line that fit almost all the inliers. In this project, I use the SURF feature detecting and matching function in computer vision toolbox.
Find the clearest frame
The clearest frame is defined as the frame with the most matched features. I do a feature statistic among all the frames and find the frame with the most matched features. Then store it for the recognition part.

Morphological opening operation
This method is employed to segment the ‘Number area’ from the background and photos. Morphological image processing is a classic methodology which pursues the goals of removing some imperfections by accounting for the form and structure of the image. These techniques can be extended to greyscale images and especially suited to the processing of binary images.
Neural network fitting
Here I employ back-propagation neural network to build correspondence between characters and numbers. Backpropagation neural network is a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. Backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient. It is therefore usually considered to be a supervised learning method, although it is also used in some unsupervised networks such as autoencoders. It is a generalization of the delta rule to multi-layered feedforward networks, made possible by using the chain rule to iteratively compute gradients for each layer. The schematic of a bp neural network is shown below.

¶Implementation and Experimental Results

Result of intelligent scissors

Result of the feature matching
This matching result contains both inliers and outliers are shown below.

After imposing the Ransac algorithm to both inliers and outliers, it looks like:

The flowchart can be summarized as
Locating the region of interest and segmentations
The very first step of image processing is to locate to area of interest, say ‘number area’ in this project. As the uniform ID we use, the locations of the region of interest are exactly the same among samples. Based on that, we roughly truncate a rectangle positioning [x/3.5 y/2 2x/3 y] with respect to a full frame of ID card image.
Morphological operations
Erosion and dilation are two typical operations people often talk about. Obviously dilation is what we need here to generate connected region mask. MATLAB provides a good implementation of morphological operation with kinds of seed in user-defined shape and size. Here we choose square seed in size of 60.
Degree of Matching
With connected regions generated, the prior task is to identify the ‘Number region’. Here we introduce a factor Degree of Matching (DOM) to label the region of interest.
$$
DOM = 75 \times area/perimeter^2
$$
Where area and perimeter are properties of the connected region generated in the last step. To be mentioned, the labeling is basically empirical. Therefore, we experiment a lot of samples to find a statistical interval estimation of DOM, which is finally set as [0.8 1.2] .
Character segmentation and Background compression
There are 18 digits (0-9 and X) in each ID card, so we divide the region into 18 subareas evenly. To allow for a certain amount of angle of inclination, we extend the area of mask of region of interest by 10 pixels on both sides. For the concerns of convenience and accuracy in the following neural network training, it is necessary to compress the background in individual character image. The idea behind is quite straightforward: the original image is cropped according to the first for-ground pixel in left right top and bottom orientation. The comparison of images without and with background compression is shown below.
Neural network model building
The input matrix is the characters images cropped previously. The image of number ‘3’ is a m by n binary matrix. So the total input matrix, a 2 dimensional one, is (mXn)X6 in size. The 18 digits matrix can be produced quite similarly. In practice, m = 23 and n=35.

The target matrix is generated in the way demonstrated next. For the sake of automatic processing, we name each ID card image after its ID card Number like 12345678901234567X.jpg. The target matrix is generated according the file names. Obviously, we need to guarantee that each number from 0 to 9 and X should be trained with similar respect. A relatively large sample volume is a trustworthy solution. Here, we use 30 ID cards containing 540 characters in total. The number of neurons in hide layer is set to 50 to decrease the training error. The weights and thresholds of the neural network are stored in a model.mat file.

The bp neural network fitting is conducted by nftool toolbox and user interface and recognition result is shown here!

¶Summary

Implement a intelligent scissor.
Implement SURF Feature Tracking.
Achieve a reasonable characters segmentation
Neural network fitting helps to recognize characters in an efficient way

¶References

“Distinctive Image Features from Scale-Invariant Keypoints”, David G. Lowe. Computer Science Department, University of British Columbia
“Intelligent Scissors for Image Composition”, Eric N. Mortensen, William A. Barrett. Brigham Young University
“Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”, Martin A. Fischler and Robert C. Bolles (June 1981).
R. Brunelli, Template Matching Techniques in Computer Vision: Theory and Practice, Wiley, ISBN 978-0-470-51706-2, 2009 ([1] TM book)
Aksoy, M. S., O. Torkul, and I. H. Cedimoglu. “An industrial visual inspection system that uses inductive learning.” Journal of Intelligent Manufacturing 15.4 (August 2004): 569(6). Expanded Academic ASAP. Thomson Gale.
Kyriacou, Theocharis, Guido Bugmann, and Stanislao Lauria. “Vision-based urban navigation procedures for verbally instructed robots.” Robotics and Autonomous Systems 51.1 (April 30, 2005): 69-80. Expanded Academic ASAP. Thomson Gale.

Copyright： Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.