Xiaohua Zhang
Hiroshima Institute of Technology, Hiroshima, Japan
Ning Xie
Tongji University, Shanghai, China
Masayuki Nakajima
Kanagawa Insitute of Technology, Atsuki, Japan / Uppsala University, Visby, Gotland, Sweden
Masaki Hayashi
Uppsala University, Visby, Gotland, Sweden
Steven Bachelder
Uppsala University, Visby, Gotland, Sweden
Download articlePublished in: Proceedings of SIGRAD 2016, May 23rd and 24th, Visby, Sweden
Linköping Electronic Conference Proceedings 127:11, p. 55-56
Published: 2016-05-30
ISBN: 978-91-7685-731-1
ISSN: 1650-3686 (print), 1650-3740 (online)
A clean document image is necessary for many tasks such as page layout analysis, OCR system and so on. In case of poor document image, these tasks may not be finished with satisfaction. This paper proposes a simple
approach for extracting text lines and segmenting image regions from a textual and non-textual regions mixed color document image with uneven shading. The algorithm removes the uneven shading first and then computes
Canny edges. By analyzing connected components heuristically, the text regions and image regions contained in the given document image are extracted and segmented. Finally a clean document image is obtained using a trivial
synthesis method. Our experimental results demonstrate that the proposed approach performs plausible.
[LM71] LAND E., MCCANN J.: Lightness and retinex theory,. Journal of the Optical Society of America 61 (1971), 1–11. 1
[SP00] SAUVOLA J., PIETIKAINEN M.: Adaptive document image binarization. Pattern Recognition 33, 2 (Feb. 2000), 225–236. 2
[ZENM13] ZIRARI F., ENNAJI A., NICOLAS S., MAMMASS D.: A document image segmentation system using analysis of connected components. In Proc. of 12th ICDAR (2013). 1
[ZXHX16] ZHANG X., XIE N., HUANG H., XIN Y.: Fast restoration of text strokes from a document image with uneven shading. In Proc. of 19th IWAIT (2016). 1