Insights, ML and AI

Enhancing Optical Character Recognition with TensorFlow

Kuzushiji strips of handwritten texts in Japanese

Last time SOLVVE talked about the business applications of TensorFlow, one of the most popular machine learning tools these days. There you can find example use cases for each of its functions. However, this time SOLVVE would like to talk about cases that stand at the intersection of different TensorFlow capabilities. Namely, computer vision in the form of optical character recognition, natural language processing, and how TensorFlow pushed their productivity to a new level.

How does optical character recognition technology work?

Such a technology, called OCR for short, solves the problem of recognition and transformation of printed or handwritten symbols for digitalization purposes. It has been in use for quite a long time now. Thus, many of us benefit from these systems every day without realising what kind of technology they are using.

Code scanners, file converters, and other processing tools are based on the ability to make certain sense of the text and images through varieties of approaches. The simplest way to describe the main principle of OCR is as follows.

First comes image acquisition. At this stage you feed your OCR system a document by scanning it, taking a picture of it, or in some other way. Initially software sees your document as an image and cannot tell what it depicts.

Next step is preprocessing. At this stage, software tries to reduce the noise and distortions on the image. For example, some areas can be too bright or too dark. Or, if you took a picture, it might be blurred or tilted. The better preprocessing, the higher chances of successful recognition.

Preprocessing goes hand-in-hand with segmentation and feature recognition. Segmentation helps to break down the layout of the image into different areas where we can expect to find certain types of information. For example, in eSports this technique is used to recognise scores on screenshots or even on the live video during streaming

Each image and each segment contains way too much information. Processing all of that is the waste of time and resources. That is why it is important to work only with crucial data. The OCR system focuses on enhancing features that bear such data and discards everything else. For example, the system might be looking for particular shapes or color saturation level.

When the OCR system finds prominent features, it classifies them in a predefined manner. For example, after identifying lighter and darker areas, the system marks former as the background and later as the text. Depending on what your OCR system can do, it will mark and classify not only text and background, but also pictures, tables, signatures, etc. 

Afterwards it will work with the designated classes. This is where an actual recognition happens. In case of the text, the system splits the segment into lines, words, and letters or other characters. Since their number is limited, the system can match what it “sees” to the database of the available valid options and make decisions on how to interpret the result.
Later, the system moves to post-processing. At this stage, the system removes possible errors and noise in the digitised or converted document. This may include scanning documents for specific terms or symbols.

Benefits and applications of OCR systems

OCR systems bring everything that is good about digital systems closer to businesses. Recognized data is easy to search if you need to find a particular document in your archive or a specific clause in a long contract. You can easily change and update recognized documents in a convenient format as soon as your business needs it. Digital archives are easy to store, access, transfer or back up.

All of these are actively used across industries and have probably encountered one or several of them relying on OCR in your daily life. Here are some popular examples of how OCR systems make our life more comfortable.

Parking surveillance and licence plate validation. Government institutions can use simple mobile devices or surveillance systems to capture and process licence plates on the cars. This information can help to identify drivers who speed, run red lights, park in prohibited areas or break laws in any other way.

Document management. OCR systems help to optimize and automate document workflow in a paper-intensive organisation. For example, electronic health records can benefit from OCRs when it is necessary to digitize old handwritten clinical records or prescriptions. Or you can boost your enterprise resource planning system with OCR to process labels and invoices. The same applies to legal, insurance, or banking institutions that heavily rely on accurate and in-time document flow.

Code scanning. Besides barcodes and QR codes – that are in fact also part of the optical recognition process, OCR technologies make it possible to work with items encoded with letters and numbers like IBAN for international banking or ISBN for printed products.

However, OCR systems have their shortcomings. Most of the benefits apply to the processed and recognized information while most of the inconveniences happen before or during the recognition process.

Limitations of the OCR systems

There are several. However, all of them reside under the umbrella of templates. First OCR systems were made to work with predefined types of documents having a set template layout. They were very inflexible and expected coherency and conformity. If you move something around in your template, the accuracy of the OCR system drops instantly because things are misplaced. It is also necessary to manually check or correct results after the recognition.

Nevertheless, changes are inevitable. You might get some handwritten notes on a bill or images in your report. You might get differently structured invoices from different partners. Today, businesses expect adaptability of their OCR systems to accommodate variations in document structure as well as high level of precision.

TensorFlow for Intelligent Character Recognition

These requirements are possible to achieve by applying machine learning to OCR systems. TesnsorFlow became handy due to its image processing and natural language processing capabilities. It has united the OCR approach with the neural networks. The combination of two helped to deliver intelligent character recognition (ICR) that does not rely on templates. It is completely AI-driven and can work with a vast spectrum of images. And thus, its greatest value lies in the ability to deal with recognition of handwriting.

The latest push in handwriting recognition was made by Center For Open Data in the Humanities. Train Clanuwat, one of the researchers, applied TensorFlow to Japanese texts written between the 8th and the early 20th century in a Kuzushiji style in an attempt to decode these writings.

The difficulty with these texts lies in a number and variety of handwritten characters used in them. As the writing style and the set of used characters changed over the centuries, reading and understanding became more and more challenging. Sometimes characters are overlapping or connected in an unusual way. While there are some 300 core characters that are used more often than the others, there are thousands of them in total in the dictionaries. Currently, only scholars specially trained in reading the script are able to decode ancient texts. That is less than 0.01% of Japan’s population.

TensorFlow-driven algorithm helps to bring classical literature, historical documents and other records closer to the general public as well as assists scholars in interpretation of these texts and past events. It applies not only to Japanese script, but to any writing system that is challenging to read nowadays, like German Fraktur.

To Conclude

Optical character recognition has become an integral part of our lives. Many businesses are either solely based on this technology or use it as a part of their business model. OCR systems helped to take a huge step forward in optimizing and smoothing of business operations.

The combination of OCR and machine learning with TensorFlow made it possible to take a step further and revive old texts through intelligent character recognition. This breakthrough sparked both hopes of the researchers for new discoveries and the interest of the general audience. Languages and texts that were considered unreadable and eligible are now available for readers once again.

However, you do not have to deal with complicated issues like decoding ancient writings to use these technologies. OCR is available for everyday business use and makes work with documents without templates and handwriting as easy as possible today. Thus, if you have any ideas or questions about the use of OCR or TensorFlow for your projects, do not hesitate to contact us. Let us make it happen!