Right from the very beginning, text has vital importance in
the human life. As compared to the vision based applications preference is
always given to the precise and rich information embodied in text. Considering
the importance of text, scene text recognition and detection is also equally
important in human life. In document analysis and studies of computer scene
text detection and recognition are the prime topic of research studies.
Particularly, in present times a number of recent research studies are conducted
on this topic. A substantial progress made in this field is dependent upon
these research studies. The main purpose of this research study is to conduct a
survey on this topic and study different directions. Research will focus on
identification of state-of-the art algorithms and bring into light the
up-to-date research work. Moreover, research will also predict future potential
directions of scene text detection and recognition. Somehow, research will ensure
link of resources (only publically available), online demos, and source code.
In short, present work will help out the future researchers to use this
research study and findings as basis to conduct future advanced research
studies.
Scene Text Detection and
Recognition
Introduction of Scene Text Detection and Recognition
The Vision-based
applications are the great source of information. Somehow, information becomes
more beneficial to the human beings when embodied in the text. Textual
information presented in videos and images can be studied by detecting and
recognizing text. Somehow, reading and localizing text in the natural scenes
are not quite easy tasks. It is true that text is beneficial in so many ways,
but when it comes to detect as well as recognize this text from images, the
process is not an easy one, as it has to use variety of mechanism and algorithms.
The process is not as simple as it may look, but good thing about this process
is that when it is successful it comes with great results. Regardless of its
success, some common challenges associated with the scene text detection and
recognition process is presented below [1]:
·
Complexity
of Scene Text Detection and Recognition
Complex
backgrounds are difficult to detect and recognize. Sometimes a clearly written
text in regular font on complex backgrounds becomes unreadable. Some examples
are natural scenes including bricks and grass in the background.
·
Interference
of Scene Text Detection and Recognition
Interference
factors including non-uniform illumination, noise, low resolution images, blur
backgrounds or text, as well as distortion create challenges for scene text
detection and recognition.
·
Diversity
of Scene Text Detection and Recognition
Scenes text
detection and recognition is not difficult if font styles and sizes are same in
the document. However, in the presence of different font’s sizes, fonts,
scales, and colors detection and recognition process takes time and sometimes comes
up with wrong outcomes.
·
Noise
& Distortion of Scene Text and Recognition
It has been
observed that sometimes images are blurred and there is too much a noise as
well as distortion in the image, which means that it is hard for the detection
technology to detect and recognize correct text from the images. If process has
to be more advanced in this regard that it may also detect blur words with
noise in the image, then various explorations of the process can be made, but
still, it would remain to be a big challenge.
Recently conducted research
studies have introduced several algorithms that can be useful in the field of scene
text detection and recognition [2]. Detection of
multi-oriented and diverse text is possible and evaluation of algorithm
regarding this can be selected through MSRA_TD500. Another database that
contains images with mutely oriented text and backgrounds is known as NEOCR. Basically,
in-ability or poor capability of scene detection and recognition in the
multi-oriented texts and complex background are key limitations that need to be
addressed in this research work. Present work will use survey methodology to
study algorithms and limitations of scene text detection and recognition.
Methodology on Scene Text Detection and Recognition
It
is important to describe that what methodology is going to be used for this
survey paper about scene text detection and recognition. The research
methodology selected for this paper is based on qualitative method, where
secondary research data and resources will be analyzed to collect the relevant
data. It means that existing research literature will be reviewed in the
discussion section to see that what kind of research has been done on the
topic, and what topics and areas have been covered in the previous research
studies. It is important to get view of the previous research to know that what
limitations have been there, and what can be done in future in this regard. The
research methodology is the most important part of the paper, as it determines
that how research will be conducted. In this paper, there will be no primary
research method used to collect data and paper will only be dependent on
exiting secondary resources such as journal and peer-reviewed research articles
written by various researchers in this field. The reviewed data will be used to
make conclusion as well as future recommendations that what else can be
reviewed in future research work regarding scene text detection and recognition [3].
Discussion on Scene Text Detection and Recognition
The
process varies in different ways to recognize & detect text, and major
purpose for all processes is to ensure that text in the images is correctly
read. If method is not good enough, then it will detect wrong words, and whole
process of text detection & recognition will be showing incorrect results. So,
it is important to review literature that how things have been developed in the
past and what kind of methods came with which kind of results to show that
things are going towards the right direction or not. So, next part of this
brief discussion will review various secondary resources [4].
Literature Review of Scene Text Detection and Recognition
It
is important to see that what kind of recent advances has been made in scene
text detection and recognition process. It is also vital to look at these advances
so that estimation is made about the future trends in this regard. One of the
research studies tried to explore that how things have been developing with the
passage of time. The research has found out that various kinds of challenges
are faced in the process such as variation, occlusion, distortion, complexity,
blur as well as noise. The study reviewed publically available literature from
various resources to conduct a detailed survey about recent advances in the
field. The point of survey was based on three things; first introduction of up
to date work, second identification of algorithms, which are state of the art,
and third making predictions about future research directions. The research
came up with various approaches and concluded that considerable and significant
progress has been made in this area of research, and things are looking bright
for the future [3].
Here are two images for recognition of text is given:
Fig.1 - Yao et al. Algorithm Text Detection [3]
Fig.2 – End-to-End Text Detection [3]
It was explained in a research
study that scene text can come up with rich kind of semantic information, and
this can be used in different ways in applications, which are vision based. There
have been various conventional methods for scene text detection and
recognition, and each had its own advantages and disadvantages. The research
found out after analyzing these methods that key issues faced by these methods
included multi-orientation, loss function as well as sequence labeling &
language model. There are various benchmark evaluation protocols &
datasets, which can be used to analyze the performance of the overall detection
and recognition process [5]
A research study was conducted recently in
2019, which tried to analyze Curved scene text detection by using longitudinal
& transverse sequence connection to see that how process goes with its
accuracy. It was described that it is always hard to detect a text, which is
curved. So, study developed the dataset of curved text named CTW1500, and there
were more than 10,000 actual text annotations in given 1500 images. To detect
the text, polygon-based text detector was used in the process. This method used
by the researchers came up with good results, and even some of advanced methods
were actually outperformed by this particular method. So, a new method was developed
in the research to help out the process of detection, where it becomes way easier
to detect curved text, which is generally hard to detect and recognize by other
methods in this field of technology [6]
A
study that demonstrated an end-to-end educable sensory network, which resolves
the
responsibilities in a novel integrated
framework. By the allocation of the conventional characteristic maps, we can
instantaneously educate the two prototypes in a solitary combined pipeline.
Furthermore, we calculated a completely
recognized,
which has been verified to be active and effectual. Within this method, assumed
the calculation of
,
is able to attain
around
calculation tradable.
In this outline, we mostly center on the horizontal or near-horizontal texts,
as for the upcoming task, there is a requirement to give more responsiveness in
managing the texts of several angles [7].
Resource: https://arxiv.org/pdf/1811.08611.pdf
The remaining problems
There is still a
gap amid the practical prominence and the essential performance point out that
endure to be
unexplained problems. Even though countless development has been completed,
there are still many study occasions. If compared with the performance of
on
,
is quiet
distant overdue. The development will derive not just from the resilient
feature of recognition models, but likewise, commencing from well-made
information allocation, opinion also the optimization approaches as well. The
newly advanced
has
significantly developed the feature arrangement performance by studying the
hierarchical multi-scale representations. The combination of deep learning
along with the improved division,
and also
extraordinary command language prototypes could extra increase the performance [8].
Resource:
Conclusion/Future
Work on Scene Text Detection and Recognition
After
analyzing various types of information and secondary resources for this paper,
it can be concluded in the end that scene text detection and recognition has
been one of the most significant technologies the recent era, and need for
vision-based application is increasing with the passage of time. So, the
process has been developed over time, and it has showed various advanced trends
with some limitations as well. It means that things have been going in right
direction, but future research needs to work on the challenges faced by the
process more so that these challenges can be met to mitigate issues in text
detection and recognition. The future researchers can look to alter exiting
methods, and develop new ones which can deal with every possible challenge, and
technology becomes more advanced to detect and recognize any kind of data in
the images.
References of Scene Text Detection and Recognition
[1] | C. Yao and X. Bai, "Scene text
detection and recognition: recent advances and future trends," Frontiers
of Computer Science, vol. 10, no. 1, pp. 19-36, 2016. |
[2] | S. Long, X. He and C. Ya,
"Scene Text Detection and Recognition: The Deep Learning Era,"
2018. |
[3] | Y. Zhu, C. Yao and X. Bai,
"Scene text detection and recognition: recent advances and future
trends," Frontiers of Computer Science (print), vol. 10, no. 1,
2015. |
[4] | L. Neumann and J. Matas,
"Real-time lexicon-free scene text localization and recognition,"
IEEE transactions on pattern analysis and machine intelligence , pp.
1872-1885, 2015. |
[5] | H. Lin, P. Yang and F. Zhang,
"Review of Scene Text Detection and Recognition," Archives of
Computational Methods in Engineering, pp. 1-22, 2019. |
[6] | Y. Liu, L. Jin, S. Zhang, C. Luo and
S. Zhang, "Curved scene text detection via transverse and longitudinal
sequence connection," Pattern Recognition, vol. 90, pp.
337-345, 2019. |
[7] | W. Sui, Q. Zhang, J. Yang and W.
Chu, "A Novel Integrated Framework for Learning both Text Detection
and Recognition," 2018 24th International Conference on Pattern
Recognition (ICPR), pp. 2233-2238, 2018. |
[8] | Q. Ye and D. Doermann, "Text
detection and recognition in imagery: A survey," IEEE transactions
on pattern analysis and machine intelligence, pp. 1480-1500, 2014. |