The summary of the article is
about the Facial Emotion Recognition, which is founded on the Visual information’s.
FER is stood for “Facial Emotion
Recognition” that is a very important topic , for the computer vision
fields along with artificial intelligence, which is owning the significant commercial
plus academic potential. By using the multiple sensor s, the FER is conducted,
where this summary focused on the different studies which are exclusively using
the facial images due to the visual expression, where the main information of the
channel is the interpersonal communications. This article provides different
researches in FER, which is conducted in the past. The Conventional FER approaches,
which is explained by the summary of the FER systems with the main algorithms.
By using the deep networks, the deep learning based on the FER approaches which
are enabling “end-to-end” for the knowledge. A focused of the article is up-to-date
of the hybrid deep learning (HDL) approaches which are combining the
convolution neural network (CNN) of the spatial features for the individuals
frame along with the short term memory of temporal features in the consecutive
frames.
The article introduction include the
various topics as given below;
The terminology of the FER
Contributions in FER
Organization of FER
Whereas in the Terminology of the
FER, the systems of the FER is founded on the facial muscle, which variations a
characterize of the facial actions that rapid the person's human emotions. FACS
stands for “facial action coding system”
which encode the movement of the specific facial muscles is known as the actions
units. Now the FLs (facial landmarks) which is the visually salient points for
the different regions at the end of the nose, end of the eyebrows, as well as
the mouth. There are seven basic emotions in the human, surprise, sadness, happiness,
anger, fear, neutral as well as disgust. Contribution of the FER is focused on as
long as the all-purpose understanding like state of an art for a approaches of FER
along with to helping the new researches which are important component along
with the trend of the FER fields. There are different standards of the database,
which still involves the video sequences as well as the images for a FER that
is used to introduce the characteristics and purpose. The deep learning based
FER, as well as the Conventional FER is explained in the term of the resources requirements
plus the accuracy. Then the explanation is about the review of the FER organization,
which is divided into the different sections like as given below
Convectional FER approaches
Advanced FER approaches
A brief review of FER publicly
Concluding marks and Discussion
of FER
Now the discussion is about the Conventional
approaches of the FER is noticing a face region along with to removing the geometric
features, with the entrance features of the mixture geometric of the target of
the face. The relationship among the facial component, for a geometric feature,
issued to construct the feature of the vector for the training. There is two
types of geometric features which is based on an angle as well as positions of the
landmarks in a frame that is calculated. The features appearances are normally
extracted for the global face region, for the different region of the face,
which is containing the information. The combined geometric, as well as the
appearance of features, for the hybrid features, is used for various approaches.
Now the Deep learning based
approaches which are used nowadays, and it has been the breakthrough of the algorithm
of deep learning which is applied in the computer vision field involving the
CNN , as well as the RNN( recurrent of neural network). The algorithm of the deep
learning based is used for classification, extraction as well as recognition
task. Benefit of the CNN is to eliminate the highly reduced dependences models
for the pre-processing methods through the enabling of “end-to-end” which is knowledge
directly from an input.
There are three types of a heterogeneous
layer of the CNN is;
Convolution Layer
Max pooling layer
Fully connected layer
In the Convolution layer , take
the image or eye a maps as input , then convolve these images by the sets of
the of a filter bank to sliding the window manner as the output features of the
maps which are represented the spatial preparation of a facial image. The max-pooling
layers lower the spatial resolutions for the representation through the
averaging the subsampling layer, which gives the input of the future map. The
deep learning did not adopt CNN directly for the detection of the AU. A
recurrent neural networks have the chain like, for the recapping modules of
neural networks. A Conventional FER
approaches, which is explained by the summary of the FER systems with the main
algorithms. By using the deep networks, the deep learning based on the FER
approaches which are enabling “end-to-end” for the education.
A remaining discussion is about
the database of FER, whereas in a FER field, there is frequent database which
is used for the extension as well as comparative experiments. The human facial
expression is studied by using the 2D static image with the 2D video sequences.
The large pose of the variation is a 2D based analysis which is difficult to handling.
At last, the article concludes
that a brief review of the FER approaches, which is explained the approaches
that are divided into the mainstream. There are three frames of the Conventional
FER approaches like as;
Facial component detection
Feature extraction
Expression classification
In the conventional FER, the organization
algorithm is used, involves the Adaboost, SVM, as well as the random forest
through the constant of the deep learning which is based on the FER methods
that reduce a dependence of the face-physical which is based on the model. The
pre-processing techniques which are enabling thought the “end-to-end” for
learning in a pipeline which is directly from an inputs images. CNN is the specific
type of the deep learning which visualize an input images to comprehend a model
by the different datasets of FER along with to establishes a competence for the
networks which is qualified the emotion detection across a FER and the datasets.
The deep learning based FER, as well as the Conventional FER is explained in
the term of the resources requirements plus the accuracy.
In the facial components, the
methods of the FER is not reflecting the temporal variations which are based on
the CNN, and the hybrid approaches are proposed through the combining of the
CNN for three-dimensional features in the separate frames. For a chronological features of a “learning,
short term memory” (LSTM) is included in the consecutive frames. By the different
studies, the analysis of the hybrid CNN-LSTM is the construction of the facial
expression, which outperforms of the functional CNN methods by means of the
temporal averaging of the accumulation. There are a number of limitation that
is involved in deep learning, which is based on the FER approaches. The
approaches of the FER, evaluations of the metrics of the FER approaches is crucial
due to the quantitative comparisons. The
features appearances are normally extracted for the global face region, for the
different region of the face, which is containing the information.
In this article, the hybrid
architecture which is presented the superior performances for the micro-expressions,
where the task challenges are remains to solve the subtle as well as
spontaneous facial movements which are occurred involuntarily. The database
which is related to the FER that consist of the images, as well as a video
sequence, is briefly introduced in this article. The human facial expression is
studied in the traditional datasets by using the 2D video sequences as well as
static 2D images. Thus it is due to the 2D based analysis which is very difficult
to handle the large changes in the subtle as well as spontaneous facial, and
recent datasets that are supposed for the 3D facial expression which is better
to enable an examination for a fine mechanical changes is characteristic for the
unprompted expression.
Summing up all the discussion, moreover
the evaluation of the metric of the FER, which is based on the different
approaches, and it introduced to produce the standard metrics of comparison.
The evaluation metrics which is evaluated in the recognition field along with the
precision which recalls the, and it is mainly used. The method of the new
evaluation on behalf of the recognizing consecutive, for the facial expression
which is applied for the micro recognition expression to move the images. In
the past the FER studied is conducted, and the FER performance is significantly
improved by the algorithm of deep learning combination which is combined and
developed through the additional internet of things (IoT) sensors in future. An
expected result of the FER is improved for the current rate of recognition
involving the spontaneous micro-expression for a same level as human beings (Ko).
Ko, Byoung Chul . "A Brief Review of Facial
Emotion Recognition Based on Visual Information." Sensors 2018
401.18 (2018).