Monday, April 1, 2019

Smart Music Player Integrating Facial Emotion Recognition

Smart medication imposter Integrating seventh cranial nerve emotion realisationSmart medicament participant Integrating Facial Emotion Recognition and medication Mood familyification1Shlok Gilda, 2Husain Zafar, 3Chintan Soni, 4Kshitija WaghurdekarDepartment of Computer Engineering, Pune Institute of Computer Technology, Pune, IndiaAbstract Songs, as a medium, have always been a usual choice to depict kind-hearted feelings. Reli able-bodied perception based categorizeification systems can go a long way in facilitating this. However, research in the field of emotion based medicament classification has not yielded optimal results. In this paper, we fo low an affective cross-platform music histrion, EMP, which suggests music based on the real- condemnation sensory system of the physical exertionr. EMP provides smart caprice based music good word by incorporating the capabilities of emotion context reasoning indoors our adaptive music recommendation system. Our mu sic p mould contains three modules Emotion faculty, Music classification Module and testimonial Module. The Emotion Module takes an double of the exploiter as an excitant and pay backs use of deep scholarship algorithms to identify the liking of the exploiter with an trueness of 90.23%. The Music folkification Module makes use of strait traits to achieve a scarce result of 97.69% time classifying birdsongs into 4 different mood classes. The tribute Module suggests songs to the user by map the emotion of the user to the mood of the song, pickings into consideration the preferences of the user.Keywords-Recommender systems, Emotion recognition, Music information retrieval, Artificial nervous lucres, Multi- class unquiet communicate.I. Introduction topical research in the field of music psychological science has shown that music induces a clear emotional response in its listeners1. musical preferences have been demonstrated to be exceedingly correlated with persona lity traits and moods. The meter, timber, euphony and legal transfer of music argon managed in atomic number 18as of the brain that deal with emotions and mood2.Undoubtedly, a users affective response to a music fragment depends on a intumescent pit of external factors, such as gender, age3, culture4, preferences, emotion and context5 (e.g. time of day or location). However, these external variables set aside, humans argon able to consistently categorize songs as being happy, sad, enthusiastic or relaxed. up-to-date research in emotion based recommender systems focuses on two chief(prenominal) aspects, lyrics612 and speech sound gasconades7. Acknowledging the language barrier, we focus on audio feature inception and analysis in order to map those features to four basic moods. reflex(a) music classification exploitation some mood categories yields promising results.Expressions are the around ancient and natural way of conveying emotions, moods and feelings. The nervus facialis chemical formula would categorize in 4 different emotions, viz. happy, sad, angry and neutral.The principal(prenominal) objective of this paper is to design a cost-effective music player which automatically generates a sentiment aware playlist based on the emotional present of the user. The application designed requires slight memory and less computational time. The emotion module determines the emotion of the user. Relevant and faultfinding audio information from a song is extracted by the music classification module. The recommendation module combines the results of the emotion module and the music classification module to recommend songs to the user. This system provides significantly better verity and performance than existing systems.II. colligate WorksVarious regularityologies have been proposed to classify the behaviour and emotional state of the user. Mase et al. foc apply on using movements of facial muscles8 while Tian et al.9 attempted to agnise put thr oughs Units (AU) developed by Ekman and Friesen in 197810 using permanent and transient facial features. With evolving ruleologies, the use of Convolutional neural Networks (CNNs) for emotion recognition has become increasingly popular11.Music has been classified using lyrical analysis612. While this tokenized method is relatively easier to implement, on its own, it is not suitable to classify songs accurately. Another obvious concern with this method is the language barrier which restricts classification to a single language.Another method for music mood classification is using acoustic features like tempo, pitch and rhythm to identify the sentiment conveyed by the song. This method involves extracting a set of features and using those feature vectors to find patterns characteristic to a specific mood.III. Emotion ModuleIn this section, we study the usage of convolutional neural networks (CNNs) to emotion recognition1314. CNNs are known to simulate the human brain when analyzing visuals however, given the computational requirements and complexity of a CNN, optimizing a network for efficient computation is necessary. Thus, a CNN is utilize to construct a computational present which successfull classifies emotion in 4 moods, namely, happy, sad, angry and neutral, with an accuracy of 90.23%.A. Dataset DescriptionThe entropyset we used for schooling the modelling is from a Kaggle Facial Expression Recognition Challenge, FER201315. The data consists of 4848 pixel grayscale images of faces. apiece of the faces are organized into one of the 7 emotion classes angry, disgust, fear, happy, sad, surprise, and neutral. For this research, we have make use of 4 emotions angry, happy, sad and neutral. There is a total of 26,217 images equivalent to these emotions. The breakdown of the images is as follows happy with 8989 samples, sad with 6077 samples, neutral with 6198 samples, angry with 4953 samples.B. perplex DescriptionA multi-layered convolutional neural ne twork is programmed to evaluate the features of the user image1617. The convolutional neural network contains an input layer, some convolutional layers, ReLU layers, pooling layers, and some dense layers (aka. fully- attached layers), and an make layer. These layers are linearly stacked in sequence.1) Input Layer The input layer has fixed and predetermined dimensions. So, for pre-processing the image, we used OpenCV for face detection in the image before feeding the image into the layer. Pre- educate filters from Haar Cascades along with Adaboost are used to quickly find and crop the face. The cropped face is then converted into grayscale and resized to 48-by-48 pixels. This dance step greatly reduces the dimensions from (3, 48, 48) (RGB) to (1, 48, 48) (grayscale) which can be easily fed into the input layer as a numpy array.2) Convolutional LayersA set of unique kernels (or feature detectors), with promiscuously generated weights, are specified as one of the hyperparameters in the Convolution2D layer. Each feature detector is a (3, 3) receptive field, which slides across the original image and computes a feature map. Convolution generates different feature maps for the same input image. clear-cut filters are used to perform operations that represent how pixel set are enhanced, for example, blur and edge detection. Filters are applied successively all over the entire image, creating a set of feature maps. In our neural network, from each one convolutional layer generates 128 feature maps. ascertain Linear Unit (ReLU) has been used after each convolution operation. subsequently a set of convolutional layers, a popular pooling method, MaxPooling, was used to reduce the dimensionality of each feature map, all the while retaining the critical information. We used (2, 2) windows which consider only the maximum pixel values within the window from the feature map. The pooled pixels form an image with dimensions reduced by 4. Rectified Linear Unit (ReLU) ha s been used after every convolution operation.3) grievous LayersThe output from the convolutional and pooling layers represent eminent-level features of the input image. The dense layer uses these features for classifying the input image into unlike classes. The features are transformed through the layers which are connected with trainable weights. The network is trained by forward propagation of training data and then backward propagation of its errors. Our model uses 2 sequential fully connected layers. The network generalizes tumesce to new images and is able to gradually make adjustments until the errors are minimized. A dropout of 20% was applied in order to sustain overfitting of the training data. This helped us control the models sensitivity to noise during training while maintaining the necessary complexity of the computer architecture.4) Output LayerWe used softmax as the energizing function at the output layer of the dense layer. Thus, the output is be as a probabil ity distribution for each emotion class. Models with various combinations of hyper-parameters were trained and evaluated utilizing a 4 GiB DDR3 NVIDIA 840M graphics card using the NVIDIA CUDA Deep Neural Network library (cuDNN). This greatly reduced training time and increased efficiency in tuning the model. Ultimately, our network architecture consisted of 9 convolutional layers with one max-pooling after every three convolution layers followed by 2 dense layers, as seen in Figure 1.C. ResultsThe final network was trained on 20973 images and tested on 5244 images. At the end, the model achieved an accuracy of 90.23%. Table 1 displays the wonder matrix for the module.Evidently, the system performs very sanitary in classifying images belonging to the angry category. We also note interesting results under happy and sad category owing to the remarkable differences in Action Units as mentioned by Ekman11. The F-measure of this system comes out to be 90.12%.IV. Music Classification Mo duleIn this section, we describe the procedure that was used to identify the procedure of each song with its mood. We extracted the acoustic features of the songs using LibROSA18, aubiopitch19 and other state-of-the art audio extraction algorithms. Based on these features, we trained an artificial neural network which successfully classifies the songs in 4 classes with an accuracy of 92.05%. The classification process is depict in Figure 2.A.Dataset DescriptionThe dataset comprises of 390 songs spread across four moods. The distribution of the songs is as follows class A with 100 songs, class B with 93 songs, class C with 100 songs and class D with 97 songs. The songs were manually labelled and the class labels were verified by 10 paid subjects. Class A comprises of exciting and energetic songs, class B has happy and jocund songs, class C consists of sad and melancholy songs, and class D has sedate and relaxed songs.1) Preprocessing All the songs were down sampled to a uniform b it-rate of 128 kbps, a mono audio channel and resampled at a sampling frequency of 44100 Hz. We further intermit each song to obtain clips that contained the roughly meaningful parts of the song. The feature vectors were then standardized so that it had zero mean and a social unit variance.2) Feature Description We identified several mood sensitive audio features by reading current works20 and the results from the 2007 MIREX sound Mood Classification task2122.The chance features for the extraction process belonged to different classes spectral (RMSE, centroid, rolloff, MFCC, kurtosis, etc.), rhythmic (tempo, adhere spectrum, etc.), refreshful mode and pitch. All these descriptions are standard. All the features were extracted using Python 2.7 and relevant packages1819.After identifying all the features, we used Recursive Feature Elimination (or RFE) to select those features that outperform contribute to the accuracy of the model. RFE works by recursively removing attributes a nd building a model on those attributes that remain. It uses the model accuracy to identify which attributes (and combination of attributes) contribute the most to predicting the target attribute. The selected features were pitch, spectral rolloff, mel-frequency cepstral coefficients, tempo, root mean square energy, spectral centroid, beat spectrum, zero-cross rate, short-time Fourier transform and kurtosis of the songs.B. Model DescriptionA multi-layered neural network was trained to evaluate the mood associated with the song. The network contains an input layer, multiple hidden layers and a dense output layer.The input layer has fixed and predetermined dimensions. It takes the 10 feature vectors as input and uses ReLU operation to provide non-linearity to the dataset. This ensured that the model performs strong in real-world scenarios as well.The hidden layer is a traditional multi-layer perceptron, which allowed us to make combination of features which led to a better classifica tion accuracy. The output layer used a softmax activation function which produces the output as a probability for each mood class.C. ResultsWe achieved an overall classification accuracy of 97.69% and F1 score of 97.692% after 10-fold cross-validation using our neural network. Table 2 displays the confusion matrix.Undoubtedly, the level of performance of the music classification module is exceptionally highschool.V. tribute ModuleThis module is responsible for generating a playlist of relevant songs for the user. It allows the user to modify the playlist based on her/his preferences and modify the class labels of the songs as well. The working of the recommendation module is explained in Figure 3.A. Mapping and playlist GenerationClassified songs are mapped to the users mood. This mapping is as shown in cast 1. The system was developed after referring to the Russell 2-D Valence-Arousal Model and Geneva Emotion Wheel.After the mapping procedure is complete, a playlist of relevant songs is generated. Similar songs are grouped together while generating the playlist. Similarity between songs was calculated by examine songs over 50ms intervals, centered on each 10ms time window. After data-based observations, we found that the duration of these intervals is on the order of magnitude of a characteristic song note. Cosine distance function was used to determine the similarity between audio rouses. Feature values corresponding to an audio file were compared to the values (for the same features) corresponding to audio files belonging to the same class label. The recommendation engine has a twofold mechanism it recommends songs based on1. Users perceived mood.2. Users preference.Initially, a playlist of all songs belonging to the detail class is generated. The user can mark a song as preferent depending on her/his choice. A favorite song will be depute a higher priority value in the playlist. Also, the interpretation of the mood of a song can vary from perso n to person. Understanding this, the user is allowed to change the class label of the songs according to their taste of music.B. Adaptive Music PlayerWe were able to implement an adaptive music player by the use of a very popular online appliance learning algorithm, stochastic Gradient Descent (SGD)23. If the user wants to change the class of a particular song, SGD is implemented considering the new label for that specific user only.Multiple single-pass algorithms were canvas for their performance with our system but SGD performed most efficiently considering the real-time genius of the music player. Parameter updates in SGD occur after processing of every training example from the dataset. This approach yields two advantages over the batch gradient descent algorithm. Firstly, time required for calculating the cost and gradient for large datasets is reduced. Secondly, integration of new data or amendment of existing data is easier. The frequent, highly variant updates demand the learning rate to be smaller as compared to that of batch gradient descent23.VI. ConclusionThe results obtained above are very promising. The high accuracy of the application and quick response time makes it suitable for most practical purposes. The music classification module in particular, performs significantly well. Remarkably, it achieves high accuracy in the angry category it also performs specifically well for the happy and calm categories. Thus, EMP reduces user efforts for generating playlists. It efficiently maps the user emotion to the song class with an excellent overall accuracy, thus achieving optimistic results for 4 moods.References1 Swathi Swaminathan, E. Glenn Schellenberg. Current Emotion Research in Music Psychology, Emotion Review Vol. 7, no 2, pp. 189-197, April 20152 How music changes your mood, Examined Existence. Online. Available http//examinedexistence.com/how-music-changes-your-mood/. Accessed Jan. 13, 20173 Kyogu Lee and Minsu Cho. Mood Classification from Musical Audio Using User Group-dependent Models.4 Daniel Wolff, Tillman Weyde and Andrew MacFarlane. Culture-aware Music Recommendation5 Mirim Lee, Jun-Dong Cho. Logmusic Context-Based Social Music Recommendation Service on Mobile Device, Ubicomp 14 Adjunct, September 13-17, 2014, Seattle, WA, USA.6 D. Gossi and M. H. Gunes, Lyric-based music recommendation, in Studies in Computational Intelligence. Springer Nature, 2016, pp. 301-310.7 Bo Shao, Dingding Wang, Tao Li, and Mitsunori Ogihara. Music Recommendation Based on Acoustic Features and User Access Patterns, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, no 8, NOVEMBER 20098 Mase K. Recognition of facial expression from optical flow. IEICE Transc., E. 74(10)3474-3483, 0ctober 1991.9 Tian, Ying-li, Kanade, T. and Cohn, J. Recognizing Lower. Face Action Units for Facial Expression Analysis. Proceedings of the 4th IEEE transnational Conference on automatic rifle Face and Gesture Recognition (FG00), Ma rch, 2000, pp. 484 490.10 Ekman, P., Friesen, W. V. Facial Action Coding remains A Technique for Measurement of Facial Movement. Consulting Psychologists Press Palo Alto, California, 1978.11 Gil Levi and Tal Hassner, Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns12 E. E. P. Myint and M. Pwint, An approach for mulit-label music mood classification, 2010 2nd International Conference on Signal touch Systems, Dalian, 2010, pp. V1-290-V1-294.13 slam Burkert, Felix Trier, Muhammad Zeshan Afzal, Andreas Dengel and Marcus Liwicki. DeXpression Deep Convolutional Neural Network for Expression Recognition14 Ujjwalkarn, An intuitive translation of Convolutional neural networks, the data science blog, 2016. Online. Available https//ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/. Accessed Jan. 13, 2017.15 Ian J. Goodfellow et al., Challenges in type Learning A report on three machine learning contests16 S. Lawrence, C. L. Giles, Ah Chu ng Tsoi and A. D. Back, Face recognition a convolutional neural-network approach, in IEEE Transactions on Neural Networks, vol. 8, no. 1, pp. 98-113, Jan 1997.17 A. Koakowska, A. Landowska, M. Szwoch, W. Szwoch, and M. R. Wrobel, Human-Computer Systems Interaction Back-grounds and Applications 3, ch. Emotion Recognition and Its Applications, pp. 51-62. Cham Springer International Publishing, 2014.18 Brian McFee, ., Matt McVicar, ., Colin Raffel, ., Dawen Liang, ., Oriol Nieto, ., Eric Battenberg, ., Adrian Holovaty, . (2015). librosa 0.4.1 Data set. Zenodo. http//doi.org/10.5281/zenodo.3219319 The aubio team, Aubio, a library for audio labelling, 2003. Online. Available http//aubio.org/. Accessed Jan. 13, 2017.20 E. E. P. Myint and M. Pwint, An approach for mulit-label music mood classification, 2010 2nd International Conference on Signal Processing Systems, Dalian, 2010, pp. V1-290-V1-294.21 J. S. Downie. The music information retrieval evaluation exchange (mirex). D-Lib Magazine, 12(12), 2006.22 Cyril Laurier, Perfecto Herrera, M Mandel and D Ellis,Audio music mood classification using support vector machine23 Unsupervised feature learning and deep learning Tutorial, Online. Available http//ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/. Accessed Jan. 13, 2017

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.