Notices
Home   >  News & Events  >   Notices  >  
Information on Doctoral thesis of Fellows Nguyen Quang Trung

1. Full name: Nguyen Quang Trung                                 2. Sex: Male

3. Date of birth: 10th November, 1978                            4. Place of birth: Ha Noi

5. Admission decision number: 3451/SĐH Dated: 26th November 2010 by the President of VNU, Ha Noi.

6. Changes in academic process: No

7. Official thesis title: A spectrogram based approach for speech perception

8. Major: Information Technology                                               9. Code: 62 48 01 01

10. Supervisors: Associate Professor Dr. Bui The Duy

11. Summary of the new findings of the thesis:

- Proposed an approach that uses LNBNN classifier in combination with SIFT-SPEECH features for the speech perception problem. The proposed approach allows adding training samples without retraining after the training phase. This saves training time which is suitable for big data. Another advantage of this method is that feature vectors do not need to be quantized. Such quality of input features is not diminished. It contributes to improve the quality of classification. In the experiments figure out, the proposed approach performs well in a speech perception system. Classification of the speech signal based on combination of LNBNN and SIFT features gives better results than a combination of LNBNN and other features.

- Proposed an approach for speech perception by using a deep network model that is convolution neural network  based on power spectrogram and frequency spectrogram of speech signal in the mapping speech signal to given concept aspect.

- Proposes an approach that directly learns the relationship between a speech signal and an image without knowing the meaning of the speech signal or the image and proposed a model of speech perception that simulate the speech perception process in the associative cortex by learning the map between a speech signals to an image signal by using convolution neural network to map the frequency spectrogram of speech to image. After being trained, the model could recall an image for a new input speech signal.

-Proposed a method to reduce the size of SIFT features before using LNBNN classification while conserving most accuracy rate. Each component of SIFT features are quantized into binary values based on their medians. This leads to reduce memory store SIFT from 128 bytes to 16 bytes equal 8 time reduction of storage and significant reduce running time in classification phase.

- Proposed the implementation of the LNBNN on a Hadoop framework. This was done to remove the limitation of computational capability by having a cluster of systems working together. This not only eliminates the limitation of computational demands but also speeds up the processing time by having more than one computer interconnected over a network working on the given task.

12. Practical applicability, if any: The significance remarks of this thesis may be used as important references for robot communication as well as robot control.

13. Further research directions, if any:

- Building an relevant data corpus that is large enough to enhance the accuracy of the speech perception model through learning the relationship between the speech signal and other signals.

 - Research to improve speech perception model so that after training, the model will recall (synthesize the image) the image corresponding to the input speech signal, as well as recall the speech signal (synthesized speech) from the image signal.

- Adding more information that  simulating other human sensory organs into the model of speech perception model.

- Research on the improvement of speech feature extractiong to improve the quality of the speech perception model.

- Research on the application of speech perception model into the field of robot control.

14. Thesis-related publications:

[1] Quang Trung, Nguyễn; Thế Duy, Bùi; Thị Châu, Ma; 2015, An Image based approach for speech perception, (2015) 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science, Springer, 208 – 213.

[2] Quang Trung, Nguyen; The Duy, Bui; (2016), Speech classification using SIFT features on spectrogram images, Vietnam Journal of Computer Science, 3(4), 247-257.

[3] The Duy, Bui; Quang Trung, Nguyen; Speech classification by using binary quantized SIFT features of signal spectrogram images, (2016), 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science, IEEE

[4] Quang Trung, Nguyen; The Duy, Bui; (2016), MapReduce based for speech classification SoICT '16: Proceedings of the Seventh Symposium on Information and Communication Technology, ACM

[5] The Duy, Bui; Quang Trung, Nguyen; (2016), Learning relationshipbetween speech and image, The 8th International Conference on Knowledge and Systems Engineering (KSE) 2016, IEEE, 103-108.

[6] Quang Trung, Nguyen; The Duy, Bui;  (2018),  Speech perception based on mapping speech to image by using convolution neural network, The 5th NAFOSTED Conference on Information and Computer Science, NICS 2018, IEEE.

 

 VNU UET
  Print     Send