CÁC BÀI BÁO KHOA HỌC 05:29:32 Ngày 27/04/2024 GMT+7
Recognizing Vietnamese online handwritten separated characters

Vietnamese alphabet is based on the Latin alphabet with the addition of nine accent marks or diacritics four of them to create additional sounds, and the other five to indicate the tone of each word. Because Vietnamese is a tonal language that uses tone to distinguish words, recognizing diacritics is an important part in recognizing Vietnamese word. However, in written form, diacritics are much smaller then the characters, which make very them hard to recognize. Previous works on Vietnamese characters recognition often pre-process input with a graph-based approach by trying to separate the main characters with their diacritics by determining connected regions at pixel level. his approach, however, only works well where the input contains only characters with separable diacritics, for example, scanned image of printed documents. We propose in this paper a robust method to recognize online Vietnamese characters with diacritics. Using cosine transformation with appropriated sampling algorithms, we represent multiple strokes of a character together in a single set of features. This set of features is then used as the input for a well designed machine learning based system. We have tested our system on the combination of Vietnamese characters with diacritics and Section 1c (isolated characters) of the Unipen data set, and have obtained very competitive results. © 2008 IEEE.


 Duy K.N., The D.B.
   455.pdf    Gửi cho bạn bè
  Từ khóa : Artificial intelligence; Information technology; Learning systems; Linguistics; Mathematical transformations; Connected regions; Data sets; Graph-based; International conferences; Language processing; Machine-learning; Pixel level; Printed documents; Process inputs; Sampling algorithms; Tonal languages; Web information; Technology