Identifying the Dominant Language of Web Page Using Supervised N-grams

Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknow...

Full description

Saved in:
Bibliographic Details
Main Authors: Ng, Choon-Ching, Siau-Chuin, Liew, Wan Muhammad Syahrir, Wan Hussin, Tutut, Herawan
Format: Article
Published: Conference Publishing Services (CPS) 2013
Subjects:
Online Access:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516378
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6516378
http://umpir.ump.edu.my/6869/1/dentifying_the_Dominant_Language_of_Web_Page_Using_Supervised_N-grams.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Natural language processing is an emerging technology in linguistic industry and an aid to human-computer interaction in computer science. Language identification, on the other hand, is a form of pattern recognition that helps to identify predefined language of a web page and to predict the unknown language of one particular text. Written texts are constructed by common features such as character, word and n-gram and these characteristics are unique among languages. From the experiment result, the performance of the supervised n-gram produces an accurate identification value and outperforms the distance measurement on Arabic script web pages.