A hybrid semantic search technique for web information retrieval
Vast emergence of data on the web is an advantage in terms of availability. However, the ever-increasing growth of data and information makes finding the right information a challenge and an urgent task. This scenario results in the need to the improvement of information retrieval (IR). Web Informat...
Saved in:
| Main Author: | |
|---|---|
| Format: | Thesis |
| Published: |
2015
|
| Subjects: | |
| Online Access: | http://eprints.uthm.edu.my/7898/ http://eprints.uthm.edu.my/7898/1/noryusliza_abdullah.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Vast emergence of data on the web is an advantage in terms of availability. However,
the ever-increasing growth of data and information makes finding the right
information a challenge and an urgent task. This scenario results in the need to the
improvement of information retrieval (IR). Web Information Retrieval (WIR) is the
search engine has become the main resource in this area. Current WIR techniques
have assisted in many ways, such as results ranking, categorization, and semantic
searching. Nevertheless, there is a need to improve the current techniques to enhance
information relevancy based on user's expectations. Therefore, in order to achieve the
goals, a hybrid technique combining Categorization, Ontology, and User Prof ling
concepts is proposed in this research through the use of Semantic Web (SW)
technologies. The objectives of this research were to design, implement and compare
an alternative semantic search IR, and its effectiveness is tested in Cloud Computing
(CC) environment. The WordNet, a lexical ontology resource, was used for keyword
categorization as it consisted of large data in the English language, while the UTHM
Ontology (UTHM Onto) supported User Profiling. The similarity between WordNet
and UTHM Onto is generated using the semantic similarity measurement. The
comparisons between the proposed Hybrid Search Engine (Hysse) with other
techniques were identified based on Precision Effectiveness Metric. The term Java
(referring to either a programme, beverage or an island) is used to measure the
precision. The MAP of Java Object Oriented Programming Language for Hysse is
93%, WSP 89%, Doctopush 7%, Carrot2 73% and Google 93%. On the other hand,
MAP of Java Beverage for Hysse is 81%, WSP 76%, Doctopush 9%, Carrot2 4%
and Google 6%. Lastly MAP of Java Island for Hysse is 85%, WSP 82%, Doctopush
83%, Carrot2 3% and Google 11%. The Hysse is tested in CC using MYRENCloud
and Amazon Elastic Compute Cloud (EC2). Comparison of Hysse and another
technique which is Doctopush in cloud shows good results with the difference
between them is only 14ms. |
|---|