Volume 11 Number 12 (Dec. 2016)
Home > Archive > 2016 > Volume 11 Number 12 (Dec. 2016) >
JSW 2016 Vol.11(12): 1207-1223 ISSN: 1796-217X
doi: 10.17706/jsw.11.12.1207-1223

Automatic Linking of Short Arabic Texts to Wikipedia Articles

Fatoom Fayad1*, Iyad AlAgha2
1Computer Center, Palestine Technical College-Deir El-Balah, Gaza Strip, Palestine.
2Faculty of Information Technology, The Islamic University of Gaza, Gaza Strip, Palestine.

Abstract—Given the enormous amount of unstructured texts available on the Web, there has been an emerging need to increase discoverability of and accessibility to these texts. One of the proposed solutions is to annotate texts with information extracted from background knowledge. Wikipedia, the free encyclopedia, has been recently exploited as a background knowledge to annotate text with complementary information. Given any piece of text, the main challenge is how to determine the most relevant information from Wikipedia with the least effort and time. While Wikipedia-based annotation has mainly targeted the English and Latin versions of Wikipedia, little effort has been devoted to annotate Arabic text using the Arabic version of Wikipedia. In addition, the annotation of short text presents further challenges due to the inability to apply statistical or machine learning techniques that are commonly used with long text. This work proposes an approach for automatic linking of Arabic short texts to articles drawn from Wikipedia. It reports on the several challenges associated with the design and implementation of the linking approach including the processing of the Wikipedia's enormous content, the mapping of texts to Wikipedia articles, the problem of article disambiguation, and the time efficiency. The proposed approach was tested on a dataset of 100 short texts gathered from online Arabic articles. The annotations generated by the approach were compared with the annotations generated by two human subjects. The approach achieved 71.79% accuracy, 74.70% average precision, and 82.63 % average recall. A thorough analysis and discussion of the evaluation results are also presented to address the limitations, strengths as well as recommendations for future improvements.

Index Terms—Arabic, annotation, entity linking, short text, wikipedia.


Cite: Fatoom Fayad, Iyad AlAgha, "Automatic Linking of Short Arabic Texts to Wikipedia Articles," Journal of Software vol. 11, no. 12, pp. 1207-1223, 2016.

General Information

ISSN: 1796-217X (Online)
Frequency: Monthly (2006-2019); Bimonthly (Since 2020)
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, Google Scholar, ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsw@iap.org
  • Dec 06, 2019 News!

    Vol 14, No 1- Vol 14, No 4 has been indexed by EI (Inspec)   [Click]

  • Jun 22, 2020 News!

    Papers published in JSW Vol 14, No 1- Vol 15 No 4 have been indexed by DBLP     [Click]

  • Jun 22, 2020 News!

    The papers published in Vol 15, No 5 have all received dois from Crossref    [Click]

  • Aug 01, 2018 News!

    [CFP] 2020 the annual meeting of JSW Editorial Board, ICCSM 2020, will be held in Rome, Italy, July 17-19, 2020   [Click]

  • Jun 22, 2020 News!

    Vol 15, No 5 has been published with online version     [Click]