Volume 11 Number 12 (Dec. 2016)
Home > Archive > 2016 > Volume 11 Number 12 (Dec. 2016) >
JSW 2016 Vol.11(12): 1207-1223 ISSN: 1796-217X
doi: 10.17706/jsw.11.12.1207-1223

Automatic Linking of Short Arabic Texts to Wikipedia Articles

Fatoom Fayad1*, Iyad AlAgha2

1Computer Center, Palestine Technical College-Deir El-Balah, Gaza Strip, Palestine.
2Faculty of Information Technology, The Islamic University of Gaza, Gaza Strip, Palestine.

Abstract—Given the enormous amount of unstructured texts available on the Web, there has been an emerging need to increase discoverability of and accessibility to these texts. One of the proposed solutions is to annotate texts with information extracted from background knowledge. Wikipedia, the free encyclopedia, has been recently exploited as a background knowledge to annotate text with complementary information. Given any piece of text, the main challenge is how to determine the most relevant information from Wikipedia with the least effort and time. While Wikipedia-based annotation has mainly targeted the English and Latin versions of Wikipedia, little effort has been devoted to annotate Arabic text using the Arabic version of Wikipedia. In addition, the annotation of short text presents further challenges due to the inability to apply statistical or machine learning techniques that are commonly used with long text. This work proposes an approach for automatic linking of Arabic short texts to articles drawn from Wikipedia. It reports on the several challenges associated with the design and implementation of the linking approach including the processing of the Wikipedia's enormous content, the mapping of texts to Wikipedia articles, the problem of article disambiguation, and the time efficiency. The proposed approach was tested on a dataset of 100 short texts gathered from online Arabic articles. The annotations generated by the approach were compared with the annotations generated by two human subjects. The approach achieved 71.79% accuracy, 74.70% average precision, and 82.63 % average recall. A thorough analysis and discussion of the evaluation results are also presented to address the limitations, strengths as well as recommendations for future improvements.

Index Terms—Arabic, annotation, entity linking, short text, wikipedia.


Cite: Fatoom Fayad, Iyad AlAgha, "Automatic Linking of Short Arabic Texts to Wikipedia Articles," Journal of Software vol. 11, no. 12, pp. 1207-1223, 2016.

General Information

ISSN: 1796-217X (Online)
Frequency:  Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com
  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Apr 01, 2024 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Apr 01, 2024 News!

    Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP   [Click]

  • Nov 02, 2023 News!

    Vol 18, No 4 has been published with online version   [Click]