Volume 9 Number 10 (Oct. 2014)
Home > Archive > 2014 > Volume 9 Number 10 (Oct. 2014) >
JSW 2014 Vol.9(10): 2721-2731 ISSN: 1796-217X
doi: 10.4304/jsw.9.10.2721-2731

Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints

Peisen Yuan1, Haoyun Wang1, Jianghua Che1, Shougang Ren1, Huanliang Xu1, Dechang Pi2

1College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China
2College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China


Abstract—The string similarity join, which is employed to find similar string pairs from string sets, has received extensive attention in database and information retrieval fields. To this problem, the filter-and-refine framework is usually adopted by the existing research work firstly, and then various filtering methods have been proposed. Recently, tree based index techniques with the edit distance constraint are effectively employed for evaluating the string similarity join. However, they do not scale well with large distance threshold. In this paper, we propose an efficient framework for approximate string similarity join based on Min-Hashing locality sensitive hashing and trie-based index techniques under string edit distance constraints. We show that our framework is flexible between trading the efficiency and performance. An empirical study using the real datasets demonstrates that our framework is more efficient and scales better.

Index Terms—Approximate String Similarity Join, Locality Sensitive Hashing, Min-Hashing, String Edit Distance, Trie Join

[PDF]

Cite: Peisen Yuan, Haoyun Wang, Jianghua Che, Shougang Ren, Huanliang Xu, Dechang Pi, "Approximate String Similarity Join using Hashing Techniques under Edit Distance Constraints," Journal of Software vol. 9, no. 10, pp. 2721-2731, 2014.

General Information

ISSN: 1796-217X (Online)
Frequency:  Quarterly
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Yoyo Y. Zhou
Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
E-mail: jsweditorialoffice@gmail.com
  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Apr 01, 2024 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Apr 01, 2024 News!

    Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP   [Click]

  • Nov 02, 2023 News!

    Vol 18, No 4 has been published with online version   [Click]