Volume 7 Number 11 (Nov. 2012)
Home > Archive > 2012 > Volume 7 Number 11 (Nov. 2012) >
JSW 2012 Vol.7(11): 2424-2436 ISSN: 1796-217X
doi: 10.4304/jsw.7.11.2424-2436

Efficient and Effective Filtering of Duplication Detection in Large Database Applications

Ji Zhang

Department of Mathematics and Computing University of Southern Queensland Toowoomba, QLD 4350, Australia

Abstract—In this paper, a robust filtering technique, called PC-Filter (PC stands for partition comparison), is proposed for effective and efficient duplicate record detection in large databases. PC-Filter distinguishes itself from all of existing methods by using record partitions in duplicate detection. PC-Filter operates in three steps. It first sorts the whole database and splits the sorted database into a number of record partitions. The Partition Comparison Graph (PCG) is then generated by performing fast partition pruning. Finally, duplicate records are effectively detected through internal and external partition comparison based on PCG. Four closure properties, used as heuristics, have been devised to achieve a remarkable efficiency of the filter based on triangle inequity of record similarity. The partition size is well specified such that the time complexity of PC-Filter can be optimized. By equipping existing detection methods with PC-Filter, we are able to well solve the major problems that the existing methods suffer.

[PDF]

Cite: Ji Zhang, "Efficient and Effective Filtering of Duplication Detection in Large Database Applications," Journal of Software vol. 7, no. 11, pp. 2424-2436, 2012.

General Information

  • ISSN: 1796-217X (Online)
  • Frequency:  Quarterly
  • Editor-in-Chief: Prof. Antanas Verikas
  • Executive Editor: Ms. Yoyo Y. Zhou
  • Abstracting/ Indexing: DBLP, EBSCO, CNKIGoogle Scholar, ProQuest, INSPEC(IET), ULRICH's Periodicals Directory, WorldCat, etc
  • E-mail: jsweditorialoffice@gmail.com
  • APC: 500USD
  • Jun 12, 2024 News!

    Vol 19, No 2 has been published with online version   [Click]

  • Jan 04, 2024 News!

    JSW will adopt Article-by-Article Work Flow

  • Apr 01, 2024 News!

    Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec)     [Click]

  • Apr 01, 2024 News!

    Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP   [Click]

  • Mar 01, 2024 News!

    Vol 19, No 1 has been published with online version    [Click]