doi: 10.4304/jsw.7.11.2424-2436
Efficient and Effective Filtering of Duplication Detection in Large Database Applications
Abstract—In this paper, a robust filtering technique, called PC-Filter (PC stands for partition comparison), is proposed for effective and efficient duplicate record detection in large databases. PC-Filter distinguishes itself from all of existing methods by using record partitions in duplicate detection. PC-Filter operates in three steps. It first sorts the whole database and splits the sorted database into a number of record partitions. The Partition Comparison Graph (PCG) is then generated by performing fast partition pruning. Finally, duplicate records are effectively detected through internal and external partition comparison based on PCG. Four closure properties, used as heuristics, have been devised to achieve a remarkable efficiency of the filter based on triangle inequity of record similarity. The partition size is well specified such that the time complexity of PC-Filter can be optimized. By equipping existing detection methods with PC-Filter, we are able to well solve the major problems that the existing methods suffer.
Cite: Ji Zhang, "Efficient and Effective Filtering of Duplication Detection in Large Database Applications," Journal of Software vol. 7, no. 11, pp. 2424-2436, 2012.
General Information
ISSN: 1796-217X (Online)
Abbreviated Title: J. Softw.
Frequency: Biannually
APC: 500USD
DOI: 10.17706/JSW
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Cecilia Xie
Abstracting/ Indexing: DBLP, EBSCO,
CNKI, Google Scholar, ProQuest,
INSPEC(IET), ULRICH's Periodicals
Directory, WorldCat, etcE-mail: jsweditorialoffice@gmail.com
-
Mar 07, 2025 News!
Vol 19, No 4 has been published with online version [Click]
-
Mar 07, 2025 News!
JSW had implemented online submission system [Click]
-
Apr 01, 2024 News!
Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) [Click]
-
Apr 01, 2024 News!
Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP [Click]
-
Oct 22, 2024 News!
Vol 19, No 3 has been published with online version [Click]