Extended aggregations for databases with referential integrity issues

dc.contributor.author	Ordonez, C
dc.contributor.author	García-García, J
dc.date.accessioned	2011-01-22T10:25:45Z
dc.date.available	2011-01-22T10:25:45Z
dc.date.issued	2010
dc.identifier.issn	0169-023X
dc.identifier.uri	http://hdl.handle.net/11154/3149
dc.description.abstract	Querying inconsistent databases remains a broad and difficult problem. In this work, we study how to improve aggregations computed on databases with referential errors in the context of database integration, where each source database has different tables, columns with similar content across multiple databases. but different referential integrity constraints. Thus, a query in an integrated database may involve tables and columns with referential integrity errors. In a data warehouse, even though the ETL processes fix referential integrity errors, this is generally done by inserting "dummy" records into the dimension tables corresponding to such invalid foreign keys, thereby artificially enforcing referential integrity. When two tables are joined and aggregations are computed, rows with an invalid or null foreign key value are skipped, effectively eliminating potentially valuable information. With that motivation in mind, we extend SQL aggregate functions computed over tables with referential integrity issues to return complete answer sets in the sense that no row is excluded. We associate to each referenced key in the dimension table, a probability that invalid or null foreign keys refer to it. Our main idea is to compute aggregations over joined tables including rows with invalid or null references by distributing their contribution to aggregation totals, based on probabilities computed over correct foreign keys. Experiments with real and synthetic databases evaluate the usefulness, accuracy and performance of our extended aggregations. (C) 2009 Elsevier B.V. All rights reserved.	en_US
dc.language.iso	en	en_US
dc.title	Extended aggregations for databases with referential integrity issues	en_US
dc.type	Article	en_US
dc.identifier.idprometeo	324
dc.identifier.doi	10.1016/j.datak.2009.08.008
dc.source.novolpages	69(1):73-95
dc.subject.wos	Computer Science, Artificial Intelligence
dc.subject.wos	Computer Science, Information Systems
dc.description.index	WoS: SCI, SSCI o AHCI
dc.subject.keywords	Aggregate functions
dc.subject.keywords	Data quality
dc.subject.keywords	Inconsistent databases
dc.subject.keywords	Imprecision
dc.subject.keywords	SQL
dc.relation.journal	Data & Knowledge Engineering