Ciencias,UNAM

Extended aggregations for databases with referential integrity issues

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Ordonez, C
dc.contributor.author García-García, J
dc.date.accessioned 2011-01-22T10:25:45Z
dc.date.available 2011-01-22T10:25:45Z
dc.date.issued 2010
dc.identifier.issn 0169-023X
dc.identifier.uri http://hdl.handle.net/11154/3149
dc.description.abstract Querying inconsistent databases remains a broad and difficult problem. In this work, we study how to improve aggregations computed on databases with referential errors in the context of database integration, where each source database has different tables, columns with similar content across multiple databases. but different referential integrity constraints. Thus, a query in an integrated database may involve tables and columns with referential integrity errors. In a data warehouse, even though the ETL processes fix referential integrity errors, this is generally done by inserting "dummy" records into the dimension tables corresponding to such invalid foreign keys, thereby artificially enforcing referential integrity. When two tables are joined and aggregations are computed, rows with an invalid or null foreign key value are skipped, effectively eliminating potentially valuable information. With that motivation in mind, we extend SQL aggregate functions computed over tables with referential integrity issues to return complete answer sets in the sense that no row is excluded. We associate to each referenced key in the dimension table, a probability that invalid or null foreign keys refer to it. Our main idea is to compute aggregations over joined tables including rows with invalid or null references by distributing their contribution to aggregation totals, based on probabilities computed over correct foreign keys. Experiments with real and synthetic databases evaluate the usefulness, accuracy and performance of our extended aggregations. (C) 2009 Elsevier B.V. All rights reserved. en_US
dc.language.iso en en_US
dc.title Extended aggregations for databases with referential integrity issues en_US
dc.type Article en_US
dc.identifier.idprometeo 324
dc.identifier.doi 10.1016/j.datak.2009.08.008
dc.source.novolpages 69(1):73-95
dc.subject.wos Computer Science, Artificial Intelligence
dc.subject.wos Computer Science, Information Systems
dc.description.index WoS: SCI, SSCI o AHCI
dc.subject.keywords Aggregate functions
dc.subject.keywords Data quality
dc.subject.keywords Inconsistent databases
dc.subject.keywords Imprecision
dc.subject.keywords SQL
dc.relation.journal Data & Knowledge Engineering

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account