PODS Invited Talks
Keynote
Incomplete Data: What Went Wrong, and How to Fix It
Leonid Libkin (University of Edinburgh)
Abstract
Incomplete data is ubiquitous and poses even more problems than before. The more data we accumulate and the more widespread tools for integrating and exchanging data become, the more instances of incompleteness we have. And yet the subject is poorly handled by both practice and theory. Many queries for which students get full marks in their undergraduate courses will not work correctly in the presence of incomplete data, but these ways of evaluating queries are cast in stone (SQL standard). We have many theoretical results on handling incomplete data but they are, by and large, about showing high complexity bounds, and thus are often dismissed by practitioners. Even worse, we have a basic theoretical notion of what it means to answer queries over incomplete data, and yet this is not at all what practical systems do.
Is there a way out of this predicament? Can we have a theory of incompleteness that will appeal to theoreticians and make practitioners realize that commercial DBMSs often produce paradoxical answers? Can we make such a theory applicable, i.e., implementable on top of existing DBMSs that are very good at fast query evaluation? And can we make it useful for applications such as data integration and handling inconsistency? The talk is about raising these issues, providing some answers, and outlining problems that still need to be solved.
Bio
Leonid Libkin is Professor of Foundations of Data Management in the
School of Informatics at the University of Edinburgh. He was
previously professor at the University of Toronto and a member of
research staff at Bell Laboratories. He received his PhD from the
University of Pennsylvania in 1994. His main research interests are
in the areas of data management and applications of logic in computer
science. He has written five books and over 180 technical papers. He
was the recipient of a Marie Curie Chair Award from the EU in 2006 and
four best paper awards. He has chaired several program committees,
including PODS and ICDT, and was the conference chair of the 2010
Federated Logic Conference. He is an ACM fellow and a fellow of the
Royal Society of Edinburgh.
Tutorial 1
Model-Data Ecosystems: Challenges, Tools, and Trends
Peter J. Haas (IBM Almaden Research Center)
Abstract
In the past few years, research around (big) data management has begun to intertwine with research around deep predictive modeling and simulation. There is an increasing recognition that observed data must be combined with simulated data to support the deep what-if analysis that is needed for robust decision making under uncertainty. Simulation models of large, complex systems (traffic, biology, population health and safety) both consume and produce massive amounts of data, compounding the challenges of traditional information management. This talk will survey some interesting new problems, mathematical tools, and future directions in this emerging research area. Tentative topics include (i) pushing stochastic simulation into the database, (ii) simulation as a tool for data integration, (iii) new methods for massive scale time series transformations between models, (iv) moving from query optimization to simulation-run optimization, and (v) exploiting user control of simulated data
Bio
Peter J. Haas has been a Research Staff Member at the IBM Almaden
Research Center since 1987, where he has pursued research at the
interface of information management, applied probability, statistics,
and computer simulation. He has contributed to IBM products such as DB2
UDB and Netezza, as well as to the ISO SQL standards for database
sampling and analytics. He is also a Consulting Professor in the
Department of Management Science and Engineering at Stanford University,
teaching and pursuing research in stochastic modeling and simulation. He
is an IBM Master Inventor, an ACM Fellow, and a past president of the
INFORMS Simulation Society (I-Sim). He has received a number of awards,
including an ACM SIGMOD 10-year Best Paper award, an I-Sim Outstanding
Simulation Publication Award, and an IBM Research Outstanding Technical
Achievement Award. He has served on the editorial boards of the VLDB
Journal, Operations Research, and ACM Transactions on Modeling and
Computer Simulation.
Tutorial 2
Database Principles in Information Extraction
Benny Kimelfeld (LogicBlox)
Abstract
Populating a relational schema from textual content, a problem commonly known as Information Extraction, is pervasive in contemporary computational challenges associated with Big Data. In this tutorial, I will give an overview of the algorithmic concepts and techniques used for solving Information Extraction tasks. I will also describe some of the declarative frameworks that provide abstractions and infrastructure for programming extractors. Finally, I will highlight opportunities for impact through principles of data management, illustrate these opportunities through recent work, and propose directions for future research.
Bio
After receiving his Ph.D. in Computer Science from The Hebrew
University of Jerusalem, Benny spent five years at IBM Research –
Almaden, first as a postdoctoral scholar in the Computer Science
Principles and Methodologies (Theory) Department, and then as a
research staff member in the Search and Analytics Department. Since
2014, Benny has been a Computer Scientist at LogicBlox. Benny’s
research spans a spectrum of both foundational and systems aspects of
data management, such as uncertain (probabilistic) databases,
information retrieval over data with structure, view updates,
semistructured data, graph mining, and infrastructure for text
analytics.