DanaC: Workshop on Data analytics in the Cloud
Data nowadays comes from various sources including log files, transactional applications, the Web, social media and many others. A large part of this data is generated and transmitted in real time and in a large scale. To create value out of these data sets, business analysts and scientists employ advanced data analytics techniques combining, among others, traditional BI, text analytics, machine learning, data mining, and natural language processing. Tackling the complexity of both the data itself and its analysis remains an open challenge.
Cloud computing has emerged as a cost-effective and elastic computing paradigm that facilitates large scale data storage and analysis. Cloud infrastructures can provide adaptive resource provisioning with very little initial investment while scaling to massive amounts of commodity computing nodes. Data analytics, being very resource intensive, has the potential to be a significant cloud application, and to constitute a large fraction of the workload of modern data centers. Designing the infrastructures, systems and data analytics techniques in the new cloud computing environments remains an open challenge.
DanaC 2014 at a Glance
The workshop will take place in Snowbird, Utah, USA on Sunday 22nd June 2014 and is colocated with SIGMOD/PODS 2014.
The workshop is going to feature:
- A Keynote speech by Chris Johnson titled "Large-Scale Visual Data Analysis"
- Presentations of 6 accepted papers
- Demonstration of 6 major data analysis systems
Check the workshop's program for details.
"Large-Scale Visual Data Analysis"
|Chris Johnson Director of the Scientific Computing and Imaging Institute (SCI) at the University of Utah.|
Modern high performance computers have speeds measured in petaflops and handle data set sizes measured in terabytes and petabytes. Although these machines offer enormous potential for solving very large-scale realistic computational problems, their effectiveness will hinge upon the ability of human experts to interact with their simulation results and extract useful information. One of the greatest scientific challenges of the 21st century is to effectively understand and make use of the vast amount of information being produced. Visual data analysis will be among our most most important tools in helping to understand such large-scale information.
Our research at the Scientific Computing and Imaging (SCI) Institute at the University of Utah has focused on innovative, scalable techniques for large-scale visual data analysis. In this talk, I will present state- of-the-art visualization techniques, including scalable visualization algorithms and software, data management, cluster/cloud-based visualization methods and innovate visualization techniques applied to problems in computational science, engineering, and medicine.
Chris Johnson is the founding director of the Scientific Computing and Imaging (SCI) Institute at the University of Utah where he is a Distinguished Professor of Computer Science and holds faculty appointments in the Departments of Physics and Bioengineering. His research interests are in the areas of scientific computing and scientific visualization. Dr. Johnson founded the SCI research group in 1992, which has since grown to become the SCI Institute employing over 200 faculty, staff and students. Professor Johnson serves on several international journal editorial boards, as well as on advisory boards to several national and international research centers. Professor Johnson was awarded a Young Investigator's (FIRST) Award from the NIH in 1992, the NSF National Young Investigator (NYI) Award in 1994, and the NSF Presidential Faculty Fellow (PFF) award from President Clinton in 1995. In 1996 he received a DOE Computational Science Award and in 1997 recevied the Par Excellence Award from the University of Utah Alumni Association and the Presidential Teaching Scholar Award. In 1999, Professor Johnson was Awardedthe Governor's Medal for Science and Technology from Governor Michael Leavitt. In 2003 he received the Distinguished Professor Award from the University of Utah. In 2004 he was elected a Fellow of the American Institute for Medical and Biological Engineering, 2005 he was elected a Fellow of the American Association for the Advancement of Science, in 2009 he was elected a Fellow of the Society for Industrial and Applied Mathematics (SIAM) and received the Utah Cyber Pioneer Award. In 2010 Professor Johnson received the Rosenblatt Award from the University of Utah and the IEEE Visualization Career Award. In 2012 Professor Johnson received the IEEE IPDPS Charles Babbage Award and in 2013 Professor Johnson received the IEEE Sidney Fernbach Award . In 2014, Professor Johnson was elected an IEEE Fellow.
- Parasol: An Architecture for Cross-Cloud Federated Graph Querying
Michael D. Lieberman†-The Johns Hopkins University Applied Physics Laboratory; Sutanay Choudhury-Pacific Northwest National Laboratory; Marisa Hughes-The Johns Hopkins University Applied Physics Laboratory; Dennis Patrone-The Johns Hopkins University Applied Physics Laboratory; Robert T. Hider-The Johns Hopkins University Applied Physics Laboratory; Jr.-The Johns Hopkins University Applied Physics Laboratory; Christine D. Piatko-The Johns Hopkins University Applied Physics Laboratory; Matthew Chapman-The Johns Hopkins University Applied Physics Laboratory; J. P. Marple-The Johns Hopkins University Applied Physics Laboratory; David Silberberg-The Johns Hopkins University Applied Physics Laboratory;
- PAXQuery: A Massively Parallel XQuery Processor
Jesus Camacho-Rodriguez-Universite Paris-Sud and Inria; Dario Colazzo-Universite Paris-Dauphine; Ioana Manolescu-Inria & Universite Paris-Sud, France
- Introducing Data Connectivity in a Big Data Web
Damianos Chatziantoniou - Athens University of Economics; Florents Tselai, Intelen, Inc.
- Big-Data Management Use-Case: A Cloud Service for Creating and Analyzing Galactic Merger Trees
Sarah Loebman-University of Michigan; Jennifer Ortiz-University of Washington; Lee Lee Choo-University of Washington; Laurel Orr-University of Washington; Lauren Anderson-University of Washington; Dan Halperin-University of Washington; Magdalena Balazinska-University of Washington; Thomas Quinn-University of Washington; Fabio Governato-University of Washington
- Exploring Cloud Opportunities from an Array Database Perspective
Alex Dumitru-Jacobs University Bremen; Vlad Merticariu-Jacobs University Bremen; Peter Baumann-Jacobs University Bremen
- "Big Metadata": The Need for Principled Metadata Management in Big Data Ecosystems
Ken Smith-MITRE; Len Seligman-MITRE; Arnon Rosenthal-MITRE; Chris Kurcz-MITRE; Mary Greer-MITRE; Catherine Macheret-MITRE; Michael Sexton-MITRE; Adric Eckstein-MITRE
Data Analysis Systems Demonstrations
In addition, the workshop will feature presentations and demos of six leading systems for large-scale data analysis developed by members of the database community:
- Myria - Big Data as a Service
University of Washington, USA
- Hyper - A Hybrid OLTP & OLAP High Performance DBMS
TU Munich, Germany
- Stratosphere - Next generation Big Data Analytics Platform
TU Berlin, Germany
- Spark/BDAS - The Berkeley Data Analytics Stack
University of California, Berkeley, USA
- AsterixDB - Big Data Management System
UC Irvine, UC Riverside, and UC San Diego - USA
- REEF - The Retainable Evaluator Execution Framework
08:30 - 10:00 Session 1: Keynote Speech & "Gong show"
- 08:30 - 10:00 Keynote Talk: Large-Scale Visual Data Analysis, Speaker: Chris Johnson, Director of the Scientific Computing and Imaging Institute and Distinguished Professor at School of Computing, University of Utah (www.sci.utah.edu)
- 10:00 - 10:30 "Gong show:" 5-minute teaser of all papers accepted at the workshop (at the same order as they appear in the program)
10:00 - 10:30 Coffee break
10:30 - 12:00 Session 2: Research Paper presentations
- Research Paper: "Parasol: An Architecture for Cross-Cloud Federated Graph Querying", Michael D. Lieberman, Sutanay Choudhury, Marisa Hughes, Dennis Patrone, Robert T. Hider, Jr., Christine D. Piatko, Matthew Chapman, J. P. Marple, David Silberberg
- Research Paper: "Introducing Data Connectivity in a Big Data Web", Damianos Chatziantoniou, Florents Tselai
- Research Paper: "Big Metadata: The Need for Principled Metadata Management in Big Data Ecosystems", Ken Smith, Len Seligman, Arnon Rosenthal, Chris Kurcz, Mary Greer, Catherine Macheret, Michael Sexton, Adric Eckstein
- Research Paper: "Exploring Cloud Opportunities from an Array Database Perspective", Alex Dumitru, Vlad Merticariu, Peter Baumann
12:00 - 13:30 Lunch Break
13:30 - 15:00 Session 3: Research Paper & Systems Presentations
- Research Paper: "PAXQuery: A Massively Parallel XQuery Processor", Jesus Camacho-Rodriguez, Dario Colazzo, Ioana Manolescu
- Research Paper: "Big-Data Management Use-Case: A Cloud Service for Creating and Analyzing Galactic Merger Trees", Sarah Loebman, Jennifer Ortiz, Lee Lee Choo, Laurel Orr, Lauren Anderson, Daniel Halperin, Magdalena Balazinska, Thomas Quinn, Fabio Governato
- System Presentation: Myria (http://myria.cs.washington.edu/), presented by Daniel Halperin (University of Washington, USA)
- System Presentation: Hyper (http://hyper-db.de/) presented by Tobias Muehlbauer (TU Munich, Germany)
15:00 - 15:30 Coffee Break
15:30 - 17:00 Session 4: Systems Presentations
- System Presentation: Stratosphere (http://stratosphere.eu/), presented by Sebastian Schelter (TU Berlin, Germany)
- System Presentation: Spark/BDAS (https://amplab.cs.berkeley.edu/software/), presented by Evan Sparks (AMPLab, University of California, Berkeley, USA)
- System Presentation: Asterix (http://asterixdb.ics.uci.edu/), presented by Till Westmann (Oracle Labs, USA)
- System Presentation: Reef (http://www.reef-project.org/), presented by Konstantinos Karanasos (Microsoft Research - CISL)
Topics of Interest
Areas of particular interest for the workshop include (but are not limited to):
- Parallel execution and optimization
- Scalable storage and indexing
- Workload management
- Infrastructures for cloud computing
- Scalable machine learning
- Frameworks for parallel computing
- Industrial experiences and use cases
- Benchmarking, tuning, and testing
- Data science and analytics
- Privacy and security in the cloud
- Economic models for data
- Data management and analytics as a service
All papers should be submitted in pdf and formatted using the double-column ACM format (templates are available here).
The workshop solicits:
- research papers
- vision papers
- use cases
- controversial topics
- industrial experience
All papers should clearly mark their type (research/vision/industrial, etc.) in the paper title and should not exceed 4 pages.
Papers should be submitted using the conference management system: https://cmt.research.microsoft.com/DANAC2014
|Notification of acceptance:||May 5, 2014|
|Final papers due:||May 27, 2014|
|Workshop:||June 22, 2014|
- Till Westmann (Oracle Labs, USA)
- Jens Dittrich (Saarland University, Germany)
- Jorge-Arnulfo Quiané-Ruiz (QCRI, Qatar)
- Eric Sedlar (Oracle Labs, USA)
- Russell Sears (Microsoft Research, USA)
- Donald Kossmann (ETH Zurich, Switzerland)
- Frank McSherry (Microsoft Research, USA)
- Alkis Polyzotis (University of California - Santa Cruz, USA)
- Hakan Hacigumus (NEC Labs, USA)
- Sihem Amer-Yahia (CNRS, France)
- Chris Re (Stanford University, USA)
- Ant Rowstron (Microsoft Research, Cambridge, UK)
- Konstantinos Karanasos (IBM Almaden, USA)
- Stratos Idreos (Harvard University, USA)
- Ioana Manolescu (INRIA Saclay & Université Paris-Sud, France)
- Spyros Blanas (Ohio State University, USA)