- Keynote 1: Microsoft Azure SQL Database Service (Tomas Talius and Surajit Chaudhuri)
- Keynote 2: An Introduction to Amazon Aurora (Sailesh Krishnamurthy)
- Keynote 3: Democratizing Data Science in the Cloud (Bill Howe)
Abstract: Azure SQL Database is a Microsoft Azure platform service hosting SQL databases for external and internal customers. It addresses infrastructure, database system provisioning and database monitoring needs while allowing customers to focus on the logical level database programming. We have recently made significant changes to the service to improve its compatibility, functionality, scale and reliability by introducing Azure SQL DB v12. In this talk we will give a short introduction of the architecture of Azure SQL Database and how we changed the service in-flight. We will describe the experiences from running a large scale cloud database service, including the importance of providing predictable performance levels which are backed by CPU and IO governance, developed in collaboration with Microsoft Research. We will also briefly touch how we are taking the service forward, including ongoing join work with Microsoft Research.
Tomas Talius is one of the founding members of Azure database team at Microsoft. He has been working on SQL Server engine extensions to adapt it to data-center environments for the last ten years. Tomas has co-authored multiple patents on various database and cloud technologies. He holds a B.Sc. degree in Computer Science from Vilnius University, Lithuania.
Surajit Chaudhuri is a Distinguished Scientist at Microsoft Research and leads the Data Management, Exploration and Mining group. As a Deputy Managing Director of MSR Redmond Lab, he also has oversight of Distributed Systems, Networking, Security, Programming languages and Software Engineering groups. He serves on the Senior Leadership Team of the Executive Vice President of Microsoft's Cloud and Enterprises division. His current areas of interest are enterprise data analytics, data discovery, self-manageability and cloud database services. Working with his colleagues in Microsoft Research, he helped incorporate the Index Tuning Wizard (and subsequently Database Engine Tuning Advisor) and data cleaning technology into Microsoft SQL Server. Surajit is an ACM Fellow, a recipient of the ACM SIGMOD Edgar F. Codd Innovations Award, ACM SIGMOD Contributions Award, a VLDB 10 year Best Paper Award, and an IEEE Data Engineering Influential Paper Award. Surajit received his Ph.D. from Stanford University in 1992.
Abstract: In this talk I will provide an architectural overview of Amazon Aurora, a new cloud-native relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Aurora delivers up to five times the throughput of standard MySQL running on the same hardware. Aurora is delivered as a managed service (via Amazon RDS) that handles time-consuming tasks such as provisioning, patching, backup, recovery, failure detection, and repair. Aurora increases MySQL performance and availability by tightly integrating the database engine with an SSD-backed virtualized storage layer purpose-built for database workloads. Aurora´s storage is fault-tolerant and self-healing. Disk failures are repaired in the background without loss of database availability. Aurora is designed to automatically detect database crashes and restart without the need for crash recovery or to rebuild the database cache. In addition, Aurora will also automatically failover to one of up to 15 read replicas.
Biography: Sailesh Krishnamurthy is a Senior Engineering Manager at Amazon Web Services (AWS) where he leads engineering for the Amazon Aurora database kernel. Prior to AWS, Sailesh was at Cisco Systems via the acquisition of Truviso, a real-time streaming data analytics software company that he co-founded to commercialize his prior academic research. At Truviso, he built and managed the initial engineering, services and support teams and was also the original developer of the core platform. Sailesh is an authority in the field of data management with over a dozen published academic papers and several issued U.S. patents. He investigated the technical ideas at the heart of Truviso's products as part of his doctoral research on stream query processing, earning a Ph.D. in Computer Science from UC Berkeley in 2006. Prior to graduate work at Berkeley he worked at IBM on core database products and at Netscape on a Java virtual machine implementation. Sailesh has a Master´s degree in Computer Science from Purdue University and a Bachelor´s degree in Electrical Engineering from the Birla Institute of Technology and Science, Pilani, India.
Abstract: Data science remains a high-touch, human-intensive exercise in all contexts. There has been a "Cambrian explosion" of data systems proposed and evaluated in the last decade aimed at improving productivity, performance, or both, but the perception that an elite team of data scientists is required to operate them persists. At the UW eScience Institute and in the UW Database Group, we're studying ways to reduce the complexity of this landscape through cloud services, aiming to democratize access to advanced data management, analytics, and visualization capabilities across all fields and across all levels of expertise. I'll present some recent findings from a multi-year deployment of a database-as-a-service system called SQLShare, a system designed to eliminate prohibitive setup and usage costs associated with databases (configuration, schema design, ingest, etc.) and also serve as an "instrument" to help us better understand new workloads and design new services to accommodate them. We find that complex queries over weakly structured, short-lived datasets are the norm, challenging the design of both conventional database systems and large-scale dataflow systems. I'll describe VizDeck, a system that recommends visualizations based on statistical properties of the dataset, and organizes the candidates using a card game metaphor to afford dashboard creation and also help us collect interaction data to drive the models. I'll also describe some ongoing research in the context of the Myria project, which aims to provide common interfaces across "polystore" environments consisting of multiple systems with diverse data models and capabilities. I'll show some of the common services we are building over polystore environments, including query and optimization services, as well as a visualization and query monitoring system we built called Perfopticon that allows non-experts to debug performance problems in scale-out dataflow systems.
Biography: Bill Howe is the Associate Director of the UW eScience Institute and an Affiliate Associate Professor in Computer Science & Engineering. His research interests are in data management, curation, analytics, and visualization in the sciences. Howe played a leadership role in the Data Science Environment program at UW through a $32.8 million grant awarded jointly to UW, NYU, and UC Berkeley. With support from the MacArthur Foundation and Microsoft, Howe leads UW's participation in the national MetroLab Network focused on smart cities and data-intensive urban science. He also led the creation of the UW Data Science Masters Degree and serves as its inaugural Program Director and Faculty Chair. He has received two Jim Gray Seed Grant awards from Microsoft Research for work on managing environmental data, has had two papers selected for VLDB Journal's "Best of Conference" issues (2004 and 2010), and co-authored what are currently the most-cited papers from both VLDB 2010 and SIGMOD 2012. Howe serves on the program and organizing committees for a number of conferences in the area of databases and scientific data management, eveloped a first MOOC on data science that attracted over 200,000 students across two offerings, and founded UW's Data Science for Social Good program. He has a Ph.D. in Computer Science from Portland State University and a Bachelor's degree in Industrial & Systems Engineering from Georgia Tech.