来源：软件工程学院

华东师范大学可信计算论坛系列报告

来源：华东师范大学软件工程学院发布时间：2013-06-05浏览次数：4834

【报告一】

报告题目：Scalable, Low-Latency Data Analytics and its Applications

报告人：Yanlei Diao 教授

报告时间：6月7日周五下午 13:30 - 15:00

报告地点：中北校区数学馆201

报告摘要：
An integral part of many data-intensive applications is the need to collect and analyze enormous data sets, such as click streams, search logs, and sensor streams to derive answers and insights with low latencies. Concurrently, new programming models and architectures have been developed for large-scale cluster computing, exemplified by recent MapReduce systems. However, these systems are designed for batch processing and require data set to be fully loaded into the cluster before running analytical queries, hence causing high delays of query answers.
In this talk, I present the design of a scalable, low-latency analytics platform, called Scalla, that fundamentally transforms the existing cluster computing paradigm into an incremental parallel processing paradigm, which provides the combined benefits of massive parallelism, incremental answers, and I/O efficiency. Our technical contributions include replacing an existing popular mechanism for partitioned parallelism with a purely hash-based mechanism and using dynamic frequency analysis to offer in-memory processing for most of the data. In this talk, I will also examine two application scenarios, click stream analysis, which has been used in our evaluation, and genomic data analysis, which is a new project that leverages Scalla for massive-scale genomic data processing and analysis.

报告人简介：
Yanlei Diao is now with Computer Science at the University of Massachusetts Amherst. Her research interests are in information architectures and data management systems, with a focus on large-scale data analysis, data streams, uncertain data management, and flash memory databases. She received her PhD in Computer Science from the University of California, Berkeley in 2005, her M.S. in Computer Science from the Hong Kong University of Science and Technology in 2000, and her B.S. in Computer Science from Fudan University in 1998.
Yanlei Diao was a recipient of the CRA-W Borg Early Career Award, NSF Career Award, and the IBM Scalable Innovation Faculty Award, as well as a finalist of the Microsoft Research New Faculty Fellowship. She spoke at the Distinguished Faculty Lecture Series at the University of Texas at Austin. Her PhD dissertation “Query Processing for Large-Scale XML Message Brokering” won the 2006 ACM-SIGMOD Dissertation Award Honorable Mention. She is an associate editor of PVLDB 2013 and has served on the organizing committees of SIGMOD, CIDR, DMSN, the New Researcher Symposium, and the New England Database Summit. She has served on program committees of numerous international conferences and workshops.

【报告二】

报告题目：Teaching the Data Processing Elephant to Dance using Stream Processing

报告人：Michael Franklin 教授

报告时间：6月7日周五下午15:00 - 16：30

报告地点：中北校区数学馆201

报告摘要：
Data stream query processing has long been proposed as a way to do real-time and event-driven processing. It is less appreciated, however,that it can also be used in emerging "Big Data" analysis scenarios where massive amounts of data are continually appended to the existing database. In this talk I'll give an overview of the "Stream-Relational"processing system built by Truviso (now part of Cisco Prime Analytics),and then focus on how it handles a key problem in real-world deployments:namely, late-arriving and out-of-order data. The trick is to exploit Data Parallel execution and views - two things that database systems have supported for many years.

报告人简介：

Michael Franklin is the Thomas M. Siebel Professor of Computer Science and Director of the Algorithms, Machines, and People Laboratory (AMPLab) at UC Berkeley. His research focuses on new approaches for data management and data analysis, including data stream processing and continuous analytics. AMPLab is a cross-disciplinary collaboration addressing the Big Data analytics challenge through the development of a new software stack integrating Machine Learning, Cloud and Cluster Computing and Crowdsourcing. The AMPLab is sponsored in part by 21 leading global technology companies including founding sponsors Amazon Web Services, Google and SAP, and by a 5-year NSF CISE Expeditions in Computing award, which was announced as part of the White House Big Data Research initiative in 2012. Previously, he was a Founder and CTO of Truviso, Inc. a real-time data analytics company acquired by CIsco Systems. He is an ACM Fellow and winner of the ACM SIGMOD Test of Time Award. Recent recognition includes the Best Paper Awards at ICDE 2013 and NSDI 2012, a "Best of VLDB 2012" selection, Best Demo awards at SIGMOD 2012 and VLDB 2011, and an Outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley. He is serving as a committee member on the U.S. National Academy of Sciences study on Analysis of Massive Data and an NRC/TRB Committee on the long-term stewardship of driver safety data, as well as on a number of Advisory Boards for start up companies and research centers. He received his Ph.D. in Computer Science from the University of Wisconsin-Madison in 1993.