报告题目:Coresets for clustering in Euclidean spaces: importance sampling is nearly optimal.
报告人:黄棱潇报告时间:06月28日 周一 13:30
报告地点:理科大楼B1002
报告摘要:Given a collection of ��� points in R��� , the goal of the (���, ���)-Clustering problem is to find a subset of ��� “centers” that minimizes the sum of the ���-th powers of the Euclidean distance of each point to the closest center. Special cases of the (���, ���)-Clustering problem include the ���-Median and ���-Means problems. Our main result is a unified two-stage importance sampling framework that constructs an ���-coreset for the (���, ���)-Clustering problem. Compared to the results for (���, ���)-Clustering in [Feldman and Langberg, STOC 2011], our framework saves a ���2��� factor in the coreset size. Compared to the results for (���, ���)-Clustering in [Sohler and Woodruff, FOCS 2018], our framework saves a poly(���) factor in the coreset size and avoids the exp(���/���) term in the construction time. Specifically, our coreset for ���-Median (��� = 1) has smaller size compared to the result in [Sohler and Woodruff, FOCS 2018], saves a ��� factor in the coreset size. Our algorithmic results rely on a new dimension reduction technique that connects two well-known shape fitting problems: subspace approximation and clustering, and may be of independent interest. We also provide a size lower bound for a 0.01-coreset for (���, ���)-Clustering, which has a linear dependence of size on ��� and an exponential dependence on ��� that matches our algorithmic results.
报告人简介:Lingxiao Huang is a researcher of computer science at Huawei TCS Lab. Before he was a postdoc of computer science at Yale University from 2019 to 2020 and was a postdoc of computer science at EPFL from 2017 to 2019, after received his Ph.D. in IIIS, Tsinghua University. His current research interest is algorithm design and computational social choice. He is passionate about creating novel algorithms that are motivated by existing practical challenges.