时 间:2025年10月29日(周三)14:00 – 15:00
地 点:中北理科大楼A1514室
报告人:胡懿娟 北京大学教授
主持人:马慧娟 华东师范大学副教授
摘 要:
In the era of big data, many sequencing-based molecular datasets are compositional, meaning they are expressed as percentages. While microbiome data is the most well-known example, single-cell subtype abundance data is also compositional in nature. These datasets are often sparse, containing numerous zero values due to the large number of features and limited sequencing depth. Compositional analysis typically assumes that only a small proportion of taxa are differentially abundant, while the ratios of relative abundances among the remaining taxa remain stable. Most existing methods rely on log-transformed data; however, log-transformation becomes problematic when zero counts are pervasive, often resulting in poor control of the false discovery rate (FDR). To address these challenges, we propose Logistic Compositional Analysis (LOCOM) — a robust logistic regression-based approach for compositional data analysis that eliminates the need for pseudocounts. LOCOM leverages permutation-based inference to account for overdispersion and small sample sizes. Additionally, it employs an asymptotic approach to enhance computational efficiency for large-sample datasets. To mitigate batch effects — commonly arising from systematic differences in sequencing depth in large-sample studies — LOCOM appropriately weights samples. Our simulations demonstrate that LOCOM consistently maintains FDR control while achieving significantly improved sensitivity compared to existing methods.
报告人简介:
胡懿娟,北京大学博雅特聘教授,入选国家级人才计划。双聘于北京国际数学研究中心和北大医学部生物统计系。在北京大学数学科学学院获得学士学位(2005)和美国北卡教堂山大学获得生物统计学博士学位(2011)。在美国埃默里大学历任助理教授、副教授和教授。于2024年7月全职回国。致力于开发生物统计学中高维度、高噪声组学数据的统计理论和方法,特别针对微生物组数据和遗传数据中的高维假设检验、稳健推测、缺失/偏差数据等问题。代表工作发表于Journal of American Statistical Association (JASA) 、Proceedings of the National Academy of Sciences(PNAS) 、Microbiome、 American Journal of Human Genetics (AJHG) 等期刊。