Lincen Yang
Postdoc Researcher in Interpretable Machine Learning & Data Mining (Email: ✉️)
Leiden University, The Netherlands
Postdoc Researcher in Interpretable Machine Learning & Data Mining (Email: ✉️)
Leiden University, The Netherlands
Welcome to my personal website!
I am interested in fundamental methodology research that aims to develop human-understandable & trustworthy machine learning models, wich a focus on decision trees, probabilistic rules, histogram density estimations, and the Minimum Length Description (MDL) principle.
I collaborate with scientists and domain experts from various applications fields, including health care, chip making, and environmental science, to apply the developed methods & algorithms to use cases and try to answer real-world 'why' questions by discovering joint effects of feature variables.
During my PhD in Leiden I was supervised by Dr. Matthijs van Leeuwen.
(2025.03) I will organize the Human-Centered Data Mining (HuMine 2025) workshop at ECML-PKDD in Porto as one of the workshop chairs! Thanks for the efforts of all our co-organizers for getting the proposal accepted!
(2025.03) Our survey paper about Diffusion Models for Tabular Data is on Arxiv! Thanks for our first author Zhong Li and other co-authors!
(2024.12) I presented my work about CDTree at NeurIPS 2024 in Vancouver! Glad to meet so many nice researchers!
(2024.11) I was visiting the group of Prof. Jilles Vreeken at CISPA, Saarbrucken, Germany as a visiting Postdoc for seven weeks!
Yang, L & van Leeuwen, M Conditional Density Estimation with Histogram Trees. NeurIPS 2024.
Yang, L, Baratchi, M, & van Leeuwen, M, Unsupervised Discretization by Two-dimensional MDL-based Histogram, Machine Learning, Springer, 2023
Yang, L & van Leeuwen, M, Truly Unordered Probabilistic Rule Sets for Multi-class Classification, ECML-PKDD 2022
*Marx, A, *Yang, L & van Leeuwen, M, Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multi-Dimensional Adaptive Histograms. In: Proceedings of the SIAM Conference on Data Mining 2021, SDM'21 (*contributed equally).
Yang, L & van Leeuwen, M Human-guided Rule Learning for ICU Readmission Risk Analysis. In: Proceedings of the Workshop on AI and Data Science for Healthcare (AIDSH) at KDD 2024, 2024.
Yang, L, Opdam, T & van Leeuwen, M Histogram-based Probabilistic Rule Lists for Numeric Targets. In: Proceedings of the international workshop on Knowledge Discovery in Inductive Databases (KDID 2022) at ECML PKDD 2022, 2022.
Yang, L & van Leeuwen, M, Probabilistic Rule Sets Ready for Interactive Machine Learning. In: AAAI22-Workshop on Interactive Machine Learning, 2022
ECML-PKDD Program Committee 2024
Invited Talks on 23 February 2023, at ECDA Insights Social AI seminar at Erasmus Centre for Data Analytics, Eramus University Rotterdam, The Netherlands
Invited journal reviewer: Data Mining and Knowledge Discovery Journal, Intelligent Data Analysis Journal
Invited conference reviewer: KDD 2021, ICLR 2023
The first dedicated decision tree method for non-parametric conditional density estimation.
Enhancing the explanability of (probabilistic) rule set methods by making them truly unordered: a principled way of handling 'overlaps' of rules.
A measure-theoretic approach for defining entropy for continuous-discrete mixtures and a theoretically consistent estimator for CMI based on multi-dimensional adaptive histograms.
Students Thesis Projects:
Interactive Feature Engineering for Rule-based Models with The Minimum Description Length principle, Tim Opdam (daily supervisor 2024 - now), Master’s Thesis.
Discount Factors for MDL-based Model Selection for RIPPER, Niccolo Maria (daily supervisor 2023 - 2024), Master’s Thesis.
Evaluating Time Cost of Sequence Patterns in Industrial Process (in collaboration with ASML), Chang Liu (daily advisor, 2021 - 2022), Master’s Thesis.
Subgroup Discovery for Numeric Targets with MDL-based Histograms, Tim Opdam (daily advisor, 2022 - 2022), Bachelor’s Thesis.
Can a human-computer hybrid outperform the LTM tiling algorithm? Floyd Remmerswaal (advisor, 2020 - 2021), Bachelor’s Thesis.
Teaching:
Information-theoretic data mining, Tutorial session & Seminar, Fall 2019, 2021
Statistics for Computer Scientists, Tutorial session, Spring 2020, 2021, 2022 and Fall 2022