Lincen Yang
Postdoc Researcher in Interpretable Machine Learning & Data Mining (Email: ✉️)
Leiden University, The Netherlands
Postdoc Researcher in Interpretable Machine Learning & Data Mining (Email: ✉️)
Leiden University, The Netherlands
Welcome to my personal website!
I am interested in fundamental methodology research that aims to develop human-understandable & trustworthy machine learning models, wich a focus on decision trees, probabilistic rules, histogram density estimations, and the Minimum Length Description (MDL) principle.
I collaborate with scientists and domain experts from various applications fields, including health care, chip making, and environmental science, to apply the developed methods & algorithms to use cases and try to answer real-world 'why' questions by discovering joint effects of feature variables.
During my PhD in Leiden I was supervised by Dr. Matthijs van Leeuwen.
(2025.09) I am looking for a Master Student to work on a thesis project "Human-guided Subgroup Discovery for Chip Microarchitecture Design", in collaboration with CNRS (Montpellier, France). We aim at a KDD / ECML-PKDD (ADS track) submission next year.
(2025.09) Our paper titled "Interpretable Machine Learning for Identifying ICU Readmission Risk in Subgroups with Probabilistic Rules" has been accepted to the medical informatics journal Journal of the American Medical Informatics Association (JAMIA). Thank our collaborators Siri van der Meijden (Healthplus.ai) & Sesmu Arbous (Leiden University Medical Center) and looking forward to our further collaborations on trustworthy AI in healthcare!
(2025.09) Our paper titled "Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching" has been accepted to NeurIPS 2025! Thank all our collaborators Zhong Li, Qi Huang, Yuxuan Zhu (and more)!
(2025.09) Will attend the farewell lecture of my academic grandfather Prof. Arno Siebes in Utrecht.
(2025.09) Our Human-Centered Data Mining (HuMine 2025) workshop at ECML-PKDD has finished. Thank all our participants for the excellent presentations and interesting discussions!
(2025.06) I will attend SMiLe 2025 in Sint-Michielsgestel (NL). Looking forward to meeting a lot of people there!
(2025.03) I will organize the Human-Centered Data Mining (HuMine 2025) workshop at ECML-PKDD in Porto as one of the workshop chairs! Thanks for the efforts of all our co-organizers for getting the proposal accepted!
(2025.03) Our survey paper about Diffusion Models for Tabular Data is on Arxiv! Thanks for our first author Zhong Li and other co-authors!
(2024.12) I presented my work about CDTree at NeurIPS 2024 in Vancouver! Glad to meet so many nice researchers!
(2024.11) I was visiting the group of Prof. Jilles Vreeken at CISPA, Saarbrucken, Germany as a visiting Postdoc for seven weeks!
Yang, L & van Leeuwen, M Conditional Density Estimation with Histogram Trees. NeurIPS 2024.
Yang, L, Baratchi, M, & van Leeuwen, M, Unsupervised Discretization by Two-dimensional MDL-based Histogram, Machine Learning, Springer, 2023
Yang, L & van Leeuwen, M, Truly Unordered Probabilistic Rule Sets for Multi-class Classification, ECML-PKDD 2022
*Marx, A, *Yang, L & van Leeuwen, M, Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multi-Dimensional Adaptive Histograms. In: Proceedings of the SIAM Conference on Data Mining 2021, SDM'21 (*contributed equally).
Yang, L & van Leeuwen, M Human-guided Rule Learning for ICU Readmission Risk Analysis. In: Proceedings of the Workshop on AI and Data Science for Healthcare (AIDSH) at KDD 2024, 2024.
Yang, L, Opdam, T & van Leeuwen, M Histogram-based Probabilistic Rule Lists for Numeric Targets. In: Proceedings of the international workshop on Knowledge Discovery in Inductive Databases (KDID 2022) at ECML PKDD 2022, 2022.
Yang, L & van Leeuwen, M, Probabilistic Rule Sets Ready for Interactive Machine Learning. In: AAAI22-Workshop on Interactive Machine Learning, 2022
ECML-PKDD Program Committee 2024
Invited Talks on 23 February 2023, at ECDA Insights Social AI seminar at Erasmus Centre for Data Analytics, Eramus University Rotterdam, The Netherlands
Invited journal reviewer: Data Mining and Knowledge Discovery Journal, Intelligent Data Analysis Journal
Invited conference reviewer: KDD 2021, ICLR 2023
The first dedicated decision tree method for non-parametric conditional density estimation.
Enhancing the explanability of (probabilistic) rule set methods by making them truly unordered: a principled way of handling 'overlaps' of rules.
A measure-theoretic approach for defining entropy for continuous-discrete mixtures and a theoretically consistent estimator for CMI based on multi-dimensional adaptive histograms.
Students Thesis Projects:
Interactive Feature Engineering for Rule-based Models with The Minimum Description Length principle, Tim Opdam (daily supervisor 2024 - now), Master’s Thesis.
Discount Factors for MDL-based Model Selection for RIPPER, Niccolo Maria (daily supervisor 2023 - 2024), Master’s Thesis.
Evaluating Time Cost of Sequence Patterns in Industrial Process (in collaboration with ASML), Chang Liu (daily advisor, 2021 - 2022), Master’s Thesis.
Subgroup Discovery for Numeric Targets with MDL-based Histograms, Tim Opdam (daily advisor, 2022 - 2022), Bachelor’s Thesis.
Can a human-computer hybrid outperform the LTM tiling algorithm? Floyd Remmerswaal (advisor, 2020 - 2021), Bachelor’s Thesis.
Teaching:
Information-theoretic data mining, Tutorial session & Seminar, Fall 2019, 2021
Statistics for Computer Scientists, Tutorial session, Spring 2020, 2021, 2022 and Fall 2022