MBZUAI Nexus Speaker Series
Hosted by: Prof. Mladen Kolar
Tool-using LLM agents can be best understood as resource-constrained decision systems. Each run implicitly solves an operations problem: how to allocate scarce budget (tokens, latency, tool-call limits, and verification/judging compute) across planning, execution, recovery, and checking—under uncertainty about tool reliability, user intent, and when to stop. In this talk, I’ll connect modern agent design to classic OR ideas—sequential decision-making, budgeted optimization, scheduling, and robust objectives—and show how this framing leads to systems that are measurably more reliable, not just larger. I’ll walk through a unified set of results across three themes: (1) tool orchestration in realistic multi-tool environments, with evaluation designed to be diagnostic and trajectory-agnostic; (2) open-ended research agents evaluated via structured rubrics that surface systematic failure modes and make iteration scientific; and (3) cost-aware evaluation protocols, where debate/deliberation and budgeted stopping explicitly trade off accuracy against compute to trace a cost–accuracy frontier. Finally, I’ll discuss why small-model proxies (“analogs”) are a practical accelerator for this agenda: they enable faster experimentation on orchestration policies and evaluation designs at a fraction of the cost, while preserving the failure modes that matter. I’ll close with how these ideas translate into ongoing research collaborations with startups, developing deployable agents with explicit budgets, measurable guarantees, and clear reliability trade-offs.
Hosted by: Prof. Aziz Khan
Many tasks in biological and medical science can be modeled as Pattern Recognition tasks, and AI is playing more and more important roles in those tasks. With the enrichment of single-cell level high-throughput omics data, it is now even possible to build digital virtual cells with advanced AI foundation models. Prof. Xuegong Zhang has been one of the leading researchers in using AI for cutting-edge pattern recognition tasks in biology and medicine, and in prompting the concept and practices of developing AI virtual cell models. In this seminar, he will provide an overview of both the fields based on their own work in the past two decades, and discuss the future trends in AI biology and medicine.
Hosted by: Prof. Marcos Matabuena
Modern digital devices continuously record physiological signals such as heart rate and physical activity, generating rich but complex data that evolve over time and across individuals. This talk introduces flexible statistical frameworks that move beyond modeling averages to capture full outcome distributions and dynamic time patterns. By representing responses through quantile functions and allowing data‐driven transformations of time, the proposed methods provide a unified way to study how entire distributions change with covariates and over the course of daily life. These approaches enable more nuanced questions: not only how a typical heart rate responds to activity, but how variability, extremes, and temporal dynamics differ across individuals and contexts. Applications to continuously monitored wearable data demonstrate how the methods reveal interpretable features of human behavior and physiology, offering powerful tools for digital health research and personalized monitoring.
Hosted by: Eduardo Beltrame
"Cognitive impairment is increasingly recognized as a systemic phenomenon rather than a purely brain-restricted disorder. Across neurodevelopmental conditions, psychiatric disorders, post-infectious syndromes such as long COVID, cancer-related cognitive impairment, and neurodegenerative diseases, peripheral inflammation emerges as a shared and biologically meaningful contributor to cognitive vulnerability. This convergence across diagnostic categories suggests that inflammatory processes act as cross-cutting modifiers of brain function rather than disease-specific epiphenomena. Our results show that inflammatory burden outside the central nervous system is consistently associated with selective cognitive deficits. Importantly, these associations are detectable before overt neurological or psychiatric deterioration, indicating a role in shaping cognitive trajectories rather than merely reflecting established disease. Rather than acting as a nonspecific background factor, peripheral inflammation appears to organize distinct and clinically relevant cognitive phenotypes, with implications for risk stratification, prognosis, and early intervention. This perspective reframes cognitive impairment as a dynamic outcome of systemic brain–body interactions, opening new avenues for prevention-oriented approaches to brain health."
Hosted by: Prof. Chih-Jen Lin
"In physics, phenomena such as light propagation and Newtonian mechanics obey the principle of least action: the true trajectory is a stationary point of the Lagrangian. In our recent work [1], we investigated whether learning, too, follows a least-action principle. We model learning as stationary-action dynamics on information fields. Concretely, we derive classical learning algorithms as stationary points of information-field Lagrangians, recovering Bellman optimality from a reward-based Hamiltonian and Fisher-information–aware updates for estimation. This potentially yields a unifying variational view across reinforcement learning and supervised learning, and suggests optimisers with testable properties. Conceptually, it treats the training of a learning system as the dynamical evolution of a physical system in an abstract information space. Structure is also central to learning, enabling interventional reasoning and scientific understanding. Causality provides a framework for discovering structure from data under the hypothesis that causal mechanisms are independent. In earlier work [2], we formalise independent mechanisms as independent latent variables controlling each mechanism, and show how this perspective extends across effect estimation, counterfactual reasoning, representation learning, and reinforcement learning. Methodologically, in collaboration with Prior Labs, we developed Do-PFN [3], a pre-trained foundation model that performs in-context causal inference. This serves as a promising out-of-the-box tool for practitioners working across diverse scientific domains. References [1] Siyuan Guo and Bernhard Schölkopf. Physics of Learning: A Lagrangian Perspective to Different Learning Paradigms. arXiv preprint arXiv:2509.21049, 2025. [2] Siyuan Guo*, Viktor Tóth*, Bernhard Schölkopf, and Ferenc Huszár. Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data. Advances in Neural Information Processing Systems (NeurIPS), 2023. [3] Jake Robertson*, Arik Reuter*, Siyuan Guo, Noah Hollmann, Frank Hutter, and Bernhard Schölkopf. Do-PFN: In-Context Learning for Causal Effect Estimation. Advances in Neural Information Processing Systems (NeurIPS), 2025. (Spotlight; acceptance rate 3.19%.)"
Hosted by: Prof. Kun Zhang
The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets. While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations. In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient. In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets. First, we bridge the gap by constructing a novel training budget-aware optimization framework, which explicitly accounts for the robustness to landscape curvature variations. From this framework, we derive the UBA schedule, controlled by a single hyper-parameter φ that provides a trade-off between flexibility and simplicity, eliminating the need for per-network numerical optimization. Moreover, we establish a theoretical connection between φ and the condition number, adding interpretation and justification to our approach. Besides, we prove the convergence for different values of φ. We offer practical guidelines for its selection via theoretical analysis and empirical results. Extensive experimental results show that UBA consistently surpasses the commonly-used schedules across diverse vision and language tasks, spanning network architectures (e.g., ResNet, OLMo) and scales, under different training-iteration budgets.
Hosted by: Prof. Natasa Przulj
Understanding the deep human past requires analytical frameworks capable of integrating diverse datasets and tracing long-term trajectories of cultural and environmental change. Archaeology—uniquely positioned at the intersection of material culture, ecology, and human behaviour—holds unparalleled potential to address these challenges. This talk presents a suite of pioneering studies in which artificial intelligence, network science, and complexity theory are applied to Eurasian archaeological datasets, offering the most robust quantitative framework to date for modelling cooperation, exchange, and cultural co-evolution. The first part of the talk focuses on the origins of metallurgy in the Balkans between the 6th and 3rd millennia BC, where copper production and circulation first took recognisable regional form. Using trace element and lead isotope analyses from 410 artefacts across c. 80 sites (6200–3200 BC), we apply seven community detection algorithms—including Louvain, Leiden, Spinglass, and Eigenvector methods—to reconstruct prehistoric copper-supply networks. These models reveal stable and meaningful supply communities that correlate strikingly with regional archaeological cultures such as Vinča, KGK VI and Bodrogkeresztúr. By critically evaluating algorithm performance on archaeological compositional data, this case study not only demonstrates the power of network science for reconstructing prehistoric exchange but also challenges the traditional, typology-based concept of “archaeological culture.” It exemplifies how AI and complexity science can rigorously decode patterns of cooperation, resource movement, and social boundaries in the deep past.
Hosted by: Prof. Zhiqiang Xu
In this talk, I will discuss the development of machine learning for combinatorial optimization, covering general methodology and especially generative models for AI4Opt. I will show how the idea of diffusion models could be introduced to solve the notoriously hard combinatorial problems. I will also share some forward-looking ideas on future research directions.
Hosted by: Muhammad Haris Khan
We spend a lot of time in training a network to recognize different but a fixed number of types of objects in a scene. If we are to induct new object classes subsequently in the recognition engine, should we be retraining the network from scratch again? Can we tweak the network so that it can incrementally learn new classes of object? Unfortunately, any attempt to incrementally learn new concepts may also lead to forgetting, often catastrophic, of previously learnt concepts. Similarly, can we also selectively forget a few concepts that may be required for socio-technical reasons? In this talk, we shall discuss how some of these objectives can be achieved.
Hosted by: Prof. Marcos Matabuena
In recent years, Reinforcement Learning (RL) has gained a prominent position in addressing health-related sequential decision-making problems. In this talk, we will discuss two such sequential decision-making problems: (1) dynamic treatment regimes (DTRs), i.e., clinical decision rules for adapting the type, dosage and timing of treatment according to an individual patient’s characteristics and evolving health status; and (2) just-in-time adaptive interventions (JITAIs) in mobile app-based behavioral nudges in population health. Specifically, we will illustrate the similarities and differences between these two types of RL problems (e.g., offline vs. online RL), common algorithms used in these two settings (e.g., Q-learning vs. Thomson sampling), and real-life case studies.
Hosted by: Prof. Laura Koesten
Machine learning classifiers are increasingly applied to complex tasks such as audio tagging, image labeling, and text classification -- many of which require multi-label classification. Traditional evaluation tools, often limited to single metrics such as accuracy, fall short of providing insight into classifier behavior across multiple labels. To address this, we present MLMC, an interactive visualization tool for evaluating and comparing multi-label classifiers. Based on expert interviews, MLMC supports analysis at instance-, label-, and classifier-level views, offering a scalable, more interpretable alternative. We demonstrate its use across three different domains and describe its core algorithms and user interface. Two pilot studies (N=$6$ each) provided insight into MLMC's usability and showed improved task accuracy, consistency, and user confidence compared to confusion matrices. Results highlight MLMC's potential as a practical tool for intuitive evaluation of multi-label classifiers, with implications for a broad range of machine learning applications. Our approach is using the Design Study Methodology, which is rooted in Human-Centered Design.
Hosted by: Prof. Eric Moulines
"Stochastic differential equations (SDEs) provide a flexible framework for modeling time series, dynamical systems, and sequential data. However, learning SDEs from data typically relies on adjoint sensitivity methods, which require repeated simulation, time discretization, and backpropagation through approximate SDE solvers, leading to significant computational overhead and limited scalability. We introduce SDE Matching, a simulation- and discretization-free approach for learning stochastic dynamics directly from data. Building on recent advances in score matching and flow matching for generative modeling, we extend these ideas to the dynamical setting, enabling direct learning of SDE drift and diffusion terms without numerical simulation. SDE Matching replaces solver-based training with a regression-like objective defined on transformed data samples, eliminating the need for backpropagation through stochastic trajectories. Empirically, SDE Matching achieves accuracy comparable to adjoint sensitivity-based methods while substantially reducing computational cost, offering a scalable alternative for learning stochastic dynamical systems. We demonstrate these results across a range of synthetic and real-world dynamical modeling tasks."
Hosted by: Prof. Eric Moulines
Information design is a seminal concept in economics wherein a party with information advantage can strategically reveal this to influence the actions of a rational decision-maker. This talk centers on my efforts to bridge this model to emerging computational and machine learning paradigms. While the classic model assumes that only the quantitative structure of information matters, behavioral economics and psychology emphasize that the framing of information also plays a key role. My recent work formalizes a language-based notion of framing for information design and combines analytical methods to design information structures with LLMs to optimize the language/framing. I explore, both theoretically and empirically, when this LLM-augmented approach is tractable. I will also discuss a second work that uses information design as a light-weight approach to content moderation on social media. Doing so requires a new framework where the information advantage originates from a machine learning model and the interaction is dynamic with long-term intervention effects. I will conclude by connecting these threads to my broader research agenda on strategic decision-making in multi-agent systems.
Hosted by: Prof. Yoshihiko Nakamura
As robotic systems grow more capable and ubiquitous, their increasing scale and complexity necessitate a shift toward robust, scalable controllers and automated synthesis methods. My group has approached this challenge by turning to distributed (multi-agent) reinforcement learning (MARL) approaches, with an emphasis on understanding and eliciting emergent coordination/cooperation in multi-robot systems and articulated robots (where agents are individual joints). There, our focus lies in improving information representations and neural architectures, as well as devising learning techniques that can help them explore their high-dimensional joint policy space, to identify and reinforce high-quality policies that naturally fit together towards team-level cooperation. In this talk, I will discuss the three main areas my group has been investigating: imitation learning, modularized/hierarchical neural structures, and learning scaffolding. I will describe these techniques within a wide variety of robotic applications, such as multi-agent pathfinding, autonomous exploration/search, traffic signal control, collaborative manipulation, and legged loco-manipulation. Finally, I will also briefly touch on some of our ongoing and future work. Throughout this journey, my goal will be to highlight the key challenges surrounding learning representation, policy space exploration, and scalability/robustness of learned policies, and outline some of the open avenues for research in this exciting area of robotics.
Hosted by: Hongyuan Cao
This talk introduces a novel nonparametric inference framework for functional data having sample paths of bounded variation, with applications in a variety of complex statistical settings. The main application will be to wearable device data collected in a Columbia-based study of an experimental therapy for mitochondrial disease, a group of disorders that affect the body's ability to produce energy. Specifically, we provide the first clinical application of a novel, bias-adjusted outcome measure of acceleration across a range of subjects' activities to assess nucleoside therapy for thymidine kinase 2 deficiency, an ultra-rare autosomal recessive mitochondrial disease.


Seville, Spain
Hangzhou, China
Sharm El Sheikh, Egypt
Rotterdam, Netherlands
Vienna, Austria
United Kingdom
Vancouver, Canada 





