MBZUAI Nexus Speaker Series
Hosted by: Prof. Kun Zhang
The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets. While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations. In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient. In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets. First, we bridge the gap by constructing a novel training budget-aware optimization framework, which explicitly accounts for the robustness to landscape curvature variations. From this framework, we derive the UBA schedule, controlled by a single hyper-parameter φ that provides a trade-off between flexibility and simplicity, eliminating the need for per-network numerical optimization. Moreover, we establish a theoretical connection between φ and the condition number, adding interpretation and justification to our approach. Besides, we prove the convergence for different values of φ. We offer practical guidelines for its selection via theoretical analysis and empirical results. Extensive experimental results show that UBA consistently surpasses the commonly-used schedules across diverse vision and language tasks, spanning network architectures (e.g., ResNet, OLMo) and scales, under different training-iteration budgets.
Hosted by: Prof. Eduardo Beltrame
Recent research has imported tools from network science control theory towards studying controllability properties of brain circuits, and investigating the possibility of restoring or enhancing brain activity using brain stimulation. However, a fundamental challenge here is that current notions of controllability based on the structural connections of the human brain may be inadequate for the study of human brain function. We use system identification, network science, stability analysis, and control theory to probe functional circuit dynamics during working memory task performance. Our main finding is that the Network controllability decreases with working memory load and SN nodes show the highest functional controllability. Our findings reveal dissociable roles of the SN and FPN in systems control and provide novel insights into dynamic circuit mechanisms by which cognitive control circuits operate asymmetrically during cognition.
Hosted by: Prof. Natasa Przulj
Understanding the deep human past requires analytical frameworks capable of integrating diverse datasets and tracing long-term trajectories of cultural and environmental change. Archaeology—uniquely positioned at the intersection of material culture, ecology, and human behaviour—holds unparalleled potential to address these challenges. This talk presents a suite of pioneering studies in which artificial intelligence, network science, and complexity theory are applied to Eurasian archaeological datasets, offering the most robust quantitative framework to date for modelling cooperation, exchange, and cultural co-evolution. The first part of the talk focuses on the origins of metallurgy in the Balkans between the 6th and 3rd millennia BC, where copper production and circulation first took recognisable regional form. Using trace element and lead isotope analyses from 410 artefacts across c. 80 sites (6200–3200 BC), we apply seven community detection algorithms—including Louvain, Leiden, Spinglass, and Eigenvector methods—to reconstruct prehistoric copper-supply networks. These models reveal stable and meaningful supply communities that correlate strikingly with regional archaeological cultures such as Vinča, KGK VI and Bodrogkeresztúr. By critically evaluating algorithm performance on archaeological compositional data, this case study not only demonstrates the power of network science for reconstructing prehistoric exchange but also challenges the traditional, typology-based concept of “archaeological culture.” It exemplifies how AI and complexity science can rigorously decode patterns of cooperation, resource movement, and social boundaries in the deep past.
Hosted by: Prof. Zhiqiang Xu
In this talk, I will discuss the development of machine learning for combinatorial optimization, covering general methodology and especially generative models for AI4Opt. I will show how the idea of diffusion models could be introduced to solve the notoriously hard combinatorial problems. I will also share some forward-looking ideas on future research directions.
Hosted by: Muhammad Haris Khan
We spend a lot of time in training a network to recognize different but a fixed number of types of objects in a scene. If we are to induct new object classes subsequently in the recognition engine, should we be retraining the network from scratch again? Can we tweak the network so that it can incrementally learn new classes of object? Unfortunately, any attempt to incrementally learn new concepts may also lead to forgetting, often catastrophic, of previously learnt concepts. Similarly, can we also selectively forget a few concepts that may be required for socio-technical reasons? In this talk, we shall discuss how some of these objectives can be achieved.
Hosted by: Prof. Marcos Matabuena
In recent years, Reinforcement Learning (RL) has gained a prominent position in addressing health-related sequential decision-making problems. In this talk, we will discuss two such sequential decision-making problems: (1) dynamic treatment regimes (DTRs), i.e., clinical decision rules for adapting the type, dosage and timing of treatment according to an individual patient’s characteristics and evolving health status; and (2) just-in-time adaptive interventions (JITAIs) in mobile app-based behavioral nudges in population health. Specifically, we will illustrate the similarities and differences between these two types of RL problems (e.g., offline vs. online RL), common algorithms used in these two settings (e.g., Q-learning vs. Thomson sampling), and real-life case studies.
Hosted by: Prof. Laura Koesten
Machine learning classifiers are increasingly applied to complex tasks such as audio tagging, image labeling, and text classification -- many of which require multi-label classification. Traditional evaluation tools, often limited to single metrics such as accuracy, fall short of providing insight into classifier behavior across multiple labels. To address this, we present MLMC, an interactive visualization tool for evaluating and comparing multi-label classifiers. Based on expert interviews, MLMC supports analysis at instance-, label-, and classifier-level views, offering a scalable, more interpretable alternative. We demonstrate its use across three different domains and describe its core algorithms and user interface. Two pilot studies (N=$6$ each) provided insight into MLMC's usability and showed improved task accuracy, consistency, and user confidence compared to confusion matrices. Results highlight MLMC's potential as a practical tool for intuitive evaluation of multi-label classifiers, with implications for a broad range of machine learning applications. Our approach is using the Design Study Methodology, which is rooted in Human-Centered Design.
Hosted by: Prof. Eric Moulines
"Stochastic differential equations (SDEs) provide a flexible framework for modeling time series, dynamical systems, and sequential data. However, learning SDEs from data typically relies on adjoint sensitivity methods, which require repeated simulation, time discretization, and backpropagation through approximate SDE solvers, leading to significant computational overhead and limited scalability. We introduce SDE Matching, a simulation- and discretization-free approach for learning stochastic dynamics directly from data. Building on recent advances in score matching and flow matching for generative modeling, we extend these ideas to the dynamical setting, enabling direct learning of SDE drift and diffusion terms without numerical simulation. SDE Matching replaces solver-based training with a regression-like objective defined on transformed data samples, eliminating the need for backpropagation through stochastic trajectories. Empirically, SDE Matching achieves accuracy comparable to adjoint sensitivity-based methods while substantially reducing computational cost, offering a scalable alternative for learning stochastic dynamical systems. We demonstrate these results across a range of synthetic and real-world dynamical modeling tasks."
Hosted by: Prof. Eric Moulines
Information design is a seminal concept in economics wherein a party with information advantage can strategically reveal this to influence the actions of a rational decision-maker. This talk centers on my efforts to bridge this model to emerging computational and machine learning paradigms. While the classic model assumes that only the quantitative structure of information matters, behavioral economics and psychology emphasize that the framing of information also plays a key role. My recent work formalizes a language-based notion of framing for information design and combines analytical methods to design information structures with LLMs to optimize the language/framing. I explore, both theoretically and empirically, when this LLM-augmented approach is tractable. I will also discuss a second work that uses information design as a light-weight approach to content moderation on social media. Doing so requires a new framework where the information advantage originates from a machine learning model and the interaction is dynamic with long-term intervention effects. I will conclude by connecting these threads to my broader research agenda on strategic decision-making in multi-agent systems.
Hosted by: Prof. Eric Moulines
Stochastic optimal control problem with a final constraint provides a natural way to construct a Schrödinger bridge between two distributions, making it well‑suited for generative modelling. In this problem, the optimal control can be expressed through the Schrödinger potential, which depends on the target distribution — typically unknown in practice. We address the problem of estimating this potential from finite samples. Focusing on estimators that minimize the empirical Kullback Leibler (KL) divergence, we study their generalization abilities. Despite the loss function’s unusual structure, we show that it exhibits favourable geometric properties under mild assumptions that hold for a broad class of target distributions. We derive non‑asymptotic, high‑probability upper bounds for the potential estimation accuracy, measured in terms of excess KL‑risk. In the second part of the talk , we show that the Schrödinger system could be rewritten in terms of a single positive transformed potential that satisfies a nonlinear fixed-point equation and estimate this potential by empirical risk minimization over a function class. The talk is based on the joint work with D. Belomestny, N. Puchkin and D. Suchkov.
Hosted by: Prof. Yoshihiko Nakamura
As robotic systems grow more capable and ubiquitous, their increasing scale and complexity necessitate a shift toward robust, scalable controllers and automated synthesis methods. My group has approached this challenge by turning to distributed (multi-agent) reinforcement learning (MARL) approaches, with an emphasis on understanding and eliciting emergent coordination/cooperation in multi-robot systems and articulated robots (where agents are individual joints). There, our focus lies in improving information representations and neural architectures, as well as devising learning techniques that can help them explore their high-dimensional joint policy space, to identify and reinforce high-quality policies that naturally fit together towards team-level cooperation. In this talk, I will discuss the three main areas my group has been investigating: imitation learning, modularized/hierarchical neural structures, and learning scaffolding. I will describe these techniques within a wide variety of robotic applications, such as multi-agent pathfinding, autonomous exploration/search, traffic signal control, collaborative manipulation, and legged loco-manipulation. Finally, I will also briefly touch on some of our ongoing and future work. Throughout this journey, my goal will be to highlight the key challenges surrounding learning representation, policy space exploration, and scalability/robustness of learned policies, and outline some of the open avenues for research in this exciting area of robotics.
Hosted by: Hongyuan Cao
This talk introduces a novel nonparametric inference framework for functional data having sample paths of bounded variation, with applications in a variety of complex statistical settings. The main application will be to wearable device data collected in a Columbia-based study of an experimental therapy for mitochondrial disease, a group of disorders that affect the body's ability to produce energy. Specifically, we provide the first clinical application of a novel, bias-adjusted outcome measure of acceleration across a range of subjects' activities to assess nucleoside therapy for thymidine kinase 2 deficiency, an ultra-rare autosomal recessive mitochondrial disease.


Seville, Spain
Hangzhou, China
Sharm El Sheikh, Egypt
Rotterdam, Netherlands
Vienna, Austria
United Kingdom
Vancouver, Canada 





