2025 Fall
The seminar of this semester is organized by Shibo Zeng and Yongle Xie, and co-organized by the graduate student union in the School of Mathematical Sciences at Fudan. This section is partially sponsored by Shanghai Key Laboratory for Contemporary Applied Mathematics.
2025-11-13 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower
[poster]
- Title:
An adaptive Hermite spectral method for the Boltzmann equation
- Speaker: Jianjiang Yu (Fudan University)
- Advisor: Weiguo Gao, Luo Luo (Fudan University)
Abstract: Click to expand
We present new theoretical results for the BFGS method with an adaptive step size~[Gao and Goldfarb, Optimization Methods and Software, 34(1):194-217, 2019], showing explicit two-phase global convergence: a linear phase at rate $\mathcal{O}((1 - 1/\varkappa)^{k})$ and a superlinear phase at $\mathcal{O}((\varkappa/k)^{k})$, where $k$ is the iteration counter and $\varkappa$ is the condition number.
In contrast, classical analyses establish asymptotic convergence only, and recent non-asymptotic results mainly address local convergence under the unit step size or global guarantees with line search.
We further propose a smoothness-aided variant that takes a larger adaptive step by leveraging the gradient Lipschitz continuity, thereby accelerating early convergence.
These results provide the first explicit non-asymptotic global characterization of BFGS without line search.
Past Presentations
2025-09-18 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower
[poster]
- Title:
Sharp Estimates for Optimal Multistage Group Partition Testing
- Speaker: Guojiang Shao (Fudan University)
- Advisor: Qi Zhang (Fudan University)
Abstract: Click to expand
In multistage group testing, the tests within the same stage
are considered nonadaptive, while those conducted across
different stages are adaptive. Especially, when the pools
within the same stage are disjoint, meaning that
the entire set is divided into several disjoint subgroups,
it is referred to as a multistage group partition testing problem,
denoted as the $(n, d, s)$ problem, where $n$, $d$,
and $s$ represent the total number of items, defectives, and stages respectively.
This paper presents exact solutions for the $(n,1,s)$ and $(n,d,2)$ problems for the first time.
Furthermore, we develop a general dynamic programming framework for the $(n,d,s)$ problem,
which allows us to derive the sharp estimation of upper and lower bounds.
2025-09-25 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower
[poster]
- Title:
Reinforcement Learning for Heterogeneous DAG Scheduling with Weighted Cross-Attention
- Speaker: Ruisong Zhou (Peking University)
- Advisor: Zaiwen Wen (Peking University)
Abstract: Click to expand
Efficient scheduling of directed acyclic graphs (DAGs) in
heterogeneous environments is challenging due to diverse resource capacities
and intricate dependencies. In practice, scalability across environments
with varying resource pools, task types, and other settings,
alongside rapid schedule generation, complicates these challenges. We
propose WeCAN, an end-to-end reinforcement learning framework excelling
in heterogeneous DAG scheduling featuring task-resource compatibility.
WeCAN rapidly generates schedules through single-pass network
inference. Leveraging the weighted cross-attention layer, WeCAN
utilizes all available environment information while preserving scalability
across diverse heterogeneous environments. Moreover, we introduce
a criterion to analyze the optimality gap inherent in list scheduling based
methods, revealing barriers preventing these methods from consistently
finding optimal solutions. The skip action introduced in our framework
addresses this gap. Our approach delivers robust performance and scalability,
outperforming state-of-the-art methods across diverse datasets.
2025-10-09 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower
[poster]
- Title:
Utilizing Causal Network Markers to Identify Tipping Points ahead of Critical Transition
- Speaker: Shirui Bian (Fudan University)
- Advisor: Wei Lin (Fudan University)
Abstract: Click to expand
Early-warning signals of delicate design are always used to
predict critical transitions in complex systems, which makes it
possible to render the systems far away from the catastrophic
state by introducing timely interventions. Traditional signals
including the dynamical network biomarker (DNB), based on
statistical properties such as variance and autocorrelation of
nodal dynamics, overlook directional interactions and thus have
limitations in capturing underlying mechanisms and simultaneously
sustaining robustness against noise perturbations. This paper
therefore introduces a framework of causal network markers (CNMs)
by incorporating causality indicators, which reflect the
directional influence between variables. Actually, to detect and
identify the tipping points ahead of critical transition, two
markers are designed: CNM-GC for linear causality and CNM-TE for
non-linear causality, as well as a functional representation of
different causality indicators and a clustering technique to
verify the system's dominant group. Through demonstrations using
benchmark models and real-world datasets of epileptic seizure,
the framework of CNMs shows higher predictive power and accuracy
than the traditional DNB indicator. It is believed that, due to
the versatility and scalability, the CNMs are suitable for
comprehensively evaluating the systems. The most possible
direction for application includes the identification of tipping
points in clinical disease.
2025-10-16 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower
[poster]
- Title:
Phenomenon-Driven Deep Learning Theory: From Implicit Regularization in Matrix Factorization to Loss Spike Mechanisms in Adam
- Speaker: Zhiwei Bai (Shanghai Jiao Tong University)
- Advisor: Yaoyu Zhang, Zhiqin Xu (Shanghai Jiao Tong University)
Abstract: Click to expand
Deep neural networks, as highly nonlinear complex systems, present formidable theoretical challenges. Phenomenon-driven research—grounded in meticulous observation and carefully
designed experiments to discover intrinsic system patterns—offers a crucial gateway to
understanding these complex systems. This talk presents our recent advances in deep
learning generalization and optimization theory through a phenomenon-driven approach.
One of the most counterintuitive phenomena in modern machine learning is that neural networks maintain excellent generalization despite overparameterization. Understanding
implicit regularization mechanisms in overparameterized models has become essential to
deep learning theory. Matrix factorization models, as an important subclass, provide an
ideal testbed for studying implicit regularization. This talk first reviews the
generalization puzzle, and introduces our discovery of a fundamental structural property
of loss landscapes: the Embedding Principle, which reveals an elegant inheritance
relationship between critical points across networks of different scales. Building on
this, we analyze matrix factorization training dynamics from a model-data decoupling
perspective, elucidating when, how, and why different implicit regularization effects (low
rank, low nuclear norm) emerge, providing a unified understanding of this system.
This talk also presents another phenomenon-driven study: loss spike—a sudden and sharp
surge in the loss function that subsequently subsides. These spikes are observed across a
wide range of network architectures and datasets, yet their underlying mechanisms remain
elusive. While previous studies attributed loss spikes to complex loss landscape geometry,
we find they originate from Adam's adaptive preconditioning mechanism. Specifically, when
gradients in certain layers gradually diminish during training, the adaptive mechanism
persistently pushes the maximum eigenvalue of the preconditioned Hessian above the
stability threshold, triggering sustained instability. This result provides a novel
theoretical perspective for understanding and controlling loss spike behavior.
2025-10-23 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower
[poster]
- Title:
Adaptive-Growth Randomized Neural Networks for PDEs: Algorithms and Numerical Analysis
- Speaker: Haoning Dang (Xi'an Jiaotong University)
- Advisor: Fei Wang (Xi'an Jiaotong University)
Abstract: Click to expand
Randomized neural network (RaNN) methods have been proposed for solving various partial
differential equations (PDEs), demonstrating high accuracy and efficiency. However,
initializing the fixed parameters remains a challenging issue. Additionally, RaNNs often
struggle to solve PDEs with sharp or discontinuous solutions. In this talk, we propose a
novel approach called Adaptive-Growth Randomized Neural Network (AG-RaNN) to address these
challenges. We introduce growth strategies that expand the neural network, making it wider
and deeper to improve the accuracy of the numerical solution. A key feature of AG-RaNN is
its adaptive strategy for determining the weights and biases of newly added neurons,
enabling the network to expand in both width and depth without requiring additional
training. Instead, all weights and biases are generated constructively, significantly
enhancing the network's approximation capabilities compared to conventional randomized
neural network methods. In addition, a domain splitting strategy is introduced to handle
the case of discontinuous solutions. A comprehensive theoretical analysis of RaNN methods
is also presented, covering approximation, statistical, and optimization errors.
2025-10-30 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower
[poster]
- Title:
Convection-Diffusion Equation: A Theoretically Certified Framework for Neural Networks
- Speaker: Tangjun Wang (Tsinghua University)
- Advisor: Zuoqiang Shi (Tsinghua University)
Abstract: Click to expand
Differential equations have demonstrated intrinsic connections to network structures,
linking discrete network layers through continuous equations. Most existing approaches
focus on the interaction between ordinary differential equations (ODEs) and feature
transformations, primarily working on input signals. In this paper, we study the partial
differential equation (PDE) model of neural networks, viewing the neural network as a
functional operating on a base model provided by the last layer of the classifier.
Inspired by scale-space theory, we theoretically prove that this mapping can be formulated
by a convection-diffusion equation, under interpretable and intuitive assumptions from
both neural network and PDE perspectives. This theoretically certified framework covers
various existing network structures and training techniques, offering a mathematical
foundation and new insights into neural networks. Moreover, based on the
convection-diffusion equation model, we design a new network structure that incorporates a
diffusion mechanism into the network architecture from a PDE perspective. Extensive
experiments confirm the effectiveness of the proposed model.
2025-11-06 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower
[poster]
- Title:
An adaptive Hermite spectral method for the Boltzmann equation
- Speaker: Jie Wu (Peking University)
- Advisor: Sihong Shao (Peking University)
Abstract: Click to expand
In this talk, we will propose an adaptive Hermite spectral method for the three-dimensional
velocity space of the Boltzmann equation guided by a newly developed frequency indicator. For
the homogeneous problem, the indicator is defined by the contribution of high-order
coefficients in the spectral expansion. For the non-homogeneous problem, a Fourier-Hermite
scheme is employed, with the corresponding frequency indicator formulated based on
distributions across the entire spatial domain. The adaptive Hermite method includes scaling
and $p$-adaptive techniques to dynamically adjust the scaling factor and expansion order
according to the indicator. Numerical experiments cover both homogeneous and non-homogeneous
problems in up to three spatial dimensions. Results demonstrate that the adaptive method
substantially reduces $L^2$ errors at negligible computational cost, and the $p$-adaptive
method achieves time savings of up to 74\%.