2025 Fall

The seminar of this semester is organized by Shibo Zeng and Yongle Xie, and co-organized by the graduate student union in the School of Mathematical Sciences at Fudan. This section is partially sponsored by Shanghai Key Laboratory for Contemporary Applied Mathematics.

2025-11-13 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand We present new theoretical results for the BFGS method with an adaptive step size~[Gao and Goldfarb, Optimization Methods and Software, 34(1):194-217, 2019], showing explicit two-phase global convergence: a linear phase at rate $\mathcal{O}((1 - 1/\varkappa)^{k})$ and a superlinear phase at $\mathcal{O}((\varkappa/k)^{k})$, where $k$ is the iteration counter and $\varkappa$ is the condition number. In contrast, classical analyses establish asymptotic convergence only, and recent non-asymptotic results mainly address local convergence under the unit step size or global guarantees with line search. We further propose a smoothness-aided variant that takes a larger adaptive step by leveraging the gradient Lipschitz continuity, thereby accelerating early convergence. These results provide the first explicit non-asymptotic global characterization of BFGS without line search.

Past Presentations

2025-09-18 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In multistage group testing, the tests within the same stage are considered nonadaptive, while those conducted across different stages are adaptive. Especially, when the pools within the same stage are disjoint, meaning that the entire set is divided into several disjoint subgroups, it is referred to as a multistage group partition testing problem, denoted as the $(n, d, s)$ problem, where $n$, $d$, and $s$ represent the total number of items, defectives, and stages respectively. This paper presents exact solutions for the $(n,1,s)$ and $(n,d,2)$ problems for the first time. Furthermore, we develop a general dynamic programming framework for the $(n,d,s)$ problem, which allows us to derive the sharp estimation of upper and lower bounds.

2025-09-25 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Efficient scheduling of directed acyclic graphs (DAGs) in heterogeneous environments is challenging due to diverse resource capacities and intricate dependencies. In practice, scalability across environments with varying resource pools, task types, and other settings, alongside rapid schedule generation, complicates these challenges. We propose WeCAN, an end-to-end reinforcement learning framework excelling in heterogeneous DAG scheduling featuring task-resource compatibility. WeCAN rapidly generates schedules through single-pass network inference. Leveraging the weighted cross-attention layer, WeCAN utilizes all available environment information while preserving scalability across diverse heterogeneous environments. Moreover, we introduce a criterion to analyze the optimality gap inherent in list scheduling based methods, revealing barriers preventing these methods from consistently finding optimal solutions. The skip action introduced in our framework addresses this gap. Our approach delivers robust performance and scalability, outperforming state-of-the-art methods across diverse datasets.

2025-10-09 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Early-warning signals of delicate design are always used to predict critical transitions in complex systems, which makes it possible to render the systems far away from the catastrophic state by introducing timely interventions. Traditional signals including the dynamical network biomarker (DNB), based on statistical properties such as variance and autocorrelation of nodal dynamics, overlook directional interactions and thus have limitations in capturing underlying mechanisms and simultaneously sustaining robustness against noise perturbations. This paper therefore introduces a framework of causal network markers (CNMs) by incorporating causality indicators, which reflect the directional influence between variables. Actually, to detect and identify the tipping points ahead of critical transition, two markers are designed: CNM-GC for linear causality and CNM-TE for non-linear causality, as well as a functional representation of different causality indicators and a clustering technique to verify the system's dominant group. Through demonstrations using benchmark models and real-world datasets of epileptic seizure, the framework of CNMs shows higher predictive power and accuracy than the traditional DNB indicator. It is believed that, due to the versatility and scalability, the CNMs are suitable for comprehensively evaluating the systems. The most possible direction for application includes the identification of tipping points in clinical disease.

2025-10-16 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Deep neural networks, as highly nonlinear complex systems, present formidable theoretical challenges. Phenomenon-driven research—grounded in meticulous observation and carefully designed experiments to discover intrinsic system patterns—offers a crucial gateway to understanding these complex systems. This talk presents our recent advances in deep learning generalization and optimization theory through a phenomenon-driven approach. One of the most counterintuitive phenomena in modern machine learning is that neural networks maintain excellent generalization despite overparameterization. Understanding implicit regularization mechanisms in overparameterized models has become essential to deep learning theory. Matrix factorization models, as an important subclass, provide an ideal testbed for studying implicit regularization. This talk first reviews the generalization puzzle, and introduces our discovery of a fundamental structural property of loss landscapes: the Embedding Principle, which reveals an elegant inheritance relationship between critical points across networks of different scales. Building on this, we analyze matrix factorization training dynamics from a model-data decoupling perspective, elucidating when, how, and why different implicit regularization effects (low rank, low nuclear norm) emerge, providing a unified understanding of this system. This talk also presents another phenomenon-driven study: loss spike—a sudden and sharp surge in the loss function that subsequently subsides. These spikes are observed across a wide range of network architectures and datasets, yet their underlying mechanisms remain elusive. While previous studies attributed loss spikes to complex loss landscape geometry, we find they originate from Adam's adaptive preconditioning mechanism. Specifically, when gradients in certain layers gradually diminish during training, the adaptive mechanism persistently pushes the maximum eigenvalue of the preconditioned Hessian above the stability threshold, triggering sustained instability. This result provides a novel theoretical perspective for understanding and controlling loss spike behavior.

2025-10-23 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Randomized neural network (RaNN) methods have been proposed for solving various partial differential equations (PDEs), demonstrating high accuracy and efficiency. However, initializing the fixed parameters remains a challenging issue. Additionally, RaNNs often struggle to solve PDEs with sharp or discontinuous solutions. In this talk, we propose a novel approach called Adaptive-Growth Randomized Neural Network (AG-RaNN) to address these challenges. We introduce growth strategies that expand the neural network, making it wider and deeper to improve the accuracy of the numerical solution. A key feature of AG-RaNN is its adaptive strategy for determining the weights and biases of newly added neurons, enabling the network to expand in both width and depth without requiring additional training. Instead, all weights and biases are generated constructively, significantly enhancing the network's approximation capabilities compared to conventional randomized neural network methods. In addition, a domain splitting strategy is introduced to handle the case of discontinuous solutions. A comprehensive theoretical analysis of RaNN methods is also presented, covering approximation, statistical, and optimization errors.

2025-10-30 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Differential equations have demonstrated intrinsic connections to network structures, linking discrete network layers through continuous equations. Most existing approaches focus on the interaction between ordinary differential equations (ODEs) and feature transformations, primarily working on input signals. In this paper, we study the partial differential equation (PDE) model of neural networks, viewing the neural network as a functional operating on a base model provided by the last layer of the classifier. Inspired by scale-space theory, we theoretically prove that this mapping can be formulated by a convection-diffusion equation, under interpretable and intuitive assumptions from both neural network and PDE perspectives. This theoretically certified framework covers various existing network structures and training techniques, offering a mathematical foundation and new insights into neural networks. Moreover, based on the convection-diffusion equation model, we design a new network structure that incorporates a diffusion mechanism into the network architecture from a PDE perspective. Extensive experiments confirm the effectiveness of the proposed model.

2025-11-06 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we will propose an adaptive Hermite spectral method for the three-dimensional velocity space of the Boltzmann equation guided by a newly developed frequency indicator. For the homogeneous problem, the indicator is defined by the contribution of high-order coefficients in the spectral expansion. For the non-homogeneous problem, a Fourier-Hermite scheme is employed, with the corresponding frequency indicator formulated based on distributions across the entire spatial domain. The adaptive Hermite method includes scaling and $p$-adaptive techniques to dynamically adjust the scaling factor and expansion order according to the indicator. Numerical experiments cover both homogeneous and non-homogeneous problems in up to three spatial dimensions. Results demonstrate that the adaptive method substantially reduces $L^2$ errors at negligible computational cost, and the $p$-adaptive method achieves time savings of up to 74\%.