2025 Fall

The seminar of this semester is organized by Shibo Zeng and Yongle Xie, and co-organized by the graduate student union in the School of Mathematical Sciences at Fudan. This section is partially sponsored by Shanghai Key Laboratory for Contemporary Applied Mathematics.

2025-12-25 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand We present an efficient numerical method for solving backward stochastic partial differential equations (BSPDEs), a class of problems notoriously challenging due to the curse of dimensionality because of spatial discretization. Our approach leverages a splitting technique to decompose the original BSPDE into simpler subproblems, effectively alleviates the curse of dimensionality. The optimal first-order strong convergence rate is derived when applied to a general class of non-linear BSPDEs. This convergence rate holds for the solution $u$ itself, and its spatial gradient $\nabla u$, as well as the process $q$. The robustness result is also given. Numerical experiments are also provided to validate the theoretical results.

Past Presentations

2025-09-18 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In multistage group testing, the tests within the same stage are considered nonadaptive, while those conducted across different stages are adaptive. Especially, when the pools within the same stage are disjoint, meaning that the entire set is divided into several disjoint subgroups, it is referred to as a multistage group partition testing problem, denoted as the $(n, d, s)$ problem, where $n$, $d$, and $s$ represent the total number of items, defectives, and stages respectively. This paper presents exact solutions for the $(n,1,s)$ and $(n,d,2)$ problems for the first time. Furthermore, we develop a general dynamic programming framework for the $(n,d,s)$ problem, which allows us to derive the sharp estimation of upper and lower bounds.

2025-09-25 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Efficient scheduling of directed acyclic graphs (DAGs) in heterogeneous environments is challenging due to diverse resource capacities and intricate dependencies. In practice, scalability across environments with varying resource pools, task types, and other settings, alongside rapid schedule generation, complicates these challenges. We propose WeCAN, an end-to-end reinforcement learning framework excelling in heterogeneous DAG scheduling featuring task-resource compatibility. WeCAN rapidly generates schedules through single-pass network inference. Leveraging the weighted cross-attention layer, WeCAN utilizes all available environment information while preserving scalability across diverse heterogeneous environments. Moreover, we introduce a criterion to analyze the optimality gap inherent in list scheduling based methods, revealing barriers preventing these methods from consistently finding optimal solutions. The skip action introduced in our framework addresses this gap. Our approach delivers robust performance and scalability, outperforming state-of-the-art methods across diverse datasets.

2025-10-09 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Early-warning signals of delicate design are always used to predict critical transitions in complex systems, which makes it possible to render the systems far away from the catastrophic state by introducing timely interventions. Traditional signals including the dynamical network biomarker (DNB), based on statistical properties such as variance and autocorrelation of nodal dynamics, overlook directional interactions and thus have limitations in capturing underlying mechanisms and simultaneously sustaining robustness against noise perturbations. This paper therefore introduces a framework of causal network markers (CNMs) by incorporating causality indicators, which reflect the directional influence between variables. Actually, to detect and identify the tipping points ahead of critical transition, two markers are designed: CNM-GC for linear causality and CNM-TE for non-linear causality, as well as a functional representation of different causality indicators and a clustering technique to verify the system's dominant group. Through demonstrations using benchmark models and real-world datasets of epileptic seizure, the framework of CNMs shows higher predictive power and accuracy than the traditional DNB indicator. It is believed that, due to the versatility and scalability, the CNMs are suitable for comprehensively evaluating the systems. The most possible direction for application includes the identification of tipping points in clinical disease.

2025-10-16 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Deep neural networks, as highly nonlinear complex systems, present formidable theoretical challenges. Phenomenon-driven research—grounded in meticulous observation and carefully designed experiments to discover intrinsic system patterns—offers a crucial gateway to understanding these complex systems. This talk presents our recent advances in deep learning generalization and optimization theory through a phenomenon-driven approach. One of the most counterintuitive phenomena in modern machine learning is that neural networks maintain excellent generalization despite overparameterization. Understanding implicit regularization mechanisms in overparameterized models has become essential to deep learning theory. Matrix factorization models, as an important subclass, provide an ideal testbed for studying implicit regularization. This talk first reviews the generalization puzzle, and introduces our discovery of a fundamental structural property of loss landscapes: the Embedding Principle, which reveals an elegant inheritance relationship between critical points across networks of different scales. Building on this, we analyze matrix factorization training dynamics from a model-data decoupling perspective, elucidating when, how, and why different implicit regularization effects (low rank, low nuclear norm) emerge, providing a unified understanding of this system. This talk also presents another phenomenon-driven study: loss spike—a sudden and sharp surge in the loss function that subsequently subsides. These spikes are observed across a wide range of network architectures and datasets, yet their underlying mechanisms remain elusive. While previous studies attributed loss spikes to complex loss landscape geometry, we find they originate from Adam's adaptive preconditioning mechanism. Specifically, when gradients in certain layers gradually diminish during training, the adaptive mechanism persistently pushes the maximum eigenvalue of the preconditioned Hessian above the stability threshold, triggering sustained instability. This result provides a novel theoretical perspective for understanding and controlling loss spike behavior.

2025-10-23 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Randomized neural network (RaNN) methods have been proposed for solving various partial differential equations (PDEs), demonstrating high accuracy and efficiency. However, initializing the fixed parameters remains a challenging issue. Additionally, RaNNs often struggle to solve PDEs with sharp or discontinuous solutions. In this talk, we propose a novel approach called Adaptive-Growth Randomized Neural Network (AG-RaNN) to address these challenges. We introduce growth strategies that expand the neural network, making it wider and deeper to improve the accuracy of the numerical solution. A key feature of AG-RaNN is its adaptive strategy for determining the weights and biases of newly added neurons, enabling the network to expand in both width and depth without requiring additional training. Instead, all weights and biases are generated constructively, significantly enhancing the network's approximation capabilities compared to conventional randomized neural network methods. In addition, a domain splitting strategy is introduced to handle the case of discontinuous solutions. A comprehensive theoretical analysis of RaNN methods is also presented, covering approximation, statistical, and optimization errors.

2025-10-30 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Differential equations have demonstrated intrinsic connections to network structures, linking discrete network layers through continuous equations. Most existing approaches focus on the interaction between ordinary differential equations (ODEs) and feature transformations, primarily working on input signals. In this paper, we study the partial differential equation (PDE) model of neural networks, viewing the neural network as a functional operating on a base model provided by the last layer of the classifier. Inspired by scale-space theory, we theoretically prove that this mapping can be formulated by a convection-diffusion equation, under interpretable and intuitive assumptions from both neural network and PDE perspectives. This theoretically certified framework covers various existing network structures and training techniques, offering a mathematical foundation and new insights into neural networks. Moreover, based on the convection-diffusion equation model, we design a new network structure that incorporates a diffusion mechanism into the network architecture from a PDE perspective. Extensive experiments confirm the effectiveness of the proposed model.

2025-11-06 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we will propose an adaptive Hermite spectral method for the three-dimensional velocity space of the Boltzmann equation guided by a newly developed frequency indicator. For the homogeneous problem, the indicator is defined by the contribution of high-order coefficients in the spectral expansion. For the non-homogeneous problem, a Fourier-Hermite scheme is employed, with the corresponding frequency indicator formulated based on distributions across the entire spatial domain. The adaptive Hermite method includes scaling and $p$-adaptive techniques to dynamically adjust the scaling factor and expansion order according to the indicator. Numerical experiments cover both homogeneous and non-homogeneous problems in up to three spatial dimensions. Results demonstrate that the adaptive method substantially reduces $L^2$ errors at negligible computational cost, and the $p$-adaptive method achieves time savings of up to 74\%.

2025-11-13 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand We present new theoretical results for the BFGS method with an adaptive step size~[Gao and Goldfarb, Optimization Methods and Software, 34(1):194-217, 2019], showing explicit two-phase global convergence: a linear phase at rate $\mathcal{O}((1 - 1/\varkappa)^{k})$ and a superlinear phase at $\mathcal{O}((\varkappa/k)^{k})$, where $k$ is the iteration counter and $\varkappa$ is the condition number. In contrast, classical analyses establish asymptotic convergence only, and recent non-asymptotic results mainly address local convergence under the unit step size or global guarantees with line search. We further propose a smoothness-aided variant that takes a larger adaptive step by leveraging the gradient Lipschitz continuity, thereby accelerating early convergence. These results provide the first explicit non-asymptotic global characterization of BFGS without line search.

2025-11-20 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Reconstructing the evolutionary relationships among species, i.e., phylogenetic inference, has been one of the central problems in computational biology. With a phylogenetic prior and evolutionary substitution likelihood model, this problem is formulated as Bayesian phylogenetic inference of the posterior distribution over phylogenetic trees. Previous approaches often leverages Monte-Carlo type approaches, e.g., MCMC, which can suffer from slow convergence and local mode trapping in practice. In this talk, we discuss how to integrate variational inference with deep learning as a powerful solution to Bayesian phylogenetic inference. Specifically, we develop an autoregressive probabilisitc model called ARTree and its accelerated version to modeling the tree topologies, and a semi-implicit hierarchical construction for the branch lengths. We also introduce representation learning for phylogenetic trees to provide high-resolution representations that are ready-to-use for downstream tasks. These deep learning approaches to Bayesian phylogenetic inference achieve state-of-the-art inference accuracies and inspire broader follow-up innovations.

2025-12-04 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand This report introduces a machine learning method that combines Tensor Neural Networks (TNN) with homogenization theory for solving elliptic multiscale equations. The core advantage of TNN lies in its unique tensor structure, which allows the computation of high-dimensional neural network function integrals to be reduced to one-dimensional integrals. This enables the design of highly accurate high-dimensional integration methods, whose computational complexity scales only polynomially with the number of dimensions. Leveraging this feature, we design a high-precision solver for multiscale problems. Specifically, the original problem is first transformed via homogenization into a series of cell problems and a homogenized equation. These are then solved separately using TNN-based methods. Unlike conventional machine learning methods that rely on Monte Carlo sampling, our approach employs deterministic numerical integration, achieving high computational accuracy. In particular, for cases where the multiscale coefficients depend on both fast and slow variables, the corresponding cell problems are defined on high-dimensional domains; the TNN-based approach enables efficient and accurate computation for such cases compared to traditional methods, thereby extending the applicability of homogenization techniques. We also generalize this approach to elliptic multiscale eigenvalue problems.

2025-12-11 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this lecture, we empirically investigate the distinct long-term behaviors of prenorm and postnorm attention-based graph neural networks. We observe that prenorm models, while do not occur oversmoothing, are prone to the curse of depth. In contrast, postnorm models exhibit the opposite behavior. To mitigate oversmoothing, we propose a simple and efficient approach that incorporates Laplacian energy multiplication prior to the diffusion step. Both theoretical analysis and empirical results demonstrate that our method effectively alleviates oversmoothing.

2025-12-18 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand The primary objective of seismic-petrophysical inversion is to predict reservoir rock and fluid properties from observed data—most notably elastic parameters—which are fundamental to exploration geophysics. This prediction task reformulates the inverse problem as a conditional (posterior) probability model conditioned on the observed data. Within the Bayesian framework, physical constraints inform the construction of the likelihood function, while data constraints guide the development of the prior model. A central challenge lies in building high-fidelity mathematical models grounded in both physics and data. An additional challenge is the need for algorithmic innovation to solve these complex models efficiently. For instance, although common probabilistic inversion algorithms offer global convergence capabilities, their computational cost is often orders of magnitude higher—typically thousands of times—than that of lower-precision, locally convergent seismic inversion methods. This substantial computational burden has severely limited the industrial deployment of high-precision seismology. In this talk, I will present our latest research on both modeling strategies and algorithmic developments for seismic-petrophysical probabilistic inversion, and demonstrate the effectiveness of these approaches in real industrial applications. Moreover, the rapid rise of deep learning has, to some extent, reshaped the research paradigm of traditional seismology. Accordingly, neural network models and deep learning-based inversion algorithms will also be featured in this report.