2025 Spring

The seminar of this semester is organized by Qiang Wu and Ming Li, and co-organized by the graduate student union in the School of Mathematical Sciences at Fudan. This section is partially sponsored by Shanghai Key Laboratory for Contemporary Applied Mathematics.

2025-06-05 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In symmetric block eigenvalue algorithms, such as the subspace iteration algorithm and the locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm, a large block size is often employed to achieve robustness and rapid convergence. However, using a large block size also increases the computational cost. Traditionally, the block size is typically reduced after convergence of some eigenpairs, known as deflation. In this work, we propose a non-deflation-based, more aggressive technique, where the block size is adjusted dynamically during the algorithm. This technique can be applied to a wide range of block eigensolvers, reducing computational cost without compromising convergence speed. We present three adaptive strategies for adjusting the block size, and apply them to four well-known eigensolvers as examples. Detailed theoretical analysis and numerical experiments are provided to illustrate the efficiency of the proposed technique. In practice, an overall acceleration of 20% to 30% is observed.

2025-06-12 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand This article addresses the challenge of parameter calibration in stochastic models where the likelihood function is not analytically available. We propose a gradient-based simulated parameter estimation framework, leveraging a multi-time scale algorithm that tackles the issue of ratio bias in both maximum likelihood estimation and posterior density estimation problems. A nested simulation optimization structure is introduced, accompanied by comprehensive theoretical analyses, including strong convergence, asymptotic normality, convergence rates, and budget allocation strategies. These theoretical results provide crucial insights for algorithm design and hyperparameter selection. The framework is further extended to neural network training, offering a novel perspective on stochastic approximation in machine learning. Numerical experiments show that our algorithm can improve the estimation accuracy and save computational costs, making it effective for parameter estimation in stochastic systems.

Past Presentations

2025-02-20 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, I present an asymptotic stability result concerning the self-similar Blasius profiles $[\bar{u}, \bar{v}]$ of the stationary Prandtl boundary layer equation. Initially demonstrated by Serrin (1967, Proc.\ R.\ Soc.\ Lond), the profiles $[\bar{u}, \bar{v}]$ were shown to act as a self-similar attractor of solutions $[u, v]$ to the Prandtl equation through the use of von Mises transform and maximal principle techniques. Specifically, as $x \to \infty$, $\|u - \bar{u}\|_{L^{\infty}_{y}} \to 0$. Iyer(2020, ARMA) employed refined energy methods to derive an explicit convergence rate for initial data close to Blasius. Wang and Zhang(2023, Math.\ Ann.) utilized barrier function methods, removing smallness assumptions but imposing stronger asymptotic conditions on the initial data. It was suggested that the optimal convergence rate should be $\|u-\bar{u}\|_{L^{\infty}_{y}}\lesssim (x+1)^{-\frac{1}{2}}$, treating the stationary Prandtl equation as a 1-D parabolic equation in the entire space. In our work, we establish that $\|u - \bar{u}\|_{L^{\infty}_{y}} \lesssim (x+1)^{-1}$. Our proof relies on discovering nearly conserved low-frequency quantities and inherent degenerate structures at the boundary, which enhance the convergence rate through iteration techniques. Notably, the convergence rate we have demonstrated is optimal. We can find special solutions of Prandtl's equation such that the convergence between the solutions and the Blasius profile is exact, represented as $ (x+1)^{-1} $. This is a joint work with Prof. Hao Jia and Prof. Zhen Lei.

2025-02-27 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Solving the time-independent Schrödinger equation gives us full access to the chemical properties of molecules. Among all the ab-initio methods, full configuration interaction (FCI) provides the numerically exact solution under a predefined basis set. However, the FCI problem scales exponentially with respect to the number of bases and electrons and suffers from the curse of dimensionality. We develop a multi-threaded parallel coordinate descent full configuration interaction algorithm, for the electronic structure ground-state calculation in the configuration interaction framework. The algorithm solves an unconstrained nonconvex optimization problem, via a modified block coordinate descent method with a deterministic compression strategy. CDFCI captures and updates appreciative determinants with different frequencies proportional to their importance. We demonstrate the efficiency of the algorithm on practical systems.

2025-03-06 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, a numerical method for a class of nonlocal PDEs with long time delay is designed. The system involves a variable on $\Omega\times\mathbb{R}\times\mathbb{R}^{+}$, in which case for $\Omega\subset\mathbb{R}^{d}$, a $(d+2)$-dimensional problem is to be solved numerically, which is challenging, especially for $d=2$ or $d=3$. In this talk, we propose an effective numerical method: BDF schemes and Fourier spectral method are applied for time and space discretization respectively, and the long time delay term is treated by Laguerre spectral method. The unique solvability of the numerical schemes is proved, and the energy upper bound of the numerical solution for the long time is given by energy estimation. By applying the generalized Laguerre orthogonal projection, we obtain the error estimate within finite final time for the fully discretization. We present some numerical experiments to verify the energy bound and convergence order. Also, examples are given to show how the solutions evolve and approach the global attractor.

2025-03-13 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Interactive Theorem Provers (ITPs), often referred to as formal languages, offer a reliable method to eliminate errors in mathematical reasoning. Meanwhile, Large Language Models (LLMs) have shown great potential to accelerate—and even automate—the formalization process. In this talk, we will explore how LLMs are applied in key areas such as premise selection, tactic suggestion, auto-formalization, and automated theorem proving. Additionally, we will discuss how training datasets for these tasks are constructed, highlighting the impact of structural information on improving LLMs' performance in Lean-related tasks, particularly in LeanSearch and our statement formalizer.

2025-03-20 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, the computation of the matrix pair describing the first passage time of a Markov additive process is considered. This pair of matrices is characterized as a solution to an integral matrix equation, for which we develop an iterative method. At each step, it requires computing the extremal solution to a mixed linear-quadratic matrix equation, which is accomplished by a quadratically convergent algorithm. When all the jumps are of phase-type distribution, the integral matrix equation can be transformed into a single mixed linear-quadratic matrix equation, and thus the pair of matrices can be computed with quadratic convergence.

2025-03-27 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Accurately identifying and predicting dynamics from observational data with noise perturbations or data missing is a significant challenge in the field of dynamical systems. In my talk, I will introduce the Hamiltonian Neural Koopman Operator (HNKO), a novel approach that combines principles from Hamiltonian mechanics with the learning of the Koopman operator. This framework not only sustains but also discovers conservation laws automatically, leveraging my foundational knowledge of mathematical physics. The effectiveness of the HNKO and its extensions are demonstrated across various representative physical systems, even those with hundreds or thousands of degrees of freedom. The findings indicate that incorporating prior knowledge of the underlying system and relevant mathematical theories into the learning framework significantly enhances the ability of machine learning to address complex physical problems.

2025-04-03 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this report, we present a neural network approach to address the dynamic unbalanced optimal transport problem on surfaces with point cloud representation. For surfaces with point cloud representation, traditional method is difficult to apply due to the difficulty of mesh generating. Neural network is easy to implement even for complicate geometry. Moreover, instead of solving the original dynamic formulation, we consider the Hamiltonian flow approach, i.e. Karush-Kuhn-Tucker system. Based on this approach, we can exploit mathematical structure of the optimal transport to construct the neural network and the loss function can be simplified. Extensive numerical experiments are conducted for surfaces with different geometry. We also test the method for point cloud with noise, which shows stability of this method. This method is also easy to generalize to diverse range of problems.

2025-04-10 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand The nonlinear Schrödinger equation (NLSE) arises from various applications in quantum physics and chemistry, nonlinear optics, plasma physics, Bose--Einstein Condensates, etc. In these applications, it is necessary to incorporate low-regularity or singular potential and nonlinearity into the NLSE. Typical examples of such potential and nonlinearity include the discontinuous square-well potential, the singular Coulomb potential, the non-integer power nonlinearity, and the logarithmic nonlinearity. Such low regularity and singularity pose significant challenges in the analysis of standard numerical methods and the development of novel accurate, efficient, and structure-preserving schemes. In this talk, I will introduce several new analysis techniques to establish optimal error bounds for some widely used numerical methods under optimally weak regularity assumptions. Based on the analysis, we also propose novel temporal and spatial discretizations to handle the low regularity and singularity more effectively.

2025-04-17 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Policy optimization refers to a family of effective algorithms which search in the policy space based on policy parameterization to solve reinforcement learning problems. Inspired by the similar update pattern of softmax natural policy gradient and Hadamard policy gradient, we propose to study a general policy update rule called $\phi$-update, where $\phi$ refers to a scaling function on advantage functions. Under very mild conditions on $\phi$, the global asymptotic state value convergence of $\phi$-update is firstly established. Then we show that the policy produced by $\phi$-update indeed converges, even when there are multiple optimal policies. This is in stark contrast to existing results where explicit regularizations are required to guarantee the convergence of the policy. The exact asymptotic convergence rate of state values is further established based on the policy convergence. Lastly, we establish the global linear convergence of $\phi$-update.

2025-04-24 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand This work considers the simple bilevel optimization problem, which involves minimizing a composite convex function over the optimal solution set of another composite convex minimization problem. By reformulating this bilevel problem as finding the left-most root of a nonlinear equation and introducing a novel dual approach for the subproblems, we efficiently obtain an $(\epsilon, \epsilon)$-optimal solution. The proposed methods achieve near-optimal complexity of $\tilde{\mathcal{O}}(1/\sqrt{\epsilon})$ for both the upper- and lower-level objectives under mild assumptions, aligning with the optimal complexity bounds of first-order methods in unconstrained smooth or composite convex optimization when ignoring logarithmic terms.

2025-05-08 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand For any weakly interacting particle system with bounded kernel, we give uniform-in-time estimates of the $L^2$ norm of correlation functions, provided that the diffusion coefficient is large enough. When the condition on the kernels is more restrictive, we can remove the dependence of the lower bound for diffusion coefficient on the initial data and estimate the size of chaos in a weaker sense. Based on these estimates, we may study fluctuation around the mean-field limit.

2025-05-15 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Nonuniform discrete Fourier transform (NUDFT) and its inverse are widely used in various fields of scientific computing. In this talk, we introduce a novel fast direct inversion method for type 3 NUDFT. The proposed method approximates the type 3 NUDFT matrix as a product of a type 2 NUDFT matrix and an HSS matrix, where the type 2 NUDFT matrix is further decomposed as the product of an HSS matrix and uniform DFT matrix. Based on the decomposition of the type 3 NUDFT matrix, both matrix forward application and backward inversion could be accomplished in quasi-linear complexity. Our fast backward inversion can serve as a fast direct solver or as an efficient preconditioner. Additionally, we provide an error bound for the approximation under specific sample distributions. Numerical results are presented to verify the relevant theoretical properties and demonstrate the efficiency of the proposed methods.

2025-05-22 15:00:00 - 16:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Neural networks have become powerful tools for solving Partial Differential Equations (PDEs), with wide-ranging applications in engineering, physics, and biology. In this talk, we explore the performance of deep neural networks in solving PDEs, focusing on two primary sources of error: approximation error, and generalization error. The approximation error captures the gap between the exact PDE solution and the neural network’s hypothesis space. Generalization error arises from the challenges of learning from finite samples. We begin by analyzing the approximation capabilities of deep neural networks, particularly under Sobolev norms, and discuss strategies to overcome the curse of dimensionality. We then present generalization error bounds, offering insight into when and why deep networks can outperform shallow ones in solving PDEs.

2025-05-22 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand The high-dimensional Schrödinger eigenvalue problem plays a crucial role in various fields, such as computational chemistry, condensed matter physics and quantum computing. Though classical numerical methods have achieved great success in solving low-dimensional PDEs and eigenvalue problems, a major challenge persists: the curse of dimensionality. Recently, significant progress has been made in applying deep neural networks to solve PDEs and Schrödinger eigenvalue problems. In this talk, we introduce a machine learning method for computing eigenvalues and eigenfunctions of the Schrödinger operator with Dirichlet boundary conditions. The eigenvalues are deep in the spectrum. The cut-off function technique is employed to construct trial functions that precisely satisfy the Dirichlet boundary conditions. This approach outperforms the standard boundary penalty method, as demonstrated by the numerical tests. Under the assumption that the eigenfunctions belong to a spectral Barron space, we derive a dimension-free convergence rate of the generalization error bound of the method, and all constants in the error bounds grow at most polynomially. This assumption is verified by proving a new regularity result for the eigenfunctions when the potential lies in an appropriate spectral Barron space. Moreover, we prove a sharp accumulation rate of the generalization error and extend the generalization bound to the normalized penalty method, which is widely used in practice.

2025-05-29 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Transformer-based Large Language Models (LLMs) have revolutionized Natural Language Processing by demonstrating exceptional performance across diverse tasks. This study investigates the impact of the parameter initialization scale on the training behavior and task preferences of LLMs. We discover that smaller initialization scales encourage models to favor reasoning tasks, whereas larger initialization scales lead to a preference for memorization tasks. We validate this reasoning bias via real datasets and meticulously designed anchor functions. Further analysis of initial training dynamics suggests that specific model components, particularly the embedding space and self-attention mechanisms, play pivotal roles in shaping these learning biases. We provide a theoretical framework from the perspective of model training dynamics to explain these phenomena. Additionally, experiments on real-world language tasks corroborate our theoretical insights. This work enhances our understanding of how initialization strategies influence LLM performance on reasoning tasks and offers valuable guidelines for training models.