2024 Fall

The seminar of this semester is organized by Jingyu Liu and Yujie Chen, and co-organized by the graduate student union in the School of Mathematical Sciences at Fudan. This section is partially sponsored by Shanghai Key Laboratory for Contemporary Applied Mathematics.

2024-11-21 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Neurons are the fundamental computational units of the brain. Specifically, the complex dendritic structures within neurons are crucial components for information integration and computation. Recently, quantitatively characterizing dendritic integration rules and understanding their implications for brain-inspired algorithms has become a key research topic. In this talk, we will first introduce the classical dendritic cable model and our recent theoretical and experimental results on a quantitative bilinear integration rule for dendrites. Next, we will discuss how the bilinear integration rule can be used to design artificial neural network models equivalent to biological neurons with complex dendritic structures. Finally, we will explore the implications of dendritic computation for designing artificial neural networks, examining how the information-processing mechanisms of neurons can be leveraged to enhance artificial neural network performance. Our work may help deepen the understanding of neuronal computation and provide valuable insights for the field of brain-inspired artificial intelligence.

2024-11-28 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Benefiting from the ability to reduce communication overhead and maintain the privacy of local agents, decentralized optimization methods have come to the forefront in distributed learning, especially in the decentralized training of DNN. However, the widely employed ReLU activation function leads to the nonsmoothness without Clarke regularity of the loss function in the training of neural networks, which leads to a gap between existing theoretical analysis and implementation in real-world training tasks. In this poster, we concentrate on decentralized optimization problems with nonsmooth nonconvex objective functions. We introduce a unified framework, named DSM, to analyze the global convergence of decentralized stochastic subgradient-based methods, by establishing that the generated sequence asymptotically approximates the trajectories of its associated differential inclusion. We show our proposed framework covers a wide range of existing efficient decentralized subgradient-based methods, such as DSGD, DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). In addition, we introduce the sign map to regularize the update directions in DSGD-M, and show it is enclosed in our proposed framework. Preliminary numerical experiments verify the validity of our developed theoretical results and exhibit flexibility and potential in developing new decentralized subgradient-based algorithms.

Past Presentations

2024-09-05 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Learning long-term dependencies remains a significant challenge in sequence modelling. Despite extensive empirical evidence showing the difficulties recurrent models face in capturing such dependencies, the underlying theoretical reasons are not fully understood. In this talk, we present inverse approximation theorems for nonlinear recurrent neural networks and state-space models. Our analysis reveals that appropriate reparameterizations of recurrent weights are crucial for stably approximating targets with long-term memory. We demonstrate that a broad class of stable reparameterizations allows state-space models to consistently approximate any target functional sequence with decaying memory. Additionally, these reparameterizations mitigate the vanishing and exploding gradient problems commonly encountered in training recurrent models.

2024-09-12 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we introduce a martingale-based neural network, SOC-MartNet, for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) equations where no explicit expression is needed for the infimum of the Hamiltonian, $\inf_u H(t, x, u, z, p)$, and stochastic optimal control problems (SOCPs) with controls on both drift and volatility. We reformulate the HJB equations for the value function by training two neural networks, one for the value function and one for the optimal control with the help of two stochastic processes - a Hamiltonian process and a cost process. The control and value networks are trained such that the associated Hamiltonian process is minimized to satisfy the minimum principle of a feedback SOCP, and the cost process becomes a martingale, thus, ensuring the value function network as the solution to the corresponding HJB equation. Moreover, to enforce the martingale property for the cost process, we employ an adversarial network and construct a loss function characterizing the projection property of the conditional expectation condition of the martingale. Numerical results show that the proposed SOC-MartNet is effective and efficient for solving HJB-type equations and SOCPs with a dimension up to 2000 in a small number of iteration steps (less than 2000) of training.

2024-09-19 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we introduce a mathematically rigorous control strategy for the concurrent modulation of amplitude and frequency in nonlinear dynamic systems, with a focus on oscillatory signals. The central challenge is the independent adjustment of one parameter while constraining the other, a problem of theoretical importance across various complex systems. By leveraging system nonlinearity, we decouple these parameters using a noncomputational approach. This method, supported by a robust mathematical framework, has been validated in representative biophysical systems, demonstrating its potential for future applications in controlling oscillatory dynamics across a wider range of complex systems.

2024-09-26 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Recent years have seen a growing interest in understanding acceleration methods through the lens of ordinary differential equations (ODEs). Despite the theoretical advancements, translating the rapid convergence observed in continuous-time models to discrete-time iterative methods poses significant challenges. In this talk, we present a comprehensive framework integrating the inertial systems with Hessian-driven damping (ISHD) and learning-based approaches for developing optimization methods. We first establish the convergence condition for ensuring the convergence of the solution trajectory of ISHD. Then, we show that provided the stability condition, the sequence generated through the explicit Euler discretization of ISHD converges, which gives a large family of practical optimization methods. In order to select the best optimization method in this family, we introduce the stopping time, the time required for an optimization method derived from ISHD to achieve a predefined level of suboptimality. Then, we formulate a novel learning to optimize (L2O) problem aimed at minimizing the stopping time subject to the convergence and stability condition. Empirical validation of our framework is conducted through extensive numerical experiments. These experiments showcase the superior performance of the learned optimization methods.

2024-10-10 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential solutions, which capture the underlying compositional primitives, or symmetric solutions, which simply memorize mappings without understanding the compositional structure. By analyzing the information flow and vector representations within the model, we reveal the distinct mechanisms underlying these solution types. We further find that inferential solutions exhibit low complexity bias, which we hypothesize is a key factor enabling them to learn individual mappings for single anchors. Our findings provide valuable insights into the role of initialization scale in shaping the type of solution learned by transformers and their ability to learn and generalize compositional tasks.

2024-10-17 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Backward stochastic partial differential equations (BSPDEs), as an infinite-dimensional extension of backward stochastic differential equations, are crucial in stochastic optimal control theory. This study explores the Cauchy problem for a class of linear BSPDEs. For deterministic leading coefficients, we establish the solvability and Hölder regularity of well-posed strong solutions to fractional BSPDEs using potential theory of the fractional heat kernel. Additionally, we apply fractional adjoint BSPDEs to analyze stochastic optimal control in partially observed systems driven by $\alpha$-stable Lévy processes. In the case of stochastic leading coefficients, we propose a perturbation method to derive Schauder estimates.

2024-10-24 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Sequential propagation of chaos (SPoC) is a recently developed tool to solve mean-field stochastic differential equations and their related nonlinear Fokker-Planck equations. Based on the theory of SPoC, we present a new particle method (DeepSPoC) that combines the interacting particle system of SPoC and deep learning. In the algorithm, two classes of frequently used deep models include fully connected neural networks and normalizing flows are considered. For high-dimensional problems, spatial adaptive method are designed to further improve the accuracy and efficiency of deepSPoC. We analysis the convergence of the framework of deepSPoC under some simplified conditions and also provide a posterior error estimation for the algorithm. Finally, we test our methods on a wide range of different types of mean-field equations.

2024-10-31 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Recent development on mixed precision techniques has largely enhanced the performance of various linear algebra solvers, one of which being the least squares problem $\min_{x}\| b-Ax \|$. By transforming the least squares problem into an augmented linear system, mixed precision techniques are able to refine the lower precision solution to the working precision. In this talk, we propose mixed precision iterative refinement algorithms for two variants of the least squares problem---the least squares with linear equality constraints~(LSE) and the generalized least squares problem~(GLS). Both classical and GMRES-based iterative refinement can be applied to augmented systems of these two problems to improve the accuracy of the solution. For reasonably well-conditioned problems our algorithms reduce the execution time by 40\% in average compared to the fixed precision ones from LAPACK on the x86-64 architecture.

2024-11-07 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Concentration inequalities describe the probability that the average of a sequence of random variables is close to its expected value. In particular, the concentration of Markov chains has been an extensively studied topic. In this talk, we first introduce a new Bernstein-type inequality for general Markov chains via an elementary approach. Compared with existing works, our result only requires the Markov chain to satisfy an iterated Poincar\'e inequality. In the second part of this talk, we study the inverse problem of estimating the transition matrix of a given Markov chain. We show the robustness of our Bernstein-type inequality by establishing non-asymptotic error bounds for the classical MLE method. Meanwhile, in the reversible case, we propose a new reversibility-preserving online SCE method with non-asymptotic deviation bounds.

2024-11-14 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In batched stochastic bandits, an agent collects noisy rewards or losses in batches, aiming to identify the optimal option while efficiently exploring the decision space. Beyond merely minimizing regret—which measures policy effectiveness—the agent also seeks to minimize the number of batches used. In this talk, we examine batched stochastic bandits for a significant class of functions over a compact doubling metric space, called “nondegenerate functions”, which may exhibit nonconvexity, nonsmoothness, or discontinuities. To address this problem, we introduce an algorithm called Geometric Narrowing (GN). The regret bound for GN is of order $ \widetilde{\mathcal{O}}(A_+^d \sqrt{T}) $, where $ d $ is the doubling dimension of the metric space and $ A_+ $ is a constant independent of both $ d $ and the time horizon $T$. Notably, GN requires only $ \mathcal{O}(\log \log T) $ batches to achieve this regret bound. We further present a lower bound analysis, demonstrating that the GN algorithm attains near-optimal regret with minimal number of batches.