2024 Fall

The seminar of this semester is organized by Jingyu Liu and Yujie Chen, and co-organized by the graduate student union in the School of Mathematical Sciences at Fudan. This section is partially sponsored by Shanghai Key Laboratory for Contemporary Applied Mathematics.

Past Presentations

2024-09-05 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Learning long-term dependencies remains a significant challenge in sequence modelling. Despite extensive empirical evidence showing the difficulties recurrent models face in capturing such dependencies, the underlying theoretical reasons are not fully understood. In this talk, we present inverse approximation theorems for nonlinear recurrent neural networks and state-space models. Our analysis reveals that appropriate reparameterizations of recurrent weights are crucial for stably approximating targets with long-term memory. We demonstrate that a broad class of stable reparameterizations allows state-space models to consistently approximate any target functional sequence with decaying memory. Additionally, these reparameterizations mitigate the vanishing and exploding gradient problems commonly encountered in training recurrent models.

2024-09-12 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we introduce a martingale-based neural network, SOC-MartNet, for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) equations where no explicit expression is needed for the infimum of the Hamiltonian, $\inf_u H(t, x, u, z, p)$, and stochastic optimal control problems (SOCPs) with controls on both drift and volatility. We reformulate the HJB equations for the value function by training two neural networks, one for the value function and one for the optimal control with the help of two stochastic processes - a Hamiltonian process and a cost process. The control and value networks are trained such that the associated Hamiltonian process is minimized to satisfy the minimum principle of a feedback SOCP, and the cost process becomes a martingale, thus, ensuring the value function network as the solution to the corresponding HJB equation. Moreover, to enforce the martingale property for the cost process, we employ an adversarial network and construct a loss function characterizing the projection property of the conditional expectation condition of the martingale. Numerical results show that the proposed SOC-MartNet is effective and efficient for solving HJB-type equations and SOCPs with a dimension up to 2000 in a small number of iteration steps (less than 2000) of training.

2024-09-19 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we introduce a mathematically rigorous control strategy for the concurrent modulation of amplitude and frequency in nonlinear dynamic systems, with a focus on oscillatory signals. The central challenge is the independent adjustment of one parameter while constraining the other, a problem of theoretical importance across various complex systems. By leveraging system nonlinearity, we decouple these parameters using a noncomputational approach. This method, supported by a robust mathematical framework, has been validated in representative biophysical systems, demonstrating its potential for future applications in controlling oscillatory dynamics across a wider range of complex systems.

2024-09-26 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Recent years have seen a growing interest in understanding acceleration methods through the lens of ordinary differential equations (ODEs). Despite the theoretical advancements, translating the rapid convergence observed in continuous-time models to discrete-time iterative methods poses significant challenges. In this talk, we present a comprehensive framework integrating the inertial systems with Hessian-driven damping (ISHD) and learning-based approaches for developing optimization methods. We first establish the convergence condition for ensuring the convergence of the solution trajectory of ISHD. Then, we show that provided the stability condition, the sequence generated through the explicit Euler discretization of ISHD converges, which gives a large family of practical optimization methods. In order to select the best optimization method in this family, we introduce the stopping time, the time required for an optimization method derived from ISHD to achieve a predefined level of suboptimality. Then, we formulate a novel learning to optimize (L2O) problem aimed at minimizing the stopping time subject to the convergence and stability condition. Empirical validation of our framework is conducted through extensive numerical experiments. These experiments showcase the superior performance of the learned optimization methods.

2024-10-10 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential solutions, which capture the underlying compositional primitives, or symmetric solutions, which simply memorize mappings without understanding the compositional structure. By analyzing the information flow and vector representations within the model, we reveal the distinct mechanisms underlying these solution types. We further find that inferential solutions exhibit low complexity bias, which we hypothesize is a key factor enabling them to learn individual mappings for single anchors. Our findings provide valuable insights into the role of initialization scale in shaping the type of solution learned by transformers and their ability to learn and generalize compositional tasks.

2024-10-17 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Backward stochastic partial differential equations (BSPDEs), as an infinite-dimensional extension of backward stochastic differential equations, are crucial in stochastic optimal control theory. This study explores the Cauchy problem for a class of linear BSPDEs. For deterministic leading coefficients, we establish the solvability and Hölder regularity of well-posed strong solutions to fractional BSPDEs using potential theory of the fractional heat kernel. Additionally, we apply fractional adjoint BSPDEs to analyze stochastic optimal control in partially observed systems driven by $\alpha$-stable Lévy processes. In the case of stochastic leading coefficients, we propose a perturbation method to derive Schauder estimates.

2024-10-24 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Sequential propagation of chaos (SPoC) is a recently developed tool to solve mean-field stochastic differential equations and their related nonlinear Fokker-Planck equations. Based on the theory of SPoC, we present a new particle method (DeepSPoC) that combines the interacting particle system of SPoC and deep learning. In the algorithm, two classes of frequently used deep models include fully connected neural networks and normalizing flows are considered. For high-dimensional problems, spatial adaptive method are designed to further improve the accuracy and efficiency of deepSPoC. We analysis the convergence of the framework of deepSPoC under some simplified conditions and also provide a posterior error estimation for the algorithm. Finally, we test our methods on a wide range of different types of mean-field equations.

2024-10-31 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Recent development on mixed precision techniques has largely enhanced the performance of various linear algebra solvers, one of which being the least squares problem $\min_{x}\| b-Ax \|$. By transforming the least squares problem into an augmented linear system, mixed precision techniques are able to refine the lower precision solution to the working precision. In this talk, we propose mixed precision iterative refinement algorithms for two variants of the least squares problem---the least squares with linear equality constraints~(LSE) and the generalized least squares problem~(GLS). Both classical and GMRES-based iterative refinement can be applied to augmented systems of these two problems to improve the accuracy of the solution. For reasonably well-conditioned problems our algorithms reduce the execution time by 40\% in average compared to the fixed precision ones from LAPACK on the x86-64 architecture.

2024-11-07 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Concentration inequalities describe the probability that the average of a sequence of random variables is close to its expected value. In particular, the concentration of Markov chains has been an extensively studied topic. In this talk, we first introduce a new Bernstein-type inequality for general Markov chains via an elementary approach. Compared with existing works, our result only requires the Markov chain to satisfy an iterated Poincar\'e inequality. In the second part of this talk, we study the inverse problem of estimating the transition matrix of a given Markov chain. We show the robustness of our Bernstein-type inequality by establishing non-asymptotic error bounds for the classical MLE method. Meanwhile, in the reversible case, we propose a new reversibility-preserving online SCE method with non-asymptotic deviation bounds.

2024-11-14 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In batched stochastic bandits, an agent collects noisy rewards or losses in batches, aiming to identify the optimal option while efficiently exploring the decision space. Beyond merely minimizing regret—which measures policy effectiveness—the agent also seeks to minimize the number of batches used. In this talk, we examine batched stochastic bandits for a significant class of functions over a compact doubling metric space, called “nondegenerate functions”, which may exhibit nonconvexity, nonsmoothness, or discontinuities. To address this problem, we introduce an algorithm called Geometric Narrowing (GN). The regret bound for GN is of order $ \widetilde{\mathcal{O}}(A_+^d \sqrt{T}) $, where $ d $ is the doubling dimension of the metric space and $ A_+ $ is a constant independent of both $ d $ and the time horizon $T$. Notably, GN requires only $ \mathcal{O}(\log \log T) $ batches to achieve this regret bound. We further present a lower bound analysis, demonstrating that the GN algorithm attains near-optimal regret with minimal number of batches.

2024-11-21 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Neurons are the fundamental computational units of the brain. Specifically, the complex dendritic structures within neurons are crucial components for information integration and computation. Recently, quantitatively characterizing dendritic integration rules and understanding their implications for brain-inspired algorithms has become a key research topic. In this talk, we will first introduce the classical dendritic cable model and our recent theoretical and experimental results on a quantitative bilinear integration rule for dendrites. Next, we will discuss how the bilinear integration rule can be used to design artificial neural network models equivalent to biological neurons with complex dendritic structures. Finally, we will explore the implications of dendritic computation for designing artificial neural networks, examining how the information-processing mechanisms of neurons can be leveraged to enhance artificial neural network performance. Our work may help deepen the understanding of neuronal computation and provide valuable insights for the field of brain-inspired artificial intelligence.

2024-11-28 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Benefiting from the ability to reduce communication overhead and maintain the privacy of local agents, decentralized optimization methods have come to the forefront in distributed learning, especially in the decentralized training of DNN. However, the widely employed ReLU activation function leads to the nonsmoothness without Clarke regularity of the loss function in the training of neural networks, which leads to a gap between existing theoretical analysis and implementation in real-world training tasks. In this poster, we concentrate on decentralized optimization problems with nonsmooth nonconvex objective functions. We introduce a unified framework, named DSM, to analyze the global convergence of decentralized stochastic subgradient-based methods, by establishing that the generated sequence asymptotically approximates the trajectories of its associated differential inclusion. We show our proposed framework covers a wide range of existing efficient decentralized subgradient-based methods, such as DSGD, DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). In addition, we introduce the sign map to regularize the update directions in DSGD-M, and show it is enclosed in our proposed framework. Preliminary numerical experiments verify the validity of our developed theoretical results and exhibit flexibility and potential in developing new decentralized subgradient-based algorithms.

2024-12-05 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Nonlinear boolean equation systems play an important role in a wide range of applications. Grover's algorithm is one of the best-known quantum search algorithms in solving the nonlinear boolean equation system on quantum computers. In this talk, we introduce several techniques to improve the efficiency in solving nonlinear boolean equations under Grover's algorithm framework. A W-cycle circuit construction introduces a recursive idea to increase the solvable number of boolean equations given a fixed number of qubits. Then, a greedy compression technique is proposed to reduce the oracle circuit depth. Finally, a randomized Grover's algorithm randomly chooses a subset of equations to form a random oracle every iteration, which further reduces the circuit depth and the number of ancilla qubits. Numerical results on boolean quadratic equations demonstrate the efficiency of the proposed techniques.

2024-12-12 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand As networks scale up and computational tasks grow increasingly complex, decentralized resource allocation algorithms have garnered significant attention in fields such as smart grids, vehicular networks, and wireless sensor networks. However, the presence of malicious agents that deviate from the given algorithm protocol and broadcast random, incorrect messages to the network poses a significant challenge. Such malicious messages can easily disrupt existing resource allocation algorithms, leading to wrong resource allocation strategies. To address this, we characterize such malicious behavior using the classical Byzantine attack model and present a series of Byzantine-resilient decentralized resource allocation algorithms. We analyze the theoretical performance of the proposed algorithms and validate their Byzantine resilience through numerical experiments.

2024-12-19 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand The reconstruction of gene regulatory networks (GRNs) from data is vital in systems biology. Although different approaches have been proposed to infer causality from data, some challenges remain, such as how to accurately infer the direction and type of interactions, how to deal with complex network involving multiple feedbacks, as well as how to infer causality between variables from real-world data, especially single cell data. Here, we tackle these problems by deep neural networks (DNNs). The underlying regulatory network for different systems (gene regulations, ecology, diseases, development) can be successfully reconstructed from trained DNN models. We show that DNN is superior to existing approaches including Boolean network, Random Forest and partial cross mapping for network inference. Further, by interrogating the ensemble DNN model trained from single cell data from dynamical system perspective, we are able to unravel complex cell fate dynamics during preimplantation development. We also propose a data-driven approach to quantify the energy landscape for gene regulatory systems, by combining DNN with the partial self-consistent mean field approximation (PSCA) approach. We anticipate the proposed method can be applied to other fields to decipher the underlying dynamical mechanisms of systems from data.

2024-12-26 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In recent work, it has been shown that determining a feedforward ReLU neural network within high uniform accuracy from point samples suffers from the curse of dimensionality in terms of the number of samples needed. As a consequence, feedforward ReLU neural networks are of limited use for applications where guaranteed high uniform accuracy is required. We consider the question of whether the sampling complexity can be improved by restricting the specific neural network architecture. To this end, we investigate invertible residual neural networks which are foundational architectures in deep learning and are widely employed in models that power modern generative methods. Our main result shows that the residual neural network architecture and invertibility do not help overcome the complexity barriers encountered with simpler feedforward architectures. Specifically, we demonstrate that the computational complexity of approximating invertible residual neural networks from point samples in the uniform norm suffers from the curse of dimensionality. Similar results are established for invertible convolutional residual neural networks.

2025-01-02 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Preconditioned eigensolvers enable the incorporation of preconditioners for eigenvalue computation, but their convergence analysis is intricate. Even for PINVIT, which targets the smallest eigenvalue of an SPD matrix, Neymeyr’s celebrated analysis is highly nontrivial and only yields convergence if the starting vector is fairly close to the desired eigenvector. While advanced methods like LOBPCG offer significant practical improvements, there remains no theoretical justification for their accelerations. In this talk, we present some recent theoretical advances on preconditioned eigensolvers, including both provable accelerations and improved convergence guarantees. These results are achieved by establishing novel convexity structures of Rayleigh quotients and analyzing the local properties of preconditioners. This talk is based on joint works with Foivos Alimisis (UNIGE), Zhaojun Bai (UC Davis), Wenbin Chen (Fudan), Daniel Kressner (EPFL) and Bart Vandereycken (UNIGE).