2021 Fall

The seminar of this semester is organized by Hanxiang Shen and Sihan Mao, and co-organized by the graduate student union in the School of Mathematical Sciences at Fudan. This section is partially sponsored by Ke Wei.

Past Presentations

2021-09-13 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand This presentation addresses the problem of image manipulation and generation under user-specific geometric constraints. Key challenges for solving this problem include (i) Limited training database which causes low generalization ability. (ii) Hand-crafted sketch input producing irregular structure. and (iii) Handling noise to balance between image quality and data privacy. To address these problems, a novel two-stage approach is proposed. Interactive image deformation is performed through editing on contours. This is performed in the latent sparse edge space with both color and gradient information. The output of editing is then fed into a multi-scale representation of the image to recover quality output. The model is flexible in terms of transferability and training efficiency. Joint work with Prof. L. D. Cohen.

2021-09-27 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we propose and study a distributed algorithm for computing dominant (or truncated) singular value decompositions (SVD) of large and distributed data matrices by solving an optimization problem with orthogonality constraints. We consider a centralized network in which each node privately holds a subset of columns and only exchanges “safe” information with a center server, directly or indirectly, in a collaborative effort to calculate a dominant SVD for the whole matrix. In the framework of alternating direction methods of multipliers (ADMM), we propose a novel formulation for building consensus by equalizing subspaces spanned by splitting variables instead of equalizing the variables themselves. This technique greatly relaxes feasibility restrictions and accelerates convergence significantly, while at the same time yielding simple subproblems. We design several algorithmic features, including a low-rank multiplier formula and mechanisms for controlling subproblem solution accuracies, to increase the algorithm’s computational efficiency and reduce its communication overhead. More importantly, the possibility appears remote, if possible at all, for a malicious node to be able to uncover the data stored in another node through publicly shared quantities in the algorithm, which is not the case in many existing distributed or parallelized algorithms. We present convergence analysis results, including a worst-case complexity estimate, for our specialized nonconvex ADMM algorithm, and extensive experimental results indicating that the proposed algorithm, while safely guarding data privacy, has a strong potential to deliver a cutting-edge performance, especially when communication costs are high.

2021-10-11 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand The principal rank-one (RO) components of an image represent the self-similarity of the image, which is an important property for image restoration. However, the RO components of a corrupted image could be decimated by the procedure of image denoising. We suggest that the RO property should be utilized and the decimation should be avoided in image restoration. To achieve this, we propose a new framework comprised of two modules, i.e., the RO decomposition and RO reconstruction. The RO decomposition is developed to decompose a corrupted image into the RO components and residual. This is achieved by successively applying RO projections to the image or its residuals to extract the RO components. The RO projections, based on neural networks, extract the closest RO component of an image. The RO reconstruction is aimed to reconstruct the important information, respectively from the RO components and residual, as well as to restore the image from this reconstructed information. Experimental results on four tasks, i.e., noise-free image super-resolution (SR), realistic image SR, gray-scale image denoising, and color image denoising, show that the method is effective and efficient for image restoration, and it delivers superior performance for realistic image SR and color image denoising.

2021-10-18 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Recurrent neural networks (RNNs) are among the most frequently employed methods to build machine learning models on temporal data. Despite its ubiquitous applications, many fundamental theoretical questions remain to be answered. We study the approximation properties and optimization dynamics of RNNs when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals and characterize the approximation rate. Moreover, a fine-grained dynamical analysis of training linear RNNs by gradient methods is performed. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on both approximation and optimization. When there is long-term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from severe slow downs. In particular, both of these effects become exponentially more pronounced with increasing memory - a phenomenon we call the “curse of memory”. These analyses represent a basic step towards a concrete mathematical understanding of new phenomenons that may arise in learning temporal relationships using recurrent architectures.

2021-10-25 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand The optimization methods for computing higher-order critical points of nonconvex problems attract growing research interest [1, 13–15] recently, as they are able to exclude the so-called degenerate saddle points and reach a solution with better quality. Despite theoretical developments in [1, 13–15], the corresponding numerical experiments are missing. This paper proposes an implementable higher-order method, named adaptive high order method (AHOM), to find the third-order critical points. AHOM is achieved by solving an “easier” subproblem and incorporating the adaptive strategy of parametertuning in each iteration of the algorithm. The iteration complexity of the proposed method is established. Some preliminary numerical results are provided to show that AHOM can escape from the degenerate saddle points, where the second-order method could possibly get stuck.

2021-11-01 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we will introduce our research on generative, retrieval and task-oriented dialogue system respectively. In generative dialogue system, we propose a new dialog pre-training framework called DialogVED, which introduces continuous latent variables into the encoder-decoder pre-training framework to increase the relevance and diversity of responses. We pre-train DialogVED on large-scale dialogue corpus, achieving SOTA results on multiple downstream tasks. In retrieval-based dialogue system, we propose a fine-to-coarse distillation model based on contextual matching for coarse-grained response selection in open-domain conversations. Extensive experimental results on two self-constructed datasets show that the proposed methods achieve a significant improvement over all evaluation metrics compared with traditional baseline methods. In task-oriented dialogue system, we propose a diagnosis-oriented dialogue system framework, and introduce a large-scale medical dialogue corpus DialoIMC with multi-level fine-grained annotations. We establish three evaluation tracks on DialoIMC and report a set of benchmark results for each track, which shows the usability of the dataset and sets a baseline for future studies.

2021-11-08 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand Landau equation is a fundamental integro-differential equation describing the evolution of the distribution for charged particles in plasma physics. In this talk, I will introduce random batch particle methods for efficiently solving the homogeneous Landau equation. The methods are stochastic variations of the particle methods proposed by Carrillo et al. using the random batch strategy. The collisions only take place inside the small but randomly selected batches so that the computational cost is reduced from $O(N^2)$ to $O(N)$ per time step. Meanwhile, these methods can preserve the conservation of mass, momentum, energy and the decay of entropy.

2021-11-15 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand It is of great interests to solve inverse stationary radiative transport equation (RTE) with very large data sets. The standard way is to formulate the inverse problem into an optimization problem, but the bottle-neck is that one has to solve the forward problem over and over again which is time consuming. In this paper, we propose an offline/online solver for RTE based on the Tailored Finite Point Method (TFPM) proposed in \cite{ShiA} and \cite{HanTwo}. TFPM for RTE is uniformly convergent with respect to the mean free path and valid up to the boundary and interface layers. Two cases are considered, one is to solve the RTE with fixed scattering and absorption cross sections, while the boundary conditions vary; the other is when cross sections vary in a small domain and the boundary conditions change for a lot of times. In these two cases, the solver can be decomposed into offline/online stages. The cost at offline stage is comparable to classical methods, while the cost at online stage is much lower. One only needs to calculate the offline stage once and update the online stage when varying the parameters. Our proposed solver is much cheaper when one needs to solve RTE with multiple right hand sides or when the cross sections vary in a small domain, thus can accelerate the speed of inverse RTE problems.

2021-11-22 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we focus on minimizing the sum of smooth and strongly convex local objective functions stored in a distributed manner across the nodes of an undirected network. We propose an Optimal Gradient Tracking (OGT) method which is the first single-loop gradient-type method optimal in both gradient computation and communication complexities. The development of OGT has two steps. First, we propose a new accelerated gradient tracking method termed ``Snapshot” Gradient Tracking (SS-GT). Inspired by the variance reduction methods Katyusha and L-Katyusha, SS-GT combines ``snapshot” points and ``negative momentums” with the classical gradient tracking and outperforms previous single-loop accelerated gradient tracking methods. SS-GT is of independent interest and can be extended to more general settings such as time-varying graphs and directed graphs. Second, we develop a technique termed Loopless Chebyshev Acceleration (LCA) which can be implemented ``looplessly’’ and achieve similar effects with inner loops of Chebyshev acceleration. The LCA technique can accelerate many other gradient tracking based methods with respect to the graph condition number.

2021-12-06 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand In this talk, we propose a new nonlinear time series model: autoregressive conditional accelerated Fréchet (AcAF) model and introduce two new endopathic and exopathic competing risk measures for better learning risk patterns, decoupling systemic risk, and making better risk management. We establish the probabilistic properties of stationarity and ergodicity of the AcAF model. Statistical inference is developed through conditional maximum likelihood estimation. The consistency and asymptotic normality of the estimators are derived. Simulation demonstrates the efficiency of the proposed estimators and the AcAF model's flexibility in modeling heterogeneous data. Empirical studies on the stock returns and the cryptocurrency trading show the superior performance of the AcAF model in terms of the identified risk patterns, enhancing the understanding of the systemic risks of a market and their causes, and making better risk management possible.

2021-12-13 16:10:00 - 17:00:00 @ Rm 1801, Guanghua East Tower [poster]

Abstract: Click to expand This result is completed by Dr.Zhou Shijie affiliated to Dr.Lin Wei. Here, we design a scenario for stochastic adaptive control. Firstly, we theoraticlly prove the feasibility on stabilization, synchronization and parameter identification in terms of multi-dimension Brownian motion. And the criterion for specific noise is provided here. At last, we verify its feasibility by giving some typical examples of dynamical systems.