By employing modulus-based matrix splitting iteration methods as smoothers, we establish modulus-based multigrid methods for solving large sparse linear complementarity problems. The local Fourier analysis is used to quantitatively predict the asymptotic convergence factor of this class of multigrid methods. Numerical results indicate that the modulus-based multigrid methods of the W-cycle can achieve optimality in terms of both convergence factor and computing time, and their asymptotic convergence factors can be predicted perfectly by the local Fourier analysis of the corresponding modulus-based two-grid methods.

The problem of determining matrix inertia is very important in many applications, for example, in stability analysis of dynamical systems. In the (point-wise) H-matrix case, it was proven that the diagonal entries solely reveal this information. This paper generalises these results to the block H-matrix cases for 1, 2, and matrix norms. The usefulness of the block approach is illustrated on 3 relevant numerical examples, arising in engineering.

The Jacobi, Gauss-Seidel and successive over-relaxation methods are well-known basic iterative methods for solving system of linear equations. In this paper, we extend those basic methods to solve the tensor equation
, where
is an *m*th-order *n*−dimensional symmetric tensor and *b* is an *n*-dimensional vector. Under appropriate conditions, we show that the proposed methods are globally convergent and locally r-linearly convergent. Taking into account the special structure of the Newton method for the problem, we propose a Newton-Gauss-Seidel method, which is expected to converge faster than the above methods. The proposed methods can be extended to solve a general symmetric tensor equations. Our preliminary numerical results show the effectiveness of the proposed methods.

It is shown that the matrix sequence generated by Euler's method starting from the identity matrix converges to the principal *p*th root of a square matrix, if all the eigenvalues of the matrix are in a region including the one for Newton's method given by Guo in 2010. The convergence is cubic if the matrix is invertible. A modification version of Euler's method using the Schur decomposition is developed. Numerical experiments show that the modified algorithm has the overall good numerical behavior.

It is well known that the standard algorithm for the mixed least squares–total least squares (MTLS) problem uses the QR factorization to reduce the original problem into a standard total least squares problem with smaller size, which can be solved based on the singular value decomposition (SVD). In this paper, the MTLS problem is proven to be closely related to a weighted total least squares problem with its error-free columns multiplied by a large weighting factor. A criterion for choosing the weighting factor is given; and for the sake of stability in solving the MTLS problem, the Cholesky factorization-based inverse (Cho-INV) iteration and Rayleigh quotient iteration are also considered. For large-scale MTLS problems, numerical tests show that Cho-INV is superior to the standard QR-SVD method, especially for the case with big gap between the desired and undesired singular values and the case when the coefficient matrix has much more error-contaminated columns. Rayleigh quotient iteration behaves more efficient than QR-SVD for most cases and fails occasionally, and in some cases, it converges much faster than Cho-INV but still less efficient due to its higher computation cost.

Tikhonov regularization is one of the most popular approaches to solving linear discrete ill-posed problems. The choice of the regularization matrix may significantly affect the quality of the computed solution. When the regularization matrix is the identity, iterated Tikhonov regularization can yield computed approximate solutions of higher quality than (standard) Tikhonov regularization. This paper provides an analysis of iterated Tikhonov regularization with a regularization matrix different from the identity. Computed examples illustrate the performance of this method.

Image registration is a central problem in a variety of areas involving imaging techniques and is known to be challenging and ill-posed. Regularization functionals based on hyperelasticity provide a powerful mechanism for limiting the ill-posedness. A key feature of hyperelastic image registration approaches is their ability to model large deformations while guaranteeing their invertibility, which is crucial in many applications. To ensure that numerical solutions satisfy this requirement, we discretize the variational problem using piecewise linear finite elements, and then solve the discrete optimization problem using the Gauss–Newton method. In this work, we focus on computational challenges arising in approximately solving the Hessian system. We show that the Hessian is a discretization of a strongly coupled system of partial differential equations whose coefficients can be severely inhomogeneous. Motivated by a local Fourier analysis, we stabilize the system by thresholding the coefficients. We propose a Galerkin-multigrid scheme with a collective pointwise smoother. We demonstrate the accuracy and effectiveness of the proposed scheme, first on a two-dimensional problem of a moderate size and then on a large-scale real-world application with almost 9 million degrees of freedom.

A fast algorithm for enclosing the solution of the nonsymmetric algebraic Riccati equation arising in transport theory is proposed. The equation has a special structure, which is taken into account to reduce the complexity. By exploiting the structure, the enclosing process involves only quadratic complexity under a reasonable assumption. The algorithm moreover verifies the uniqueness and minimal positiveness of the enclosed solution. Numerical results show the efficiency of the algorithm.

Sparse symmetric indefinite linear systems of equations arise in numerous practical applications. In many situations, an iterative method is the method of choice but a preconditioner is normally required for it to be effective. In this paper, the focus is on a class of incomplete factorization algorithms that can be used to compute preconditioners for symmetric indefinite systems. A limited memory approach is employed that incorporates a number of new ideas with the goal of improving the stability, robustness, and efficiency of the preconditioner. These include the monitoring of stability as the factorization proceeds and the incorporation of pivot modifications when potential instability is observed. Numerical experiments involving test problems arising from a range of real-world applications demonstrate the effectiveness of our approach.

In this paper, we study a deblurring algorithm for distorted images by random impulse response. We propose and develop a convex optimization model to recover the underlying image and the blurring function simultaneously. The objective function is composed of 3 terms: the data-fitting term between the observed image and the product of the estimated blurring function and the estimated image, the squared difference between the estimated blurring function and its mean, and the total variation regularization term for the estimated image. We theoretically show that under some mild conditions, the resulting objective function can be convex in which the global minimum value is unique. The numerical results confirm that the peak-to-signal-noise-ratio and structural similarity of the restored images by the proposed algorithm are the best when the proposed objective function is convex. We also present a proximal alternating minimization scheme to solve the resulting minimization problem. Numerical examples are presented to demonstrate the effectiveness of the proposed model and the efficiency of the numerical scheme.

We generalize the matrix Kronecker product to tensors and propose the tensor Kronecker product singular value decomposition that decomposes a real *k*-way tensor
into a linear combination of tensor Kronecker products with an arbitrary number of *d* factors. We show how to construct
, where each factor
is also a *k*-way tensor, thus including matrices (*k*=2) as a special case. This problem is readily solved by reshaping and permuting
into a *d*-way tensor, followed by a orthogonal polyadic decomposition. Moreover, we introduce the new notion of general symmetric tensors (encompassing symmetric, persymmetric, centrosymmetric, Toeplitz and Hankel tensors, etc.) and prove that when
is structured then its factors
will also inherit this structure.

The finite difference discretization of the spatial fractional diffusion equations gives discretized linear systems whose coefficient matrices have a diagonal-plus-Toeplitz structure. For solving these diagonal-plus-Toeplitz linear systems, we construct a class of diagonal and Toeplitz splitting iteration methods and establish its unconditional convergence theory. In particular, we derive a sharp upper bound about its asymptotic convergence rate and deduct the optimal value of its iteration parameter. The diagonal and Toeplitz splitting iteration method naturally leads to a diagonal and circulant splitting preconditioner. Analysis shows that the eigenvalues of the corresponding preconditioned matrix are clustered around 1, especially when the discretization step-size *h* is small. Numerical results exhibit that the diagonal and circulant splitting preconditioner can significantly improve the convergence properties of GMRES and BiCGSTAB, and these preconditioned Krylov subspace iteration methods outperform the conjugate gradient method preconditioned by the approximate inverse circulant-plus-diagonal preconditioner proposed recently by Ng and Pan (M.K. Ng and J.-Y. Pan, SIAM J. Sci. Comput. 2010;32:1442-1464). Moreover, unlike this preconditioned conjugate gradient method, the preconditioned GMRES and BiCGSTAB methods show *h*-independent convergence behavior even for the spatial fractional diffusion equations of discontinuous or big-jump coefficients.

The treatment of the stochastic linear quadratic optimal control problem with finite time horizon requires the solution of stochastic differential Riccati equations. We propose efficient numerical methods, which exploit the particular structure and can be applied for large-scale systems. They are based on numerical methods for ordinary differential equations such as Rosenbrock methods, backward differentiation formulas, and splitting methods. The performance of our approach is tested in numerical experiments.

In this paper, we compare two block triangular preconditioners for different linearizations of the Rayleigh–Bénard convection problem discretized with finite element methods. The two preconditioners differ in the nested or nonnested use of a certain approximation of the Schur complement associated to the Navier–Stokes block. First, bounds on the generalized eigenvalues are obtained for the preconditioned systems linearized with both Picard and Newton methods. Then, the performance of the proposed preconditioners is studied in terms of computational time. This investigation reveals some inconsistencies in the literature that are hereby discussed. We observe that the nonnested preconditioner works best both for the Picard and for the Newton cases. Therefore, we further investigate its performance by extending its application to a mixed Picard–Newton scheme. Numerical results of two- and three-dimensional cases show that the convergence is robust with respect to the mesh size. We also give a characterization of the performance of the various preconditioned linearization schemes in terms of the Rayleigh number.

The contour integral-based eigensolvers are the recent efforts for computing the eigenvalues inside a given region in the complex plane. The best-known members are the Sakurai–Sugiura method, its stable version CIRR, and the FEAST algorithm. An attractive computational advantage of these methods is that they are easily parallelizable. The FEAST algorithm was developed for the generalized Hermitian eigenvalue problems. It is stable and accurate. However, it may fail when applied to non-Hermitian problems. Recently, a dual subspace FEAST algorithm was proposed to extend the FEAST algorithm to non-Hermitian problems. In this paper, we instead use the oblique projection technique to extend FEAST to the non-Hermitian problems. Our approach can be summarized as follows: (a) construct a particular contour integral to form a search subspace containing the desired eigenspace and (b) use the oblique projection technique to extract desired eigenpairs with appropriately chosen test subspace. The related mathematical framework is established. Comparing to the dual subspace FEAST algorithm, we can save the computational cost roughly by a half if only the eigenvalues or the eigenvalues together with their right eigenvectors are needed. We also address some implementation issues such as how to choose a suitable starting matrix and design-efficient stopping criteria. Numerical experiments are provided to illustrate that our method is stable and efficient.

No abstract is available for this article.

]]>The following paper discusses the application of a multigrid-in-time scheme to Least Squares Shadowing (LSS), a novel sensitivity analysis method for chaotic dynamical systems. While traditional sensitivity analysis methods break down for chaotic dynamical systems, LSS is able to compute accurate gradients. Multigrid is used because LSS requires solving a very large Karush–Kuhn–Tucker system constructed from the solution of the dynamical system over the entire time interval of interest. Several different multigrid-in-time schemes are examined, and a number of factors were found to heavily influence the convergence rate of multigrid-in-time for LSS. These include the iterative method used for the smoother, how the coarse grid system is formed and how the least squares objective function at the center of LSS is weighted.

This paper proposes a new, low-communication algorithm for solving PDEs on massively parallel computers. The range decomposition (RD) algorithm exposes coarse-grain parallelism by applying nested iteration and adaptive mesh refinement locally before performing a global communication step. Just a few such steps are observed to be sufficient to obtain accuracy within a small multiple of discretization error. The target applications are petascale and exascale machines, where hierarchical parallelism is required and traditional parallel numerical PDE communication patterns are costly because of message latency. The RD algorithm uses a partition of unity to equally distribute the error, and thus, the work. The computational advantages of this approach are that the decomposed problems can be solved in parallel without any communication until the partitioned solutions are summed. This offers potential advantages in the paradigm of expensive communication but very cheap computation. This paper introduces the method and explains the details of the communication step. Two performance models are developed, showing that the latency cost associated with a traditional parallel implementation of nested iteration is proportional to *l**o**g*(*P*)^{2}, whereas the RD method reduces the communication latency to *l**o**g*(*P*), while maintaining similar bandwidth costs. Numerical results for two problems, Laplace and advection diffusion, demonstrate the enhanced performance, and a heuristic argument explains why the method converges quickly. Copyright © 2016 John Wiley & Sons, Ltd.

The incompressible. Stokes equations are a widely used model of viscous or tightly confined flow in which convection effects are negligible. In order to strongly enforce the conservation of mass at the element scale, special discretization techniques must be employed. In this paper, we consider a discontinuous Galerkin approximation in which the velocity field is *H*(div,Ω)-conforming and divergence-free, based on the Brezzi, Douglas, and Marini finite-element space, with complementary space (*P*_{0}) for the pressure. Because of the saddle-point structure and the nature of the resulting variational formulation, the linear systems can be difficult to solve. Therefore, specialized preconditioning strategies are required in order to efficiently solve these systems. We compare the effectiveness of two families of preconditioners for saddle-point systems when applied to the resulting matrix problem. Specifically, we consider block-factorization techniques, in which the velocity block is preconditioned using geometric multigrid, as well as fully coupled monolithic multigrid methods. We present parameter study data and a serial timing comparison, and we show that a monolithic multigrid preconditioner using Braess–Sarazin style relaxation provides the fastest time to solution for the test problem considered. Copyright © 2016 John Wiley & Sons, Ltd.

Large-scale scientific computing models are needed for the simulation of wave propagation especially for multiple frequency and high-frequency models in complex heterogeneous media. Multigrid methods provide efficient iterative solvers for many large sign-definite systems of equations resulting from physical models. Time-harmonic wave propagation models lead to sign-indefinite systems with eigenvalues in the left half of the complex plane. Thus standard multigrid approaches applied in conjunction with a low-order finite difference or finite element method are not sufficient. In this work, we describe a high-order finite element method model for multiple (low to high) frequency time-harmonic acoustic wave propagation on general curved, non-convex, and non-smooth domains with heterogeneous media using a multigrid approximation of the shifted Laplacian operator as a preconditioner. We implement the model using an efficient geometric multigrid approach with parallel grid transfer operator calculations to simulate the model using the BiCGStab iterative solver. We demonstrate the efficiency and parallel performance of the computational model with multiple low (5 wavelength) to high-frequency (100 wavelength) input incident waves. Copyright © 2016 John Wiley & Sons, Ltd.

In this paper, we present preconditioning techniques to accelerate the convergence of Krylov solvers at each step of an Inexact Newton's method for the computation of the leftmost eigenpairs of large and sparse symmetric positive definite matrices arising in large-scale scientific computations. We propose a two-stage spectral preconditioning strategy: The first stage produces a very rough approximation of a number of the leftmost eigenvectors. The second stage uses these approximations as starting vectors and also to construct the tuned preconditioner from an initial inverse approximation of the coefficient matrix, as proposed by Martínez. In the framework of the Implicitly Restarted Lanczos method. The action of this spectral preconditioner results in clustering a number of the eigenvalues of the preconditioned matrices close to one. We also study the combination of this approach with a BFGS-style updating of the proposed spectral preconditioner as described by Bergamaschi and Martínez. Extensive numerical testing on a set of representative large SPD matrices gives evidence of the acceleration provided by these spectral preconditioners.

The paper studies numerical properties of LU and incomplete LU factorizations applied to the discrete linearized incompressible Navier–Stokes problem also known as the Oseen problem. A commonly used stabilized Petrov–Galerkin finite element method for the Oseen problem leads to the system of algebraic equations having a 2 × 2-block structure. While enforcing better stability of the finite element solution, the Petrov–Galerkin method perturbs the saddle-point structure of the matrix and may lead to less favorable algebraic properties of the system. The paper analyzes the stability of the LU factorization. This analysis quantifies the effect of the streamline upwind Petrov–Galerkin stabilization in terms of the perturbation made to a nonstabilized system. The further analysis shows how the perturbation depends on the particular finite element method, the choice of stabilization parameters, and flow problem parameters. The analysis of LU factorization and its stability helps to understand the properties of threshold ILU factorization preconditioners for the system. Numerical experiments for a model problem of blood flow in a coronary artery illustrate the performance of the threshold ILU factorization as a preconditioner. The dependence of the preconditioner properties on the stabilization parameters of the finite element method is also studied numerically.

In one study, Hillar and Lim famously demonstrated that “multilinear (tensor) analogues of many efficiently computable problems in numerical linear algebra are nondeterministic polynomial-time hard.” Despite many recent advancements, the state-of-the-art methods for computing such “tensor analogues” still suffer severely from the curse of dimensionality. In this paper, we show that the Tucker core of a tensor, however, retains many properties of the original tensor, including the CANDECOMP/PARAFAC (CP) rank, the border rank, the tensor Schatten quasi norms, and the Z-eigenvalues. When the core tensor is smaller than the original tensor, this property leads to considerable computational advantages as confirmed by our numerical experiments. In our analysis, we in fact work with a generalized Tucker-like decomposition that can accommodate any full column-rank factor matrices.

The ChebFilterCG algorithm, proposed by Golub, Ruiz, and Touhami [SIAM J. Matrix Anal. Appl. 29 (2007), pp. 774-795] is an iterative method that combines Chebyshev filter and conjugate gradient for solving symmetric positive definite linear systems with multiple right-hand sides. The Chebyshev filter is used to produce initial residuals rich in eigenvectors corresponding to the smallest eigenvalues, which are then used in the initial phase of the conjugate gradient. This paper presents a convergence analysis of ChebFilterCG. In particular, it is shown theoretically and numerically that the algorithm yields an approximation of the invariant subspace associated with the smallest eigenvalues that can be recycled for solving several linear systems with the same matrix and different right-hand sides. A refined error bound when solving these systems is also given. The choice and influence of the Chebyshev filtering steps is discussed. Numerical experiments are described to illustrate that the Chebyshev filter does not degrade the distribution of the smallest eigenvalues and highlight the effect of rounding errors when large outlying eigenvalues are present. Finally, it is shown that the method may become more effective when an additional Chebyshev filtering step is used in the initialization phase of ChebFilterCG.

We consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. These methods are expected to have considerable potential for resiliency to faults when implemented on massively parallel machines. We establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.

Krylov subspace approximations to the matrix exponential are popularly used with full orthogonalization instead of incomplete orthogonalization, even though the latter strategy is known to reduce the cost by truncating the recurrences of the modified Gram–Schmidt process. This study combines such a strategy with an adaptive step-by-step integration scheme that allows both the stepsize and the dimension of the Krylov subspace to vary. A convergence analysis is done. Numerical results on test problems drawn from systems biology and computer systems show a significant speedup over the standard implementation with full orthogonalization and fixed dimension.