HEAD PREVIOUS

Footnotes:

1Throughout this book, matrices and vectors in abstract multidimensional space are denoted by upright bold font. Vectors in physical three-dimensional space are instead denoted by italic bold font.
2This problem with polynomial fitting is sometimes called Runge's phenomenon.
3Sometimes called simply "orthogonal", the real version of "unitary".
4A highly abbreviated outline of the SVD is as follows. The M×M matrix STS is symmetric. Therefore it has real eigenvalues di2 which because of its form are non-negative. Its eigenvectors vi, satisfying STSvi=di2vi, can be arranged into an orthonormal set in order of decreasing magnitude of di2. The M eigenvectors can be considered the columns of an orthonormal matrix V, which diagonalizes STS so that VTSTSV = D2 is an M×M non-negative, diagonal matrix with diagonal values di2. Since (SV)T SV is diagonal, the columns of the N×M matrix SV are orthogonal. Its columns corresponding to di=0 are zero and are not useful to us. The useful columns corresponding to non-zero di (i=1,...,L, say, L ≤ M) can be normalized by dividing by di. Then by appending N−L normalized column N-vectors that are orthogonal to all the previous ones, we can construct a complete N×N orthonormal matrix U = [SVD−1L,UN−L]. Here D−1L denotes the M×L diagonal matrix with elements 1/di and UN−L denotes the appended extra columns. Now consider UDVT. The appended UN−L make zero contribution to this product because the lower rows of D which they multiply are always zero. The rest of the product is SVD−1L DLVT=S. Therefore we have constructed the singular value decomposition S=UDVT.
5If M > N, the combination DD−1, which arises from forming SS−1 is not an M×M identity matrix. Instead it has ones only at most for the first N of the diagonal positions, and zeros thereafter. It is an N×N identity matrix with extra zero rows and columns padding it out to M×M. So the pseudo inverse is a funny kind of inverse which works only one way.
If S is square and nonsingular, then the pseudo inverse is exactly the same as the (normal) inverse.
If S were a singular square matrix, for example (and possibly in other situations) then at least one of the original dj would be zero. We usually consider the singular values (dj) to be arranged in descending order of size; so that the zero values come at the end. D−1 would then have an element 1/dj that is infinite, and the formal manipulations would be unjustified. What the pseudo-inverse does in these tricky cases is put the value of the inverse 1/dj equal to zero instead of infinity. In that case, once again, an incomplete identity matrix is produced, with extra diagonal zeros at the end. And it actually doesn't completely "work" as an inverse in either direction.
For those who know some linear algebra, what's happening is that the pseudo-inverse projects vectors in the range of the original matrix back into the complement of its nullspace.
6See for example, Numerical Recipes, W H Press, B P Flannery, S A Teukolsky, and W T Vettering, Cambridge University Press (first edition, 1989) (henceforth cited simply as Numerical Recipes) Section 2.9.
7by QR decomposition
8This notation means the first N rows of the compound matrix consist of S, and the next NR rows are λR.
9This regularization is equivalent to what is sometimes called a "smoothing spline". In the limit of large smoothing parameter λ, the function f is a straight line (zero second derivative everywhere). In the limit of small λ, it is a cubic spline interpolation through all the values (xi,yi).
10This topic and many others in data fitting is addressed for example in Siegmund Brandt (2014) Data Analysis Statistical and Computational Methods for Scientists and Engineers, Fourth Ed., Springer, New York.
11Of course Fourier analysis is usually approached differently, as briefly discussed in the final chapter. We are here just using sine as an example function, in part because it is familiar.
12There are lots of other correct but more verbose ways of doing it.
13If f is independent of y, then we should use f(xn+1/2) where xn+1/2=(xn+1+xn)/2, and we then have a formula accurate to second-order for performing simple integration
0xnf(x)dx=yn−y0= n−1

k=0 
f(xk+1/2)(xk+1−xk)

. There are fancier, higher-order, ways to do integration, but this is by far the most straightforward.
14The notation O(ϵn) denotes additional terms whose magnitude is proportional to a (usually small) parameter ϵ raised to the power n or higher.
15The proof is rather hard work. Here's an outline. Use notation F(δ x)=f(y(xn+δx),δx) to refer to values of the derivative function along the exact orbit. Suppose we compose
∆y = (1/6F(0)+1/3F(

2
) +1/3F(

2
)+1/6F(∆))∆

. This is the form corresponding to eqs. (2.16) and (2.17), but using the exact f on the orbit, rather than the intermediate f(n) estimates. By substituting from eq. (2.11) it is easy but tedious to show that this ∆y is equal to y(xn+∆) − yn+O(∆5); that is, it is an accurate representation of the y-step to fourth-order. Actually it is instructive to realize that the symmetry of the δx positions, 0,

2

, and ∆ ensures that all even order errors in this ∆y are zero while the first-order error is zero because the coefficients sum to unity. The factor of 2 difference between the center and the end coefficients is the choice that makes the third-order error zero. In itself this has not proved the scheme to be fourth-order accurate, because there are non-zero differences approximately proportional to
∂f

∂y

between the f values and the F values:
f(1)−F(

2
)= ∂f

∂ y
(y(1)−y(

2
))

,
f(2)−F(

2
)= ∂f

∂y
(y(2)−y(

2
))

, and
f(3)−F(∆)= ∂f

∂ y
(y(3)−y(∆))

. (Because of the order of the y-differences one can quickly see that
2

∂y2

terms don't need to be kept.) The successive differences such as
(y(1)−y(

2
))

can be expressed in terms of the Taylor expansion eq. (2.12), and then the f-differences are gathered in the combination of eq.  (2.17): 1/6,1/3,1/3,1/6. One can then evaluate the
∂f

∂y

terms and demonstrate that they cancel to fourth order.
16A more general form of bisection in which the interval is divided in two unequal parts weighted by the value of the function, sn=[f(sl)su−f(su)sl]/[f(sl)−f(su)] has comparable robustness, requires no additional function evaluations per step but a couple of extra multiplications, and converges quicker than plain bisection except in special cases. This is sometimes called the "false-position" root-finding method. It is generally possible to detect automatically when one of those slowly converging special cases is encountered and take additional measures.
17A shorthand way to remember the result for an order N derivative is that the numerator is the sum of the neighboring y values times the coefficients of a binomial expansion to the power N, and the denominator is ∆xN.
18An alternative approach to implementing boundary conditions is to regard the boundary positions as outside and not operated upon by the difference matrix. Then its indices runs only from 2 to N−1, and it has dimensions (N−2)×(N−2). For Dirichlet conditions, the term in the difference stencils of positions 2 and N−1 that contains the boundary value is moved into the right-hand-side source vector. Generically the matrix equation is then







−2
1
0
0
0
···
···
···
0
0
0
1
−2
1
0
0
0
···
···
···
0
0
0
1
−2














y2
:
yn
:
yN−1







=






g2∆x2−yL
:
gn∆x2
:
gN−1∆x2−yR







.
This form keeps the matrix symmetric, which is advantageous for some inversion algorithms.
19In some circumstances one can deliberately choose the mesh so as to put the boundary at x3/2. Then the implementation represented by eq.  (3.15) does put the value at the correct place. This mesh choice is appropriate if the boundary condition is known to be of purely derivative form. It then lends itself to the alternative approach of the previous note, excluding the boundary from D. The top left coefficient becomes D22=−1, and ∆x dy/dx|L is added to the source vector, to implement the boundary condition. Mixed boundary conditions are not handled so easily like this, which is why a second-order accurate form with the boundary at x1 has been given.
20The error in the first derivative y′ is approximately y"2.1/2∆x, first-order in ∆x, but the error in ∆y is second-order in ∆x giving first order accuracy.
21With periodic boundary conditions the homogeneous equation (
d2y

dx2
=0

) is satisfied by any constant y. An additional condition must therefore be applied to make the solution unique. Moreover, there exists no solution of the differential equation with continuous derivative unless ∫g dx=0. These requirements are reflected in the fact that the matrix of this system is singular. If (3.19) is solved by pseudo-inverse, it gives the solution having zero mean:

yn/N=0

, using as right-hand-side instead of ∆x2g the quantity
∆x2(g
gn /N)

.
22On a structured mesh, a finite volume method is identical to the finite difference method provided this conservative differencing is used.
23The subscript q on ρq reminds us this is charge density, not mass density ρ here.
24Vector dependent variable problems are hyperbolic if the matrix of the coefficients of their differentials is diagonizable with real eigenvalues, as we'll see later.
25Constructed with the distmesh routines from http://persson.berkeley.edu/distmesh/
26For object-oriented purists, this could be a "method".
27In effect, we linearize about an exact solution of the equations and the ψ we then analyse is the (presumed small) difference between the solution we obtain and the exact solution. In the text we avoid encumbrance of the notation by not drawing explicit attention to this fact.
28The Nyquist limit.
29Or, for example, an LU decomposition.
30See for example Numerical Recipes section 2.6
31Octave and Matlab have a simple built-in function to do reordering, called reshape().
32and of LU decomposition and backsubstitution
33We could alternatively have used integral positions and the second order scheme of eq. 3.16)
34Removing the (first-order) time derivative changes the classification of the equation because it is no longer a differential equation in t at all.
35Jacobi's method actually can be used to solve a rather general matrix equation Bψ=s, and is implemented as ψ(n+1)=D−1(sRψ(n)), where D is the diagonal part of B and R=BD is the rest. Its convergence is guaranteed if B is diagonally dominant, meaning that the absolute value of the diagonal term exceeds the sum of the absolute values of all the other coefficients of the corresponding row.
36For the matrix expert, the advancing matrix is then explicitly "two-cyclic, and consistently ordered".
37A Fourier treatment is not rigorously justified in general, but serves to illustrate simply the most important characteristics.
38See the enrichment section for an outline if you are interested. G D Smith Numerical Soution of Partial Differential Equations, Oxford University Press, 1985, p275ff gives an accessible detailed treatment.
39If we apply the boundary conditions, then the eigenmode is actually sin(jpπ/Nx)sin(kqπ/Ny) but the complex Von Neumann analysis gives the same result.
40In fact, if Nx=Ny and ∆x=∆y, then M=Nx, or alternatively if ∆y >> ∆x, and Nx is not far smaller than Ny, then M ≈ Nx.
41the difference between the "converged" iterative ψ(0) and the analytic ψ(0) normalized to the analytic value.
42If the Jacobian is position-dependent, then the modes are only approximately uncoupled, because our analysis is effectively assuming that J and ∂/∂x commute, which is not exact if J is spatially-varying. The stability analysis is then local, and only approximate. But in any case Von Neumann stability analysis is only approximate in non-uniform cases.
43provided the eigenvectors are linearly independent
44Although this can be proved, it is complicated.
45There is considerable subtlety in the question "how abruptly". After all any velocity change really takes a finite time and involves an acceleration. Why isn't it included in a? For present purposes we'll just finesse this question by saying that the source contains any velocity changes that are not included in the acceleration.
46(not multiple dependent variables that have to be arranged into a vector representation whose eigenvalue might not be real)
47The transport of radiation, of photons, can also be treated in this way, but their treatment has to be expressed in terms of energy (or momentum) rather than speed, since all photons travel at the same speed.
48Also there may be other spontaneous decays or fissions (delayed neutrons) but we'll ignore them for now.
49Most reactor physics texts express the velocity distribution differently, in terms of flux density ϕ ≡ v f, as the dependent variable. To retain the more general relevance to solving Boltzmann's equation, we here keep f as the dependent variable. The result is still equivalent to standard reactor physics equations.
50Although there we might have to include self-collisions.
51In practice delayed neutrons introduce a severe complication.
52Computational size reduction, retaining spatial dependence, might call for a very small number of speed groups. Perhaps even a single group NG=1. Then the neutron transport is reduced to a single diffusion equation.
53And the boundary conditions are also homogeneous. Of course "homogeneous" does not here mean uniform in space; it means having no constant terms.
54Strictly speaking, the inverse of the eigenvalues of the matrix (νΣfV)−1(−LtV−ΣsV).
55A generalized eigenvalue is the solution to (A−λB)v=0 where both A and B are general matrices.
56Actually if it were the largest eigenvalue of M, for a single group or diagonal G we could directly use the "power method", which is eq. 9.16 with the F-indices (n, n+1) exchanged. However, because we want the smallest eigenvalue 1/k, we must effectively invert M (which requires interior iteration) because M is never diagonal.
57e.g.  SOR
58See e.g. Applied Reactor Physics Alain Hébert, Presses internationales Polytechnique, www.polymtl.ca, (2009). Actually many other choices of weighting will work as well, not just (GF(n+1))T.
59The zeroes in the non-diagonal matrices arise because scatterings hardly ever move neutrons to higher energy or reduce the energy by more than one group, and fissions give rise only to energetic neutrons. The values given are roughly appropriate for a pressurized water reactor.
60In reactor physics B is called the "Buckling".
61Alder and Wainwright, "Phase transition for a hard sphere system", J. Chem. Phys. 27, 1208-1209, (1957). Liquid behavior is an important phenomenon for which Molecular Dynamics is useful.
62For fixed ∆t, the Verlet and leap-frog schemes are equivalent, with the identification vn=(vn−1/2+vn+1/2)/2. The Verlet scheme, implemented in terms of velocity as in eq. (10.1), requires more storage or more acceleration evaluations: because two values of a are needed. It can be implemented in terms of the position advance alone as xn+1=2xnxn−1 +a∆t2, which requires the same storage as leap-frog.
63Since the step cost for a neighboring domain of size ∝ Nl is ∝ NpNl3, and the step cost for the neighbor update is ∝ Np2/Nl, a formal optimum occurs when these are equal Nl ∝ Np1/4. So formally, the per-step cost of this Nl-optimized algorithm would be ∝ NpNp3/4: slightly better than Np2.
64See for example C K Birdsall and A B Langdon (1991) Plasma Physics via Computer Simulation, IOP Publishing, Bristol; or R W Hockney and J W Eastwood (1988) Computer Simulation using Particles, Taylor and Francis, New York
65See for example the SOR estimates.
66Siegmund Brandt (2014) Data Analysis Statistical and Computational Methods for Scientists and Engineers, Fourth Ed., Springer, New York, gives a much more expansive introduction to statistics and Monte Carlo techniques.
67Statisticians use the generic word "sample" to refer to the particular result of a single test or measurement.
68Division by the factor N−1 rather than N makes this formula an unbiassed estimate of the distribution variance. One way to understand this is to recognize that the number of degrees of freedom of
N

1 
(vi−μN)2

is N−1, not N. Using N−1 is sometimes called "Bessel's correction".
69Often called the "expectation" of v
70Central Limit Theorem. It is not straightforward to prove that the distribution becomes Gaussian. But it is fairly easy to show that the variance of the sample mean is the variance of the distribution divided by N. From the definition of μN one can immediately deduce that
N−μ)2=
1

N
N

1 
(vi−μ)
2

 
= 1

N2
N

i,j 
(vi−μ)(vj−μ).
Take the expectation 〈... 〉 of this quantity to obtain the variance of the distribution of sample means:
〈(μN−μ)2〉 = 1

N2
N

i,j 
〈(vi−μ)(vj−μ)〉 = 1

N2
N

i 
〈(vi−μ)2〉 = S2

N
.
The first equality, taking the expectation inside the sum, is a simple property of taking the expectation: the expectation of a sum is the sum of the expectations. The second equality uses the fact that 〈(vi−μ)(vj−μ)〉 = 0 for i ≠ j because the quantities (vi−μ) are statistically independent and have zero mean. That is sufficient to yield the required result. Our estimate for the distribution variance is S2=SN2. So the unbiassed estimate for the variance of μN is 〈(μN−μ)2〉 = SN2/N. The standard error is the square root of this quantity.
71An un-normalized Gaussian distribution has three, including the height.
72Linear interpolation is then equivalent to representing pv as a histogram. So adequate resolution may require a fairly large number Nt.
73Quiet start and quasi-random selection. When starting a particle-in-cell simulation, the initial positions of the particles might be chosen using random numbers to decide their location. However, they then will have density fluctuations of various wavelengths that in plasmas may be bigger than are present after running the simulation for many steps. The reason for this discrepancy is that the feedback effect of the self-consistent electric potential tends to smooth out density fluctuations, so that in the fully developed simulation the noise level is lower than purely random. A simple way of saying the same thing is that individual particles repel others of the same type, preventing clumping of the particles. It is therefore often physically reasonable to start a PIC simulation with positions that are chosen to be more evenly spaced than purely random. Indeed, for some calculations it is advantageous (but non-physical) to start with density fluctuations even lower than the final level that would be present for a steady plasma. In either case, what is called a "Quiet Start" can be obtained by using what are called "quasi-random" numbers [see e.g. Numerical Recipes] instead of (pseudo) random numbers. Quasi-random numbers are somewhat random, but much smoother in their distribution because each new number takes account of the already used numbers and tries to avoid being close to them. Successive numbers are thus correlated rather than uncorrelated. For Monte-Carlo integration, such smoother distributions in space are also often highly appropriate and can give lower-noise results for the same number of samples, beating the weak, 1/√N, decrease of fractional error.
74Or if we wish to do "quiet injection" that is smoother than purely random
75The Discrete Poisson distribution. Suppose there is a very large number N of similar, uncorrelated, events (for example radioactive decays of atoms) waiting to occur. In a time duration far shorter than the waiting time (e.g. the half-life) of an individual event the probability that any one of them occurs is small, say p = r/N. Then r is the average total number of events occuring in this time duration. The total number of events actually occuring in a particular time sample is then an integer, and we want to find the probability of each possible integer. Any sample consists of N choices about the individual events: yes or no. The number of yes events is distributed as a Binomial distribution, in which the probability of n yes events is given by the number of different ways to choose n out of N, which is
N!

n! (N−n)!

, times the probability of each specific arrangement of n yes events and N−n no events, pn(1−p)N−n. The total is
pn = N!

n! (N−n)!
pn(1−p)N−n = rn

n!

1− r

N

N

 

1

Nn(1−r/ N)n
N!

(N−n)!

.
Now recognize that the limit for large N (but constant n) of the square bracket term is 1; while the limit of the term (1−r/N)N is exp(−r). Therefore the probability of obtaining n total events, of a type that are completely uncorrelated (N→ ∞), when their average rate of occurrence is r is
pn = rn

n!
exp(−r).
This is the discrete Poisson distribution.
76Or double-precision if N is very large.
77The random generators I used here are both portable. The good generator is the RANLUX routine described by M. Luscher, Computer Physics Communications 79 (1994) 100, and F. James, Computer Physics Communications 79 (1994) 111, used with the lowest luxury level. The bad generator is the RAN1 routine from the first (FORTRAN) edition of Numerical Recipes (1989). It was replaced by better routines in later editions.
78For photons, the "velocity" should be interpreted as a combination of energy (frequency or wavelength) and propagation direction, since the particle's speed is always the speed of light.
79The velocity distribution fk in discrete velocity bins d3vk may also be desired. It is given by summing only those passages that occur in the velocity bin:
fkd3vk =

vi ∈ d3vk 
∆ti

. It will have greater statistical uncertainty because of fewer samples per bin.
80One such book with lots of engineering examples is Steven C Chapra and Raymond P Canale (2006) (5th Ed, or later) "Numerical Methods for Engineers" McGraw-Hill, NY.
81The volume of a tetrahedron can be written as a 4×4 determinant using the cartesian coordinates (xk,yk,zk) of its four corners (k=1,4) as
V = 1

6






1
x1
y1
z1
1
x2
y2
z2
1
x3
y3
z3
1
x4
y4
z4






82These important details are discussed in texts devoted to finite elements, such as, Thomas J R Hughes (1987) "The Finite Element Method" Prentice Hall, Englewood Cliffs, NJ. The classic text on the background mathematical theory is, G Strang and G J Fix (1973, 2008) "An Analysis of the Finite Element Method", Reissued by Wellesley-Cambridge, Press, USA.
83Since f is periodic, its value at j=0 is the same as at j=N, so the range is equivalent to 1,...,N.
84If N is odd, then the highest frequency is obviously (N−1)/2, but we'll avoid reiterating this alternative.
85ϵ is a small adjustment of the end point to avoid it falling on a delta function.
86An extensive discussion is given in W H Press et al, "Numerical Recipes", chapter 12. Various open source implementations are available.
87A non-zero x0 can be removed by reexpressing the equation in terms of x′=xx0.
88The present derivation draws on C P Jackson and P C Robinson (1985) "A numerical study of various algorithms related to the preconditioned conjugate gradient method", International Journal for Numerical Methods in Engineering, Vol 21, pp 1315-1338, and G Markham (1990) "Conjugate Gradient Type Methods for Indefinite, Asymmetric, and Complex Systems" IMA Journal of Numerical Analysis, Vol 10, pp 155-170.
89This is a type of Gramm-Schmidt basis-set construction.
90Succinct descriptions of a wide range of iterative algorithms are given by R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, H. Van der Vorst (1994) "Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods" 2nd Edition, SIAM, Philadelphia, which is available at http://www.netlib.org/linalg/html_templates/report.html. Open source libraries of implementations of Krylov schemes are widely available. One of the more comprehensive is "SLATEC" at http://www.netlib.org/slatec/.
91 Bi-conjugate gradient iterations are implemented using two residual and two search direction vectors requiring just one multiplication by A and one by AT per step in this way:
1 Initialize:choose x0, r0=bAx0, p0=r0, choose
-
r
 

0 

,
-
p
 

0 
=
-
r
 

0 

, set k=1.
2 Calculate α:
αk−1=rTk−1
-
r
 

k−1 
/pTk−1A
-
p
 

k−1 

3 Update rs, x:rk = rk−1 − αk−1 Apk−1,   
-
r
 

k 
=
-
r
 

k−1 
− αk−1 AT
-
p
 

k−1 

,
xk=xk−1 − αk−1 Apk−1.
4 Calculate β:
βk,k−1=
-
r
 
T
k 
rk/
-
r
 
T
k−1 
rk−1.

5 Update ps:pk=rk−βk,k−1pk−1    and    
-
p
 

k 
=
-
r
 

k 
−βk,k−1
-
p
 

k−1 
.

6 Convergence?If not converged, increment k and repeat from 2.
The conjugate gradient scheme is the same except no barred quantities are calculated and unbarred quantities replace them.
92A compact mathematical description is given, for example, by Stephen Jardin (2010) "Computational Methods for Plasma Physics" CRC press, Taylor and Francis, Boca Raton. Section 3.7.2.
93The number of iterations required to converge to a given accuracy for separated eigenvalues is approximately proportional to the square root of the matrix condition number. The condition number is the ratio of the maximum to the minimum eigenvalue. That is unity for the unit matrix because all eigenvalues are 1. For a matrix representing an isotropic finite-difference Laplacian, the eigenvalues are the squared wave numbers of the spatial Fourier modes. The condition number is therefore the sum of the squares of the number of mesh points in each dimension. The number of iterations required is therefore proportional to the mesh count of the largest dimension. That is the same as optimized SOR. Finding the optimum relaxation parameter for SOR is relatively easy for a rectangular mesh on a cubical domain. Consequently un-preconditioned Krylov methods for finite-difference equations in simple domains are generally not much faster than SOR. Krylov methods come into their own when the matrices are more complicated and optimizing SOR is impossible, for example for finite element methods on unstructured grids.
94There are some problems in liquids where sound waves are important; just a minority.
95This equation arises from substituting ρ = constant into the (sourceless) continuity equation, so
0= ∂ρ

∂t
=−∇.(ρv)

. If ∇ρ is non-zero, so ∇.(ρv) is non-zero even though ∇.v=0, then the pressure equation acquires an additional term.
96See, e.g. J H Ferziger and M Perić (2002) "Computational Methods for Fluid Dynamics" third edition, Springer, Berlin.
97See, e.g. S Jardin (2010) op cit.
98A comprehensive introduction to limiter methods is given in Randall J Leveque (2002) "Finite Volume Methods for Hyperbolic Problems", Cambridge University Press.
99A large jet aircraft may reach 109.
100See for example U. Piomelli (1999) "Large-eddy simulation: achievements and challenges" Progress in Aerospace Sciences, Vol 35, pp 335-362
101or the product of two different fluctuating quantities
102See for example B E Launder, G J Reece, and W Rodi (1975) "Progress in development of a Reynolds-stress turbulence closure", Journal of Fluid Mechanics, Vol 68, pp 537-566.
103Of course there are deeper ways to think about vector spaces, but we here use the simplest approach.
104So a column vector can be considered a J×1 matrix and a row vector a 1×J matrix.
105Consider recursively expanding the determinant and subsequent cofactors, always choosing not to expand by the rows that are identical. Eventually one ends up with all 2×2 cofactors composed of the identical rows. They are all zero.
106A left inverse AL such that ALA=I must always be equal to a right inverse AR such that AAR=I because AL=ALI=AL(AAR) = (ALA)AR=IAR=AR.
HEAD


File translated from TEX by TTH, version 4.12.