Programming the Commodore PET

My first programming languages were Fortran, learned in my mathematics degree, and Basic, which was the language built into the ROM (read only memory) of the Commodore PET. The PET—one of the early microcomputers, first produced in 1977—stored programs on cassette tapes, using its integral cassette deck. In those days you could buy programs on tape, at what today would seem very high prices, but people mostly wrote their own programs or typed them in from a magazine.

Many computer magazines existed that published (and paid for) programs written by hobbyists. Readers would happily type in a page or two of code in order to try the programs out. The two most popular categories of programs were games and utilities. I recently came across a file in which I’d kept a page from the January 1982 issue of Your Computer magazine containing one of my first published programs. The whole page is here in PDF form.


The purpose of the Screenprint program is to print the contents of the PET’s screen to an attached Commodore printer—something there was no built-in way to do. The pictures underneath the listing are screen dumps of the output from another program I wrote called Whirly Art.

Some explanation of the code is needed.

  • In Commodore Basic (a variant of Microsoft Basic) , line numbers are essential and are the target for a goto statement, which jumps to the specified line. Notice that the line numbers here are spaced about 10 apart: this is to allow for insertion of intermediate lines as the code is developed. There was no standard way to renumber the lines of a program, but I think I later acquired an add-on ROM that could do this.
  • There are no spaces within any non-comment line of code. Today this would be terrible coding practice, but each space took one byte of storage and in those days storage was at a premium. This bit of code was intended to be appended to another program, so it was important that it occupy as little memory as possible. The Basic interpreter was able to recognize keywords IF, THEN, FOR, etc., even when they were not surrounded by spaces.
  • The PEEK command reads a byte from RAM (random access memory) and the POKE commands writes a byte to RAM. CHR$(A) is a string character with ASCII value A.

I was surprised to find that scans of many issues of Your Computer are available on the Internet Archive. Indeed that archive contains a huge range of material on all topics, including a selection of books about the Commodore PET, some of which are in my loft but which I have not looked at for years.

Posted in software | Tagged | Leave a comment

Publication Peculiarities: More Author Lists

In an earlier post in my series of posts on publication peculiarities, I wrote about author lists. Here are some more offerings on the same topic.

Number of Authors

A contender for the world record for the paper with the greatest number of authors is the 5,154-author paper

G. Aad, B. Abbott, J. Abdallah, O. Abdinov et al., Combined measurement of the Higgs boson mass in pp Collisions at \sqrt{s}=7 and 8 TeV with the ATLAS and CMS experiments. Phys. Rev. Lett., 114, 191803, 2015.

It comprises 8.5 pages of text and 24.5 pages of author list and author addresses.

Names that Relate to the Paper Title

New Scientist magazine used the term nominative determinism for the tendency for authors to gravitate to fields of research related to their surname. (See this article for more background on the term.)

A. G. Cock, Genetical Studies on Growth and Form in the Fowl, Genetical Research 4, 167-192, 1963.

A. J. Splatt and D. Weedon, The Urethral Syndrome: Experience with the Richardson Urethroplasty, British Journal of Urology 49, 173-176, 1977.

Zhian Sun and Keith Shine, Studies of the Radiative Properties of Ice and Mixed-Phase Clouds, Q. J. R. Meteorol. Soc. 120, 111-137, 1994.


It’s not hard to find examples of husband-wife co-authors. Other relations are less common.


Nicholas J. Higham and Desmond J. Higham, Large Growth Factors in Gaussian Elimination with Pivoting, SIAM J. Matrix Anal. Appl., 10, 155-164, 1989.

Father (second author) and son (first author):

Alex Olshevsky and Vadim Olshevsky, Kharitonov’s Theorem and Bezoutians, Linear Algebra Appl., 399 (1), 285-297, 2005.

Michael Stewart and G. W. Stewart, On Hyperbolic Triangularization: Stability and Pivoting, SIAM J. Matrix Anal. Appl., 19, 847-860, 1998

Mother (Alicja) and daughter (Agata):

Alicja Smoktunowicz, Agata Smoktunowicz, and Ewa Pawelec, The three-term recursion for Chebyshev polynomials is mixed forward-backward stable, Numerical Algorithm, 69(4), 785–794, 2015.

Grandfather (Walter) and grandson (Daniel):

Walter Ledermann, Carol Alexander and Daniel Ledermann, Random Orthogonal Matrix Simulation, Linear Algebra Appl. 434, 1444-1467, 2011

Rhyming Names

G. E. P. Box and D. R. Cox, An Analysis of Transformations, Journal of the Royal Statistical Society. Series B (Methodological) 26, 211-252, 1964.

Pronounced the same but spelled differently:

Peter D. Burns and Roy S. Berns, Error Propagation Analysis in Color Measurement and Imaging, Color Research & Application 22, 280-289, 1997

The latter paper also has the distinction of having a DOI that I cannot get to parse correctly in this post: 10.1002/(SICI)1520-6378(199708)22:4<280::AID-COL9>3.0.CO;2-L

Names that are Colours

R. A. Brown and C. H. Green, Threats to Health or Safety: Perceived Risk and Willingness-To-Pay, Social Science & Medicine. Part C: Medical Economics 15, 67-75, 1981.

Esther Black and Craig White, Fear of Recurrence, Sense of Coherence and Posttraumatic Stress Disorder in Haematological Cancer Survivors, Psycho-Oncology 14, 510-515, 2005


The next article includes the unusual combination of Wright and Wrong.

S. Levi, C. T. Dollery, S. R. Bloom, J. Calam, T. M. Cox, H. J. F. Hodgson, M. S. Losowsky, M. B. Pepys, N. A. Wright, and O. M. Wrong, Campylobacter Pylori, Duodenal Ulcer Disease, and Gastrin, BMJ 299, 1093-1094, 1989.

Ones That Got Away

Many years ago Ron Mitchell, of the University of Dundee, told me that there was a report or paper by Collar and Tie, the first author presumably being the engineer Arthur Roderick Collar. I have not being able to locate this publication.

Fictitious Authors

There are a number of examples of fictitious authors with amusing names being included on papers. I will not try to document any here, but point to The true story of Stronzo Bestiale (and others scientific jokes) for some examples.


Thanks to Des Higham for pointing out Box and Cox, Brown and Green, and Cock, which are taken from Learning LaTeX (page 40) by D. F. Griffiths and D. J. Higham.

Posted in publication peculiarities | Leave a comment

PeerJ Computer Science: A New Publishing Experience


The download button and the button to (un)follow an article.

PeerJ Computer Science began operation in early 2015. I’ve just published a paper in the journal, and am also an editor of it. PeerJ Computer Science does a lot of things differently than journals that I’ve published with before, so I thought it would be useful to explain what is different and novel about it.

PeerJ Computer Science is an open access journal whose articles are published under a Creative Commons Attribution license. As with its older sister journal Peerj (life, biological and health sciences), authors pay a fee to publish. I won’t discuss the pricing details, which can be found here, but just note that many institutions already have a publishing plan with PeerJ, which means that authors at those institutions will find their articles are pre-paid. This was the case for me, though I had to pay a $99 fee for my co-author (a PhD student).

The editorial board is large: over 300 editors, each of whom has to choose which subjects, from a given list, match their research interests. There is no Editor in-Chief. When a paper is submitted, the PeerJ office contacts all editors whose interests match one of the subjects chosen by the submitting authors, asking if they would like to edit the paper. If no editor responds, after reminders if necessary, the editorial office regards the paper as either not in the scope of the journal or insufficiently interesting and returns it to the authors—though I believe this is not a common occurrence.

The journal’s aims and scope statement says

“PeerJ Computer Science only considers Research Articles … PeerJ Computer Science evaluates articles based only on an objective determination of scientific and methodological soundness, not on subjective determinations of ‘impact’ or ‘readership’ for example.”

In every other journal in which I’ve published, significance and likely impact are criteria for acceptance, so PeerJ Computer Science is very different in this regard. It’s too early for me to say how these easier-to-satisfy criteria affect the refereeing and editing process.

The coverage of the journal is defined by its list of categories. Closest to my interests are Algorithms and Analysis of Algorithms, Data Science, Optimization Theory and Computation, Scientific Computing and Simulation, and Software Engineering. Each of these categories forms an overlay, pulling out a subset of papers from the journal as a whole that lie in the given area.

The journal aims for a fast turnaround. Referees are given 2 weeks or 3 weeks (the editor chooses) to provide a report. PeerJ Computer Science gets a first decision back in about a month.

Here are some of the things I like about PeerJ Computer Science.

  • The process of submission, refereeing, and editing has been designed to be web-based, and it is very nice to use. The submitting author has a lot of information to complete, but much of it relates to the journal’s policies: funding sources, competing interests, and data availability must be entered, along with a description of what contribution each author made to the paper (something new for me, but standard in many areas of science).
  • I spend a lot of time on journal websites trying to find the “download PDF” button and the “export citations” button, which seem to be in a different place on every site. PeerJ has a delightfully simple big blue Download button (see the image at the top of the page): click it, and you can select what you want to download, be it PDF, BibTeX, or something else. What a brilliant idea!
  • All authors are given the opportunity to set up an author profile page, which provides a link to their PeerJ articles as well their website, Twitter account, GitHub account, ORCID, etc. See my profile and my co-author Weijian Zhang’s profile.
  • The journal has a thorough set of policies dealing with all aspects of ethics and procedures. These are laid out extremely clearly.
  • Once your paper is published you get access to a personal “To-do list” web page with ideas for how to publicize your paper, which include Tweeting about it and emailing colleagues, with one click producing a partially completed Tweet or email. The page explains “Why promote your work?” and records which To-dos have been done.
  • While the published paper is frozen, the page it sits on (pointed to by the DOI) is not. As an author you are able to add links at the end of the page, perhaps to blog posts, updated software, or follow-on work.
  • Referee reports and author responses can be made publicly available via a “Peer Review History” link, provided both authors and referees agree. The history for my paper is here. The original submission can also be downloaded.
  • The PeerJ staff read the reviews as they come in, and flag anything that might be problematic with the editor, such as an inadequate review. This is a great idea. I do enough editing, for various journals, that usually I look at reports only when they are all in. Early notification of issues with a review can shorten the time to a decision being made on a paper.
  • PeerJ uses LaTeX, and high quality PDF files can be downloaded. Papers are displayed in the browser in html form, with MathJaX used for the mathematics (see this example); they look very similar to the PDF version.
  • While PeerJ does not copy edit manuscripts, it does put them into the journal format and copy edit references into the journal style. The experience from my one published paper was very positive, and included a short email exchange about how best to format one reference. After having had a poor experience with copy editing and production at a commercial publisher recently, I found the PeerJ proofing stage a pleasure.
  • For readers, the web site works exactly as you would hope. Searches are fast and accurate. Thanks to the responsive web coding, papers can be read in html form comfortably on an iPhone.
  • The journal integrates with a preprint server, PeerJ Preprints, which supports versioning. Authors can start a submission at either PeerJ Preprints or PeerJ Computer Science and then export their submission to the other, retaining metadata. I have not tried PeerJ Preprints, but it offers an interesting alternative to the ArXiv or an institutional preprint server.
  • With its Paper Now experiment, PeerJ allows you to create, edit, and display a journal article entirely in GitHub. For an example see this GitHub article and this corresponding PeerJ article.

In summary, PeerJ Computer Science is completely different from all the journals I have previously published in or edited for, but I am impressed by what I’ve seen. By rethinking how a journal should be managed and published in the 21st century, the PeerJ team have brought some fresh ideas into this domain of academic publishing.

Posted in publishing | Leave a comment

Improved MATLAB Function Sqrtm


The MATLAB function sqrtm, for computing a square root of a matrix, first appeared in the 1980s. It was improved in MATLAB 5.3 (1999) and again in MATLAB 2015b. In this post I will explain how the recent changes have brought significant speed improvements.

Recall that every n-by-n nonsingular matrix A has a square root: a matrix X such that X^2 = A. In fact, it has at least two square roots, \pm X, and possibly infinitely many. These two extremes occur when A has a single block in its Jordan form (two square roots) and when an eigenvalue appears in two or more blocks in the Jordan form (infinitely many square roots).

In practice, it is usually the principal square root that is wanted, which is the one whose eigenvalues lie in the right-half plane. This square root is unique if A has no real negative eigenvalues. We make it unique in all cases by taking the square root of a negative number to be the one with nonnegative imaginary part.

The original sqrtm transformed A to Schur form and then applied a recurrence of Parlett, designed for general matrix functions; in fact it simply called the MATLAB funm function of that time. This approach can be unstable when A has nearly repeated eigenvalues. I pointed out the instability in a 1999 report A New Sqrtm for MATLAB and provided a replacement for sqrtm that employs a recurrence derived specifically for the square root by Björck and Hammarling in 1983. The latter recurrence is perfectly stable. My function also provided an estimate of the condition number of the matrix square root.

The importance of sqrtm has grown over the years because logm (for the matrix logarithm) depends on it, as do codes for other matrix functions, for computing arbitrary powers of a matrix and inverse trigonometric and inverse hyperbolic functions.

For a triangular matrix T, the cost of the recurrence for computing T^{1/2} is the same as the cost of computing T^{-1}, namely n^3/3 flops. But while the inverse of a triangular matrix is a level 3 BLAS operation, and so has been very efficiently implemented in libraries, the square root computation is not in the level 3 BLAS standard. As a result, my sqrtm implemented the Björck–Hammarling recurrence in M-code as a triply nested loop and was rather slow.

The new sqrtm introduced in MATLAB 2015b contains a new implementation of the Björck–Hammarling recurrence that, while still in M-code, is much faster. Here is a comparison of the underlying function sqrtm_tri (contained in toolbox/matlab/matfun/private/sqrtm_tri.m) with the relevant piece of code extracted from the old sqrtm. Shown are execution times in seconds for random triangular matrices an a quad-core Intel Core i7 processor.

n sqrtm_tri old sqrtm
10 0.0024 0.0008
100 0.0017 0.014
1000 0.45 3.12

For n=10, the new code is slower. But for n=100 we already have a factor 8 speedup, rising to a factor 69 for n=1000. The slowdown for n=10 is for a combination of two reasons: the new code is more general, as it supports the real Schur form, and it contains function calls that generate overheads for small n.

How does sqrtm_tri work? It uses a recursive partitioning technique. It writes

T = \begin{bmatrix} T_{11} & T_{12} \\ 0 & T_{22} \\ \end{bmatrix}

and notes that

T^{1/2} = \begin{bmatrix} T_{11}^{1/2} & X_{12} \\ 0 & T_{22}^{1/2} \\ \end{bmatrix},

where T_{11}^{1/2} X_{12} + X_{12} T_{22}^{1/2} = T_{12}. In this way the task of computing the square root of T is reduced to the tasks of recursively computing the square roots of the smaller matrices T_{11} and T_{22} and then solving the Sylvester equation for X_{12}. The Sylvester equation is solved using an LAPACK routine, for efficiency. If you’d like to take a look at the code, type edit private/sqrtm_tri at the MATLAB prompt. For more on this recursive scheme for computing square roots of triangular matrices see Blocked Schur Algorithms for Computing the Matrix Square Root (2013) by Deadman, Higham and Ralha.

The sqrtm in MATLAB 2015b includes two further refinements.

  • For real matrices it uses the real Schur form, which means that all computations are carried out in real arithmetic, giving a speed advantage and ensuring that the result will not be contaminated by imaginary parts at the roundoff level.
  • It estimates the 1-norm condition number of the matrix square root instead of the 2-norm condition number, so exploits the normest1 function.

Finally, I note that the product of two triangular matrices is also not a level-3 BLAS routine, yet again it is needed in matrix function codes. A proposal for it to be included in the Intel Math Kernel Library was made recently by Peter Kandolf, and I strongly support the proposal.

Posted in research | Tagged , | Leave a comment

The Top 10 Algorithms in Applied Mathematics


From “Computational Science” by David E. Keyes in Princeton Companion to Applied Mathematics

In the January/February 2000 issue of Computing in Science and Engineering, Jack Dongarra and Francis Sullivan chose the “10 algorithms with the greatest influence on the development and practice of science and engineering in the 20th century” and presented a group of articles on them that they had commissioned and edited. (A SIAM News article by Barry Cipra gives a summary for anyone who does not have access to the original articles). This top ten list has attracted a lot of interest.

Sixteen years later, I though it would be interesting to produce such a list in a different way and see how it compares with the original top ten. My unscientific—but well defined— way of doing so is to determine which algorithms have the most page locators in the index of The Princeton Companion to Applied Mathematics (PCAM). This is a flawed measure for several reasons. First, the book focuses on applied mathematics, so some algorithms included in the original list may be outside its scope, though the book takes a broad view of the subject and includes many articles about applications and about topics on the interface with other areas. Second, the content is selective and the book does not attempt to cover all of applied mathematics. Third, the number of page locators is not necessarily a good measure of importance. However, the index was prepared by a professional indexer, so it should reflect the content of the book fairly objectively.

A problem facing anyone who compiles such a list is to define what is meant by “algorithm”. Where does one draw the line between an algorithm and a technique? For a simple example, is putting a rational function in partial fraction form an algorithm? In compiling the following list I have erred on the side of inclusion. This top ten list is in decreasing order of the number of page locators.

  1. Newton and quasi-Newton methods
  2. Matrix factorizations (LU, Cholesky, QR)
  3. Singular value decomposition, QR and QZ algorithms
  4. Monte-Carlo methods
  5. Fast Fourier transform
  6. Krylov subspace methods (conjugate gradients, Lanczos, GMRES, minres)
  7. JPEG
  8. PageRank
  9. Simplex algorithm
  10. Kalman filter

Note that JPEG (1992) and PageRank (1998) were youngsters in 2000, but all the other algorithms date back at least to the 1960s.

By comparison, the 2000 list is, in chronological order (no other ordering was given)

  • Metropolis algorithm for Monte Carlo
  • Simplex method for linear programming
  • Krylov subspace iteration methods
  • The decompositional approach to matrix computations
  • The Fortran optimizing compiler
  • QR algorithm for computing eigenvalues
  • Quicksort algorithm for sorting
  • Fast Fourier transform
  • Integer relation detection
  • Fast multipole method

The two lists agree in 7 of their entries. The differences are:

PCAM list 2000 list
Newton and quasi-Newton methods The Fortran Optimizing Compiler
Jpeg Quicksort algorithm for sorting
PageRank Integer relation detection
Kalman filter Fast multipole method

Of those in the right-hand column, Fortran is in the index of PCAM and would have made the list, but so would C, MATLAB, etc., and I draw the line at including languages and compilers; the fast multipole method nearly made the PCAM table; and quicksort and integer relation detection both have one page locator in the PCAM index.

There is a remarkable agreement between the two lists! Dongarra and Sullivan say they knew that “whatever we came up with in the end, it would be controversial”. Their top ten has certainly stimulated some debate, but I don’t think it has been too controversial. This comparison suggests that Dongarra and Sullivan did a pretty good job, and one that has stood the test of time well.

Finally, I point readers to a talk Who invented the great numerical algorithms? by Nick Trefethen for a historical perspective on algorithms, including most of those mentioned above.

Posted in Princeton Companion | 1 Comment

A Collection of Invalid Correlation Matrices


I’ve written before (here) about the increasingly common problem of matrices that are supposed to be correlation matrices (symmetric and positive semidefinite with ones on the diagonal) turning out to have some negative eigenvalues. This is usually bad news because it means that subsequent computations are unjustified and even dangerous. The problem occurs in a wide variety of situations. For example in portfolio optimization a consequence could be to take arbitrarily large positions in a stock, as discussed by Schmelzer and Hauser in Seven Sins in Portfolio Optimization.

Much research has been done over the last fifteen years or so on how to compute the nearest correlation matrix to a given matrix, and these techniques provide a natural way to correct an “invalid” correlation matrix. Of course, other approaches can be used, such as going back to the underlying data and massaging it appropriately, but it is clear from the literature that this is not always possible and practitioners may not have the mathematical or statistical knowledge to do it.

Nataša Strabić and I have built up a collection of invalid correlation matrices, which we used most recently in work on Bounds for the Distance to the Nearest Correlation Matrix. These are mostly real-life matrices, which makes them valuable for test purposes.

We have made our collection of invalid correlation matrices available in MATLAB form on GitHub as the repository matrices-correlation-invalid. I am delighted to be able to include, with the permission of investment company Orbis, two relatively large matrices, of dimensions 1399 and 3120, arising in finance. These were the matrices I used in my original 2002 paper.

Posted in research | Tagged , | Leave a comment

Behind the Scenes of the Princeton Companion to Applied Mathematics

I’ve published an article Making the Princeton Companion to Applied Mathematics in Mathematics Today, the membership magazine of the The Institute of Mathematics and its Applications.

The article describes the story behind the Princeton Companion to Applied Mathematics, published in October 2015, which I edited (along with associate editors Mark Dennis, Paul Glendinning, Paul Martin, Fadil Santosa and Jared Tanner).

Among the topics covered are

  • the motivation for the book and for publishing it in hard copy (as well as in e-book form),
  • the question “What is applied mathematics?”,
  • the challenge of producing 1000 pages of high quality typeset mathematical text,
  • the design of the book.

Complementary to the article is the YouTube video below, recorded and produced by George Miller, in which I talk about the project. It was filmed in and around the Alan Turing building at the University of Manchester. An interesting tidbit about George is that when he worked at Oxford University Press he set up and commissioned the Very Short Introductions series, of which Tim Gowers’s Mathematics: Very Short Introductions was one of the first volumes to be published.

Posted in Princeton Companion | Tagged | Leave a comment

Empty Matrices in MATLAB

What matrix has zero norm, unit determinant, and is its own inverse? The conventional answer would be that there is no such matrix. But the empty matrix [ ] in MATLAB satisfies these conditions:

>> A = []; norm(A), det(A), inv(A)
ans =
ans =
ans =

While many MATLAB users will be familiar with the use of [ ] as a way of removing a row or column of a matrix (e.g., A(:,1) = []), or omitting an argument in a function call (e.g., max(A,[],2)), fewer will be aware that [ ] is just one in a whole family of empty matrices. Indeed [ ] is the 0-by-0 empty matrix

>> size([])
ans =
     0     0

Empty matrices can have dimension n-by-0 or 0-by-n for any nonnegative integer n. One way to construct them is with double.empty (or the empty method of any other MATLAB class):

>> double.empty
ans =
>> double.empty(4,0)
ans =
   Empty matrix: 4-by-0

What makes empty matrices particularly useful is that they satisfy natural generalizations of the rules of matrix algebra. In particular, matrix multiplicaton is defined whenever the inner dimensions match up.

>> A = double.empty(0,5)*double.empty(5,0)
A =
>> A = double.empty(2,0)*double.empty(0,4)
A =
     0     0     0     0
     0     0     0     0

As the second example shows, the product of empty matrices with positive outer dimensions has zero entries. This ensures that expressions like the following work as we would hope:

>> p = 0; A = ones(3,2); B = ones(3,p); C = ones(2,3); D = ones(p,3);
>> [A B]*[C; D]
ans =
     2     2     2
     2     2     2
     2     2     2

In examples such as this empty matrices are very convenient, as they avoid us having to program around edge cases.

Empty matrices have been in MATLAB since 1986, their inclusion having been suggested by Rod Smart and Rob Schreiber \lbrack 1 \rbrack. A 1989 MATLAB manual says

We’re not sure we’ve done it correctly, or even consistently, but we have found the idea useful.

In those days there was only one empty matrix, the 0-by-0 matrix, and this led Nett and Haddad (1993) to describe the MATLAB implementation of the empty matrix concept as “neither correct, consistent, or useful, at least not for system-theoretic applications”. Nowadays MATLAB gets it right and indeed it adheres to the rules suggested by those authors and by de Boor (1990). If you are wondering how the values for norm([]), det([]) and inv([]) given above are obtained, see de Boor’s article for an explanation in terms of linear algebra transformations.

The concept of empty matrices dates back to before MATLAB. The earliest reference I am aware of is a 1970 book by Stoer and Witzgall. As the extract below shows, these authors recognized the need to support empty matrices of varying dimension and they understood how multiplication of empty matrices should work.


From Stoer and Witzgall (1970, page 3).


  1. John N. Little, The MathWorks Newsletter, 1 (1), March 1986.
Posted in matrix computations | Tagged | 1 Comment

Updated Catalogue of Software for Matrix Functions


From “help matfun” in MATLAB.

Edvin Deadman and I have updated the catalogue of software for matrix functions that we produced in 2014 (and which was discussed in this post). The new version, which has undergone some minor reorganization, is available here. It covers what is available in languages (C++, Fortran, Java, Julia, Python), problem solving environments (GNU Octave, Maple, Mathematica, MATLAB, R, Scilab), and libraries (GNU Scientific Library, NAG Library, SLICOT).


From NAG LIbrary Mark 25 News.

Here are some highlights of changes in the last two years that are reflected in the new version.

Other changes to the catalogue include these.

  • SLICOT has been added.
  • Two more R packages are included.

Suggestions for inclusion in a future revision are welcome.

Posted in software | Tagged , , , | Leave a comment

The Improved MATLAB Functions Expm and Logm


Equation from the Princeton Companion to Applied Mathematics, article “Functions of Matrices” (p. 97)

The matrix exponential is a ubiquitous matrix function, important both for theory and for practical computation. The matrix logarithm, an inverse to the exponential, is also increasingly used (see my earlier post, 400 Years of Logarithms).

MATLAB R2015b introduced new versions of the expm and logm functions. The Release Notes say

The algorithms for expm, logm, and sqrtm show increased accuracy, with logm and sqrtm additionally showing improved performance.

The help text for expm and logm is essentially unchanged from the previous versions, so what’s different about the new functions? (I will discuss sqrtm in a future post.)

The answer is that both functions make use of new backward error bounds that can be much smaller than the old ones for very nonnormal matrices, and so help to avoid a phenomenon known as overscaling. The key change is that when bounding a matrix power series p(X) = a_0 I + a_1 X + a_2 X^2 + \cdots, instead of bounding the kth term a_k X^k by |a_k| \|X\|^k, a potentially smaller bound is used.

This is best illustrated by example. Suppose we want to bound \|X^{12}\| and are not willing to compute X^{12} but are willing to compute lower powers of X. We have 12 = 6 \times 2 = 4 \times 3, so \|X^{12}\| is bounded by each of the terms (\|X^2\|)^6, (\|X^3\|)^4, (\|X^4\|)^3, and (\|X^6\|)^2. But it is easy to see that (\|X^6\|)^2 \le (\|X^2\|)^6 and (\|X^6\|)^2 \le (\|X^3\|)^4, so we can discard two of the bounds, ending up with

\|X^{12}\| \le \min( \|X^4\|^3, \|X^6\|^2 ).

This argument can be generalized so that every power of X is bounded in terms of the norms of X^p for values of p up to some small, fixed value. The gains can be significant. Consider the matrix

X = \begin{bmatrix}1 & 100 \\ 0 & 1 \end{bmatrix}.

We have \|X\|^{12} \approx 10^{24}, but

X^k = \begin{bmatrix}1 & 100k \\ 0 & 1 \end{bmatrix},

so the bound above is roughly \|X^{12}\| \le 6 \times 10^{7}, which is a significant improvement.

One way to understand what is happening is to note the inequality

\rho(X) \le \| X^k\| ^{1/k} \le \|X\|,

where \rho is the spectral radius (the largest modulus of any eigenvalue). The upper bound corresponds to the usual analysis. The lower bound is something that we cannot use to bound the norm of the power series. The middle term is what we are using, and as k\to\infty the middle term converges to the lower bound, which can be arbitrarily smaller than the upper bound.

What is the effect of these bounds on the algorithms in expm and logm? Both algorithms make use of Padé approximants, which are good only for small-normed matrices, so the algorithms begin by reducing the norm of the input matrix. Backward error bounds derived by bounding a power series as above guide the norm reduction and if the bounds are weak then the norm is reduced too much, which can result in loss of accuracy in floating point arithmetic—the phenomenon of overscaling. The new bounds greatly reduce the chance of overscaling.

In his blog post A Balancing Act for the Matrix Exponential, Cleve Moler describes a badly scaled 3-by-3 matrix for which the original expm suffers from overscaling and a loss of accuracy, but notes that the new algorithm does an excellent job.

The new logm has another fundamental change: it applies inverse scaling and squaring and Padé approximation to the whole triangular Schur factor, whereas the previous logm applied this technique to the individual diagonal blocks in conjunction with the Parlett recurrence.

For more on the algorithms underlying the new codes see these papers. The details of how the norm of a matrix power series is bounded are given in Section 4 of the first paper.

Posted in research | Tagged , | Leave a comment