The 2008 revision of the IEEE Standard for Floating-Point Arithmetic introduced a half precision 16-bit floating point format, known as fp16, as a storage format. Various manufacturers have adopted fp16 for computation, using the obvious extension of the rules for the fp32 (single precision) and fp64 (double precision) formats. For example, fp16 is supported by the NVIDIA P100 and V100 GPUs and the AMD Radeon Instinct MI25 GPU, as well as the A64FX Arm processor that will power the Fujitsu Post-K exascale computer.

Fp16 has the drawback for scientific computing of having a limited range, its largest positive number being . This has led to the development of an alternative 16-bit format that trades precision for range. The bfloat16 format is used by Google in its tensor processing units. Intel, which plans to support bfloat16 in its forthcoming Nervana Neural Network Processor, has recently (November 2018) published a white paper that gives a precise definition of the format.

The allocation of bits to the exponent and significand for bfloat16, fp16, and fp32 is shown in this table, where the implicit leading bit of a normalized number is counted in the significand.

Format | Significand | Exponent |
---|---|---|

bfloat16 | 8 bits | 8 bits |

fp16 | 11 bits | 5 bits |

fp32 | 24 bits | 8 bits |

Bfloat16 has three fewer bits in the significand than fp16, but three more in the exponent. And it has the same exponent size as fp32. Consequently, converting from fp32 to bfloat16 is easy: the exponent is kept the same and the significand is rounded or truncated from 24 bits to 8; hence overflow and underflow are not possible in the conversion.

On the other hand, when we convert from fp32 to the much narrower fp16 format overflow and underflow can readily happen, necessitating the development of techniques for rescaling before conversion—see the recent EPrint Squeezing a Matrix Into Half Precision, with an Application to Solving Linear Systems by me and Sri Pranesh.

The drawback of bfloat16 is its lesser precision: essentially 3 significant decimal digits versus 4 for fp16. The next table shows the unit roundoff , smallest positive (subnormal) number xmins, smallest normalized positive number xmin, and largest finite number xmax for the three formats.

xmins | xmin | xmax | ||
---|---|---|---|---|

bfloat16 | 3.91e-03 | (*) | 1.18e-38 | 3.39e+38 |

fp16 | 4.88e-04 | 5.96e-08 | 6.10e-05 | 6.55e+04 |

fp32 | 5.96e-08 | 1.40e-45 | 1.18e-38 | 3.40e+38 |

(*) Unlike the fp16 format, Intel’s bfloat16 does not support subnormal numbers. If subnormal numbers were supported in the same way as in IEEE arithmetic, xmins would be 9.18e-41.

The values in this table (and those for fp64 and fp128) are generated by the MATLAB function `float_params`

that I have made available on GitHub and at MathWorks File Exchange.

An interesting way to compare these different precisions is in summation of the harmonic series . The series diverges, but when summed in the natural order in floating-point arithmetic it converges, because the partial sums grow while the addends decrease and eventually the addend is small enough that it does not change the partial sum. Here is a table showing the computed sum of the harmonic series for different precisions, along with how many terms are added before the sum becomes constant.

Arithmetic | Computed Sum | Number of terms |
---|---|---|

bfloat16 | ||

fp16 | ||

fp32 | ||

fp64 |

The differences are striking! I determined the first three values in MATLAB. The fp64 value is reported by Malone based on a computation that took 24 days, and he also gives analysis to estimate the limiting sum and corresponding number of terms for fp64.

The NVIDIA V100 has tensor cores that can carry out the computation D = C + A*B in one clock cycle for 4-by-4 matrices A, B, and C with only one rounding error; this is a 4-by-4 fused multiply-add (FMA) operation. Moreover, C and D can be in fp32. The benefits that the speed and accuracy of the tensor cores can bring over plain fp16 is demonstrated in Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers.

Intel’s bfloat16 format supports a scalar FMA d = c + a*b, where c and d are in fp32.

A few years ago we had just single precision and double precision arithmetic. With the introduction of fp16 and fp128 in the IEEE standard in 2008, and now bfloat16 by Google and Intel, the floating-point landscape is becoming much more interesting.

The MATLAB R2018b release notes report a number of performance improvements, including faster startup and faster calls to built-in functions. I pick out here a few other highlights from the release (excluding the toolboxes) that are of interest from the numerical analysis point of view.

The new `xline`

and `yline`

functions add vertical or horizontal lines to a plot—something that I have needed from time to time and have previously had to produce with a `line`

or a `plot`

command, together with `hold on`

. For example, `xline(pi)`

plots a vertical line at `x = pi`

.

The `stackedplot`

function is mainly of interest for plotting multiple variables from a table in a single plot, but it can also plot the columns of a matrix. In this example, `A`

is the symmetric eigenvector matrix for the second difference matrix:

A = gallery('orthog',10,1); stackedplot(A);

The resulting plot clearly shows that the number of sign changes increases with the column index.

String arrays, introduced in R2016b, are now supported throughout MATLAB. In the previous example I could have typed `A = gallery("orthog",10,1)`

.

A new option `'all'`

to the functions `all`

, `any`

, `max`

, `min`

, `prod`

, `sum`

(and a few others) makes them operate on all the dimensions of the array. For example:

>> A = pascal(3) A = 1 1 1 1 2 3 1 3 6 >> max(A,[],'all') ans = 6 >> [prod(A,'all'), sum(A,'all')] ans = 108 19

The empty second argument in the `max`

call is needed because `max(x,y)`

is also a supported usage. This is a useful addition. I have often written `norm(A(:),inf)`

to avoid `max(max(abs(A)))`

(which does not work for arrays with more than two dimensions), but now I can write `max(abs(A),[],'all')`

without incurring the overhead of forming `A(:)`

.

New functions `sinpi`

and `cospi`

plot the sine and cosine functions at the specified multiple of . Thus `sinpi(x)`

is the same as `sin(pi*x)`

except that it does not explicitly compute `pi*x`

and so returns exact answers for integer arguments:

>> [sin(pi) sinpi(1)] ans = 1.2246e-16 0

- For earlier “What’s New in MATLAB” posts click here.
- See also MATLAB Guide, Third Edition (2017).

- adjacency matrices of graphs have entries from ;
- Bernoulli matrices, which occur in compressed sensing, have entries from ;
- Hadamard matrices have entries from and orthogonal columns; and
- matrices with elements from provide worst case growth factors for Gaussian elimination with partial pivoting and yield the most ill conditioned triangular matrices with elements bounded by .

Rob’s group have done extensive computations of eigenvalues and characteristic polynomials of Bohemian matrices, which have led to interesting observations and conjectures. Many beautiful visualizations are collected on the website http://www.bohemianmatrices.com as well as on the Bohemian Matrices Twitter feed.

In June 2018, Rob and I organized a 3-day workshop Bohemian Matrices and Applications, bringing together 16 people with an interest in the subject from a variety of backgrounds. The introductory talks by Rob, Steven, and I were videod (and are embedded below), and the slides from most of the talks are available on the conference website.

We scheduled plenty of time for discussion and working together. New collaborations were started, several open problems were solved and numerous new questions were posed.

The workshop has led to various strands of ongoing work. Steven has created the Characteristic Polynomial Database, which contains more than polynomials from more than Bohemian matrices and has led to a number of conjectures concerning matches of properties to sequences at the On-Line Encyclopedia of Integer Sequences. Three recent outputs are

- E. Y. S. Chan, R. M. Corless, L. Gonzalez-Vega, J. R. Sendra, J. Sendra and S. E. Thornton, Bohemian Upper Hessenberg Matrices, ArXiv preprint arXiv:1809.10653, September 2018.
- E. Y. S. Chan, R. M. Corless, L. Gonzalez-Vega, J. R. Sendra, J. Sendra and S. E. Thornton, Bohemian Upper Hessenberg Toeplitz Matrices, ArXiv preprint arXiv:1809.10664, September 2018.
- S. L. Hill, M. N. Huxley, M. C. Lettington and K. M. Schmidt, Some Properties and Applications of Non-trivial Divisor Functions, ArXiv preprint arXiv:1806.00651, June 2018; to appear in The Ramanujan Journal [added to this post October 26, 2018].

Sponsorship of the workshop by the Heilbronn Institute for Mathematical Research, the Royal Society and the School of Mathematics at the University of Manchester, as well as support from Canada for some of the Canadian participants, is gratefully acknowledged.

Gil talked about his famous 18.06 linear algebra course at MIT, which in its YouTube form has had almost 3 million views. A number of people in the audience commented that they had learned linear algebra from Gil.

Gil also talked about his book Linear Algebra and Learning from Data, due to be published at the end of 2018, which includes chapters on deep neural networks and back propagation. Many people will want to read Gil’s insightful slant on these topics (see also my SIAM News article The World’s Most Fundamental Matrix Equation).

As well as the video of Gil’s interview embedded below, two written interviews will be of interest to Gil’s fans:

- Farooq Ahmed, QnAs with Gilbert Strang, Proc. Nat. Acad. Sci. 108 (11), p. 4273, 2011.
- Y. K. Leong, Mathematical Conversations. Gilbert Strang: The Changing Face of Applied Mathematics, Newsletter of Institute for Mathematical Sciences, Institute for Mathematical Sciences National University of Singapore, pp 8-12, 2003.

The first part of the talk was about two aspects of complex arithmetic: the complex step method for approximating derivatives, and understanding multivalued functions with the help of the unwinding number. Then I talked about the role of the associativity of matrix multiplication, which turns out to be the key property that makes the Sherman-Morrison formula work (this formula gives the inverse of a matrix after a rank 1 update). I pointed out the role of associativity in backpropagation in neural networks and deep learning.

After giving an example of using randomization to avoid pathological cases, I discussed why low precision (specifically half precision) arithmetic is of growing interest and identified some issues that need to be overcome in order to make the best use of it.

Almost every talk at JuliaCon was livecast on YouTube, and these talks are available to watch on the Julia Language channel. The slides for my talk are available here.

Also at the conference, my PhD student Weijian Zhang spoke about the work he has been doing on evolving graphs in his PhD.

]]> Complex empty arrays, such as that generated with `complex([])`

now have an (empty) imaginary part instead of being real.

% R2017b >> complex([]) ans = [] % R2018a >> complex([]) ans = 0×0 empty complex double matrix

This is a consequence of a change in the way MATLAB stores complex arrays. Prior to R2018a it stored the real parts together and the imaginary parts together. In R2081 it uses an interleaved format in which the real and imaginary parts of a number are stored together; this is the storage format used in the C and C++ languages. For more details see MATLAB Support for Interleaved Complex API in C MEX Functions. Unless you write MEX files or create complex empty arrays, this change should have no effect on you.

The Live Editor continues to gain improved functionality, including embedded sliders and drop-down menus.

Legends can now have multiple columns, specified with the `NumColumns`

property for Legend objects. I must admit that I had not realized that is was already possible to arrange legends in a horizontal (1-by-n) rather than vertical orientation (n-by-1) using the `Orientation`

property.

Tall arrays have several new capabilities, including the ability to compute a least squares solution to an overdetermined linear system `Ax = b`

with a tall `A`

, which is done by QR factorization.

MATLAB starts up faster and has further optimizations in the execution engine.

The GraphPlot Object has some additional options for the `'force'`

, `'force3'`

, and `'circle'`

layouts.

I noticed that the recent Windows 10 Spring Creators Update broke the Symbolic Toolbox. An update to fix this (and various other issues) is available at this page (MathWorks login required).

For a concise but wide-ranging introduction and reference to MATLAB, see MATLAB Guide (third edition, 2017)

I’ve written a number of papers on algorithms to compute the (principal) logarithm of a matrix. The problem of computing the logarithm of a complex scalar—given a library routine that handles real arguments—might appear trivial, by comparison. That it is not can be seen by looking at early attempts to provide such a function in Algol 60.

The paper

J. R. Herndon (1961). Algorithm 48: Logarithm of a complex number. Comm. ACM, 4(4), 179.

presents an Algol 60 code for computing , for a complex number . It uses the arctan function to obtain the argument of a complex number.

The paper

A. P. Relph (1962). Certification of Algorithm 48: Logarithm of a complex number. Comm. ACM, 5(6), 347.

notes three problems with Herndon’s code: it fails for with zero real part, the imaginary part of the logarithm is on the wrong range (it should be for the principal value), and the code uses `log`

(log to the base 10) instead of `ln`

(log to the base ). The latter error suggests to me that the code had never actually been run, as for almost any argument it would produce an incorrect value. This is perhaps not surprising since Algol 60 compilers must have only just started to become available in 1961.

The paper

M. L. Johnson and W. Sangren, W. (1962). Remark on Algorithm 48: Logarithm of a complex number. Comm. CACM, 5(7), 391.

contains more discussion about avoiding division by zero and getting signs correct. In

D. S. Collens (1964). Remark on remarks on Algorithm 48: Logarithm of a complex number. Comm. ACM, 7(8), 485.

Collens notes that Johnson and Sangren’s code wrongly gives and has a missing minus sign in one statement. Finally, Collens gives in

D. S. Collens (1964). Algorithm 243: Logarithm of a complex number: Rewrite of Algorithm 48. Comm. CACM, 7(11), 660.

a rewritten algorithm that fixes all the earlier errors.

So it took five papers over a three year period to produce a correct Algol 60 code for the complex logarithm! Had those authors had the benefit of today’s interactive computing environments that period could no doubt have been shortened, but working with multivalued complex functions is necessarily a tricky business, as I have explained in earlier posts here and here.

]]>The event was superbly organized by Josef Malek, Miroslav Rozlozník, Zdenek Strakos and Miroslav Tuma. This was a relaxed and friendly event, and the excellent weather enabled most meals to be taken on the terrace of the family-run Sporthotel Kácov in which we were staying.

I gave three lectures of about one hour each on *Multiprecision Algorithms*. The slides are available from this link. Here is an abstract for the lectures:

Today’s computing environments offer multiple precisions of floating-point arithmetic, ranging from quarter precision (8 bits) and half precision (16 bits) to double precision (64 bits) and even quadruple precision (128 bits, available only in software), as well as arbitrary precision arithmetic (again in software). Exploiting the available precisions is essential in order to reduce the time to solution, minimize energy consumption, and (when necessary) solve ill-conditioned problems accurately.

In this course we will describe the precision landscape, explain how we can exploit different precisions in numerical linear algebra, and discuss how to analyze the accuracy and stability of multiprecision algorithms.

Lecture 1. IEEE standard arithmetic and availability in hardware and software. Motivation for low precision from applications, including machine learning. Exploiting reduced communication cost of low precision. Issued relating to rounding error analyses in low precision. Simulating low precision for testing purposes. Challenges of implementing algorithms in low precision.Lecture 2. Basics of rounding error analysis, illustrated with summation. Why increasing precision is not a panacea. Software for high precision and its cost. Case study: the matrix logarithm in high precision.Lecture 3.Solving very linear systems (possibly very ill conditioned and/or sparse) using mixed precision: iterative refinement in three precisions. A hybrid direct-iterative method: GMRES-IR.

I gave an earlier version of these lectures in March 2018 at the EU Regional School held at the Aachen Institute for Advanced Study in Computational Engineering Science (AICES), Germany. This single two and a half hour lecture was recorded and can be viewed on YouTube. The slides are available here.

Joan Eileen Walsh was born on 7 October 1932 and passed away on 30 December 2017 at the age of 85.

Joan obtained a First Class B.A. honours degree in Mathematics from the University of Oxford in 1954. She then spent three years working as an Assistant Mistress at Howell’s School in Denbigh, North Wales. In 1957 Joan left teaching and enrolled at the University of Cambridge to study for a Diploma in Numerical Analysis. This qualification was awarded, with Distinction, in 1958. At this point, Joan returned to the University of Oxford Computing Laboratory to study for a D.Phil. under the supervision of Professor Leslie Fox. She was Fox’s first doctoral student. Her D.Phil. was awarded in 1961.

Between October 1960 and March 1963, Joan worked as a Mathematical Programmer for the CEGB (Central Electricity Generating Board) Computing Department in London. In April 1963, she was appointed to a Lectureship in the Department of Mathematics at the University of Manchester. She progressed through the positions of Senior Lecturer (1966) and Reader (1971) before being appointed as Professor of Numerical Analysis at the University of Manchester in October 1974. For the academic year 1967-1968 Joan had leave of absence at the SRC Atlas Computer Laboratory—a joint appointment with St Hilda’s College, Oxford.

Joan led the Numerical Analysis group at the University of Manchester until 1985, after which Christopher Baker took over. This was a period of expansion both for the Numerical Analysis group at Manchester and, more generally, for numerical analysis in Britain. This expansion of British numerical analysis was supported by special grants from the SRC (Science Research Council) to provide additional funding for the subject at the Universities of Dundee, Manchester and Oxford, from 1973 until 1976. This funding supported one Senior Research Fellow and two Research Fellows at each Institution. Joan helped establish the Manchester group as one of the leading Numerical Analysis research centres in the United Kingdom (with eight permanent staff by 1987)—a position that is maintained to the present day.

Joan was Head of the Department of Mathematics between 1986 and 1989, and subsequently became Pro-Vice Chancellor of the University of Manchester in 1990. She held the latter role for four years, and was responsible for undergraduate affairs across the University. Joan’s tenure as Pro-Vice Chancellor coincided with substantial, and sometimes controversial, changes in undergraduate teaching—for example, the introduction of semesterisation and of credit-based degree programmes; Joan managed these major changes across the University with her customary tact, energy and determination. Joan was an efficient and effective administrator at a time when relatively few women occupied senior management roles in universities.

After 35 years’ service, Joan retired from the University in 1998 and was appointed Professor Emeritus.

In retirement, Joan returned to her studies; between 2000 and 2003 she studied for an MA in “Contemporary Theology in the Catholic Tradition” at Heythrop College of the University of London.

Over the years, and particularly during her tenure as Pro-Vice Chancellor, Joan sat on, and chaired, numerous University committees, far too many to list. She had a very long relationship with Allen Hall (a University Hall of Residence) where she was on the Hall Advisory Committee from 1975 until her retirement in 1998.

Joan served leadership roles nationally, as well as in the University. She was Vice President of the IMA (1992-1993) and a member of the Council of the IMA (1990-1991 and 1994-1995). She was elected Fellow of the Institute of Mathematics and its Applications (IMA) in 1984. She was a member of the Computer Board for Universities and Research Councils for several years from the late 1970s. She encouraged the creation of its Software Provision Committee, formally constituted in 1980 with Joan as its first Chairman, which she led until 1985. She was also President of the National Conference of University Professors (1993–1994). Further, she was a member of the Board of Governors at Withington Girls’ School, a leading independent school, for six years between 1993 and 1999.

Nowadays, all computational scientists take for granted the existence of software libraries such as the NAG Library. It is unimaginable to undertake major computational tasks without them. In 1970, Joan was one of a group of four academics who founded the Nottingham Algorithms Group with the aim of developing a comprehensive mathematical software library for use by the group of universities that were running ICL 1906A mainframe computers. Subsequently, the Nottingham Algorithms Group moved from the University of Nottingham to the University of Oxford and the project was incorporated as the Numerical Algorithms Group (NAG) Ltd. Joan became the Founding Chairman of NAG Ltd. in 1976, a position she held for the next ten years. She was subsequently a member of the Council of NAG Ltd. from 1992 until 1996. In recognition of her contribution to the NAG project Joan was elected as a Founding Member of the NAG Life Service Recognition Award in 2011.

Joan’s research interests focused on the numerical solution of ordinary differential equation boundary value problems and the numerical solution of partial differential equations. She conducted much of her research in collaboration with PhD students, supervising the following PhD students at the University of Manchester, who obtained their degrees in the years shown:

- Thomas Sag, 1966;
- Les Graney, 1973;
- David Sayers, 1973;
- Geoffrey McKeown, 1977;
- Roderick Cook, 1978;
- Patricia Hanson, 1979;
- Guy Lonsdale, 1985;
- Chan Basaruddin, 1990;
- Fathalla Rihan (supervised jointly with C. T. H. Baker), 2000.

Joan was an important figure in the development of Numerical Analysis and Scientific Computing at the University of Manchester and in the UK more generally. Her essay Numerical Analysis at the Victoria University of Manchester, 1957-1979 gives an interesting perspective on early developments at Manchester.

*Brian Ford OBE, Founder Director of NAG, writes*:

Joan had a brilliant career in Mathematics (particularly areas of Numerical Mathematics, Ordinary and Partial Differential Equations), Computing, University Education and Teaching, and was an excellent researcher, teacher, administrator, doctoral supervisor and colleague. But she was so much more than that!

Joan was invariably kind and thoughtful, intellectually gifted and generous with advice and guidance. Her profound Christian faith illuminated every aspect of her life. Joan’s deep reading and wide intellectual interests coupled with her prudence and clear thinking gave her profound knowledge and command. She was excellent company –amusing, modest, never belittling nor intimidating- and enjoyed fine wine and food in good company. She held firm beliefs, gently and persuasively seeking what she saw as the right way. Many people turned to her for help, advice and references and were grateful for her readily-offered help and support.

Joan was a private, even guarded, person. A devout Catholic, on her retirement she completed an MA in “Contemporary Theology in the Catholic Tradition” at Heythorp College, University of London. Fluent in Latin and reading regularly at services, she loved the traditional Tridentine Mass of the Church. Along with her local bishop in Salford and other like-minded Catholics, she therefore worked actively for the restitution of the Tridentine Mass to the liturgy of the world-wide Church (sidelined after Vatican II in favour of local languages), an involvement which culminated her joining high-level discussions at the Vatican. This bore fruit, the Tridentine Latin Mass being officially declared the extraordinary form of the Roman Rite of Mass a few years later: Joan was thrilled. Such was Joan’s commitment to things she believed in and her endless thought and work for others.

Joan was an excellent contributor to the NAG Library, believing strongly in collaboration and sharing, with high quality standards for all aspects of our work and thorough checking and testing. She was an excellent first Chairman of NAG and invaluable colleague and advisor. We thoroughly enjoyed working together, invariably in an excellent spirit. We achieved much.

]]>The conference was expertly organized by Alex Pothen and Jie Shen. The attendees, numbering around 70, included many of Walter’s friends and colleagues.

The speakers made many references to Walter’s research contributions, particularly in the area of orthogonal polynomials. In my talk, Matrix Functions and their Sensitivity, I emphasized Walter’s work on conditioning of Vandermonde matrices.

A Vandermonde matrix is an matrix depending on parameters that has th column . It is nonsingular when the are distinct. This is a notoriously ill conditioned class of matrices. Walter said that he first experienced the ill conditioning when he computed Gaussian quadrature formulas from moments of a weight function.

Walter has written numerous papers on Vandermonde matrices that give much insight into their conditioning. Here is a very a brief selection of Walter’s results. For more, see my chapter Numerical Conditioning in Walter’s collected works.

In a 1962 paper he showed that

In 1978 he obtained

which differs from the upper bound by at most a factor . A 1975 result is that for equispaced on ,

A 1988 paper returns to lower bounds, showing that for and ,

When some of the coincide a confluent Vandermonde matrix can be defined, in which columns are “repeatedly differentiated”. Walter has obtained bounds for the confluent case, too.

These results quantify the extreme ill conditioning. I should note, though, that appropriate algorithms that exploit structure can nevertheless obtain accurate solutions to Vandermonde problems, as described in Chapter 22 of Accuracy and Stability of Numerical Algorithms.