How Accurate Are Spreadsheets in the Cloud?

For a vector x with n elements the sample variance is s_n^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2, where the sample mean is \overline{x} = \frac{1}{n} \sum_{i=1}^n x_i. An alternative formula often given in textbooks is s_n^2 = \frac{1}{n-1} \left( \sum_{i=1}^n x_i^2 - \frac{1}{n} \left(\sum_{i=1}^n x_i \right)^2 \, \right). This second formula has the advantage that it can be computed with just one pass through the data, whereas the first formula requires two passes. However, the one-pass formula can suffer damaging subtractive cancellation, making it numerically unstable. When I wrote my book Accuracy and Stability of Numerical Algorithms I found that several pocket calculators appeared to use the one-pass formula.

How do spreadsheets apps available in web browsers and hosted in the cloud fare on computations such as this? I used Google Sheets to compute the standard deviation of vectors of the form x = [m, m+1, m+2] (Google Sheets does not seem to have a built-in function for the sample variance; the standard deviation is the square root of the sample variance). Here is what I found. (The spreadsheet that produced these results is available as this xlsx file. Note that if you click on that link it will probably load into Excel and display the correct result.)

m Exact standard deviation Google’s result
10^7 1 1
10^8 1 0

The incorrect result 0 for m=10^8 is what I would expect from the one-pass formula in IEEE double precision arithmetic, which has the equivalent of about 16 significant decimal digits of precision, since \sum_{i=1}^n x_i^2 and \frac{1}{n} \left(\sum_{i=1}^n x_i\right)^2 are both about 10^{16} and so there is not enough precision to retain the difference (which is equal to 2). A computation in MATLAB verifies that the one-pass formula returns 0 in IEEE double precision arithmetic.

It seems that Google Sheets is using IEEE double precision arithmetic internally, because the expression 3\times (4/3-1)-1 evaluates to 2.2E-16. So it appears that Google may be using the one-pass formula.

This use of the unstable formula is deeply unsatisfactory, but it is just the tip of the iceberg. In a recent paper Spreadsheets in the Cloud—Not Ready Yet, Bruce McCullough and Talha Yalta show that Google Sheets, Excel Web App and Zoho Sheet all fail on various members of a set of “sanity tests”. This might not be too surprising if you are aware of McCullough’s earlier work in which he found errors in several versions of Microsoft Excel.

However, spreadsheets in the cloud bring further complications, as noted by McCullough and Yalta:

  • These spreadsheets apps do not carry version information and the software can be changed by the provider at any time without announcement. It is therefore impossible to reproduce results computed previously.
  • The hardware and software environment on which the software is running is not specified, which adds another level of irreproducibility.
  • McCullough and Yalta found that the Excel Web App could produce different output from Excel 2010. Anyone moving a spreadsheet between the two applications could be in for a surprise.

The conclusion: use spreadsheets in the cloud at your peril! In fact, I avoid spreadsheets altogether. Anything I want to do can be done better in MATLAB, LaTeX or Emacs ORG mode.

About these ads
This entry was posted in software and tagged . Bookmark the permalink.

3 Responses to How Accurate Are Spreadsheets in the Cloud?

  1. Nick and other readers,

    I think that you will also find my contributions on the Google Docs (+ Spreadsheets) product forum of interest. The short of it is in refs. [1] and [2] and I refer there to earlier contributions going back several years. My own tortuous process of discovery one error at a time can be followed in [3].

    The situation with Google spreadsheets is actually significantly worse yet than one would guess from the McCullough and Yalta paper. My very terse summary on 2013-02-06 under [1]: “Binary search in an N-element sorted array returning the N+1 element, round to integer returning a non-integer or the wrong integer, cumulative distribution functions returning a negative value, standard deviation and variance functions implemented in known bad ways, error function at 0 not equal to 0, gratuitous overflow errors in evaluation of discrete probability functions and combinatorial functions, function descriptions that read like “we haven’t a clue, try it out”; all these errors are still errors on 2013-02-06. Way to go, Google Docs and Spreadsheets technical team and management!”

    The errors are all known to the Google Docs and Spreadsheets technical team and management; they just ignore them.

    Bas Braams
    Vienna, Austria

    [1] (2012-05-06) Errors and other issues with statistical and mathematical functions in GSheets

    https://productforums.google.com/d/msg/docs/rxFCHYeMhrU/bzNv8ivD0JoJ

    [2] (2012-09-26) Errors in the spreadsheets program

    https://productforums.google.com/d/msg/docs/bgqBkRdxIWM/0u7tD-umUFQJ

    [3] (2011-11-13) normdist throws negative value still

    http://productforums.google.com/d/msg/docs/XfBPtoKJ1Ws/5wWPLajcUzAJ

  2. This is interesting and good to know! Fortunately, I only use spreadsheets for expense reports. I don’t expect any numerical issues—until SIAM has its Annual Meeting on Mars. :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s