By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: When is a zero not a zero?
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > When is a zero not a zero?
Data MiningPredictive Analytics

When is a zero not a zero?

DavidMSmith
Last updated: 2009/03/05 at 10:21 PM
DavidMSmith
8 Min Read
SHARE

Answer: when it’s in floating point.No, this isn’t my entry for the “least funny joke ever” competition. It’s the answer to a fairly common complaint of beginning R users, which goes something like this: “R has a bug! It’s giving the wrong answer to a simple calculation!”. (I paraphrase.) Let’s see some examples of such “bugs”:”The square of the square root of two isn’t two!”> a <- sqrt(2)=""> if(a*a != 2) print(“R has a bug!”)[1] “R has a bug!” # this shouldn’t print, should it?”Fractions which should be equal, aren’t!”> a <- (58/40="" -="" 1)=""> a[1] 0.45> b <- (18/40)="">…

Answer: when it's in floating point.

No, this isn't my entry for the "least funny joke ever" competition. It's the answer to a fairly common complaint of beginning R users, which goes something like this: "R has a bug! It's giving the wrong answer to a simple calculation!". (I paraphrase.)  Let's see some examples of such "bugs":

More Read

data mining helps with offsite SEO

Can Data Mining Aid with Off-Page SEO Strategies?

Albanian Bitcoin Investors Tap the Power of Predictive Analytics
Predictive Analytics Improves Trading Decisions as Euro Rebounds
Can Predictive Analytics Help Traders Navigate Bitcoin’s Volatility?
Perks of Predictive Analytics for Businesses Big and Small
"The square of the square root of two isn't two!"

> a <- sqrt(2)
> if(a*a != 2) print("R has a bug!")
[1] "R has a bug!"  # this shouldn't print, should it?

"Fractions which should be equal, aren't!"

> a <- (58/40 – 1)
> a
[1] 0.45
> b <- (18/40)
> b
[1] 0.45
> a==b
[1] FALSE  # shouldn't this be TRUE?

"The sum of the residuals isn't zero!"

> x <- 1:25 + rnorm(25)
> sum(x-mean(x))
[1] 1.509903e-14  # shouldn't this be zero?

"My while loop runs one iteration too many times!"

> j <- 0
> while (j < 1) j<-j+0.1
> j
[1] 1.1  # shouldn't this end with j equal to 1?

What's going on?

The short answer is that R, like pretty much every other numerical software in existence, uses floating point arithmetic to do its calculations.  In each case above R is doing the right thing, given the principles of floating-point.  To use a strained analogy, floating point arithmetic is to the "real" arithmetic you learned in school as Newtonian physics is to Einstein's Theory of Relativity — most of the time it works just like you expect, but in extreme cases the results can be surprising. Unfortunately, while floating-point arithmetic is familiar to computer scientists, it's rarely taught in statistics classes.

The basic principle is this: computers don't store numbers (except smallish integers and some fractions) exactly. It's very similar to the way you can't write down 1/3 in decimal exactly: how ever many 3's you add to the end of .3333333 the number you write will be close to, but not quite, one third.

The principle is the same for floating point numbers: the main difference is that the underlying representation is binary, not decimal. Although the command j <- 0.1 looks like you're assigning the value "one-tenth" to j, in fact it is stored as a number close to, but not exactly, one tenth. (In fact, it's about 2 quadrillionths less than that, on most systems). Most of the time you'll never notice, because an error on that scale is too small to print (actually, the error cancels out in the conversion from decimal to binary and back again). This "error cancellation" happens much of the time, for example, if we multiply j by 10 everything looks fine:

> j <- 0.1
> j*10 – 1
[1] 0

Sometimes, though, these errors accumulate:

> j+j+j+j+j+j+j+j+j+j-1  # ten j's
[1] -1.110223e-16

(One of the weird things about floating-point arithmetic is that it's not necessarily associative, so that (a+b)+c isn't always equal to a+(b+c), nor is it always distributive, so (a+b)*c might not be the same as a*c+b*c.)  A similar effect is evident in the "residuals" example above. Sometimes, the errors can multiply dramatically if you use the wrong algorithm to make calculations, especially where very large and very small numbers mix. For example, calculating standard deviations using the naive "calculator algorithm" can give the wrong answer for large numbers with small variances. Thankfully, R's internal algorithms (including that for the stdev function) are carefully coded to avoid such floating-point error accumulations. (Some other software tools haven't always been so careful.)

Here are some tips to help you avoid some of the most common floating-point pitfalls:

Don't test floating point numbers for exact equality.  If your code includes expressions like x==0 when x is a floating-point number, you're asking for trouble.

Use integer objects when working with whole numbers. If you know that x will only ever take integer values, give it an integer representation, like this: x <- as.integer(1). As long as you only ever add, subtract etc. other integers to/from x, it's safe to use the equality test, and expressions like x==0 are meaningful. (Bonus: you'll reduce memory usage, too.)
   
If you must test floating points numbers, use fuzzy matching. If "real" arithmetic tells you x should be one, and x is floating point, test whether x is in a range near one, not whether it's one exactly. Replace code that looks like this: x==1, with this: abs(x-1)<eps , where eps is a small number. How small eps should be depends on the values you expect x to take. You can use the function all.equal(x,1) to test x against the smallest possible difference. A similar solution this would help our "while loop" example above, but it's usually better to rewrite your code so that such a test isn't necessary.

Use internal algorithms where possible. R's built-in functions are carefully written to avoid accumulation of floating-point errors.  Use functions like stdev and scale instead of rolling your own variants.

Finally, it's always worth learning more about how floating-point arithmetic works.  The Wikipedia article is a good start, and David Goldberg's article What Every Computer Scientist Should Know About Floating-Point Arithmetic has everything you ever wanted to know (and them some). And if you see other R users with floating-point woes, point them to the R FAQ entry Why doesn't R think these numbers are equal?

DavidMSmith March 5, 2009
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data mining helps with offsite SEO
Data Mining

Can Data Mining Aid with Off-Page SEO Strategies?

10 Min Read
predictive analytics helps Albanian bitcoin investors
Blockchain

Albanian Bitcoin Investors Tap the Power of Predictive Analytics

9 Min Read
benefits of data analytics for financial management
Predictive Analytics

Predictive Analytics Improves Trading Decisions as Euro Rebounds

10 Min Read
predictive analytics can help bitcoin traders predict future price movements
Blockchain

Can Predictive Analytics Help Traders Navigate Bitcoin’s Volatility?

8 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?