A Floating Point Question Revisited

QUESTION: A machine stores floating point numbers in 7-bit word. The first bit is stored for the sign of the number, the next three for the biased exponent and the next three for the magnitude of the mantissa. You are asked to represent 33.35 in the above word. The error you will get in this case would be
(A) underflow
(B) overflow
(C) NaN
(D) No error will be registered

The solution to problem is given here.

However a student asked me a follow up question, and here is the answer.

QUESTION: I was doing the multiple choice question and I am having trouble understanding it. I looked at the solution but I am having trouble still. I began by turning 33.35 into binary and i get 100001.01011. I just am having trouble putting it into the format. The max exponent value is 4 in this case but in the solutions it says you need 5. Maybe I do not understand what underflow and over flow is exactly.

ANSWER: The solution is given as you have pointed out.

The binary number in fixed format needs to be converted to floating point format. That would be 100001.01011=1.0000101011*2^5 as you move the radix point by 5 places to the left.  We move that 5 places as it gives us only one non-zero digit now to the left of the radix point.  This is no different from the procedure you use for converting a decimal format to scientific format for base-10 numbers.

Now all floating point formats have an upper limit of number it can represent.  Since the biased exponent has 3 bits, the biased exponent that can be represented is from 0 to 7, which means the unbiased exponent that can be represented is from -3 to 4 (biasing by +3, and unbiasing by -3).  But since we need to represent an unbiased exponent of 5, it cannot be done.  The maximum unbiased exponent that can be represented is 4.  So the number is larger than the one that can be represented.  If you put 32 ounces of water in a 24-ounce cup, we say that the water overflowed.  In this case, the number will overflow as it is more than it can handle.

You can see this in a different way as follows (looking at a solution a different way; that always helps the brain and your long-term memory).

The maximum number you can represent in binary for the given 7-bit word is 0111111 and that translates to (1.111)2*2^(111)2 which in base 10 is equivalent to (1.875)*2^(7-3)=30 (the 3 is used for unbiasing the exponent).  Hence, 33.35 would overflow, just like when you put  32 ounces of water in a 24-ounce cup.


This post is brought to you by

Author: Autar Kaw

Autar Kaw (http://autarkaw.com) is a Professor of Mechanical Engineering at the University of South Florida. He has been at USF since 1987, the same year in which he received his Ph. D. in Engineering Mechanics from Clemson University. He is a recipient of the 2012 U.S. Professor of the Year Award. With major funding from NSF, he is the principal and managing contributor in developing the multiple award-winning online open courseware for an undergraduate course in Numerical Methods. The OpenCourseWare (nm.MathForCollege.com) annually receives 1,000,000+ page views, 1,000,000+ views of the YouTube audiovisual lectures, and 150,000+ page views at the NumericalMethodsGuy blog. His current research interests include engineering education research methods, adaptive learning, open courseware, massive open online courses, flipped classrooms, and learning strategies. He has written four textbooks and 80 refereed technical papers, and his opinion editorials have appeared in the St. Petersburg Times and Tampa Tribune.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s