Lecture 7
Floating-Point Format
Lecture
Outline
- Floating-point format.
- Standard IEEE 754.
- Floating-point instructions.
- Programs with floating-point operations.
Examples
Workshop
-
Find decimal values for the following binary values:
0.0 0.01 0.010 0.0011 0.00110 0.001101 0.0011010 0.00110011
-
Find binary values for the following fractions:
1/2 1/8 3/4 5/16 11/32
-
Find binary values for the following decimal values:
0.5 0.25 0.125 1.125 5.875 3.1875
-
Write program
fprint.s
that inputs a single and double floating-point value and prints them in the binary format. -
Write program
fprint2.s
that separately prints fields (sign, fraction, exponent) of single and double floating-point values. The code of the previous program can be partially reused. -
Write program
farithm.s
that inputs three double valuesa
,b
, andc
, calculates the result of expressiona + b - c
, and prints the result. -
Write program
even_back.s
that does the following:Input an integer value
N
and thenN
float values. Output line by line only even ones, in reversed order. To decide whether a float number is even, it must be converted (rounded) to an integer value.Input:
6 12.3 -11.0 3.25 88.01 0.0 1.25
Output:
0.0 88.01 12.3
-
Write program
no_dups.s
that does the following:Inputs an integer
N
value and then N double values. Outputs all the doubles, skipping duplicated ones.Input:
8 12.025 34.5 -12.0 23.25 12.025 -12.0 56.75 9.125
Output:
12.025 34.5 -12.0 23.25 56.75 9.125
Homework
-
Write program
fraction_truncate.s
that does the following:Input three cardinals —
A
,B
andn
. Output double floatF
that has exactn
decimal places ofA/B
. You need to write a subroutine than accepts doublef=A/B
infa0
and integern
ina0
and returns rounded doubleF
infa0
.Hint: \(10^n*A/B < 2^{31}\)
Input:
123 456 7
Output:
0.2697368
Spoiler: \(10^n*A/B < 2^{31}\) means that you can just take an integer part of it, then divide the result back to \(10^n\)
-
Write program
cubic_root.s
that does the following:Input double (positive or negative) values \(1 <= |A| <= 1000000\) and \(0.00001<= ɛ <=0.01\). Calculate a cubical root of A with closeness \(<=ɛ\) (you do not need to round the result).
HINT: You always can calculate a cubic power of something!
Input:
1000 0.0001
Output:
9.99995
Spoiler: suppose solution is between M and N (M < N). Select \(K=(M+N)/2\) and if \(|K^3|>|A|\) then solution is between M and K, else it is between K and N.
-
Bonus task (2 bonus points). Write program
leibpi.s
that does the following:Calculate π value using Leibniz formula for π accurate to N decimal places. Input N, output the result. Use function defined in FractionTruncate to truncate out other digits. Keep in mind that the exact formula is calculating π/4, you probably should start with 4 instead 1 to gain exact accuracy. Warning: the algorithm is slow, do not panic, but keep code as simple as possible.
Input:
4
Output:
3.1416
Hint: to gain performance, keep anything in registers.
References
- Standard IEEE 754 (Wikipedia).
- Standard IEEE 754-2008.
- Floating point. Section 3.5 in [CODR].
- Floating point. Section 2.4 in [CSPP].
- RISC-V Assembly Programmer’s Manual.
- RISC-V Formal Specifications in nML.