Tesla Configurable Abstract Float8 (CFloat8) This standard specifies Tesla arithmetic formats and methods for the new 8-bit and 16-bit binary floating-point arithmetic in computer programming & Float16 (CFloat16) environments for deep learning neural network training. This standard also specifies exception conditions and the status flags thereof. An Formats implementation of a floating-point system conforming to this standard may be realized entirely in software, entirely in hardware, or in any combination of software and hardware. Keywords Arithmetic, binary, computer, deep learning, neural networks, training, exponent, floating-point, format, NaN, Infinity, number, mantissa, subnormal, denormal, configurable exponent bias, range, precision, rounding mode, random number generator, stochastic rounding. Motivation The original IEEE 754 standard, which was published in 1985 specified formats and methods for floating-point arithmetic in computer systems— standard and extended functions with single (32-bit), double (64-bit) precision. The standard single and double precision formats are shown in Table 1 below. Table 1: Floating Point Formats defined by the IEEE 754 Standard Format Sign bit? No. of Mantissa bits No. of Exponent bits Exponent Bias Value Single Precision (Float32) Yes 1 + 23 8 127 Double Precision (Float64) Yes 1 + 52 11 1023 The purpose of the standard was to provide a method for computation with floating-point numbers that will yield the same result whether the processing is done in hardware, software, or a combination of the two. The results of the computation will be identical, independent of implementation, given the same input data. Errors, and error conditions, in the mathematical processing will be reported in a consistent manner regardless of implementation. The above formats have been widely adopted in computer systems, both hardware and software, for scientific, numeric, and various other computing. Subsequently, the revised IEEE754 standard in 2008 also included a half precision (16-bit), only as a storage format without specifying the arithmetic operations. However, Nvidia and Microsoft defined this datatype in the Cg language even earlier, in early 2002, and implemented it in silicon in the GeForce FX, released in late 2002. The IEEE half precision format has been used not just for storage but even for performing arithmetic operations in various computer systems, especially for graphics and machine learning applications. This format is used in several computer graphics environments including MATLAB, OpenEXR, JPEG XR, GIMP, OpenGL, Cg, Direct3D, and D3DX. The advantage over single precision binary format is that it requires half the storage and bandwidth (at the expense of precision and range). Subsequently, the IEEE half precision format has been adopted in machine learning systems in the Nvidia AI processors, especially for training, due to the significantly increased memory storage and bandwidth requirements in such applications. More recently, Google Brain, an artificial intelligence research group at Google, developed the Brain Floating Point, or BFloat16 (16-bit) format 02 Tesla Dojo Technology — A Guide to Tesla’s Configurable Floating Point Formats & Arithmetic
Tesla Dojo Technology Page 1 Page 3