Recent Orders

Our Reviews

Sample Papers

How It Works

Get First 2 Pages Of Your Homework Absolutely Free!

Messages

Welcome to TutorsOnSpot.Com!

World's No. 1 Assignment Writing Market

Post Your Homework

Proposals

Post your homework and get free proposals here!

Post Your Homework

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Get Free Quotes Post Your Requirements

80 023 in scientific notation

09/11/2021 Client: muhammad11 Deadline: 2 Day

C H A P T E R 3 DATA REPRESENTATION

C H A P T E R G O A L S

Describe numbering systems and their use in data representation

Compare different data representation methods

Summarize the CPU data types and explain how nonnumeric data is represented

Describe common data structures and their uses

Computers manipulate and store a variety of data, such as numbers, text, sound, and pictures. This

chapter describes how data is represented and stored in computer hardware. It also explains how

simple data types are used as building blocks to create more complex data structures, such as arrays

and records. Understanding data representation is key to understanding hardware and software

technologies.

DATA REPRESENTATION AND PROCESSING

People can understand and manipulate data represented in a variety of forms. For example, they can understand numbers represented symbolically as Arabic numerals (such as 8714), Roman numerals (such as XVII), and simple lines or tick marks on paper (for example, ||| to represent the value 3). They can understand words and concepts represented with pictorial characters ( ) or alphabetic characters ( computer and , Cyrillic text of the Russian word for computer) and in the form of sound waves (spoken words). People also extract data from visual images (photos and movies) and from the senses of taste, smell, and touch. The human brain s processing power and flexibility are evident in the rich variety of data representations it can recognize and understand.

To be manipulated or processed by the brain, external data representations, such as printed text, must be converted to an internal format and transported to the brain s pro- cessing circuitry. Sensory organs convert inputs, such as sight, smell, taste, sound, and skin sensations, into electrical impulses that are transported through the nervous system to the brain. Processing in the brain occurs as networks of neurons exchange data electrically.

Any data and information processor, whether organic, mechanical, electrical, or opti- cal, must be capable of the following:

Recognizing external data and converting it to an internal format Storing and retrieving data internally Transporting data between internal storage and processing components Manipulating data to produce results or decisions

Note that these capabilities correspond roughly to computer system components described in Chapter 2 I/O units, primary and secondary storage, the system bus, and the CPU.

Automated Data Processing Computer systems represent data electrically and process it with electrical switches. Two- state (on and off) electrical switches are well suited for representing data that can be expressed in binary (1 or 0) format, as you see later in Chapter 4. Electrical switches are combined to form processing circuits, which are then combined to form processing sub- systems and entire CPUs. You can see this processing as an equation:

A B C

In this equation, data inputs A and B, represented as electrical currents, are trans- ported through processing circuits (see Figure 3.1). The electrical current emerging from the circuit represents a data output, C. Automated data processing, therefore, combines physics (electronics) and mathematics.

The physical laws of electricity, optics, and quantum mechanics are described by mathematical formulas. If a device s behavior is based on well-defined, mathematically described laws of physics, the device, in theory, can implement a processor to perform the equivalent mathematical function. This relationship between mathematics and physics underlies all automated computation devices, from mechanical clocks (using the

FIGURE 3.1 Two electrical inputs on the left flow through processing circuitry that generates their sum on the right

Courtesy of Course Technology/Cengage Learning

Chapter 3

mathematical ratios of gears) to electronic microprocessors (using the mathematics of electrical voltage and resistance). As you learned in Chapter 2, in quantum mechanics, the mathematical laws are understood but not how to build reliable and cost-effective com- puting devices based on these laws.

Basing computer processing on mathematics and physics has limits, however. Proces- sing operations must be based on mathematical functions, such as addition and equality comparison; use numerical data inputs; and generate numerical outputs. These processing functions are sufficient when a computer performs numeric tasks, such as accounting or statistical analysis. When you ask a computer to perform tasks such as searching text documents and editing sound, pictures, and video, numeric-processing functions do have limitations, but ones that modern software has largely overcome. However, when you want to use a computer to manipulate data with no obvious numeric equivalent for example, literary or philosophical analysis of concepts such as mother, friend, love, and hate numeric-processing functions have major shortcomings. As the data you want to

process moves further away from numbers, applying computer technology to processing the data becomes increasingly difficult and less successful.

Binary Data Representation In a decimal (base 10) number, each digit can have 1 of 10 possible values: 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. In a binary number, each digit can have only one of two possible values: 0 or 1. Computers represent data with binary numbers for two reasons:

Binary numbers represented as binary electrical signals can be transported reliably between computer systems and their components (discussed in detail in Chapter 8). Binary numbers represented as electrical signals can be processed by two- state electrical devices that are easy to design and fabricate (discussed in detail in Chapter 4).

For computer applications to produce accurate outputs, reliable data transport is important. Given current technology, binary signals and processing devices represent the most cost-efficient tradeoffs between capacity, accuracy, reliability, and cost.

Binary numbers are also well suited to computer processing because they correspond directly with values in Boolean logic. This form of logic is named for 19th-century mathe- matician George Boole, who developed methods of reasoning and logical proof that use sequences of statements that can be evaluated only as true or false. Similarly, a computer can perform logical comparisons of two binary data values to determine whether one data value is greater than, equal to, less than, less than or equal to, not equal to, or greater than or equal to another value. As discussed in Chapter 2, a computer uses this primitive logical capability to exhibit intelligent behavior.

Both computers and humans can combine digits to represent and manipulate larger numbers. Decimal and binary notations are alternative forms of a positional numbering system, in which numeric values are represented as groups, or strings, of digits. The symbol used to represent a digit and the digit s position in a string determine its value. The value of the entire string is the sum of the values of all digits in the string.

Data Representation and Processing

For example, in the decimal numbering system, the number 5689 is interpreted as follows:

5 1000 6 100 8 10 9

5000 600 80 9 5689

The same series of operations can be represented in columnar form, with positions of the same value aligned in columns:

5000 600 80 9

5689

For whole numbers, values are accumulated from right to left. In the preceding exam- ple, the digit 9 is in the first position, 8 is in the second position, 6 is in the third, and 5 is in the fourth.

The maximum value, or weight, of each position is a multiple of the weight of the position to its right. In the decimal numbering system, the first (rightmost) position is the ones (100), and the second position is 10 times the first position (101). The third position is 10 times the second position (102, or 100), the fourth is 10 times the third position (103, or 1000), and so on. In the binary numbering system, each position is 2 times the previous position, so position values for whole numbers are 1, 2, 4, 8, and so forth. The multiplier that describes the difference between one position and the next is the base, or radix, of the numbering system. In the decimal numbering system, it s 10, and in the binary numbering system, it s 2.

The fractional part of a numeric value is separated from the whole part by a period, although in some countries, a comma is used instead of a period. In the decimal number- ing system, the period or comma is called a decimal point. In other numbering systems, the term radix point is used for the period or comma. Here s an example of a decimal value with a radix point:

5689 368

The fractional portion of this real number is .368, and its value is interpreted as follows:

3 10-1 6 10-2 8 10-3

3 1 6 01 8 001

0 3 0 06 0 008 0 368

Proceeding toward the right from the radix point, the weight of each position is a fraction of the position to its left. In the decimal (base 10) numbering system, the first position to the right of the decimal point represents tenths (10-1), the second position represents hundredths (10-2), the third represents thousandths (10-3), and so forth.

Chapter 3

In the binary numbering system, the first position to the right of the radix point represents halves (2-1), the second position represents quarters (2-2), the third represents eighths (2-3), and so forth. As with whole numbers, each position has a weight 10 (or 2) times the position to its right. Table 3.1 compares decimal and binary notations for the values 0 through 10.

The number of digits needed to represent a value depends on the numbering system s base: The number of digits increases as the numbering system s base decreases. Therefore, values that can be represented in a compact format in decimal notation might require lengthy sequences of binary digits. For example, the decimal value 99 requires two decimal digits but seven binary digits. Table 3.2 summarizes the number of binary digits needed to represent decimal values up to 16 positions.

TABLE 3.1 Binary and decimal notations for the values 0 through 10

Binary system (base 2) Decimal system (base 10)

Place 23 22 21 20 103 102 101 100

Values 8 4 2 1 1000 100 10 1

0 0 0 0 5 0 0 0 0

0 0 0 1 5 0 0 0 1

0 0 1 0 5 0 0 0 2

0 0 1 1 5 0 0 0 3

0 1 0 0 5 0 0 0 4

0 1 0 1 5 0 0 0 5

0 1 1 0 5 0 0 0 6

0 1 1 1 5 0 0 0 7

1 0 0 0 5 0 0 0 8

1 0 0 1 5 0 0 0 9

1 0 1 0 5 0 0 1 0

Data Representation and Processing

To convert a binary value to its decimal equivalent, use the following procedure:

1. Determine each position weight by raising 2 to the number of positions left ( ) or right (-) of the radix point.

2. Multiply each digit by its position weight. 3. Sum all the values calculated in Step 2.

N O T E The standard Windows calculator can convert between binary, octal, decimal, and hexadecimal. To open the calculator in Windows 7, click Start, All Programs, Accessories, Calculator. To convert a binary num- ber to decimal, click View, Programmer from the menu (View, Scientific for Vista and earlier Windows versions). Click the Bin (for binary) option button at the upper left, enter the binary number in the text box, and then click the Dec (for decimal) option button.

TABLE 3.2 Binary notations for decimal values up to 16 positions

Number of bits (n) Number of values (2n) Numeric range (decimal)

1 2 0 1

2 4 0 3

3 8 0 7

4 16 0 15

5 32 0 31

6 64 0 63

7 128 0 127

8 256 0 255

9 512 0 511

10 1024 0 1023

11 2048 0 2047

12 4096 0 4095

13 8192 0 8191

14 16,384 0 16,383

15 32,768 0 32,767

16 65,536 0 65,535

Chapter 3

Figure 3.2 shows how the binary number 101101.101 is converted to its decimal equivalent, 45.625.

In computer terminology, each digit of a binary number is called a bit. A group of bits that describe a single data value is called a bit string. The leftmost digit, which has the greatest weight, is called the most significant digit, or high-order bit. Conversely, the rightmost digit is the least significant digit, or low-order bit. A string of 8 bits is called a byte. Generally, a byte is the smallest unit of data that can be read from or written to a storage device.

The following mathematical rules define addition of positive binary digits:

0 0 0 1 0 1 0 1 1 1 1 10

To add two positive binary bit strings, you first must align their radix points as follows:

101101 101 10100 0010

FIGURE 3.2 Computing the decimal equivalent of a binary number Courtesy of Course Technology/Cengage Learning

Data Representation and Processing

The values in each column are added separately, starting with the least significant, or rightmost, digit. If a column result exceeds 1, the excess value must be carried to the next column and added to the values in that column.

N O T E The standard Windows calculator can add and subtract binary integers. To use this feature, click View, Programmer from the menu (View, Scientific for Vista and earlier Windows versions), and click the Bin option button. You can also click View, Digit grouping from the menu to place digits in groups of four for easier readability.

This is the result of adding the two preceding numbers:

1 1 1 1

101101 101 10100 0010

1000001 1100

The result is the same as when adding the values in base-10 notation:

Real Real Binary fractions decimal

101101 101 45 58 45 625 10100 0010 20 18 20 125

1000001 1100 65 34 65 750

Binary numbers usually contain many digits and are difficult for people to remember and manipulate without error. Compilers and interpreters for high-level programming lan- guages, such as C and Java, convert decimal numbers into binary numbers automatically when generating CPU instructions and data values. However, sometimes programmers must deal with binary numbers directly, such as when they program in machine language or for some operating system (OS) utilities. To minimize errors and make dealing with binary numbers easier, numbering systems based on even multiples of 2 are sometimes used. These numbering systems include hexadecimal and octal, discussed in the following sections.

Hexadecimal Notation Hexadecimal numbering uses 16 as its base or radix ( hex 6 and decimal 10). There aren t enough numeric symbols (Arabic numerals) to represent 16 different values, so English letters represent the larger values (see Table 3.3).

Chapter 3

The primary advantage of hexadecimal notation, compared with binary notation, is its compactness. Large numeric values expressed in binary notation require four times as many digits as those expressed in hexadecimal notation. For example, the data content of a byte requires eight binary digits (such as 11110000) but only two hexadecimal digits (such as F0). This compact representation helps reduce programmer error.

Hexadecimal numbers often designate memory addresses. For example, a 64 KB memory region contains 65,536 bytes (64 1024 bytes/KB). Each byte is identified by a sequential numeric address. The first byte is always address 0. Therefore, the range of possible memory addresses is 0 to 65,535 in decimal numbers, 0 to 1111111111111111 in binary numbers, and 0 to FFFF in hexadecimal numbers. As you can see from this exam- ple, hexadecimal addresses are more compact than decimal or binary addresses because of the numbering system s higher radix.

When reading a numeric value in written text, the number s base might not be obvious. For example, when reading an OS error message or a hardware installation manual, should the number 1000 be interpreted in base 2, 10, 16, or something else? In mathematical expressions, the base is usually specified with a subscript, as in this example:

10012

The subscript 2 indicates that 1001 should be interpreted as a binary number. Similarly, in the following example, the subscript 16 indicates that 6044 should be interpreted as a hexadecimal number:

604416

The base of a written number can be made explicit by placing a letter at the end. For example, the letter B in this example indicates a binary number:

1001B

The letter H in this example indicates a hexadecimal number:

6044H

TABLE 3.3 Hexadecimal and decimal values

Base-16 digit Decimal value Base-16 digit Decimal value

0 0 8 8

1 1 9 9

2 2 A 10

3 3 B 11

4 4 C 12

5 5 D 13

6 6 E 14

7 7 F 15

Data Representation and Processing

Normally, no letter is used to indicate a decimal (base 10) number. Some program- ming languages, such as Java and C , use the prefix 0x to indicate a hexadecimal number. For example, 0x1001 is equivalent to 100116.

Unfortunately, these conventions aren t observed consistently. Often it s left to the reader to guess the correct base by the number s content or the context in which it appears. A value containing a numeral other than 0 or 1 can t be binary, for instance. Similarly, the use of letters A through F indicates that the contents are expressed in hexadecimal. Bit strings are usually expressed in binary, and memory addresses are usually expressed in hexadecimal.

Octal Notation Some OSs and machine programming languages use octal notation. Octal notation uses the base-8 numbering system and has a range of digits from 0 to 7. Large numeric values expressed in octal notation are one-third the length of corresponding binary notation and double the length of corresponding hexadecimal notation.

GOALS OF COMPUTER DATA REPRESENTATION

Although all modern computers represent data internally with binary digits, they don t necessarily represent larger numeric values with positional bit strings. Positional number- ing systems are convenient for people to interpret and manipulate because the sequential processing of digits in a string parallels the way the human brain functions and because people are taught to perform computations in a linear fashion. For example, positional numbering systems are well suited to adding and subtracting numbers in columns by using pencil and paper. Computer processors, however, operate differently from a human brain. Data representation tailored to human capabilities and limitations might not be best suited to computers.

Any representation format for numeric data represents a balance among several factors, including the following:

Compactness Range Accuracy Ease of manipulation Standardization

As with many computer design decisions, alternatives that perform well in one factor often perform poorly in others. For example, a data format with a high degree of accuracy and a large range of representable values is usually difficult and expensive to manipulate because it s not compact.

Compactness and Range The term compactness (or size) describes the number of bits used to represent a numeric value. Compact representation formats use fewer bits to represent a value, but they re limited in the range of values they can represent. For example, the largest binary integer that can be stored in 32 bits is 232, or 4,294,967,29610. Halving the number of bits

Chapter 3

to make the representation more compact decreases the largest possible value to 216, or 65,53510.

Computer users and programmers usually prefer a large numeric range. For example, would you be happy if your bank s computer limited your maximum checking account balance to 65,535 pennies? The extra bit positions required to increase the numeric range have a cost, however. Primary and secondary storage devices must be larger and, there- fore, more expensive. Additional and more expensive capacity is required for data trans- mission between devices in a computer system or across computer networks. CPU processing circuitry becomes more complex and expensive as more bit positions are added. The more compact a data representation format, the less expensive it is to implement in computer hardware.

Accuracy Although compact data formats can minimize hardware s complexity and cost, they do so at the expense of accurate data representation. The accuracy, or precision, of representa- tion increases with the number of data bits used.

It s possible for routine calculations to generate quantities too large or too small to be contained in a machine s finite circuitry (that is, in a fixed number of bits). For example, the fraction 1/3 can t be represented accurately in a fixed number of bits because it s a nonterminating fractional quantity (0.333333333 , with an infinite number of 3s). In these cases, the quantities must be manipulated and stored as approximations, and each approximation introduces a degree of error. If approximate results are used as inputs for other computations, errors can be compounded and even result in major errors. For this reason, a program can have no apparent logical flaws yet still produce inaccurate results.

If all data types were represented in the most compact form possible, approximations would introduce unacceptable margins of error. If a large number of bits were allocated to each data value instead, machine efficiency and performance would be sacrificed, and hardware cost would be increased. The best balance in performance and cost can be achieved by using an optimum coding method for each type of data or each type of oper- ation to be performed. Striving for this balance is the main reason for the variety of data representation formats used in modern CPUs.

Ease of Manipulation When discussing computer processing, manipulation refers to executing processor instructions, such as addition, subtraction, and equality comparisons, and ease refers to machine efficiency. A processor s efficiency depends on its complexity (the number of its primitive components and the complexity of the wiring that binds them together). Effi- cient processor circuits perform their functions quickly because of the small number of components and the short distance electricity must travel. More complex devices need more time to perform their functions.

Data representation formats vary in their capability to support efficient processing. For example, most people have more difficulty performing computations with fractions than with decimal numbers. People process the decimal format more efficiently than the fractional format.

Goals of Computer Data Representation

Unfortunately, there s no best representation format for all types of computation operations. For example, representing large numeric values as logarithms simplifies multi- plication and division for people and computers because log A log B log (A B) and log A - log B log (A B). Logarithms complicate other operations, such as addition and subtraction, and they can increase a number s length substantially (for example, log 99 1.9956351945975499153402557777533).

Standardization Data must be communicated between devices in a single computer and to other computers via networks. To ensure correct and efficient data transmission, data formats must be suitable for a wide variety of devices and computers. For this reason, several organizations have created standard data-encoding methods (discussed later in the Character Data section). Adhering to these standards gives computer users the flexibility to combine hardware from different vendors with minimal data communication problems.

CPU DATA TYPES

The CPUs of most modern computers can represent and process at least the following primitive data types:

Integer Real number Character Boolean Memory address

The arrangement and interpretation of bits in a bit string are usually different for each data type. The representation format for each data type balances compactness, range, accuracy, ease of manipulation, and standardization. A CPU can also implement multiple versions of each type to support different types of processing operations.

Integers An integer is a whole number a value that doesn t have a fractional part. For example, the values 2, 3, 9, and 129 are integers, but the value 12.34 is not. Integer data formats can be signed or unsigned. Most CPUs provide an unsigned integer data type, which stores positive integer values as ordinary binary numbers. An unsigned integer s value is always assumed to be positive.

A signed integer uses one bit to represent whether the value is positive or negative. The choice of bit value (0 or 1) to represent the sign (positive or negative) is arbitrary. The sign bit is normally the high-order bit in a numeric data format. In most data formats, it s 1 for a negative number and 0 for a nonnegative number. (Note that 0 is a nonnegative number.)

The sign bit occupies a bit position that would otherwise be available to store part of the data value. Therefore, using a sign bit reduces the largest positive value that can be stored in any fixed number of bit positions. For example, the largest positive value that

Chapter 3

can be stored in an 8-bit unsigned integer is 255, or 28 - 1. If a bit is used for the sign, the largest positive value that can be stored is 127, or 27 - 1.

With unsigned integers, the lowest value that can be represented is always 0. With signed integers, the lowest value that can be represented is the negative of the highest value that can be stored (for example, -127 for 8-bit signed binary).

Excess Notation

One format that can be used to represent signed integers is excess notation, which always uses a fixed number of bits, with the leftmost bit representing the sign. For example, the value 0 is represented by a bit string with 1 in the leftmost digit and 0s in all the other digits. As shown in Table 3.4, all nonnegative values have 1 as the high-order bit, and neg- ative values have 0 in this position. In essence, excess notation divides a range of ordinary binary numbers in half and uses the lower half for negative values and the upper half for nonnegative values.

To represent a specific integer value in excess notation, you must know how many storage bits are to be used, whether the value fits within the numeric range of excess notation for that number of bits, and whether the value to be stored is positive or negative. For any number of bits, the largest and smallest values in excess notation are 2(n-1) - 1 and -2(n-1), where n is the number of available storage bits.

TABLE 3.4 Excess notation

Bit string Decimal value

1111 7

Nonnegative numbers

1110 6

1101 5

1100 4

1011 3

1010 2

1001 1

1000 0

0111 -1

Negative numbers

0110 -2

0101 -3

0100 -4

0011 -5

0010 -6

0001 -7

0000 -8

CPU Data Types

For example, consider storing a signed integer in 8 bits with excess notation. Because the leftmost bit is a sign bit, the largest positive value that can be stored is 27 - 1, or 12710, and the smallest negative value that can be stored is -27, or -12810. The range of positive values appears to be smaller than the range of negative values because 0 is considered a positive (nonnegative) number in excess notation. Attempting to represent larger positive or smaller negative values results in errors because the leftmost (sign) bit might be over- written with an incorrect value.

Now consider how 9 and -9 are represented in 8-bit excess notation. Both values are well within the numeric range limits for 8-bit excess notation. The ordinary binary repre- sentation of 910 in 8 bits is 00001001. Recall that the excess notation representation of 0 is always a leading 1 bit followed by all 0 bits 10000000 for 8-bit excess notation. Because 9 is nine integer values greater than 0, you can calculate the representation of

9 by adding its ordinary binary representation to the excess notation representation of 0 as follows:

10000000 00001001 10001001

To represent negative values, you use a similar method based on subtraction. Because -9 is nine integer values less than 0, you can calculate the representation of -9 by sub- tracting its ordinary binary representation from the excess notation representation of 0 as follows:

10000000 - 00001001 01110111

Twos Complement Notation

In the binary numbering system, the complement of 0 is 1, and the complement of 1 is 0. The complement of a bit string is formed by substituting 0 for all values of 1 and 1 for all values of 0. For example, the complement of 1010 is 0101. This transformation is the basis of twos complement notation. In this notation, nonnegative integer values are represented as ordinary binary values. For example, a twos complement representation of 710 using 4 bits is 0111.

Bit strings for negative integer values are determined by the following transformation:

complement of positive value 1 negative representation

Parentheses are a common mathematical notation for showing a value s complement; for example, if A is a numeric value, (A) represents its complement. In 4-bit twos comple- ment representation, -710 is calculated as follows:

0111 0001

1000 0001

1001 -710

As another example, take a look at the twos complement representation of 3510 and -3510 in 8 bits. The ordinary binary equivalent of 3510 is 00100011, which is also the

Chapter 3

twos complement representation. To determine the twos complement representation of -3510, use the previous formula with 8-bit numbers:

00100011 00000001

11011100 00000001

11011101 -3510

Twos complement notation is awkward for most people, but it s highly compatible with digital electronic circuitry for the following reasons:

The leftmost bit represents the sign. A fixed number of bit positions are used. Only two logic circuits are required to perform addition on single-bit values. Subtraction can be performed as addition of a negative value.

The latter two reasons enable CPU manufacturers to build processors with fewer components than are needed for other integer data formats, which saves money and increases computational speed. For these reasons, all modern CPUs represent and mani- pulate signed integers by using twos complement format.

Range and Overflow

Most modern CPUs use 64 bits to represent a twos complement value and support 32-bit formats for backward compatibility with older software. A 32-bit format is used in the remainder of this book to simplify the discussion and examples. A small positive value, such as 1, occupies 32 bits even though, in theory, only 2 bits are required (one for the value and one for the sign). Although people can deal with numeric values of varying lengths, computer circuitry isn t nearly as flexible. Fixed-width formats enable more efficient processor and data communication circuitry. The additional CPU complexity required to process variable-length data formats results in unacceptably slow performance. Therefore, when small numeric values are stored, the extra bit positions are filled with leading 0s.

The numeric range of a twos complement value is -(2n-1) to (2n-1 - 1), where n is the number of bits used to store the value. The exponent is n-1 because 1 bit is used for the sign. For 32-bit twos complement format, the numeric range is -2,147,483,64810 to 2,147,483,64710. With any fixed-width data storage format, it s possible that the result of a computation will be too large to fit in the format. For example, the Gross Domestic Prod- uct of each U.S. state was less than $2 billion in 2005. Therefore, these values can be represented as 32-bit twos complement integers. Adding these numbers to calculate Gross National Product (GNP), however, yields a sum larger than $2 billion. Therefore, a pro- gram that computes GNP by using 32-bit twos complement values will generate a value that exceeds the format s numeric range. This condition, referred to as overflow, is treated as an error by the CPU. Executing a subtraction instruction can also result in overflow for example, -(231) - 1. Overflow occurs when the absolute value of a computational result contains too many bits to fit into a fixed-width data format.

As with most other aspects of CPU design, data format length is one design factor that needs to be balanced with others. Large formats reduce the chance of overflow by increasing the maximum absolute value that can be represented, but many bits are wasted

CPU Data Types

(padded with leading 0s) when smaller values are stored. If bits were free, there would be no tradeoff. However, extra bits increase processor complexity and storage requirements, which increase computer system cost. A CPU designer chooses a data format width by balancing numeric range, the chance of overflow during program execution, and the complexity, cost, and speed of processing and storage devices.

To avoid overflow and increase accuracy, some computers and programming lan- guages define additional numeric data types called double-precision data formats. A double-precision data format combines two adjacent fixed-length data items to hold a single value. Double-precision integers are sometimes called long integers.

Overflow can also be avoided by careful programming. If a programmer anticipates that overflow is possible, the units of measure for program variables can be made larger. For example, calculations on centimeters could be converted to meters or kilometers, as appropriate.

Real Numbers A real number can contain both whole and fractional components. The fractional portion is represented by digits to the right of the radix point. For example, the following compu- tation uses real number data inputs and generates a real number output:

18 0 4 0 4 5

This is the equivalent computation in binary notation:

10010 100 100 1

Representing a real number in computer circuitry requires some way to separate the value s whole and fractional components (that is, the computer equivalent of a written radix point). A simple way to accomplish this is to define a storage format in which a fixed-length portion of the bit string holds the whole portion and the remainder of the bit string holds the fractional portion. Figure 3.3 shows this format with a sign bit and fixed radix point.

The format in Figure 3.3 is structurally simple because of the radix point s fixed loca- tion. The advantage of this simplicity is simpler and faster CPU processing circuitry.

FIGURE 3.3 A 32-bit storage format for real numbers using a fixed radix point Courtesy of Course Technology/Cengage Learning

Chapter 3

Unfortunately, processing efficiency is gained by limiting numeric range. Although the sample format uses 32 bits, its numeric range is substantially less than 32-bit twos com- plement. Only 16 bits are allocated to the whole portion of the value. Therefore, the larg- est possible whole value is 216 - 1, or 65,535. The remaining bits store the fractional portion of the value, which can never be greater than or equal to 1.

You could increase the format s numeric range by allocating more bits to the whole portion (shifting the radix point in Figure 3.3 to the right). If the format s total size is fixed at 32 bits, however, the reallocation would reduce the number of bits used to store the fractional portion of the value. Reallocating bits from the fractional portion to the whole portion reduces the precision of fractional quantities, which reduces computational accuracy.

Floating-Point Notation

One way of dealing with the tradeoff between range and precision is to abandon the con- cept of a fixed radix point. To represent extremely small (precise) values, move the radix point far to the left. For example, the following value has only a single digit to the left of the radix point:

0 0000000013526473

Similarly, very large values can be represented by moving the radix point far to the right, as in this example:

1352647300000000 0

Note that both examples have the same number of digits. By floating the radix point left or right, the first example trades range of the whole portion for increased fractional precision, and the second example trades fractional precision for increased whole range. Values can be very large or very small (precise) but not both at the same time.

People tend to commit errors when manipulating long strings of digits. To minimize errors, they often write large numbers in a more compact format called scientific notation. In scientific notation, the two preceding numbers shown are represented as 13,526,473 10-16 and 13,526,473 108. Note that the numbering system s base (10) is part of the multiplier. The exponent can be interpreted as the number and direction of positional moves of the radix point, as shown in Figure 3.4. Negative exponents indicate movement to the left, and positive exponents indicate movement to the right.

FIGURE 3.4 Conversion of scientific notation to decimal notation Courtesy of Course Technology/Cengage Learning

CPU Data Types

Real numbers are represented in computers by using floating-point notation, which is similar to scientific notation except that 2 (rather than 10) is the base. A numeric value is derived from a floating-point bit string according to the following formula:

value mantissa 2exponent

The mantissa holds the bits that are interpreted to derive the real number s digits. By convention, the mantissa is assumed to be preceded by a radix point. The exponent value indicates the radix point s position.

Many CPU-specific implementations of floating-point notation are possible. Differences in these implementations can include the length and coding formats of the mantissa and exponent and the radix point s location in the mantissa. Although twos complement can be used to code the exponent, mantissa, or both, other coding formats might offer better design tradeoffs. Before the 1980s, there was little compatibility in floating-point format between different CPUs, which made transporting floating-point data between different computers difficult or impossible.

The Institute of Electrical and Electronics Engineers (IEEE) addressed this problem in standard 754, which defines the following formats for floating-point data:

binary32 32-bit format for base 2 values binary64 64-bit format for base 2 values binary128 128-bit format for base 2 values decimal64 64-bit format for base 10 values decimal128 128-bit format for base 10 values

The binary32 and binary64 formats were specified in the standard s 1985 version and have been adopted by all computer and microprocessor manufacturers. The other three formats were defined in the 2008 version. Computer and microprocessor manufacturers are currently in the process of incorporating these formats into their products, and some products (such as the IBM POWER6 processor) already include some newer formats. For the remainder of this chapter, all references to floating-point representation refer to the binary32 format, unless otherwise specified.

Figure 3.5 shows the binary32 format. The leading sign bit applies to the mantissa, not the exponent, and is 1 if the mantissa is negative. The 8-bit exponent is coded in excess notation (meaning its first bit is a sign bit). The 23-bit mantissa is coded as an ordinary binary number. It s assumed to be preceded by a binary 1 and the radix point. This format extends the mantissa s precision to 24 bits, although only 23 are actually stored.

Chapter 3

Range, Overflow, and Underflow

The number of bits in a floating-point string and the formats of the mantissa and exponent impose limits on the range of values that can be represented. The number of digits in the mantissa determines the number of significant (nonzero) digits in the largest and smallest values that can be represented. The number of digits in the exponent determines the number of possible bit positions to the right or left of the radix point.

Using the number of bits assigned to mantissa and exponent, the largest absolute value of a floating-point value appears to be the following:

1 11111111111111111111111 211111111

Exponents containing all 0s and all 1s, however, represent special data values in the IEEE standards. Therefore, the usable exponent range is reduced, and the decimal range for the entire floating-point value is approximately 10-45 to 1038.

Floating-point numbers with large absolute values have large positive exponents. When overflow occurs, it always occurs in the exponent. Floating-point representation is also subject to a related error condition called underflow. Very small numbers are repre- sented by negative exponents. Underflow occurs when the absolute value of a negative exponent is too large to fit in the bits allocated to store it.

Precision and Truncation

Recall that scientific notation, including floating-point notation, trades numeric range for accuracy. Accuracy is reduced as the number of digits available to store the mantissa is reduced. The 23-bit mantissa used in the binary32 format represents approximately seven decimal digits of precision. However, many useful numbers contain more than seven nonzero decimal digits, such as the decimal equivalent of the fraction 1/3:

1 3 0 33333333

The number of digits to the right of the decimal point is infinite. Only a limited number of mantissa digits are available, however.

Numbers such as 1/3 are stored in floating-point format by truncation. The numeric value is stored in the mantissa, starting with its most significant bit, until all available bits are used. The remaining bits are discarded. An error or approximation occurs any time a floating-point value is truncated. However, the truncated digits are insignificant compared

FIGURE 3.5 IEEE binary32 floating-point format Courtesy of Course Technology/Cengage Learning

CPU Data Types

with the significant, or large, value that s stored. Problems can result when truncated values are used as input to computations. The error introduced by truncation can be magnified when truncated values are used and generate inaccurate results. The error resulting from a long series of computations starting with truncated inputs can be large.

An added difficulty is that more values have nonterminating representations in the binary system than in the decimal system. For example, the fraction 1/10 is nonterminat- ing in binary notation. The representation of this value in floating-point notation is a truncated value, but these problems can usually be avoided with careful programming. In general, programmers reserve binary floating-point calculations for quantities that can vary continuously over wide ranges, such as measurements made by scientific instruments.

When possible, programmers use data types other than binary32 to avoid or minimize the impact of truncation. Most current microprocessors can store and manipulate binary64 values, and support for binary128 is gradually being added. In addition, most programming languages can emulate binary128, decimal64, and decimal128 values, although processing these values is considerably slower than when the microprocessor supports them as hardware data types. Programmers seeking to minimize representation and computation errors should choose the largest floating-point format supported by hardware or software.

Monetary values are particularly sensitive to truncation errors. Most monetary sys- tems have at least one fractional monetary unit, such as pennies fractions of a U.S. dol- lar. Novice programmers sometimes assume that monetary amounts should be stored and manipulated as binary floating-point numbers. Inevitably, truncation errors caused by nonterminating representations of tenths and other fractions occur. Cumulative errors mount when truncated numbers, or approximations, are input in subsequent calculations.

One way to address the problem is to use integer arithmetic for accounting and finan- cial applications. To do so, a programmer stores and manipulates monetary amounts in the smallest possible monetary unit for example, U.S. pennies or Mexican pesos. Small denominations are converted to larger ones only when needed for output, such as printing a dollar amount on a check or account statement.

Although representing and manipulating monetary values as integers provides computa- tional accuracy, this method has limitations. For example, complex formulas for computing interest on loans or investment balances include exponents and division. Intermediate cal- culation results for programs using these formulas can produce fractional quantities unless monetary amounts are scaled to very small units (for example, millionths of a penny).

The decimal64 and decimal128 bit formats defined in the 2008 version of IEEE stan- dard 754 are intended to address the shortcomings of both binary floating-point and inte- ger representation of monetary units. These formats provide accurate representation of decimal values, and the standard specifies rounding methods that can be used instead of truncation to improve computational accuracy. Both formats use the same basic approach as in binary formats a mantissa and an exponent but they encode three decimal digits in each 10-bit group.

Processing Complexity

The difficulty of learning to use scientific and floating-point notation is understandable. These formats are far more complex than integer data formats, and the complexity affects

Chapter 3

both people and computers. Although floating-point formats are optimized for processing efficiency, they still require complex processing circuitry. The simpler twos complement format used for integers requires much less complex circuitry.

The difference in processing circuitry complexity translates to a difference in speed of performing calculations. The magnitude of the difference depends on several factors, including the computation and the exact details of the processing circuitry. As a general rule, simple computational operations, such as addition and subtraction, take at least twice as long with floating-point numbers than with integers. The difference is even greater for operations such as division and exponentiation. For this reason and for reasons of accuracy, careful programmers never use a real number when an integer can be used, particularly for frequently updated data items.

Character Data In their written form, English and many other languages use alphabetic letters, numerals, punctuation marks, and a variety of other special-purpose symbols, such as $ and &. Each symbol is a character. A sequence of characters that forms a meaningful word, phrase, or other useful group is a string. In most programming languages, single characters are sur- rounded by single quotation marks ('c'), and strings are surrounded by double quotation marks ("computer").

Character data can t be represented or processed directly in a computer because computers are designed to process only digital data (bits). It can be represented indirectly by defining a table that assigns a numeric value to each character. For example, the inte- ger values 0 through 9 can be used to represent the characters (numerals)'0' through'9', the uppercase letters'A' through'Z' can be represented as the integer values 10 through 36, and so forth.

A table-based substitution of one set of symbols or values for another is one example of a coding method. All coding methods share several important characteristics, including the following:

All users must use the same coding and decoding methods. The coded values must be capable of being stored or transmitted. A coding method represents a tradeoff among compactness, range, ease of manipulation, accuracy, and standardization.

The following sections describe some common coding methods for character data.

EBCDIC

Extended Binary Coded Decimal Interchange Code (EBCDIC) is a character-coding method developed by IBM in the 1960s and used in all IBM mainframes well into the 2000s. Recent IBM mainframes and mainframe OSs support more recent character-coding methods, but support for EBCDIC is still maintained for backward compatibility. EBCDIC characters are encoded as strings of 8 bits.

ASCII

The American Standard Code for Information Interchange (ASCII), adopted in the United States in the 1970s, is a widely used coding method in data communication. The international

CPU Data Types

equivalent of this coding method is International Alphabet 5 (IA5), an International Organi- zation for Standardization (ISO) standard. Almost all computers and OSs support ASCII, although a gradual migration is in progress to its newer relative, Unicode.

ASCII is a 7-bit format because most computers and peripheral devices transmit data in bytes and because parity checking was used widely in the 1960s to 1980s for detecting transmission errors. Chapter 8 discusses parity checking and other error detection and correction methods. For now, the important characteristic of parity checking is that it requires 1 extra bit per character. Therefore, 1 of every 8 bits isn t part of the data value, leaving only 7 bits for data representation.

The standard ASCII version used for data transfer is sometimes called ASCII-7 to emphasize its 7-bit format. This coding table has 128, or 27, defined characters. Computers that use 8-bit bytes are capable of representing 256, or 28, different characters. In most computers, the ASCII-7 characters are included in an 8-bit character coding table as the first, or lower, 128 table entries. The additional, or upper, 128 entries are defined by the computer manufacturer and typically used for graphical characters, such as line-drawing characters and multinational characters for example, á, ñ, Ö, and . This encoding method is sometimes called ASCII-8. The term is a misnomer, as it implies that the entire table (all 256 entries) is standardized. In fact, only the first 128 entries are defined by the ASCII standard. Table 3.5 shows portions of the ASCII and EBCDIC coding tables.

TABLE 3.5 Partial listing of ASCII and EBCDIC codes

Symbol ASCII EBCDIC

0 0110000 11110000

1 0110001 11110001

2 0110010 11110010

3 0110011 11110011

4 0110100 11110100

5 0110101 11110101

6 0110110 11110110

7 0110111 11110111

8 0111000 11111000

9 0111001 11111001

A 1000001 11000001

B 1000010 11000010

C 1000011 11000011

a 1100001 10000001

b 1100010 10000010

c 1100011 10000011

Chapter 3

Device Control When text is printed or displayed on an output device, often it s format- ted in a particular way. For example, text output to a printer is normally formatted in lines and paragraphs, and a customer record can be displayed onscreen so that it looks like a printed form. Certain text can be highlighted when printed or displayed by using methods such as underlining, bold font, or reversed background and foreground colors.

ASCII defines several device control codes (see Table 3.6) used for text formatting by sending them immediately before or after the characters they modify. Among the simpler codes are carriage return, which moves the print head or insertion point to the beginning of a line; line feed, which moves the print head or insertion point down one line; and bell, which generates a short sound, such as a beep or bell ring. In ASCII, each of these func- tions is assigned a numeric code and a short character name, such as CR for carriage return, LF for line feed, and BEL for bell. In addition, some ASCII device control codes are used to control data transfer. For example, ACK is sent to acknowledge correct receipt of data, and NAK is sent to indicate that an error has been detected.

N O T E The 33 device control codes in the ASCII table occupy the first 32 entries (numbered 0 through 31) and the last entry (number 127).

TABLE 3.6 ASCII control codes

Decimal code Control character Description

000 NUL Null

001 SOH Start of heading

002 STX Start of text

003 ETX End of text

004 EOT End of transmission

005 ENQ Enquiry

006 ACK Acknowledge

007 BEL Bell

008 BS Backspace

009 HT Horizontal tabulation

010 LF Line feed

011 VT Vertical tabulation

012 FF Form feed

013 CR Carriage return

014 SO Shift out

015 SI Shift in

CPU Data Types

Software and Hardware Support Because characters are usually represented in the CPU as unsigned integers, there s little or no need for special character-processing instructions. Instructions that move and copy unsigned integers behave the same whether the content being manipulated is an actual numeric value or an ASCII-encoded character. Similarly, an equality or inequality comparison instruction that works for unsigned inte- gers also works for values representing characters.

The results of nonequality comparisons are less straightforward. The assignment of numeric codes to characters follows a specific order called a collating sequence. A greater- than comparison with two character inputs (for example,'a' less than'z') returns a result based on the numeric comparison of the corresponding ASCII codes that is, whether the numeric code for'a' is less than the numeric code for'z'. If the character set has an order and the coding method follows the order, less-than and greater-than comparisons usually produce expected results.

However, using numeric values to represent characters can produce some unexpected or unplanned results. For example, the collating sequence of letters and numerals in ASCII follows the standard alphabetic order for letters and numeric order for numerals, but uppercase and lowercase letters are represented by different codes. As a result, an equality comparison between uppercase and lowercase versions of the same letter returns false because the numeric codes aren t identical. For example,'a' doesn t equal'A', as shown

TABLE 3.6 ASCII control codes (continued)

Decimal code Control character Description

016 DLE Data link escape

017 DC1 Device control 1

018 DC2 Device control 2

019 DC3 Device control 3

020 DC4 Device control 4

021 NAK Negative acknowledge

022 SYN Synchronous idle

023 ETB End of transmission block

024 CAN Cancel

025 EM End of medium

026 SUB Substitute

027 ESC Escape

028 FS File separator

029 GS Group separator

030 RS Record separator

031 US Unit separator

127 DEL Delete

Chapter 3

previously in Table 3.5. Punctuation symbols also have a specific order in the collating sequence, although there s no widely accepted ordering for them.

ASCII Limitations ASCII s designers couldn t foresee the code s long lifetime (almost 50 years) or the revolutions in I/O device technologies that would take place. They never envisioned modern I/O device characteristics, such as color, bitmapped graphics, and selectable fonts. Unfortunately, ASCII doesn t have the range to define enough control codes to account for all the formatting and display capabilities in modern I/O devices.

ASCII is also an English-based coding method. This isn t surprising, given that when it was defined, the United States accounted for most computer use and almost all computer production. ASCII has a heavy bias toward Western languages in general and American English in particular, which became a major limitation as computer use and production proliferated worldwide.

Recall that 7-bit ASCII has only 128 table entries, 33 of which are used for device control. Only 95 printable characters can be represented, which are enough for a usable subset of the characters commonly used in American English text. This subset, however, doesn t include any modified Latin characters, such as ç and á, or those from other alphabets, such as .

The ISO partially addressed this problem by defining many different 256-entry tables based on the ASCII model. One, called Latin-1, contains the ASCII-7 characters in the lower 128 table entries and most of the characters used by Western European languages in the upper 128 table entries. The upper 128 entries are sometimes called multinational characters. The number of available character codes in a 256-entry table, however, is still much too small to represent the full range of printable characters in world languages.

Further complicating matters is that some printed languages aren t based on charac- ters in the Western sense. Chinese, Japanese, and Korean written text consists of ideo- graphs, which are pictorial representations of words or concepts. Ideographs are composed of graphical elements, sometimes called strokes, that number in the thousands. Other written languages, such as Arabic, present similar, although less severe, coding problems.

Homework is Completed By:

Writer	Writer Name	Amount	Client Comments & Rating
ONLINE	Instant Homework Helper 4.8 4305 Orders Completed	$36	She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up! 5.00
Answer.docx Turnitin Report.pdf Contact Writer For Solution Contact Writer For Solution

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 3 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 6 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 12 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

6 writers have sent their proposals to do this homework:

Writer	Writer Name	Offer	Chat
ONLINE	A+GRADE HELPER I have done dissertations, thesis, reports related to these topics, and I cover all the CHAPTERS accordingly and provide proper updates on the project. 4.8 2289 Orders Completed	$17	Chat With Writer
ONLINE	Smart Tutor I reckon that I can perfectly carry this project for you! I am a research writer and have been writing academic papers, business reports, plans, literature review, reports and others for the past 1 decade. 4.9 1008 Orders Completed	$24	Chat With Writer
ONLINE	Essay & Assignment Help As per my knowledge I can assist you in writing a perfect Planning, Marketing Research, Business Pitches, Business Proposals, Business Feasibility Reports and Content within your given deadline and budget. 4.8 1071 Orders Completed	$37	Chat With Writer
ONLINE	Top Writing Guru I have assisted scholars, business persons, startups, entrepreneurs, marketers, managers etc in their, pitches, presentations, market research, business plans etc. 4.7 1680 Orders Completed	$48	Chat With Writer
ONLINE	Top Grade Tutor I have written research reports, assignments, thesis, research proposals, and dissertations for different level students and on different subjects. 4.8 11445 Orders Completed	$16	Chat With Writer
ONLINE	Assignment Hut I have worked on wide variety of research papers including; Analytical research paper, Argumentative research paper, Interpretative research, experimental research etc. 4.8 1995 Orders Completed	$22	Chat With Writer