Ch. 7: Bits and Bytes

Coding with symbols.
1. Digital information is just information represented using digits.
  1. Usual meaning of “digits” is the symbols 0-9.
  2. In fact, any symbols could be used: 601 925 3815, or ^)! (@% #*!%
  3. Use of digits can actually be misleading: a phone number is not a quantity.
2. Combinations.
  1. Digits (or letters, or shapes, or whatever) are basic symbols.
  2. To name a large number of things, you need to make groups of them, called strings or words.
  3. If you have p symbols, there are pⁿ strings of length n.
    1. Six faces on a die, two dice, give 6² = 36 pairs.
    2. Ten digits, 10³ = 1000 3-digit numbers (000 through 999).
    3. If strings are constructed using five capital letters, there are 26⁵ = 11,881,376 possibilities.
3. Ordering.
  1. Symbols made of digits are easily ordered for searching.
  2. We can do this with any symbols if we agree on the collating sequence.
  3. Alphabetical order is a collating sequence.
Binary
1. To record or transmit digital information, symbols must be recorded using some physical phenomenon.
  1. Shapes using ink on a paper page.
  2. Shapes pressed into clay (which is then hardened).
  3. Puffs of smoke.
  4. Flashes of light.
2. Computer engineers find it easiest to use physical phenomena classified into two states. Present and Absent; PandA.
  1. On or off.
  2. Charged or discharged.
  3. Magnetized or not.
  4. Pit or land (non-pit).
3. Divide disks into areas.
4. Hexadecimal
  1. A more compact representation for strings of bits.
  2. Use 16 symbols, 0–9, A–F: hex “digits”.
  3. Each hex digit is four bits:
    
    0 0000 1 0001 2 0010 3 0011
    4 0100 5 0101 6 0110 7 0111
    8 1000 9 1001 A 1010 B 1011
    C 1100 D 1101 E 1110 F 1111
  4. Group starting at right; pad with zeros on the left.
  5. 101000001010110011 is 282B3
  6. 37D1 is 0011011111010001.
  7. Have you seen this before?
The humble byte.
1. A byte is simply a string of 8 bits.
2. Spelled “byte” instead of “bite” because a single typo can't change “byte” to “bit”.
Representing text.
1. Early and still widely used: ASCII (American Standard Code for Information Interchange).
  1. Originally, 7 bits, 0000000–1111111 (00–7F). That's 128 characters.
  2. PC memory is divided in bytes, making it convenient to use 8-bit characters, so they doubled the number of assignments to 256.
  3. Second bank, 10000000–11111111 is not well-standardized.
  4. Tom is 010101000110111101101101, or 546F6D
2. Unicode.
  1. ASCII for US English.
  2. The second bank adds characters for other European languages.
  3. For the rest of the world, esp. middle-eastern and oriental languages, we're going to need more than 8 bits.
  4. Unicode defines a vast number of codes for pretty much everyone.
    1. Use 16 bits (65,536 codes) for the most common characters.
    2. Use 32 bits (4,294,967,296 codes) if you need them all.
3. UTF-8
  1. The most commonly-used format for web pages.
  2. Codes vary in length.
    1. Characters from 7-bit ASCII are represented by their ASCII code, using one byte.
    2. Other characters use two or more bytes.
  3. Designed to allow coding of world alphabets without breaking old software.
  4. Where have you seen UTF-8 before in this class?
Encoding with Redundancy. Detects or avoids errors.
1. NATO radio code. Listener can identify the letters over noise.
2. Bar Codes.
  1. Bit are recorded by the presence or absence of a bar.
  2. Decimal digits are coded in binary.
  3. The UPC has a manufacturer and a product code.
  4. Each digit is assigned two codes, one for the manufacturer side, another for the product code side.
  5. Codes are chosen so that the reverse of any code is not a legal code.
    Can tell when the code was read up side down.
  6. To create enough codes, each decimal digit requires seven bits. (What's the least number of bits needed to code all ten digits?)
3. Parity bit.
  1. Append an extra bit to your data so that the total count of 1's is always an even number. (Or an odd number, just so long as everyone agrees ahead of time.)
  2. The reader or receiver can detect when a single bit was read incorrectly. (How is that?)
  3. Does UPC use parity?
4. Error-Correcting codes.
  1. With these, the reader can actually correct damaged data, if the damage is not too severe.
  2. Used with computer memories and storage devices, and in space communications.

0	0000	1	0001	2	0010	3	0011
4	0100	5	0101	6	0110	7	0111
8	1000	9	1001	A	1010	B	1011
C	1100	D	1101	E	1110	F	1111