Binary Coding Schemes
The alphabetic data, numeric data, alphanumeric data, symbols, sound data and video data, are represented as combination of bits in the computer. The bits are grouped in a fixed size, such as 8 bits, 6 bits or 4 bits. A code is made by combining bits of definite size. Binary Coding schemes represent the data such as alphabets, digits 0−9, and symbols in a standard code. A combination of bits represents a unique symbol in the data. The standard code enables any programmer to use the same combination of bits to represent a symbol in the data.
The binary coding schemes that are most commonly used are
- Extended Binary Coded Decimal Interchange Code (EBCDIC)
- American Standard Code for Information Interchange (ASCII)
- Unicode
1. EBCDIC
- The Extended Binary Coded Decimal Interchange Code (EBCDIC) uses 8 bits (4 bits for zone, 4 bits for digit) to represent a symbol in the data.
- EBCDIC allows 28 = 256 combinations of bits.
- 256 unique symbols are represented using EBCDIC code. It represents decimal numbers (0−9), lower case letters (a−z), uppercase letters (A−Z), Special characters, and Control characters (printable and non−printable, e.g., for cursor movement, printer vertical spacing, etc.).
- EBCDIC codes are mainly used in the mainframe computers.
2. ASCII
- The American Standard Code for Information Interchange (ASCII) is widely used in computers of all types.
- ASCII codes are of two types—ASCII−7 and ASCII−8.
- ASCII-7 is a 7-bit standard ASCII code. In ASCII-7, the first 3 bits are the zone bits and the next 4 bits are for the digits. ASCII-7 allows 27 = 128 combinations. 128 unique symbols are represented using ASCII-7. ASCII-7 has been modified by IBM to ASCII-8.
- ASCII-8 is an extended version of ASCII-7. ASCII-8 is an 8-bit code having 4 bits for zone and 4 bits for the digit. ASCII-8 allows 28 = 256 combinations. ASCII-8 represents 256 unique symbols. ASCII is used widely to represent data in computers.
- The ASCII-8 code represents 256 symbols.
- Codes 48 to 57 stand for numeric 0−9.
- Codes 65 to 90 stand for uppercase letters A−Z.
- Codes 97 to 122 stand for lowercase letters a−z.
- Codes 128 to 255 are the extended ASCII codes.
3 Unicode
- Unicode is a universal character encoding standard for the representation of text which includes letters, numbers and symbols in multi−lingual environments. The Unicode Consortium based in California developed the Unicode standard.
- Unicode uses 32 bits to represent a symbol in the data.
- Unicode allows 232 = 4164895296 (~ 4 billion) combinations.
- Unicode can uniquely represent any character or symbol present in any language like Chinese, Japanese, etc. In addition to the letters; mathematical and scientific symbols are also represented in Unicode codes.
- An advantage of Unicode is that it is compatible with the ASCII−8 codes. The first 256 codes in Unicode are identical to the ASCII-8 codes.
- Unicode is implemented by different character encodings. UTF-8 is the most commonly used encoding scheme. UTF stands for Unicode Transformation Format. UTF-8 uses 8 bits to 32 bits per code.