Data Representation

What to expect

This chapter discusses binary, hexadecimal, character sets, images and sound. Some extracts are below.

Hand signals

We need to head over to France for an early binary code. Louis Braille injured an eye in his father’s leather workshop at the age of three, and the resulting infection caused him to go blind in both eyes by five. At age ten he obtained a scholarship to the Paris Institute for Blind Children, which at the time used a system of raised letters invented by Valentin Haüy. The letter shapes we know and love are not distinct enough to be easily discerned by touch, however, and Braille found Haüy’s system hard to learn.

While there, Braille was shown a system of raised dots used by the military to communicate at night. He took this system and improved upon it, using just six dots to represent all the letters of the alphabet plus numbers and some punctuation symbols. Each dot is raised or flat, and a blank space (effectively six flat dots) separates words and sentences. In this way, the grid of six dots could represent 2x2x2x2x2x2 = 2⁶ or 64 different characters.

Magic Numbers

The importance of knowing the format of a file before processing it with application code is so important that there are two ways that a file identifies itself to the operating system. Windows uses file extensions such as .gif or .jpg and associates each extension with an application. It’s easy to accidentally change this extension, however, or even delete it. Meanwhile Unix-derived operating systems such as Linux and MacOS don’t use file associations anyway, so the metadata in the file header will also identify the content. For example, all JPEG files begin with the two bytes FFD8 in hex, GIFs begin with the ASCII string GIF89a, and PDFs begin %PDF. Because this trick originally used only numeric codes, this identifying string is still sometimes called the file’s “magic number”.

Misconception

The number of possible colours in a bitmap image is equal to the number of bits per pixel, so 4 bits gives 4 colours.

Reality

The number of possible colours is calculated as 2^bit depth, so 4 bits gives 2^4 = 16 colours, and 8 bits gives 2^8 = 256 colours. This misconception can be tackled by doing some student activities with low bit depths from 1 to 4, so they can see how the number of bit patterns, therefore the number of colours, doubles each time you add a bit.

TL;DR.

At the heart of this topic is the idea that as long as we can turn information into binary data, we can use a computer to process it. Digital computers process binary numbers because they use two-state electrical signals. The challenge is therefore to find a transformation from the real-world information to binary. This transformation is called encoding and it makes use of a code.

ASCII and Unicode are used to encode text, JPEG, GIF, PNG do the same for bitmap images, and WAV, MP3 and AAC encode digital sound. But it’s important to realise that there is a virtually limitless number of ways of encoding information, and these are just the techniques that are widely used due to effectiveness or official recognition, or both.

Analogue to digital conversion is the process of mapping the original data to the digital representation, and it’s vital to understand binary to really grasp the importance of bit depth and resolution and their effect on file size. Metadata is “data about data” and describes the contents of the file or something about the original information.

References

The PCK section in this chapter includes these references:

Use Python with an image library such as PIL, create and demonstrate some sample code, and then allow the students to edit it to make their own image filters. Some sample code is here, this program is easily edited to change the colour cast of a JPEG image. https://repl.it/@mraharrison/redfilter .

Martin O’ Hanlon has created a whole course on using Python with image filters here: https://www.futurelearn.com/info/courses/representing-data-with-images-and-sound/

Image filters (above) are a great means of linking programming to data representation. You could also link Programming with sound data, using a Python library such as wave https://docs.python.org/3/library/mm.html

What to expect

TL;DR.

References

Share this: