LEARN-Data.

Fertile Questions

for the Data Chapter of “How to Learn Computer Science”.

Welcome


Thank you for buying my book! This page discusses the content in the “Data” chapter and answers the “Fertile Questions” I asked there. There are no perfect answers, however: you may even disagree, but the point of a fertile question is to make you think.

Here are the questions, and my suggested answers. Do you agree?

Advertisements
Advertisements
Can a computer store and process anything we see or hear?

In short, yes. The answer to this question lies in my explanation in the “TL;DR” section in the book.

At the heart of this topic is the idea that if we can turn information into binary data, we can use a computer to process it. Digital computers process binary numbers because they use two-state electrical signals. The challenge is therefore to find a transformation from real-world information to binary. This transformation is called encoding and it makes use of a code. ASCII and Unicode are used to encode text; JPEG, GIF, PNG do the same for bitmap images; and WAV, MP3 and AAC encode digital sound. But it’s important to realise that there are virtually limitless ways of encoding
information and these are just the techniques that are widely used, owing to their effectiveness or official recognition, or both.

How to Learn Computer Science, page 34

So as long as we can capture information from the real world, and find a way of encoding it – turning it into binary numbers – we can use a computer to process it.

Every file format has its own encoding method, and there are thousands of them, but we can define a new encoding method whenever we want. The challenge is to write a code that works efficiently when we have lots of data.

Advertisements
A Word document takes up just 100KB of storage until I insert images. Then it’s 12MB – why?

Text requires little storage compared to images. But why?

Consider how few different characters are needed in a text document. Remember the discussion of ASCII and Unicode. Even allowing for large character sets like Chinese, 4 bytes (32 bits) is all we need for every possible character. So our documents are a maximum of 4 bytes times the number of characters in the document. This will never get massive: 100KB is more than enough space for a novel, even in Chinese!

But a digital image (let’s talk about bitmap images only today) requires much more space. That’s because we need to record the colour of each pixel. JPEG images are made up of 24 bits (3 bytes) of information in each pixel, and a typical smartphone camera now takes around 12 “megapixel” images, 12 million pixels. So the raw data being collected from the real world, by the camera sensor, is 12 million times 24 bits or 36 MB.

The JPEG format used by most digital cameras and smartphones does some clever compression though, so usually our files are a lot smaller than that, but still run to many megabytes for a typical digital photo.

JPG, GIF, SVG – why so many image file formats?

Although we know the JPEG format quite well, it’s not the only bitmap format. A few different bitmap formats sprung up from different sources in the 80s and 90s. From the book…

The JPEG format is excellent for digital photography but does not support transparency, and with GIF’s limitation of 256 colours, Portable Network Graphics (PNG) emerged as a rival format in 1996.

How to Learn Computer Science, page 30

Bitmaps are great for photos but not logos and fonts. They do not scale well, causing blurring, or “pixelation”, when enlarged. So another reason for multiple image formats is the need for a format that works better with curves and shapes. Vector images are made of mathematical formulae that scale well without losing sharpness. SVG is a vector format. Do you know any more?

Advertisements
Why does a 700MB CD sound the same as a 50MB MP3 album?

Some people may disagree, but there’s not much difference because of two reasons:

  • The limitation of the human ear: sounds that are too high or too low frequency (pitch) have been removed from the data
  • The remaining data is compressed to save space without affecting quality in a noticable manner

My book for teachers, “How to Teach Computer Science” goes into more detail on this topic, as your teacher to get a copy 🙂

Advertisements