1.2 - Memory and Storage (Part 2)

Units of Data

- Computers use 1s and 0s (binary digits, bits) to represent information.
- A byte is 8 bits, as this is what you need to represent a character in binary with ASCII, or the American Standard Code for Information Interchange.
- Interestingly, a nibble is 4 bits or half a byte.
- Modern systems mainly use a different character system called Unicode, but the byte as 8 bits stuck.

B: Byte
KB: Kilobyte (1000 bytes)
MB: Megabyte (1,000,000 bytes)
And so on according to the SI prefixes.

In addition to this, a lowercase 'b' indicates bits, and an i (as in MiB), indicates x1024. So 1KiB = 1,024B, 1MiB = 1,048,576B, but 1 Mb is only 0.125MiB.
(at GCSE [x]iB doesn't matter, so you may ignore this.)


Reading data from a disk/disc

- The reason computers use binary is that they consist of billions (at least modern ones anyway) of tiny switches called transistors. Switches are either on or off, therefore computers must use a numbering system that it either on or off (1 or 0): binary.
- This means that all data entering a computer system must be converted to binary before it can be processed.
- This also means that storage devices use binary. In optical discs, it is represented by a land (higher) or pit (lower), in magnetic media, it is represented by positive or negative magnetic polarity, and in solid state media it is represented by the presence of electrons in a floating gate or charge trap layer.


Capacity Requirements

- Data capacity is the max amount of data a storage device can hold.
- By knowing the capacity of a storage device, and how the size of a type of data scales with length, you can calculate how much of different data types can be stored.
- Data types can include text, basic streams of bytes, sound files, any file format really.

- You calculate capacity requirements by doing capacity/size of file.
- For text files, you do bytes per character * number of characters to get file size.
- For image files, you do resolution (like 1920x1080, so 2,073,600) * colour depth (usually 24 bits).
- For sound files, you do sample rate * duration * bit depth.


Representing Characters

This was alluded to before

- Computers have to represent all data in binary, and this includes keyboard input characters.
- The more binary digits you use to represent characters, the more characters you can represent but the more bits are taken up.
- Computers use character sets for characters, which is a 'defined list of characters that can be understood by a computer'.
- Each character is given a binary code, and sets are ordered logically (e.g. in ASCII 00110000 is 0, while 00110001 is 1).
- A character set provides a standard for computers to send/receive text.

- The most common character sets are ASCII (American Standard Code for Information Interchange) and Unicode (Universal Character Encoding).
- As you might guess, ASCII only contains English characters (and some numbers and special characters), which makes sense since it's an American standard but is a bit short-sighted.
- With Unicode, instead, the goal is to represent every character ever used and more and more gets added to it with every revision.
- It uses a minimum of 16 bits, providing 65,536 possible characters.

ASCII Unicode
7 bits used 16 bits used
128 chars 65,536 chars
Can represent English characters only Used to represent characters across the world, and has the ability for emojis and other special characters.
Low storage space needed Higher storage space required

Representing Sound

- For sound to be represented as binary so computers can process it, it must first be sampled and stored.
- To do this, when a microphone records it is actually recording measurements of the original sound wave and then are stored as binary on secondary storage.
- Sound waves are analogue (meaning a continuous wave), but computers can only work with digital data.
- The process of conversion is simply called 'Analogue to Digital', or A2D.

To convert sound waves, the computer:
1. Measures the amplitude of the analogue sound wave, creating samples.
2. Generates binary values from the samples.
3. Creates a digital version of the sound wave from the samples.
Then, to play back the audio, the computer reverses the process, converting digital to analogue for speakers.

If you still remember what I said in the 'Capacity Requirements' section, you may now be wondering where 'sample rate' and 'bit depth' come in to this.
- Sample rate is the amount of samples taken per second (Hz) of an analogue sound wave.
- The higher the sample rate, the less blocky the sound wave looks and the more the sound will sound true to the original:

image source, licensed under CC-BY-SA

Landline telephones use 16/22kHz sample rates, CDs use 44.1kHz, and DVDs use 48kHz.

Bit Depth is the number of bits stored per sample of sound. It being higher means a bigger range of sounds can be represented, and it has better sound quality, but it results in a larger file size.


Vectors vs Bitmaps

- There are two methods used for orepresenting images: bitmaps, where each pixel is stored individually, or vectors, where images are represented through mathematical equations.

- In Bitmap images, each pixel you see is a square that has a RGB (Red, Green, Blue) value.
- A pixel is the smallest possible element of a bitmap.
- Each RGB value has a binary code, often represented in hexadecimal (like #ffffff, more on hex later).
- Bitmap images are usually things like photographs.

- In Vector images, each pixel on the screen is being processed from mathematical equations and points.
- With vectors, you can have infinite scalability, resizing up or down.
- To create a circle, the data stored would be the centre point (x, y coordinates) and the radius of the circle.
- Vectors are commonly used for things like logos, because the logo may be used on anything from a tiny coin to the side of a plane.


Colour Depth and Resolution

- Colour depth is the number of bits stored per pixel in bitmaps.
- The more colours needed in the image, the higher the colour depth.
- More colours allow you to fine-grain detail.
- For example, a black and white image has a colour depth of 1 (1=white, 0=black), and an 8-colour image has a colour depth of 3 (000, 001, 010, 011, 100 and so on).
- A colour depth of 24 bits is called True Colour and is what most computers support and use.
- The more colours and detail (resolution) in the image, the more binary that needs to be stored and the higher the file size will be.
- When resolution increases in a bitmap, the amount of pixel increases.
- Quality vs file size is an important consideration.


Compression

- Compression is defined as reducing the size of a file so it takes up less storage.
- Commonly compression is used to maximise the amount of data stored, or to minimise the amount of data travelling over a network.
- There are two types of compression: lossy and lossless.

- Lossless compression is where the data can be turned back into its original data, and is done through a process called encoding.
- It does not reduce the file size as much as lossy compression, but can be used on more types of data.
- It is used where a loss in quality is unacceptable.
- A lossless encoding algorithm looks for patterns in the data. For example, if a string was 'uwuuwuowoowo', it would be represented as 'uwu2owo2'.

- Lossy compression, on the other hand, removes quality from a file.
- Lossy compression is irreversible, but can greatly reduce the size of a file.
- It is suitable for files where reducing quality is ok.
- Compressing too much, for example on images, can result in bluriness or other issues.