1’s and 0’s -The magic behind data representation.


We all know that data in a computer is represented as an endless series of bits, with each bit having a value of either 1 or 0. But, if everything is just 1’s and 0’s, then how do we have numbers, strings, images, audio, video etc ? How are these 1’s and 0’s transformed into these sensible things ? In this article we’ll be discussing the general concept behind the internal representation of data in computers.

Registers

This is where the magic begins. So far we know that everything is just 1' s and 0’s. Let’s go a step deeper. These continuous bits of 1’s and 0’s present in the computer can be divided into chunks called registers. These chunks of data are either grouped into 32 bits wide or 64 bits wide based on the architecture. So, a 32 bit architecture simply means the memory is divided into 32 bit chunks of data and this implies that all instructions and data are typically processed in 32-bit chunks by the CPU. Similarly for a 64 bit system, the register size is 64 bits.

So now we know that a register equals either 32 or 64 consecutive bits in memory. Each of these chunks of data has a memory address associated with it. A memory address is how we refer or access a particular chunk of data.

Encoding and Decoding

So now that we have registers with addresses containing data as 1’s and 0’s. How can these 1’s and 0’s be read as different types of data ? If all data are stored as just 1’s and 0’s, then real data must somehow be converted or encoded into numbers and eventually into the binary representation of 1’s and 0’s. Encoding refers to converting real-world data into a binary representation which is stored in the computers, and decoding refers to converting binary data in the computers back to its original form. This encoding and decoding is different for different data types and programming languages. An integer encoded using a particular technique in a particular language could be different from the encoding technique used by another language. Hence this encoding algorithm plays a major role in how any real data is converted into its’s equivalent 1’s and 0’s. Theoretically when you know how you converted real data to 1’s and 0’s, then you must know how to convert the 1’s and 0’s back to the real data and this is the decoding process.

Information in registers = actual data + meta data

So now you know that real data is converted into binary data and then stored in the registers. When real data is converted or encoded, the resulting bits of 1’s and 0’s contain two types of data. What kind of data is stored in the register ? and what is the actual data that is being stored ? The bits containing information about the type of data stored or other additional details about the actual data, is called the metadata and the remaining bits will contain the actual data. Which particular bits are meta data and which bits are the actual data will be determined by the encoding scheme of a particular datatype. Hence when decoding, the instructions would be to check the metadata first to identify the type of data stored and other information related to the actual data. Then, once the type is identified, the decoding algorithm related to that particular type would tell the CPU which bits contain the actual data and how they can be converted back into real data.

Connecting registers

Now that we know that a register will have bits containing actual data and metadata, how can we store everything in just one register ? What if, larger datasets, like images or videos, require more space than what a single register can provide. To tackle this, we separate data into chunks and store them in different registers. To identify which chunk belongs to what data and the order of each chunk, we can store the chunks of data in contiguous registers back to back and the total number of registers holding the actual data can also be stored as metadata. This way we can access the immediate next registers for next chunks of data and continue until we have collected all the successive chunks containing the actual data. Another common technique is to make use of the memory address and store the address of the next chunk as data in the register, commonly referred to as pointers, so that when decoding, we will know which register contains the next chunk of data and comes after the current register as a continuation of data. Again, the exact implementation of how these chunks are connected together depends on the specific encoding and decoding scheme being used for the data type.

Conclusion

Now that you have a general idea of how data is being stored and retrieved from computers, you can start looking into concrete implementations of encoding and decoding different data types in different languages. I’ll myself be explaining how integers and other data types are represented internally in elixir/erlang in my next articles. If you are really interested in learning more about how computers really work in a very low level, I would suggest you to start your journey by watching this crash course series on YouTube.

Comments

Popular posts from this blog

Mastering the Information Avalanche: Your Roadmap to Conquer Digital Overload

Navigating the Investment Landscape with Canvas Investment Partners

HEB Community Investment: Nurturing Communities for a Brighter Tomorrow