Blockchain and Bitcoin
The revolutionary blockchain technology was first adopted by the Bitcoin cryptocurrency, and later other cryptocurrencies such as Peercoin, Litecoin, Ethereum, etc., followed suit. Therefore, the word blockchain is closely associated with the word Bitcoin. But certainly, blockchain and Bitcoin are two distinct entities. The latter is a cryptocurrency that utilizes the former as its technology underneath. Searching for blochchain-only materials on Internet most certainly leads to many blog posts and videos materials that explain the blockchain technology from the Bitcoin perspective.
While understanding the blockchain technology from a use-case perspective is helpful, but the specific concepts used solely in a particular use-case may lead to biased understanding of blockchain and can cause confusion when studying other use-cases. So, in this article we will first take a deep dive into understanding the basic fundamentals of blockchain technology without getting into specific details of Bitcoin or other uses-cases, unless required to provide examples.
What is blockchain technology?
In simple terms, blockchain is a technology for storing and managing any data; securely, permanently and pseudo-anonymously. By permanence, it means the data cannot be edited or deleted once stored on a blockchain platform. By pseudo-anonymous, it means the identities of owners of the data are represented by codes only.
Presently, most of the database management systems are centralized client-server based architecture; where a server stores the database and clients connect to the server to fetch data. In contrast, a blockchain network is based on a peer-to-peer, decentralized and distributed architecture. All nodes (computers) connected to the blockchain network stores the database, without the need for a centralized server. Broadly, we can break a blockchain platform into 2 parts: the network part and the database part.
As you might be aware, most legacy database systems uses tables, containing columns and rows, to store and retrieve data. But the blockchain technology stores data as series of blocks that points to its predecessor.
A Real World Analogy
To understand the blockchain database in a practical sense, imagine an entire blockchain database as your family genealogy. Assume yourself as the latest block in your family genealogy and your father as the previous block and the chain continues backwards until the very first genesis block “Adam”. Assume your unique genetics as the data stored in you i.e., in block-Y. Similarly, unique genetics data are stored in your father (block-X) and so on until the block-A (Adam). Now, each block in this “human” blockchain is related to the previous block by DNA. Similarly, in the actual blockchain database, each data block is related to its previous block via a mathematically generated code, known as the “block hash” (more about this later). Now this entire chain of data blocks or the blockchain database is stored on all the nodes of a blockchain network, thus making it a distributed database platform.
A look into the blockchain network model
A blockchain network is formed through peer-to-peer interconnections of nodes. A node can join or leave the network anytime. When it joins, it pulls a copy of the entire blockchain database from other nodes. Next, the node collects a set of new data from the users, verifies the data and packs them together along with the previous block hash to produce a new block. Lastly, it computes a hash of the newly created block and broadcasts the block to other nodes. Upon receiving the new block, the other nodes would validate the block by verifying the block hash and all the data inside the block. If the block passes the validation, the nodes append the new block to their respective copy of the blockchain database.
All nodes would sync among each other to maintain the same snapshot of the database using a consensus protocol. This distributed data storage enables true decentralization and provides security against:
- Single point of failure
- Data manipulation
If data gets corrupted on one or few nodes, valid data stored on other majority nodes will replace the corrupted data. Regarding data manipulation, a malicious attacker needs to hack into more than 50% of total nodes simultaneously to successfully manipulate the data, which is highly improbable in a blockchain network unless the network has very few node counts. For instance, there are roughly 10,000 nodes participating in the Bitcoin network, essentially rendering hacking effort practically infeasible.
Abstract model of Blockchain database
The above figure represents an abstract model of the blockchain database; where blocks are chained using previous block hash and arranged according to the block creation timestamps. A block generally contains a header and data section.
Besides distributed storage, linking each block using the previous block’s hash further enhances the security against data manipulation. In the above figure, the Block-O’s header stores the hash of Block-N, similarly the Block-N’s header stores the hash of Block-M and goes on until the first block. Since the first block– the genesis block has no predecessor, it stores zeros in the previous hash field.
In general, a hash is simply a unique code mathematically generated for a particular data input. This code consists of fixed length characters consisting of alphabets (a-f) and numbers (0-9). The SHA-256 is the widely preferred algorithm for computing hash. For a given input, the algorithm produces a unique sequence of 64 characters of alphabets and numbers. E.g.,
I want to compute the hash of this sentence using SHA-256.
e4a86ea27880dea48260a40 c6429723a9aa63552129439 f65a25b3e8a851341c
If the input data slightly changes; even a period, the hash code also changes entirely.
I want to compute the hash of this sentence using SHA-256
6acc9f9100261af3f912303 3e34d1829368db340a946d3 c8cc7afff383c92a49
Note, the second hash code completely changed after removing the period at the end of the input sentence. In blockchain, a whole block is taken as the input for computing the block hash and the next block stores this hash in its header. This way, if an attacker manipulates the data inside a block, then he needs to recompute new hash, not just for the manipulated block, but also for all the next sequence of blocks until the latest block in the chain. Now, what if an attacker decides to recompute all the hash codes for all the blocks? It seems pretty straight forward, right? But two things makes difficult for the attacker to succeed– the consensus protocol and other is of-course the distributed storage.
Any untrusted node can join the blockchain network without undergoing (node) verification process. Therefore, blockchain platforms use consensus protocols to secure the network and achieve reliability. A consensus protocol enables a blockchain network to be consistent; with all nodes (or atleast 51% of the nodes) agreeing to a single snapshot of the database. The most commonly used consensus protocol is the Proof-of-Work (PoW). Another alternate consensus algorithm: the Proof-of-Stake (PoS), is gaining traction among major blockchain platforms such as Peercoin, Bitshares and Ethereum.
Any blockchain use-case adopting a Proof-of-Work mechanism requires all the nodes to compute a “difficult” block hash before broadcasting the block to other nodes. Basically a node should compute a block hash that satisfies the difficulty target set by the network. E.g., a hash containing 17 leading zeros. So a node increments a variable called “nonce” in the block header until it computes a hash with 17 leading zeros which satisfies the difficulty target. Finding a 64 character hash that meets the difficulty target is a time intensive process. E.g., in the Bitcoin, a node on average takes 10 minutes to compute a block hash. This block-hash computing process is popularly known as “solving a difficult mathematical puzzle”.
So basically, all nodes are racing to become the first (i.e. winner) to compute the block hash. Because the winning node that computes the block hash first would get a reward. The reward depends on the specific blockchain use-cases. E.g., Bitcoin rewards a winner node with 12.5 BTC, plus all the transaction fees associated with the newly mined block.
Next, the node would broadcast the block containing the nonce that generated the block hash. Other nodes who also were racing to find the hash, would stop hashing and start validating the new block by checking if the nonce really generates the block hash that satisfies the target difficulty. If yes, the block becomes the latest block on their respective blockchain database copy, and the winning node would get to claim the reward. If the nonce produces a hash that fails to meet the target difficulty, the nodes discard the block and they resume their hash computing race.
The objective of Proof-of-Work
The main objective of the Proof-of-Work is to introduce a fixed time computational complexity when creating a new block. For the sake of understanding, let us see what happens in absence of the PoW. As discussed earlier, after data manipulation, an attacker first needs to recompute all the hashes starting from the target block until the latest block in the chain. Next, the attacker also needs to convince at-least 51% nodes in the network to accept his new manipulated blockchain. Without PoW, the attacker can convince if he can recompute all the hashes and also generates more new blocks before any node can produce a new block.
In short, if the attacker possess a computer whose computing capacity is larger than the combined computing capacity of 51% of the nodes, then the attacker can produce a longest chain of manipulated blocks. All nodes will then accept this (manipulated) longest chain as the valid blockchain database. With a PoW in place, a single attacker cannot produce a longer chain faster than 51% of the nodes combined. Because, such a vast computing capacity is simply unavailable as of today.
Enormous amount of energy wastage is the main problem plaguing the Proof-of-Work model. Since only a single node wins the reward, other nodes energy consumption is total waste. To minimize energy wastage and keep the blockchain safe, alternative consensus protocol such as Proof-of-Stake (PoS) is gaining adoption.
In the Proof-of-Stake process, any node desiring to participate in the block creation should deposit a certain amount of crypocurrency as stake. Next, all the stakeholders elect a leader based on the highest stake deposited. The highest depositor has higher chance to become the leader and to create the block and earn a reward. If the leader cheats, then he would lose his entire deposit.
The main objective of the Proof-of-Stake is to introduce a large economic constraint to deter malicious actors from manipulating the database. That is, the cost of cheating will be significantly larger than the incentive obtained from cheating. Therefore, a rational node will have higher economic incentive for honesty compared to cheating.
Next, to avoid the highest depositor from always becoming the elected leader, the PoS algorithms utilizes following mechanisms
- Random selection.
- Coin-age based selection.
While a leader election is random in the former process, the latter process computes the age of the deposited coins. E.g., a node holding 100 coins in his wallet for 30 days has 3000 coin-age. After a block creation, the coin-age score is reset to zero and the depositor needs to wait for another 30 days to accumulate new coin-age score for his existing coins.
Ideally, the blockchain technology should be able to store any type of data (text, image or video). At present, most blockchain platforms store transactions-type data that strongly requires: permanency, tamper-proof, and privacy. E.g., financial transactions or government records. But there are few blockchain-based platforms such as StorJ which offers file storage service, and Slate that offers Blockchain Video on Demand service. For now, it is out of the scope of this article to discuss on how these platforms store large files using blockchain technology.