The dictionary meaning of shard is a small part of something. Sharding is a process of dividing a big database into smaller portions so that they can be easily managed and updated. It was first introduced in MMORPG or Massively Multiplayer Online Role-Play Games in 1999. Since these games had massive traffic, the huge database was divided into different segments which were portrayed as different scenes or landscapes to the players. This was done to manage the heavy traffic all at once by splitting the players across different servers.
Essentially, sharding is just another name for horizontal partitioning. In horizontal partitioning, the rows of a database are split, instead of the columns. This reduces the index size and thus results in better search performance. The basic purpose behind this is to be able to manage a vast database. The transactions per unit of time cause the size of a database to increase linearly and the response time to increase exponentially. Clearly, more response time means slow output, which can only be dealt with with the help of more expensive hardware.
All data-driven applications and websites growing at a significant pace will need to scale at some point. The huge amount of data will need to be securely accommodated in such a way that it can be easily accessed. This scaling has to be dynamic to keep up with the fluctuations in the future.
Therefore, we can say that the cost of maintaining such a large database in one place becomes higher. On the other hand, if sharding is performed, the database can be divided into smaller segments which can be managed individually from several places with much less expensive hardware. A common example in terms of business can be to create data shards of a customer database as per geographic locations. This will make the data of customers in one location be put together on a unique server. So, instead of going through a huge customer database for certain information, smaller segments of the database will have to be processed.
Sharding in terms of Blockchain
Sharding, in a database, has been used for a long time. It is quite simple as the developers just have to create a separate database structure that can operate for their given use case securely. In a blockchain, however, sharding becomes a little complicated.
In a blockchain network, the data is a block, and each block contains its own hash and the hash of the previous block. So, each block is related to the previous block. It is also thought of as a database with nodes that represent the data servers. The essence of blockchain lies in the fact that it is decentralized. Taking any measure to compromise the decentralization will result in weaker security. Then why perform sharding of blockchains at all?
It’s because blockchain networks face a serious and recurring issue, called bloat. Bloat is the challenge of storing massive amounts of data or blocks permanently on the chain. In other words, a blockchain has to be scalable to accommodate all this data, which is increasing by the second. Now, this causes a problem related to scalability as well as response time.
The idea of sharding in blockchains is to overcome these issues. This can potentially be achieved by separating it into small manageable segments. Sharding applied to the blockchain will cause the network to separate into individual shards, each containing a unique set of account balances and smart contracts. Nodes then assigned to individual shards will verify operations and transactions. These nodes are now only responsible for their own transactions. This is better than each one verifying all transactions on the entire network.
This will not only divide a larger blockchain into smaller segments but also increase the speed because one node won’t have to verify all transactions in the network; there will be designated nodes for each verification.
For example, the bitcoin blockchain could initially only perform 3-7 tractions per second (TPS), and the Ethereum blockchain could handle 12-30 TPS. If we compare these with VISA’s speed, which is 24,000 TPS, we realize that there is a massive difference. Ethereum is said to have over 8000 computers in the network, each of which is lending a certain hash power to the network.
We can infer that increasing the number of computers won’t necessarily increase the processing speed. The whole register is kept in each of the computers, which causes the process of verification to be a lot slower. Instead of linear execution, the process of parallel execution can be way more beneficial in this respect. Multiple computers will be performing only the designated computations parallelly. This will allow multiple transactions to be processed at the same time.
There are quite a few challenges that are faced because this technology is still somewhat in the developmental stage. Changes are being made by developers to get the best of this technology. Some of the major challenges are mentioned below.
First and foremost, sharding a blockchain is very risky. All blocks, in a blockchain, are linked to the block preceding them, and they also include their hashes. So how a blockchain should be sharded depends on its underlying consensus mechanism. A Proof-Of-Work blockchain, for example, is very hard to shard. The transactions in it have to be validated, but their entire transaction history isn’t available. Hence, new transactions will have to be validated without any knowledge of history.
Secondly, shards are often exposed to security threats. If a hacker can take over the majority of blocks, then these blocks can easily be manipulated in a lot of ways.
Lack of communication across different shards is another major problem faced. There has to be a mechanism that can help establish communication, but this adds a separate layer of complication for the developers.
Troubles with Sharding
Communication and security are the two sectors where sharding is at a disadvantage. When shards of a blockchain are created, each shard behaves as an individual blockchain network. All these individual networks have no way of communicating with each other. Moreover, the users and applications of one network cannot communicate with the users and applications of another network. This communication can only be achieved using a special inter-shard communication mechanism, which adds an extra layer complicated code for the developers to create.
In a sharded blockchain, security is also a major concern. Since a huge blockchain is broken down into smaller subdomains, the hash power of the blocks also decreases. Therefore, it becomes easier for hackers to take over a single segment and manipulate it as per their desire. This is known as a single-shard takeover attack or a 1% attack, i.e., it takes only 1% of the network hash rate to dominate the entire shard in a 100 shard network. Once this attack takes place, the manipulation by the hackers can lead to the submission of invalid transactions to the main network or loss of these transactions permanently.
Ethereum proposes a potential solution for this problem by using random sampling. In this method, a shard notary is randomly appointed to discrete segments to verify block authentication.
Alternatives to Sharding
Developers recommend two alternatives to improve performance and transaction speed. The first one is to increase the block size to fit more data into one block. This will ensure that less time is required to perform a greater number of transactions, i.e., higher TPS. Although it solves the performance and speed issues, a new one is brought forward. To verify a bigger block, a device, with more computation power, will be required. Only such devices will qualify as nodes.
This will increase the cost of nodes and will also result in smaller node pools. These node pools will be more centralized and will result in vulnerability to a 51% attack. In this attack, the hacker is able to control the majority of the network hash rate to manipulate transaction history. Splitting the community will also be required to increase the block size. This will result in a hard fork, and if everyone doesn’t upgrade to the new blockchain, there will be an existence of two such blockchains using two different types of coins. These new and old versions will both contain a transaction history, much like parallel worlds but with no connection with each other.
Even though this alternative does solve the previous issues, it poses some new ones and so, increasing the block size won’t qualify as a long-term solution. The second alternative suggested by the developers is an altcoin.
An alternative coin or altcoin is a substitute for bitcoin. It makes possible the execution of different applications on their own chains and with their own coins. The advantage is that only one blockchain won’t be overloaded and this will result in better performance. The hash power will be distributed among different blockchains, which will, however, result in increased vulnerability of the network to security threats. The hash power needed for a 51% attack to take place will be much less, and thus, the hacker can easily hack into the network. Although this solves the performance issue, the security of the network is at risk. Hence, even this alternative isn’t a suitable option.
Sharding vs Sidechains
Sharding is implemented in the base-level protocol of a blockchain and is therefore called a layer-1 solution. It divides the main blockchain into smaller, and individual blockchains called shards. On the other hand, a sidechain is a separate blockchain, but it is connected to the main blockchain using a two-way peg.
Sidechain has been around for not a very long time; it was introduced in 2014 in a research paper by Dr. Adam Back. According to his paper, sidechains can allow digital assets to be used from one blockchain in a different blockchain. Moreover, they can also be moved back into the original blockchain. Sidechains are also referred to as child chains by some blockchain projects like Ardor, MOAC, etc.
This sounds like a far better technology than sharding. Aelf is a blockchain project which is making using of the sidechain technology. It has a multi-chain parallel computing blockchain. Some of its unique features are:
- It solves scalability issues. The parallel processing reduces the latency rate.
- The resource segregation to be implemented will allow each smart contract to run on its own blockchain.
Advantages of Sharding
The faster transaction rate leads to an all-over better experience for the users. Since there has been a rise in the number of applications that require constant communication with the database, it has become essential to maintain security and speed. Sharding is a great technology in keeping up with both of these aspects.
Scalability has also become the need of the hour as the amount of data is increasing every second. Some apps might be doing fine right now, but over time, they are bound to have a lot of data to handle. Thus, sharding is a viable solution here as it creates small segments of data which can be easily accessed.
It also eliminates the need for using machines with high computation power because the aggregated data is not stored only in one place. Since there are shards of data, therefore, a simple device can also work.
It is also easier to carry out maintenance in smaller shards than a huge database or blockchain.
Who uses sharding?
Some blockchains have started using sharding, but some are still developing it. There is more than just one approach to introduce sharding mechanism into a blockchain. A lot of factors determine which one will work well with a certain blockchain. The major reason for using sharding is scaling. Scaling is essential to keep up with the increasing number of nodes and make room for them. Sharding makes massive blockchains sustainable by segmenting them into shards.
Zilliqa, a secure and scalable public blockchain platform, was the first one to incorporate sharding. It was launched in 2017 and achieved a throughput of 2,828 TPS in its test net and can easily achieve 2,500 transactions at any point in time. Its striking feature is that high scalability is possible on it as a new shard is created every 600 nodes.
NEAR is another platform where sharding is used. It is a developer-friendly blockchain to test and deploy decentralized applications easily. It states that its sharding technology allows the nodes to stay small enough to run on simple cloud-hosted instances. These nodes can potentially be mobile devices as well in the future, which will make the whole process a lot easier.
Ethereum also plans on introducing sharding in its new update, Ethereum 2.0, which is to be launched in January 2020. Some other blockchain projects like Cardano, QuarkChain, and PChain are also moving towards sharding to solve the scalability issue.
The Future of Sharding
Distributed ledger technologies (DLTs) have taken up almost every industry. This also means that a lot of data is being stored and it still keeps adding. Some of these DLTs are doing fine now, but it might not be the case after a couple of years or even months. The information overload will be responsible for poor performance in the near future. If scaling isn’t done, these technologies will become so slow that it will take days or even weeks to load a blockchain.
Entering sharding is, as of now, the only viable option for all these problems. It balances a network’s load across all nodes and solves the blockchain trilemma.
The blockchain trilemma is that a blockchain can only possess two of the following three properties at a given time:
Sharding gained popularity recently after Facebook released more information on its Libra coin. It’s for Facebook’s financial services, which are expected to launch in 2020. Facebook plans on creating a more stable cryptocurrency so that the consumers are encouraged to use them for ordinary online transactions.
However, despite the benefits of sharding, its implementation in blockchains has been somewhat limited.