As the internet obtains petabytes and petabytes, software systems have to scale in order to handle the increased stress of data. People are also exptecting these systems to deliver output in a quick amount of time while also being accurate. On the side, many companies are expecting incoming software engineers to have the skills to be able to construct effective systems for their use case. We’ll look briefly on scaling with distributed Systems.w

How do I scale a software system?

There are two ways to have a system handle more data. One is to upgrade the computing power of the singular computer (monolith) running the software. This is called Vertical Scaling. The issue is that it becomes very expensive to keep upgrading computers more and more. It’s also been shown that the throughput returns of upgrading the compute power decrease and eventually become nil.

A cheaper version that most companies use is horizontal scaling. Companies can spread the load of input between many different computers and communicate the results with each other. This field of making computers that spread the load of computation, storage, or other functions is called Distributed Systems. Computers do not need to be the most premium, so they can be cheaper, and they have equal or better results than simply upgrading the power of the monolith.

CAP Theorem and proofs

The complexity of Distributed Systemse comes in the communication. Specifically, how should computers act if two computers have a network failure, or a computer shuts downs. Others in the network have no way of knowing if something bad has happened or if messages are traveling much slower than usual in the network. Theoretical Computer Scientists have figured out an interesting property if network failures occur in something known as the CAP Theorem. It says that given a network failure, a distributed database can provide two of the following: consistency, availability, or partition tolerance.

consistency - Computers having information that is comopletely correct with each other.

availability - Computers having information always available (but it may not be consistent with other computers)

partition tolerance The ability for the system to operate through a failure.

You can only have at most 2 when servers start failing. Partition Tolerance must always be used in the real world, otherwise the system stops working, so software engineers must take the choice to have consistency or availablity. Consistency ensures that information is never disagreeing with other computers, but that may mean that information that we search for is not available to us. Availability allows information to always be available but it may not be correct or the most up to date. A person may want availability if information just has to be present, but not necessarily the most current. This could be the youtube feed. A database holding people’s financials will want to be consistent because if two computers are saying different things about someone, that could lead to fraud charges even though the person did nothing wrong. Of course, these failures will not happen most of the time, but people have to think about them for when they do happen.

Thoughts

Distributed Systems are interesting as they are increasingly becoming used. The cloud is a specified field of Distributed Systems. Blockchains are completely distributed. It will be valuable for engineers to learn how these systems work and their internals. I hope to learn more about them, and actually build them in the future. I unfortunately cannot say too much about them since I am just a novice, but it would be beneficial to study for me and for your future.