Let’s try and understand why we should care more about this in the first place. This requires a little bit of historical explanation.
We all know Moore’s Law. It says the number of transistors on integrated circuits doubles approximately every two years. The prediction has been working pretty well since late ‘70. Couple of things started changing from the middle of last decade (around 2005).
- Moore’s law started failing, since hardware makers found a limitation in hardware capability for increasing the clock speed, and,
- Secondly there has been an exponential growth of data from around that time.
First part will help us understand our current topic of discussion and the second point will lead to another interesting discussion, about the rising trajectory of NoSQL landscape and what was the problem with traditional RDBMS ? Will discuss the second part later.
…Unable to increase the clock speed, companies like Intel started adding multi core processors in the same machine. And we got Dual Core, Quad Code … machines. Now the current existing languages, like C, C++, java are designed to use threads to handle concurrency. Now parallelism is different from concurrency. Simply put, concurrency is how we handle multiple request-response and Parallelism is sharing a large CPU intensive work with multiple processors. That’s a different problem, that, even with the threaded model, it’s difficult to write thread safe code that works over time. Livelocks, deadlocks become part of daily affair in maintaining a large application written with threaded code. This is one of the reasons I like Node.js so much; Concurrency is handled by event loop and you don’t have to worry about Locking and synchronization.
Mutability becomes nightmare when you have to share your mutable code. So, if something do not change and you share it, you do not have to protect it, which means you don’t have to worry about safety, synchronization if you share immutable code. This is one of the great aspects of functional programming. Immutability. This is what make your code run in multiple processors. It’s not free, but it’s trivial to make your code run on multiple core. Generally it’s achieved with immutable collection of Objects in Scala.
Look at this code in Scala
1. val list = (1 to 100000).toList
2. list.map(_ + 42)
To make the operation run in parallel, one must simply invoke the par method on the sequential collection, list. awesome !!
1. list.par.map(_ + 42)
Another important aspect of FP is functions are first class; they are not second class citizens like in C++ or java. You can treat them as any another variable. Functions are pure, they exhibit idempotent behavior, side-effect free and functions are of higher-order. You can pass a function to a function and you can return a function. Closures are very much derived from this. You take an object and transform it to something else, you don’t change it. Monads !!
Scala harnesses all the power of functional programming and combines it with Object Oriented Programming. It’s a JVM language and fully interoperable with Java libraries.
SPARK is written with Scala and what scala does to your code in multi-core machine, SPARK does the same thing across machines in a cluster. Parallelism !!