Before I run out the door for a badly needed haircut.
Just into the building out of college and into my first proper job, back in 1997 I sent the powers that be at EMC an email suggesting the company take a serious look at buying NUMA pioneer Sequent Computer Systems. I saw Sequent, which had attached itself to Oracle’s hip, as a great way for EMC to do real damage to IBM & Sun in what was an incredibly high margin business for them.
The business of running Oracle databases very very fast.
I did not receive a reply, a year or two later EMC bought Data General and dumped the Aviion server business while IBM bought up Sequent only to smother it within months of acquisition.
These days when I send such emails I tend to get an answer filled with reasons for not doing what I suggest. This is what’s called career progress.
I wouldn’t consider the above to be a bias but you should see where I’m coming from in what follows.
--Disclosure over, to the topic at hand.--
Neither scale up nor scale out are ‘better’ and I don’t accept any gyrations on the topic since it is the consideration of fault domains, availability requirements, infrastructure overhead and concurrency programming requirements are what should drive the decision between one or the other.
Lets (quickly) look at both approaches. I’m going to use cluster computing to represent scale out and SMP/NUMA (Symmetric Multiprocessing/Non-Uniform Memory Access) to represent scale up.
In a scale out cluster design a group of cluster nodes, each containing memory, CPUs, CPU cache and I/O devices are linked together via a low latency network and operate as a single entity. All the work to make that happen is done in software. This cluster design greatly enhances the availability, manages fault domains and can allow for hot addition of scalability to the overall system. But it does so by adding extra layers of software complexity as individual nodes are islands of compute and I/O separate from one another which need to be monitored and managed. Anytime they need to communicate they have to traverse the network. A network which while low latency is much much slower than CPU cores sitting on the same chip in the same socket.
In a scale up design you don’t get the hot add scaling and fault domains can hit you incredibly hard unless you go the physical partitioning route but all memory is global memory, all I/O is global I/O, when you add CPUs they don’t need to copy data across a network to one another to get work done they can simply mark it in global memory as something that needs to be worked on. All of this makes the programming model much less complex resulting in higher performance per system as you can pack them with CPU cores and when you’re writing software to use them you’re not spending time on synchronising cluster node operating system clocks against OS judder or the myriad of other things required to keep nodes synchronised and operational.
So instead of the most infantile of concepts of ‘the future verses the past’ you should approach scale up and scale out as engineering decisions.
There are substantial operational benefits to both. Different benefits for each of them. But both models are too different to be compared in a vacuum.
And lets not forget that cluster computing came about because the price of SMP/NUMA systems was considered prohibitively expensive, which it was because it was artificially inflated by the vendors of such systems to keep it a massively lucrative business for them.
If anyone says one is better than the other they could very well be right, but ask them the problem they’re looking to solve before you take it on faith.