Part 1 - Hardware
If you work with Cluster- / Grid- or Cloud technologies like Mesos, Spark, Hadoop, Solr Cloud or Kubernetes, as a developer, architect or technical expert, you need your own private datacenter for testing and developing. There are several ways to build such an environment, each with its own drawbacks. To test real world scenarios like a failsafe and resilient Zookeeper cluster or a clustered Spark/Hadoop installation, you should have at least three independent machines. For the installation of Mesos/DCOS it is recommended that you have five machines in minimal setup.
There a several ways to build such an environment, each with it own drawbacks:
1) A virtualized environment running on a workstation laptop or PC
You can easily create a bunch von virtual machines and run them an a desktop or workstation. This approach works fine, is fast and cheap but has some problems:
- Your laptop may have only 16 Gigabyte of RAM - so each VM could get only 2-3 Gigabyte. For frameworks like Apache Spark which heavily uses caching this does not work well.
- The performance of a virtualized environment is not predictable. The problem is that some resources like disk, network or memory access are shared between all VMs. So even if you have a workstation with an octa-core Intel Xenon processor, IO will behave different.
2) A cloud environment like the AWS EC2
This is the way most people work with these technologies but has also some specific disadvantages. If you experience any performance problem, you are likely not able to analyze the details. Cluster software is normally very sensitive in terms of network latency and network performance. Since AWS can't guarantee that all your machines are in the same rack, the performance between some nodes can differ.
3) A datacenter with real hardware
You can build your own cluster but it is normally far too expensive. But even if you can afford real server hardware, you will have the problem that this solution is not portable. In most enterprises, you will not be allowed to run such a cluster. For testing and development it is much better when you have your own private cluster like your own laptop.
So what is a feasible solution?
I decided to build my own 4 node cluster on Intel NUC mini PCs. Here are the technical facts:
- NUC 6th Generation - Skylake
- Intel Core I5 - Dual Core with Hyper-threading
- 32 GB DDR4 RAM
- 256 GB Samsung M.2 SSD
- Gigabit Ethernet
The Intel NUC has to be equipped with RAM and a M.2 SSD disk. All these parts have to be ordered separately.
This gives you a cluster with amazing capabilities
- 16 Hyper Threading Units (8 Cores)
- 128 GB DDR4 RAM
- 1 TB Solid State Disk
Since I needed a portable solution, everything should be packed into a normal business case. I found a very slim aluminium attaché case at Amazon with the right dimensions to include the NUC PCs and the network switch.
I decided to include a monitor and a keyboard to get direct access to the first node in the cluster. The monitor is used for visualization and monitoring when the software runs. I ordered a Gechic HDMI monitor which has the right dimensions to include the monitor in front of the case.
The NUC package includes screws for mounting. This also works in such a case when you drill a small hole for each screw. For the internal wiring you have to use flexible network cables. Otherwise you will get problems with the wiring. You also have to have a little talent to mount connectors for power and network in the case, but with a little patience it works.
You can see the final result here:
This case will be my companion for the next year on all conferences, fairs and even in my office. The perfect presenter for any cluster / cloud technology.
In the next part I will describe how to get a DCOS, Solr/Spark/Zeppelin Cloud installed and what you can do on top of such a hardware.