Mar 18, 2016

Building a Solr-, Spark-, Zookeeper-Cloud with Intel NUC PCs


Part 1 - Hardware

If you work with Cluster- / Grid- or Cloud technologies like Mesos, Spark, Hadoop, Solr Cloud or Kubernetes, as a developer, architect or technical expert, you need your own private datacenter for testing and developing. There are several ways to build such an environment, each with its own drawbacks. To test real world scenarios like a failsafe and resilient Zookeeper cluster or a clustered Spark/Hadoop installation, you should have at least three independent machines. For the installation of Mesos/DCOS it is recommended that you have five machines in minimal setup.
There a several ways to build such an environment, each with it own drawbacks:

1) A virtualized environment running on a workstation laptop or PC

You can easily create a bunch von virtual machines and run them an a desktop or workstation. This approach works fine, is fast and cheap but has some problems:
  1. Your laptop may have only 16 Gigabyte of RAM - so each VM could get only 2-3 Gigabyte. For frameworks like Apache Spark which heavily uses caching this does not work well. 
  2. The performance of a virtualized environment is not predictable. The problem is that some resources like disk, network or memory access are shared between all VMs. So even if you have a workstation with an octa-core Intel Xenon processor, IO will behave different.

2) A cloud environment like the AWS EC2

This is the way most people work with these technologies but has also some specific disadvantages. If you experience any performance problem, you are likely not able to analyze the details. Cluster software is normally very sensitive in terms of network latency and network performance. Since AWS can't guarantee that all your machines are in the same rack, the performance between some nodes can differ.  

3) A datacenter with real hardware

You can build your own cluster but it is normally far too expensive. But even if you can afford real server hardware, you will have the problem that this solution is not portable. In most enterprises, you will not be allowed to run such a cluster. For testing and development it is much better when you have your own private cluster like your own laptop. 


So what is a feasible solution?

I decided to build my own 4 node cluster on Intel NUC mini PCs. Here are the technical facts:
  • NUC 6th Generation - Skylake
  • Intel Core I5 - Dual Core with Hyper-threading 
  • 32 GB DDR4 RAM
  • 256 GB Samsung M.2 SSD
  • Gigabit Ethernet
The Intel NUC has to be equipped with RAM and a M.2 SSD disk. All these parts have to be ordered separately.

This gives you a cluster with amazing capabilities
  • 16 Hyper Threading Units (8 Cores)
  • 128 GB DDR4 RAM
  • 1 TB Solid State Disk
Since I needed a portable solution, everything should be packed into a normal business case.  I found a very slim aluminium attach√© case at Amazon with the right dimensions to include the NUC PCs and the network switch.




I decided to include a monitor and a keyboard to get direct access to the first node in the cluster. The monitor is used for visualization and monitoring when the software runs. I ordered a Gechic HDMI monitor which has the right dimensions to include the monitor in front of the case.





The NUC package includes screws for mounting. This also works in such a case when you drill a small hole for each screw. For the internal wiring you have to use flexible network cables. Otherwise you will get problems with the wiring. You also have to have a little talent to mount connectors for power and network in the case, but with a little patience it works. 

You can see the final result here:



This case will be my companion for the next year on all conferences, fairs and even in my office. The perfect presenter for any cluster / cloud technology. 

In the next part I will describe how to get a DCOS, Solr/Spark/Zeppelin Cloud installed and what you can do on top of such a hardware.

Have fun. 

Johannes Weigend

Mar 11, 2016

KubeCon 2016: Recap

All things Kubernetes: KubeCon 2016 in London (https://kubecon.io) revealed how attractive Kubernetes is to the community and how fast Kubernetes and its ecosystem are emerging. Evidence: 500 participants, most of them using Kubernetes in dev & production; impressive stats of the open source project and community; and profound talks reflecting real life experiences. My takeaways:

Kubernetes Roadmap 

By the end of march Kubernetes version 1.2 will be released with the following highlights:
  • New abstraction "Deployment": A deployment groups pod/rc/service definitions with additional deployment metadata. A deployment describes the desired target state of an application on a k8s cluster. When a deployment is applied k8s drives the current cluster state towards the desired state. This is performed on the server side and not on the client side (unlike in k8s < 1.2).
  • ConfigMaps & Secrets: Kubernetes can now handle configuration files & parameters as well as secrets and certificates cluster-wide. It stores the configs inside of etcd and makes it accessible through the k8s API. The configs are exposed by mounting the files into the pods (as tmpfs) and via env vars. They can also be referenced in the YAML files. They are updated live and atomically.
  • Brand new web UI: The Kubernetes Dashboard.
  • Improved scalability and support for multiple regions.
  • Better support for third-party extensions.
  • DaemonSet to better support the Sidekick pattern.
In about 16 weeks there'll be Kubernetes 1.3 with:
  • Better support for legacy applications with mechanisms like IP persistence.
  • Cluster federation (project Ubernetes) to join multiple k8s clusters together.
  • Further improved scalability.
  • Cluster autoscaling (automatically acquiring & releasing resources from the cloud provider).
  • In-Cluster IAM (LDAP / AM integration).
  • Scheduled jobs to better support batch processing on k8s.
  • Public cloud dashboard for Kubernetes-as-a-Service scenarios.
  • ... and more to come / to be discussed in the community.

Hot topics

The hot topics in my opinion were:
  • Higher-level abstractions & tools: Despite Kubernetes is a great advance in bridging the gap between devs and ops, there is the need for higher-level abstractions & tools - especially for the devs (cite: "Kubernetes should be an implementation detail for devs"). This is addressed by k8s itself (deployment abstraction) as well as by different approaches like kdeploy (https://github.com/kubernauts/kploy), Puppet Kubernetes (https://forge.puppetlabs.com/garethr/kubernetes), dgr (https://github.com/blablacar/dgr) or DEIS (http://deis.io). From a high-level point of view the community is putting the bricks on Kubernetes towards PaaS.
  • Continuous Delivery: Kubernetes is an enabler of continuous delivery (CD) and developing cloud native applications on k8s demands for CD. There were some industrial experience reports on using Kubernetes as execution environment for their CD workflows. Kubernetes handles scaling the CI/CD server as well as the application itself. Best practice here is to separate different applications and stages by using k8s namespaces and to use ChatOps tools like Hubot (https://hubot.github.com) to provide fast feedback to the devs & ops.
  • Stateful services: Kubernetes is great in running stateless Microservices. But a lot of applications have to deal with (persistent) state. But how to run stateful services and even databases on Kubernetes without loosing its benefits or even loosing data in case of a re-scheduling? K8S to the rescue! The answer is persistent volumes providing cluster-wide non-ephemeral storage. A couple of different cluster storage providers are available for persistent volumes in k8s: More classic ones like NFS and SCSI; cloud native ones like GlusterFS and Ceph; cloud provider specific ones for GCE and AWS and storage abstraction layers like Flocker. The competition is open!
  • Diagnosability: As applications and infrastructure is getting more and more fine-grained and distributed with platforms like k8s the problem of diagnosing failures and optimization potentials arises. Time for cluster-aware diagnosis tools like sysdig (http://www.sysdig.org), Scope (https://github.com/weaveworks/scope) and Kubernetes Dashboard (https://github.com/kubernetes/dashboard)! 
Learn more about the Kubernetes and other cloud native technologies on April 21, 2016 at our Cloud Native Night Meetup (RSVP) taking place in Mainz beside the JAX conference (http://www.meetup.com/cloud-native-night).