What this tutorial is about
We are going to setup an architecture combining the Big-Data computing framework Apache Spark with a sharded Apache Solr-Cloud. After that we will learn how to start Spark tasks from Java without Spark having any knowledge of the task itself.For that purpose we will install three virtual machines using VirtualBox.
The following image briefly illustrates our goal.
Technologies used
For this tutorial we will need to get in touch with the listed technologies.- Apache Spark 1.4.0
- Apache Solr 5.2.1
- Zookeeper 3.4.6
- VirtualBox 4.3
- Ubuntu Server 64bit 15.04 (on each of our VMs)
- Java 8 (openjdk-8-jdk)
- Maven