Jul 24, 2015

Solr with Spark(s) - Or how to submit a Spark-Task which utilizes a Solr-Cloud from Java-Code


What this tutorial is about

We are going to setup an architecture combining the Big-Data computing framework Apache Spark with a sharded Apache Solr-Cloud. After that we will learn how to start Spark tasks from Java without Spark having any knowledge of the task itself.

For that purpose we will install three virtual machines using VirtualBox.
The following image briefly illustrates our goal.



Technologies used

 For this tutorial we will need to get in touch with the listed technologies.
  • Apache Spark 1.4.0
  • Apache Solr 5.2.1
  • Zookeeper 3.4.6
  • VirtualBox 4.3
  • Ubuntu Server 64bit 15.04 (on each of our VMs)
  • Java 8 (openjdk-8-jdk)
  • Maven