May 18, 2017

ApacheCon / Apache BigData - Day 2

Here is my conference coverage for ApacheCon and Apache BigData NA 2017 day 2. See day 1 coverage here.

Apache Ignite
Like last year in Vancouver Apache Ignite is again a big thing. It's really an amazing piece of technology. Here's the feature puzzle of Apache Ignite:
At the conference the following Ignite topics were covered for the lately released version 2.0:

SQL Grid
Ignite supports ANSI SQL 99 compliant access to the data within a memory grid. It supports even the tricky things like (distributed) joins and groupings and full-text search within the data model and geo-spatial qeries. The data is always consistent and transactions are ACID. Even if Ignite acts as an read-through/write-through cache for a relational database. This is a very interesting use case as this allows Ignite to act as an caching SQL proxy in front of an relational database. Ignite SQL can be accessed by an own JDBC and ODBC driver as well as by the Ignite SQL API. The relational data model within Ignite can be described and modified with SQL DDL and DMLs as well as by code annotations and XML configuration. The relational data model can also be imported from relational databases. Indexes are stored in-memory (off-heap) as B+ trees.

Streaming
With data streamers you can import data into an Ignite Cluster as stream with automatic partitioning support. Prebuilt data streamers for Kafka, RocketMQ, sockets, JMS, MQTT and others are available. The processing side are continuous SQL queries on sliding windows.

Web Console
There is a web console for Apache Ignite available for query execution, result visualization and monitoring. It also provides a schema import wizard from relational databases. 

File System
Ignite provides an in-memory file system which implements the Hadoop FileSystem API. So it can be used as a HDFS or Alluxio replacement for {Hadoop, Spark, Flink}. In this scenario it can also act as an caching layer between {Hadoop, Spark, Flink} and real (and persistent) HDFS. 

Ignite 2.1
Ignite 2.1 will be released within the next months. The big new thing will be an own high-performance persistent storage implementation to be able to provide durable scenarios without relying on external persistent storage solutions.

Btw.: Ignite claims to be way faster than Hazelcast and an Ignite book has just being completed.

Presto
When it comes to interactive analysis of big data Facebook's Presto seems to be the jack of all trades. It supports full ANSI-SQL (including joins) has its own JDBC driver and Tableau web connector and can connect to various data sources like files within HDFS in formats like Parquet and ORC as well as other persistent storages like Cassandra, Hive, PostgreSQL, and Redis. Presto can be enhanced by UDFs and provides enterprise-grade features like Kerberos and LDAP authentication and secured cluster-internal communication. Presto is maintained by a solid community and has a broad user base. There's also a nice web interface for Presto available from Airbnb. Beside Facebook also Teradata contributes to Presto with about 20 developers and provides an own Presto distribution with enterprise support available.

IoT
Apache is very busy in providing an open source IoT stack on top of mynewt, an real time operating system (RTOS) for low-level devices (Cortex M0-M4, MIPS, RISC-V) with included device management features like build and package mangement, remote firmware upgrade, secure bootloader and signed images.


Incubating Edget provides analytics capabilities at the edge from the cloud to the IoT fog.

No comments:

Post a Comment