Givre Charles, Rogers Paul. Learning Apache Drill: Query and Analyze Distributed Data Sources with SQL

pdf file
size 9,93 MB

added by bookman_72 11/29/2018 08:00
info modified 08/04/2022 19:10

Givre Charles, Rogers Paul. Learning Apache Drill: Query and Analyze Distributed Data Sources with SQL

O’Reilly Media, 2018. — 347 p. — ISBN: 1492032794.

Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster.

At its core, Apache Drill is a SQL engine for big data. In practical terms, what this means is that Drill acts as an intermediary that allows you to query selfdescribing data using standard ANSI SQL. To use a comparison from the science fiction series Star Trek, Drill acts as a universal translator for your data and enables you to use SQL to interact with your data as if it were a table in a database, whether it is or not. Bringing this down to earth, Drill enables an analyst, armed only with a knowledge of SQL or a business intelligence (BI) tool such as Tableau, to analyze and query their data without having to transform the data or move it to a centralized data store.