Dremio Corp. is adding a data catalog to its self-service data analytics platform in a major release announced today. The company is also incorporating new controls for multi-tenant deployments, boosting end-to-end data encryption, offering options to run inside software containers and adopting Gandiva, an open source performance-enhancing library for the Apache Arrow distributed query engine upon which the company’s namesake product is based.
Apache Arrow uses columnar in-memory analytics to boost query speeds up to 100-fold over conventional analytics engines. The technology is similar to that which Google LLC uses to deliver sub-second response times to search queries, but Dremio is optimized for analytical operations.
The data catalog in Dremio 3.0 isn’t a bid by the company to compete with the many existing enterprise data catalogs but rather is focused on capturing and organizing data for use in Dremio. Data catalogs are used to create an inventory and descriptions of data assets within an organization. Dremio has added a crowdsourcing element in the form of a shared wiki page accompanying each data set that can be used for meta-tagging and description.
Security gets a boost in this version with the addition of end-to-end Transport Layer Security, a successor protocol to Secure Sockets Layer. While Dremio had encryption features in earlier releases, they did not span the full data access spectrum. The platform now also supports Amazon Web Services Inc. EC2 instance profiles for secure access to AWS S3 storage. Native integration with Apache Ranger is also new in this release.
The new multi-tenant features enable data engineering teams to manage and optimize cluster resources across a variety workloads and users, company said. Workload management policies written in SQL can be applied for such tasks as resource allocation, query admission and timeouts.
“Most data analytics platforms treat all users the same, which means you have to provision different clusters for different users,” said Chief Marketing Officer Kelly Stirman. Dremio has added features that deliver “fine-grained controls over which users or resources get priority,” he said. For example, administrators can specify that an intern should never have priority access to the cluster outside of work hours.
Also new in this release is compatibility with the Kubernetes orchestration framework via Docker images and templates. Kubernetes can be used to deploy and manage large collections of software containers, which are mini virtual machines that include all the services needed to run an application. Dremio has added charts that are compatible with the open source Helm Kubernetes package manager for provisioning and scaling. “Helm is what the cool kids are doing these days,” Stirman said.
Gandiva, which was built by Dremio developers, combines the LLVM run-time compiler with an execution kernel for efficient evaluation of arbitrary SQL expressions on Arrow. It’s claimed to provide up to 100-fold improvements in speed on certain types of queries. “In general, the more complex the query the better a candidate it is for Gandiva,” Stirman said, “but every query will be improved.”
Dremio 3.0 is available immediately in both free community and paid enterprise editions.
Since you’re here …
The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.