Startup Imply Data Inc., which is promoting the use of the open-source Apache Druid real-time analytical database, today announced the third milestone in Project Shapeshift, its campaign to build a comprehensive toolset for developers to use in building applications on top of Druid.
Imply is a multicloud data platform for real-time data ingestion and visualization in event-driven and streaming applications such as ad brokering, network traffic data analysis, observability data monitoring, and application logs. Imply, which was founded by the creators of Druid, has raised over $215 million, including a $100 million late-stage round early last year.
The latest milestone, called schema auto-discover, enables Druid to create schemas on the fly without the need for advanced definition. A database schema is a “blueprint” that describes how data is represented and relates to other tables or other data models. The company said its auto-discovery feature can deliver the performance of a strongly typed data structure with schemaless flexibility.
“Data types are important for performance. Fast queries require strong data type definitions,” said David Wang, vice president of product. Schemas are commonly used in relational database management systems, but many of the new breed of NoSQL databases typically operate without them.
“We’re introducing the best of both worlds: the ability to have a strongly typed format but with a schema-less like ingestion,” Wang said. “This is a big deal because you no longer have to adjust and figure out what new column you’re adding or what new data type that needs to be created. It all happens seamlessly in the background.”
Despite the fact that added processing power is required to define a schema, Imply said it has come up with an approach that doesn’t affect performance. Druid is based on a columnar data store, which provides the best performance in analytical scenarios. It stores data in chunks called segments, said Vadim Ogievetsky, co-founder and chief experience officer.
“We basically find new columns in your data, look at them, figure out the ideal type for them and then add them to the column set,” he said. “You don’t have to define anything in advance. The fact that we can do that without taking a step back on performance is really the crux of the announcement.”
The first three Shapeshift milestones were the release of a cloud-native database service followed by a query engine and now automated schema generation. Imply expects to release a fourth unspecified component late this year.
“That’s likely getting towards the end of the Shapeshift initiative,” Wang said. “It has completely transformed Druid’s core architecture and how developers build applications.”