Very cool! One thing to keep in mind: DuckDB can directly query parquet files (a...

lukekim · on March 28, 2024

Yes, we're huge fans of DuckDB, Mark, Hannes and the team.

What we've found is sometimes you want to materialize data in an OTLP DB, so what Spice gives you is the choice to store some datasets in DuckDB and some in something like SQLite/PostgreSQL and join them together in a single SQL query, so you can get the best of both worlds.

riku_iki · on March 28, 2024

DuckDB can both read/write to PG. What exactly usecase you are unlocking?..

lukekim · on March 29, 2024

DuckDB is awesome. As an OLAP columnar-store database it excels at certain operations, like aggregations. If your use-case is row-based lookups where an OLTP database would perform better, you now get a choice of engine, while still having a single place to access your data from your app.

Originally, we only supported DuckDB in our cloud product Spice Firecache, but actually lost a customer because their use-case was optimized for an OLTP DB. Now, you can get a choice... down to the dataset level and still be able to join across them in a single query. With Spice, you can load both SQLite and DuckDB together in the same process for local materialization and acceleration.

Finally, Spice OSS does more than just data query. You can read about the vision to power AI-driven applications by co-locating data with models at https://docs.spiceai.org/intelligent-applications.

riku_iki · on March 29, 2024

> If your use-case is row-based lookups where an OLTP database would perform better, you now get a choice of engine, while still having a single place to access your data from your app.

my understanding is if you run some SQL in DuckDB against PG using extension, say select * from t where id = 2; it will perform actual lookup on PG server but results will be accessible in DuckDB.

> With Spice, you can load both SQLite and DuckDB together in the same process for local materialization and acceleration.

you can do this in any Py or Java or C++ or whatever program..

lukekim · on March 29, 2024

You're right, and that might be a good choice if you wanted to deploy and operate an additional PostgreSQL server locally.

## Using DuckDB:

app -> duckdb -> network -> remote postgres (data) | local postgres (materialization)

## Using Spice:

app -> localhost gRPC/HTTP -> [Spice <duckdb|sqlite>] -> network -> [postgres|S3|snowflake|etc]

In addition, Spice manages the materialization for you. In the DuckDB-only case, you'd have to do a COPY FROM [remote postgres] to [local postgres] manually every time, and manage the data lifecycle yourself. That gets even more complicated if you want to do append or incremental updates of data to your local materialization.

phillip-spice · on March 29, 2024

DuckDB is an in-process DB similar to SQLite - so every application in your stack would need to embed it. Spice is a binary that has Flight SQL and HTTP query endpoints - so multiple applications can connect to it from any language.