DuckDB is an open-source, in-process, column-oriented relational database management system designed for high-performance analytical workloads.
Overview
- Developed by Mark Raasveldt and Hannes Mühleisen at the Centrum Wiskunde & Informatica (CWI) in the Netherlands, first released in 2019.
- Provides a rich SQL dialect with support for complex queries, window functions, nested data types, and extensions.
- Designed to be fast, reliable, portable, and easy to use for analytical queries on large datasets.
- Available as a standalone CLI application and has clients for Python, R, Java, Node.js, and others.
- Runs inside the host process (in-process) instead of a traditional client-server model.
- Uses a vectorized query processing engine and columnar storage for high performance.
Features
- Simple installation with zero external dependencies, runs on Linux, macOS, Windows, and various architectures.
- Supports reading and writing file formats like CSV, Parquet, and JSON from local or remote (e.g., S3) storage.
- Offers parallel execution and can process larger-than-memory workloads efficiently.
- Extensible with third-party features like new data types, functions, file formats, and SQL syntax.
- Open-source under the MIT License.
Usage and Support
- Used by companies like Facebook, Google, and Airbnb for analytical workloads.
- DuckDB Labs provides commercial support and consulting services from the core contributors.
- Latest stable release is v1.0.0 as of June 3, 2024.
In summary, DuckDB is a high-performance, open-source analytical database designed for in-process querying of large datasets, with a rich SQL interface and support for various data formats and programming languages.