The author mentions it's more than just parquet which duck "just works" with. They're right. Here's a single query against postgres, parquet, json, csv, and Google Sheets:
ATTACH 'postgres://your_username_password_host:5432/postgres' AS pgdb (TYPE postgres, READ_ONLY);
with interests as (
select *
from
read_parquet('https://storage.googleapis.com/duck-demo-data/user_interests.parquet')
), user_preferences as (
select *
from read_json_auto(
'https://storage.googleapis.com/duck-demo-data/user_preferences.jsonl',
format = 'newline_delimited',
records = true
)
), user_details as (
SELECT * FROM pgdb.user_details
), users as (
select *
from read_csv_auto('https://storage.googleapis.com/duck-demo-data/users.csv')
), one_more_thing as (
SELECT *
FROM read_csv_auto(
'https://docs.google.com/spreadsheets/export?format=csv&id=1O-sbeSxCpzhzZj5iTRnOplZX-dIAiQJAeIO0mlh2kSU',
normalize_names=True
)
)
select
users.user_id,
users.name,
interests.interest,
user_preferences.theme,
user_preferences.language,
user_details.hobby,
one_more_thing.one_more_thing
from users left join
interests on users.user_id = interests.user_id left join
user_preferences on users.user_id = user_preferences.user_id left join
user_details on users.user_id = user_details.user_id left join
one_more_thing on users.user_id = one_more_thing.user_id
I feel duckdb is in serious risk of becoming the load bearing glue that holds everything together in some cursed future software stack. A sort of curl-of-data-integration.