Skip to content

Conversation

steve-chavez
Copy link
Member

Addresses the problem on PostgREST/postgrest#1102

Adds an overloaded csv_agg function:

SELECT csv_agg(x, '|') AS body
FROM   projects x;

Addresses the problem on PostgREST/postgrest#1102

Adds an overloaded `csv_agg` function:

```sql
SELECT csv_agg(x, '|') AS body
FROM   projects x;
```
@steve-chavez
Copy link
Member Author

steve-chavez commented Aug 4, 2025

The added loadtest proves this variant has the same performance as csv_agg with a single arg: https://github.com/PostgREST/pg_csv/actions/runs/16736548331?pr=2

@steve-chavez
Copy link
Member Author

I'm not sure if this form:

SELECT csv_agg(x, '|') 

Will be the easiest to integrate with PostgREST Media Type Handlers -- maybe a tsv_agg without params is better. At any rate, we can add a new agg function later.

@steve-chavez steve-chavez merged commit d724494 into PostgREST:master Aug 5, 2025
11 checks passed
@coveralls
Copy link

coveralls commented Aug 5, 2025

Pull Request Test Coverage Report for Build 16736850387

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 12 of 12 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 97.368%

Totals Coverage Status
Change from base Build 16732377191: 0.1%
Covered Lines: 74
Relevant Lines: 76

💛 - Coveralls

@wolfgangwalther
Copy link
Member

I'm not sure if this form:

SELECT csv_agg(x, '|') 

Will be the easiest to integrate with PostgREST Media Type Handlers

What if we would support parameters to media types being mapped to these additional arguments? For example when I request text/csv;delimiter=|, this could be mapped to csv_agg(x, delimiter => '|'). For this, it would be good if the csv_agg function had named arguments, at least all except the first one.

That won't quite work with things like CREATE DOMAIN "text/tsv" ..., though. For these we'd probably still need a separate tsv_agg wrapper.

@steve-chavez
Copy link
Member Author

What if we would support parameters to media types being mapped to these additional arguments? For example when I request text/csv;delimiter=|, this could be mapped to csv_agg(x, delimiter => '|')

@wolfgangwalther That's a negative from pg side, look:

SELECT csv_agg(x, delimiter := E'\t') AS body
ERROR:  aggregates cannot use named arguments

Weirdly enough it does let me create the agg with named parameters:

create aggregate csv_agg(anyelement, delimiter "char") (
  sfunc     = csv_agg_transfn,
  stype     = internal,
  finalfunc = csv_agg_finalfn,
  parallel  = safe
);

Wrapping the aggs in another SQL function won't work because they wouldn't be aggregates no more.

Looks like the only choice is to make the tsv_agg and other wrappers.

@wolfgangwalther
Copy link
Member

SELECT csv_agg(x, delimiter := E'\t') AS body
ERROR:  aggregates cannot use named arguments

Urghs, really odd.

Weirdly enough it does let me create the agg with named parameters:

create aggregate csv_agg(anyelement, delimiter "char") (
  sfunc     = csv_agg_transfn,
  stype     = internal,
  finalfunc = csv_agg_finalfn,
  parallel  = safe
);

Technically, we don't need to actually call the function with these named arguments. If the aggregate can be created with them, we can do the catalog lookup and map the mimetypes with parameters to a csv_agg(x, E'\t') call, even without specifying the name on the call, by passing them in the right order and with the right type.

@steve-chavez
Copy link
Member Author

steve-chavez commented Aug 5, 2025

If the aggregate can be created with them, we can do the catalog lookup and map the mimetypes with parameters to a csv_agg(x, E'\t') call, even without specifying the name on the call, by passing them in the right order and with the right type.

Yeah, but that looks like much more work compared to just adding different aggs like tsv_agg. Considering we control pg_csv, we can just add those aggs.

Then integrating on PostgREST will be much simpler, just adding new builtin handlers here:
https://github.com/PostgREST/postgrest/blob/8fdd46c64bf7b999656cfc2584fd840502184e44/src/PostgREST/SchemaCache.hs#L1077-L1083

We already support geoJSON through PostGIS (ref), I think we need to make this more formal and detect some extensions on the schema cache. Right now we just fail at runtime for PostGIS, we can fail without hitting the db.

Once we have that, we can provide our builtin CSV handler as fallback and once the user does CREATE EXTENSION pg_csv, PostgREST will pick the new aggs/handlers. That's the best DX I think.

@steve-chavez
Copy link
Member Author

We already support geoJSON through PostGIS (ref), I think we need to make this more formal and detect some extensions on the schema cache. Right now we just fail at runtime for PostGIS, we can fail without hitting the db.

Looks like we'd need the same mechanism for postgrest-openapi too.

@wolfgangwalther
Copy link
Member

Once we have that, we can provide our builtin CSV handler as fallback and once the user does CREATE EXTENSION pg_csv, PostgREST will pick the new aggs/handlers. That's the best DX I think.

While I agree with the general direction, I would like to drop the builtin CSV handler eventually, when pg_csv is mature and available widespread.

@steve-chavez
Copy link
Member Author

I would like to drop the builtin CSV handler eventually, when pg_csv is mature and available widespread.

Yes, I would too, fully agree.

@steve-chavez
Copy link
Member Author

@wolfgangwalther I'm trying to integrate pg_csv into PostgREST to close PostgREST/postgrest#3627.

Since you've been doing some work on nixpkgs, I was wondering if you could help me.

On https://github.com/PostgREST/postgrest/blob/0f1ca8faac630518b343f195464beff47fe32c78/default.nix#L54-L61

I'm adding this

  postgresqlVersions =
    let
      pg_csv = pkgs.callPackage ./nix/ext/pg_csv.nix {};
      exts = p: [ p.postgis p.pg_safeupdate pg_csv ];
    in
    [
      { name = "postgresql-17"; postgresql = pkgs.postgresql_17.withPackages exts; }
      { name = "postgresql-16"; postgresql = pkgs.postgresql_16.withPackages exts; }
      { name = "postgresql-15"; postgresql = pkgs.postgresql_15.withPackages exts; }
      { name = "postgresql-14"; postgresql = pkgs.postgresql_14.withPackages exts; }
      { name = "postgresql-13"; postgresql = pkgs.postgresql_13.withPackages exts; }
    ];

Then pg_csv.nix is:

{
  stdenv,
  fetchFromGitHub,
  lib,
  postgresql,
}:

stdenv.mkDerivation rec {
  pname = "pg_csv";
  version = "0.1";

  buildInputs = [ postgresql postgresql.dev ];

  src = fetchFromGitHub {
    owner = "PostgREST";
    repo = "pg_csv";
    rev = "refs/tags/v${version}";
    hash = "sha256-N6dneoPQDEFg1cMvxARub+/xSAH86qihzZGlXTNE/hQ=";
  };
}

But that somehow doesn't find pg_config:

/nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash: line 1: pg_config: command not found
make: *** No rule to make target 'install'.  Stop.
error: builder for '/nix/store/dhczacqyy8ivdbn1lnwp4bcqysb43pp7-pg_csv-0.1.drv' failed with exit code 2;
       last 10 log lines:
       > no configure script, doing nothing
       > Running phase: buildPhase
       > build flags: SHELL=/nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash
       > /nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash: line 1: pg_config: command not found
       > cp sql/pg_csv.sql sql/pg_csv--0.1.sql
       > sed "s/@EXTVERSION@/0.1/g" pg_csv.control.in > pg_csv.control
       > Running phase: installPhase
       > install flags: SHELL=/nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash install
       > /nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash: line 1: pg_config: command not found
       > make: *** No rule to make target 'install'.  Stop.
       For full logs, run 'nix-store -l /nix/store/dhczacqyy8ivdbn1lnwp4bcqysb43pp7-pg_csv-0.1.drv'.

I tried using postgresqlBuildExtension but that also gave errors, how should I install an extension now?

@wolfgangwalther
Copy link
Member

I'm adding this

  postgresqlVersions =
    let
      pg_csv = pkgs.callPackage ./nix/ext/pg_csv.nix {};
      exts = p: [ p.postgis p.pg_safeupdate pg_csv ];
    in
    [
      { name = "postgresql-17"; postgresql = pkgs.postgresql_17.withPackages exts; }
      { name = "postgresql-16"; postgresql = pkgs.postgresql_16.withPackages exts; }
      { name = "postgresql-15"; postgresql = pkgs.postgresql_15.withPackages exts; }
      { name = "postgresql-14"; postgresql = pkgs.postgresql_14.withPackages exts; }
      { name = "postgresql-13"; postgresql = pkgs.postgresql_13.withPackages exts; }
    ];

This will always build pg_csv against the same postgresql version. Thus, this will not work for the different versions we test here. pg_csv needs to be added to the postgresql package set. We currently don't have a way to extend that from outside nixpkgs, though.

Then pg_csv.nix is:

{
  stdenv,
  fetchFromGitHub,
  lib,
  postgresql,
}:

stdenv.mkDerivation rec {
  pname = "pg_csv";
  version = "0.1";

  buildInputs = [ postgresql postgresql.dev ];

  src = fetchFromGitHub {
    owner = "PostgREST";
    repo = "pg_csv";
    rev = "refs/tags/v${version}";
    hash = "sha256-N6dneoPQDEFg1cMvxARub+/xSAH86qihzZGlXTNE/hQ=";
  };
}

But that somehow doesn't find pg_config:

pg_config is only available as postgresql.pg_config in nixpkgs today, so you'd need to add that to your nativeBuildInputs.

I tried using postgresqlBuildExtension but that also gave errors, how should I install an extension now?

IIRC, postgresqlBuildExtension is only available inside nixpkgs, not as a consumer of it.

Upstreaming pg_csv is the way to go.

@steve-chavez
Copy link
Member Author

Upstreaming pg_csv is the way to go.

Do you think it's fine to do that in its current state? I mean, I've tagged it as 0.1 mostly because it doesn't have many features, but it should be stable (test/loadtested + it's really simple too).

@wolfgangwalther
Copy link
Member

I'd say it's OK to upstream it as an optional dependency of PostgREST. This dependency is not encoded in the nix expression, but it's nevertheless there.

@steve-chavez
Copy link
Member Author

I'd say it's OK to upstream it as an optional dependency of PostgREST

I'd like it to be generally useful too. I think it already is convenient (I always forget the COPY syntax) but maybe we can make it more featureful/stable before upstreaming, we might avoid some breaking changes that way too. (also we can progress relatively quick here)

As I add more features (#4), I now understand the point Tom Lane made (ref) about csv_agg not being flexible enough.

The root problem is that we don't have named args as we found above, so adding multiple options becomes complicated for the user. CSV has many options, see https://specs.frictionlessdata.io//csv-dialect/ (this was shared on PostgREST/postgrest#1374 (comment)).

So maybe we should come up with a stable interface?

Using https://specs.frictionlessdata.io//csv-dialect/, a JSON interface would be something like:

SELECT csv_agg(x, '{"delimiter": ",", "header": true, "doubleQuote": true}') AS body
FROM   projects x;

(hstore would be more ergonomic, but unfortunately it requires doing CREATE EXTENSION hstore)

To reduce the json verbosity, maybe we could add an ENUM for the different boolean options:

SELECT csv_agg(x, '{"delimiter": ","}', CSV_HEADER_DQUOTE) AS body
FROM   projects x;

I kinda like the latter. I think that's the best we can do with the current limitation. Any thoughts?

(I'll have to check the perf of parsing the json, but I don't think it should matter.. I've been adding loadtests for the csv_agg variants and CI shows the added parameters are negligible in perf loss)

@steve-chavez
Copy link
Member Author

To reduce the json verbosity, maybe we could add an ENUM for the different boolean options:

I think the enum idea is good regardless of the 2nd parameter (maybe it can be an array too?). We already have the options of w/wo CSV header and w/wo BOM, so I can proceed with this.

@steve-chavez
Copy link
Member Author

steve-chavez commented Aug 6, 2025

Ohh, I think I found a much better option. We use a composite type:

create type pg_csv_options as (
  bom bool,
  header bool,
  delim text
);

create function csv_options(bom bool default false, header bool default true, delim text default ',') returns pg_csv_options as $$ 
  select (bom,header,delim)::pg_csv_options; 
$$ language sql;

create aggregate csv_agg(anyelement, pg_csv_options) (
  sfunc     = csv_agg_transfn,
  stype     = internal,
  finalfunc = csv_agg_finalfn,
  parallel  = safe
);

select csv_agg(x, csv_options(bom := true)) from projects x;

This looks really nice! I think that's the final interface. Then we can add new options without adding more parameters, and we can name them too!

@wolfgangwalther
Copy link
Member

It's really a shame that aggregates can't use named arguments...

Ohh, I think I found a much better option. We use a composite type:

That looks legit.

I'd use a bit more consistent naming:

  • looking at COPY: delim -> delimiter
  • type pg_csv_options -> csv_options (imho type and proc namespaces don't collide, so they can be the same name)

@steve-chavez
Copy link
Member Author

I'd use a bit more consistent naming:

Nice! Making that change on #5

@steve-chavez
Copy link
Member Author

steve-chavez commented Aug 9, 2025

I think there's a future where we could have pg_csv and postgrest-openapi inside a single postgrest-contrib; maybe later plmustache too, this for ease of installation (one single make && make install for users). PostGIS also has multiple extensions in a single repo: https://github.com/postgis/postgis/tree/master/extensions

@wolfgangwalther Maybe we should do the same? Would a single ext/postgrest-contrib.nix be simpler to maintain in nixpkgs?

@wolfgangwalther
Copy link
Member

While I think postgrest-openapi and other things could sensibly be in a single postgrest-contrib, I think neither pg_csv, nor plmustache belong there - these can be used independently of PostgREST, so they should not be in there.

@steve-chavez
Copy link
Member Author

I was thinking to do the csv_populate_recordset function that is needed for solving PostgREST/postgrest#1043, but the parsing is a bit complicated and will take a while to do it. So I'll upstream the extension to nixpkgs as it is for now.

@steve-chavez
Copy link
Member Author

@wolfgangwalther Released v1.0 🚀

I had an old clone of nixpkgs on my machine but somehow pulling is never finishing :/. I wonder if you could give me a hand in adding pg_csv on nixpkgs and putting me as a maintainer there? No problem if not, I'll try again tomorrow (getting late here 😪 ).

@wolfgangwalther
Copy link
Member

If nothing else works, you can also do a shallow clone of nixpkgs!

@steve-chavez
Copy link
Member Author

@wolfgangwalther Cool, did the PR on NixOS/nixpkgs#439223. So far just checked it works by doing:

$ nix-build -A postgresqlPackages.pg_csv

@steve-chavez
Copy link
Member Author

Cool, we got a mention on postgres weekly: https://postgresweekly.com/issues/614

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants