feat: add custom delimiter to csv_agg #2

steve-chavez · 2025-08-04T23:29:40Z

Addresses the problem on PostgREST/postgrest#1102

Adds an overloaded csv_agg function:

SELECT csv_agg(x, '|') AS body
FROM   projects x;

Addresses the problem on PostgREST/postgrest#1102 Adds an overloaded `csv_agg` function: ```sql SELECT csv_agg(x, '|') AS body FROM projects x; ```

steve-chavez · 2025-08-04T23:53:10Z

The added loadtest proves this variant has the same performance as csv_agg with a single arg: https://github.com/PostgREST/pg_csv/actions/runs/16736548331?pr=2

steve-chavez · 2025-08-04T23:59:16Z

I'm not sure if this form:

SELECT csv_agg(x, '|')

Will be the easiest to integrate with PostgREST Media Type Handlers -- maybe a tsv_agg without params is better. At any rate, we can add a new agg function later.

coveralls · 2025-08-05T01:51:24Z

Pull Request Test Coverage Report for Build 16736850387

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

12 of 12 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.1%) to 97.368%

Totals
Change from base Build 16732377191:	0.1%
Covered Lines:	74
Relevant Lines:	76

💛 - Coveralls

wolfgangwalther · 2025-08-05T08:28:34Z

I'm not sure if this form:
SELECT csv_agg(x, '|') 
Will be the easiest to integrate with PostgREST Media Type Handlers

What if we would support parameters to media types being mapped to these additional arguments? For example when I request text/csv;delimiter=|, this could be mapped to csv_agg(x, delimiter => '|'). For this, it would be good if the csv_agg function had named arguments, at least all except the first one.

That won't quite work with things like CREATE DOMAIN "text/tsv" ..., though. For these we'd probably still need a separate tsv_agg wrapper.

steve-chavez · 2025-08-05T16:03:09Z

What if we would support parameters to media types being mapped to these additional arguments? For example when I request text/csv;delimiter=|, this could be mapped to csv_agg(x, delimiter => '|')

@wolfgangwalther That's a negative from pg side, look:

SELECT csv_agg(x, delimiter := E'\t') AS body
ERROR:  aggregates cannot use named arguments

Weirdly enough it does let me create the agg with named parameters:

create aggregate csv_agg(anyelement, delimiter "char") (
  sfunc     = csv_agg_transfn,
  stype     = internal,
  finalfunc = csv_agg_finalfn,
  parallel  = safe
);

Wrapping the aggs in another SQL function won't work because they wouldn't be aggregates no more.

Looks like the only choice is to make the tsv_agg and other wrappers.

wolfgangwalther · 2025-08-05T16:06:45Z

SELECT csv_agg(x, delimiter := E'\t') AS body
ERROR:  aggregates cannot use named arguments

Urghs, really odd.

Weirdly enough it does let me create the agg with named parameters:

create aggregate csv_agg(anyelement, delimiter "char") (
  sfunc     = csv_agg_transfn,
  stype     = internal,
  finalfunc = csv_agg_finalfn,
  parallel  = safe
);

Technically, we don't need to actually call the function with these named arguments. If the aggregate can be created with them, we can do the catalog lookup and map the mimetypes with parameters to a csv_agg(x, E'\t') call, even without specifying the name on the call, by passing them in the right order and with the right type.

steve-chavez · 2025-08-05T17:24:44Z

If the aggregate can be created with them, we can do the catalog lookup and map the mimetypes with parameters to a csv_agg(x, E'\t') call, even without specifying the name on the call, by passing them in the right order and with the right type.

Yeah, but that looks like much more work compared to just adding different aggs like tsv_agg. Considering we control pg_csv, we can just add those aggs.

Then integrating on PostgREST will be much simpler, just adding new builtin handlers here:
https://github.com/PostgREST/postgrest/blob/8fdd46c64bf7b999656cfc2584fd840502184e44/src/PostgREST/SchemaCache.hs#L1077-L1083

We already support geoJSON through PostGIS (ref), I think we need to make this more formal and detect some extensions on the schema cache. Right now we just fail at runtime for PostGIS, we can fail without hitting the db.

Once we have that, we can provide our builtin CSV handler as fallback and once the user does CREATE EXTENSION pg_csv, PostgREST will pick the new aggs/handlers. That's the best DX I think.

steve-chavez · 2025-08-05T17:30:50Z

We already support geoJSON through PostGIS (ref), I think we need to make this more formal and detect some extensions on the schema cache. Right now we just fail at runtime for PostGIS, we can fail without hitting the db.

Looks like we'd need the same mechanism for postgrest-openapi too.

wolfgangwalther · 2025-08-05T17:59:58Z

Once we have that, we can provide our builtin CSV handler as fallback and once the user does CREATE EXTENSION pg_csv, PostgREST will pick the new aggs/handlers. That's the best DX I think.

While I agree with the general direction, I would like to drop the builtin CSV handler eventually, when pg_csv is mature and available widespread.

steve-chavez · 2025-08-05T18:10:52Z

I would like to drop the builtin CSV handler eventually, when pg_csv is mature and available widespread.

Yes, I would too, fully agree.

steve-chavez · 2025-08-06T15:51:55Z

@wolfgangwalther I'm trying to integrate pg_csv into PostgREST to close PostgREST/postgrest#3627.

Since you've been doing some work on nixpkgs, I was wondering if you could help me.

On https://github.com/PostgREST/postgrest/blob/0f1ca8faac630518b343f195464beff47fe32c78/default.nix#L54-L61

I'm adding this

  postgresqlVersions =
    let
      pg_csv = pkgs.callPackage ./nix/ext/pg_csv.nix {};
      exts = p: [ p.postgis p.pg_safeupdate pg_csv ];
    in
    [
      { name = "postgresql-17"; postgresql = pkgs.postgresql_17.withPackages exts; }
      { name = "postgresql-16"; postgresql = pkgs.postgresql_16.withPackages exts; }
      { name = "postgresql-15"; postgresql = pkgs.postgresql_15.withPackages exts; }
      { name = "postgresql-14"; postgresql = pkgs.postgresql_14.withPackages exts; }
      { name = "postgresql-13"; postgresql = pkgs.postgresql_13.withPackages exts; }
    ];

Then pg_csv.nix is:

{
  stdenv,
  fetchFromGitHub,
  lib,
  postgresql,
}:

stdenv.mkDerivation rec {
  pname = "pg_csv";
  version = "0.1";

  buildInputs = [ postgresql postgresql.dev ];

  src = fetchFromGitHub {
    owner = "PostgREST";
    repo = "pg_csv";
    rev = "refs/tags/v${version}";
    hash = "sha256-N6dneoPQDEFg1cMvxARub+/xSAH86qihzZGlXTNE/hQ=";
  };
}

But that somehow doesn't find pg_config:

/nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash: line 1: pg_config: command not found
make: *** No rule to make target 'install'.  Stop.
error: builder for '/nix/store/dhczacqyy8ivdbn1lnwp4bcqysb43pp7-pg_csv-0.1.drv' failed with exit code 2;
       last 10 log lines:
       > no configure script, doing nothing
       > Running phase: buildPhase
       > build flags: SHELL=/nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash
       > /nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash: line 1: pg_config: command not found
       > cp sql/pg_csv.sql sql/pg_csv--0.1.sql
       > sed "s/@EXTVERSION@/0.1/g" pg_csv.control.in > pg_csv.control
       > Running phase: installPhase
       > install flags: SHELL=/nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash install
       > /nix/store/ih68ar79msmj0496pgld4r3vqfr7bbin-bash-5.2p37/bin/bash: line 1: pg_config: command not found
       > make: *** No rule to make target 'install'.  Stop.
       For full logs, run 'nix-store -l /nix/store/dhczacqyy8ivdbn1lnwp4bcqysb43pp7-pg_csv-0.1.drv'.

I tried using postgresqlBuildExtension but that also gave errors, how should I install an extension now?

wolfgangwalther · 2025-08-06T16:06:13Z

I'm adding this

  postgresqlVersions =
    let
      pg_csv = pkgs.callPackage ./nix/ext/pg_csv.nix {};
      exts = p: [ p.postgis p.pg_safeupdate pg_csv ];
    in
    [
      { name = "postgresql-17"; postgresql = pkgs.postgresql_17.withPackages exts; }
      { name = "postgresql-16"; postgresql = pkgs.postgresql_16.withPackages exts; }
      { name = "postgresql-15"; postgresql = pkgs.postgresql_15.withPackages exts; }
      { name = "postgresql-14"; postgresql = pkgs.postgresql_14.withPackages exts; }
      { name = "postgresql-13"; postgresql = pkgs.postgresql_13.withPackages exts; }
    ];

This will always build pg_csv against the same postgresql version. Thus, this will not work for the different versions we test here. pg_csv needs to be added to the postgresql package set. We currently don't have a way to extend that from outside nixpkgs, though.

Then pg_csv.nix is:

{
  stdenv,
  fetchFromGitHub,
  lib,
  postgresql,
}:

stdenv.mkDerivation rec {
  pname = "pg_csv";
  version = "0.1";

  buildInputs = [ postgresql postgresql.dev ];

  src = fetchFromGitHub {
    owner = "PostgREST";
    repo = "pg_csv";
    rev = "refs/tags/v${version}";
    hash = "sha256-N6dneoPQDEFg1cMvxARub+/xSAH86qihzZGlXTNE/hQ=";
  };
}

But that somehow doesn't find pg_config:

pg_config is only available as postgresql.pg_config in nixpkgs today, so you'd need to add that to your nativeBuildInputs.

I tried using postgresqlBuildExtension but that also gave errors, how should I install an extension now?

IIRC, postgresqlBuildExtension is only available inside nixpkgs, not as a consumer of it.

Upstreaming pg_csv is the way to go.

steve-chavez · 2025-08-06T16:11:01Z

Upstreaming pg_csv is the way to go.

Do you think it's fine to do that in its current state? I mean, I've tagged it as 0.1 mostly because it doesn't have many features, but it should be stable (test/loadtested + it's really simple too).

wolfgangwalther · 2025-08-06T16:23:11Z

I'd say it's OK to upstream it as an optional dependency of PostgREST. This dependency is not encoded in the nix expression, but it's nevertheless there.

steve-chavez · 2025-08-06T22:12:11Z

I'd say it's OK to upstream it as an optional dependency of PostgREST

I'd like it to be generally useful too. I think it already is convenient (I always forget the COPY syntax) but maybe we can make it more featureful/stable before upstreaming, we might avoid some breaking changes that way too. (also we can progress relatively quick here)

As I add more features (#4), I now understand the point Tom Lane made (ref) about csv_agg not being flexible enough.

The root problem is that we don't have named args as we found above, so adding multiple options becomes complicated for the user. CSV has many options, see https://specs.frictionlessdata.io//csv-dialect/ (this was shared on PostgREST/postgrest#1374 (comment)).

So maybe we should come up with a stable interface?

Using https://specs.frictionlessdata.io//csv-dialect/, a JSON interface would be something like:

SELECT csv_agg(x, '{"delimiter": ",", "header": true, "doubleQuote": true}') AS body
FROM   projects x;

(hstore would be more ergonomic, but unfortunately it requires doing CREATE EXTENSION hstore)

To reduce the json verbosity, maybe we could add an ENUM for the different boolean options:

SELECT csv_agg(x, '{"delimiter": ","}', CSV_HEADER_DQUOTE) AS body
FROM   projects x;

I kinda like the latter. I think that's the best we can do with the current limitation. Any thoughts?

(I'll have to check the perf of parsing the json, but I don't think it should matter.. I've been adding loadtests for the csv_agg variants and CI shows the added parameters are negligible in perf loss)

steve-chavez · 2025-08-06T23:15:21Z

To reduce the json verbosity, maybe we could add an ENUM for the different boolean options:

I think the enum idea is good regardless of the 2nd parameter (maybe it can be an array too?). We already have the options of w/wo CSV header and w/wo BOM, so I can proceed with this.

steve-chavez · 2025-08-06T23:35:21Z

Ohh, I think I found a much better option. We use a composite type:

create type pg_csv_options as (
  bom bool,
  header bool,
  delim text
);

create function csv_options(bom bool default false, header bool default true, delim text default ',') returns pg_csv_options as $$ 
  select (bom,header,delim)::pg_csv_options; 
$$ language sql;

create aggregate csv_agg(anyelement, pg_csv_options) (
  sfunc     = csv_agg_transfn,
  stype     = internal,
  finalfunc = csv_agg_finalfn,
  parallel  = safe
);

select csv_agg(x, csv_options(bom := true)) from projects x;

This looks really nice! I think that's the final interface. Then we can add new options without adding more parameters, and we can name them too!

wolfgangwalther · 2025-08-07T10:22:29Z

It's really a shame that aggregates can't use named arguments...

Ohh, I think I found a much better option. We use a composite type:

That looks legit.

I'd use a bit more consistent naming:

looking at COPY: delim -> delimiter
type pg_csv_options -> csv_options (imho type and proc namespaces don't collide, so they can be the same name)

steve-chavez · 2025-08-07T17:46:37Z

I'd use a bit more consistent naming:

Nice! Making that change on #5

steve-chavez · 2025-08-09T15:16:46Z

I think there's a future where we could have pg_csv and postgrest-openapi inside a single postgrest-contrib; maybe later plmustache too, this for ease of installation (one single make && make install for users). PostGIS also has multiple extensions in a single repo: https://github.com/postgis/postgis/tree/master/extensions

@wolfgangwalther Maybe we should do the same? Would a single ext/postgrest-contrib.nix be simpler to maintain in nixpkgs?

wolfgangwalther · 2025-08-09T15:23:32Z

While I think postgrest-openapi and other things could sensibly be in a single postgrest-contrib, I think neither pg_csv, nor plmustache belong there - these can be used independently of PostgREST, so they should not be in there.

steve-chavez · 2025-08-18T04:05:13Z

I was thinking to do the csv_populate_recordset function that is needed for solving PostgREST/postgrest#1043, but the parsing is a bit complicated and will take a while to do it. So I'll upstream the extension to nixpkgs as it is for now.

steve-chavez · 2025-09-01T04:43:19Z

@wolfgangwalther Released v1.0 🚀

Added the nullstr option feat: add nullstr option #8
Put some rationale on the README
Changed the project description to "Fast and flexible CSV processing for Postgres", I think "flexible" is key here.

I had an old clone of nixpkgs on my machine but somehow pulling is never finishing :/. I wonder if you could give me a hand in adding pg_csv on nixpkgs and putting me as a maintainer there? No problem if not, I'll try again tomorrow (getting late here 😪 ).

wolfgangwalther · 2025-09-01T10:32:50Z

If nothing else works, you can also do a shallow clone of nixpkgs!

steve-chavez · 2025-09-01T14:51:29Z

@wolfgangwalther Cool, did the PR on NixOS/nixpkgs#439223. So far just checked it works by doing:

$ nix-build -A postgresqlPackages.pg_csv

steve-chavez · 2025-09-04T23:28:00Z

Cool, we got a mention on postgres weekly: https://postgresweekly.com/issues/614 ⭐

feat: add custom delimiter to csv_agg

781ace9

Addresses the problem on PostgREST/postgrest#1102 Adds an overloaded `csv_agg` function: ```sql SELECT csv_agg(x, '|') AS body FROM projects x; ```

steve-chavez force-pushed the delim branch from a6ed0bd to 781ace9 Compare August 4, 2025 23:51

steve-chavez mentioned this pull request Aug 4, 2025

Feature request: option to change delimiter character for CSV output PostgREST/postgrest#1102

Closed

steve-chavez merged commit d724494 into PostgREST:master Aug 5, 2025
11 checks passed

feat: add custom delimiter to csv_agg #2

feat: add custom delimiter to csv_agg #2

Uh oh!

Conversation

steve-chavez commented Aug 4, 2025

Uh oh!

steve-chavez commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented Aug 4, 2025

Uh oh!

Uh oh!

coveralls commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 16736850387

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

wolfgangwalther commented Aug 5, 2025

Uh oh!

steve-chavez commented Aug 5, 2025

Uh oh!

wolfgangwalther commented Aug 5, 2025

Uh oh!

steve-chavez commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steve-chavez commented Aug 5, 2025

Uh oh!

wolfgangwalther commented Aug 5, 2025

Uh oh!

steve-chavez commented Aug 5, 2025

Uh oh!

steve-chavez commented Aug 6, 2025

Uh oh!

wolfgangwalther commented Aug 6, 2025

Uh oh!

steve-chavez commented Aug 6, 2025

Uh oh!

wolfgangwalther commented Aug 6, 2025

Uh oh!

steve-chavez commented Aug 6, 2025

Uh oh!

steve-chavez commented Aug 6, 2025

Uh oh!

steve-chavez commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther commented Aug 7, 2025

Uh oh!

steve-chavez commented Aug 7, 2025

Uh oh!

steve-chavez commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther commented Aug 9, 2025

Uh oh!

steve-chavez commented Aug 18, 2025

Uh oh!

steve-chavez commented Sep 1, 2025

Uh oh!

wolfgangwalther commented Sep 1, 2025

Uh oh!

steve-chavez commented Sep 1, 2025

Uh oh!

steve-chavez commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

steve-chavez commented Aug 4, 2025 •

edited

Loading

coveralls commented Aug 5, 2025 •

edited

Loading

steve-chavez commented Aug 5, 2025 •

edited

Loading

steve-chavez commented Aug 6, 2025 •

edited

Loading

steve-chavez commented Aug 9, 2025 •

edited

Loading