-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add custom delimiter to csv_agg #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Addresses the problem on PostgREST/postgrest#1102 Adds an overloaded `csv_agg` function: ```sql SELECT csv_agg(x, '|') AS body FROM projects x; ```
The added loadtest proves this variant has the same performance as |
I'm not sure if this form: SELECT csv_agg(x, '|') Will be the easiest to integrate with PostgREST Media Type Handlers -- maybe a |
Pull Request Test Coverage Report for Build 16736850387Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
What if we would support parameters to media types being mapped to these additional arguments? For example when I request That won't quite work with things like |
@wolfgangwalther That's a negative from pg side, look: SELECT csv_agg(x, delimiter := E'\t') AS body
ERROR: aggregates cannot use named arguments Weirdly enough it does let me create the agg with named parameters: create aggregate csv_agg(anyelement, delimiter "char") (
sfunc = csv_agg_transfn,
stype = internal,
finalfunc = csv_agg_finalfn,
parallel = safe
); Wrapping the aggs in another SQL function won't work because they wouldn't be aggregates no more. Looks like the only choice is to make the |
Urghs, really odd.
Technically, we don't need to actually call the function with these named arguments. If the aggregate can be created with them, we can do the catalog lookup and map the mimetypes with parameters to a |
Yeah, but that looks like much more work compared to just adding different aggs like Then integrating on PostgREST will be much simpler, just adding new builtin handlers here: We already support geoJSON through PostGIS (ref), I think we need to make this more formal and detect some extensions on the schema cache. Right now we just fail at runtime for PostGIS, we can fail without hitting the db. Once we have that, we can provide our builtin CSV handler as fallback and once the user does |
Looks like we'd need the same mechanism for postgrest-openapi too. |
While I agree with the general direction, I would like to drop the builtin CSV handler eventually, when pg_csv is mature and available widespread. |
Yes, I would too, fully agree. |
@wolfgangwalther I'm trying to integrate Since you've been doing some work on nixpkgs, I was wondering if you could help me. I'm adding this postgresqlVersions =
let
pg_csv = pkgs.callPackage ./nix/ext/pg_csv.nix {};
exts = p: [ p.postgis p.pg_safeupdate pg_csv ];
in
[
{ name = "postgresql-17"; postgresql = pkgs.postgresql_17.withPackages exts; }
{ name = "postgresql-16"; postgresql = pkgs.postgresql_16.withPackages exts; }
{ name = "postgresql-15"; postgresql = pkgs.postgresql_15.withPackages exts; }
{ name = "postgresql-14"; postgresql = pkgs.postgresql_14.withPackages exts; }
{ name = "postgresql-13"; postgresql = pkgs.postgresql_13.withPackages exts; }
]; Then {
stdenv,
fetchFromGitHub,
lib,
postgresql,
}:
stdenv.mkDerivation rec {
pname = "pg_csv";
version = "0.1";
buildInputs = [ postgresql postgresql.dev ];
src = fetchFromGitHub {
owner = "PostgREST";
repo = "pg_csv";
rev = "refs/tags/v${version}";
hash = "sha256-N6dneoPQDEFg1cMvxARub+/xSAH86qihzZGlXTNE/hQ=";
};
} But that somehow doesn't find
I tried using |
This will always build
IIRC, Upstreaming |
Do you think it's fine to do that in its current state? I mean, I've tagged it as 0.1 mostly because it doesn't have many features, but it should be stable (test/loadtested + it's really simple too). |
I'd say it's OK to upstream it as an optional dependency of PostgREST. This dependency is not encoded in the nix expression, but it's nevertheless there. |
I'd like it to be generally useful too. I think it already is convenient (I always forget the COPY syntax) but maybe we can make it more featureful/stable before upstreaming, we might avoid some breaking changes that way too. (also we can progress relatively quick here) As I add more features (#4), I now understand the point Tom Lane made (ref) about The root problem is that we don't have named args as we found above, so adding multiple options becomes complicated for the user. CSV has many options, see https://specs.frictionlessdata.io//csv-dialect/ (this was shared on PostgREST/postgrest#1374 (comment)). So maybe we should come up with a stable interface? Using https://specs.frictionlessdata.io//csv-dialect/, a JSON interface would be something like: SELECT csv_agg(x, '{"delimiter": ",", "header": true, "doubleQuote": true}') AS body
FROM projects x; ( To reduce the json verbosity, maybe we could add an ENUM for the different boolean options: SELECT csv_agg(x, '{"delimiter": ","}', CSV_HEADER_DQUOTE) AS body
FROM projects x; I kinda like the latter. I think that's the best we can do with the current limitation. Any thoughts? (I'll have to check the perf of parsing the json, but I don't think it should matter.. I've been adding loadtests for the |
I think the |
Ohh, I think I found a much better option. We use a composite type: create type pg_csv_options as (
bom bool,
header bool,
delim text
);
create function csv_options(bom bool default false, header bool default true, delim text default ',') returns pg_csv_options as $$
select (bom,header,delim)::pg_csv_options;
$$ language sql;
create aggregate csv_agg(anyelement, pg_csv_options) (
sfunc = csv_agg_transfn,
stype = internal,
finalfunc = csv_agg_finalfn,
parallel = safe
);
select csv_agg(x, csv_options(bom := true)) from projects x; This looks really nice! I think that's the final interface. Then we can add new options without adding more parameters, and we can name them too! |
It's really a shame that aggregates can't use named arguments...
That looks legit. I'd use a bit more consistent naming:
|
Nice! Making that change on #5 |
I think there's a future where we could have @wolfgangwalther Maybe we should do the same? Would a single |
While I think |
I was thinking to do the |
@wolfgangwalther Released v1.0 🚀
I had an old clone of nixpkgs on my machine but somehow pulling is never finishing :/. I wonder if you could give me a hand in adding |
If nothing else works, you can also do a shallow clone of nixpkgs! |
@wolfgangwalther Cool, did the PR on NixOS/nixpkgs#439223. So far just checked it works by doing:
|
Cool, we got a mention on postgres weekly: https://postgresweekly.com/issues/614 ⭐ |
Addresses the problem on PostgREST/postgrest#1102
Adds an overloaded
csv_agg
function: