csv format for psql

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
109 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

csv format for psql

Daniel Verite
 Hi,


This patch implements csv as an output format in psql
(\pset format csv). It's quite similar to the unaligned format,
except that it applies CSV quoting rules (obviously!) and that
it prints no footer and no title.
As with unaligned, a header with column names is output unless
tuples_only is on. It also supports the fieldsep/fielsep_zero
and recordsep/recordsep_zero settings.

Most of times, the need for CSV is covered by \copy or COPY with
the CSV option, but there are some cases where it would be more
practical to have it as an output format in psql.

* \copy does not interpolate psql variables and is a single-line
command, so making a query fit these contraints can be cumbersome.
It can be got around by defining a temporary view and
\copy from that view, but that doesn't work in a read-only context
such as when connected to a standby.

* the server-side COPY TO STDOUT can also be used from psql,
typically with psql -c "COPY (query) TO STDOUT CSV" > file.csv,
but that's too simple to extract multiple result sets per script.
COPY is also more rigid than psql in the options to delimit
fields and records.

* copy with csv can't help for the output of meta-commands
such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql
does work with these.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

psql-csv-format.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Fabien COELHO-3

Hello Daniel,

> This patch implements csv as an output format in psql
> (\pset format csv).

Would you consider registering it in the next CF?

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Pavel Stehule
In reply to this post by Daniel Verite


2018-01-30 9:31 GMT+01:00 Daniel Verite <[hidden email]>:
 Hi,


This patch implements csv as an output format in psql
(\pset format csv). It's quite similar to the unaligned format,
except that it applies CSV quoting rules (obviously!) and that
it prints no footer and no title.
As with unaligned, a header with column names is output unless
tuples_only is on. It also supports the fieldsep/fielsep_zero
and recordsep/recordsep_zero settings.

Most of times, the need for CSV is covered by \copy or COPY with
the CSV option, but there are some cases where it would be more
practical to have it as an output format in psql.

I absolutely agree


* \copy does not interpolate psql variables and is a single-line
command, so making a query fit these contraints can be cumbersome.
It can be got around by defining a temporary view and
\copy from that view, but that doesn't work in a read-only context
such as when connected to a standby.

* the server-side COPY TO STDOUT can also be used from psql,
typically with psql -c "COPY (query) TO STDOUT CSV" > file.csv,
but that's too simple to extract multiple result sets per script.
COPY is also more rigid than psql in the options to delimit
fields and records.

* copy with csv can't help for the output of meta-commands
such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql
does work with these.

It is great - long time I miss this feature - It is interesting for scripting, ETL, ..

This format is too important, so some special short or long option can be practical (it will be printed in help)

some like --csv

I found one issue - PostgreSQL default field separator is "|". Maybe good time to use more common "," ?

Or when field separator was not explicitly defined, then use "," for CSV, and "|" for other. Although it can be little bit messy

Thank you

Pavel
 


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Daniel Verite
In reply to this post by Fabien COELHO-3
        Fabien COELHO wrote:

> > This patch implements csv as an output format in psql
> > (\pset format csv).
>
> Would you consider registering it in the next CF?

Done here:

https://commitfest.postgresql.org/17/1500/

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Daniel Verite
In reply to this post by Pavel Stehule
        Pavel Stehule wrote:

> This format is too important, so some special short or long option can be
> practical (it will be printed in help)
>
> some like --csv

I guess -C/--csv could be used, like there already is -H/--html.

> I found one issue - PostgreSQL default field separator is "|". Maybe good
> time to use more common "," ?
>
> Or when field separator was not explicitly defined, then use "," for CSV,
> and "|" for other. Although it can be little bit messy

Currently there's a strong analogy between the unaligned
mode and csv mode. In particular they use the existing pset
variables fieldsep, fieldsep_zero, recordsep, recordsep_zero
in the same way. If we want to make csv special with regard
to the delimiters,  that complicates the user interface
For instance if fieldsep was changed automatically by
\pset format csv, it's not obvious if/when we should switch it
back to its previous state, and how the fieldsep switch done
automatically would mix or conflict with other \pset
commands issued by the user.
Or we need to duplicate these variables. Or duplicate
only fieldsep, having a fieldsep_csv, leaving out the
other variables and not being as close to the unaligned
mode.
These options don't appeal to me much compared
to the simplicity of the current patch.

Also, although the comma is the separator defined by the
RFC4180 and the default for COPY CSV, people also use the
semicolon extensively (because it's what Excel does I guess),
which somehow mitigates the importance of comma as the
default value.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Pavel Stehule


2018-01-31 13:58 GMT+01:00 Daniel Verite <[hidden email]>:
        Pavel Stehule wrote:

> This format is too important, so some special short or long option can be
> practical (it will be printed in help)
>
> some like --csv

I guess -C/--csv could be used, like there already is -H/--html.

> I found one issue - PostgreSQL default field separator is "|". Maybe good
> time to use more common "," ?
>
> Or when field separator was not explicitly defined, then use "," for CSV,
> and "|" for other. Although it can be little bit messy

Currently there's a strong analogy between the unaligned
mode and csv mode. In particular they use the existing pset
variables fieldsep, fieldsep_zero, recordsep, recordsep_zero
in the same way. If we want to make csv special with regard
to the delimiters,  that complicates the user interface
For instance if fieldsep was changed automatically by
\pset format csv, it's not obvious if/when we should switch it
back to its previous state, and how the fieldsep switch done
automatically would mix or conflict with other \pset
commands issued by the user.
Or we need to duplicate these variables. Or duplicate
only fieldsep, having a fieldsep_csv, leaving out the
other variables and not being as close to the unaligned
mode.
These options don't appeal to me much compared
to the simplicity of the current patch.

Also, although the comma is the separator defined by the
RFC4180 and the default for COPY CSV, people also use the
semicolon extensively (because it's what Excel does I guess),
which somehow mitigates the importance of comma as the
default value.


The functionality is clean and great - default setting "|" is less. I have not strong opinion about it and I understand to your arguments. Has anybody some idea?

Regards

Pavel
 

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Pavel Stehule


2018-01-31 14:09 GMT+01:00 Pavel Stehule <[hidden email]>:


2018-01-31 13:58 GMT+01:00 Daniel Verite <[hidden email]>:
        Pavel Stehule wrote:

> This format is too important, so some special short or long option can be
> practical (it will be printed in help)
>
> some like --csv

I guess -C/--csv could be used, like there already is -H/--html.

> I found one issue - PostgreSQL default field separator is "|". Maybe good
> time to use more common "," ?
>
> Or when field separator was not explicitly defined, then use "," for CSV,
> and "|" for other. Although it can be little bit messy

Currently there's a strong analogy between the unaligned
mode and csv mode. In particular they use the existing pset
variables fieldsep, fieldsep_zero, recordsep, recordsep_zero
in the same way. If we want to make csv special with regard
to the delimiters,  that complicates the user interface
For instance if fieldsep was changed automatically by
\pset format csv, it's not obvious if/when we should switch it
back to its previous state, and how the fieldsep switch done
automatically would mix or conflict with other \pset
commands issued by the user.
Or we need to duplicate these variables. Or duplicate
only fieldsep, having a fieldsep_csv, leaving out the
other variables and not being as close to the unaligned
mode.
These options don't appeal to me much compared
to the simplicity of the current patch.

Also, although the comma is the separator defined by the
RFC4180 and the default for COPY CSV, people also use the
semicolon extensively (because it's what Excel does I guess),
which somehow mitigates the importance of comma as the
default value.


The functionality is clean and great - default setting "|" is less. I have not strong opinion about it and I understand to your arguments. Has anybody some idea?

two other ideas from me

1. currently \pset format xxx command has not any other parameters. Other optional parameters can be set of possible attributes like {, ; | \t header_off escape_all}. If this set will be known by tabcomplete, then it can be very friendly and I don't think so code will be more complex than in your proposal (long options can be "--csv --csv-options=", header_off" )

some like "\pset format csv , header_off

or just limited variant with only field separator specification

\pset format csv ;

2. if we support csv format, then we can introduce \gcsv xxx | gnuplot .... It is just idea.

@1 is much more interesting than @2


Regards

Pavel


 

Regards

Pavel
 

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Pavel Stehule
In reply to this post by Daniel Verite


2018-01-31 13:58 GMT+01:00 Daniel Verite <[hidden email]>:
        Pavel Stehule wrote:

> This format is too important, so some special short or long option can be
> practical (it will be printed in help)
>
> some like --csv

I guess -C/--csv could be used, like there already is -H/--html.

I prefer just long option only. Maybe in future we can use short "C" for something else better. There is only few free short commands.

Regards

Pavel

 

> I found one issue - PostgreSQL default field separator is "|". Maybe good
> time to use more common "," ?
>
> Or when field separator was not explicitly defined, then use "," for CSV,
> and "|" for other. Although it can be little bit messy

Currently there's a strong analogy between the unaligned
mode and csv mode. In particular they use the existing pset
variables fieldsep, fieldsep_zero, recordsep, recordsep_zero
in the same way. If we want to make csv special with regard
to the delimiters,  that complicates the user interface
For instance if fieldsep was changed automatically by
\pset format csv, it's not obvious if/when we should switch it
back to its previous state, and how the fieldsep switch done
automatically would mix or conflict with other \pset
commands issued by the user.
Or we need to duplicate these variables. Or duplicate
only fieldsep, having a fieldsep_csv, leaving out the
other variables and not being as close to the unaligned
mode.
These options don't appeal to me much compared
to the simplicity of the current patch.

Also, although the comma is the separator defined by the
RFC4180 and the default for COPY CSV, people also use the
semicolon extensively (because it's what Excel does I guess),
which somehow mitigates the importance of comma as the
default value.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Daniel Verite
        Pavel Stehule wrote:

> > I guess -C/--csv could be used, like there already is -H/--html.
> >
>
> I prefer just long option only. Maybe in future we can use short "C" for
> something else better. There is only few free short commands.

Looking at parse_psql_options(), currently psql has 35 long options
and all of them have a corresponding single-character invocation.
Unless there is a consensus for it, I don't feel like it's time to create
the precedent that some options deserve short forms and others
don't.

Sure at some point we'll run out of letters, but looking back ten
years ago, on Jan 2008 psql had 31 options, so the rate of adding new
ones does not look worrying.

Besides I like the fact that -C can be seen as a drop-in replacement for
-A, because in most cases, it's just a safer version of -A, as it
deals with the situation that the separator might be in the contents,
which is the main weakness of -A.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Daniel Verite
In reply to this post by Pavel Stehule
        Pavel Stehule wrote:

> some like "\pset format csv , header_off

If we did that, we'd need to reconsider the interactions
of this with \t on|off and \pset footer on|off
and how to keep things consistent with the unaligned
format.
I feel like it's not worth the trouble.
We can still add this later if users are confused with
the interface, but that interface is already well
established with the unaligned format.

> 2. if we support csv format, then we can introduce \gcsv xxx | gnuplot ....

I thought committers rejected this because they didn't want a
new variant of \g
The csv format makes it possible to not add any new \g-style
metacommand and inject csv into the existing ones.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Pavel Stehule


2018-02-12 15:33 GMT+01:00 Daniel Verite <[hidden email]>:
        Pavel Stehule wrote:

> some like "\pset format csv , header_off

If we did that, we'd need to reconsider the interactions
of this with \t on|off and \pset footer on|off
and how to keep things consistent with the unaligned
format.

ok

 
I feel like it's not worth the trouble.
We can still add this later if users are confused with
the interface, but that interface is already well
established with the unaligned format.

> 2. if we support csv format, then we can introduce \gcsv xxx | gnuplot ....

I thought committers rejected this because they didn't want a
new variant of \g
The csv format makes it possible to not add any new \g-style
metacommand and inject csv into the existing ones.


ok
 

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Fabien COELHO-3
In reply to this post by Daniel Verite

Hello Daniel,

> This patch implements csv as an output format in psql
> (\pset format csv). It's quite similar to the unaligned format,
> except that it applies CSV quoting rules (obviously!) and that
> it prints no footer and no title.
> As with unaligned, a header with column names is output unless
> tuples_only is on. It also supports the fieldsep/fielsep_zero
> and recordsep/recordsep_zero settings.

Patch applies cleanly and compiles. "make check" ok, although there is
no specific test for this feature...

The documentation should mention the other CSV options (COPY, \copy, ...)
and explain how they compare to this one. Maybe a specific paragraph about
how to do CSV? I understand "\pset format csv" as triggering that all
outputs compared to per command options.

Given that this is somehow already available, I'm wondering why there is
no code sharing.

I find it annoying that setting csv keeps the default '|' separator, where
ISTM that it should be by default "," (as in COMMA separated value:-).
However it would not be a good idea to change another variables when setting
one, obviously.

Maybe some \csv command could set the format to csv, fieldsep to ",",
tuples_only to on, recordsep to '\n'? Not sure whether it would be
acceptable, though, and how to turn it off once turned on... Probably an
average (aka not good) idea:-)

The format adds column headers, however they are not escaped properly:

   psql> \pset format csv
   psql> \pset fieldsep ,
   psql> SELECT 1 AS "hello, world", 2 AS """";
     hello, world,"
     1,2

Also it does not seem to work properly in expanded mode, both for the
column and values:

   psql> \x
   psql> SELECT 1 AS "bla""", E'\n,"' AS foo;
     bla",1
     foo,
     ,"

There MUST be some tests, especially with ugly stuff (escapes, newlines,
double quotes, various types, expanded or not, field seps, strange
column names...).


> Most of times, the need for CSV is covered by \copy or COPY with
> the CSV option, but there are some cases where it would be more
> practical to have it as an output format in psql.
>
> * \copy does not interpolate psql variables and is a single-line
> command, so making a query fit these contraints can be cumbersome.
> It can be got around by defining a temporary view and
> \copy from that view, but that doesn't work in a read-only context
> such as when connected to a standby.
>
> * the server-side COPY TO STDOUT can also be used from psql,
> typically with psql -c "COPY (query) TO STDOUT CSV" > file.csv,
> but that's too simple to extract multiple result sets per script.
> COPY is also more rigid than psql in the options to delimit
> fields and records.
>
> * copy with csv can't help for the output of meta-commands
> such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql
> does work with these.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Daniel Verite
        Fabien COELHO wrote:

> Maybe some \csv command could set the format to csv, fieldsep to ",",
> tuples_only to on, recordsep to '\n'? Not sure whether it would be
> acceptable, though, and how to turn it off once turned on... Probably an
> average (aka not good) idea:-)

Thanks for reviewing this patch!

Attached is a v2 fixing the bugs you mentioned, and adding ---csv/-C
as discussed upthread. I'll add some regression tests shortly.

About the default separator, the approach taken by this patch is that
the csv output is equivalent to unaligned with csv quoting, so it has
the same behavior and follows the options of the unaligned mode
as much as possible.

That seemed like a good idea to me when I wrote it, but maybe
a better idea would to have a new \pset fieldsep_csv
parameter with its own command-line option, and ignore {fieldsep,
fieldsep_zero, recordsep, recordsep_zero} for this format, just like
they are being ignored for HTML, or latex, or asciidoc...
Does anyone who think that csv should be added care
about this "alignment" with unaligned? :)


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

psql-csv-format-v2.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Pavel Stehule


2018-03-01 17:10 GMT+01:00 Daniel Verite <[hidden email]>:
        Fabien COELHO wrote:

> Maybe some \csv command could set the format to csv, fieldsep to ",",
> tuples_only to on, recordsep to '\n'? Not sure whether it would be
> acceptable, though, and how to turn it off once turned on... Probably an
> average (aka not good) idea:-)

Thanks for reviewing this patch!

Attached is a v2 fixing the bugs you mentioned, and adding ---csv/-C
as discussed upthread. I'll add some regression tests shortly.

About the default separator, the approach taken by this patch is that
the csv output is equivalent to unaligned with csv quoting, so it has
the same behavior and follows the options of the unaligned mode
as much as possible.

That seemed like a good idea to me when I wrote it, but maybe
a better idea would to have a new \pset fieldsep_csv
parameter with its own command-line option, and ignore {fieldsep,
fieldsep_zero, recordsep, recordsep_zero} for this format, just like
they are being ignored for HTML, or latex, or asciidoc...
Does anyone who think that csv should be added care
about this "alignment" with unaligned? :)

good idea

Pavel
 


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Fabien COELHO-3
In reply to this post by Daniel Verite

Hello Daniel,

> Attached is a v2 fixing the bugs you mentioned, and adding ---csv/-C
> as discussed upthread. I'll add some regression tests shortly.

Basically I'm waiting for the version with regression tests before
reviewing.

It is unclear whether committer will like it.

From my point of view being able to simply set postgres to generate csv is
fine with me, with example uses such as:

  psql --csv 'TABLE Stuff;' > stuff.csv

So that having the --csv option to turn to "full csv", i.e. set the format
and various seperators as expected, would be a nice have.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Pavel Stehule


2018-03-07 10:45 GMT+01:00 Fabien COELHO <[hidden email]>:

Hello Daniel,

Attached is a v2 fixing the bugs you mentioned, and adding ---csv/-C
as discussed upthread. I'll add some regression tests shortly.

Basically I'm waiting for the version with regression tests before reviewing.

It is unclear whether committer will like it.

From my point of view being able to simply set postgres to generate csv is fine with me, with example uses such as:

        psql --csv 'TABLE Stuff;' > stuff.csv

So that having the --csv option to turn to "full csv", i.e. set the format and various seperators as expected, would be a nice have.

There is commad -c and it should be used. The --csv options should not to have a parameter. I don't like a idea to have more options for query execution.

Regards

Pavel
 

--
Fabien.


Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Fabien COELHO-3

Hello Pavel,

>>         psql --csv 'TABLE Stuff;' > stuff.csv
>
> There is commad -c and it should be used. The --csv options should not to
> have a parameter. I don't like a idea to have more options for query
> execution.

Yes, I agree and that is indeed what I meant, sorry for the typo. The
cleaner example would be something like:

   psql --csv -c 'TABLE foo' > foo.csv

With a -c to introduce the command.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Pavel Stehule


2018-03-07 19:40 GMT+01:00 Fabien COELHO <[hidden email]>:

Hello Pavel,

        psql --csv 'TABLE Stuff;' > stuff.csv

There is commad -c and it should be used. The --csv options should not to
have a parameter. I don't like a idea to have more options for query
execution.

Yes, I agree and that is indeed what I meant, sorry for the typo. The cleaner example would be something like:

  psql --csv -c 'TABLE foo' > foo.csv

With a -c to introduce the command.

ok :) it has sense now

Regards

Pavel

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

David Fetter
In reply to this post by Fabien COELHO-3
On Wed, Mar 07, 2018 at 07:40:49PM +0100, Fabien COELHO wrote:

>
> Hello Pavel,
>
> >>        psql --csv 'TABLE Stuff;' > stuff.csv
> >
> >There is commad -c and it should be used. The --csv options should not to
> >have a parameter. I don't like a idea to have more options for query
> >execution.
>
> Yes, I agree and that is indeed what I meant, sorry for the typo. The
> cleaner example would be something like:
>
>   psql --csv -c 'TABLE foo' > foo.csv
>
> With a -c to introduce the command.

This seems pretty specialized.  If we're adding something new, how about

    psql --format=csv -o foo.csv -c 'TABLE foo'

Or we could stick with:

    psql -P format=csv -o foo.csv -c 'TABLE foo'

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Reply | Threaded
Open this post in threaded view
|

Re: csv format for psql

Fabien COELHO-3

>>   psql --csv -c 'TABLE foo' > foo.csv
>>
>> With a -c to introduce the command.
>
> This seems pretty specialized.  If we're adding something new, how about
>
>    psql --format=csv -o foo.csv -c 'TABLE foo'
>
> Or we could stick with:
>
>    psql -P format=csv -o foo.csv -c 'TABLE foo'

Currently "-P format=csv" uses the unaligned formating separators, i.e.
'|' is used. I was suggesting that a special long option could switch
several variables to some specific values, i.e.

   --csv

Would be equivalent to something like:

   -P format=csv -P fieldsep=, -P recordsep=\n (?) -P tuples_only=on ...

I.e. really generate some csv from the data in just one option, not many.

But this is obviously debatable.

--
Fabien.

1234 ... 6
Previous Thread Next Thread