Create a Foreign Table for PostgreSQL CSV Logs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Create a Foreign Table for PostgreSQL CSV Logs

PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/runtime-config-logging.html
Description:

There is some imperfection  in PostgreSQL doc. There are two section in
documentation how to import PostgreSQL log into sql space and both not
complete. One of them is:
https://www.postgresql.org/docs/current/file-fdw.html
Example F.1. Create a Foreign Table for PostgreSQL CSV Logs
Here is described how to config file-fdw to get access to the PostgreSQL
logs, but there is not description or a link how to config PostgreSQL log
for this.

Other section is:
https://www.postgresql.org/docs/12/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG
19.8.4. Using CSV-Format Log Output
Here is described how to configure PostgreSQL log to import it into
previously created ordinary table by COPY command. May be this configuration
is compatible with "Example F.1. Create a Foreign Table for PostgreSQL CSV
Logs", but this method is not mentioned.

My idea to improve documentation by adding cross link: to section «Example
F.1. Create a Foreign Table for PostgreSQL CSV Logs» add link to «19.8.4.
Using CSV-Format Log Output» as example of log configuration and to section
«19.8.4. Using CSV-Format Log Output» add link to «Example F.1. Create a
Foreign Table for PostgreSQL CSV Logs» as one more example how PostgreSQL
logs can be imported inside PostgreSQL.
Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

Bruce Momjian
On Fri, Aug 21, 2020 at 09:25:20AM +0000, PG Doc comments form wrote:

> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/12/runtime-config-logging.html
> Description:
>
> There is some imperfection  in PostgreSQL doc. There are two section in
> documentation how to import PostgreSQL log into sql space and both not
> complete. One of them is:
> https://www.postgresql.org/docs/current/file-fdw.html
> Example F.1. Create a Foreign Table for PostgreSQL CSV Logs
> Here is described how to config file-fdw to get access to the PostgreSQL
> logs, but there is not description or a link how to config PostgreSQL log
> for this.
>
> Other section is:
> https://www.postgresql.org/docs/12/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG
> 19.8.4. Using CSV-Format Log Output
> Here is described how to configure PostgreSQL log to import it into
> previously created ordinary table by COPY command. May be this configuration
> is compatible with "Example F.1. Create a Foreign Table for PostgreSQL CSV
> Logs", but this method is not mentioned.
>
> My idea to improve documentation by adding cross link: to section «Example
> F.1. Create a Foreign Table for PostgreSQL CSV Logs» add link to «19.8.4.
> Using CSV-Format Log Output» as example of log configuration and to section
> «19.8.4. Using CSV-Format Log Output» add link to «Example F.1. Create a
> Foreign Table for PostgreSQL CSV Logs» as one more example how PostgreSQL
> logs can be imported inside PostgreSQL.
Good idea.  People have been confused about this before.  Attached is a
patch.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


csv.diff (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

David G Johnston
On Fri, Aug 21, 2020 at 2:58 PM Bruce Momjian <[hidden email]> wrote:
On Fri, Aug 21, 2020 at 09:25:20AM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/12/runtime-config-logging.html
> Description:
>
> There is some imperfection  in PostgreSQL doc. There are two section in
> documentation how to import PostgreSQL log into sql space and both not
> complete. One of them is:
> https://www.postgresql.org/docs/current/file-fdw.html
> Example F.1. Create a Foreign Table for PostgreSQL CSV Logs
> Here is described how to config file-fdw to get access to the PostgreSQL
> logs, but there is not description or a link how to config PostgreSQL log
> for this.
>
> Other section is:
> https://www.postgresql.org/docs/12/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG
> 19.8.4. Using CSV-Format Log Output
> Here is described how to configure PostgreSQL log to import it into
> previously created ordinary table by COPY command. May be this configuration
> is compatible with "Example F.1. Create a Foreign Table for PostgreSQL CSV
> Logs", but this method is not mentioned.
>
> My idea to improve documentation by adding cross link: to section «Example
> F.1. Create a Foreign Table for PostgreSQL CSV Logs» add link to «19.8.4.
> Using CSV-Format Log Output» as example of log configuration and to section
> «19.8.4. Using CSV-Format Log Output» add link to «Example F.1. Create a
> Foreign Table for PostgreSQL CSV Logs» as one more example how PostgreSQL
> logs can be imported inside PostgreSQL.

Good idea.  People have been confused about this before.  Attached is a
patch.

+ It is also possible to access the file as a foreign data wrapper
+        using <xref linkend="file-fdw"/>.

Seems more accurate to say "It is also possible to access the file as a foreign table, using the supplied <xref linkend="file-fdw"/> module."

The file_fdw -> config change looks good.


A bit off-topic, but since this is being touched anyway - the listing of fields in the paragraph is not particularly readable (but maybe we want to keep it for accessibility reasons?) while the CREATE TABLE statement is very readable and more accurate, though it could be better.  Adding CHECK constraints and -- comments to the CREATE TABLE command would be a welcome addition.  In particular I noticed:

paragraph: client host:port number
example: connection_from text,

could become:

connection_from text check(connection_from ~ '^[^:]+:[0-9]+$) -- the host and port of the client, colon-separated

I've been mentally playing around with the idea of having the Config section with the CREATE TABLE somehow describe both the plain table and foreign table variants directly and removing the example from the file_fdw section and instead leaving the cross-references in place from file_fdw to config to see the example and from config to file_fdw to get clarity on the options and the SERVER syntax.  As they are being written for copy-and-paste though, and it's not like we are going to change the format, having the table definition duplicated isn't a terrible option.  But consolidation is something to consider.

I may pick this up in the future unless someone thinks it wouldn't be a good idea.  I would be removing the paragraph of field names and make the table specification authoritative.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

Bruce Momjian
On Fri, Aug 21, 2020 at 08:41:54PM -0700, David G. Johnston wrote:

> On Fri, Aug 21, 2020 at 2:58 PM Bruce Momjian <[hidden email]> wrote:
>     Good idea.  People have been confused about this before.  Attached is a
>     patch.
>
>
> + It is also possible to access the file as a foreign data wrapper
> +        using <xref linkend="file-fdw"/>.
>
> Seems more accurate to say "It is also possible to access the file as a foreign
> table, using the supplied <xref linkend="file-fdw"/> module."
>
> The file_fdw -> config change looks good.
OK, updated patch attached.

> A bit off-topic, but since this is being touched anyway - the listing of fields
> in the paragraph is not particularly readable (but maybe we want to keep it for
> accessibility reasons?) while the CREATE TABLE statement is very readable and
> more accurate, though it could be better.  Adding CHECK constraints and --
> comments to the CREATE TABLE command would be a welcome addition.  In
> particular I noticed:
>
> paragraph: client host:port number
> example: connection_from text,
>
> could become:
>
> connection_from text check(connection_from ~ '^[^:]+:[0-9]+$) -- the host and
> port of the client, colon-separated
>
> I've been mentally playing around with the idea of having the Config section
> with the CREATE TABLE somehow describe both the plain table and foreign table
> variants directly and removing the example from the file_fdw section and
> instead leaving the cross-references in place from file_fdw to config to see
> the example and from config to file_fdw to get clarity on the options and the
> SERVER syntax.  As they are being written for copy-and-paste though, and it's
> not like we are going to change the format, having the table definition
> duplicated isn't a terrible option.  But consolidation is something to
> consider.
>
> I may pick this up in the future unless someone thinks it wouldn't be a good
> idea.  I would be removing the paragraph of field names and make the table
> specification authoritative.
I am a little worried about adding this since the data is generated in
an automated way, and might change, or some config value might change
its format.  I think the example is to show how to load, and adding extra
constraints would just detract from the illustration, I think.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


csv.diff (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

David G Johnston
On Sat, Aug 22, 2020 at 10:51 AM Bruce Momjian <[hidden email]> wrote:
On Fri, Aug 21, 2020 at 08:41:54PM -0700, David G. Johnston wrote:
> Adding CHECK constraints and --
> comments to the CREATE TABLE command would be a welcome addition.  In
> particular I noticed:
>
> paragraph: client host:port number
> example: connection_from text,
>
> could become:
>
> connection_from text check(connection_from ~ '^[^:]+:[0-9]+$) -- the host and
> port of the client, colon-separated

> I may pick this up in the future unless someone thinks it wouldn't be a good
> idea.  I would be removing the paragraph of field names and make the table
> specification authoritative.

I am a little worried about adding this since the data is generated in
an automated way, and might change, or some config value might change
its format.  I think the example is to show how to load, and adding extra
constraints would just detract from the illustration, I think.

Good constraints wouldn't include specific values, just general structure when appropriate.  The existing documentation already does this in the provided example - though relegating things to just comments would still be an improvement.  I see it less as an illustration in the Config section and more of a specification.  A config value changing its format is going to be something anyone using this example would have to understand and adapt to - it still seems worthwhile to have it fleshed out for the default.

The middle of the config section doesn't seem like a great place to teach that the COPY command exists.

David J.
Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

Олег Самойлов
There must not be constraints at all. Constraints are needed to check incoming data to the table. But here table is read-only for database!

So all your constraints is totally useless.
> > could become:
> >
> > connection_from text check(connection_from ~ '^[^:]+:[0-9]+$) -- the host and
> > port of the client, colon-separated
> David J.



Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

David G Johnston
On Mon, Aug 24, 2020 at 6:17 AM Олег Самойлов <[hidden email]> wrote:
There must not be constraints at all. Constraints are needed to check incoming data to the table. But here table is read-only for database!

Please don't top-post.


So all your constraints is totally useless.
> > could become:
> >
> > connection_from text check(connection_from ~ '^[^:]+:[0-9]+$) -- the host and
> > port of the client, colon-separated

I'll agree that the benefit for adding the constraints to a foreign table are less than for a normal table but it is still not zero.  Constraints are also a form of documentation.  And also can be used (at least non-null ones) during optimization.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

Bruce Momjian
On Mon, Aug 24, 2020 at 07:40:49AM -0700, David G. Johnston wrote:

> On Mon, Aug 24, 2020 at 6:17 AM Олег Самойлов <[hidden email]> wrote:
>
>     There must not be constraints at all. Constraints are needed to check
>     incoming data to the table. But here table is read-only for database!
>
>
> Please don't top-post.
>
>
>
>     So all your constraints is totally useless.
>     > > could become:
>     > >
>     > > connection_from text check(connection_from ~ '^[^:]+:[0-9]+$) -- the
>     host and
>     > > port of the client, colon-separated
>
>
> I'll agree that the benefit for adding the constraints to a foreign table are
> less than for a normal table but it is still not zero.  Constraints are also a
> form of documentation.  And also can be used (at least non-null ones) during
> optimization.

I feel constraints are going to lose focus of what we are trying to
show.  Do the constraints actually do anything on a foreign table?  If
not, we would have to mention that here too, which might be fine.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

Олег Самойлов


> 24 авг. 2020 г., в 18:07, Bruce Momjian <[hidden email]> написал(а):
>
> On Mon, Aug 24, 2020 at 07:40:49AM -0700, David G. Johnston wrote:
>> On Mon, Aug 24, 2020 at 6:17 AM Олег Самойлов <[hidden email]> wrote:
>>
>>    There must not be constraints at all. Constraints are needed to check
>>    incoming data to the table. But here table is read-only for database!
>>
>>
>> Please don't top-post.
>>
>>
>>
>>    So all your constraints is totally useless.
>>>> could become:
>>>>
>>>> connection_from text check(connection_from ~ '^[^:]+:[0-9]+$) -- the
>>    host and
>>>> port of the client, colon-separated
>>
>>
>> I'll agree that the benefit for adding the constraints to a foreign table are
>> less than for a normal table but it is still not zero.  Constraints are also a
>> form of documentation.  And also can be used (at least non-null ones) during
>> optimization.
>
> I feel constraints are going to lose focus of what we are trying to
> show.  Do the constraints actually do anything on a foreign table?  

I'll add: "on foreign table based on read only text file without indexes, etc". :) But in this case I indeed added some CHECK() constraints for old PostgreSQL and change them for the partition syntax sugar on new PostgreSQL to point on different files according to PostgreSQL default log config, where logs of each day of a week is kept in the different files. For example:

ALTER SYSTEM SET log_destination=csvlog;
SELECT pg_reload_conf();
CREATE EXTENSION file_fdw;
CREATE SERVER file_fdw FOREIGN DATA WRAPPER file_fdw;
BEGIN;
CREATE SCHEMA pglog;
CREATE TABLE pglog.pglog (
        log_time timestamp(3),
        user_name text,
        database_name text,
        process_id integer,
        connection_from text,
        session_id text,
        session_line_num bigint,
        command_tag text,
        session_start_time timestamp with time zone,
        virtual_transaction_id text,
        transaction_id bigint,
        error_severity text,
        sql_state_code text,
        message text,
        detail text,
        hint text,
        internal_query text,
        internal_query_pos integer,
        context text,
        query text,
        query_pos integer,
        location text,
        application_name text
) PARTITION BY LIST (date_part('isodow', log_time));
CREATE FOREIGN TABLE pglog.Mon
        PARTITION OF pglog.pglog FOR VALUES IN (1)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Mon.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Tue
        PARTITION OF pglog.pglog FOR VALUES IN (2)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Tue.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Wed
        PARTITION OF pglog.pglog FOR VALUES IN (3)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Wed.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Thu
        PARTITION OF pglog.pglog FOR VALUES IN (4)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Thu.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Fri
        PARTITION OF pglog.pglog FOR VALUES IN (5)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Fri.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Sat
        PARTITION OF pglog.pglog FOR VALUES IN (6)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Sat.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Sun
        PARTITION OF pglog.pglog FOR VALUES IN (7)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Sun.csv', format 'csv' );
COMMIT;

But there is another point of view. The same table schema used now not only for file_fdw access to the log, but also in example where this is an ordinary table. In this case may be some other optimisation: indexes, etc.

But do we really need in the simple example such detailed and specific code? May be better give as simple as possible example as example, which everyone will can adapt for his own needs.

Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

Николай Чадаев
Thank You Oleg.
 
A simple example is extremely important and necessary.
I will wait for this example.
 
-- 
Nick Chadaev
 
      +7(916)175-3061
      +7(958)820-7975
 
 
 
 
24.08.2020, 18:38, "Олег Самойлов" <[hidden email]>:


 

 24 авг. 2020 г., в 18:07, Bruce Momjian <[hidden email]> написал(а):
 
 On Mon, Aug 24, 2020 at 07:40:49AM -0700, David G. Johnston wrote:
 On Mon, Aug 24, 2020 at 6:17 AM Олег Самойлов <[hidden email]> wrote:
 
    There must not be constraints at all. Constraints are needed to check
    incoming data to the table. But here table is read-only for database!
 
 
 Please don't top-post.
 
 
 
    So all your constraints is totally useless.
 could become:
 
 connection_from text check(connection_from ~ '^[^:]+:[0-9]+$) -- the
    host and
 port of the client, colon-separated
 
 
 I'll agree that the benefit for adding the constraints to a foreign table are
 less than for a normal table but it is still not zero. Constraints are also a
 form of documentation. And also can be used (at least non-null ones) during
 optimization.
 
 I feel constraints are going to lose focus of what we are trying to
 show. Do the constraints actually do anything on a foreign table?


I'll add: "on foreign table based on read only text file without indexes, etc". :) But in this case I indeed added some CHECK() constraints for old PostgreSQL and change them for the partition syntax sugar on new PostgreSQL to point on different files according to PostgreSQL default log config, where logs of each day of a week is kept in the different files. For example:

ALTER SYSTEM SET log_destination=csvlog;
SELECT pg_reload_conf();
CREATE EXTENSION file_fdw;
CREATE SERVER file_fdw FOREIGN DATA WRAPPER file_fdw;
BEGIN;
CREATE SCHEMA pglog;
CREATE TABLE pglog.pglog (
        log_time timestamp(3),
        user_name text,
        database_name text,
        process_id integer,
        connection_from text,
        session_id text,
        session_line_num bigint,
        command_tag text,
        session_start_time timestamp with time zone,
        virtual_transaction_id text,
        transaction_id bigint,
        error_severity text,
        sql_state_code text,
        message text,
        detail text,
        hint text,
        internal_query text,
        internal_query_pos integer,
        context text,
        query text,
        query_pos integer,
        location text,
        application_name text
) PARTITION BY LIST (date_part('isodow', log_time));
CREATE FOREIGN TABLE pglog.Mon
        PARTITION OF pglog.pglog FOR VALUES IN (1)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Mon.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Tue
        PARTITION OF pglog.pglog FOR VALUES IN (2)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Tue.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Wed
        PARTITION OF pglog.pglog FOR VALUES IN (3)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Wed.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Thu
        PARTITION OF pglog.pglog FOR VALUES IN (4)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Thu.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Fri
        PARTITION OF pglog.pglog FOR VALUES IN (5)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Fri.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Sat
        PARTITION OF pglog.pglog FOR VALUES IN (6)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Sat.csv', format 'csv' );
CREATE FOREIGN TABLE pglog.Sun
        PARTITION OF pglog.pglog FOR VALUES IN (7)
        SERVER file_fdw
        OPTIONS ( filename '/var/lib/pgsql/12/data/log/postgresql-Sun.csv', format 'csv' );
COMMIT;

But there is another point of view. The same table schema used now not only for file_fdw access to the log, but also in example where this is an ordinary table. In this case may be some other optimisation: indexes, etc.

But do we really need in the simple example such detailed and specific code? May be better give as simple as possible example as example, which everyone will can adapt for his own needs.

Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

Олег Самойлов


> 26 Aug 2020, в 09:24, Николай Чадаев <[hidden email]> написал(а):
>
> Thank You Oleg.
>  
> A simple example is extremely important and necessary.
> I will wait for this example.

It's already exists. The idea was to add cross links to this.



Reply | Threaded
Open this post in threaded view
|

Re: Create a Foreign Table for PostgreSQL CSV Logs

Bruce Momjian
In reply to this post by Bruce Momjian
On Sat, Aug 22, 2020 at 01:51:56PM -0400, Bruce Momjian wrote:

> On Fri, Aug 21, 2020 at 08:41:54PM -0700, David G. Johnston wrote:
> > On Fri, Aug 21, 2020 at 2:58 PM Bruce Momjian <[hidden email]> wrote:
> >     Good idea.  People have been confused about this before.  Attached is a
> >     patch.
> >
> >
> > + It is also possible to access the file as a foreign data wrapper
> > +        using <xref linkend="file-fdw"/>.
> >
> > Seems more accurate to say "It is also possible to access the file as a foreign
> > table, using the supplied <xref linkend="file-fdw"/> module."
> >
> > The file_fdw -> config change looks good.
>
> OK, updated patch attached.

Patch applied.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee