Improvements and additions to COPY progress reporting

Next Topic
 
classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Improvements and additions to COPY progress reporting

Matthias van de Meent-2
Hi,

With [0] we got COPY progress reporting. Before the column names of
this newly added view are effectively set in stone with the release of
pg14, I propose the following set of relatively small patches. These
are v2, because it is a patchset that is based on a set of patches
that I previously posted in [0].

0001 Adds a column to pg_stat_progress_copy which details the amount
of tuples that were excluded from insertion by the WHERE clause of the
COPY FROM command.

0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead
of 'line'-terminology. 'Line' doesn't make sense in the binary copy
case, and only for the 'text' copy format there can be a guarantee
that the source / output file actually contains the reported amount of
lines, whereas the amount of data tuples (which is also what it's
called internally) is guaranteed to equal for all data types.

There was some discussion about this in [0] where the author thought
'line' is more consistent with the CSV documentation, and where I
argued that 'tuple' is both more consistent with the rest of the
progress reporting tables and more consistent with the actual counted
items: these are the tuples serialized / inserted (as noted in the CSV
docs; "Thus the files are not strictly one line per table row like
text-format files.").

Patch 0003 adds backlinks to the progress reporting docs from the docs
of the commands that have progress reporting (re/index, cluster,
vacuum, etc.) such that progress reporting is better discoverable from
the relevant commands, and removes the datname column from the
progress_copy view (that column was never committed). This too should
be fairly trivial and uncontroversial.

0004 adds the 'command' column to the progress_copy view; which
distinguishes between COPY FROM and COPY TO. The two commands are (in
my opinion) significantly different enough to warrant this column;
similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY]
which also report that information. I believe that this change is
appropriate; as the semantics of the columns change depending on the
command being executed.

Lastly, 0005 adds 'io_target' to the reported information, that is,
FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
be determined based on the commands in pg_stat_activity, it is
reasonably something that a user would want to query on, as the
origin/target of COPY has security and performance implications,
whereas other options (e.g. format) are less interesting for clients
that are not executing that specific COPY command.

Of special interest in 0005 is that it reports the io_target for the
logical replications' initial tablesyncs' internal COPY. This would
otherwise be measured, but no knowledge about the type of copy (or its
origin) would be available on the worker's side. I'm not married to
this patch 0005, but I believe it could be useful, and therefore
included it in the patchset.


With regards,

Matthias van de Meent.


[0] https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com

v2-0005-Add-a-io_target-column-to-the-copy-progress-view.patch (8K) Download Attachment
v2-0002-Rename-lines-to-tuples-in-COPY-progress-reporting.patch (7K) Download Attachment
v2-0001-Add-progress-reporting-for-excluded-rows.patch (5K) Download Attachment
v2-0003-Add-backlinks-to-progress-reporting-documentation.patch (6K) Download Attachment
v2-0004-Add-a-command-column-to-the-copy-progress-view.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Josef Šimánek
po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
<[hidden email]> napsal:
>
> Hi,
>
> With [0] we got COPY progress reporting. Before the column names of
> this newly added view are effectively set in stone with the release of
> pg14, I propose the following set of relatively small patches. These
> are v2, because it is a patchset that is based on a set of patches
> that I previously posted in [0].

Hello. I had this in my backlog to revisit this feature as well before
the release. Thanks for picking this up.

> 0001 Adds a column to pg_stat_progress_copy which details the amount
> of tuples that were excluded from insertion by the WHERE clause of the
> COPY FROM command.
>
> 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead
> of 'line'-terminology. 'Line' doesn't make sense in the binary copy
> case, and only for the 'text' copy format there can be a guarantee
> that the source / output file actually contains the reported amount of
> lines, whereas the amount of data tuples (which is also what it's
> called internally) is guaranteed to equal for all data types.
>
> There was some discussion about this in [0] where the author thought
> 'line' is more consistent with the CSV documentation, and where I
> argued that 'tuple' is both more consistent with the rest of the
> progress reporting tables and more consistent with the actual counted
> items: these are the tuples serialized / inserted (as noted in the CSV
> docs; "Thus the files are not strictly one line per table row like
> text-format files.").

As an mentioned author I have no preference over line or tuple
terminology here. For some cases "line" terminology fits better, for
some "tuple" one. Docs can be improved later if needed to make it
clear at some cases (for example in most common case probably - CSV
import/export) tuple equals one line in CSV.

> Patch 0003 adds backlinks to the progress reporting docs from the docs
> of the commands that have progress reporting (re/index, cluster,
> vacuum, etc.) such that progress reporting is better discoverable from
> the relevant commands, and removes the datname column from the
> progress_copy view (that column was never committed). This too should
> be fairly trivial and uncontroversial.
>
> 0004 adds the 'command' column to the progress_copy view; which
> distinguishes between COPY FROM and COPY TO. The two commands are (in
> my opinion) significantly different enough to warrant this column;
> similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY]
> which also report that information. I believe that this change is
> appropriate; as the semantics of the columns change depending on the
> command being executed.

This was part of my initial patch as well, but I decided to strip it
out to make the final patch as small as possible to make it quickly
mergeable without need of further discussion. From my side this is
useful to have directly in the progress report as well.

> Lastly, 0005 adds 'io_target' to the reported information, that is,
> FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> be determined based on the commands in pg_stat_activity, it is
> reasonably something that a user would want to query on, as the
> origin/target of COPY has security and performance implications,
> whereas other options (e.g. format) are less interesting for clients
> that are not executing that specific COPY command.

Similar (simplified, not supporting CALLBACK) info was also part of
the initial patch and stripped out later. I'm also +1 on this info
being useful to have directly in the progress report.

> Of special interest in 0005 is that it reports the io_target for the
> logical replications' initial tablesyncs' internal COPY. This would
> otherwise be measured, but no knowledge about the type of copy (or its
> origin) would be available on the worker's side. I'm not married to
> this patch 0005, but I believe it could be useful, and therefore
> included it in the patchset.

All patches seem good to me. I was able to apply them to current clean
master and "make check" has succeeded without problems.

>
> With regards,
>
> Matthias van de Meent.
>
>
> [0] https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Bharath Rupireddy
In reply to this post by Matthias van de Meent-2
On Tue, Feb 9, 2021 at 12:06 AM Matthias van de Meent
<[hidden email]> wrote:
> With [0] we got COPY progress reporting. Before the column names of
> this newly added view are effectively set in stone with the release of
> pg14, I propose the following set of relatively small patches. These
> are v2, because it is a patchset that is based on a set of patches
> that I previously posted in [0].

Thanks for working on the patches. Here are some comments:

0001 - +1 to add tuples_excluded and the patch LGTM.

0002 - Yes, the tuples_processed or tuples_excluded makes more sense
to me than lines_processed and lines_excluded. The patch LGTM.

0003 - Instead of just adding the progress reporting to "See also"
sections in the footer of the respective pages analyze, cluster and
others, it would be nice if we have a mention of it in the description
as pg_basebackup has something like below:
 <para>
   Whenever <application>pg_basebackup</application> is taking a base
   backup, the server's <structname>pg_stat_progress_basebackup</structname>
   view will report the progress of the backup.
   See <xref linkend="basebackup-progress-reporting"/> for details.

0004 -
1) How about PROGRESS_COPY_COMMAND_TYPE instead of
PROGRESS_COPY_COMMAND? The names looks bit confusing with the existing
PROGRESS_COMMAND_COPY.

0005 -
1) How about
+       or <literal>CALLBACK</literal> (used in the table
synchronization background
+       worker).
instead of
+       or <literal>CALLBACK</literal> (used in the tablesync background
+       worker).
Because "table synchronization" is being used in logical-replication.sgml.

2) I think cstate->copy_src = COPY_CALLBACK is assigned after the
switch case added in copyfrom.c
    if (data_source_cb)
    {
        cstate->copy_src = COPY_CALLBACK;
        cstate->data_source_cb = data_source_cb;
    }

Also, you can add this to the current commitfest.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Josef Šimánek
In reply to this post by Matthias van de Meent-2
po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
<[hidden email]> napsal:

>
> Hi,
>
> With [0] we got COPY progress reporting. Before the column names of
> this newly added view are effectively set in stone with the release of
> pg14, I propose the following set of relatively small patches. These
> are v2, because it is a patchset that is based on a set of patches
> that I previously posted in [0].
>
> 0001 Adds a column to pg_stat_progress_copy which details the amount
> of tuples that were excluded from insertion by the WHERE clause of the
> COPY FROM command.
>
> 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead
> of 'line'-terminology. 'Line' doesn't make sense in the binary copy
> case, and only for the 'text' copy format there can be a guarantee
> that the source / output file actually contains the reported amount of
> lines, whereas the amount of data tuples (which is also what it's
> called internally) is guaranteed to equal for all data types.
>
> There was some discussion about this in [0] where the author thought
> 'line' is more consistent with the CSV documentation, and where I
> argued that 'tuple' is both more consistent with the rest of the
> progress reporting tables and more consistent with the actual counted
> items: these are the tuples serialized / inserted (as noted in the CSV
> docs; "Thus the files are not strictly one line per table row like
> text-format files.").
>
> Patch 0003 adds backlinks to the progress reporting docs from the docs
> of the commands that have progress reporting (re/index, cluster,
> vacuum, etc.) such that progress reporting is better discoverable from
> the relevant commands, and removes the datname column from the
> progress_copy view (that column was never committed). This too should
> be fairly trivial and uncontroversial.
>
> 0004 adds the 'command' column to the progress_copy view; which
> distinguishes between COPY FROM and COPY TO. The two commands are (in
> my opinion) significantly different enough to warrant this column;
> similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY]
> which also report that information. I believe that this change is
> appropriate; as the semantics of the columns change depending on the
> command being executed.
>
> Lastly, 0005 adds 'io_target' to the reported information, that is,
> FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> be determined based on the commands in pg_stat_activity, it is
> reasonably something that a user would want to query on, as the
> origin/target of COPY has security and performance implications,
> whereas other options (e.g. format) are less interesting for clients
> that are not executing that specific COPY command.
I took a little deeper look and I'm not sure if I understand FILE and
STDIO. I have finally tried to finalize some initial regress testing
of COPY command progress using triggers. I have attached the initial
patch  applicable to your changes. As you can see COPY FROM STDIN is
reported as FILE. That's probably expected, but it is a little
confusing for me since STDIN and STDIO sound similar. What is the
purpose of STDIO? When is the COPY command reported with io_target of
STDIO?

> Of special interest in 0005 is that it reports the io_target for the
> logical replications' initial tablesyncs' internal COPY. This would
> otherwise be measured, but no knowledge about the type of copy (or its
> origin) would be available on the worker's side. I'm not married to
> this patch 0005, but I believe it could be useful, and therefore
> included it in the patchset.
>
>
> With regards,
>
> Matthias van de Meent.
>
>
> [0] https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com

v2-0006-initial-regress-testing-of-COPY-progress.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

0010203112132233
On Tue, 9 Feb 2021 at 09:32, Josef Šimánek <[hidden email]> wrote:

>
> po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
> <[hidden email]> napsal:
> > Lastly, 0005 adds 'io_target' to the reported information, that is,
> > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> > be determined based on the commands in pg_stat_activity, it is
> > reasonably something that a user would want to query on, as the
> > origin/target of COPY has security and performance implications,
> > whereas other options (e.g. format) are less interesting for clients
> > that are not executing that specific COPY command.
>
> I took a little deeper look and I'm not sure if I understand FILE and
> STDIO. I have finally tried to finalize some initial regress testing
> of COPY command progress using triggers. I have attached the initial
> patch  applicable to your changes. As you can see COPY FROM STDIN is
> reported as FILE. That's probably expected, but it is a little
> confusing for me since STDIN and STDIO sound similar. What is the
> purpose of STDIO? When is the COPY command reported with io_target of
> STDIO?

I checked for the type of the copy_src before it was correctly set,
therefore only reporting FILE type, but this will be fixed shortly in
v3.

Matthias


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Josef Šimánek
út 9. 2. 2021 v 12:51 odesílatel 0010203112132233 <[hidden email]> napsal:

>
> On Tue, 9 Feb 2021 at 09:32, Josef Šimánek <[hidden email]> wrote:
> >
> > po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
> > <[hidden email]> napsal:
> > > Lastly, 0005 adds 'io_target' to the reported information, that is,
> > > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> > > be determined based on the commands in pg_stat_activity, it is
> > > reasonably something that a user would want to query on, as the
> > > origin/target of COPY has security and performance implications,
> > > whereas other options (e.g. format) are less interesting for clients
> > > that are not executing that specific COPY command.
> >
> > I took a little deeper look and I'm not sure if I understand FILE and
> > STDIO. I have finally tried to finalize some initial regress testing
> > of COPY command progress using triggers. I have attached the initial
> > patch  applicable to your changes. As you can see COPY FROM STDIN is
> > reported as FILE. That's probably expected, but it is a little
> > confusing for me since STDIN and STDIO sound similar. What is the
> > purpose of STDIO? When is the COPY command reported with io_target of
> > STDIO?
>
> I checked for the type of the copy_src before it was correctly set,
> therefore only reporting FILE type, but this will be fixed shortly in
> v3.

OK, would you mind to integrate my regression test initial patch as
well in v3 or should I submit it later in a separate way?

> Matthias


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Matthias van de Meent-2
In reply to this post by Bharath Rupireddy
On Tue, 9 Feb 2021 at 08:12, Bharath Rupireddy
<[hidden email]> wrote:

>
> On Tue, Feb 9, 2021 at 12:06 AM Matthias van de Meent
> <[hidden email]> wrote:
> > With [0] we got COPY progress reporting. Before the column names of
> > this newly added view are effectively set in stone with the release of
> > pg14, I propose the following set of relatively small patches. These
> > are v2, because it is a patchset that is based on a set of patches
> > that I previously posted in [0].
>
> Thanks for working on the patches. Here are some comments:
>
> 0001 - +1 to add tuples_excluded and the patch LGTM.
>
> 0002 - Yes, the tuples_processed or tuples_excluded makes more sense
> to me than lines_processed and lines_excluded. The patch LGTM.
>
> 0003 - Instead of just adding the progress reporting to "See also"
> sections in the footer of the respective pages analyze, cluster and
> others, it would be nice if we have a mention of it in the description
> as pg_basebackup has something like below:
>  <para>
>    Whenever <application>pg_basebackup</application> is taking a base
>    backup, the server's <structname>pg_stat_progress_basebackup</structname>
>    view will report the progress of the backup.
>    See <xref linkend="basebackup-progress-reporting"/> for details.
Added

> 0004 -
> 1) How about PROGRESS_COPY_COMMAND_TYPE instead of
> PROGRESS_COPY_COMMAND? The names looks bit confusing with the existing
> PROGRESS_COMMAND_COPY.

The current name is consistent with the naming of the other
command-reporting progress views; CREATEIDX and CLUSTER both use the
*_COMMAND as this column indexes' internal name.

> 0005 -
> 1) How about
> +       or <literal>CALLBACK</literal> (used in the table
> synchronization background
> +       worker).
> instead of
> +       or <literal>CALLBACK</literal> (used in the tablesync background
> +       worker).
> Because "table synchronization" is being used in logical-replication.sgml.

Fixed

> 2) I think cstate->copy_src = COPY_CALLBACK is assigned after the
> switch case added in copyfrom.c
>     if (data_source_cb)
>     {
>         cstate->copy_src = COPY_CALLBACK;
>         cstate->data_source_cb = data_source_cb;
>     }

Yes, I noticed this too while working on the patchset, but apparently
didn't act on this... Fixed in attachted version.

> Also, you can add this to the current commitfest.

See https://commitfest.postgresql.org/32/2977/

On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <[hidden email]> wrote:
>
> OK, would you mind to integrate my regression test initial patch as
> well in v3 or should I submit it later in a separate way?

Attached, with minor fixes


With regards,

Matthias van de Meent

v3-0006-Add-progress-reporting-regression-tests.patch (6K) Download Attachment
v3-0004-Add-a-command-column-to-the-copy-progress-view.patch (5K) Download Attachment
v3-0003-Add-backlinks-to-progress-reporting-documentation.patch (10K) Download Attachment
v3-0002-Rename-lines-to-tuples-in-COPY-progress-reporting.patch (7K) Download Attachment
v3-0005-Add-a-io_target-column-to-the-copy-progress-view.patch (8K) Download Attachment
v3-0001-Add-progress-reporting-for-excluded-rows.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Bharath Rupireddy
On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
<[hidden email]> wrote:

> > Also, you can add this to the current commitfest.
>
> See https://commitfest.postgresql.org/32/2977/
>
> On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <[hidden email]> wrote:
> >
> > OK, would you mind to integrate my regression test initial patch as
> > well in v3 or should I submit it later in a separate way?
>
> Attached, with minor fixes

Why do we need to have a new test file progress.sql for the test
cases? Can't we add them into existing copy.sql or copy2.sql? Or do
you have a plan to add test cases into progress.sql for other progress
reporting commands?

IMO, it's better not add any new test file but add the tests to existing files.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Matthias van de Meent-2
On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
<[hidden email]> wrote:

>
> On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
> <[hidden email]> wrote:
> > > Also, you can add this to the current commitfest.
> >
> > See https://commitfest.postgresql.org/32/2977/
> >
> > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <[hidden email]> wrote:
> > >
> > > OK, would you mind to integrate my regression test initial patch as
> > > well in v3 or should I submit it later in a separate way?
> >
> > Attached, with minor fixes
>
> Why do we need to have a new test file progress.sql for the test
> cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> you have a plan to add test cases into progress.sql for other progress
> reporting commands?

I don't mind moving the test into copy or copy2, but the main reason
to put it in a seperate file is to test the 'copy' component of the
feature called 'progress reporting'. If the feature instead is 'copy'
and 'progress reporting' is part of that feature, then I'd put it in
the copy-tests, but because the documentation of this has it's own
docs page  'progress reporting', and because 'copy' is a subsection of
that, I do think that this feature warrants its own regression test
file.

There are no other tests for the progress reporting feature yet,
because COPY ... FROM is the only command that is progress reported
_and_ that can fire triggers while running the command, so checking
the progress view during the progress reported command is only
feasable in COPY progress reporting. To test the other progress
reporting views, we would need multiple sessions, which I believe is
impossible in this test format. Please correct me if I'm wrong; I'd
love to add tests for the other components. That will not be in this
patchset, though.

> IMO, it's better not add any new test file but add the tests to existing files.

In general I agree, but in some cases (e.g. new system component, new
full-fledged feature), new test files are needed. I think that this
could be one of those cases.


With regards,

Matthias van de Meent


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Josef Šimánek
čt 11. 2. 2021 v 15:27 odesílatel Matthias van de Meent
<[hidden email]> napsal:

>
> On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
> <[hidden email]> wrote:
> >
> > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
> > <[hidden email]> wrote:
> > > > Also, you can add this to the current commitfest.
> > >
> > > See https://commitfest.postgresql.org/32/2977/
> > >
> > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <[hidden email]> wrote:
> > > >
> > > > OK, would you mind to integrate my regression test initial patch as
> > > > well in v3 or should I submit it later in a separate way?
> > >
> > > Attached, with minor fixes
> >
> > Why do we need to have a new test file progress.sql for the test
> > cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> > you have a plan to add test cases into progress.sql for other progress
> > reporting commands?
>
> I don't mind moving the test into copy or copy2, but the main reason
> to put it in a seperate file is to test the 'copy' component of the
> feature called 'progress reporting'. If the feature instead is 'copy'
> and 'progress reporting' is part of that feature, then I'd put it in
> the copy-tests, but because the documentation of this has it's own
> docs page  'progress reporting', and because 'copy' is a subsection of
> that, I do think that this feature warrants its own regression test
> file.
>
> There are no other tests for the progress reporting feature yet,
> because COPY ... FROM is the only command that is progress reported
> _and_ that can fire triggers while running the command, so checking
> the progress view during the progress reported command is only
> feasable in COPY progress reporting. To test the other progress
> reporting views, we would need multiple sessions, which I believe is
> impossible in this test format. Please correct me if I'm wrong; I'd
> love to add tests for the other components. That will not be in this
> patchset, though.
>
> > IMO, it's better not add any new test file but add the tests to existing files.
>
> In general I agree, but in some cases (e.g. new system component, new
> full-fledged feature), new test files are needed. I think that this
> could be one of those cases.

I have split it since it should be the start of progress reporting
testing at all. If you better consider this as part of COPY testing,
feel free to move it to already existing copy testing related files.
There's no real reason to keep it separated if not needed.

>
> With regards,
>
> Matthias van de Meent


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Bharath Rupireddy

On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <[hidden email]> wrote:
čt 11. 2. 2021 v 15:27 odesílatel Matthias van de Meent
<[hidden email]> napsal:
>
> On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
> <[hidden email]> wrote:
> >
> > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
> > <[hidden email]> wrote:
> > > > Also, you can add this to the current commitfest.
> > >
> > > See https://commitfest.postgresql.org/32/2977/
> > >
> > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <[hidden email]> wrote:
> > > >
> > > > OK, would you mind to integrate my regression test initial patch as
> > > > well in v3 or should I submit it later in a separate way?
> > >
> > > Attached, with minor fixes
> >
> > Why do we need to have a new test file progress.sql for the test
> > cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> > you have a plan to add test cases into progress.sql for other progress
> > reporting commands?
>
> I don't mind moving the test into copy or copy2, but the main reason
> to put it in a seperate file is to test the 'copy' component of the
> feature called 'progress reporting'. If the feature instead is 'copy'
> and 'progress reporting' is part of that feature, then I'd put it in
> the copy-tests, but because the documentation of this has it's own
> docs page  'progress reporting', and because 'copy' is a subsection of
> that, I do think that this feature warrants its own regression test
> file.
>
> There are no other tests for the progress reporting feature yet,
> because COPY ... FROM is the only command that is progress reported
> _and_ that can fire triggers while running the command, so checking
> the progress view during the progress reported command is only
> feasable in COPY progress reporting. To test the other progress
> reporting views, we would need multiple sessions, which I believe is
> impossible in this test format. Please correct me if I'm wrong; I'd
> love to add tests for the other components. That will not be in this
> patchset, though.
>
> > IMO, it's better not add any new test file but add the tests to existing files.
>
> In general I agree, but in some cases (e.g. new system component, new
> full-fledged feature), new test files are needed. I think that this
> could be one of those cases.

I have split it since it should be the start of progress reporting
testing at all. If you better consider this as part of COPY testing,
feel free to move it to already existing copy testing related files.
There's no real reason to keep it separated if not needed.

+1 to move those test cases to existing copy test files.

Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Matthias van de Meent-2
On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy
<[hidden email]> wrote:

>
>
> On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <[hidden email]> wrote:
>> I have split it since it should be the start of progress reporting
>> testing at all. If you better consider this as part of COPY testing,
>> feel free to move it to already existing copy testing related files.
>> There's no real reason to keep it separated if not needed.
>
>
> +1 to move those test cases to existing copy test files.
Thanks for your reviews. PFA v4 of the patchset, in which the tests
are put into copy.sql (well, input/copy.source). This also adds tests
for correctly reporting COPY ... FROM 'file'.

I've changed the notice-alerted format from manually naming each
column to calling to_jsonb and removing the unstable columns from the
reported value; this should therefore be stable and give direct notice
to changes in the view.

With regards,

Matthias van de Meent.

v4-0004-Add-a-command-column-to-the-copy-progress-view.patch (5K) Download Attachment
v4-0006-Add-copy-progress-reporting-regression-tests.patch (6K) Download Attachment
v4-0002-Rename-lines-to-tuples-in-COPY-progress-reporting.patch (7K) Download Attachment
v4-0003-Add-backlinks-to-progress-reporting-documentation.patch (10K) Download Attachment
v4-0005-Add-a-io_target-column-to-the-copy-progress-view.patch (8K) Download Attachment
v4-0001-Add-progress-reporting-for-excluded-rows.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Matthias van de Meent-2
On Fri, 12 Feb 2021 at 12:23, Matthias van de Meent
<[hidden email]> wrote:

>
> On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy
> <[hidden email]> wrote:
> >
> >
> > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <[hidden email]> wrote:
> >> I have split it since it should be the start of progress reporting
> >> testing at all. If you better consider this as part of COPY testing,
> >> feel free to move it to already existing copy testing related files.
> >> There's no real reason to keep it separated if not needed.
> >
> >
> > +1 to move those test cases to existing copy test files.
>
> Thanks for your reviews. PFA v4 of the patchset, in which the tests
> are put into copy.sql (well, input/copy.source). This also adds tests
> for correctly reporting COPY ... FROM 'file'.
PFA v5, which fixes a failure in the pg_upgrade regression tests due
to incorrect usage of @abs_builddir@. I had the changes staged, but
forgot to add them to the patches.

Sorry for the noise.

-Matthias

v5-0006-Add-copy-progress-reporting-regression-tests.patch (2K) Download Attachment
v5-0005-Add-copy-progress-reporting-regression-tests.patch (6K) Download Attachment
v5-0002-Add-backlinks-to-progress-reporting-documentation.patch (10K) Download Attachment
v5-0004-Add-a-io_target-column-to-the-copy-progress-view.patch (8K) Download Attachment
v5-0003-Add-a-command-column-to-the-copy-progress-view.patch (5K) Download Attachment
v5-0001-Rename-lines-to-tuples-in-COPY-progress-reporting.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Bharath Rupireddy
On Fri, Feb 12, 2021 at 5:40 PM Matthias van de Meent
<[hidden email]> wrote:

>
> On Fri, 12 Feb 2021 at 12:23, Matthias van de Meent
> <[hidden email]> wrote:
> >
> > On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy
> > <[hidden email]> wrote:
> > >
> > >
> > > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <[hidden email]> wrote:
> > >> I have split it since it should be the start of progress reporting
> > >> testing at all. If you better consider this as part of COPY testing,
> > >> feel free to move it to already existing copy testing related files.
> > >> There's no real reason to keep it separated if not needed.
> > >
> > >
> > > +1 to move those test cases to existing copy test files.
> >
> > Thanks for your reviews. PFA v4 of the patchset, in which the tests
> > are put into copy.sql (well, input/copy.source). This also adds tests
> > for correctly reporting COPY ... FROM 'file'.
>
> PFA v5, which fixes a failure in the pg_upgrade regression tests due
> to incorrect usage of @abs_builddir@. I had the changes staged, but
> forgot to add them to the patches.
>
> Sorry for the noise.

Looks like the patch 0001 that was adding tuples_excluded was missing
and cfbot is also not happy with the v5 patch set.

Maybe, we may not need 6 patches as they are relatively very small
patches. IMO, the following are enough:

0001 - tuples_excluded, lines to tuples change, COPY FROM/COPY TO
addition, io_target -- basically all the code related patches can go
into 0001
0002 - documentation
0003 - tests - I think we can only have a simple test(in copy2.sql)
showing stdin/stdout and not have file related tests. Because these
patches work as expected, please find my testing below:

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid |  command  | io_target |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+-----------+-----------+-----------------+-------------+------------------+-----------------
 2886103 | 12977 | postgres | 16384 | COPY FROM | FILE      |
83099648 |    85777795 |          9553999 |         1111111
(1 row)

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid |  command  | io_target |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+-----------+-----------+-----------------+-------------+------------------+-----------------
 2886103 | 12977 | postgres | 16384 | COPY FROM | STDIO     |
     0 |           0 |                0 |               0
(1 row)

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid | command | io_target |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+---------+-----------+-----------------+-------------+------------------+-----------------
 2886103 | 12977 | postgres | 16384 | COPY TO | FILE      |
37771610 |           0 |          4999228 |               0
(1 row)

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid |  command  | io_target |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
---------+-------+----------+-------+-----------+-----------+-----------------+-------------+------------------+-----------------
 2892816 | 12977 | postgres | 16384 | COPY FROM | CALLBACK  |
249777823 |           0 |         31888892 |               0
(1 row)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Matthias van de Meent-2
On Fri, 12 Feb 2021 at 13:40, Bharath Rupireddy
<[hidden email]> wrote:

>
> On Fri, Feb 12, 2021 at 5:40 PM Matthias van de Meent
> <[hidden email]> wrote:
> >
> > On Fri, 12 Feb 2021 at 12:23, Matthias van de Meent
> > <[hidden email]> wrote:
> > >
> > > On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy
> > > <[hidden email]> wrote:
> > > >
> > > >
> > > > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <[hidden email]> wrote:
> > > >> I have split it since it should be the start of progress reporting
> > > >> testing at all. If you better consider this as part of COPY testing,
> > > >> feel free to move it to already existing copy testing related files.
> > > >> There's no real reason to keep it separated if not needed.
> > > >
> > > >
> > > > +1 to move those test cases to existing copy test files.
> > >
> > > Thanks for your reviews. PFA v4 of the patchset, in which the tests
> > > are put into copy.sql (well, input/copy.source). This also adds tests
> > > for correctly reporting COPY ... FROM 'file'.
> >
> > PFA v5, which fixes a failure in the pg_upgrade regression tests due
> > to incorrect usage of @abs_builddir@. I had the changes staged, but
> > forgot to add them to the patches.
> >
> > Sorry for the noise.
>
> Looks like the patch 0001 that was adding tuples_excluded was missing
> and cfbot is also not happy with the v5 patch set.
>
> Maybe, we may not need 6 patches as they are relatively very small
> patches. IMO, the following are enough:
>
> 0001 - tuples_excluded, lines to tuples change, COPY FROM/COPY TO
> addition, io_target -- basically all the code related patches can go
> into 0001
> 0002 - documentation
> 0003 - tests - I think we can only have a simple test(in copy2.sql)
> showing stdin/stdout and not have file related tests. Because these
> patches work as expected, please find my testing below:
I agree with that split, the current split was mainly for the reason
that some of the patches (except 1, 3 and 6, which are quite
substantially different from the rest) each have had their seperate
concerns voiced about the changes contained in that patch (be it
direct or indirect); e.g. the renaming of lines_* to tuples_* is in my
opionion a good thing, and Josef disagrees.

Anyway, please find attached patchset v6 applying that split.

Regarding only a simple test: I believe it is useful to have at least
a test that distinguishes between two different import types. I've
made a mistake before, so I think it is useful to add a regression
tests to prevent someone else from making this same mistake (trivial
as it may be). Additionally, testing in copy.sql also allows for
validating the bytes_total column, which cannot be tested in copy2.sql
due to the lack of COPY FROM FILE -support over there. I'm +0.5 on
keeping it as-is in copy.sql, so unless someone has some strong
feelings about this, I'd like to keep it in copy.sql.

With regards,

Matthias van de Meent.

v6-0002-Add-backlinks-to-progress-reporting-documentation.patch (10K) Download Attachment
v6-0003-Add-copy-progress-reporting-regression-tests.patch (6K) Download Attachment
v6-0001-Add-progress-reported-components-for-COPY-progres.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Justin Pryzby
--- a/doc/src/sgml/ref/analyze.sgml                                                                                                                                                                                            
+++ b/doc/src/sgml/ref/analyze.sgml                                                                                                                                                                                            
@@ -273,6 +273,12 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea                                                                                                                          
     will not record new statistics for that table.  Any existing statistics                                                                                                                                                  
     will be retained.                                                                                                                                                                                                        
   </para>                                                                                                                                                                                                                    
+                                                                                                                                                                                                                              
+  <para>                                                                                                                                                                                                                      
+    Each backend running the <command>ANALYZE</command> command will report their                                                                                                                                            
+    progress to the <structname>pg_stat_progress_analyze</structname> view.                                                                                                                                                  
+    See <xref linkend="analyze-progress-reporting"/> for details.                                                                                                                                                            
+  </para>                                                                                                                                                                                                                    

I think this should say:

"..will report its progress to.."

Or:

"The progress of each backend running >ANALYZE< is reported in the
>pg_stat_progress_analyze< view."

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Tomas Vondra-6
In reply to this post by Matthias van de Meent-2
Hi,

I agree with these changes in general - I have a couple minor comment:

1) 0001

- the SGML docs are missing a couple tags

- The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
we do this in our codebase. Move the variable declarations to the
beginning, get rid of the out block. Or something like that.

- I fir the "io_target" name misleading, because in some cases it's
actually the *source*.


2) 0002

- I believe "each backend ... reports its" (not theirs), right?

- This seems more like a generic docs improvement, not quite specific to
the COPY progress patch. It's a bit buried, maybe it should be posted
separately. OTOH it's pretty small.


3) 0003

- Some whitespace noise, triggering "git am" warnings.

- Might be good to briefly explain what the regression test does with
the triggers, etc.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v7-0001-Add-progress-reported-components-for-COPY-progress-r.patch (10K) Download Attachment
v7-0002-0001-review.patch (1K) Download Attachment
v7-0003-Add-backlinks-to-progress-reporting-documentation.patch (7K) Download Attachment
v7-0004-0002-review.patch (4K) Download Attachment
v7-0005-Add-copy-progress-reporting-regression-tests.patch (4K) Download Attachment
v7-0006-0003-review.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Matthias van de Meent-2
Hi,

Thank you all for the suggestions. PFA version 8 of the patchset, in
which I have applied most of your comments. Unless explicitly named
below, I have applied the suggestions.


On Mon, 15 Feb 2021 at 17:07, Tomas Vondra
<[hidden email]> wrote:
>
> - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
> we do this in our codebase.

I saw this being used in (re)index progress reporting, that's where I
took inspiration from. It has been fixed in the attached version.

> - I fir the "io_target" name misleading, because in some cases it's
> actually the *source*.

Yes, I was also not quite happy with this, but couldn't find a better
one at the point of writing the initial patchset. Would
"io_operations", "io_port", "operates_through" or "through" maybe be
better?


With regards,

Matthias van de Meent

v8-0001-Add-progress-reported-components-for-COPY-progres.patch (16K) Download Attachment
v8-0002-Add-backlinks-to-progress-reporting-documentation.patch (10K) Download Attachment
v8-0003-Add-copy-progress-reporting-regression-tests.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Tomas Vondra-6


On 2/18/21 4:46 PM, Matthias van de Meent wrote:
> Hi,
>
> Thank you all for the suggestions. PFA version 8 of the patchset, in
> which I have applied most of your comments. Unless explicitly named
> below, I have applied the suggestions.
>

Thanks.

>
> On Mon, 15 Feb 2021 at 17:07, Tomas Vondra
> <[hidden email]> wrote:
>>
>> - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
>> we do this in our codebase.
>
> I saw this being used in (re)index progress reporting, that's where I
> took inspiration from. It has been fixed in the attached version.
>

Hmmm, good point. I haven't looked at the other places reporting
progress and I only ever saw this pattern in old code. I kinda dislike
these blocks, but admittedly that's rather subjective view. So if other
similar places do this when reporting progress, this probably should
too. What's your opinion on this?

>> - I fir the "io_target" name misleading, because in some cases it's
>> actually the *source*.
>
> Yes, I was also not quite happy with this, but couldn't find a better
> one at the point of writing the initial patchset. Would
> "io_operations", "io_port", "operates_through" or "through" maybe be
> better?
>

No idea. Let's see if someone has a better proposal ...


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Improvements and additions to COPY progress reporting

Bharath Rupireddy
On Fri, Feb 19, 2021 at 2:34 AM Tomas Vondra
<[hidden email]> wrote:

> > On Mon, 15 Feb 2021 at 17:07, Tomas Vondra
> > <[hidden email]> wrote:
> >>
> >> - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
> >> we do this in our codebase.
> >
> > I saw this being used in (re)index progress reporting, that's where I
> > took inspiration from. It has been fixed in the attached version.
> >
>
> Hmmm, good point. I haven't looked at the other places reporting
> progress and I only ever saw this pattern in old code. I kinda dislike
> these blocks, but admittedly that's rather subjective view. So if other
> similar places do this when reporting progress, this probably should
> too. What's your opinion on this?

Actually in the code base the style of that variable declaration and
usage of pgstat_progress_update_multi_param is a mix. For instance, in
lazy_scan_heap, ReindexRelationConcurrently, the variables are
declared at the start of the function. And in _bt_spools_heapscan,
index_build, validate_index, perform_base_backup, the variables are
declared within a separate block.

IMO, we can have the arrays declared at the start of the functions
i.e. the way it's done in v8-0001, because we can extend them for
reporting some other parameter(maybe in future).

> >> - I fir the "io_target" name misleading, because in some cases it's
> >> actually the *source*.
> >
> > Yes, I was also not quite happy with this, but couldn't find a better
> > one at the point of writing the initial patchset. Would
> > "io_operations", "io_port", "operates_through" or "through" maybe be
> > better?
> >
>
> No idea. Let's see if someone has a better proposal ...

 For COPY TO the name "source_type" column and for COPY FROM the name
"destination_type" makes sense. To have a combined column name for
both, how about naming that column as "io_type"?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


12