Progress reporting for pg_verify_checksums

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
65 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Progress reporting for pg_verify_checksums

Michael Banck-2
Hi,

my colleague Bernd Helmle recently added progress reporting to our
pg_checksums application[1]. I have now forward ported this to
pg_verify_checksums for the September commitfest, please see the
attached patch.

Here's the description:

This optionally prints the progress of pg_verify_checksums via read
kilobytes to the terminal with the new command line argument -P.

If -P was forgotten and pg_verify_checksums operates on a large cluster,
the caller can send SIGUSR1 to pg_verify_checksums to turn progress
status reporting on during runtime.


Michael

[1] https://github.com/credativ/pg_checksums/commit/2b691
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

pg_verify_checksums_progress.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Fabien COELHO-3

Hallo Michael,

> my colleague Bernd Helmle recently added progress reporting to our
> pg_checksums application[1]. I have now forward ported this to
> pg_verify_checksums for the September commitfest, please see the
> attached patch.

It seems that the patch does not apply cleanly on master, neither with
"git apply" nor "patch". Could you rebase it?

> Here's the description:
>
> This optionally prints the progress of pg_verify_checksums via read
> kilobytes

MB? GB?

> to the terminal with the new command line argument -P.
>
> If -P was forgotten and pg_verify_checksums operates on a large cluster,
> the caller can send SIGUSR1 to pg_verify_checksums to turn progress
> status reporting on during runtime.

Hmmm. I cannot say I like the signal feature much. Would it make sense for
the progress to be on by default, and to have a quiet option instead?

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Alvaro Herrera-9
On 2018-Sep-01, Fabien COELHO wrote:

> > If -P was forgotten and pg_verify_checksums operates on a large cluster,
> > the caller can send SIGUSR1 to pg_verify_checksums to turn progress
> > status reporting on during runtime.
>
> Hmmm. I cannot say I like the signal feature much. Would it make sense for
> the progress to be on by default, and to have a quiet option instead?

Hmm, I recall this technique being used elsewhere and is sometimes
useful.  Can't remember where though -- by manpages, it's not rsync nor
pv ...

How about making it a toggle?  Default off, enable-able by option,
toggleable by signal.  (If you enable it via the signal, what's the rate
to report at?)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Michael Banck-2
Hi,

On Mon, Sep 03, 2018 at 11:21:32AM -0300, Alvaro Herrera wrote:

> On 2018-Sep-01, Fabien COELHO wrote:
> > > If -P was forgotten and pg_verify_checksums operates on a large cluster,
> > > the caller can send SIGUSR1 to pg_verify_checksums to turn progress
> > > status reporting on during runtime.
> >
> > Hmmm. I cannot say I like the signal feature much. Would it make sense for
> > the progress to be on by default, and to have a quiet option instead?
>
> Hmm, I recall this technique being used elsewhere and is sometimes
> useful.  Can't remember where though -- by manpages, it's not rsync nor
> pv ...
 
e2fsck does it. And I think it would be useful for pg_basebackup as
well, but that's for another time.

In any case, pg_basebackup has the same option and it is off by default
as well.  I guess the rationale for not having it on by default for
pg_basebackup is the fact is does not have any other output normally.

pg_verify_checksums writes a report at the end however, but maybe that
one could be demoted to verbose mode now that we have it?

> How about making it a toggle?  Default off, enable-able by option,
> toggleable by signal.  

That sounds like a good idea.

> (If you enable it via the signal, what's the rate to report at?)

You mean how often it reports progress (that seems independent of the
signal)? From testing, it's roughly every second or so.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Michael Banck-2
In reply to this post by Michael Banck-2
Hi,

Am Freitag, den 31.08.2018, 14:50 +0200 schrieb Michael Banck:

> my colleague Bernd Helmle recently added progress reporting to our
> pg_checksums application[1]. I have now forward ported this to
> pg_verify_checksums for the September commitfest, please see the
> attached patch.
>
> Here's the description:
>
> This optionally prints the progress of pg_verify_checksums via read
> kilobytes to the terminal with the new command line argument -P.
>
> If -P was forgotten and pg_verify_checksums operates on a large cluster,
> the caller can send SIGUSR1 to pg_verify_checksums to turn progress
> status reporting on during runtime.
Version 2 of the patch is attached. This is rebased to master as of
422952ee and changes the signal handling to be a toggle as suggested by
Alvaro.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

pg_verify_checksums_progress_V2.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Fabien COELHO-3

Hallo Michael,

>> This optionally prints the progress of pg_verify_checksums via read
>> kilobytes to the terminal with the new command line argument -P.
>>
>> If -P was forgotten and pg_verify_checksums operates on a large cluster,
>> the caller can send SIGUSR1 to pg_verify_checksums to turn progress
>> status reporting on during runtime.
>
> Version 2 of the patch is attached. This is rebased to master as of
> 422952ee and changes the signal handling to be a toggle as suggested by
> Alvaro.

Patch applies cleanly and compiles.

About tests: "make check" is okay, but ITSM that the command is not
started once, ever, in any test:-( It is unclear whether any test triggers
checksums anyway...

The time() granularity to the second makes the result awkward on small
tests:

  8/1554552 kB (0%, 8 kB/s)
  1040856/1554552 kB (66%, 1040856 kB/s)
  1554552/1554552 kB (100%, 1554552 kB/s)

I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
much better precision.

The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
about storage which is usually measured in powers of 1,000, so I'd suggest
to use thousands.

Another reserve I have is that on any hardware it is likely that there
will be a lot of kilos, so maybe megas (MB) should be used instead.

I'm wondering whether the bandwidth display should be local (from the last
display) or global (from the start of the command), but for the last
forced one. This is an open question.

Maybe it would be nice to show elapsed time and expected completion time
based on the current speed.

I would be in favor or having this turned on by default, and silenced with
some option. I'm not sure other people would agree, though, so it is an
open question as well.

About the code:

  +               if (show_progress)
  +                       show_progress = false;
  +               else
  +                       show_progress = true;

Why not a simple:

  /* invert current state */
  show_progress = !show_progress;

I do not see much advantage in using intermediate string buffers for
values:

  + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
  +          total_size / 1024);
  + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
  +          current_size / 1024);
  + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
  +          (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
  + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
  +          currentstr, totalstr, total_percent, currspeedstr);

ISTM that just one simple fprintf would be fine:

   fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...",
           formulas for each slot...);

ISTM that this line overwriting trick deserves a comment:

  + if (isatty(fileno(stderr)))
  +   fprintf(stderr, "\r");
  + else
  +   fprintf(stderr, "\n");

And it runs a little amok with "-v".

  + memset(&act, '\0', sizeof(act));

pg uses 0 instead of '\0' everywhere else.

   +   /* Make sure we just report at least once a second */
   +   if ((now == last_progress_update) && !force) return;

The comments seems contradictory, I understand the code makes sure that it
is "at most" once a second. Pgbench uses "-P XXX" to strigger a progress
report every XXX second. I'm unsure whether it would be useful to allow
the user to change the 1 second display interval.

I'm not sure why you removed one unrelated line:

   #include "storage/checksum.h"
   #include "storage/checksum_impl.h"

  -
   static int64 files = 0;
   static int64 blocks = 0;
   static int64 badblocks = 0;


There is a problem in the scan_file code: The added sizeonly parameter is
not used. It should be removed.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Michael Banck-2
Hi,

thanks for the review!

On Wed, Sep 19, 2018 at 05:17:05PM +0200, Fabien COELHO wrote:

> >>This optionally prints the progress of pg_verify_checksums via read
> >>kilobytes to the terminal with the new command line argument -P.
> >>
> >>If -P was forgotten and pg_verify_checksums operates on a large cluster,
> >>the caller can send SIGUSR1 to pg_verify_checksums to turn progress
> >>status reporting on during runtime.
> >
> >Version 2 of the patch is attached. This is rebased to master as of
> >422952ee and changes the signal handling to be a toggle as suggested by
> >Alvaro.
>
> Patch applies cleanly and compiles.
>
> About tests: "make check" is okay, but ITSM that the command is not started
> once, ever, in any test:-( It is unclear whether any test triggers checksums
> anyway...
Yeah, this was discussed on another thread and some basic tap tests for
pg_verify_checksums were proposed (also by me), but this hasn't been
further addressed.

Personally, I think this would be a good thing to add to v11 even.

In any case, this particular feature might not be very easy to tap test,
but I am open to suggestions, of course.

> The time() granularity to the second makes the result awkward on small
> tests:
>
>  8/1554552 kB (0%, 8 kB/s)
>  1040856/1554552 kB (66%, 1040856 kB/s)
>  1554552/1554552 kB (100%, 1554552 kB/s)
>
> I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
> much better precision.
>
> The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
> 1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
> about storage which is usually measured in powers of 1,000, so I'd suggest
> to use thousands.
>
> Another reserve I have is that on any hardware it is likely that there will
> be a lot of kilos, so maybe megas (MB) should be used instead.
The display is exactly the same as in pg_basebackup (except the
bandwith is shown as well), so right now I think it is more useful to be
consistent here.  So if we change that, pg_basebackup should be changed
as well I think.

Maybe the code could be factored out to some common file in the future.
 
> I'm wondering whether the bandwidth display should be local (from the last
> display) or global (from the start of the command), but for the last forced
> one. This is an open question.


 
> Maybe it would be nice to show elapsed time and expected completion time
> based on the current speed.

Maybe; this could be added to the factored out common code I mentioned
above.
 
> I would be in favor or having this turned on by default, and silenced with
> some option. I'm not sure other people would agree, though, so it is an open
> question as well.

If this runs in a script, it would be pretty annoying, so everybody
would have to add --no-progress so I am not convinced. Also, both
pg_basebackup and pgbench don't show progress by default.
 

> About the code:
>
>  +               if (show_progress)
>  +                       show_progress = false;
>  +               else
>  +                       show_progress = true;
>
> Why not a simple:
>
> /* invert current state */
> show_progress = !show_progress;
Fair enough.
 

> I do not see much advantage in using intermediate string buffers for values:
>
>  + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
>  +          total_size / 1024);
>  + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
>  +          current_size / 1024);
>  + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
>  +          (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
>  + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
>  +          currentstr, totalstr, total_percent, currspeedstr);
>
> ISTM that just one simple fprintf would be fine:
>
>   fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...",
>           formulas for each slot...);
That code is adopted from pg_basebackup.c and the comment there says:

|     * Separate step to keep platform-dependent format code out of
|     * translatable strings.  And we only test for INT64_FORMAT
|     * availability in snprintf, not fprintf.

So that sounds legit to me.

> ISTM that this line overwriting trick deserves a comment:
>
>  + if (isatty(fileno(stderr)))
>  +   fprintf(stderr, "\r");
>  + else
>  +   fprintf(stderr, "\n");

I agree a comment should be in there. But that code is also taken
verbatim from pg_basebackup.c (but this time, there's no comment there,
either).

How about this:

|     * If we are reporting to a terminal, send a carriage return so that
|     * we stay on the same line.  If not, send a newline.

> And it runs a little amok with "-v".

Do you suggest we should make those mutually exlusive?  I agree that in
general, -P is not very useful if -v is on, but if you have a really big
table, it should still be, no?

>  + memset(&act, '\0', sizeof(act));
>
> pg uses 0 instead of '\0' everywhere else.

Ok.
 
>   +   /* Make sure we just report at least once a second */
>   +   if ((now == last_progress_update) && !force) return;
>
> The comments seems contradictory, I understand the code makes sure that it
> is "at most" once a second.

I guess you're right as the identical code in pg_basebackup.c has a
comment saying "Max once per second".

> Pgbench uses "-P XXX" to strigger a progress
> report every XXX second. I'm unsure whether it would be useful to allow the
> user to change the 1 second display interval.

I think pgbench writes a new line for each report? In that case, having
it every second for longer runs might be annoying and I can see the
point in having in configurable.

In the case of pg_basebackup/pg_verify_checksums, I guess 1 second is
fine.
 
> I'm not sure why you removed one unrelated line:
>
>   #include "storage/checksum.h"
>   #include "storage/checksum_impl.h"
>
>  -
>   static int64 files = 0;
>   static int64 blocks = 0;
>   static int64 badblocks = 0;

Merge error on my side I guess, will fix.
 
> There is a problem in the scan_file code: The added sizeonly parameter is
> not used. It should be removed.

Right, this might have been a leftover from an earlier version of the
code, I'll check back with Bernd to make sure that was not a
porting/merge error on my side.

I've attached V3 of the patch, addressing some of your comments.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

pg_verify_checksums_progress_V3.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Thomas Munro-3
In reply to this post by Alvaro Herrera-9
On Tue, Sep 4, 2018 at 2:21 AM Alvaro Herrera <[hidden email]> wrote:

> On 2018-Sep-01, Fabien COELHO wrote:
> > > If -P was forgotten and pg_verify_checksums operates on a large cluster,
> > > the caller can send SIGUSR1 to pg_verify_checksums to turn progress
> > > status reporting on during runtime.
> >
> > Hmmm. I cannot say I like the signal feature much. Would it make sense for
> > the progress to be on by default, and to have a quiet option instead?
>
> Hmm, I recall this technique being used elsewhere and is sometimes
> useful.  Can't remember where though -- by manpages, it's not rsync nor
> pv ...

On BSD-derived systems including macOS, you can hit ^T to deliver
SIGINFO to the foreground process (much like ^C delivers SIGINT).
Many well behaved programs including rsync (at least on FreeBSD) then
show you some kind of progress or status report:

munro@asterix $ sleep 60
load: 0.11  cmd: sleep 62234 [nanslp] 1.22r 0.00u 0.00s 0% 2032k
sleep: about 58 second(s) left out of the original 60

I've contemplated various crackpot ideas about teaching psql to show
information about what my backend is doing when I hit ^T (perhaps via
a second connection, similar to the way it cancels queries?)...

There was an interesting thread[1] about dumping various things from
backends on (multiplexed) SIGUSR1.  Also if memory serves, JVMs also
dump internal stuff when you signal them, so you can confirm that they
really did eat all 64GB of your RAM.  But none of these things are
examples of enabling/disabling an on-going status reporting mode,
which is perhaps what you meant, they're more like one-off pokes to
dump information.

[1] https://www.postgresql.org/message-id/flat/20140623101501.GN16260%40awork2.anarazel.de

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Fabien COELHO-3
In reply to this post by Michael Banck-2

Hallo Michael,

About patch v3: applies cleanly and compiles.

The xml documentation should be updated! (It is kind of hard to notice
what is not there:-)

See "doc/src/sgml/ref/pg_verify_checksums.sgml".

>> The time() granularity to the second makes the result awkward on small
>> tests:
>>
>>  8/1554552 kB (0%, 8 kB/s)
>>  1040856/1554552 kB (66%, 1040856 kB/s)
>>  1554552/1554552 kB (100%, 1554552 kB/s)
>>
>> I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
>> much better precision.

I still think it would make sense to use that instead of low-precision
time().

>> The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
>> 1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
>> about storage which is usually measured in powers of 1,000, so I'd suggest
>> to use thousands.
>>
>> Another reserve I have is that on any hardware it is likely that there will
>> be a lot of kilos, so maybe megas (MB) should be used instead.
>
> The display is exactly the same as in pg_basebackup (except the
> bandwith is shown as well), so right now I think it is more useful to be
> consistent here.

Hmmm... units are units, and the display is wrong. The fact that other
commands are wrong as well does not look like a good argument to persist
in an error.

> So if we change that, pg_basebackup should be changed as well I think.

I'm okay with fixing pg_basebackup as well! I'm unsure about the best
place to put such a function, though. Probably under "src/common/" where
there is already some front-end specific code ("fe_memutils.c").

> Maybe the code could be factored out to some common file in the future.

I would be okay with a progress printing function shared between some
commands. It just needs some place. pg_rewind also has a --rewind option.

>> Maybe it would be nice to show elapsed time and expected completion time
>> based on the current speed.
>
> Maybe; this could be added to the factored out common code I mentioned
> above.

Yep.

>> I would be in favor or having this turned on by default, and silenced with
>> some option. I'm not sure other people would agree, though, so it is an open
>> question as well.
>
> If this runs in a script, it would be pretty annoying, so everybody
> would have to add --no-progress so I am not convinced. Also, both
> pg_basebackup and pgbench don't show progress by default.
>

Ok.

>> I do not see much advantage in using intermediate string buffers for values:
>>
>>  + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
>>  +          total_size / 1024);
>>  + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
>>  +          current_size / 1024);
>>  + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
>>  +          (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
>>  + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
>>  +          currentstr, totalstr, total_percent, currspeedstr);
>>
>> ISTM that just one simple fprintf would be fine:
>>
>>   fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...",
>>           formulas for each slot...);
>
> That code is adopted from pg_basebackup.c and the comment there says:
>
> |     * Separate step to keep platform-dependent format code out of
> |     * translatable strings.  And we only test for INT64_FORMAT
> |     * availability in snprintf, not fprintf.
>
> So that sounds legit to me.

Hmmm. Translation. Not sure I like much the idea of translating units,
but why not.

> |     * If we are reporting to a terminal, send a carriage return so that
> |     * we stay on the same line.  If not, send a newline.
>
>> And it runs a little amok with "-v".
>
> Do you suggest we should make those mutually exlusive?  I agree that in
> general, -P is not very useful if -v is on, but if you have a really big
> table, it should still be, no?

Yep. I was just mentionning that they interfere on a terminal, but I agree
that there is no obvious fix.

>>  + memset(&act, '\0', sizeof(act));
>>
>> pg uses 0 instead of '\0' everywhere else.
>
> Ok.

Not '0', plain 0 (the integer of value zero).

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Michael Banck-2
Hi,

On Wed, Sep 26, 2018 at 10:46:06AM +0200, Fabien COELHO wrote:
> The xml documentation should be updated! (It is kind of hard to notice what
> is not there:-)
>
> See "doc/src/sgml/ref/pg_verify_checksums.sgml".

Right, I've added a paragraph.
 

> >>The time() granularity to the second makes the result awkward on small
> >>tests:
> >>
> >> 8/1554552 kB (0%, 8 kB/s)
> >> 1040856/1554552 kB (66%, 1040856 kB/s)
> >> 1554552/1554552 kB (100%, 1554552 kB/s)
> >>
> >>I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
> >>much better precision.
>
> I still think it would make sense to use that instead of low-precision
> time().
As the use-case of this is not for small tests, I don't think it is
useful to make the code more complicated for this.
 

> >>The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
> >>1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
> >>about storage which is usually measured in powers of 1,000, so I'd suggest
> >>to use thousands.
> >>
> >>Another reserve I have is that on any hardware it is likely that there will
> >>be a lot of kilos, so maybe megas (MB) should be used instead.
> >
> >The display is exactly the same as in pg_basebackup (except the
> >bandwith is shown as well), so right now I think it is more useful to be
> >consistent here.
>
> Hmmm... units are units, and the display is wrong. The fact that other
> commands are wrong as well does not look like a good argument to persist in
> an error.
I've had a look around, and "kB" seems to be a common unit for 1024
bytes, e.g. in pg_test_fsync etc., unless I am mistaken?
 

> >So if we change that, pg_basebackup should be changed as well I think.
>
> I'm okay with fixing pg_basebackup as well! I'm unsure about the best place
> to put such a function, though. Probably under "src/common/" where there is
> already some front-end specific code ("fe_memutils.c").
>
> >Maybe the code could be factored out to some common file in the future.
>
> I would be okay with a progress printing function shared between some
> commands. It just needs some place. pg_rewind also has a --rewind option.
I guess you mean pg_rewind also has a --progress option.

I do agree it makes sense to refactor that, but I don't think this
should be part of this patch.
 
> >> + memset(&act, '\0', sizeof(act));
> >>
> >>pg uses 0 instead of '\0' everywhere else.
> >
> >Ok.
>
> Not '0', plain 0 (the integer of value zero).

Oops, thanks for spotting that.

I've attached v4 of the patch.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

pg_verify_checksums_progress_V4.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Fabien COELHO-3

>>>> The time() granularity to the second makes the result awkward on small
>>>> tests:
>>>>
>>>> 8/1554552 kB (0%, 8 kB/s)
>>>> 1040856/1554552 kB (66%, 1040856 kB/s)
>>>> 1554552/1554552 kB (100%, 1554552 kB/s)
>>>>
>>>> I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
>>>> much better precision.
>>
>> I still think it would make sense to use that instead of low-precision
>> time().
>
> As the use-case of this is not for small tests,

Even for a long run, the induce error by the 1 second imprecision on both
points is significant at the beginning of the scan.

> I don't think it is useful to make the code more complicated for this.

I do not think that it would be much more complicated to use the
portability macros to get a precise time.

>>> The display is exactly the same as in pg_basebackup (except the
>>> bandwith is shown as well), so right now I think it is more useful to be
>>> consistent here.
>>
>> Hmmm... units are units, and the display is wrong. The fact that other
>> commands are wrong as well does not look like a good argument to persist in
>> an error.
>
> I've had a look around, and "kB" seems to be a common unit for 1024
> bytes, e.g. in pg_test_fsync etc., unless I am mistaken?

I can only repeat my above comment: the fact that postgres is wrong
elsewhere is not a good reason to persist in reproducing an error.


See the mother of all knowledge https://en.wikipedia.org/wiki/Kilobyte

  - SI (decimal, 1000): kB, MB, GB, ...
  - IEC (binary 1024): KiB, MiB, GiB, ...
  - JEDEC (binary 1024, used for memory): KB, MB, GB.

Summary:
  - 1 kB = 1000 bytes
  - 1 KB = 1 KiB = 1024 bytes

Decimal is used for storage (HDD, SSD), binary for memory. That is life,
and IMHO Postgres code is not the place to invent new units.

Moreover, I still think that the progress should use MB defined as 1000000
bytes for showing the progress.

>> I would be okay with a progress printing function shared between some
>> commands. It just needs some place. pg_rewind also has a --rewind option.
>
> I guess you mean pg_rewind also has a --progress option.

Indeed.

> I do agree it makes sense to refactor that, but I don't think this
> should be part of this patch.

That's a point. I'd suggest to put the new progress function directly in
the common part, and other patches could take advantage of it for other
commands when someone feels like it.

Other comments:

function toggle_progress_report has empty lines after { and before },
but this style is not used elsewhere, I think that they should be removed.

The report_progress API should be ready to be used by another client
application, which suggest that global variables should be avoided.

Maybe:

   void report_progress(current, total, force)

and the scan start and last times could be a static variable inside the
function initialized on the first call, which would suggest to call the
function at the beginning of the scan, probably with current == 0.

Or maybe:

   time_type report_progress(current, total, start, last, force)

Which would return the last time, and the caller has responsability for
initializing a start time.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Thomas Munro-3
In reply to this post by Michael Banck-2
On Sat, Sep 29, 2018 at 1:07 AM Michael Banck <[hidden email]> wrote:
> I've attached v4 of the patch.

Hi Michael,

Windows doesn't like sigaction:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189

I'm not sure if we classify this as a "frontend" program.  Should it
be using pqsignal() from src/port/pqsignal.c?  Or perhaps just
sigaction as you have it (pqsignal.c says that we require sigaction on
all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is
never going to work anyway.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Dmitry Dolgov
> On Wed, Oct 3, 2018 at 12:51 AM Thomas Munro <[hidden email]> wrote:
>
> On Sat, Sep 29, 2018 at 1:07 AM Michael Banck <[hidden email]> wrote:
> > I've attached v4 of the patch.
>
> Hi Michael,
>
> Windows doesn't like sigaction:
>
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189
>
> I'm not sure if we classify this as a "frontend" program.  Should it
> be using pqsignal() from src/port/pqsignal.c?  Or perhaps just
> sigaction as you have it (pqsignal.c says that we require sigaction on
> all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is
> never going to work anyway.

Unfortunately, patch also has some conflicts with the current master. I'll move
it to the next CF.

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Michael Banck-2
In reply to this post by Thomas Munro-3
Hi,

On Wed, Oct 03, 2018 at 11:50:36AM +1300, Thomas Munro wrote:
> On Sat, Sep 29, 2018 at 1:07 AM Michael Banck <[hidden email]> wrote:
> Windows doesn't like sigaction:
>
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189

Thanks for the report and sorry for taking so long to reply.
 
> I'm not sure if we classify this as a "frontend" program.  Should it
> be using pqsignal() from src/port/pqsignal.c?  Or perhaps just
> sigaction as you have it (pqsignal.c says that we require sigaction on
> all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is
> never going to work anyway.

I've used pqsignal now, and disabled that feature on Windows.

I've also updated one comment and added an additional regression test.

V5 attached.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

pg_verify_checksums_progress_V5.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Fabien COELHO-3

Hallo Michael,

> V5 attached.

Patch applies cleanly, compiles, global & local make check are ok.

Given that the specific output is not checked, I do not think that the -P
check deserves a test on its own, I think that the -P option could simply
be added to any of the existing tests.

I'm still unhappy that "kB" is used for 1024 bytes in the output, contrary
to all existing standards (1 kB = 1000 bytes -- SI, 1 KB = 1 KiB = 1024
bytes -- IEC & JEDEC). The fact that this is also used wrongly elsewhere
in pg is not relevant, these are bugs to be fixed, not to be replicated.

Given the speed of verifying checksums and its storage-oriented status, I
also still think that a (possibly fractional) MB (1,000,000 bytes), or
even GB, is the right unit to use for reporting this progress. On my
laptop (SSD), verifying runs at least at 1.26 GB/s (on one small test),
there is no point in displaying kilobytes progress.

I still think that using a more precise time than time(), eg with existing
macros from "instr_time.h", would not cost anything more and result in a
better precision output. It would also allow to remove the check used to
avoid a division-by-zero by switching to double.

If the check is performed while online (other patch in queue), then the
size may change thus it may not reach or go beyond 100%. No big deal.

I'd consider inverting the sizeonly boolean, so that true does the check
and false does only the size collection. It seems more logical to me if it
performs more with true than with false.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Fabien COELHO-3

> Given the speed of verifying checksums and its storage-oriented status, I
> also still think that a (possibly fractional) MB (1,000,000 bytes), or even
> GB, is the right unit to use for reporting this progress. On my laptop (SSD),
> verifying runs at least at 1.26 GB/s (on one small test), there is no point
> in displaying kilobytes progress.

Obviously the file is cached by the system at such speed, but still most
disks should provides dozens of MB per second of read bandwidth. If GB is
used, it should use fractional display (eg 1.25 GB) though.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Michael Banck-2
Hi,

Am Dienstag, den 25.12.2018, 12:12 +0100 schrieb Fabien COELHO:
> > Given the speed of verifying checksums and its storage-oriented status, I
> > also still think that a (possibly fractional) MB (1,000,000 bytes), or even
> > GB, is the right unit to use for reporting this progress. On my laptop (SSD),
> > verifying runs at least at 1.26 GB/s (on one small test), there is no point
> > in displaying kilobytes progress.
>
> Obviously the file is cached by the system at such speed, but still most
> disks should provides dozens of MB per second of read bandwidth. If GB is
> used, it should use fractional display (eg 1.25 GB) though.

I think MB indeed makes more sense than kB, so I have changed that now
in V7, per attached.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

pg_verify_checksums_progress_V6.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Fabien COELHO-3

> I think MB indeed makes more sense than kB, so I have changed that now
> in V7, per attached.

You use 1024² bytes. What about 1000000 bytes per MB, as the unit is about
stored files?

Also, you did not answer to my other points:
  - use "instr_time.h" for better precision
  - invert sizeonly
  - reuse a test

--
Fabien.
Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Michael Paquier-2
On Tue, Dec 25, 2018 at 07:05:30PM +0100, Fabien COELHO wrote:
> You use 1024² bytes. What about 1000000 bytes per MB, as the unit is about
> stored files?
>
> Also, you did not answer to my other points:
>  - use "instr_time.h" for better precision
>  - invert sizeonly
>  - reuse a test

It seems to me that the SIGUSR1 business is not necessary for the
basic layer of this feature, so I would rather rip that off now.  If
necessary we could always discuss that later on.  My take about the
option is that --progress should not be the default, but that reports
should only be provided if the caller wants them.

--quiet may have some value by itself, but that seems like a separate
discussion to me.

+   /*
+    * If we are reporting to a terminal, send a carriage return so that we
+    * stay on the same line.  If not, send a newline.
+    */
+   if (isatty(fileno(stderr)))
+       fprintf(stderr, "\r");
+   else
+       fprintf(stderr, "\n");
This bit is not really elegant, why not just '\r'?

+   /* The same for total size */
+   if (current_size > total_size)
+       total_size = current_size / 1048576;
Let's use that in a variable and not hardcode the number.

What's introduced here is very similar to what progress_report() does
in pg_rewind/logging.c.  The report depends on the context so this
cannot be a common routine logic but perhaps we could keep a
consistent output for both tools?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Progress reporting for pg_verify_checksums

Alvaro Herrera-9
On 2018-Dec-26, Michael Paquier wrote:

> +   /*
> +    * If we are reporting to a terminal, send a carriage return so that we
> +    * stay on the same line.  If not, send a newline.
> +    */
> +   if (isatty(fileno(stderr)))
> +       fprintf(stderr, "\r");
> +   else
> +       fprintf(stderr, "\n");
> This bit is not really elegant, why not just '\r'?

Umm, this is established coding pattern in pg_basebackup.c.
Stylistically I'd change all those cases to "fprintf(stderr,
isatty(fileno(stderr)) ? "\r" : "\n")" but leave the string alone, since
AFAIR it took some time to figure out what to do.  (I'd also make the
comment one line instead of four, say "Stay on the same line if
reporting to a terminal".  That makes the whole stanza two lines rather
than eight, which is the appropriate amount of space for it).

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

1234