[PATCH] Initial progress reporting for COPY command

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH] Initial progress reporting for COPY command

Josef Šimánek
Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.

Few examples first:

"COPY (SELECT * FROM test) TO '/tmp/ids';"

yr=# SELECT * from pg_stat_progress_copy;
   pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
 3347126 | 16384 | yr      |     0 | TO        | t    | f       |         3529943 |             24906226
(1 row)
 
"COPY test FROM '/tmp/ids';

yr=# SELECT * from pg_stat_progress_copy;
   pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
 3347126 | 16384 | yr      | 16385 | FROM      | t    | f       |       121591999 |            957218816
(1 row)

Columns are inspired by CREATE INDEX progress report system view.

pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since oid should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known (for example when copying to file, it is 0)
direction - text - one of "FROM" or "TO" depends on command used
file - bool - is file is used?
program - bool - is program used?
lines_processed - bigint - amount of processed lines, works for both directions (FROM/TO)
file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)

Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.

Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patch

I havefew initial notes and questions.

I'm using ftell to get current position in file to populate file_bytes_processed without error handling (ftell can return -1L and also populate errno on problems).

1. Is that a good way to get progress of file processing?
2. Is it safe in given context to not care about errors? If not, what to do on error?

Some columns are not populated on certain COPY commands. For example when a file is not used, file_bytes_processed is set to 0. Would it be better to use NULL instead when the column is not related to the current command? Same problem is for relid column.

I have not found any tests for progress reporting. Are there any? It would need two backends running (one running COPY, one checking output of report view). Is there any similar test I can inspire at? In theory, it should be possible to use dblink_send_query to run async COPY command in the background.

My initial (attached) patch also doesn't introduce documentation for this system view. I can add that later once this patch is finalized (if that happens).

copy-progress-v1.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Michael Paquier-2
Hi Josef,

On Sun, Jun 14, 2020 at 02:32:33PM +0200, Josef Šimánek wrote:
> Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL
> maillist (
> https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
> I have prepared an initial patch for COPY command progress reporting.

Sounds like a good idea to me.

> I have not found any tests for progress reporting. Are there any? It would
> need two backends running (one running COPY, one checking output of report
> view). Is there any similar test I can inspire at? In theory, it should be
> possible to use dblink_send_query to run async COPY command in the
> background.

We don't have any tests in core.  I think that making deterministic
test cases is rather tricky here as long as we don't have a more
advanced testing framework that allows is to lock certain code paths
and keep around an expected state until a second session comes around
and looks at the progress catalog (even that would need adding more
code to core to mark the extra point looked at).  So I think that it is
fine to not focus on that for this feature.  The important parts are
the choice of the progress points and the data sent to MyProc, and
both should be chosen wisely.

> My initial (attached) patch also doesn't introduce documentation for this
> system view. I can add that later once this patch is finalized (if that
> happens).

You may want to add it to the next commit fest:
https://commitfest.postgresql.org/28/
Documentation is necessary, and having some would ease reviews.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Fujii Masao-4
In reply to this post by Josef Šimánek


On 2020/06/14 21:32, Josef Šimánek wrote:
> Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.

Sounds nice!


> file - bool - is file is used?
> program - bool - is program used?

Are these fields really necessary in a progress view?
What values are reported when STDOUT/STDIN is specified in COPY command?


> file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
> FROM/TO) when file is used (file = t)

What value is reported when STDOUT/STDIN is specified in COPY command?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Bharath Rupireddy
In reply to this post by Josef Šimánek
> I'm using ftell to get current position in file to populate file_bytes_processed without error handling (ftell can return -1L and also populate errno on problems).
>
> 1. Is that a good way to get progress of file processing?

IMO, it's better to handle the error cases. One possible case where
ftell can return -1 and set errno is when the total bytes processed is
more than LONG_MAX.

Will your patch handle file_bytes_processed reporting for COPY FROM
STDIN cases? For this case, ftell can't be used.

Instead of using ftell and worrying about the errors, a simple
approach could be to have a uint64 variable in CopyStateData to track
the number of bytes read whenever CopyGetData is called. This approach
can also handle the case of COPY FROM STDIN.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Josef Šimánek
In reply to this post by Michael Paquier-2


po 15. 6. 2020 v 2:18 odesílatel Michael Paquier <[hidden email]> napsal:
Hi Josef,

On Sun, Jun 14, 2020 at 02:32:33PM +0200, Josef Šimánek wrote:
> Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL
> maillist (
> https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
> I have prepared an initial patch for COPY command progress reporting.

Sounds like a good idea to me.

Great. I will continue working on this.
 
> I have not found any tests for progress reporting. Are there any? It would
> need two backends running (one running COPY, one checking output of report
> view). Is there any similar test I can inspire at? In theory, it should be
> possible to use dblink_send_query to run async COPY command in the
> background.

We don't have any tests in core.  I think that making deterministic
test cases is rather tricky here as long as we don't have a more
advanced testing framework that allows is to lock certain code paths
and keep around an expected state until a second session comes around
and looks at the progress catalog (even that would need adding more
code to core to mark the extra point looked at).  So I think that it is
fine to not focus on that for this feature.  The important parts are
the choice of the progress points and the data sent to MyProc, and
both should be chosen wisely.

Thanks for the info. I'm focusing exactly at looking for right spots to report the progress. I'll attach new patch with better places and supporting more options of reporting (including STDIN, STDOUT) soon and also I'll try to add it to commitfest.
 

> My initial (attached) patch also doesn't introduce documentation for this
> system view. I can add that later once this patch is finalized (if that
> happens).

You may want to add it to the next commit fest:
https://commitfest.postgresql.org/28/
Documentation is necessary, and having some would ease reviews.
--
Michael
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Josef Šimánek
In reply to this post by Fujii Masao-4


po 15. 6. 2020 v 6:39 odesílatel Fujii Masao <[hidden email]> napsal:


On 2020/06/14 21:32, Josef Šimánek wrote:
> Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.

Sounds nice!


> file - bool - is file is used?
> program - bool - is program used?

Are these fields really necessary in a progress view?
What values are reported when STDOUT/STDIN is specified in COPY command?

For STDOUT and STDIN file is true and program is false.
 
> file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
> FROM/TO) when file is used (file = t)

What value is reported when STDOUT/STDIN is specified in COPY command?

For my first patch nothing was reported on STDOUT/STDIN usage. I'll attach new patch soon supporting those as well.
 
 
 
Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Josef Šimánek
In reply to this post by Michael Paquier-2


po 15. 6. 2020 v 2:18 odesílatel Michael Paquier <[hidden email]> napsal:
Hi Josef,

On Sun, Jun 14, 2020 at 02:32:33PM +0200, Josef Šimánek wrote:
> Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL
> maillist (
> https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
> I have prepared an initial patch for COPY command progress reporting.

Sounds like a good idea to me.

> I have not found any tests for progress reporting. Are there any? It would
> need two backends running (one running COPY, one checking output of report
> view). Is there any similar test I can inspire at? In theory, it should be
> possible to use dblink_send_query to run async COPY command in the
> background.

We don't have any tests in core.  I think that making deterministic
test cases is rather tricky here as long as we don't have a more
advanced testing framework that allows is to lock certain code paths
and keep around an expected state until a second session comes around
and looks at the progress catalog (even that would need adding more
code to core to mark the extra point looked at).  So I think that it is
fine to not focus on that for this feature.  The important parts are
the choice of the progress points and the data sent to MyProc, and
both should be chosen wisely.

> My initial (attached) patch also doesn't introduce documentation for this
> system view. I can add that later once this patch is finalized (if that
> happens).

You may want to add it to the next commit fest:
https://commitfest.postgresql.org/28/
Documentation is necessary, and having some would ease reviews.

I have added documentation, more code comments and I'll upload patch to commit fest.
 
--
Michael
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Josef Šimánek
In reply to this post by Bharath Rupireddy


po 15. 6. 2020 v 7:34 odesílatel Bharath Rupireddy <[hidden email]> napsal:
> I'm using ftell to get current position in file to populate file_bytes_processed without error handling (ftell can return -1L and also populate errno on problems).
>
> 1. Is that a good way to get progress of file processing?

IMO, it's better to handle the error cases. One possible case where
ftell can return -1 and set errno is when the total bytes processed is
more than LONG_MAX.

Will your patch handle file_bytes_processed reporting for COPY FROM
STDIN cases? For this case, ftell can't be used.

Instead of using ftell and worrying about the errors, a simple
approach could be to have a uint64 variable in CopyStateData to track
the number of bytes read whenever CopyGetData is called. This approach
can also handle the case of COPY FROM STDIN.

Thanks for suggestion. I used this approach and latest patch supports both STDIN and STDOUT now. 
 
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Josef Šimánek
In reply to this post by Josef Šimánek
Thanks for all comments. I have updated code to support more options (including STDIN/STDOUT) and added some documentation.


I'm also attaching screenshot of HTML documentation and html documentation file.

I'll do my best to get this to commitfest now.

ne 14. 6. 2020 v 14:32 odesílatel Josef Šimánek <[hidden email]> napsal:
Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.

Few examples first:

"COPY (SELECT * FROM test) TO '/tmp/ids';"

yr=# SELECT * from pg_stat_progress_copy;
   pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
 3347126 | 16384 | yr      |     0 | TO        | t    | f       |         3529943 |             24906226
(1 row)
 
"COPY test FROM '/tmp/ids';

yr=# SELECT * from pg_stat_progress_copy;
   pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
 3347126 | 16384 | yr      | 16385 | FROM      | t    | f       |       121591999 |            957218816
(1 row)

Columns are inspired by CREATE INDEX progress report system view.

pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since oid should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known (for example when copying to file, it is 0)
direction - text - one of "FROM" or "TO" depends on command used
file - bool - is file is used?
program - bool - is program used?
lines_processed - bigint - amount of processed lines, works for both directions (FROM/TO)
file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)

Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.

Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patch

I havefew initial notes and questions.

I'm using ftell to get current position in file to populate file_bytes_processed without error handling (ftell can return -1L and also populate errno on problems).

1. Is that a good way to get progress of file processing?
2. Is it safe in given context to not care about errors? If not, what to do on error?

Some columns are not populated on certain COPY commands. For example when a file is not used, file_bytes_processed is set to 0. Would it be better to use NULL instead when the column is not related to the current command? Same problem is for relid column.

I have not found any tests for progress reporting. Are there any? It would need two backends running (one running COPY, one checking output of report view). Is there any similar test I can inspire at? In theory, it should be possible to use dblink_send_query to run async COPY command in the background.

My initial (attached) patch also doesn't introduce documentation for this system view. I can add that later once this patch is finalized (if that happens).

progress-reporting.html (61K) Download Attachment
copy-progress-v2.patch (28K) Download Attachment
copy-progress-v2.diff (17K) Download Attachment
copy-progress-docs.jpg (204K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Fujii Masao-4
In reply to this post by Josef Šimánek


On 2020/06/21 20:33, Josef Šimánek wrote:

>
>
> po 15. 6. 2020 v 6:39 odesílatel Fujii Masao <[hidden email] <mailto:[hidden email]>> napsal:
>
>
>
>     On 2020/06/14 21:32, Josef Šimánek wrote:
>      > Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.
>
>     Sounds nice!
>
>
>      > file - bool - is file is used?
>      > program - bool - is program used?
>
>     Are these fields really necessary in a progress view?
>     What values are reported when STDOUT/STDIN is specified in COPY command?
>
>
> For STDOUT and STDIN file is true and program is false.

Could you tell me why these columns are necessary in *progress* view?
If we want to see what copy command is actually running, we can see
pg_stat_activity, instead. For example,

     SELECT pc.*, a.query FROM pg_stat_progress_copy pc, pg_stat_activity a WHERE pc.pid = a.pid;

>
>      > file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
>      > FROM/TO) when file is used (file = t)
>
>     What value is reported when STDOUT/STDIN is specified in COPY command?
>
>
> For my first patch nothing was reported on STDOUT/STDIN usage. I'll attach new patch soon supporting those as well.

Thanks for the patch!

With the patch, pg_stat_progress_copy seems to report the progress of
the processing on file_fdw. Is this intentional?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

vignesh C
In reply to this post by Josef Šimánek
On Sun, Jun 21, 2020 at 5:11 PM Josef Šimánek <[hidden email]> wrote:

>
> Thanks for all comments. I have updated code to support more options (including STDIN/STDOUT) and added some documentation.
>
> Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.
>
> Diff version: https://github.com/simi/postgres/pull/5.diff
> Patch version: https://github.com/simi/postgres/pull/5.patch
>
> I'm also attaching screenshot of HTML documentation and html documentation file.
>
> I'll do my best to get this to commitfest now.
>
> ne 14. 6. 2020 v 14:32 odesílatel Josef Šimánek <[hidden email]> napsal:
>>
>> Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.
>>
>> Few examples first:
>>
>> "COPY (SELECT * FROM test) TO '/tmp/ids';"
>>
>> yr=# SELECT * from pg_stat_progress_copy;
>>    pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
>> ---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
>>  3347126 | 16384 | yr      |     0 | TO        | t    | f       |         3529943 |             24906226
>> (1 row)
>>
>> "COPY test FROM '/tmp/ids';
>>
>> yr=# SELECT * from pg_stat_progress_copy;
>>    pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
>> ---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
>>  3347126 | 16384 | yr      | 16385 | FROM      | t    | f       |       121591999 |            957218816
>> (1 row)
>>
>> Columns are inspired by CREATE INDEX progress report system view.
>>
>> pid - integer - PID of backend
>> datid - oid - OID of related database
>> datname - name - name of related database (this seems redundant, since oid should be enough, but it is the same in CREATE INDEX)
>> relid - oid - oid of table related to COPY command, when not known (for example when copying to file, it is 0)
>> direction - text - one of "FROM" or "TO" depends on command used
>> file - bool - is file is used?
>> program - bool - is program used?
>> lines_processed - bigint - amount of processed lines, works for both directions (FROM/TO)
>> file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
>> FROM/TO) when file is used (file = t)
>>
>> Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.
>>

Few comments:
@@ -713,6 +714,8 @@ CopyGetData(CopyState cstate, void *databuf, int
minread, int maxread)
  break;
  }

+ CopyUpdateBytesProgress(cstate, bytesread);
+
  return bytesread;
 }

This is actually the read data, actual processing will happen later
like in CopyReadLineText, it would be better if
CopyUpdateBytesProgress is done later, if not it will give the same
value even though it does multiple inserts on the table.
lines_processed will keep getting updated but file_bytes_processed
will not be updated.

 +pg_stat_progress_copy| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 0 THEN 'TO'::text
+            WHEN 1 THEN 'FROM'::text
+            ELSE NULL::text
+        END AS direction,
+    ((s.param2)::integer)::boolean AS file,
+    ((s.param3)::integer)::boolean AS program,
+    s.param4 AS lines_processed,
+    s.param5 AS file_bytes_processed

You could include pg_size_pretty for s.param5 like
pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
users to understand bytes_processed when the data size increases.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Josef Šimánek
In reply to this post by Fujii Masao-4


po 22. 6. 2020 v 4:48 odesílatel Fujii Masao <[hidden email]> napsal:


On 2020/06/21 20:33, Josef Šimánek wrote:
>
>
> po 15. 6. 2020 v 6:39 odesílatel Fujii Masao <[hidden email] <mailto:[hidden email]>> napsal:
>
>
>
>     On 2020/06/14 21:32, Josef Šimánek wrote:
>      > Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.
>
>     Sounds nice!
>
>
>      > file - bool - is file is used?
>      > program - bool - is program used?
>
>     Are these fields really necessary in a progress view?
>     What values are reported when STDOUT/STDIN is specified in COPY command?
>
>
> For STDOUT and STDIN file is true and program is false.

Could you tell me why these columns are necessary in *progress* view?
If we want to see what copy command is actually running, we can see
pg_stat_activity, instead. For example,

     SELECT pc.*, a.query FROM pg_stat_progress_copy pc, pg_stat_activity a WHERE pc.pid = a.pid; 
 
If that doesn't make any sense, I can remove those. I have not strong opinion about those values. Those were just around when I was looking for possible values to include in the progress report.

>
>      > file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
>      > FROM/TO) when file is used (file = t)
>
>     What value is reported when STDOUT/STDIN is specified in COPY command?
>
>
> For my first patch nothing was reported on STDOUT/STDIN usage. I'll attach new patch soon supporting those as well.

Thanks for the patch!

With the patch, pg_stat_progress_copy seems to report the progress of
the processing on file_fdw. Is this intentional?

Every action using internally COPY will be included in the progress report view.
I have spotted for example pg_dump does that and is reported there as well.
I do not see any problem regarding this. For pg_dump it is consistent with "pg_stat_activity" reporting COPY command in the query field.
 
Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Josef Šimánek
In reply to this post by vignesh C


po 22. 6. 2020 v 9:15 odesílatel vignesh C <[hidden email]> napsal:
On Sun, Jun 21, 2020 at 5:11 PM Josef Šimánek <[hidden email]> wrote:
>
> Thanks for all comments. I have updated code to support more options (including STDIN/STDOUT) and added some documentation.
>
> Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.
>
> Diff version: https://github.com/simi/postgres/pull/5.diff
> Patch version: https://github.com/simi/postgres/pull/5.patch
>
> I'm also attaching screenshot of HTML documentation and html documentation file.
>
> I'll do my best to get this to commitfest now.
>
> ne 14. 6. 2020 v 14:32 odesílatel Josef Šimánek <[hidden email]> napsal:
>>
>> Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting.
>>
>> Few examples first:
>>
>> "COPY (SELECT * FROM test) TO '/tmp/ids';"
>>
>> yr=# SELECT * from pg_stat_progress_copy;
>>    pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
>> ---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
>>  3347126 | 16384 | yr      |     0 | TO        | t    | f       |         3529943 |             24906226
>> (1 row)
>>
>> "COPY test FROM '/tmp/ids';
>>
>> yr=# SELECT * from pg_stat_progress_copy;
>>    pid   | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed
>> ---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
>>  3347126 | 16384 | yr      | 16385 | FROM      | t    | f       |       121591999 |            957218816
>> (1 row)
>>
>> Columns are inspired by CREATE INDEX progress report system view.
>>
>> pid - integer - PID of backend
>> datid - oid - OID of related database
>> datname - name - name of related database (this seems redundant, since oid should be enough, but it is the same in CREATE INDEX)
>> relid - oid - oid of table related to COPY command, when not known (for example when copying to file, it is 0)
>> direction - text - one of "FROM" or "TO" depends on command used
>> file - bool - is file is used?
>> program - bool - is program used?
>> lines_processed - bigint - amount of processed lines, works for both directions (FROM/TO)
>> file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction (
>> FROM/TO) when file is used (file = t)
>>
>> Patch is attached and can be found also at https://github.com/simi/postgres/pull/5.
>>

Few comments:
@@ -713,6 +714,8 @@ CopyGetData(CopyState cstate, void *databuf, int
minread, int maxread)
  break;
  }

+ CopyUpdateBytesProgress(cstate, bytesread);
+
  return bytesread;
 }

This is actually the read data, actual processing will happen later
like in CopyReadLineText, it would be better if
CopyUpdateBytesProgress is done later, if not it will give the same
value even though it does multiple inserts on the table.
lines_processed will keep getting updated but file_bytes_processed
will not be updated.

First I would like to explain what's reported (or at least I'm trying to get reported) at bytes_processed column.

When exporting to file it should start at 0 and end up at the actual final file size.
When importing from file, it should do the same. You can check file size before you start COPY FROM and get actual progress looking at bytes_processed.

This column is just a counter of bytes read from input on COPY FROM or amount of bytes going through COPY TO.

Thanks for the hint regarding "CopyReadLineText". I'll take a look.

For now I have tested those cases:

CREATE TABLE test(id int);
INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
COPY (SELECT * FROM test) TO '/tmp/ids';
COPY test FROM '/tmp/ids';

psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'

It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline character).
I'll try to check more complex COPY commands to ensure everything is in sync.

If you have any ideas for testing queries, feel free to suggest.

 +pg_stat_progress_copy| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 0 THEN 'TO'::text
+            WHEN 1 THEN 'FROM'::text
+            ELSE NULL::text
+        END AS direction,
+    ((s.param2)::integer)::boolean AS file,
+    ((s.param3)::integer)::boolean AS program,
+    s.param4 AS lines_processed,
+    s.param5 AS file_bytes_processed

You could include pg_size_pretty for s.param5 like
pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
users to understand bytes_processed when the data size increases.

I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to be used later in custom nice friendly human-readable views built on the client side.
For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.

Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.
 
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Tomas Vondra-4
In reply to this post by Josef Šimánek
On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:

>Thanks for all comments. I have updated code to support more options
>(including STDIN/STDOUT) and added some documentation.
>
>Patch is attached and can be found also at
>https://github.com/simi/postgres/pull/5.
>
>Diff version: https://github.com/simi/postgres/pull/5.diff
>Patch version: https://github.com/simi/postgres/pull/5.patch
>
>I'm also attaching screenshot of HTML documentation and html documentation
>file.
>
>I'll do my best to get this to commitfest now.
>

I see we're not showing the total number of bytes the COPY is expected
to process, which makes it hard to estimate how far we actually are.
Clearly there are cases when we really don't know that (exports, import
from stdin/program), but why not to show file size for imports from a
file? I'd expect that to be the most common case.

I wonder if it made sense to show some estimates in the other cases. For
example when exporting query result, maybe we could show the estimated
number of rows and size? Of course, that's prone to estimation errors
and it's more a wild idea for the future, I don't expect this patch to
implement that.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Josef Šimánek


po 22. 6. 2020 v 14:14 odesílatel Tomas Vondra <[hidden email]> napsal:
On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
>Thanks for all comments. I have updated code to support more options
>(including STDIN/STDOUT) and added some documentation.
>
>Patch is attached and can be found also at
>https://github.com/simi/postgres/pull/5.
>
>Diff version: https://github.com/simi/postgres/pull/5.diff
>Patch version: https://github.com/simi/postgres/pull/5.patch
>
>I'm also attaching screenshot of HTML documentation and html documentation
>file.
>
>I'll do my best to get this to commitfest now.
>

I see we're not showing the total number of bytes the COPY is expected
to process, which makes it hard to estimate how far we actually are.
Clearly there are cases when we really don't know that (exports, import
from stdin/program), but why not to show file size for imports from a
file? I'd expect that to be the most common case.

For COPY FROM file fstat is done and info is available already at https://github.com/postgres/postgres/blob/fe186b4c200b76a5c0f03379fe8645ed1c70a844/src/backend/commands/copy.c#L1934. It should be easy to update some param (param6 for example) with file size and expose it in report view. When not available, this column can be NULL.

Would that be enough?

On the other side everyone can check file size manually to get total value expected and just compare to reported bytes_processed. Alt. "wc -l" can be checked to get amount of lines and check lines_processed column to get progress. Should it check amount of lines and populate another column with lines total (using a configured separator) as well? AFAIK that would need full file scan which can be slow for huge files.
 
I wonder if it made sense to show some estimates in the other cases. For
example when exporting query result, maybe we could show the estimated
number of rows and size? Of course, that's prone to estimation errors
and it's more a wild idea for the future, I don't expect this patch to
implement that.

My plan here was to expose numbers not being currently available and let clients get the rest of info on their own.

For example:
- for "COPY (query) TO file" - EXPLAIN or COUNT variant of query could be executed before to get the amount of expected rows
- for "COPY table FROM file" - file size or amount of lines in file can be inspected first to get amount of expected rows or bytes to be processed

I see the current system view in my patch (and also all other report views currently available) more as a scaffold to build own tools.

For example CLI tools can use this to provide some kind of progress.
 
regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Tomas Vondra-4
On Mon, Jun 22, 2020 at 03:33:00PM +0200, Josef Šimánek wrote:

>po 22. 6. 2020 v 14:14 odesílatel Tomas Vondra <[hidden email]>
>napsal:
>
>> On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
>> >Thanks for all comments. I have updated code to support more options
>> >(including STDIN/STDOUT) and added some documentation.
>> >
>> >Patch is attached and can be found also at
>> >https://github.com/simi/postgres/pull/5.
>> >
>> >Diff version: https://github.com/simi/postgres/pull/5.diff
>> >Patch version: https://github.com/simi/postgres/pull/5.patch
>> >
>> >I'm also attaching screenshot of HTML documentation and html documentation
>> >file.
>> >
>> >I'll do my best to get this to commitfest now.
>> >
>>
>> I see we're not showing the total number of bytes the COPY is expected
>> to process, which makes it hard to estimate how far we actually are.
>> Clearly there are cases when we really don't know that (exports, import
>> from stdin/program), but why not to show file size for imports from a
>> file? I'd expect that to be the most common case.
>>
>
>For COPY FROM file fstat is done and info is available already at
>https://github.com/postgres/postgres/blob/fe186b4c200b76a5c0f03379fe8645ed1c70a844/src/backend/commands/copy.c#L1934.
>It should be easy to update some param (param6 for example) with file size
>and expose it in report view. When not available, this column can be NULL.
>
>Would that be enough?
>

Yes, I think that'd be fine. The rows without a file should have NULL,
because we literally don't know what the value is. And 0 is a valid file
size, so we can't use it anyway.

>On the other side everyone can check file size manually to get total value
>expected and just compare to reported bytes_processed. Alt. "wc -l" can be
>checked to get amount of lines and check lines_processed column to get
>progress. Should it check amount of lines and populate another column with
>lines total (using a configured separator) as well? AFAIK that would need
>full file scan which can be slow for huge files.
>

Sure, but the extra `wc -l` is less convenient and you then need to
combine that with pg_stat_progress_copy. With the information right in
the view, you can do (100.0 * bytes_processed / bytes_total) and you get
the progress as a percentage. (I've omitted the NULL handling.)

As for the number of lines, I certainly don't think we need to scan the
file - that'd be far too expensive. What we might do is estimate it as

    total_bytes / (processed_bytes / processed_rows)

but that's something people can easily do on their own. So I don't think
it needs to be part of the patch, and IMHO bytes_processed / bytes_total
is a sufficient measure of progress.

>
>> I wonder if it made sense to show some estimates in the other cases. For
>> example when exporting query result, maybe we could show the estimated
>> number of rows and size? Of course, that's prone to estimation errors
>> and it's more a wild idea for the future, I don't expect this patch to
>> implement that.
>>
>
>My plan here was to expose numbers not being currently available and let
>clients get the rest of info on their own.
>
>For example:
>- for "COPY (query) TO file" - EXPLAIN or COUNT variant of query could be
>executed before to get the amount of expected rows
>- for "COPY table FROM file" - file size or amount of lines in file can be
>inspected first to get amount of expected rows or bytes to be processed
>
>I see the current system view in my patch (and also all other report views
>currently available) more as a scaffold to build own tools.
>
>For example CLI tools can use this to provide some kind of progress.
>

True, but I'd advise against putting this into v1 of the patch. Let's
keep it simple, get it committed and then maybe improve it later.

Some of these stats (like the estimates from a query) may be quite
unreliable, so I think it needs more discussion. We might invent
lines_estimated or something like that, for example.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

vignesh C
In reply to this post by Josef Šimánek
On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <[hidden email]> wrote:

>
> Thanks for the hint regarding "CopyReadLineText". I'll take a look.
>
> For now I have tested those cases:
>
> CREATE TABLE test(id int);
> INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
> COPY (SELECT * FROM test) TO '/tmp/ids';
> COPY test FROM '/tmp/ids';
>
> psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
> echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
>
> It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline character).
> I'll try to check more complex COPY commands to ensure everything is in sync.
>
> If you have any ideas for testing queries, feel free to suggest.

For copy from statement you could attach the session, put a breakpoint
at CopyReadLineText, execution will hit this breakpoint for every
record it is doing COPY FROM and parallely check if
pg_stat_progress_copy is getting updated correctly. I noticed it was
showing the file read size instead of the actual processed bytes.

>>  +pg_stat_progress_copy| SELECT s.pid,
>> +    s.datid,
>> +    d.datname,
>> +    s.relid,
>> +        CASE s.param1
>> +            WHEN 0 THEN 'TO'::text
>> +            WHEN 1 THEN 'FROM'::text
>> +            ELSE NULL::text
>> +        END AS direction,
>> +    ((s.param2)::integer)::boolean AS file,
>> +    ((s.param3)::integer)::boolean AS program,
>> +    s.param4 AS lines_processed,
>> +    s.param5 AS file_bytes_processed
>>
>> You could include pg_size_pretty for s.param5 like
>> pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
>> users to understand bytes_processed when the data size increases.
>
>
> I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to be used later in custom nice friendly human-readable views built on the client side.
> For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.
>
> Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.

I felt we could add pg_size_pretty to make the view more user friendly.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Tomas Vondra-4
On Tue, Jun 23, 2020 at 03:40:08PM +0530, vignesh C wrote:

>On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <[hidden email]> wrote:
>>
>> Thanks for the hint regarding "CopyReadLineText". I'll take a look.
>>
>> For now I have tested those cases:
>>
>> CREATE TABLE test(id int);
>> INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
>> COPY (SELECT * FROM test) TO '/tmp/ids';
>> COPY test FROM '/tmp/ids';
>>
>> psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
>> echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
>>
>> It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline character).
>> I'll try to check more complex COPY commands to ensure everything is in sync.
>>
>> If you have any ideas for testing queries, feel free to suggest.
>
>For copy from statement you could attach the session, put a breakpoint
>at CopyReadLineText, execution will hit this breakpoint for every
>record it is doing COPY FROM and parallely check if
>pg_stat_progress_copy is getting updated correctly. I noticed it was
>showing the file read size instead of the actual processed bytes.
>
>>>  +pg_stat_progress_copy| SELECT s.pid,
>>> +    s.datid,
>>> +    d.datname,
>>> +    s.relid,
>>> +        CASE s.param1
>>> +            WHEN 0 THEN 'TO'::text
>>> +            WHEN 1 THEN 'FROM'::text
>>> +            ELSE NULL::text
>>> +        END AS direction,
>>> +    ((s.param2)::integer)::boolean AS file,
>>> +    ((s.param3)::integer)::boolean AS program,
>>> +    s.param4 AS lines_processed,
>>> +    s.param5 AS file_bytes_processed
>>>
>>> You could include pg_size_pretty for s.param5 like
>>> pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
>>> users to understand bytes_processed when the data size increases.
>>
>>
>> I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to be used later in custom nice friendly human-readable views built on the client side.
>> For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.
>>
>> Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.
>
>I felt we could add pg_size_pretty to make the view more user friendly.
>

Please no. That'd make processing of the data (say, computing progress
as processed/total) impossible. It's easy to add pg_size_pretty if you
want it, it's impossible to undo it. I don't see a single pg_size_pretty
call in system_views.sql.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Pavel Stehule


út 23. 6. 2020 v 13:15 odesílatel Tomas Vondra <[hidden email]> napsal:
On Tue, Jun 23, 2020 at 03:40:08PM +0530, vignesh C wrote:
>On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <[hidden email]> wrote:
>>
>> Thanks for the hint regarding "CopyReadLineText". I'll take a look.
>>
>> For now I have tested those cases:
>>
>> CREATE TABLE test(id int);
>> INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
>> COPY (SELECT * FROM test) TO '/tmp/ids';
>> COPY test FROM '/tmp/ids';
>>
>> psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
>> echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
>>
>> It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline character).
>> I'll try to check more complex COPY commands to ensure everything is in sync.
>>
>> If you have any ideas for testing queries, feel free to suggest.
>
>For copy from statement you could attach the session, put a breakpoint
>at CopyReadLineText, execution will hit this breakpoint for every
>record it is doing COPY FROM and parallely check if
>pg_stat_progress_copy is getting updated correctly. I noticed it was
>showing the file read size instead of the actual processed bytes.
>
>>>  +pg_stat_progress_copy| SELECT s.pid,
>>> +    s.datid,
>>> +    d.datname,
>>> +    s.relid,
>>> +        CASE s.param1
>>> +            WHEN 0 THEN 'TO'::text
>>> +            WHEN 1 THEN 'FROM'::text
>>> +            ELSE NULL::text
>>> +        END AS direction,
>>> +    ((s.param2)::integer)::boolean AS file,
>>> +    ((s.param3)::integer)::boolean AS program,
>>> +    s.param4 AS lines_processed,
>>> +    s.param5 AS file_bytes_processed
>>>
>>> You could include pg_size_pretty for s.param5 like
>>> pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
>>> users to understand bytes_processed when the data size increases.
>>
>>
>> I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to be used later in custom nice friendly human-readable views built on the client side.
>> For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.
>>
>> Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.
>
>I felt we could add pg_size_pretty to make the view more user friendly.
>

Please no. That'd make processing of the data (say, computing progress
as processed/total) impossible. It's easy to add pg_size_pretty if you
want it, it's impossible to undo it. I don't see a single pg_size_pretty
call in system_views.sql.

+1, *_pretty functions should be used on the client side only. Server side (source) should be in raw format.

Regards

Pavel




regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Initial progress reporting for COPY command

Josef Šimánek
In reply to this post by Tomas Vondra-4


út 23. 6. 2020 v 13:15 odesílatel Tomas Vondra <[hidden email]> napsal:
On Tue, Jun 23, 2020 at 03:40:08PM +0530, vignesh C wrote:
>On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek <[hidden email]> wrote:
>>
>> Thanks for the hint regarding "CopyReadLineText". I'll take a look.
>>
>> For now I have tested those cases:
>>
>> CREATE TABLE test(id int);
>> INSERT INTO test SELECT 1 FROM generate_series(1, 1000000);
>> COPY (SELECT * FROM test) TO '/tmp/ids';
>> COPY test FROM '/tmp/ids';
>>
>> psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,100000000)) TO STDOUT;' > /tmp/ryba.txt
>> echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
>>
>> It is easy to check lines count and bytes count are in sync (since 1 line is 2 bytes here - "1" and newline character).
>> I'll try to check more complex COPY commands to ensure everything is in sync.
>>
>> If you have any ideas for testing queries, feel free to suggest.
>
>For copy from statement you could attach the session, put a breakpoint
>at CopyReadLineText, execution will hit this breakpoint for every
>record it is doing COPY FROM and parallely check if
>pg_stat_progress_copy is getting updated correctly. I noticed it was
>showing the file read size instead of the actual processed bytes.
>
>>>  +pg_stat_progress_copy| SELECT s.pid,
>>> +    s.datid,
>>> +    d.datname,
>>> +    s.relid,
>>> +        CASE s.param1
>>> +            WHEN 0 THEN 'TO'::text
>>> +            WHEN 1 THEN 'FROM'::text
>>> +            ELSE NULL::text
>>> +        END AS direction,
>>> +    ((s.param2)::integer)::boolean AS file,
>>> +    ((s.param3)::integer)::boolean AS program,
>>> +    s.param4 AS lines_processed,
>>> +    s.param5 AS file_bytes_processed
>>>
>>> You could include pg_size_pretty for s.param5 like
>>> pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
>>> users to understand bytes_processed when the data size increases.
>>
>>
>> I was looking at the rest of reporting views and for me those seem to be just basic ones providing just raw data to be used later in custom nice friendly human-readable views built on the client side.
>> For example "pg_stat_progress_basebackup" also reports "backup_streamed" in raw form.
>>
>> Anyway if you would like to make this view more user-friendly, I can add that. Just ping me.
>
>I felt we could add pg_size_pretty to make the view more user friendly.
>

Please no. That'd make processing of the data (say, computing progress
as processed/total) impossible. It's easy to add pg_size_pretty if you
want it, it's impossible to undo it. I don't see a single pg_size_pretty
call in system_views.sql.


I think the same. 
 
regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
12