list of extended statistics on psql

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
76 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

list of extended statistics on psql

Tatsuro Yamada-2
Hi!

I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-D

I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.

Please find the attached patch.
Any comments are welcome!

For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;

CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;

postgres=# \dz
                     List of extended statistics
  Schema | Table | Name  | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
  public | t1    | stts1 | a, b    | f         | t            | f
  public | t1    | stts2 | a, b    | t         | t            | f
  public | t1    | stts3 | a, b    | t         | t            | t
  public | t2    | stts4 | b, c    | t         | t            | t
(4 rows)

postgres=# \?
...
   \dy     [PATTERN]      list event triggers
   \dz     [PATTERN]      list extended statistics
   \l[+]   [PATTERN]      list databases
...
=======================

For now, I haven't written a document and regression test for that.
I'll create it later.

Thanks,
Tatsuro Yamada



add_list_extended_stats_for_psql_poc1.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Pavel Stehule


po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <[hidden email]> napsal:
Hi!

I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-D

I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.

Please find the attached patch.
Any comments are welcome!

For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;

CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;

postgres=# \dz
                     List of extended statistics
  Schema | Table | Name  | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
  public | t1    | stts1 | a, b    | f         | t            | f
  public | t1    | stts2 | a, b    | t         | t            | f
  public | t1    | stts3 | a, b    | t         | t            | t
  public | t2    | stts4 | b, c    | t         | t            | t
(4 rows)

postgres=# \?
...
   \dy     [PATTERN]      list event triggers
   \dz     [PATTERN]      list extended statistics
   \l[+]   [PATTERN]      list databases
...
=======================

For now, I haven't written a document and regression test for that.
I'll create it later.

+1 good idea

Pavel


Thanks,
Tatsuro Yamada


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Julien Rouhaud
On Mon, Aug 24, 2020 at 6:13 AM Pavel Stehule <[hidden email]> wrote:

>
> po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <[hidden email]> napsal:
>>
>> Hi!
>>
>> I created a POC patch that allows showing a list of extended statistics by
>> "\dz" command on psql. I believe this feature helps DBA and users who
>> would like to know all extended statistics easily. :-D
>>
>> I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
>> than "\dz" but they were already assigned. Therefore I used "\dz"
>> instead of them.
>>
>> Please find the attached patch.
>> Any comments are welcome!
>>
>> For Example:
>> =======================
>> CREATE TABLE t1 (a INT, b INT);
>> CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
>> CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
>> CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
>> ANALYZE t1;
>>
>> CREATE TABLE t2 (a INT, b INT, c INT);
>> CREATE STATISTICS stts4 ON b, c FROM t2;
>> ANALYZE t2;
>>
>> postgres=# \dz
>>                      List of extended statistics
>>   Schema | Table | Name  | Columns | Ndistinct | Dependencies | MCV
>> --------+-------+-------+---------+-----------+--------------+-----
>>   public | t1    | stts1 | a, b    | f         | t            | f
>>   public | t1    | stts2 | a, b    | t         | t            | f
>>   public | t1    | stts3 | a, b    | t         | t            | t
>>   public | t2    | stts4 | b, c    | t         | t            | t
>> (4 rows)
>>
>> postgres=# \?
>> ...
>>    \dy     [PATTERN]      list event triggers
>>    \dz     [PATTERN]      list extended statistics
>>    \l[+]   [PATTERN]      list databases
>> ...
>> =======================
>>
>> For now, I haven't written a document and regression test for that.
>> I'll create it later.
>
>
> +1 good idea

+1 that's a good idea.  Please add it to the next commitfest!

You have a typo:

+    if (pset.sversion < 10000)
+    {
+        char        sverbuf[32];
+
+        pg_log_error("The server (version %s) does not support
extended statistics.",
+                     formatPGVersionNumber(pset.sversion, false,
+                                           sverbuf, sizeof(sverbuf)));
+        return true;
+    }

the version test is missing a 0, the feature looks otherwise ok.

How about using \dX rather than \dz?


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tatsuro Yamada-2
Hi!

>> +1 good idea
>
> +1 that's a good idea.  Please add it to the next commitfest!

Thanks!


> You have a typo:
>
> +    if (pset.sversion < 10000)
> +    {
> +        char        sverbuf[32];
> +
> +        pg_log_error("The server (version %s) does not support
> extended statistics.",
> +                     formatPGVersionNumber(pset.sversion, false,
> +                                           sverbuf, sizeof(sverbuf)));
> +        return true;
> +    }
>
> the version test is missing a 0, the feature looks otherwise ok.
Ouch, I fixed on the attached patch.

The new patch includes:

  - Fix the version number check (10000 -> 100000)
  - Fix query to get extended stats info for sort order
  - Add handling [Pattern] e.g \dz stts*
  - Add document and regression test for \dz
 
> How about using \dX rather than \dz?

Thanks for your suggestion!
I'll replace it if I got consensus. :-D

Thanks,
Tatsuro Yamada

add_list_extended_stats_for_psql_poc2.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tatsuro Yamada-2
Hi Julien and Pavel!

>> How about using \dX rather than \dz?
>
> Thanks for your suggestion!
> I'll replace it if I got consensus. :-D

>> How about using \dX rather than \dz?
>
>Thanks for your suggestion!
>I'll replace it if I got consensus. :-D


I re-read a help message of \d* commands and realized it's better to
use "\dX".
There are already cases where the commands differ due to differences
in case, so I did the same way. Please find attached patch. :-D
 
For example:
==========
   \da[S]  [PATTERN]      list aggregates
   \dA[+]  [PATTERN]      list access methods
==========

Attached patch uses "\dX" instead of "\dz":
==========
   \dx[+]  [PATTERN]      list extensions
   \dX     [PATTERN]      list extended statistics
==========

Results of regress test of the feature are the following:
==========
-- check printing info about extended statistics
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

\dX
                           List of extended statistics
  Schema | Table |   Name    |     Columns      | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
  public | hoge  | stts_hoge | col1, col2, col3 | t         | t            | t
  public | t1    | stts_1    | a, b             | f         | t            | f
  public | t1    | stts_2    | a, b             | t         | t            | f
  public | t1    | stts_3    | a, b             | t         | t            | t
  public | t2    | stts_4    | b, c             | t         | t            | t
(5 rows)

\dX stts_?
                     List of extended statistics
  Schema | Table |  Name  | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
  public | t1    | stts_1 | a, b    | f         | t            | f
  public | t1    | stts_2 | a, b    | t         | t            | f
  public | t1    | stts_3 | a, b    | t         | t            | t
  public | t2    | stts_4 | b, c    | t         | t            | t
(4 rows)

\dX *hoge
                           List of extended statistics
  Schema | Table |   Name    |     Columns      | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
  public | hoge  | stts_hoge | col1, col2, col3 | t         | t            | t
(1 row)
==========


Thanks,
Tatsuro Yamada



add_list_extended_stats_for_psql_by_dX_command.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Julien Rouhaud
Hi Yamada-san,

On Thu, Aug 27, 2020 at 03:13:09PM +0900, Tatsuro Yamada wrote:

>
> I re-read a help message of \d* commands and realized it's better to
> use "\dX".
> There are already cases where the commands differ due to differences
> in case, so I did the same way. Please find attached patch. :-D
> For example:
> ==========
>   \da[S]  [PATTERN]      list aggregates
>   \dA[+]  [PATTERN]      list access methods
> ==========
>
> Attached patch uses "\dX" instead of "\dz":
> ==========
>   \dx[+]  [PATTERN]      list extensions
>   \dX     [PATTERN]      list extended statistics
> ==========


Thanks for updating the patch!  This alias will probably be easier to remember.


>
> Results of regress test of the feature are the following:
> ==========
> -- check printing info about extended statistics
> create table t1 (a int, b int);
> create statistics stts_1 (dependencies) on a, b from t1;
> create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
> create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
> create table t2 (a int, b int, c int);
> create statistics stts_4 on b, c from t2;
> create table hoge (col1 int, col2 int, col3 int);
> create statistics stts_hoge on col1, col2, col3 from hoge;
>
> \dX
>                           List of extended statistics
>  Schema | Table |   Name    |     Columns      | Ndistinct | Dependencies | MCV
> --------+-------+-----------+------------------+-----------+--------------+-----
>  public | hoge  | stts_hoge | col1, col2, col3 | t         | t            | t
>  public | t1    | stts_1    | a, b             | f         | t            | f
>  public | t1    | stts_2    | a, b             | t         | t            | f
>  public | t1    | stts_3    | a, b             | t         | t            | t
>  public | t2    | stts_4    | b, c             | t         | t            | t
> (5 rows)
>
> \dX stts_?
>                     List of extended statistics
>  Schema | Table |  Name  | Columns | Ndistinct | Dependencies | MCV
> --------+-------+--------+---------+-----------+--------------+-----
>  public | t1    | stts_1 | a, b    | f         | t            | f
>  public | t1    | stts_2 | a, b    | t         | t            | f
>  public | t1    | stts_3 | a, b    | t         | t            | t
>  public | t2    | stts_4 | b, c    | t         | t            | t
> (4 rows)
>
> \dX *hoge
>                           List of extended statistics
>  Schema | Table |   Name    |     Columns      | Ndistinct | Dependencies | MCV
> --------+-------+-----------+------------------+-----------+--------------+-----
>  public | hoge  | stts_hoge | col1, col2, col3 | t         | t            | t
> (1 row)
> ==========


Thanks also for the documentation and regression tests.  This overall looks
good, I just have a two comments:

- there's a whitespace issue in the documentation part:

add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent.
          <varlistentry>
warning: 1 line adds whitespace errors.

- You're sorting the output on schema, table, extended statistics and columns
  but I think the last one isn't required since extended statistics names are
  unique.


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tatsuro Yamada-2
Hi Julien!
 

> Thanks also for the documentation and regression tests.  This overall looks
> good, I just have a two comments:


Thank you for reviewing the patch! :-D


> - there's a whitespace issue in the documentation part:
>
> add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent.
>  <varlistentry>
> warning: 1 line adds whitespace errors.


Oops, I forgot to use "git diff --check". I fixed it.

 
> - You're sorting the output on schema, table, extended statistics and columns
>    but I think the last one isn't required since extended statistics names are
>    unique.


You are right.
The sort key "columns" was not necessary so I removed it.

Attached new patch includes the above two fixes:

   - Fix whitespace issue in the documentation part
   - Remove unnecessary sort key from the query
      (ORDER BY 1, 2, 3, 4 -> ORDER BY 1, 2, 3)


Thanks,
Tatsuro Yamada


add_list_extended_stats_for_psql_by_dX_command_r2.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Alvaro Herrera-9
+1 for the general idea, and +1 for \dX being the syntax to use

IMO the per-type columns should show both the type being enabled as
well as it being built.

(How many more stat types do we expect -- Tomas?  I wonder if having one
column per type is going to scale in the long run.)

Also, the stat obj name column should be first, followed by a single
column listing both table and columns that it applies to.  Keep in mind
that in the future we might want to add stats that cross multiple tables
-- that's why the CREATE syntax is the way it is.  So we should give
room for that in psql's display too.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tatsuro Yamada-2
Hi Alvaro!

It's been ages since we created a progress reporting feature together. :-D

>>> +1 good idea
>>
>> +1 that's a good idea.  Please add it to the next commitfest!
>
>+1 for the general idea, and +1 for \dX being the syntax to use

Thank you for voting!


> IMO the per-type columns should show both the type being enabled as
well as it being built.

Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?


> Also, the stat obj name column should be first, followed by a single
> column listing both table and columns that it applies to.  Keep in mind
> that in the future we might want to add stats that cross multiple tables
> -- that's why the CREATE syntax is the way it is.  So we should give
> room for that in psql's display too.

I understand your suggestions are the following, right?

* The Current column order:
===================
   Schema | Table |  Name  | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
   public | t1    | stts_1 | a, b    | f         | t            | f
   public | t1    | stts_2 | a, b    | t         | t            | f
   public | t1    | stts_3 | a, b    | t         | t            | t
   public | t2    | stts_4 | b, c    | t         | t            | t
===================

* The suggested column order is like this:
===================
    Name    | Schema | Table |     Columns      | Ndistinct | Dependencies | MCV
-----------+--------+-------+------------------+-----------+--------------+-----
  stts_1    | public | t1    | a, b             | f         | t            | f
  stts_2    | public | t1    | a, b             | t         | t            | f
  stts_3    | public | t1    | a, b             | t         | t            | t
  stts_4    | public | t2    | b, c             | t         | t            | t
===================

*  In the future, Extended stats that cross multiple tables will be
    shown maybe... (t1, t2):
===================
    Name    | Schema | Table  |     Columns      | Ndistinct | Dependencies | MCV
-----------+--------+--------+------------------+-----------+--------------+-----
  stts_5    | public | t1, t2 | a, b             | f         | t            | f
===================

If so, I can revise the column order as you suggested easily.
However, I have no idea how to show extended stats that cross
multiple tables and the status now.

I suppose that the current column order is sufficient if there is
no improvement of extended stats on PG14. Do you know any plan to
improve extended stats such as to allow it to cross multiple tables on PG14?


In addition,
Currently, I use this query to get Extended stats info from pg_statistic_ext.

         SELECT
         stxnamespace::pg_catalog.regnamespace AS "Schema",
         c.relname AS "Table",
         stxname AS "Name",
         (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
          FROM pg_catalog.unnest(stxkeys) s(attnum)
          JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
          a.attnum = s.attnum AND NOT attisdropped)) AS "Columns",
         'd' = any(stxkind) AS "Ndistinct",
         'f' = any(stxkind) AS "Dependencies",
         'm' = any(stxkind) AS "MCV"
         FROM pg_catalog.pg_statistic_ext
         INNER JOIN pg_catalog.pg_class c
         ON stxrelid = c.oid
         ORDER BY 1, 2, 3;

Thanks,
Tatsuro Yamada





Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Alvaro Herrera-9
On 2020-Aug-28, Tatsuro Yamada wrote:

> > IMO the per-type columns should show both the type being enabled as
> > well as it being built.
>
> Hmm. I'm not sure how to get the status (enabled or disabled) of
> extended stats. :(
> Could you explain it more?

pg_statistic_ext_data.stxdndistinct is not null if the stats have been
built.  (I'm not sure whether there's an easier way to determine this.)


> * The suggested column order is like this:
> ===================
>    Name    | Schema | Table |     Columns      | Ndistinct | Dependencies | MCV
> -----------+--------+-------+------------------+-----------+--------------+-----
>  stts_1    | public | t1    | a, b             | f         | t            | f
>  stts_2    | public | t1    | a, b             | t         | t            | f
>  stts_3    | public | t1    | a, b             | t         | t            | t
>  stts_4    | public | t2    | b, c             | t         | t            | t
> ===================

I suggest to do this

    Name    | Schema | Definition               | Ndistinct | Dependencies | MCV
 -----------+--------+--------------------------+-----------+--------------+-----
  stts_1    | public | (a, b) FROM t1           | f         | t            | f

> I suppose that the current column order is sufficient if there is
> no improvement of extended stats on PG14. Do you know any plan to
> improve extended stats such as to allow it to cross multiple tables on PG14?

I suggest that changing it in the future is going to be an uphill
battle, so better get it right from the get go, without requiring a
future restructure.

> In addition,
> Currently, I use this query to get Extended stats info from pg_statistic_ext.

Maybe something like this would do

SELECT
 stxnamespace::pg_catalog.regnamespace AS "Schema",
 stxname AS "Name",
 format('%s FROM %s',
 (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
  FROM pg_catalog.unnest(stxkeys) s(attnum)
  JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
  a.attnum = s.attnum AND NOT attisdropped)),
  stxrelid::regclass) AS "Definition",
  CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS "n-distinct",
  CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS "functional dependencies",
  CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv
 FROM pg_catalog.pg_statistic_ext es
 INNER JOIN pg_catalog.pg_class c
 ON stxrelid = c.oid
 LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
 ORDER BY 1, 2, 3;

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tomas Vondra-4
In reply to this post by Alvaro Herrera-9
On Thu, Aug 27, 2020 at 07:53:23PM -0400, Alvaro Herrera wrote:
>+1 for the general idea, and +1 for \dX being the syntax to use
>
>IMO the per-type columns should show both the type being enabled as
>well as it being built.
>
>(How many more stat types do we expect -- Tomas?  I wonder if having one
>column per type is going to scale in the long run.)
>

I wouldn't expect a huge number of types. I can imagine maybe twice the
current number of types, but not much more. But I'm not sure the output
is easy to read even now ...


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tomas Vondra-4
In reply to this post by Alvaro Herrera-9
On Thu, Aug 27, 2020 at 11:26:17PM -0400, Alvaro Herrera wrote:

>On 2020-Aug-28, Tatsuro Yamada wrote:
>
>> > IMO the per-type columns should show both the type being enabled as
>> > well as it being built.
>>
>> Hmm. I'm not sure how to get the status (enabled or disabled) of
>> extended stats. :(
>> Could you explain it more?
>
>pg_statistic_ext_data.stxdndistinct is not null if the stats have been
>built.  (I'm not sure whether there's an easier way to determine this.)
>

It's the only way, I think. Which types were requested is stored in

    pg_statistic_ext.stxkind

and what was built is in pg_statistic_ext_data. But if we want the
output to show both what was requested and which types were actually
built, that'll effectively double the number of columns needed :-(

Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Alvaro Herrera-9
On 2020-Aug-29, Tomas Vondra wrote:

> But if we want the
> output to show both what was requested and which types were actually
> built, that'll effectively double the number of columns needed :-(

I was thinking it would be one column per type showing either disabled or enabled
or built.  But another idea is to show one type per line that's at least
enabled.

> Also, it might be useful to show the size of the statistics built, just
> like we show for \d+ etc.

\dX+  I  suppose?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tomas Vondra-4
On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:

>On 2020-Aug-29, Tomas Vondra wrote:
>
>> But if we want the
>> output to show both what was requested and which types were actually
>> built, that'll effectively double the number of columns needed :-(
>
>I was thinking it would be one column per type showing either disabled or enabled
>or built.  But another idea is to show one type per line that's at least
>enabled.
>
>> Also, it might be useful to show the size of the statistics built, just
>> like we show for \d+ etc.
>
>\dX+  I  suppose?
>

Right. I've only used \d+ as an example of an existing command showing
sizes of the objects.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Alvaro Herrera-9
On 2020-Aug-30, Tomas Vondra wrote:

> On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
> > On 2020-Aug-29, Tomas Vondra wrote:

> > > Also, it might be useful to show the size of the statistics built, just
> > > like we show for \d+ etc.
> >
> > \dX+  I  suppose?
>
> Right. I've only used \d+ as an example of an existing command showing
> sizes of the objects.

Yeah, I understood it that way too.

How can you measure the size of the stat objects in a query?  Are you
thinking in pg_column_size()?

I wonder how to report that.  Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used.  What do people think about this?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tomas Vondra-4
On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:

>On 2020-Aug-30, Tomas Vondra wrote:
>
>> On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
>> > On 2020-Aug-29, Tomas Vondra wrote:
>
>> > > Also, it might be useful to show the size of the statistics built, just
>> > > like we show for \d+ etc.
>> >
>> > \dX+  I  suppose?
>>
>> Right. I've only used \d+ as an example of an existing command showing
>> sizes of the objects.
>
>Yeah, I understood it that way too.
>
>How can you measure the size of the stat objects in a query?  Are you
>thinking in pg_column_size()?
>

Either that or simply length() on the bytea value.

>I wonder how to report that.  Knowing that psql \-commands are not meant
>for anything other than human consumption, maybe we can use a format()
>string that says "built: %d bytes" when \dX+ is used (for each stat type),
>and just "built" when \dX is used.  What do people think about this?
>

I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tom Lane-2
Tomas Vondra <[hidden email]> writes:
> On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
>> I wonder how to report that.  Knowing that psql \-commands are not meant
>> for anything other than human consumption, maybe we can use a format()
>> string that says "built: %d bytes" when \dX+ is used (for each stat type),
>> and just "built" when \dX is used.  What do people think about this?

Seems a little too cute to me.

> I'd use the same approach as \d+, i.e. a separate column with the size.
> Maybe that'd mean too many columns, though.

psql already has \d commands with so many columns that you pretty much
have to use \x mode to make them legible; \df+ for instance.  I don't
mind if \dX+ is also in that territory.  It'd be good though if plain
\dX can fit in a normal terminal window.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tatsuro Yamada-2
In reply to this post by Alvaro Herrera-9
Hi Alvaro,

>>> IMO the per-type columns should show both the type being enabled as
>>> well as it being built.
>>
>> Hmm. I'm not sure how to get the status (enabled or disabled) of
>> extended stats. :(
>> Could you explain it more?
>
> pg_statistic_ext_data.stxdndistinct is not null if the stats have been
> built. (I'm not sure whether there's an easier way to determine this.)


Ah.. I see! Thank you.


> I suggest to do this
>
>    Name    | Schema | Definition               | Ndistinct | Dependencies | MCV
> -----------+--------+--------------------------+-----------+--------------+-----
>  stts_1    | public | (a, b) FROM t1           | f         | t            | f
>
>> I suppose that the current column order is sufficient if there is
>> no improvement of extended stats on PG14. Do you know any plan to
>> improve extended stats such as to allow it to cross multiple tables on PG14?
>
> I suggest that changing it in the future is going to be an uphill
> battle, so better get it right from the get go, without requiring a
> future restructure.


I understand your suggestions. I'll replace "Columns" and "Table" columns with "Definition" column.


>> Currently, I use this query to get Extended stats info from pg_statistic_ext.
>
> Maybe something like this would do
>
> SELECT
> stxnamespace::pg_catalog.regnamespace AS "Schema",
> stxname AS "Name",
> format('%s FROM %s',
>  (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
>   FROM pg_catalog.unnest(stxkeys) s(attnum)
>   JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
>   a.attnum = s.attnum AND NOT attisdropped)),
>   stxrelid::regclass) AS "Definition",
>   CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS "n-distinct",
>   CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS "functional dependencies",
>   CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv
>  FROM pg_catalog.pg_statistic_ext es
>  INNER JOIN pg_catalog.pg_class c
>  ON stxrelid = c.oid
>  LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
>  ORDER BY 1, 2, 3;

Great! It helped me a lot to understand your suggestions correctly. Thanks. :-D
I got the below results by your query.

========
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

insert into t1 select i,i from generate_series(1,100) i;
analyze t1;


Your query gave this result:

  Schema |   Name    |         Definition         |     n-distinct     | functional dependencies |        mcv
--------+-----------+----------------------------+--------------------+-------------------------+--------------------
  public | stts_1    | a, b FROM t1               |                    | built                   |
  public | stts_2    | a, b FROM t1               | built              | built                   |
  public | stts_3    | a, b FROM t1               | built              | built                   | built
  public | stts_4    | b, c FROM t2               | enabled, not built | enabled, not built      | enabled, not built
  public | stts_hoge | col1, col2, col3 FROM hoge | enabled, not built | enabled, not built      | enabled, not built
(5 rows)
========

I guess "enabled, not built" is a little redundant. The status would better to
have three patterns: "built", "not built" or nothing (NULL) like these:

   - "built":  extended stats is defined and built (collected by analyze cmd)
   - "not built": extended stats is defined but have not built yet
   - nothing (NULL): extended stats is not defined

What do you think about it?


I will send a new patch including :

   - Replace "Columns" and "Table" column with "Definition"
   - Show the status (built/not built/null) of extended stats by using
     pg_statistic_ext_data

Thanks,
Tatsuro Yamada





Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Tatsuro Yamada-2
In reply to this post by Tom Lane-2
On 2020/08/31 1:59, Tom Lane wrote:

> Tomas Vondra <[hidden email]> writes:
>> On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
>>> I wonder how to report that.  Knowing that psql \-commands are not meant
>>> for anything other than human consumption, maybe we can use a format()
>>> string that says "built: %d bytes" when \dX+ is used (for each stat type),
>>> and just "built" when \dX is used.  What do people think about this?
>
> Seems a little too cute to me.
>
>> I'd use the same approach as \d+, i.e. a separate column with the size.
>> Maybe that'd mean too many columns, though.
>
> psql already has \d commands with so many columns that you pretty much
> have to use \x mode to make them legible; \df+ for instance.  I don't
> mind if \dX+ is also in that territory.  It'd be good though if plain
> \dX can fit in a normal terminal window.

Hmm. How about these instead of "built: %d bytes"?
I added three columns (N_size, D_size, M_size) to show size. See below:

===================
  postgres=# \dX
                                List of extended statistics
  Schema |   Name    |         Definition         | N_distinct | Dependencies |    Mcv
--------+-----------+----------------------------+------------+--------------+-----------
  public | stts_1    | a, b FROM t1               |            | built        |
  public | stts_2    | a, b FROM t1               | built      | built        |
  public | stts_3    | a, b FROM t1               | built      | built        | built
  public | stts_4    | b, c FROM t2               | not built  | not built    | not built
  public | stts_hoge | col1, col2, col3 FROM hoge | not built  | not built    | not built
(5 rows)

postgres=# \dX+
                                             List of extended statistics
  Schema |   Name    |         Definition         | N_distinct | Dependencies |    Mcv    | N_size | D_size | M_size
--------+-----------+----------------------------+------------+--------------+-----------+--------+--------+--------
  public | stts_1    | a, b FROM t1               |            | built        |           |        |     40 |
  public | stts_2    | a, b FROM t1               | built      | built        |           |     13 |     40 |
  public | stts_3    | a, b FROM t1               | built      | built        | built     |     13 |     40 |   6126
  public | stts_4    | b, c FROM t2               | not built  | not built    | not built |        |        |
  public | stts_hoge | col1, col2, col3 FROM hoge | not built  | not built    | not built |        |        |
===================

I used this query to get results of "\dX+".
===================
         SELECT
          stxnamespace::pg_catalog.regnamespace AS "Schema",
          stxname AS "Name",
          format('%s FROM %s',
            (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
             FROM pg_catalog.unnest(stxkeys) s(attnum)
             JOIN pg_catalog.pg_attribute a
             ON (stxrelid = a.attrelid
             AND a.attnum = s.attnum
             AND NOT attisdropped)),
          stxrelid::regclass) AS "Definition",
          CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built'
               WHEN 'd' = any(stxkind) THEN 'not built'
          END AS "N_distinct",
          CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built'
               WHEN 'f' = any(stxkind) THEN 'not built'
          END AS "Dependencies",
          CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built'
               WHEN 'm' = any(stxkind) THEN 'not built'
          END AS "Mcv",
        pg_catalog.length(stxdndistinct) AS "N_size",
        pg_catalog.length(stxddependencies) AS "D_size",
        pg_catalog.length(stxdmcv) AS "M_size"
        FROM pg_catalog.pg_statistic_ext es
        INNER JOIN pg_catalog.pg_class c
        ON stxrelid = c.oid
        LEFT JOIN pg_catalog.pg_statistic_ext_data esd
        ON es.oid = esd.stxoid
        ORDER BY 1, 2;
===================
 

Attached patch includes:

    - Replace "Columns" and "Table" column with "Definition"
    - Show the status (built/not built/null) of extended stats by
      using pg_statistic_ext_data
    - Add "\dX+" command to show size of extended stats

Please find the attached file! :-D


Thanks,
Tatsuro Yamada


add_list_extended_stats_for_psql_by_dX_and_dXplus_r3.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: list of extended statistics on psql

Justin Pryzby
In reply to this post by Alvaro Herrera-9
On Thu, Aug 27, 2020 at 07:53:23PM -0400, Alvaro Herrera wrote:

> +1 for the general idea, and +1 for \dX being the syntax to use
>
> IMO the per-type columns should show both the type being enabled as
> well as it being built.
>
> (How many more stat types do we expect -- Tomas?  I wonder if having one
> column per type is going to scale in the long run.)
>
> Also, the stat obj name column should be first, followed by a single
> column listing both table and columns that it applies to.  Keep in mind
> that in the future we might want to add stats that cross multiple tables
> -- that's why the CREATE syntax is the way it is.  So we should give
> room for that in psql's display too.

There's also a plan for CREATE STATISTICS to support expresion statistics, with
the statistics functionality of an expression index, but without the cost of
index-update on UPDATE/DELETE.  That's Tomas' patch here:
https://commitfest.postgresql.org/29/2421/

I think that would compute ndistinct and MCV, same as indexes, but not
dependencies.  To me, I think it's better if there's a single column showing
the "kinds" of statistics to be generated (stxkind), rather than a column for
each.

I'm not sure why the length of the stats lists cast as text is useful to show?
We don't have a slash-dee command to show the number of MCV or histogram in
traditional, 1-D stats in pg_statistic, right ?  I think anybody wanting that
would learn to SELECT FROM pg_statistic*.  Also, the length of the text output
isn't very meaningful ?  If this is json, maybe you'd do something like this:
|SELECT a.stxdndistinct , COUNT(b) FROM pg_statistic_ext_data a , json_each(stxdndistinct::Json) AS b GROUP BY 1

I guess stxdmcv isn't json, but it seems especially meaningless to show
length() of its ::text, since we don't even "deserialize" the object to begin
with.

BTW, I've just started a new thread about displaying in psql \d the stats
target of target extended stats.

--
Justin


1234