pgbench - allow to create partitioned tables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

pgbench - allow to create partitioned tables

Fabien COELHO-3

Hello devs,

While doing some performance tests and reviewing patches, I needed to
create partitioned tables. Given the current syntax this is time
consumming.

The attached patch adds two options to create a partitioned "account"
table in pgbench.

It allows to answer quickly simple questions, eg "what is the overhead of
hash partitioning on a simple select on my laptop"? Answer:

  # N=0..?
  sh> pgench -i -s 1 --partition-number=$N --partition-type=hash

  # then run
  sh> pgench -S -M prepared -P 1 -T 10

  # and look at latency:
  # no parts = 0.071 ms
  #   1 hash = 0.071 ms (did someone optimize this case?!)
  #   2 hash ~ 0.126 ms (+ 0.055 ms)
  #  50 hash ~ 0.155 ms
  # 100 hash ~ 0.178 ms
  # 150 hash ~ 0.232 ms
  # 200 hash ~ 0.279 ms
  # overhead ~ (0.050 + [0.0005-0.0008] * nparts) ms

--
Fabien.

pgbench-init-partitioned-1.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Simon Riggs
On Tue, 23 Jul 2019 at 19:26, Fabien COELHO <[hidden email]> wrote:

Hello devs,

While doing some performance tests and reviewing patches, I needed to
create partitioned tables. Given the current syntax this is time
consumming.

Good idea. I wonder why we didn't have it already.
 
The attached patch adds two options to create a partitioned "account"
table in pgbench.

It allows to answer quickly simple questions, eg "what is the overhead of
hash partitioning on a simple select on my laptop"? Answer:

  # N=0..?
  sh> pgench -i -s 1 --partition-number=$N --partition-type=hash

Given current naming of options, I would call this --partitions=number-of-partitions and --partition-method=hash
 
  # then run
  sh> pgench -S -M prepared -P 1 -T 10

  # and look at latency:
  # no parts = 0.071 ms
  #   1 hash = 0.071 ms (did someone optimize this case?!)
  #   2 hash ~ 0.126 ms (+ 0.055 ms)
  #  50 hash ~ 0.155 ms
  # 100 hash ~ 0.178 ms
  # 150 hash ~ 0.232 ms
  # 200 hash ~ 0.279 ms
  # overhead ~ (0.050 + [0.0005-0.0008] * nparts) ms

It is linear? 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Fabien COELHO-3

Hello Simon,

>> While doing some performance tests and reviewing patches, I needed to
>> create partitioned tables. Given the current syntax this is time
>> consumming.
>
> Good idea. I wonder why we didn't have it already.

Probably because I did not have to create partitioned table for some
testing:-)

>>   sh> pgench -i -s 1 --partition-number=$N --partition-type=hash
>
> Given current naming of options, I would call this
> --partitions=number-of-partitions and --partition-method=hash

Ok.

>>   # then run
>>   sh> pgench -S -M prepared -P 1 -T 10
>>
>>   # and look at latency:
>>   # no parts = 0.071 ms
>>   #   1 hash = 0.071 ms (did someone optimize this case?!)
>>   #   2 hash ~ 0.126 ms (+ 0.055 ms)
>>   #  50 hash ~ 0.155 ms
>>   # 100 hash ~ 0.178 ms
>>   # 150 hash ~ 0.232 ms
>>   # 200 hash ~ 0.279 ms
>>   # overhead ~ (0.050 + [0.0005-0.0008] * nparts) ms
>
> It is linear?
Good question. I would have hoped affine, but this is not very clear on
these data, which are the median of about five runs, hence the bracket on
the slope factor. At least it is increasing with the number of partitions.
Maybe it would be clearer on the minimum of five runs.

--
Fabien.

pgbench-init-partitioned-2.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Fabien COELHO-3

>>>   # and look at latency:
>>>   # no parts = 0.071 ms
>>>   #   1 hash = 0.071 ms (did someone optimize this case?!)
>>>   #   2 hash ~ 0.126 ms (+ 0.055 ms)
>>>   #  50 hash ~ 0.155 ms
>>>   # 100 hash ~ 0.178 ms
>>>   # 150 hash ~ 0.232 ms
>>>   # 200 hash ~ 0.279 ms
>>>   # overhead ~ (0.050 + [0.0005-0.0008] * nparts) ms
>>
>> It is linear?
>
> Good question. I would have hoped affine, but this is not very clear on these
> data, which are the median of about five runs, hence the bracket on the slope
> factor. At least it is increasing with the number of partitions. Maybe it
> would be clearer on the minimum of five runs.
Here is a fellow up.

On the minimum of all available runs the query time on hash partitions is
about:

    0.64375 nparts + 118.30979 (in µs).

So the overhead is about 47.30979 + 0.64375 nparts, and it is indeed
pretty convincingly linear as suggested by the attached figure.

--
Fabien.

regression.png (54K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Fabien COELHO-3
In reply to this post by Fabien COELHO-3

Attached v3 fixes strcasecmp non portability on windows, per postgresql
patch tester.

--
Fabien.

pgbench-init-partitioned-3.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Asif Rehman
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hi,

The patch looks good to me, Just one suggestion --partition-method option should be made dependent on --partitions, because it has no use unless used with --partitions. What do you think?
 
Regards,
Asif

The new status of this patch is: Waiting on Author
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Fabien COELHO-3

> Just one suggestion --partition-method option should be made dependent
> on --partitions, because it has no use unless used with --partitions.
> What do you think?

Why not. V4 attached.

--
Fabien.

pgbench-init-partitioned-4.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Asif Rehman
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Thanks. All looks good, making it ready for committer.

Regards,
Asif

The new status of this patch is: Ready for Committer
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

akapila
In reply to this post by Fabien COELHO-3
On Mon, Aug 26, 2019 at 11:04 PM Fabien COELHO <[hidden email]> wrote:
>
>
> > Just one suggestion --partition-method option should be made dependent
> > on --partitions, because it has no use unless used with --partitions.
> > What do you think?
>

Some comments:
*
+ case 11: /* partitions */
+ initialization_option_set = true;
+ partitions = atoi(optarg);
+ if (partitions < 0)
+ {
+ fprintf(stderr, "invalid number of partitions: \"%s\"\n",
+ optarg);
+ exit(1);
+ }
+ break;

Is there a reason why we treat "partitions = 0" as a valid value?
Also, shouldn't we keep some max limit for this parameter as well?
Forex. how realistic it will be if the user gives the value of
partitions the same or greater than the number of rows in
pgbench_accounts table?  I understand it is not sensible to give such
a value, but I guess the API should behave sanely in such cases as
well.  I am not sure what will be the good max value for it, but I
think there should be one.  Anyone else have any better suggestions
for this?

*
@@ -3625,6 +3644,7 @@ initCreateTables(PGconn *con)
  const char *bigcols; /* column decls if accountIDs are 64 bits */
  int declare_fillfactor;
  };
+
  static const struct ddlinfo DDLs[] = {

Spurious line change.

*
+    "  --partitions=NUM         partition account table in NUM parts
(defaults: 0)\n"
+    "  --partition-method=(range|hash)\n"
+    "                           partition account table with this
method (default: range)\n"

Refer complete table name like pgbench_accounts instead of just
account. It will be clear and in sync with what we display in some
other options like --skip-some-updates.

*
+    "  --partitions=NUM         partition account table in NUM parts
(defaults: 0)\n"

/defaults/default.

*
I think we should print the information about partitions in
printResults.  It can help users while analyzing results.

*
+enum { PART_NONE, PART_RANGE, PART_HASH }
+ partition_method = PART_NONE;
+

I think it is better to follow the style of QueryMode enum by using
typedef here, that will make look code in sync with nearby code.

*
- int i;

  fprintf(stderr, "creating tables...\n");

- for (i = 0; i < lengthof(DDLs); i++)
+ for (int i = 0; i < lengthof(DDLs); i++)

This is unnecessary change as far as this patch is concerned.  I
understand there is no problem in writing either way, but let's not
change the coding pattern here as part of this patch.

*
+ if (partitions >= 1)
+ {
+ int64 part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
+ char ff[64];
+ ff[0] = '\0';
+ append_fillfactor(ff, sizeof(ff));
+
+ fprintf(stderr, "creating %d partitions...\n", partitions);
+
+ for (int p = 1; p <= partitions; p++)
+ {
+ char query[256];
+
+ if (partition_method == PART_RANGE)
+ {

part_size can be defined inside "if (partition_method == PART_RANGE)"
as it is used here.  In general, this part of the code can use some
comments.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Fabien COELHO-3

Hello Amit,

Thanks for the feedback.

> + case 11: /* partitions */
> + initialization_option_set = true;
> + partitions = atoi(optarg);
> + if (partitions < 0)
> + {
> + fprintf(stderr, "invalid number of partitions: \"%s\"\n",
> + optarg);
> + exit(1);
> + }
> + break;
>
> Is there a reason why we treat "partitions = 0" as a valid value?
Yes. It is an explicit "do not create partitioned tables", which differ
from 1 which says "create a partitionned table with just one partition".

> Also, shouldn't we keep some max limit for this parameter as well?

I do not think so. If someone wants to test how terrible it is to use
100000 partitions, we should not prevent it.

> Forex. how realistic it will be if the user gives the value of
> partitions the same or greater than the number of rows in
> pgbench_accounts table?

Although I agree that it does not make much sense, for testing purposes
why not, to test overheads in critical cases for instance.

> I understand it is not sensible to give such a value, but I guess the
> API should behave sanely in such cases as well.

Yep, it should work.

> I am not sure what will be the good max value for it, but I
> think there should be one.

I disagree. Pgbench is a tool for testing performance for given
parameters. If postgres accepts a parameter there is no reason why pgbench
should reject it.

> @@ -3625,6 +3644,7 @@ initCreateTables(PGconn *con)
>  const char *bigcols; /* column decls if accountIDs are 64 bits */
>  int declare_fillfactor;
>  };
> +
>  static const struct ddlinfo DDLs[] = {
>
> Spurious line change.

Indeed.

> *
> +    "  --partitions=NUM         partition account table in NUM parts
> (defaults: 0)\n"
> +    "  --partition-method=(range|hash)\n"
> +    "                           partition account table with this
> method (default: range)\n"
>
> Refer complete table name like pgbench_accounts instead of just account.
> It will be clear and in sync with what we display in some other options
> like --skip-some-updates.
Ok.

> *
> +    "  --partitions=NUM         partition account table in NUM parts
> (defaults: 0)\n"
>
> /defaults/default.

Ok.

> I think we should print the information about partitions in
> printResults.  It can help users while analyzing results.

Hmmm. Why not, with some hocus-pocus to get the information out of
pg_catalog, and trying to fail gracefully so that if pgbench is run
against a no partitioning-support version.

> *
> +enum { PART_NONE, PART_RANGE, PART_HASH }
> + partition_method = PART_NONE;
> +
>
> I think it is better to follow the style of QueryMode enum by using
> typedef here, that will make look code in sync with nearby code.

Hmmm. Why not. This means inventing a used-once type name for
partition_method. My great creativity lead to partition_method_t.

> *
> - int i;
>
>  fprintf(stderr, "creating tables...\n");
>
> - for (i = 0; i < lengthof(DDLs); i++)
> + for (int i = 0; i < lengthof(DDLs); i++)
>
> This is unnecessary change as far as this patch is concerned.  I
> understand there is no problem in writing either way, but let's not
> change the coding pattern here as part of this patch.
The reason I did that is that I had a stupid bug in a development version
which was due to an accidental reuse of this index, which would have been
prevented by this declaration style. Removed anyway.

> + if (partitions >= 1)
> + {
> + int64 part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
> + char ff[64];
> + ff[0] = '\0';
> + append_fillfactor(ff, sizeof(ff));
> +
> + fprintf(stderr, "creating %d partitions...\n", partitions);
> +
> + for (int p = 1; p <= partitions; p++)
> + {
> + char query[256];
> +
> + if (partition_method == PART_RANGE)
> + {
>
> part_size can be defined inside "if (partition_method == PART_RANGE)"
> as it is used here.
I just wanted to avoid recomputing the value in the loop, but indeed it
may be computed needlessly. Moved.

> In general, this part of the code can use some comments.

Ok.

Attached an updated version.

--
Fabien.

pgbench-init-partitioned-5.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Dilip Kumar-2
On Wed, Sep 11, 2019 at 6:08 PM Fabien COELHO <[hidden email]> wrote:
> Attached an updated version.
I have reviewed the patch and done some basic testing.  It works as
per the expectation

I have a few cosmetic comments

1.
+ if (partitions >= 1)
+ {
+ char ff[64];
+ ff[0] = '\0';
+ append_fillfactor(ff, sizeof(ff));

Generally, we give one blank line between the variable declaration and
the first statement of the block.

2.
+ if (p == 1)
+ sprintf(minvalue, "minvalue");
+ else
+ sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);

(p-1) -> (p - 1)

I am just wondering will it be a good idea to expand it to support
multi-level partitioning?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

akapila
In reply to this post by Fabien COELHO-3
On Wed, Sep 11, 2019 at 6:08 PM Fabien COELHO <[hidden email]> wrote:
>
>

I would like to take inputs from others as well for the display part
of this patch.  After this patch, for a simple-update pgbench test,
the changed output will be as follows (note: partition method and
partitions):
pgbench.exe -c 4 -j 4 -T 10 -N postgres
starting vacuum...end.
transaction type: <builtin: simple update>
scaling factor: 1
partition method: hash
partitions: 3
query mode: simple
number of clients: 4
number of threads: 4
duration: 10 s
number of transactions actually processed: 14563
latency average = 2.749 ms
tps = 1454.899150 (including connections establishing)
tps = 1466.689412 (excluding connections establishing)

What do others think about this?  This will be the case when the user
has used --partitions option in pgbench, otherwise, it won't change.

>
> > + case 11: /* partitions */
> > + initialization_option_set = true;
> > + partitions = atoi(optarg);
> > + if (partitions < 0)
> > + {
> > + fprintf(stderr, "invalid number of partitions: \"%s\"\n",
> > + optarg);
> > + exit(1);
> > + }
> > + break;
> >
> > Is there a reason why we treat "partitions = 0" as a valid value?
>
> Yes. It is an explicit "do not create partitioned tables", which differ
> from 1 which says "create a partitionned table with just one partition".
>

Why would anyone want to use --partitions option in the first case
("do not create partitioned tables")?

>
> > I think we should print the information about partitions in
> > printResults.  It can help users while analyzing results.
>
> Hmmm. Why not, with some hocus-pocus to get the information out of
> pg_catalog, and trying to fail gracefully so that if pgbench is run
> against a no partitioning-support version.
>

+ res = PQexec(con,
+ "select p.partstrat, count(*) "
+ "from pg_catalog.pg_class as c "
+ "left join pg_catalog.pg_partitioned_table as p on (p.partrelid = c.oid) "
+ "left join pg_catalog.pg_inherits as i on (c.oid = i.inhparent) "
+ "where c.relname = 'pgbench_accounts' "
+ "group by 1, c.oid");

Can't we write this query with inner join instead of left join?  What
additional purpose you are trying to serve by using left join?

> > *
> > +enum { PART_NONE, PART_RANGE, PART_HASH }
> > + partition_method = PART_NONE;
> > +
> >
> > I think it is better to follow the style of QueryMode enum by using
> > typedef here, that will make look code in sync with nearby code.
>
> Hmmm. Why not. This means inventing a used-once type name for
> partition_method. My great creativity lead to partition_method_t.
>

+partition_method_t partition_method = PART_NONE;

It is better to make this static.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Fabien COELHO-3
In reply to this post by Dilip Kumar-2

Hello Dilip,

> Generally, we give one blank line between the variable declaration and
> the first statement of the block.

Ok.

> (p-1) -> (p - 1)

Ok.

> I am just wondering will it be a good idea to expand it to support
> multi-level partitioning?

ISTM that how the user could specify multi-level parameters is pretty
unclear, so I would let that as a possible extension if someone wants it
enough.

Attached v6 implements the two cosmetic changes outlined above.

--
Fabien.

pgbench-init-partitioned-6.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Dilip Kumar-2
On Fri, Sep 13, 2019 at 1:35 PM Fabien COELHO <[hidden email]> wrote:

Thanks for the updated version of the patch.

> > Generally, we give one blank line between the variable declaration and
> > the first statement of the block.
>
> Ok.
>
> > (p-1) -> (p - 1)
>
> Ok.
>
> > I am just wondering will it be a good idea to expand it to support
> > multi-level partitioning?
>
> ISTM that how the user could specify multi-level parameters is pretty
> unclear, so I would let that as a possible extension if someone wants it
> enough.
Ok
>
> Attached v6 implements the two cosmetic changes outlined above.

+ /* For RANGE, we use open-ended partitions at the beginning and end */
+ if (p == 1)
+ sprintf(minvalue, "minvalue");
+ else
+ sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);
+
+ if (p < partitions)
+ sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
+ else
+ sprintf(maxvalue, "maxvalue");

I do not understand the reason why first partition need to be
open-ended?  Because we are clear that the minimum value of the aid is
1 in pgbench_accout.  So if you directly use
sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);  then also it
will give 1 as minvalue for the first partition and that will be the
right thing to do.  Am I missing something here?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Fabien COELHO-3
In reply to this post by akapila

Hello Amit,

>>> Is there a reason why we treat "partitions = 0" as a valid value?
>>
>> Yes. It is an explicit "do not create partitioned tables", which differ
>> from 1 which says "create a partitionned table with just one partition".
>
> Why would anyone want to use --partitions option in the first case
> ("do not create partitioned tables")?

Having an explicit value for the default is generally a good idea, eg for
a script to tests various partitioning settings:

   for nparts in 0 1 2 3 4 5 6 7 8 9 ; do
     pgbench -i --partitions=$nparts ... ;
     ...
   done

Otherwise you would need significant kludging to add/remove the option.
Allowing 0 does not harm anyone.

Now if the consensus is to remove an explicit 0, it is simple enough to
change it, but my opinion is that it is better to have it.

>>> I think we should print the information about partitions in
>>> printResults.  It can help users while analyzing results.
>
> + res = PQexec(con,
> + "select p.partstrat, count(*) "
> + "from pg_catalog.pg_class as c "
> + "left join pg_catalog.pg_partitioned_table as p on (p.partrelid = c.oid) "
> + "left join pg_catalog.pg_inherits as i on (c.oid = i.inhparent) "
> + "where c.relname = 'pgbench_accounts' "
> + "group by 1, c.oid");
>
> Can't we write this query with inner join instead of left join?  What
> additional purpose you are trying to serve by using left join?

I'm ensuring that there is always a one line answer, whether it is
partitioned or not. Maybe the count(*) should be count(something in p) to
get 0 instead of 1 on non partitioned tables, though, but this is hidden
in the display anyway.

> +partition_method_t partition_method = PART_NONE;
>
> It is better to make this static.

I do agree, but this would depart from all other global variables around
which are currently not static. Maybe a separate patch could turn them all
as static, but ISTM that this patch should not change the current style.

--
Fabien.


Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Fabien COELHO-3
In reply to this post by Dilip Kumar-2

Hello Dilip,

> + /* For RANGE, we use open-ended partitions at the beginning and end */
> + if (p == 1)
> + sprintf(minvalue, "minvalue");
> + else
> + sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);
> +
> + if (p < partitions)
> + sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
> + else
> + sprintf(maxvalue, "maxvalue");
>
> I do not understand the reason why first partition need to be
> open-ended?  Because we are clear that the minimum value of the aid is 1
> in pgbench_accout.  So if you directly use sprintf(minvalue,
> INT64_FORMAT, (p-1) * part_size + 1);  then also it will give 1 as
> minvalue for the first partition and that will be the right thing to do.
> Am I missing something here?

This is simply for the principle that any value allowed for the primary
key type has a corresponding partition, and also that it exercices these
special values.

It also probably reduces the cost of checking whether a value belongs to
the first partition because one test is removed, so there is a small
additional performance benefit beyond principle and coverage.

--
Fabien.


Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Dilip Kumar-2
On Fri, Sep 13, 2019 at 2:05 PM Fabien COELHO <[hidden email]> wrote:

>
>
> Hello Dilip,
>
> > + /* For RANGE, we use open-ended partitions at the beginning and end */
> > + if (p == 1)
> > + sprintf(minvalue, "minvalue");
> > + else
> > + sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);
> > +
> > + if (p < partitions)
> > + sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
> > + else
> > + sprintf(maxvalue, "maxvalue");
> >
> > I do not understand the reason why first partition need to be
> > open-ended?  Because we are clear that the minimum value of the aid is 1
> > in pgbench_accout.  So if you directly use sprintf(minvalue,
> > INT64_FORMAT, (p-1) * part_size + 1);  then also it will give 1 as
> > minvalue for the first partition and that will be the right thing to do.
> > Am I missing something here?

>
> This is simply for the principle that any value allowed for the primary
> key type has a corresponding partition, and also that it exercices these
> special values.

IMHO, the primary key values for the pgbench_accout tables are always
within the defined range minvalue=1 and maxvalue=scale*100000, right?
>
> It also probably reduces the cost of checking whether a value belongs to
> the first partition because one test is removed, so there is a small
> additional performance benefit beyond principle and coverage.

Ok,  I agree that it will slightly reduce the cost for the tuple
falling in the first and the last partition.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

akapila
In reply to this post by Fabien COELHO-3
On Fri, Sep 13, 2019 at 1:50 PM Fabien COELHO <[hidden email]> wrote:

> >>> Is there a reason why we treat "partitions = 0" as a valid value?
> >>
> >> Yes. It is an explicit "do not create partitioned tables", which differ
> >> from 1 which says "create a partitionned table with just one partition".
> >
> > Why would anyone want to use --partitions option in the first case
> > ("do not create partitioned tables")?
>
> Having an explicit value for the default is generally a good idea, eg for
> a script to tests various partitioning settings:
>
>    for nparts in 0 1 2 3 4 5 6 7 8 9 ; do
>      pgbench -i --partitions=$nparts ... ;
>      ...
>    done
>
> Otherwise you would need significant kludging to add/remove the option.
> Allowing 0 does not harm anyone.
>
> Now if the consensus is to remove an explicit 0, it is simple enough to
> change it, but my opinion is that it is better to have it.
>

Fair enough, let us see if anyone else wants to weigh in.

> >>> I think we should print the information about partitions in
> >>> printResults.  It can help users while analyzing results.
> >
> > + res = PQexec(con,
> > + "select p.partstrat, count(*) "
> > + "from pg_catalog.pg_class as c "
> > + "left join pg_catalog.pg_partitioned_table as p on (p.partrelid = c.oid) "
> > + "left join pg_catalog.pg_inherits as i on (c.oid = i.inhparent) "
> > + "where c.relname = 'pgbench_accounts' "
> > + "group by 1, c.oid");
> >
> > Can't we write this query with inner join instead of left join?  What
> > additional purpose you are trying to serve by using left join?
>
> I'm ensuring that there is always a one line answer, whether it is
> partitioned or not. Maybe the count(*) should be count(something in p) to
> get 0 instead of 1 on non partitioned tables, though, but this is hidden
> in the display anyway.
>

Sure, but I feel the code will be simplified.  I see no reason for
using left join here.

> > +partition_method_t partition_method = PART_NONE;
> >
> > It is better to make this static.
>
> I do agree, but this would depart from all other global variables around
> which are currently not static.
>

Check QueryMode.

> Maybe a separate patch could turn them all
> as static, but ISTM that this patch should not change the current style.
>

No need to change others, but we can do it for this one.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Alvaro Herrera-9
On 2019-Sep-13, Amit Kapila wrote:

> On Fri, Sep 13, 2019 at 1:50 PM Fabien COELHO <[hidden email]> wrote:
> > >>> Is there a reason why we treat "partitions = 0" as a valid value?
> > >>
> > >> Yes. It is an explicit "do not create partitioned tables", which differ
> > >> from 1 which says "create a partitionned table with just one partition".
> > >
> > > Why would anyone want to use --partitions option in the first case
> > > ("do not create partitioned tables")?
> >
> > Having an explicit value for the default is generally a good idea, eg for
> > a script to tests various partitioning settings:
> >
> >    for nparts in 0 1 2 3 4 5 6 7 8 9 ; do
> >      pgbench -i --partitions=$nparts ... ;
> >      ...
> >    done
> >
> > Otherwise you would need significant kludging to add/remove the option.
> > Allowing 0 does not harm anyone.
> >
> > Now if the consensus is to remove an explicit 0, it is simple enough to
> > change it, but my opinion is that it is better to have it.
>
> Fair enough, let us see if anyone else wants to weigh in.

It seems convenient UI -- I vote to keep it.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pgbench - allow to create partitioned tables

Alvaro Herrera-9
In reply to this post by akapila
On 2019-Sep-13, Amit Kapila wrote:

> I would like to take inputs from others as well for the display part
> of this patch.  After this patch, for a simple-update pgbench test,
> the changed output will be as follows (note: partition method and
> partitions):

> pgbench.exe -c 4 -j 4 -T 10 -N postgres
> starting vacuum...end.
> transaction type: <builtin: simple update>
> scaling factor: 1
> partition method: hash
> partitions: 3
> query mode: simple
> number of clients: 4
> number of threads: 4
> duration: 10 s
> number of transactions actually processed: 14563
> latency average = 2.749 ms
> tps = 1454.899150 (including connections establishing)
> tps = 1466.689412 (excluding connections establishing)
>
> What do others think about this?  This will be the case when the user
> has used --partitions option in pgbench, otherwise, it won't change.

I wonder what's the intended usage of this output ... it seems to be
getting a bit too long.  Is this intended for machine processing?  I
would rather have more things per line in a more compact header.
But then I'm not the kind of person who automates multiple pgbench runs.
Maybe we can get some input from Tomas, who does -- how do you automate
extracting data from collected pgbench output, or do you instead just
redirect the output to a file whose path/name indicates the parameters
that were used?  (I do the latter.)

I mean, if we changed it like this (and I'm not proposing to do it in
this patch, this is only an example), would it bother anyone?

$ pgbench -x -y -z ...
starting vacuum...end.
scaling factor: 1      partition method: hash   partitions: 1
transaction type: <builtin: simple update>      query mode: simple
number of clients: 4   number of threads: 4     duration: 10s
number of transactions actually processed: 14563
latency average = 2.749 ms
tps = 1454.899150 (including connections establishing)
tps = 1466.689412 (excluding connections establishing)


If this output doesn't bother people, then I suggest that this patch
should put the partition information in the line together with scaling
factor.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


12