pgsql: Fix some issues with step generation in partition pruning.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

pgsql: Fix some issues with step generation in partition pruning.

Etsuro Fujita-3
Fix some issues with step generation in partition pruning.

In the case of range partitioning, get_steps_using_prefix() assumes that
the passed-in prefix list contains at least one clause for each of the
partition keys earlier than one specified in the passed-in
step_lastkeyno, but the caller (ie, gen_prune_steps_from_opexps())
didn't take it into account, which led to a server crash or incorrect
results when the list contained no clauses for such partition keys, as
reported in bug #16500 and #16501 from Kobayashi Hisanori.  Update the
caller to call that function only when the list created there contains
at least one clause for each of the earlier partition keys in the case
of range partitioning.

While at it, fix some other issues:

* The list to pass to get_steps_using_prefix() is allowed to contain
  multiple clauses for the same partition key, as described in the
  comment for that function, but that function actually assumed that the
  list contained just a single clause for each of middle partition keys,
  which led to an assertion failure when the list contained multiple
  clauses for such partition keys.  Update that function to match the
  comment.
* In the case of hash partitioning, partition keys are allowed to be
  NULL, in which case the list to pass to get_steps_using_prefix()
  contains no clauses for NULL partition keys, but that function treats
  that case as like the case of range partitioning, which led to the
  assertion failure.  Update the assertion test to take into account
  NULL partition keys in the case of hash partitioning.
* Fix a typo in a comment in get_steps_using_prefix_recurse().
* gen_partprune_steps() failed to detect self-contradiction from
  strict-qual clauses and an IS NULL clause for the same partition key
  in some cases, producing incorrect partition-pruning steps, which led
  to incorrect results of partition pruning, but didn't cause any
  user-visible problems fortunately, as the self-contradiction is
  detected later in the query planning.  Update that function to detect
  the self-contradiction.

Per bug #16500 and #16501 from Kobayashi Hisanori.  Patch by me, initial
diagnosis for the reported issue and review by Dmitry Dolgov.
Back-patch to v11, where partition pruning was introduced.

Discussion: https://postgr.es/m/16500-d1613f2a78e1e090%40postgresql.org
Discussion: https://postgr.es/m/16501-5234a9a0394f6754%40postgresql.org

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/13838740f61fc455aa4196d257efc0b761daba1f

Modified Files
--------------
src/backend/partitioning/partprune.c          | 187 ++++++++++++++++++--------
src/test/regress/expected/partition_prune.out |  92 +++++++++++++
src/test/regress/sql/partition_prune.sql      |  71 ++++++++++
3 files changed, 294 insertions(+), 56 deletions(-)

Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix some issues with step generation in partition pruning.

Thomas Munro-5
On Tue, Jul 28, 2020 at 2:01 PM Etsuro Fujita <[hidden email]> wrote:
> src/backend/partitioning/partprune.c          | 187 ++++++++++++++++++--------

Hi Fujita-san

I wonder if this build farm failure is related?

Program terminated with signal 11, Segmentation fault.
...
#0  0x59c15c in gen_partprune_steps_internal (context=0x40223d28, clauses=0x0)
    at partprune.c:1483
1483 for (i = 0; i < pc->keyno; i++)

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gaur&dt=2020-08-01%2004%3A12%3A08


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix some issues with step generation in partition pruning.

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> Hi Fujita-san
> I wonder if this build farm failure is related?

Sure looks that way, doesn't it?  I'm just now working to reproduce
on gaur's host and poke into it with a debugger.  More news in an
hour or two (it's slow :-().

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix some issues with step generation in partition pruning.

Etsuro Fujita-2
On Sat, Aug 1, 2020 at 11:13 PM Tom Lane <[hidden email]> wrote:
> Thomas Munro <[hidden email]> writes:
> > Hi Fujita-san
> > I wonder if this build farm failure is related?

Thanks for letting me know!

> Sure looks that way, doesn't it?  I'm just now working to reproduce
> on gaur's host and poke into it with a debugger.  More news in an
> hour or two (it's slow :-().

Thanks for that!

I'm not sure this is related to that failure, but self-reviewing the
patch, one thing I noticed I missed is that get_steps_using_prefix()
assumes that clauses in the passed-in prefix list are sorted in an
ascending order of partkeyidx, but the caller (ie,
gen_prune_steps_from_opexps()) doesn't take that into account.  I
might have broken something.  I'm a bit tired today, so I'll look at
this more closely tomorrow.

Best regards,
Etsuro Fujita


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix some issues with step generation in partition pruning.

Tom Lane-2
Etsuro Fujita <[hidden email]> writes:
> On Sat, Aug 1, 2020 at 11:13 PM Tom Lane <[hidden email]> wrote:
>> Sure looks that way, doesn't it?  I'm just now working to reproduce
>> on gaur's host and poke into it with a debugger.  More news in an
>> hour or two (it's slow :-().

> Thanks for that!

I've concluded that this is probably a compiler bug.  It doesn't fail
at optimization level -O0 or -O2, only -O1; and trying to step through
gen_partprune_steps_internal() suggests that the part_scheme local
variable is changing value, which it surely should not.  Since gaur
is running an ancient gcc version, and -O1 is doubtless a pretty
under-tested optimization level, bugs there are not so surprising.

There wasn't any amazingly good reason to be using -O1 for gaur,
so I've switched the animal to use -O2.  I expect it'll go back
to green in a few hours.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix some issues with step generation in partition pruning.

Etsuro Fujita-2
On Sun, Aug 2, 2020 at 3:16 AM Tom Lane <[hidden email]> wrote:

> I've concluded that this is probably a compiler bug.  It doesn't fail
> at optimization level -O0 or -O2, only -O1; and trying to step through
> gen_partprune_steps_internal() suggests that the part_scheme local
> variable is changing value, which it surely should not.  Since gaur
> is running an ancient gcc version, and -O1 is doubtless a pretty
> under-tested optimization level, bugs there are not so surprising.
>
> There wasn't any amazingly good reason to be using -O1 for gaur,
> so I've switched the animal to use -O2.  I expect it'll go back
> to green in a few hours.

I checked that gaur got back to green.  Thanks!

Best regards,
Etsuro Fujita


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix some issues with step generation in partition pruning.

Robert Haas
In reply to this post by Tom Lane-2
On Sat, Aug 1, 2020 at 2:16 PM Tom Lane <[hidden email]> wrote:
> There wasn't any amazingly good reason to be using -O1 for gaur,
> so I've switched the animal to use -O2.  I expect it'll go back
> to green in a few hours.

While I like having HP-UX buildfarm coverage, the status page says
that gaur is running 10.20, which according to noted reliable source
Wikipedia, was released in 1996 and went out of support in 2003. So I
wonder whether testing that version is really a sensible thing to do
at this point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Fix some issues with step generation in partition pruning.

Tom Lane-2
Robert Haas <[hidden email]> writes:
> While I like having HP-UX buildfarm coverage, the status page says
> that gaur is running 10.20, which according to noted reliable source
> Wikipedia, was released in 1996 and went out of support in 2003. So I
> wonder whether testing that version is really a sensible thing to do
> at this point.

Certainly HPUX 10.20 isn't interesting as an OS anymore.  I think the
point of this animal is mostly to test on HPPA hardware, and the reason
HPPA is interesting for us is mostly that it has such strange spinlocks,
and that's not OS-dependent.

If I still had support coverage on the machine, I'd upgrade to v11,
but I don't.  (Wikipedia claims that HPUX is going EOL altogether
next year, so such a change wouldn't buy much for long anyhow.)

I experimented a year or so ago with installing OpenBSD, which seems
to be the last major open-source OS with HPPA support.  I did get it
running, but it takes about twice as long to do "check-world" as
HPUX on the same machine, which does not speak well to the amount of
effort OpenBSD has put into the port.  It already takes gaur near
six hours to do one buildfarm run; I don't want it to be twelve.
So I'm planning to keep running gaur as-is until (a) the hardware dies
or (b) we make a policy decision that breaks support for this
hardware.

pademelon (same machine, vendor compiler) is just there to keep
us honest on C89 support until v11 is EOL.  There are other ways
to test C89 compliance, perhaps, but as long as this one is handy
I don't feel like putting effort into figuring that out.  (Besides,
we have darn little coverage of not-gcc-derived compilers.)

                        regards, tom lane