BUG #16863: Assert failed in set_plain_rel_size() on processing ~* with a long prefix

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16863: Assert failed in set_plain_rel_size() on processing ~* with a long prefix

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      16863
Logged by:          Alexander Lakhin
Email address:      [hidden email]
PostgreSQL version: 13.2
Operating system:   Ubuntu 20.04
Description:        

On executing the query:
CREATE TABLE test (t text);
SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));

I get an assertion failure with the following callstack:
Core was generated by `postgres: law regression [local] SELECT              
                        '.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f2fec581859 in __GI_abort () at abort.c:79
#2  0x0000559da7963ff8 in ExceptionalCondition (
    conditionName=conditionName@entry=0x559da7ab4690 "rel->rows > 0 ||
IS_DUMMY_REL(rel)",
    errorType=errorType@entry=0x559da79bf028 "FailedAssertion",
fileName=fileName@entry=0x559da7ab43b1 "allpaths.c",
    lineNumber=lineNumber@entry=462) at assert.c:67
#3  0x0000559da7731c94 in set_rel_size (root=root@entry=0x559da8c23528,
rel=rel@entry=0x559da8c24058, rti=rti@entry=1,
    rte=rte@entry=0x559da8b43978) at allpaths.c:462
#4  0x0000559da7731cd7 in set_base_rel_sizes
(root=root@entry=0x559da8c23528) at allpaths.c:323
#5  0x0000559da7733ae8 in make_one_rel (root=root@entry=0x559da8c23528,
joinlist=joinlist@entry=0x559da8c24700)
    at allpaths.c:185
#6  0x0000559da775a68f in query_planner (root=root@entry=0x559da8c23528,
    qp_callback=qp_callback@entry=0x559da775e617 <standard_qp_callback>,
qp_extra=qp_extra@entry=0x7ffe0e874eb0)
    at planmain.c:269
#7  0x0000559da77617bd in grouping_planner
(root=root@entry=0x559da8c23528,
    inheritance_update=inheritance_update@entry=false,
tuple_fraction=<optimized out>, tuple_fraction@entry=0)
    at planner.c:2058
#8  0x0000559da7763d5a in subquery_planner (glob=glob@entry=0x559da8b43748,
parse=parse@entry=0x559da8b43860,
    parent_root=parent_root@entry=0x0,
hasRecursion=hasRecursion@entry=false,
tuple_fraction=tuple_fraction@entry=0)
    at planner.c:1015
#9  0x0000559da77643f0 in standard_planner (parse=0x559da8b43860,
query_string=<optimized out>, cursorOptions=256,
    boundParams=<optimized out>) at planner.c:405
#10 0x0000559da77649e4 in planner (parse=parse@entry=0x559da8b43860,
    query_string=query_string@entry=0x559da8b42550 "SELECT * FROM test WHERE
t ~* ('^' || repeat('-', 500));",
    cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0) at planner.c:275
#11 0x0000559da783e71c in pg_plan_query
(querytree=querytree@entry=0x559da8b43860,
    query_string=query_string@entry=0x559da8b42550 "SELECT * FROM test WHERE
t ~* ('^' || repeat('-', 500));",
    cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0) at postgres.c:875
#12 0x0000559da783e7d9 in pg_plan_queries (querytrees=0x559da8c234f0,
    query_string=query_string@entry=0x559da8b42550 "SELECT * FROM test WHERE
t ~* ('^' || repeat('-', 500));",
    cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0) at postgres.c:966
#13 0x0000559da783eca9 in exec_simple_query (
    query_string=query_string@entry=0x559da8b42550 "SELECT * FROM test WHERE
t ~* ('^' || repeat('-', 500));")
    at postgres.c:1158
#14 0x0000559da7840cb0 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x559da8b6d7c8, dbname=<optimized out>,
    username=<optimized out>) at postgres.c:4329
#15 0x0000559da77ac39f in BackendRun (port=port@entry=0x559da8b63ad0) at
postmaster.c:4526
#16 0x0000559da77af4fa in BackendStartup (port=port@entry=0x559da8b63ad0) at
postmaster.c:4210
#17 0x0000559da77af741 in ServerLoop () at postmaster.c:1739
#18 0x0000559da77b0c6a in PostmasterMain (argc=3, argv=<optimized out>) at
postmaster.c:1412
#19 0x0000559da76f9e16 in main (argc=3, argv=0x559da8b3c980) at main.c:210

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16863: Assert failed in set_plain_rel_size() on processing ~* with a long prefix

Alexander Lakhin-2
12.02.2021 22:00, PG Bug reporting form wrote:
> SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));
> ...
> #2  0x0000559da7963ff8 in ExceptionalCondition (
>     conditionName=conditionName@entry=0x559da7ab4690 "rel->rows > 0 ||
> IS_DUMMY_REL(rel)",
>     errorType=errorType@entry=0x559da79bf028 "FailedAssertion",
> fileName=fileName@entry=0x559da7ab43b1 "allpaths.c",
>     lineNumber=lineNumber@entry=462) at assert.c:67
With the attached debugging patch applied I see that rel->rows there is
NaN. At the end of the walk by the following calls:
set_plain_rel_size -> set_baserel_size_estimates ->
clauselist_selectivity -> clauselist_selectivity_simple ->
clause_selectivity -> restriction_selectivity -> icregexeqsel ->
patternsel -> patternsel_common -> pattern_fixed_prefix ->
regex_fixed_prefix -> regex_selectivity
I've found a division that produces NaN:
sel /= pow(FIXED_CHAR_SEL, fixed_prefix_len);

The complete output with the debugging code:
psql:500.sql:2: INFO:  sel:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000,
FIXED_CHAR_SEL: 0.200000, fixed_prefix_len: 500, pow:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
psql:500.sql:2: INFO:  sel: NaN
psql:500.sql:2: INFO:  rel->rows: NaN; nrows: NaN
psql:500.sql:2: INFO:  rel->rows: NaN
psql:500.sql:2: server closed the connection unexpectedly

With the prefix length 400 the output is different:
psql:400.sql:2: INFO:  sel:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002582249878086965166223199366465225696632326517795503118666931124080740300000000000000000000000000000000000000000000000000,
FIXED_CHAR_SEL: 0.200000, fixed_prefix_len: 400, pow:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002582249878086966144821031402096464952544460966116815567078211594740854000000000000000000000000000000000000000000000000000
psql:400.sql:2: INFO:  sel: 1.000000
psql:400.sql:2: INFO:  rel->rows: 7.000000; nrows: 6.800000
psql:400.sql:2: INFO:  rel->rows: 7.000000

On the master with the length 500 there is no assertion failure but the
row count is strange:
psql:master500.sql:3: INFO:  sel:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000,
FIXED_CHAR_SEL: 0.200000, fixed_prefix_len: 500, pow:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
psql:master500.sql:3: INFO:  sel: NaN
psql:master500.sql:3: INFO:  rel->rows:
10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104.000000;
nrows: NaN
psql:master500.sql:3: INFO:  rel->rows:
10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104.000000

By the way, if the following check in restriction_selectivity() is
intended to throw error on an invalid selectivity, shouldn't the
isnan(result) test be appended here?:
    if (result < 0.0 || result > 1.0)
        elog(ERROR, "invalid restriction selectivity: %f", result);

Best regards,
Alexander

debug_sel.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16863: Assert failed in set_plain_rel_size() on processing ~* with a long prefix

Tom Lane-2
Alexander Lakhin <[hidden email]> writes:
> I've found a division that produces NaN:
> sel /= pow(FIXED_CHAR_SEL, fixed_prefix_len);

Hmm.  I'm not getting a NaN AFAICT, but I am getting pretty darn weird
estimates.  I agree this needs some kind of clamp.

regression=# create table test (t text);
CREATE TABLE
regression=# explain SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));
...
 Seq Scan on test  (cost=0.00..27.00 rows=10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104 width=32)
...

BTW, the message you're answering hasn't shown up here, nor is it
in the mailing list archives.  Odd.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16863: Assert failed in set_plain_rel_size() on processing ~* with a long prefix

Alexander Lakhin-2
Hello Tom,
12.02.2021 23:37, Tom Lane wrote:

> Alexander Lakhin <[hidden email]> writes:
>> I've found a division that produces NaN:
>> sel /= pow(FIXED_CHAR_SEL, fixed_prefix_len);
> Hmm.  I'm not getting a NaN AFAICT, but I am getting pretty darn weird
> estimates.  I agree this needs some kind of clamp.
>
> regression=# create table test (t text);
> CREATE TABLE
> regression=# explain SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));
> ...
>  Seq Scan on test  (cost=0.00..27.00 rows=10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104 width=32)
The same number I've seen on the master branch. But on REL_13_STABLE I
get a NaN and then the assertion failure. (I've chosen the version 13.2
in the bug report, but it's really delayed somewhere.)
Though that division produced a NaN for me on both branches.

Best regards,
Alexander


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16863: Assert failed in set_plain_rel_size() on processing ~* with a long prefix

Tom Lane-2
Alexander Lakhin <[hidden email]> writes:
> 12.02.2021 23:37, Tom Lane wrote:
>> Alexander Lakhin <[hidden email]> writes:
>>> I've found a division that produces NaN:
>>> sel /= pow(FIXED_CHAR_SEL, fixed_prefix_len);

>> Hmm.  I'm not getting a NaN AFAICT, but I am getting pretty darn weird
>> estimates.  I agree this needs some kind of clamp.
>>
>> regression=# create table test (t text);
>> CREATE TABLE
>> regression=# explain SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));
>> ...
>> Seq Scan on test  (cost=0.00..27.00 rows=10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104 width=32)

> The same number I've seen on the master branch. But on REL_13_STABLE I
> get a NaN and then the assertion failure.

The difference in behavior is evidently explained by commit a90c950fc,
which has decided to insert a random number in place of a NaN estimate.

Well, it's not really random, it's supposed to be 1e100 ... but EXPLAIN
doesn't know there's only 16 or so significant digits there.  Maybe we
need to work a bit harder on making that print nicely.  And maybe we
should rethink the idea that it's okay to gloss over a NaN estimate
that way.  I certainly don't see a defensible reason for assuming that
NaN means "a large value".

                        regards, tom lane