PostgreSQL12 crash bug report

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

PostgreSQL12 crash bug report

yi huang
1、Run the docker version of PostgreSQL 12

$ docker run --rm -p 5432:5432 -v /tmp:/data -e POSTGRES_USER=exchange
-e POSTGRES_DB=exchange -e POSTGRES_PASSWORD=123456 postgres:12-alpine

2、Download the attached files to /tmp
3、Import the schema and data

$ docker exec -it $CONTAINER psql -U exchange exchange -f
/data/t_order_schema.sql
$ docker exec -it $CONTAINER psql -U exchange exchange -f /data/t_order_data.sql

4、Run pgbench with provided test script

$ pgbench -h 127.0.0.1 -p 5432 -U exchange exchange -T 2 -c 4 -f /tmp/test.sql

t_order_schema.sql (25K) Download Attachment
test.sql (520 bytes) Download Attachment
t_order_data.sql (37K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL12 crash bug report

Tom Lane-2
yi huang <[hidden email]> writes:
> [ crashes with ]
> $ pgbench -h 127.0.0.1 -p 5432 -U exchange exchange -T 2 -c 4 -f /tmp/test.sql

Thanks for the report!

Note that this doesn't fail if you just run test.sql by itself, but
it does soon fall over as described when running multiple copies in
parallel.  Unsurprisingly given that, the crash is inside EvalPlanQual:

(gdb) bt
#0  memcpy () at ../sysdeps/x86_64/memcpy.S:526
#1  0x00000000004863b1 in fill_val (tupleDesc=0x18c7060,
    values=<value optimized out>, isnull=0x195e198,
    data=<value optimized out>, data_size=148670068,
    infomask=<value optimized out>, bit=0x7f07aef4207f "\017")
    at heaptuple.c:247
#2  heap_fill_tuple (tupleDesc=0x18c7060, values=<value optimized out>,
    isnull=0x195e198, data=<value optimized out>, data_size=148670068,
    infomask=<value optimized out>, bit=0x7f07aef4207f "\017")
    at heaptuple.c:336
#3  0x0000000000486921 in heap_form_tuple (tupleDescriptor=0x18c7060,
    values=0x7ffc691d58b0, isnull=0x195e198) at heaptuple.c:1090
#4  0x00000000004ddf78 in toast_build_flattened_tuple (tupleDesc=0x18c7060,
    values=<value optimized out>, isnull=0x195e198) at tuptoaster.c:1336
#5  0x00000000006441f0 in ExecEvalWholeRowVar (state=<value optimized out>,
    op=0x1958030, econtext=<value optimized out>) at execExprInterp.c:4019
#6  0x0000000000647602 in ExecInterpExpr (state=0x1957ea0, econtext=0x19576a0,
    isnull=0x7ffc691d8dff) at execExprInterp.c:519
#7  0x00000000006543ef in ExecEvalExprSwitchContext (
    node=<value optimized out>, accessMtd=0x67a140 <ValuesNext>,
    recheckMtd=0x679f00 <ValuesRecheck>)
    at ../../../src/include/executor/executor.h:307
#8  ExecProject (node=<value optimized out>, accessMtd=0x67a140 <ValuesNext>,
    recheckMtd=0x679f00 <ValuesRecheck>)
    at ../../../src/include/executor/executor.h:341
#9  ExecScan (node=<value optimized out>, accessMtd=0x67a140 <ValuesNext>,
    recheckMtd=0x679f00 <ValuesRecheck>) at execScan.c:199
#10 0x00000000006698dc in ExecProcNode (node=0x19573d8)
    at ../../../src/include/executor/executor.h:239
#11 MultiExecPrivateHash (node=0x19573d8) at nodeHash.c:164
#12 MultiExecHash (node=0x19573d8) at nodeHash.c:114
#13 0x000000000066b685 in ExecHashJoinImpl (pstate=0x1939d08)
    at nodeHashjoin.c:291
#14 ExecHashJoin (pstate=0x1939d08) at nodeHashjoin.c:572
#15 0x000000000064af7f in ExecProcNode (epqstate=<value optimized out>)
    at ../../../src/include/executor/executor.h:239
#16 EvalPlanQualNext (epqstate=<value optimized out>) at execMain.c:2692
#17 0x000000000064b502 in EvalPlanQual (estate=<value optimized out>,
    epqstate=0x18b5420, relation=<value optimized out>, rti=4,
    inputslot=0x195dfc0) at execMain.c:2475
#18 0x0000000000675f09 in ExecUpdate (mtstate=0x18b5328,
    tupleid=0x7ffc691d9130, oldtuple=0x0, slot=<value optimized out>,
    planSlot=0x18ca138, epqstate=0x18b5420, estate=0x18b4558, canSetTag=true)
    at nodeModifyTable.c:1389
#19 0x000000000067668d in ExecModifyTable (pstate=0x18b5328)
    at nodeModifyTable.c:2226
#20 0x000000000064d547 in ExecProcNode (queryDesc=0x18b4148,
    direction=<value optimized out>, count=0, execute_once=40)
    at ../../../src/include/executor/executor.h:239
#21 ExecutePlan (queryDesc=0x18b4148, direction=<value optimized out>,
    count=0, execute_once=40) at execMain.c:1648
#22 standard_ExecutorRun (queryDesc=0x18b4148,
    direction=<value optimized out>, count=0, execute_once=40)
    at execMain.c:365
#23 0x00000000007c32ab in ProcessQuery (plan=<value optimized out>,
    sourceText=0x17b7f28 "with tmp(id, avail_amount, deal_value, update_id) as (values (76107497731002368,0.00,16546.90746,76107519298113537),(76107514491441155,0.606,5808.03795,76107519298113537))\nupdate t_order set avail_amo"...,
    params=0x0, queryEnv=0x0, dest=<value optimized out>,
    completionTag=0x7ffc691d9430 "") at pquery.c:161
#24 0x00000000007c3613 in PortalRunMulti (portal=0x181dc48, isTopLevel=true,
    setHoldSnapshot=false, dest=0x7f07b7f9fce0, altdest=0x7f07b7f9fce0,
    completionTag=0x7ffc691d9430 "") at pquery.c:1283
#25 0x00000000007c3d80 in PortalRun (portal=0x181dc48,
    count=9223372036854775807, isTopLevel=true, run_once=true,
    dest=0x7f07b7f9fce0, altdest=0x7f07b7f9fce0,
    completionTag=0x7ffc691d9430 "") at pquery.c:796
#26 0x00000000007c0001 in exec_simple_query (
...

As per the stack trace, we're trying to build a new tuple for the output
of a ValuesScan node, but the targetlist for that seems completely insane:
heap_form_tuple is being given a 15-column tupdesc that includes

(gdb) p tupleDescriptor->attrs[1]
$8 = {attrelid = 0, attname = {data = "side", '\000' <repeats 59 times>},
  atttypid = 35644, attstattarget = -1, attlen = 4, attnum = 2, attndims = 0,
  attcacheoff = 8, atttypmod = -1, attbyval = true, attstorage = 112 'p',
  attalign = 105 'i', attnotnull = false, atthasdef = false,
  atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
  attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0}

which seems to be the column that we have a bogus Datum for.  But what's
the code doing expecting that to be available from the ValueScan?  It's a
column of t_order.  And the other data is all wrong too: that composite
type should surely not have attlen = 4 nor attbyval = true.

"explain verbose" claims that the Values node should be returning

               ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=184)
                     Output: "*VALUES*".column2, "*VALUES*".column3, "*VALUES*".column4, "*VALUES*".*, "*VALUES*".column1

so it seems like somehow we've got wrong info in the EPQ context.
I wonder if this is related to the "Hash join explain is broken"
thread.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL12 crash bug report

Tom Lane-2
I wrote:
> yi huang <[hidden email]> writes:
>> [ crashes with ]
>> $ pgbench -h 127.0.0.1 -p 5432 -U exchange exchange -T 2 -c 4 -f /tmp/test.sql

> As per the stack trace, we're trying to build a new tuple for the output
> of a ValuesScan node, but the targetlist for that seems completely insane:

After digging through this, it seems clear that it's the fault of
ad0bda5d2, specifically this code in EvalPlanQualSlot:

            *slot = ExecAllocTableSlot(&epqstate->estate->es_tupleTable,
                                       epqstate->origslot->tts_tupleDescriptor,
                                       &TTSOpsVirtual);

It's setting up an es_epqTupleSlot[] entry on the assumption that it
should have the same tupdesc as the output tuple that's to be rechecked.
This might accidentally fail to malfunction in single-table cases,
but it's quite broken for any join case --- in particular, for the
given test case, the scan tuple for the VALUES node surely doesn't have
the same tupdesc as the join result.

I think we need to change the API of EvalPlanQualSlot, because it
cannot determine the correct tupdesc when it's not handed a Relation.
I'm not entirely sure that its immediate callers can either :-( so
this might propagate out a ways.  Or perhaps we should move the
slot-making someplace else.

I'll go make an open item for this.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL12 crash bug report

Andres Freund
Hi,

On 2019-07-31 19:07:29 -0400, Tom Lane wrote:

> I wrote:
> > yi huang <[hidden email]> writes:
> >> [ crashes with ]
> >> $ pgbench -h 127.0.0.1 -p 5432 -U exchange exchange -T 2 -c 4 -f /tmp/test.sql
>
> > As per the stack trace, we're trying to build a new tuple for the output
> > of a ValuesScan node, but the targetlist for that seems completely insane:
>
> After digging through this, it seems clear that it's the fault of
> ad0bda5d2, specifically this code in EvalPlanQualSlot:
>
>             *slot = ExecAllocTableSlot(&epqstate->estate->es_tupleTable,
>                                        epqstate->origslot->tts_tupleDescriptor,
>                                        &TTSOpsVirtual);
>
> It's setting up an es_epqTupleSlot[] entry on the assumption that it
> should have the same tupdesc as the output tuple that's to be rechecked.
> This might accidentally fail to malfunction in single-table cases,
> but it's quite broken for any join case --- in particular, for the
> given test case, the scan tuple for the VALUES node surely doesn't have
> the same tupdesc as the join result.

:(

We really need to improve the test coverage for EPQ :(. I tried to add a
few cases, but it's still one of the least tested complicated areas in
PG.

To make sure I understand - the problem isn't the slot that we've
created in nodeModifyTable.c via EvalPlanQualSlot(), right? It's the one
we create in EvalPlanQualFetchRowMarks(), because we don't have a proper
tuple type handy to create the slot with?  And then ExecScanFetch() (on
behalf of ValueScan in this case) expects something with the correct
format in that slot, which causes the backtrace you show. And this
problem occurs "only" for ROW_MARK_COPY, because for everything else we
have an associated relation.

Previously we simply didn't need to know the type during EPQ setup,
because we only stored a HeapTuple anyway. And we'd know the appropriate
tupledesc at the places that access the tuple.

Right?


> I think we need to change the API of EvalPlanQualSlot, because it
> cannot determine the correct tupdesc when it's not handed a Relation.
> I'm not entirely sure that its immediate callers can either :-( so
> this might propagate out a ways.  Or perhaps we should move the
> slot-making someplace else.

I think we'll probably have to split EvalPlanQualSlot() into two
functions. One for nodeModifyTable.c and nodeLockRows.c, and the
!ROW_MARK_COPY cases in EvalPlanQualFetchRowMarks().


One bigger change - but possibly one worth it - would be to basically
move the work done in EvalPlanQualFetchRowMarks() to be done on-demand,
at least for ROW_MARK_COPY. We don't need to do

                        /* fetch the whole-row Var for the relation */
                        datum = ExecGetJunkAttribute(epqstate->origslot,
                                                                                 aerm->wholeAttNo,
                                                                                 &isNull);
                        /* non-locked rels could be on the inside of outer joins */
                        if (isNull)
                                continue;

that early in the query. We could call EvalPlanQualFetchRowMark() (note
the singular) from ExecScanFetch(), that ought to currently be the only
place that accesses ROW_MARK_COPY type rowmarks?

We could do the same for the other types of rowmarks, but I think that'd
be semantically somewhat murkier.

That would have the advantage that we'd not unnecessarily put row
marks we might never need into slots. There's probably not that many
cases where that matters, though.


Or we could keep enough information for ROW_MARK_COPY RowMarks around to
create the necessary tupledesc (there's a number of variants as to how -
don't zap the information in add_rte_to_flat_rtable, have setrefs.c
somehow keep maintain rti indexed pointers to Plan nodes, ...).

Not sure yet what's best here.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL12 crash bug report

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2019-07-31 19:07:29 -0400, Tom Lane wrote:
>> It's setting up an es_epqTupleSlot[] entry on the assumption that it
>> should have the same tupdesc as the output tuple that's to be rechecked.
>> This might accidentally fail to malfunction in single-table cases,
>> but it's quite broken for any join case --- in particular, for the
>> given test case, the scan tuple for the VALUES node surely doesn't have
>> the same tupdesc as the join result.

> To make sure I understand - the problem isn't the slot that we've
> created in nodeModifyTable.c via EvalPlanQualSlot(), right? It's the one
> we create in EvalPlanQualFetchRowMarks(), because we don't have a proper
> tuple type handy to create the slot with?

Yeah, I think nodeModifyTable.c is fine, because it always passes the
target relation.  EvalPlanQualFetchRowMarks is not fine, and I'm unsure
about the call in nodeLockRows.c.

> Previously we simply didn't need to know the type during EPQ setup,
> because we only stored a HeapTuple anyway. And we'd know the appropriate
> tupledesc at the places that access the tuple.

Right.  So we gotta refactor that somehow.

> One bigger change - but possibly one worth it - would be to basically
> move the work done in EvalPlanQualFetchRowMarks() to be done on-demand,
> at least for ROW_MARK_COPY.

Hm.  Too tired to think that through right now.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL12 crash bug report

Andres Freund
Hi,

On 2019-07-31 22:37:40 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2019-07-31 19:07:29 -0400, Tom Lane wrote:
> >> It's setting up an es_epqTupleSlot[] entry on the assumption that it
> >> should have the same tupdesc as the output tuple that's to be rechecked.
> >> This might accidentally fail to malfunction in single-table cases,
> >> but it's quite broken for any join case --- in particular, for the
> >> given test case, the scan tuple for the VALUES node surely doesn't have
> >> the same tupdesc as the join result.
>
> > To make sure I understand - the problem isn't the slot that we've
> > created in nodeModifyTable.c via EvalPlanQualSlot(), right? It's the one
> > we create in EvalPlanQualFetchRowMarks(), because we don't have a proper
> > tuple type handy to create the slot with?
>
> Yeah, I think nodeModifyTable.c is fine, because it always passes the
> target relation.  EvalPlanQualFetchRowMarks is not fine, and I'm unsure
> about the call in nodeLockRows.c.
I think nodeLockRows.c itself should be fine - the slots it gets with
EvalPlanQualSlot() are either used for the FDWs to store tuples in it,
or nodeLockRows.c directly fetches tuples into them with
table_tuple_lock(). It itself is only dealing with locking rowmarks, for
which we obviously have a relation.  It does trigger
EvalPlanQualFetchRowMarks() for the non-locking marks - that part is
obviously broken, but not in nodeLockRows.c itself.


> > Previously we simply didn't need to know the type during EPQ setup,
> > because we only stored a HeapTuple anyway. And we'd know the appropriate
> > tupledesc at the places that access the tuple.
>
> Right.  So we gotta refactor that somehow.

One way to avoid doing so would be to just not deal with slots in the
case of ROW_MARK_COPY. The reason why EPQ was made to use to use slots
instead of HeapTuples is that not doing so would require expensive
conversions for AMs that don't use HeapTuple.  But for COPY rowmarks
that's not an issue, because the source already is a HeapTuple (in the
form of a Datum).

That's imo not a particularly pretty approach, but might be the most
viable approach for v12. Attached is a quick WIP implementation, which
indeed fixes the crash reported here, and passes all our
tests. Obviously it'd need some polish if we went for that.



I think it's pretty darn ugly that we don't know what kind of tuples
we're dealing with at the top-level of EPQ. But changing that seems far
from trivial.

The way things set up we don't have a good way to map from rtis to the
associated PlanStates, and currently only the PlanStates have enough
information to infer the slot type.  In fact, if we knew the ScanState
nodes for specific rtis, we could perhaps even just use each node's
ScanState->ss_ScanTupleSlot - seems fragile, but we've effectively done
so before v12.

And we can't easy add a way to map from rti to PlanState, without adding
work to ExecInit* for every Scan node. If ExecInitNode() knew whether
the relevant nodes were Scan nodes, we could maintain the mapping at
that stage, but we afaict don't know that.

I briefly looked at hacking something like ExecInitScanTupleSlot() to
maintain an EState rti->ScanState mapping, but we have ScanState
executor nodes for planner nodes that do *not* inherit from Scan
(e.g. Material). While that's ugly, it's not something we seem likely to
change all that soon - and using ExecInitScanTupleSlot() as a proxy is
just a hack anyway.  It sure would be nice if IsA() actually worked when
inheriting from a node...

We could easily maintain a mapping from Plan->plan_node_id to the
corresponding PlanState from within ExecInitNode(). But I don't think
that'd help that much, as we'd need an rti -> plan_node_id mapping,
which'd again not be trivial to maintain. We could do so in
set_plan_refs(), as that knows both the relevant plan_node_id, and the
adjusted scanrelid.

I'm not exactly sure what the intended uses of plan_node_id are - it was
introduce in d1b7c1ffe72e86932b5395f29e006c3f503bc53d and is,
exclusively?, used as a unique key into dynamic shared memory for
parallel query (so each node can cheaply can build a key for its own
data). So perhaps it'd be a gross misuse of that system. But it doesn't
look like that to me.

Not sure if there's other things that could benefit from the
rti -> ScanState or plan_node_id -> PlanState mapping. It'd be nice for
debugging, I guess, but that doesn't quite seem enough.


> > One bigger change - but possibly one worth it - would be to basically
> > move the work done in EvalPlanQualFetchRowMarks() to be done on-demand,
> > at least for ROW_MARK_COPY.
>
> Hm.  Too tired to think that through right now.

I briefly tried to implement that. The problem is that, as things are
currently set up, we don't have access to the corresponding epqstate
from within executor nodes. That prevents us from accessing
EPQState->{origslot, arowMarks}, which we need to actually be able to
fetch the rows to return.

We could solve that by referencing the EPQState from its EState (but not
from the "main" estate). I've wondered before whether that'd be better
than having various es_epq* fields in EState.  I don't think it'd make
the modularity any worse, given that we're already referencing EPQstate
fields from with EState.  If we moved es_epqTupleSlot into EPQState and
allocated it from within EvalPlanQualInit(), instead of
EvalPlanQualBegin(), we'd address your complaint that we now call that
earlier too.

Doesn't sound that hard. Seems like a big change for v12 though.


As far as I can tell, the only behavioural change of fetching rows later
instead of in EvalPlanQualFetchRowMarks would be that we'd fetch
(locally or via FDW) rows that are referenced, but not locked, at the
same time we lock rows, and that we would not fetch them if any of the
locked rows couldn't be locked.  Which seems like a mild improvement.

Greetings,

Andres Freund

v1-0001-WIP-fix-crash-ROW_MARK_COPY.patch (7K) Download Attachment