BUG #16095: Segfault while executing trigger

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16095: Segfault while executing trigger

apt.postgresql.org Repository Update
The following bug has been logged on the website:

Bug reference:      16095
Logged by:          Thomas Butz
Email address:      [hidden email]
PostgreSQL version: 12.0
Operating system:   Modified postgres:12 docker image (Debian Buster)
Description:        

Postgres package version: 12.0-2.pgdg100+1

I'm trying to setup Nominatim and the import seems to consistently fail
while updating a table because a postgres process terminates with a
segfault.
(discussion: https://github.com/openstreetmap/Nominatim/issues/1551)

So far i was able to obtain a coredump and analyze it with gdb:

# gdb -q -c /tmp/core.postgres.1572952590
/usr/lib/postgresql/12/bin/postgres
Reading symbols from /usr/lib/postgresql/12/bin/postgres...Reading symbols
from
/usr/lib/debug/.build-id/e3/1f64b70ce618d5c59b5485a32deeceb0264254.debug...done.
done.
[New LWP 581]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres:'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000560fcc7e0013 in GetMemoryChunkContext (pointer=0x0) at
./build/../src/include/utils/memutils.h:127
127     ./build/../src/include/utils/memutils.h: No such file or
directory.
(gdb) bt
#0  0x0000560fcc7e0013 in GetMemoryChunkContext (pointer=0x0) at
./build/../src/include/utils/memutils.h:127
#1  pfree (pointer=0x0) at ./build/../src/backend/utils/mmgr/mcxt.c:1033
#2  0x0000560fcc381a35 in heap_freetuple (htup=<optimized out>) at
./build/../src/backend/access/common/heaptuple.c:1340
#3  0x0000560fcc52fe69 in tts_buffer_heap_clear (slot=0x560fce8d8ff0) at
./build/../src/backend/executor/execTuples.c:652
#4  0x0000560fcc53022e in ExecClearTuple (slot=0x560fce8d8ff0) at
./build/../src/include/executor/tuptable.h:428
#5  ExecResetTupleTable (tupleTable=0x560fcdd30d28, shouldFree=false) at
./build/../src/backend/executor/execTuples.c:1165
#6  0x0000560fcc526171 in ExecEndPlan (estate=0x560fcdd2efd0,
planstate=<optimized out>) at
./build/../src/backend/executor/execMain.c:1560
#7  standard_ExecutorEnd (queryDesc=0x560fcdd17b50) at
./build/../src/backend/executor/execMain.c:496
#8  0x0000560fcc55b3f3 in _SPI_pquery (tcount=<optimized out>,
fire_triggers=<optimized out>, queryDesc=<optimized out>) at
./build/../src/backend/executor/spi.c:2533
#9  _SPI_execute_plan (plan=plan@entry=0x560fce495920,
paramLI=paramLI@entry=0x560fce4a6d50, snapshot=snapshot@entry=0x0,
crosscheck_snapshot=crosscheck_snapshot@entry=0x0,
read_only=read_only@entry=false,
    fire_triggers=fire_triggers@entry=true, tcount=<optimized out>) at
./build/../src/backend/executor/spi.c:2296
#10 0x0000560fcc55bb14 in SPI_execute_plan_with_paramlist
(plan=0x560fce495920, params=0x560fce4a6d50, read_only=<optimized out>,
tcount=0) at ./build/../src/backend/executor/spi.c:577
#11 0x00007efce42193b4 in exec_stmt_execsql
(estate=estate@entry=0x7ffff7444e00, stmt=stmt@entry=0x7efce41a4278) at
./build/../src/pl/plpgsql/src/pl_exec.c:4162
#12 0x00007efce421a0eb in exec_stmt (estate=estate@entry=0x7ffff7444e00,
stmt=0x7efce41a4278) at ./build/../src/pl/plpgsql/src/pl_exec.c:2033
#13 0x00007efce421d294 in exec_stmts (estate=0x7ffff7444e00,
stmts=<optimized out>) at ./build/../src/pl/plpgsql/src/pl_exec.c:1924
#14 0x00007efce421da98 in exec_for_query
(estate=estate@entry=0x7ffff7444e00, stmt=stmt@entry=0x7efce41a2a00,
portal=0x560fcd6cee30, prefetch_ok=prefetch_ok@entry=true) at
./build/../src/pl/plpgsql/src/pl_exec.c:6007
#15 0x00007efce421a756 in exec_stmt_fors (stmt=0x7efce41a2a00,
estate=0x7ffff7444e00) at ./build/../src/pl/plpgsql/src/pl_exec.c:2791
#16 exec_stmt (estate=estate@entry=0x7ffff7444e00, stmt=0x7efce41a2a00) at
./build/../src/pl/plpgsql/src/pl_exec.c:1997
#17 0x00007efce421d294 in exec_stmts (estate=0x7ffff7444e00,
stmts=<optimized out>) at ./build/../src/pl/plpgsql/src/pl_exec.c:1924
#18 0x00007efce421c020 in exec_stmt_if (stmt=0x7efce41a4a88,
estate=0x7ffff7444e00) at ./build/../src/pl/plpgsql/src/pl_exec.c:2474
#19 exec_stmt (estate=estate@entry=0x7ffff7444e00, stmt=0x7efce41a4a88) at
./build/../src/pl/plpgsql/src/pl_exec.c:1977
#20 0x00007efce421d294 in exec_stmts (estate=0x7ffff7444e00,
stmts=<optimized out>) at ./build/../src/pl/plpgsql/src/pl_exec.c:1924
#21 0x00007efce421c020 in exec_stmt_if (stmt=0x7efce41a8160,
estate=0x7ffff7444e00) at ./build/../src/pl/plpgsql/src/pl_exec.c:2474
#22 exec_stmt (estate=estate@entry=0x7ffff7444e00, stmt=0x7efce41a8160) at
./build/../src/pl/plpgsql/src/pl_exec.c:1977
#23 0x00007efce421d294 in exec_stmts (estate=0x7ffff7444e00,
stmts=<optimized out>) at ./build/../src/pl/plpgsql/src/pl_exec.c:1924
#24 0x00007efce421d65a in exec_stmt_block
(estate=estate@entry=0x7ffff7444e00, block=block@entry=0x7efce41b58a0) at
./build/../src/pl/plpgsql/src/pl_exec.c:1865
#25 0x00007efce421ae8b in exec_stmt (estate=estate@entry=0x7ffff7444e00,
stmt=0x7efce41b58a0) at ./build/../src/pl/plpgsql/src/pl_exec.c:1957
#26 0x00007efce421de36 in plpgsql_exec_trigger
(func=func@entry=0x560fcd691238, trigdata=0x7ffff7445240) at
./build/../src/pl/plpgsql/src/pl_exec.c:1005
#27 0x00007efce420fbc4 in plpgsql_call_handler (fcinfo=0x7ffff74450c0) at
./build/../src/pl/plpgsql/src/pl_handler.c:258
#28 0x0000560fcc4ff087 in ExecCallTriggerFunc
(trigdata=trigdata@entry=0x7ffff7445240, tgindx=tgindx@entry=2,
finfo=0x560fcdaf25d0, instr=0x0, per_tuple_context=0x560fcdfa4690) at
./build/../src/backend/commands/trigger.c:2432
#29 0x0000560fcc5060cc in ExecBRUpdateTriggers
(estate=estate@entry=0x560fcdaf1f00, epqstate=epqstate@entry=0x560fcdaf27a8,
relinfo=relinfo@entry=0x560fcdaf2170, tupleid=tupleid@entry=0x7ffff744540a,

    fdw_trigtuple=fdw_trigtuple@entry=0x0,
newslot=newslot@entry=0x560fcd833e10) at
./build/../src/backend/commands/trigger.c:3093
#30 0x0000560fcc54a57d in ExecUpdate (mtstate=mtstate@entry=0x560fcdaf26b0,
tupleid=0x7ffff744540a, oldtuple=0x0, slot=0x560fcd833e10,
planSlot=0x560fcd833bd0, epqstate=epqstate@entry=0x560fcdaf27a8,
estate=0x560fcdaf1f00,
    canSetTag=true) at
./build/../src/backend/executor/nodeModifyTable.c:1072
#31 0x0000560fcc54ba52 in ExecModifyTable (pstate=0x560fcdaf26b0) at
./build/../src/backend/executor/nodeModifyTable.c:2223
#32 0x0000560fcc524983 in ExecProcNode (node=0x560fcdaf26b0) at
./build/../src/include/executor/executor.h:239
#33 ExecutePlan (execute_once=<optimized out>, dest=0x560fcca2ec00
<donothingDR>, direction=<optimized out>, numberTuples=0,
sendTuples=<optimized out>, operation=CMD_UPDATE,
use_parallel_mode=<optimized out>,
    planstate=0x560fcdaf26b0, estate=0x560fcdaf1f00) at
./build/../src/backend/executor/execMain.c:1646
#34 standard_ExecutorRun (queryDesc=0x560fcd75a240, direction=<optimized
out>, count=0, execute_once=<optimized out>) at
./build/../src/backend/executor/execMain.c:364
#35 0x0000560fcc68b437 in ProcessQuery (plan=<optimized out>,
sourceText=0x560fcd75a140 "update placex set indexed_status = 0 where
place_id = $1", params=0x560fcd75a1b0, queryEnv=0x0, dest=0x560fcca2ec00
<donothingDR>,
    completionTag=0x7ffff74459c0 "") at
./build/../src/backend/tcop/pquery.c:161
#36 0x0000560fcc68b66b in PortalRunMulti
(portal=portal@entry=0x560fcd6ced20, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=0x560fcca2ec00
<donothingDR>, dest@entry=0x560fcd65d270,
    altdest=0x560fcca2ec00 <donothingDR>, altdest@entry=0x560fcd65d270,
completionTag=completionTag@entry=0x7ffff74459c0 "") at
./build/../src/backend/tcop/pquery.c:1283
#37 0x0000560fcc68c25f in PortalRun (portal=portal@entry=0x560fcd6ced20,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x560fcd65d270,
    altdest=altdest@entry=0x560fcd65d270, completionTag=0x7ffff74459c0 "")
at ./build/../src/backend/tcop/pquery.c:796
#38 0x0000560fcc68975e in exec_execute_message
(max_rows=9223372036854775807, portal_name=0x560fcd65ce60 "") at
./build/../src/backend/tcop/postgres.c:2090
#39 PostgresMain (argc=<optimized out>, argv=argv@entry=0x560fcd687b38,
dbname=<optimized out>, username=<optimized out>) at
./build/../src/backend/tcop/postgres.c:4297
#40 0x0000560fcc60f579 in BackendRun (port=0x560fcd681d70,
port=0x560fcd681d70) at
./build/../src/backend/postmaster/postmaster.c:4431
#41 BackendStartup (port=0x560fcd681d70) at
./build/../src/backend/postmaster/postmaster.c:4122
#42 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1704
#43 0x0000560fcc61049f in PostmasterMain (argc=1, argv=0x560fcd652e60) at
./build/../src/backend/postmaster/postmaster.c:1377
#44 0x0000560fcc377ebd in main (argc=1, argv=0x560fcd652e60) at
./build/../src/backend/main/main.c:228

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16095: Segfault while executing trigger

Tom Lane-2
PG Bug reporting form <[hidden email]> writes:
> I'm trying to setup Nominatim and the import seems to consistently fail
> while updating a table because a postgres process terminates with a
> segfault.

Hmm ... can you reproduce this with a smaller test case, by any chance?
I'm not eager to figure out what Nominatim is, let alone install it.

It is evidently happening within a BEFORE UPDATE trigger, which is
interesting but not necessarily relevant.

> (gdb) bt
> #0  0x0000560fcc7e0013 in GetMemoryChunkContext (pointer=0x0) at
> ./build/../src/include/utils/memutils.h:127
> #1  pfree (pointer=0x0) at ./build/../src/backend/utils/mmgr/mcxt.c:1033
> #2  0x0000560fcc381a35 in heap_freetuple (htup=<optimized out>) at
> ./build/../src/backend/access/common/heaptuple.c:1340
> #3  0x0000560fcc52fe69 in tts_buffer_heap_clear (slot=0x560fce8d8ff0) at
> ./build/../src/backend/executor/execTuples.c:652
> #4  0x0000560fcc53022e in ExecClearTuple (slot=0x560fce8d8ff0) at
> ./build/../src/include/executor/tuptable.h:428
> #5  ExecResetTupleTable (tupleTable=0x560fcdd30d28, shouldFree=false) at
> ./build/../src/backend/executor/execTuples.c:1165
> #6  0x0000560fcc526171 in ExecEndPlan (estate=0x560fcdd2efd0,
> planstate=<optimized out>) at
> ./build/../src/backend/executor/execMain.c:1560

So pretty clearly, this slot has a null bslot->base.tuple pointer and
yet its TTS_FLAG_SHOULDFREE flag is set.

Wondering about how that could be, I notice that execTuples.c seems
to have a bad coding pattern of setting TTS_FLAG_SHOULDFREE *before*
the pointer is valid.  Eg, in tts_buffer_heap_materialize, a failure
in heap_form_tuple would leave the slot in an inconsistent state.

I'm not sure that that explains this report, because we typically
would not run ExecutorEnd on a plan tree that had failed, but I'm
still strongly inclined to run around and move those flag-setting
steps down a bit.  Andres, any objection?

I'm also noticing some places that seem pretty much like sloppy
copy and paste jobs, eg a couple lines further down yet, we have

                Assert(BufferIsValid(bslot->buffer));
                if (likely(BufferIsValid(bslot->buffer)))
                        ReleaseBuffer(bslot->buffer);
                bslot->buffer = InvalidBuffer;

What's the point of having both an assertion and a run-time test?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16095: Segfault while executing trigger

Andres Freund
On 2019-11-05 12:38:32 -0500, Tom Lane wrote:

> PG Bug reporting form <[hidden email]> writes:
> > I'm trying to setup Nominatim and the import seems to consistently fail
> > while updating a table because a postgres process terminates with a
> > segfault.
>
> Hmm ... can you reproduce this with a smaller test case, by any chance?
> I'm not eager to figure out what Nominatim is, let alone install it.
>
> It is evidently happening within a BEFORE UPDATE trigger, which is
> interesting but not necessarily relevant.
>
> > (gdb) bt
> > #0  0x0000560fcc7e0013 in GetMemoryChunkContext (pointer=0x0) at
> > ./build/../src/include/utils/memutils.h:127
> > #1  pfree (pointer=0x0) at ./build/../src/backend/utils/mmgr/mcxt.c:1033
> > #2  0x0000560fcc381a35 in heap_freetuple (htup=<optimized out>) at
> > ./build/../src/backend/access/common/heaptuple.c:1340
> > #3  0x0000560fcc52fe69 in tts_buffer_heap_clear (slot=0x560fce8d8ff0) at
> > ./build/../src/backend/executor/execTuples.c:652
> > #4  0x0000560fcc53022e in ExecClearTuple (slot=0x560fce8d8ff0) at
> > ./build/../src/include/executor/tuptable.h:428
> > #5  ExecResetTupleTable (tupleTable=0x560fcdd30d28, shouldFree=false) at
> > ./build/../src/backend/executor/execTuples.c:1165
> > #6  0x0000560fcc526171 in ExecEndPlan (estate=0x560fcdd2efd0,
> > planstate=<optimized out>) at
> > ./build/../src/backend/executor/execMain.c:1560
>
> So pretty clearly, this slot has a null bslot->base.tuple pointer and
> yet its TTS_FLAG_SHOULDFREE flag is set.

Hm. It seems quite likely that this is yet another report of:

commit d986d4e87f61c68f52c68ebc274960dc664b7b4e
Author: Andres Freund <[hidden email]>
Date:   2019-10-04 11:59:34 -0700

    Fix crash caused by EPQ happening with a before update trigger present.

The copying of a slot into itself yielded precisely this symptom, a slot
with shouldfree set, but without a tuple associated.


> Wondering about how that could be, I notice that execTuples.c seems
> to have a bad coding pattern of setting TTS_FLAG_SHOULDFREE *before*
> the pointer is valid.  Eg, in tts_buffer_heap_materialize, a failure
> in heap_form_tuple would leave the slot in an inconsistent state.

> I'm not sure that that explains this report, because we typically
> would not run ExecutorEnd on a plan tree that had failed, but I'm
> still strongly inclined to run around and move those flag-setting
> steps down a bit.  Andres, any objection?

No objection here. I don't think it ought to cause an actual problem for
the reason you mention (we'd otherwise run into tons of problems with
slots trying to free tuples that already have been released via memory
contexts), but if the code looks otherwise ok, I see no reason to not
move the flag setting bag.


> I'm also noticing some places that seem pretty much like sloppy
> copy and paste jobs, eg a couple lines further down yet, we have
>
> Assert(BufferIsValid(bslot->buffer));
> if (likely(BufferIsValid(bslot->buffer)))
> ReleaseBuffer(bslot->buffer);
> bslot->buffer = InvalidBuffer;
>
> What's the point of having both an assertion and a run-time test?

Fair question. I think I might just have wanted to make sure we don't
leak a pin, but that's unconvincing, given that we actually have a
mechanism to do so reliably at eoxact.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16095: Segfault while executing trigger

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2019-11-05 12:38:32 -0500, Tom Lane wrote:
>> So pretty clearly, this slot has a null bslot->base.tuple pointer and
>> yet its TTS_FLAG_SHOULDFREE flag is set.

> Hm. It seems quite likely that this is yet another report of:
> commit d986d4e87f61c68f52c68ebc274960dc664b7b4e
>     Fix crash caused by EPQ happening with a before update trigger present.
> The copying of a slot into itself yielded precisely this symptom, a slot
> with shouldfree set, but without a tuple associated.

Thomas, can you try with that patch and see if it fixes the problem
for you?  We're quite close to 12.1 release, so if there's more to fix,
it'd be better to find out ASAP.

Note that this theory requires that the query we see being executed within
a BEFORE UPDATE trigger had itself fired a BEFORE UPDATE trigger, and that
there'd been concurrent commits causing EPQ to run.  That doesn't seem
exactly implausible, but it's data not evident in your report.


>> Wondering about how that could be, I notice that execTuples.c seems
>> to have a bad coding pattern of setting TTS_FLAG_SHOULDFREE *before*
>> the pointer is valid.  Eg, in tts_buffer_heap_materialize, a failure
>> in heap_form_tuple would leave the slot in an inconsistent state.
>> I'm not sure that that explains this report, because we typically
>> would not run ExecutorEnd on a plan tree that had failed, but I'm
>> still strongly inclined to run around and move those flag-setting
>> steps down a bit.  Andres, any objection?

> No objection here.

OK, I'll take a look at that tomorrow or so.  It's probably just
cosmetic, but it makes me itch anyway.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16095: Segfault while executing trigger

Thomas Butz
In reply to this post by apt.postgresql.org Repository Update
> Thomas, can you try with that patch and see if it fixes the problem
> for you?  We're quite close to 12.1 release, so if there's more to fix,
> it'd be better to find out ASAP.
>
> Note that this theory requires that the query we see being executed within
> a BEFORE UPDATE trigger had itself fired a BEFORE UPDATE trigger, and that
> there'd been concurrent commits causing EPQ to run.  That doesn't seem
> exactly implausible, but it's data not evident in your report.

That seems to be the case because the software is running parallel UPDATE requests against a table called "placex" which has an active trigger[1]:


CREATE TRIGGER placex_before_update BEFORE UPDATE ON placex
    FOR EACH ROW EXECUTE PROCEDURE placex_update();


and the placex_update() function is again issuing UPDATE statements[2]:


UPDATE placex set linked_place_id = null, indexed_status = 2
    where linked_place_id = NEW.place_id;


[1] https://github.com/openstreetmap/Nominatim/blob/65daef70c1f9aa80e6d37fc12198309e49b883ca/sql/tables.sql#L198-L199
[2] https://github.com/openstreetmap/Nominatim/blob/05d7f9139247981cc7d9edc1edca9492e94b516b/sql/functions.sql#L1248-L1249


Are there any prebuilt deb packages with this patch?



Kind regards

Thomas Butz
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16095: Segfault while executing trigger

Tom Lane-2
Thomas Butz <[hidden email]> writes:
>> Thomas, can you try with that patch and see if it fixes the problem
>> for you? We're quite close to 12.1 release, so if there's more to fix,
>> it'd be better to find out ASAP.

> Are there any prebuilt deb packages with this patch?

Doubt it.  You'd have to take the deb source package, drop in a
replacement PG tarball, and rebuild.  If that's within your abilities,
you could grab a nightly snapshot tarball from

https://www.postgresql.org/ftp/snapshot/

Otherwise, it'll *probably* be fixed in next week's releases.
It'd just be annoying to miss the window if it's not fixed ...

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16095: Segfault while executing trigger

Thomas Butz
> Doubt it.  You'd have to take the deb source package, drop in a
> replacement PG tarball, and rebuild.  If that's within your abilities,
> you could grab a nightly snapshot tarball from
>
> https://www.postgresql.org/ftp/snapshot/
>
> Otherwise, it'll *probably* be fixed in next week's releases.
> It'd just be annoying to miss the window if it's not fixed ...
>
> regards, tom lane


I'll give it a shot ;)


Kind regards


Thomas Butz