BUG #16577: Segfault on altering a table located in a dropped tablespace

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16577: Segfault on altering a table located in a dropped tablespace

PG Doc comments form
The following bug has been logged on the website:

Bug reference:      16577
Logged by:          Alexander Lakhin
Email address:      [hidden email]
PostgreSQL version: 13beta2
Operating system:   Ubuntu 20.04
Description:        

When executing the following query (a modified excerpt from the tablespace
regression test):
CREATE TABLESPACE regress_tblspace LOCATION '@testtablespace@';
CREATE TABLE test_default_tab_p(id bigint, val bigint)
    PARTITION BY LIST (id) TABLESPACE regress_tblspace;
CREATE INDEX test_index2 on test_default_tab_p (val) TABLESPACE
regress_tblspace;
DROP TABLESPACE regress_tblspace;
\d+ test_default_tab_p;
ALTER TABLE test_default_tab_p ALTER val TYPE bigint;

I get a segfault with the stacktrace:
Core was generated by `postgres: law regression [local] ALTER TABLE        
                        '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  quote_identifier (ident=0x0) at ruleutils.c:10754
10754           safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] ==
'_');
(gdb) bt
#0  quote_identifier (ident=0x0) at ruleutils.c:10754
#1  0x000055cd6c3798d3 in pg_get_indexdef_worker
(indexrelid=indexrelid@entry=16389, colno=colno@entry=0,
    excludeOps=excludeOps@entry=0x0, attrsOnly=attrsOnly@entry=false,
keysOnly=keysOnly@entry=false,
    showTblSpc=showTblSpc@entry=true, inherits=true, prettyFlags=0,
missing_ok=false) at ruleutils.c:1460
#2  0x000055cd6c379ab1 in pg_get_indexdef_string
(indexrelid=indexrelid@entry=16389) at ruleutils.c:1161
#3  0x000055cd6c0ca0c1 in RememberIndexForRebuilding (indoid=16389,
tab=tab@entry=0x55cd6c89df20) at tablecmds.c:11718
#4  0x000055cd6c0cd764 in ATExecAlterColumnType
(tab=tab@entry=0x55cd6c89df20, rel=rel@entry=0x7f0e3cef3570,
    cmd=<optimized out>, lockmode=lockmode@entry=8) at tablecmds.c:11280
#5  0x000055cd6c0dece5 in ATExecCmd (wqueue=wqueue@entry=0x7ffe0dcc7a30,
tab=tab@entry=0x55cd6c89df20,
    rel=rel@entry=0x7f0e3cef3570, cmd=<optimized out>,
lockmode=lockmode@entry=8, cur_pass=cur_pass@entry=1,
    context=0x7ffe0dcc7b40) at tablecmds.c:4523
#6  0x000055cd6c0df155 in ATRewriteCatalogs
(wqueue=wqueue@entry=0x7ffe0dcc7a30, lockmode=lockmode@entry=8,
    context=context@entry=0x7ffe0dcc7b40) at
../../../src/include/nodes/nodes.h:594
#7  0x000055cd6c0df3a0 in ATController
(parsetree=parsetree@entry=0x55cd6c712040, rel=rel@entry=0x7f0e3cef3570,
    cmds=0x55cd6c712008, recurse=true, lockmode=lockmode@entry=8,
context=context@entry=0x7ffe0dcc7b40)
    at tablecmds.c:3971
#8  0x000055cd6c0df42a in AlterTable (stmt=stmt@entry=0x55cd6c712040,
lockmode=lockmode@entry=8,
    context=context@entry=0x7ffe0dcc7b40) at tablecmds.c:3627
#9  0x000055cd6c2af6f8 in ProcessUtilitySlow
(pstate=pstate@entry=0x55cd6c89ddb0, pstmt=pstmt@entry=0x55cd6c7120e8,
    queryString=queryString@entry=0x55cd6c711350 "ALTER TABLE
test_default_tab_p ALTER val TYPE bigint;",
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0,
queryEnv=queryEnv@entry=0x0,
    dest=0x55cd6c712388, qc=0x7ffe0dcc8060) at utility.c:1269
#10 0x000055cd6c2af1f2 in standard_ProcessUtility (pstmt=0x55cd6c7120e8,
    queryString=0x55cd6c711350 "ALTER TABLE test_default_tab_p ALTER val
TYPE bigint;",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55cd6c712388, qc=0x7ffe0dcc8060)
    at utility.c:1069
#11 0x000055cd6c2af2d1 in ProcessUtility (pstmt=pstmt@entry=0x55cd6c7120e8,
queryString=<optimized out>,
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>,
queryEnv=<optimized out>,
    dest=dest@entry=0x55cd6c712388, qc=0x7ffe0dcc8060) at utility.c:524
#12 0x000055cd6c2ab73c in PortalRunUtility
(portal=portal@entry=0x55cd6c7747f0, pstmt=pstmt@entry=0x55cd6c7120e8,
    isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x55cd6c712388,

    qc=qc@entry=0x7ffe0dcc8060) at pquery.c:1157
#13 0x000055cd6c2ac270 in PortalRunMulti
(portal=portal@entry=0x55cd6c7747f0, isTopLevel=isTopLevel@entry=true,
    setHoldSnapshot=setHoldSnapshot@entry=false,
dest=dest@entry=0x55cd6c712388, altdest=altdest@entry=0x55cd6c712388,
    qc=qc@entry=0x7ffe0dcc8060) at pquery.c:1303
#14 0x000055cd6c2acf52 in PortalRun (portal=portal@entry=0x55cd6c7747f0,
count=count@entry=9223372036854775807,
    isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true,
dest=dest@entry=0x55cd6c712388,
    altdest=altdest@entry=0x55cd6c712388, qc=0x7ffe0dcc8060) at
pquery.c:779
#15 0x000055cd6c2a93bc in exec_simple_query (
    query_string=query_string@entry=0x55cd6c711350 "ALTER TABLE
test_default_tab_p ALTER val TYPE bigint;")
    at postgres.c:1239
#16 0x000055cd6c2ab2d8 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x55cd6c73ca88, dbname=<optimized out>,
    username=<optimized out>) at postgres.c:4315
#17 0x000055cd6c216e67 in BackendRun (port=port@entry=0x55cd6c735380) at
postmaster.c:4523
#18 0x000055cd6c219fdd in BackendStartup (port=port@entry=0x55cd6c735380) at
postmaster.c:4215
#19 0x000055cd6c21a224 in ServerLoop () at postmaster.c:1727
#20 0x000055cd6c21b74d in PostmasterMain (argc=8, argv=<optimized out>) at
postmaster.c:1400
#21 0x000055cd6c164bed in main (argc=8, argv=0x55cd6c70b9e0) at main.c:210

Best regards,
Alexander

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16577: Segfault on altering a table located in a dropped tablespace

Michael Paquier-2
On Sun, Aug 09, 2020 at 11:00:01AM +0000, PG Bug reporting form wrote:

> When executing the following query (a modified excerpt from the tablespace
> regression test):
> CREATE TABLESPACE regress_tblspace LOCATION '@testtablespace@';
> CREATE TABLE test_default_tab_p(id bigint, val bigint)
>     PARTITION BY LIST (id) TABLESPACE regress_tblspace;
> CREATE INDEX test_index2 on test_default_tab_p (val) TABLESPACE
> regress_tblspace;
> DROP TABLESPACE regress_tblspace;
> \d+ test_default_tab_p;
> ALTER TABLE test_default_tab_p ALTER val TYPE bigint;
Thanks Alexander for the report.  Interesting case indeed.
For a normal table we would complain that the tablespace is not empty
when attempting to drop the tablespace.  But here we have only one
partitioned table still holding references to the tablespace.  Things
get even more spicy with stuff like that, once you try to play with
the orphaned tablespace reference:
CREATE TABLESPACE popo location '/tmp/popo';
CREATE TABLE parent_tab (a int) partition by list (a) tablespace popo;
DROP TABLESPACE popo;
CREATE TABLE child_tab partition of parent_tab for values in (1);
ERROR:  58P01: could not create directory
"pg_tblspc/24587/PG_12_201909212/16384": No such file or directory
LOCATION:  TablespaceCreateDbspace, tablespace.c:161

The issue with indexes is present since 11, but we have more as
reltablespace gets also set for partitioned tables since 12.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16577: Segfault on altering a table located in a dropped tablespace

Tom Lane-2
Michael Paquier <[hidden email]> writes:
> Thanks Alexander for the report.  Interesting case indeed.
> For a normal table we would complain that the tablespace is not empty
> when attempting to drop the tablespace.  But here we have only one
> partitioned table still holding references to the tablespace.

Yeah, this seems like a mess.  DROP TABLESPACE supposes that it can
drop the tablespace if there are no physical files in it, and it's
really hard to see how it could test any more carefully given that
it cannot see what is in pg_class of other databases.

Offhand it seems like we could either

1. Start creating an empty physical file for each partitioned table
or index.

2. Forget the idea that a partitioned table/index has an associated
tablespace.

Neither of these are terribly attractive.  But I notice that we
already backed off the idea that this is a thing to some extent:

regression=# CREATE TABLE test_default_tab_p(id bigint, val bigint)
    PARTITION BY LIST (id) TABLESPACE pg_default;
ERROR:  cannot specify default tablespace for partitioned relations

I'm a bit inclined to think that this "feature" is sufficiently
broken that we should just drop it.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16577: Segfault on altering a table located in a dropped tablespace

Alvaro Herrera-9
On 2020-Aug-09, Tom Lane wrote:

> Michael Paquier <[hidden email]> writes:
> > Thanks Alexander for the report.  Interesting case indeed.
> > For a normal table we would complain that the tablespace is not empty
> > when attempting to drop the tablespace.  But here we have only one
> > partitioned table still holding references to the tablespace.

Ah, so it turns out that the physical files were necessary after all.
Maybe the solution to this problem is indeed to have them.  It means
partly reverting this commit:

commit 807ae415c54628ade937cb209f0fc9913e6b0cf5
Author:     Alvaro Herrera <[hidden email]>
AuthorDate: Fri Jan 4 14:51:17 2019 -0300
CommitDate: Fri Jan 4 14:51:17 2019 -0300

    Don't create relfilenode for relations without storage
   
    Some relation kinds had relfilenode set to some non-zero value, but
    apparently the actual files did not really exist because creation was
    prevented elsewhere.  Get rid of the phony pg_class.relfilenode values.
   
    Catversion bumped, but only because the sanity_test check will fail if
    run in a system initdb'd with the previous version.
   
    Reviewed-by: Kyotaro HORIGUCHI, Michael Paquier
    Discussion: https://postgr.es/m/20181206215552.fm2ypuxq6nhpwjuc@...

But also fixing whatever *other* code was preventing creating of the
filenodes.

As for the crash at hand, it seems it can be solved easily by making
ruleutils avoid trying to dereference a null pointer, as in the attached
patch.  This seems necessary for branches 12 and 13, which have the
above commit, but not for 10/11 (which do have relfilenodes) nor master,
where we can revert it.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

part-idx-tblspc.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16577: Segfault on altering a table located in a dropped tablespace

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> On 2020-Aug-09, Tom Lane wrote:
>> Michael Paquier <[hidden email]> writes:
>>> For a normal table we would complain that the tablespace is not empty
>>> when attempting to drop the tablespace.  But here we have only one
>>> partitioned table still holding references to the tablespace.

> Ah, so it turns out that the physical files were necessary after all.
> Maybe the solution to this problem is indeed to have them.  It means
> partly reverting this commit:

I think actually the hardest part will be figuring out what is the
conversion path, e.g. will pg_upgrade have to know about this.

One point that strikes me is that that will put us in a place where
"does this relation have storage" is not a binary choice.  The possible
answers will be "yes", "no", or "has an empty stub file".  If we try
to take shortcuts rather than handling that honestly, we'll be in for
more bugs.

> As for the crash at hand, it seems it can be solved easily by making
> ruleutils avoid trying to dereference a null pointer, as in the attached
> patch.

Meh.  I have a feeling that that's just the tip of the iceberg of
things that will go wrong in this scenario.  I'm not sure how much
band-aid code we want to expend on the case --- after all, tablespace
create/drop is a superuser-only activity, so people aren't doing it
carelessly (one hopes).

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16577: Segfault on altering a table located in a dropped tablespace

Michael Paquier-2
On Mon, Aug 10, 2020 at 02:59:26PM -0400, Tom Lane wrote:
> I think actually the hardest part will be figuring out what is the
> conversion path, e.g. will pg_upgrade have to know about this.
>
> One point that strikes me is that that will put us in a place where
> "does this relation have storage" is not a binary choice.  The possible
> answers will be "yes", "no", or "has an empty stub file".  If we try
> to take shortcuts rather than handling that honestly, we'll be in for
> more bugs.

Hmm.  Creating a file for partitioned table would be a completely new
thing as well.  heap_create() has never created a file for partitioned
tables since 10 so this could open to a new class of bugs.

>> As for the crash at hand, it seems it can be solved easily by making
>> ruleutils avoid trying to dereference a null pointer, as in the attached
>> patch.
>
> Meh.  I have a feeling that that's just the tip of the iceberg of
> things that will go wrong in this scenario.  I'm not sure how much
> band-aid code we want to expend on the case --- after all, tablespace
> create/drop is a superuser-only activity, so people aren't doing it
> carelessly (one hopes).

+1.  I would suspect that it is not the only code path that would run
into issues.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16577: Segfault on altering a table located in a dropped tablespace

Michael Paquier-2
On Tue, Aug 11, 2020 at 04:04:07PM +0900, Michael Paquier wrote:
> Hmm.  Creating a file for partitioned table would be a completely new
> thing as well.  heap_create() has never created a file for partitioned
> tables since 10 so this could open to a new class of bugs.

This thread has stalled for a couple of weeks now, and I would tend to
take the path where we'd basically revert 8725958 and ca41030.  That's
too late for v13 to do anything about that.  But not for 14.  Any
opinions?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16577: Segfault on altering a table located in a dropped tablespace

Alvaro Herrera-9
On 2020-Sep-08, Michael Paquier wrote:

> On Tue, Aug 11, 2020 at 04:04:07PM +0900, Michael Paquier wrote:
> > Hmm.  Creating a file for partitioned table would be a completely new
> > thing as well.  heap_create() has never created a file for partitioned
> > tables since 10 so this could open to a new class of bugs.
>
> This thread has stalled for a couple of weeks now, and I would tend to
> take the path where we'd basically revert 8725958 and ca41030.  That's
> too late for v13 to do anything about that.  But not for 14.  Any
> opinions?

Well, naturally I oppose this idea.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16577: Segfault on altering a table located in a dropped tablespace

Tomas Vondra-4
On Tue, Sep 08, 2020 at 06:37:29PM -0300, Alvaro Herrera wrote:

>On 2020-Sep-08, Michael Paquier wrote:
>
>> On Tue, Aug 11, 2020 at 04:04:07PM +0900, Michael Paquier wrote:
>> > Hmm.  Creating a file for partitioned table would be a completely new
>> > thing as well.  heap_create() has never created a file for partitioned
>> > tables since 10 so this could open to a new class of bugs.
>>
>> This thread has stalled for a couple of weeks now, and I would tend to
>> take the path where we'd basically revert 8725958 and ca41030.  That's
>> too late for v13 to do anything about that.  But not for 14.  Any
>> opinions?
>
>Well, naturally I oppose this idea.
>

Would it actually solve the issue? ISTM we'd still have to expect cases
with partitioned tables without storage, so presumably we'd have to do
something else ...


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16577: Segfault on altering a table located in a dropped tablespace

Michael Paquier-2
On Wed, Sep 09, 2020 at 01:55:53AM +0200, Tomas Vondra wrote:
> Would it actually solve the issue? ISTM we'd still have to expect cases
> with partitioned tables without storage, so presumably we'd have to do
> something else ...

I am not sure what you mean here.  If we don't keep anymore references
to tablespace OIDs in pg_class for partitioned tables, meaning that we
don't leave anything dangling if the tablespace is dropped without any
files in its location, how could that be a problem?
--
Michael

signature.asc (849 bytes) Download Attachment