pg_basebackup failure after setting default_table_access_method option

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

pg_basebackup failure after setting default_table_access_method option

vignesh C

Hi,

I noticed pg_basebackup failure when default_table_access_method option is set.

Test steps:

Step 1: Init database 
./initdb -D data

Step 2: Start Server
./postgres -D data &

Step 3: Set Guc option 
export PGOPTIONS='-c default_table_access_method=zheap'

Step 4: Peform backup
/pg_basebackup -D backup -p 5432 --no-sync
2019-06-05 20:35:04.088 IST [11601] FATAL:  cannot read pg_class without having selected a database
pg_basebackup: error: could not connect to server: FATAL:  cannot read pg_class without having selected a database

Reason why it is failing:
pg_basebackup does not use any database to connect to server as it backs up the whole data instance. 
As the option default_table_access_method is set.
It tries to validate this option, but while validating the option in ScanPgRelation function:
if (!OidIsValid(MyDatabaseId))
elog(FATAL, "cannot read pg_class without having selected a database");

Here as pg_basebackup uses no database the command fails.

Fix:
The patch has the fix for the above issue:

Let me know your opinion on this.

--
Regards,
vignesh

0001-pg-backup-guc-option-validation-failure.patch (1004 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Haribabu Kommi-2

On Thu, Jun 6, 2019 at 1:46 AM vignesh C <[hidden email]> wrote:

Hi,

I noticed pg_basebackup failure when default_table_access_method option is set.

Test steps:

Step 1: Init database 
./initdb -D data

Step 2: Start Server
./postgres -D data &

Step 3: Set Guc option 
export PGOPTIONS='-c default_table_access_method=zheap'

Step 4: Peform backup
/pg_basebackup -D backup -p 5432 --no-sync
2019-06-05 20:35:04.088 IST [11601] FATAL:  cannot read pg_class without having selected a database
pg_basebackup: error: could not connect to server: FATAL:  cannot read pg_class without having selected a database

Reason why it is failing:
pg_basebackup does not use any database to connect to server as it backs up the whole data instance. 
As the option default_table_access_method is set.
It tries to validate this option, but while validating the option in ScanPgRelation function:
if (!OidIsValid(MyDatabaseId))
elog(FATAL, "cannot read pg_class without having selected a database");

Here as pg_basebackup uses no database the command fails.

Thanks for the details steps to reproduce the bug, I am also able to reproduce the problem.

 
Fix:
The patch has the fix for the above issue:

Let me know your opinion on this.

Thanks for the patch and it fixes the problem.

Regards,
Haribabu Kommi
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Michael Paquier-2
On Thu, Jun 06, 2019 at 11:19:48AM +1000, Haribabu Kommi wrote:
> Thanks for the details steps to reproduce the bug, I am also able to
> reproduce the problem.

This way is even more simple, no need for zheap to be around:
=# create access method heap2 TYPE table HANDLER heap_tableam_handler;
CREATE ACCESS METHOD
And then:
PGOPTIONS="-c default_table_access_method=heap2" psql "replication=1"
psql: error: could not connect to server: FATAL:  cannot read pg_class
without having selected a database

> Thanks for the patch and it fixes the problem.

I was wondering if we actually need at all a catalog lookup at this
stage, simplifying get_table_am_oid() on the way so as we always
throw an error (its missing_ok is here to allow a proper error in the
GUC context).  The table AM lookup happens only when creating a table,
so we could just get a failure when attempting to create a table with
this incorrect value.

Actually, when updating a value and reloading and/or restarting the
server, it is possible to easily get in a state where we have an
invalid table AM parameter stored in the GUC, which is what the
callback is here to avoid.  If you attempt to update the parameter
with ALTER SYSTEM, then the command complains.  So it seems to me that
the user experience is inconsistent.
--
Michael


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

vignesh C
In reply to this post by Haribabu Kommi-2
Thanks Hari for helping in verifying.

Regards,
Vignesh


On Thu, Jun 6, 2019 at 6:50 AM Haribabu Kommi <[hidden email]> wrote:

On Thu, Jun 6, 2019 at 1:46 AM vignesh C <[hidden email]> wrote:

Hi,

I noticed pg_basebackup failure when default_table_access_method option is set.

Test steps:

Step 1: Init database 
./initdb -D data

Step 2: Start Server
./postgres -D data &

Step 3: Set Guc option 
export PGOPTIONS='-c default_table_access_method=zheap'

Step 4: Peform backup
/pg_basebackup -D backup -p 5432 --no-sync
2019-06-05 20:35:04.088 IST [11601] FATAL:  cannot read pg_class without having selected a database
pg_basebackup: error: could not connect to server: FATAL:  cannot read pg_class without having selected a database

Reason why it is failing:
pg_basebackup does not use any database to connect to server as it backs up the whole data instance. 
As the option default_table_access_method is set.
It tries to validate this option, but while validating the option in ScanPgRelation function:
if (!OidIsValid(MyDatabaseId))
elog(FATAL, "cannot read pg_class without having selected a database");

Here as pg_basebackup uses no database the command fails.

Thanks for the details steps to reproduce the bug, I am also able to reproduce the problem.

 
Fix:
The patch has the fix for the above issue:

Let me know your opinion on this.

Thanks for the patch and it fixes the problem.

Regards,
Haribabu Kommi
Fujitsu Australia


--
Regards,
vignesh
                          Have a nice day
Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Dmitry Dolgov
In reply to this post by Michael Paquier-2
> On Thu, Jun 6, 2019 at 9:06 AM Michael Paquier <[hidden email]> wrote:
>
> I was wondering if we actually need at all a catalog lookup at this
> stage, simplifying get_table_am_oid() on the way so as we always
> throw an error (its missing_ok is here to allow a proper error in the
> GUC context).

Just for me to understand, do you suggest to not check
default_table_access_method existence in check_default_table_access_method? If
yes, then

> The table AM lookup happens only when creating a table, so we could just get
> a failure when attempting to create a table with this incorrect value.

is correct, but doesn't it leave the room for some problems in the future with
a wrong assumptions about correctness of default_table_access_method?

> Actually, when updating a value and reloading and/or restarting the
> server, it is possible to easily get in a state where we have an
> invalid table AM parameter stored in the GUC, which is what the
> callback is here to avoid.  If you attempt to update the parameter
> with ALTER SYSTEM, then the command complains.  So it seems to me that
> the user experience is inconsistent.

Right, as far as I see the there is the same for e.g. default_tablespace due to
IsTransactionState condition. In fact looks like one can see the very same
issue with this option too, so probably it also needs to have MyDatabaseId
check.


Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Andres Freund
In reply to this post by Michael Paquier-2
Hi,

On 2019-06-06 16:06:36 +0900, Michael Paquier wrote:

> On Thu, Jun 06, 2019 at 11:19:48AM +1000, Haribabu Kommi wrote:
> > Thanks for the details steps to reproduce the bug, I am also able to
> > reproduce the problem.
>
> This way is even more simple, no need for zheap to be around:
> =# create access method heap2 TYPE table HANDLER heap_tableam_handler;
> CREATE ACCESS METHOD
> And then:
> PGOPTIONS="-c default_table_access_method=heap2" psql "replication=1"
> psql: error: could not connect to server: FATAL:  cannot read pg_class
> without having selected a database

Yea, need to fix that.


> > Thanks for the patch and it fixes the problem.
>
> I was wondering if we actually need at all a catalog lookup at this
> stage, simplifying get_table_am_oid() on the way so as we always
> throw an error (its missing_ok is here to allow a proper error in the
> GUC context).  The table AM lookup happens only when creating a table,
> so we could just get a failure when attempting to create a table with
> this incorrect value.

I think that'd be a bad plan. We check other such GUCs,
e.g. default_tablespace where this behaviour has been copied from, even
if not bulletproof.


> Actually, when updating a value and reloading and/or restarting the
> server, it is possible to easily get in a state where we have an
> invalid table AM parameter stored in the GUC, which is what the
> callback is here to avoid.

We have plenty other callbacks that aren't bulletproof, so I don't think
this is really something we should / can change in isolation here.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Andres Freund
In reply to this post by Dmitry Dolgov
Hi,

On 2019-06-08 16:03:09 +0200, Dmitry Dolgov wrote:
> > On Thu, Jun 6, 2019 at 9:06 AM Michael Paquier <[hidden email]> wrote:
> > The table AM lookup happens only when creating a table, so we could just get
> > a failure when attempting to create a table with this incorrect value.
>
> is correct, but doesn't it leave the room for some problems in the future with
> a wrong assumptions about correctness of default_table_access_method?

What do you mean by that? Every single use of
default_table_access_method (and similarly default_tablespace) has to
check the value, because it could be outdated / not checked due to wrong
context.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Dmitry Dolgov
> On Sat, Jun 8, 2019 at 5:30 PM Andres Freund <[hidden email]> wrote:
>
> On 2019-06-08 16:03:09 +0200, Dmitry Dolgov wrote:
> > > On Thu, Jun 6, 2019 at 9:06 AM Michael Paquier <[hidden email]> wrote:
> > > The table AM lookup happens only when creating a table, so we could just get
> > > a failure when attempting to create a table with this incorrect value.
> >
> > is correct, but doesn't it leave the room for some problems in the future with
> > a wrong assumptions about correctness of default_table_access_method?
>
> What do you mean by that?

I didn't have any particular problem in mind, just an abstract and probably
wrong observation. One more observation is that this

> Every single use of default_table_access_method (and similarly
> default_tablespace) has to check the value, because it could be outdated /
> not checked due to wrong context.

for default_tablespace clearly expressed in GetDefaultTablespace function (if
you see something like that, obviously you better use it), but there is nothing
like similar for default_table_access_method so one have to keep it in mind
(although of course it's not a problem so far, since it's being used in only
one place).


Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Michael Paquier-2
In reply to this post by Andres Freund
On Sat, Jun 08, 2019 at 08:26:07AM -0700, Andres Freund wrote:
> We have plenty other callbacks that aren't bulletproof, so I don't think
> this is really something we should / can change in isolation here.

Good point.  I was looking at the check callbacks in guc.c for similar
changes, and missed these.  After testing, only these parameters fail
with the same error:
- default_table_access_method
- default_text_search_config

For the second one it's much older.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Andres Freund
Hi,

On 2019-06-10 16:37:33 +0900, Michael Paquier wrote:

> On Sat, Jun 08, 2019 at 08:26:07AM -0700, Andres Freund wrote:
> > We have plenty other callbacks that aren't bulletproof, so I don't think
> > this is really something we should / can change in isolation here.
>
> Good point.  I was looking at the check callbacks in guc.c for similar
> changes, and missed these.  After testing, only these parameters fail
> with the same error:
> - default_table_access_method
> - default_text_search_config
>
> For the second one it's much older.

Yea, that's where the default_table_access_method code originates from,
obviously. I'll backpatch the default_text_search_config fix separately
(and first).

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Michael Paquier-2
On Mon, Jun 10, 2019 at 10:33:37PM -0700, Andres Freund wrote:
> Yea, that's where the default_table_access_method code originates from,
> obviously. I'll backpatch the default_text_search_config fix separately
> (and first).

So you are just planning to add a check on MyDatabaseId for both?  No
objections to that.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Andres Freund
Hi,

On 2019-06-11 14:56:36 +0900, Michael Paquier wrote:
> On Mon, Jun 10, 2019 at 10:33:37PM -0700, Andres Freund wrote:
> > Yea, that's where the default_table_access_method code originates from,
> > obviously. I'll backpatch the default_text_search_config fix separately
> > (and first).
>
> So you are just planning to add a check on MyDatabaseId for both?  No
> objections to that.

Well, all four. Given it's just copied code I don't see much code in
splitting the commit anymore.

I noticed some other uglyness: check_timezone calls interval_in(),
without any checks. Not a huge fan of doing all that in postmaster, even
leaving the wrong error reporting aside :(.  But that seems like a
plenty different enough issue to fix it separately, if we decide we want
to do so.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pg_basebackup failure after setting default_table_access_method option

Michael Paquier-2
On Mon, Jun 10, 2019 at 11:49:03PM -0700, Andres Freund wrote:
> Well, all four. Given it's just copied code I don't see much code in
> splitting the commit anymore.

Thanks for pushing the fix, the result looks fine.

> I noticed some other uglyness: check_timezone calls interval_in(),
> without any checks. Not a huge fan of doing all that in postmaster, even
> leaving the wrong error reporting aside :(.  But that seems like a
> plenty different enough issue to fix it separately, if we decide we want
> to do so.

Indeed, I have not noticed this one :(
--
Michael

signature.asc (849 bytes) Download Attachment