BUG #15734: Walsender process crashing when executing SHOW ALL;

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15734: Walsender process crashing when executing SHOW ALL;

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15734
Logged by:          Alexander Kukushkin
Email address:      [hidden email]
PostgreSQL version: 11.2
Operating system:   Ubuntu 18.04.2 LTS
Description:        

According to the documentation [1], replication protocol supports SHOW
command, which is similar to the SQL command SHOW [2], so I tried to execute
"SHOW ALL;" and walsender crashed with the following stack-trace:
Program received signal SIGSEGV, Segmentation fault.
MemoryContextAlloc (context=0x0, size=104) at
./build/../src/backend/utils/mmgr/mcxt.c:783
783     ./build/../src/backend/utils/mmgr/mcxt.c: No such file or
directory.
(gdb) bt
#0  MemoryContextAlloc (context=0x0, size=104) at
./build/../src/backend/utils/mmgr/mcxt.c:783
#1  0x0000558b292beb97 in CopySnapshot (snapshot=0x558b29715060
<CatalogSnapshotData>) at ./build/../src/backend/utils/time/snapmgr.c:674
#2  0x0000558b292bf855 in RegisterSnapshotOnOwner (snapshot=0x558b29715060
<CatalogSnapshotData>, owner=0x558b2ae178c8) at
./build/../src/backend/utils/time/snapmgr.c:884
#3  0x0000558b292bf896 in RegisterSnapshot (snapshot=<optimized out>) at
./build/../src/backend/utils/time/snapmgr.c:868
#4  0x0000558b28ece60f in systable_beginscan
(heapRelation=heapRelation@entry=0x7f96bedb2cc8, indexId=<optimized out>,
indexOK=<optimized out>, snapshot=snapshot@entry=0x0, nkeys=nkeys@entry=1,
key=key@entry=0x7ffe14269ec0)
    at ./build/../src/backend/access/index/genam.c:356
#5  0x0000558b2926bfd3 in SearchCatCacheList (cache=0x558b2ae2f700,
nkeys=nkeys@entry=1, v1=16384, v2=v2@entry=0, v3=v3@entry=0) at
./build/../src/backend/utils/cache/catcache.c:1650
#6  0x0000558b2927e2ee in SearchSysCacheList (cacheId=cacheId@entry=8,
nkeys=nkeys@entry=1, key1=<optimized out>, key2=key2@entry=0,
key3=key3@entry=0) at ./build/../src/backend/utils/cache/syscache.c:1427
#7  0x0000558b2917ef50 in roles_is_member_of (roleid=roleid@entry=16384) at
./build/../src/backend/utils/adt/acl.c:4862
#8  0x0000558b29182717 in is_member_of_role (member=16384,
role=role@entry=3374) at ./build/../src/backend/utils/adt/acl.c:4946
#9  0x0000558b2929fe0a in ShowAllGUCConfig (dest=<optimized out>) at
./build/../src/backend/utils/misc/guc.c:8280
#10 GetPGVariable (name=<optimized out>, dest=<optimized out>) at
./build/../src/backend/utils/misc/guc.c:8184
#11 0x0000558b2911b7aa in exec_replication_command
(cmd_string=cmd_string@entry=0x558b2ade36b0 "SHOW ALL;") at
./build/../src/backend/replication/walsender.c:1555
#12 0x0000558b291696a6 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x558b2ae125d8, dbname=<optimized out>, username=<optimized
out>) at ./build/../src/backend/tcop/postgres.c:4178
#13 0x0000558b290f4a7d in BackendRun (port=0x558b2ae0c690) at
./build/../src/backend/postmaster/postmaster.c:4361
#14 BackendStartup (port=0x558b2ae0c690) at
./build/../src/backend/postmaster/postmaster.c:4033
#15 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1706
#16 0x0000558b290f5abf in PostmasterMain (argc=17, argv=0x558b2adde020) at
./build/../src/backend/postmaster/postmaster.c:1379
#17 0x0000558b28e824c2 in main (argc=17, argv=0x558b2adde020) at
./build/../src/backend/main/main.c:228

I don't think that such a crash is expected behavior. Either it should
return the table with all settings or report simply error out with message
'unrecognized configuration parameter "ALL"'.

[1] https://www.postgresql.org/docs/11/protocol-replication.html
[2] https://www.postgresql.org/docs/11/sql-show.html

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15734: Walsender process crashing when executing SHOW ALL;

Michael Paquier-2
On Thu, Apr 04, 2019 at 07:54:19AM +0000, PG Bug reporting form wrote:
> I don't think that such a crash is expected behavior. Either it should
> return the table with all settings or report simply error out with message
> 'unrecognized configuration parameter "ALL"'.

No, it's not correct to just disallow ALL as we have a similar check
in GetConfigOptionByName() for individual parameters.

Problem is origically from 25fff40, and got worse as of 0c8910a0.  A
superuser is able to get the list of parameters, but I can easily see
the failure when using a plain replication user.  A replication user
is one which can take a full base backup, so he can easily retrieve
the full list of parameters anyway.  What about bypassing the problem
and just allow WAL sender contexts to always have access to parameter
values?  I don't think that this is much a security issue if one
thinks about the access to BASE_BACKUP.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15734: Walsender process crashing when executing SHOW ALL;

Michael Paquier-2
On Mon, Apr 08, 2019 at 04:39:03PM +0900, Michael Paquier wrote:
> Problem is origically from 25fff40, and got worse as of 0c8910a0.  A
> superuser is able to get the list of parameters, but I can easily see
> the failure when using a plain replication user.  A replication user
> is one which can take a full base backup, so he can easily retrieve
> the full list of parameters anyway.  What about bypassing the problem
> and just allow WAL sender contexts to always have access to parameter
> values?  I don't think that this is much a security issue if one
> thinks about the access to BASE_BACKUP.

After pondering more about this one, allowing replication to have the
same rights as a superuser in this case does not feel completely right
either as this is just a shortcut to bypass the syscache lookups
happening through is_member_of_role().  So attached is a much better
and simple idea: let's just use a transaction context when issuing the
SHOW command so as it is possible to perform cache lookups correctly.
This way, even a replication role is not able to see some parameters
except if the role is a member of pg_read_all_settings, which is more
consistent.

This needs a backpatch down to v10.

Any objections to that?
--
Michael

wal-sender-show.patch (607 bytes) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15734: Walsender process crashing when executing SHOW ALL;

Alvaro Herrera-9
On 2019-Apr-10, Michael Paquier wrote:

> After pondering more about this one, allowing replication to have the
> same rights as a superuser in this case does not feel completely right
> either as this is just a shortcut to bypass the syscache lookups
> happening through is_member_of_role().  So attached is a much better
> and simple idea: let's just use a transaction context when issuing the
> SHOW command so as it is possible to perform cache lookups correctly.
> This way, even a replication role is not able to see some parameters
> except if the role is a member of pg_read_all_settings, which is more
> consistent.
>
> This needs a backpatch down to v10.

Thanks for tracking this down.

I think we should have a few tests issuing SHOW ALL in a replication
connection with various levels of privilege; it's annoying that this bug
took two years to find.  With that, special-purpose buildfarm members
would tell us if we've made some mistake in transaction handling or
whatever.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15734: Walsender process crashing when executing SHOW ALL;

Michael Paquier-2
On Wed, Apr 10, 2019 at 11:01:21AM -0400, Alvaro Herrera wrote:
> I think we should have a few tests issuing SHOW ALL in a replication
> connection with various levels of privilege; it's annoying that this bug
> took two years to find.  With that, special-purpose buildfarm members
> would tell us if we've made some mistake in transaction handling or
> whatever.

What do you think about the set attached?  I prefer that we use
psql(on_error_die => 1) so as in the case of a crash or an error the
test dies hard and fast. In consequence, I have granted
pg_read_all_settings to the replication user to ensure that no errors
happen, still the syscache lookup is done.  We could have negative
tests but I don't think that's worth the addition as we already have
tests for pg_read_all_settings in src/test/regress/.  If I remove the
patched patch from walsender.c, the test fails immediately.
--
Michael

wal-sender-show-v2.patch (3K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #15734: Walsender process crashing when executing SHOW ALL;

Michael Paquier-2
On Thu, Apr 11, 2019 at 12:07:12PM +0900, Michael Paquier wrote:
> What do you think about the set attached?  I prefer that we use
> psql(on_error_die => 1) so as in the case of a crash or an error the
> test dies hard and fast. In consequence, I have granted
> pg_read_all_settings to the replication user to ensure that no errors
> happen, still the syscache lookup is done.  We could have negative
> tests but I don't think that's worth the addition as we already have
> tests for pg_read_all_settings in src/test/regress/.  If I remove the
> patched patch from walsender.c, the test fails immediately.

I have done another round of review of this patch and committed it
down to v10.  When testing on Windows, I have noticed that these new
regression tests suffer the same problem with SSPI authentication as
the recent tweaks I have done for pg_rewind.  So I have backpatched
the extension for PostgresNode::init done in d9f543e9 to be able to
add extra options for the authentication configuration so as the new
tests are able to work.

Thanks Alexander for the report!
--
Michael

signature.asc (849 bytes) Download Attachment