BUG #16817: kill process cause postmaster hang

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16817: kill process cause postmaster hang

apt.postgresql.org Repository Update
The following bug has been logged on the website:

Bug reference:      16817
Logged by:          Bo Chen
Email address:      [hidden email]
PostgreSQL version: 11.8
Operating system:   euleros v2r7 x86_64
Description:        

Hi hackers

    Recently we encountered a problem that after killed walwriter, we expect
the database can recover normally, but it not (the postmaster hang in the
stat of  'wait dead end',and the archiver does't exit).
    After analysis this problem, we found it could be a bug for a long time.
for archiver now use 'system' to call the configed archive command. For
'system' the linux programmer's manual describe the following 'During
execution of the command, SIGCHLD will be blocked, and SIGINT and SIGQUIT
will be ignored'.

    So, when a child chrash, we now just SIGQUIT the archiver just one time,
while the archiver just execute 'system', SIGQUIT  will be ignored, then the
posmaster hang in stat of 'wait dead end'.

    For this porblem, we now added a SIGUSR2 for archiver after SIGQUIT  for
HandleChildCrash. If there any other solution?

   regards,ChenBo

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16817: kill process cause postmaster hang

Tom Lane-2
PG Bug reporting form <[hidden email]> writes:
>     Recently we encountered a problem that after killed walwriter, we expect
> the database can recover normally, but it not (the postmaster hang in the
> stat of  'wait dead end', and the archiver does't exit).
>     After analysis this problem, we found it could be a bug for a long time.
> for archiver now use 'system' to call the configed archive command. For
> 'system' the linux programmer's manual describe the following 'During
> execution of the command, SIGCHLD will be blocked, and SIGINT and SIGQUIT
> will be ignored'.

>     So, when a child chrash, we now just SIGQUIT the archiver just one time,
> while the archiver just execute 'system', SIGQUIT  will be ignored, then the
> posmaster hang in stat of 'wait dead end'.

Not sure I believe this: why wouldn't the SIGKILL-after-5-seconds logic
get us out of that situation?

                        regards, tom lane