dropdb --force

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
135 messages Options
1234567
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

vignesh C
On Thu, Oct 3, 2019 at 11:18 PM vignesh C <[hidden email]> wrote:

>
> On Wed, Oct 2, 2019 at 10:21 PM Pavel Stehule <[hidden email]> wrote:
> >
> > Thank you for careful review. I hope so all issues are out.
> >
> >
> Thanks Pavel for fixing the comments.
> Few comments:
> The else part cannot be hit in DropDatabase function as gram.y expects FORCE.
> +
> + if (strcmp(opt->defname, "force") == 0)
> + force = true;
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("unrecognized DROP DATABASE option \"%s\"", opt->defname),
> + parser_errposition(pstate, opt->location)));
> + }
> +
>
> We should change gram.y to accept any keyword and throw error from
> DropDatabase function.
> + */
> +drop_option: FORCE
> + {
> + $$ = makeDefElem("force", NULL, @1);
> + }
> + ;
>
> "This will also fail" should be "This will fail"
> +     <para>
> +      This will fail, if current user has no permissions to terminate other
> +      connections. Required permissions are the same as with
> +      <literal>pg_terminate_backend</literal>, described
> +      in <xref linkend="functions-admin-signal"/>.
> +
> +      This will also fail if we are not able to terminate connections or
> +      when there are active prepared transactions or active logical replication
> +      slots.
> +     </para>
>
> Can we add few tests for this feature.
>
Couple of minor comments:
DBNAME can be included
- * DROP DATABASE [ IF EXISTS ]
+ * DROP DATABASE [ IF EXISTS ] [ [ WITH ] ( options ) ]
can be
- * DROP DATABASE [ IF EXISTS ]
+ * DROP DATABASE [ IF EXISTS ] DBNAME [ [ WITH ] ( options ) ]

Should we include dbname in the below also?
+DROP DATABASE [ IF EXISTS ] <replaceable
class="parameter">name</replaceable> [ [ WITH ] ( <replaceable
class="parameter">option</replaceable> [, ...] ) ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can
be one of:</phrase>

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule
In reply to this post by vignesh C


čt 3. 10. 2019 v 19:48 odesílatel vignesh C <[hidden email]> napsal:
On Wed, Oct 2, 2019 at 10:21 PM Pavel Stehule <[hidden email]> wrote:
>
> Thank you for careful review. I hope so all issues are out.
>
>
Thanks Pavel for fixing the comments.
Few comments:
The else part cannot be hit in DropDatabase function as gram.y expects FORCE.
+
+ if (strcmp(opt->defname, "force") == 0)
+ force = true;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized DROP DATABASE option \"%s\"", opt->defname),
+ parser_errposition(pstate, opt->location)));
+ }
+

I know - but somebody can call DropDatabase function outside parser. So is better check all possibilities.


We should change gram.y to accept any keyword and throw error from
DropDatabase function.
+ */
+drop_option: FORCE
+ {
+ $$ = makeDefElem("force", NULL, @1);
+ }
+ ;

I spent some time with thinking about it, and I think so this variant (with keyword) is well readable and very illustrative. This will be lost with generic variant.

When the keyword FORCE already exists, then I prefer current state.
 

"This will also fail" should be "This will fail"
+     <para>
+      This will fail, if current user has no permissions to terminate other
+      connections. Required permissions are the same as with
+      <literal>pg_terminate_backend</literal>, described
+      in <xref linkend="functions-admin-signal"/>.
+
+      This will also fail if we are not able to terminate connections or
+      when there are active prepared transactions or active logical replication
+      slots.
+     </para>

fixed
 

Can we add few tests for this feature.

there are not any other test for DROP DATABASE

We can check syntax later inside second patch (for -f option of dropdb command)

Regards

Pavel

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

drop-database-force-20191004.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

vignesh C
On Fri, Oct 4, 2019 at 9:54 PM Pavel Stehule <[hidden email]> wrote:

>
>
>
> čt 3. 10. 2019 v 19:48 odesílatel vignesh C <[hidden email]> napsal:
>>
>> On Wed, Oct 2, 2019 at 10:21 PM Pavel Stehule <[hidden email]> wrote:
>> >
>> > Thank you for careful review. I hope so all issues are out.
>> >
>> >
>> Thanks Pavel for fixing the comments.
>> Few comments:
>> The else part cannot be hit in DropDatabase function as gram.y expects FORCE.
>> +
>> + if (strcmp(opt->defname, "force") == 0)
>> + force = true;
>> + else
>> + ereport(ERROR,
>> + (errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("unrecognized DROP DATABASE option \"%s\"", opt->defname),
>> + parser_errposition(pstate, opt->location)));
>> + }
>> +
>
>
> I know - but somebody can call DropDatabase function outside parser. So is better check all possibilities.
>
>>
>> We should change gram.y to accept any keyword and throw error from
>> DropDatabase function.
>> + */
>> +drop_option: FORCE
>> + {
>> + $$ = makeDefElem("force", NULL, @1);
>> + }
>> + ;
>
>
> I spent some time with thinking about it, and I think so this variant (with keyword) is well readable and very illustrative. This will be lost with generic variant.
>
> When the keyword FORCE already exists, then I prefer current state.
>
>>
>>
>> "This will also fail" should be "This will fail"
>> +     <para>
>> +      This will fail, if current user has no permissions to terminate other
>> +      connections. Required permissions are the same as with
>> +      <literal>pg_terminate_backend</literal>, described
>> +      in <xref linkend="functions-admin-signal"/>.
>> +
>> +      This will also fail if we are not able to terminate connections or
>> +      when there are active prepared transactions or active logical replication
>> +      slots.
>> +     </para>
>
>
> fixed
>
>>
>>
>> Can we add few tests for this feature.
>
>
> there are not any other test for DROP DATABASE
>
> We can check syntax later inside second patch (for -f option of dropdb command)
>
Changes in this patch looks fine to me.
I'm not sure if we have forgotten to miss attaching the second patch
or can you provide the link to second patch.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule


ne 6. 10. 2019 v 10:19 odesílatel vignesh C <[hidden email]> napsal:
On Fri, Oct 4, 2019 at 9:54 PM Pavel Stehule <[hidden email]> wrote:
>
>
>
> čt 3. 10. 2019 v 19:48 odesílatel vignesh C <[hidden email]> napsal:
>>
>> On Wed, Oct 2, 2019 at 10:21 PM Pavel Stehule <[hidden email]> wrote:
>> >
>> > Thank you for careful review. I hope so all issues are out.
>> >
>> >
>> Thanks Pavel for fixing the comments.
>> Few comments:
>> The else part cannot be hit in DropDatabase function as gram.y expects FORCE.
>> +
>> + if (strcmp(opt->defname, "force") == 0)
>> + force = true;
>> + else
>> + ereport(ERROR,
>> + (errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("unrecognized DROP DATABASE option \"%s\"", opt->defname),
>> + parser_errposition(pstate, opt->location)));
>> + }
>> +
>
>
> I know - but somebody can call DropDatabase function outside parser. So is better check all possibilities.
>
>>
>> We should change gram.y to accept any keyword and throw error from
>> DropDatabase function.
>> + */
>> +drop_option: FORCE
>> + {
>> + $$ = makeDefElem("force", NULL, @1);
>> + }
>> + ;
>
>
> I spent some time with thinking about it, and I think so this variant (with keyword) is well readable and very illustrative. This will be lost with generic variant.
>
> When the keyword FORCE already exists, then I prefer current state.
>
>>
>>
>> "This will also fail" should be "This will fail"
>> +     <para>
>> +      This will fail, if current user has no permissions to terminate other
>> +      connections. Required permissions are the same as with
>> +      <literal>pg_terminate_backend</literal>, described
>> +      in <xref linkend="functions-admin-signal"/>.
>> +
>> +      This will also fail if we are not able to terminate connections or
>> +      when there are active prepared transactions or active logical replication
>> +      slots.
>> +     </para>
>
>
> fixed
>
>>
>>
>> Can we add few tests for this feature.
>
>
> there are not any other test for DROP DATABASE
>
> We can check syntax later inside second patch (for -f option of dropdb command)
>
Changes in this patch looks fine to me.
I'm not sure if we have forgotten to miss attaching the second patch
or can you provide the link to second patch.

I plan to work on the second patch after committing of this first. Now we are early on commit fest, and the complexity of this or second patch is not too high be necessary to prepare patch series.

Regards

Pavel


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

akapila
In reply to this post by Pavel Stehule
On Fri, Oct 4, 2019 at 9:54 PM Pavel Stehule <[hidden email]> wrote:
>
>>
>> Can we add few tests for this feature.
>
> there are not any other test for DROP DATABASE
>

I think there are no dedicated tests for it but in a few tests, we use
it like in contrib\sepgsql\sql\alter.sql.  I am not sure if we can
write a predictable test for force option because it will never be
guaranteed to drop the database in the presence of other active
sessions.

Few more comments:
---------------------------------
1.
+ if (nprepared > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg("database \"%s\" is used by %d prepared transaction(s)",
+ get_database_name(databaseId),
+ nprepared)));
+

You need to use errdetail_plural here to avoid translation problems,
see docs[1] for a detailed explanation.  You can use function
errdetail_busy_db.  Also, the indentation is not proper.

2.
TerminateOtherDBBackends()
{
..
+ /* We know so we have all necessary rights now */
+ foreach (lc, pids)
+ {
+ int pid = lfirst_int(lc);
+ PGPROC    *proc = BackendPidGetProc(pid);
+
+ if (proc != NULL)
+ {
+ /* If we have setsid(), signal the backend's whole process group */
+#ifdef HAVE_SETSID
+ (void) kill(-pid, SIGTERM);
+#else
+ (void) kill(pid, SIGTERM);
+#endif
+ }
+ }
+
+ /* sleep 100ms */
+ pg_usleep(100 * 1000L);
..
}

So here we are sending SIGTERM to all the processes and wait for 100ms
to allow them to exit.  Have you tested this with many processes?  I
think we can create 100~500 sessions or maybe more to the database
being dropped and see what is behavior?  One thing to notice is that
in function CountOtherDBBackends() we are sending SIGTERM to 10
autovacuum processes at-a-time.  That has been introduced in commit
4abd7b49f1e9, but the reason for the same is not mentioned.  I am not
sure if it is to avoid sending SIGTERM to many processes in quick
succession.

I think there should be more comments atop TerminateOtherDBBackends to
explain the working of it and some assumptions of that function.

3.
+opt_drop_option_list:
+ opt_with '(' drop_option_list ')' { $$ = $3; }
+ | /* EMPTY */

I think it is better to keep opt_with as part of the main syntax
rather than clubbing it with drop_option_list as we have in other
cases in the code.

4.
+drop_option: FORCE
+ {
+ $$ = makeDefElem("force", NULL, @1);
+ }
+ ;

We generally keep the option name "FORCE" in the new line.

5.  I think it is better if can support tab-completion for this feature.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule


so 19. 10. 2019 v 12:37 odesílatel Amit Kapila <[hidden email]> napsal:
On Fri, Oct 4, 2019 at 9:54 PM Pavel Stehule <[hidden email]> wrote:
>
>>
>> Can we add few tests for this feature.
>
> there are not any other test for DROP DATABASE
>

I think there are no dedicated tests for it but in a few tests, we use
it like in contrib\sepgsql\sql\alter.sql.  I am not sure if we can
write a predictable test for force option because it will never be
guaranteed to drop the database in the presence of other active
sessions.

done - I push tests to /tests/regress/psql.sql


Few more comments:
---------------------------------
1.
+ if (nprepared > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg("database \"%s\" is used by %d prepared transaction(s)",
+ get_database_name(databaseId),
+ nprepared)));
+

You need to use errdetail_plural here to avoid translation problems,
see docs[1] for a detailed explanation.  You can use function
errdetail_busy_db.  Also, the indentation is not proper.

fixed


2.
TerminateOtherDBBackends()
{
..
+ /* We know so we have all necessary rights now */
+ foreach (lc, pids)
+ {
+ int pid = lfirst_int(lc);
+ PGPROC    *proc = BackendPidGetProc(pid);
+
+ if (proc != NULL)
+ {
+ /* If we have setsid(), signal the backend's whole process group */
+#ifdef HAVE_SETSID
+ (void) kill(-pid, SIGTERM);
+#else
+ (void) kill(pid, SIGTERM);
+#endif
+ }
+ }
+
+ /* sleep 100ms */
+ pg_usleep(100 * 1000L);
..
}

So here we are sending SIGTERM to all the processes and wait for 100ms
to allow them to exit.  Have you tested this with many processes?  I
think we can create 100~500 sessions or maybe more to the database
being dropped and see what is behavior?  One thing to notice is that
in function CountOtherDBBackends() we are sending SIGTERM to 10
autovacuum processes at-a-time.  That has been introduced in commit
4abd7b49f1e9, but the reason for the same is not mentioned.  I am not
sure if it is to avoid sending SIGTERM to many processes in quick
succession.

I tested it on linux Linux nemesis

5.3.6-300.fc31.x86_64 #1 SMP Mon Oct 14 12:26:42 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Tested with 1800 connections without any problem (under low load (only pg_sleep was called).


I think there should be more comments atop TerminateOtherDBBackends to
explain the working of it and some assumptions of that function.

done


3.
+opt_drop_option_list:
+ opt_with '(' drop_option_list ')' { $$ = $3; }
+ | /* EMPTY */

I think it is better to keep opt_with as part of the main syntax
rather than clubbing it with drop_option_list as we have in other
cases in the code.

done


4.
+drop_option: FORCE
+ {
+ $$ = makeDefElem("force", NULL, @1);
+ }
+ ;

We generally keep the option name "FORCE" in the new line.

done


5.  I think it is better if can support tab-completion for this feature.

done

I am sending fresh patch

Regards

Pavel


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

drop-database-force-20191019.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

akapila
On Sun, Oct 20, 2019 at 2:06 AM Pavel Stehule <[hidden email]> wrote:

>
> so 19. 10. 2019 v 12:37 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>> On Fri, Oct 4, 2019 at 9:54 PM Pavel Stehule <[hidden email]> wrote:
>> >
>> >>
>> >> Can we add few tests for this feature.
>> >
>> > there are not any other test for DROP DATABASE
>> >
>>
>> I think there are no dedicated tests for it but in a few tests, we use
>> it like in contrib\sepgsql\sql\alter.sql.  I am not sure if we can
>> write a predictable test for force option because it will never be
>> guaranteed to drop the database in the presence of other active
>> sessions.
>
>
> done - I push tests to /tests/regress/psql.sql
>
>>
>> Few more comments:
>> ---------------------------------
>>
>> 2.
>> TerminateOtherDBBackends()
>> {
>> ..
>> + /* We know so we have all necessary rights now */
>> + foreach (lc, pids)
>> + {
>> + int pid = lfirst_int(lc);
>> + PGPROC    *proc = BackendPidGetProc(pid);
>> +
>> + if (proc != NULL)
>> + {
>> + /* If we have setsid(), signal the backend's whole process group */
>> +#ifdef HAVE_SETSID
>> + (void) kill(-pid, SIGTERM);
>> +#else
>> + (void) kill(pid, SIGTERM);
>> +#endif
>> + }
>> + }
>> +
>> + /* sleep 100ms */
>> + pg_usleep(100 * 1000L);
>> ..
>> }
>>
>> So here we are sending SIGTERM to all the processes and wait for 100ms
>> to allow them to exit.  Have you tested this with many processes?  I
>> think we can create 100~500 sessions or maybe more to the database
>> being dropped and see what is behavior?  One thing to notice is that
>> in function CountOtherDBBackends() we are sending SIGTERM to 10
>> autovacuum processes at-a-time.  That has been introduced in commit
>> 4abd7b49f1e9, but the reason for the same is not mentioned.  I am not
>> sure if it is to avoid sending SIGTERM to many processes in quick
>> succession.
>
>
> I tested it on linux Linux nemesis
>
> 5.3.6-300.fc31.x86_64 #1 SMP Mon Oct 14 12:26:42 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>
> Tested with 1800 connections without any problem
>

When you say 'without any problem', do you mean to say that the drop
database is successful?  In your tests were those sessions idle?  If
so, I think we should try with busy sessions.  Also, if you write any
script to do these tests, kindly share the same so that others can
also repeat those tests.

>(under low load (only pg_sleep was called).
>

I guess this is also possible because immediately after
TerminateOtherDBBackends, we call CountOtherDBBackends which again
give 5s to allow active connections to die. I am wondering why not we
call CountOtherDBBackends from TerminateOtherDBBackends after sending
the SIGTERM to all the sessions that are connected to the database
being dropped?  Currently, it looks odd that first, we wait for 100ms
after sending the signal and then separately wait for 5s in another
function.

Other comments:
1.
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index 26a0bcf718..c8f545be18 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1182,3 +1182,11 @@ drop role regress_partitioning_role;

 -- \d on toast table (use pg_statistic's toast table, which has a known name)
 \d pg_toast.pg_toast_2619
+
+--
+-- DROP DATABASE FORCE test of syntax (should not to raise syntax error)
+--
+drop database not_exists (force);
+drop database not_exists with (force);
+drop database if exists not_exists (force);
+drop database if exists not_exists with (force);

I don't think psql.sql is the right place to add such tests.
Actually, there doesn't appear to be any database specific test file.
However, if we want to add to any existing file, then maybe
drop_if_exits.sql could be a better place for these tests as compare
to psql.sql.

2.
+ if (nprepared > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg_plural("database \"%s\" is used by %d prepared transaction",
+    "database \"%s\" is used by %d prepared transactions",
+    nprepared,
+    get_database_name(databaseId), nprepared)));

I think it is better if we mimic exactly what we have in the failure
of  CountOtherDBBackends.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule


po 21. 10. 2019 v 7:11 odesílatel Amit Kapila <[hidden email]> napsal:
On Sun, Oct 20, 2019 at 2:06 AM Pavel Stehule <[hidden email]> wrote:
>
> so 19. 10. 2019 v 12:37 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>> On Fri, Oct 4, 2019 at 9:54 PM Pavel Stehule <[hidden email]> wrote:
>> >
>> >>
>> >> Can we add few tests for this feature.
>> >
>> > there are not any other test for DROP DATABASE
>> >
>>
>> I think there are no dedicated tests for it but in a few tests, we use
>> it like in contrib\sepgsql\sql\alter.sql.  I am not sure if we can
>> write a predictable test for force option because it will never be
>> guaranteed to drop the database in the presence of other active
>> sessions.
>
>
> done - I push tests to /tests/regress/psql.sql
>
>>
>> Few more comments:
>> ---------------------------------
>>
>> 2.
>> TerminateOtherDBBackends()
>> {
>> ..
>> + /* We know so we have all necessary rights now */
>> + foreach (lc, pids)
>> + {
>> + int pid = lfirst_int(lc);
>> + PGPROC    *proc = BackendPidGetProc(pid);
>> +
>> + if (proc != NULL)
>> + {
>> + /* If we have setsid(), signal the backend's whole process group */
>> +#ifdef HAVE_SETSID
>> + (void) kill(-pid, SIGTERM);
>> +#else
>> + (void) kill(pid, SIGTERM);
>> +#endif
>> + }
>> + }
>> +
>> + /* sleep 100ms */
>> + pg_usleep(100 * 1000L);
>> ..
>> }
>>
>> So here we are sending SIGTERM to all the processes and wait for 100ms
>> to allow them to exit.  Have you tested this with many processes?  I
>> think we can create 100~500 sessions or maybe more to the database
>> being dropped and see what is behavior?  One thing to notice is that
>> in function CountOtherDBBackends() we are sending SIGTERM to 10
>> autovacuum processes at-a-time.  That has been introduced in commit
>> 4abd7b49f1e9, but the reason for the same is not mentioned.  I am not
>> sure if it is to avoid sending SIGTERM to many processes in quick
>> succession.
>
>
> I tested it on linux Linux nemesis
>
> 5.3.6-300.fc31.x86_64 #1 SMP Mon Oct 14 12:26:42 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>
> Tested with 1800 connections without any problem
>

When you say 'without any problem', do you mean to say that the drop
database is successful?  In your tests were those sessions idle?  If
so, I think we should try with busy sessions.  Also, if you write any
script to do these tests, kindly share the same so that others can
also repeat those tests.


sessions was active - but the function pg_sleep was called. Drop table (mainly logout of these users was successful).

I had a script just

for i in {1..1000}; do (psql -c "select pg_sleep(1000)" omega &> /dev/null &); done

I'll try to do some experiments - unfortunately,I have not a hw where I can test very large number of connections.

But surely I can use pg_bench, and I can check pg_bench load

 
>(under low load (only pg_sleep was called).
>

I guess this is also possible because immediately after
TerminateOtherDBBackends, we call CountOtherDBBackends which again
give 5s to allow active connections to die. I am wondering why not we
call CountOtherDBBackends from TerminateOtherDBBackends after sending
the SIGTERM to all the sessions that are connected to the database
being dropped?  Currently, it looks odd that first, we wait for 100ms
after sending the signal and then separately wait for 5s in another
function.

I'll look to this part, but I don't think so it is bad. 5s is maximum, not minimum of waiting. So if sigterm is successful in first 100ms, then  CountOtherDBBackends doesn't add any time. If not, then it dynamically waiting. 

If we don't wait in TerminateOtherDBBackends, then probably there should be necessary some cycles inside CountOtherDBBackends. I think so code like is correct

1. send SIGTERM to target processes
2. put some time to processes for logout (100ms)
3. check in loop (max 5 sec) on logout of all processes

Maybe my feeling is wrong, but I think so it is good to wait few time instead to call CountOtherDBBackends immediately - the first iteration should to fail, and then first iteration is useless without chance on success.


Other comments:
1.
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index 26a0bcf718..c8f545be18 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1182,3 +1182,11 @@ drop role regress_partitioning_role;

 -- \d on toast table (use pg_statistic's toast table, which has a known name)
 \d pg_toast.pg_toast_2619
+
+--
+-- DROP DATABASE FORCE test of syntax (should not to raise syntax error)
+--
+drop database not_exists (force);
+drop database not_exists with (force);
+drop database if exists not_exists (force);
+drop database if exists not_exists with (force);

I don't think psql.sql is the right place to add such tests.
Actually, there doesn't appear to be any database specific test file.
However, if we want to add to any existing file, then maybe
drop_if_exits.sql could be a better place for these tests as compare
to psql.sql.

2.
+ if (nprepared > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg_plural("database \"%s\" is used by %d prepared transaction",
+    "database \"%s\" is used by %d prepared transactions",
+    nprepared,
+    get_database_name(databaseId), nprepared)));

I think it is better if we mimic exactly what we have in the failure
of  CountOtherDBBackends.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

akapila
On Mon, Oct 21, 2019 at 11:08 AM Pavel Stehule <[hidden email]> wrote:

>
> po 21. 10. 2019 v 7:11 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>> >(under low load (only pg_sleep was called).
>> >
>>
>> I guess this is also possible because immediately after
>> TerminateOtherDBBackends, we call CountOtherDBBackends which again
>> give 5s to allow active connections to die. I am wondering why not we
>> call CountOtherDBBackends from TerminateOtherDBBackends after sending
>> the SIGTERM to all the sessions that are connected to the database
>> being dropped?  Currently, it looks odd that first, we wait for 100ms
>> after sending the signal and then separately wait for 5s in another
>> function.
>
>
> I'll look to this part, but I don't think so it is bad. 5s is maximum, not minimum of waiting. So if sigterm is successful in first 100ms, then  CountOtherDBBackends doesn't add any time. If not, then it dynamically waiting.
>
> If we don't wait in TerminateOtherDBBackends, then probably there should be necessary some cycles inside CountOtherDBBackends. I think so code like is correct
>
> 1. send SIGTERM to target processes
> 2. put some time to processes for logout (100ms)
> 3. check in loop (max 5 sec) on logout of all processes
>
> Maybe my feeling is wrong, but I think so it is good to wait few time instead to call CountOtherDBBackends immediately - the first iteration should to fail, and then first iteration is useless without chance on success.
>

I think the way I am suggesting by skipping the second step will allow
sleeping only when required.  Consider a case where there are just one
or two sessions connected to the database and they immediately exited
after the signal is sent.  In such a case you don't need to sleep at
all whereas, under your proposal, it will always sleep.  In the case
where a large number of sessions are present and the first 100ms are
not sufficient, we anyway need to wait dynamically.  So, I think the
second step not only looks odd but also seems to be redundant.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule


po 21. 10. 2019 v 8:38 odesílatel Amit Kapila <[hidden email]> napsal:
On Mon, Oct 21, 2019 at 11:08 AM Pavel Stehule <[hidden email]> wrote:
>
> po 21. 10. 2019 v 7:11 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>> >(under low load (only pg_sleep was called).
>> >
>>
>> I guess this is also possible because immediately after
>> TerminateOtherDBBackends, we call CountOtherDBBackends which again
>> give 5s to allow active connections to die. I am wondering why not we
>> call CountOtherDBBackends from TerminateOtherDBBackends after sending
>> the SIGTERM to all the sessions that are connected to the database
>> being dropped?  Currently, it looks odd that first, we wait for 100ms
>> after sending the signal and then separately wait for 5s in another
>> function.
>
>
> I'll look to this part, but I don't think so it is bad. 5s is maximum, not minimum of waiting. So if sigterm is successful in first 100ms, then  CountOtherDBBackends doesn't add any time. If not, then it dynamically waiting.
>
> If we don't wait in TerminateOtherDBBackends, then probably there should be necessary some cycles inside CountOtherDBBackends. I think so code like is correct
>
> 1. send SIGTERM to target processes
> 2. put some time to processes for logout (100ms)
> 3. check in loop (max 5 sec) on logout of all processes
>
> Maybe my feeling is wrong, but I think so it is good to wait few time instead to call CountOtherDBBackends immediately - the first iteration should to fail, and then first iteration is useless without chance on success.
>

I think the way I am suggesting by skipping the second step will allow
sleeping only when required.  Consider a case where there are just one
or two sessions connected to the database and they immediately exited
after the signal is sent.  In such a case you don't need to sleep at
all whereas, under your proposal, it will always sleep.  In the case
where a large number of sessions are present and the first 100ms are
not sufficient, we anyway need to wait dynamically.  So, I think the
second step not only looks odd but also seems to be redundant.

I checked the code, and I think so calling  CountOtherDBBackends from TerminateOtherDBBackends is not good idea. CountOtherDBBackends should be called anywhere, TerminateOtherDBBackends only with FORCE flag. So I wouldn't to change code.

But I can (and I have not any problem with it) remove or significantly decrease sleeping time in TerminateOtherDBBackends.

100 ms is maybe very much - but zero is maybe too low. If there will not be any time between TerminateOtherDBBackends and CountOtherDBBackends, then probably CountOtherDBBackends hit waiting 100ms.

What about only 5 ms sleeping in TerminateOtherDBBackends?



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule
In reply to this post by akapila

Hi


When you say 'without any problem', do you mean to say that the drop
database is successful?  In your tests were those sessions idle?  If
so, I think we should try with busy sessions.  Also, if you write any
script to do these tests, kindly share the same so that others can
also repeat those tests.

I run pgbench on database with -i -S 100 and -c 1000. It is maximum for my notebook. There was high load - about 50, and still DROP DATABASE FORCE is working without any problems



>(under low load (only pg_sleep was called).
>

I guess this is also possible because immediately after
TerminateOtherDBBackends, we call CountOtherDBBackends which again
give 5s to allow active connections to die. I am wondering why not we
call CountOtherDBBackends from TerminateOtherDBBackends after sending
the SIGTERM to all the sessions that are connected to the database
being dropped?  Currently, it looks odd that first, we wait for 100ms
after sending the signal and then separately wait for 5s in another
function.

I checked code, and would not to change calls. Now I think the code is well readable and has logical sense. But we can decrease sleep in  TerminateOtherDBBackends from 100 ms to 5 ms (just to increase chance to be all killed processes done, and then waiting in CountOtherDBBackends 100ms will not be hit.


Other comments:
1.
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index 26a0bcf718..c8f545be18 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1182,3 +1182,11 @@ drop role regress_partitioning_role;

 -- \d on toast table (use pg_statistic's toast table, which has a known name)
 \d pg_toast.pg_toast_2619
+
+--
+-- DROP DATABASE FORCE test of syntax (should not to raise syntax error)
+--
+drop database not_exists (force);
+drop database not_exists with (force);
+drop database if exists not_exists (force);
+drop database if exists not_exists with (force);

I don't think psql.sql is the right place to add such tests.
Actually, there doesn't appear to be any database specific test file.
However, if we want to add to any existing file, then maybe
drop_if_exits.sql could be a better place for these tests as compare
to psql.sql.

moved


2.
+ if (nprepared > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_IN_USE),
+ errmsg_plural("database \"%s\" is used by %d prepared transaction",
+    "database \"%s\" is used by %d prepared transactions",
+    nprepared,
+    get_database_name(databaseId), nprepared)));

I think it is better if we mimic exactly what we have in the failure
of  CountOtherDBBackends.

done

Regards

Pavel


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

drop-database-force-20191021.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

akapila
In reply to this post by Pavel Stehule
On Mon, Oct 21, 2019 at 12:24 PM Pavel Stehule <[hidden email]> wrote:

>
> po 21. 10. 2019 v 8:38 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>> > If we don't wait in TerminateOtherDBBackends, then probably there should be necessary some cycles inside CountOtherDBBackends. I think so code like is correct
>> >
>> > 1. send SIGTERM to target processes
>> > 2. put some time to processes for logout (100ms)
>> > 3. check in loop (max 5 sec) on logout of all processes
>> >
>> > Maybe my feeling is wrong, but I think so it is good to wait few time instead to call CountOtherDBBackends immediately - the first iteration should to fail, and then first iteration is useless without chance on success.
>> >
>>
>> I think the way I am suggesting by skipping the second step will allow
>> sleeping only when required.  Consider a case where there are just one
>> or two sessions connected to the database and they immediately exited
>> after the signal is sent.  In such a case you don't need to sleep at
>> all whereas, under your proposal, it will always sleep.  In the case
>> where a large number of sessions are present and the first 100ms are
>> not sufficient, we anyway need to wait dynamically.  So, I think the
>> second step not only looks odd but also seems to be redundant.
>
>
> I checked the code, and I think so calling  CountOtherDBBackends from TerminateOtherDBBackends is not good idea. CountOtherDBBackends should be called anywhere, TerminateOtherDBBackends only with FORCE flag. So I wouldn't to change code.
>

Sorry, but I am not able to understand the reason.  Are you worried
about the comments atop CountOtherDBBackends which says it is used in
Drop Database and related commands?

> But I can (and I have not any problem with it) remove or significantly decrease sleeping time in TerminateOtherDBBackends.
>
> 100 ms is maybe very much - but zero is maybe too low. If there will not be any time between TerminateOtherDBBackends and CountOtherDBBackends, then probably CountOtherDBBackends hit waiting 100ms.
>
> What about only 5 ms sleeping in TerminateOtherDBBackends?
>

I am not completely sure about what is the most appropriate thing to
do, but I favor removing sleep from TerminateOtherDBBackends.  OTOH,
there is nothing broken with the logic.  Anyone else wants to weigh in
here?



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule


po 21. 10. 2019 v 10:25 odesílatel Amit Kapila <[hidden email]> napsal:
On Mon, Oct 21, 2019 at 12:24 PM Pavel Stehule <[hidden email]> wrote:
>
> po 21. 10. 2019 v 8:38 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>> > If we don't wait in TerminateOtherDBBackends, then probably there should be necessary some cycles inside CountOtherDBBackends. I think so code like is correct
>> >
>> > 1. send SIGTERM to target processes
>> > 2. put some time to processes for logout (100ms)
>> > 3. check in loop (max 5 sec) on logout of all processes
>> >
>> > Maybe my feeling is wrong, but I think so it is good to wait few time instead to call CountOtherDBBackends immediately - the first iteration should to fail, and then first iteration is useless without chance on success.
>> >
>>
>> I think the way I am suggesting by skipping the second step will allow
>> sleeping only when required.  Consider a case where there are just one
>> or two sessions connected to the database and they immediately exited
>> after the signal is sent.  In such a case you don't need to sleep at
>> all whereas, under your proposal, it will always sleep.  In the case
>> where a large number of sessions are present and the first 100ms are
>> not sufficient, we anyway need to wait dynamically.  So, I think the
>> second step not only looks odd but also seems to be redundant.
>
>
> I checked the code, and I think so calling  CountOtherDBBackends from TerminateOtherDBBackends is not good idea. CountOtherDBBackends should be called anywhere, TerminateOtherDBBackends only with FORCE flag. So I wouldn't to change code.
>

Sorry, but I am not able to understand the reason.  Are you worried
about the comments atop CountOtherDBBackends which says it is used in
Drop Database and related commands?

no, just now the code in dropdb looks like

if (force)
    TerminateOtherDBBackends(...);

CountOtherDBBackends(...);

if I call CountOtherDBBackends from TerminateOtherDBBackends, then code will look like

if (force)
  TerminateOtherDBBackends(...);
else
  CountOtherDBBackends(...);

That looks like CountOtherDBBackends is not called when force clause is active. And this is not true.

So I prefer current relations between routines.




> But I can (and I have not any problem with it) remove or significantly decrease sleeping time in TerminateOtherDBBackends.
>
> 100 ms is maybe very much - but zero is maybe too low. If there will not be any time between TerminateOtherDBBackends and CountOtherDBBackends, then probably CountOtherDBBackends hit waiting 100ms.
>
> What about only 5 ms sleeping in TerminateOtherDBBackends?
>

I am not completely sure about what is the most appropriate thing to
do, but I favor removing sleep from TerminateOtherDBBackends.  OTOH,
there is nothing broken with the logic.  Anyone else wants to weigh in
here?

ok. But when I remove it, should not be better to set waiting in  CountOtherDBBackends to some smaller number than 100ms?

Pavel



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

akapila
On Mon, Oct 21, 2019 at 2:15 PM Pavel Stehule <[hidden email]> wrote:

>
> po 21. 10. 2019 v 10:25 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>>
>> Sorry, but I am not able to understand the reason.  Are you worried
>> about the comments atop CountOtherDBBackends which says it is used in
>> Drop Database and related commands?
>
>
> no, just now the code in dropdb looks like
>
> if (force)
>     TerminateOtherDBBackends(...);
>
> CountOtherDBBackends(...);
>
> if I call CountOtherDBBackends from TerminateOtherDBBackends, then code will look like
>
> if (force)
>   TerminateOtherDBBackends(...);
> else
>   CountOtherDBBackends(...);
>
> That looks like CountOtherDBBackends is not called when force clause is active. And this is not true.
>

Hmm, can't we pass force as a parameter to TerminateOtherDBBackends()
and then take the decision inside that function?  That will make the
code of dropdb function a bit simpler.

> So I prefer current relations between routines.
>
>
>
>>
>> > But I can (and I have not any problem with it) remove or significantly decrease sleeping time in TerminateOtherDBBackends.
>> >
>> > 100 ms is maybe very much - but zero is maybe too low. If there will not be any time between TerminateOtherDBBackends and CountOtherDBBackends, then probably CountOtherDBBackends hit waiting 100ms.
>> >
>> > What about only 5 ms sleeping in TerminateOtherDBBackends?
>> >
>>
>> I am not completely sure about what is the most appropriate thing to
>> do, but I favor removing sleep from TerminateOtherDBBackends.  OTOH,
>> there is nothing broken with the logic.  Anyone else wants to weigh in
>> here?
>
>
> ok. But when I remove it, should not be better to set waiting in  CountOtherDBBackends to some smaller number than 100ms?
>

CountOtherDBBackends is called from other places as well, so I don't
think it is advisable to change the sleep time in that function.
Also, I don't want to add a parameter for it.  I think you have a
point that in some cases we might end up sleeping for 100ms when we
could do with less sleeping time, but I think it is true to some
extent today as well.  I think we can anyway change it in the future
if there is a problem with the sleep timing, but for now, I think we
can just call CountOtherDBBackends after sending SIGTERM and call it
good.  You might want to add a futuristic note in the code.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule


út 22. 10. 2019 v 5:09 odesílatel Amit Kapila <[hidden email]> napsal:
On Mon, Oct 21, 2019 at 2:15 PM Pavel Stehule <[hidden email]> wrote:
>
> po 21. 10. 2019 v 10:25 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>>
>> Sorry, but I am not able to understand the reason.  Are you worried
>> about the comments atop CountOtherDBBackends which says it is used in
>> Drop Database and related commands?
>
>
> no, just now the code in dropdb looks like
>
> if (force)
>     TerminateOtherDBBackends(...);
>
> CountOtherDBBackends(...);
>
> if I call CountOtherDBBackends from TerminateOtherDBBackends, then code will look like
>
> if (force)
>   TerminateOtherDBBackends(...);
> else
>   CountOtherDBBackends(...);
>
> That looks like CountOtherDBBackends is not called when force clause is active. And this is not true.
>

Hmm, can't we pass force as a parameter to TerminateOtherDBBackends()
and then take the decision inside that function?  That will make the
code of dropdb function a bit simpler.

I don't think so it is correct. Because without FORCE flag, you should not to call TeminateOtherDBBackend ever.

Maybe I don't understand what is wrong.

if (force)
  terminate();

CountOtherDBBackends()
if (some numbers)
   ereport(ERROR, ..

This is fully correct for me.




> So I prefer current relations between routines.
>
>
>
>>
>> > But I can (and I have not any problem with it) remove or significantly decrease sleeping time in TerminateOtherDBBackends.
>> >
>> > 100 ms is maybe very much - but zero is maybe too low. If there will not be any time between TerminateOtherDBBackends and CountOtherDBBackends, then probably CountOtherDBBackends hit waiting 100ms.
>> >
>> > What about only 5 ms sleeping in TerminateOtherDBBackends?
>> >
>>
>> I am not completely sure about what is the most appropriate thing to
>> do, but I favor removing sleep from TerminateOtherDBBackends.  OTOH,
>> there is nothing broken with the logic.  Anyone else wants to weigh in
>> here?
>
>
> ok. But when I remove it, should not be better to set waiting in  CountOtherDBBackends to some smaller number than 100ms?
>

CountOtherDBBackends is called from other places as well, so I don't
think it is advisable to change the sleep time in that function.
Also, I don't want to add a parameter for it.  I think you have a
point that in some cases we might end up sleeping for 100ms when we
could do with less sleeping time, but I think it is true to some
extent today as well.  I think we can anyway change it in the future
if there is a problem with the sleep timing, but for now, I think we
can just call CountOtherDBBackends after sending SIGTERM and call it
good.  You might want to add a futuristic note in the code.


ok.

I removed sleeping from TerminateOtherDBBackends().

If you want to change any logic there, please, do it without any hesitations. Maybe I don't see, what you think.

Regards

Pavel

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

drop-database-force-20191022.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

akapila
On Tue, Oct 22, 2019 at 4:51 PM Pavel Stehule <[hidden email]> wrote:

>
> út 22. 10. 2019 v 5:09 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>>
>> CountOtherDBBackends is called from other places as well, so I don't
>> think it is advisable to change the sleep time in that function.
>> Also, I don't want to add a parameter for it.  I think you have a
>> point that in some cases we might end up sleeping for 100ms when we
>> could do with less sleeping time, but I think it is true to some
>> extent today as well.  I think we can anyway change it in the future
>> if there is a problem with the sleep timing, but for now, I think we
>> can just call CountOtherDBBackends after sending SIGTERM and call it
>> good.  You might want to add a futuristic note in the code.
>>
>
> ok.
>
> I removed sleeping from TerminateOtherDBBackends().
>
> If you want to change any logic there, please, do it without any hesitations. Maybe I don't see, what you think.
>

Fair enough, I will see if I need to change anything.  In the
meantime, can you look into thread related to CountDBSubscriptions[1]?
 I think the problem reported there will be more serious after your
patch, so it is better if we can fix it before this patch.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BqhLkCYG2oy9xug9ur_j%3DG2wQNRYAyd%2B-kZfZ1z42pLw%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

akapila
On Wed, Oct 23, 2019 at 12:59 PM Amit Kapila <[hidden email]> wrote:

>
> On Tue, Oct 22, 2019 at 4:51 PM Pavel Stehule <[hidden email]> wrote:
> >
> > út 22. 10. 2019 v 5:09 odesílatel Amit Kapila <[hidden email]> napsal:
> >>
> >>
> >> CountOtherDBBackends is called from other places as well, so I don't
> >> think it is advisable to change the sleep time in that function.
> >> Also, I don't want to add a parameter for it.  I think you have a
> >> point that in some cases we might end up sleeping for 100ms when we
> >> could do with less sleeping time, but I think it is true to some
> >> extent today as well.  I think we can anyway change it in the future
> >> if there is a problem with the sleep timing, but for now, I think we
> >> can just call CountOtherDBBackends after sending SIGTERM and call it
> >> good.  You might want to add a futuristic note in the code.
> >>
> >
> > ok.
> >
> > I removed sleeping from TerminateOtherDBBackends().
> >
> > If you want to change any logic there, please, do it without any hesitations. Maybe I don't see, what you think.
> >
>
> Fair enough, I will see if I need to change anything.
>
While making some changes in the patch, I noticed that in
TerminateOtherDBBackends, there is a race condition where after we
release the ProcArrayLock, the backend process to which we decided to
send a signal exits by itself and the same pid can be assigned to
another backend which is connected to some other database.  This leads
to terminating a wrong backend.  I think we have some similar race
condition at few other places like in pg_terminate_backend(),
ProcSleep() and CountOtherDBBackends().  I think here the risk is a
bit more because there could be a long list of pids.

One idea could be that we write a new function similar to IsBackendPid
which takes dbid and ensures that pid belongs to that database and use
that before sending kill signal, but still it will not be completely
safe.  But, I think it can be closer to cases like we already have in
code.

Another possible idea could be to use the SendProcSignal mechanism
similar to how we have used it in CancelDBBackends() to allow the
required backends to exit by themselves.  This might be safer.

I am not sure if we can entirely eliminate this race condition and
whether it is a good idea to accept such a race condition even though
it exists in other parts of code.  What do you think?

BTW, I have added/revised some comments in the code and done few other
cosmetic changes, the result of which is attached.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

drop-database-force-20191024.amit.patch (21K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule


čt 24. 10. 2019 v 11:10 odesílatel Amit Kapila <[hidden email]> napsal:
On Wed, Oct 23, 2019 at 12:59 PM Amit Kapila <[hidden email]> wrote:
>
> On Tue, Oct 22, 2019 at 4:51 PM Pavel Stehule <[hidden email]> wrote:
> >
> > út 22. 10. 2019 v 5:09 odesílatel Amit Kapila <[hidden email]> napsal:
> >>
> >>
> >> CountOtherDBBackends is called from other places as well, so I don't
> >> think it is advisable to change the sleep time in that function.
> >> Also, I don't want to add a parameter for it.  I think you have a
> >> point that in some cases we might end up sleeping for 100ms when we
> >> could do with less sleeping time, but I think it is true to some
> >> extent today as well.  I think we can anyway change it in the future
> >> if there is a problem with the sleep timing, but for now, I think we
> >> can just call CountOtherDBBackends after sending SIGTERM and call it
> >> good.  You might want to add a futuristic note in the code.
> >>
> >
> > ok.
> >
> > I removed sleeping from TerminateOtherDBBackends().
> >
> > If you want to change any logic there, please, do it without any hesitations. Maybe I don't see, what you think.
> >
>
> Fair enough, I will see if I need to change anything.
>

While making some changes in the patch, I noticed that in
TerminateOtherDBBackends, there is a race condition where after we
release the ProcArrayLock, the backend process to which we decided to
send a signal exits by itself and the same pid can be assigned to
another backend which is connected to some other database.  This leads
to terminating a wrong backend.  I think we have some similar race
condition at few other places like in pg_terminate_backend(),
ProcSleep() and CountOtherDBBackends().  I think here the risk is a
bit more because there could be a long list of pids.

One idea could be that we write a new function similar to IsBackendPid
which takes dbid and ensures that pid belongs to that database and use
that before sending kill signal, but still it will not be completely
safe.  But, I think it can be closer to cases like we already have in
code.

Another possible idea could be to use the SendProcSignal mechanism
similar to how we have used it in CancelDBBackends() to allow the
required backends to exit by themselves.  This might be safer.

I am not sure if we can entirely eliminate this race condition and
whether it is a good idea to accept such a race condition even though
it exists in other parts of code.  What do you think?

BTW, I have added/revised some comments in the code and done few other
cosmetic changes, the result of which is attached.

Tomorrow I'll check variants that you mentioned.

We sure so there are not any new connect to removed database, because we hold lock there. So check if oid db is same should be enough.

Pavel


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

akapila
On Thu, Oct 24, 2019 at 8:22 PM Pavel Stehule <[hidden email]> wrote:

>
> čt 24. 10. 2019 v 11:10 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>> While making some changes in the patch, I noticed that in
>> TerminateOtherDBBackends, there is a race condition where after we
>> release the ProcArrayLock, the backend process to which we decided to
>> send a signal exits by itself and the same pid can be assigned to
>> another backend which is connected to some other database.  This leads
>> to terminating a wrong backend.  I think we have some similar race
>> condition at few other places like in pg_terminate_backend(),
>> ProcSleep() and CountOtherDBBackends().  I think here the risk is a
>> bit more because there could be a long list of pids.
>>
>> One idea could be that we write a new function similar to IsBackendPid
>> which takes dbid and ensures that pid belongs to that database and use
>> that before sending kill signal, but still it will not be completely
>> safe.  But, I think it can be closer to cases like we already have in
>> code.
>>
>> Another possible idea could be to use the SendProcSignal mechanism
>> similar to how we have used it in CancelDBBackends() to allow the
>> required backends to exit by themselves.  This might be safer.
>>
>> I am not sure if we can entirely eliminate this race condition and
>> whether it is a good idea to accept such a race condition even though
>> it exists in other parts of code.  What do you think?
>>
>> BTW, I have added/revised some comments in the code and done few other
>> cosmetic changes, the result of which is attached.
>
>
> Tomorrow I'll check variants that you mentioned.
>
> We sure so there are not any new connect to removed database, because we hold lock there.
>

Right.

> So check if oid db is same should be enough.
>

We can do this before sending a kill signal but is it enough?  Because
as soon as we release ProcArrayLock anytime the other process can exit
and a new process can use its pid.  I think this is more of a
theoretical risk and might not be easy to hit, but still, we can't
ignore it.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: dropdb --force

Pavel Stehule


pá 25. 10. 2019 v 4:55 odesílatel Amit Kapila <[hidden email]> napsal:
On Thu, Oct 24, 2019 at 8:22 PM Pavel Stehule <[hidden email]> wrote:
>
> čt 24. 10. 2019 v 11:10 odesílatel Amit Kapila <[hidden email]> napsal:
>>
>> While making some changes in the patch, I noticed that in
>> TerminateOtherDBBackends, there is a race condition where after we
>> release the ProcArrayLock, the backend process to which we decided to
>> send a signal exits by itself and the same pid can be assigned to
>> another backend which is connected to some other database.  This leads
>> to terminating a wrong backend.  I think we have some similar race
>> condition at few other places like in pg_terminate_backend(),
>> ProcSleep() and CountOtherDBBackends().  I think here the risk is a
>> bit more because there could be a long list of pids.
>>
>> One idea could be that we write a new function similar to IsBackendPid
>> which takes dbid and ensures that pid belongs to that database and use
>> that before sending kill signal, but still it will not be completely
>> safe.  But, I think it can be closer to cases like we already have in
>> code.
>>
>> Another possible idea could be to use the SendProcSignal mechanism
>> similar to how we have used it in CancelDBBackends() to allow the
>> required backends to exit by themselves.  This might be safer.
>>
>> I am not sure if we can entirely eliminate this race condition and
>> whether it is a good idea to accept such a race condition even though
>> it exists in other parts of code.  What do you think?
>>
>> BTW, I have added/revised some comments in the code and done few other
>> cosmetic changes, the result of which is attached.
>
>
> Tomorrow I'll check variants that you mentioned.
>
> We sure so there are not any new connect to removed database, because we hold lock there.
>

Right.

> So check if oid db is same should be enough.
>

We can do this before sending a kill signal but is it enough?  Because
as soon as we release ProcArrayLock anytime the other process can exit
and a new process can use its pid.  I think this is more of a
theoretical risk and might not be easy to hit, but still, we can't
ignore it.

yes, there is a theoretical risk probably - the released pid should near current fresh pid from range 0 .. pid_max.

Probably the best solutions is enhancing SendProcSignal and using it here and fix CountOtherDBBackends.

Alternative solution can be killing in block 50 processes and recheck. I'll try to write both and you can decide for one

Pavel


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
1234567