Get memory contexts of an arbitrary backend process

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Get memory contexts of an arbitrary backend process

torikoshia
Hi,

After commit 3e98c0bafb28de, we can display the usage of the
memory contexts using pg_backend_memory_contexts system
view.

However, its target is limited to the  process attached to
the current session.

As discussed in the thread[1], it'll be useful to make it
possible to get the memory contexts of an arbitrary backend
process.

Attached PoC patch makes pg_get_backend_memory_contexts()
display meory contexts of the specified PID of the process.


   =# -- PID of the target process is 17051
   =# SELECT * FROM  pg_get_backend_memory_contexts(17051) ;
            name          | ident |      parent      | level |
total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes
   
-----------------------+-------+------------------+-------+-------------+---------------+------------+-------------+------------
    TopMemoryContext      |       |                  |     0 |      
68720 |             5 |      16816 |          16 |      51904
    RowDescriptionContext |       | TopMemoryContext |     1 |        
8192 |             1 |       6880 |           0 |       1312
    MessageContext        |       | TopMemoryContext |     1 |      
65536 |             4 |      19912 |           1 |      45624
    ...

It doesn't display contexts of all the backends but only
the contexts of specified process.
I think it would be enough because I suppose this function
is used after investigations using ps command or other OS
level utilities.


The rough idea of implementation is like below:

   1. send  a signal to the specified process
   2. signaled process dumps its memory contexts to a file
   3. read the dumped file and display it to the user


Any thoughts?

[1]
https://www.postgresql.org/message-id/72a656e0f71d0860161e0b3f67e4d771%40oss.nttdata.com


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

0001-Enabled-pg_get_backend_memory_contexts-to-collect-ar.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Kasahara Tatsuhito
Hi,

On Mon, Aug 31, 2020 at 8:22 PM torikoshia <[hidden email]> wrote:
> As discussed in the thread[1], it'll be useful to make it
> possible to get the memory contexts of an arbitrary backend
> process.
+1

> Attached PoC patch makes pg_get_backend_memory_contexts()
> display meory contexts of the specified PID of the process.
Thanks, it's a very good patch for discussion.

> It doesn't display contexts of all the backends but only
> the contexts of specified process.
or we can  "SELECT (pg_get_backend_memory_contexts(pid)).* FROM
pg_stat_activity WHERE ...",
so I don't think it's a big deal.

> The rough idea of implementation is like below:
>
>    1. send  a signal to the specified process
>    2. signaled process dumps its memory contexts to a file
>    3. read the dumped file and display it to the user
I agree with the overview of the idea.
Here are some comments and questions.

- Currently, "the signal transmission for dumping memory information"
and "the read & output of dump information "
  are on the same interface, but I think it would be better to separate them.
  How about providing the following three types of functions for users?
  - send a signal to specified pid
  - check the status of the signal sent and received
  - read the dumped information
- How about managing the status of signal send/receive and dump
operations on a shared hash or others ?
  Sending and receiving signals, dumping memory information, and
referencing dump information all work asynchronously.
  Therefore, it would be good to have management information to check
the status of each process.
  A simple idea is that ..
   - send a signal to dump to a PID, it first record following
information into the shared hash.
     pid (specified pid)
     loc (dump location, currently might be ASAP)
     recv (did the pid process receive a signal? first false)
     dumped (did the pid process dump a mem information?  first false)
   - specified process receive the signal, update the status in the
shared hash, then dumped at specified location.
   - specified process finish dump mem information,  update the status
in the shared hash.
- Does it allow one process to output multiple dump files?
  It appears to be a specification to overwrite at present, but I
thought it would be good to be able to generate
  multiple dump files in different phases (e.g., planning phase and
execution phase) in the future.
- How is the dump file cleaned up?

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Pavel Stehule
Hi

po 31. 8. 2020 v 17:03 odesílatel Kasahara Tatsuhito <[hidden email]> napsal:
Hi,

On Mon, Aug 31, 2020 at 8:22 PM torikoshia <[hidden email]> wrote:
> As discussed in the thread[1], it'll be useful to make it
> possible to get the memory contexts of an arbitrary backend
> process.
+1

> Attached PoC patch makes pg_get_backend_memory_contexts()
> display meory contexts of the specified PID of the process.
Thanks, it's a very good patch for discussion.

> It doesn't display contexts of all the backends but only
> the contexts of specified process.
or we can  "SELECT (pg_get_backend_memory_contexts(pid)).* FROM
pg_stat_activity WHERE ...",
so I don't think it's a big deal.

> The rough idea of implementation is like below:
>
>    1. send  a signal to the specified process
>    2. signaled process dumps its memory contexts to a file
>    3. read the dumped file and display it to the user
I agree with the overview of the idea.
Here are some comments and questions.

- Currently, "the signal transmission for dumping memory information"
and "the read & output of dump information "
  are on the same interface, but I think it would be better to separate them.
  How about providing the following three types of functions for users?
  - send a signal to specified pid
  - check the status of the signal sent and received
  - read the dumped information
- How about managing the status of signal send/receive and dump
operations on a shared hash or others ?
  Sending and receiving signals, dumping memory information, and
referencing dump information all work asynchronously.
  Therefore, it would be good to have management information to check
the status of each process.
  A simple idea is that ..
   - send a signal to dump to a PID, it first record following
information into the shared hash.
     pid (specified pid)
     loc (dump location, currently might be ASAP)
     recv (did the pid process receive a signal? first false)
     dumped (did the pid process dump a mem information?  first false)
   - specified process receive the signal, update the status in the
shared hash, then dumped at specified location.
   - specified process finish dump mem information,  update the status
in the shared hash.
- Does it allow one process to output multiple dump files?
  It appears to be a specification to overwrite at present, but I
thought it would be good to be able to generate
  multiple dump files in different phases (e.g., planning phase and
execution phase) in the future.
- How is the dump file cleaned up?

 For a very long time there has been similar discussion about taking session query and session execution plans from other sessions.

I am not sure how necessary information is in the memory dump, but I am sure so taking the current execution plan and complete text of the current query is pretty necessary information.

but can be great so this infrastructure can be used for any debugging purpose.

Regards

Pavel


Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Andres Freund
In reply to this post by torikoshia
Hi,

On 2020-08-31 20:22:18 +0900, torikoshia wrote:

> After commit 3e98c0bafb28de, we can display the usage of the
> memory contexts using pg_backend_memory_contexts system
> view.
>
> However, its target is limited to the  process attached to
> the current session.
>
> As discussed in the thread[1], it'll be useful to make it
> possible to get the memory contexts of an arbitrary backend
> process.
>
> Attached PoC patch makes pg_get_backend_memory_contexts()
> display meory contexts of the specified PID of the process.

Awesome!


> It doesn't display contexts of all the backends but only
> the contexts of specified process.
> I think it would be enough because I suppose this function
> is used after investigations using ps command or other OS
> level utilities.

It can be used as a building block if all are needed. Getting the
infrastructure right is the big thing here, I think. Adding more
detailed views on top of that data later is easier.



> diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
> index a2d61302f9..88fb837ecd 100644
> --- a/src/backend/catalog/system_views.sql
> +++ b/src/backend/catalog/system_views.sql
> @@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
>  REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
>  
>  CREATE VIEW pg_backend_memory_contexts AS
> -    SELECT * FROM pg_get_backend_memory_contexts();
> +    SELECT * FROM pg_get_backend_memory_contexts(-1);

-1 is odd. Why not use NULL or even 0?

> + else
> + {
> + int rc;
> + int parent_len = strlen(parent);
> + int name_len = strlen(name);
> +
> + /*
> + * write out the current memory context information.
> + * Since some elements of values are reusable, we write it out.

Not sure what the second comment line here is supposed to mean?


> + */
> + fputc('D', fpout);
> + rc = fwrite(values, sizeof(values), 1, fpout);
> + rc = fwrite(nulls, sizeof(nulls), 1, fpout);
> +
> + /* write out information which is not resuable from serialized values */

s/resuable/reusable/


> + rc = fwrite(&name_len, sizeof(int), 1, fpout);
> + rc = fwrite(name, name_len, 1, fpout);
> + rc = fwrite(&idlen, sizeof(int), 1, fpout);
> + rc = fwrite(clipped_ident, idlen, 1, fpout);
> + rc = fwrite(&level, sizeof(int), 1, fpout);
> + rc = fwrite(&parent_len, sizeof(int), 1, fpout);
> + rc = fwrite(parent, parent_len, 1, fpout);
> + (void) rc; /* we'll check for error with ferror */
> +
> + }

This format is not descriptive. How about serializing to json or
something? Or at least having field names?

Alternatively, build the same tuple we build for the SRF, and serialize
that. Then there's basically no conversion needed.


> @@ -117,6 +157,8 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
>  Datum
>  pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
>  {
> + int pid =  PG_GETARG_INT32(0);
> +
>   ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
>   TupleDesc tupdesc;
>   Tuplestorestate *tupstore;
> @@ -147,11 +189,258 @@ pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
>  
>   MemoryContextSwitchTo(oldcontext);
>  
> - PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
> - TopMemoryContext, NULL, 0);
> + if (pid == -1)
> + {
> + /*
> + * Since pid -1 indicates target is the local process, simply
> + * traverse memory contexts.
> + */
> + PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
> + TopMemoryContext, "", 0, NULL);
> + }
> + else
> + {
> + /*
> + * Send signal for dumping memory contexts to the target process,
> + * and read the dumped file.
> + */
> + FILE   *fpin;
> + char dumpfile[MAXPGPATH];
> +
> + SendProcSignal(pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
> +
> + snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", pid);
> +
> + while (true)
> + {
> + CHECK_FOR_INTERRUPTS();
> +
> + pg_usleep(10000L);
> +

Need better signalling back/forth here.



> +/*
> + * dump_memory_contexts
> + * Dumping local memory contexts to a file.
> + * This function does not delete the file as it is intended to be read by
> + * another process.
> + */
> +static void
> +dump_memory_contexts(void)
> +{
> + FILE   *fpout;
> + char tmpfile[MAXPGPATH];
> + char dumpfile[MAXPGPATH];
> +
> + snprintf(tmpfile, sizeof(tmpfile), "pg_memusage/%d.tmp", MyProcPid);
> + snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", MyProcPid);
> +
> + /*
> + * Open a temp file to dump the current memory context.
> + */
> + fpout = AllocateFile(tmpfile, PG_BINARY_W);
> + if (fpout == NULL)
> + {
> + ereport(LOG,
> + (errcode_for_file_access(),
> + errmsg("could not write temporary memory context file \"%s\": %m",
> + tmpfile)));
> + return;
> + }

Probably should be opened with O_CREAT | O_TRUNC?


Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

torikoshia
In reply to this post by Pavel Stehule
On 2020-09-01 03:29, Pavel Stehule wrote:

> Hi
>
> po 31. 8. 2020 v 17:03 odesílatel Kasahara Tatsuhito
> <[hidden email]> napsal:
>
>> Hi,
>>
>> On Mon, Aug 31, 2020 at 8:22 PM torikoshia
>> <[hidden email]> wrote:
>>> As discussed in the thread[1], it'll be useful to make it
>>> possible to get the memory contexts of an arbitrary backend
>>> process.
>> +1
>>
>>> Attached PoC patch makes pg_get_backend_memory_contexts()
>>> display meory contexts of the specified PID of the process.
>> Thanks, it's a very good patch for discussion.
>>
>>> It doesn't display contexts of all the backends but only
>>> the contexts of specified process.
>> or we can  "SELECT (pg_get_backend_memory_contexts(pid)).* FROM
>> pg_stat_activity WHERE ...",
>> so I don't think it's a big deal.
>>
>>> The rough idea of implementation is like below:
>>>
>>> 1. send  a signal to the specified process
>>> 2. signaled process dumps its memory contexts to a file
>>> 3. read the dumped file and display it to the user
>> I agree with the overview of the idea.
>> Here are some comments and questions.

Thanks for the comments!

>>
>> - Currently, "the signal transmission for dumping memory
>> information"
>> and "the read & output of dump information "
>> are on the same interface, but I think it would be better to
>> separate them.
>> How about providing the following three types of functions for
>> users?
>> - send a signal to specified pid
>> - check the status of the signal sent and received
>> - read the dumped information

Is this for future extensibility to make it possible to get
other information like the current execution plan which was
suggested by Pavel?

If so, I agree with considering extensibility, but I'm not
sure it's necessary whether providing these types of
functions for 'users'.

>> - How about managing the status of signal send/receive and dump
>> operations on a shared hash or others ?
>> Sending and receiving signals, dumping memory information, and
>> referencing dump information all work asynchronously.
>> Therefore, it would be good to have management information to
>> check
>> the status of each process.
>> A simple idea is that ..
>> - send a signal to dump to a PID, it first record following
>> information into the shared hash.
>> pid (specified pid)
>> loc (dump location, currently might be ASAP)
>> recv (did the pid process receive a signal? first false)
>> dumped (did the pid process dump a mem information?  first
>> false)
>> - specified process receive the signal, update the status in the
>> shared hash, then dumped at specified location.
>> - specified process finish dump mem information,  update the
>> status
>> in the shared hash.

Adding management information on shared memory seems necessary
when we want to have more controls over dumping like 'dump
location' or any other information such as 'current execution
plan'.
I'm going to consider this.


>> - Does it allow one process to output multiple dump files?
>> It appears to be a specification to overwrite at present, but I
>> thought it would be good to be able to generate
>> multiple dump files in different phases (e.g., planning phase and
>> execution phase) in the future.
>> - How is the dump file cleaned up?
>
>  For a very long time there has been similar discussion about taking
> session query and session execution plans from other sessions.
>
> I am not sure how necessary information is in the memory dump, but I
> am sure so taking the current execution plan and complete text of the
> current query is pretty necessary information.
>
> but can be great so this infrastructure can be used for any debugging
> purpose.

Thanks!
It would be good if some part of this effort can be an infrastructure
of other debugging.
It may be hard, but I will keep your comment in mind.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

>
> Regards
>
> Pavel
>
>> Best regards,
>>
>> --
>> Tatsuhito Kasahara
>> kasahara.tatsuhito _at_ gmail.com [1]
>
>
> Links:
> ------
> [1] http://gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

torikoshia
In reply to this post by Andres Freund
Thanks for reviewing!

I'm going to modify the patch according to your comments.

On 2020-09-01 10:54, Andres Freund wrote:

> Hi,
>
> On 2020-08-31 20:22:18 +0900, torikoshia wrote:
>> After commit 3e98c0bafb28de, we can display the usage of the
>> memory contexts using pg_backend_memory_contexts system
>> view.
>>
>> However, its target is limited to the  process attached to
>> the current session.
>>
>> As discussed in the thread[1], it'll be useful to make it
>> possible to get the memory contexts of an arbitrary backend
>> process.
>>
>> Attached PoC patch makes pg_get_backend_memory_contexts()
>> display meory contexts of the specified PID of the process.
>
> Awesome!
>
>
>> It doesn't display contexts of all the backends but only
>> the contexts of specified process.
>> I think it would be enough because I suppose this function
>> is used after investigations using ps command or other OS
>> level utilities.
>
> It can be used as a building block if all are needed. Getting the
> infrastructure right is the big thing here, I think. Adding more
> detailed views on top of that data later is easier.
>
>
>
>> diff --git a/src/backend/catalog/system_views.sql
>> b/src/backend/catalog/system_views.sql
>> index a2d61302f9..88fb837ecd 100644
>> --- a/src/backend/catalog/system_views.sql
>> +++ b/src/backend/catalog/system_views.sql
>> @@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
>>  REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
>>
>>  CREATE VIEW pg_backend_memory_contexts AS
>> -    SELECT * FROM pg_get_backend_memory_contexts();
>> +    SELECT * FROM pg_get_backend_memory_contexts(-1);
>
> -1 is odd. Why not use NULL or even 0?
>
>> + else
>> + {
>> + int rc;
>> + int parent_len = strlen(parent);
>> + int name_len = strlen(name);
>> +
>> + /*
>> + * write out the current memory context information.
>> + * Since some elements of values are reusable, we write it out.
>
> Not sure what the second comment line here is supposed to mean?
>
>
>> + */
>> + fputc('D', fpout);
>> + rc = fwrite(values, sizeof(values), 1, fpout);
>> + rc = fwrite(nulls, sizeof(nulls), 1, fpout);
>> +
>> + /* write out information which is not resuable from serialized
>> values */
>
> s/resuable/reusable/
>
>
>> + rc = fwrite(&name_len, sizeof(int), 1, fpout);
>> + rc = fwrite(name, name_len, 1, fpout);
>> + rc = fwrite(&idlen, sizeof(int), 1, fpout);
>> + rc = fwrite(clipped_ident, idlen, 1, fpout);
>> + rc = fwrite(&level, sizeof(int), 1, fpout);
>> + rc = fwrite(&parent_len, sizeof(int), 1, fpout);
>> + rc = fwrite(parent, parent_len, 1, fpout);
>> + (void) rc; /* we'll check for error with ferror */
>> +
>> + }
>
> This format is not descriptive. How about serializing to json or
> something? Or at least having field names?
>
> Alternatively, build the same tuple we build for the SRF, and serialize
> that. Then there's basically no conversion needed.
>
>
>> @@ -117,6 +157,8 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate
>> *tupstore,
>>  Datum
>>  pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
>>  {
>> + int pid =  PG_GETARG_INT32(0);
>> +
>>   ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
>>   TupleDesc tupdesc;
>>   Tuplestorestate *tupstore;
>> @@ -147,11 +189,258 @@
>> pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
>>
>>   MemoryContextSwitchTo(oldcontext);
>>
>> - PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>> - TopMemoryContext, NULL, 0);
>> + if (pid == -1)
>> + {
>> + /*
>> + * Since pid -1 indicates target is the local process, simply
>> + * traverse memory contexts.
>> + */
>> + PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
>> + TopMemoryContext, "", 0, NULL);
>> + }
>> + else
>> + {
>> + /*
>> + * Send signal for dumping memory contexts to the target process,
>> + * and read the dumped file.
>> + */
>> + FILE   *fpin;
>> + char dumpfile[MAXPGPATH];
>> +
>> + SendProcSignal(pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
>> +
>> + snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", pid);
>> +
>> + while (true)
>> + {
>> + CHECK_FOR_INTERRUPTS();
>> +
>> + pg_usleep(10000L);
>> +
>
> Need better signalling back/forth here.

Do you mean I should also send another signal from the dumped
process to the caller of the pg_get_backend_memory_contexts()
when it finishes dumping?

Regards,



--
Atsushi Torikoshi
NTT DATA CORPORATION

>
>
>
>> +/*
>> + * dump_memory_contexts
>> + * Dumping local memory contexts to a file.
>> + * This function does not delete the file as it is intended to be
>> read by
>> + * another process.
>> + */
>> +static void
>> +dump_memory_contexts(void)
>> +{
>> + FILE   *fpout;
>> + char tmpfile[MAXPGPATH];
>> + char dumpfile[MAXPGPATH];
>> +
>> + snprintf(tmpfile, sizeof(tmpfile), "pg_memusage/%d.tmp", MyProcPid);
>> + snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", MyProcPid);
>> +
>> + /*
>> + * Open a temp file to dump the current memory context.
>> + */
>> + fpout = AllocateFile(tmpfile, PG_BINARY_W);
>> + if (fpout == NULL)
>> + {
>> + ereport(LOG,
>> + (errcode_for_file_access(),
>> + errmsg("could not write temporary memory context file \"%s\":
>> %m",
>> + tmpfile)));
>> + return;
>> + }
>
> Probably should be opened with O_CREAT | O_TRUNC?
>
>
> Greetings,
>
> Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Kasahara Tatsuhito
In reply to this post by torikoshia
Hi,

On Thu, Sep 3, 2020 at 3:38 PM torikoshia <[hidden email]> wrote:

> >> - Currently, "the signal transmission for dumping memory
> >> information"
> >> and "the read & output of dump information "
> >> are on the same interface, but I think it would be better to
> >> separate them.
> >> How about providing the following three types of functions for
> >> users?
> >> - send a signal to specified pid
> >> - check the status of the signal sent and received
> >> - read the dumped information
>
> Is this for future extensibility to make it possible to get
> other information like the current execution plan which was
> suggested by Pavel?
Yes, but it's not only for future expansion, but also for the
usability and the stability of this feature.
For example, if you want to read one dumped file multiple times and analyze it,
you will want the ability to just read the dump.
Moreover, when it takes a long time from the receive the signal to the
dump output,
or the dump output itself takes a long time, users can investigate
where it takes time
if each process is separated.

> If so, I agree with considering extensibility, but I'm not
> sure it's necessary whether providing these types of
> functions for 'users'.
Of course, it is possible and may be necessary to provide a wrapped
sequence of processes
from sending a signal to reading  dump files.
But IMO, some users would like to perform the signal transmission,
state management and
dump file reading processes separately.

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Tom Lane-2
Kasahara Tatsuhito <[hidden email]> writes:
> Yes, but it's not only for future expansion, but also for the
> usability and the stability of this feature.
> For example, if you want to read one dumped file multiple times and analyze it,
> you will want the ability to just read the dump.

If we design it to make that possible, how are we going to prevent disk
space leaks from never-cleaned-up dump files?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Kasahara Tatsuhito
On Fri, Sep 4, 2020 at 2:40 AM Tom Lane <[hidden email]> wrote:
> Kasahara Tatsuhito <[hidden email]> writes:
> > Yes, but it's not only for future expansion, but also for the
> > usability and the stability of this feature.
> > For example, if you want to read one dumped file multiple times and analyze it,
> > you will want the ability to just read the dump.
>
> If we design it to make that possible, how are we going to prevent disk
> space leaks from never-cleaned-up dump files?
In my thought, with features such as a view that allows us to see a
list of dumped files,
it would be better to have a function that simply deletes the dump
files associated with a specific PID,
or to delete all dump files.
Some files may be dumped with unexpected delays, so I think the
cleaning feature will be necessary.
( Also, as the pgsql_tmp file, it might better to delete dump files
when PostgreSQL start.)

Or should we try to delete the dump file as soon as we can read it?

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Tomas Vondra-4
On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote:

>On Fri, Sep 4, 2020 at 2:40 AM Tom Lane <[hidden email]> wrote:
>> Kasahara Tatsuhito <[hidden email]> writes:
>> > Yes, but it's not only for future expansion, but also for the
>> > usability and the stability of this feature.
>> > For example, if you want to read one dumped file multiple times and analyze it,
>> > you will want the ability to just read the dump.
>>
>> If we design it to make that possible, how are we going to prevent disk
>> space leaks from never-cleaned-up dump files?
>In my thought, with features such as a view that allows us to see a
>list of dumped files,
>it would be better to have a function that simply deletes the dump
>files associated with a specific PID,
>or to delete all dump files.
>Some files may be dumped with unexpected delays, so I think the
>cleaning feature will be necessary.
>( Also, as the pgsql_tmp file, it might better to delete dump files
>when PostgreSQL start.)
>
>Or should we try to delete the dump file as soon as we can read it?
>

IMO making the cleanup a responsibility of the users (e.g. by exposing
the list of dumped files through a view and expecting users to delete
them in some way) is rather fragile.

I don't quite see what's the point of designing it this way. It was
suggested this improves stability and usability of this feature, but
surely making it unnecessarily complex contradicts both points?

IMHO if the user needs to process the dump repeatedly, what's preventing
him/her from storing it in a file, or something like that? At that point
it's clear it's up to them to remove the file. So I suggest to keep the
feature as simple as possible - hand the dump over and delete.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

torikoshia
On 2020-09-04 21:46, Tomas Vondra wrote:

> On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote:
>> On Fri, Sep 4, 2020 at 2:40 AM Tom Lane <[hidden email]> wrote:
>>> Kasahara Tatsuhito <[hidden email]> writes:
>>> > Yes, but it's not only for future expansion, but also for the
>>> > usability and the stability of this feature.
>>> > For example, if you want to read one dumped file multiple times and analyze it,
>>> > you will want the ability to just read the dump.
>>>
>>> If we design it to make that possible, how are we going to prevent
>>> disk
>>> space leaks from never-cleaned-up dump files?
>> In my thought, with features such as a view that allows us to see a
>> list of dumped files,
>> it would be better to have a function that simply deletes the dump
>> files associated with a specific PID,
>> or to delete all dump files.
>> Some files may be dumped with unexpected delays, so I think the
>> cleaning feature will be necessary.
>> ( Also, as the pgsql_tmp file, it might better to delete dump files
>> when PostgreSQL start.)
>>
>> Or should we try to delete the dump file as soon as we can read it?
>>
>
> IMO making the cleanup a responsibility of the users (e.g. by exposing
> the list of dumped files through a view and expecting users to delete
> them in some way) is rather fragile.
>
> I don't quite see what's the point of designing it this way. It was
> suggested this improves stability and usability of this feature, but
> surely making it unnecessarily complex contradicts both points?
>
> IMHO if the user needs to process the dump repeatedly, what's
> preventing
> him/her from storing it in a file, or something like that? At that
> point
> it's clear it's up to them to remove the file. So I suggest to keep the
> feature as simple as possible - hand the dump over and delete.

+1.
If there are no other objections, I'm going to accept this
suggestion.

Regards


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Kasahara Tatsuhito
Hi,

On Thu, Sep 10, 2020 at 8:53 PM torikoshia <[hidden email]> wrote:

>
> On 2020-09-04 21:46, Tomas Vondra wrote:
> > On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote:
> >> On Fri, Sep 4, 2020 at 2:40 AM Tom Lane <[hidden email]> wrote:
> >>> Kasahara Tatsuhito <[hidden email]> writes:
> >>> > Yes, but it's not only for future expansion, but also for the
> >>> > usability and the stability of this feature.
> >>> > For example, if you want to read one dumped file multiple times and analyze it,
> >>> > you will want the ability to just read the dump.
> >>>
> >>> If we design it to make that possible, how are we going to prevent
> >>> disk
> >>> space leaks from never-cleaned-up dump files?
> >> In my thought, with features such as a view that allows us to see a
> >> list of dumped files,
> >> it would be better to have a function that simply deletes the dump
> >> files associated with a specific PID,
> >> or to delete all dump files.
> >> Some files may be dumped with unexpected delays, so I think the
> >> cleaning feature will be necessary.
> >> ( Also, as the pgsql_tmp file, it might better to delete dump files
> >> when PostgreSQL start.)
> >>
> >> Or should we try to delete the dump file as soon as we can read it?
> >>
> >
> > IMO making the cleanup a responsibility of the users (e.g. by exposing
> > the list of dumped files through a view and expecting users to delete
> > them in some way) is rather fragile.
> >
> > I don't quite see what's the point of designing it this way. It was
> > suggested this improves stability and usability of this feature, but
> > surely making it unnecessarily complex contradicts both points?
> >
> > IMHO if the user needs to process the dump repeatedly, what's
> > preventing
> > him/her from storing it in a file, or something like that? At that
> > point
> > it's clear it's up to them to remove the file. So I suggest to keep the
> > feature as simple as possible - hand the dump over and delete.
Yeah, it might be better  to avoid making the user responsible for removal.

I think it's fine to have an interface to delete in an emergency, but
I agree that
users shouldn't be made aware of the existence or deletion of dump
files, basically.

> +1.
> If there are no other objections, I'm going to accept this
> suggestion.
So +1

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Michael Paquier-2
On Thu, Sep 10, 2020 at 09:11:21PM +0900, Kasahara Tatsuhito wrote:
> I think it's fine to have an interface to delete in an emergency, but
> I agree that
> users shouldn't be made aware of the existence or deletion of dump
> files, basically.

Per the CF bot, the number of tests needs to be tweaked, because we
test each entry filtered out with is_deeply(), meaning that the number
of tests needs to be updated to reflect that if the filtered list is
changed:
t/010_pg_basebackup.pl ... 104/109 # Looks like you planned 109 tests but ran 110.
t/010_pg_basebackup.pl ... Dubious, test returned 255 (wstat 65280, 0xff00)
All 109 subtests passed

Simple enough to fix.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

torikoshia
Hi,

Thanks for all your comments, I updated the patch.


On Tue, Sep 1, 2020 at 12:03 AM Kasahara Tatsuhito
<[hidden email]> wrote:

> - How about managing the status of signal send/receive and dump
> operations on a shared hash or others ?
> Sending and receiving signals, dumping memory information, and
> referencing dump information all work asynchronously.
> Therefore, it would be good to have management information to
> check
> the status of each process.
> A simple idea is that ..
> - send a signal to dump to a PID, it first record following
> information into the shared hash.
> pid (specified pid)
> loc (dump location, currently might be ASAP)
> recv (did the pid process receive a signal? first false)
> dumped (did the pid process dump a mem information?  first
> false)
> - specified process receive the signal, update the status in the
> shared hash, then dumped at specified location.
> - specified process finish dump mem information,  update the
> status
> in the shared hash.
I added a shared hash table consisted of minimal members
mainly for managing whether the file is dumped or not.
Some members like 'loc' seem useful in the future, but I
haven't added them since it's not essential at this point.



On 2020-09-01 10:54, Andres Freund wrote:

>>  CREATE VIEW pg_backend_memory_contexts AS
>> -    SELECT * FROM pg_get_backend_memory_contexts();
>> +    SELECT * FROM pg_get_backend_memory_contexts(-1);
>
> -1 is odd. Why not use NULL or even 0?

Changed it from -1 to NULL.

>> +             rc = fwrite(&name_len, sizeof(int), 1, fpout);
>> +             rc = fwrite(name, name_len, 1, fpout);
>> +             rc = fwrite(&idlen, sizeof(int), 1, fpout);
>> +             rc = fwrite(clipped_ident, idlen, 1, fpout);
>> +             rc = fwrite(&level, sizeof(int), 1, fpout);
>> +             rc = fwrite(&parent_len, sizeof(int), 1, fpout);
>> +             rc = fwrite(parent, parent_len, 1, fpout);
>> +             (void) rc;                              /* we'll check
>> for error with ferror */
>> +
>> +     }
> This format is not descriptive. How about serializing to json or
> something? Or at least having field names?
Added field names for each value.

>> +             while (true)
>> +             {
>> +                     CHECK_FOR_INTERRUPTS();
>> +
>> +                     pg_usleep(10000L);
>> +
>
> Need better signalling back/forth here.

Added a shared hash table that has a flag for managing whether the file
is dumped or not.
I modified it to use this flag.

>> +     /*
>> +      * Open a temp file to dump the current memory context.
>> +      */
>> +     fpout = AllocateFile(tmpfile, PG_BINARY_W);
>> +     if (fpout == NULL)
>> +     {
>> +             ereport(LOG,
>> +                             (errcode_for_file_access(),
>> +                              errmsg("could not write temporary
>> memory context file \"%s\": %m",
>> +                                             tmpfile)));
>> +             return;
>> +     }
>
> Probably should be opened with O_CREAT | O_TRUNC?
AllocateFile() calls fopen() and AFAIU fopen() with mode "w" corresponds
to open() with "O_WRONLY | O_CREAT | O_TRUNC".

Do you mean I should not use fopen() here?

On 2020-09-24 13:01, Michael Paquier wrote:

> On Thu, Sep 10, 2020 at 09:11:21PM +0900, Kasahara Tatsuhito wrote:
>> I think it's fine to have an interface to delete in an emergency, but
>> I agree that
>> users shouldn't be made aware of the existence or deletion of dump
>> files, basically.
>
> Per the CF bot, the number of tests needs to be tweaked, because we
> test each entry filtered out with is_deeply(), meaning that the number
> of tests needs to be updated to reflect that if the filtered list is
> changed:
> t/010_pg_basebackup.pl ... 104/109 # Looks like you planned 109 tests
> but ran 110.
> t/010_pg_basebackup.pl ... Dubious, test returned 255 (wstat 65280,
> 0xff00)
> All 109 subtests passed
>
> Simple enough to fix.
Incremented the number of tests.


Any thoughts?

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

0002-Enabled-pg_get_backend_memory_contexts-to-collect-ar.patch (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Kasahara Tatsuhito
Hi,

On Fri, Sep 25, 2020 at 4:28 PM torikoshia <[hidden email]> wrote:
> Thanks for all your comments, I updated the patch.
Thanks for updating the patch.
I did a brief test and code review.

> I added a shared hash table consisted of minimal members
> mainly for managing whether the file is dumped or not.
> Some members like 'loc' seem useful in the future, but I
> haven't added them since it's not essential at this point.
Yes, that would be good.

+        /*
+         * Since we allow only one session can request to dump
memory context at
+         * the same time, check whether the dump files already exist.
+         */
+        while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile, &stat_tmp) == 0)
+        {
+            pg_usleep(1000000L);
+        }

If pg_get_backend_memory_contexts() is executed by two or more
sessions at the same time, it cannot be run exclusively in this way.
Currently it seems to cause a crash when do it so.
This is easy to reproduce and can be done as follows.

[session-1]
BEGIN;
LOCK TABKE t1;
  [Session-2]
  BEGIN;
  LOCK TABLE t1; <- waiting
    [Session-3]
    select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
      [Session-4]
      select * FROM pg_get_backend_memory_contexts(<pid of session-2>);

If you issue commit or abort at session-1, you will get SEGV.

Instead of checking for the existence of the file, it might be better
to use a hash (mcxtdumpHash) entry with LWLock.

+        if (proc == NULL)
+        {
+            ereport(WARNING,
+                    (errmsg("PID %d is not a PostgreSQL server
process", dst_pid)));
+            return (Datum) 1;
+        }

Shouldn't it clear the hash entry before return?

+        /* Wait until target process finished dumping file. */
+        while (!entry->is_dumped)
+        {
+            CHECK_FOR_INTERRUPTS();
+            pg_usleep(10000L);
+        }

If the target is killed and exit before dumping the memory
information, you're in an infinite loop here.
So how about making sure that any process that has to stop before
doing a memory dump changes the status of the hash (entry->is_dumped)
before stopping and the caller detects it?
I'm not sure it's best or not, but you might want to use something
like the on_shmem_exit callback.

In the current design, if the caller stops processing before reading
the dumped file, you will have an orphaned file.
It looks like the following.

[session-1]
BEGIN;
LOCK TABKE t1;
  [Session-2]
  BEGIN;
  LOCK TABLE t1; <- waiting
    [Session-3]
    select * FROM pg_get_backend_memory_contexts(<pid of session-2>);

If you cancel or terminate the session-3, then issue commit or abort
at session-1, you will get orphaned files in pg_memusage.

So if you allow only one session can request to dump file, it could
call pg_memusage_reset() before send the signal in this function.

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

torikoshia
> On Thu, Oct 1, 2020 at 4:06 PM Kasahara Tatsuhito
> <[hidden email]> wrote:
> Hi,
>
> On Fri, Sep 25, 2020 at 4:28 PM torikoshia <[hidden email]>
> wrote:
> > Thanks for all your comments, I updated the patch.
> Thanks for updating the patch.
> I did a brief test and code review.

Thanks for your tests and review!

> > I added a shared hash table consisted of minimal members
> > mainly for managing whether the file is dumped or not.
> > Some members like 'loc' seem useful in the future, but I
> > haven't added them since it's not essential at this point.
> Yes, that would be good.
>
> +        /*
> +         * Since we allow only one session can request to dump
> memory context at
> +         * the same time, check whether the dump files already exist.
> +         */
> +        while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile,
> &stat_tmp) == 0)
> +        {
> +            pg_usleep(1000000L);
> +        }
>
> If pg_get_backend_memory_contexts() is executed by two or more
> sessions at the same time, it cannot be run exclusively in this way.
> Currently it seems to cause a crash when do it so.
> This is easy to reproduce and can be done as follows.
>
> [session-1]
> BEGIN;
> LOCK TABKE t1;
>   [Session-2]
>   BEGIN;
>   LOCK TABLE t1; <- waiting
>     [Session-3]
>     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
>       [Session-4]
>       select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
>
> If you issue commit or abort at session-1, you will get SEGV.
>
> Instead of checking for the existence of the file, it might be better
> to use a hash (mcxtdumpHash) entry with LWLock.
Thanks!
Added a LWLock and changed the way from checking the file existence
to finding the hash entry.

> +        if (proc == NULL)
> +        {
> +            ereport(WARNING,
> +                    (errmsg("PID %d is not a PostgreSQL server
> process", dst_pid)));
> +            return (Datum) 1;
> +        }
>
> Shouldn't it clear the hash entry before return?

Yeah. added codes for removing the entry.

>
> +        /* Wait until target process finished dumping file. */
> +        while (!entry->is_dumped)
> +        {
> +            CHECK_FOR_INTERRUPTS();
> +            pg_usleep(10000L);
> +        }
>
> If the target is killed and exit before dumping the memory
> information, you're in an infinite loop here.
> So how about making sure that any process that has to stop before
> doing a memory dump changes the status of the hash (entry->is_dumped)
> before stopping and the caller detects it?
> I'm not sure it's best or not, but you might want to use something
> like the on_shmem_exit callback.
Thanks for your idea!
Added a callback to change the status of the hash table entry.

Although I think it's necessary to remove this callback when it finished
processing memory dumping, on_shmem_exit() does not seem to have such
a function.
I used before_shmem_exit() since it has a corresponding function to
remove registered callback.
If it's inappropriate, I'm going to add a function removing the
registered callback of on_shmem_exit().

> In the current design, if the caller stops processing before reading
> the dumped file, you will have an orphaned file.
> It looks like the following.
>
> [session-1]
> BEGIN;
> LOCK TABKE t1;
>   [Session-2]
>   BEGIN;
>   LOCK TABLE t1; <- waiting
>     [Session-3]
>     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
>
> If you cancel or terminate the session-3, then issue commit or abort
> at session-1, you will get orphaned files in pg_memusage.
>
> So if you allow only one session can request to dump file, it could
> call pg_memusage_reset() before send the signal in this function.
Although I'm going to allow only one session per one target process,
I'd like to allow running multiple pg_get_backend_memory_contexts()
which target process is different.

Instead of calling pg_memusage_reset(), I added a callback for
cleaning up orphaned files and the elements of the hash table
using before_shmem_exit() through PG_ENSURE_ERROR_CLEANUP() and
PG_END_ENSURE_ERROR_CLEANUP().

I chose PG_ENSURE_ERROR_CLEANUP() and PG_END_ENSURE_ERROR_CLEANUP()
here since it can handle not only termination but also cancellation.

Any thoughts?


--
Atsushi Torikoshi

0003-Enabled-pg_get_backend_memory_contexts-to-collect.patch (39K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Kyotaro Horiguchi-4
Wait...

> Attachments: 0003-Enabled-pg_get_backend_memory_contexts-to-collect.patch

For a moment I thought that the number is patch number but the
predecessors are 0002-Enabled..collect.patch and 0001-(same
name). It's not mandatory but we usually do as the follows and it's
the way of git.

v1-0001-Enabled...collect.patch
v2-0001-Enabled...collect.patch

The vn is added by -v option for git-format-patch.

At Thu, 22 Oct 2020 21:32:00 +0900, torikoshia <[hidden email]> wrote in

> > > I added a shared hash table consisted of minimal members
> > > mainly for managing whether the file is dumped or not.
> > > Some members like 'loc' seem useful in the future, but I
> > > haven't added them since it's not essential at this point.
> > Yes, that would be good.
> > +        /*
> > +         * Since we allow only one session can request to dump
> > memory context at
> > +         * the same time, check whether the dump files already exist.
> > +         */
> > +        while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile,
> > &stat_tmp) == 0)
> > +        {
> > +            pg_usleep(1000000L);
> > +        }
> > If pg_get_backend_memory_contexts() is executed by two or more
> > sessions at the same time, it cannot be run exclusively in this way.
> > Currently it seems to cause a crash when do it so.
> > This is easy to reproduce and can be done as follows.
> > [session-1]
> > BEGIN;
> > LOCK TABKE t1;
> >   [Session-2]
> >   BEGIN;
> >   LOCK TABLE t1; <- waiting
> >     [Session-3]
> >     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
> >       [Session-4]
> >       select * FROM pg_get_backend_memory_contexts(<pid of
> > session-2>);
> > If you issue commit or abort at session-1, you will get SEGV.
> > Instead of checking for the existence of the file, it might be better
> > to use a hash (mcxtdumpHash) entry with LWLock.
>
> Thanks!
> Added a LWLock and changed the way from checking the file existence
> to finding the hash entry.

> > +        if (proc == NULL)
> > +        {
> > +            ereport(WARNING,
> > +                    (errmsg("PID %d is not a PostgreSQL server
> > process", dst_pid)));
> > +            return (Datum) 1;
> > +        }
> > Shouldn't it clear the hash entry before return?
>
> Yeah. added codes for removing the entry.

+ entry = AddEntryToMcxtdumpHash(dst_pid);
+
+ /* Check whether the target process is PostgreSQL backend process. */
+ /* TODO: Check also whether backend or not. */
+ proc = BackendPidGetProc(dst_pid);
+
+ if (proc == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL server process", dst_pid)));
+
+ LWLockAcquire(McxtDumpHashLock, LW_EXCLUSIVE);
+
+ if (hash_search(mcxtdumpHash, &dst_pid, HASH_REMOVE, NULL) == NULL)
+ elog(WARNING, "hash table corrupted");
+
+ LWLockRelease(McxtDumpHashLock);
+
+ return (Datum) 1;
+ }

Why do you enter a useles entry then remove it immedately?

+ PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) Int32GetDatum(dst_pid));
+ {
+ SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);

"PROCSIG_DUMP_MEMORY" is somewhat misleading. Hwo about
"PROCSIG_DUMP_MEMCXT" or "PROCSIG_DUMP_MEMORY_CONTEXT"?

I thought that the hash table would prevent multiple reqestors from
making a request at once, but the patch doesn't seem to do that.

+ /* Wait until target process finished dumping file. */
+ while (entry->dump_status == MCXTDUMPSTATUS_NOTYET)

This needs LWLock. And this could read the entry after reused by
another backend if the dumper process is gone. That isn't likely to
happen, but theoretically the other backend may set it to
MCXTDUMPSTATUS_NOTYET inbetween two successive check on the member.

+ /*
+ * Make dump file ends with 'D'.
+ * This is checked by the caller when reading the file.
+ */
+ fputc('E', fpout);

Which is right?

+ fputc('E', fpout);
+
+ CHECK_FOR_INTERRUPTS();

This means that the process accepts another request and rewrite the
file even while the first requester is reading it. And, the file can
be removed by the first requestor before the second requestor can read
it.

+ mcxtdumpHash = ShmemInitHash("mcxtdump hash",
+   SHMEM_MEMCONTEXT_SIZE,
+   SHMEM_MEMCONTEXT_SIZE,

The space needed for this hash doesn't seem to be secured. The hash is
sized to 64 entries, so pg_get_backend_memory_contexts() may fail with
ERROR "out of shared memory".

The hash is used only to check if the dump file is completed and if
ended with error. If we need only those, an shared byte array with the
length of max_backend is sufficient.

+ PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) Int32GetDatum(dst_pid));
+ {
+ SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
+
+ /* Wait until target process finished dumping file. */
+ while (entry->dump_status == MCXTDUMPSTATUS_NOTYET)
+ {
+ CHECK_FOR_INTERRUPTS();
+ pg_usleep(10000L);
+ }
+ }
+ PG_END_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) Int32GetDatum(dst_pid));
+
+ if (entry->dump_status == MCXTDUMPSTATUS_ERROR)
+ {
..
+ if (stat(tmpfile, &stat_tmp) == 0)
+ unlink(tmpfile);
+ if (stat(dumpfile, &stat_tmp) == 0)
+ unlink(dumpfile);
...
+ return (Datum) 0;
+ }
+
+ /* Read values from the dumped file and put them on tuplestore. */
+ PutDumpedValuesOnTuplestore(dumpfile, tupstore, tupdesc, dst_pid);

This means that if the function gets sigint before the dumper creates
the file, the dumper can leave a dump file?

> > +        /* Wait until target process finished dumping file. */
> > +        while (!entry->is_dumped)
> > +        {
> > +            CHECK_FOR_INTERRUPTS();
> > +            pg_usleep(10000L);
> > +        }
> > If the target is killed and exit before dumping the memory
> > information, you're in an infinite loop here.
> > So how about making sure that any process that has to stop before
> > doing a memory dump changes the status of the hash (entry->is_dumped)
> > before stopping and the caller detects it?
> > I'm not sure it's best or not, but you might want to use something
> > like the on_shmem_exit callback.
>
> Thanks for your idea!
> Added a callback to change the status of the hash table entry.
>
> Although I think it's necessary to remove this callback when it
> finished
> processing memory dumping, on_shmem_exit() does not seem to have such
> a function.
> I used before_shmem_exit() since it has a corresponding function to
> remove registered callback.
> If it's inappropriate, I'm going to add a function removing the
> registered callback of on_shmem_exit().

This seems to leave a file for the pid.

> > In the current design, if the caller stops processing before reading
> > the dumped file, you will have an orphaned file.
> > It looks like the following.
> > [session-1]
> > BEGIN;
> > LOCK TABKE t1;
> >   [Session-2]
> >   BEGIN;
> >   LOCK TABLE t1; <- waiting
> >     [Session-3]
> >     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
> > If you cancel or terminate the session-3, then issue commit or abort
> > at session-1, you will get orphaned files in pg_memusage.
> > So if you allow only one session can request to dump file, it could
> > call pg_memusage_reset() before send the signal in this function.
>
> Although I'm going to allow only one session per one target process,
> I'd like to allow running multiple pg_get_backend_memory_contexts()
> which target process is different.
>
> Instead of calling pg_memusage_reset(), I added a callback for
> cleaning up orphaned files and the elements of the hash table
> using before_shmem_exit() through PG_ENSURE_ERROR_CLEANUP() and
> PG_END_ENSURE_ERROR_CLEANUP().
>
> I chose PG_ENSURE_ERROR_CLEANUP() and PG_END_ENSURE_ERROR_CLEANUP()
> here since it can handle not only termination but also cancellation.
>
> Any thoughts?

+/*
+ * pg_memusage_reset
+ * Remove the memory context dump files.
+ */
+void
+pg_memusage_reset(int pid)

The function name seem to represents somewhat different from what it
does.


I think we might need to step-back to basic design of this feature
since this patch seems to have unhandled corner cases that are
difficult to find.

- Dump file lifecycle or state-transition of the dumper

  Currently the lifecycle of a dump file, or the state-transition of
  the dumper process doesn't seem to be well defined.

  The file is create only by the dumper.
  If the requestor reads the completed file, the reader removes it.

  If the dumper receives a cancel request, the dumper removes it if
  any.

  Of course dumper removes it if it fails to complete the file.

- Better way of signalling between the requestor and the dumper

  I think there's no doubt about request signal.
 
  About the complete signal, currently the requestor polls on a flag
  in a hash entry.  I'm wondering if we can get rid of polling. The
  problem on doing that is the lack of a means for a dumper to know
  the requestor.  We need to store requestor pid or backend id in the
  shared hash entry.

  By the way, about shared hash entry, it uses about 70kB for only 64
  entries so it seems inefficient than a shared array that has
  MaxBackends entries.  If we used a following array on shared memory,

  struct hoge
  {
    BackendId requestor[MAX_BACKENDS];
        int  status[MAX_BACKENDS];
        LWLock lock;
  };

  This array has the size of 24 * MaxBackends + 16. 24kB for 1000
  backends.  It could be on dsm since this feature is not used
  commonly.


- The way to cancel a request already made. (or management of the dump
  state transition.)

  Cancellation seems to contain some race conditions. But basically
  that could be done by sending a request signal after setting the
  hoge.requestor above to some special value, not needing the third
  type of signal.  The special value should be different from the
  initial state, which signals that the process is accepting a new
  request.

  As the whole, that would looks like the folloing?

------------------------------------------------------------
Successful request.

  Requestor         dumper       state
                    [idle]       initial
  [request] -------------------> requestor pid/backendid
           -signal->
                    [dumping]
           <-signal-[done]
  [read]
  [done]   --------------------> initial

------------------------------------------------------------
On failure, the dumper signals with setting state to initial.
  [request] -------------------> requestor pid/backendid
           -signal->
                    [dumping]
                                        [failed]     initial
           <-signal-
     (some other requestor might come meantime.)
  <sees that the requestor is not me>
  [failed]

------------------------------------------------------------
If the requestor wants to cancel the request, it sets the state to
'cancel' then signal.

Requestor         dumper       state
                    [idle]       initial
  [request] -------------------> cancel
                    <if canceled. clean up>
                    [dumping]
                    <if canceled. clean up>
           <-signal-[done]

           -signal-><try to clean up>


Other aspects to cnosider?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


 
 


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

torikoshia
On 2020-10-23 13:46, Kyotaro Horiguchi wrote:

> Wait...
>
>> Attachments:
>> 0003-Enabled-pg_get_backend_memory_contexts-to-collect.patch
>
> For a moment I thought that the number is patch number but the
> predecessors are 0002-Enabled..collect.patch and 0001-(same
> name). It's not mandatory but we usually do as the follows and it's
> the way of git.
>
> v1-0001-Enabled...collect.patch
> v2-0001-Enabled...collect.patch
>
> The vn is added by -v option for git-format-patch.

Sorry for the confusion. I'll follow that way next time.

> At Thu, 22 Oct 2020 21:32:00 +0900, torikoshia
> <[hidden email]> wrote in
>> > > I added a shared hash table consisted of minimal members
>> > > mainly for managing whether the file is dumped or not.
>> > > Some members like 'loc' seem useful in the future, but I
>> > > haven't added them since it's not essential at this point.
>> > Yes, that would be good.
>> > +        /*
>> > +         * Since we allow only one session can request to dump
>> > memory context at
>> > +         * the same time, check whether the dump files already exist.
>> > +         */
>> > +        while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile,
>> > &stat_tmp) == 0)
>> > +        {
>> > +            pg_usleep(1000000L);
>> > +        }
>> > If pg_get_backend_memory_contexts() is executed by two or more
>> > sessions at the same time, it cannot be run exclusively in this way.
>> > Currently it seems to cause a crash when do it so.
>> > This is easy to reproduce and can be done as follows.
>> > [session-1]
>> > BEGIN;
>> > LOCK TABKE t1;
>> >   [Session-2]
>> >   BEGIN;
>> >   LOCK TABLE t1; <- waiting
>> >     [Session-3]
>> >     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
>> >       [Session-4]
>> >       select * FROM pg_get_backend_memory_contexts(<pid of
>> > session-2>);
>> > If you issue commit or abort at session-1, you will get SEGV.
>> > Instead of checking for the existence of the file, it might be better
>> > to use a hash (mcxtdumpHash) entry with LWLock.
>>
>> Thanks!
>> Added a LWLock and changed the way from checking the file existence
>> to finding the hash entry.
>
>> > +        if (proc == NULL)
>> > +        {
>> > +            ereport(WARNING,
>> > +                    (errmsg("PID %d is not a PostgreSQL server
>> > process", dst_pid)));
>> > +            return (Datum) 1;
>> > +        }
>> > Shouldn't it clear the hash entry before return?
>>
>> Yeah. added codes for removing the entry.
>
> + entry = AddEntryToMcxtdumpHash(dst_pid);
> +
> + /* Check whether the target process is PostgreSQL backend process.
> */
> + /* TODO: Check also whether backend or not. */
> + proc = BackendPidGetProc(dst_pid);
> +
> + if (proc == NULL)
> + {
> + ereport(WARNING,
> + (errmsg("PID %d is not a PostgreSQL server process", dst_pid)));
> +
> + LWLockAcquire(McxtDumpHashLock, LW_EXCLUSIVE);
> +
> + if (hash_search(mcxtdumpHash, &dst_pid, HASH_REMOVE, NULL) == NULL)
> + elog(WARNING, "hash table corrupted");
> +
> + LWLockRelease(McxtDumpHashLock);
> +
> + return (Datum) 1;
> + }
>
> Why do you enter a useles entry then remove it immedately?

Do you mean I should check the process existence first
since it enables us to skip entering hash entries?

>
> + PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum)
> Int32GetDatum(dst_pid));
> + {
> + SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
>
> "PROCSIG_DUMP_MEMORY" is somewhat misleading. Hwo about
> "PROCSIG_DUMP_MEMCXT" or "PROCSIG_DUMP_MEMORY_CONTEXT"?

I'll go with "PROCSIG_DUMP_MEMCXT".

>
> I thought that the hash table would prevent multiple reqestors from
> making a request at once, but the patch doesn't seem to do that.
>
> + /* Wait until target process finished dumping file. */
> + while (entry->dump_status == MCXTDUMPSTATUS_NOTYET)
>
> This needs LWLock. And this could read the entry after reused by
> another backend if the dumper process is gone. That isn't likely to
> happen, but theoretically the other backend may set it to
> MCXTDUMPSTATUS_NOTYET inbetween two successive check on the member.

Thanks for your notification.
I'll use an LWLock.

>
> + /*
> + * Make dump file ends with 'D'.
> + * This is checked by the caller when reading the file.
> + */
> + fputc('E', fpout);
>
> Which is right?

Sorry, the comment was wrong..

>
> + fputc('E', fpout);
> +
> + CHECK_FOR_INTERRUPTS();
>
> This means that the process accepts another request and rewrite the
> file even while the first requester is reading it. And, the file can
> be removed by the first requestor before the second requestor can read
> it.

I added CHECK_FOR_INTERRUPTS() here to make the dump cancellation
possible, however, considering your indication, it needs to think
about a way to handle only the dump cancellation.

>
> + mcxtdumpHash = ShmemInitHash("mcxtdump hash",
> +   SHMEM_MEMCONTEXT_SIZE,
> +   SHMEM_MEMCONTEXT_SIZE,
> .
> The space needed for this hash doesn't seem to be secured. The hash is
> sized to 64 entries, so pg_get_backend_memory_contexts() may fail with
> ERROR "out of shared memory".
>
> The hash is used only to check if the dump file is completed and if
> ended with error. If we need only those, an shared byte array with the
> length of max_backend is sufficient.

Although there was a comment that controlling dump location may be a
good idea, but it also seems possible to include the location
information
in the struct Hoge you suggested below.

On Tue, Sep 1, 2020 at 12:03 AM Kasahara Tatsuhito <[hidden email]>
wrote:
|   - send a signal to dump to a PID, it first record following
information into the shared hash.
|     pid (specified pid)
|     loc (dump location, currently might be ASAP)


>
> + PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum)
> Int32GetDatum(dst_pid));
> + {
> + SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
> +
> + /* Wait until target process finished dumping file. */
> + while (entry->dump_status == MCXTDUMPSTATUS_NOTYET)
> + {
> + CHECK_FOR_INTERRUPTS();
> + pg_usleep(10000L);
> + }
> + }
> + PG_END_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum)
> Int32GetDatum(dst_pid));
> +
> + if (entry->dump_status == MCXTDUMPSTATUS_ERROR)
> + {
> ..
> + if (stat(tmpfile, &stat_tmp) == 0)
> + unlink(tmpfile);
> + if (stat(dumpfile, &stat_tmp) == 0)
> + unlink(dumpfile);
> ...
> + return (Datum) 0;
> + }
> +
> + /* Read values from the dumped file and put them on tuplestore. */
> + PutDumpedValuesOnTuplestore(dumpfile, tupstore, tupdesc, dst_pid);
>
> This means that if the function gets sigint before the dumper creates
> the file, the dumper can leave a dump file?

In that case, the requestor removes the corresponding hash entry in the
callback McxtReqKill, then the dumper who can not find the hash entry
does not dump a file.

However, I've now noticed that when the requestor gets sigint just
after the dumper check, the dump file remains.

In the current design, only the requestor can remove the dump file,
but it seems necessary to allow the dumper to remove it.


>> > +        /* Wait until target process finished dumping file. */
>> > +        while (!entry->is_dumped)
>> > +        {
>> > +            CHECK_FOR_INTERRUPTS();
>> > +            pg_usleep(10000L);
>> > +        }
>> > If the target is killed and exit before dumping the memory
>> > information, you're in an infinite loop here.
>> > So how about making sure that any process that has to stop before
>> > doing a memory dump changes the status of the hash (entry->is_dumped)
>> > before stopping and the caller detects it?
>> > I'm not sure it's best or not, but you might want to use something
>> > like the on_shmem_exit callback.
>>
>> Thanks for your idea!
>> Added a callback to change the status of the hash table entry.
>>
>> Although I think it's necessary to remove this callback when it
>> finished
>> processing memory dumping, on_shmem_exit() does not seem to have such
>> a function.
>> I used before_shmem_exit() since it has a corresponding function to
>> remove registered callback.
>> If it's inappropriate, I'm going to add a function removing the
>> registered callback of on_shmem_exit().
>
> This seems to leave a file for the pid.

As mentioned above, there can be a chance to remain files.

>
>> > In the current design, if the caller stops processing before reading
>> > the dumped file, you will have an orphaned file.
>> > It looks like the following.
>> > [session-1]
>> > BEGIN;
>> > LOCK TABKE t1;
>> >   [Session-2]
>> >   BEGIN;
>> >   LOCK TABLE t1; <- waiting
>> >     [Session-3]
>> >     select * FROM pg_get_backend_memory_contexts(<pid of session-2>);
>> > If you cancel or terminate the session-3, then issue commit or abort
>> > at session-1, you will get orphaned files in pg_memusage.
>> > So if you allow only one session can request to dump file, it could
>> > call pg_memusage_reset() before send the signal in this function.
>>
>> Although I'm going to allow only one session per one target process,
>> I'd like to allow running multiple pg_get_backend_memory_contexts()
>> which target process is different.
>>
>> Instead of calling pg_memusage_reset(), I added a callback for
>> cleaning up orphaned files and the elements of the hash table
>> using before_shmem_exit() through PG_ENSURE_ERROR_CLEANUP() and
>> PG_END_ENSURE_ERROR_CLEANUP().
>>
>> I chose PG_ENSURE_ERROR_CLEANUP() and PG_END_ENSURE_ERROR_CLEANUP()
>> here since it can handle not only termination but also cancellation.
>>
>> Any thoughts?
>
> +/*
> + * pg_memusage_reset
> + * Remove the memory context dump files.
> + */
> +void
> +pg_memusage_reset(int pid)
>
> The function name seem to represents somewhat different from what it
> does.

Yeah, It looks like it actually reset the memory.
I'll rename it to remove_memcxt_file or something.

>
> I think we might need to step-back to basic design of this feature
> since this patch seems to have unhandled corner cases that are
> difficult to find.

Agreed and thanks for writing it down below.


> - Dump file lifecycle or state-transition of the dumper
>
>   Currently the lifecycle of a dump file, or the state-transition of
>   the dumper process doesn't seem to be well defined.
>
>   The file is create only by the dumper.
>   If the requestor reads the completed file, the reader removes it.
>
>   If the dumper receives a cancel request, the dumper removes it if
>   any.


>
>   Of course dumper removes it if it fails to complete the file.
>
> - Better way of signalling between the requestor and the dumper
>
>   I think there's no doubt about request signal.
>
>   About the complete signal, currently the requestor polls on a flag
>   in a hash entry.  I'm wondering if we can get rid of polling. The
>   problem on doing that is the lack of a means for a dumper to know
>   the requestor.  We need to store requestor pid or backend id in the
>   shared hash entry.

Agreed to get rid of polling.

BTW, it seems common to use a latch instead of pg_usleep() to wait until
signals arrive as far as I read latch.h.
I'm now thinking about using a latch here and it would make polling
removed.

>   By the way, about shared hash entry, it uses about 70kB for only 64
>   entries so it seems inefficient than a shared array that has
>   MaxBackends entries.  If we used a following array on shared memory,
>
>   struct hoge
>   {
>     BackendId requestor[MAX_BACKENDS];
> int  status[MAX_BACKENDS];
> LWLock lock;
>   };

If the requestor's id is 5 and dumper's id is 10,
is this struct used like below?

- hoge.requestor[10] = 5
- Both status[5] and status[10] change like "request", "idle" or "done"

>   This array has the size of 24 * MaxBackends + 16. 24kB for 1000
>   backends.  It could be on dsm since this feature is not used
>   commonly.

Sorry but I'm not sure this calculation.
Do you mean 16.24kB for 1000 backends?

Regarding dsm, do you imagine using hoge on dsm?

>
> - The way to cancel a request already made. (or management of the dump
>   state transition.)
>
>   Cancellation seems to contain some race conditions. But basically
>   that could be done by sending a request signal after setting the
>   hoge.requestor above to some special value, not needing the third

I imagined hoge.requestor was set to requestor's backendid.
Isn't it hoge.status?

>   type of signal.  The special value should be different from the
>   initial state, which signals that the process is accepting a new
>   request.
>
>   As the whole, that would looks like the folloing?
>
> ------------------------------------------------------------
> Successful request.
>
>   Requestor         dumper       state
>                     [idle]       initial
>   [request] -------------------> requestor pid/backendid
>            -signal->
>                     [dumping]
>            <-signal-[done]
>   [read]
>   [done]   --------------------> initial
>
> ------------------------------------------------------------
> On failure, the dumper signals with setting state to initial.
>   [request] -------------------> requestor pid/backendid
>            -signal->
>                     [dumping]
> [failed]     initial
>            <-signal-
>      (some other requestor might come meantime.)
>   <sees that the requestor is not me>

Is this "some other requestor come case" relevant to the dumper failure?

Regards,

--
Atsushi Torikoshi

>   [failed]
>
> ------------------------------------------------------------
> If the requestor wants to cancel the request, it sets the state to
> 'cancel' then signal.
>
> Requestor         dumper       state
>                     [idle]       initial
>   [request] -------------------> cancel
>                     <if canceled. clean up>
>                     [dumping]
>                     <if canceled. clean up>
>            <-signal-[done]
>
>            -signal-><try to clean up>
>
>
> Other aspects to cnosider?
>
> regards.


Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

Georgios Kokolatos
Hi,

I noticed that this patch fails on the cfbot.
For this, I changed the status to: 'Waiting on Author'.

Cheers,
//Georgios

The new status of this patch is: Waiting on Author
Reply | Threaded
Open this post in threaded view
|

Re: Get memory contexts of an arbitrary backend process

torikoshia
On 2020-10-28 15:32, torikoshia wrote:
> On 2020-10-23 13:46, Kyotaro Horiguchi wrote:

>> I think we might need to step-back to basic design of this feature
>> since this patch seems to have unhandled corner cases that are
>> difficult to find.

I've written out the basic design below and attached
corresponding patch.

   # Communication flow between the dumper and the requester
   - (1) When REQUESTING memory context dumping, the dumper adds an entry
to the shared memory. The entry manages the dump state and it is set to
'REQUESTING'.
   - (2) The dumper sends the signal to the dumper and wait on the latch.
   - (3) The dumper looks into the corresponding shared memory entry and
changes its state to 'DUMPING'.
   - (4) When the dumper completes dumping, it changes the state to
'DONE' and set the latch.
   - (5) The dumper reads the dump file and shows it to the user.
Finally, the dumper removes the dump file and reset the shared memory
entry.

   # Query cancellation
   - When the requestor cancels dumping, e.g. signaling using ctrl-C, the
requestor changes the status of the shared memory entry to 'CANCELING'.
   - The dumper checks the status when it tries to change the state to
'DONE' at (4), and if the state is 'CANCELING', it removes the dump file
and reset the shared memory entry.

   # Cleanup dump file and the shared memory entry
   - In the normal case, the dumper removes the dump file and resets the
shared memory entry as described in (5).
   - When something like query cancellation or process termination
happens on the dumper after (1) and before (3), in other words, the
state is 'REQUESTING', the requestor does the cleanup.
   - When something happens on the dumper or the requestor after (3) and
before (4), in other words, the state is 'DUMPING', the dumper does the
cleanup. Specifically, if the requestor cancels the query, it just
changes the state to 'CANCELING' and the dumper notices it and cleans up
things later. OTOH, when the dumper fails to dump, it cleans up the dump
file and deletes the entry on the shared memory.
   - When something happens on the requestor after (4), i.e., the state
is 'DONE', the requestor does the cleanup.
   - In the case of receiving SIGKILL or power failure, all dump files
are removed in the crash recovery process.


Although there was a suggestion that shared memory hash
table should be changed to more efficient structures,
I haven't done it in this patch.
I think it can be treated separately, I'm going to work
on that later.


On 2020-11-11 00:07, Georgios Kokolatos wrote:
> Hi,
>
> I noticed that this patch fails on the cfbot.
> For this, I changed the status to: 'Waiting on Author'.
>
> Cheers,
> //Georgios
>
> The new status of this patch is: Waiting on Author

Thanks for your notification and updated the patch.
Changed the status to: 'Waiting on Author'.

Regards,

--
Atsushi Torikoshi

v4-0001-Enabled-pg_get_backend_memory_contexts-to-collect.patch (42K) Download Attachment
12