Creating a function for exposing memory usage of backend process

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Creating a function for exposing memory usage of backend process

torikoshia
Hi,

As you may know better than I do, backend processes sometimes use a lot
of memory because of the various reasons like caches, prepared
statements and cursors.
When supporting PostgreSQL, I face situations for investigating the
reason of memory bloat.

AFAIK, the way to examine it is attaching a debugger and call
MemoryContextStats(TopMemoryContext), however, I feel some difficulties
doing it:

   - some production environments don't allow us to run a debugger easily
   - many lines about memory contexts are hard to analyze

Using an extension(pg_stat_get_memory_context() in pg_cheat_funcs[1]),
we can get the view of the memory contexts, but it's the memory contexts
of the backend which executed the pg_stat_get_memory_context().


[user interface]
If we have a function exposing memory contexts for specified PID,
we can easily examine them.
I imagine a user interface something like this:

   =# SELECT * FROM pg_stat_get_backend_memory_context(PID);

            name           |       parent       | level | total_bytes |
total_nblocks | free_bytes | free_chunks | used_bytes | some other
attibutes..
--------------------------+--------------------+-------+-------------+---------------+------------+-------------+------------
  TopMemoryContext         |                    |     0 |       68720 |  
           5 |       9936 |          16 |      58784
  TopTransactionContext    | TopMemoryContext   |     1 |        8192 |  
           1 |       7720 |           0 |        472
  PL/pgSQL function        | TopMemoryContext   |     1 |       16384 |  
           2 |       5912 |           1 |      10472
  PL/pgSQL function        | TopMemoryContext   |     1 |       32768 |  
           3 |      15824 |           3 |      16944
  dynahash                 | TopMemoryContext   |     1 |        8192 |  
           1 |        512 |           0 |       7680
...


[rough implementation ideas and challenges]
I suppose communication between a process which runs
pg_stat_get_backend_memory_context()(referred to as A) and
target backend(referred to as B) is like:

   1. A sends a message to B and order to dump the memory contexts
   2. B dumps its memory contexts to some shared area
   3. A reads the shared area and returns it to the function invoker

To do so, there seem some challenges.

(1) how to share memory contexts information between backend processes
The amount of memory contexts greatly varies depending on the
situation, so it's not appropriate to share the memory contexts using
fixed shared memory.
Also using the file on 'stats_temp_directory' seems difficult thinking
about the background of the shared-memory based stats collector
proposal[2].
Instead, I'm thinking about using dsm_mq, which allows messages of
arbitrary length to be sent and receive.

(2) how to send messages wanting memory contexts
Communicating via signal seems simple but assigning a specific number
of signal for this purpose seems wasting.
I'm thinking about using fixed shared memory to put dsm_mq handle.
To send a message, A creates a dsm_mq and put the handle on the shared
memory area. When B founds a handle, B dumps the memory contexts to the
corresponding dsm_mq.

However, enabling B to find the handle needs to check the shared memory
periodically. I'm not sure the suitable location and timing for this
checking yet, and doubt this way of communication is acceptable because
it gives certain additional loads to all the backends.

(3) clarifying the necessary attributes
As far as reading the past disucussion[3], it's not so clear what kind
of information should be exposed regarding memory contexts.


As a first step, to deal with (3) I'd like to add
pg_stat_get_backend_memory_context() which target is limited to the
local backend process.


Thanks for reading and how do you think?


[1]
https://github.com/MasaoFujii/pg_cheat_funcs#setof-record-pg_stat_get_memory_context
[2]
https://www.postgresql.org/message-id/flat/20180629.173418.190173462.horiguchi.kyotaro@...
[3]
https://www.postgresql.org/message-id/20190805171608.g22gxwmfr2r7uf6t%40alap3.anarazel.de


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Fujii Masao-4


On 2020/06/17 22:00, torikoshia wrote:

> Hi,
>
> As you may know better than I do, backend processes sometimes use a lot
> of memory because of the various reasons like caches, prepared
> statements and cursors.
> When supporting PostgreSQL, I face situations for investigating the
> reason of memory bloat.
>
> AFAIK, the way to examine it is attaching a debugger and call
> MemoryContextStats(TopMemoryContext), however, I feel some difficulties
> doing it:
>
>    - some production environments don't allow us to run a debugger easily
>    - many lines about memory contexts are hard to analyze

Agreed. The feature to view how local memory contexts are used in
each process is very useful!


> Using an extension(pg_stat_get_memory_context() in pg_cheat_funcs[1]),
> we can get the view of the memory contexts, but it's the memory contexts
> of the backend which executed the pg_stat_get_memory_context().
>
>
> [user interface]
> If we have a function exposing memory contexts for specified PID,
> we can easily examine them.
> I imagine a user interface something like this:
>
>    =# SELECT * FROM pg_stat_get_backend_memory_context(PID);

I'm afraid that this interface is not convenient when we want to monitor
the usages of local memory contexts for all the processes. For example,
I'd like to monitor how much memory is totally used to store prepared
statements information. For that purpose, I wonder if it's more convenient
to provide the view displaying the memory context usages for
all the processes.

To provide that view, all the processes need to save their local memory
context usages into the shared memory or the special files in their
convenient timing. For example, backends do that every end of query
execution (during waiting for new request from clients). OTOH,
the query on the view scans and displays all those information.

Of course there would be several issues in this idea. One issue is
the performance overhead caused when each process stores
its own memory context usage to somewhere. Even if backends do that
during waiting for new request from clients, non-negligible overhead
might happen. Performance test is necessary. Also this means that
we cannot see the memory context usage of the process in the middle of
query execution since it's saved at the end of query. If local memory bloat
occurs only during query execution and we want to investigate it, we still
need to use gdb to output the memory context information.

Another issue is that the large amount of shared memory might be
necessary to save the memory context usages for all the proceses. We can
save the usage information into the file instead, but which would cause
more overhead. If we use shared memory, the similar parameter like
track_activity_query_size might be necessary. That is, backends save
only the specified number of memory context information. If it's zero,
the feature is disabled.

Also we should reduce the same of information to save. For example,
instead of saving all memory context information like MemoryContextStats()
prints, it might be better to save the summary stats (per memory context
type) from them.


>
>             name           |       parent       | level | total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes | some other attibutes..
> --------------------------+--------------------+-------+-------------+---------------+------------+-------------+------------
>   TopMemoryContext         |                    |     0 |       68720 |           5 |       9936 |          16 |      58784
>   TopTransactionContext    | TopMemoryContext   |     1 |        8192 |           1 |       7720 |           0 |        472
>   PL/pgSQL function        | TopMemoryContext   |     1 |       16384 |           2 |       5912 |           1 |      10472
>   PL/pgSQL function        | TopMemoryContext   |     1 |       32768 |           3 |      15824 |           3 |      16944
>   dynahash                 | TopMemoryContext   |     1 |        8192 |           1 |        512 |           0 |       7680
> ...
>
>
> [rough implementation ideas and challenges]
> I suppose communication between a process which runs
> pg_stat_get_backend_memory_context()(referred to as A) and
> target backend(referred to as B) is like:
>
>    1. A sends a message to B and order to dump the memory contexts
>    2. B dumps its memory contexts to some shared area
>    3. A reads the shared area and returns it to the function invoker
>
> To do so, there seem some challenges.
>
> (1) how to share memory contexts information between backend processes
> The amount of memory contexts greatly varies depending on the
> situation, so it's not appropriate to share the memory contexts using
> fixed shared memory.
> Also using the file on 'stats_temp_directory' seems difficult thinking
> about the background of the shared-memory based stats collector
> proposal[2].
> Instead, I'm thinking about using dsm_mq, which allows messages of
> arbitrary length to be sent and receive.
>
> (2) how to send messages wanting memory contexts
> Communicating via signal seems simple but assigning a specific number
> of signal for this purpose seems wasting.
> I'm thinking about using fixed shared memory to put dsm_mq handle.
> To send a message, A creates a dsm_mq and put the handle on the shared
> memory area. When B founds a handle, B dumps the memory contexts to the
> corresponding dsm_mq.
>
> However, enabling B to find the handle needs to check the shared memory
> periodically. I'm not sure the suitable location and timing for this
> checking yet, and doubt this way of communication is acceptable because
> it gives certain additional loads to all the backends.
>
> (3) clarifying the necessary attributes
> As far as reading the past disucussion[3], it's not so clear what kind
> of information should be exposed regarding memory contexts.
>
>
> As a first step, to deal with (3) I'd like to add
> pg_stat_get_backend_memory_context() which target is limited to the
> local backend process.

+1


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Robert Haas
On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
<[hidden email]> wrote:
> > As a first step, to deal with (3) I'd like to add
> > pg_stat_get_backend_memory_context() which target is limited to the
> > local backend process.
>
> +1

+1 from me, too. Something that exposed this via shared memory would
be quite useful, and there are other things we'd like to expose
similarly, such as the plan(s) from the running queries, or even just
the untruncated query string(s). I'd like to have a good
infrastructure for that sort of thing, but I think it's quite tricky
to do properly. You need a variable-size chunk of shared memory, so
probably it has to use DSM somehow, and you need to update it as
things change, but if you update it too frequently performance will
stink. If you ping processes to do the updates, how do you know when
they've completed them, and what happens if they don't respond in a
timely fashion? These are probably all solvable problems, but I don't
think they are very easy ones.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Kasahara Tatsuhito
In reply to this post by Fujii Masao-4
Hi !

On Thu, Jun 18, 2020 at 12:56 PM Fujii Masao
<[hidden email]> wrote:
> Agreed. The feature to view how local memory contexts are used in
> each process is very useful!
+1

> >    =# SELECT * FROM pg_stat_get_backend_memory_context(PID);
>
> I'm afraid that this interface is not convenient when we want to monitor
> the usages of local memory contexts for all the processes. For example,
> I'd like to monitor how much memory is totally used to store prepared
> statements information. For that purpose, I wonder if it's more convenient
> to provide the view displaying the memory context usages for
> all the processes.
How about separating a function that examines memory consumption
trends for all processes and a function that examines memory
consumption for a particular phase of a particular process?

For the former, as Fujii said, the function shows the output limited
information for each context type. All processes calculation and
output the information at idle status.

I think the latter is useful for debugging and other purposes.
For example, imagine preparing a function for registration like the following.
=# SELECT pg_stat_get_backend_memory_context_regist (pid, context,
max_children, calc_point)

pid: A target process
context: The top level of the context of interest
max_children: Maximum number of output for the target context's children
 (similar to MemoryContextStatsInternal()'s max_children)
calc_point: Single or multiple position(s) to calculate and output
context information
 (Existing hooks such as planner_hook, executor_start, etc.. could be used. )

This function informs the target PID to output the information of the
specified context at the specified calc_point.
When the target PID process reaches the calc_point, it calculates and
output the context information one time to a file or DSM.

(Currently PostgreSQL has no formal ways of externally modifying the
parameters of a particular process, so it may need to be
implemented...)

Sometimes I want to know the memory usage in the planning phase or
others with a query_string and/or plan_tree that before target process
move to the idle status.
So it would be nice to retrieve memory usage at some arbitrary point in time !

Regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Bharath Rupireddy
Hi,

While going through the mail chain on relation, plan and catalogue
caching [1], I'm thinking on the lines that is there a way to know the
current relation, plan and catalogue cache sizes? If there is a way
already,  please ignore this and it would be grateful if someone point
me to that.

Posting this here as I felt it's relevant.

If there is no such way to know the cache sizes and other info such as
statistics, number of entries, cache misses, hits etc.  can the
approach discussed here be applied?

If the user knows the cache statistics and other information, may be
we can allow user to take appropriate actions such as allowing him to
delete few entries through a command or some other way.

I'm sorry, If I'm diverting the topic being discussed in this mail
thread, please ignore if it is irrelevant.

[1] - https://www.postgresql.org/message-id/flat/20161219.201505.11562604.horiguchi.kyotaro%40lab.ntt.co.jp

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Kasahara Tatsuhito
Hi,

On Fri, Jun 26, 2020 at 3:42 PM Bharath Rupireddy
<[hidden email]> wrote:
> While going through the mail chain on relation, plan and catalogue
> caching [1], I'm thinking on the lines that is there a way to know the
> current relation, plan and catalogue cache sizes? If there is a way
> already,  please ignore this and it would be grateful if someone point
> me to that.
AFAIK the only way to get statistics on PostgreSQL's backend  internal
local memory usage is to use MemoryContextStats() via gdb to output
the information to the log, so far.

> If there is no such way to know the cache sizes and other info such as
> statistics, number of entries, cache misses, hits etc.  can the
> approach discussed here be applied?
I think it's partially yes.

> If the user knows the cache statistics and other information, may be
> we can allow user to take appropriate actions such as allowing him to
> delete few entries through a command or some other way.
Yeah, one of the purposes of the features we are discussing here is to
use them for such situation.

Regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

torikoshia
In reply to this post by Robert Haas
On 2020-06-20 03:11, Robert Haas wrote:
> On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
> <[hidden email]> wrote:
>> > As a first step, to deal with (3) I'd like to add
>> > pg_stat_get_backend_memory_context() which target is limited to the
>> > local backend process.
>>
>> +1
>
> +1 from me, too.

Attached a patch that adds a function exposing memory usage of local
backend.

It's almost same as pg_cheat_funcs's pg_stat_get_memory_context().
I've also added MemoryContexts identifier because it seems useful to
distinguish the same kind of memory contexts.

For example, when there are many prepared statements we can
distinguish them using identifiers, which shows the cached query.

   =# SELECT name, ident FROM pg_stat_get_memory_contexts() WHERE name =
'CachedPlanSource';
          name       |            ident
   ------------------+--------------------------------
    CachedPlanSource | PREPARE q1(text) AS SELECT ..;
    CachedPlanSource | PREPARE q2(text) AS SELECT ..;
   (2 rows)


> Something that exposed this via shared memory would
> be quite useful, and there are other things we'd like to expose
> similarly, such as the plan(s) from the running queries, or even just
> the untruncated query string(s). I'd like to have a good
> infrastructure for that sort of thing, but I think it's quite tricky
> to do properly. You need a variable-size chunk of shared memory, so
> probably it has to use DSM somehow, and you need to update it as
> things change, but if you update it too frequently performance will
> stink. If you ping processes to do the updates, how do you know when
> they've completed them, and what happens if they don't respond in a
> timely fashion? These are probably all solvable problems, but I don't
> think they are very easy ones.
Thanks for your comments!

It seems hard as you pointed out.
I'm going to consider it along with the advice of Fujii and Kasahara.

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

0001-Adding-a-function-exposing-memory-usage-of-local-backend.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Fujii Masao-4


On 2020/06/29 12:01, torikoshia wrote:

> On 2020-06-20 03:11, Robert Haas wrote:
>> On Wed, Jun 17, 2020 at 11:56 PM Fujii Masao
>> <[hidden email]> wrote:
>>> > As a first step, to deal with (3) I'd like to add
>>> > pg_stat_get_backend_memory_context() which target is limited to the
>>> > local backend process.
>>>
>>> +1
>>
>> +1 from me, too.
>
> Attached a patch that adds a function exposing memory usage of local
> backend.

Thanks for the patch!
Could you add this patch to Commitfest 2020-07?

>
> It's almost same as pg_cheat_funcs's pg_stat_get_memory_context().

This patch provides only the function, but isn't it convenient to
provide the view like pg_shmem_allocations?


> I've also added MemoryContexts identifier because it seems useful to
> distinguish the same kind of memory contexts.

Sounds good. But isn't it better to document each column?
Otherwise, users cannot undersntad what "ident" column indicates.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Bharath Rupireddy
In reply to this post by Kasahara Tatsuhito
> > If there is no such way to know the cache sizes and other info such as
> > statistics, number of entries, cache misses, hits etc.  can the
> > approach discussed here be applied?
> I think it's partially yes.
>

> > If the user knows the cache statistics and other information, may be
> > we can allow user to take appropriate actions such as allowing him to
> > delete few entries through a command or some other way.
> Yeah, one of the purposes of the features we are discussing here is to
> use them for such situation.
>

Thanks Kasahara for the response. I will try to understand more about
getting the cache statistics and
also will study the possibility of applying this approach being
discussed here in this thread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

torikoshia
In reply to this post by Fujii Masao-4
On Mon, Jun 29, 2020 at 3:13 PM Fujii Masao
<[hidden email]> wrote:

> Could you add this patch to Commitfest 2020-07?

Thanks for notifying, I've added it.
BTW, I registered you as an author because this patch used
lots of pg_cheat_funcs' codes.

   https://commitfest.postgresql.org/28/2622/

> This patch provides only the function, but isn't it convenient to
> provide the view like pg_shmem_allocations?

> Sounds good. But isn't it better to document each column?
> Otherwise, users cannot undersntad what "ident" column indicates.

Agreed.
Attached a patch for creating a view for local memory context
and its explanation on the document.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

0002-Adding-a-function-exposing-memory-usage-of-local-backend.patch (21K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Fujii Masao-4


On 2020/07/01 14:48, torikoshia wrote:
> On Mon, Jun 29, 2020 at 3:13 PM Fujii Masao <[hidden email]> wrote:
>
>> Could you add this patch to Commitfest 2020-07?
>
> Thanks for notifying, I've added it.
> BTW, I registered you as an author because this patch used
> lots of pg_cheat_funcs' codes.
>
>    https://commitfest.postgresql.org/28/2622/

Thanks!

>
>> This patch provides only the function, but isn't it convenient to
>> provide the view like pg_shmem_allocations?
>
>> Sounds good. But isn't it better to document each column?
>> Otherwise, users cannot undersntad what "ident" column indicates.
>
> Agreed.
> Attached a patch for creating a view for local memory context
> and its explanation on the document.

Thanks for updating the patch!

You treat pg_stat_local_memory_contexts view as a dynamic statistics view.
But isn't it better to treat it as just system view like pg_shmem_allocations
or pg_prepared_statements  because it's not statistics information? If yes,
we would need to rename the view, move the documentation from
monitoring.sgml to catalogs.sgml, and move the code from pgstat.c to
the more appropriate source file.

+ tupdesc = rsinfo->setDesc;
+ tupstore = rsinfo->setResult;

These seem not to be necessary.

+ /*
+ * It seems preferable to label dynahash contexts with just the hash table
+ * name.  Those are already unique enough, so the "dynahash" part isn't
+ * very helpful, and this way is more consistent with pre-v11 practice.
+ */
+ if (ident && strcmp(name, "dynahash") == 0)
+ {
+ name = ident;
+ ident = NULL;
+ }

IMO it seems better to report both name and ident even in the case of
dynahash than report only ident (as name). We can easily understand
the context is used for dynahash when it's reported. But you think it's
better to report NULL rather than "dynahash"?

+/* ----------
+ * The max bytes for showing identifiers of MemoryContext.
+ * ----------
+ */
+#define MEMORY_CONTEXT_IDENT_SIZE 1024

Do we really need this upper size limit? Could you tell me why
this is necessary?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Daniel Gustafsson
In reply to this post by torikoshia
> On 1 Jul 2020, at 07:48, torikoshia <[hidden email]> wrote:

> Attached a patch for creating a view for local memory context
> and its explanation on the document.

For the next version (if there will be one), please remove the catversion bump
from the patch as it will otherwise just break patch application without
constant rebasing (as it's done now).  The committer will handle the catversion
change if/when it gets committed.

cheers ./daniel

Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

torikoshia
In reply to this post by Fujii Masao-4
On Wed, Jul 1, 2020 at 4:43 PM Fujii Masao <[hidden email]>
wrote:

Thanks for reviewing!

> You treat pg_stat_local_memory_contexts view as a dynamic statistics
> view.
> But isn't it better to treat it as just system view like
> pg_shmem_allocations
> or pg_prepared_statements  because it's not statistics information? If
> yes,
> we would need to rename the view, move the documentation from
> monitoring.sgml to catalogs.sgml, and move the code from pgstat.c to
> the more appropriate source file.

Agreed.
At first, I thought not only statistical but dynamic information about
exactly
what is going on was OK when reading the sentence on the manual below.

> PostgreSQL also supports reporting dynamic information about exactly
> what is going on in the system right now, such as the exact command
> currently being executed by other server processes, and which other
> connections exist in the system. This facility is independent of the
> collector process.

However, now I feel something strange because existing pg_stats_* views
seem
to be per cluster information but the view I'm adding is about per
backend
information.

I'm going to do some renaming and transportations.

- view name: pg_memory_contexts
- function name: pg_get_memory_contexts()
- source file: mainly src/backend/utils/mmgr/mcxt.c


> +       tupdesc = rsinfo->setDesc;
> +       tupstore = rsinfo->setResult;
>
> These seem not to be necessary.

Thanks!

> +       /*
> +        * It seems preferable to label dynahash contexts with just the
> hash table
> +        * name.  Those are already unique enough, so the "dynahash"
> part isn't
> +        * very helpful, and this way is more consistent with pre-v11
> practice.
> +        */
> +       if (ident && strcmp(name, "dynahash") == 0)
> +       {
> +               name = ident;
> +               ident = NULL;
> +       }
>
> IMO it seems better to report both name and ident even in the case of
> dynahash than report only ident (as name). We can easily understand
> the context is used for dynahash when it's reported. But you think it's
> better to report NULL rather than "dynahash"?

These codes come from MemoryContextStatsPrint() and my intension is to
keep consistent with it.

> +/* ----------
> + * The max bytes for showing identifiers of MemoryContext.
> + * ----------
> + */
> +#define MEMORY_CONTEXT_IDENT_SIZE      1024
>
> Do we really need this upper size limit? Could you tell me why
> this is necessary?

It also derived from MemoryContextStatsPrint().
I suppose it is for keeping readability of the log..

I'm going to follow the discussion on the mailing list and find why
these codes were introduced.
If there's no important reason to do the same in our context, I'll
change them.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

torikoshia
In reply to this post by Daniel Gustafsson
On 2020-07-01 20:47, Daniel Gustafsson wrote:

> For the next version (if there will be one), please remove the
> catversion bump
> from the patch as it will otherwise just break patch application
> without
> constant rebasing (as it's done now).  The committer will handle the
> catversion
> change if/when it gets committed.
>
> cheers ./daniel

Thanks for telling me it!
I'll do that way next time.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Fujii Masao-4
In reply to this post by torikoshia


On 2020/07/01 22:15, torikoshia wrote:

> On Wed, Jul 1, 2020 at 4:43 PM Fujii Masao <[hidden email]> wrote:
>
> Thanks for reviewing!
>
>> You treat pg_stat_local_memory_contexts view as a dynamic statistics view.
>> But isn't it better to treat it as just system view like pg_shmem_allocations
>> or pg_prepared_statements  because it's not statistics information? If yes,
>> we would need to rename the view, move the documentation from
>> monitoring.sgml to catalogs.sgml, and move the code from pgstat.c to
>> the more appropriate source file.
>
> Agreed.
> At first, I thought not only statistical but dynamic information about exactly
> what is going on was OK when reading the sentence on the manual below.
>
>> PostgreSQL also supports reporting dynamic information about exactly what is going on in the system right now, such as the exact command currently being executed by other server processes, and which other connections exist in the system. This facility is independent of the collector process.
>
> However, now I feel something strange because existing pg_stats_* views seem
> to be per cluster information but the view I'm adding is about per backend
> information.
>
> I'm going to do some renaming and transportations.
>
> - view name: pg_memory_contexts
> - function name: pg_get_memory_contexts()
> - source file: mainly src/backend/utils/mmgr/mcxt.c
>
>
>> +       tupdesc = rsinfo->setDesc;
>> +       tupstore = rsinfo->setResult;
>>
>> These seem not to be necessary.
>
> Thanks!
>
>> +       /*
>> +        * It seems preferable to label dynahash contexts with just the hash table
>> +        * name.  Those are already unique enough, so the "dynahash" part isn't
>> +        * very helpful, and this way is more consistent with pre-v11 practice.
>> +        */
>> +       if (ident && strcmp(name, "dynahash") == 0)
>> +       {
>> +               name = ident;
>> +               ident = NULL;
>> +       }
>>
>> IMO it seems better to report both name and ident even in the case of
>> dynahash than report only ident (as name). We can easily understand
>> the context is used for dynahash when it's reported. But you think it's
>> better to report NULL rather than "dynahash"?
>
> These codes come from MemoryContextStatsPrint() and my intension is to
> keep consistent with it.

Ok, understood! I agree that it's strange to display different names
for the same memory context between this view and logging.

It's helpful if the comment there refers to MemoryContextStatsPrint()
and mentions the reason that you told.


>
>> +/* ----------
>> + * The max bytes for showing identifiers of MemoryContext.
>> + * ----------
>> + */
>> +#define MEMORY_CONTEXT_IDENT_SIZE      1024
>>
>> Do we really need this upper size limit? Could you tell me why
>> this is necessary?
>
> It also derived from MemoryContextStatsPrint().
> I suppose it is for keeping readability of the log..

Understood. I may want to change the upper limit of query size to display.
But at the first step, I'm fine with limitting 1024 bytes.


>
> I'm going to follow the discussion on the mailing list and find why
> these codes were introduced.

https://www.postgresql.org/message-id/12319.1521999065%40sss.pgh.pa.us

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

torikoshia
On Wed, Jul 1, 2020 at 10:15 PM torikoshia <[hidden email]>
wrote:
> I'm going to do some renaming and transportations.
>
> - view name: pg_memory_contexts
> - function name: pg_get_memory_contexts()
> - source file: mainly src/backend/utils/mmgr/mcxt.c

Attached an updated patch.

On Wed, Jul 1, 2020 at 10:58 PM Fujii Masao
<[hidden email]> wrote:
> Ok, understood! I agree that it's strange to display different names
> for the same memory context between this view and logging.
>
> It's helpful if the comment there refers to MemoryContextStatsPrint()
> and mentions the reason that you told.

Agreed. I changed the comments.

> > It also derived from MemoryContextStatsPrint().
> > I suppose it is for keeping readability of the log..
>
> Understood. I may want to change the upper limit of query size to
> display.
> But at the first step, I'm fine with limitting 1024 bytes.

Thanks, I've left it as it was.

> > I'm going to follow the discussion on the mailing list and find why
> > these codes were introduced.
>
> https://www.postgresql.org/message-id/12319.1521999065%40sss.pgh.pa.us

Thanks for sharing!


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

0003-Adding-a-function-exposing-memory-usage-of-local-backend.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Fujii Masao-4


On 2020/07/03 11:45, torikoshia wrote:
> On Wed, Jul 1, 2020 at 10:15 PM torikoshia <[hidden email]> wrote:
>> I'm going to do some renaming and transportations.
>>
>> - view name: pg_memory_contexts

I like more specific name like pg_backend_memory_contexts.
But I'd like to hear more opinions about the name from others.


>> - function name: pg_get_memory_contexts()
>> - source file: mainly src/backend/utils/mmgr/mcxt.c
>
> Attached an updated patch.

Thanks for updating the patch!

+       <structfield>level</structfield> <type>integer</type>

In catalog.sgml, "int4" and "int8" are used in other catalogs tables.
So "integer" in the above should be "int4"?

+       <structfield>total_bytes</structfield> <type>bigint</type>

"bigint" should be "int8"?

+       Identification information of the memory context. This field is truncated if the identification field is longer than 1024 characters

"characters" should be "bytes"?

It's a bit confusing to have both "This field" and "the identification field"
in one description. What about just "This field is truncated at 1024 bytes"?

+      <para>
+       Total bytes requested from malloc

Isn't it better not to use "malloc" in the description? For example,
what about something like "Total bytes allocated for this memory context"?

+#define PG_STAT_GET_MEMORY_CONTEXT_COLS 9

Isn't it better to rename this to PG_GET_MEMORY_CONTEXTS_COLS
for the consistency with the function name?

+ memset(nulls, 0, sizeof(nulls));

"values[]" also should be initialized with zero?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

torikoshia
On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao <[hidden email]>
wrote:

Thanks for your review!

> I like more specific name like pg_backend_memory_contexts.

Agreed.

When I was trying to add this function as statistics function,
I thought that naming pg_stat_getbackend_memory_context()
might make people regarded it as a "per-backend statistics
function", whose parameter is backend ID number.
So I removed "backend", but now there is no necessity to do
so.

> But I'd like to hear more opinions about the name from others.

I changed the name to pg_backend_memory_contexts for the time
being.


>> - function name: pg_get_memory_contexts()
>> - source file: mainly src/backend/utils/mmgr/mcxt.c


>> +       Identification information of the memory context. This field
>> is truncated if the identification field is longer than 1024
>> characters
>
> "characters" should be "bytes"?

Fixed, but I used "characters" while referring to the
descriptions on the manual of pg_stat_activity.query
below.

| By default the query text is truncated at 1024 characters;

It has nothing to do with this thread, but considering
multibyte characters, it also may be better to change it
to "bytes".


Regarding the other comments, I revised the patch as you pointed.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

0004-Adding-a-function-exposing-memory-usage-of-local-backend.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

Fujii Masao-4


On 2020/07/06 12:12, torikoshia wrote:

> On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao <[hidden email]> wrote:
>
> Thanks for your review!
>
>> I like more specific name like pg_backend_memory_contexts.
>
> Agreed.
>
> When I was trying to add this function as statistics function,
> I thought that naming pg_stat_getbackend_memory_context()
> might make people regarded it as a "per-backend statistics
> function", whose parameter is backend ID number.
> So I removed "backend", but now there is no necessity to do
> so.
>
>> But I'd like to hear more opinions about the name from others.
>
> I changed the name to pg_backend_memory_contexts for the time
> being.

+1


>>> - function name: pg_get_memory_contexts()
>>> - source file: mainly src/backend/utils/mmgr/mcxt.c
>
>
>>> +       Identification information of the memory context. This field is truncated if the identification field is longer than 1024 characters
>>
>> "characters" should be "bytes"?
>
> Fixed, but I used "characters" while referring to the
> descriptions on the manual of pg_stat_activity.query
> below.
>
> | By default the query text is truncated at 1024 characters;
>
> It has nothing to do with this thread, but considering
> multibyte characters, it also may be better to change it
> to "bytes".

Yeah, I agree we should write the separate patch fixing that. You will?
If not, I will do that later.


> Regarding the other comments, I revised the patch as you pointed.

Thanks for updating the patch! The patch basically looks good to me/
Here are some minor comments:

+#define MEMORY_CONTEXT_IDENT_SIZE 1024

This macro varible name sounds like the maximum allowed length of ident that
each menory context has. But actually this limits the maximum bytes of ident
to display. So I think that it's better to rename this macro to something like
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE. Thought?

+#define PG_GET_MEMORY_CONTEXTS_COLS 9
+ Datum values[PG_GET_MEMORY_CONTEXTS_COLS];
+ bool nulls[PG_GET_MEMORY_CONTEXTS_COLS];

This macro variable name should be PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
for the consistency with the function name?

+{ oid => '2282', descr => 'statistics: information about all memory contexts of local backend',

Isn't it better to remove "statistics: " from the above description?

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>parent</structfield> <type>text</type>

There can be multiple memory contexts with the same name. So I'm afraid
that it's difficult to identify the actual parent memory context from this
"parent" column. This is ok when logging memory contexts by calling
MemoryContextStats() via gdb. Because child memory contexts are printed
just under their parent, with indents. But this doesn't work in the view.
To identify the actual parent memory or calculate the memory contexts tree
from the view, we might need to assign unique ID to each memory context
and display it. But IMO this is overkill. So I'm fine with current "parent"
column. Thought? Do you have any better idea?


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Creating a function for exposing memory usage of backend process

torikoshia
On 2020-07-06 15:16, Fujii Masao wrote:

> On 2020/07/06 12:12, torikoshia wrote:
>> On Fri, Jul 3, 2020 at 7:33 PM Fujii Masao
>> <[hidden email]> wrote:
>>
>> Thanks for your review!
>>
>>> I like more specific name like pg_backend_memory_contexts.
>>
>> Agreed.
>>
>> When I was trying to add this function as statistics function,
>> I thought that naming pg_stat_getbackend_memory_context()
>> might make people regarded it as a "per-backend statistics
>> function", whose parameter is backend ID number.
>> So I removed "backend", but now there is no necessity to do
>> so.
>>
>>> But I'd like to hear more opinions about the name from others.
>>
>> I changed the name to pg_backend_memory_contexts for the time
>> being.
>
> +1
>
>
>>>> - function name: pg_get_memory_contexts()
>>>> - source file: mainly src/backend/utils/mmgr/mcxt.c
>>
>>
>>>> +       Identification information of the memory context. This field
>>>> is truncated if the identification field is longer than 1024
>>>> characters
>>>
>>> "characters" should be "bytes"?
>>
>> Fixed, but I used "characters" while referring to the
>> descriptions on the manual of pg_stat_activity.query
>> below.
>>
>> | By default the query text is truncated at 1024 characters;
>>
>> It has nothing to do with this thread, but considering
>> multibyte characters, it also may be better to change it
>> to "bytes".
>
> Yeah, I agree we should write the separate patch fixing that. You will?
> If not, I will do that later.
Thanks, I will try it!

>> Regarding the other comments, I revised the patch as you pointed.
>
> Thanks for updating the patch! The patch basically looks good to me/
> Here are some minor comments:
>
> +#define MEMORY_CONTEXT_IDENT_SIZE 1024
>
> This macro varible name sounds like the maximum allowed length of ident
> that
> each menory context has. But actually this limits the maximum bytes of
> ident
> to display. So I think that it's better to rename this macro to
> something like
> MEMORY_CONTEXT_IDENT_DISPLAY_SIZE. Thought?
Agreed.
MEMORY_CONTEXT_IDENT_DISPLAY_SIZE seems more accurate.

> +#define PG_GET_MEMORY_CONTEXTS_COLS 9
> + Datum values[PG_GET_MEMORY_CONTEXTS_COLS];
> + bool nulls[PG_GET_MEMORY_CONTEXTS_COLS];
>
> This macro variable name should be PG_GET_BACKEND_MEMORY_CONTEXTS_COLS
> for the consistency with the function name?

Thanks! Fixed it.

>
> +{ oid => '2282', descr => 'statistics: information about all memory
> contexts of local backend',
>
> Isn't it better to remove "statistics: " from the above description?

Yeah, it's my oversight.

>
> +     <row>
> +      <entry role="catalog_table_entry"><para
> role="column_definition">
> +       <structfield>parent</structfield> <type>text</type>
>
> There can be multiple memory contexts with the same name. So I'm afraid
> that it's difficult to identify the actual parent memory context from
> this
> "parent" column. This is ok when logging memory contexts by calling
> MemoryContextStats() via gdb. Because child memory contexts are printed
> just under their parent, with indents. But this doesn't work in the
> view.
> To identify the actual parent memory or calculate the memory contexts
> tree
> from the view, we might need to assign unique ID to each memory context
> and display it. But IMO this is overkill. So I'm fine with current
> "parent"
> column. Thought? Do you have any better idea?
Indeed.
I also feel it's not usual to assign a unique ID, which
can vary every time the view displayed.

We show each context using a recursive function and this is
a kind of depth-first search. So, as far as I understand,
we can identify its parent using both the "parent" column
and the order of the rows.

Documenting these things may worth for who want to identify
the relation between parents and children.

Of course, in the relational model, the order of relation
does not have meaning so it's also unusual in this sense..


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

0005-Adding-a-function-exposing-memory-usage-of-local-backend.patch (16K) Download Attachment
12