[Patch] Optimize dropping of relation buffers using dlist

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
262 messages Options
12345678 ... 14
Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

k.jamison@fujitsu.com
On Monday, September 28, 2020 11:50 AM, Tsunakawa-san wrote:

> From: Amit Kapila <[hidden email]>
> > I agree with the above two points.
>
> Thank you.  I'm relieved to know I didn't misunderstand.
>
>
> > > * Then, add a new function, say, smgrnblocks_cached() that simply
> > > returns
> > the cached block count, and DropRelFileNodeBuffers() uses it instead
> > of smgrnblocks().
> > >
> >
> > I am not sure if it worth adding a new function for this. Why not
> > simply add a boolean variable in smgrnblocks for this?
>
>
> One reason is that adding an argument requires modification of existing call
> sites (10 + a few).  Another is that, although this may be different for each
> person's taste, it's sometimes not easy to understand when a function call
> with true/false appears.  One such example is find_XXX(some_args,
> true/false), where the true/false represents missing_ok.  Another example is
> as follows.  I often wonder "what's the meaning of this false, and that true?"
>
>     if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, false))
>         elog(ERROR, "InstallXLogFileSegment should not have failed");
>
> Fortunately, the new function is very short and doesn't duplicate much code.
> The function is a simple getter and the function name can convey the
> meaning straight (if the name is good.)
>
>
> > BTW, AFAICS, the latest patch
> > doesn't have code to address this point.
>
> Kirk-san, can you address this?  I don't mind much if you add an argument
> or a new function.

I maybe missing something. so I'd like to check if my understanding is correct,
as I'm confused with what do we mean exactly by "cached value of nblocks".

Discussed upthread, smgrnblocks() does not always guarantee that it returns a
"cached" nblocks even in recovery.
When we enter this path in recovery path of DropRelFileNodeBuffers,
according to Tsunakawa-san:
>> * During recovery, DropRelFileNodeBuffers() gets the cached size of the relation fork.  If it is cached, trust it and optimize the buffer invalidation.  If it's not cached, we can't trust the return value of smgrnblocks() because it's the lseek(END) return value, so we avoid the optimization.

+ nTotalBlocks = smgrnblocks(smgr_reln, forkNum[j]);

But this comment in the smgrnblocks source code:
         * For now, we only use cached values in recovery due to lack of a shared
         * invalidation mechanism for changes in file size.
         */
        if (InRecovery && reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
                return reln->smgr_cached_nblocks[forknum];

So the nblocks returned in DropRelFileNodeBuffers are still not guaranteed to be "cached values"?
And that we want to add a new function (I think it's the lesser complicated way than modifying smgrnblocks):

/*
 * smgrnblocksvalid() -- Calculate the number of blocks that are cached in
 * the supplied relation.
 *
 * It is equivalent to calling smgrnblocks, but only used in recovery for now
 * when DropRelFileNodeBuffers() is called, to ensure that only cached value
 * is used, which is always valid.
 *
 * This returns an InvalidBlockNumber when smgr_cached_nblocks is not available
 * and when isCached is false.
 */
BlockNumber
smgrnblocksvalid(SMgrRelation reln, ForkNumber forknum, bool isCached)
{
        BlockNumber result;

        /*
         * For now, we only use cached values in recovery due to lack of a shared
         * invalidation mechanism for changes in file size.
         */
        if (InRecovery && if reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber
                && isCached)
                        return reln->smgr_cached_nblocks[forknum];
        }

        result = smgrsw[reln->smgr_which].smgr_nblocks(reln, forknum);

        reln->smgr_cached_nblocks[forknum] = result;

        if (!InRecovery && !isCached)
                return InvalidBlockNumber;

        return result;
}

Then in DropRelFileNodeBuffers
+ nTotalBlocks = smgrcachednblocks(smgr_reln, forkNum[j], true);

Is my understanding above correct?

Regards,
Kirk Jamison
Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

tsunakawa.takay@fujitsu.com
        From: Jamison, Kirk/ジャミソン カーク <[hidden email]>
> Is my understanding above correct?

No.  I simply meant DropRelFileNodeBuffers() calls the following function, and avoids the optimization if it returns InvalidBlockNumber.


BlockNumber
smgrcachednblocks(SMgrRelation reln, ForkNumber forknum)
{
        return reln->smgr_cached_nblocks[forknum];
}


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

k.jamison@fujitsu.com
On Monday, September 28, 2020 5:08 PM, Tsunakawa-san wrote:

> From: Jamison, Kirk/ジャミソン カーク <[hidden email]>
> > Is my understanding above correct?
>
> No.  I simply meant DropRelFileNodeBuffers() calls the following function,
> and avoids the optimization if it returns InvalidBlockNumber.
>
>
> BlockNumber
> smgrcachednblocks(SMgrRelation reln, ForkNumber forknum) {
> return reln->smgr_cached_nblocks[forknum];
> }
Thank you for clarifying.

So in the new function, it goes something like:
        if (InRecovery)
        {
                if (reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
                        return reln->smgr_cached_nblocks[forknum];
                else
                        return InvalidBlockNumber;
        }

I've revised the patch and added the new function accordingly in the attached file.
I also did not remove the duplicate code from smgrnblocks because Amit-san mentioned
that when the caching for non-recovery cases is implemented, we can use it
for non-recovery cases as well.

Although I am not sure if the way it's written in DropRelFileNodeBuffers is okay.
BlockNumberIsValid(nTotalBlocks)
 
                        nTotalBlocks = smgrcachednblocks(smgr_reln, forkNum[j]);
                        nBlocksToInvalidate = nTotalBlocks - firstDelBlock[j];

                        if (BlockNumberIsValid(nTotalBlocks) &&
                                nBlocksToInvalidate < BUF_DROP_FULLSCAN_THRESHOLD)
                        {
                                //enter optimization loop
                        }
                        else
                        {
                                //full scan for each fork  
                        }

Regards,
Kirk Jamison

v17-Optimize-DropRelFileNodeBuffers-during-recovery.patch (13K) Download Attachment
v1-Prevent-invalidating-blocks-in-smgrextend-during-recovery.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] Optimize dropping of relation buffers using dlist

Kyotaro Horiguchi-4
At Mon, 28 Sep 2020 08:57:36 +0000, "[hidden email]" <[hidden email]> wrote in

> On Monday, September 28, 2020 5:08 PM, Tsunakawa-san wrote:
>
> > From: Jamison, Kirk/ジャミソン カーク <[hidden email]>
> > > Is my understanding above correct?
> >
> > No.  I simply meant DropRelFileNodeBuffers() calls the following function,
> > and avoids the optimization if it returns InvalidBlockNumber.
> >
> >
> > BlockNumber
> > smgrcachednblocks(SMgrRelation reln, ForkNumber forknum) {
> > return reln->smgr_cached_nblocks[forknum];
> > }
>
> Thank you for clarifying.

FWIW, I (and maybe Amit) am thinking that the property we need here is
not it is cached or not but the accuracy of the returned file length,
and that the "cached" property should be hidden behind the API.

Another reason for not adding this function is the cached value is not
really reliable on non-recovery environment.

> So in the new function, it goes something like:
> if (InRecovery)
> {
> if (reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
> return reln->smgr_cached_nblocks[forknum];
> else
> return InvalidBlockNumber;
> }

If we add the new function, it should reutrn InvalidBlockNumber
without consulting smgr_nblocks().

> I've revised the patch and added the new function accordingly in the attached file.
> I also did not remove the duplicate code from smgrnblocks because Amit-san mentioned
> that when the caching for non-recovery cases is implemented, we can use it
> for non-recovery cases as well.
>
> Although I am not sure if the way it's written in DropRelFileNodeBuffers is okay.
> BlockNumberIsValid(nTotalBlocks)
>  
> nTotalBlocks = smgrcachednblocks(smgr_reln, forkNum[j]);
> nBlocksToInvalidate = nTotalBlocks - firstDelBlock[j];
>
> if (BlockNumberIsValid(nTotalBlocks) &&
> nBlocksToInvalidate < BUF_DROP_FULLSCAN_THRESHOLD)
> {
> //enter optimization loop
> }
> else
> {
> //full scan for each fork  
> }

Hmm. The current loop in DropRelFileNodeBuffers looks like this:

    if (InRecovery)
           for (for each forks)
              if (the fork meets the criteria)
                     <optimized dropping>
          else
                     <full scan>

I think this is somewhat different from the current
discussion. Whether we sum-up the number of blcoks for all forks or
just use that of the main fork, we should take full scan if we failed
to know the accurate size for any one of the forks. (In other words,
it is stupid that we run a full scan for more than one fork at a
drop.)

Come to think of that, we can naturally sum-up all forks' blocks since
anyway we need to call smgrnblocks for all forks to know the
optimzation is usable.

So that block would be something like this:

    for (forks of the rel)
          /* the function returns InvalidBlockNumber if !InRecovery */
          if (smgrnblocks returned InvalidBlockNumber)
             total_blocks = InvalidBlockNumber;
                 break;
      total_blocks += nbloks of this fork

    /* <we could rely on the fact that InvalidBlockNumber is zero> */
    if (total_blocks != InvalidBlockNumber && total_blocks < threshold)
      for (forks of the rel)
              for (blocks of the fork)
             <try dropping the buffer for the block>
    else
       <full scan dropping>

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

tsunakawa.takay@fujitsu.com
In reply to this post by k.jamison@fujitsu.com
From: Jamison, Kirk/ジャミソン カーク <[hidden email]>
> I also did not remove the duplicate code from smgrnblocks because Amit-san
> mentioned that when the caching for non-recovery cases is implemented, we
> can use it for non-recovery cases as well.

But the extra code is not used now.  The code for future usage should be added when it becomes necessary.  Duplicate code may make people think that you should add an argument to smgrnblocks() instead of adding a new function.

+ if (reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
+ return reln->smgr_cached_nblocks[forknum];
+ else
+ return InvalidBlockNumber;

Anyway, the else block is redundant, as the variable contains InvalidBlockNumber.

Also, as Amit-san mentioned, the cause of the slight performance regression when shared_buffers is small needs to be investigated and addressed.  I think you can do it after sharing the performance result with a large shared_buffers.

I found no other problem.


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: [Patch] Optimize dropping of relation buffers using dlist

akapila
On Tue, Sep 29, 2020 at 7:21 AM [hidden email]
<[hidden email]> wrote:
>
> From: Jamison, Kirk/ジャミソン カーク <[hidden email]>
>
> Also, as Amit-san mentioned, the cause of the slight performance regression when shared_buffers is small needs to be investigated and addressed.
>

Yes, I think it is mainly because extra instructions added in the
optimized code which doesn't make up for the loss when the size of
shared buffers is small.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

k.jamison@fujitsu.com
In reply to this post by Kyotaro Horiguchi-4
On Tuesday, September 29, 2020 10:35 AM, Horiguchi-san wrote:

> FWIW, I (and maybe Amit) am thinking that the property we need here is not it
> is cached or not but the accuracy of the returned file length, and that the
> "cached" property should be hidden behind the API.
>
> Another reason for not adding this function is the cached value is not really
> reliable on non-recovery environment.
>
> > So in the new function, it goes something like:
> > if (InRecovery)
> > {
> > if (reln->smgr_cached_nblocks[forknum] !=
> InvalidBlockNumber)
> > return reln->smgr_cached_nblocks[forknum];
> > else
> > return InvalidBlockNumber;
> > }
>
> If we add the new function, it should reutrn InvalidBlockNumber without
> consulting smgr_nblocks().
So here's how I revised it
smgrcachednblocks(SMgrRelation reln, ForkNumber forknum)
{
        if (InRecovery)
        {
                if (reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
                        return reln->smgr_cached_nblocks[forknum];
        }
        return InvalidBlockNumber;


> Hmm. The current loop in DropRelFileNodeBuffers looks like this:
>
>     if (InRecovery)
>   for (for each forks)
>      if (the fork meets the criteria)
>     <optimized dropping>
>           else
>     <full scan>
>
> I think this is somewhat different from the current discussion. Whether we
> sum-up the number of blcoks for all forks or just use that of the main fork, we
> should take full scan if we failed to know the accurate size for any one of the
> forks. (In other words, it is stupid that we run a full scan for more than one
> fork at a
> drop.)
>
> Come to think of that, we can naturally sum-up all forks' blocks since anyway
> we need to call smgrnblocks for all forks to know the optimzation is usable.
I understand. We really don't have to enter the optimization when we know the
file size is inaccurate. That also makes the patch simpler.

> So that block would be something like this:
>
>     for (forks of the rel)
>  /* the function returns InvalidBlockNumber if !InRecovery */
>  if (smgrnblocks returned InvalidBlockNumber)
>     total_blocks = InvalidBlockNumber;
> break;
>       total_blocks += nbloks of this fork
>
>     /* <we could rely on the fact that InvalidBlockNumber is zero> */
>     if (total_blocks != InvalidBlockNumber && total_blocks < threshold)
>       for (forks of the rel)
>      for (blocks of the fork)
>              <try dropping the buffer for the block>
>     else
>        <full scan dropping>
I followed this logic in the attached patch.
Thank you very much for the thoughtful reviews.

Performance measurement for large shared buffers to follow.

Best regards,
Kirk Jamison


v18-Optimize-DropRelFileNodeBuffers-during-recovery.patch (14K) Download Attachment
v1-Prevent-invalidating-blocks-in-smgrextend-during-recovery.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

k.jamison@fujitsu.com
Hi,

I revised the patch again. Attached is V19.
The previous patch's algorithm missed entering the optimization loop.
So I corrected that and removed the extra function I added
in the previous versions.

The revised patch goes something like this:
        for (forks of rel)
        {
                if (smgrcachednblocks() == InvalidBlockNumber)
                        break; //go to full scan
                if (nBlocksToInvalidate < buf_full_scan_threshold)
                        for (blocks of the fork)
                else
                        break; //go to full scan
        }
        <execute full scan>

Recovery performance measurement results below.
But it seems there are overhead even with large shared buffers.

| s_b   | master | patched | %reg  |
|-------|--------|---------|-------|
| 128MB | 36.052 | 39.451  | 8.62% |
| 1GB   | 21.731 | 21.73   | 0.00% |
| 20GB  | 24.534 | 25.137  | 2.40% |
| 100GB | 30.54  | 31.541  | 3.17% |

I'll investigate further. Or if you have any feedback or advice, I'd appreciate it.

Machine specs used for testing:
RHEL7, 8 core, 256 GB RAM, xfs

Configuration:
wal_level = replica
autovacuum = off
full_page_writes = off

# For streaming replication from primary.
synchronous_commit = remote_write
synchronous_standby_names = ''

# For Standby.
#hot_standby = on
#primary_conninfo

shared_buffers = 128MB
# 1GB, 20GB, 100GB

Just in case it helps for some understanding,
I also attached the recovery log 018_wal_optimize_node_replica.log
with some ereport that prints whether we enter the optimization loop or do full scan.

Regards,
Kirk Jamison

v19-Optimize-DropRelFileNodeBuffers-during-recovery.patch (11K) Download Attachment
v1-Prevent-invalidating-blocks-in-smgrextend-during-recovery.patch (1K) Download Attachment
018_wal_optimize_node_replica.log (214K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

tsunakawa.takay@fujitsu.com
From: Jamison, Kirk/ジャミソン カーク <[hidden email]>
> Recovery performance measurement results below.
> But it seems there are overhead even with large shared buffers.
>
> | s_b   | master | patched | %reg  |
> |-------|--------|---------|-------|
> | 128MB | 36.052 | 39.451  | 8.62% |
> | 1GB   | 21.731 | 21.73   | 0.00% |
> | 20GB  | 24.534 | 25.137  | 2.40% |
> | 100GB | 30.54  | 31.541  | 3.17% |

Did you really check that the optimization path is entered and the traditional path is never entered?

With the following code, when the main fork does not meet the optimization criteria, other forks are not optimized as well.  You want to determine each fork's optimization separately, don't you?

+ /* If blocks are invalid, exit the optimization and execute full scan */
+ if (nTotalBlocks == InvalidBlockNumber)
+ break;


+ else
+ break;
+ }
  for (i = 0; i < NBuffers; i++)


Regards
Takayuki Tsunakawa





Reply | Threaded
Open this post in threaded view
|

Re: [Patch] Optimize dropping of relation buffers using dlist

akapila
On Thu, Oct 1, 2020 at 8:11 AM [hidden email]
<[hidden email]> wrote:

>
> From: Jamison, Kirk/ジャミソン カーク <[hidden email]>
> > Recovery performance measurement results below.
> > But it seems there are overhead even with large shared buffers.
> >
> > | s_b   | master | patched | %reg  |
> > |-------|--------|---------|-------|
> > | 128MB | 36.052 | 39.451  | 8.62% |
> > | 1GB   | 21.731 | 21.73   | 0.00% |
> > | 20GB  | 24.534 | 25.137  | 2.40% |
> > | 100GB | 30.54  | 31.541  | 3.17% |
>
> Did you really check that the optimization path is entered and the traditional path is never entered?
>

I have one idea for performance testing. We can even test this for
non-recovery paths by removing the recovery-related check like only
use it when there are cached blocks. You can do this if testing via
recovery path is difficult because at the end performance should be
same for recovery and non-recovery paths.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

tsunakawa.takay@fujitsu.com
From: Amit Kapila <[hidden email]>
> I have one idea for performance testing. We can even test this for
> non-recovery paths by removing the recovery-related check like only
> use it when there are cached blocks. You can do this if testing via
> recovery path is difficult because at the end performance should be
> same for recovery and non-recovery paths.

That's a good idea.


Regards
Takayuki Tsunakawa


Reply | Threaded
Open this post in threaded view
|

Re: [Patch] Optimize dropping of relation buffers using dlist

Kyotaro Horiguchi-4
In reply to this post by tsunakawa.takay@fujitsu.com
At Thu, 1 Oct 2020 02:40:52 +0000, "[hidden email]" <[hidden email]> wrote in
> With the following code, when the main fork does not meet the
> optimization criteria, other forks are not optimized as well.  You
> want to determine each fork's optimization separately, don't you?

In more detail, if smgrcachednblocks() returned InvalidBlockNumber for
any of the forks, we should give up the optimization at all since we
need to run a full scan anyway.  On the other hand, if any of the
forks is smaller than the threshold, we still can use the optimization
when we know the accurate block number of all the forks.

Still, I prefer to use total block number of all forks since we anyway
visit the all forks.  Is there any reason to exlucde forks other than
the main fork while we visit all of them already?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

tsunakawa.takay@fujitsu.com
From: Kyotaro Horiguchi <[hidden email]>
> In more detail, if smgrcachednblocks() returned InvalidBlockNumber for
> any of the forks, we should give up the optimization at all since we
> need to run a full scan anyway.  On the other hand, if any of the
> forks is smaller than the threshold, we still can use the optimization
> when we know the accurate block number of all the forks.

Ah, I got your point (many eyes in open source development is nice.)  Still, I feel it's better to treat each fork separately, because the inner loop in the traditional path may be able to skip forks that have been already processed in the optimization path.  For example, if the forks[] array contains {fsm, vm, main} in this order (I know main is usually put at the beginning), fsm and vm are processed in the optimization path and the inner loop in the traditional path can skip fsm and vm.

> Still, I prefer to use total block number of all forks since we anyway
> visit the all forks.  Is there any reason to exlucde forks other than
> the main fork while we visit all of them already?

When the number of cached blocks for a main fork is below the threshold but the total cached blocks of all forks exceeds the threshold, the optimization is skipped.  I think it's mottainai.


Regards
Takayuki Tsunakawa





Reply | Threaded
Open this post in threaded view
|

Re: [Patch] Optimize dropping of relation buffers using dlist

Kyotaro Horiguchi-4
At Thu, 1 Oct 2020 04:20:27 +0000, "[hidden email]" <[hidden email]> wrote in
> From: Kyotaro Horiguchi <[hidden email]>
> > In more detail, if smgrcachednblocks() returned InvalidBlockNumber for
> > any of the forks, we should give up the optimization at all since we
> > need to run a full scan anyway.  On the other hand, if any of the
> > forks is smaller than the threshold, we still can use the optimization
> > when we know the accurate block number of all the forks.
>
> Ah, I got your point (many eyes in open source development is nice.)  Still, I feel it's better to treat each fork separately, because the inner loop in the traditional path may be able to skip forks that have been already processed in the optimization path.  For example, if the forks[] array contains {fsm, vm, main} in this order (I know main is usually put at the beginning), fsm and vm are processed in the optimization path and the inner loop in the traditional path can skip fsm and vm.

I thought that the advantage of this optimization is that we don't
need to visit all buffers?  If we need to run a full-scan for any
reason, there's no point in looking-up already-visited buffers
again. That's just wastefull cycles.  Am I missing somethig?


> > Still, I prefer to use total block number of all forks since we anyway
> > visit the all forks.  Is there any reason to exlucde forks other than
> > the main fork while we visit all of them already?
>
> When the number of cached blocks for a main fork is below the threshold but the total cached blocks of all forks exceeds the threshold, the optimization is skipped.  I think it's mottainai.

I don't understand. If we chose to the optimized dropping, the reason
is the number of buffer lookup is fewer than a certain threashold. Why
do you think that the fork kind a buffer belongs to is relevant to the
criteria?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

k.jamison@fujitsu.com
In reply to this post by akapila
On Thursday, October 1, 2020 11:49 AM, Amit Kapila wrote:

> On Thu, Oct 1, 2020 at 8:11 AM [hidden email]
> <[hidden email]> wrote:
> >
> > From: Jamison, Kirk/ジャミソン カーク <[hidden email]>
> > > Recovery performance measurement results below.
> > > But it seems there are overhead even with large shared buffers.
> > >
> > > | s_b   | master | patched | %reg  |
> > > |-------|--------|---------|-------|
> > > | 128MB | 36.052 | 39.451  | 8.62% |
> > > | 1GB   | 21.731 | 21.73   | 0.00% |
> > > | 20GB  | 24.534 | 25.137  | 2.40% | 100GB | 30.54  | 31.541  |
> > > | 3.17% |
> >
> > Did you really check that the optimization path is entered and the traditional
> path is never entered?
> >

Oops. Thanks Tsunakawa-san for catching that.
Will fix in the next patch, replacing break with continue.

> I have one idea for performance testing. We can even test this for
> non-recovery paths by removing the recovery-related check like only use it
> when there are cached blocks. You can do this if testing via recovery path is
> difficult because at the end performance should be same for recovery and
> non-recovery paths.

For non-recovery path, did you mean by any chance
measuring the cache hit rate for varying shared_buffers?

SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit)  as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM
  pg_statio_user_tables;


Regards,
Kirk Jamison
Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

tsunakawa.takay@fujitsu.com
From: Jamison, Kirk/ジャミソン カーク <[hidden email]>
> For non-recovery path, did you mean by any chance
> measuring the cache hit rate for varying shared_buffers?

No.  You can test the speed of DropRelFileNodeBuffers() during normal operation, i.e. by running TRUNCATE on psql, instead of performing recovery.  To enable that, you can just remove the checks for recovery, i.e. removing the check if InRecovery and if the value is cached or not.



Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

tsunakawa.takay@fujitsu.com
In reply to this post by Kyotaro Horiguchi-4
From: Kyotaro Horiguchi <[hidden email]>
> I thought that the advantage of this optimization is that we don't
> need to visit all buffers?  If we need to run a full-scan for any
> reason, there's no point in looking-up already-visited buffers
> again. That's just wastefull cycles.  Am I missing somethig?
>
> I don't understand. If we chose to the optimized dropping, the reason
> is the number of buffer lookup is fewer than a certain threashold. Why
> do you think that the fork kind a buffer belongs to is relevant to the
> criteria?

I rethought about this, and you certainly have a point, but...  OK, I think I understood.  I should have thought in a complicated way.  In other words, you're suggesting "Let's simply treat all forks as one relation to determine whether to optimize," right?  That is, the code simple becomes:

Sums up the number of buffers to invalidate in all forks;
if (the cached sizes of all forks are valid && # of buffers to invalidate < THRESHOLD)
{
        do the optimized way;
        return;
}
do the traditional way;

This will be simple, and I'm +1.


Regards
Takayuki Tsunakawa





Reply | Threaded
Open this post in threaded view
|

RE: [Patch] Optimize dropping of relation buffers using dlist

k.jamison@fujitsu.com
On Thursday, October 1, 2020 4:52 PM, Tsunakawa-san wrote:
 

> From: Kyotaro Horiguchi <[hidden email]>
> > I thought that the advantage of this optimization is that we don't
> > need to visit all buffers?  If we need to run a full-scan for any
> > reason, there's no point in looking-up already-visited buffers again.
> > That's just wastefull cycles.  Am I missing somethig?
> >
> > I don't understand. If we chose to the optimized dropping, the reason
> > is the number of buffer lookup is fewer than a certain threashold. Why
> > do you think that the fork kind a buffer belongs to is relevant to the
> > criteria?
>
> I rethought about this, and you certainly have a point, but...  OK, I think I
> understood.  I should have thought in a complicated way.  In other words,
> you're suggesting "Let's simply treat all forks as one relation to determine
> whether to optimize," right?  That is, the code simple becomes:
>
> Sums up the number of buffers to invalidate in all forks; if (the cached sizes
> of all forks are valid && # of buffers to invalidate < THRESHOLD) {
> do the optimized way;
> return;
> }
> do the traditional way;
>
> This will be simple, and I'm +1.
This is actually close to the v18 I posted trying Horiguchi-san's approach, but that
patch had bug. So attached is an updated version (v20) trying this approach again.
I hope it's bug-free this time.

Regards,
Kirk Jamison
 

v20-Optimize-DropRelFileNodeBuffers-during-recovery.patch (11K) Download Attachment
v1-Prevent-invalidating-blocks-in-smgrextend-during-recovery.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] Optimize dropping of relation buffers using dlist

Kyotaro Horiguchi-4
At Thu, 1 Oct 2020 12:55:34 +0000, "[hidden email]" <[hidden email]> wrote in

> On Thursday, October 1, 2020 4:52 PM, Tsunakawa-san wrote:
>  
> > From: Kyotaro Horiguchi <[hidden email]>
> > > I thought that the advantage of this optimization is that we don't
> > > need to visit all buffers?  If we need to run a full-scan for any
> > > reason, there's no point in looking-up already-visited buffers again.
> > > That's just wastefull cycles.  Am I missing somethig?
> > >
> > > I don't understand. If we chose to the optimized dropping, the reason
> > > is the number of buffer lookup is fewer than a certain threashold. Why
> > > do you think that the fork kind a buffer belongs to is relevant to the
> > > criteria?
> >
> > I rethought about this, and you certainly have a point, but...  OK, I think I
> > understood.  I should have thought in a complicated way.  In other words,
> > you're suggesting "Let's simply treat all forks as one relation to determine
> > whether to optimize," right?  That is, the code simple becomes:

Exactly.  The concept of the threshold is that if we are expected to
repeat buffer look-up than that, we consider just one-time full-scan
more efficient.  Since we know we are going to drop buffers of all (or
the specified) forks of the relation at once, the number of looking-up
is naturally the sum of the expected number of the buffers of all
forks.

> > whether to optimize," right?  That is, the code simple becomes:
> >
> > Sums up the number of buffers to invalidate in all forks;
> > if (the cached sizes
> >     of all forks are valid && # of buffers to invalidate < THRESHOLD) {
> > do the optimized way;
> > return;
> > }
> > do the traditional way;
> >
> > This will be simple, and I'm +1.

Thanks!

> This is actually close to the v18 I posted trying Horiguchi-san's approach, but that
> patch had bug. So attached is an updated version (v20) trying this approach again.
> I hope it's bug-free this time.

Thaks for the new version.

- * XXX currently it sequentially searches the buffer pool, should be
- * changed to more clever ways of searching.  However, this routine
- * is used only in code paths that aren't very performance-critical,
- * and we shouldn't slow down the hot paths to make it faster ...
+ * XXX The relation might have extended before this, so this path is

The following description is found in the comment for FlushRelationBuffers.

> * XXX currently it sequentially searches the buffer pool, should be
> * changed to more clever ways of searching.  This routine is not
> * used in any performance-critical code paths, so it's not worth
> * adding additional overhead to normal paths to make it go faster;
> * but see also DropRelFileNodeBuffers.

This looks like to me "We won't do that kind of optimization for
FlushRelationBuffers, but DropRelFileNodeBuffers would need it".  If
so, don't we need to revise the comment together?

- * XXX currently it sequentially searches the buffer pool, should be
- * changed to more clever ways of searching.  However, this routine
- * is used only in code paths that aren't very performance-critical,
- * and we shouldn't slow down the hot paths to make it faster ...
+ * XXX The relation might have extended before this, so this path is
+ * only optimized during recovery when we can get a reliable cached
+ * value of blocks for specified relation.  In addition, it is safe to
+ * do this since there are no other processes but the startup process
+ * that changes the relation size during recovery.  Otherwise, or if
+ * not in recovery, proceed to usual invalidation process, where it
+ * sequentially searches the buffer pool.

This should no longer be a XXX comment.  It seems to me somewhat
describing too-detailed at this function's level. How about something
like the follwoing? (excpet its syntax, or phrasing:p)

===
If the expected maximum number of buffers to drop is small enough
compared to NBuffers, individual buffers are located by
BufTableLookup. Otherwise we scan through all buffers. Snnce we
mustn't leave a buffer behind, we take the latter way unless the
number is not reliably identified.  See smgrcachednblocks() for
details.
===

(I'm still mildly opposed to the function name, which seems exposing
 detail too much.)

+ * Get the total number of cached blocks and to-be-invalidated blocks
+ * of the relation.  If a fork's nblocks is not valid, break the loop.

The number of file blocks is not usually equal to the number of
existing buffers for the file. We might need to explain that
limitation here.


+ for (j = 0; j < nforks; j++)

Though I understand that j is considered to be in a connection with
fork number, I'm a bit uncomfortable that j is used for the outmost
loop..

+ for (curBlock = firstDelBlock[j]; curBlock < nTotalBlocks; curBlock++)

Mmm. We should compare curBlock with the number of blocks of the fork,
not the total of all forks.

+ uint32 newHash; /* hash value for newTag */
+ BufferTag newTag; /* identity of requested block */
+ LWLock   *newPartitionLock; /* buffer partition lock for it */

It seems to be copied from somewhere, but the buffer is not new at
all.

+ if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
+ bufHdr->tag.forkNum == forkNum[j] &&
+ bufHdr->tag.blockNum == curBlock)
+ InvalidateBuffer(bufHdr); /* releases spinlock */

I think it cannot happen that the block is used for a different block
of the same relation-fork, but it could be safer to check
bufHdr->tag.blockNum >= firstDelBlock[j] instead.


+/*
+ * smgrcachednblocks() -- Calculate the number of blocks that are cached in
+ * the supplied relation.
+ *
+ * It is equivalent to calling smgrnblocks, but only used in recovery for now
+ * when DropRelFileNodeBuffers() is called.  This ensures that only cached value
+ * is used which is always valid in recovery, since there is no shared
+ * invalidation mechanism that is implemented yet for changes in file size.
+ *
+ * This returns an InvalidBlockNumber when smgr_cached_nblocks is not available
+ * and when not in recovery.

Isn't it too concrete? We need to mention the buggy-kernel issue here
rahter than that of callers.

And if the comment is correct, we should Assert(InRecovery) at the
beggining of this function.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: [Patch] Optimize dropping of relation buffers using dlist

Kyotaro Horiguchi-4
At Fri, 02 Oct 2020 11:44:46 +0900 (JST), Kyotaro Horiguchi <[hidden email]> wrote in

> At Thu, 1 Oct 2020 12:55:34 +0000, "[hidden email]" <[hidden email]> wrote in
> - * XXX currently it sequentially searches the buffer pool, should be
> - * changed to more clever ways of searching.  However, this routine
> - * is used only in code paths that aren't very performance-critical,
> - * and we shouldn't slow down the hot paths to make it faster ...
> + * XXX The relation might have extended before this, so this path is
> + * only optimized during recovery when we can get a reliable cached
> + * value of blocks for specified relation.  In addition, it is safe to
> + * do this since there are no other processes but the startup process
> + * that changes the relation size during recovery.  Otherwise, or if
> + * not in recovery, proceed to usual invalidation process, where it
> + * sequentially searches the buffer pool.
>
> This should no longer be a XXX comment.  It seems to me somewhat
> describing too-detailed at this function's level. How about something
> like the follwoing? (excpet its syntax, or phrasing:p)
>
> ===
> If the expected maximum number of buffers to drop is small enough
> compared to NBuffers, individual buffers are located by
> BufTableLookup. Otherwise we scan through all buffers. Snnce we
> mustn't leave a buffer behind, we take the latter way unless the
> number is not reliably identified.  See smgrcachednblocks() for
> details.
> ===

The second to last phrase is inversed, and some typos are found. FWIW
this is the revised version.

====
If we are expected to drop buffers less enough, we locate individual
buffers using BufTableLookup. Otherwise we scan through all
buffers. Since we mustn't leave a buffer behind, we take the latter
way unless the sizes of all the involved forks are known to be
accurte. See smgrcachednblocks() for details.
====

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


12345678 ... 14