WIP: Avoid creation of the free space map for small tables

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

WIP: Avoid creation of the free space map for small tables

John Naylor
Hi all,
A while back, Robert Haas noticed that the space taken up by very
small tables is dominated by the FSM [1]. Tom suggested that we could
prevent creation of the FSM until the heap has reached a certain
threshold size [2]. Attached is a WIP patch to implement that. I've
also attached a SQL script to demonstrate the change in behavior for
various scenarios.

The behavior that allows the simplest implementation I thought of is as follows:

-The FSM isn't created if the heap has fewer than 10 blocks (or
whatever). If the last known good block has insufficient space, try
every block before extending the heap.

-If a heap with a FSM is truncated back to below the threshold, the
FSM stays around and can be used as usual.

-If the heap tuples are all deleted, the FSM stays but has no leaf
blocks (same as on master). Although it exists, it won't be
re-extended until the heap re-passes the threshold.

--
Some notes:

-For normal mode, I taught fsm_set_and_search() to switch to a
non-extending buffer call, but the biggest missing piece is WAL
replay. I couldn't find a non-extending equivalent of
XLogReadBufferExtended(), so I might have to create one.

-There'll need to be some performance testing to make sure there's no
regression, and to choose a good value for the threshold. I'll look
into that, but if anyone has any ideas for tests, that'll help this
effort along.

-A possible TODO item is to teach pg_upgrade not to link FSMs for
small heaps. I haven't look into the feasibility of that, however.

-RelationGetBufferForTuple() now has two boolean variables that mean
"don't use the FSM", but with different behaviors. To avoid confusion,
I've renamed use_fsm to always_extend and revised the commentary
accordingly.

-I've only implemented this for heaps, because indexes (at least
B-tree) don't seem to be as eager to create a FSM. I haven't looked at
the code, however.

--
[1] https://www.postgresql.org/message-id/CA%2BTgmoac%2B6qTNp2U%2BwedY8-PU6kK_b6hbdhR5xYGBG3GtdFcww%40mail.gmail.com
[2] https://www.postgresql.org/message-id/11360.1345502641%40sss.pgh.pa.us

--
I'll add this to the November commitfest.

-John Naylor

fsmtest.sql (1K) Download Attachment
v1-0001-Avoid-creation-of-the-free-space-map-for-small-ta.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

Thomas Munro-3
On Sat, Oct 6, 2018 at 7:47 AM John Naylor <[hidden email]> wrote:
> A while back, Robert Haas noticed that the space taken up by very
> small tables is dominated by the FSM [1]. Tom suggested that we could
> prevent creation of the FSM until the heap has reached a certain
> threshold size [2]. Attached is a WIP patch to implement that. I've
> also attached a SQL script to demonstrate the change in behavior for
> various scenarios.

Hi John,

You'll need to tweak the test in contrib/pageinspect/sql/page.sql,
because it's currently asserting that there is an FSM on a small table
so make check-world fails.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

John Naylor
On 10/6/18, Thomas Munro <[hidden email]> wrote:

> On Sat, Oct 6, 2018 at 7:47 AM John Naylor <[hidden email]> wrote:
>> A while back, Robert Haas noticed that the space taken up by very
>> small tables is dominated by the FSM [1]. Tom suggested that we could
>> prevent creation of the FSM until the heap has reached a certain
>> threshold size [2]. Attached is a WIP patch to implement that. I've
>> also attached a SQL script to demonstrate the change in behavior for
>> various scenarios.
>
> Hi John,
>
> You'll need to tweak the test in contrib/pageinspect/sql/page.sql,
> because it's currently asserting that there is an FSM on a small table
> so make check-world fails.
Whoops, sorry about that; the attached patch passes make check-world.
While looking into that, I also found a regression: If the cached
target block is the last block in the relation and there is no free
space, that block will be tried twice. That's been fixed as well.

Thanks,
-John Naylor

v2-0001-Avoid-creation-of-the-free-space-map-for-small-ta.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

Tom Lane-2
John Naylor <[hidden email]> writes:
> On 10/6/18, Thomas Munro <[hidden email]> wrote:
>> On Sat, Oct 6, 2018 at 7:47 AM John Naylor <[hidden email]> wrote:
>>> A while back, Robert Haas noticed that the space taken up by very
>>> small tables is dominated by the FSM [1]. Tom suggested that we could
>>> prevent creation of the FSM until the heap has reached a certain
>>> threshold size [2]. Attached is a WIP patch to implement that.

BTW, don't we need a similar hack for visibility maps?

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

John Naylor
On 10/7/18, Tom Lane <[hidden email]> wrote:
> John Naylor <[hidden email]> writes:
>> On 10/6/18, Thomas Munro <[hidden email]> wrote:
>>> On Sat, Oct 6, 2018 at 7:47 AM John Naylor <[hidden email]> wrote:
>>>> A while back, Robert Haas noticed that the space taken up by very
>>>> small tables is dominated by the FSM [1]. Tom suggested that we could
>>>> prevent creation of the FSM until the heap has reached a certain
>>>> threshold size [2]. Attached is a WIP patch to implement that.
>
> BTW, don't we need a similar hack for visibility maps?

The FSM is the bigger bang for the buck, and fairly simple to do, but
it would be nice to do something about VMs as well. I'm not sure if
simply lacking a VM would be as simple (or as free of downsides) as
for the FSM. I haven't studied the VM code in detail, however.

-John Naylor

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

akapila
In reply to this post by John Naylor
On Sat, Oct 6, 2018 at 12:17 AM John Naylor <[hidden email]> wrote:
>
> -There'll need to be some performance testing to make sure there's no
> regression, and to choose a good value for the threshold. I'll look
> into that, but if anyone has any ideas for tests, that'll help this
> effort along.
>

Can you try with a Copy command which copies just enough tuples to
fill the pages equivalent to HEAP_FSM_EXTENSION_THRESHOLD?  It seems
to me in such a case patch will try each of the blocks multiple times.
  It looks quite lame that we have to try again and again the blocks
which we have just filled by ourselves but may be that doesn't matter
much as the threshold value is small.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

akapila
In reply to this post by John Naylor
On Sat, Oct 6, 2018 at 12:17 AM John Naylor <[hidden email]> wrote:

>
> Hi all,
> A while back, Robert Haas noticed that the space taken up by very
> small tables is dominated by the FSM [1]. Tom suggested that we could
> prevent creation of the FSM until the heap has reached a certain
> threshold size [2]. Attached is a WIP patch to implement that. I've
> also attached a SQL script to demonstrate the change in behavior for
> various scenarios.
>
> The behavior that allows the simplest implementation I thought of is as follows:
>
> -The FSM isn't created if the heap has fewer than 10 blocks (or
> whatever). If the last known good block has insufficient space, try
> every block before extending the heap.
>
> -If a heap with a FSM is truncated back to below the threshold, the
> FSM stays around and can be used as usual.
>
> -If the heap tuples are all deleted, the FSM stays but has no leaf
> blocks (same as on master). Although it exists, it won't be
> re-extended until the heap re-passes the threshold.
>
> --
> Some notes:
>
> -For normal mode, I taught fsm_set_and_search() to switch to a
> non-extending buffer call, but the biggest missing piece is WAL
> replay.
>

fsm_set_and_search()
{
..
+ /*
+ * For heaps we prevent extension of the FSM unless the number of pages
+ * exceeds
HEAP_FSM_EXTENSION_THRESHOLD. For tables that don't already
+ * have a FSM, this will save an inode and a few kB
of space.
+ * For sane threshold values, the FSM address will be zero, so we
+ * don't bother dealing with
anything else.
+ */
+ if (rel->rd_rel->relkind == RELKIND_RELATION
+ && addr.logpageno == 0)

I am not sure if this is a solid way to avoid creating FSM.  What if
fsm_set_and_search gets called for the level other than 0?   Also,
when the relation has blocks more than HEAP_FSM_EXTENSION_THRESHOLD,
then first time when vacuum will try to record the free space in the
page, won't it skip recording free space for first
HEAP_FSM_EXTENSION_THRESHOLD pages?

I think you have found a good way to avoid creating FSM, but can't we
use some simpler technique like if the FSM fork for a relation doesn't
exist, then check the heapblk number for which we try to update the
FSM and if it is lesser than HEAP_FSM_EXTENSION_THRESHOLD, then avoid
creating the FSM.

> I couldn't find a non-extending equivalent of
> XLogReadBufferExtended(), so I might have to create one.
>

I think it would be better if we can find a common way to avoid
creating FSM both during DO and REDO time.  It might be possible if
somethin like what I have said above is feasible.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

John Naylor
On 10/13/18, Amit Kapila <[hidden email]> wrote:

> On Sat, Oct 6, 2018 at 12:17 AM John Naylor <[hidden email]> wrote:
>> -For normal mode, I taught fsm_set_and_search() to switch to a
>> non-extending buffer call, but the biggest missing piece is WAL
>> replay.
>>
>
> fsm_set_and_search()
> {
> ..
> + /*
> + * For heaps we prevent extension of the FSM unless the number of pages
> + * exceeds
> HEAP_FSM_EXTENSION_THRESHOLD. For tables that don't already
> + * have a FSM, this will save an inode and a few kB
> of space.
> + * For sane threshold values, the FSM address will be zero, so we
> + * don't bother dealing with
> anything else.
> + */
> + if (rel->rd_rel->relkind == RELKIND_RELATION
> + && addr.logpageno == 0)
>
> I am not sure if this is a solid way to avoid creating FSM.  What if
> fsm_set_and_search gets called for the level other than 0?
Thanks for taking a look. As for levels other than 0, I think that
only happens when fsm_set_and_search() is called by fsm_search(),
which will not cause extension.

> Also,
> when the relation has blocks more than HEAP_FSM_EXTENSION_THRESHOLD,
> then first time when vacuum will try to record the free space in the
> page, won't it skip recording free space for first
> HEAP_FSM_EXTENSION_THRESHOLD pages?

Hmm, that's a good point.

> I think you have found a good way to avoid creating FSM, but can't we
> use some simpler technique like if the FSM fork for a relation doesn't
> exist, then check the heapblk number for which we try to update the
> FSM and if it is lesser than HEAP_FSM_EXTENSION_THRESHOLD, then avoid
> creating the FSM.

I think I see what you mean, but to avoid the vacuum problem you just
mentioned, we'd need to check the relation size, too. I've attached an
unpolished revision to do this. It seems to work, but I haven't tested
the vacuum issue yet. I'll do that and some COPY performance testing
in the next day or so. There's a bit more repetition than I would
like, so I'm not sure it's simpler - perhaps RecordPageWithFreeSpace()
could be turned into a wrapper around RecordAndGetPageWithFreeSpace().

Also new in this version, some non-functional improvements to hio.c:
-debugging calls that are #ifdef'd out.
-move some code out into a function instead of adding another goto.

>> I couldn't find a non-extending equivalent of
>> XLogReadBufferExtended(), so I might have to create one.
>>
>
> I think it would be better if we can find a common way to avoid
> creating FSM both during DO and REDO time.  It might be possible if
> somethin like what I have said above is feasible.

That would be ideal.

-John Naylor

v3-0001-Avoid-creation-of-the-free-space-map-for-small-ta.patch (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

John Naylor
> On 10/13/18, Amit Kapila <[hidden email]> wrote:
>> I think you have found a good way to avoid creating FSM, but can't we
>> use some simpler technique like if the FSM fork for a relation doesn't
>> exist, then check the heapblk number for which we try to update the
>> FSM and if it is lesser than HEAP_FSM_EXTENSION_THRESHOLD, then avoid
>> creating the FSM.

>> I think it would be better if we can find a common way to avoid
>> creating FSM both during DO and REDO time.  It might be possible if
>> somethin like what I have said above is feasible.

I've attached v4, which implements the REDO case, and as closely as
possible to the DO case. I've created a new function to guard against
creation of the FSM, which is called by  RecordPageWithFreeSpace() and
RecordAndGetPageWithFreeSpace(). Since XLogRecordPageWithFreeSpace()
takes a relfilenode and not a relation, I had to reimplement that
separately, but the logic is basically the same. It works under
streaming replication.

I've also attached a couple SQL scripts which, when the aforementioned
DEBUG1 calls are enabled, show what the heap insert code is doing for
different scenarios. Make check-world passes.

-John Naylor

v4-0001-Avoid-creation-of-the-free-space-map-for-small-ta.patch (23K) Download Attachment
test-fsm-first-10-blocks.sql (1K) Download Attachment
test-nofsm-first-block.sql (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

akapila
In reply to this post by John Naylor
On Sun, Oct 14, 2018 at 1:09 AM John Naylor <[hidden email]> wrote:

>
> On 10/13/18, Amit Kapila <[hidden email]> wrote:
>
> > I think you have found a good way to avoid creating FSM, but can't we
> > use some simpler technique like if the FSM fork for a relation doesn't
> > exist, then check the heapblk number for which we try to update the
> > FSM and if it is lesser than HEAP_FSM_EXTENSION_THRESHOLD, then avoid
> > creating the FSM.
>
> I think I see what you mean, but to avoid the vacuum problem you just
> mentioned, we'd need to check the relation size, too.
>

Sure, but vacuum already has relation size.  In general, I think we
should try to avoid adding more system calls in the common code path.
It can impact the performance.

Few comments on your latest patch:
-
+static bool
+allow_write_to_fsm(Relation rel, BlockNumber heapBlk)
+{
+ BlockNumber heap_nblocks;
+
+ if (heapBlk > HEAP_FSM_EXTENSION_THRESHOLD ||
+ rel->rd_rel->relkind != RELKIND_RELATION)
+ return true;
+
+ /* XXX is this value cached? */
+ heap_nblocks = RelationGetNumberOfBlocks(rel);
+
+ if (heap_nblocks > HEAP_FSM_EXTENSION_THRESHOLD)
+ return true;
+ else
+ {
+ RelationOpenSmgr(rel);
+ return smgrexists(rel->rd_smgr, FSM_FORKNUM);
+ }
+}

I think you can avoid calling RelationGetNumberOfBlocks, if you call
smgrexists before and for the purpose of vacuum, we can get that as an
input parameter.  I think one can argue for not changing the interface
functions like RecordPageWithFreeSpace to avoid calling
RelationGetNumberOfBlocks, but to me, it appears worth to save the
additional system call.

-
targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
-
- /*
- * If the FSM knows nothing of the rel, try the last page before we
- * give up and extend.  This avoids one-tuple-per-page syndrome during
- * bootstrapping or in a recently-started system.
- */
  if (targetBlock == InvalidBlockNumber)
- {
- BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
-
- if (nblocks > 0)
- targetBlock = nblocks - 1;
- }
+ targetBlock = get_page_no_fsm(relation, InvalidBlockNumber,
+   &try_every_page);


Is it possible to hide the magic of trying each block within
GetPageWithFreeSpace?  It will simplify the code and in future, if
another storage API has a different function for
RelationGetBufferForTuple, it will work seamlessly, provided they are
using same FSM.  One such user is zheap.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

John Naylor
On 10/15/18, Amit Kapila <[hidden email]> wrote:

> Few comments on your latest patch:
> -
> +static bool
> +allow_write_to_fsm(Relation rel, BlockNumber heapBlk)
> +{
> + BlockNumber heap_nblocks;
> +
> + if (heapBlk > HEAP_FSM_EXTENSION_THRESHOLD ||
> + rel->rd_rel->relkind != RELKIND_RELATION)
> + return true;
> +
> + /* XXX is this value cached? */
> + heap_nblocks = RelationGetNumberOfBlocks(rel);
> +
> + if (heap_nblocks > HEAP_FSM_EXTENSION_THRESHOLD)
> + return true;
> + else
> + {
> + RelationOpenSmgr(rel);
> + return smgrexists(rel->rd_smgr, FSM_FORKNUM);
> + }
> +}
>
> I think you can avoid calling RelationGetNumberOfBlocks, if you call
> smgrexists before

Okay, I didn't know which was cheaper, but I'll check smgrexists
first. Thanks for the info.

> and for the purpose of vacuum, we can get that as an
> input parameter.  I think one can argue for not changing the interface
> functions like RecordPageWithFreeSpace to avoid calling
> RelationGetNumberOfBlocks, but to me, it appears worth to save the
> additional system call.

I agree, and that should be fairly straightforward. I'll include that
in the next patch.

> -
> targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
> -
> - /*
> - * If the FSM knows nothing of the rel, try the last page before we
> - * give up and extend.  This avoids one-tuple-per-page syndrome during
> - * bootstrapping or in a recently-started system.
> - */
>   if (targetBlock == InvalidBlockNumber)
> - {
> - BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
> -
> - if (nblocks > 0)
> - targetBlock = nblocks - 1;
> - }
> + targetBlock = get_page_no_fsm(relation, InvalidBlockNumber,
> +   &try_every_page);
>
>
> Is it possible to hide the magic of trying each block within
> GetPageWithFreeSpace?  It will simplify the code and in future, if
> another storage API has a different function for
> RelationGetBufferForTuple, it will work seamlessly, provided they are
> using same FSM.  One such user is zheap.

Hmm, here I'm a bit more skeptical about the trade offs. That would
mean, in effect, to put a function called get_page_no_fsm() in the FSM
code. ;-)  I'm willing to be convinced otherwise, of course, but
here's my reasoning:

For one, we'd have to pass prevBlockNumber and &try_every_block to
GetPageWithFreeSpace() (and RecordAndGetPageWithFreeSpace() by
extension), which are irrelevant to some callers. In addition, in
hio.c, there is a call where we don't want to try any blocks that we
have already, much less all of them:

/*
 * Check if some other backend has extended a block for us while
 * we were waiting on the lock.
 */
targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);

By the time we get to this call, we likely wouldn't trigger the logic
to try every block, but I don't think we can guarantee that. We could
add a boolean parameter that means "consider trying every block", but
I don't think the FSM code should have so much state passed to it.

Thanks for reviewing,
-John Naylor

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

akapila
On Mon, Oct 15, 2018 at 4:09 PM John Naylor <[hidden email]> wrote:

> On 10/15/18, Amit Kapila <[hidden email]> wrote:
> > Few comments on your latest patch:
> > -
> > targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
> > -
> > - /*
> > - * If the FSM knows nothing of the rel, try the last page before we
> > - * give up and extend.  This avoids one-tuple-per-page syndrome during
> > - * bootstrapping or in a recently-started system.
> > - */
> >   if (targetBlock == InvalidBlockNumber)
> > - {
> > - BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
> > -
> > - if (nblocks > 0)
> > - targetBlock = nblocks - 1;
> > - }
> > + targetBlock = get_page_no_fsm(relation, InvalidBlockNumber,
> > +   &try_every_page);
> >
> >
> > Is it possible to hide the magic of trying each block within
> > GetPageWithFreeSpace?  It will simplify the code and in future, if
> > another storage API has a different function for
> > RelationGetBufferForTuple, it will work seamlessly, provided they are
> > using same FSM.  One such user is zheap.
>
> Hmm, here I'm a bit more skeptical about the trade offs. That would
> mean, in effect, to put a function called get_page_no_fsm() in the FSM
> code. ;-)  I'm willing to be convinced otherwise, of course, but
> here's my reasoning:
>
> For one, we'd have to pass prevBlockNumber and &try_every_block to
> GetPageWithFreeSpace() (and RecordAndGetPageWithFreeSpace() by
> extension), which are irrelevant to some callers.
>

I think we can avoid using prevBlockNumber and try_every_block, if we
maintain a small cache which tells whether the particular block is
tried or not.  What I am envisioning is that while finding the block
with free space, if we came to know that the relation in question is
small enough that it doesn't have FSM, we can perform a local_search.
In local_seach, we can enquire the cache for any block that we can try
and if we find any block, we can try inserting in that block,
otherwise, we need to extend the relation.  One simple way to imagine
such a cache would be an array of structure and structure has blkno
and status fields.  After we get the usable block, we need to clear
the cache, if exists.

> In addition, in
> hio.c, there is a call where we don't want to try any blocks that we
> have already, much less all of them:
>
> /*
>  * Check if some other backend has extended a block for us while
>  * we were waiting on the lock.
>  */
> targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
>
> By the time we get to this call, we likely wouldn't trigger the logic
> to try every block, but I don't think we can guarantee that.
>

I think the current code as proposed has that limitation, but if we
have a cache, then we can check if the relation has actually extended
after taking the lock and if so we can try only newly added block/'s.

I am not completely sure if the idea described above is certainly
better, but it seems that it will be clean and can handle some of the
cases with ease.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

John Naylor
In reply to this post by akapila
On 10/15/18, Amit Kapila <[hidden email]> wrote:
> I think you can avoid calling RelationGetNumberOfBlocks, if you call
> smgrexists before

This is done in the attached v5, 0001.

> and for the purpose of vacuum, we can get that as an
> input parameter.  I think one can argue for not changing the interface
> functions like RecordPageWithFreeSpace to avoid calling
> RelationGetNumberOfBlocks, but to me, it appears worth to save the
> additional system call.

This is done in 0002. I also added a check for the cached value of
pg_class.relpages, since it's cheap and may help non-VACUUM callers.

> [proposal for a cache of blocks to try]

That's interesting. I'll have to do some reading elsewhere in the
codebase, and then I'll follow up.

Thanks,
-John Naylor

v5-0001-Avoid-creation-of-the-free-space-map-for-small-ta.patch (24K) Download Attachment
v5-0002-Add-parameter-nblocks-to-RecordPageWithFreeSpace.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

akapila
On Tue, Oct 16, 2018 at 4:27 PM John Naylor <[hidden email]> wrote:
> > [proposal for a cache of blocks to try]
>
> That's interesting. I'll have to do some reading elsewhere in the
> codebase, and then I'll follow up.
>

Thanks, I have changed the status of this patch as "Waiting on
Author".  Feel free to change it once you have a new patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

John Naylor
In reply to this post by akapila
On 10/16/18, Amit Kapila <[hidden email]> wrote:

> I think we can avoid using prevBlockNumber and try_every_block, if we
> maintain a small cache which tells whether the particular block is
> tried or not.  What I am envisioning is that while finding the block
> with free space, if we came to know that the relation in question is
> small enough that it doesn't have FSM, we can perform a local_search.
> In local_seach, we can enquire the cache for any block that we can try
> and if we find any block, we can try inserting in that block,
> otherwise, we need to extend the relation.  One simple way to imagine
> such a cache would be an array of structure and structure has blkno
> and status fields.  After we get the usable block, we need to clear
> the cache, if exists.
Here is the design I've implemented in the attached v6. There is more
code than v5, but there's a cleaner separation between freespace.c and
hio.c, as you preferred. I also think it's more robust. I've expended
some effort to avoid doing unnecessary system calls to get the number
of blocks.
--

For the local, in-memory map, maintain a static array of status
markers, of fixed-length HEAP_FSM_CREATION_THRESHOLD, indexed by block
number. This is populated every time we call GetPageWithFreeSpace() on
small tables with no FSM. The statuses are

'zero' (beyond the relation)
'available to try'
'tried already'

Example for a 4-page heap:

01234567
AAAA0000

If we try block 3 and there is no space, we set it to 'tried' and next
time through the loop we'll try 2, etc:

01234567
AAAT0000

If we try all available blocks, we will extend the relation. As in the
master branch, first we call GetPageWithFreeSpace() again to see if
another backend extended the relation to 5 blocks while we were
waiting for the lock. If we find a new block, we will mark the new
block available and leave the rest alone:

01234567
TTTTA000

On the odd chance we still can't insert into the new block, we'll skip
checking any others and we'll redo the logic to extend the relation.

If we're about to successfully return a buffer, whether from an
existing block, or by extension, we clear the local map.

Once this is in shape, I'll do some performance testing.

-John Naylor

v6-0001-Avoid-creation-of-the-free-space-map-for-small-ta.patch (36K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

John Naylor
I wrote:

> Once this is in shape, I'll do some performance testing.

On second thought, there's no point in waiting, especially if a
regression points to a design flaw.

I compiled patched postgres with HEAP_FSM_CREATION_THRESHOLD set to
32, then ran the attached script which populates 100 tables with
varying numbers of blocks. I wanted a test that created pages eagerly
and wrote to disk as little as possible. Config was stock, except for
fsync = off. I took the average of 10 runs after removing the slowest
and fastest run:

# blocks master patch
4 36.4ms 33.9ms
8 50.6ms 48.9ms
12 58.6ms 66.3ms
16 65.5ms 81.4ms

It seems under these circumstances a threshold of up to 8 performs
comparably to the master branch, with small block numbers possibly
faster than with the FSM, provided they're in shared buffers already.
I didn't bother testing higher values because it's clear there's a
regression starting around 10 or so, beyond which it helps to have the
FSM.

A case could be made for setting the threshold to 4, since not having
3 blocks of FSM in shared buffers exactly makes up for the 3 other
blocks of heap that are checked when free space runs out.

I can run additional tests if there's interest.

-John Naylor

fsm-copy-test.sql (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

John Naylor
Upthread I wrote:

> -A possible TODO item is to teach pg_upgrade not to link FSMs for
> small heaps. I haven't look into the feasibility of that, however.

This turned out to be relatively light weight (0002 attached). I had
to add relkind to the RelInfo struct and save the size of each heap as
its transferred. The attached SQL script will setup a couple test
cases to demonstrate pg_upgrade. Installations with large numbers of
small tables will be able to see space savings right away.

For 0001, I adjusted the README and docs, and also made some cosmetic
improvements to the code, mostly in the comments. I've set the
commitfest entry back to 'needs review'

One thing I noticed is that one behavior on master hasn't changed:
System catalogs created during bootstrap still have a FSM if they have
any data. Possibly related, catalogs also have a VM even if they have
no data at all. This isn't anything to get excited about, but it would
be nice to investigate, at least so it can be documented. A cursory
dig hasn't found the cause, but I'll keep doing that as time permits.

-John Naylor

v7-0001-Avoid-creation-of-the-free-space-map-for-small-ta.patch (40K) Download Attachment
v7-0002-During-pg_upgrade-skip-transfer-of-FSMs-if-they-w.patch (10K) Download Attachment
setup-fsm-pg_upgrade.sql (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

akapila
On Wed, Oct 31, 2018 at 1:42 PM John Naylor <[hidden email]> wrote:

>
> Upthread I wrote:
>
> > -A possible TODO item is to teach pg_upgrade not to link FSMs for
> > small heaps. I haven't look into the feasibility of that, however.
>
> This turned out to be relatively light weight (0002 attached). I had
> to add relkind to the RelInfo struct and save the size of each heap as
> its transferred. The attached SQL script will setup a couple test
> cases to demonstrate pg_upgrade. Installations with large numbers of
> small tables will be able to see space savings right away.
>
> For 0001, I adjusted the README and docs, and also made some cosmetic
> improvements to the code, mostly in the comments. I've set the
> commitfest entry back to 'needs review'
>
> One thing I noticed is that one behavior on master hasn't changed:
> System catalogs created during bootstrap still have a FSM if they have
> any data. Possibly related, catalogs also have a VM even if they have
> no data at all. This isn't anything to get excited about, but it would
> be nice to investigate, at least so it can be documented. A cursory
> dig hasn't found the cause, but I'll keep doing that as time permits.
>

Thanks for your work on this, I will try to review it during CF.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

Robert Haas
In reply to this post by John Naylor
On Tue, Oct 23, 2018 at 9:42 AM John Naylor <[hidden email]> wrote:
> A case could be made for setting the threshold to 4, since not having
> 3 blocks of FSM in shared buffers exactly makes up for the 3 other
> blocks of heap that are checked when free space runs out.

That doesn't seem like an unreasonable argument.  I'm not sure whether
the right threshold is 4 or something a little bigger, but I bet it's
not very large.  It seems important to me that before anybody thinks
about committing this, we construct some kind of destruction case
where repeated scans of the whole table are triggered as frequently as
possible, and then run that test with varying thresholds.  I might be
totally wrong, but I bet with a value as large as 32 you will be able
to find cases where it regresses in a big way.

We also need to think about what happens on the standby, where the FSM
is updated in a fairly different way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: WIP: Avoid creation of the free space map for small tables

akapila
On Wed, Oct 31, 2018 at 10:29 PM Robert Haas <[hidden email]> wrote:

>
> On Tue, Oct 23, 2018 at 9:42 AM John Naylor <[hidden email]> wrote:
> > A case could be made for setting the threshold to 4, since not having
> > 3 blocks of FSM in shared buffers exactly makes up for the 3 other
> > blocks of heap that are checked when free space runs out.
>
> That doesn't seem like an unreasonable argument.  I'm not sure whether
> the right threshold is 4 or something a little bigger, but I bet it's
> not very large.  It seems important to me that before anybody thinks
> about committing this, we construct some kind of destruction case
> where repeated scans of the whole table are triggered as frequently as
> possible, and then run that test with varying thresholds.
>

Why do you think repeated scans will be a destruction case when there
is no FSM for a small table?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

123