PoC Refactor AM analyse API

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

PoC Refactor AM analyse API

Смирнов Денис
Hello all!

I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by Greenplum’s analyze improvement for append-optimized row and column AM with variable size compressed blocks.
Currently we do analyze in two steps.

1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function)
2. Collect tuples into reservoir with algorithm Z from Vitter.

So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS) algorithms like A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks() estimating a physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and can’t benefit from column storage at ANALYZE TABLE (COL).

The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single function: table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler functions if it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes VacAttrStats to table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be block-oriented.





Best regards,
Denis Smirnov | Developer
[hidden email] 
Arenadata | Godovikova 9-17, Moscow 129085 Russia


am-analyze-1.patch (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Смирнов Денис
It seems that my mailing client set wrong MIME types for attached patch and it was filtered by the web archive. So I attach the patch again for the web archive.




> 7 дек. 2020 г., в 23:23, Смирнов Денис <[hidden email]> написал(а):
>
> Hello all!
>
> I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by Greenplum’s analyze improvement for append-optimized row and column AM with variable size compressed blocks.
> Currently we do analyze in two steps.
>
> 1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function)
> 2. Collect tuples into reservoir with algorithm Z from Vitter.
>
> So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS) algorithms like A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks() estimating a physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and can’t benefit from column storage at ANALYZE TABLE (COL).
>
> The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single function: table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler functions if it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes VacAttrStats to table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be block-oriented.
>
> <am-analyze-1.patch>
>
>
>
> Best regards,
> Denis Smirnov | Developer
> [hidden email]
> Arenadata | Godovikova 9-17, Moscow 129085 Russia
>


am-analyze-1.patch.txt (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Andrey Borodin-2
In reply to this post by Смирнов Денис
Hi Denis!

> 7 дек. 2020 г., в 18:23, Смирнов Денис <[hidden email]> написал(а):
>
> I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by Greenplum’s analyze improvement for append-optimized row and column AM with variable size compressed blocks.
> Currently we do analyze in two steps.
>
> 1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function)
> 2. Collect tuples into reservoir with algorithm Z from Vitter.
>
> So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS) algorithms like A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks() estimating a physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and can’t benefit from column storage at ANALYZE TABLE (COL).
>
> The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single function: table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler functions if it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes VacAttrStats to table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be block-oriented.

Just few random notes about the idea.
Heap pages are not of a fixed size, when measured in tuple count. And comment in the codes describes it.
 * Although every row has an equal chance of ending up in the final
 * sample, this sampling method is not perfect: not every possible
 * sample has an equal chance of being selected.  For large relations
 * the number of different blocks represented by the sample tends to be
 * too small.  We can live with that for now.  Improvements are welcome.

Current implementation provide framework with shared code. Though this framework is only suitable for block-of-tuples AMs. And have statistical downsides when count of tuples varies too much.
Maybe can we just provide a richer API? To have both: avoid copying code and provide flexibility.

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Смирнов Денис
Andrey, thanks for your feedback!

I agree that AMs with fix sized blocks can have much alike code in acquire_sample_rows() (though it is not a rule). But there are several points about current master sampling.

* It is not perfect - AM developers may want to improve it with other sampling algorithms.
* It is designed with a big influence of heap AM - for example, RelationGetNumberOfBlocks() returns uint32 while other AMs can have a bigger amount of blocks.
* heapam_acquire_sample_rows() is a small function - I don't think it is not a big trouble to write something alike for any AM developer.
* Some AMs may have a single level sampling (only algorithm Z from Vitter for example) - why not?

As a result we get a single and clear method to acquire rows for statistics. If we don’t modify but rather extend current API ( for example in a manner it is done for FDW) the code becomes more complicated and difficult to understand.

> 8 дек. 2020 г., в 18:42, Andrey Borodin <[hidden email]> написал(а):
>
> Hi Denis!
>
>> 7 дек. 2020 г., в 18:23, Смирнов Денис <[hidden email]> написал(а):
>>
>> I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by Greenplum’s analyze improvement for append-optimized row and column AM with variable size compressed blocks.
>> Currently we do analyze in two steps.
>>
>> 1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function)
>> 2. Collect tuples into reservoir with algorithm Z from Vitter.
>>
>> So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS) algorithms like A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks() estimating a physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and can’t benefit from column storage at ANALYZE TABLE (COL).
>>
>> The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single function: table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler functions if it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes VacAttrStats to table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be block-oriented.
>
> Just few random notes about the idea.
> Heap pages are not of a fixed size, when measured in tuple count. And comment in the codes describes it.
> * Although every row has an equal chance of ending up in the final
> * sample, this sampling method is not perfect: not every possible
> * sample has an equal chance of being selected.  For large relations
> * the number of different blocks represented by the sample tends to be
> * too small.  We can live with that for now.  Improvements are welcome.
>
> Current implementation provide framework with shared code. Though this framework is only suitable for block-of-tuples AMs. And have statistical downsides when count of tuples varies too much.
> Maybe can we just provide a richer API? To have both: avoid copying code and provide flexibility.
>
> Best regards, Andrey Borodin.
>



Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Andrey Borodin-2


> 8 дек. 2020 г., в 16:44, Denis Smirnov <[hidden email]> написал(а):
>
> Andrey, thanks for your feedback!
>
> I agree that AMs with fix sized blocks can have much alike code in acquire_sample_rows() (though it is not a rule). But there are several points about current master sampling.
>
> * It is not perfect - AM developers may want to improve it with other sampling algorithms.
> * It is designed with a big influence of heap AM - for example, RelationGetNumberOfBlocks() returns uint32 while other AMs can have a bigger amount of blocks.
> * heapam_acquire_sample_rows() is a small function - I don't think it is not a big trouble to write something alike for any AM developer.
> * Some AMs may have a single level sampling (only algorithm Z from Vitter for example) - why not?
>
> As a result we get a single and clear method to acquire rows for statistics. If we don’t modify but rather extend current API ( for example in a manner it is done for FDW) the code becomes more complicated and difficult to understand.

This makes sense. Purpose of the API is to provide flexible abstraction. Current table_scan_analyze_next_block()/table_scan_analyze_next_tuple() API assumes too much about AM implementation.
But why do you pass int natts and VacAttrStats **stats to acquire_sample_rows()? Is it of any use? It seems to break abstraction too.

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Смирнов Денис

> But why do you pass int natts and VacAttrStats **stats to acquire_sample_rows()? Is it of any use? It seems to break abstraction too.

Yes, it is really a kluge that should be discussed. The main problem is that we don’t pass projection information to analyze scan (analyze begin scan relise only on relation information during initialization). And as a result we can’t benefit from column AMs during «analyze t(col)» and consume data only from target columns. This parameters were added to solve this problem.

Best regards,
Denis Smirnov | Developer
[hidden email]
Arenadata | Godovikova 9-17, Moscow 129085 Russia



Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Heikki Linnakangas
On 30/12/2020 11:12, Denis Smirnov wrote:

>
>> But why do you pass int natts and VacAttrStats **stats to
>> acquire_sample_rows()? Is it of any use? It seems to break
>> abstraction too.
>
> Yes, it is really a kluge that should be discussed. The main problem
> is that we don’t pass projection information to analyze scan (analyze
> begin scan relise only on relation information during
> initialization). And as a result we can’t benefit from column AMs
> during «analyze t(col)» and consume data only from target columns.
> This parameters were added to solve this problem.

The documentation needs to be updated accordingly, see
AcquireSampleRowsFunc in fdwhandler.sgml.

This part of the patch, adding the list of columns being analyzed, seems
a bit unfinished. I'd suggest to leave that out for now, and add it as
part of the "Table AM modifications to accept column projection lists"
patch that's also in this commitfest [1]

> This suggestion also ... removes PROGRESS_ANALYZE_BLOCKS_TOTAL and
> PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be
> block-oriented.

We shouldn't just remove it, a progress indicator is nice. It's true
that not all AMs are block-oriented, but those that are can still use
those. Perhaps we can add ther PROGRESS_ANALYZE_* states for
non-block-oriented AMs, but that can wait until there is a concrete use
for it.

> static int
> acquire_sample_rows(Relation onerel, int elevel,
> HeapTuple *rows, int targrows,
> double *totalrows, double *totaldeadrows)
> {
> int numrows = 0; /* # rows now in reservoir */
> TableScanDesc scan;
>
> Assert(targrows > 0);
>
> scan = table_beginscan_analyze(onerel);
>
> numrows = table_acquire_sample_rows(scan, elevel,
> natts, stats,
> vac_strategy, rows,
> targrows, totalrows,
> totaldeadrows);
>
> table_endscan(scan);
>
> /*
> * If we didn't find as many tuples as we wanted then we're done. No sort
> * is needed, since they're already in order.
> *
> * Otherwise we need to sort the collected tuples by position
> * (itempointer). It's not worth worrying about corner cases where the
> * tuples are already sorted.
> */
> if (numrows == targrows)
> qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
>
> return numrows;
> }

Perhaps better to move the qsort() into heapam_acquire_sample_rows(),
and document that the acquire_sample_rows() AM function must return the
tuples in 'ctid' order. In a generic API, it seems like a shady
assumption that they must be in order if we didn't find as many rows as
we wanted. Or always call qsort(); if the tuples are already in order,
that should be pretty quick.

The table_beginscan_analyze() call seems a bit pointless now. Let's
remove it, and pass the Relation to table_acquire_sample_rows directly.

> /*
> * This callback needs to fill reservour with sample rows during analyze
> * scan.
> */
> int (*acquire_sample_rows) (TableScanDesc scan,

The "reservoir" is related to the block sampler, but that's just an
implementation detail of the function. Perhaps something like "Acquire a
sample of rows from the table, for ANALYZE". And explain the arguments
here, or in table_acquire_sample_rows().

Overall, I like where this patch is going.

[1] https://commitfest.postgresql.org/31/2922/

- Heikki


Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Смирнов Денис
Thanks for your review, Heikki.

I have made the changes you have requested.

1. All modifications interconnected with column projection were reverted (they should be added in https://commitfest.postgresql.org/31/2922 if the current patch would be merged one day).
2. I have returned PROGRESS_ANALYZE_* states.
3. qsort() was moved into heapam_acquire_sample_rows(). Also a comment was added, that the acquire_sample_rows() AM function must return the tuples in a physical table order.
4. table_beginscan_analyze() was removed as a redundant function.
5. acquire_sample_rows() comment about reservoir was changed.



Best regards,
Denis Smirnov | Developer
[hidden email]
Arenadata | Godovikova 9-17, Moscow 129085 Russia


v2-0001-Refactor-AM-analyze-acqiure-rows-method.patch.txt (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Zhihong Yu
Hi,

+       *totalrows = floor((liverows / bs.m) * totalblocks + 0.5);

Is the above equivalent to:

+       *totalrows = ceil((liverows / bs.m) * totalblocks);

For compare_rows(), it seems the computation of oa and ob can be delayed to when ba == bb (after the first two if statements).

Cheers

On Thu, Feb 18, 2021 at 6:06 PM Denis Smirnov <[hidden email]> wrote:
Thanks for your review, Heikki.

I have made the changes you have requested.

1. All modifications interconnected with column projection were reverted (they should be added in https://commitfest.postgresql.org/31/2922 if the current patch would be merged one day).
2. I have returned PROGRESS_ANALYZE_* states.
3. qsort() was moved into heapam_acquire_sample_rows(). Also a comment was added, that the acquire_sample_rows() AM function must return the tuples in a physical table order.
4. table_beginscan_analyze() was removed as a redundant function.
5. acquire_sample_rows() comment about reservoir was changed.


Best regards,
Denis Smirnov | Developer
[hidden email]
Arenadata | Godovikova 9-17, Moscow 129085 Russia

Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Смирнов Денис
Hello,  Zhihong.

Thanks for your comments.

1. I am afraid that an equivalence of "floor(val + 0.5)" to "cell(val)" is incorrect: "floor(2.1 + 0.5)" returns 2 while  "cell(2.1)" returns 3. We can’t use such replacement, as you have suggested.

2. >> For compare_rows(), it seems the computation of oa and ob can be delayed to when ba == bb (after the first two if statements).
I have checked some examples of ASM code generated by different compilers with -O2/O3 flags on https://gcc.godbolt.org and didn’t see any big benefit in result CPU instructions. You can check yourself an attached example below.




Best regards,
Denis Smirnov | Developer
[hidden email]
Arenadata | Godovikova 9-17, Moscow 129085 Russia


simplified_compare_rows.c (649 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PoC Refactor AM analyse API

Zhihong Yu
Denis:
Thanks for considering my suggestion.

For #1, I didn't take your example into account. Thanks for pointing that out.

Cheers

On Thu, Feb 18, 2021 at 11:59 PM Denis Smirnov <[hidden email]> wrote:
Hello,  Zhihong.

Thanks for your comments.

1. I am afraid that an equivalence of "floor(val + 0.5)" to "cell(val)" is incorrect: "floor(2.1 + 0.5)" returns 2 while  "cell(2.1)" returns 3. We can’t use such replacement, as you have suggested.

2. >> For compare_rows(), it seems the computation of oa and ob can be delayed to when ba == bb (after the first two if statements).
I have checked some examples of ASM code generated by different compilers with -O2/O3 flags on https://gcc.godbolt.org and didn’t see any big benefit in result CPU instructions. You can check yourself an attached example below.



Best regards,
Denis Smirnov | Developer
[hidden email]
Arenadata | Godovikova 9-17, Moscow 129085 Russia