[proposal] de-TOAST'ing using a iterator

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[proposal] de-TOAST'ing using a iterator

Binguo Bao
Hi hackers!
This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.

de-TOAST iterator overview:
1. The caller requests the slice of the attribute value from the de-TOAST iterator.
2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
    otherwise goto the step3.
3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,
    call fetch_datum_iterator to fetch datums from disk to input buffer.
4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's
    needs are met.

I've implemented the prototype and apply it to the position() function to test performance.
Test tables:
-----------------------------------------------------------------------------------------------------
create table detoast_c (id serial primary key,
a text
);
insert into detoast_c (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from generate_series(1,100);

create table detoast_u (id serial primary key,
a text
);
alter table detoast_u alter a set storage external;
insert into detoast_u (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from generate_series(1,100);
**************************************************************************************
-----------------------------------------------------------------------------------------------------
                         query                                    |     master (ms)  |  patch  (ms)  |
-----------------------------------------------------------------------------------------------------
select position('123' in a) from detoast_c;    |     4054.838       |    1440.735   |
-----------------------------------------------------------------------------------------------------
select position('321' in a) from detoast_c;    |     25549.270     |   27696.245  |
-----------------------------------------------------------------------------------------------------
select position('123' in a) from detoast_u;    |     8116.996       |    1386.802   |
-----------------------------------------------------------------------------------------------------
select position('321' in a) from detoast_u     |     28442.116     |   27672.319  |
-----------------------------------------------------------------------------------------------------
**************************************************************************************
It can be seen that the iterator greatly improves the efficiency of partial de-TOAST when it has almost no degradation in full de-TOAST efficiency.
Next, I will continue to study how to apply iterators to more queries 
and improve iterator efficiency, such as using macros instead of function calls.

The patch is also available on github[1].
Any suggestions or comments would be much appreciated:)

Best regards, Binguo Bao.


0001-de-TOASTing-using-a-iterator.patch (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

Thomas Munro-5
On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao <[hidden email]> wrote:

> Hi hackers!
> This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.
>
> de-TOAST iterator overview:
> 1. The caller requests the slice of the attribute value from the de-TOAST iterator.
> 2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
>     otherwise goto the step3.
> 3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,
>     call fetch_datum_iterator to fetch datums from disk to input buffer.
> 4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's
>     needs are met.
>
> I've implemented the prototype and apply it to the position() function to test performance.

Hi Binguo,

Interesting work, and nice performance improvements so far.  Just by
the way, the patch currently generates warnings:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

Binguo Bao
Hi Thomas,
I've fixed the warnings.

Thomas Munro <[hidden email]> 于2019年7月5日周五 下午12:21写道:
On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao <[hidden email]> wrote:
> Hi hackers!
> This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.
>
> de-TOAST iterator overview:
> 1. The caller requests the slice of the attribute value from the de-TOAST iterator.
> 2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
>     otherwise goto the step3.
> 3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,
>     call fetch_datum_iterator to fetch datums from disk to input buffer.
> 4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's
>     needs are met.
>
> I've implemented the prototype and apply it to the position() function to test performance.

Hi Binguo,

Interesting work, and nice performance improvements so far.  Just by
the way, the patch currently generates warnings:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719

--
Thomas Munro
https://enterprisedb.com

0001-de-TOASTing-using-a-iterator-2.patch (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

Binguo Bao
This is the patch that fix warnings.

Best Regards,
Binguo Bao

Binguo Bao <[hidden email]> 于2019年7月10日周三 下午10:18写道:
Hi Thomas,
I've fixed the warnings.

Thomas Munro <[hidden email]> 于2019年7月5日周五 下午12:21写道:
On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao <[hidden email]> wrote:
> Hi hackers!
> This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.
>
> de-TOAST iterator overview:
> 1. The caller requests the slice of the attribute value from the de-TOAST iterator.
> 2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
>     otherwise goto the step3.
> 3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,
>     call fetch_datum_iterator to fetch datums from disk to input buffer.
> 4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's
>     needs are met.
>
> I've implemented the prototype and apply it to the position() function to test performance.

Hi Binguo,

Interesting work, and nice performance improvements so far.  Just by
the way, the patch currently generates warnings:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719

--
Thomas Munro
https://enterprisedb.com

0001-de-TOASTing-using-a-iterator-3.patch (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

Binguo Bao
I have set the local build configuration to be the same as on the CI. This patch should be correct.

Best regards,
Binguo Bao

Binguo Bao <[hidden email]> 于2019年7月11日周四 上午12:39写道:
This is the patch that fix warnings.

Best Regards,
Binguo Bao

Binguo Bao <[hidden email]> 于2019年7月10日周三 下午10:18写道:
Hi Thomas,
I've fixed the warnings.

Thomas Munro <[hidden email]> 于2019年7月5日周五 下午12:21写道:
On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao <[hidden email]> wrote:
> Hi hackers!
> This proposal aims to provide the ability to de-TOAST a fully TOAST'd and compressed field using an iterator and then update the appropriate parts of the code to use the iterator where possible instead of de-TOAST'ing and de-compressing the entire value. Examples where this can be helpful include using position() from the beginning of the value, or doing a pattern or substring match.
>
> de-TOAST iterator overview:
> 1. The caller requests the slice of the attribute value from the de-TOAST iterator.
> 2. The de-TOAST iterator checks if there is a slice available in the output buffer, if there is, return the result directly,
>     otherwise goto the step3.
> 3. The de-TOAST iterator checks if there is the slice available in the input buffer, if there is, goto step44. Otherwise,
>     call fetch_datum_iterator to fetch datums from disk to input buffer.
> 4. If the data in the input buffer is compressed, extract some data from the input buffer to the output buffer until the caller's
>     needs are met.
>
> I've implemented the prototype and apply it to the position() function to test performance.

Hi Binguo,

Interesting work, and nice performance improvements so far.  Just by
the way, the patch currently generates warnings:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719

--
Thomas Munro
https://enterprisedb.com

0001-de-TOASTing-using-a-iterator-4.patch (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

John Naylor-2
On Wed, Jun 19, 2019 at 8:51 PM Binguo Bao <[hidden email]> wrote:
> [v4 patch]

Hi Binguo,

I can verify I get no warnings with the v4 patch. I've done some
additional performance testing. First, to sum up your results:

> insert into detoast_c (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from generate_series(1,100);

When the search pattern was at the beginning, the patch was several
times faster , and when the pattern was at the end, it was 3% slower
when uncompressed and 9% slower when compressed.

First, I'd like to advocate for caution when using synthetic
benchmarks involving compression. Consider this test:

insert into detoast_c (a)
select
    'abc'||
    repeat(
    (SELECT string_agg(md5(chr(i)), '')
    FROM generate_series(1,127) i)
    , 10000)
    ||'xyz'
from generate_series(1,100);

The results for the uncompressed case were not much different then
your test. However, in the compressed case the iterator doesn't buy us
much with beginning searches since full decompression is already fast:

                 master          patch
comp. beg.        869ms          837ms
comp. end       14100ms        16100ms
uncomp. beg.     6360ms          800ms
uncomp. end     21100ms        21400ms

and with compression it's 14% slower searching to the end. This is
pretty contrived, but I include it for demonstration.

To test something hopefully a bit more realistic, I loaded 100 records
each containing the 1995 CIA fact book (~3MB of ascii) with a pattern
string put at the beginning and end. For the end search, I used a
longer needle to speed up the consumption of text, hoping to put more
stress on the detoasting algorithms, for example:

select max(position('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
in a)) from detoast_*;

comp. beg.        836ms           22ms
comp. end        1510ms         1700ms
uncomp. beg.      185ms           12ms
uncomp. end       851ms          903ms

Here, the "beginning" case is ~15-35x faster, which is very impressive
and much faster than with your generated contents. The "end" case is
up to 13% slower. It would be interesting to see where the break-even
point is, where the results are the same.

Reading the thread where you're working on optimizing partial
decompression [1], it seems you have two separate solutions for the
two problems. Maybe this is fine, but I'd like to bring up the
possibility of using the same approach for both kinds of callers.

I'm not an expert on TOAST, but maybe one way to solve both problems
is to work at the level of whole TOAST chunks. In that case, the
current patch would look like this:

1. The caller requests more of the attribute value from the de-TOAST iterator.
2. The iterator gets the next chunk and either copies or decompresses
the whole chunk into the buffer. (If inline, just decompress the whole
thing)

This seems simpler and also easy to adapt to callers that do know how
big a slice they want. I also suspect this way would be easier to
adapt to future TOAST formats not tied to heap or to a certain
compression algorithm. With less bookkeepping overhead, maybe there'll
be less worst-case performance degradation, while not giving up much
in the best case. (Note also that commit 9556aa01c6 already introduced
some performance degradation in near-end searches, when using
multibyte strings. This patch would add to that.) The regression
doesn't seem large, but I see more than your test showed, and it would
be nice to avoid it.

Thoughts, anyone?

[1] https://www.postgresql.org/message-id/flat/CAL-OGkux7%2BBm_J%3Dt5VpH7fJGGSm%2BPxWJtgs1%2BWU2g6cmLru%3D%3DA%40mail.gmail.com#705d074aa4ae305ed3d992b7e5b7af3c

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

Binguo Bao
Hi, John

First, I'd like to advocate for caution when using synthetic
benchmarks involving compression. Consider this test:
insert into detoast_c (a)
select
    'abc'||
    repeat(
    (SELECT string_agg(md5(chr(i)), '')
    FROM generate_series(1,127) i)
    , 10000)
    ||'xyz'
from generate_series(1,100);
The results for the uncompressed case were not much different then
your test. However, in the compressed case the iterator doesn't buy us
much with beginning searches since full decompression is already fast:
                 master          patch
comp. beg.        869ms          837ms
comp. end       14100ms        16100ms
uncomp. beg.     6360ms          800ms
uncomp. end     21100ms        21400ms
and with compression it's 14% slower searching to the end. This is
pretty contrived, but I include it for demonstration.

I've reproduced the test case with test scripts in the attachment on my laptop:

                                 master              patch
comp. beg.        2686.77 ms          1532.79 ms
comp. end         17971.8 ms          21206.3 ms
uncomp. beg.    8358.79 ms          1556.93 ms
uncomp. end     23559.7 ms          22547.1 ms

In the compressed beginning case, the test result is different from yours since the patch is ~1.75x faster
rather than no improvement. The interesting thing is that the patch if 4% faster than master in the uncompressed end case.
I can't figure out reason now.

Reading the thread where you're working on optimizing partial
decompression [1], it seems you have two separate solutions for the
two problems. Maybe this is fine, but I'd like to bring up the
possibility of using the same approach for both kinds of callers.
 
I'm not an expert on TOAST, but maybe one way to solve both problems
is to work at the level of whole TOAST chunks. In that case, the
current patch would look like this:
1. The caller requests more of the attribute value from the de-TOAST iterator.
2. The iterator gets the next chunk and either copies or decompresses
the whole chunk into the buffer. (If inline, just decompress the whole
thing) 

Thanks for your suggestion. It is indeed possible to implement PG_DETOAST_DATUM_SLICE using the de-TOAST iterator.
IMO the iterator is more suitable for situations where the caller doesn't know the slice size. If the caller knows the slice size,
it is reasonable to fetch enough chunks at once and then decompress it at once.
 -- 
Best regards,
Binguo Bao

init-test.sh (964 bytes) Download Attachment
iterator-test.sh (890 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

John Naylor-2
On Tue, Jul 16, 2019 at 9:14 PM Binguo Bao <[hidden email]> wrote:
> In the compressed beginning case, the test result is different from yours since the patch is ~1.75x faster
> rather than no improvement. The interesting thing is that the patch if 4% faster than master in the uncompressed end case.
> I can't figure out reason now.

Probably some differences in our test environments. I wouldn't worry
about it too much, since we can show improvement in more realistic
tests.

>> I'm not an expert on TOAST, but maybe one way to solve both problems
>> is to work at the level of whole TOAST chunks. In that case, the
>> current patch would look like this:
>> 1. The caller requests more of the attribute value from the de-TOAST iterator.
>> 2. The iterator gets the next chunk and either copies or decompresses
>> the whole chunk into the buffer. (If inline, just decompress the whole
>> thing)
>
>
> Thanks for your suggestion. It is indeed possible to implement PG_DETOAST_DATUM_SLICE using the de-TOAST iterator.
> IMO the iterator is more suitable for situations where the caller doesn't know the slice size. If the caller knows the slice size,
> it is reasonable to fetch enough chunks at once and then decompress it at once.

That sounds reasonable for the reason of less overhead.

In the case where we don't know the slice size, how about the other
aspect of my question above: Might it be simpler and less overhead to
decompress entire chunks at a time? If so, I think it would be
enlightening to compare performance.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

Binguo Bao
Hi John!
Sorry for the late reply. It took me some time to fix a random bug.

In the case where we don't know the slice size, how about the other
aspect of my question above: Might it be simpler and less overhead to
decompress entire chunks at a time? If so, I think it would be
enlightening to compare performance.

Good idea. I've tested  your propopal with scripts and patch v5 in the attachment:

                  master          patch v4          patch v5
comp. beg.        4364ms          1505ms          1529ms
comp. end       28321ms        31202ms          26916ms
uncomp. beg.     3474ms          1513ms          1523ms
uncomp. end     27416ms        30260ms          25888ms

The proposal improves suffix query performance greatly
with less calls to the decompression function. 

Besides, do you have any other suggestions for the structure of DetoastIterator or ToastBuffer?
Maybe they can be designed to be more reasonable.

Thanks again for the proposal.
-- 
Best regards,
Binguo Bao

init-test.sh (964 bytes) Download Attachment
iterator-test.sh (890 bytes) Download Attachment
0001-de-TOASTing-using-a-iterator-5.patch (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

John Naylor-2
On Thu, Jul 25, 2019 at 10:21 PM Binguo Bao <[hidden email]> wrote:
>
> Hi John!
> Sorry for the late reply. It took me some time to fix a random bug.

Don't worry, it's not late at all! :-)

>> In the case where we don't know the slice size, how about the other
>> aspect of my question above: Might it be simpler and less overhead to
>> decompress entire chunks at a time? If so, I think it would be
>> enlightening to compare performance.
>
>
> Good idea. I've tested  your propopal with scripts and patch v5 in the attachment:
>
>                   master          patch v4          patch v5
> comp. beg.        4364ms          1505ms          1529ms
> comp. end       28321ms        31202ms          26916ms
> uncomp. beg.     3474ms          1513ms          1523ms
> uncomp. end     27416ms        30260ms          25888ms
>
> The proposal improves suffix query performance greatly
> with less calls to the decompression function.
Looks good. I repeated my CIA fact book test and found no difference
with compression, but found that suffix search in the uncompressed
case had less regression (~5%) than v4 (>8%). Let's pursue this
further.

> Besides, do you have any other suggestions for the structure of DetoastIterator or ToastBuffer?

My goal for this stage of review was to understand more fully what the
code is doing, and make it as simple and clear as possible, starting
at the top level. In doing so, it looks like I found some additional
performance gains. I haven't looked much yet at the TOAST fetching
logic.


1). For every needle comparison, text_position_next_internal()
calculates how much of the value is needed and passes that to
detoast_iterate(), which then calculates if it has to do something or
not. This is a bit hard to follow. There might also be a performance
penalty -- the following is just a theory, but it sounds plausible:
The CPU can probably correctly predict that detoast_iterate() will
usually return the same value it did last time, but it still has to
call the function and make sure, which I imagine is more expensive
than advancing the needle. Ideally, we want to call the iterator only
if we have to.

In the attached patch (applies on top of your v5),
text_position_next_internal() simply compares hptr to the detoast
buffer limit, and calls detoast_iterate() until it can proceed. I
think this is clearer. (I'm not sure of the error handling, see #2.)
In this scheme, the only reason to know length is to pass to
pglz_decompress_iterate() in the case of in-line compression. As I
alluded to in my first review, I don't think it's worth the complexity
to handle that iteratively since the value is only a few kB. I made it
so in-line datums are fully decompressed as in HEAD and removed struct
members to match. I also noticed that no one updates or looks at
"toast_iter.done" so I removed that as well.

Now pglz_decompress_iterate() doesn't need length at all. For testing
I just set decompress_all = true and let the compiler optimize away
the rest. I left finishing it for you if you agree with these changes.

With this additional patch, the penalty for suffix search in my CIA
fact book test is only ~2% in the compressed case, and might even be
slightly faster than HEAD in the uncompressed case.


2). detoast_iterate() and fetch_datum_iterate() return a value but we
don't check it or do anything with it. Should we do something with it?
It's also not yet clear if we should check the iterator state instead
of return values. I've added some XXX comments as a reminder. We
should also check the return value of pglz_decompress_iterate().


3). Speaking of pglz_decompress_iterate(), I diff'd it with
pglz_decompress(), and I have some questions on it:

a).
+ srcend = (const unsigned char *) (source->limit == source->capacity
? source->limit : (source->limit - 4));

What does the 4 here mean in this expression? Is it possible it's
compensating for this bit in init_toast_buffer()?

+ buf->limit = VARDATA(buf->buf);

It seems the initial limit should also depend on whether the datum is
compressed, right? Can we just do this:

+ buf->limit = buf->position;

b).
- while (sp < srcend && dp < destend)
...
+ while (sp + 1 < srcend && dp < destend &&
...

Why is it here "sp + 1"?


4. Note that varlena.c has a static state variable, and a cleanup
function that currently does:

static void
text_position_cleanup(TextPositionState *state)
{
/* no cleanup needed */
}

It seems to be the detoast iterator could be embedded in this state
variable, and then free-ing can happen here. That has a possible
advantage that the iterator struct would be on the same cache line as
the state data. That would also remove the need to pass "iter" as a
parameter, since these functions already pass "state". I'm not sure if
this would be good for other users of the iterator, so maybe we can
hold off on that for now.

5. Would it be a good idea to add tests (not always practical), or
more Assert()'s? You probably already know this, but as a reminder
it's good to develop with asserts enabled, but never build with them
for performance testing.

I think that's enough for now. If you have any questions or
counter-arguments, let me know. I've set the commitfest entry to
waiting on author.


--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

0002-addendum-to-detoast-v5.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

Binguo Bao


John Naylor <[hidden email]> 于2019年7月29日周一 上午11:49写道:
On Thu, Jul 25, 2019 at 10:21 PM Binguo Bao <[hidden email]> wrote:
My goal for this stage of review was to understand more fully what the
code is doing, and make it as simple and clear as possible, starting
at the top level. In doing so, it looks like I found some additional
performance gains. I haven't looked much yet at the TOAST fetching
logic.


1). For every needle comparison, text_position_next_internal()
calculates how much of the value is needed and passes that to
detoast_iterate(), which then calculates if it has to do something or
not. This is a bit hard to follow. There might also be a performance
penalty -- the following is just a theory, but it sounds plausible:
The CPU can probably correctly predict that detoast_iterate() will
usually return the same value it did last time, but it still has to
call the function and make sure, which I imagine is more expensive
than advancing the needle. Ideally, we want to call the iterator only
if we have to.

In the attached patch (applies on top of your v5),
text_position_next_internal() simply compares hptr to the detoast
buffer limit, and calls detoast_iterate() until it can proceed. I
think this is clearer.

Yes, I think this is a general scenario where the caller continually
calls detoast_iterate until gets enough data, so I think such operations can
be extracted as a macro, as I did in patch v6. In the macro, the detoast_iterate
function is called only when the data requested by the caller is greater than the
buffer limit.

(I'm not sure of the error handling, see #2.)
In this scheme, the only reason to know length is to pass to
pglz_decompress_iterate() in the case of in-line compression. As I
alluded to in my first review, I don't think it's worth the complexity
to handle that iteratively since the value is only a few kB. I made it
so in-line datums are fully decompressed as in HEAD and removed struct
members to match.

Sounds good. This not only simplifies the structure and logic of Detoast Iterator
but also has no major impact on efficiency.
 
I also noticed that no one updates or looks at
"toast_iter.done" so I removed that as well.

toast_iter.done is updated when the buffer limit reached the buffer capacity now.
So, I added it back.
 
Now pglz_decompress_iterate() doesn't need length at all. For testing
I just set decompress_all = true and let the compiler optimize away
the rest. I left finishing it for you if you agree with these changes.

Done. 
 
2). detoast_iterate() and fetch_datum_iterate() return a value but we
don't check it or do anything with it. Should we do something with it?
It's also not yet clear if we should check the iterator state instead
of return values. I've added some XXX comments as a reminder. We
should also check the return value of pglz_decompress_iterate().

IMO, we need to provide users with a simple iterative interface.
Using the required data pointer to compare with the buffer limit is an easy way.
And the application scenarios of the iterator are mostly read operations.
So I think there is no need to return a value, and the iterator needs to throw an
exception for some wrong calls, such as all the data have been iterated,
but the user still calls the iterator.
 

3). Speaking of pglz_decompress_iterate(), I diff'd it with
pglz_decompress(), and I have some questions on it:

a).
+ srcend = (const unsigned char *) (source->limit == source->capacity
? source->limit : (source->limit - 4));

What does the 4 here mean in this expression?

Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,
In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and
read unallocated bytes. Giving a four-byte buffer can prevent sp from exceeding the source buffer limit.
If we have read all the chunks, we don't need to be careful to cross the border, 
just make srcend equal to source buffer limit. I've added comments to explain it in patch v6.

 
Is it possible it's
compensating for this bit in init_toast_buffer()?

+ buf->limit = VARDATA(buf->buf);

It seems the initial limit should also depend on whether the datum is
compressed, right? Can we just do this:

+ buf->limit = buf->position;

I'm afraid not. buf->position points to the data portion of the buffer, but the beginning of
the chunks we read may contain header information. For example, for compressed data chunks, 
the first four bytes record the size of raw data, this means that limit is four bytes ahead of position.
This initialization doesn't cause errors, although the position is less than the limit in other cases.
Because we always fetch chunks first, then decompress it. 
 
b).
- while (sp < srcend && dp < destend)
...
+ while (sp + 1 < srcend && dp < destend &&
...

Why is it here "sp + 1"?

Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch v6 to
achieve the purpose of parsing ctrl correctly every time.
 

4. Note that varlena.c has a static state variable, and a cleanup
function that currently does:

static void
text_position_cleanup(TextPositionState *state)
{
/* no cleanup needed */
}

It seems to be the detoast iterator could be embedded in this state
variable, and then free-ing can happen here. That has a possible
advantage that the iterator struct would be on the same cache line as
the state data. That would also remove the need to pass "iter" as a
parameter, since these functions already pass "state". I'm not sure if
this would be good for other users of the iterator, so maybe we can
hold off on that for now.

Good idea. I've implemented it in patch v6.
 
5. Would it be a good idea to add tests (not always practical), or
more Assert()'s? You probably already know this, but as a reminder
it's good to develop with asserts enabled, but never build with them
for performance testing.

I've added more Assert()'s to check iterator state.
 

I think that's enough for now. If you have any questions or
counter-arguments, let me know. I've set the commitfest entry to
waiting on author.


--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

BTW, I found that iterators come in handy for json/jsonb's find field value or get array elements operations.
I will continue to optimize the json/jsonb query based on the detoast iterator patch.

--
Best regards,
Binguo Bao

0001-de-TOASTing-using-a-iterator-6.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

John Naylor-2
On Tue, Jul 30, 2019 at 8:20 PM Binguo Bao <[hidden email]> wrote:

>
> John Naylor <[hidden email]> 于2019年7月29日周一 上午11:49写道:
>>
>> 1). For every needle comparison, text_position_next_internal()
>> calculates how much of the value is needed and passes that to
>> detoast_iterate(), which then calculates if it has to do something or
>> not. This is a bit hard to follow. There might also be a performance
>> penalty -- the following is just a theory, but it sounds plausible:
>> The CPU can probably correctly predict that detoast_iterate() will
>> usually return the same value it did last time, but it still has to
>> call the function and make sure, which I imagine is more expensive
>> than advancing the needle. Ideally, we want to call the iterator only
>> if we have to.
>>
>> In the attached patch (applies on top of your v5),
>> text_position_next_internal() simply compares hptr to the detoast
>> buffer limit, and calls detoast_iterate() until it can proceed. I
>> think this is clearer.
>
>
> Yes, I think this is a general scenario where the caller continually
> calls detoast_iterate until gets enough data, so I think such operations can
> be extracted as a macro, as I did in patch v6. In the macro, the detoast_iterate
> function is called only when the data requested by the caller is greater than the
> buffer limit.

I like the use of a macro here. However, I think we can find a better
location for the definition. See the header comment of fmgr.h:
"Definitions for the Postgres function manager and function-call
interface." Maybe tuptoaster.h is as good a place as any?

>> I also noticed that no one updates or looks at
>> "toast_iter.done" so I removed that as well.
>
>
> toast_iter.done is updated when the buffer limit reached the buffer capacity now.
> So, I added it back.

Okay.

>> 2). detoast_iterate() and fetch_datum_iterate() return a value but we
>> don't check it or do anything with it. Should we do something with it?
>> It's also not yet clear if we should check the iterator state instead
>> of return values. I've added some XXX comments as a reminder. We
>> should also check the return value of pglz_decompress_iterate().
>
>
> IMO, we need to provide users with a simple iterative interface.
> Using the required data pointer to compare with the buffer limit is an easy way.
> And the application scenarios of the iterator are mostly read operations.
> So I think there is no need to return a value, and the iterator needs to throw an
> exception for some wrong calls, such as all the data have been iterated,
> but the user still calls the iterator.

Okay, and see these functions now return void. The orignal
pglz_decompress() returned a value that was check against corruption.
Is there a similar check we can do for the iterative version?

>> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
>> pglz_decompress(), and I have some questions on it:
>>
>> a).
>> + srcend = (const unsigned char *) (source->limit == source->capacity
>> ? source->limit : (source->limit - 4));
>>
>> What does the 4 here mean in this expression?
>
>
> Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,
> In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and read unallocated bytes.

Why is this? That tells me the limit is incorrect. Can the setter not
determine the right value?

> Giving a four-byte buffer can prevent sp from exceeding the source buffer limit.

Why 4? That's a magic number. Why not 2, or 27?

> If we have read all the chunks, we don't need to be careful to cross the border,
> just make srcend equal to source buffer limit. I've added comments to explain it in patch v6.

That's a good thing to comment on, but it doesn't explain why. This
logic seems like a band-aid and I think a committer would want this to
be handled in a more principled way.

>> Is it possible it's
>> compensating for this bit in init_toast_buffer()?
>>
>> + buf->limit = VARDATA(buf->buf);
>>
>> It seems the initial limit should also depend on whether the datum is
>> compressed, right? Can we just do this:
>>
>> + buf->limit = buf->position;
>
>
> I'm afraid not. buf->position points to the data portion of the buffer, but the beginning of
> the chunks we read may contain header information. For example, for compressed data chunks,
> the first four bytes record the size of raw data, this means that limit is four bytes ahead of position.
> This initialization doesn't cause errors, although the position is less than the limit in other cases.
> Because we always fetch chunks first, then decompress it.

I see what you mean now. This could use a comment or two to explain
the stated constraints may not actually be satisfied at
initialization.

>> b).
>> - while (sp < srcend && dp < destend)
>> ...
>> + while (sp + 1 < srcend && dp < destend &&
>> ...
>>
>> Why is it here "sp + 1"?
>
>
> Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch v6 to
> achieve the purpose of parsing ctrl correctly every time.

Please explain further. Was the "sp + 1" correct behavior (and why),
or only for debugging setting ctrl/c correctly? Also, I don't think
the new logic for the ctrl/c variables is an improvement:

1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
which is confusing). Any time you initialize with something not 0 or
1, it's a magic number, and here it's far from where the loop variable
is used. This is harder to read.

2. First time though the loop, iter->ctrlc = 8, which immediately gets
set back to 0.

3. At the end of the loop, iter->ctrl/c are unconditionally set. In
v5, there was a condition which would usually avoid this copying of
values through pointers.

>> 4. Note that varlena.c has a static state variable, and a cleanup
>> function that currently does:
>>
>> static void
>> text_position_cleanup(TextPositionState *state)
>> {
>> /* no cleanup needed */
>> }
>>
>> It seems to be the detoast iterator could be embedded in this state
>> variable, and then free-ing can happen here. That has a possible
>> advantage that the iterator struct would be on the same cache line as
>> the state data. That would also remove the need to pass "iter" as a
>> parameter, since these functions already pass "state". I'm not sure if
>> this would be good for other users of the iterator, so maybe we can
>> hold off on that for now.
>
>
> Good idea. I've implemented it in patch v6.

That's better, and I think we can take it a little bit farther.

1. Notice that TextPositionState is allocated on the stack in
text_position(), which passes both the "state" pointer and the "iter"
pointer to text_position_setup(), and only then sets state->iter =
iter. We can easily set this inside text_position(). That would get
rid of the need for other callers to pass NULL iter to
text_position_setup().

2. DetoastIteratorData is fixed size, so I see no reason to allocate
it on the heap. We could allocate it on the stack in text_pos(), and
pass the pointer to create_detoast_iterator() (in this case maybe a
better name is init_detoast_iterator), which would return a bool to
tell text_pos() whether to pass down the pointer or a NULL. The
allocation of other structs (toast buffer and fetch iterator) probably
can't be changed without more work.

>> 5. Would it be a good idea to add tests (not always practical), or
>> more Assert()'s? You probably already know this, but as a reminder
>> it's good to develop with asserts enabled, but never build with them
>> for performance testing.
>
>
> I've added more Assert()'s to check iterator state.

Okay.

> BTW, I found that iterators come in handy for json/jsonb's find field value or get array elements operations.
> I will continue to optimize the json/jsonb query based on the detoast iterator patch.

That will be an interesting use case.

There are other aspects of the patch I should investigate, but I'll
put that off for another time. Commitfest is over, but note that
review can happen at any time. I'll continue to do so as time permits.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

Binguo Bao


John Naylor <[hidden email]> 于2019年8月2日周五 下午3:12写道:
On Tue, Jul 30, 2019 at 8:20 PM Binguo Bao <[hidden email]> wrote:
>
> John Naylor <[hidden email]> 于2019年7月29日周一 上午11:49写道:
>>
>> 1). For every needle comparison, text_position_next_internal()
>> calculates how much of the value is needed and passes that to
>> detoast_iterate(), which then calculates if it has to do something or
>> not. This is a bit hard to follow. There might also be a performance
>> penalty -- the following is just a theory, but it sounds plausible:
>> The CPU can probably correctly predict that detoast_iterate() will
>> usually return the same value it did last time, but it still has to
>> call the function and make sure, which I imagine is more expensive
>> than advancing the needle. Ideally, we want to call the iterator only
>> if we have to.
>>
>> In the attached patch (applies on top of your v5),
>> text_position_next_internal() simply compares hptr to the detoast
>> buffer limit, and calls detoast_iterate() until it can proceed. I
>> think this is clearer.
>
>
> Yes, I think this is a general scenario where the caller continually
> calls detoast_iterate until gets enough data, so I think such operations can
> be extracted as a macro, as I did in patch v6. In the macro, the detoast_iterate
> function is called only when the data requested by the caller is greater than the
> buffer limit.

I like the use of a macro here. However, I think we can find a better
location for the definition. See the header comment of fmgr.h:
"Definitions for the Postgres function manager and function-call
interface." Maybe tuptoaster.h is as good a place as any?

PG_DETOAST_ITERATE isn't a sample function-call interface,
But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is
similar to PG_DETOAST_ITERATE, make condition check first and then
decide whether to call the function. Besides, PG_DETOAST_DATUM, 
PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE, 
PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable
to put all the de-TOAST interface together.

>> 2). detoast_iterate() and fetch_datum_iterate() return a value but we
>> don't check it or do anything with it. Should we do something with it?
>> It's also not yet clear if we should check the iterator state instead
>> of return values. I've added some XXX comments as a reminder. We
>> should also check the return value of pglz_decompress_iterate().
>
>
> IMO, we need to provide users with a simple iterative interface.
> Using the required data pointer to compare with the buffer limit is an easy way.
> And the application scenarios of the iterator are mostly read operations.
> So I think there is no need to return a value, and the iterator needs to throw an
> exception for some wrong calls, such as all the data have been iterated,
> but the user still calls the iterator.

Okay, and see these functions now return void. The orignal
pglz_decompress() returned a value that was check against corruption.
Is there a similar check we can do for the iterative version?

As far as I know, we can just do such check after all compressed data is decompressed.
If we are slicing, we can't do the check.
 

>> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
>> pglz_decompress(), and I have some questions on it:
>>
>> a).
>> + srcend = (const unsigned char *) (source->limit == source->capacity
>> ? source->limit : (source->limit - 4));
>>
>> What does the 4 here mean in this expression?
>
>
> Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,
> In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and read unallocated bytes.

Why is this? That tells me the limit is incorrect. Can the setter not
determine the right value?

There are three statments change `sp` value in the while loop `while (sp < srcend && dp < destend)`:
`ctrl = *sp++;`
`off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;`
`len += *sp++`
Although we make sure `sp` is less than `srcend` when enter while loop, `sp` is likely to 
go beyond the `srcend` in the loop, and we should ensure that `sp` is always smaller than `buf->limit` to avoid
reading unallocated data. So, `srcend` can't be initialized to `buf->limit`. Only one case is exceptional,
we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity', it's imposisble to read unallocated
data via `sp`.

> Giving a four-byte buffer can prevent sp from exceeding the source buffer limit.

Why 4? That's a magic number. Why not 2, or 27?

As I explained above, `sp` may go beyond the `srcend`in the loop, up to the `srcend + 2`.
In theory, it's ok to set the buffer size to be greater than or equal 2. 
 
> If we have read all the chunks, we don't need to be careful to cross the border,
> just make srcend equal to source buffer limit. I've added comments to explain it in patch v6.

That's a good thing to comment on, but it doesn't explain why.

Yes, the current comment is puzzling. I'll improve it.
 
This
logic seems like a band-aid and I think a committer would want this to
be handled in a more principled way.

I don't want to change pglz_decompress logic too much, the iterator should
pay more attention to saving and restoring the original pglz_decompress state.
 
>> Is it possible it's
>> compensating for this bit in init_toast_buffer()?
>>
>> + buf->limit = VARDATA(buf->buf);
>>
>> It seems the initial limit should also depend on whether the datum is
>> compressed, right? Can we just do this:
>>
>> + buf->limit = buf->position;
>
>
> I'm afraid not. buf->position points to the data portion of the buffer, but the beginning of
> the chunks we read may contain header information. For example, for compressed data chunks,
> the first four bytes record the size of raw data, this means that limit is four bytes ahead of position.
> This initialization doesn't cause errors, although the position is less than the limit in other cases.
> Because we always fetch chunks first, then decompress it.

I see what you mean now. This could use a comment or two to explain
the stated constraints may not actually be satisfied at
initialization.

Done.
 
>> b).
>> - while (sp < srcend && dp < destend)
>> ...
>> + while (sp + 1 < srcend && dp < destend &&
>> ...
>>
>> Why is it here "sp + 1"?
>
>
> Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch v6 to
> achieve the purpose of parsing ctrl correctly every time.

Please explain further. Was the "sp + 1" correct behavior (and why),
or only for debugging setting ctrl/c correctly?

In patch v5, If the condition is `sp < srcend`, suppose `sp = srcend - 1` before
entering the loop `while (sp < srcend && dp < destend)`, when entering the loop
and read a control byte(sp equals to `srcend` now), the program can't enter the
loop `for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)`, then set `iter->ctrlc` to 0, 
exit the first loop and then this iteration is over. At the next iteration, 
the control byte will be reread since `iter->ctrlc` equals to 0, but the previous control byte
is not used. Changing the condition to `sp + 1 < srcend` avoid only one control byte is read
then the iterator is over.
 
Also, I don't think
the new logic for the ctrl/c variables is an improvement:

1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
which is confusing). Any time you initialize with something not 0 or
1, it's a magic number, and here it's far from where the loop variable
is used. This is harder to read.

`iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at the end of
the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value is 0~7,
When `ctrlc` reaches 8,  a control byte is read from the source
buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read from the
source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be intialized with '8'.
 
2. First time though the loop, iter->ctrlc = 8, which immediately gets
set back to 0.

As I explained above, `iter->ctrlc = 8` make a control byte be read
from the source buffer to `ctrl` on the first iteration. Besides, `iter->ctrlc = 8`
indicates that the valid value of `ctrlc` at the end of the last iteration was not
recorded, Obviously, there are no other iterations before the first iteration.
 
3. At the end of the loop, iter->ctrl/c are unconditionally set. In
v5, there was a condition which would usually avoid this copying of
values through pointers.

Patch v6 just records the value of `ctrlc` at the end of each iteration(or loop)
whether it is valid (0~7) or 8, and initializes `ctrlc` on the next iteration(or loop) correctly.
I think it is more concise in patch v6.
 

>> 4. Note that varlena.c has a static state variable, and a cleanup
>> function that currently does:
>>
>> static void
>> text_position_cleanup(TextPositionState *state)
>> {
>> /* no cleanup needed */
>> }
>>
>> It seems to be the detoast iterator could be embedded in this state
>> variable, and then free-ing can happen here. That has a possible
>> advantage that the iterator struct would be on the same cache line as
>> the state data. That would also remove the need to pass "iter" as a
>> parameter, since these functions already pass "state". I'm not sure if
>> this would be good for other users of the iterator, so maybe we can
>> hold off on that for now.
>
>
> Good idea. I've implemented it in patch v6.

That's better, and I think we can take it a little bit farther.

1. Notice that TextPositionState is allocated on the stack in
text_position(), which passes both the "state" pointer and the "iter"
pointer to text_position_setup(), and only then sets state->iter =
iter. We can easily set this inside text_position(). That would get
rid of the need for other callers to pass NULL iter to
text_position_setup().

Done.
 
2. DetoastIteratorData is fixed size, so I see no reason to allocate
it on the heap. We could allocate it on the stack in text_pos(), and
pass the pointer to create_detoast_iterator() (in this case maybe a
better name is init_detoast_iterator), which would return a bool to
tell text_pos() whether to pass down the pointer or a NULL. The
allocation of other structs (toast buffer and fetch iterator) probably
can't be changed without more work.

Done

If there is anything else that is not explained clearly, please point it out.

--
Best regards,
Binguo Bao

0001-de-TOASTing-using-a-iterator-7.patch (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

John Naylor-2
On Sat, Aug 3, 2019 at 11:11 PM Binguo Bao <[hidden email]> wrote:

>
> John Naylor <[hidden email]> 于2019年8月2日周五 下午3:12写道:
>>
>> I like the use of a macro here. However, I think we can find a better
>> location for the definition. See the header comment of fmgr.h:
>> "Definitions for the Postgres function manager and function-call
>> interface." Maybe tuptoaster.h is as good a place as any?
>
> PG_DETOAST_ITERATE isn't a sample function-call interface,
> But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is
> similar to PG_DETOAST_ITERATE, make condition check first and then
> decide whether to call the function. Besides, PG_DETOAST_DATUM,
> PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE,
> PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable
> to put all the de-TOAST interface together.

Hmm, it's strange that those macros ended up there, but now I see why
it makes sense to add new ones there also.

>> Okay, and see these functions now return void. The orignal
>> pglz_decompress() returned a value that was check against corruption.
>> Is there a similar check we can do for the iterative version?
>
> As far as I know, we can just do such check after all compressed data is decompressed.
> If we are slicing, we can't do the check.

Okay.

>> >> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
>> >> pglz_decompress(), and I have some questions on it:
>> >>
>> >> a).
>> >> + srcend = (const unsigned char *) (source->limit == source->capacity
>> >> ? source->limit : (source->limit - 4));
>> >>
>> >> What does the 4 here mean in this expression?
>> >
>> > Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,
>> > In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and read unallocated bytes.
>>
>> Why is this? That tells me the limit is incorrect. Can the setter not
>> determine the right value?
>
> There are three statments change `sp` value in the while loop `while (sp < srcend && dp < destend)`:
> `ctrl = *sp++;`
> `off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;`
> `len += *sp++`
> Although we make sure `sp` is less than `srcend` when enter while loop, `sp` is likely to
> go beyond the `srcend` in the loop, and we should ensure that `sp` is always smaller than `buf->limit` to avoid
> reading unallocated data. So, `srcend` can't be initialized to `buf->limit`. Only one case is exceptional,
> we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity', it's imposisble to read unallocated
> data via `sp`.

Thank you for the detailed explanation and the comment.

>> Please explain further. Was the "sp + 1" correct behavior (and why),
>> or only for debugging setting ctrl/c correctly?
>
> In patch v5, If the condition is `sp < srcend`, suppose `sp = srcend - 1` before
> entering the loop `while (sp < srcend && dp < destend)`, when entering the loop
> and read a control byte(sp equals to `srcend` now), the program can't enter the
> loop `for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)`, then set `iter->ctrlc` to 0,
> exit the first loop and then this iteration is over. At the next iteration,
> the control byte will be reread since `iter->ctrlc` equals to 0, but the previous control byte
> is not used. Changing the condition to `sp + 1 < srcend` avoid only one control byte is read
> then the iterator is over.

Okay, that's quite subtle. I agree the v6/7 way is more clear in this regard.

>> Also, I don't think
>> the new logic for the ctrl/c variables is an improvement:
>>
>> 1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
>> which is confusing). Any time you initialize with something not 0 or
>> 1, it's a magic number, and here it's far from where the loop variable
>> is used. This is harder to read.
>
> `iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at the end of
> the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value is 0~7,
> When `ctrlc` reaches 8,  a control byte is read from the source
> buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read from the
> source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be intialized with '8'.

My point here is it looks strange out of context, but "0" looked
normal. Maybe a comment in init_detoast_buffer(), something like "8
means read a control byte from the source buffer on the first
iteration, see pg_lzdecompress_iterate()".

Or, possibly, we could have a macro like INVALID_CTRLC. That might
even improve the readability of the original function. This is just an
idea, and maybe others would disagree, so you don't need to change it
for now.

>> 3. At the end of the loop, iter->ctrl/c are unconditionally set. In
>> v5, there was a condition which would usually avoid this copying of
>> values through pointers.
>
> Patch v6 just records the value of `ctrlc` at the end of each iteration(or loop)
> whether it is valid (0~7) or 8, and initializes `ctrlc` on the next iteration(or loop) correctly.
> I think it is more concise in patch v6.

And, in the case mentioned above where we enter the while loop with sp
= src_end - 1 , we can read the control byte and still store the
correct value for ctrlc.

>>> [varlena.c api]
>> That's better, and I think we can take it a little bit farther.
>>
>> 1. Notice that TextPositionState is allocated on the stack in
>> text_position(), which passes both the "state" pointer and the "iter"
>> pointer to text_position_setup(), and only then sets state->iter =
>> iter. We can easily set this inside text_position(). That would get
>> rid of the need for other callers to pass NULL iter to
>> text_position_setup().
>
>
> Done.

That looks much cleaner, thanks.

I've repeated my performance test to make sure there's no additional
regression in my suffix tests:

                master       patch v7
comp. end       1560s        1600ms
uncomp. end     896ms         890ms

The regression from master in the compressed case is about 2.5%, which
is no different from the last patch I tested, so that's good.

At this point, there are no functional things that I think we need to
change. It's close to ready-for-committer. For the next version, I'd
like you go through the comments and edit for grammar, spelling, and
clarity as you see fit. I know you're not a native speaker of English,
so I can help you with anything that remains. Also note we use braces
on their own lines
{
    like this
}

We do have a source formatting tool (pgindent), but it helps
readability for committers to have it mostly standard beforehand.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

Binguo Bao
Hi John,
>> Also, I don't think

>> the new logic for the ctrl/c variables is an improvement:
>>
>> 1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
>> which is confusing). Any time you initialize with something not 0 or
>> 1, it's a magic number, and here it's far from where the loop variable
>> is used. This is harder to read.
>
> `iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at the end of
> the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value is 0~7,
> When `ctrlc` reaches 8,  a control byte is read from the source
> buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read from the
> source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be intialized with '8'.

My point here is it looks strange out of context, but "0" looked
normal. Maybe a comment in init_detoast_buffer(), something like "8
means read a control byte from the source buffer on the first
iteration, see pg_lzdecompress_iterate()".

Or, possibly, we could have a macro like INVALID_CTRLC. That might
even improve the readability of the original function. This is just an
idea, and maybe others would disagree, so you don't need to change it
for now.

All in all, the idea is much better than a magic number 8. So, I've implemented it.
 
At this point, there are no functional things that I think we need to
change. It's close to ready-for-committer. For the next version, I'd
like you go through the comments and edit for grammar, spelling, and
clarity as you see fit. I know you're not a native speaker of English,
so I can help you with anything that remains.

I've tried my best to improve the comments, but there should be room for further improvement
I hope you can help me perfect it.
 
Also note we use braces
on their own lines
{
    like this
}

Done. 
-- 
Best regards,
Binguo Bao

0001-de-TOASTing-using-a-iterator-8.patch (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] de-TOAST'ing using a iterator

John Naylor-2
On Fri, Aug 16, 2019 at 10:48 PM Binguo Bao <[hidden email]> wrote:
> [v8 patch with cosmetic changes]

Okay, looks good. I'll make a few style suggestions and corrections.
In the course of looking at this again, I have a few other questions
below as well.

It looks like you already do this for the most part, but I'll mention
that we try to keep lines, including comments, less than 80 characters
long. pgindent can try to fix that, but the results don't always look
nice.

About variable names: The iterator pointers are variously called
"iter", "iterator", and "fetch_iter". I found this confusing the first
time I read this code. I think we should use "iter" if we have only
one kind in the function, and "detoast_iter" and "fetch_iter" if we
have both kinds.
--

init_detoast_iterator():

+ * The "iterator" variable is normally just a local variable in the caller.

I don't think this comment is helpful to understand this function or its use.

+ * It only make sense to initialize de-TOAST iterator for external
on-disk value.

s/make/makes/
"a" de-TOAST iterator
s/value/values/

The comments in this function that start with "This is a ..." could be
shortened like this:

/* indirect pointer -- dereference it */

While looking at this again, I noticed we no longer need to test for
the in-line compressed case at all. I also tried some other cosmetic
rearrangements. Let me know what you think about the attached patch.
Also, I wonder if the VARATT_IS_EXTERNAL_INDIRECT case should come
first. Then the two normal cases are next to eachother.


free_detoast_iterator(), free_fetch_datum_iterator(), and free_toast_buffer():

These functions should return void.

+ * Free the memory space occupied by the de-TOAST iterator include buffers and
+ * fetch datum iterator.

Perhaps "Free memory used by the de-TOAST iterator, including buffers
and fetch datum iterator."

The check

if (iter->buf != iter->fetch_datum_iterator->buf)

is what we need to do for the compressed case. Could we use this
directly instead of having a separate state variable iter->compressed,
with a macro like this?

#define TOAST_ITER_COMPRESSED(iter) \
    (iter->buf != iter->fetch_datum_iterator->buf)

Or maybe that's too clever?


detoast_iterate():

+ * As long as there is another data chunk in compression or external storage,

We no longer use the iterator with in-line compressed values.

+ * de-TOAST it into toast buffer in iterator.

Maybe "into the iterator's toast buffer"


fetch_datum_iterate():

My remarks for detoast_iterate() also apply here.


init_toast_buffer():

+ * Note the constrain buf->position <= buf->limit may be broken
+ * at initialization. Make sure that the constrain is satisfied
+ * when consume chars.

s/constrain/constraint/ (2 times)
s/consume/consuming/

Also, this comment might be better at the top the whole function?


pglz_decompress_iterate():

+ * Decompresses source into dest until the source is exhausted.

This comment is from the original function, but I think it would be
better to highlight the differences from the original, something like:

"This function is based on pglz_decompress(), with these additional
requirements:

1. We need to save the current control byte and byte position for the
caller's next iteration.

2. In pglz_decompress(), we can assume we have all the source bytes
available. This is not the case when we decompress one chunk at a
time, so we have to make sure that we only read bytes available in the
current chunk."

(I'm not sure about the term 'byte position', maybe there's a better one.)

+ * In the while loop, sp may go beyond the srcend, provides a four-byte
+ * buffer to prevent sp from reading unallocated bytes from source buffer.
+ * When source->limit reaches source->capacity, don't worry about reading
+ * unallocated bytes.

Here's my suggestion:

"In the while loop, sp may be incremented such that it points beyond
srcend. To guard against reading beyond the end of the current chunk,
we set srcend such that we exit the loop when we are within four bytes
of the end of the current chunk. When source->limit reaches
source->capacity, we are decompressing the last chunk, so we can (and
need to) read every byte."

+ for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)

Note you can also replace 8 with INVALID_CTRLC here.

tuptoaster.h:
+ * Constrains that need to be satisfied:

s/constrains/constraints/

+ * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
+ * the field is invalid and need to read the control byte from the
+ * source buffer in the next iteration, see pglz_decompress_iterate().
+ */
+#define INVALID_CTRLC 8

I think the macro might be better placed in pg_lzcompress.h, and for
consistency used in pglz_decompress(). Then the comment can be shorter
and more general. With my additional comment in
init_detoast_iterator(), hopefully it will be clear to readers.


--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

rearrange-init-iter.patch (3K) Download Attachment