tableam vs. TOAST

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

tableam vs. TOAST

Robert Haas
In a nearby thread[1], Ashwin Agrawal complained that there is no way
for a table AM to get rid the TOAST table that the core system thinks
should be created.  To that I added a litany of complaints of my own,
including...

- the core system decides whether or not a TOAST table is needed based
on criteria that are very much heap-specific,
- the code for reading and writing values stored in a TOAST table is
heap-specific, and
- the core system assumes that you want to use the same table AM for
the main table and the toast table, but you might not (e.g. you might
want to use the regular old heap for the latter).

Attached as a series of patches which try to improve things in this
area.  Except possibly for 0001, this is v13 material; see discussion
on the other thread.  These likely need some additional work, but I've
done enough with them that I thought it would be worth publishing them
at this stage, because it seems that I'm not the only one thinking
about the problems that exist in this general area.  Here is an
overview:

0001 moves the needs_toast_table() calculation below the table AM
layer.  That allows a table AM to decide for itself whether it wants a
TOAST table.  The most obvious way in which a table AM might want to
be different from what core expects is to decide that the answer is
always "no," which it can do if it has some other method of storing
large values or doesn't wish to support them.  Another possibility is
that it wants logic that is basically similar to the heap, but with a
different size threshold because its tuple format is different. There
are probably other possibilities.

0002 breaks tuptoaster.c into three separate files.  It just does code
movement; no functional changes.  The three pieces are detoast.c,
which handles detoasting of toast values and inspection of the sizes
of toasted datums; heaptoast.c, which keeps all the functions that are
intrinsically heap-specific; and toast_internals.c, which is intended
to have a very limited audience.  A nice fringe benefit of this stuff
is that a lot of other files that current have to include tuptoaster.h
and thus htup_details.h no longer do.

0003 creates a new file toast_helper.c which is intended to help table
AMs implement insertion and deletion of toast table rows.  Most of the
AM-independent logic from the functions remaining in heaptoast.c is
moved to this file.  This leaves about ~600 of the original ~2400
lines from tuptoaster.c as heap-specific logic, but a new heap AM
actually wouldn't need all of that stuff, because some of the logic
here is in support of stuff like record types, which use HeapTuple
internally and will continue to do so even if those record types are
stored in some other kind of table.

0004 allows TOAST tables to be implemented using a table AM other than
heap.  In a certain sense this is the opposite of 0003.  0003 is
intended to help people who are implementing a new kind of main table,
whereas 0004 is intended to help people implementing a new kind of
TOAST table.  It teaches the code that inserts, deletes, and retrieves
TOAST row to use slots, and it makes some efficiency improvements in
the hopes of offsetting any performance loss from so doing.  See
commit message and/or patch for full details.

I believe that with all of these changes it should be pretty
straightforward for a table AM that wants to use itself to store TOAST
data to do so, or to delegate that task back to say the regular heap.
I haven't really validated that yet, but plan to do so.

In addition to what's in this patch set, I believe that we should
probably rename some of these functions and macros, so that the
heap-specific ones have heap-specific names and the generic ones
don't, but I haven't gone through all of that yet.  The existing
patches try to choose good names for the new things they add, but they
don't rename any of the existing stuff.  I also think we should
consider removing TOAST_MAX_CHUNK_SIZE from the control file, both
because I'm not sure anybody's really using the ability to vary that
for anything and because that solution doesn't seem entirely sensible
in a world of multiple AMs.  However, that is a debatable change, so
maybe others will disagree.

[1] http://postgr.es/m/CALfoeitE+P8UGii8=BsGQLpHch2EZWJhq4M+D-jfaj8YCa_FSw@...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v1-0004-Allow-TOAST-tables-to-be-implemented-using-table-.patch (48K) Download Attachment
v1-0001-tableam-Move-heap-specific-logic-from-needs_toast.patch (9K) Download Attachment
v1-0003-Create-an-API-for-inserting-and-deleting-rows-in-.patch (40K) Download Attachment
v1-0002-Split-tuptoaster.c-into-three-separate-files.patch (223K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Robert Haas
Updated and rebased patches attached.

On Fri, May 17, 2019 at 5:21 PM Robert Haas <[hidden email]> wrote:
> 0001 moves the needs_toast_table() calculation below the table AM
> layer.  That allows a table AM to decide for itself whether it wants a
> TOAST table.  The most obvious way in which a table AM might want to
> be different from what core expects is to decide that the answer is
> always "no," which it can do if it has some other method of storing
> large values or doesn't wish to support them.  Another possibility is
> that it wants logic that is basically similar to the heap, but with a
> different size threshold because its tuple format is different. There
> are probably other possibilities.

This was committed as 1171d7d58545f26a402f76a05936d572bf29d53b per
discussion on another thread.

> 0002 breaks tuptoaster.c into three separate files.  It just does code
> movement; no functional changes.  The three pieces are detoast.c,
> which handles detoasting of toast values and inspection of the sizes
> of toasted datums; heaptoast.c, which keeps all the functions that are
> intrinsically heap-specific; and toast_internals.c, which is intended
> to have a very limited audience.  A nice fringe benefit of this stuff
> is that a lot of other files that current have to include tuptoaster.h
> and thus htup_details.h no longer do.

Now 0001.  No changes.

> 0003 creates a new file toast_helper.c which is intended to help table
> AMs implement insertion and deletion of toast table rows.  Most of the
> AM-independent logic from the functions remaining in heaptoast.c is
> moved to this file.  This leaves about ~600 of the original ~2400
> lines from tuptoaster.c as heap-specific logic, but a new heap AM
> actually wouldn't need all of that stuff, because some of the logic
> here is in support of stuff like record types, which use HeapTuple
> internally and will continue to do so even if those record types are
> stored in some other kind of table.

Now 0002. No changes.

> 0004 allows TOAST tables to be implemented using a table AM other than
> heap.  In a certain sense this is the opposite of 0003.  0003 is
> intended to help people who are implementing a new kind of main table,
> whereas 0004 is intended to help people implementing a new kind of
> TOAST table.  It teaches the code that inserts, deletes, and retrieves
> TOAST row to use slots, and it makes some efficiency improvements in
> the hopes of offsetting any performance loss from so doing.  See
> commit message and/or patch for full details.

Now 0003.  Some brain fade repaired.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v2-0003-Allow-TOAST-tables-to-be-implemented-using-table-.patch (50K) Download Attachment
v2-0002-Create-an-API-for-inserting-and-deleting-rows-in-.patch (40K) Download Attachment
v2-0001-Split-tuptoaster.c-into-three-separate-files.patch (223K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Robert Haas
On Tue, May 21, 2019 at 2:10 PM Robert Haas <[hidden email]> wrote:
> Updated and rebased patches attached.

And again.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v3-0003-Allow-TOAST-tables-to-be-implemented-using-table-.patch (49K) Download Attachment
v3-0001-Split-tuptoaster.c-into-three-separate-files.patch (223K) Download Attachment
v3-0002-Create-an-API-for-inserting-and-deleting-rows-in-.patch (40K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Prabhat Sahu
On Tue, Jun 11, 2019 at 9:47 PM Robert Haas <[hidden email]> wrote:
On Tue, May 21, 2019 at 2:10 PM Robert Haas <[hidden email]> wrote:
> Updated and rebased patches attached.

And again.

Hi Robert,

I have tested the TOAST patches(v3) with different storage options like(MAIN, EXTERNAL, EXTENDED, etc.), and 
combinations of compression and out-of-line storage options.
I have used a few dummy tables with various tuple count say 10k, 20k, 40k, etc. with different column lengths. 
Used manual CHECKPOINT option with (checkpoint_timeout = 1d, max_wal_size = 10GB) before the test to avoid performance fluctuations, 
and calculated the results as a median value of a few consecutive test executions.

Please find the SQL script attached herewith, which I have used to perform the observation.

Below are the test scenarios, how I have checked the behavior and performance of TOAST patches against PG master.
1. where a single column is compressed(SCC)
2. where multiple columns are compressed(MCC)
        -- ALTER the table column/s for storage as "MAIN" to make sure that the column values are COMPRESSED.

3. where a single column is pushed to the TOAST table but not compressed(SCTNC)
4. where multiple columns are pushed to the TOAST table but not compressed(MCTNC)
        -- ALTER the table column/s for storage as "EXTERNAL" to make sure that the column values are pushed to the TOAST table but not COMPRESSED.

5. where a single column is pushed to the TOAST table and also compressed(SCTC)
6. where multiple columns are pushed to the TOAST table and also compressed(MCTC)
        -- ALTER the table column/s for storage as "EXTENDED" to make sure that the column values are pushed to the TOAST table and also COMPRESSED.

7. updating the tuples with similar data shouldn't affect the behavior of storage options.

Please find my observation as below:
System Used: (VCPUs: 8, RAM: 16GB, Size: 640GB)
10000 Tuples20000 Tuples40000 Tuples80000 Tuples
Without PatchWith PatchWithout PatchWith PatchWithout PatchWith PatchWithout PatchWith Patch
1. SCC INSERT125921.737 ms (02:05.922)125992.563 ms (02:05.993)234263.295 ms (03:54.263)235952.336 ms (03:55.952)497290.442 ms (08:17.290)502820.139 ms (08:22.820)948470.603 ms (15:48.471)941778.952 ms (15:41.779)
1. SCC UPDATE263017.814 ms (04:23.018)270893.910 ms (04:30.894)488393.748 ms (08:08.394)507937.377 ms (08:27.937)1078862.613 ms (17:58.863)1053029.428 ms (17:33.029)2037119.576 ms (33:57.120)2023633.862 ms (33:43.634)
2. MCC INSERT35415.089 ms (00:35.415)35910.552 ms (00:35.911)70899.737 ms (01:10.900)70800.964 ms (01:10.801)142185.996 ms (02:22.186)142241.913 ms (02:22.242)
2. MCC UPDATE72043.757 ms (01:12.044)73848.732 ms (01:13.849)137717.696 ms (02:17.718)137577.606 ms (02:17.578)276358.752 ms (04:36.359) 276520.727 ms (04:36.521)
3. SCTNC INSERT26377.274 ms (00:26.377) 25600.189 ms (00:25.600)45702.630 ms (00:45.703)45163.510 ms (00:45.164)99903.299 ms (01:39.903) 100013.004 ms (01:40.013)
3. SCTNC UPDATE78385.225 ms (01:18.385)76680.325 ms (01:16.680)151823.250 ms (02:31.823)153503.971 ms (02:33.504) 308197.734 ms (05:08.198)308474.937 ms (05:08.475)
4. MCTNC INSERT26214.069 ms (00:26.214)25383.522 ms (00:25.384)50826.522 ms (00:50.827)50221.669 ms (00:50.222)106034.338 ms (01:46.034)106122.827 ms (01:46.123)
4. MCTNC UPDATE78423.817 ms (01:18.424) 75154.593 ms (01:15.155)158885.787 ms (02:38.886)156530.964 ms (02:36.531)319721.266 ms (05:19.721)322385.709 ms (05:22.386)
5. SCTC INSERT38451.022 ms (00:38.451)38652.520 ms (00:38.653) 71590.748 ms (01:11.591)71048.975 ms (01:11.049) 143327.913 ms (02:23.328)142593.207 ms (02:22.593)
5. SCTC UPDATE82069.311 ms (01:22.069)81678.131 ms (01:21.678)138763.508 ms (02:18.764)138625.473 ms (02:18.625)277534.080 ms (04:37.534)277091.611 ms (04:37.092)
6. MCTC INSERT36325.730 ms (00:36.326)35803.368 ms (00:35.803)73285.204 ms (01:13.285)72728.371 ms (01:12.728)142324.859 ms (02:22.325)144368.335 ms (02:24.368)
6. MCTC UPDATE73740.729 ms (01:13.741)73002.511 ms (01:13.003)141309.859 ms (02:21.310)139676.173 ms (02:19.676)278906.647 ms (04:38.907)279522.408 ms (04:39.522)

All the observation looks good to me,
except for the "Test1" for SCC UPDATE with tuple count(10K/20K)for SCC INSERT with tuple count(40K)  there was a slightly increse in time taken
incase of "with patch" result. For a better observation, I also have ran the same "Test 1" for higher tuple count(i.e. 80K), and it also looks fine.

I also have performed the below test with TOAST table objects.
8. pg_dump/restore, pg_upgrade with these 
9. Streaming Replication setup
10. Concurrent Transactions

While testing few concurrent transactions I have below query:
-- Concurrent transactions acquire a lock for TOAST option(ALTER TABLE .. SET STORAGE .. MAIN/EXTERNAL/EXTENDED/ etc)

-- Session 1:
CREATE TABLE a (a_id text PRIMARY KEY);
CREATE TABLE b (b_id text);
INSERT INTO a VALUES ('a'), ('b');
INSERT INTO b VALUES ('a'), ('b'), ('b');

BEGIN;
ALTER TABLE b ADD CONSTRAINT bfk FOREIGN KEY (b_id) REFERENCES a (a_id);     -- Not Acquiring any lock

-- Session 2:
SELECT * FROM b WHERE b_id = 'a';             -- Shows result

-- Session 1:
ALTER TABLE b ALTER COLUMN b_id SET STORAGE EXTERNAL;        -- Acquire a lock

-- Session 2:
SELECT * FROM b WHERE b_id = 'a';            -- Hang/Waiting for lock in session 1

Is this an expected behavior?


-- 

With Regards,

Prabhat Kumar Sahu


TOAST_table_test.sql (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Thomas Munro-5
In reply to this post by Robert Haas
On Wed, Jun 12, 2019 at 4:17 AM Robert Haas <[hidden email]> wrote:
> On Tue, May 21, 2019 at 2:10 PM Robert Haas <[hidden email]> wrote:
> > Updated and rebased patches attached.
>
> And again.

Hi Robert,

Thus spake GCC:

detoast.c: In function ‘toast_fetch_datum’:
detoast.c:308:12: error: variable ‘toasttupDesc’ set but not used
[-Werror=unused-but-set-variable]
TupleDesc toasttupDesc;
^

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Robert Haas
In reply to this post by Prabhat Sahu
On Tue, Jun 25, 2019 at 2:19 AM Prabhat Sahu
<[hidden email]> wrote:
> I have tested the TOAST patches(v3) with different storage options like(MAIN, EXTERNAL, EXTENDED, etc.), and
> combinations of compression and out-of-line storage options.
> I have used a few dummy tables with various tuple count say 10k, 20k, 40k, etc. with different column lengths.
> Used manual CHECKPOINT option with (checkpoint_timeout = 1d, max_wal_size = 10GB) before the test to avoid performance fluctuations,
> and calculated the results as a median value of a few consecutive test executions.

Thanks for testing.

> All the observation looks good to me,
> except for the "Test1" for SCC UPDATE with tuple count(10K/20K), for SCC INSERT with tuple count(40K)  there was a slightly increse in time taken
> incase of "with patch" result. For a better observation, I also have ran the same "Test 1" for higher tuple count(i.e. 80K), and it also looks fine.

Did you run each test just once?  How stable are the results?

> While testing few concurrent transactions I have below query:
> -- Concurrent transactions acquire a lock for TOAST option(ALTER TABLE .. SET STORAGE .. MAIN/EXTERNAL/EXTENDED/ etc)
>
> -- Session 1:
> CREATE TABLE a (a_id text PRIMARY KEY);
> CREATE TABLE b (b_id text);
> INSERT INTO a VALUES ('a'), ('b');
> INSERT INTO b VALUES ('a'), ('b'), ('b');
>
> BEGIN;
> ALTER TABLE b ADD CONSTRAINT bfk FOREIGN KEY (b_id) REFERENCES a (a_id);     -- Not Acquiring any lock

For me, this acquires AccessShareLock and ShareRowExclusiveLock on the
target table.

rhaas=# select locktype, database, relation, pid, mode, granted from
pg_locks where relation = 'b'::regclass;
 locktype | database | relation |  pid  |         mode          | granted
----------+----------+----------+-------+-----------------------+---------
 relation |    16384 |    16872 | 93197 | AccessShareLock       | t
 relation |    16384 |    16872 | 93197 | ShareRowExclusiveLock | t
(2 rows)

I don't see what that has to do with the topic at hand, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Robert Haas
In reply to this post by Thomas Munro-5
On Sun, Jul 7, 2019 at 11:08 PM Thomas Munro <[hidden email]> wrote:
> Thus spake GCC:
>
> detoast.c: In function ‘toast_fetch_datum’:
> detoast.c:308:12: error: variable ‘toasttupDesc’ set but not used
> [-Werror=unused-but-set-variable]
> TupleDesc toasttupDesc;
> ^

Hmm, fixed, I hope.

Here's an updated patch set.  In addition to the above fix, I fixed
things up for the new pgindent rules and added a fourth patch that
renames the detoasting functions to something that doesn't include
'heap.'

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0004-Rename-attribute-detoasting-functions.patch (18K) Download Attachment
0003-Allow-TOAST-tables-to-be-implemented-using-table-AMs.patch (50K) Download Attachment
0002-Create-an-API-for-inserting-and-deleting-rows-in-TOA.patch (41K) Download Attachment
0001-Split-tuptoaster.c-into-three-separate-files.patch (223K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Prabhat Sahu
In reply to this post by Robert Haas


On Mon, Jul 8, 2019 at 9:06 PM Robert Haas <[hidden email]> wrote:
On Tue, Jun 25, 2019 at 2:19 AM Prabhat Sahu
<[hidden email]> wrote:
> I have tested the TOAST patches(v3) with different storage options like(MAIN, EXTERNAL, EXTENDED, etc.), and
> combinations of compression and out-of-line storage options.
> I have used a few dummy tables with various tuple count say 10k, 20k, 40k, etc. with different column lengths.
> Used manual CHECKPOINT option with (checkpoint_timeout = 1d, max_wal_size = 10GB) before the test to avoid performance fluctuations,
> and calculated the results as a median value of a few consecutive test executions.

Thanks for testing.

> All the observation looks good to me,
> except for the "Test1" for SCC UPDATE with tuple count(10K/20K), for SCC INSERT with tuple count(40K)  there was a slightly increse in time taken
> incase of "with patch" result. For a better observation, I also have ran the same "Test 1" for higher tuple count(i.e. 80K), and it also looks fine.

Did you run each test just once?  How stable are the results?
No, I have executed the test multiple times(7times each) and calculated the result as the median among those,
and the result looks stable(with v3 patches).

--

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.

The Postgres Database Company

Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Robert Haas
On Tue, Jul 9, 2019 at 12:40 AM Prabhat Sahu
<[hidden email]> wrote:
>> Did you run each test just once?  How stable are the results?
>
> No, I have executed the test multiple times(7times each) and calculated the result as the median among those,
> and the result looks stable(with v3 patches).

I spent some time looking at your SCC test today.  I think this isn't
really testing the code that actually got changed in the patch: a
quick CPU profile shows that your SCC test is bottlenecked on
pg_lzcompress, which spends a huge amount of time compressing the
gigantic string of 'a's you've constructed, and that code is exactly
the same with the patch as it in master. So, I think that any
fluctuations between the patched and unpatched results are just random
variation.  There's no reason the patch should be slower with one row
count and faster with a different row count, anyway.

I tried to come up with a better test case that uses a more modest
amount of data, and ended up with this:

-- Setup.
CREATE OR REPLACE FUNCTION randomish_string(integer) RETURNS text AS $$
SELECT string_agg(random()::text, '') FROM generate_series(1, $1);
$$ LANGUAGE sql;

CREATE TABLE source_compressed (a int, b text);
INSERT INTO source_compressed
SELECT g, repeat('a', 2000) FROM generate_series(1, 10000) g;
CREATE TABLE sink_compressed (LIKE source_compressed);

CREATE TABLE source_external (a int, b text);
INSERT INTO source_external
SELECT g, randomish_string(400) FROM generate_series(1, 10000) g;

CREATE TABLE sink_external (LIKE source_external);
CREATE TABLE source_external_uncompressed (a int, b text);
ALTER TABLE source_external_uncompressed ALTER COLUMN b SET STORAGE EXTERNAL;
INSERT INTO source_external_uncompressed
SELECT g, randomish_string(400) FROM generate_series(1, 10000) g;
CREATE TABLE sink_external_uncompressed (LIKE source_external_uncompressed);
ALTER TABLE sink_external_uncompressed ALTER COLUMN b SET STORAGE EXTERNAL;

-- Test.
\timing
TRUNCATE sink_compressed, sink_external, sink_external_uncompressed;
CHECKPOINT;
INSERT INTO sink_compressed SELECT * FROM source_compressed;
INSERT INTO sink_external SELECT * FROM source_external;
INSERT INTO sink_external_uncompressed SELECT * FROM
source_external_uncompressed;

Roughly, on both master and with the patches, the first one takes
about 4.2 seconds, the second 7.5, and the third 1.2.  The third one
is the fastest because it doesn't do any compression.  Since it does
less irrelevant work than the other two cases, it has the best chance
of showing up any performance regression that the patch has caused --
if any regression existed, I suppose that it would be an increased
per-toast-fetch or per-toast-chunk overhead. However, I can't
reproduce any such regression.  My first attempt at testing that case
showed the patch about 1% slower, but that wasn't reliably
reproducible when I did it a bunch more times.  So as far as I can
figure, this patch does not regress performance in any
easily-measurable way.

Barring objections, I plan to commit the whole series of patches here
(latest rebase attached).  They are not perfect and could likely be
improved in various ways, but I think they are an improvement over
what we have now, and it's not like it's set in stone once it's
committed.  We can change it more if we come up with a better idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0004-Rename-attribute-detoasting-functions.patch (18K) Download Attachment
0002-Create-an-API-for-inserting-and-deleting-rows-in-TOA.patch (41K) Download Attachment
0003-Allow-TOAST-tables-to-be-implemented-using-table-AMs.patch (50K) Download Attachment
0001-Split-tuptoaster.c-into-three-separate-files.patch (224K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Andres Freund
Hi,

On 2019-08-01 12:23:42 -0400, Robert Haas wrote:
> Barring objections, I plan to commit the whole series of patches here
> (latest rebase attached).  They are not perfect and could likely be
> improved in various ways, but I think they are an improvement over
> what we have now, and it's not like it's set in stone once it's
> committed.  We can change it more if we come up with a better idea.

Could you wait until I either had a chance to look again, or until, say,
Monday if I don't get around to it?  I'll try to get to it earlier than
that.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Robert Haas
On Thu, Aug 1, 2019 at 1:53 PM Andres Freund <[hidden email]> wrote:
> Could you wait until I either had a chance to look again, or until, say,
> Monday if I don't get around to it?  I'll try to get to it earlier than
> that.

Sure, no problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2019-08-01 12:23:42 -0400, Robert Haas wrote:

> Roughly, on both master and with the patches, the first one takes
> about 4.2 seconds, the second 7.5, and the third 1.2.  The third one
> is the fastest because it doesn't do any compression.  Since it does
> less irrelevant work than the other two cases, it has the best chance
> of showing up any performance regression that the patch has caused --
> if any regression existed, I suppose that it would be an increased
> per-toast-fetch or per-toast-chunk overhead. However, I can't
> reproduce any such regression.  My first attempt at testing that case
> showed the patch about 1% slower, but that wasn't reliably
> reproducible when I did it a bunch more times.  So as far as I can
> figure, this patch does not regress performance in any
> easily-measurable way.

Hm, those all include writing, right? And for read-only we don't expect
any additional overhead, correct? The write overhead is probably too
large show a bit of function call overhead - but if so, it'd probably be
on unlogged tables? And with COPY, because that utilizes multi_insert,
which means more toasting in a shorter amount of time?


.oO(why does everyone attach attachements out of order? Is that
a gmail thing?)


> From a4c858c75793f0f8aff7914c572a6615ea5babf8 Mon Sep 17 00:00:00 2001
> From: Robert Haas <[hidden email]>
> Date: Mon, 8 Jul 2019 11:58:05 -0400
> Subject: [PATCH 1/4] Split tuptoaster.c into three separate files.
>
> detoast.c/h contain functions required to detoast a datum, partially
> or completely, plus a few other utility functions for examining the
> size of toasted datums.
>
> toast_internals.c/h contain functions that are used internally to the
> TOAST subsystem but which (mostly) do not need to be accessed from
> outside.
>
> heaptoast.c/h contains code that is intrinsically specific to the
> heap AM, either because it operates on HeapTuples or is based on the
> layout of a heap page.
>
> detoast.c and toast_internals.c are placed in
> src/backend/access/common rather than src/backend/access/heap.  At
> present, both files still have dependencies on the heap, but that will
> be improved in a future commit.

I wonder if toasting.c should be moved too?

If anybody doesn't know git's --color-moved, it makes patches like this
a lot easier to review.

> index 00000000000..582af147ea1
> --- /dev/null
> +++ b/src/include/access/detoast.h
> @@ -0,0 +1,92 @@
> +/*-------------------------------------------------------------------------
> + *
> + * detoast.h
> + *    Access to compressed and external varlena values.
>
> Hm. Does it matter that that also includes stuff like expanded datums?
>
> + * Copyright (c) 2000-2019, PostgreSQL Global Development Group
> + *
> + * src/include/access/detoast.h
> + *
> + *-------------------------------------------------------------------------
> + */
> +#ifndef DETOAST_H
> +#define DETOAST_H

trailing whitespace after "#ifndef DETOAST_H ".


> commit 60d51e6510c66f79c51e43fe22730fe050d87854
> Author: Robert Haas <[hidden email]>
> Date:   2019-07-08 12:02:16 -0400
>
>     Create an API for inserting and deleting rows in TOAST tables.
>
>     This moves much of the non-heap-specific logic from toast_delete and
>     toast_insert_or_update into a helper functions accessible via a new
>     header, toast_helper.h.  Using the functions in this module, a table
>     AM can implement creation and deletion of TOAST table rows with
>     much less code duplication than was possible heretofore.  Some
>     table AMs won't want to use the TOAST logic at all, but for those
>     that do this will make that easier.
>
>     Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@...

Hm. This leaves toast_insert_or_update() as a name. That makes it sound
like it's generic toast code, rather than heap specific?

It's definitely nice how a lot of repetitive code has been deduplicated.
Also makes it easier to see how algorithmically expensive
toast_insert_or_update() is :(.


> /*
> * Second we look for attributes of attstorage 'x' or 'e' that are still
> * inline, and make them external.  But skip this if there's no toast
> * table to push them to.
> */
> while (heap_compute_data_size(tupleDesc,
>  toast_values, toast_isnull) > maxDataLen &&
>   rel->rd_rel->reltoastrelid != InvalidOid)

Shouldn't this condition be the other way round?


> if (for_compression)
> skip_colflags |= TOASTCOL_INCOMPRESSIBLE;
>
> for (i = 0; i < numAttrs; i++)
> {
> Form_pg_attribute att = TupleDescAttr(tupleDesc, i);
>
> if ((ttc->ttc_attr[i].tai_colflags & skip_colflags) != 0)
> continue;
> if (VARATT_IS_EXTERNAL(DatumGetPointer(ttc->ttc_values[i])))
> continue; /* can't happen, toast_action would be 'p' */
> if (for_compression &&
> VARATT_IS_COMPRESSED(DatumGetPointer(ttc->ttc_values[i])))
> continue;
> if (check_main && att->attstorage != 'm')
> continue;
> if (!check_main && att->attstorage != 'x' && att->attstorage != 'e')
> continue;
>
> if (ttc->ttc_attr[i].tai_size > biggest_size)
> {
> biggest_attno = i;
> biggest_size = ttc->ttc_attr[i].tai_size;
> }
> }

Couldn't most of these be handled via colflags, instead of having
numerous individual checks? I.e. if you had TOASTCOL_COMPRESSED,
TOASTCOL_IGNORED, TOASTCOL_MAIN, TOASTFLAG_EXTERNAL, etc, all but the
size check ought to boil down to a single mask test?


> extern void toast_tuple_init(ToastTupleContext *ttc);
> extern int toast_tuple_find_biggest_attribute(ToastTupleContext *ttc,
>   bool for_compression,
>   bool check_main);
> extern void toast_tuple_try_compression(ToastTupleContext *ttc, int attribute);
> extern void toast_tuple_externalize(ToastTupleContext *ttc, int attribute,
> int options, int max_chunk_size);
> extern void toast_tuple_cleanup(ToastTupleContext *ttc, bool cleanup_toastrel);

I wonder if a better prefix wouldn't be toasting_...


> +/*
> + * Information about one column of a tuple being toasted.
> + *
> + * NOTE: toast_action[i] can have these values:
> + *      ' '     default handling
> + *      'p'     already processed --- don't touch it
> + *      'x'     incompressible, but OK to move off
> + *
> + * NOTE: toast_attr[i].tai_size is only made valid for varlena attributes with
> + * toast_action[i] different from 'p'.
> + */
> +typedef struct
> +{
> +    struct varlena *tai_oldexternal;
> +    int32       tai_size;
> +    uint8       tai_colflags;
> +} ToastAttrInfo;

I think that comment is outdated?

> +/*
> + * Flags indicating the overall state of a TOAST operation.
> + *
> + * TOAST_NEEDS_DELETE_OLD indicates that one or more old TOAST datums need
> + * to be deleted.
> + *
> + * TOAST_NEEDS_FREE indicates that one or more TOAST values need to be freed.
> + *
> + * TOAST_HAS_NULLS indicates that nulls were found in the tuple being toasted.
> + *
> + * TOAST_NEEDS_CHANGE indicates that a new tuple needs to built; in other
> + * words, the toaster did something.
> + */
> +#define TOAST_NEEDS_DELETE_OLD              0x0001
> +#define TOAST_NEEDS_FREE                    0x0002
> +#define TOAST_HAS_NULLS                     0x0004
> +#define TOAST_NEEDS_CHANGE                  0x0008

I'd make these enums. They're more often accessible in a debugger...



> commit 9e4bd383a00e6bb96088666e57673b343049345c
> Author: Robert Haas <[hidden email]>
> Date:   2019-08-01 10:37:02 -0400
>
>     Allow TOAST tables to be implemented using table AMs other than heap.
>
>     toast_fetch_datum, toast_save_datum, and toast_delete_datum are
>     adjusted to use tableam rather than heap-specific functions.  This
>     might have some performance impact, but this patch attempts to
>     mitigate that by restructuring things so that we don't open and close
>     the toast table and indexes multiple times per tuple.
>
>     tableam now exposes an integer value (not a callback) for the
>     maximum TOAST chunk size, and has a new callback allowing table
>     AMs to specify the AM that should be used to implement the TOAST
>     table. Previously, the toast AM was always the same as the table AM.
>
>     Patch by me, tested by Prabhat Sabu.
>
>     Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@...

I'm quite unconvinced that making the chunk size specified by the AM is
a good idea to do in isolation. We have TOAST_MAX_CHUNK_SIZE in
pg_control etc. It seems a bit dangerous to let AMs provide the size,
without being very clear that any change of the value will make data
inaccessible. It'd be different if the max were only used during
toasting.

I think the *size* checks should be weakened so we check:
1) After each chunk, whether the already assembled chunks exceed the
   expected size.
2) After all chunks have been collected, check that the size is exactly
   what we expect.

I don't think that removes a meaningful amount of error
checking. Missing tuples etc get detected by the chunk_ids not being
consecutive. The overall size is still verified.


The obvious problem with this is the slice fetching logic. For slices
with an offset of 0, it's obviously trivial to implement. For the higher
slice logic, I'd assume we'd have to fetch the first slice by estimating
where the start chunk is based on the current suggest chunk size, and
restarting the scan earlier/later if not accurate.  In most cases it'll
be accurate, so we'd not loose efficiency.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Robert Haas
On Fri, Aug 2, 2019 at 6:42 PM Andres Freund <[hidden email]> wrote:
> Hm, those all include writing, right? And for read-only we don't expect
> any additional overhead, correct? The write overhead is probably too
> large show a bit of function call overhead - but if so, it'd probably be
> on unlogged tables? And with COPY, because that utilizes multi_insert,
> which means more toasting in a shorter amount of time?

Yes and yes. I guess we could test the unlogged case and with COPY,
but in any realistic case you're still looking for a tiny CPU overhead
in a sea of I/O costs.  Even if an extremely short COPY on an unlogged
table regresses slightly, we do not normally reject patches that
improve code quality on the grounds that they add function call
overhead in a few places.  Code like this hard to optimize and
maintain; as you remarked yourself, there are multiple opportunities
to do this stuff better that are hard to see in the current structure.

> .oO(why does everyone attach attachements out of order? Is that
> a gmail thing?)

Must be.

> I wonder if toasting.c should be moved too?

I mean, we could, but I don't really see a reason.  It'd just be
moving it from one fairly-generic place to another, and I'd rather
minimize churn.

> trailing whitespace after "#ifndef DETOAST_H ".

Will fix.

> Hm. This leaves toast_insert_or_update() as a name. That makes it sound
> like it's generic toast code, rather than heap specific?

I'll rename it to heap_toast_insert_or_update().  But I think I'll put
that in 0004 with the other renames.

> It's definitely nice how a lot of repetitive code has been deduplicated.
> Also makes it easier to see how algorithmically expensive
> toast_insert_or_update() is :(.

Yep.

> Shouldn't this condition be the other way round?

I had to fight pretty hard to stop myself from tinkering with the
algorithm -- this can clearly be done better, but I wanted to make it
match the existing structure as far as possible. It also only needs to
be tested once, not on every loop iteration, so if we're going to
start changing things, we should go further than just swapping the
order of the tests.  For now I prefer to draw a line in the sand and
change nothing.

> Couldn't most of these be handled via colflags, instead of having
> numerous individual checks? I.e. if you had TOASTCOL_COMPRESSED,
> TOASTCOL_IGNORED, TOASTCOL_MAIN, TOASTFLAG_EXTERNAL, etc, all but the
> size check ought to boil down to a single mask test?

I'm not really seeing how more flags would significantly simplify this
logic, but I might be missing something.

> I wonder if a better prefix wouldn't be toasting_...

I'm open to other votes, but I think it's toast_tuple is more specific
than toasting_ and thus better.

> > +/*
> > + * Information about one column of a tuple being toasted.
> > + *
> > + * NOTE: toast_action[i] can have these values:
> > + *      ' '     default handling
> > + *      'p'     already processed --- don't touch it
> > + *      'x'     incompressible, but OK to move off
> > + *
> > + * NOTE: toast_attr[i].tai_size is only made valid for varlena attributes with
> > + * toast_action[i] different from 'p'.
> > + */
> > +typedef struct
> > +{
> > +    struct varlena *tai_oldexternal;
> > +    int32       tai_size;
> > +    uint8       tai_colflags;
> > +} ToastAttrInfo;
>
> I think that comment is outdated?

Oops.  Will fix.

> > +/*
> > + * Flags indicating the overall state of a TOAST operation.
> > + *
> > + * TOAST_NEEDS_DELETE_OLD indicates that one or more old TOAST datums need
> > + * to be deleted.
> > + *
> > + * TOAST_NEEDS_FREE indicates that one or more TOAST values need to be freed.
> > + *
> > + * TOAST_HAS_NULLS indicates that nulls were found in the tuple being toasted.
> > + *
> > + * TOAST_NEEDS_CHANGE indicates that a new tuple needs to built; in other
> > + * words, the toaster did something.
> > + */
> > +#define TOAST_NEEDS_DELETE_OLD              0x0001
> > +#define TOAST_NEEDS_FREE                    0x0002
> > +#define TOAST_HAS_NULLS                     0x0004
> > +#define TOAST_NEEDS_CHANGE                  0x0008
>
> I'd make these enums. They're more often accessible in a debugger...

Ugh, I hate that style.  Abusing enums to make flag bits makes my skin
crawl.  I always wondered what the appeal was -- I guess now I know.
Blech.

> I'm quite unconvinced that making the chunk size specified by the AM is
> a good idea to do in isolation. We have TOAST_MAX_CHUNK_SIZE in
> pg_control etc. It seems a bit dangerous to let AMs provide the size,
> without being very clear that any change of the value will make data
> inaccessible. It'd be different if the max were only used during
> toasting.

I was actually thinking about proposing that we rip
TOAST_MAX_CHUNK_SIZE out of pg_control.  No real effort has been made
here to make this something that users could configure, and I don't
know of a good reason to configure it. It also seems pretty out of
place in a world where there are multiple AMs floating around -- why
should heap, and only heap, be checked there? Granted it does have
some pride of place, but still.

> I think the *size* checks should be weakened so we check:
> 1) After each chunk, whether the already assembled chunks exceed the
>    expected size.
> 2) After all chunks have been collected, check that the size is exactly
>    what we expect.
>
> I don't think that removes a meaningful amount of error
> checking. Missing tuples etc get detected by the chunk_ids not being
> consecutive. The overall size is still verified.
>
> The obvious problem with this is the slice fetching logic. For slices
> with an offset of 0, it's obviously trivial to implement. For the higher
> slice logic, I'd assume we'd have to fetch the first slice by estimating
> where the start chunk is based on the current suggest chunk size, and
> restarting the scan earlier/later if not accurate.  In most cases it'll
> be accurate, so we'd not loose efficiency.

I don't feel entirely convinced that there's any rush to do all of
this right now, and the more I change the harder it is to make sure
that I haven't broken anything.  How strongly do you feel about this
stuff?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Tom Lane-2
Robert Haas <[hidden email]> writes:
> I don't feel entirely convinced that there's any rush to do all of
> this right now, and the more I change the harder it is to make sure
> that I haven't broken anything.  How strongly do you feel about this
> stuff?

FWIW, I agree with your comment further up that this patch ought to
just refactor, not change any algorithms.  The latter can be done
in separate patches.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2019-08-05 15:36:59 -0400, Robert Haas wrote:
> On Fri, Aug 2, 2019 at 6:42 PM Andres Freund <[hidden email]> wrote:
> > Hm. This leaves toast_insert_or_update() as a name. That makes it sound
> > like it's generic toast code, rather than heap specific?
>
> I'll rename it to heap_toast_insert_or_update().  But I think I'll put
> that in 0004 with the other renames.

Makes sense.


> > It's definitely nice how a lot of repetitive code has been deduplicated.
> > Also makes it easier to see how algorithmically expensive
> > toast_insert_or_update() is :(.
>
> Yep.
>
> > Shouldn't this condition be the other way round?
>
> I had to fight pretty hard to stop myself from tinkering with the
> algorithm -- this can clearly be done better, but I wanted to make it
> match the existing structure as far as possible. It also only needs to
> be tested once, not on every loop iteration, so if we're going to
> start changing things, we should go further than just swapping the
> order of the tests.  For now I prefer to draw a line in the sand and
> change nothing.

Makes sense.


> > Couldn't most of these be handled via colflags, instead of having
> > numerous individual checks? I.e. if you had TOASTCOL_COMPRESSED,
> > TOASTCOL_IGNORED, TOASTCOL_MAIN, TOASTFLAG_EXTERNAL, etc, all but the
> > size check ought to boil down to a single mask test?
>
> I'm not really seeing how more flags would significantly simplify this
> logic, but I might be missing something.

Well, right now you have a number of ifs for each attribute. If you
encoded all the parameters into flags, you could change that to a single
flag test - as far as I can tell, all the parameters could be
represented as that, if you moved MAIN etc into flags.  A single if for
flags (and then the size check) is cheaper than several separate checks.


> > I'm quite unconvinced that making the chunk size specified by the AM is
> > a good idea to do in isolation. We have TOAST_MAX_CHUNK_SIZE in
> > pg_control etc. It seems a bit dangerous to let AMs provide the size,
> > without being very clear that any change of the value will make data
> > inaccessible. It'd be different if the max were only used during
> > toasting.
>
> I was actually thinking about proposing that we rip
> TOAST_MAX_CHUNK_SIZE out of pg_control.  No real effort has been made
> here to make this something that users could configure, and I don't
> know of a good reason to configure it. It also seems pretty out of
> place in a world where there are multiple AMs floating around -- why
> should heap, and only heap, be checked there? Granted it does have
> some pride of place, but still.

> > I think the *size* checks should be weakened so we check:
> > 1) After each chunk, whether the already assembled chunks exceed the
> >    expected size.
> > 2) After all chunks have been collected, check that the size is exactly
> >    what we expect.
> >
> > I don't think that removes a meaningful amount of error
> > checking. Missing tuples etc get detected by the chunk_ids not being
> > consecutive. The overall size is still verified.
> >
> > The obvious problem with this is the slice fetching logic. For slices
> > with an offset of 0, it's obviously trivial to implement. For the higher
> > slice logic, I'd assume we'd have to fetch the first slice by estimating
> > where the start chunk is based on the current suggest chunk size, and
> > restarting the scan earlier/later if not accurate.  In most cases it'll
> > be accurate, so we'd not loose efficiency.
>
> I don't feel entirely convinced that there's any rush to do all of
> this right now, and the more I change the harder it is to make sure
> that I haven't broken anything.  How strongly do you feel about this
> stuff?

I think we either should leave the hardcoded limit in place, or make it
actually not fixed. Ripping-it-out-but-not-actually just seems like a
trap for the unwary, without much point.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Alvaro Herrera-9
In reply to this post by Robert Haas
On 2019-Aug-05, Robert Haas wrote:

> > Shouldn't this condition be the other way round?
>
> I had to fight pretty hard to stop myself from tinkering with the
> algorithm -- this can clearly be done better, but I wanted to make it
> match the existing structure as far as possible. It also only needs to
> be tested once, not on every loop iteration, so if we're going to
> start changing things, we should go further than just swapping the
> order of the tests.  For now I prefer to draw a line in the sand and
> change nothing.

I agree, and can we move forward with this 0001?  The idea here is to
change no code (as also suggested by Tom elsewhere), and it's the
largest patch in this series by a mile.  I checked --color-moved=zebra
and I think the patch looks fine, and also it compiles fine.  I ran
src/tools/pginclude/headerscheck on it and found no complaints.

So here's a rebased version, where the DETOAST_H whitespace has been
removed.  No other changes from your original.  Will you please push
soon?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

0001-Split-tuptoaster.c-into-three-separate-files.patch (169K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Robert Haas
On Thu, Sep 5, 2019 at 10:52 AM Alvaro Herrera <[hidden email]> wrote:
> I agree, and can we move forward with this 0001?  The idea here is to
> change no code (as also suggested by Tom elsewhere), and it's the
> largest patch in this series by a mile.  I checked --color-moved=zebra
> and I think the patch looks fine, and also it compiles fine.  I ran
> src/tools/pginclude/headerscheck on it and found no complaints.
>
> So here's a rebased version, where the DETOAST_H whitespace has been
> removed.  No other changes from your original.  Will you please push
> soon?

Done, thanks. Here's the rest again with the additional rename added
to 0003 (formerly 0004). I think it's probably OK to go ahead with
that stuff, too, but I'll wait a bit to see if anyone wants to raise
more objections.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v5-0003-Rename-some-toasting-functions-based-on-whether-t.patch (23K) Download Attachment
v5-0002-Allow-TOAST-tables-to-be-implemented-using-table-.patch (50K) Download Attachment
v5-0001-Create-an-API-for-inserting-and-deleting-rows-in-.patch (41K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Andres Freund
Hi,

On 2019-09-05 13:42:40 -0400, Robert Haas wrote:
> Done, thanks. Here's the rest again with the additional rename added
> to 0003 (formerly 0004). I think it's probably OK to go ahead with
> that stuff, too, but I'll wait a bit to see if anyone wants to raise
> more objections.

Well, I still dislike making the toast chunk size configurable in a
halfhearted manner.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Robert Haas
On Thu, Sep 5, 2019 at 3:10 PM Andres Freund <[hidden email]> wrote:
> On 2019-09-05 13:42:40 -0400, Robert Haas wrote:
> > Done, thanks. Here's the rest again with the additional rename added
> > to 0003 (formerly 0004). I think it's probably OK to go ahead with
> > that stuff, too, but I'll wait a bit to see if anyone wants to raise
> > more objections.
>
> Well, I still dislike making the toast chunk size configurable in a
> halfhearted manner.

So, I'd be willing to look into that some more.  But how about if I
commit the next patch in the series first?  I think this comment is
really about the second patch in the series, "Allow TOAST tables to be
implemented using table AMs other than heap," and it's fair to point
out that, since that patch extends table AM, we're somewhat committed
to it once we put it in.  But "Create an API for inserting and
deleting rows in TOAST tables." is just refactoring, and I don't see
what we gain from waiting to commit that part.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: tableam vs. TOAST

Tom Lane-2
In reply to this post by Andres Freund
Andres Freund <[hidden email]> writes:
> Well, I still dislike making the toast chunk size configurable in a
> halfhearted manner.

It's hard to make it fully configurable without breaking our on-disk
storage, because of the lack of any explicit representation of the chunk
size in TOAST data.  You have to "just know" how big the chunks are
supposed to be.

However, it's reasonable to ask why we should treat it as an AM property,
especially a fixed AM property as this has it.  If somebody does
reimplement toast logic in some other AM, they might well decide it's
worth the storage cost to be more flexible about the chunk size ... but
too bad, this design won't let them do it.

I don't entirely understand why relation_toast_am is a callback
while toast_max_chunk_size isn't, either.  Why would they not both
be callbacks?  That would at least let an AM set a per-relation
max chunk size, if it wanted.

It seems like this design throws away most of the benefit of a fixed
chunk size (mostly, being able to do relevant modulo arithmetic with
shifts and masks rather than full-fledged integer division) without
getting much of anything in return.

                        regards, tom lane


123