Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
Hi,

Turns out in portions of the regression tests a good chunk of the
runtime is inside AddNewAttributeTuples() and
recordMultipleDependencies()'s heap insertions. Looking at a few
profiles I had lying around I found that in some production cases
too. ISTM we should use heap_multi_insert() for both, as the source
tuples ought to be around reasonably comfortably.

For recordMultipleDependencies() it'd obviously better if we collected
all dependencies for new objects, rather than doing so separately. Right
now e.g. the code for a new table looks like:

                recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);

                recordDependencyOnOwner(RelationRelationId, relid, ownerid);

                recordDependencyOnNewAcl(RelationRelationId, relid, 0, ownerid, relacl);

                recordDependencyOnCurrentExtension(&myself, false);

                if (reloftypeid)
                {
                        referenced.classId = TypeRelationId;
                        referenced.objectId = reloftypeid;
                        referenced.objectSubId = 0;
                        recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
                }

and it'd obviously be more efficient to do that once if we went to using
heap_multi_insert() in the dependency code. But I suspect even if we
just used an extended API in AddNewAttributeTuples() (for the type /
collation dependencies), it'd be a win.

I'm not planning to work on this soon, but I though it'd be worthwhile
to put this out there (even if potentially just as a note to myself).

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
> On 13 Feb 2019, at 19:27, Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> Turns out in portions of the regression tests a good chunk of the
> runtime is inside AddNewAttributeTuples() and
> recordMultipleDependencies()'s heap insertions. Looking at a few
> profiles I had lying around I found that in some production cases
> too. ISTM we should use heap_multi_insert() for both, as the source
> tuples ought to be around reasonably comfortably.
>
> For recordMultipleDependencies() it'd obviously better if we collected
> all dependencies for new objects, rather than doing so separately. Right
> now e.g. the code for a new table looks like:
>
> recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
>
> recordDependencyOnOwner(RelationRelationId, relid, ownerid);
>
> recordDependencyOnNewAcl(RelationRelationId, relid, 0, ownerid, relacl);
>
> recordDependencyOnCurrentExtension(&myself, false);
>
> if (reloftypeid)
> {
> referenced.classId = TypeRelationId;
> referenced.objectId = reloftypeid;
> referenced.objectSubId = 0;
> recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
> }
>
> and it'd obviously be more efficient to do that once if we went to using
> heap_multi_insert() in the dependency code. But I suspect even if we
> just used an extended API in AddNewAttributeTuples() (for the type /
> collation dependencies), it'd be a win.
When a colleague was looking at heap_multi_insert in the COPY codepath I
remembered this and took a stab at a WIP patch inspired by this email, while
not following it to the letter.  It’s not going the full route of collecting
all the dependencies for creating a table, but adding ways to perform
multi_heap_insert in the existing codepaths as it seemed like a good place to
start.

It introduces a new function CatalogMultiInsertWithInfo which takes a set of
slots for use in heap_multi_insert, used from recordMultipleDependencies and
InsertPgAttributeTuples (which replace calling InsertPgAttributeTuple
repeatedly).  The code is still a WIP with some kludges, following the show-
early philosophy.

It passes make check and some light profiling around regress suites indicates
that it does improve a bit by reducing the somewhat costly calls.  Is this
along the lines of what you were thinking or way off?

cheers ./daniel


catalog_multi_insert.patch (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
Hi,

On 2019-05-22 10:25:14 +0200, Daniel Gustafsson wrote:

> > On 13 Feb 2019, at 19:27, Andres Freund <[hidden email]> wrote:
> >
> > Hi,
> >
> > Turns out in portions of the regression tests a good chunk of the
> > runtime is inside AddNewAttributeTuples() and
> > recordMultipleDependencies()'s heap insertions. Looking at a few
> > profiles I had lying around I found that in some production cases
> > too. ISTM we should use heap_multi_insert() for both, as the source
> > tuples ought to be around reasonably comfortably.
> >
> > For recordMultipleDependencies() it'd obviously better if we collected
> > all dependencies for new objects, rather than doing so separately. Right
> > now e.g. the code for a new table looks like:
> >
> > recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
> >
> > recordDependencyOnOwner(RelationRelationId, relid, ownerid);
> >
> > recordDependencyOnNewAcl(RelationRelationId, relid, 0, ownerid, relacl);
> >
> > recordDependencyOnCurrentExtension(&myself, false);
> >
> > if (reloftypeid)
> > {
> > referenced.classId = TypeRelationId;
> > referenced.objectId = reloftypeid;
> > referenced.objectSubId = 0;
> > recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
> > }
> >
> > and it'd obviously be more efficient to do that once if we went to using
> > heap_multi_insert() in the dependency code. But I suspect even if we
> > just used an extended API in AddNewAttributeTuples() (for the type /
> > collation dependencies), it'd be a win.
>
> When a colleague was looking at heap_multi_insert in the COPY codepath I
> remembered this and took a stab at a WIP patch inspired by this email, while
> not following it to the letter.  It’s not going the full route of collecting
> all the dependencies for creating a table, but adding ways to perform
> multi_heap_insert in the existing codepaths as it seemed like a good place to
> start.

Cool. I don't quite have the energy to look at this right now, could you
create a CF entry for this?

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
> On 23 May 2019, at 03:46, Andres Freund <[hidden email]> wrote:
> On 2019-05-22 10:25:14 +0200, Daniel Gustafsson wrote:

>> When a colleague was looking at heap_multi_insert in the COPY codepath I
>> remembered this and took a stab at a WIP patch inspired by this email, while
>> not following it to the letter.  It’s not going the full route of collecting
>> all the dependencies for creating a table, but adding ways to perform
>> multi_heap_insert in the existing codepaths as it seemed like a good place to
>> start.
>
> Cool. I don't quite have the energy to look at this right now, could you
> create a CF entry for this?

Of course, done.

cheers ./daniel

Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
In reply to this post by Daniel Gustafsson
Hi,

On 2019-05-22 10:25:14 +0200, Daniel Gustafsson wrote:
> It passes make check and some light profiling around regress suites indicates
> that it does improve a bit by reducing the somewhat costly calls.

Just for the record, here is the profile I did:

perf record --call-graph lbr make -s check-world -Otarget -j16 -s
perf report

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
In reply to this post by Andres Freund
> On 23 May 2019, at 03:46, Andres Freund <[hidden email]> wrote:
> On 2019-05-22 10:25:14 +0200, Daniel Gustafsson wrote:

>> When a colleague was looking at heap_multi_insert in the COPY codepath I
>> remembered this and took a stab at a WIP patch inspired by this email, while
>> not following it to the letter.  It’s not going the full route of collecting
>> all the dependencies for creating a table, but adding ways to perform
>> multi_heap_insert in the existing codepaths as it seemed like a good place to
>> start.
>
> Cool. I don't quite have the energy to look at this right now, could you
> create a CF entry for this?

Attached is an updated version with some of the stuff we briefly discussed at
PGCon.  This version use the ObjectAddresses API already established to collect
the dependencies, and perform a few more multi inserts.  Profiling shows that
we are spending less time in catalog insertions, but whether it’s enough to
warrant the added complexity is up for debate.

The patch is still rough around the edges (TODO’s left to mark some areas), but
I prefer to get some feedback early rather than working too far in potentially
the wrong direction, so parking this in the CF for now.

cheers ./daniel


catalog_multi_insert-v2.patch (60K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
Hi,

On 2019-06-11 15:20:42 +0200, Daniel Gustafsson wrote:
> Attached is an updated version with some of the stuff we briefly discussed at
> PGCon.  This version use the ObjectAddresses API already established to collect
> the dependencies, and perform a few more multi inserts.

Cool.

> Profiling shows that
> we are spending less time in catalog insertions, but whether it’s enough to
> warrant the added complexity is up for debate.

Probably worth benchmarking e.g. temp table creation speed in
isolation. People do complain about that occasionally.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Thomas Munro-5
In reply to this post by Daniel Gustafsson
On Wed, Jun 12, 2019 at 1:21 AM Daniel Gustafsson <[hidden email]> wrote:
> The patch is still rough around the edges (TODO’s left to mark some areas), but
> I prefer to get some feedback early rather than working too far in potentially
> the wrong direction, so parking this in the CF for now.

Hi Daniel,

Given the above disclaimers the following may be entirely expected,
but just in case you weren't aware:
t/010_logical_decoding_timelines.pl fails with this patch applied.

https://travis-ci.org/postgresql-cfbot/postgresql/builds/555205042

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
> On 8 Jul 2019, at 00:02, Thomas Munro <[hidden email]> wrote:

>
> On Wed, Jun 12, 2019 at 1:21 AM Daniel Gustafsson <[hidden email]> wrote:
>> The patch is still rough around the edges (TODO’s left to mark some areas), but
>> I prefer to get some feedback early rather than working too far in potentially
>> the wrong direction, so parking this in the CF for now.
>
> Hi Daniel,
>
> Given the above disclaimers the following may be entirely expected,
> but just in case you weren't aware:
> t/010_logical_decoding_timelines.pl fails with this patch applied.
>
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/555205042
I hadn’t seen since I had fat-fingered and accidentally run the full tests in a
tree without assertions.  The culprit here seems to an assertion in the logical
decoding code which doesn’t account for heap_multi_insert into catalog
relations (which there are none now, this patch introduce them and thus trip
the assertion).  As the issue is somewhat unrelated, I’ve opened a separate
thread with a small patch:

        https://postgr.es/m/CBFFD532-C033-49EB-9A5A-F67EAEE9EB0B@...

The attached v3 also has that fix in order to see if the cfbot is happier with
this.

cheers ./daniel


catalog_multi_insert-v3.patch (61K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Thomas Munro-5
On Tue, Jul 9, 2019 at 11:07 PM Daniel Gustafsson <[hidden email]> wrote:
> The attached v3 also has that fix in order to see if the cfbot is happier with
> this.

Noticed while moving this to the next CF:

heap.c: In function ‘StorePartitionKey’:
1191heap.c:3582:3: error: ‘referenced’ undeclared (first use in this function)
1192   referenced.classId = RelationRelationId;
1193   ^

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
> On 2 Aug 2019, at 00:41, Thomas Munro <[hidden email]> wrote:

>
> On Tue, Jul 9, 2019 at 11:07 PM Daniel Gustafsson <[hidden email]> wrote:
>> The attached v3 also has that fix in order to see if the cfbot is happier with
>> this.
>
> Noticed while moving this to the next CF:
>
> heap.c: In function ‘StorePartitionKey’:
> 1191heap.c:3582:3: error: ‘referenced’ undeclared (first use in this function)
> 1192   referenced.classId = RelationRelationId;
> 1193   ^
Thanks, the attached v4 updates to patch to handle a0555ddab9b672a046 as well.

cheers ./daniel


catalog_multi_insert-v4.patch (61K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Michael Paquier-2
On Mon, Aug 05, 2019 at 09:20:07AM +0200, Daniel Gustafsson wrote:
> Thanks, the attached v4 updates to patch to handle a0555ddab9b672a046 as well.

+   if (referenced->numrefs == 1)
+       recordDependencyOn(depender, &referenced->refs[0], behavior);
+   else
+       recordMultipleDependencies(depender,
+                                  referenced->refs, referenced->numrefs,
+                                  behavior);
This makes me wonder if we should not just add a shortcut in
recordMultipleDependencies() to use recordDependencyOn if there is
only one reference in the set.  That would save the effort of a multi
insert for all callers of recordMultipleDependencies() this way,
including the future ones.  And that could also be done independently
of the addition of InsertPgAttributeTuples(), no?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Michael Paquier-2
On Mon, Aug 05, 2019 at 04:45:59PM +0900, Michael Paquier wrote:

> +   if (referenced->numrefs == 1)
> +       recordDependencyOn(depender, &referenced->refs[0], behavior);
> +   else
> +       recordMultipleDependencies(depender,
> +                                  referenced->refs, referenced->numrefs,
> +                                  behavior);
> This makes me wonder if we should not just add a shortcut in
> recordMultipleDependencies() to use recordDependencyOn if there is
> only one reference in the set.  That would save the effort of a multi
> insert for all callers of recordMultipleDependencies() this way,
> including the future ones.  And that could also be done independently
> of the addition of InsertPgAttributeTuples(), no?
A comment in heap_multi_insert() needs to be updated because it
becomes the case with your patch:
     /*
      * We don't use heap_multi_insert for catalog tuples yet, but
      * better be prepared...
      */

There is also this one in DecodeMultiInsert()
      * CONTAINS_NEW_TUPLE will always be set currently as multi_insert
      * isn't used for catalogs, but better be future proof.

(I am going to comment on the assertion issue on the other thread, I
got some suggestions about it.)
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Michael Paquier-2
On Tue, Aug 06, 2019 at 11:24:46AM +0900, Michael Paquier wrote:

> A comment in heap_multi_insert() needs to be updated because it
> becomes the case with your patch:
>      /*
>       * We don't use heap_multi_insert for catalog tuples yet, but
>       * better be prepared...
>       */
>
> There is also this one in DecodeMultiInsert()
>       * CONTAINS_NEW_TUPLE will always be set currently as multi_insert
>       * isn't used for catalogs, but better be future proof.
Applying the latest patch, this results in an assertion failure for
the tests of test_decoding.

> (I am going to comment on the assertion issue on the other thread, I
> got some suggestions about it.)

This part has resulted in 75c1921, and we could just change
DecodeMultiInsert() so as if there is no tuple data then we'd just
leave.  However, I don't feel completely comfortable with that either
as it would be nice to still check that for normal relations we
*always* have a FPW available.

Daniel, your thoughts?  I am switching the patch as waiting on
author.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
> On 11 Nov 2019, at 09:32, Michael Paquier <[hidden email]> wrote:

> This part has resulted in 75c1921, and we could just change
> DecodeMultiInsert() so as if there is no tuple data then we'd just
> leave.  However, I don't feel completely comfortable with that either
> as it would be nice to still check that for normal relations we
> *always* have a FPW available.

XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations
IIUC (that is, non logically decoded relations), so it seems to me that we can
have a fastpath out of DecodeMultiInsert() which inspects that flag without
problems. Is this proposal along the lines of what you were thinking?

The patch is now generating an error in the regression tests as well, on your
recently added CREATE INDEX test from 68ac9cf2499236996f3d4bf31f7f16d5bd3c77af.
By using the ObjectAddresses API the dependencies are deduped before recorded,
removing the duplicate entry from that test output.  AFAICT there is nothing
benefiting us from having duplicate dependencies, so this seems an improvement
(albeit tiny and not very important), or am I missing something?  Is there a
use for duplicate dependencies?

Attached version contains the above two fixes, as well as a multi_insert for
dependencies in CREATE EXTENSION which I had missed to git add in previous
versions.

cheers ./daniel


catalog_multi_insert-v5.patch (72K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
On 2019-11-12 16:25:06 +0100, Daniel Gustafsson wrote:

> > On 11 Nov 2019, at 09:32, Michael Paquier <[hidden email]> wrote:
>
> > This part has resulted in 75c1921, and we could just change
> > DecodeMultiInsert() so as if there is no tuple data then we'd just
> > leave.  However, I don't feel completely comfortable with that either
> > as it would be nice to still check that for normal relations we
> > *always* have a FPW available.
>
> XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations
> IIUC (that is, non logically decoded relations), so it seems to me that we can
> have a fastpath out of DecodeMultiInsert() which inspects that flag without
> problems. Is this proposal along the lines of what you were thinking?

Maybe I'm missing something, but it's the opposite, no?

        bool need_tuple_data = RelationIsLogicallyLogged(relation);

...
                        if (need_tuple_data)
                                xlrec->flags |= XLH_INSERT_CONTAINS_NEW_TUPLE;

or am I misunderstanding what you mean?


> @@ -1600,10 +1598,16 @@ recordDependencyOnExpr(const ObjectAddress *depender,
>   /* Remove any duplicates */
>   eliminate_duplicate_dependencies(context.addrs);
>  
> - /* And record 'em */
> - recordMultipleDependencies(depender,
> -   context.addrs->refs, context.addrs->numrefs,
> -   behavior);
> + /*
> + * And record 'em. If we know we only have a single dependency then
> + * avoid the extra cost of setting up a multi insert.
> + */
> + if (context.addrs->numrefs == 1)
> + recordDependencyOn(depender, &context.addrs->refs[0], behavior);
> + else
> + recordMultipleDependencies(depender,
> +   context.addrs->refs, context.addrs->numrefs,
> +   behavior);

I'm not sure this is actually a worthwhile complexity to keep. Hard to
believe that setting up a multi-insert is goign to have a significant
enough overhead to matter here?

And if it does, is there a chance we can hide this repeated block
somewhere within recordMultipleDependencies() or such? I don't like the
repetitiveness here. Seems recordMultipleDependencies() could easily
optimize the case of there being exactly one dependency to insert?


> +/*
> + * InsertPgAttributeTuples
> + * Construct and insert multiple tuples in pg_attribute.
> + *
> + * This is a variant of InsertPgAttributeTuple() which dynamically allocates
> + * space for multiple tuples. Having two so similar functions is a kludge, but
> + * for now it's a TODO to make it less terrible.
> + */
> +void
> +InsertPgAttributeTuples(Relation pg_attribute_rel,
> + FormData_pg_attribute *new_attributes,
> + int natts,
> + CatalogIndexState indstate)
> +{
> + Datum values[Natts_pg_attribute];
> + bool nulls[Natts_pg_attribute];
> + HeapTuple tup;
> + int i;
> + TupleTableSlot **slot;
> +
> + /*
> + * The slots are dropped in CatalogMultiInsertWithInfo(). TODO: natts
> + * number of slots is not a reasonable assumption as a wide relation
> + * would be detrimental, figuring a good number is left as a TODO.
> + */
> + slot = palloc(sizeof(TupleTableSlot *) * natts);

Hm. Looking at

SELECT avg(pg_column_size(pa)) FROM pg_attribute pa;

yielding ~144 bytes, we can probably cap this at 128 or such, without
loosing efficiency. Or just use
#define MAX_BUFFERED_BYTES 65535
from copy.c or such (MAX_BUFFERED_BYTES / sizeof(FormData_pg_attribute)).


> + /* This is a tad tedious, but way cleaner than what we used to do... */

I don't like comments that refer to "what we used to" because there's no
way for anybody to make sense of that, because it's basically a dangling
reference :)


> + memset(values, 0, sizeof(values));
> + memset(nulls, false, sizeof(nulls));

> + /* start out with empty permissions and empty options */
> + nulls[Anum_pg_attribute_attacl - 1] = true;
> + nulls[Anum_pg_attribute_attoptions - 1] = true;
> + nulls[Anum_pg_attribute_attfdwoptions - 1] = true;
> + nulls[Anum_pg_attribute_attmissingval - 1] = true;
> +
> + /* attcacheoff is always -1 in storage */
> + values[Anum_pg_attribute_attcacheoff - 1] = Int32GetDatum(-1);
> +
> + for (i = 0; i < natts; i++)
> + {
> + values[Anum_pg_attribute_attrelid - 1] = ObjectIdGetDatum(new_attributes[i].attrelid);
> + values[Anum_pg_attribute_attname - 1] = NameGetDatum(&new_attributes[i].attname);
> + values[Anum_pg_attribute_atttypid - 1] = ObjectIdGetDatum(new_attributes[i].atttypid);
> + values[Anum_pg_attribute_attstattarget - 1] = Int32GetDatum(new_attributes[i].attstattarget);
> + values[Anum_pg_attribute_attlen - 1] = Int16GetDatum(new_attributes[i].attlen);
> + values[Anum_pg_attribute_attnum - 1] = Int16GetDatum(new_attributes[i].attnum);
> + values[Anum_pg_attribute_attndims - 1] = Int32GetDatum(new_attributes[i].attndims);
> + values[Anum_pg_attribute_atttypmod - 1] = Int32GetDatum(new_attributes[i].atttypmod);
> + values[Anum_pg_attribute_attbyval - 1] = BoolGetDatum(new_attributes[i].attbyval);
> + values[Anum_pg_attribute_attstorage - 1] = CharGetDatum(new_attributes[i].attstorage);
> + values[Anum_pg_attribute_attalign - 1] = CharGetDatum(new_attributes[i].attalign);
> + values[Anum_pg_attribute_attnotnull - 1] = BoolGetDatum(new_attributes[i].attnotnull);
> + values[Anum_pg_attribute_atthasdef - 1] = BoolGetDatum(new_attributes[i].atthasdef);
> + values[Anum_pg_attribute_atthasmissing - 1] = BoolGetDatum(new_attributes[i].atthasmissing);
> + values[Anum_pg_attribute_attidentity - 1] = CharGetDatum(new_attributes[i].attidentity);
> + values[Anum_pg_attribute_attgenerated - 1] = CharGetDatum(new_attributes[i].attgenerated);
> + values[Anum_pg_attribute_attisdropped - 1] = BoolGetDatum(new_attributes[i].attisdropped);
> + values[Anum_pg_attribute_attislocal - 1] = BoolGetDatum(new_attributes[i].attislocal);
> + values[Anum_pg_attribute_attinhcount - 1] = Int32GetDatum(new_attributes[i].attinhcount);
> + values[Anum_pg_attribute_attcollation - 1] = ObjectIdGetDatum(new_attributes[i].attcollation);
> +
> + slot[i] = MakeSingleTupleTableSlot(RelationGetDescr(pg_attribute_rel),
> +   &TTSOpsHeapTuple);
> + tup = heap_form_tuple(RelationGetDescr(pg_attribute_rel), values, nulls);
> + ExecStoreHeapTuple(heap_copytuple(tup), slot[i], false);

This seems likely to waste some effort - during insertion the slot will
be materialized, which'll copy the tuple. I think it'd be better to
construct the tuple directly inside the slot's tts_values/isnull, and
then store it with ExecStoreVirtualTuple().

Also, why aren't you actually just specifying shouldFree = true? We'd
want the result of the heap_form_tuple() to be freed eagerly, no? Nor do
I get why you're *also* heap_copytuple'ing the tuple, despite having
just locally formed it, and not referencing the result of
heap_form_tuple() further?  Obviously that's all unimportant if you
change the code to use ExecStoreVirtualTuple()?


> + }
> +
> + /* finally insert the new tuples, update the indexes, and clean up */
> + CatalogMultiInsertWithInfo(pg_attribute_rel, slot, natts, indstate);

Previous comment:

I think it might be worthwhile to clear and delete the slots
after inserting. That's potentially a good bit of memory, no?

Current comment:

I think I quite dislike the API of CatalogMultiInsertWithInfo freeing
the slots. It'd e.g. make it impossible to reuse them to insert more
data. It's also really hard to understand


> +}
> +
>  /*
>   * InsertPgAttributeTuple
>   * Construct and insert a new tuple in pg_attribute.
> @@ -754,7 +827,7 @@ AddNewAttributeTuples(Oid new_rel_oid,
>    TupleDesc tupdesc,
>    char relkind)
>  {
> - Form_pg_attribute attr;
> + Form_pg_attribute *attrs;
>   int i;
>   Relation rel;
>   CatalogIndexState indstate;
> @@ -769,35 +842,42 @@ AddNewAttributeTuples(Oid new_rel_oid,
>  
>   indstate = CatalogOpenIndexes(rel);
>  
> + attrs = palloc(sizeof(Form_pg_attribute) * natts);

Hm. Why we we need this separate allocation? Isn't this exactly the same
as what's in the tupledesc?


> +/*
> + * CatalogMultiInsertWithInfo

Hm. The current function is called CatalogTupleInsert(), wouldn't
CatalogTuplesInsertWithInfo() or such be more accurate? Or
CatalogTuplesMultiInsertWithInfo()?



> @@ -81,7 +128,14 @@ recordMultipleDependencies(const ObjectAddress *depender,
>  
>   memset(nulls, false, sizeof(nulls));
>  
> - for (i = 0; i < nreferenced; i++, referenced++)
> + values[Anum_pg_depend_classid - 1] = ObjectIdGetDatum(depender->classId);
> + values[Anum_pg_depend_objid - 1] = ObjectIdGetDatum(depender->objectId);
> + values[Anum_pg_depend_objsubid - 1] = Int32GetDatum(depender->objectSubId);
> +
> + /* TODO is nreferenced a reasonable allocation of slots? */
> + slot = palloc(sizeof(TupleTableSlot *) * nreferenced);
> +
> + for (i = 0, ntuples = 0; i < nreferenced; i++, referenced++)
>   {
>   /*
>   * If the referenced object is pinned by the system, there's no real
> @@ -94,30 +148,34 @@ recordMultipleDependencies(const ObjectAddress *depender,
>   * Record the Dependency.  Note we don't bother to check for
>   * duplicate dependencies; there's no harm in them.
>   */
> - values[Anum_pg_depend_classid - 1] = ObjectIdGetDatum(depender->classId);
> - values[Anum_pg_depend_objid - 1] = ObjectIdGetDatum(depender->objectId);
> - values[Anum_pg_depend_objsubid - 1] = Int32GetDatum(depender->objectSubId);
> -
>   values[Anum_pg_depend_refclassid - 1] = ObjectIdGetDatum(referenced->classId);
>   values[Anum_pg_depend_refobjid - 1] = ObjectIdGetDatum(referenced->objectId);
>   values[Anum_pg_depend_refobjsubid - 1] = Int32GetDatum(referenced->objectSubId);
>  
>   values[Anum_pg_depend_deptype - 1] = CharGetDatum((char) behavior);
>  
> - tup = heap_form_tuple(dependDesc->rd_att, values, nulls);
> + slot[ntuples] = MakeSingleTupleTableSlot(RelationGetDescr(dependDesc),
> + &TTSOpsHeapTuple);
> +
> + tuple = heap_form_tuple(dependDesc->rd_att, values, nulls);
> + ExecStoreHeapTuple(heap_copytuple(tuple), slot[ntuples], false);
> + ntuples++;

Same concern as in the equivalent pg_attribute code.


> + ntuples = 0;
>   while (HeapTupleIsValid(tup = systable_getnext(scan)))
>   {
> - HeapTuple newtup;
> -
> - newtup = heap_modify_tuple(tup, sdepDesc, values, nulls, replace);
> - CatalogTupleInsertWithInfo(sdepRel, newtup, indstate);
> + slot[ntuples] = MakeSingleTupleTableSlot(RelationGetDescr(sdepRel),
> + &TTSOpsHeapTuple);
> + newtuple = heap_modify_tuple(tup, sdepDesc, values, nulls, replace);
> + ExecStoreHeapTuple(heap_copytuple(newtuple), slot[ntuples], false);
> + ntuples++;


> - heap_freetuple(newtup);
> + if (ntuples == DEPEND_TUPLE_BUF)
> + {
> + CatalogMultiInsertWithInfo(sdepRel, slot, ntuples, indstate);
> + ntuples = 0;
> + }
>   }

Too much copying again.


Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Michael Paquier-2
On Tue, Nov 12, 2019 at 10:33:16AM -0800, Andres Freund wrote:

> On 2019-11-12 16:25:06 +0100, Daniel Gustafsson wrote:
>> On 11 Nov 2019, at 09:32, Michael Paquier <[hidden email]> wrote:
>>
>>> This part has resulted in 75c1921, and we could just change
>>> DecodeMultiInsert() so as if there is no tuple data then we'd just
>>> leave.  However, I don't feel completely comfortable with that either
>>> as it would be nice to still check that for normal relations we
>>> *always* have a FPW available.
>>
>> XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations
>> IIUC (that is, non logically decoded relations), so it seems to me that we can
>> have a fastpath out of DecodeMultiInsert() which inspects that flag without
>> problems. Is this proposal along the lines of what you were thinking?
>
> Maybe I'm missing something, but it's the opposite, no?

> bool need_tuple_data = RelationIsLogicallyLogged(relation);

Yep.  This checks after IsCatalogRelation().

> ...
> if (need_tuple_data)
> xlrec->flags |= XLH_INSERT_CONTAINS_NEW_TUPLE;
>
> or am I misunderstanding what you mean?

Not sure that I can think about a good way to properly track if the
new tuple data is associated to a catalog or not, aka if the data is
expected all the time or not to get a good sanity check when doing the
multi-insert decoding.  We could extend xl_multi_insert_tuple with a
flag to track that, but that seems like an overkill for a simple
sanity check..
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
Hi,

On 2019-11-13 15:58:28 +0900, Michael Paquier wrote:

> On Tue, Nov 12, 2019 at 10:33:16AM -0800, Andres Freund wrote:
> > On 2019-11-12 16:25:06 +0100, Daniel Gustafsson wrote:
> >> On 11 Nov 2019, at 09:32, Michael Paquier <[hidden email]> wrote:
> >>
> >>> This part has resulted in 75c1921, and we could just change
> >>> DecodeMultiInsert() so as if there is no tuple data then we'd just
> >>> leave.  However, I don't feel completely comfortable with that either
> >>> as it would be nice to still check that for normal relations we
> >>> *always* have a FPW available.
> >>
> >> XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations
> >> IIUC (that is, non logically decoded relations), so it seems to me that we can
> >> have a fastpath out of DecodeMultiInsert() which inspects that flag without
> >> problems. Is this proposal along the lines of what you were thinking?
> >
> > Maybe I'm missing something, but it's the opposite, no?
>
> > bool need_tuple_data = RelationIsLogicallyLogged(relation);
>
> Yep.  This checks after IsCatalogRelation().
>
> > ...
> > if (need_tuple_data)
> > xlrec->flags |= XLH_INSERT_CONTAINS_NEW_TUPLE;
> >
> > or am I misunderstanding what you mean?
>
> Not sure that I can think about a good way to properly track if the
> new tuple data is associated to a catalog or not, aka if the data is
> expected all the time or not to get a good sanity check when doing the
> multi-insert decoding.  We could extend xl_multi_insert_tuple with a
> flag to track that, but that seems like an overkill for a simple
> sanity check..

My point was that I think there must be negation missing in Daniel's
statement - XLH_INSERT_CONTAINS_NEW_TUPLE will only be set if *not* a
catalog relation. But Daniel's statement says exactly the opposite, at
least by my read.

I can't follow what you're trying to get at in this sub discussion - why
would we want to sanity check something about catalog tables here? Given
that XLH_INSERT_CONTAINS_NEW_TUPLE is set exactly when the table is not
a catalog table, that seems entirely superfluous?

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Michael Paquier-2
On Wed, Nov 13, 2019 at 05:55:12PM -0800, Andres Freund wrote:
> My point was that I think there must be negation missing in Daniel's
> statement - XLH_INSERT_CONTAINS_NEW_TUPLE will only be set if *not* a
> catalog relation. But Daniel's statement says exactly the opposite, at
> least by my read.

FWIW, I am reading the same, aka the sentence of Daniel is wrong.  And
that what you say is right.

> I can't follow what you're trying to get at in this sub discussion - why
> would we want to sanity check something about catalog tables here? Given
> that XLH_INSERT_CONTAINS_NEW_TUPLE is set exactly when the table is not
> a catalog table, that seems entirely superfluous?

[ ... Looking ... ]
You are right, we could just rely on cross-checking that when we have
no data then XLH_INSERT_CONTAINS_NEW_TUPLE is not set, or something
like that:
@@ -901,11 +901,17 @@ DecodeMultiInsert(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
        return;

    /*
-    * As multi_insert is not used for catalogs yet, the block should always
-    * have data even if a full-page write of it is taken.
+    * multi_insert can be used by catalogs, hence it is possible that
+    * the block does not have any data even if a full-page write of it
+    * is taken.
     */
     tupledata = XLogRecGetBlockData(r, 0, &tuplelen);
-    Assert(tupledata != NULL);
+    Assert(tupledata == NULL ||
+           (xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) != 0);
+
+    /* No data, then leave */
+    if (tupledata == NULL)
+        return;

The latest patch does not apply, so it needs a rebase.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
In reply to this post by Andres Freund
> On 12 Nov 2019, at 19:33, Andres Freund <[hidden email]> wrote:

Thanks for reviewing!

> On 2019-11-12 16:25:06 +0100, Daniel Gustafsson wrote:
>>> On 11 Nov 2019, at 09:32, Michael Paquier <[hidden email]> wrote:
>>
>>> This part has resulted in 75c1921, and we could just change
>>> DecodeMultiInsert() so as if there is no tuple data then we'd just
>>> leave.  However, I don't feel completely comfortable with that either
>>> as it would be nice to still check that for normal relations we
>>> *always* have a FPW available.
>>
>> XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations
>> IIUC (that is, non logically decoded relations), so it seems to me that we can
>> have a fastpath out of DecodeMultiInsert() which inspects that flag without
>> problems. Is this proposal along the lines of what you were thinking?
>
> Maybe I'm missing something, but it's the opposite, no?
> ...
> or am I misunderstanding what you mean?
Correct, as has been discussed in this thread already, I managed to write it
backwards.

>> @@ -1600,10 +1598,16 @@ recordDependencyOnExpr(const ObjectAddress *depender,
>> /* Remove any duplicates */
>> eliminate_duplicate_dependencies(context.addrs);
>>
>> - /* And record 'em */
>> - recordMultipleDependencies(depender,
>> -   context.addrs->refs, context.addrs->numrefs,
>> -   behavior);
>> + /*
>> + * And record 'em. If we know we only have a single dependency then
>> + * avoid the extra cost of setting up a multi insert.
>> + */
>> + if (context.addrs->numrefs == 1)
>> + recordDependencyOn(depender, &context.addrs->refs[0], behavior);
>> + else
>> + recordMultipleDependencies(depender,
>> +   context.addrs->refs, context.addrs->numrefs,
>> +   behavior);
>
> I'm not sure this is actually a worthwhile complexity to keep. Hard to
> believe that setting up a multi-insert is goign to have a significant
> enough overhead to matter here?
>
> And if it does, is there a chance we can hide this repeated block
> somewhere within recordMultipleDependencies() or such? I don't like the
> repetitiveness here. Seems recordMultipleDependencies() could easily
> optimize the case of there being exactly one dependency to insert?
Agreed, I've simplified by just calling recordMultipleDepencies() until that's
found to be too expensive.

>> + slot = palloc(sizeof(TupleTableSlot *) * natts);
>
> Hm. Looking at
>
> SELECT avg(pg_column_size(pa)) FROM pg_attribute pa;
>
> yielding ~144 bytes, we can probably cap this at 128 or such, without
> loosing efficiency. Or just use
> #define MAX_BUFFERED_BYTES 65535
> from copy.c or such (MAX_BUFFERED_BYTES / sizeof(FormData_pg_attribute)).
Added a limit using MAX_BUFFERED_BYTES as above, or the number of tuples,
whichever is smallest.

>> + /* This is a tad tedious, but way cleaner than what we used to do... */
>
> I don't like comments that refer to "what we used to" because there's no
> way for anybody to make sense of that, because it's basically a dangling
> reference :)

This is a copy/paste from InsertPgAttributeTuple added in 03e5248d0f0, which in
turn quite likely was a copy/paste from InsertPgClassTuple added in b7b78d24f7f
back in 2006.  Looking at that commit, it's clear what the comment refers to
but it's quite useless in isolation.  Since the current coding is its teens by
now, perhaps we should just remove the two existing occurrences?

I can submit a rewrite of the comments into something less gazing into the past
if you feel like removing these.

> This seems likely to waste some effort - during insertion the slot will
> be materialized, which'll copy the tuple. I think it'd be better to
> construct the tuple directly inside the slot's tts_values/isnull, and
> then store it with ExecStoreVirtualTuple().

I'm not sure why it looked that way, but it's clearly rubbish.  Rewrote by
taking the TupleDesc as input (which addresses your other comment below too),
and create the tuples directly by using ExecStoreVirtualTuple.

>> + }
>> +
>> + /* finally insert the new tuples, update the indexes, and clean up */
>> + CatalogMultiInsertWithInfo(pg_attribute_rel, slot, natts, indstate);
>
> Previous comment:
>
> I think it might be worthwhile to clear and delete the slots
> after inserting. That's potentially a good bit of memory, no?
>
> Current comment:
>
> I think I quite dislike the API of CatalogMultiInsertWithInfo freeing
> the slots. It'd e.g. make it impossible to reuse them to insert more
> data. It's also really hard to understand
I don't have strong feelings, I see merit in both approches but the reuse
aspect is clearly the winner.  I've changed it such that the caller is
responsible for freeing.

>> + attrs = palloc(sizeof(Form_pg_attribute) * natts);
>
> Hm. Why we we need this separate allocation? Isn't this exactly the same
> as what's in the tupledesc?

Fixed.

>> +/*
>> + * CatalogMultiInsertWithInfo
>
> Hm. The current function is called CatalogTupleInsert(), wouldn't
> CatalogTuplesInsertWithInfo() or such be more accurate? Or
> CatalogTuplesMultiInsertWithInfo()?

Fixed by opting for the latter, mostly since it seems best convey what the
function does.

> Same concern as in the equivalent pg_attribute code.

> Too much copying again.

Both of them fixed.

The attached patch addresses all of the comments, thanks again for reviewing!

cheers ./daniel




catalog_multi_insert-v6.patch (72K) Download Attachment
123