Ought to use heap_multi_insert() for pg_attribute/depend insertions?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
Hi,

Turns out in portions of the regression tests a good chunk of the
runtime is inside AddNewAttributeTuples() and
recordMultipleDependencies()'s heap insertions. Looking at a few
profiles I had lying around I found that in some production cases
too. ISTM we should use heap_multi_insert() for both, as the source
tuples ought to be around reasonably comfortably.

For recordMultipleDependencies() it'd obviously better if we collected
all dependencies for new objects, rather than doing so separately. Right
now e.g. the code for a new table looks like:

                recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);

                recordDependencyOnOwner(RelationRelationId, relid, ownerid);

                recordDependencyOnNewAcl(RelationRelationId, relid, 0, ownerid, relacl);

                recordDependencyOnCurrentExtension(&myself, false);

                if (reloftypeid)
                {
                        referenced.classId = TypeRelationId;
                        referenced.objectId = reloftypeid;
                        referenced.objectSubId = 0;
                        recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
                }

and it'd obviously be more efficient to do that once if we went to using
heap_multi_insert() in the dependency code. But I suspect even if we
just used an extended API in AddNewAttributeTuples() (for the type /
collation dependencies), it'd be a win.

I'm not planning to work on this soon, but I though it'd be worthwhile
to put this out there (even if potentially just as a note to myself).

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
> On 13 Feb 2019, at 19:27, Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> Turns out in portions of the regression tests a good chunk of the
> runtime is inside AddNewAttributeTuples() and
> recordMultipleDependencies()'s heap insertions. Looking at a few
> profiles I had lying around I found that in some production cases
> too. ISTM we should use heap_multi_insert() for both, as the source
> tuples ought to be around reasonably comfortably.
>
> For recordMultipleDependencies() it'd obviously better if we collected
> all dependencies for new objects, rather than doing so separately. Right
> now e.g. the code for a new table looks like:
>
> recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
>
> recordDependencyOnOwner(RelationRelationId, relid, ownerid);
>
> recordDependencyOnNewAcl(RelationRelationId, relid, 0, ownerid, relacl);
>
> recordDependencyOnCurrentExtension(&myself, false);
>
> if (reloftypeid)
> {
> referenced.classId = TypeRelationId;
> referenced.objectId = reloftypeid;
> referenced.objectSubId = 0;
> recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
> }
>
> and it'd obviously be more efficient to do that once if we went to using
> heap_multi_insert() in the dependency code. But I suspect even if we
> just used an extended API in AddNewAttributeTuples() (for the type /
> collation dependencies), it'd be a win.
When a colleague was looking at heap_multi_insert in the COPY codepath I
remembered this and took a stab at a WIP patch inspired by this email, while
not following it to the letter.  It’s not going the full route of collecting
all the dependencies for creating a table, but adding ways to perform
multi_heap_insert in the existing codepaths as it seemed like a good place to
start.

It introduces a new function CatalogMultiInsertWithInfo which takes a set of
slots for use in heap_multi_insert, used from recordMultipleDependencies and
InsertPgAttributeTuples (which replace calling InsertPgAttributeTuple
repeatedly).  The code is still a WIP with some kludges, following the show-
early philosophy.

It passes make check and some light profiling around regress suites indicates
that it does improve a bit by reducing the somewhat costly calls.  Is this
along the lines of what you were thinking or way off?

cheers ./daniel


catalog_multi_insert.patch (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
Hi,

On 2019-05-22 10:25:14 +0200, Daniel Gustafsson wrote:

> > On 13 Feb 2019, at 19:27, Andres Freund <[hidden email]> wrote:
> >
> > Hi,
> >
> > Turns out in portions of the regression tests a good chunk of the
> > runtime is inside AddNewAttributeTuples() and
> > recordMultipleDependencies()'s heap insertions. Looking at a few
> > profiles I had lying around I found that in some production cases
> > too. ISTM we should use heap_multi_insert() for both, as the source
> > tuples ought to be around reasonably comfortably.
> >
> > For recordMultipleDependencies() it'd obviously better if we collected
> > all dependencies for new objects, rather than doing so separately. Right
> > now e.g. the code for a new table looks like:
> >
> > recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
> >
> > recordDependencyOnOwner(RelationRelationId, relid, ownerid);
> >
> > recordDependencyOnNewAcl(RelationRelationId, relid, 0, ownerid, relacl);
> >
> > recordDependencyOnCurrentExtension(&myself, false);
> >
> > if (reloftypeid)
> > {
> > referenced.classId = TypeRelationId;
> > referenced.objectId = reloftypeid;
> > referenced.objectSubId = 0;
> > recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
> > }
> >
> > and it'd obviously be more efficient to do that once if we went to using
> > heap_multi_insert() in the dependency code. But I suspect even if we
> > just used an extended API in AddNewAttributeTuples() (for the type /
> > collation dependencies), it'd be a win.
>
> When a colleague was looking at heap_multi_insert in the COPY codepath I
> remembered this and took a stab at a WIP patch inspired by this email, while
> not following it to the letter.  It’s not going the full route of collecting
> all the dependencies for creating a table, but adding ways to perform
> multi_heap_insert in the existing codepaths as it seemed like a good place to
> start.

Cool. I don't quite have the energy to look at this right now, could you
create a CF entry for this?

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
> On 23 May 2019, at 03:46, Andres Freund <[hidden email]> wrote:
> On 2019-05-22 10:25:14 +0200, Daniel Gustafsson wrote:

>> When a colleague was looking at heap_multi_insert in the COPY codepath I
>> remembered this and took a stab at a WIP patch inspired by this email, while
>> not following it to the letter.  It’s not going the full route of collecting
>> all the dependencies for creating a table, but adding ways to perform
>> multi_heap_insert in the existing codepaths as it seemed like a good place to
>> start.
>
> Cool. I don't quite have the energy to look at this right now, could you
> create a CF entry for this?

Of course, done.

cheers ./daniel

Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
In reply to this post by Daniel Gustafsson
Hi,

On 2019-05-22 10:25:14 +0200, Daniel Gustafsson wrote:
> It passes make check and some light profiling around regress suites indicates
> that it does improve a bit by reducing the somewhat costly calls.

Just for the record, here is the profile I did:

perf record --call-graph lbr make -s check-world -Otarget -j16 -s
perf report

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Daniel Gustafsson
In reply to this post by Andres Freund
> On 23 May 2019, at 03:46, Andres Freund <[hidden email]> wrote:
> On 2019-05-22 10:25:14 +0200, Daniel Gustafsson wrote:

>> When a colleague was looking at heap_multi_insert in the COPY codepath I
>> remembered this and took a stab at a WIP patch inspired by this email, while
>> not following it to the letter.  It’s not going the full route of collecting
>> all the dependencies for creating a table, but adding ways to perform
>> multi_heap_insert in the existing codepaths as it seemed like a good place to
>> start.
>
> Cool. I don't quite have the energy to look at this right now, could you
> create a CF entry for this?

Attached is an updated version with some of the stuff we briefly discussed at
PGCon.  This version use the ObjectAddresses API already established to collect
the dependencies, and perform a few more multi inserts.  Profiling shows that
we are spending less time in catalog insertions, but whether it’s enough to
warrant the added complexity is up for debate.

The patch is still rough around the edges (TODO’s left to mark some areas), but
I prefer to get some feedback early rather than working too far in potentially
the wrong direction, so parking this in the CF for now.

cheers ./daniel


catalog_multi_insert-v2.patch (60K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

Andres Freund
Hi,

On 2019-06-11 15:20:42 +0200, Daniel Gustafsson wrote:
> Attached is an updated version with some of the stuff we briefly discussed at
> PGCon.  This version use the ObjectAddresses API already established to collect
> the dependencies, and perform a few more multi inserts.

Cool.

> Profiling shows that
> we are spending less time in catalog insertions, but whether it’s enough to
> warrant the added complexity is up for debate.

Probably worth benchmarking e.g. temp table creation speed in
isolation. People do complain about that occasionally.

Greetings,

Andres Freund