Why does ExecComputeStoredGenerated() form a heap tuple

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Why does ExecComputeStoredGenerated() form a heap tuple

Andres Freund
Hi,

while rebasing the remaining tableam patches (luckily a pretty small set
now!), I had a few conflicts with ExecComputeStoredGenerated(). While
resolving I noticed:

        oldtuple = ExecFetchSlotHeapTuple(slot, true, &should_free);
        newtuple = heap_modify_tuple(oldtuple, tupdesc, values, nulls, replaces);
        ExecForceStoreHeapTuple(newtuple, slot);
        if (should_free)
                heap_freetuple(oldtuple);

        MemoryContextSwitchTo(oldContext);

First off, I'm not convinced this is correct:

ISTM you'd need at least an ExecMaterializeSlot() before the
MemoryContextSwitchTo() in ExecComputeStoredGenerated().

But what actually brought me to reply was that it seems like it'll cause
unnecessary slowdowns for !heap AMs. First, it'll form a heaptuple if
the slot isn't in that form, and then it'll cause a conversion by
storing a heap tuple even if the target doesn't use heap representation.

ISTM the above would be much more efficiently - even more efficient if
only heap is used - implemented as something roughly akin to:

    slot_getallattrs(slot);
    memcpy(values, slot->tts_values, ...);
    memcpy(nulls, slot->tts_isnull, ...);

    for (int i = 0; i < natts; i++)
    {
        if (TupleDescAttr(tupdesc, i)->attgenerated == ATTRIBUTE_GENERATED_STORED)
        {
            values[i] = ...
        }
        else
            values[i] = datumCopy(...);
    }

    ExecClearTuple(slot);
    memcpy(slot->tts_values, values, ...);
    memcpy(slot->tts_isnull, nulls, ...);
    ExecStoreVirtualTuple(slot);
    ExecMaterializeSlot(slot);

that's not perfect, but more efficient than your version...

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Why does ExecComputeStoredGenerated() form a heap tuple

Andres Freund
Hi,

On 2019-03-30 19:57:44 -0700, Andres Freund wrote:

> while rebasing the remaining tableam patches (luckily a pretty small set
> now!), I had a few conflicts with ExecComputeStoredGenerated(). While
> resolving I noticed:
>
> oldtuple = ExecFetchSlotHeapTuple(slot, true, &should_free);
> newtuple = heap_modify_tuple(oldtuple, tupdesc, values, nulls, replaces);
> ExecForceStoreHeapTuple(newtuple, slot);
> if (should_free)
> heap_freetuple(oldtuple);
>
> MemoryContextSwitchTo(oldContext);
>
> First off, I'm not convinced this is correct:
>
> ISTM you'd need at least an ExecMaterializeSlot() before the
> MemoryContextSwitchTo() in ExecComputeStoredGenerated().
>
> But what actually brought me to reply was that it seems like it'll cause
> unnecessary slowdowns for !heap AMs. First, it'll form a heaptuple if
> the slot isn't in that form, and then it'll cause a conversion by
> storing a heap tuple even if the target doesn't use heap representation.
>
> ISTM the above would be much more efficiently - even more efficient if
> only heap is used - implemented as something roughly akin to:
>
>     slot_getallattrs(slot);
>     memcpy(values, slot->tts_values, ...);
>     memcpy(nulls, slot->tts_isnull, ...);
>
>     for (int i = 0; i < natts; i++)
>     {
>         if (TupleDescAttr(tupdesc, i)->attgenerated == ATTRIBUTE_GENERATED_STORED)
>         {
>             values[i] = ...
>         }
>         else
>             values[i] = datumCopy(...);
>     }
>
>     ExecClearTuple(slot);
>     memcpy(slot->tts_values, values, ...);
>     memcpy(slot->tts_isnull, nulls, ...);
>     ExecStoreVirtualTuple(slot);
>     ExecMaterializeSlot(slot);
>
> that's not perfect, but more efficient than your version...

Also, have you actually benchmarked this code? ISTM that adding a
stored generated column would cause quite noticable slowdowns in the
COPY path based on this code.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Why does ExecComputeStoredGenerated() form a heap tuple

Peter Eisentraut-6
In reply to this post by Andres Freund
On 2019-03-31 04:57, Andres Freund wrote:
> while rebasing the remaining tableam patches (luckily a pretty small set
> now!), I had a few conflicts with ExecComputeStoredGenerated(). While
> resolving I noticed:

The core of that code was written a long time ago and perhaps hasn't
caught up with all the refactoring going on.  I'll look through your
proposal and update the code.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Why does ExecComputeStoredGenerated() form a heap tuple

Peter Eisentraut-6
In reply to this post by Andres Freund
On 2019-03-31 05:00, Andres Freund wrote:
> Also, have you actually benchmarked this code? ISTM that adding a
> stored generated column would cause quite noticable slowdowns in the
> COPY path based on this code.

Yes, it'll be slower than not having it, but it's much faster than the
equivalent trigger.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Why does ExecComputeStoredGenerated() form a heap tuple

Andres Freund
Hi,

On 2019-04-01 11:25:46 +0200, Peter Eisentraut wrote:
> On 2019-03-31 05:00, Andres Freund wrote:
> > Also, have you actually benchmarked this code? ISTM that adding a
> > stored generated column would cause quite noticable slowdowns in the
> > COPY path based on this code.
>
> Yes, it'll be slower than not having it, but it's much faster than the
> equivalent trigger.

It at the moment is quite noticably slower than directly inserting the
generated column.

postgres[11993][1]=# CREATE TABLE foo_without_generated(id int, copy_of_int int);
CREATE TABLE
Time: 0.625 ms
postgres[11993][1]=# CREATE TABLE foo_with_generated(id int, copy_of_int int generated always as (id) stored);
CREATE TABLE
Time: 0.771 ms
postgres[11993][1]=# INSERT INTO foo_without_generated SELECT g.i, g.i FROM generate_series(1, 1000000) g(i);
INSERT 0 1000000
Time: 691.533 ms
postgres[11993][1]=# INSERT INTO foo_with_generated SELECT g.i FROM generate_series(1, 1000000) g(i);
INSERT 0 1000000
Time: 825.471 ms
postgres[11993][1]=# COPY foo_without_generated TO '/tmp/foo_without_generated';
COPY 1000000
Time: 194.051 ms
postgres[11993][1]=# COPY foo_with_generated TO '/tmp/foo_with_generated';
COPY 1000000
Time: 153.146 ms
postgres[11993][1]=# ;TRUNCATE foo_without_generated ;COPY foo_without_generated FROM '/tmp/foo_without_generated';
Time: 0.178 ms
TRUNCATE TABLE
Time: 8.456 ms
COPY 1000000
Time: 394.990 ms
postgres[11993][1]=# ;TRUNCATE foo_with_generated ;COPY foo_with_generated FROM '/tmp/foo_with_generated';
Time: 0.147 ms
TRUNCATE TABLE
Time: 8.043 ms
COPY 1000000
Time: 508.918 ms

From a quick profile that's indeed largely because
ExecComputeStoredGenerated() is really inefficient - and it seems
largely unnecessarily so.  I think this should at least be roughly as
efficient as getting the additional data from the client.


Minor other point: I'm not a fan of defining more general infrastructure
like ExecComputedStoredGenerated() in nodeModifyTable.c - it's already
large and confusing, and it's not obvious that e.g. COPY would call into
it.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Why does ExecComputeStoredGenerated() form a heap tuple

Peter Eisentraut-6
In reply to this post by Peter Eisentraut-6
On 2019-04-01 11:23, Peter Eisentraut wrote:
> On 2019-03-31 04:57, Andres Freund wrote:
>> while rebasing the remaining tableam patches (luckily a pretty small set
>> now!), I had a few conflicts with ExecComputeStoredGenerated(). While
>> resolving I noticed:
>
> The core of that code was written a long time ago and perhaps hasn't
> caught up with all the refactoring going on.  I'll look through your
> proposal and update the code.

The attached patch is based on your sketch.  It's clearly better in the
long term not to rely on heap tuples here.  But in testing this change
seems to make it slightly slower, certainly not a speedup as you were
apparently hoping for.


Test setup:

create table t0 (a int, b int);
insert into t0 select generate_series (1, 10000000);  -- 10 million
\copy t0 (a) to 'test.dat';

-- for comparison, without generated column
truncate t0;
\copy t0 (a) from 'test.dat';

-- master
create table t1 (a int, b int generated always as (a * 2) stored);
truncate t1;
\copy t1 (a) from 'test.dat';

-- patched
truncate t1;
\copy t1 (a) from 'test.dat';

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

0001-Don-t-form-heap-tuple-in-ExecComputeStoredGenerated.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Why does ExecComputeStoredGenerated() form a heap tuple

David Rowley-3
On Tue, 23 Apr 2019 at 20:23, Peter Eisentraut
<[hidden email]> wrote:

> The attached patch is based on your sketch.  It's clearly better in the
> long term not to rely on heap tuples here.  But in testing this change
> seems to make it slightly slower, certainly not a speedup as you were
> apparently hoping for.
>
>
> Test setup:
>
> create table t0 (a int, b int);
> insert into t0 select generate_series (1, 10000000);  -- 10 million
> \copy t0 (a) to 'test.dat';
>
> -- for comparison, without generated column
> truncate t0;
> \copy t0 (a) from 'test.dat';
>
> -- master
> create table t1 (a int, b int generated always as (a * 2) stored);
> truncate t1;
> \copy t1 (a) from 'test.dat';
>
> -- patched
> truncate t1;
> \copy t1 (a) from 'test.dat';

I didn't do the exact same test, but if I use COPY instead of \copy,
then for me patched is faster.

Normal table:


postgres=# copy t0 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 5437.768 ms (00:05.438)
postgres=# truncate t0;
TRUNCATE TABLE
Time: 20.775 ms
postgres=# copy t0 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 5272.228 ms (00:05.272)

Master:

postgres=# copy t1 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 6570.031 ms (00:06.570)
postgres=# truncate t1;
TRUNCATE TABLE
Time: 17.813 ms
postgres=# copy t1 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 6486.253 ms (00:06.486)

Patched:

postgres=# copy t1 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 5359.338 ms (00:05.359)
postgres=# truncate table t1;
TRUNCATE TABLE
Time: 25.551 ms
postgres=# copy t1 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 5347.596 ms (00:05.348)


For the patch, I wonder if you need this line:

+ memcpy(values, slot->tts_values, sizeof(*values) * natts);

If you got rid of that and changed the datumCopy to use
slot->tts_values[i] instead.

Maybe it's also worth getting rid of the first memcpy for the null
array and just assign the element in the else clause.

It might also be cleaner to assign TupleDescAttr(tupdesc, i) to a
variable instead of using the macro 3 times. It'd make that datumCopy
line shorter too.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Why does ExecComputeStoredGenerated() form a heap tuple

Peter Eisentraut-6
On 2019-04-24 00:26, David Rowley wrote:
> I didn't do the exact same test, but if I use COPY instead of \copy,
> then for me patched is faster.

OK, confirmed that way, too.

> For the patch, I wonder if you need this line:
>
> + memcpy(values, slot->tts_values, sizeof(*values) * natts);
>
> If you got rid of that and changed the datumCopy to use
> slot->tts_values[i] instead.

done

> Maybe it's also worth getting rid of the first memcpy for the null
> array and just assign the element in the else clause.

Tried that, seems to be slower.  So I left it as is.

> It might also be cleaner to assign TupleDescAttr(tupdesc, i) to a
> variable instead of using the macro 3 times. It'd make that datumCopy
> line shorter too.

Also done.

Updated patch attached.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v2-0001-Convert-ExecComputeStoredGenerated-to-use-tuple-s.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Why does ExecComputeStoredGenerated() form a heap tuple

David Rowley-3
On Thu, 16 May 2019 at 05:44, Peter Eisentraut
<[hidden email]> wrote:
> Updated patch attached.

This patch looks okay to me.

It's not for this patch, or probably for PG12, but it would be good if
we could avoid the formation of the Tuple until right before the new
tuple is inserted.

I see heap_form_tuple() is called 3 times for a single INSERT with:

create table t (a text, b text, c text generated always as (b || b) stored);

create or replace function t_trigger() returns trigger as $$
begin
NEW.b = UPPER(NEW.a);
RETURN NEW;
end;
$$ language plpgsql;

create trigger t_on_insert before insert on t for each row execute
function t_trigger();

insert into t (a) values('one');

and heap_deform_tuple() is called once for each additional
heap_form_tuple().  That's pretty wasteful :-(

Maybe Andres can explain if this is really required, or if it's just
something that's not well optimised yet.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Why does ExecComputeStoredGenerated() form a heap tuple

Andres Freund
Hi,

On 2019-05-20 14:23:34 +1200, David Rowley wrote:

> It's not for this patch, or probably for PG12, but it would be good if
> we could avoid the formation of the Tuple until right before the new
> tuple is inserted.
>
> I see heap_form_tuple() is called 3 times for a single INSERT with:
>
> create table t (a text, b text, c text generated always as (b || b) stored);
>
> create or replace function t_trigger() returns trigger as $$
> begin
> NEW.b = UPPER(NEW.a);
> RETURN NEW;
> end;
> $$ language plpgsql;
>
> create trigger t_on_insert before insert on t for each row execute
> function t_trigger();
>
> insert into t (a) values('one');
>
> and heap_deform_tuple() is called once for each additional
> heap_form_tuple().  That's pretty wasteful :-(
>
> Maybe Andres can explain if this is really required, or if it's just
> something that's not well optimised yet.

I think we can optimize this further, but it's not unexpected.

I see:

1) ExecCopySlot() call in in ExecModifyTable(). For INSERT SELECT the
   input will be in a virtual slot. We might be able to have some
   trickery to avoid this one in some case. Not sure how much it'd help
   - I think we most of the time would just move the forming of the
   tuple around - ExecInsert() wants to materialize the slot.

2) plpgsql form/deform due to updating a field. I don't see how we could
   easily fix that. We'd have to invent a mechanism that allows plpgsql to pass
   slots around. I guess it's possible you could make that work somehow?
   But I think we'd also need to change the external trigger interface -
   which currently specifies that the return type is a HeapTuple

3) ExecComputeStoredGenerated(). I suspect it's not particularly useful
   to get rid of the heap_form_tuple (from with ExecMaterialize())
   here. When actually inserting we'll have to actually form the tuple
   anyway.  But what I do wonder is whether it would make sense to move
   the materialization outside of that function. If there's constraints,
   or partitioning, we'll have to deform (parts of) the tuple, to access
   the necessary columns.

Currently materializing an unmaterialized slot (i.e. making it
independent from anything but memory referenced by the slot) also means
that later accesses will need to deform again. I'm fairly sure we can
improve that for many cases (IIRC we don't need to that for virtual
slots, but that's irrelevant here).

I'm pretty sure we get rid of most of this, but it'll be some work. I'm
also not sure how important it is - for INSERT/UPDATE, in how many cases
is the bottleneck those copies, rather than other parts of query
execution? I suspect you can measure it for some INSERT ... SELECT type
cases - but probably the overhead of triggers and GENERATED is going to
be higher.

Greetings,

Andres Freund