posgres 12 bug (partitioned table)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

posgres 12 bug (partitioned table)

Pavel Biryukov
Hello!
 
(Tested on PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit)
 
Bug reference:      16446
Logged by:          Георгий Драк
Email address:      sonicgd(at)gmail(dot)com
PostgreSQL version: 12.2
Operating system:   Debian 10.3
Description:        
 
 
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING xmin;
 
 
Reproduction.
 
 
1. Create schema
CREATE TABLE "tmp"
(
    id   bigint generated always as identity,
    date timestamptz not null,
    foo  int         not null,
    PRIMARY KEY ("id", "date")
)
    PARTITION BY RANGE ("date");
CREATE TABLE "tmp_2020" PARTITION OF "tmp" FOR VALUES FROM ('2020-01-01') TO
('2021-01-01');
 
 
2. Execute query
INSERT INTO "tmp" ("date", "foo")
VALUES (NOW(), 1)
RETURNING id, xmin;
 
 
3. Result - ERROR: virtual tuple table slot does not have system
attributes
 
 
4. Expected result - id and xmin of inserted row.
 
Reply | Threaded
Open this post in threaded view
|

Re: posgres 12 bug (partitioned table)

Tom Lane-2
Pavel Biryukov <[hidden email]> writes:
> Hello. I'm catch error "virtual tuple table slot does not have system
> attributes" when inserting row into partitioned table with RETURNING
> xmin

Reproduced here.  The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression.  I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: posgres 12 bug (partitioned table)

David Rowley
On Fri, 5 Jun 2020 at 04:57, Tom Lane <[hidden email]> wrote:
> Reproduced here.  The example works in v11, and in v10 if you remove
> the unnecessary-to-the-example primary key, so it seems like a clear
> regression.  I didn't dig for the cause but I suppose it's related
> to Andres' slot-related changes.

Looks like c2fe139c20 is the breaking commit.

David


Reply | Threaded
Open this post in threaded view
|

Re: posgres 12 bug (partitioned table)

Pavel Biryukov
Hello!
 
Is it going to be fixed? This problem stops us from migrating to 12...
-- 
С уважением, Павел
 
 
 
05.06.2020, 03:18, "David Rowley" <[hidden email]>:

On Fri, 5 Jun 2020 at 04:57, Tom Lane <[hidden email]> wrote:

 Reproduced here. The example works in v11, and in v10 if you remove
 the unnecessary-to-the-example primary key, so it seems like a clear
 regression. I didn't dig for the cause but I suppose it's related
 to Andres' slot-related changes.


Looks like c2fe139c20 is the breaking commit.

David

Reply | Threaded
Open this post in threaded view
|

Re: posgres 12 bug (partitioned table)

Soumyadeep Chakraborty
Hello,

ccing [hidden email]

Upon investigation, it seems that the problem is caused by the
following:

The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot. If there are system
columns other than ctid and tableOid referenced in the RETURNING clause
(not only xmin as in the bug report), it will lead to the ERROR as
mentioned in this thread as virtual tuple table slots don't really store
such columns. (ctid and tableOid are present directly in the
TupleTableSlot struct and can be satisfied from there: refer:
slot_getsysattr()))

I have attached two alternate patches to solve the problem. Both patches
use and share a mechanism to detect if there are any such system
columns. This is done inside ExecBuildProjectionInfo() and we store this
info inside the ProjectionInfo struct. Then based on this info, system
columns are populated in a suitable slot, which is then passed on to
ExecProcessReturning(). (If it is deemed that this operation only be
done for RETURNING, we can just as easily do it in the callsite for
ExecBuildProjectionInfo() in ExecInitModifyTable() for RETURNING instead
of doing it inside ExecBuildProjectionInfo())

The first patch [1] explicitly creates a heap tuple table slot, fills in the
system column values as we would do during heap_prepare_insert() and
then passes that slot to ExecProcessReturning(). (We use a heap tuple table
slot as it is guaranteed to support these attributes).

The second patch [2] instead of relying on a heap tuple table slot,
relies on ExecGetReturningSlot() for the right slot and
table_tuple_fetch_row_version() to supply the system column values.  It
does make the assumption that the AM would supply a slot that will have
these system columns.

[1] v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch
[2] v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch

Regards,
Soumyadeep (VMware)

v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch (7K) Download Attachment
v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: posgres 12 bug (partitioned table)

Amit Langote
Hi Soumyadeep,

Thanks for picking this up.

On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
<[hidden email]> wrote:
> Upon investigation, it seems that the problem is caused by the
> following:
>
> The slot passed to the call to ExecProcessReturning() inside
> ExecInsert() is often a virtual tuple table slot.

Actually, not that often in practice.  The slot is not virtual, for
example, when inserting into a regular non-partitioned table.  Whether
or not it is virtual depends on the following piece of code in
ExecInitModifyTable():

        mtstate->mt_scans[i] =
            ExecInitExtraTupleSlot(mtstate->ps.state,
ExecGetResultType(mtstate->mt_plans[i]),

table_slot_callbacks(resultRelInfo->ri_RelationDesc));

Specifically, the call to table_slot_callbacks() above determines what
kind of slot is assigned for a given target relation.  For partitioned
tables, it happens to return a virtual slot currently, per this
implementation:

    if (relation->rd_tableam)
        tts_cb = relation->rd_tableam->slot_callbacks(relation);
    else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
    {
        /*
         * Historically FDWs expect to store heap tuples in slots. Continue
         * handing them one, to make it less painful to adapt FDWs to new
         * versions. The cost of a heap slot over a virtual slot is pretty
         * small.
         */
        tts_cb = &TTSOpsHeapTuple;
    }
    else
    {
        /*
         * These need to be supported, as some parts of the code (like COPY)
         * need to create slots for such relations too. It seems better to
         * centralize the knowledge that a heap slot is the right thing in
         * that case here.
         */
        Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
               relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
        tts_cb = &TTSOpsVirtual;
    }

If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached).  In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.

> I have attached two alternate patches to solve the problem.

IMHO, they are solving the problem at the wrong place.  We should
really fix things so that the slot that gets passed down to
ExecProcessReturning() is of the correct type to begin with.  We could
do what I suggest above or maybe find some other way.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

partitioned-table-heap-slot.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: posgres 12 bug (partitioned table)

Soumyadeep Chakraborty
Hi Amit,

Thanks for your reply!

On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <[hidden email]> wrote:

>
> Hi Soumyadeep,
>
> Thanks for picking this up.
>
> On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
> <[hidden email]> wrote:
> > Upon investigation, it seems that the problem is caused by the
> > following:
> >
> > The slot passed to the call to ExecProcessReturning() inside
> > ExecInsert() is often a virtual tuple table slot.
>
> Actually, not that often in practice.  The slot is not virtual, for
> example, when inserting into a regular non-partitioned table.

Indeed! I meant partitioned tables are a common use case. Sorry, I
should have elaborated.

> If I change this to return a "heap" slot for partitioned tables, just
> like for foreign tables, the problem goes away (see the attached).  In
> fact, even make check-world passes, so I don't know why it isn't that
> way to begin with.
>

This is what I had thought of initially but I had taken a step back for 2
reasons:

1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1], a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.

2. This is a secondary aspect but I will mention it here for
completeness. Not knowing enough about this code: will demanding heap
tuples for partitioned tables all throughout the code have a performance
impact? At a first glance it didn't seem to be the case.  However, I did
find lots of callsites for partitioning or otherwise where we kind of
expect a virtual tuple table slot (as evidenced with the calls to
ExecStoreVirtualTuple()). With your patch, we seem to be calling
ExecStoreVirtualTuple() on a heap tuple table slot, in various places:
such as inside execute_attr_map_slot(). It seems to be harmless to do so
however, in accordance with my limited investigation.

All in all, I think we have to explicitly supply those system columns. I
heard from Daniel that one of the motivations for having table AMs
was to ensure that transaction meta-data storage is not demanded off any
AM.

[1] https://github.com/greenplum-db/postgres/tree/zedstore

Regards,
Soumyadeep


Reply | Threaded
Open this post in threaded view
|

Re: posgres 12 bug (partitioned table)

Amit Langote
Hi Soumyadeep,

On Wed, Jul 8, 2020 at 9:37 AM Soumyadeep Chakraborty
<[hidden email]> wrote:

> On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <[hidden email]> wrote:
> > If I change this to return a "heap" slot for partitioned tables, just
> > like for foreign tables, the problem goes away (see the attached).  In
> > fact, even make check-world passes, so I don't know why it isn't that
> > way to begin with.
>
> This is what I had thought of initially but I had taken a step back for 2
> reasons:
>
> 1. It is not mandatory for an AM to supply a heap tuple in the slot
> passed to any AM routine. With your patch, the slot passed to
> table_tuple_insert() inside ExecInsert() for instance is now expected to
> supply a heap tuple for the subsequent call to ExecProcessReturning().
> This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
> applying your patch on Zedstore [1], a columnar AM, and consequently, I
> got incorrect values for xmin, xmax etc with the query reported in this
> issue.

Ah, I see.  You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert().  So,
table_tuple_insert() always refers to a leaf partition's AM.   Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table.  How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops?  That would be the case, for
example, if the leaf partition is of Zedstore AM.  In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com