Duplicate Item Pointers in Gin index

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Duplicate Item Pointers in Gin index

R, Siva

Hi,

We are currently investigating an issue with a Gin index containing duplicate item pointers for one of its keys. While we are trying to reproduce this issue, we wanted to reach out to the community to validate our current working theory.

Background:
The discussion in the following thread:

https://www.postgresql.org/message-id/flat/CAD21AoBLUSyiYKnTYtSAbC%2BF%3DXDjiaBrOUEGK%2BzUXdQ8owfPKw%40mail.gmail.com#CAD21AoBLUSyiYKnTYtSAbC+F=XDjiaBrOUEGK+zUXdQ8owfPKw@...

is about vacuum skipping cleanup of gin pending list when a user backend has locked the pending list for cleanup.

One of the responses from Jeff Janes describes the source of a corruption. Here is the sequence of events leading up to it
  1. Consider a tuple that is inserted and just updated/deleted and passes the dead-to-all horizon. At this point, the old tuple id (corresponding to the insert) is eligible for vacuum.
  2. Due to the above bug, vacuum can skip clean up of pending list if a user backend already has a lock on it before proceeding to vacuum the gin posting tree.
  3. At this point, it is possible that the user backend flushes the pending list containing the dead tuple to the main gin index structure after vacuum has already passed over. Vacuum has marked that tuple id slot as re-usable when there is actually an index tuple pointing to it.

Given the above, our question is to find out if the above is a valid case for having duplicate tids in the index tree structure without the fix for the above bug (in non-debug builds where "Assert" is disabled) with the following sequence of events:
  1. Assume that in the above sequence, steps 1 and 2 have completed.
  2. Before the pending list flush from user backend has happened, vacuum has marked that tuple id as re-usable.
  3. User backend (let us call this backend 1) has released the lock on the metapage and is processing the pending list pages, adding the items to the build accumulator. Let us assume that there are currently 2 pages in the pending list and metadata->head is currently being processed (metadata->tail is not locked yet) and the old tuple id that was deleted is added to the accumulator.
  4. Another user backend (let us call this backend 2) uses the tid that was marked re-usable for a row that contains the same key datum as before, takes metapage lock and inserts into the pending list (metadata->tail) and returns (assume it does not do cleanup).
  5. The first backend (backend 1) has finished processing metadata->head, decides not to flush and is now processing metadata->tail page which also contains the old tuple id and adds it to the build accumulator (ginCombineData appends to the end of the list in the accumulator tree - checks for duplicates through a Assert [1] ). So now the build accumulator contains duplicates for that key datum.
  6. First backend has finished processing metadata->tail, finds it is the last page of pending list and then decides to flush (one of the conditions to start flushing if we are at end of pending list).
  7. Each list of item pointers that is obtained through ginGetBAEntry [2] and inserted into the main tree. ginGetBAEntry will sort the items if already not sorted using qsort and qsortCompareItemPointers method where the check for duplicates happens through a Assert [3]
  8. During ginEntryInsert when we actually place the data to the page (ginEntryInsert -> ginInsertItemPointers -> ginInsertValue -> ginPlaceToPage -> beginPlaceToPage [dataBeginPlaceToPage] -> dataBeginPlaceToPageLeaf -> addItemsToLeaf). Assume that this insertion does not trigger a split for simplicity.
  9. Function addItemsToLeaf calls ginMergeItemPointers to merge the item pointers in the input with the existing one in the leaf page. The comment on ginMergeItemPointers mentions that it eliminates duplicates [4] , but it eliminates duplicates only across the 2 input arrays, not within the input arrays (Example : ginMergeItemPointers({1, 2, 2, 3}, {3, 4, 5}) will yield {1, 2, 2, 3, 4, 5}).

While this is probably a rare scenario, we were wondering if this is possible and if so, of the consequences that it can have in other parts of the code. One place where we found the strictly increasing order of item pointers is validated is in ginCompressPostingList during encoding the item pointers (through Assert [5] Note that if there are duplicates, the encoding is not broken)

Best
Siva

Links -
[1] - https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/access/gin/ginbulk.c#L60
[2] - https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/access/gin/ginfast.c#L878
[3] - https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/access/gin/ginbulk.c#L250
[4] - https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/access/gin/ginpostinglist.c#L361
[5] - https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/access/gin/ginpostinglist.c#L213


Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Peter Geoghegan-4
On Tue, Feb 20, 2018 at 3:30 PM, R, Siva <[hidden email]> wrote:
> We are currently investigating an issue with a Gin index containing
> duplicate item pointers for one of its keys. While we are trying to
> reproduce this issue, we wanted to reach out to the community to validate
> our current working theory.

Please define "a GIN index containing duplicate item pointers". Are
the keys duplicated, the item pointers, or both? Where are the
duplicates contained (what part of the GIN structure)?

> Consider a tuple that is inserted and just updated/deleted and passes the
> dead-to-all horizon. At this point, the old tuple id (corresponding to the
> insert) is eligible for vacuum.
> Due to the above bug, vacuum can skip clean up of pending list if a user
> backend already has a lock on it before proceeding to vacuum the gin posting
> tree.
> At this point, it is possible that the user backend flushes the pending list
> containing the dead tuple to the main gin index structure after vacuum has
> already passed over. Vacuum has marked that tuple id slot as re-usable when
> there is actually an index tuple pointing to it.

It sounds like you're talking about a case where a TID value is
recycled by VACUUM, despite the fact that the GIN structure still
contains a TID intended to point to the original tuple -- an index
tuple that should have been killed before VACUUM recycled the
TID/killed the heap tuple. The TID points to a valid heap tuple, but
the wrong one. (This is a corruption scenario that we saw from time to
time in past cases where VACUUMing of indexes had bugs).

Is that general understanding of what you've said correct?

> Given the above, our question is to find out if the above is a valid case
> for having duplicate tids in the index tree structure without the fix for
> the above bug (in non-debug builds where "Assert" is disabled) with the
> following sequence of events:

I think that you'll be very lucky to get a useful answer to this
specific question. The recent bugfix was based on an observation that
Sawada-san made about the code not doing what it claimed to do, and
clearly needs to do. It's very hard to give you any assurance that
"duplicate TIDs" could be explained by the bug.

FWIW, I continue to have misgivings about pending list VACUUMing. I
plan to look at it again.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Masahiko Sawada
In reply to this post by R, Siva
On Wed, Feb 21, 2018 at 8:30 AM, R, Siva <[hidden email]> wrote:

> User backend (let us call this backend 1) has released the lock on the
> metapage and is processing the pending list pages, adding the items to the
> build accumulator. Let us assume that there are currently 2 pages in the
> pending list and metadata->head is currently being processed (metadata->tail
> is not locked yet) and the old tuple id that was deleted is added to the
> accumulator.
> Another user backend (let us call this backend 2) uses the tid that was
> marked re-usable for a row that contains the same key datum as before, takes
> metapage lock and inserts into the pending list (metadata->tail) and returns
> (assume it does not do cleanup).

IIUC, ginInsertCleanup() holds ExclusiveLock on metapage during adding
tuples in the pending list to the accumulator. And inserting entries
to the pending list also requires ExclusiveLock on metapage. This
behavior is not relevant with that bug fix. So I don't think that
backend2 can inserts a tuple while backend1 is processing the pending
list.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Peter Geoghegan-4
On Tue, Feb 20, 2018 at 7:19 PM, Masahiko Sawada <[hidden email]> wrote:
> IIUC, ginInsertCleanup() holds ExclusiveLock on metapage during adding
> tuples in the pending list to the accumulator. And inserting entries
> to the pending list also requires ExclusiveLock on metapage. This
> behavior is not relevant with that bug fix. So I don't think that
> backend2 can inserts a tuple while backend1 is processing the pending
> list.

You mean that you think that the problem that Siva described cannot
happen once Postgres has commit
3b2787e1f8f1eeeb6bd9565288ab210262705b56?

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Masahiko Sawada
On Wed, Feb 21, 2018 at 12:31 PM, Peter Geoghegan <[hidden email]> wrote:

> On Tue, Feb 20, 2018 at 7:19 PM, Masahiko Sawada <[hidden email]> wrote:
>> IIUC, ginInsertCleanup() holds ExclusiveLock on metapage during adding
>> tuples in the pending list to the accumulator. And inserting entries
>> to the pending list also requires ExclusiveLock on metapage. This
>> behavior is not relevant with that bug fix. So I don't think that
>> backend2 can inserts a tuple while backend1 is processing the pending
>> list.
>
> You mean that you think that the problem that Siva described cannot
> happen once Postgres has commit
> 3b2787e1f8f1eeeb6bd9565288ab210262705b56?

It's true the problem cannot happen once postgres has that commit.
Since the commit 3b2787e fixed so as not to happen the #1 event that
Siva described, so the problem cannot happen once postgres has it. But
what I meant is that I think the event #3 and #4 cannot happen even
without that commit.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

R, Siva
In reply to this post by Peter Geoghegan-4

Hi Peter,
Thanks for the response!

> Please define "a GIN index containing duplicate item pointers". Are
> the keys duplicated, the item pointers, or both? Where are the
> duplicates contained (what part of the GIN structure)?

The keys are not duplicates. For a particular index key,
there are 2 consecutive item pointers that are equal.
The duplicates are in a GIN data leaf page (posting tree).


> It sounds like you're talking about a case where a TID value is
> recycled by VACUUM, despite the fact that the GIN structure still
> contains a TID intended to point to the original tuple -- an index
> tuple that should have been killed before VACUUM recycled the
> TID/killed the heap tuple. The TID points to a valid heap tuple, but
> the wrong one. (This is a corruption scenario that we saw from time to
> time in past cases where VACUUMing of indexes had bugs).

> Is that general understanding of what you've said correct?

Yes that is correct.

Best
Siva


________________________________________
From: Peter Geoghegan <[hidden email]>
Sent: Tuesday, February 20, 2018 6:15 PM
To: R, Siva
Cc: [hidden email]
Subject: Re: Duplicate Item Pointers in Gin index

On Tue, Feb 20, 2018 at 3:30 PM, R, Siva <[hidden email]> wrote:
> We are currently investigating an issue with a Gin index containing
> duplicate item pointers for one of its keys. While we are trying to
> reproduce this issue, we wanted to reach out to the community to validate
> our current working theory.

Please define "a GIN index containing duplicate item pointers". Are
the keys duplicated, the item pointers, or both? Where are the
duplicates contained (what part of the GIN structure)?

> Consider a tuple that is inserted and just updated/deleted and passes the
> dead-to-all horizon. At this point, the old tuple id (corresponding to the
> insert) is eligible for vacuum.
> Due to the above bug, vacuum can skip clean up of pending list if a user
> backend already has a lock on it before proceeding to vacuum the gin posting
> tree.
> At this point, it is possible that the user backend flushes the pending list
> containing the dead tuple to the main gin index structure after vacuum has
> already passed over. Vacuum has marked that tuple id slot as re-usable when
> there is actually an index tuple pointing to it.

It sounds like you're talking about a case where a TID value is
recycled by VACUUM, despite the fact that the GIN structure still
contains a TID intended to point to the original tuple -- an index
tuple that should have been killed before VACUUM recycled the
TID/killed the heap tuple. The TID points to a valid heap tuple, but
the wrong one. (This is a corruption scenario that we saw from time to
time in past cases where VACUUMing of indexes had bugs).

Is that general understanding of what you've said correct?

> Given the above, our question is to find out if the above is a valid case
> for having duplicate tids in the index tree structure without the fix for
> the above bug (in non-debug builds where "Assert" is disabled) with the
> following sequence of events:

I think that you'll be very lucky to get a useful answer to this
specific question. The recent bugfix was based on an observation that
Sawada-san made about the code not doing what it claimed to do, and
clearly needs to do. It's very hard to give you any assurance that
"duplicate TIDs" could be explained by the bug.

FWIW, I continue to have misgivings about pending list VACUUMing. I
plan to look at it again.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

R, Siva
In reply to this post by Masahiko Sawada
Hi Masahiko,
Thanks for the response!

On Tue, Feb 20, 2018 at 7:19PM, Masahiko Sawada <[hidden email]>

> On Wed, Feb 21, 2018 at 8:30 AM, R, Siva <[hidden email]> wrote:
>> User backend (let us call this backend 1) has released the lock on the
>> metapage and is processing the pending list pages, adding the items to the
>> build accumulator. Let us assume that there are currently 2 pages in the
>> pending list and metadata->head is currently being processed (metadata->tail
>> is not locked yet) and the old tuple id that was deleted is added to the
>> accumulator.
>> Another user backend (let us call this backend 2) uses the tid that was
>> marked re-usable for a row that contains the same key datum as before, takes
>> metapage lock and inserts into the pending list (metadata->tail) and returns
>> (assume it does not do cleanup).

> IIUC, ginInsertCleanup() holds ExclusiveLock on metapage during adding
> tuples in the pending list to the accumulator. And inserting entries
> to the pending list also requires ExclusiveLock on metapage. This
> behavior is not relevant with that bug fix. So I don't think that
>backend2 can inserts a tuple while backend1 is processing the pending
 >list.

Did you mean pin on the metapage buffer during ginInsertCleanup and not lock
during addition of tuples to the accumulator? The exclusive lock on metapage
buffer is released after reading/locking head of pending list and before we
process pages/add tuples to the accumulator in ginInsertCleanup [1].

The metapage (not metapage buffer) is locked using LockPage upon entry
in ginInsertCleanup and unlocked after processing of pending list is complete.
Does that prevent concurrent insertions into the pending list?

Insertions take exclusive lock on the metabuffer but
does not do a LockPage on the metapage. [2] There is a comment in
ginInsertCleanup that mentions concurrent insertions into pending list
is possible while cleanup is in progress. [3]


[1] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/ginfast.c;h=56a3ff590a45f00088ff9f0fd9fece0809bbe2fe;hb=refs/heads/REL9_6_STABLE#l808
[2] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/ginfast.c;h=56a3ff590a45f00088ff9f0fd9fece0809bbe2fe;hb=refs/heads/REL9_6_STABLE#l255
[3] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/ginfast.c;h=56a3ff590a45f00088ff9f0fd9fece0809bbe2fe;hb=refs/heads/REL9_6_STABLE#l754

Best
Siva

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Peter Geoghegan-4
On Wed, Feb 21, 2018 at 3:02 PM, R, Siva <[hidden email]> wrote:
> Did you mean pin on the metapage buffer during ginInsertCleanup and not lock
> during addition of tuples to the accumulator? The exclusive lock on metapage
> buffer is released after reading/locking head of pending list and before we
> process pages/add tuples to the accumulator in ginInsertCleanup [1].

AFAICT, nobody ever holds just a pin on the metapage as some kind of
interlock (since nobody else ever acquires a "super exclusive lock" on
the metapage -- if anybody else ever did that, then simply holding a
pin might make sense as a way of blocking the "super exclusive" lock
acquisition). Maybe you're thinking of the root page of posting trees?

I think that Sawada-san simply means that holding an ExclusiveLock on
the metapage makes writers block each other, and concurrent VACUUMs.
At least, for as long as they're in ginInsertCleanup().

BTW, are you aware of this basic fact about GIN?:

/*
 * First, scan the pending list and collect any matching entries into the
 * bitmap.  After we scan a pending item, some other backend could post it
 * into the main index, and so we might visit it a second time during the
 * main scan.  This is okay because we'll just re-set the same bit in the
 * bitmap.  (The possibility of duplicate visits is a major reason why GIN
 * can't support the amgettuple API, however.) Note that it would not do
 * to scan the main index before the pending list, since concurrent
 * cleanup could then make us miss entries entirely.
 */

(This comment is from gingetbitmap().)

> The metapage (not metapage buffer) is locked using LockPage upon entry
> in ginInsertCleanup and unlocked after processing of pending list is complete.
> Does that prevent concurrent insertions into the pending list?

It's supposed to be. I suggested that there might be a similar though
distinct problem there, right before discussion trailed off [1] --
"The metapage HW lock is sufficient for writers to block writers, but
AFAICT it is not sufficient for writers to block readers."

Is that related to what you're asking about?

[1] https://www.postgresql.org/message-id/CAH2-Wz%3DGTnAPzEEZqYELOv3h1Fxpo5xhMrBP6aMGEKLKv95csQ%40mail.gmail.com
--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Masahiko Sawada
On Thu, Feb 22, 2018 at 8:28 AM, Peter Geoghegan <[hidden email]> wrote:

> On Wed, Feb 21, 2018 at 3:02 PM, R, Siva <[hidden email]> wrote:
>> Did you mean pin on the metapage buffer during ginInsertCleanup and not lock
>> during addition of tuples to the accumulator? The exclusive lock on metapage
>> buffer is released after reading/locking head of pending list and before we
>> process pages/add tuples to the accumulator in ginInsertCleanup [1].
>
> AFAICT, nobody ever holds just a pin on the metapage as some kind of
> interlock (since nobody else ever acquires a "super exclusive lock" on
> the metapage -- if anybody else ever did that, then simply holding a
> pin might make sense as a way of blocking the "super exclusive" lock
> acquisition). Maybe you're thinking of the root page of posting trees?
>
> I think that Sawada-san simply means that holding an ExclusiveLock on
> the metapage makes writers block each other, and concurrent VACUUMs.
> At least, for as long as they're in ginInsertCleanup().

Yes, but I realized my previous mail was wrong, sorry. Insertion to
pending list doesn't acquire ExclusiveLock on metapage. So we can
insert tuples to pending list while cleaning up.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Masahiko Sawada
On Thu, Feb 22, 2018 at 10:26 AM, Masahiko Sawada <[hidden email]> wrote:

> On Thu, Feb 22, 2018 at 8:28 AM, Peter Geoghegan <[hidden email]> wrote:
>> On Wed, Feb 21, 2018 at 3:02 PM, R, Siva <[hidden email]> wrote:
>>> Did you mean pin on the metapage buffer during ginInsertCleanup and not lock
>>> during addition of tuples to the accumulator? The exclusive lock on metapage
>>> buffer is released after reading/locking head of pending list and before we
>>> process pages/add tuples to the accumulator in ginInsertCleanup [1].
>>
>> AFAICT, nobody ever holds just a pin on the metapage as some kind of
>> interlock (since nobody else ever acquires a "super exclusive lock" on
>> the metapage -- if anybody else ever did that, then simply holding a
>> pin might make sense as a way of blocking the "super exclusive" lock
>> acquisition). Maybe you're thinking of the root page of posting trees?
>>
>> I think that Sawada-san simply means that holding an ExclusiveLock on
>> the metapage makes writers block each other, and concurrent VACUUMs.
>> At least, for as long as they're in ginInsertCleanup().
>
> Yes, but I realized my previous mail was wrong, sorry. Insertion to
> pending list doesn't acquire ExclusiveLock on metapage. So we can
> insert tuples to pending list while cleaning up.
>

Sorry for the very late response.

FWIW, I've looked at this again. I think that the situation Siva
reported in the first mail can happen before we get commit 3b2787e.
That is, gin indexes had had a data corruption bug. I've reproduced
the situation with PostgreSQL 10.1 and observed that a gin index can
corrupt. However, gingetbitmap (fortunately?) returned a correct
result even when the gin index is corrupted.

The minimum situation I reproduced is that each gin entry has two
pointers to the same TID as follows.

gin-entry 1     gin-entry2
(1, 147)         (1, 147)
(1, 147)         (1, 147)

The above situation is surely corrupted where I executed the all steps
Siva described in the first mail. The first TID of both entries points
to an already-vacuumed itempointer (the tuple is inserted, deleted and
vacuumed), whereas the second entries points to a live itempointer on
heap. In entryGetItem, since we check advancePast it doesn't return
the second TIDs in both posting list case and posting tree case. Also
even in partial match case, since TIDbitmap eliminates the duplication
entryGetItem can return a correct result. The corrupted gin index
returned a correct result actually but no assertion failure happened.

I'm not sure how you figured this duplicated item pointers issue out
but what I got through investigating this issue is that gin indexes
could return a correct result without no assertion failure even if it
somewhat corrupted. So maybe having amcheck for gin indexes would
resolve part of problems.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Peter Geoghegan-4
On Tue, Jun 12, 2018 at 11:01 PM, Masahiko Sawada <[hidden email]> wrote:
> FWIW, I've looked at this again. I think that the situation Siva
> reported in the first mail can happen before we get commit 3b2787e.
> That is, gin indexes had had a data corruption bug. I've reproduced
> the situation with PostgreSQL 10.1 and observed that a gin index can
> corrupt.

So, you've recreated the problem with Postgres from before 3b2787e,
but not after 3b2787e? Are you suggesting that 3b2787e might have
fixed it, or that it only hid the problem, or something else?

How did you recreate the problem? Do you have a test case you can share?

> However, gingetbitmap (fortunately?) returned a correct
> result even when the gin index is corrupted.

That's not really surprising.

> I'm not sure how you figured this duplicated item pointers issue out
> but what I got through investigating this issue is that gin indexes
> could return a correct result without no assertion failure even if it
> somewhat corrupted. So maybe having amcheck for gin indexes would
> resolve part of problems.

That seems like a really good idea. As you know, I have misgivings
about this area.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Masahiko Sawada
On Wed, Jun 13, 2018 at 3:32 PM, Peter Geoghegan <[hidden email]> wrote:

> On Tue, Jun 12, 2018 at 11:01 PM, Masahiko Sawada <[hidden email]> wrote:
>> FWIW, I've looked at this again. I think that the situation Siva
>> reported in the first mail can happen before we get commit 3b2787e.
>> That is, gin indexes had had a data corruption bug. I've reproduced
>> the situation with PostgreSQL 10.1 and observed that a gin index can
>> corrupt.
>
> So, you've recreated the problem with Postgres from before 3b2787e,
> but not after 3b2787e? Are you suggesting that 3b2787e might have
> fixed it, or that it only hid the problem, or something else?

I meant 3b2787e fixed it. I checked that at least the situation
doesn't happen after 3b2787e.

> How did you recreate the problem? Do you have a test case you can share?

I recreated it by executing each steps step by step using gdb. So I
can share the test case but it might not help.

create extension pageinspect;
create table g (c int[]);
insert into g select ARRAY[1] from generate_series(1,1000);
create index g_idx on g using gin (c);
alter table g set (autovacuum_enabled = off);
insert into g select ARRAY[1] from generate_series(1, 408); -- 408
items fit in exactly one page of pending list
insert into g select ARRAY[1] from generate_series(1, 100); -- insert
into 2nd page of pending list
select n_pending_pages, n_pending_tuples from
gin_metapage_info(get_raw_page('g_idx', 0));
insert into g select ARRAY[999]; -- insert into 2nd pending list page
delete from g where c = ARRAY[999];
-- At this point, gin entry of 'ARRAY[999]' exists on 2nd page of
pending list and deleted.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Alexander Korotkov
On Wed, Jun 13, 2018 at 11:40 AM Masahiko Sawada <[hidden email]> wrote:

>
> On Wed, Jun 13, 2018 at 3:32 PM, Peter Geoghegan <[hidden email]> wrote:
> > On Tue, Jun 12, 2018 at 11:01 PM, Masahiko Sawada <[hidden email]> wrote:
> >> FWIW, I've looked at this again. I think that the situation Siva
> >> reported in the first mail can happen before we get commit 3b2787e.
> >> That is, gin indexes had had a data corruption bug. I've reproduced
> >> the situation with PostgreSQL 10.1 and observed that a gin index can
> >> corrupt.
> >
> > So, you've recreated the problem with Postgres from before 3b2787e,
> > but not after 3b2787e? Are you suggesting that 3b2787e might have
> > fixed it, or that it only hid the problem, or something else?
>
> I meant 3b2787e fixed it. I checked that at least the situation
> doesn't happen after 3b2787e.

I also think that 3b2787e should fix such problems.  After 3b2787e,
vacuum is forced to cleanup all pending list entries, which were
inserted before vacuum start.  So, vacuum should have everything to be
vaccumed merged into posting lists/trees.

> > How did you recreate the problem? Do you have a test case you can share?
>
> I recreated it by executing each steps step by step using gdb. So I
> can share the test case but it might not help.
>
> create extension pageinspect;
> create table g (c int[]);
> insert into g select ARRAY[1] from generate_series(1,1000);
> create index g_idx on g using gin (c);
> alter table g set (autovacuum_enabled = off);
> insert into g select ARRAY[1] from generate_series(1, 408); -- 408
> items fit in exactly one page of pending list
> insert into g select ARRAY[1] from generate_series(1, 100); -- insert
> into 2nd page of pending list
> select n_pending_pages, n_pending_tuples from
> gin_metapage_info(get_raw_page('g_idx', 0));
> insert into g select ARRAY[999]; -- insert into 2nd pending list page
> delete from g where c = ARRAY[999];
> -- At this point, gin entry of 'ARRAY[999]' exists on 2nd page of
> pending list and deleted.

Is this test case completed?  It looks like there should be a
continuation with concurrent vacuum and insertions managed by gdb...

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

Masahiko Sawada
On Wed, Jun 13, 2018 at 10:22 PM, Alexander Korotkov
<[hidden email]> wrote:

> On Wed, Jun 13, 2018 at 11:40 AM Masahiko Sawada <[hidden email]> wrote:
>>
>> On Wed, Jun 13, 2018 at 3:32 PM, Peter Geoghegan <[hidden email]> wrote:
>> > On Tue, Jun 12, 2018 at 11:01 PM, Masahiko Sawada <[hidden email]> wrote:
>> >> FWIW, I've looked at this again. I think that the situation Siva
>> >> reported in the first mail can happen before we get commit 3b2787e.
>> >> That is, gin indexes had had a data corruption bug. I've reproduced
>> >> the situation with PostgreSQL 10.1 and observed that a gin index can
>> >> corrupt.
>> >
>> > So, you've recreated the problem with Postgres from before 3b2787e,
>> > but not after 3b2787e? Are you suggesting that 3b2787e might have
>> > fixed it, or that it only hid the problem, or something else?
>>
>> I meant 3b2787e fixed it. I checked that at least the situation
>> doesn't happen after 3b2787e.
>
> I also think that 3b2787e should fix such problems.  After 3b2787e,
> vacuum is forced to cleanup all pending list entries, which were
> inserted before vacuum start.  So, vacuum should have everything to be
> vaccumed merged into posting lists/trees.
>
>> > How did you recreate the problem? Do you have a test case you can share?
>>
>> I recreated it by executing each steps step by step using gdb. So I
>> can share the test case but it might not help.
>>
>> create extension pageinspect;
>> create table g (c int[]);
>> insert into g select ARRAY[1] from generate_series(1,1000);
>> create index g_idx on g using gin (c);
>> alter table g set (autovacuum_enabled = off);
>> insert into g select ARRAY[1] from generate_series(1, 408); -- 408
>> items fit in exactly one page of pending list
>> insert into g select ARRAY[1] from generate_series(1, 100); -- insert
>> into 2nd page of pending list
>> select n_pending_pages, n_pending_tuples from
>> gin_metapage_info(get_raw_page('g_idx', 0));
>> insert into g select ARRAY[999]; -- insert into 2nd pending list page
>> delete from g where c = ARRAY[999];
>> -- At this point, gin entry of 'ARRAY[999]' exists on 2nd page of
>> pending list and deleted.
>
> Is this test case completed?  It looks like there should be a
> continuation with concurrent vacuum and insertions managed by gdb...
>

This is not completed test case. This is only for step 1 and we need
concurrent vacuum and insertions as you said.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Item Pointers in Gin index

R, Siva
On Tue, Jun 12, 2018 at 11:01 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> FWIW, I've looked at this again. I think that the situation Siva
> reported in the first mail can happen before we get commit 3b2787e.
> That is, gin indexes had had a data corruption bug. I've reproduced
> the situation with PostgreSQL 10.1 and observed that a gin index can
> corrupt.

Thank you so much for trying the steps out and reproducing the issue!
It is good that the correctness of query result is not affected when reading
from a gin index that has duplicates.

One place where an assertion will fail is when we compress the posting list [1]
but since it is a debug assert, it will be suppressed in release builds.
Here, the correctness of varbyte encoding is not affected,
but the encoded page will have redundant zero deltas.

When we repack leaf items [2] this may cause creation of a new segment
if all the items (including the duplicates) do not fit in a single page.

[1] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/ginpostinglist.c;h=8d2d31ac7236de5e2c62c6fa745af45a2b895b2c;hb=refs/heads/REL_10_STABLE#l209
[2] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/gindatapage.c;h=2e5ea479763f32d6dc637e1c27d5975d124f293f;hb=refs/heads/REL_10_STABLE#l1601

Best
Siva