logical decoding : exceeded maxAllocatedDescs for .spill files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
Hi,

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1].
This issue is similar but not exactly the same as [1]. In [1], the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1] where there was a file descriptor
leak.

I could reproduce it using a transaction containing a long series of
sub-transactions (possibly could be reproduced without
sub-transactions, but looking at the code I could come up with this
script using sub-transactions easily) :

create table tab(id int);

-- Function that does huge changes in a single transaction
create or replace function f(id int) returns int as
$$
begin
    -- Iterate this more than 492 times (max transient file
descriptors PG would typically allow)
    -- This will create that many sub-transactions due to presence of
exception block.
    FOR i IN 1..600 LOOP

        BEGIN
            -- Iterate more than 4096 times (so that changes spill to
disk: max_changes_in_memory)
            FOR j IN 1..5000 LOOP
               insert into tab values (1);
            END LOOP;
        EXCEPTION
            when division_by_zero then perform 'dummy';
        END;

    END LOOP;

    return id;
end $$ language plpgsql;

SELECT * FROM pg_create_logical_replication_slot('logical', 'test_decoding');

begin;
select f(1); -- Do huge changes in a single transaction
commit;

\! pg_recvlogical -d postgres  --slot=logical --verbose --start -f -

pg_recvlogical: starting log streaming at 0/0 (slot logical)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot logical)
BEGIN 1869
pg_recvlogical: confirming write up to 0/1B6D6E38, flush to 0/1B6D6E38
(slot logical)
pg_recvlogical: error: unexpected termination of replication stream:
ERROR:  exceeded maxAllocatedDescs (492) while trying to open file
"pg_replslot/logical/xid-2362-lsn-0-24000000.spill"
pg_recvlogical: disconnected; waiting 5 seconds to try again

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if  reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

Comments ?

[1] https://www.postgresql.org/message-id/flat/738a590a-2ce5-9394-2bef-7b1caad89b37%402ndquadrant.com

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Álvaro Herrera
On 2019-Sep-11, Amit Khandekar wrote:

> I reproduced the error "exceeded maxAllocatedDescs (492) while trying
> to open file ...", which was also discussed about in the thread [1].
> This issue is similar but not exactly the same as [1]. In [1], the
> file for which this error used to show up was
> "pg_logical/mappings/map...." , while here it's the .spill file. And
> here the issue , in short, seems to be : The .spill file does not get
> closed there and then, unlike in [1] where there was a file descriptor
> leak.

Uh-oh :-(  Thanks for the reproducer -- I confirm it breaks things.

> Looking at the code, what might be happening is,
> ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
> files, but leaves them open if end of file is not reached. Eventually
> if end of file is reached, it gets closed. The function returns back
> without closing the file descriptor if  reorder buffer changes being
> restored are more than max_changes_in_memory. Probably later on, the
> rest of the changes get restored in another
> ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
> of such files opened, we can run out of the max files that PG decides
> to keep open (it has some logic that takes into account system files
> allowed to be open, and already opened).

Makes sense.

> Offhand, what I am thinking is, we need to close the file descriptor
> before returning from ReorderBufferRestoreChanges(), and keep track of
> the file offset and file path, so that next time we can resume reading
> from there.

I think doing this all the time would make restore very slow -- there's
a reason we keep the files open, after all.  It would be better if we
can keep the descriptors open as much as possible, and only close them
if there's trouble.  I was under the impression that using
OpenTransientFile was already taking care of that, but that's evidently
not the case.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Tomas Vondra-4
On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

>On 2019-Sep-11, Amit Khandekar wrote:
>
>> I reproduced the error "exceeded maxAllocatedDescs (492) while trying
>> to open file ...", which was also discussed about in the thread [1].
>> This issue is similar but not exactly the same as [1]. In [1], the
>> file for which this error used to show up was
>> "pg_logical/mappings/map...." , while here it's the .spill file. And
>> here the issue , in short, seems to be : The .spill file does not get
>> closed there and then, unlike in [1] where there was a file descriptor
>> leak.
>
>Uh-oh :-(  Thanks for the reproducer -- I confirm it breaks things.
>
>> Looking at the code, what might be happening is,
>> ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
>> files, but leaves them open if end of file is not reached. Eventually
>> if end of file is reached, it gets closed. The function returns back
>> without closing the file descriptor if  reorder buffer changes being
>> restored are more than max_changes_in_memory. Probably later on, the
>> rest of the changes get restored in another
>> ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
>> of such files opened, we can run out of the max files that PG decides
>> to keep open (it has some logic that takes into account system files
>> allowed to be open, and already opened).
>
>Makes sense.
>
>> Offhand, what I am thinking is, we need to close the file descriptor
>> before returning from ReorderBufferRestoreChanges(), and keep track of
>> the file offset and file path, so that next time we can resume reading
>> from there.
>
>I think doing this all the time would make restore very slow -- there's a
>reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

>It would be better if we can keep the descriptors open as much as
>possible, and only close them if there's trouble.  I was under the
>impression that using OpenTransientFile was already taking care of that,
>but that's evidently not the case.
>

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Alvaro Herrera-9
On 2019-Sep-12, Tomas Vondra wrote:

> On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
> > On 2019-Sep-11, Amit Khandekar wrote:

> > I think doing this all the time would make restore very slow -- there's a
> > reason we keep the files open, after all.
>
> How much slower? It certainly will have a hit, but maybe it's negligible
> compared to all the other stuff happening in this code?

I dunno -- that's a half-assed guess based on there being many more
syscalls, and on the fact that the API is how it is in the first place.
(Andres did a lot of perf benchmarking and tweaking back when he was
writing this ... I just point out that I have a colleague that had to
invent *a lot* of new MemCxt infrastructure in order to make some of
Andres' perf optimizations cleaner, just as a semi-related data point.
Anyway, I digress.)

Anyway, such a fix would pessimize all cases, including every single
case that works today (which evidently is almost every single usage of
this feature, since we hadn't heard of this problem until yesterday), in
order to solve a problem that you can only find in very rare ones.

Another point of view is that we should make it work first, then make it
fast.  But the point remains that it works fine and fast for 99.99% of
cases.

> > It would be better if we can keep the descriptors open as much as
> > possible, and only close them if there's trouble.  I was under the
> > impression that using OpenTransientFile was already taking care of that,
> > but that's evidently not the case.
>
> I don't see how the current API could do that transparently - it does
> track the files, but the user only gets a file descriptor. With just a
> file descriptor, how could the code know to do reopen/seek when it's going
> just through the regular fopen/fclose?

Yeah, I don't know what was in Amit's mind, but it seemed obvious to me
that such a fix required changing that API so that a seekpos is kept
together with the fd.  ReorderBufferRestoreChange is static in
reorderbuffer.c so it's not a problem to change its ABI.

I agree with trying to get a reorderbuffer-localized back-patchable fix
first, then we how to improve from that.

> As a sidenote - in the other thread about streaming, one of the patches
> does change how we log subxact assignments. In the end, this allows using
> just a single file for the top-level transaction, instead of having one
> file per subxact. That would also solve this.

:-(  While I would love to get that patch done and get rid of the issue,
it's not a backpatchable fix either.  ... however, it does mean we maybe
can get away with the reorderbuffer.c-local fix, and then just use your
streaming stuff to get rid of the perf problem forever?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Robert Haas
In reply to this post by Tomas Vondra-4
On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
<[hidden email]> wrote:

> I don't see how the current API could do that transparently - it does
> track the files, but the user only gets a file descriptor. With just a
> file descriptor, how could the code know to do reopen/seek when it's going
> just through the regular fopen/fclose?
>
> Anyway, I agree we need to do something, to fix this corner case (many
> serialized in-progress transactions). ISTM we have two options - either do
> something in the context of reorderbuffer.c, or extend the transient file
> API somehow. I'd say the second option is the right thing going forward,
> because it does allow doing it transparently and without leaking details
> about maxAllocatedDescs etc. There are two issues, though - it does
> require changes / extensions to the API, and it's not backpatchable.
>
> So maybe we should start with the localized fix in reorderbuffer, and I
> agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.
You just need to go through the File abstraction (PathNameOpenFile or
PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
functions that deal directly with fds (OpenTransientFile,
BasicOpenFile, etc.).  It seems like it would be better to reuse the
existing VFD layer than to invent a whole new one specific to logical
replication.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Andres Freund
Hi,

On 2019-09-12 09:41:02 -0400, Robert Haas wrote:

> On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
> <[hidden email]> wrote:
> > I don't see how the current API could do that transparently - it does
> > track the files, but the user only gets a file descriptor. With just a
> > file descriptor, how could the code know to do reopen/seek when it's going
> > just through the regular fopen/fclose?
> >
> > Anyway, I agree we need to do something, to fix this corner case (many
> > serialized in-progress transactions). ISTM we have two options - either do
> > something in the context of reorderbuffer.c, or extend the transient file
> > API somehow. I'd say the second option is the right thing going forward,
> > because it does allow doing it transparently and without leaking details
> > about maxAllocatedDescs etc. There are two issues, though - it does
> > require changes / extensions to the API, and it's not backpatchable.
> >
> > So maybe we should start with the localized fix in reorderbuffer, and I
> > agree tracking offset seems reasonable.
>
> We've already got code that knows how to track this sort of thing.
> You just need to go through the File abstraction (PathNameOpenFile or
> PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
> functions that deal directly with fds (OpenTransientFile,
> BasicOpenFile, etc.).  It seems like it would be better to reuse the
> existing VFD layer than to invent a whole new one specific to logical
> replication.

Yea, I agree that that is the right fix.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Andres Freund
In reply to this post by Alvaro Herrera-9
Hi,

On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:

> On 2019-Sep-12, Tomas Vondra wrote:
>
> > On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
> > > On 2019-Sep-11, Amit Khandekar wrote:
>
> > > I think doing this all the time would make restore very slow -- there's a
> > > reason we keep the files open, after all.
> >
> > How much slower? It certainly will have a hit, but maybe it's negligible
> > compared to all the other stuff happening in this code?

I'd expect it to be significant.


> > As a sidenote - in the other thread about streaming, one of the patches
> > does change how we log subxact assignments. In the end, this allows using
> > just a single file for the top-level transaction, instead of having one
> > file per subxact. That would also solve this.

Uhm, how is rollback to savepoint going to be handled in that case? I
don't think it's great to just retain space for all rolled back
subtransactions.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Tomas Vondra-4
On Thu, Sep 12, 2019 at 11:34:01AM -0700, Andres Freund wrote:

>Hi,
>
>On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:
>> On 2019-Sep-12, Tomas Vondra wrote:
>>
>> > On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
>> > > On 2019-Sep-11, Amit Khandekar wrote:
>>
>> > > I think doing this all the time would make restore very slow -- there's a
>> > > reason we keep the files open, after all.
>> >
>> > How much slower? It certainly will have a hit, but maybe it's negligible
>> > compared to all the other stuff happening in this code?
>
>I'd expect it to be significant.
>
>
>> > As a sidenote - in the other thread about streaming, one of the patches
>> > does change how we log subxact assignments. In the end, this allows using
>> > just a single file for the top-level transaction, instead of having one
>> > file per subxact. That would also solve this.
>
>Uhm, how is rollback to savepoint going to be handled in that case? I
>don't think it's great to just retain space for all rolled back
>subtransactions.
>

The code can just do ftruncate() to the position of the subtransaction.
That's what the patch [1] does.

[1] https://commitfest.postgresql.org/24/1927/


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
In reply to this post by Robert Haas
On Thu, 12 Sep 2019 at 19:11, Robert Haas <[hidden email]> wrote:

>
> On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
> <[hidden email]> wrote:
> > I don't see how the current API could do that transparently - it does
> > track the files, but the user only gets a file descriptor. With just a
> > file descriptor, how could the code know to do reopen/seek when it's going
> > just through the regular fopen/fclose?
> >
> > Anyway, I agree we need to do something, to fix this corner case (many
> > serialized in-progress transactions). ISTM we have two options - either do
> > something in the context of reorderbuffer.c, or extend the transient file
> > API somehow. I'd say the second option is the right thing going forward,
> > because it does allow doing it transparently and without leaking details
> > about maxAllocatedDescs etc. There are two issues, though - it does
> > require changes / extensions to the API, and it's not backpatchable.
> >
> > So maybe we should start with the localized fix in reorderbuffer, and I
> > agree tracking offset seems reasonable.
>

> We've already got code that knows how to track this sort of thing.

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD  does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened. Robert, are you suggesting to add this
capability to VFD ? I agree that we could do it, but for
back-patching, offhand I couldn't think of a simpler way.



--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Tom Lane-2
Amit Khandekar <[hidden email]> writes:
> You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
> that excess fds are automatically closed. But Alvaro seems to be
> talking in context of tracking of file seek position. VFD  does not
> have a mechanism to track file offsets if one of the vfd cached file
> is closed and reopened.

Hm.  It used to, but somebody got rid of that on the theory that
we could use pread/pwrite instead.  I'm inclined to think that that
was the right tradeoff, but it'd mean that getting logical decoding
to adhere to the VFD API requires extra work to track file position
on the caller side.

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API.  I concur with that.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Robert Haas
On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <[hidden email]> wrote:

> Amit Khandekar <[hidden email]> writes:
> > You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
> > that excess fds are automatically closed. But Alvaro seems to be
> > talking in context of tracking of file seek position. VFD  does not
> > have a mechanism to track file offsets if one of the vfd cached file
> > is closed and reopened.
>
> Hm.  It used to, but somebody got rid of that on the theory that
> we could use pread/pwrite instead.  I'm inclined to think that that
> was the right tradeoff, but it'd mean that getting logical decoding
> to adhere to the VFD API requires extra work to track file position
> on the caller side.

Oops.  I forgot that we'd removed that.

> Again, though, the advice that's been given here is that we should
> fix logical decoding to use the VFD API as it stands, not change
> that API.  I concur with that.

A reasonable position.  So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
On Fri, 13 Sep 2019 at 22:01, Robert Haas <[hidden email]> wrote:
> On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <[hidden email]> wrote:
> > Again, though, the advice that's been given here is that we should
> > fix logical decoding to use the VFD API as it stands, not change
> > that API.  I concur with that.
>
> A reasonable position.  So I guess logical decoding has to track the
> file position itself, but perhaps use the VFD layer for managing FD
> pooling.

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

use_vfd_for_logrep.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Tom Lane-2
Amit Khandekar <[hidden email]> writes:
> On Fri, 13 Sep 2019 at 22:01, Robert Haas <[hidden email]> wrote:
>> On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <[hidden email]> wrote:
>>> Again, though, the advice that's been given here is that we should
>>> fix logical decoding to use the VFD API as it stands, not change
>>> that API.  I concur with that.

>> A reasonable position.  So I guess logical decoding has to track the
>> file position itself, but perhaps use the VFD layer for managing FD
>> pooling.

> Yeah, something like the attached patch. I think this tracking of
> offsets would have been cleaner if we add in-built support in VFD. But
> yeah, for bank branches at least, we need to handle it outside of VFD.
> Or may be we would add it if we find one more use-case.

Again, we had that and removed it, for what seem to me to be solid
reasons.  It adds cycles when we're forced to close/reopen a file,
and it also adds failure modes that we could do without (ie, failure
of either the ftell or the lseek, which are particularly nasty because
they shouldn't happen according to the VFD abstraction).  I do not
think there is going to be any argument strong enough to make us
put it back, especially not for non-mainstream callers like logical
decoding.

                        regards, tom lane