logical decoding : exceeded maxAllocatedDescs for .spill files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
55 messages Options
123
Reply | Threaded
Open this post in threaded view
|

logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
Hi,

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1].
This issue is similar but not exactly the same as [1]. In [1], the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1] where there was a file descriptor
leak.

I could reproduce it using a transaction containing a long series of
sub-transactions (possibly could be reproduced without
sub-transactions, but looking at the code I could come up with this
script using sub-transactions easily) :

create table tab(id int);

-- Function that does huge changes in a single transaction
create or replace function f(id int) returns int as
$$
begin
    -- Iterate this more than 492 times (max transient file
descriptors PG would typically allow)
    -- This will create that many sub-transactions due to presence of
exception block.
    FOR i IN 1..600 LOOP

        BEGIN
            -- Iterate more than 4096 times (so that changes spill to
disk: max_changes_in_memory)
            FOR j IN 1..5000 LOOP
               insert into tab values (1);
            END LOOP;
        EXCEPTION
            when division_by_zero then perform 'dummy';
        END;

    END LOOP;

    return id;
end $$ language plpgsql;

SELECT * FROM pg_create_logical_replication_slot('logical', 'test_decoding');

begin;
select f(1); -- Do huge changes in a single transaction
commit;

\! pg_recvlogical -d postgres  --slot=logical --verbose --start -f -

pg_recvlogical: starting log streaming at 0/0 (slot logical)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot logical)
BEGIN 1869
pg_recvlogical: confirming write up to 0/1B6D6E38, flush to 0/1B6D6E38
(slot logical)
pg_recvlogical: error: unexpected termination of replication stream:
ERROR:  exceeded maxAllocatedDescs (492) while trying to open file
"pg_replslot/logical/xid-2362-lsn-0-24000000.spill"
pg_recvlogical: disconnected; waiting 5 seconds to try again

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if  reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

Comments ?

[1] https://www.postgresql.org/message-id/flat/738a590a-2ce5-9394-2bef-7b1caad89b37%402ndquadrant.com

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Álvaro Herrera
On 2019-Sep-11, Amit Khandekar wrote:

> I reproduced the error "exceeded maxAllocatedDescs (492) while trying
> to open file ...", which was also discussed about in the thread [1].
> This issue is similar but not exactly the same as [1]. In [1], the
> file for which this error used to show up was
> "pg_logical/mappings/map...." , while here it's the .spill file. And
> here the issue , in short, seems to be : The .spill file does not get
> closed there and then, unlike in [1] where there was a file descriptor
> leak.

Uh-oh :-(  Thanks for the reproducer -- I confirm it breaks things.

> Looking at the code, what might be happening is,
> ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
> files, but leaves them open if end of file is not reached. Eventually
> if end of file is reached, it gets closed. The function returns back
> without closing the file descriptor if  reorder buffer changes being
> restored are more than max_changes_in_memory. Probably later on, the
> rest of the changes get restored in another
> ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
> of such files opened, we can run out of the max files that PG decides
> to keep open (it has some logic that takes into account system files
> allowed to be open, and already opened).

Makes sense.

> Offhand, what I am thinking is, we need to close the file descriptor
> before returning from ReorderBufferRestoreChanges(), and keep track of
> the file offset and file path, so that next time we can resume reading
> from there.

I think doing this all the time would make restore very slow -- there's
a reason we keep the files open, after all.  It would be better if we
can keep the descriptors open as much as possible, and only close them
if there's trouble.  I was under the impression that using
OpenTransientFile was already taking care of that, but that's evidently
not the case.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Tomas Vondra-4
On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:

>On 2019-Sep-11, Amit Khandekar wrote:
>
>> I reproduced the error "exceeded maxAllocatedDescs (492) while trying
>> to open file ...", which was also discussed about in the thread [1].
>> This issue is similar but not exactly the same as [1]. In [1], the
>> file for which this error used to show up was
>> "pg_logical/mappings/map...." , while here it's the .spill file. And
>> here the issue , in short, seems to be : The .spill file does not get
>> closed there and then, unlike in [1] where there was a file descriptor
>> leak.
>
>Uh-oh :-(  Thanks for the reproducer -- I confirm it breaks things.
>
>> Looking at the code, what might be happening is,
>> ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
>> files, but leaves them open if end of file is not reached. Eventually
>> if end of file is reached, it gets closed. The function returns back
>> without closing the file descriptor if  reorder buffer changes being
>> restored are more than max_changes_in_memory. Probably later on, the
>> rest of the changes get restored in another
>> ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
>> of such files opened, we can run out of the max files that PG decides
>> to keep open (it has some logic that takes into account system files
>> allowed to be open, and already opened).
>
>Makes sense.
>
>> Offhand, what I am thinking is, we need to close the file descriptor
>> before returning from ReorderBufferRestoreChanges(), and keep track of
>> the file offset and file path, so that next time we can resume reading
>> from there.
>
>I think doing this all the time would make restore very slow -- there's a
>reason we keep the files open, after all.

How much slower? It certainly will have a hit, but maybe it's negligible
compared to all the other stuff happening in this code?

>It would be better if we can keep the descriptors open as much as
>possible, and only close them if there's trouble.  I was under the
>impression that using OpenTransientFile was already taking care of that,
>but that's evidently not the case.
>

I don't see how the current API could do that transparently - it does
track the files, but the user only gets a file descriptor. With just a
file descriptor, how could the code know to do reopen/seek when it's going
just through the regular fopen/fclose?

Anyway, I agree we need to do something, to fix this corner case (many
serialized in-progress transactions). ISTM we have two options - either do
something in the context of reorderbuffer.c, or extend the transient file
API somehow. I'd say the second option is the right thing going forward,
because it does allow doing it transparently and without leaking details
about maxAllocatedDescs etc. There are two issues, though - it does
require changes / extensions to the API, and it's not backpatchable.

So maybe we should start with the localized fix in reorderbuffer, and I
agree tracking offset seems reasonable.

As a sidenote - in the other thread about streaming, one of the patches
does change how we log subxact assignments. In the end, this allows using
just a single file for the top-level transaction, instead of having one
file per subxact. That would also solve this.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Alvaro Herrera-9
On 2019-Sep-12, Tomas Vondra wrote:

> On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
> > On 2019-Sep-11, Amit Khandekar wrote:

> > I think doing this all the time would make restore very slow -- there's a
> > reason we keep the files open, after all.
>
> How much slower? It certainly will have a hit, but maybe it's negligible
> compared to all the other stuff happening in this code?

I dunno -- that's a half-assed guess based on there being many more
syscalls, and on the fact that the API is how it is in the first place.
(Andres did a lot of perf benchmarking and tweaking back when he was
writing this ... I just point out that I have a colleague that had to
invent *a lot* of new MemCxt infrastructure in order to make some of
Andres' perf optimizations cleaner, just as a semi-related data point.
Anyway, I digress.)

Anyway, such a fix would pessimize all cases, including every single
case that works today (which evidently is almost every single usage of
this feature, since we hadn't heard of this problem until yesterday), in
order to solve a problem that you can only find in very rare ones.

Another point of view is that we should make it work first, then make it
fast.  But the point remains that it works fine and fast for 99.99% of
cases.

> > It would be better if we can keep the descriptors open as much as
> > possible, and only close them if there's trouble.  I was under the
> > impression that using OpenTransientFile was already taking care of that,
> > but that's evidently not the case.
>
> I don't see how the current API could do that transparently - it does
> track the files, but the user only gets a file descriptor. With just a
> file descriptor, how could the code know to do reopen/seek when it's going
> just through the regular fopen/fclose?

Yeah, I don't know what was in Amit's mind, but it seemed obvious to me
that such a fix required changing that API so that a seekpos is kept
together with the fd.  ReorderBufferRestoreChange is static in
reorderbuffer.c so it's not a problem to change its ABI.

I agree with trying to get a reorderbuffer-localized back-patchable fix
first, then we how to improve from that.

> As a sidenote - in the other thread about streaming, one of the patches
> does change how we log subxact assignments. In the end, this allows using
> just a single file for the top-level transaction, instead of having one
> file per subxact. That would also solve this.

:-(  While I would love to get that patch done and get rid of the issue,
it's not a backpatchable fix either.  ... however, it does mean we maybe
can get away with the reorderbuffer.c-local fix, and then just use your
streaming stuff to get rid of the perf problem forever?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Robert Haas
In reply to this post by Tomas Vondra-4
On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
<[hidden email]> wrote:

> I don't see how the current API could do that transparently - it does
> track the files, but the user only gets a file descriptor. With just a
> file descriptor, how could the code know to do reopen/seek when it's going
> just through the regular fopen/fclose?
>
> Anyway, I agree we need to do something, to fix this corner case (many
> serialized in-progress transactions). ISTM we have two options - either do
> something in the context of reorderbuffer.c, or extend the transient file
> API somehow. I'd say the second option is the right thing going forward,
> because it does allow doing it transparently and without leaking details
> about maxAllocatedDescs etc. There are two issues, though - it does
> require changes / extensions to the API, and it's not backpatchable.
>
> So maybe we should start with the localized fix in reorderbuffer, and I
> agree tracking offset seems reasonable.

We've already got code that knows how to track this sort of thing.
You just need to go through the File abstraction (PathNameOpenFile or
PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
functions that deal directly with fds (OpenTransientFile,
BasicOpenFile, etc.).  It seems like it would be better to reuse the
existing VFD layer than to invent a whole new one specific to logical
replication.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Andres Freund
Hi,

On 2019-09-12 09:41:02 -0400, Robert Haas wrote:

> On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
> <[hidden email]> wrote:
> > I don't see how the current API could do that transparently - it does
> > track the files, but the user only gets a file descriptor. With just a
> > file descriptor, how could the code know to do reopen/seek when it's going
> > just through the regular fopen/fclose?
> >
> > Anyway, I agree we need to do something, to fix this corner case (many
> > serialized in-progress transactions). ISTM we have two options - either do
> > something in the context of reorderbuffer.c, or extend the transient file
> > API somehow. I'd say the second option is the right thing going forward,
> > because it does allow doing it transparently and without leaking details
> > about maxAllocatedDescs etc. There are two issues, though - it does
> > require changes / extensions to the API, and it's not backpatchable.
> >
> > So maybe we should start with the localized fix in reorderbuffer, and I
> > agree tracking offset seems reasonable.
>
> We've already got code that knows how to track this sort of thing.
> You just need to go through the File abstraction (PathNameOpenFile or
> PathNameOpenFilePerm or OpenTemporaryFile) rather than using the
> functions that deal directly with fds (OpenTransientFile,
> BasicOpenFile, etc.).  It seems like it would be better to reuse the
> existing VFD layer than to invent a whole new one specific to logical
> replication.

Yea, I agree that that is the right fix.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Andres Freund
In reply to this post by Alvaro Herrera-9
Hi,

On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:

> On 2019-Sep-12, Tomas Vondra wrote:
>
> > On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
> > > On 2019-Sep-11, Amit Khandekar wrote:
>
> > > I think doing this all the time would make restore very slow -- there's a
> > > reason we keep the files open, after all.
> >
> > How much slower? It certainly will have a hit, but maybe it's negligible
> > compared to all the other stuff happening in this code?

I'd expect it to be significant.


> > As a sidenote - in the other thread about streaming, one of the patches
> > does change how we log subxact assignments. In the end, this allows using
> > just a single file for the top-level transaction, instead of having one
> > file per subxact. That would also solve this.

Uhm, how is rollback to savepoint going to be handled in that case? I
don't think it's great to just retain space for all rolled back
subtransactions.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Tomas Vondra-4
On Thu, Sep 12, 2019 at 11:34:01AM -0700, Andres Freund wrote:

>Hi,
>
>On 2019-09-12 09:57:55 -0300, Alvaro Herrera wrote:
>> On 2019-Sep-12, Tomas Vondra wrote:
>>
>> > On Wed, Sep 11, 2019 at 09:51:40AM -0300, Alvaro Herrera from 2ndQuadrant wrote:
>> > > On 2019-Sep-11, Amit Khandekar wrote:
>>
>> > > I think doing this all the time would make restore very slow -- there's a
>> > > reason we keep the files open, after all.
>> >
>> > How much slower? It certainly will have a hit, but maybe it's negligible
>> > compared to all the other stuff happening in this code?
>
>I'd expect it to be significant.
>
>
>> > As a sidenote - in the other thread about streaming, one of the patches
>> > does change how we log subxact assignments. In the end, this allows using
>> > just a single file for the top-level transaction, instead of having one
>> > file per subxact. That would also solve this.
>
>Uhm, how is rollback to savepoint going to be handled in that case? I
>don't think it's great to just retain space for all rolled back
>subtransactions.
>

The code can just do ftruncate() to the position of the subtransaction.
That's what the patch [1] does.

[1] https://commitfest.postgresql.org/24/1927/


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
In reply to this post by Robert Haas
On Thu, 12 Sep 2019 at 19:11, Robert Haas <[hidden email]> wrote:

>
> On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
> <[hidden email]> wrote:
> > I don't see how the current API could do that transparently - it does
> > track the files, but the user only gets a file descriptor. With just a
> > file descriptor, how could the code know to do reopen/seek when it's going
> > just through the regular fopen/fclose?
> >
> > Anyway, I agree we need to do something, to fix this corner case (many
> > serialized in-progress transactions). ISTM we have two options - either do
> > something in the context of reorderbuffer.c, or extend the transient file
> > API somehow. I'd say the second option is the right thing going forward,
> > because it does allow doing it transparently and without leaking details
> > about maxAllocatedDescs etc. There are two issues, though - it does
> > require changes / extensions to the API, and it's not backpatchable.
> >
> > So maybe we should start with the localized fix in reorderbuffer, and I
> > agree tracking offset seems reasonable.
>

> We've already got code that knows how to track this sort of thing.

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD  does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened. Robert, are you suggesting to add this
capability to VFD ? I agree that we could do it, but for
back-patching, offhand I couldn't think of a simpler way.



--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Tom Lane-2
Amit Khandekar <[hidden email]> writes:
> You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
> that excess fds are automatically closed. But Alvaro seems to be
> talking in context of tracking of file seek position. VFD  does not
> have a mechanism to track file offsets if one of the vfd cached file
> is closed and reopened.

Hm.  It used to, but somebody got rid of that on the theory that
we could use pread/pwrite instead.  I'm inclined to think that that
was the right tradeoff, but it'd mean that getting logical decoding
to adhere to the VFD API requires extra work to track file position
on the caller side.

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API.  I concur with that.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Robert Haas
On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <[hidden email]> wrote:

> Amit Khandekar <[hidden email]> writes:
> > You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
> > that excess fds are automatically closed. But Alvaro seems to be
> > talking in context of tracking of file seek position. VFD  does not
> > have a mechanism to track file offsets if one of the vfd cached file
> > is closed and reopened.
>
> Hm.  It used to, but somebody got rid of that on the theory that
> we could use pread/pwrite instead.  I'm inclined to think that that
> was the right tradeoff, but it'd mean that getting logical decoding
> to adhere to the VFD API requires extra work to track file position
> on the caller side.

Oops.  I forgot that we'd removed that.

> Again, though, the advice that's been given here is that we should
> fix logical decoding to use the VFD API as it stands, not change
> that API.  I concur with that.

A reasonable position.  So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
On Fri, 13 Sep 2019 at 22:01, Robert Haas <[hidden email]> wrote:
> On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <[hidden email]> wrote:
> > Again, though, the advice that's been given here is that we should
> > fix logical decoding to use the VFD API as it stands, not change
> > that API.  I concur with that.
>
> A reasonable position.  So I guess logical decoding has to track the
> file position itself, but perhaps use the VFD layer for managing FD
> pooling.

Yeah, something like the attached patch. I think this tracking of
offsets would have been cleaner if we add in-built support in VFD. But
yeah, for bank branches at least, we need to handle it outside of VFD.
Or may be we would add it if we find one more use-case.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

use_vfd_for_logrep.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Tom Lane-2
Amit Khandekar <[hidden email]> writes:
> On Fri, 13 Sep 2019 at 22:01, Robert Haas <[hidden email]> wrote:
>> On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <[hidden email]> wrote:
>>> Again, though, the advice that's been given here is that we should
>>> fix logical decoding to use the VFD API as it stands, not change
>>> that API.  I concur with that.

>> A reasonable position.  So I guess logical decoding has to track the
>> file position itself, but perhaps use the VFD layer for managing FD
>> pooling.

> Yeah, something like the attached patch. I think this tracking of
> offsets would have been cleaner if we add in-built support in VFD. But
> yeah, for bank branches at least, we need to handle it outside of VFD.
> Or may be we would add it if we find one more use-case.

Again, we had that and removed it, for what seem to me to be solid
reasons.  It adds cycles when we're forced to close/reopen a file,
and it also adds failure modes that we could do without (ie, failure
of either the ftell or the lseek, which are particularly nasty because
they shouldn't happen according to the VFD abstraction).  I do not
think there is going to be any argument strong enough to make us
put it back, especially not for non-mainstream callers like logical
decoding.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Andres Freund
Hi,

On 2019-09-14 14:34:21 -0400, Tom Lane wrote:

> Amit Khandekar <[hidden email]> writes:
> > On Fri, 13 Sep 2019 at 22:01, Robert Haas <[hidden email]> wrote:
> >> On Fri, Sep 13, 2019 at 12:14 PM Tom Lane <[hidden email]> wrote:
> >>> Again, though, the advice that's been given here is that we should
> >>> fix logical decoding to use the VFD API as it stands, not change
> >>> that API.  I concur with that.
>
> >> A reasonable position.  So I guess logical decoding has to track the
> >> file position itself, but perhaps use the VFD layer for managing FD
> >> pooling.
>
> > Yeah, something like the attached patch. I think this tracking of
> > offsets would have been cleaner if we add in-built support in VFD. But
> > yeah, for bank branches at least, we need to handle it outside of VFD.
> > Or may be we would add it if we find one more use-case.
>
> Again, we had that and removed it, for what seem to me to be solid
> reasons.  It adds cycles when we're forced to close/reopen a file,
> and it also adds failure modes that we could do without (ie, failure
> of either the ftell or the lseek, which are particularly nasty because
> they shouldn't happen according to the VFD abstraction).  I do not
> think there is going to be any argument strong enough to make us
> put it back, especially not for non-mainstream callers like logical
> decoding.

Yea, I think that's the right call. Avoiding kernel seeks is quite
worthwhile, and we shouldn't undo it just because of this usecase. And
that'll become more and more important performance-wise (and has already
done so, with all the intel fixes making syscalls much slower).

I could see an argument for adding a separate generic layer providing
position tracking ontop of the VFD abstraction however. Seems quite
possible that there's some other parts of the system that could benefit
from using VFDs rather than plain fds. And they'd probably also need the
positional tracking.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
On Tue, 17 Sep 2019 at 21:19, Andres Freund <[hidden email]> wrote:

> On 2019-09-14 14:34:21 -0400, Tom Lane wrote:
> > Amit Khandekar <[hidden email]> writes:
> > > Yeah, something like the attached patch. I think this tracking of
> > > offsets would have been cleaner if we add in-built support in VFD. But
> > > yeah, for bank branches at least, we need to handle it outside of VFD.
> > > Or may be we would add it if we find one more use-case.
> >
> > Again, we had that and removed it, for what seem to me to be solid
> > reasons.  It adds cycles when we're forced to close/reopen a file,
> > and it also adds failure modes that we could do without (ie, failure
> > of either the ftell or the lseek, which are particularly nasty because
> > they shouldn't happen according to the VFD abstraction).

Ok. So you mean, when the caller would call FileRead() for sequential
reading, underneath VFD would do a pread(), but if pread() returns
error, the errno can belong to read() or it might as well belong to
lseek(). If it's due to lseek(), it's not expected from the caller
because for the caller it's just a sequential read. Yeah, makes sense.

>>  I do not
> > think there is going to be any argument strong enough to make us
> > put it back, especially not for non-mainstream callers like logical
> > decoding.

Ok. Also, more about putting back is in the below comments ...

>
> Yea, I think that's the right call. Avoiding kernel seeks is quite
> worthwhile, and we shouldn't undo it just because of this usecase. And
> that'll become more and more important performance-wise (and has already
> done so, with all the intel fixes making syscalls much slower).

By the way, I was not thinking about adding back the read() and
lseek() calls. I was saying we continue to use the pread() call, so
it's just a single system call. FileReadAt(..., offset) would do
pread() with user-supplied offset, and FileRead() would do pread()
using internally tracked offset. So for the user, FileReadAt() is like
pread(), and FileRead() would be like read().

But I agree with Tom's objection about having to unnecessarily handle
lseek error codes.

>
> I could see an argument for adding a separate generic layer providing
> position tracking ontop of the VFD abstraction however. Seems quite
> possible that there's some other parts of the system that could benefit
> from using VFDs rather than plain fds. And they'd probably also need the
> positional tracking.

Yeah, that also could be done.

Probably, for now at least, what everyone seems to agree is to take my
earlier attached patch forward.

I am going to see if I can add a TAP test for the patch, and will add
the patch into the commitfest soon.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <[hidden email]> wrote:
> Probably, for now at least, what everyone seems to agree is to take my
> earlier attached patch forward.
>
> I am going to see if I can add a TAP test for the patch, and will add
> the patch into the commitfest soon.

Attached is an updated patch v2. Has a new test scenario added in
contrib/test_decoding/sql/spill test, and some minor code cleanup.

Going to add this into Nov commitfest.





--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

use_vfd_for_logrep_v2.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

akapila
On Thu, Oct 3, 2019 at 4:48 PM Amit Khandekar <[hidden email]> wrote:

>
> On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <[hidden email]> wrote:
> > Probably, for now at least, what everyone seems to agree is to take my
> > earlier attached patch forward.
> >
> > I am going to see if I can add a TAP test for the patch, and will add
> > the patch into the commitfest soon.
>
> Attached is an updated patch v2.
>

I see that you have made changes in ReorderBufferRestoreChanges to use
PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
reason for the same?  In my test environment, with the test provided
by you, I got the error (reported in this thread) via
ReorderBufferSerializeTXN.  See call stack below:

!errfinish(int dummy=0, ...)  Line 441 C
!OpenTransientFilePerm(const char * fileName=0x012deeac, int
fileFlags=33033, unsigned short fileMode=384)  Line 2272 + 0x57 bytes
C
!OpenTransientFile(const char * fileName=0x012deeac, int
fileFlags=33033)  Line 2256 + 0x15 bytes C
!ReorderBufferSerializeTXN(ReorderBuffer * rb=0x01ee4d80,
ReorderBufferTXN * txn=0x1f9a6ce8)  Line 2302 + 0x11 bytes C
!ReorderBufferIterTXNInit(ReorderBuffer * rb=0x01ee4d80,
ReorderBufferTXN * txn=0x01f08f80)  Line 1044 + 0xd bytes C


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
On Mon, 18 Nov 2019 at 17:20, Amit Kapila <[hidden email]> wrote:
> I see that you have made changes in ReorderBufferRestoreChanges to use
> PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
> reason for the same?  In my test environment, with the test provided
> by you, I got the error (reported in this thread) via
> ReorderBufferSerializeTXN.

You didn't get this error with the patch applied, did you ?

If you were debugging this without the patch applied, I suspect that
the reason why ReorderBufferSerializeTXN() => OpenTransientFile() is
generating this error is because the max limit must be already crossed
because of earlier calls to ReorderBufferRestoreChanges().

Note that in ReorderBufferSerializeTXN(), OpenTransientFile() is
sufficient because the code in that function has made sure the fd gets
closed there itself.

If you are getting this error even with the patch applied, then this
needs investigation.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

akapila
In reply to this post by akapila
On Mon, Nov 18, 2019 at 5:20 PM Amit Kapila <[hidden email]> wrote:

>
> On Thu, Oct 3, 2019 at 4:48 PM Amit Khandekar <[hidden email]> wrote:
> >
> > On Wed, 18 Sep 2019 at 12:24, Amit Khandekar <[hidden email]> wrote:
> > > Probably, for now at least, what everyone seems to agree is to take my
> > > earlier attached patch forward.
> > >
> > > I am going to see if I can add a TAP test for the patch, and will add
> > > the patch into the commitfest soon.
> >
> > Attached is an updated patch v2.
> >
>
> I see that you have made changes in ReorderBufferRestoreChanges to use
> PathNameOpenFile, but not in ReorderBufferSerializeTXN.  Is there a
> reason for the same?
>

I have one more question regarding this patch.   It seems to me that
the files opened via OpenTransientFile or OpenTemporaryFile are
automatically closed at transaction end(abort), but that doesn't seem
to be the case for files opened with PathNameOpenFile.  See
AtEOXact_Files and AtEOSubXact_Files.  So, now with the change
proposed by this patch, don't we need to deal it in some other way?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

Amit Khandekar-2
On Mon, 18 Nov 2019 at 17:52, Amit Kapila <[hidden email]> wrote:
> I have one more question regarding this patch.   It seems to me that
> the files opened via OpenTransientFile or OpenTemporaryFile are
> automatically closed at transaction end(abort), but that doesn't seem
> to be the case for files opened with PathNameOpenFile.  See
> AtEOXact_Files and AtEOSubXact_Files.  So, now with the change
> proposed by this patch, don't we need to deal it in some other way?

For the API's that use VFDs (like PathNameOpenFile), the files opened
are always recorded in the VfdCache array. So it is not required to do
the cleanup at (sub)transaction end, because the kernel fds get closed
dynamically in ReleaseLruFiles() whenever they reach max_safe_fds
limit. So if a transaction aborts, the fds might remain open, but
those will get cleaned up whenever we require more fds, through
ReleaseLruFiles(). Whereas, for files opened through
OpenTransientFile(), VfdCache is not involved, so this needs
transaction end cleanup.


--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


123