pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Justin Pryzby
While working on a patch, I noticed this pre-existing behavior, which seems to
be new since v11, maybe due to changes to SRF.

|postgres=# SELECT pg_ls_dir('.') LIMIT 1;
|WARNING:  1 temporary files and directories not closed at end-of-transaction
|pg_ls_dir | pg_dynshmem

|postgres=# SELECT pg_ls_waldir() LIMIT 1;
|WARNING:  1 temporary files and directories not closed at end-of-transaction
|-[ RECORD 1 ]+-------------------------------------------------------------
|pg_ls_waldir | (00000001000031920000007B,16777216,"2020-03-08 03:50:34-07")


Note, that doesn't happen with "SELECT * FROM".

I'm not sure what the solution is to that, but my patch was going to make it
worse rather than better for pg_ls_tmpdir.

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
> While working on a patch, I noticed this pre-existing behavior, which seems to
> be new since v11, maybe due to changes to SRF.

> |postgres=# SELECT pg_ls_dir('.') LIMIT 1;
> |WARNING:  1 temporary files and directories not closed at end-of-transaction

Hmm, actually it looks to me like pg_ls_dir has been broken forever.
The reason the warning didn't show up before v11 is that CleanupTempFiles
didn't bleat about leaked "allocated" directories before that
(cf 9cb7db3f0).

I guess we ought to change that function to use returns-a-tuplestore
protocol instead of thinking it can hold a directory open across calls.
It's not hard to think of use-cases where the existing behavior would
cause issues worse than a nanny-ish WARNING, especially on platforms
with tight "ulimit -n" limits.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Justin Pryzby
On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:

> Justin Pryzby <[hidden email]> writes:
> > While working on a patch, I noticed this pre-existing behavior, which seems to
> > be new since v11, maybe due to changes to SRF.
>
> > |postgres=# SELECT pg_ls_dir('.') LIMIT 1;
> > |WARNING:  1 temporary files and directories not closed at end-of-transaction
>
> Hmm, actually it looks to me like pg_ls_dir has been broken forever.
> The reason the warning didn't show up before v11 is that CleanupTempFiles
> didn't bleat about leaked "allocated" directories before that
> (cf 9cb7db3f0).
>
> I guess we ought to change that function to use returns-a-tuplestore
> protocol instead of thinking it can hold a directory open across calls.
> It's not hard to think of use-cases where the existing behavior would
> cause issues worse than a nanny-ish WARNING, especially on platforms
> with tight "ulimit -n" limits.

Thanks for the analysis.

Do you mean it should enumerate all files during the initial SRF call, or use
something other than the SRF_* macros ?

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
> On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:
>> I guess we ought to change that function to use returns-a-tuplestore
>> protocol instead of thinking it can hold a directory open across calls.
>> It's not hard to think of use-cases where the existing behavior would
>> cause issues worse than a nanny-ish WARNING, especially on platforms
>> with tight "ulimit -n" limits.

> Thanks for the analysis.

> Do you mean it should enumerate all files during the initial SRF call, or use
> something other than the SRF_* macros ?

It has to enumerate all the files during the first call.  I suppose it
could do that and then still hand back the results one-at-a-time, but
there seems little point compared to filling a tuplestore immediately.
So probably the SRF_ macros are useless here.

Another possible solution is to register an exprstate-shutdown hook to
ensure the resource is cleaned up, but I'm not very happy with that
because it does nothing to prevent the hazard of overrunning the
available resources if you have several of these active at once.

I've just finished scanning the source code and concluding that all
of these functions are similarly broken:

pg_ls_dir
pg_ls_dir_files
pg_tablespace_databases
pg_logdir_ls_internal
pg_timezone_names
pgrowlocks

The first five risk leaking an open directory, the last risks leaking
an active tablescan and open relation.

I don't see anything in the documentation (either funcapi.h or
xfunc.sgml) warning that the function might not get run to completion,
either ...

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
I wrote:
> I've just finished scanning the source code and concluding that all
> of these functions are similarly broken:
> pg_ls_dir
> pg_ls_dir_files
> pg_tablespace_databases
> pg_logdir_ls_internal
> pg_timezone_names
> pgrowlocks

BTW, another thing I noticed while looking around is that some of
the functions using SRF_RETURN_DONE() think they should clean up
memory beforehand.  This is a waste of code/cycles, as long as the
memory was properly allocated in funcctx->multi_call_memory_ctx,
because funcapi.c takes care of deleting that context.

We should probably document that *any* manual cleanup before
SRF_RETURN_DONE() is an antipattern.  If you have to have cleanup,
it needs to be done via RegisterExprContextCallback instead.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Justin Pryzby
In reply to this post by Tom Lane-2
On Sun, Mar 08, 2020 at 03:40:09PM -0400, Tom Lane wrote:
> Justin Pryzby <[hidden email]> writes:
> > On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:
> >> I guess we ought to change that function to use returns-a-tuplestore
> >> protocol instead of thinking it can hold a directory open across calls.
> > Thanks for the analysis.
> > Do you mean it should enumerate all files during the initial SRF call, or use
> > something other than the SRF_* macros ?
> It has to enumerate all the files during the first call.  I suppose it

> I've just finished scanning the source code and concluding that all
> of these functions are similarly broken:
> pg_ls_dir_files

I patched this one to see what it looks like and to allow /hopefully/ moving
forward one way or another with the pg_ls_tmpfile() patch set (or at least
avoid trying to do anything there which is too inconsistent with this fix).

> I don't see anything in the documentation (either funcapi.h or
> xfunc.sgml) warning that the function might not get run to completion,
> either ...

Also, at first glance, these seem to be passing constant "randomAccess=true"
rather than (bool) (rsinfo->allowedModes&SFRM_Materialize_Random)

$ git grep -wl SFRM_Materialize |xargs grep -l 'tuplestore_begin_heap(true'
contrib/dblink/dblink.c
contrib/pageinspect/brinfuncs.c
contrib/pg_stat_statements/pg_stat_statements.c
src/backend/access/transam/xlogfuncs.c
src/backend/commands/event_trigger.c
src/backend/commands/extension.c
src/backend/foreign/foreign.c
src/backend/replication/logical/launcher.c
src/backend/replication/logical/logicalfuncs.c
src/backend/replication/logical/origin.c
src/backend/replication/slotfuncs.c
src/backend/replication/walsender.c
src/backend/storage/ipc/shmem.c
src/backend/utils/adt/pgstatfuncs.c
src/backend/utils/misc/guc.c
src/backend/utils/misc/pg_config.c

--
Justin

v1-0001-pg_ls_dir_files-avoid-leaking-DIR-if-not-run-to-c.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
>>> On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:
>>>> I guess we ought to change that function to use returns-a-tuplestore
>>>> protocol instead of thinking it can hold a directory open across calls.

> I patched this one to see what it looks like and to allow /hopefully/ moving
> forward one way or another with the pg_ls_tmpfile() patch set (or at least
> avoid trying to do anything there which is too inconsistent with this fix).

I reviewed this, added some test cases, and pushed it, so that we can see
if the buildfarm finds anything wrong.  (I'm not expecting that, because
this should all be pretty portable, but you never know.)  Assuming not,
we need to fix the other functions similarly, and then do something about
revising the documentation to warn against this coding style.  Do you
want to have a go at that?

> Also, at first glance, these seem to be passing constant "randomAccess=true"
> rather than (bool) (rsinfo->allowedModes&SFRM_Materialize_Random)

Hm.  Not a bug, but possibly a performance issue, if the tuplestore
gets big enough for that to matter.  (I think it doesn't matter until
we start spilling to temp files.)

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Justin Pryzby
On Wed, Mar 11, 2020 at 03:32:38PM -0400, Tom Lane wrote:
> > I patched this one to see what it looks like and to allow /hopefully/ moving
> > forward one way or another with the pg_ls_tmpfile() patch set (or at least
> > avoid trying to do anything there which is too inconsistent with this fix).
>
> I reviewed this, added some test cases, and pushed it, so that we can see

Thanks, tests were on my periphery..

|    In passing, fix bogus error report for stat() failure: it was
|    whining about the directory when it should be fingering the
|    individual file.  Doubtless a copy-and-paste error.

Thanks again ; that was my 0001 patch on the other thread.  No rebase conflict
even ;)
https://www.postgresql.org/message-id/20191228101650.GG12890%40telsasoft.com

> Do you want to have a go at that?

First draft attached.  Note that I handled pg_ls_dir, even though I'm proposing
on the other thread to collapse/merge/meld it with pg_ls_dir_files [0].
Possibly that's a bad idea with tuplestore, due to returning a scalar vs a row
and needing to conditionally call CreateTemplateTupleDesc vs
get_call_result_type.  I'll rebase that patch later today.

I didn't write test cases yet.  Also didn't look for functions not on your
list.

I noticed this doesn't actually do anything, but kept it for now...except in
pg_ls_dir error case:

src/include/utils/tuplestore.h:/* tuplestore_donestoring() used to be required, but is no longer used */
src/include/utils/tuplestore.h:#define tuplestore_donestoring(state)    ((void) 0)

I found a few documentation bits that I think aren't relevant, but could
possibly be misread to encourage the bad coding practice.  This is about *sql*
functions:

|37.5.8. SQL Functions Returning Sets
|When an SQL function is declared as returning SETOF sometype, the function's
|final query is executed TO COMPLETION, and each row it outputs is returned as
|an element of the result set.
|...
|Set-returning functions in the select list are always evaluated as though they
|are on the inside of a nested-loop join with the rest of the FROM clause, so
|that the function(s) are run TO COMPLETION before the next row from the FROM
|clause is considered.

--
Justin

[0] https://www.postgresql.org/message-id/20200310183037.GA29065%40telsasoft.com
v9-0008-generalize-pg_ls_dir_files-and-retire-pg_ls_dir.patch

0001-SRF-avoid-leaking-resources-if-not-run-to-completion.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Justin Pryzby
In reply to this post by Tom Lane-2
On Sun, Mar 08, 2020 at 04:30:44PM -0400, Tom Lane wrote:
> BTW, another thing I noticed while looking around is that some of
> the functions using SRF_RETURN_DONE() think they should clean up
> memory beforehand.  This is a waste of code/cycles, as long as the
> memory was properly allocated in funcctx->multi_call_memory_ctx,
> because funcapi.c takes care of deleting that context.
>
> We should probably document that *any* manual cleanup before
> SRF_RETURN_DONE() is an antipattern.  If you have to have cleanup,
> it needs to be done via RegisterExprContextCallback instead.

This part appears to be already in place since
e4186762ffaa4188e16702e8f4f299ea70988b96:

|The memory context that is current when the SRF is called is a transient
|context that will be cleared between calls. This means that you do not need to
|call pfree on everything you allocated using palloc; it will go away anyway.
|However, if you want to allocate any data structures to live across calls, you
|need to put them somewhere else. The memory context referenced by
|multi_call_memory_ctx is a suitable location for any data that needs to survive
|until the SRF is finished running. In most cases, this means that you should
|switch into multi_call_memory_ctx while doing the first-call setup.

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Alvaro Herrera-9
In reply to this post by Justin Pryzby
Nitpick: please see c4dcd9144ba6.

> From: Justin Pryzby <[hidden email]>
> Date: Wed, 11 Mar 2020 10:09:18 -0500
> Subject: [PATCH] SRF: avoid leaking resources if not run to completion
>
> Change to return a tuplestore populated immediately and returned in full.
>
> Discussion: https://www.postgresql.org/message-id/20200308173103.GC1357%40telsasoft.com

I wonder if this isn't saying that the whole value-per-call protocol is
bogus, in that it seems impossible to write a useful function with it.
Maybe we should add one final call with a special flag "function
shutdown" or something, so that these resources can be released if the
SRF isn't run to completion?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> I wonder if this isn't saying that the whole value-per-call protocol is
> bogus, in that it seems impossible to write a useful function with it.

Only if you have a *very* narrow definition of "useful function".
If you look through SRF_RETURN_DONE callers, only a small minority
are trying to do resource cleanup beforehand.

> Maybe we should add one final call with a special flag "function
> shutdown" or something, so that these resources can be released if the
> SRF isn't run to completion?

We already have an appropriate mechanism for cleaning up resources,
ie RegisterExprContextCallback.  I do not think what you're suggesting
could be made to work without an incompatible break in the API for
SRFs, so it's not an improvement over telling people more forcefully
about how to do resource cleanup correctly.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Justin Pryzby
In reply to this post by Justin Pryzby
On Thu, Mar 12, 2020 at 07:11:56AM -0500, Justin Pryzby wrote:

> > Do you want to have a go at that?
>
> First draft attached.  Note that I handled pg_ls_dir, even though I'm proposing
> on the other thread to collapse/merge/meld it with pg_ls_dir_files [0].
> Possibly that's a bad idea with tuplestore, due to returning a scalar vs a row
> and needing to conditionally call CreateTemplateTupleDesc vs
> get_call_result_type.  I'll rebase that patch later today.
>
> I didn't write test cases yet.  Also didn't look for functions not on your
> list.
>
> I noticed this doesn't actually do anything, but kept it for now...except in
> pg_ls_dir error case:
>
> src/include/utils/tuplestore.h:/* tuplestore_donestoring() used to be required, but is no longer used */
> src/include/utils/tuplestore.h:#define tuplestore_donestoring(state)    ((void) 0)
v2 attached - I will add to next CF in case you want to defer it until later.

--
Justin

v2-0001-SRF-avoid-leaking-resources-if-not-run-to-complet.patch (29K) Download Attachment
v2-0002-Clean-up-existing-anti-pattern-of-freeing-SRF-res.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
> v2 attached - I will add to next CF in case you want to defer it until later.

Thanks, reviewed and pushed.  Since this is a bug fix (at least in part)
I didn't want to wait.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Justin Pryzby
On Mon, Mar 16, 2020 at 09:38:50PM -0400, Tom Lane wrote:
> Justin Pryzby <[hidden email]> writes:
> > v2 attached - I will add to next CF in case you want to defer it until later.
>
> Thanks, reviewed and pushed.  Since this is a bug fix (at least in part)
> I didn't want to wait.

Thanks for fixing my test case and pushing.

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
> Thanks for fixing my test case and pushing.

The buildfarm just showed up another instability in the test cases
we added:

=========================== regression.diffs ================
diff -U3 /home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/../pgsql/src/test/regress/expected/misc_functions.out /home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/misc_functions.out
--- /home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/../pgsql/src/test/regress/expected/misc_functions.out 2020-03-17 08:14:50.292037956 +0100
+++ /home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/misc_functions.out 2020-03-28 13:55:12.490024822 +0100
@@ -169,11 +169,7 @@
 
 select (w).size = :segsize as ok
 from (select pg_ls_waldir() w) ss where length((w).name) = 24 limit 1;
- ok
-----
- t
-(1 row)
-
+ERROR:  could not stat file "pg_wal/000000010000000000000078": No such file or directory
 select count(*) >= 0 as ok from pg_ls_archive_statusdir();
  ok
 ----

It's pretty obvious what happened here: concurrent activity renamed or
removed the WAL segment between when we saw it in the directory and
when we tried to stat() it.

This seems like it would be just as much of a hazard for field usage
as it is for regression testing, so I propose that we fix these
directory-scanning functions to silently ignore ENOENT failures from
stat().  Are there any for which we should not do that?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Justin Pryzby
On Sat, Mar 28, 2020 at 01:13:54PM -0400, Tom Lane wrote:
> The buildfarm just showed up another instability in the test cases
> we added:

Yea, as you said, this is an issue with the *testcase*.  The function behavior
didn't change, we just weren't previously exercising it.

>  select (w).size = :segsize as ok
>  from (select pg_ls_waldir() w) ss where length((w).name) = 24 limit 1;
> - ok
> -----
> - t
> -(1 row)
> -
> +ERROR:  could not stat file "pg_wal/000000010000000000000078": No such file or directory
>  select count(*) >= 0 as ok from pg_ls_archive_statusdir();
>   ok
>  ----
>
> It's pretty obvious what happened here: concurrent activity renamed or
> removed the WAL segment between when we saw it in the directory and
> when we tried to stat() it.
>
> This seems like it would be just as much of a hazard for field usage
> as it is for regression testing,

That's clearly true for pg_ls_waldir(), which call pg_ls_dir_files, and
includes some metadata columns.

> so I propose that we fix these directory-scanning functions to silently
> ignore ENOENT failures from stat().  Are there any for which we should not do
> that?

I think it's reasonable to ignore transient ENOENT for tmpdir, logdir, and
probably archive_statusdir.  That doesn't currently affect pg_ls_dir(), which
lists file but not metadata for an arbitrary dir, so doesn't call stat().

Note that dangling links in the other functions currently cause (wrong [0])
error.  I guess it should be documented if broken link is will be ignored due
to ENOENT.

Maybe we should lstat() the file to determine if it's a dangling link; if
lstat() fails, then skip it.  Currently, we use stat(), which shows metdata of
a link's *target*.  Maybe we'd change that.

Note that I have a patch which generalizes pg_ls_dir_files and makes
pg_ls_dir() a simple wrapper, so if that's pursued, they would behave the same
unless I add another flag to do otherwise (but behaving the same has its
merits).  It already uses lstat() to show links to dirs as isdir=no, which was
needed to avoid recursing into links-to-dirs in the new helper function
pg_ls_dir_recurse().  https://commitfest.postgresql.org/26/2377/

[0] Which you fixed in 085b6b667 and I previously fixed at:
https://www.postgresql.org/message-id/attachment/106478/v2-0001-BUG-in-errmsg.patch
|$ sudo ln -s foo /var/log/postgresql/bar
|ts=# SELECT * FROM pg_ls_logdir() ORDER BY 3;
|ERROR:  could not stat directory "/var/log/postgresql": No such file or directory


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
> On Sat, Mar 28, 2020 at 01:13:54PM -0400, Tom Lane wrote:
>> so I propose that we fix these directory-scanning functions to silently
>> ignore ENOENT failures from stat().  Are there any for which we should not do
>> that?

> Maybe we should lstat() the file to determine if it's a dangling link; if
> lstat() fails, then skip it.  Currently, we use stat(), which shows metdata of
> a link's *target*.  Maybe we'd change that.

Hm, good point that ENOENT could refer to a symlink's target.  Still,
I'm not sure it's worth going out of our way to disambiguate that,
given that these directories aren't really supposed to contain symlinks.
(And on the third hand, if they aren't supposed to, then maybe these
functions needn't look through any symlinks?  In which case just
substituting lstat for stat would resolve the ambiguity.)

> Note that I have a patch which generalizes pg_ls_dir_files and makes
> pg_ls_dir() a simple wrapper, so if that's pursued, they would behave the same
> unless I add another flag to do otherwise (but behaving the same has its
> merits).  It already uses lstat() to show links to dirs as isdir=no, which was
> needed to avoid recursing into links-to-dirs in the new helper function
> pg_ls_dir_recurse().  https://commitfest.postgresql.org/26/2377/

I think we need a back-patchable fix for the ENOENT failure, seeing that
we back-patched the new regression test; intermittent buildfarm failures
are no fun in any branch.  So new functions aren't too relevant here,
although it's fair to look ahead at whether the same behavior will be
appropriate for them.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
I wrote:
> Justin Pryzby <[hidden email]> writes:
>> Maybe we should lstat() the file to determine if it's a dangling link; if
>> lstat() fails, then skip it.  Currently, we use stat(), which shows metdata of
>> a link's *target*.  Maybe we'd change that.

> Hm, good point that ENOENT could refer to a symlink's target.  Still,
> I'm not sure it's worth going out of our way to disambiguate that,
> given that these directories aren't really supposed to contain symlinks.
> (And on the third hand, if they aren't supposed to, then maybe these
> functions needn't look through any symlinks?  In which case just
> substituting lstat for stat would resolve the ambiguity.)

After looking at the callers of pg_ls_dir_files, and noticing that
it's already defined to ignore anything that's not a regular file,
I think switching to lstat makes sense.

I also grepped the other uses of ReadDir[Extended], and didn't see
any other ones that seemed desperately in need of changing.

So the attached seems like a sufficient fix.

                        regards, tom lane


diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 01185f2..8429a12 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -596,10 +596,15 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
 
  /* Get the file info */
  snprintf(path, sizeof(path), "%s/%s", dir, de->d_name);
- if (stat(path, &attrib) < 0)
+ if (lstat(path, &attrib) < 0)
+ {
+ /* Ignore concurrently-deleted files, else complain */
+ if (errno == ENOENT)
+ continue;
  ereport(ERROR,
  (errcode_for_file_access(),
  errmsg("could not stat file \"%s\": %m", path)));
+ }
 
  /* Ignore anything but regular files */
  if (!S_ISREG(attrib.st_mode))
Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Justin Pryzby
On Sun, Mar 29, 2020 at 12:37:05PM -0400, Tom Lane wrote:

> I wrote:
> > Justin Pryzby <[hidden email]> writes:
> >> Maybe we should lstat() the file to determine if it's a dangling link; if
> >> lstat() fails, then skip it.  Currently, we use stat(), which shows metdata of
> >> a link's *target*.  Maybe we'd change that.
>
> > Hm, good point that ENOENT could refer to a symlink's target.  Still,
> > I'm not sure it's worth going out of our way to disambiguate that,
> > given that these directories aren't really supposed to contain symlinks.
> > (And on the third hand, if they aren't supposed to, then maybe these
> > functions needn't look through any symlinks?  In which case just
> > substituting lstat for stat would resolve the ambiguity.)
>
> After looking at the callers of pg_ls_dir_files, and noticing that
> it's already defined to ignore anything that's not a regular file,
> I think switching to lstat makes sense.

Yea, only pg_ls_dir() shows special file types (and currently the others even
hide dirs).

The essence of your patch is to ignore ENOENT, but you also changed to use
lstat(), which seems unrelated.  That means we'll now hide (non-broken)
symlinks.  Is that intentional/needed ?  I guess maybe you're trying to fix the
bug (?) that symlinks aren't skipped?  If so, I guess it should be a separate
commit, or the commit message should say so.  I think the doc update is already
handled by: 8b6d94cf6c8319bfd6bebf8b863a5db586c19c3b (we didn't used to say we
skipped specials, and now we say we do, and we'll to follow through RSN and
actually do it, too).

> diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
> index 01185f2..8429a12 100644
> --- a/src/backend/utils/adt/genfile.c
> +++ b/src/backend/utils/adt/genfile.c
> @@ -596,10 +596,15 @@ pg_ls_dir_files(FunctionCallInfo fcinfo, const char *dir, bool missing_ok)
>  
>   /* Get the file info */
>   snprintf(path, sizeof(path), "%s/%s", dir, de->d_name);
> - if (stat(path, &attrib) < 0)
> + if (lstat(path, &attrib) < 0)
> + {
> + /* Ignore concurrently-deleted files, else complain */
> + if (errno == ENOENT)
> + continue;
>   ereport(ERROR,
>   (errcode_for_file_access(),
>   errmsg("could not stat file \"%s\": %m", path)));
> + }
>  
>   /* Ignore anything but regular files */
>   if (!S_ISREG(attrib.st_mode))



Reply | Threaded
Open this post in threaded view
|

Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

Tom Lane-2
Justin Pryzby <[hidden email]> writes:
> On Sun, Mar 29, 2020 at 12:37:05PM -0400, Tom Lane wrote:
>> After looking at the callers of pg_ls_dir_files, and noticing that
>> it's already defined to ignore anything that's not a regular file,
>> I think switching to lstat makes sense.

> Yea, only pg_ls_dir() shows special file types (and currently the others even
> hide dirs).

> The essence of your patch is to ignore ENOENT, but you also changed to use
> lstat(), which seems unrelated.  That means we'll now hide (non-broken)
> symlinks.  Is that intentional/needed ?

Well, the following comment says "ignore anything but regular files",
so I'm supposing that that is the behavior that we actually want here
and failed to implement correctly.  There might be scope for
additional directory-reading functions, but I'd think you'd want
more information (such as the file type) returned from anything
that doesn't act this way.

In practice, since these directories shouldn't contain symlinks,
it's likely moot.  The only place in PG data directories where
we actually expect symlinks is pg_tablespace ... and that contains
symlinks to directories, so that this function would ignore them
anyway.

                        regards, tom lane


12