Emacs vs pg_indent's weird indentation for function declarations

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Emacs vs pg_indent's weird indentation for function declarations

Thomas Munro-3
Hello,

Using either the .dir-locals.el settings or the "more complete"
src/tools/editors/emacs.samples, I have never convinced Emacs to
produce multi-line function declarations in .h files that satisfy
pg_indent.

For example, latch.c has the following definition:

int
WaitEventSetWait(WaitEventSet *set, long timeout,
                 WaitEvent *occurred_events, int nevents,
                 uint32 wait_event_info)

And now let's see the declaration in latch.h:

extern int WaitEventSetWait(WaitEventSet *set, long timeout,
                 WaitEvent *occurred_events, int nevents,
                 uint32 wait_event_info);

(I replaced all tabs with four space here; if you're reading in a
monospace font, you'll see that the first arguments off all lines
begin in the same column in the definition by not in the declaration.)

For a while I've been baffled by that: the first arguments of later
lines don't line up with that of the first line, but they're also not
in a constant column (it varies from function to function), and it's
also not caused by 8-space vs 4-space confusion.  It was only when I
put those two things next to each other just now in this email that I
finally spotted the logic it's using: if you remove "extern int " then
the later lines line up with the first argument of the top line.  This
works for other examples I looked at too.  Huh.

That's ... annoying.  I wish indent wouldn't do that, because it means
that my declarations get moved around every time I write code.  But if
it's not possible to change that for whatever technical or political
reason, I wonder if it's possible to teach Emacs to understand that
weird rule...

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> For a while I've been baffled by that: the first arguments of later
> lines don't line up with that of the first line, but they're also not
> in a constant column (it varies from function to function), and it's
> also not caused by 8-space vs 4-space confusion.  It was only when I
> put those two things next to each other just now in this email that I
> finally spotted the logic it's using: if you remove "extern int " then
> the later lines line up with the first argument of the top line.  This
> works for other examples I looked at too.  Huh.

Yeah.  I suspect that the underlying cause is that pgindent doesn't
really distinguish function declarations from definitions, at least
not when it comes time to indent the lines after the first one.

> That's ... annoying.  I wish indent wouldn't do that, because it means
> that my declarations get moved around every time I write code.

If you can fix it, I'd vote for accepting the patch.  I don't personally
have the desire to dig into the indent code that much ...

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Michael Paquier-2
On Mon, Jan 28, 2019 at 12:28:30AM -0500, Tom Lane wrote:
> Thomas Munro <[hidden email]> writes:
>> That's ... annoying.  I wish indent wouldn't do that, because it means
>> that my declarations get moved around every time I write code.
>
> If you can fix it, I'd vote for accepting the patch.  I don't personally
> have the desire to dig into the indent code that much ...

If you could get pgindent smarter in this area, it would be really
nice..
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Thomas Munro-3
On Mon, Jan 28, 2019 at 8:08 PM Michael Paquier <[hidden email]> wrote:

> On Mon, Jan 28, 2019 at 12:28:30AM -0500, Tom Lane wrote:
> > Thomas Munro <[hidden email]> writes:
> >> That's ... annoying.  I wish indent wouldn't do that, because it means
> >> that my declarations get moved around every time I write code.
> >
> > If you can fix it, I'd vote for accepting the patch.  I don't personally
> > have the desire to dig into the indent code that much ...
>
> If you could get pgindent smarter in this area, it would be really
> nice..
Ah, it's not indent doing it, it's pgindent's post_indent subroutine
trying to correct the effects of the (implied) -psl option, but not
doing a complete job of it (it should adjust the indentation lines of
later lines if it changes the first line).

One idea I had was to tell indent not to do that by using -npsl when
processing headers, like in the attached.  That fixes all the headers
I looked at, though of course it doesn't fix the static function
declarations that appear in .c files, so it's not quite the right
answer.

--
Thomas Munro
http://www.enterprisedb.com

hack-pgindent.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Thomas Munro-5
On Mon, Jan 28, 2019 at 9:48 PM Thomas Munro
<[hidden email]> wrote:

> On Mon, Jan 28, 2019 at 8:08 PM Michael Paquier <[hidden email]> wrote:
> > If you could get pgindent smarter in this area, it would be really
> > nice..
>
> Ah, it's not indent doing it, it's pgindent's post_indent subroutine
> trying to correct the effects of the (implied) -psl option, but not
> doing a complete job of it (it should adjust the indentation lines of
> later lines if it changes the first line).
>
> One idea I had was to tell indent not to do that by using -npsl when
> processing headers, like in the attached.  That fixes all the headers
> I looked at, though of course it doesn't fix the static function
> declarations that appear in .c files, so it's not quite the right
> answer.
I tried teaching pgindent's post_indent subroutine to unmangle the
multi-line declarations it mangles.  That produces correct
indentation!  But can also produce lines that exceed the column limit
we would normally wrap at (of course, because pg_bsd_indent had less
whitespace on the left when it made wrapping decisions).  Doh.
Attached for posterity, but it's useless.

So I think pg_bsd_indent itself needs to be fixed.  I think I know
where the problem is.  lexi.c isn't looking far ahead enough to
recognise multi-line function declarations:

        if (*buf_ptr == '(' && state->tos <= 1 && state->ind_level == 0 &&
            state->in_parameter_declaration == 0 && state->block_init == 0) {
            char *tp = buf_ptr;
            while (tp < buf_end)
                if (*tp++ == ')' && (*tp == ';' || *tp == ','))
                    goto not_proc;
            strncpy(state->procname, token, sizeof state->procname - 1);
            if (state->in_decl)
                state->in_parameter_declaration = 1;
            return (funcname);
    not_proc:;
        }

That loop that looks for ')' followed by ';' is what causes the lexer
to conclude that the "foo" is an "ident" rather than a "funcname",
given the following input:

  extern void foo(int i);

But if because buf_end ends at the newline, it can't see the
semi-colon and concludes that "foo" is a "funcname" here:

  extern void foo(int i,
                  int j);

That determination causes indent.c's procnames_start_line (-psl) mode
to put "extern void" on its own line here and stop thinking of it as a
declaration:

    case funcname:
    case ident:     /* got an identifier or constant */
        if (ps.in_decl) {
            if (type_code == funcname) {
                ps.in_decl = false;
                if (procnames_start_line && s_code != e_code) {
                    *e_code = '\0';
                    dump_line();
                }

I guess it'd need something smarter than fill_buffer() to see far
enough, but simply loading N lines at once wouldn't be enough because
you could still happen to be looking at the final line in the buffer;
you'd probably need a sliding window.  I'm not planning on trying to
fix that myself in the short term, but since it annoys me every time I
commit anything, I couldn't resist figuring out where it's coming from
at least...

--
Thomas Munro
https://enterprisedb.com

0001-Fix-pgindent-s-postprocessing-of-multi-line-prototyp.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> I tried teaching pgindent's post_indent subroutine to unmangle the
> multi-line declarations it mangles.  That produces correct
> indentation!  But can also produce lines that exceed the column limit
> we would normally wrap at (of course, because pg_bsd_indent had less
> whitespace on the left when it made wrapping decisions).  Doh.
> Attached for posterity, but it's useless.

> So I think pg_bsd_indent itself needs to be fixed.  I think I know
> where the problem is.  lexi.c isn't looking far ahead enough to
> recognise multi-line function declarations:

I experimented with fixing this.  I was able to get pg_bsd_indent to
distinguish multi-line function declarations from definitions, but it
turns out that it doesn't help your concern about the lines being too
long after re-indenting.  Contrary to what you imagine above, it seems
pg_bsd_indent will not reflow argument lists, regardless of whether it
thinks there needs to be more or less leading whitespace.  I'm a bit
surprised that -bc doesn't cause that to happen, but it doesn't (and I'm
not sure we'd really want to force one-parameter-per-line, anyway).

Anyway, the attached hasty-and-undercommented change to pg_bsd_indent
allows removal of the "Move prototype names to the same line as return
type" hack in pgindent, and we then get prototypes with properly
lined-up arguments, but we'll have a lot of places with over-length
lines needing manual fixing.  Unless somebody wants to find where to
fix that in pg_bsd_indent, but I've had my fill of looking at that
spaghetti code for today.

                        regards, tom lane


diff --git a/indent.h b/indent.h
index 0fffd89..1708dbc 100644
--- a/indent.h
+++ b/indent.h
@@ -41,6 +41,8 @@ void diag2(int, const char *);
 void diag3(int, const char *, int);
 void diag4(int, const char *, int, int);
 void dump_line(void);
+int lookahead(void);
+void lookahead_reset(void);
 void fill_buffer(void);
 void parse(int);
 void pr_comment(void);
diff --git a/io.c b/io.c
index df11094..8d13a52 100644
--- a/io.c
+++ b/io.c
@@ -51,6 +51,13 @@ static char sccsid[] = "@(#)io.c 8.1 (Berkeley) 6/6/93";
 
 int         comment_open;
 static int  paren_target;
+
+static char *lookahead_buf; /* malloc'd buffer, or NULL initially */
+static char *lookahead_buf_end; /* end+1 of allocated space */
+static char *lookahead_start; /* => next char for fill_buffer() to fetch */
+static char *lookahead_ptr; /* => next char for lookahead() to fetch */
+static char *lookahead_end; /* last+1 valid char in lookahead_buf */
+
 static int pad_output(int current, int target);
 
 void
@@ -252,6 +259,58 @@ compute_label_target(void)
  : ps.ind_size * (ps.ind_level - label_offset) + 1;
 }
 
+/*
+ * Read data ahead of what has been collected into in_buffer.
+ *
+ * Successive calls get further and further ahead, until we hit EOF.
+ * Call lookahead_reset to rescan from just beyond in_buffer.
+ */
+int
+lookahead(void)
+{
+    while (lookahead_ptr >= lookahead_end) {
+      int i = getc(input);
+
+      if (i == EOF)
+ return i;
+      if (i == '\0')
+ continue; /* fill_buffer drops nulls, so do we */
+
+      if (lookahead_end >= lookahead_buf_end) {
+ /* Need to allocate or enlarge lookahead_buf */
+ char *new_buf;
+ size_t req;
+
+ if (lookahead_buf == NULL) {
+  req = 64;
+  new_buf = malloc(req);
+ } else {
+  req = (lookahead_buf_end - lookahead_buf) * 2;
+  new_buf = realloc(lookahead_buf, req);
+ }
+ if (new_buf == NULL)
+  errx(1, "too much lookahead required");
+ lookahead_start = new_buf + (lookahead_start - lookahead_buf);
+ lookahead_ptr = new_buf + (lookahead_ptr - lookahead_buf);
+ lookahead_end = new_buf + (lookahead_end - lookahead_buf);
+ lookahead_buf = new_buf;
+ lookahead_buf_end = new_buf + req;
+      }
+
+      *lookahead_end++ = i;
+    }
+    return (unsigned char) *lookahead_ptr++;
+}
+
+/*
+ * Reset so that lookahead() will again scan from just beyond what's in
+ * in_buffer.
+ */
+void
+lookahead_reset(void)
+{
+    lookahead_ptr = lookahead_start;
+}
 
 /*
  * Copyright (C) 1976 by the Board of Trustees of the University of Illinois
@@ -293,11 +352,16 @@ fill_buffer(void)
     p = in_buffer + offset;
     in_buffer_limit = in_buffer + size - 2;
  }
- if ((i = getc(f)) == EOF) {
- *p++ = ' ';
- *p++ = '\n';
- had_eof = true;
- break;
+ if (lookahead_start < lookahead_end) {
+  i = (unsigned char) *lookahead_start++;
+ } else {
+  lookahead_start = lookahead_ptr = lookahead_end = lookahead_buf;
+  if ((i = getc(f)) == EOF) {
+    *p++ = ' ';
+    *p++ = '\n';
+    had_eof = true;
+    break;
+  }
  }
  if (i != '\0')
     *p++ = i;
diff --git a/lexi.c b/lexi.c
index 3c7bfef..e637e1a 100644
--- a/lexi.c
+++ b/lexi.c
@@ -148,6 +148,39 @@ strcmp_type(const void *e1, const void *e2)
     return (strcmp(e1, *(const char * const *)e2));
 }
 
+/*
+ * Scan over a function argument declaration list, then see if it is
+ * followed by ';' or ',' indicating that it's just a prototype.
+ *
+ * We do not detect comments, so you can fool this by putting unbalanced
+ * parens inside a comment within the argument list.  So don't do that.
+ */
+static int
+is_prototype(char *tp)
+{
+  int paren_depth = 0;
+
+  lookahead_reset();
+  for (;;) {
+    int c;
+
+    if (tp < buf_end)
+      c = *tp++;
+    else {
+ c = lookahead();
+ if (c == EOF)
+  break;
+    }
+    if (c == '(')
+      paren_depth++;
+    else if (c == ')')
+      paren_depth--;
+    else if (paren_depth == 0 && !isspace((unsigned char) c))
+      return (c == ';' || c == ',');
+  }
+  return false;
+}
+
 int
 lexi(struct parser_state *state)
 {
@@ -348,15 +381,12 @@ lexi(struct parser_state *state)
  } /* end of if (found_it) */
  if (*buf_ptr == '(' && state->tos <= 1 && state->ind_level == 0 &&
     state->in_parameter_declaration == 0 && state->block_init == 0) {
-    char *tp = buf_ptr;
-    while (tp < buf_end)
- if (*tp++ == ')' && (*tp == ';' || *tp == ','))
-    goto not_proc;
+  if (!is_prototype(buf_ptr)) {
     strncpy(state->procname, token, sizeof state->procname - 1);
     if (state->in_decl)
  state->in_parameter_declaration = 1;
     return (funcname);
-    not_proc:;
+  }
  }
  /*
  * The following hack attempts to guess whether or not the current
Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Tom Lane-2
I wrote:
> I experimented with fixing this.  I was able to get pg_bsd_indent to
> distinguish multi-line function declarations from definitions, but it
> turns out that it doesn't help your concern about the lines being too
> long after re-indenting.  Contrary to what you imagine above, it seems
> pg_bsd_indent will not reflow argument lists, regardless of whether it
> thinks there needs to be more or less leading whitespace.

Actually, now that I think about it, pgindent seldom revisits line-break
decisions in code anyway; it's only aggressive about reflowing comments.
So maybe we shouldn't be expecting it to fix this.

Also, looking at sample results (attached), it seems like we don't have
such a large problem as one might guess.  It looks like people have
mostly formatted declarations to work with this indentation already,
either because their editors did it automatically, or because they were
thinking that this might get fixed someday.  (I know I've often had that
in the back of my mind.)  There are some places in the attached diff
that I might take the trouble to clean up manually, but not that many.

I found out that my initial draft of a multi-line lookahead function was
not nearly smart enough to survive contact with the PG source corpus,
but after rejiggering it to cope with comments and attributes, it seems
to do quite well.

A small problem with the "rejiggering" is that it now makes the wrong
choice for K&R-style function definitions, causing them to be weirdly
indented.  For our purposes, that's a non-problem so I'm not excited
about trying to make it smart enough to recognize those.  We do have
a couple of amazingly old and crufty K&R-style functions in src/port/,
though, so probably we'd wish to fix those.

Attached is working draft of pg_bsd_indent changes (still sans comments)
as well as a patch showing the difference between current pgindent
results on HEAD and the results of this version.  I think there's no
question that this is an improvement.

                        regards, tom lane


diff --git a/indent.h b/indent.h
index 0fffd89..1708dbc 100644
--- a/indent.h
+++ b/indent.h
@@ -41,6 +41,8 @@ void diag2(int, const char *);
 void diag3(int, const char *, int);
 void diag4(int, const char *, int, int);
 void dump_line(void);
+int lookahead(void);
+void lookahead_reset(void);
 void fill_buffer(void);
 void parse(int);
 void pr_comment(void);
diff --git a/io.c b/io.c
index df11094..8d13a52 100644
--- a/io.c
+++ b/io.c
@@ -51,6 +51,13 @@ static char sccsid[] = "@(#)io.c 8.1 (Berkeley) 6/6/93";
 
 int         comment_open;
 static int  paren_target;
+
+static char *lookahead_buf; /* malloc'd buffer, or NULL initially */
+static char *lookahead_buf_end; /* end+1 of allocated space */
+static char *lookahead_start; /* => next char for fill_buffer() to fetch */
+static char *lookahead_ptr; /* => next char for lookahead() to fetch */
+static char *lookahead_end; /* last+1 valid char in lookahead_buf */
+
 static int pad_output(int current, int target);
 
 void
@@ -252,6 +259,58 @@ compute_label_target(void)
  : ps.ind_size * (ps.ind_level - label_offset) + 1;
 }
 
+/*
+ * Read data ahead of what has been collected into in_buffer.
+ *
+ * Successive calls get further and further ahead, until we hit EOF.
+ * Call lookahead_reset to rescan from just beyond in_buffer.
+ */
+int
+lookahead(void)
+{
+    while (lookahead_ptr >= lookahead_end) {
+      int i = getc(input);
+
+      if (i == EOF)
+ return i;
+      if (i == '\0')
+ continue; /* fill_buffer drops nulls, so do we */
+
+      if (lookahead_end >= lookahead_buf_end) {
+ /* Need to allocate or enlarge lookahead_buf */
+ char *new_buf;
+ size_t req;
+
+ if (lookahead_buf == NULL) {
+  req = 64;
+  new_buf = malloc(req);
+ } else {
+  req = (lookahead_buf_end - lookahead_buf) * 2;
+  new_buf = realloc(lookahead_buf, req);
+ }
+ if (new_buf == NULL)
+  errx(1, "too much lookahead required");
+ lookahead_start = new_buf + (lookahead_start - lookahead_buf);
+ lookahead_ptr = new_buf + (lookahead_ptr - lookahead_buf);
+ lookahead_end = new_buf + (lookahead_end - lookahead_buf);
+ lookahead_buf = new_buf;
+ lookahead_buf_end = new_buf + req;
+      }
+
+      *lookahead_end++ = i;
+    }
+    return (unsigned char) *lookahead_ptr++;
+}
+
+/*
+ * Reset so that lookahead() will again scan from just beyond what's in
+ * in_buffer.
+ */
+void
+lookahead_reset(void)
+{
+    lookahead_ptr = lookahead_start;
+}
 
 /*
  * Copyright (C) 1976 by the Board of Trustees of the University of Illinois
@@ -293,11 +352,16 @@ fill_buffer(void)
     p = in_buffer + offset;
     in_buffer_limit = in_buffer + size - 2;
  }
- if ((i = getc(f)) == EOF) {
- *p++ = ' ';
- *p++ = '\n';
- had_eof = true;
- break;
+ if (lookahead_start < lookahead_end) {
+  i = (unsigned char) *lookahead_start++;
+ } else {
+  lookahead_start = lookahead_ptr = lookahead_end = lookahead_buf;
+  if ((i = getc(f)) == EOF) {
+    *p++ = ' ';
+    *p++ = '\n';
+    had_eof = true;
+    break;
+  }
  }
  if (i != '\0')
     *p++ = i;
diff --git a/lexi.c b/lexi.c
index 3c7bfef..df71a20 100644
--- a/lexi.c
+++ b/lexi.c
@@ -148,6 +148,54 @@ strcmp_type(const void *e1, const void *e2)
     return (strcmp(e1, *(const char * const *)e2));
 }
 
+/*
+ * Scan over a function argument declaration list, then see if it is
+ * followed by '{', indicating that it's a function definition.
+ * If it's followed by ';' or ',', it's a prototype.  Otherwise keep
+ * scanning, because there might be whitespace, comments, or attribute
+ * declarations before we get to the telltale punctuation.
+ *
+ * Note that this will make the wrong decision for a K&R-style function
+ * definition.  Too bad.
+ */
+static int
+is_func_definition(char *tp)
+{
+  int paren_depth = 0;
+  int in_comment = false;
+  int lastc = 0;
+
+  lookahead_reset();
+  for (;;) {
+    int c;
+
+    if (tp < buf_end)
+      c = *tp++;
+    else {
+ c = lookahead();
+ if (c == EOF)
+  break;
+    }
+    if (in_comment) {
+      if (lastc == '*' && c == '/')
+ in_comment = false;
+    } else if (lastc == '/' && c == '*')
+      in_comment = true;
+    else if (c == '(')
+      paren_depth++;
+    else if (c == ')') {
+      paren_depth--;
+      if (paren_depth < 0)
+ return false;
+    } else if (paren_depth == 0 && c == '{')
+      return true;
+    else if (paren_depth == 0 && (c == ';' || c == ','))
+      return false;
+    lastc = c;
+  }
+  return false;
+}
+
 int
 lexi(struct parser_state *state)
 {
@@ -348,15 +396,12 @@ lexi(struct parser_state *state)
  } /* end of if (found_it) */
  if (*buf_ptr == '(' && state->tos <= 1 && state->ind_level == 0 &&
     state->in_parameter_declaration == 0 && state->block_init == 0) {
-    char *tp = buf_ptr;
-    while (tp < buf_end)
- if (*tp++ == ')' && (*tp == ';' || *tp == ','))
-    goto not_proc;
+  if (is_func_definition(buf_ptr)) {
     strncpy(state->procname, token, sizeof state->procname - 1);
     if (state->in_decl)
  state->in_parameter_declaration = 1;
     return (funcname);
-    not_proc:;
+  }
  }
  /*
  * The following hack attempts to guess whether or not the current
diff --git a/tests/declarations.0.stdout b/tests/declarations.0.stdout
index e97e447..ec63596 100644
--- a/tests/declarations.0.stdout
+++ b/tests/declarations.0.stdout
@@ -64,10 +64,10 @@ static int ald_shutingdown = 0;
 struct thread  *ald_thread;
 
 static int
-do_execve(td, args, mac_p)
- struct thread  *td;
- struct image_args *args;
- struct mac     *mac_p;
+ do_execve(td, args, mac_p)
+struct thread  *td;
+struct image_args *args;
+struct mac     *mac_p;
 {
 
 }
diff --git a/tests/list_head.0.stdout b/tests/list_head.0.stdout
index b6f0762..a8f985f 100644
--- a/tests/list_head.0.stdout
+++ b/tests/list_head.0.stdout
@@ -1,10 +1,10 @@
 /* $FreeBSD$ */
 /* See r309380 */
 static int
-do_execve(td, args, mac_p)
- struct thread  *td;
- struct image_args *args;
- struct mac     *mac_p;
+ do_execve(td, args, mac_p)
+struct thread  *td;
+struct image_args *args;
+struct mac     *mac_p;
 {
 
 }

indent-delta-results.patch.gz (335K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Thomas Munro-5
On Thu, May 16, 2019 at 9:17 AM Tom Lane <[hidden email]> wrote:

> I wrote:
> > I experimented with fixing this.  I was able to get pg_bsd_indent to
> > distinguish multi-line function declarations from definitions, but it
> > turns out that it doesn't help your concern about the lines being too
> > long after re-indenting.  Contrary to what you imagine above, it seems
> > pg_bsd_indent will not reflow argument lists, regardless of whether it
> > thinks there needs to be more or less leading whitespace.
>
> Actually, now that I think about it, pgindent seldom revisits line-break
> decisions in code anyway; it's only aggressive about reflowing comments.
> So maybe we shouldn't be expecting it to fix this.

Ahh, so my quick and dirty perl change was probably OK then.  Sorry I
didn't realise that before, but this certainly a better way, fixing
the right thing.

> A small problem with the "rejiggering" is that it now makes the wrong
> choice for K&R-style function definitions, causing them to be weirdly
> indented.  For our purposes, that's a non-problem so I'm not excited
> about trying to make it smart enough to recognize those.  We do have
> a couple of amazingly old and crufty K&R-style functions in src/port/,
> though, so probably we'd wish to fix those.

What kid of fork is pg_bsd_indent... do we care about upstreaming
changes?  I guess someone would need to deal with that case eventually
if so.

> Attached is working draft of pg_bsd_indent changes (still sans comments)
> as well as a patch showing the difference between current pgindent
> results on HEAD and the results of this version.  I think there's no
> question that this is an improvement.

+1

Thanks for looking into it!

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> On Thu, May 16, 2019 at 9:17 AM Tom Lane <[hidden email]> wrote:
>> A small problem with the "rejiggering" is that it now makes the wrong
>> choice for K&R-style function definitions, causing them to be weirdly
>> indented.  For our purposes, that's a non-problem so I'm not excited
>> about trying to make it smart enough to recognize those.  We do have
>> a couple of amazingly old and crufty K&R-style functions in src/port/,
>> though, so probably we'd wish to fix those.

> What kid of fork is pg_bsd_indent... do we care about upstreaming
> changes?  I guess someone would need to deal with that case eventually
> if so.

We regard the FreeBSD copy as upstream, and I think we're mostly in sync
with that but only mostly.  So it would come down to whether the FreeBSD
maintainer is worried about K&R mode and what he wants to do about that.
Piotr, that's still you isn't it?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Tom Lane-2
I wrote:
>>> A small problem with the "rejiggering" is that it now makes the wrong
>>> choice for K&R-style function definitions, causing them to be weirdly
>>> indented.  For our purposes, that's a non-problem so I'm not excited
>>> about trying to make it smart enough to recognize those.  We do have
>>> a couple of amazingly old and crufty K&R-style functions in src/port/,
>>> though, so probably we'd wish to fix those.

To be concrete about that: the existing behavior when trying to decide
whether "foo(" starts a function declaration or a function definition
is to scan to the matching right paren[1] and then, if the immediately
next character is ';' or ',', it's a declaration; else it's a definition.
This fails on decls with attributes, such as

extern void pg_re_throw(void) __attribute__((noreturn));

My patch changes the rule to be "scan until we see a ';' ',' or '{'
that's not within parens; then it's a definition if that character
is '{', otherwise a declaration".  So it gets the right answer
for the above, but the wrong answer for

int
isinf(x)
double x;
{

It doesn't really seem practical to me to make the lookahead function
smart enough to tell the difference between attributes and K&R-style
parameter declarations.  What I'm thinking of doing to have an
upstreamable patch is to invent a new switch, perhaps '-kr'/'-nkr',
to indicate whether the user is more worried about K&R function
declarations than she is about function attributes.  The default
for this switch could be debated perhaps, but I'd just stick an
explicit -nkr into pgindent's invocation, so we wouldn't care too
much.  (This would also ensure that pgindent would fail if you
try to use a too-old copy of pg_bsd_indent...)

                        regards, tom lane

[1] Um, well, actually the existing behavior is to scan to the
*first* right paren.  But upgrading it to count parens correctly
seems like an unobjectionable improvement in any case.


Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Piotr Stefaniak
On 17/05/2019 16.48, Tom Lane wrote:
> I wrote:
>>>> A small problem with the "rejiggering" is that it now makes the wrong
>>>> choice for K&R-style function definitions, causing them to be weirdly
>>>> indented.  For our purposes, that's a non-problem so I'm not excited
>>>> about trying to make it smart enough to recognize those.  We do have
>>>> a couple of amazingly old and crufty K&R-style functions in src/port/,
>>>> though, so probably we'd wish to fix those.

> It doesn't really seem practical to me to make the lookahead function
> smart enough to tell the difference between attributes and K&R-style
> parameter declarations.  What I'm thinking of doing to have an
> upstreamable patch is to invent a new switch, perhaps '-kr'/'-nkr',
> to indicate whether the user is more worried about K&R function
> declarations than she is about function attributes.

I think it's safe to assume that upstream can drop support for K&R-style
parameters altogether.
Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Tom Lane-2
Piotr Stefaniak <[hidden email]> writes:
> On 17/05/2019 16.48, Tom Lane wrote:
>> It doesn't really seem practical to me to make the lookahead function
>> smart enough to tell the difference between attributes and K&R-style
>> parameter declarations.  What I'm thinking of doing to have an
>> upstreamable patch is to invent a new switch, perhaps '-kr'/'-nkr',
>> to indicate whether the user is more worried about K&R function
>> declarations than she is about function attributes.

> I think it's safe to assume that upstream can drop support for K&R-style
> parameters altogether.

Cool.  I already created the switch, but maybe we could have it
default to -nkr?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Piotr Stefaniak
On 19/05/2019 19.27, Tom Lane wrote:

> Piotr Stefaniak <[hidden email]> writes:
>> On 17/05/2019 16.48, Tom Lane wrote:
>>> It doesn't really seem practical to me to make the lookahead function
>>> smart enough to tell the difference between attributes and K&R-style
>>> parameter declarations.  What I'm thinking of doing to have an
>>> upstreamable patch is to invent a new switch, perhaps '-kr'/'-nkr',
>>> to indicate whether the user is more worried about K&R function
>>> declarations than she is about function attributes.
>
>> I think it's safe to assume that upstream can drop support for K&R-style
>> parameters altogether.
>
> Cool.  I already created the switch, but maybe we could have it
> default to -nkr?

Sorry, but GNU indent already uses -kr for something else and I would
like FreeBSD indent have something like that under the same name.
Besides, indent has too many options and this one doesn't look like
particularly desired by anyone. It's possible that someone will complain
some day, but I don't think we should assume that they'll do or that
they're more important than the other users who benefit from your change
being the default behavior and no additional options.
Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Tom Lane-2
Piotr Stefaniak <[hidden email]> writes:
> On 19/05/2019 19.27, Tom Lane wrote:
>> Piotr Stefaniak <[hidden email]> writes:
>>> I think it's safe to assume that upstream can drop support for K&R-style
>>> parameters altogether.

>> Cool.  I already created the switch, but maybe we could have it
>> default to -nkr?

> Sorry, but GNU indent already uses -kr for something else and I would
> like FreeBSD indent have something like that under the same name.
> Besides, indent has too many options and this one doesn't look like
> particularly desired by anyone. It's possible that someone will complain
> some day, but I don't think we should assume that they'll do or that
> they're more important than the other users who benefit from your change
> being the default behavior and no additional options.

Huh.  OK, I'll rip the switch back out again.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Emacs vs pg_indent's weird indentation for function declarations

Tom Lane-2
I wrote:
> Piotr Stefaniak <[hidden email]> writes:
>> Sorry, but GNU indent already uses -kr for something else and I would
>> like FreeBSD indent have something like that under the same name.
>> Besides, indent has too many options and this one doesn't look like
>> particularly desired by anyone. It's possible that someone will complain
>> some day, but I don't think we should assume that they'll do or that
>> they're more important than the other users who benefit from your change
>> being the default behavior and no additional options.

> Huh.  OK, I'll rip the switch back out again.

Here's a proposed patch for you.

                        regards, tom lane


diff --git a/args.c b/args.c
index f79de75..1850542 100644
--- a/args.c
+++ b/args.c
@@ -55,7 +55,7 @@ static char sccsid[] = "@(#)args.c 8.1 (Berkeley) 6/6/93";
 #include "indent_globs.h"
 #include "indent.h"
 
-#define INDENT_VERSION "2.0"
+#define INDENT_VERSION "2.1"
 
 /* profile types */
 #define PRO_SPECIAL 1 /* special case */
diff --git a/indent.h b/indent.h
index 0fffd89..1708dbc 100644
--- a/indent.h
+++ b/indent.h
@@ -41,6 +41,8 @@ void diag2(int, const char *);
 void diag3(int, const char *, int);
 void diag4(int, const char *, int, int);
 void dump_line(void);
+int lookahead(void);
+void lookahead_reset(void);
 void fill_buffer(void);
 void parse(int);
 void pr_comment(void);
diff --git a/io.c b/io.c
index df11094..fbaa5dd 100644
--- a/io.c
+++ b/io.c
@@ -51,6 +51,14 @@ static char sccsid[] = "@(#)io.c 8.1 (Berkeley) 6/6/93";
 
 int         comment_open;
 static int  paren_target;
+
+static char *lookahead_buf; /* malloc'd buffer, or NULL initially */
+static char *lookahead_buf_end; /* end+1 of allocated space */
+static char *lookahead_start; /* => next char for fill_buffer() to fetch */
+static char *lookahead_ptr; /* => next char for lookahead() to fetch */
+static char *lookahead_end; /* last+1 valid char in lookahead_buf */
+static char *lookahead_bp_save; /* lookahead position in bp_save, if any */
+
 static int pad_output(int current, int target);
 
 void
@@ -252,6 +260,73 @@ compute_label_target(void)
  : ps.ind_size * (ps.ind_level - label_offset) + 1;
 }
 
+/*
+ * Read data ahead of what has been collected into in_buffer.
+ *
+ * Successive calls get further and further ahead, until we hit EOF.
+ * Call lookahead_reset() to rescan from just beyond in_buffer.
+ *
+ * Lookahead is automatically reset whenever fill_buffer() reads beyond
+ * the lookahead buffer, i.e., you can't use this for "look behind".
+ *
+ * The standard pattern for potentially multi-line lookahead is to call
+ * lookahead_reset(), then enter a loop that scans forward from buf_ptr
+ * to buf_end, then (if necessary) calls lookahead() to read additional
+ * characters from beyond the end of the current line.
+ */
+int
+lookahead(void)
+{
+    /* First read whatever's in bp_save area */
+    if (lookahead_bp_save != NULL && lookahead_bp_save < be_save)
+ return (unsigned char) *lookahead_bp_save++;
+    /* Else, we have to examine and probably fill the main lookahead buffer */
+    while (lookahead_ptr >= lookahead_end) {
+ int    i = getc(input);
+
+ if (i == EOF)
+    return i;
+ if (i == '\0')
+    continue; /* fill_buffer drops nulls, and so do we */
+
+ if (lookahead_end >= lookahead_buf_end) {
+    /* Need to allocate or enlarge lookahead_buf */
+    char       *new_buf;
+    size_t req;
+
+    if (lookahead_buf == NULL) {
+ req = 64;
+ new_buf = malloc(req);
+    } else {
+ req = (lookahead_buf_end - lookahead_buf) * 2;
+ new_buf = realloc(lookahead_buf, req);
+    }
+    if (new_buf == NULL)
+ errx(1, "too much lookahead required");
+    lookahead_start = new_buf + (lookahead_start - lookahead_buf);
+    lookahead_ptr = new_buf + (lookahead_ptr - lookahead_buf);
+    lookahead_end = new_buf + (lookahead_end - lookahead_buf);
+    lookahead_buf = new_buf;
+    lookahead_buf_end = new_buf + req;
+ }
+
+ *lookahead_end++ = i;
+    }
+    return (unsigned char) *lookahead_ptr++;
+}
+
+/*
+ * Reset so that lookahead() will again scan from just beyond what's in
+ * in_buffer.
+ */
+void
+lookahead_reset(void)
+{
+    /* Reset the main lookahead buffer */
+    lookahead_ptr = lookahead_start;
+    /* If bp_save isn't NULL, we need to scan that first */
+    lookahead_bp_save = bp_save;
+}
 
 /*
  * Copyright (C) 1976 by the Board of Trustees of the University of Illinois
@@ -261,7 +336,9 @@ compute_label_target(void)
  *
  * NAME: fill_buffer
  *
- * FUNCTION: Reads one block of input into input_buffer
+ * FUNCTION: Reads one line of input into in_buffer,
+ * sets up buf_ptr and buf_end to point to the line's start and end+1.
+ * (Note that the buffer does not get null-terminated.)
  *
  * HISTORY: initial coding November 1976 D A Willcox of CAC 1/7/77 A
  * Willcox of CAC Added check for switch back to partly full input
@@ -279,6 +356,7 @@ fill_buffer(void)
  buf_ptr = bp_save; /* do not read anything, just switch buffers */
  buf_end = be_save;
  bp_save = be_save = NULL;
+ lookahead_bp_save = NULL;
  if (buf_ptr < buf_end)
     return; /* only return if there is really something in
  * this buffer */
@@ -293,16 +371,21 @@ fill_buffer(void)
     p = in_buffer + offset;
     in_buffer_limit = in_buffer + size - 2;
  }
- if ((i = getc(f)) == EOF) {
+ if (lookahead_start < lookahead_end) {
+    i = (unsigned char) *lookahead_start++;
+ } else {
+    lookahead_start = lookahead_ptr = lookahead_end = lookahead_buf;
+    if ((i = getc(f)) == EOF) {
  *p++ = ' ';
  *p++ = '\n';
  had_eof = true;
  break;
+    }
  }
  if (i != '\0')
     *p++ = i;
  if (i == '\n')
- break;
+    break;
     }
     buf_ptr = in_buffer;
     buf_end = p;
diff --git a/lexi.c b/lexi.c
index 3c7bfef..d43723c 100644
--- a/lexi.c
+++ b/lexi.c
@@ -148,6 +148,74 @@ strcmp_type(const void *e1, const void *e2)
     return (strcmp(e1, *(const char * const *)e2));
 }
 
+/*
+ * Decide whether "foo(..." is a function definition or declaration.
+ *
+ * At call, we are looking at the '('.  Look ahead to find the first
+ * '{', ';' or ',' that is not within parentheses or comments; then
+ * it's a definition if we found '{', otherwise a declaration.
+ * Note that this rule is fooled by K&R-style parameter declarations,
+ * but telling the difference between those and function attributes
+ * seems like more trouble than it's worth.  This code could also be
+ * fooled by mismatched parens or apparent comment starts within string
+ * literals, but that seems unlikely in the context it's used in.
+ */
+static int
+is_func_definition(char *tp)
+{
+    int paren_depth = 0;
+    int in_comment = false;
+    int in_slash_comment = false;
+    int lastc = 0;
+
+    /* We may need to look past the end of the current buffer. */
+    lookahead_reset();
+    for (;;) {
+ int    c;
+
+ /* Fetch next character. */
+ if (tp < buf_end)
+    c = *tp++;
+ else {
+    c = lookahead();
+    if (c == EOF)
+ break;
+ }
+ /* Handle comments. */
+ if (in_comment) {
+    if (lastc == '*' && c == '/')
+ in_comment = false;
+ } else if (lastc == '/' && c == '*' && !in_slash_comment)
+    in_comment = true;
+ else if (in_slash_comment) {
+    if (c == '\n')
+ in_slash_comment = false;
+ } else if (lastc == '/' && c == '/')
+    in_slash_comment = true;
+ /* Count nested parens properly. */
+ else if (c == '(')
+    paren_depth++;
+ else if (c == ')') {
+    paren_depth--;
+    /*
+     * If we find unbalanced parens, we must have started inside a
+     * declaration.
+     */
+    if (paren_depth < 0)
+ return false;
+ } else if (paren_depth == 0) {
+    /* We are outside any parentheses or comments. */
+    if (c == '{')
+ return true;
+    else if (c == ';' || c == ',')
+ return false;
+ }
+ lastc = c;
+    }
+    /* Hit EOF --- for lack of anything better, assume "not a definition". */
+    return false;
+}
+
 int
 lexi(struct parser_state *state)
 {
@@ -348,15 +416,12 @@ lexi(struct parser_state *state)
  } /* end of if (found_it) */
  if (*buf_ptr == '(' && state->tos <= 1 && state->ind_level == 0 &&
     state->in_parameter_declaration == 0 && state->block_init == 0) {
-    char *tp = buf_ptr;
-    while (tp < buf_end)
- if (*tp++ == ')' && (*tp == ';' || *tp == ','))
-    goto not_proc;
-    strncpy(state->procname, token, sizeof state->procname - 1);
-    if (state->in_decl)
- state->in_parameter_declaration = 1;
-    return (funcname);
-    not_proc:;
+    if (is_func_definition(buf_ptr)) {
+ strncpy(state->procname, token, sizeof state->procname - 1);
+ if (state->in_decl)
+    state->in_parameter_declaration = 1;
+ return (funcname);
+    }
  }
  /*
  * The following hack attempts to guess whether or not the current
diff --git a/tests/declarations.0 b/tests/declarations.0
index 6d668b1..8419494 100644
--- a/tests/declarations.0
+++ b/tests/declarations.0
@@ -70,10 +70,10 @@ static int ald_shutingdown = 0;
 struct thread *ald_thread;
 
 static int
-do_execve(td, args, mac_p)
- struct thread *td;
- struct image_args *args;
- struct mac *mac_p;
+do_execve(
+struct thread *td,
+struct image_args *args,
+struct mac *mac_p)
 {
 
 }
diff --git a/tests/declarations.0.stdout b/tests/declarations.0.stdout
index e97e447..ab5a447 100644
--- a/tests/declarations.0.stdout
+++ b/tests/declarations.0.stdout
@@ -64,10 +64,10 @@ static int ald_shutingdown = 0;
 struct thread  *ald_thread;
 
 static int
-do_execve(td, args, mac_p)
- struct thread  *td;
- struct image_args *args;
- struct mac     *mac_p;
+do_execve(
+  struct thread *td,
+  struct image_args *args,
+  struct mac *mac_p)
 {
 
 }
diff --git a/tests/list_head.0 b/tests/list_head.0
index 3a186ca..35874eb 100644
--- a/tests/list_head.0
+++ b/tests/list_head.0
@@ -1,10 +1,9 @@
 /* $FreeBSD$ */
 /* See r309380 */
 static int
-do_execve(td, args, mac_p)
- struct thread *td;
- struct image_args *args;
- struct mac *mac_p;
+do_execve(struct thread *td,
+struct image_args *args,
+struct mac *mac_p)
 {
 
 }
diff --git a/tests/list_head.0.stdout b/tests/list_head.0.stdout
index b6f0762..2ebcca5 100644
--- a/tests/list_head.0.stdout
+++ b/tests/list_head.0.stdout
@@ -1,10 +1,9 @@
 /* $FreeBSD$ */
 /* See r309380 */
 static int
-do_execve(td, args, mac_p)
- struct thread  *td;
- struct image_args *args;
- struct mac     *mac_p;
+do_execve(struct thread *td,
+  struct image_args *args,
+  struct mac *mac_p)
 {
 
 }