Displaying and dumping of table access methods

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Displaying and dumping of table access methods

Andres Freund
Hi,

Over in [1] we're discussing the development of the pluggable storage
patchset, which allows different ways of storing table data.

One thing I'd like to discuss with a wider audience than the
implementation details is psql and pg_dump handling of table access
methods.

Currently the patchset neither dumps nor displays table access
methods . That's clearly not right.

The reason for that however is not that it's hard to dump/display
code-wise, but that to me the correct behaviour is not obvious.

The reason to make table storage pluggable is after all that the table
access method can be changed, and part of developing new table access
methods is being able to run the regression tests.

A patch at [2] adds display of a table's access method to \d+ - but that
means that running the tests with a different default table access
method (e.g. using PGOPTIONS='-c default_table_access_method=...)
there'll be a significant number of test failures, even though the test
results did not meaningfully differ.

Similarly, if pg_dump starts to dump table access methods either
unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
unimportant differences.

A third issue, less important in my opinion, is that specifying the
table access method means that it's harder to dump/restore into a table
with a different AM.


One way to solve this would be for psql/pg_dump to only define the table
access methods for tables that differ from the currently configured
default_table_access_method. That'd mean that only a few tests that
intentionally test AMs would display/dump the access method.  On the
other hand that seems like it's a bit too much magic.

While I don't like that option, I haven't really come up with something
better.  Having alternative outputs for nearly every test file for
e.g. zheap if/when we merge it, seems unmaintainable. It's less insane
for the pg_dump tests.

An alternative approach based on that would be to hack pg_regress to
magically ignore "Access method: ..." type differences, but that seems
like a bad idea to me.

Greetings,

Andres Freund

[1] https://postgr.es/m/20180703070645.wchpu5muyto5n647%40alap3.anarazel.de
[2] https://postgr.es/m/CA+q6zcWMHSbLkKO7eq95t15m3R1X9FCcm0kT3dGS2gGSRO9kKw@...
[3] https://postgr.es/m/20181215193700.nov7bphxyge4qkez@...

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Stephen Frost
Greetings,

* Andres Freund ([hidden email]) wrote:
> Over in [1] we're discussing the development of the pluggable storage
> patchset, which allows different ways of storing table data.
>
> One thing I'd like to discuss with a wider audience than the
> implementation details is psql and pg_dump handling of table access
> methods.
>
> Currently the patchset neither dumps nor displays table access
> methods . That's clearly not right.

Agreed.

> The reason for that however is not that it's hard to dump/display
> code-wise, but that to me the correct behaviour is not obvious.

While it might be a lot of changes to the regression output results, I
tend to feel that showng the access method is a very important aspect of
the relation and therefore should be in \d output.

> The reason to make table storage pluggable is after all that the table
> access method can be changed, and part of developing new table access
> methods is being able to run the regression tests.

We certainly want people to be able to run the regression tests, but it
feels like we will need more regression tests in the future as we wish
to cover both the built-in heap AM and the new zheap AM, so we should
really have those both run independently.  I don't think we'll be able
to have just one set of regression tests that cover everything
interesting for both, sadly.  Perhaps there's a way to have a set of
regression tests which are run for both, and another set that's run for
each, but building all of that logic is a fair bit of work and I'm not
sure how much it's really saving us.

> A patch at [2] adds display of a table's access method to \d+ - but that
> means that running the tests with a different default table access
> method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> there'll be a significant number of test failures, even though the test
> results did not meaningfully differ.

Yeah, I'm not really thrilled with this approach.

> Similarly, if pg_dump starts to dump table access methods either
> unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> unimportant differences.

In reality, pg_dump already depends on quite a few defaults to be in
place, so I don't see a particular issue with adding this into that set.
New tests would need to have new pg_dump checks, of course, but that's
generally the case as well.

> A third issue, less important in my opinion, is that specifying the
> table access method means that it's harder to dump/restore into a table
> with a different AM.

I understand this concern but I view it as an independent consideration.
There are a lot of transformations which one might wish for when dumping
and restoring data, a number of which are handled through various
options (--no-owner, --no-acls, etc) and it seems like we could do
something similar here.

> One way to solve this would be for psql/pg_dump to only define the table
> access methods for tables that differ from the currently configured
> default_table_access_method. That'd mean that only a few tests that
> intentionally test AMs would display/dump the access method.  On the
> other hand that seems like it's a bit too much magic.

I'm not a fan of depending on the currently set
default_table_access_method..  When would that be set in the pg_restore
process?  Or in the SQL script that's created?  Really though, that does
strike me as quite a bit of magic.

> While I don't like that option, I haven't really come up with something
> better.  Having alternative outputs for nearly every test file for
> e.g. zheap if/when we merge it, seems unmaintainable. It's less insane
> for the pg_dump tests.

I'm thinking less of alternative output files and more of additional
tests for zheap cases...

> An alternative approach based on that would be to hack pg_regress to
> magically ignore "Access method: ..." type differences, but that seems
> like a bad idea to me.

I agree, that's not a good idea.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Andres Freund
Hi,

On 2019-01-07 19:19:46 -0500, Stephen Frost wrote:

> Greetings,
>
> * Andres Freund ([hidden email]) wrote:
> > Over in [1] we're discussing the development of the pluggable storage
> > patchset, which allows different ways of storing table data.
> >
> > One thing I'd like to discuss with a wider audience than the
> > implementation details is psql and pg_dump handling of table access
> > methods.
> >
> > Currently the patchset neither dumps nor displays table access
> > methods . That's clearly not right.
>
> Agreed.
>
> > The reason for that however is not that it's hard to dump/display
> > code-wise, but that to me the correct behaviour is not obvious.
>
> While it might be a lot of changes to the regression output results, I
> tend to feel that showng the access method is a very important aspect of
> the relation and therefore should be in \d output.

I don't see how that'd be feasible.  Imo it is/was absolutely crucial
for zheap development to be able to use the existing postgres tests.


> > The reason to make table storage pluggable is after all that the table
> > access method can be changed, and part of developing new table access
> > methods is being able to run the regression tests.
>
> We certainly want people to be able to run the regression tests, but it
> feels like we will need more regression tests in the future as we wish
> to cover both the built-in heap AM and the new zheap AM, so we should
> really have those both run independently.  I don't think we'll be able
> to have just one set of regression tests that cover everything
> interesting for both, sadly.  Perhaps there's a way to have a set of
> regression tests which are run for both, and another set that's run for
> each, but building all of that logic is a fair bit of work and I'm not
> sure how much it's really saving us.

I don't think there's any sort of contradiction here. I don't think it's
feasible to have tests tests for every feature duplicated for each
supported AM, we have enough difficulty maintaining our current
tests. But that doesn't mean it's a problem to have individual test
[files] run with an explicitly assigned AM - the test can just do a SET
default_table_access_method = zheap; or explicitly say USING zheap.

> > A patch at [2] adds display of a table's access method to \d+ - but that
> > means that running the tests with a different default table access
> > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > there'll be a significant number of test failures, even though the test
> > results did not meaningfully differ.
>
> Yeah, I'm not really thrilled with this approach.
>
> > Similarly, if pg_dump starts to dump table access methods either
> > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> > unimportant differences.
>
> In reality, pg_dump already depends on quite a few defaults to be in
> place, so I don't see a particular issue with adding this into that set.
> New tests would need to have new pg_dump checks, of course, but that's
> generally the case as well.

I am not sure what you mean here? Are you suggesting that there'd be a
separate set of pg_dump test for zheap and every other possible future
AM?


To me the approach you're suggesting is going to lead to an explosion of
redundant tests, which are really hard to maintain, especially for
out-of-tree AMs. Out of tree AMs with the setup I propose can just
install the AM into the template database and set PGOPTIONS, and they
have pretty good coverage.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Stephen Frost
Greetings,

* Andres Freund ([hidden email]) wrote:

> On 2019-01-07 19:19:46 -0500, Stephen Frost wrote:
> > * Andres Freund ([hidden email]) wrote:
> > > Over in [1] we're discussing the development of the pluggable storage
> > > patchset, which allows different ways of storing table data.
> > >
> > > One thing I'd like to discuss with a wider audience than the
> > > implementation details is psql and pg_dump handling of table access
> > > methods.
> > >
> > > Currently the patchset neither dumps nor displays table access
> > > methods . That's clearly not right.
> >
> > Agreed.
> >
> > > The reason for that however is not that it's hard to dump/display
> > > code-wise, but that to me the correct behaviour is not obvious.
> >
> > While it might be a lot of changes to the regression output results, I
> > tend to feel that showng the access method is a very important aspect of
> > the relation and therefore should be in \d output.
>
> I don't see how that'd be feasible.  Imo it is/was absolutely crucial
> for zheap development to be able to use the existing postgres tests.
I don't agree with the general assumption that "we did this for
development and therefore it should be done that way forever".

Instead, I would look at adding tests where there's a difference
between the two, or possibly some difference, and make sure that there
isn't, and make sure that the code paths are covered.

> > > The reason to make table storage pluggable is after all that the table
> > > access method can be changed, and part of developing new table access
> > > methods is being able to run the regression tests.
> >
> > We certainly want people to be able to run the regression tests, but it
> > feels like we will need more regression tests in the future as we wish
> > to cover both the built-in heap AM and the new zheap AM, so we should
> > really have those both run independently.  I don't think we'll be able
> > to have just one set of regression tests that cover everything
> > interesting for both, sadly.  Perhaps there's a way to have a set of
> > regression tests which are run for both, and another set that's run for
> > each, but building all of that logic is a fair bit of work and I'm not
> > sure how much it's really saving us.
>
> I don't think there's any sort of contradiction here. I don't think it's
> feasible to have tests tests for every feature duplicated for each
> supported AM, we have enough difficulty maintaining our current
> tests. But that doesn't mean it's a problem to have individual test
> [files] run with an explicitly assigned AM - the test can just do a SET
> default_table_access_method = zheap; or explicitly say USING zheap.
I don't mean to suggest that there's a contradiction.  I don't have any
problem with new tests being added which set the default AM to zheap, as
long as it's clear that such is happening for downstream tests.

> > > A patch at [2] adds display of a table's access method to \d+ - but that
> > > means that running the tests with a different default table access
> > > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > > there'll be a significant number of test failures, even though the test
> > > results did not meaningfully differ.
> >
> > Yeah, I'm not really thrilled with this approach.
> >
> > > Similarly, if pg_dump starts to dump table access methods either
> > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> > > unimportant differences.
> >
> > In reality, pg_dump already depends on quite a few defaults to be in
> > place, so I don't see a particular issue with adding this into that set.
> > New tests would need to have new pg_dump checks, of course, but that's
> > generally the case as well.
>
> I am not sure what you mean here? Are you suggesting that there'd be a
> separate set of pg_dump test for zheap and every other possible future
> AM?
I'm suggesting that pg_dump would have additional tests for zheap, in
addition to the existing tests we already have.  No more, no less,
really.

> To me the approach you're suggesting is going to lead to an explosion of
> redundant tests, which are really hard to maintain, especially for
> out-of-tree AMs. Out of tree AMs with the setup I propose can just
> install the AM into the template database and set PGOPTIONS, and they
> have pretty good coverage.

I'm frankly much less interested in out-of-tree AMs as we aren't going
to have in-core regression tests for them anyway, because we can't as
they aren't in our tree and, ultimately, I don't find them to have
anywhere near the same value that in-core AMs have.

I don't have any problem with out-of-tree AMs hacking things up as they
see fit and then running whatever tests they want, but it is, and always
will be, a very different discussion and ultimately a different result
when we're talking about what will be incorporated and supported as part
of core.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Andres Freund
Hi,

On 2019-01-07 20:30:13 -0500, Stephen Frost wrote:

> * Andres Freund ([hidden email]) wrote:
> > On 2019-01-07 19:19:46 -0500, Stephen Frost wrote:
> > > * Andres Freund ([hidden email]) wrote:
> > > > Over in [1] we're discussing the development of the pluggable storage
> > > > patchset, which allows different ways of storing table data.
> > > >
> > > > One thing I'd like to discuss with a wider audience than the
> > > > implementation details is psql and pg_dump handling of table access
> > > > methods.
> > > >
> > > > Currently the patchset neither dumps nor displays table access
> > > > methods . That's clearly not right.
> > >
> > > Agreed.
> > >
> > > > The reason for that however is not that it's hard to dump/display
> > > > code-wise, but that to me the correct behaviour is not obvious.
> > >
> > > While it might be a lot of changes to the regression output results, I
> > > tend to feel that showng the access method is a very important aspect of
> > > the relation and therefore should be in \d output.
> >
> > I don't see how that'd be feasible.  Imo it is/was absolutely crucial
> > for zheap development to be able to use the existing postgres tests.
>
> I don't agree with the general assumption that "we did this for
> development and therefore it should be done that way forever".
>
> Instead, I would look at adding tests where there's a difference
> between the two, or possibly some difference, and make sure that there
> isn't, and make sure that the code paths are covered.

I think this approach makes no sense whatsover.  It's entirely possible
to encounter bugs in table AM relevant code in places one would not
think so. But even if one were, foolishly, to exclude those, the pieces
of code we know are highly affected by the way the AM works are a a
significant (at the very least 10-20% of tests) percentage of our
tests. Duplicating them even just between heap and zheap would be a
major technical debt.  DML, ON CONFLICT, just about all isolationtester
test, etc all are absolutely crucial to test different AMs for
correctness.


> > To me the approach you're suggesting is going to lead to an explosion of
> > redundant tests, which are really hard to maintain, especially for
> > out-of-tree AMs. Out of tree AMs with the setup I propose can just
> > install the AM into the template database and set PGOPTIONS, and they
> > have pretty good coverage.
>
> I'm frankly much less interested in out-of-tree AMs as we aren't going
> to have in-core regression tests for them anyway, because we can't as
> they aren't in our tree and, ultimately, I don't find them to have
> anywhere near the same value that in-core AMs have.

I think you must be missing my point: Adding spurious differences due to
"Table Access Method: heap" vs "Table Access Method: blarg" makes it
unnecessarily hard to reuse the in-core tests for any additional AM, be
it in-core or out of core.  I fail to see what us not having explicit
tests for such AMs in core has to do with my point.

Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or
HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly*
better from a maintainability POV than including the AM in the output.

Andres

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Stephen Frost
Greetings,

* Andres Freund ([hidden email]) wrote:

> On 2019-01-07 20:30:13 -0500, Stephen Frost wrote:
> > * Andres Freund ([hidden email]) wrote:
> > > On 2019-01-07 19:19:46 -0500, Stephen Frost wrote:
> > > > * Andres Freund ([hidden email]) wrote:
> > > > > Over in [1] we're discussing the development of the pluggable storage
> > > > > patchset, which allows different ways of storing table data.
> > > > >
> > > > > One thing I'd like to discuss with a wider audience than the
> > > > > implementation details is psql and pg_dump handling of table access
> > > > > methods.
> > > > >
> > > > > Currently the patchset neither dumps nor displays table access
> > > > > methods . That's clearly not right.
> > > >
> > > > Agreed.
> > > >
> > > > > The reason for that however is not that it's hard to dump/display
> > > > > code-wise, but that to me the correct behaviour is not obvious.
> > > >
> > > > While it might be a lot of changes to the regression output results, I
> > > > tend to feel that showng the access method is a very important aspect of
> > > > the relation and therefore should be in \d output.
> > >
> > > I don't see how that'd be feasible.  Imo it is/was absolutely crucial
> > > for zheap development to be able to use the existing postgres tests.
> >
> > I don't agree with the general assumption that "we did this for
> > development and therefore it should be done that way forever".
> >
> > Instead, I would look at adding tests where there's a difference
> > between the two, or possibly some difference, and make sure that there
> > isn't, and make sure that the code paths are covered.
>
> I think this approach makes no sense whatsover.  It's entirely possible
> to encounter bugs in table AM relevant code in places one would not
> think so. But even if one were, foolishly, to exclude those, the pieces
> of code we know are highly affected by the way the AM works are a a
> significant (at the very least 10-20% of tests) percentage of our
> tests. Duplicating them even just between heap and zheap would be a
> major technical debt.  DML, ON CONFLICT, just about all isolationtester
> test, etc all are absolutely crucial to test different AMs for
> correctness.
I am generally on board with minimizing the amount of duplicate code
that we have, but we must run those tests independently, so it's really
a question of if we build a system where we can parametize a set of
tests to run and then run them for every AM and compare the output to
either the generalized output or the per-AM output, or if we build on
the existing system and simply have an independent set of tests.  It's
not clear to me, at the current point, which will be the lower level of
ongoing effort, but when it comes to the effort required today, it seems
pretty clear to me that whacking around the current test environment to
rerun tests is a larger amount of effort.  If that's the requirement,
then so be it and I'm on-board, but I'm also open to considering a
lesser requirement for a completely new feature.

> > > To me the approach you're suggesting is going to lead to an explosion of
> > > redundant tests, which are really hard to maintain, especially for
> > > out-of-tree AMs. Out of tree AMs with the setup I propose can just
> > > install the AM into the template database and set PGOPTIONS, and they
> > > have pretty good coverage.
> >
> > I'm frankly much less interested in out-of-tree AMs as we aren't going
> > to have in-core regression tests for them anyway, because we can't as
> > they aren't in our tree and, ultimately, I don't find them to have
> > anywhere near the same value that in-core AMs have.
>
> I think you must be missing my point: Adding spurious differences due to
> "Table Access Method: heap" vs "Table Access Method: blarg" makes it
> unnecessarily hard to reuse the in-core tests for any additional AM, be
> it in-core or out of core.  I fail to see what us not having explicit
> tests for such AMs in core has to do with my point.
I don't think I'm missing your point.  If you believe that we should be
swayed by this argument into agreeing to change what we believe the
user-facing psql \d output should be, then I am very hopeful that you're
completely wrong.  The psql \d output should be driven by what will be
best for our users, not by what's best by external AMs or, really,
anything else.

> Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or
> HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly*
> better from a maintainability POV than including the AM in the output.

I'm pretty sure I said in my last reply that I'm alright with psql and
pg_dump not outputting a result for the default value, provided the
default is understood to always really be the default, but, again, what
we should be concerned about here is what the end user is dealing with
and I'm not particularly incensed to support something different even if
it's around a variable of some kind for external AMs, or external
*whatevers*.

I'm also a bit confused as to why we are spending a good bit of time
arguing about external AMs without any discussion or definition of what
they are or what their requirements are.  If such things seriously
exist, then let us talk about them and try to come up with solutions for
them; if they don't, then we can talk about them when they come up.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Andres Freund
Hi,

On 2019-01-07 21:08:58 -0500, Stephen Frost wrote:

> * Andres Freund ([hidden email]) wrote:
> > On 2019-01-07 20:30:13 -0500, Stephen Frost wrote:
> > > I don't agree with the general assumption that "we did this for
> > > development and therefore it should be done that way forever".
> > >
> > > Instead, I would look at adding tests where there's a difference
> > > between the two, or possibly some difference, and make sure that there
> > > isn't, and make sure that the code paths are covered.
> >
> > I think this approach makes no sense whatsover.  It's entirely possible
> > to encounter bugs in table AM relevant code in places one would not
> > think so. But even if one were, foolishly, to exclude those, the pieces
> > of code we know are highly affected by the way the AM works are a a
> > significant (at the very least 10-20% of tests) percentage of our
> > tests. Duplicating them even just between heap and zheap would be a
> > major technical debt.  DML, ON CONFLICT, just about all isolationtester
> > test, etc all are absolutely crucial to test different AMs for
> > correctness.
>
> I am generally on board with minimizing the amount of duplicate code
> that we have, but we must run those tests independently, so it's really
> a question of if we build a system where we can parametize a set of
> tests to run and then run them for every AM and compare the output to
> either the generalized output or the per-AM output, or if we build on
> the existing system and simply have an independent set of tests.  It's
> not clear to me, at the current point, which will be the lower level of
> ongoing effort

Huh? It's absolutely *trivial* from a buildsystem POV to run the tests
again with a different default AM. That's precisely why I'm talking
about this. Just setting PGOPTIONS='-c
default_table_access_method=zheap' in the new makefile target (the ms
run scripts are similar) is sufficient.  And we don't need to force
everyone to constantly run tests with e.g. both heap and zheap, it's
sufficient to do so on a few buildfarm machines, and whenever changing
AM level code.  Rerunning all the tests with a different AM is just
setting the same environment variable, but running check-world as the
target.

Obviously that does not preclude a few tests that explicitly test
features specific to an AM. E.g. the zheap branch's tests have an
explicit zheap regression file and a few zheap specific isolationtester
spec files that a) test zheap specific beheaviour b) make sure that the
most basic zheap behaviour is tested even when not running the tests
with zheap as the default AM.

And even if you were to successfully argue that it's sufficient during
normal development to only have a few zheap specific additional tests,
we'd certainly want to make it possible to occasionally explicitly run
the rest of the tests under zheap to see whether additional stuff has
been broken - and that's much harder to sift through if there's a lot of
spurious test failures due to \d[+] outputting additional/differing
data.


> ..., but when it comes to the effort required today, it seems pretty
> clear to me that whacking around the current test environment to rerun
> tests is a larger amount of effort.

How did you come to that conclusion? Adding a makefile and vcregress.pl
target is pretty trivial.


> > Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or
> > HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly*
> > better from a maintainability POV than including the AM in the output.
>
> I'm pretty sure I said in my last reply that I'm alright with psql and
> pg_dump not outputting a result for the default value

I don't see that anywhere in your replies.  Are you referring to:

> I don't have any problem with new tests being added which set the
> default AM to zheap, as long as it's clear that such is happening for
> downstream tests.

? If so, that's not addressing the reason why I'm suggesting to have
something like HIDE_TABLE_AMS. The point is that that'd allow us to
cater the default psql output to humans, while still choosing not to
display the AM for regression tests (thereby allowing to run the tests).


> provided the default is understood to always really be the default

What do you mean by that? Are you arguing that it should be impossible
in test scenarios to override default_table_access_method? Or that
pg_dump/psql should check for a hardcoded 'heap' AM (via a macro or
whatnot)?


> I'm also a bit confused as to why we are spending a good bit of time
> arguing about external AMs without any discussion or definition of what
> they are or what their requirements are.  If such things seriously
> exist, then let us talk about them and try to come up with solutions for
> them; if they don't, then we can talk about them when they come up.

We are working seriously hard on making AMs pluggable. Zheap is not yet,
and won't be that soon, part of core. The concerns for an in-core zheap
(which needs to maintain the test infrastructure during the remainder of
its out-of-core development!) and out-of-core AMs are pretty aligned.  I
don't get your confusion.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Michael Paquier-2
On Mon, Jan 07, 2019 at 06:31:52PM -0800, Andres Freund wrote:

> Huh? It's absolutely *trivial* from a buildsystem POV to run the tests
> again with a different default AM. That's precisely why I'm talking
> about this. Just setting PGOPTIONS='-c
> default_table_access_method=zheap' in the new makefile target (the ms
> run scripts are similar) is sufficient.  And we don't need to force
> everyone to constantly run tests with e.g. both heap and zheap, it's
> sufficient to do so on a few buildfarm machines, and whenever changing
> AM level code.  Rerunning all the tests with a different AM is just
> setting the same environment variable, but running check-world as the
> target.
Another point is that having default_table_access_method facilitates
the restore of tables across AMs similarly to tablespaces, so CREATE
TABLE dumps should not include the AM part.

> And even if you were to successfully argue that it's sufficient during
> normal development to only have a few zheap specific additional tests,
> we'd certainly want to make it possible to occasionally explicitly run
> the rest of the tests under zheap to see whether additional stuff has
> been broken - and that's much harder to sift through if there's a lot of
> spurious test failures due to \d[+] outputting additional/differing
> data.

The specific-heap tests could be included as an extra module in
src/test/modules easily, so removing from the main tests what is not
completely transparent may make sense.  Most users use make-check to
test a patch quickly, so we could miss some bugs because of that
during review.  Still, people seem to be better-educated lately in the
fact that they need to do an actual check-world when checking a patch
at full.  So personally I can live with a split where it makes sense.
Being able to easily validate an AM implementation would be nice.
Isolation tests may be another deal though for DMLs.

> We are working seriously hard on making AMs pluggable. Zheap is not yet,
> and won't be that soon, part of core. The concerns for an in-core zheap
> (which needs to maintain the test infrastructure during the remainder of
> its out-of-core development!) and out-of-core AMs are pretty aligned.  I
> don't get your confusion.

I would imagine that a full-fledged AM should be able to maintain
compatibility with the full set of queries that heap is able to
support, so if you can make the tests transparent enough so as they
can be run for any AMs without alternate input in the core tree, then
that's a goal worth it.  Don't you have plan inconsistencies as well
with zheap?

In short, improving portability incrementally is good for the
long-term prospective.  From that point of view adding the AM to \d+
output may be a bad idea, as there are modules out of core which
rely on psql meta-commands, and it would be nice to be able to test
those tests as well for those plugins with different types of AMs.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Haribabu Kommi-2

On Tue, Jan 8, 2019 at 3:02 PM Michael Paquier <[hidden email]> wrote:
On Mon, Jan 07, 2019 at 06:31:52PM -0800, Andres Freund wrote:
> Huh? It's absolutely *trivial* from a buildsystem POV to run the tests
> again with a different default AM. That's precisely why I'm talking
> about this. Just setting PGOPTIONS='-c
> default_table_access_method=zheap' in the new makefile target (the ms
> run scripts are similar) is sufficient.  And we don't need to force
> everyone to constantly run tests with e.g. both heap and zheap, it's
> sufficient to do so on a few buildfarm machines, and whenever changing
> AM level code.  Rerunning all the tests with a different AM is just
> setting the same environment variable, but running check-world as the
> target.

PGOPTIONS or any similar options are good for the AM development
to test their AM's with all the existing PostgreSQL features.
 
Another point is that having default_table_access_method facilitates
the restore of tables across AMs similarly to tablespaces, so CREATE
TABLE dumps should not include the AM part.

+1 to the above approach to dump "set default_table_access_method".
 
> And even if you were to successfully argue that it's sufficient during
> normal development to only have a few zheap specific additional tests,
> we'd certainly want to make it possible to occasionally explicitly run
> the rest of the tests under zheap to see whether additional stuff has
> been broken - and that's much harder to sift through if there's a lot of
> spurious test failures due to \d[+] outputting additional/differing
> data.

The specific-heap tests could be included as an extra module in
src/test/modules easily, so removing from the main tests what is not
completely transparent may make sense.  Most users use make-check to
test a patch quickly, so we could miss some bugs because of that
during review.  Still, people seem to be better-educated lately in the
fact that they need to do an actual check-world when checking a patch
at full.  So personally I can live with a split where it makes sense.
Being able to easily validate an AM implementation would be nice.
Isolation tests may be another deal though for DMLs.

> We are working seriously hard on making AMs pluggable. Zheap is not yet,
> and won't be that soon, part of core. The concerns for an in-core zheap
> (which needs to maintain the test infrastructure during the remainder of
> its out-of-core development!) and out-of-core AMs are pretty aligned.  I
> don't get your confusion.

I would imagine that a full-fledged AM should be able to maintain
compatibility with the full set of queries that heap is able to
support, so if you can make the tests transparent enough so as they
can be run for any AMs without alternate input in the core tree, then
that's a goal worth it.  Don't you have plan inconsistencies as well
with zheap?

In short, improving portability incrementally is good for the
long-term prospective.  From that point of view adding the AM to \d+
output may be a bad idea, as there are modules out of core which
rely on psql meta-commands, and it would be nice to be able to test
those tests as well for those plugins with different types of AMs.

I also agree that adding AM details to \d+ will lead to many unnecessary
failures. Currently \d+ also doesn't show all the details of the relation like
owner and etc.

Regards,
Haribabu Kommi
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Peter Eisentraut-6
In reply to this post by Andres Freund
On 08/01/2019 00:56, Andres Freund wrote:
> A patch at [2] adds display of a table's access method to \d+ - but that
> means that running the tests with a different default table access
> method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> there'll be a significant number of test failures, even though the test
> results did not meaningfully differ.

For psql, a variable that hides the access method if it's the default.

> Similarly, if pg_dump starts to dump table access methods either
> unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> unimportant differences.

For pg_dump, track and set the default_table_access_method setting
throughout the dump (similar to how default_with_oids was handled, I
believe).

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Andres Freund
In reply to this post by Michael Paquier-2
Hi,

On 2019-01-08 13:02:00 +0900, Michael Paquier wrote:

> On Mon, Jan 07, 2019 at 06:31:52PM -0800, Andres Freund wrote:
> > Huh? It's absolutely *trivial* from a buildsystem POV to run the tests
> > again with a different default AM. That's precisely why I'm talking
> > about this. Just setting PGOPTIONS='-c
> > default_table_access_method=zheap' in the new makefile target (the ms
> > run scripts are similar) is sufficient.  And we don't need to force
> > everyone to constantly run tests with e.g. both heap and zheap, it's
> > sufficient to do so on a few buildfarm machines, and whenever changing
> > AM level code.  Rerunning all the tests with a different AM is just
> > setting the same environment variable, but running check-world as the
> > target.
>
> Another point is that having default_table_access_method facilitates
> the restore of tables across AMs similarly to tablespaces, so CREATE
> TABLE dumps should not include the AM part.

That's what I suggested in the first message in this thread, or did I
miss a difference?


> > And even if you were to successfully argue that it's sufficient during
> > normal development to only have a few zheap specific additional tests,
> > we'd certainly want to make it possible to occasionally explicitly run
> > the rest of the tests under zheap to see whether additional stuff has
> > been broken - and that's much harder to sift through if there's a lot of
> > spurious test failures due to \d[+] outputting additional/differing
> > data.
>
> The specific-heap tests could be included as an extra module in
> src/test/modules easily, so removing from the main tests what is not
> completely transparent may make sense.

Why does it need to be an extra module, rather than just exta regression
files / sections of files that just explicitly specify the AM?  Seems a
lot easier and faster.


> > We are working seriously hard on making AMs pluggable. Zheap is not yet,
> > and won't be that soon, part of core. The concerns for an in-core zheap
> > (which needs to maintain the test infrastructure during the remainder of
> > its out-of-core development!) and out-of-core AMs are pretty aligned.  I
> > don't get your confusion.
>
> I would imagine that a full-fledged AM should be able to maintain
> compatibility with the full set of queries that heap is able to
> support, so if you can make the tests transparent enough so as they
> can be run for any AMs without alternate input in the core tree, then
> that's a goal worth it.  Don't you have plan inconsistencies as well
> with zheap?

In the core tests there's a fair number of things that can be cured by
adding an ORDER BY to the tests, which seems sensible. We've added a lot
of those over the years anyway.  There's additionally a number of plans
that change, which currently is handled by alternatives output files,
but I think we should move to reduce those differences, that's probably
not too hard.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Andres Freund
In reply to this post by Peter Eisentraut-6
Hi,

On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote:
> On 08/01/2019 00:56, Andres Freund wrote:
> > A patch at [2] adds display of a table's access method to \d+ - but that
> > means that running the tests with a different default table access
> > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > there'll be a significant number of test failures, even though the test
> > results did not meaningfully differ.
>
> For psql, a variable that hides the access method if it's the default.

Yea, I think that seems the least contentious solution.  Don't like it
too much, but it seems better than the alternative. I wonder if we want
one for multiple regression related issues, or whether one specifically
about table AMs is more appropriate. I lean towards the latter.


> > Similarly, if pg_dump starts to dump table access methods either
> > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> > unimportant differences.
>
> For pg_dump, track and set the default_table_access_method setting
> throughout the dump (similar to how default_with_oids was handled, I
> believe).

Yea, that's similar to that, and I think that makes sense.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Michael Paquier-2
In reply to this post by Andres Freund
On Tue, Jan 08, 2019 at 09:29:49AM -0800, Andres Freund wrote:
> On 2019-01-08 13:02:00 +0900, Michael Paquier wrote:
>> The specific-heap tests could be included as an extra module in
>> src/test/modules easily, so removing from the main tests what is not
>> completely transparent may make sense.
>
> Why does it need to be an extra module, rather than just extra regression
> files / sections of files that just explicitly specify the AM?  Seems a
> lot easier and faster.

The point would be to keep individual Makefiles simpler to maintain,
and separating things can make it simpler.  I cannot say for sure
without seeing how things would change though based on what you are
suggesting, and that may finish by being a matter of taste.

> In the core tests there's a fair number of things that can be cured by
> adding an ORDER BY to the tests, which seems sensible. We've added a lot
> of those over the years anyway.

When working on Postgres-XC I cursed about the need to add many ORDER
BY queries to ensure the ordering of tuples fetched from different
nodes, and we actually had an option to enforce the default
distribution used by tables, so that would be really nice to close the
gap.

> There's additionally a number of plans
> that change, which currently is handled by alternatives output files,
> but I think we should move to reduce those differences, that's probably
> not too hard.

Okay, that's not surprising.  I guess it depends on how many alternate
files are needed and if it is possible to split things so as we avoid
unnecessary output in alternate files.  A lot of things you are
proposing on this thread make sense in my experience.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Dmitry Dolgov
> On Tue, Jan 8, 2019 at 6:34 PM Andres Freund <[hidden email]> wrote:
>
> On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote:
> > On 08/01/2019 00:56, Andres Freund wrote:
> > > A patch at [2] adds display of a table's access method to \d+ - but that
> > > means that running the tests with a different default table access
> > > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > > there'll be a significant number of test failures, even though the test
> > > results did not meaningfully differ.
> >
> > For psql, a variable that hides the access method if it's the default.
>
> Yea, I think that seems the least contentious solution.  Don't like it
> too much, but it seems better than the alternative. I wonder if we want
> one for multiple regression related issues, or whether one specifically
> about table AMs is more appropriate. I lean towards the latter.

Are there any similar existing regression related issues? If no, then probably
the latter indeed makes more sense.

> > > Similarly, if pg_dump starts to dump table access methods either
> > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> > > unimportant differences.
> >
> > For pg_dump, track and set the default_table_access_method setting
> > throughout the dump (similar to how default_with_oids was handled, I
> > believe).
>
> Yea, that's similar to that, and I think that makes sense.

Yes, sounds like a reasonable approach, I can proceed with it.

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

akapila
In reply to this post by Andres Freund
On Tue, Jan 8, 2019 at 11:04 PM Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote:
> > On 08/01/2019 00:56, Andres Freund wrote:
> > > A patch at [2] adds display of a table's access method to \d+ - but that
> > > means that running the tests with a different default table access
> > > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > > there'll be a significant number of test failures, even though the test
> > > results did not meaningfully differ.
> >
> > For psql, a variable that hides the access method if it's the default.
>
> Yea, I think that seems the least contentious solution.
>

+1.

>  Don't like it
> too much, but it seems better than the alternative. I wonder if we want
> one for multiple regression related issues, or whether one specifically
> about table AMs is more appropriate. I lean towards the latter.
>

I didn't understand what is the earlier part "I wonder if we want one
for multiple regression related issues".  What do you mean by multiple
regression related issues?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Andres Freund
Hi,

On 2019-01-09 18:26:16 +0530, Amit Kapila wrote:

> On Tue, Jan 8, 2019 at 11:04 PM Andres Freund <[hidden email]> wrote:
> +1.
>
> >  Don't like it
> > too much, but it seems better than the alternative. I wonder if we want
> > one for multiple regression related issues, or whether one specifically
> > about table AMs is more appropriate. I lean towards the latter.
> >
>
> I didn't understand what is the earlier part "I wonder if we want one
> for multiple regression related issues".  What do you mean by multiple
> regression related issues?

One flag that covers all things that make psql output less useful for
regression test output, or one flag that just controls the table access
method display.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

akapila
On Wed, Jan 9, 2019 at 10:53 PM Andres Freund <[hidden email]> wrote:

>
> Hi,
>
> On 2019-01-09 18:26:16 +0530, Amit Kapila wrote:
> > On Tue, Jan 8, 2019 at 11:04 PM Andres Freund <[hidden email]> wrote:
> > +1.
> >
> > >  Don't like it
> > > too much, but it seems better than the alternative. I wonder if we want
> > > one for multiple regression related issues, or whether one specifically
> > > about table AMs is more appropriate. I lean towards the latter.
> > >
> >
> > I didn't understand what is the earlier part "I wonder if we want one
> > for multiple regression related issues".  What do you mean by multiple
> > regression related issues?
>
> One flag that covers all things that make psql output less useful for
> regression test output, or one flag that just controls the table access
> method display.
>

+1 for the later (one flag that just controls the table access method
display) as that looks clean.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Amit Khandekar-2
In reply to this post by Dmitry Dolgov
On Wed, 9 Jan 2019 at 14:30, Dmitry Dolgov <[hidden email]> wrote:

>
> > On Tue, Jan 8, 2019 at 6:34 PM Andres Freund <[hidden email]> wrote:
> >
> > On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote:
> > > On 08/01/2019 00:56, Andres Freund wrote:
> > > > A patch at [2] adds display of a table's access method to \d+ - but that
> > > > means that running the tests with a different default table access
> > > > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > > > there'll be a significant number of test failures, even though the test
> > > > results did not meaningfully differ.
> > >
> > > For psql, a variable that hides the access method if it's the default.
> >
> > Yea, I think that seems the least contentious solution.  Don't like it
> > too much, but it seems better than the alternative. I wonder if we want
> > one for multiple regression related issues, or whether one specifically
> > about table AMs is more appropriate. I lean towards the latter.
>
> Are there any similar existing regression related issues? If no, then probably
> the latter indeed makes more sense.
>
> > > > Similarly, if pg_dump starts to dump table access methods either
> > > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> > > > unimportant differences.
> > >
> > > For pg_dump, track and set the default_table_access_method setting
> > > throughout the dump (similar to how default_with_oids was handled, I
> > > believe).
> >
> > Yea, that's similar to that, and I think that makes sense.
>
> Yes, sounds like a reasonable approach, I can proceed with it.

Dmitry, I believe you have taken the pg_dump part only. If that's
right, I can proceed with the psql part. Does that sound right ?

>


--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Peter Eisentraut-6
In reply to this post by akapila
On 10/01/2019 04:58, Amit Kapila wrote:
>> One flag that covers all things that make psql output less useful for
>> regression test output, or one flag that just controls the table access
>> method display.
>>
> +1 for the later (one flag that just controls the table access method
> display) as that looks clean.

Yeah, I'd prefer a specific flag.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Displaying and dumping of table access methods

Dmitry Dolgov
In reply to this post by Amit Khandekar-2
> On Fri, Jan 11, 2019 at 6:02 AM Amit Khandekar <[hidden email]> wrote:
>
> > Yes, sounds like a reasonable approach, I can proceed with it.
>
> Dmitry, I believe you have taken the pg_dump part only. If that's
> right, I can proceed with the psql part. Does that sound right ?

Well, actually I've meant that I'm going to proceed with both, since I've
posted both psql and pg_dump patches. But of course you're welcome to submit
any new version or improvements you want.

12