Pluggable Storage - Andres's take

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
175 messages Options
1234 ... 9
Reply | Threaded
Open this post in threaded view
|

Pluggable Storage - Andres's take

Andres Freund
Hi,

As I've previously mentioned I had planned to spend some time to polish
Haribabu's version of the pluggable storage patch and rebase it on the
vtable based slot approach from [1]. While doing so I found more and
more things that I previously hadn't noticed. I started rewriting things
into something closer to what I think we want architecturally.

The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s).  But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.

The most fundamental issues I had with Haribabu's last version from [2]
are the following:

- The use of TableTuple, a typedef from void *, is bad from multiple
  fronts. For one it reduces just about all type safety. There were
  numerous bugs in the patch where things were just cast from HeapTuple
  to TableTuple to HeapTuple (and even to TupleTableSlot).  I think it's
  a really, really bad idea to introduce a vague type like this for
  development purposes alone, it makes it way too hard to refactor -
  essentially throwing the biggest benefit of type safe languages out of
  the window.

  Additionally I think it's also the wrong approach architecturally. We
  shouldn't assume that a tuple can efficiently be represented as a
  single palloc'ed chunk. In fact, we should move *away* from relying on
  that so much.

  I've thus removed the TableTuple type entirely.


- Previous verions of the patchset exposed Buffers in the tableam.h API,
  performed buffer locking / pinning / ExecStoreTuple() calls outside of
  it.  That is wrong in my opinion, as various AMs will deal very
  differently with buffer pinning & locking. The relevant logic is
  largely moved within the AM.  Bringing me to the next point:


- tableam exposed various operations based on HeapTuple/TableTuple's
  (and their Buffers). This all need be slot based, as we can't
  represent the way each AM will deal with this.  I've largely converted
  the API to be slot based.  That has some fallout, but I think largely
  works.  Lots of outdated comments.


- I think the move of the indexing from outside the table layer into the
  storage layer isn't a good idea. It lead to having to pass EState into
  the tableam, a callback API to perform index updates, etc.  This seems
  to have at least partially been triggered by the speculative insertion
  codepaths.  I've reverted this part of the changes.  The speculative
  insertion / confirm codepaths are now exposed to tableam.h - I think
  that's the right thing because we'll likely want to have that
  functionality across more than a single tuple in the future.


- The visibility functions relied on the *caller* performing buffer
  locking. That's not a great idea, because generic code shouldn't know
  about the locking scheme a particular AM needs.  I've changed the
  external visibility functions to instead take a slot, and perform the
  necessary locking inside.


- There were numerous tableam callback uses inside heapam.c - that makes
  no sense, we know what the storage is therein.  The relevant


- The integration between index lookups and heap lookups based on the
  results on a index lookup was IMO too tight.  The index code dealt
  with heap tuples, which isn't great.  I've introduced a new concept, a
  'IndexFetchTableData' scan. It's initialized when building an index
  scan, and provides the necessary state (say current heap buffer), to
  do table lookups from within a heap.


- The am of relations required for bootstrapping was set to 0 - I don't
  think that's a good idea. I changed it so it's set to the heap AM as
  well.


- HOT was encoded in the API in a bunch of places. That doesn't look
  right to me. I tried to improve a bit on that, but I'm not yet quite
  sure I like it. Needs written explanation & arguments...


- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
  a higher memory usage, because the resulting tuples weren't freed or
  anything. There might be a reason for doing such a change - we've
  certainly discussed that before - but I'm *vehemently* against doing
  that at the same time we introduce pluggable storage. Analyzing the
  performance effects will be hard enough without changes like this.


- I've for now backed out the heap rewrite changes, partially.  Mostly
  because I didn't like the way the abstraction looks, but haven't quite
  figured out how it should look like.


- I did not like that speculative tokens were moved to slots. There's
  really no reason for them to live outside parameters to tableam.h
  functsions.


- lotsa additional smaller changes.


- lotsa new bugs


My current working state is at [3] (urls to clone repo are at [4]).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work.  If others want I could
move repo to github and grant others write access.


I think the patchseries should eventually look like:

- move vacuumlazy.c (and other similar files) into access/heap, there's
  really nothing generic here. This is a fairly independent task.
- slot'ify FDW RefetchForeignRow_function
- vtable based slot API, based on [1]
- slot'ify trigger API
- redo EPQ based on slots (prototyped in my git tree)
- redo trigger API to be slot based
- tuple traversal API changes
- tableam infrastructure, with error if a non-builtin AM is chosen
- move heap and calling code to be tableam based
- make vacuum callback based (not vacuum.c, just vacuumlazy.c)
- [other patches]
- allow other AMs
- introduce test AM


Tasks / Questions:

- split up patch
- Change heap table AM to not allocate handler function for each table,
  instead allocate it statically. Avoids a significant amount of data
  duplication, and allows for a few more compiler optimizations.
- Merge tableam.h and tableamapi.h and make most tableam.c functions
  small inline functions. Having one-line tableam.c wrappers makes this
  more expensive than necessary. We'll have a big enough trouble not
  regressing performancewise.
- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid
- COPY's multi_insert path should probably deal with a bunch of slots,
  rather than forming HeapTuples
- bitmap index scans probably need a new tableam.h callback, abstracting
  bitgetpage()
- suspect IndexBuildHeapScan might need to move into the tableam.h API -
  it's not clear to me that it's realistically possible to this in a
  generic manner.

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20180220224318.gw4oe5jadhpmcdnm%40alap3.anarazel.de
[2] http://archives.postgresql.org/message-id/CAJrrPGcN5A4jH0PJ-s=6k3+SLA4pozC4HHRdmvU1ZBuA20TE-A@...
[3] https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/pluggable-storage
[4] https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary

Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2

On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <[hidden email]> wrote:
Hi,

As I've previously mentioned I had planned to spend some time to polish
Haribabu's version of the pluggable storage patch and rebase it on the
vtable based slot approach from [1]. While doing so I found more and
more things that I previously hadn't noticed. I started rewriting things
into something closer to what I think we want architecturally.

Thanks for the deep review and changes.
 
The current state of my version of the patch is *NOT* ready for proper
review (it doesn't even pass all tests, there's FIXME / elog()s).  But I
think it's getting close enough to it's eventual shape that more eyes,
and potentially more hands on keyboards, can be useful.

I will try to update it to make sure that it passes all the tests and also try to
reduce the FIXME's.
 
The most fundamental issues I had with Haribabu's last version from [2]
are the following:

- The use of TableTuple, a typedef from void *, is bad from multiple
  fronts. For one it reduces just about all type safety. There were
  numerous bugs in the patch where things were just cast from HeapTuple
  to TableTuple to HeapTuple (and even to TupleTableSlot).  I think it's
  a really, really bad idea to introduce a vague type like this for
  development purposes alone, it makes it way too hard to refactor -
  essentially throwing the biggest benefit of type safe languages out of
  the window.

My earlier intention was to remove the HeapTuple usage entirely and replace
it with slot everywhere outside the tableam. But it ended up with TableTuple
before it reach to the stage because of heavy use of HeapTuple.
 
  Additionally I think it's also the wrong approach architecturally. We
  shouldn't assume that a tuple can efficiently be represented as a
  single palloc'ed chunk. In fact, we should move *away* from relying on
  that so much.

  I've thus removed the TableTuple type entirely.

Thanks for the changes, I didn't check the code yet, so for now whenever the
HeapTuple is required it will be generated from slot?
 
- Previous verions of the patchset exposed Buffers in the tableam.h API,
  performed buffer locking / pinning / ExecStoreTuple() calls outside of
  it.  That is wrong in my opinion, as various AMs will deal very
  differently with buffer pinning & locking. The relevant logic is
  largely moved within the AM.  Bringing me to the next point:


- tableam exposed various operations based on HeapTuple/TableTuple's
  (and their Buffers). This all need be slot based, as we can't
  represent the way each AM will deal with this.  I've largely converted
  the API to be slot based.  That has some fallout, but I think largely
  works.  Lots of outdated comments.

Yes, I agree with you.
 
- I think the move of the indexing from outside the table layer into the
  storage layer isn't a good idea. It lead to having to pass EState into
  the tableam, a callback API to perform index updates, etc.  This seems
  to have at least partially been triggered by the speculative insertion
  codepaths.  I've reverted this part of the changes.  The speculative
  insertion / confirm codepaths are now exposed to tableam.h - I think
  that's the right thing because we'll likely want to have that
  functionality across more than a single tuple in the future.


- The visibility functions relied on the *caller* performing buffer
  locking. That's not a great idea, because generic code shouldn't know
  about the locking scheme a particular AM needs.  I've changed the
  external visibility functions to instead take a slot, and perform the
  necessary locking inside. 

When I first moved all the visibility functions as part of tableam, I find this
problem, and it will be good if it takes of buffer locking and etc. 


- There were numerous tableam callback uses inside heapam.c - that makes
  no sense, we know what the storage is therein.  The relevant


- The integration between index lookups and heap lookups based on the
  results on a index lookup was IMO too tight.  The index code dealt
  with heap tuples, which isn't great.  I've introduced a new concept, a
  'IndexFetchTableData' scan. It's initialized when building an index
  scan, and provides the necessary state (say current heap buffer), to
  do table lookups from within a heap.

I agree that it will be good with the new concept from index to access the
heap.
 
- The am of relations required for bootstrapping was set to 0 - I don't
  think that's a good idea. I changed it so it's set to the heap AM as
  well.


- HOT was encoded in the API in a bunch of places. That doesn't look
  right to me. I tried to improve a bit on that, but I'm not yet quite
  sure I like it. Needs written explanation & arguments...


- the heap tableam did a heap_copytuple() nearly everywhere. Leading to
  a higher memory usage, because the resulting tuples weren't freed or
  anything. There might be a reason for doing such a change - we've
  certainly discussed that before - but I'm *vehemently* against doing
  that at the same time we introduce pluggable storage. Analyzing the
  performance effects will be hard enough without changes like this.

How about using of slot instead of tuple and reusing of it? I don't know
yet whether it is possible everywhere.
 

- I've for now backed out the heap rewrite changes, partially.  Mostly
  because I didn't like the way the abstraction looks, but haven't quite
  figured out how it should look like.


- I did not like that speculative tokens were moved to slots. There's
  really no reason for them to live outside parameters to tableam.h
  functsions.


- lotsa additional smaller changes.


- lotsa new bugs

Thanks for all the changes.
 

My current working state is at [3] (urls to clone repo are at [4]).
This is *HEAVILY WIP*. I plan to continue working on it over the next
days, but I'll temporarily focus onto v11 work.  If others want I could
move repo to github and grant others write access.

Yes, I want to access the code and do further development on it.

 

Tasks / Questions:

- split up patch

How about generating refactoring changes as patches first based on
the code in your repo as discussed here[1]?
 
- Change heap table AM to not allocate handler function for each table,
  instead allocate it statically. Avoids a significant amount of data
  duplication, and allows for a few more compiler optimizations.

Some kind of static variable handlers for each tableam, but need to check
how can we access that static handler from the relation.
 
- Merge tableam.h and tableamapi.h and make most tableam.c functions
  small inline functions. Having one-line tableam.c wrappers makes this
  more expensive than necessary. We'll have a big enough trouble not
  regressing performancewise.

OK.
 
- change scan level slot creation to use tableam function for doing so
- get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid

so with this there shouldn't be a way from slot to tid mapping or there 
should be some other way.
 
- COPY's multi_insert path should probably deal with a bunch of slots,
  rather than forming HeapTuples

OK. 

- bitmap index scans probably need a new tableam.h callback, abstracting
  bitgetpage()

OK.

Regards,
Haribabu Kommi
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Alexander Korotkov
In reply to this post by Andres Freund
Hi!

On Tue, Jul 3, 2018 at 10:06 AM Andres Freund <[hidden email]> wrote:
> As I've previously mentioned I had planned to spend some time to polish
> Haribabu's version of the pluggable storage patch and rebase it on the
> vtable based slot approach from [1]. While doing so I found more and
> more things that I previously hadn't noticed. I started rewriting things
> into something closer to what I think we want architecturally.

Great, thank you for working on this patchset!

> The current state of my version of the patch is *NOT* ready for proper
> review (it doesn't even pass all tests, there's FIXME / elog()s).  But I
> think it's getting close enough to it's eventual shape that more eyes,
> and potentially more hands on keyboards, can be useful.
>
> The most fundamental issues I had with Haribabu's last version from [2]
> are the following:
>
> - The use of TableTuple, a typedef from void *, is bad from multiple
>   fronts. For one it reduces just about all type safety. There were
>   numerous bugs in the patch where things were just cast from HeapTuple
>   to TableTuple to HeapTuple (and even to TupleTableSlot).  I think it's
>   a really, really bad idea to introduce a vague type like this for
>   development purposes alone, it makes it way too hard to refactor -
>   essentially throwing the biggest benefit of type safe languages out of
>   the window.
>
>   Additionally I think it's also the wrong approach architecturally. We
>   shouldn't assume that a tuple can efficiently be represented as a
>   single palloc'ed chunk. In fact, we should move *away* from relying on
>   that so much.
>
>   I've thus removed the TableTuple type entirely.

+1, TableTuple was vague concept.

> - Previous verions of the patchset exposed Buffers in the tableam.h API,
>   performed buffer locking / pinning / ExecStoreTuple() calls outside of
>   it.  That is wrong in my opinion, as various AMs will deal very
>   differently with buffer pinning & locking. The relevant logic is
>   largely moved within the AM.  Bringing me to the next point:
>
>
> - tableam exposed various operations based on HeapTuple/TableTuple's
>   (and their Buffers). This all need be slot based, as we can't
>   represent the way each AM will deal with this.  I've largely converted
>   the API to be slot based.  That has some fallout, but I think largely
>   works.  Lots of outdated comments.

Makes sense for me.  I like passing TupleTableSlot  to tableam api
function much more.

> - I think the move of the indexing from outside the table layer into the
>   storage layer isn't a good idea. It lead to having to pass EState into
>   the tableam, a callback API to perform index updates, etc.  This seems
>   to have at least partially been triggered by the speculative insertion
>   codepaths.  I've reverted this part of the changes.  The speculative
>   insertion / confirm codepaths are now exposed to tableam.h - I think
>   that's the right thing because we'll likely want to have that
>   functionality across more than a single tuple in the future.

I agree that passing EState into tableam doesn't look good.  But I
believe that tableam needs way more control over indexes than it has
in your version patch.  If even tableam-independent insertion into
indexes on tuple insert is more or less ok, but on update we need
something smarter rather than just insert index tuples depending
"update_indexes" flag.  Tableam specific update method may decide to
update only some of indexes.  For example, when zheap performs update
in-place then it inserts to only indexes, whose fields were updated.
And I think any undo-log based storage would have similar behavior.
Moreover, it might be required to do something with existing index
tuples (for instance, as I know, zheap sets "deleted" flag to index
tuples related to previous values of updated fields).

If we would like to move indexing outside of tableam, then we might
turn "update_indexes" from bool to more enum with values like: "don't
insert index tuples", "insert all index tuples", "insert index tuples
only for updated fields" and so on.  But that looks more like set of
hardcoded cases for particular implementations, than proper API.  So,
probably we shouldn't move indexing outside of tableam, but rather
provide better wrappers for doing that in tableam?

> - The visibility functions relied on the *caller* performing buffer
>   locking. That's not a great idea, because generic code shouldn't know
>   about the locking scheme a particular AM needs.  I've changed the
>   external visibility functions to instead take a slot, and perform the
>   necessary locking inside.

Makes sense for me.  But would it cause extra locking/unlocking and in
turn performance impact?

> - There were numerous tableam callback uses inside heapam.c - that makes
>   no sense, we know what the storage is therein.  The relevant

Ok.

> - The integration between index lookups and heap lookups based on the
>   results on a index lookup was IMO too tight.  The index code dealt
>   with heap tuples, which isn't great.  I've introduced a new concept, a
>   'IndexFetchTableData' scan. It's initialized when building an index
>   scan, and provides the necessary state (say current heap buffer), to
>   do table lookups from within a heap.

+1

> - The am of relations required for bootstrapping was set to 0 - I don't
>   think that's a good idea. I changed it so it's set to the heap AM as
>   well.

+1

> - HOT was encoded in the API in a bunch of places. That doesn't look
>   right to me. I tried to improve a bit on that, but I'm not yet quite
>   sure I like it. Needs written explanation & arguments...

Yes, HOT is heapam specific feature.  Other tableams might not have
HOT.  But it appears that we still expose hot_search_buffer() function
in tableam API.  But that function have no usage, so it's just
redundant and can be removed.

> - the heap tableam did a heap_copytuple() nearly everywhere. Leading to
>   a higher memory usage, because the resulting tuples weren't freed or
>   anything. There might be a reason for doing such a change - we've
>   certainly discussed that before - but I'm *vehemently* against doing
>   that at the same time we introduce pluggable storage. Analyzing the
>   performance effects will be hard enough without changes like this.

I think once we switched to slots, doing heap_copytuple() do
frequently is not required anymore.

> - I've for now backed out the heap rewrite changes, partially.  Mostly
>   because I didn't like the way the abstraction looks, but haven't quite
>   figured out how it should look like.

Yeah, it's hard part, but we need to invent something in this area...

> - I did not like that speculative tokens were moved to slots. There's
>   really no reason for them to live outside parameters to tableam.h
>   functsions.

Good.

> My current working state is at [3] (urls to clone repo are at [4]).
> This is *HEAVILY WIP*. I plan to continue working on it over the next
> days, but I'll temporarily focus onto v11 work.  If others want I could
> move repo to github and grant others write access.

Github would be more convinient for me.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Alexander Korotkov
On Thu, Jul 5, 2018 at 3:25 PM Alexander Korotkov
<[hidden email]> wrote:
> > My current working state is at [3] (urls to clone repo are at [4]).
> > This is *HEAVILY WIP*. I plan to continue working on it over the next
> > days, but I'll temporarily focus onto v11 work.  If others want I could
> > move repo to github and grant others write access.
>
> Github would be more convinient for me.

I've another note.  It appears that you leave my patch for locking
last version of tuple in one call (heapam_lock_tuple() function)
almost without changes.  During PGCon 2018 Developer meeting I
remember you was somewhat unhappy with this approach.  So, do you have
any notes about that for now?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Andres Freund
In reply to this post by Haribabu Kommi-2
Hi,

I've pushed up a new version to
https://github.com/anarazel/postgres-pluggable-storage which now passes
all the tests.

Besides a lot of bugfixes, I've rebased the tree, moved TriggerData to
be primarily slot based (with a conversion roundtrip when calling
trigger functions), and a lot of other small things.


On 2018-07-04 20:11:21 +1000, Haribabu Kommi wrote:

> On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <[hidden email]> wrote:
> > The current state of my version of the patch is *NOT* ready for proper
> > review (it doesn't even pass all tests, there's FIXME / elog()s).  But I
> > think it's getting close enough to it's eventual shape that more eyes,
> > and potentially more hands on keyboards, can be useful.
> >
>
> I will try to update it to make sure that it passes all the tests and also
> try to
> reduce the FIXME's.

Cool.

Alexander, Haribabu, if you give me (privately) your github accounts,
I'll give you write access to that repository.


> > The most fundamental issues I had with Haribabu's last version from [2]
> > are the following:
> >
> > - The use of TableTuple, a typedef from void *, is bad from multiple
> >   fronts. For one it reduces just about all type safety. There were
> >   numerous bugs in the patch where things were just cast from HeapTuple
> >   to TableTuple to HeapTuple (and even to TupleTableSlot).  I think it's
> >   a really, really bad idea to introduce a vague type like this for
> >   development purposes alone, it makes it way too hard to refactor -
> >   essentially throwing the biggest benefit of type safe languages out of
> >   the window.
> >
>
> My earlier intention was to remove the HeapTuple usage entirely and replace
> it with slot everywhere outside the tableam. But it ended up with TableTuple
> before it reach to the stage because of heavy use of HeapTuple.

I don't think that's necessary - a lot of the system catalog accesses
are going to continue to be heap tuple accesses. And the conversions you
did significantly continue to access TableTuples as heap tuples - it was
just that the compiler didn't warn about it anymore.

A prime example of that is the way the rewriteheap / cluster
integreation was done. Cluster continued to sort tuples as heap tuples -
even though that's likely incompatible with other tuple formats which
need different state.


> >   Additionally I think it's also the wrong approach architecturally. We
> >   shouldn't assume that a tuple can efficiently be represented as a
> >   single palloc'ed chunk. In fact, we should move *away* from relying on
> >   that so much.
> >
> >   I've thus removed the TableTuple type entirely.
> >
>
> Thanks for the changes, I didn't check the code yet, so for now whenever the
> HeapTuple is required it will be generated from slot?

Pretty much.


> > - the heap tableam did a heap_copytuple() nearly everywhere. Leading to
> >   a higher memory usage, because the resulting tuples weren't freed or
> >   anything. There might be a reason for doing such a change - we've
> >   certainly discussed that before - but I'm *vehemently* against doing
> >   that at the same time we introduce pluggable storage. Analyzing the
> >   performance effects will be hard enough without changes like this.
> >
>
> How about using of slot instead of tuple and reusing of it? I don't know
> yet whether it is possible everywhere.

Not quite sure what you mean?


> > Tasks / Questions:
> >
> > - split up patch
> >
>
> How about generating refactoring changes as patches first based on
> the code in your repo as discussed here[1]?

Sure - it was just at the moment too much work ;)


> > - Change heap table AM to not allocate handler function for each table,
> >   instead allocate it statically. Avoids a significant amount of data
> >   duplication, and allows for a few more compiler optimizations.
> >
>
> Some kind of static variable handlers for each tableam, but need to check
> how can we access that static handler from the relation.

I'm not sure what you mean by "how can we access"? We can just return a
pointer from the constant data from the current handler? Except that
adding a bunch of consts would be good, the external interface wouldn't
need to change?


> > - change scan level slot creation to use tableam function for doing so
> > - get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid
> >
>
> so with this there shouldn't be a way from slot to tid mapping or there
> should be some other way.

I'm not sure I follow?


> - bitmap index scans probably need a new tableam.h callback, abstracting
> >   bitgetpage()
> >
>
> OK.

Any chance you could try to tackle this?  I'm going to be mostly out
this week, so we'd probably not run across each others feet...

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Andres Freund
In reply to this post by Alexander Korotkov
Hi,

On 2018-07-05 15:25:25 +0300, Alexander Korotkov wrote:

> > - I think the move of the indexing from outside the table layer into the
> >   storage layer isn't a good idea. It lead to having to pass EState into
> >   the tableam, a callback API to perform index updates, etc.  This seems
> >   to have at least partially been triggered by the speculative insertion
> >   codepaths.  I've reverted this part of the changes.  The speculative
> >   insertion / confirm codepaths are now exposed to tableam.h - I think
> >   that's the right thing because we'll likely want to have that
> >   functionality across more than a single tuple in the future.
>
> I agree that passing EState into tableam doesn't look good.  But I
> believe that tableam needs way more control over indexes than it has
> in your version patch.  If even tableam-independent insertion into
> indexes on tuple insert is more or less ok, but on update we need
> something smarter rather than just insert index tuples depending
> "update_indexes" flag.  Tableam specific update method may decide to
> update only some of indexes.  For example, when zheap performs update
> in-place then it inserts to only indexes, whose fields were updated.
> And I think any undo-log based storage would have similar behavior.
> Moreover, it might be required to do something with existing index
> tuples (for instance, as I know, zheap sets "deleted" flag to index
> tuples related to previous values of updated fields).

I agree that we probably need more - I'm just inclined to think that we
need a more concrete target to work against.  Currently zheap's indexing
logic still is fairly naive, I don't think we'll get the interface right
without having worked further on the zheap side of things.


> > - The visibility functions relied on the *caller* performing buffer
> >   locking. That's not a great idea, because generic code shouldn't know
> >   about the locking scheme a particular AM needs.  I've changed the
> >   external visibility functions to instead take a slot, and perform the
> >   necessary locking inside.
>
> Makes sense for me.  But would it cause extra locking/unlocking and in
> turn performance impact?

I don't think so - nearly all the performance relevant cases do all the
visibility logic inside the AM. Where the underlying functions can be
used, that do not do the locking.  Pretty all the converted places just
had manual LockBuffer calls.


> > - HOT was encoded in the API in a bunch of places. That doesn't look
> >   right to me. I tried to improve a bit on that, but I'm not yet quite
> >   sure I like it. Needs written explanation & arguments...
>
> Yes, HOT is heapam specific feature.  Other tableams might not have
> HOT.  But it appears that we still expose hot_search_buffer() function
> in tableam API.  But that function have no usage, so it's just
> redundant and can be removed.

Yea, that was a leftover.


> > - the heap tableam did a heap_copytuple() nearly everywhere. Leading to
> >   a higher memory usage, because the resulting tuples weren't freed or
> >   anything. There might be a reason for doing such a change - we've
> >   certainly discussed that before - but I'm *vehemently* against doing
> >   that at the same time we introduce pluggable storage. Analyzing the
> >   performance effects will be hard enough without changes like this.
>
> I think once we switched to slots, doing heap_copytuple() do
> frequently is not required anymore.

It's mostly gone now.


> > - I've for now backed out the heap rewrite changes, partially.  Mostly
> >   because I didn't like the way the abstraction looks, but haven't quite
> >   figured out how it should look like.
>
> Yeah, it's hard part, but we need to invent something in this area...

I agree. But I really don't yet quite now what. I somewhat wonder if we
should just add a cluster_rel() callback to the tableam and let it deal
with everything :(.   As previously proposed the interface wouldn't have
worked with anything not losslessly encodable into a heaptuple, which is
unlikely to be sufficient.


FWIW, I plan to be mostly out until Thursday this week, and then I'll
rebase onto the new version of the abstract slot patch and then try to
split up the patchset. Once that's done, I'll do a prototype conversion
of zheap, which I'm sure will show up a lot of weaknesses in the current
abstraction.  Once that's done I hope we can collaborate / divide &
conquer to get the individual pieces into commit shape.

If either of you wants to get a head start separating something out,
let's try to organize who would do what? The EPQ and trigger
slotification are probably good candidates.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2
In reply to this post by Andres Freund

On Mon, Jul 16, 2018 at 11:35 PM Andres Freund <[hidden email]> wrote:
On 2018-07-04 20:11:21 +1000, Haribabu Kommi wrote:
> On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <[hidden email]> wrote:

> > The most fundamental issues I had with Haribabu's last version from [2]
> > are the following:
> >
> > - The use of TableTuple, a typedef from void *, is bad from multiple
> >   fronts. For one it reduces just about all type safety. There were
> >   numerous bugs in the patch where things were just cast from HeapTuple
> >   to TableTuple to HeapTuple (and even to TupleTableSlot).  I think it's
> >   a really, really bad idea to introduce a vague type like this for
> >   development purposes alone, it makes it way too hard to refactor -
> >   essentially throwing the biggest benefit of type safe languages out of
> >   the window.
> >
>
> My earlier intention was to remove the HeapTuple usage entirely and replace
> it with slot everywhere outside the tableam. But it ended up with TableTuple
> before it reach to the stage because of heavy use of HeapTuple.

I don't think that's necessary - a lot of the system catalog accesses
are going to continue to be heap tuple accesses. And the conversions you
did significantly continue to access TableTuples as heap tuples - it was
just that the compiler didn't warn about it anymore.

A prime example of that is the way the rewriteheap / cluster
integreation was done. Cluster continued to sort tuples as heap tuples -
even though that's likely incompatible with other tuple formats which
need different state.

OK. Understood.
 
> > - the heap tableam did a heap_copytuple() nearly everywhere. Leading to
> >   a higher memory usage, because the resulting tuples weren't freed or
> >   anything. There might be a reason for doing such a change - we've
> >   certainly discussed that before - but I'm *vehemently* against doing
> >   that at the same time we introduce pluggable storage. Analyzing the
> >   performance effects will be hard enough without changes like this.
> >
>
> How about using of slot instead of tuple and reusing of it? I don't know
> yet whether it is possible everywhere.

Not quite sure what you mean?

I thought of using slot everywhere can reduce the use of heap_copytuple, 
I understand that you already did those changes from you reply from the
other mail.
 

> > Tasks / Questions:
> >
> > - split up patch
> >
>
> How about generating refactoring changes as patches first based on
> the code in your repo as discussed here[1]?

Sure - it was just at the moment too much work ;)

Yes, it is too much work. How about doing this once most of the
open items are finished?
 

> > - Change heap table AM to not allocate handler function for each table,
> >   instead allocate it statically. Avoids a significant amount of data
> >   duplication, and allows for a few more compiler optimizations.
> >
>
> Some kind of static variable handlers for each tableam, but need to check
> how can we access that static handler from the relation.

I'm not sure what you mean by "how can we access"? We can just return a
pointer from the constant data from the current handler? Except that
adding a bunch of consts would be good, the external interface wouldn't
need to change?

I mean we may need to store some tableam ID in each table, so that based on
that ID we get the static tableam handler, because at a time there may be
tables from different tableam methods.
 

> > - change scan level slot creation to use tableam function for doing so
> > - get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid
> >
>
> so with this there shouldn't be a way from slot to tid mapping or there
> should be some other way.

I'm not sure I follow?

Replacing of heaptuple with slot, currently with slot only the tid is passed
to the tableam methods, To get rid of the tid from slot, we may need any
other method of passing?
 
> - bitmap index scans probably need a new tableam.h callback, abstracting
> >   bitgetpage()
> >
>
> OK.

Any chance you could try to tackle this?  I'm going to be mostly out
this week, so we'd probably not run across each others feet...


OK, I will take care of the above point.

Regards,
Haribabu Kommi
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2
On Tue, Jul 17, 2018 at 11:01 PM Haribabu Kommi <[hidden email]> wrote:

On Mon, Jul 16, 2018 at 11:35 PM Andres Freund <[hidden email]> wrote:
On 2018-07-04 20:11:21 +1000, Haribabu Kommi wrote:
> On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <[hidden email]> wrote:

> - bitmap index scans probably need a new tableam.h callback, abstracting
> >   bitgetpage()
> >
>
> OK.

Any chance you could try to tackle this?  I'm going to be mostly out
this week, so we'd probably not run across each others feet...

OK, I will take care of the above point.

I added new API in the tableam.h to get all the page visible tuples to
abstract the bitgetpage() function.

>>- Merge tableam.h and tableamapi.h and make most tableam.c functions
>> small inline functions. Having one-line tableam.c wrappers makes this
>> more expensive than necessary. We'll have a big enough trouble not
>> regressing performancewise.

I merged tableam.h and tableamapi.h into tableam.h and changed all the
functions as inline. This change may have added some additional headers,
will check them if I can remove their need.

Attached are the updated patches on top your github tree.

Currently I am working on the following.
- I observed that there is a crash when running isolation tests.
- COPY's multi_insert path should probably deal with a bunch of slots,
  rather than forming HeapTuples

Regards,
Haribabu Kommi
Fujitsu Australia

0002-New-API-to-get-heap-page-tuples.patch (12K) Download Attachment
0001-Merge-tableam.h-and-tableamapi.h.patch (80K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2
On Tue, Jul 24, 2018 at 11:31 PM Haribabu Kommi <[hidden email]> wrote:
On Tue, Jul 17, 2018 at 11:01 PM Haribabu Kommi <[hidden email]> wrote:

I added new API in the tableam.h to get all the page visible tuples to
abstract the bitgetpage() function.

>>- Merge tableam.h and tableamapi.h and make most tableam.c functions
>> small inline functions. Having one-line tableam.c wrappers makes this
>> more expensive than necessary. We'll have a big enough trouble not
>> regressing performancewise.

I merged tableam.h and tableamapi.h into tableam.h and changed all the
functions as inline. This change may have added some additional headers,
will check them if I can remove their need.

Attached are the updated patches on top your github tree.

Currently I am working on the following.
- I observed that there is a crash when running isolation tests.

while investing the crash, I observed that it is due to the lot of FIXME's in
the code. So I just fixed minimal changes and looking into correcting
the FIXME's first.

One thing I observed is lack relation pointer is leading to crash in the
flow of EvalPlan* functions, because all ROW_MARK types doesn't
contains relation pointer.

will continue to check all FIXME fixes.
 
- COPY's multi_insert path should probably deal with a bunch of slots,
  rather than forming HeapTuples

Implemented supporting of slots in the copy multi insert path.

Regards,
Haribabu Kommi
Fujitsu Australia

0002-Isolation-test-fixes-1.patch (3K) Download Attachment
0001-COPY-s-multi_insert-path-deal-with-bunch-of-slots.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Andres Freund
Hi,

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with.  I plan to push a tree including a lot of fixes and improvements
soon.

On 2018-08-03 12:35:50 +1000, Haribabu Kommi wrote:

> while investing the crash, I observed that it is due to the lot of FIXME's
> in
> the code. So I just fixed minimal changes and looking into correcting
> the FIXME's first.
>
> One thing I observed is lack relation pointer is leading to crash in the
> flow of EvalPlan* functions, because all ROW_MARK types doesn't
> contains relation pointer.
>
> will continue to check all FIXME fixes.

Thanks.


> > - COPY's multi_insert path should probably deal with a bunch of slots,
> >   rather than forming HeapTuples
> >
>
> Implemented supporting of slots in the copy multi insert path.

Cool. I've not yet looked at it, but I plan to do so soon.  Will have to
rebase over the other copy changes first :(


- Andres

Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2
On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <[hidden email]> wrote:
Hi,

I'm currently in the process of rebasing zheap onto the pluggable
storage work. The goal, which seems to work surprisingly well, is to
find issues that the current pluggable storage patch doesn't yet deal
with.  I plan to push a tree including a lot of fixes and improvements
soon.

Sorry for coming late to this thread.

That's good. Did you find any problems in porting zheap into pluggable
storage? Does it needs any API changes or new API requirement?
 
On 2018-08-03 12:35:50 +1000, Haribabu Kommi wrote:
> while investing the crash, I observed that it is due to the lot of FIXME's
> in
> the code. So I just fixed minimal changes and looking into correcting
> the FIXME's first.
>
> One thing I observed is lack relation pointer is leading to crash in the
> flow of EvalPlan* functions, because all ROW_MARK types doesn't
> contains relation pointer.
>
> will continue to check all FIXME fixes.

Thanks.

I fixed some of the Isolation test problems. All the issues are related to
EPQ slot handling. Still more needs to be fixed.

Does the new TupleTableSlot abstraction patches has fixed any of these
issues in the recent thread [1]? so that I can look into the change of FDW API
to return slot instead of tuple.  


> > - COPY's multi_insert path should probably deal with a bunch of slots,
> >   rather than forming HeapTuples
> >
>
> Implemented supporting of slots in the copy multi insert path.

Cool. I've not yet looked at it, but I plan to do so soon.  Will have to
rebase over the other copy changes first :(

OK. Understood. There are many changes in the COPY flow conflicts
with my changes. Please let me know once you done the rebase, I can
fix those conflicts and regenerate the patch.

Attached is the patch with further fixes.

[1] - https://www.postgresql.org/message-id/CAFjFpRcNPQ1oOL41-HQYaEF%3DNq6Vbg0eHeFgopJhHw_X2usA5w%40mail.gmail.com

Regards,
Haribabu Kommi
Fujitsu Australia

0001-isolation-test-fixes-2.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Andres Freund
Hi,

On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
> On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <[hidden email]> wrote:
> > I'm currently in the process of rebasing zheap onto the pluggable
> > storage work. The goal, which seems to work surprisingly well, is to
> > find issues that the current pluggable storage patch doesn't yet deal
> > with.  I plan to push a tree including a lot of fixes and improvements
> > soon.
> >
>
> Sorry for coming late to this thread.

No worries.


> That's good. Did you find any problems in porting zheap into pluggable
> storage? Does it needs any API changes or new API requirement?

A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap scans
- introduction of callbacks for vacuum_rel, cluster

And quite a bit more along those lines.

> Does the new TupleTableSlot abstraction patches has fixed any of these
> issues in the recent thread [1]? so that I can look into the change of
> FDW API to return slot instead of tuple.

Yea, that'd be a good thing to start with.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2

On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <[hidden email]> wrote:
On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
> On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <[hidden email]> wrote:
> > I'm currently in the process of rebasing zheap onto the pluggable
> > storage work. The goal, which seems to work surprisingly well, is to
> > find issues that the current pluggable storage patch doesn't yet deal
> > with.  I plan to push a tree including a lot of fixes and improvements
> > soon.
> >
> That's good. Did you find any problems in porting zheap into pluggable
> storage? Does it needs any API changes or new API requirement?

A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap scans
- introduction of callbacks for vacuum_rel, cluster

And quite a bit more along those lines.

OK. Those are quite a bit of changes.
 
> Does the new TupleTableSlot abstraction patches has fixed any of these
> issues in the recent thread [1]? so that I can look into the change of
> FDW API to return slot instead of tuple.

Yea, that'd be a good thing to start with.

I found out only the RefetchForeignRow API needs the change and done the same.
Along with that, I fixed all the issues of  running make check-world. Attached patches 
for the same.

Now I will look into the remaining FIXME's that don't conflict with your further changes.

Regards,
Haribabu Kommi
Fujitsu Australia

0002-check-world-fixes.patch (11K) Download Attachment
0001-FDW-RefetchForeignRow-API-prototype-change.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Andres Freund
Hi,

On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:

> On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <[hidden email]> wrote:
>
> > On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
> > > On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <[hidden email]> wrote:
> > > > I'm currently in the process of rebasing zheap onto the pluggable
> > > > storage work. The goal, which seems to work surprisingly well, is to
> > > > find issues that the current pluggable storage patch doesn't yet deal
> > > > with.  I plan to push a tree including a lot of fixes and improvements
> > > > soon.
> > > >
> > > That's good. Did you find any problems in porting zheap into pluggable
> > > storage? Does it needs any API changes or new API requirement?
> >
> > A lot, yes. The big changes are:
> > - removal of HeapPageScanDesc
> > - introduction of explicit support functions for tablesample & bitmap scans
> > - introduction of callbacks for vacuum_rel, cluster
> >
> > And quite a bit more along those lines.
> >
>
> OK. Those are quite a bit of changes.

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that.  There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

There's a few further changes since last time: - Pluggable handlers are
now stored in static global variables, and thus do not need to be copied
anymore - VACUUM FULL / CLUSTER is moved into one callback that does the
actual copying. The various previous rewrite callbacks imo integrated at
the wrong level.  - there's a GUC that allows to change the default
table AM - moving COPY to use slots (roughly based on your / Haribabu's
patch) - removed the AM specific shmem initialization callbacks -
various AMs are going to need the syncscan APIs, so moving that into AM
callbacks doesn't make sense.

Missing:
- callback for the second scan of CREATE INDEX CONCURRENTLY
- commands/analyze.c integration (Working on it)
- fixing your (Haribabu's) slotification of copy patch to compute memory
  usage somehow
- table creation callback, currently the pluggable-zheap patch has a few
  conditionals outside of access/zheap for that purpose (see RelationTruncate
- header structure cleanup

And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches


You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?


> > > Does the new TupleTableSlot abstraction patches has fixed any of these
> > > issues in the recent thread [1]? so that I can look into the change of
> > > FDW API to return slot instead of tuple.
> >
> > Yea, that'd be a good thing to start with.
> >
>
> I found out only the RefetchForeignRow API needs the change and done the
> same.
> Along with that, I fixed all the issues of  running make check-world.
> Attached patches
> for the same.

Thanks, that's really helpful!  I'll try to merge these soon.


I'm starting to think that we're getting closer to something that
looks right from a high level, even though there's a lot of details to
clean up.


Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2
On Fri, Aug 24, 2018 at 12:50 PM Andres Freund <[hidden email]> wrote:
Hi,

On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:
> On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <[hidden email]> wrote:
>
> > On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
> > > On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <[hidden email]> wrote:
> > > > I'm currently in the process of rebasing zheap onto the pluggable
> > > > storage work. The goal, which seems to work surprisingly well, is to
> > > > find issues that the current pluggable storage patch doesn't yet deal
> > > > with.  I plan to push a tree including a lot of fixes and improvements
> > > > soon.
> > > >
> > > That's good. Did you find any problems in porting zheap into pluggable
> > > storage? Does it needs any API changes or new API requirement?
> >
> > A lot, yes. The big changes are:
> > - removal of HeapPageScanDesc
> > - introduction of explicit support functions for tablesample & bitmap scans
> > - introduction of callbacks for vacuum_rel, cluster
> >
> > And quite a bit more along those lines.
> >
>
> OK. Those are quite a bit of changes.

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that.  There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

OK. Thanks, will check that also.
 
There's a few further changes since last time: - Pluggable handlers are
now stored in static global variables, and thus do not need to be copied
anymore - VACUUM FULL / CLUSTER is moved into one callback that does the
actual copying. The various previous rewrite callbacks imo integrated at
the wrong level.  - there's a GUC that allows to change the default
table AM - moving COPY to use slots (roughly based on your / Haribabu's
patch) - removed the AM specific shmem initialization callbacks -
various AMs are going to need the syncscan APIs, so moving that into AM
callbacks doesn't make sense.

OK.
 
Missing:
- callback for the second scan of CREATE INDEX CONCURRENTLY
- commands/analyze.c integration (Working on it)
- fixing your (Haribabu's) slotification of copy patch to compute memory
  usage somehow

I will check it.
 
- table creation callback, currently the pluggable-zheap patch has a few
  conditionals outside of access/zheap for that purpose (see RelationTruncate

I will check it.
 
And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches


You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?

The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?

Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This needs
to be enhanced to add AM specific private members also.
 
> > > Does the new TupleTableSlot abstraction patches has fixed any of these
> > > issues in the recent thread [1]? so that I can look into the change of
> > > FDW API to return slot instead of tuple.
> >
> > Yea, that'd be a good thing to start with.
> >
>
> I found out only the RefetchForeignRow API needs the change and done the
> same.
> Along with that, I fixed all the issues of  running make check-world.
> Attached patches
> for the same.

Thanks, that's really helpful!  I'll try to merge these soon.

I can share the rebased patches for the fixes, so that it will be easy to merge. 

I'm starting to think that we're getting closer to something that
looks right from a high level, even though there's a lot of details to
clean up.

That's good.

Regards,
Haribabu Kommi
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2
On Tue, Aug 28, 2018 at 1:48 PM Haribabu Kommi <[hidden email]> wrote:
On Fri, Aug 24, 2018 at 12:50 PM Andres Freund <[hidden email]> wrote:
Hi,

On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:
> On Tue, Aug 21, 2018 at 6:59 PM Andres Freund <[hidden email]> wrote:
>
> > On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
> > > On Sun, Aug 5, 2018 at 7:48 PM Andres Freund <[hidden email]> wrote:
> > > > I'm currently in the process of rebasing zheap onto the pluggable
> > > > storage work. The goal, which seems to work surprisingly well, is to
> > > > find issues that the current pluggable storage patch doesn't yet deal
> > > > with.  I plan to push a tree including a lot of fixes and improvements
> > > > soon.
> > > >
> > > That's good. Did you find any problems in porting zheap into pluggable
> > > storage? Does it needs any API changes or new API requirement?
> >
> > A lot, yes. The big changes are:
> > - removal of HeapPageScanDesc
> > - introduction of explicit support functions for tablesample & bitmap scans
> > - introduction of callbacks for vacuum_rel, cluster
> >
> > And quite a bit more along those lines.
> >
>
> OK. Those are quite a bit of changes.

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that.  There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

OK. Thanks, will check that also.
 
- fixing your (Haribabu's) slotification of copy patch to compute memory
  usage somehow

I will check it.

Attached is the copy patch that brings back the size validation.
Compute the tuple size from the first tuple in the batch and use the same for the
rest of the tuples in that batch. This way the calculation overhead is also reduced,
but there may be chances that the first tuple size is very low and rest can be very
long, but I feel those are rare.
 
 
- table creation callback, currently the pluggable-zheap patch has a few
  conditionals outside of access/zheap for that purpose (see RelationTruncate

I will check it.

I found couple of places where the zheap is using some extra logic in verifying 
whether it is zheap AM or not, based on that it used to took some extra decisions.
I am analyzing all the extra code that is done, whether any callbacks can handle it
or not? and how? I can come back with more details later.
 
 
And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches


You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?

The main reason of adding them to AM is just to provide a control to
the specific AM to decide whether they can support the bulk insert or
not?

Current framework doesn't support AM specific bulk insert state to be
passed from one function to another and it's structure is fixed. This needs
to be enhanced to add AM specific private members also.

Do you want me to work on it to make it generic to AM methods to extend
the structure?
 
 
> > > Does the new TupleTableSlot abstraction patches has fixed any of these
> > > issues in the recent thread [1]? so that I can look into the change of
> > > FDW API to return slot instead of tuple.
> >
> > Yea, that'd be a good thing to start with.
> >
>
> I found out only the RefetchForeignRow API needs the change and done the
> same.
> Along with that, I fixed all the issues of  running make check-world.
> Attached patches
> for the same.

Thanks, that's really helpful!  I'll try to merge these soon.

I can share the rebased patches for the fixes, so that it will be easy to merge. 

Rebased FDW and check-world fixes patch is attached.
I will continue working on the rest of the miss items.

Regards,
Haribabu Kommi
Fujitsu Australia

0002-copy-memory-limit-fix.patch (8K) Download Attachment
0001-check-world-fixes.patch (8K) Download Attachment
0003-FDW-RefetchForeignRow-API-prototype-change.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Andres Freund
Hi,

Thanks for the patches!

On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
> I found couple of places where the zheap is using some extra logic in
> verifying
> whether it is zheap AM or not, based on that it used to took some extra
> decisions.
> I am analyzing all the extra code that is done, whether any callbacks can
> handle it
> or not? and how? I can come back with more details later.

Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.


> >> And then:
> >> - lotsa cleanups
> >> - rebasing onto a newer version of the abstract slot patchset
> >> - splitting out smaller patches
> >>
> >>
> >> You'd moved the bulk insert into tableam callbacks - I don't quite get
> >> why? There's not really anything AM specific in that code?
> >>
> >
> > The main reason of adding them to AM is just to provide a control to
> > the specific AM to decide whether they can support the bulk insert or
> > not?
> >
> > Current framework doesn't support AM specific bulk insert state to be
> > passed from one function to another and it's structure is fixed. This needs
> > to be enhanced to add AM specific private members also.
> >
>
> Do you want me to work on it to make it generic to AM methods to extend
> the structure?

I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...


> @@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, EState *estate,
>   CommandId mycid, int hi_options,
>   ResultRelInfo *resultRelInfo,
>   BulkInsertState bistate,
> - int nBufferedTuples, TupleTableSlot **bufferedSlots,
> + int nBufferedSlots, TupleTableSlot **bufferedSlots,
>   uint64 firstBufferedLineNo);
>  static bool CopyReadLine(CopyState cstate);
>  static bool CopyReadLineText(CopyState cstate);
> @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
>   void       *bistate;
>   uint64 processed = 0;
>   bool useHeapMultiInsert;
> - int nBufferedTuples = 0;
> + int nBufferedSlots = 0;
>   int prev_leaf_part_index = -1;

> -#define MAX_BUFFERED_TUPLES 1000
> +#define MAX_BUFFERED_SLOTS 1000

What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.


>   if (useHeapMultiInsert)
>   {
> + int tup_size;
> +
>   /* Add this tuple to the tuple buffer */
> - if (nBufferedTuples == 0)
> + if (nBufferedSlots == 0)
> + {
>   firstBufferedLineNo = cstate->cur_lineno;
> - Assert(bufferedSlots[nBufferedTuples] == myslot);
> - nBufferedTuples++;
> +
> + /*
> + * Find out the Tuple size of the first tuple in a batch and
> + * use it for the rest tuples in a batch. There may be scenarios
> + * where the first tuple is very small and rest can be large, but
> + * that's rare and this should work for majority of the scenarios.
> + */
> + tup_size = heap_compute_data_size(myslot->tts_tupleDescriptor,
> +  myslot->tts_values,
> +  myslot->tts_isnull);
> + }

This seems too exensive to me.  I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2


On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <[hidden email]> wrote:
Hi,

Thanks for the patches!

On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
> I found couple of places where the zheap is using some extra logic in
> verifying
> whether it is zheap AM or not, based on that it used to took some extra
> decisions.
> I am analyzing all the extra code that is done, whether any callbacks can
> handle it
> or not? and how? I can come back with more details later.

Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.
 
OK. I will list all the areas that I found with my observation of how to
abstract or leaving it and then implement around it.


> >> And then:
> >> - lotsa cleanups
> >> - rebasing onto a newer version of the abstract slot patchset
> >> - splitting out smaller patches
> >>
> >>
> >> You'd moved the bulk insert into tableam callbacks - I don't quite get
> >> why? There's not really anything AM specific in that code?
> >>
> >
> > The main reason of adding them to AM is just to provide a control to
> > the specific AM to decide whether they can support the bulk insert or
> > not?
> >
> > Current framework doesn't support AM specific bulk insert state to be
> > passed from one function to another and it's structure is fixed. This needs
> > to be enhanced to add AM specific private members also.
> >
>
> Do you want me to work on it to make it generic to AM methods to extend
> the structure?

I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...

OK. Will remove them and share the patch.
 

> @@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, EState *estate,
>                                       CommandId mycid, int hi_options,
>                                       ResultRelInfo *resultRelInfo,
>                                       BulkInsertState bistate,
> -                                     int nBufferedTuples, TupleTableSlot **bufferedSlots,
> +                                     int nBufferedSlots, TupleTableSlot **bufferedSlots,
>                                       uint64 firstBufferedLineNo);
>  static bool CopyReadLine(CopyState cstate);
>  static bool CopyReadLineText(CopyState cstate);
> @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
>       void       *bistate;
>       uint64          processed = 0;
>       bool            useHeapMultiInsert;
> -     int                     nBufferedTuples = 0;
> +     int                     nBufferedSlots = 0;
>       int                     prev_leaf_part_index = -1;

> -#define MAX_BUFFERED_TUPLES 1000
> +#define MAX_BUFFERED_SLOTS 1000

What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.

OK. I will correct it.
 

>                               if (useHeapMultiInsert)
>                               {
> +                                     int tup_size;
> +
>                                       /* Add this tuple to the tuple buffer */
> -                                     if (nBufferedTuples == 0)
> +                                     if (nBufferedSlots == 0)
> +                                     {
>                                               firstBufferedLineNo = cstate->cur_lineno;
> -                                     Assert(bufferedSlots[nBufferedTuples] == myslot);
> -                                     nBufferedTuples++;
> +
> +                                             /*
> +                                              * Find out the Tuple size of the first tuple in a batch and
> +                                              * use it for the rest tuples in a batch. There may be scenarios
> +                                              * where the first tuple is very small and rest can be large, but
> +                                              * that's rare and this should work for majority of the scenarios.
> +                                              */
> +                                             tup_size = heap_compute_data_size(myslot->tts_tupleDescriptor,
> +                                                                                                               myslot->tts_values,
> +                                                                                                               myslot->tts_isnull);
> +                                     }

This seems too exensive to me.  I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?

Yes, the cstate structure contains the line_buf member that holds the information of 
the line length of the row, this can be used as a tuple length to limit the size usage.
comments?

Regards,
Haribabu Kommi
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

Haribabu Kommi-2
On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <[hidden email]> wrote:

On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <[hidden email]> wrote:
Hi,

Thanks for the patches!

On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote:
> I found couple of places where the zheap is using some extra logic in
> verifying
> whether it is zheap AM or not, based on that it used to took some extra
> decisions.
> I am analyzing all the extra code that is done, whether any callbacks can
> handle it
> or not? and how? I can come back with more details later.

Yea, I think some of them will need to stay (particularly around
integrating undo) and som other ones we'll need to abstract.
 
OK. I will list all the areas that I found with my observation of how to
abstract or leaving it and then implement around it.

The following are the change where the code is specific to checking whether
it is a zheap relation or not?

Overall I found that It needs 3 new API's at the following locations.
1. RelationSetNewRelfilenode
2. heap_create_init_fork
3. estimate_rel_size
4. Facility to provide handler options like (skip WAL and etc).
_hash_vacuum_one_page:
xlrec.flags = RelationStorageIsZHeap(heapRel) ?
XLOG_HASH_VACUUM_RELATION_STORAGE_ZHEAP : 0;

_bt_delitems_delete:
xlrec_delete.flags = RelationStorageIsZHeap(heapRel) ?
XLOG_BTREE_DELETE_RELATION_STORAGE_ZHEAP : 0;


Storing the type of the handler and while checking for these new types adding a 
new API for special handing can remove the need of the above code.
RelationAddExtraBlocks:
if (RelationStorageIsZHeap(relation))
{
ZheapInitPage(page, BufferGetPageSize(buffer));
freespace = PageGetZHeapFreeSpace(page);
}

Adding a new API for PageInt and PageGetHeapFreeSpace to redirect the calls to specific
table am handlers.

visibilitymap_set:
if (RelationStorageIsZHeap(rel))
{
recptr = log_zheap_visible(rel->rd_node, heapBuf, vmBuf,
   cutoff_xid, flags);
/*
* We do not have a page wise visibility flag in zheap.
* So no need to set LSN on zheap page.
*/
}

Handler options may solve need of above code.

validate_index_heapscan:
/* Set up for predicate or expression evaluation */
/* For zheap relations, the tuple is locally allocated, so free it. */
ExecStoreHeapTuple(heapTuple, slot, RelationStorageIsZHeap(heapRelation));


This will solve after changing the validate_index_heapscan function to slotify.

RelationTruncate:
/* Create the meta page for zheap */
if (RelationStorageIsZHeap(rel))
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
  InvalidTransactionId,
  InvalidMultiXactId);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
rel->rd_rel->relkind != 'p')
{
heap_create_init_fork(rel);
if (RelationStorageIsZHeap(rel))
ZheapInitMetaPage(rel, INIT_FORKNUM);
}

new API in RelationSetNewRelfilenode and heap_create_init_fork can solve it.
cluster:
if (RelationStorageIsZHeap(rel))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot cluster a zheap table")));
 
No change required.
 
copyFrom:
/*
* In zheap, we don't support the optimization for HEAP_INSERT_SKIP_WAL.
* See zheap_prepare_insert for details.
* PBORKED / ZBORKED: abstract
*/
if (!RelationStorageIsZHeap(cstate->rel) && !XLogIsNeeded())
hi_options |= HEAP_INSERT_SKIP_WAL;
How about requesting the table am handler to provide options and use them here?
ExecuteTruncateGuts:

// PBORKED: Need to abstract this
minmulti = GetOldestMultiXactId();

/*
* Need the full transaction-safe pushups.
*
* Create a new empty storage file for the relation, and assign it
* as the relfilenode value. The old storage file is scheduled for
* deletion at commit.
*
* PBORKED: needs to be a callback
*/
if (RelationStorageIsZHeap(rel))
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
  InvalidTransactionId, InvalidMultiXactId);
else
RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
  RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
{
heap_create_init_fork(rel);
if (RelationStorageIsZHeap(rel))
ZheapInitMetaPage(rel, INIT_FORKNUM);
}

New API inside RelationSetNewRelfilenode can handle it.
ATRewriteCatalogs:
/* Inherit the storage_engine reloption from the parent table. */
if (RelationStorageIsZHeap(rel))
{
static char *validnsps[] = HEAP_RELOPT_NAMESPACES;
DefElem *storage_engine;

storage_engine = makeDefElemExtended("toast", "storage_engine",
(Node *) makeString("zheap"),
DEFELEM_UNSPEC, -1);
reloptions = transformRelOptions((Datum) 0,
list_make1(storage_engine),
"toast",
validnsps, true, false);
}

I don't think anything can be done in API.

ATRewriteTable:
/*
* In zheap, we don't support the optimization for HEAP_INSERT_SKIP_WAL.
* See zheap_prepare_insert for details.
*
* ZFIXME / PFIXME: We probably need a different abstraction for this.
*/
if (!RelationStorageIsZHeap(newrel) && !XLogIsNeeded())
hi_options |= HEAP_INSERT_SKIP_WAL;

Options can solve this also.

estimate_rel_size:
if (curpages < 10 &&
(rel->rd_rel->relpages == 0 ||
(RelationStorageIsZHeap(rel) &&
rel->rd_rel->relpages == ZHEAP_METAPAGE + 1)) &&
!rel->rd_rel->relhassubclass &&
rel->rd_rel->relkind != RELKIND_INDEX)
curpages = 10;

/* report estimated # pages */
*pages = curpages;
/* quick exit if rel is clearly empty */
if (curpages == 0 || (RelationStorageIsZHeap(rel) &&
curpages == ZHEAP_METAPAGE + 1))
{
*tuples = 0;
*allvisfrac = 0;
break;
}
/* coerce values in pg_class to more desirable types */
relpages = (BlockNumber) rel->rd_rel->relpages;
reltuples = (double) rel->rd_rel->reltuples;
relallvisible = (BlockNumber) rel->rd_rel->relallvisible;

/*
* If it's a zheap relation, then subtract the pages
* to account for the metapage.
*/
if (relpages > 0 && RelationStorageIsZHeap(rel))
{
curpages--;
relpages--;
}

An API may be needed to find out estimation size based on handler type?
pg_stat_get_tuples_hot_updated and others:
/*
* Counter tuples_hot_updated stores number of hot updates for heap table
* and the number of inplace updates for zheap table.
*/
if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
RelationStorageIsZHeap(rel))
result = 0;
else
result = (int64) (tabentry->tuples_hot_updated);


Is the special condition is needed? The values should be 0 because of zheap right?

RelationSetNewRelfilenode:
/* Initialize the metapage for zheap relation. */
if (RelationStorageIsZHeap(relation))
ZheapInitMetaPage(relation, MAIN_FORKNUM);
New API in RelationSetNetRelfilenode Can solve this problem.
 

> >> And then:
> >> - lotsa cleanups
> >> - rebasing onto a newer version of the abstract slot patchset
> >> - splitting out smaller patches
> >>
> >>
> >> You'd moved the bulk insert into tableam callbacks - I don't quite get
> >> why? There's not really anything AM specific in that code?
> >>
> >
> > The main reason of adding them to AM is just to provide a control to
> > the specific AM to decide whether they can support the bulk insert or
> > not?
> >
> > Current framework doesn't support AM specific bulk insert state to be
> > passed from one function to another and it's structure is fixed. This needs
> > to be enhanced to add AM specific private members also.
> >
>
> Do you want me to work on it to make it generic to AM methods to extend
> the structure?

I think the best thing here would be to *remove* all AM abstraction for
bulk insert, until it's actuallly needed. The likelihood of us getting
the interface right and useful without an actual user seems low. Also,
this already is a huge patch...

OK. Will remove them and share the patch.

Bulk insert API changes are removed.
 
 

> @@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, EState *estate,
>                                       CommandId mycid, int hi_options,
>                                       ResultRelInfo *resultRelInfo,
>                                       BulkInsertState bistate,
> -                                     int nBufferedTuples, TupleTableSlot **bufferedSlots,
> +                                     int nBufferedSlots, TupleTableSlot **bufferedSlots,
>                                       uint64 firstBufferedLineNo);
>  static bool CopyReadLine(CopyState cstate);
>  static bool CopyReadLineText(CopyState cstate);
> @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate)
>       void       *bistate;
>       uint64          processed = 0;
>       bool            useHeapMultiInsert;
> -     int                     nBufferedTuples = 0;
> +     int                     nBufferedSlots = 0;
>       int                     prev_leaf_part_index = -1;

> -#define MAX_BUFFERED_TUPLES 1000
> +#define MAX_BUFFERED_SLOTS 1000

What's the point of these renames? We're still dealing in tuples. Just
seems to make the patch larger.

OK. I will correct it.
 

>                               if (useHeapMultiInsert)
>                               {
> +                                     int tup_size;
> +
>                                       /* Add this tuple to the tuple buffer */
> -                                     if (nBufferedTuples == 0)
> +                                     if (nBufferedSlots == 0)
> +                                     {
>                                               firstBufferedLineNo = cstate->cur_lineno;
> -                                     Assert(bufferedSlots[nBufferedTuples] == myslot);
> -                                     nBufferedTuples++;
> +
> +                                             /*
> +                                              * Find out the Tuple size of the first tuple in a batch and
> +                                              * use it for the rest tuples in a batch. There may be scenarios
> +                                              * where the first tuple is very small and rest can be large, but
> +                                              * that's rare and this should work for majority of the scenarios.
> +                                              */
> +                                             tup_size = heap_compute_data_size(myslot->tts_tupleDescriptor,
> +                                                                                                               myslot->tts_values,
> +                                                                                                               myslot->tts_isnull);
> +                                     }

This seems too exensive to me.  I think it'd be better if we instead
used the amount of input data consumed for the tuple as a proxy. Does that
sound reasonable?

Yes, the cstate structure contains the line_buf member that holds the information of 
the line length of the row, this can be used as a tuple length to limit the size usage.
comments?

copy from batch insert memory usage limit fix and Provide grammer support for USING method
to create table as also.

Regards,
Haribabu Kommi
Fujitsu Australia

0001-copy-memory-limit-fix.patch (2K) Download Attachment
0003-CREATE-AS-USING-method-grammer-support.patch (2K) Download Attachment
0002-Remove-of-Bulk-insert-state-API.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable Storage - Andres's take

akapila
On Mon, Sep 10, 2018 at 1:12 PM Haribabu Kommi <[hidden email]> wrote:

>
> On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <[hidden email]> wrote:
>>
> pg_stat_get_tuples_hot_updated and others:
> /*
> * Counter tuples_hot_updated stores number of hot updates for heap table
> * and the number of inplace updates for zheap table.
> */
> if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL ||
> RelationStorageIsZHeap(rel))
> result = 0;
> else
> result = (int64) (tabentry->tuples_hot_updated);
>
>
> Is the special condition is needed? The values should be 0 because of zheap right?
>

I also think so.  Beena/Mithun has worked on this part of the code, so
it is better if they also confirm once.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

1234 ... 9