Pluggable storage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
107 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

Pluggable storage

Alvaro Herrera-9
Many have expressed their interest in this topic, but I haven't seen any
design of how it should work.  Here's my attempt; I've been playing with
this for some time now and I think what I propose here is a good initial
plan.  This will allow us to write permanent table storage that works
differently than heapam.c.  At this stage, I haven't throught through
whether this is going to allow extensions to define new storage modules;
I am focusing on AMs that can coexist with heapam in core.

The design starts with a new row type in pg_am, of type "s" (for "storage").
The handler function returns a struct of node StorageAmRoutine.  This
contains functions for 1) scans (beginscan, getnext, endscan) 2) tuples
(tuple_insert/update/delete/lock, as well as set_oid, get_xmin and the
like), and operations on tuples that are part of slots (tuple_deform,
materialize).

To support this, we introduce StorageTuple and StorageScanDesc.
StorageTuples represent a physical tuple coming from some storage AM.
It is necessary to have a pointer to a StorageAmRoutine in order to
manipulate the tuple.  For heapam.c, a StorageTuple is just a HeapTuple.

RelationData gains ->rd_stamroutine which is a pointer to the
StorageAmRoutine for the relation in question.  Similarly,
TupleTableSlot is augmented with a link to the StorageAmRoutine to
handle the StorageTuple it contains (probably in most cases it's set at
the same time as the tupdesc).  This implies that routines such as
ExecAssignScanType need to pass down the StorageAmRoutine from the
relation to the slot.

The executor is modified so that instead of calling heap_insert etc
directly, it uses rel->rd_stamroutine to call these methods.  The
executor is still in charge of dealing with indexes, constraints, and
any other thing that's not the tuple storage itself (this is one major
point in which this differs from FDWs).  This all looks simple enough,
with one exception and a few notes:

exception a) ExecMaterializeSlot needs special consideration.  This is
used in two different ways: a1) is the stated "make tuple independent
from any underlying storage" point, which is handled by
ExecMaterializeSlot itself and calling a method from the storage AM to
do any byte copying as needed.  ExecMaterializeSlot no longer returns a
HeapTuple, because there might not be any.  The second usage pattern a2)
is to create a HeapTuple that's passed to other modules which only deal
with HT and not slots (triggers are the main case I noticed, but I think
there are others such as the executor itself wanting tuples as Datum for
some reason).  For the moment I'm handling this by having a new
ExecHeapifyTuple which creates a HeapTuple from a slot, regardless of
the original tuple format.

note b) EvalPlanQual currently maintains an array of HeapTuple in
EState->es_epqTuple.  I think it works to replace that with an array of
StorageTuples; EvalPlanQualFetch needs to call the StorageAmRoutine
methods in order to interact with it.  Other than those changes, it
seems okay.

note c) nodeSubplan has curTuple as a HeapTuple.  It seems simple
to replace this with an independent slot-based tuple.

note d) grp_firstTuple in nodeAgg / nodeSetOp.  These are less
simple than the above, but replacing the HeapTuple with a slot-based
tuple seems doable too.

note e) nodeLockRows uses lr_curtuples to feed EvalPlanQual.
TupleTableSlot also seems a good replacement.  This has fallout in other
users of EvalPlanQual, too.

note f) More widespread, MinimalTuples currently use a tweaked HeapTuple
format.  In the long run, it may be possible to replace them with a
separate storage module that's specifically designed to handle tuples
meant for tuplestores etc.  That may simplify TupleTableSlot and
execTuples.  For the moment we keep the tts_mintuple as it is.  Whenever
a tuple is not already in heap format, we heapify it in order to put in
the store.


The current heapam.c routines need some changes.  Currently, practice is
that heap_insert, heap_multi_insert, heap_fetch, heap_update scribble on
their input tuples to set the resulting ItemPointer in tuple->t_self.
This is messy if we want StorageTuples to be abstract.  I'm changing
this so that the resulting ItemPointer is returned in a separate output
argument; the tuple itself is left alone.  This is somewhat messy in the
case of heap_multi_insert because it returns several items; I think it's
acceptable to return an array of ItemPointers in the same order as the
input tuples.  This works fine for the only caller, which is COPY in
batch mode.  For the other routines, they don't really care where the
TID is returned AFAICS.


Additional noteworthy items:

i) Speculative insertion: the speculative insertion token is no longer
installed directly in the heap tuple by the executor (of course).
Instead, the token becomes part of the slot.  When the tuple_insert
method is called, the insertion routine is in charge of setting the
token from the slot into the storage tuple.  Executor is in charge of
calling method->speculative_finish() / abort() once the insertion has
been confirmed by the indexes.

ii) execTuples has additional accessors for tuples-in-slot, such as
ExecFetchSlotTuple and friends.  I expect to have some of them to return
abstract StorageTuples, others HeapTuple or MinimalTuples (possibly
wrapped in Datum), depending on callers.  We might be able to cut down
on these later; my first cut will try to avoid API changes to keep
fallout to a minimum.

iii) All tuples need to be identifiable by ItemPointers.  Storages that
have different requirements will need careful additional thought across
the board.

iv) System catalogs cannot use pluggable storage.  We continue to use
heap_open etc in the DDL code, in order not to make this more invasive
that it already is.  We may lift this restriction later for specific
catalogs, as needed.

v) Currently, one Buffer may be associated with one HeapTuple living in a
slot; when the slot is cleared, the buffer pin is released.  My current
patch moves the buffer pin to inside the heapam-based storage AM and the
buffer is released by the ->slot_clear_tuple method.  The rationale for
doing this is that some storage AMs might want to keep several buffers
pinned at once, for example, and must not to release those pins
individually but in batches as the scan moves forwards (say a batch of
tuples in a columnar storage AM has column values spread across many
buffers; they must all be kept pinned until the scan has moved past the
whole set of tuples).  But I'm not really sure that this is a great
design.


I welcome comments on these ideas.  My patch for this is nowhere near
completion yet; expect things to change for items that I've overlooked,
but I hope I didn't overlook any major.  If things are handwavy, it is
probably because I haven't fully figured them out yet.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Andres Freund
On 2016-08-15 12:02:18 -0400, Robert Haas wrote:
> Thanks for taking a stab at this.  I'd like to throw out a few concerns.
>
> One, I'm worried that adding an additional layer of pointer-jumping is
> going to slow things down and make Andres' work to speed up the
> executor more difficult.  I don't know that there is a problem there,
> and if there is a problem I don't know what to do about it, but I
> think it's something we need to consider.

I'm quite concerned about that as well.


> I am somewhat inclined to
> believe that we need to restructure the executor in a bigger way so
> that it passes around datums instead of tuples; I'm inclined to
> believe that the current tuple-centric model is probably not optimal
> even for the existing storage format.

I actually prototyped that, and it's not an easy win so far. Column
extraction cost, even after significant optimization, is still often a
significant portion of the runtime. And e.g. projection only extracting
all columns, after evaluating a restrictive qual referring to an "early"
column, can be a significant win.  We'd definitely have to give up on
extracting columns 0..n when accessing later columns... Hm.

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Anastasia Lubennikova
In reply to this post by Alvaro Herrera-9
13.08.2016 02:15, Alvaro Herrera:

> Many have expressed their interest in this topic, but I haven't seen any
> design of how it should work.  Here's my attempt; I've been playing with
> this for some time now and I think what I propose here is a good initial
> plan.  This will allow us to write permanent table storage that works
> differently than heapam.c.  At this stage, I haven't throught through
> whether this is going to allow extensions to define new storage modules;
> I am focusing on AMs that can coexist with heapam in core.
>
> The design starts with a new row type in pg_am, of type "s" (for "storage").
> The handler function returns a struct of node StorageAmRoutine.  This
> contains functions for 1) scans (beginscan, getnext, endscan) 2) tuples
> (tuple_insert/update/delete/lock, as well as set_oid, get_xmin and the
> like), and operations on tuples that are part of slots (tuple_deform,
> materialize).
>
> To support this, we introduce StorageTuple and StorageScanDesc.
> StorageTuples represent a physical tuple coming from some storage AM.
> It is necessary to have a pointer to a StorageAmRoutine in order to
> manipulate the tuple.  For heapam.c, a StorageTuple is just a HeapTuple.

StorageTuples concept looks really cool. I've got some questions on
details of implementation.

Do StorageTuples have fields common to all implementations?
Or StorageTuple is totally abstract structure that has nothing to do
with data, except pointing to it?

I mean, now we already have HeapTupleData structure, which is a pretty
good candidate to replace with StorageTuple.
It's already widely used in executor and moreover, it's the only structure
(except MinimalTuples and all those crazy optimizations) that works with
tuples, both extracted from the page or created on-the-fly in executor node.

typedef struct HeapTupleData
{
     uint32        t_len;           /* length of *t_data */
     ItemPointerData t_self;        /* SelfItemPointer */
     Oid            t_tableOid;     /* table the tuple came from */
     HeapTupleHeader t_data;        /* -> tuple header and data */
} HeapTupleData;

We can simply change t_data type from HeapTupleHeader to Pointer.
And maybe add a "t_handler" field that points out to handler functions.
I don't sure if it will be a name of StorageAm, or its OID, or maybe the
main function itself. Although, If I'm not mistaken, we always have
RelationData when we want to operate the tuple, so having t_handler
in the StorageTuple is excessive.


typedef struct StorageTupleData
{
     uint32            t_len;         /* length of *t_data */
     ItemPointerData   t_self;        /* SelfItemPointer */
     Oid               t_tableOid;    /* table the tuple came from */
     Pointer           t_data;        /* -> tuple header and data
                                       * This field should never be
accessed directly,
                                       * only via StorageAm handler
functions,
                                       * because we don't know
underlying data structure.
                                       */
     ???               t_handler;     /* StorageAm that knows what to do
with the tuple */
} StorageTupleData
;

This approach allows to minimize code changes and ensure that we
won't miss any function that handles tuples.

Do you see any weak points of the suggestion?
What design do you use in your prototype?

> RelationData gains ->rd_stamroutine which is a pointer to the
> StorageAmRoutine for the relation in question.  Similarly,
> TupleTableSlot is augmented with a link to the StorageAmRoutine to
> handle the StorageTuple it contains (probably in most cases it's set at
> the same time as the tupdesc).  This implies that routines such as
> ExecAssignScanType need to pass down the StorageAmRoutine from the
> relation to the slot.

If we already have this pointer in t_handler as described below,
we don't need to pass it between functions and slots.
> The executor is modified so that instead of calling heap_insert etc
> directly, it uses rel->rd_stamroutine to call these methods.  The
> executor is still in charge of dealing with indexes, constraints, and
> any other thing that's not the tuple storage itself (this is one major
> point in which this differs from FDWs).  This all looks simple enough,
> with one exception and a few notes:

That is exactly what I tried to describe in my proposal.
Chapter "Relation management". I'm sure, you've already noticed
that it will require huge source code cleaning. I've carefully read
the sources and found "violators" of abstraction in src/backend/commands.
The list is attached to the wiki page
https://wiki.postgresql.org/wiki/HeapamRefactoring.

Except these, there are some pretty strange and unrelated functions in
src/backend/catalog.
I'm willing to fix them, but I'd like to synchronize our efforts.

> exception a) ExecMaterializeSlot needs special consideration.  This is
> used in two different ways: a1) is the stated "make tuple independent
> from any underlying storage" point, which is handled by
> ExecMaterializeSlot itself and calling a method from the storage AM to
> do any byte copying as needed.  ExecMaterializeSlot no longer returns a
> HeapTuple, because there might not be any.  The second usage pattern a2)
> is to create a HeapTuple that's passed to other modules which only deal
> with HT and not slots (triggers are the main case I noticed, but I think
> there are others such as the executor itself wanting tuples as Datum for
> some reason).  For the moment I'm handling this by having a new
> ExecHeapifyTuple which creates a HeapTuple from a slot, regardless of
> the original tuple format.

Yes, triggers are a very special case. Thank you for the explanation.

That still goes well with my suggestion of a format.
Nothing to do, just substitute t_data with proper HeapTupleHeader
representation. I think it's a job for StorageAm. Let's say each StorageAm
must have stam_to_heaptuple() function and opposite function
stam_from_heaptuple().

> note b) EvalPlanQual currently maintains an array of HeapTuple in
> EState->es_epqTuple.  I think it works to replace that with an array of
> StorageTuples; EvalPlanQualFetch needs to call the StorageAmRoutine
> methods in order to interact with it.  Other than those changes, it
> seems okay.
>
> note c) nodeSubplan has curTuple as a HeapTuple.  It seems simple
> to replace this with an independent slot-based tuple.
>
> note d) grp_firstTuple in nodeAgg / nodeSetOp.  These are less
> simple than the above, but replacing the HeapTuple with a slot-based
> tuple seems doable too.
>
> note e) nodeLockRows uses lr_curtuples to feed EvalPlanQual.
> TupleTableSlot also seems a good replacement.  This has fallout in other
> users of EvalPlanQual, too.
>
> note f) More widespread, MinimalTuples currently use a tweaked HeapTuple
> format.  In the long run, it may be possible to replace them with a
> separate storage module that's specifically designed to handle tuples
> meant for tuplestores etc.  That may simplify TupleTableSlot and
> execTuples.  For the moment we keep the tts_mintuple as it is.  Whenever
> a tuple is not already in heap format, we heapify it in order to put in
> the store.
I wonder, do we really need MinimalTuples to support all formats?

> The current heapam.c routines need some changes.  Currently, practice is
> that heap_insert, heap_multi_insert, heap_fetch, heap_update scribble on
> their input tuples to set the resulting ItemPointer in tuple->t_self.
> This is messy if we want StorageTuples to be abstract.  I'm changing
> this so that the resulting ItemPointer is returned in a separate output
> argument; the tuple itself is left alone.  This is somewhat messy in the
> case of heap_multi_insert because it returns several items; I think it's
> acceptable to return an array of ItemPointers in the same order as the
> input tuples.  This works fine for the only caller, which is COPY in
> batch mode.  For the other routines, they don't really care where the
> TID is returned AFAICS.
>
>
> Additional noteworthy items:
>
> i) Speculative insertion: the speculative insertion token is no longer
> installed directly in the heap tuple by the executor (of course).
> Instead, the token becomes part of the slot.  When the tuple_insert
> method is called, the insertion routine is in charge of setting the
> token from the slot into the storage tuple.  Executor is in charge of
> calling method->speculative_finish() / abort() once the insertion has
> been confirmed by the indexes.
>
> ii) execTuples has additional accessors for tuples-in-slot, such as
> ExecFetchSlotTuple and friends.  I expect to have some of them to return
> abstract StorageTuples, others HeapTuple or MinimalTuples (possibly
> wrapped in Datum), depending on callers.  We might be able to cut down
> on these later; my first cut will try to avoid API changes to keep
> fallout to a minimum.
I'd suggest replacing all occurrences of HeapTuple with StorageTuple.
Do you see any problems with it?

> iii) All tuples need to be identifiable by ItemPointers.  Storages that
> have different requirements will need careful additional thought across
> the board.

For a start, we can simply deny secondary indexes for these storages
or require a function that converts tuple identifier inside the storage to
ItemPointer suitable for an index.

> iv) System catalogs cannot use pluggable storage.  We continue to use
> heap_open etc in the DDL code, in order not to make this more invasive
> that it already is.  We may lift this restriction later for specific
> catalogs, as needed.
+1

>
> v) Currently, one Buffer may be associated with one HeapTuple living in a
> slot; when the slot is cleared, the buffer pin is released.  My current
> patch moves the buffer pin to inside the heapam-based storage AM and the
> buffer is released by the ->slot_clear_tuple method.  The rationale for
> doing this is that some storage AMs might want to keep several buffers
> pinned at once, for example, and must not to release those pins
> individually but in batches as the scan moves forwards (say a batch of
> tuples in a columnar storage AM has column values spread across many
> buffers; they must all be kept pinned until the scan has moved past the
> whole set of tuples).  But I'm not really sure that this is a great
> design.

Frankly, I doubt that it's real to implement columnar storage just as
a variant of pluggable storage. It requires a lot of changes in executor
and optimizer and so on, which are hardly compatible with existing
tuple-oriented model. However I'm not so good in this area, so if you
feel that it's possible, go ahead.

> I welcome comments on these ideas.  My patch for this is nowhere near
> completion yet; expect things to change for items that I've overlooked,
> but I hope I didn't overlook any major.  If things are handwavy, it is
> probably because I haven't fully figured them out yet.

Thank you again for beginning the big project.
Looking forward to the prototype. I think it will make the discussion
more concrete and useful.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Alvaro Herrera-9
Anastasia Lubennikova wrote:
> 13.08.2016 02:15, Alvaro Herrera:

> >To support this, we introduce StorageTuple and StorageScanDesc.
> >StorageTuples represent a physical tuple coming from some storage AM.
> >It is necessary to have a pointer to a StorageAmRoutine in order to
> >manipulate the tuple.  For heapam.c, a StorageTuple is just a HeapTuple.
>
> StorageTuples concept looks really cool. I've got some questions on
> details of implementation.
>
> Do StorageTuples have fields common to all implementations?
> Or StorageTuple is totally abstract structure that has nothing to do
> with data, except pointing to it?
>
> I mean, now we already have HeapTupleData structure, which is a pretty
> good candidate to replace with StorageTuple.

I was planning to replace all uses of HeapTuple in the executor with
StorageTuple, actually.  But the main reason I would like to avoid
HeapTupleData itself is that it contains an assumption that there is a
single palloc chunk that contains the tuple (t_len and t_data).  This
might not be true in representations that split the tuple, for example
in columnar storage where you have one column in page A and another
column in page B, for the same tuple.  I suppose there might be some
point to keeping t_tableOid and t_self, though.

> And maybe add a "t_handler" field that points out to handler functions.
> I don't sure if it will be a name of StorageAm, or its OID, or maybe the
> main function itself. Although, If I'm not mistaken, we always have
> RelationData when we want to operate the tuple, so having t_handler
> in the StorageTuple is excessive.

Yeah, I think the RelationData (or more precisely the StorageAmRoutine)
is going to be available always, so I don't think we need a pointer in
the tuple itself.

> This approach allows to minimize code changes and ensure that we
> won't miss any function that handles tuples.
>
> Do you see any weak points of the suggestion?
> What design do you use in your prototype?

It's currently a "void *" pointer in my prototype.

> >RelationData gains ->rd_stamroutine which is a pointer to the
> >StorageAmRoutine for the relation in question.  Similarly,
> >TupleTableSlot is augmented with a link to the StorageAmRoutine to
> >handle the StorageTuple it contains (probably in most cases it's set at
> >the same time as the tupdesc).  This implies that routines such as
> >ExecAssignScanType need to pass down the StorageAmRoutine from the
> >relation to the slot.
>
> If we already have this pointer in t_handler as described below,
> we don't need to pass it between functions and slots.

I think it's better to have it in slots, so you can install multiple
tuples in the slot without having to change the routine pointers each
time.

> >The executor is modified so that instead of calling heap_insert etc
> >directly, it uses rel->rd_stamroutine to call these methods.  The
> >executor is still in charge of dealing with indexes, constraints, and
> >any other thing that's not the tuple storage itself (this is one major
> >point in which this differs from FDWs).  This all looks simple enough,
> >with one exception and a few notes:
>
> That is exactly what I tried to describe in my proposal.
> Chapter "Relation management". I'm sure, you've already noticed
> that it will require huge source code cleaning. I've carefully read
> the sources and found "violators" of abstraction in src/backend/commands.
> The list is attached to the wiki page
> https://wiki.postgresql.org/wiki/HeapamRefactoring.
>
> Except these, there are some pretty strange and unrelated functions in
> src/backend/catalog.
> I'm willing to fix them, but I'd like to synchronize our efforts.

I very much would like to stay away from touching src/backend/catalog,
which are the functions that deal with system catalogs.  We can simply
say that system catalogs are hardcoded to use heapam.c storage for now.
If we later see a need to enable some particular catalog using a
different storage implementation, we can change the code for that
specific catalog in src/backend/catalog and everywhere else, to use the
abstract API instead of hardcoding heap_insert etc.  But that can be
left for a second pass.  (This is my point "iv" further below, to which
you said "+1").


> Nothing to do, just substitute t_data with proper HeapTupleHeader
> representation. I think it's a job for StorageAm. Let's say each StorageAm
> must have stam_to_heaptuple() function and opposite function
> stam_from_heaptuple().

Hmm, yeah, that also works.  We'd have to check again whether it's more
convenient to start as a slot rather than a StorageTuple.  AFAICS the
trigger.c code is all starting from a slot, so it makes sense to have
the conversion use the slot code -- that way, there's no need for each
storageAM to re-implement conversion to HeapTuple.

> >note f) More widespread, MinimalTuples currently use a tweaked HeapTuple
> >format.  In the long run, it may be possible to replace them with a
> >separate storage module that's specifically designed to handle tuples
> >meant for tuplestores etc.  That may simplify TupleTableSlot and
> >execTuples.  For the moment we keep the tts_mintuple as it is.  Whenever
> >a tuple is not already in heap format, we heapify it in order to put in
> >the store.
> I wonder, do we really need MinimalTuples to support all formats?

Sure.  I wouldn't want to say "you can create table in columnar storage
format, but if you do, these tables cannot use hash join".

> >ii) execTuples has additional accessors for tuples-in-slot, such as
> >ExecFetchSlotTuple and friends.  I expect to have some of them to return
> >abstract StorageTuples, others HeapTuple or MinimalTuples (possibly
> >wrapped in Datum), depending on callers.  We might be able to cut down
> >on these later; my first cut will try to avoid API changes to keep
> >fallout to a minimum.
>
> I'd suggest replacing all occurrences of HeapTuple with StorageTuple.
> Do you see any problems with it?

The HeapTuple-in-datum representation, as I recall, is used in the SQL
function manager; maybe other places too.  Maybe there's a way to fix
that layer so that it uses StorageTuple instead, but I prefer not to
touch it in the first phase.  We can fix it later.  This is already a
big enough patch ...

> >iii) All tuples need to be identifiable by ItemPointers.  Storages that
> >have different requirements will need careful additional thought across
> >the board.
>
> For a start, we can simply deny secondary indexes for these storages
> or require a function that converts tuple identifier inside the storage to
> ItemPointer suitable for an index.

Umm.  I don't think rejecting secondary indexes would work very well.  I
think we can lift this limitation later; we just need to change the
IndexTuple abstraction so that it doesn't rely on ItemPointer as
currently.

> >v) Currently, one Buffer may be associated with one HeapTuple living in a
> >slot; when the slot is cleared, the buffer pin is released.  My current
> >patch moves the buffer pin to inside the heapam-based storage AM and the
> >buffer is released by the ->slot_clear_tuple method.  The rationale for
> >doing this is that some storage AMs might want to keep several buffers
> >pinned at once, for example, and must not to release those pins
> >individually but in batches as the scan moves forwards (say a batch of
> >tuples in a columnar storage AM has column values spread across many
> >buffers; they must all be kept pinned until the scan has moved past the
> >whole set of tuples).  But I'm not really sure that this is a great
> >design.
>
> Frankly, I doubt that it's real to implement columnar storage just as
> a variant of pluggable storage. It requires a lot of changes in executor
> and optimizer and so on, which are hardly compatible with existing
> tuple-oriented model. However I'm not so good in this area, so if you
> feel that it's possible, go ahead.

Well, not *just* as a variant of pluggable storage.  This thread is just
one sub-project inside the greater project to enable column-oriented
storage; that includes further changes to executor, too, but I haven't
discussed those in this proposal.  I mentioned all this in Brussels'
developer meeting earlier this year.  (There I mostly talked about
vertical partitioning, which is a different subproject that I've put
aside for the moment, but really it's all part of the same thing.)
https://wiki.postgresql.org/wiki/Future_of_storage

Thanks for reading!

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Simon Riggs
In reply to this post by Andres Freund
On 16 August 2016 at 19:46, Andres Freund <[hidden email]> wrote:

> On 2016-08-15 12:02:18 -0400, Robert Haas wrote:
>> Thanks for taking a stab at this.  I'd like to throw out a few concerns.
>>
>> One, I'm worried that adding an additional layer of pointer-jumping is
>> going to slow things down and make Andres' work to speed up the
>> executor more difficult.  I don't know that there is a problem there,
>> and if there is a problem I don't know what to do about it, but I
>> think it's something we need to consider.
>
> I'm quite concerned about that as well.

This objection would apply to all other proposals as well, FDW etc..

Do you see some way to add flexibility yet without adding a branch
point in the code?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Alexander Korotkov-3
On Thu, Aug 18, 2016 at 10:58 AM, Simon Riggs <[hidden email]> wrote:
On 16 August 2016 at 19:46, Andres Freund <[hidden email]> wrote:
> On 2016-08-15 12:02:18 -0400, Robert Haas wrote:
>> Thanks for taking a stab at this.  I'd like to throw out a few concerns.
>>
>> One, I'm worried that adding an additional layer of pointer-jumping is
>> going to slow things down and make Andres' work to speed up the
>> executor more difficult.  I don't know that there is a problem there,
>> and if there is a problem I don't know what to do about it, but I
>> think it's something we need to consider.
>
> I'm quite concerned about that as well.

This objection would apply to all other proposals as well, FDW etc..

Do you see some way to add flexibility yet without adding a branch
point in the code?

It's impossible without branch point in code.  The question is where this branch should be located.
In particular, be can put this branch point into planner by defining distinct executor nodes for each pluggable storage.  In this case, each storage would have own optimized executor nodes.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

akapila
In reply to this post by Alvaro Herrera-9
On Wed, Aug 17, 2016 at 10:33 PM, Alvaro Herrera
<[hidden email]> wrote:

> Anastasia Lubennikova wrote:
>>
>> Except these, there are some pretty strange and unrelated functions in
>> src/backend/catalog.
>> I'm willing to fix them, but I'd like to synchronize our efforts.
>
> I very much would like to stay away from touching src/backend/catalog,
> which are the functions that deal with system catalogs.  We can simply
> say that system catalogs are hardcoded to use heapam.c storage for now.
>

Does this mean that if any storage needs to access system catalog
information, they need to be aware of HeapTuple and other required
stuff like syscache?  Again, if they need to update some stats or
something like that, they need to be aware of heap tuple format.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Ants Aasma
In reply to this post by Alvaro Herrera-9
On Tue, Aug 16, 2016 at 9:46 PM, Andres Freund <[hidden email]> wrote:

> On 2016-08-15 12:02:18 -0400, Robert Haas wrote:
>> I am somewhat inclined to
>> believe that we need to restructure the executor in a bigger way so
>> that it passes around datums instead of tuples; I'm inclined to
>> believe that the current tuple-centric model is probably not optimal
>> even for the existing storage format.
>
> I actually prototyped that, and it's not an easy win so far. Column
> extraction cost, even after significant optimization, is still often a
> significant portion of the runtime. And e.g. projection only extracting
> all columns, after evaluating a restrictive qual referring to an "early"
> column, can be a significant win.  We'd definitely have to give up on
> extracting columns 0..n when accessing later columns... Hm.

What about going even further than [1] in converting the executor to
being opcode based and merging projection and qual evaluation to a
single pass? Optimizer would then have some leeway about how to order
column extraction and qual evaluation. Might even be worth it to
special case some functions as separate opcodes (e.g. int4eq,
timestamp_lt).

Regards,
Ants Aasma

[1] https://www.postgresql.org/message-id/20160714011850.bd5zhu35szle3n3c@...


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Andres Freund


On August 18, 2016 7:44:50 AM PDT, Ants Aasma <[hidden email]> wrote:

>On Tue, Aug 16, 2016 at 9:46 PM, Andres Freund <[hidden email]>
>wrote:
>> On 2016-08-15 12:02:18 -0400, Robert Haas wrote:
>>> I am somewhat inclined to
>>> believe that we need to restructure the executor in a bigger way so
>>> that it passes around datums instead of tuples; I'm inclined to
>>> believe that the current tuple-centric model is probably not optimal
>>> even for the existing storage format.
>>
>> I actually prototyped that, and it's not an easy win so far. Column
>> extraction cost, even after significant optimization, is still often
>a
>> significant portion of the runtime. And e.g. projection only
>extracting
>> all columns, after evaluating a restrictive qual referring to an
>"early"
>> column, can be a significant win.  We'd definitely have to give up on
>> extracting columns 0..n when accessing later columns... Hm.
>
>What about going even further than [1] in converting the executor to
>being opcode based and merging projection and qual evaluation to a
>single pass? Optimizer would then have some leeway about how to order
>column extraction and qual evaluation. Might even be worth it to
>special case some functions as separate opcodes (e.g. int4eq,
>timestamp_lt).
>
>Regards,
>Ants Aasma
>
>[1]
>https://www.postgresql.org/message-id/20160714011850.bd5zhu35szle3n3c@...

Good question. I think I have a reasonable answer,  but lets discuss that in the other thread.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Andres Freund
In reply to this post by Simon Riggs
On 2016-08-18 08:58:11 +0100, Simon Riggs wrote:

> On 16 August 2016 at 19:46, Andres Freund <[hidden email]> wrote:
> > On 2016-08-15 12:02:18 -0400, Robert Haas wrote:
> >> Thanks for taking a stab at this.  I'd like to throw out a few concerns.
> >>
> >> One, I'm worried that adding an additional layer of pointer-jumping is
> >> going to slow things down and make Andres' work to speed up the
> >> executor more difficult.  I don't know that there is a problem there,
> >> and if there is a problem I don't know what to do about it, but I
> >> think it's something we need to consider.
> >
> > I'm quite concerned about that as well.
>
> This objection would apply to all other proposals as well, FDW etc..

Not really. The place you draw your boundary significantly influences
where and how much of a price you pay.  Having another indirection
inside HeapTuple - which is accessed in many many places, is something
different from having a seqscan equivalent, which returns you a batch of
already deformed tuples in array form.  In the latter case there's one
additional indirection per batch of tuples, in the former there's many
for each tuple.


> Do you see some way to add flexibility yet without adding a branch
> point in the code?

I'm not even saying that the approach of doing the indirection inside
the HeapTuple replacement is a no-go, just that it concerns me.  I do
think that working on only lugging arround values/isnull arrays is
something that I could see working better, if some problems are
addressed beforehand.

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Alvaro Herrera-9
In reply to this post by Alvaro Herrera-9
Alvaro Herrera wrote:
> Many have expressed their interest in this topic, but I haven't seen any
> design of how it should work.  Here's my attempt; I've been playing with
> this for some time now and I think what I propose here is a good initial
> plan.

I regret to announce that I'll have to stay away from this topic for a
little while, as I have another project taking priority.  I expect to
return to this shortly thereafter, hopefully in time to get it done for
pg10.

If anyone is interested in helping with the (currently not compilable)
patch I have, please mail me privately and we can discuss.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Alvaro Herrera-9
I have sent the partial patch I have to Hari Babu Kommi.  We expect that
he will be able to further this goal some more.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Haribabu Kommi-2


On Fri, Oct 14, 2016 at 7:26 AM, Alvaro Herrera <[hidden email]> wrote:
I have sent the partial patch I have to Hari Babu Kommi.  We expect that
he will be able to further this goal some more.

Thanks Alvaro for sharing your development patch.

Most of the patch design is same as described by Alvaro in the first mail [1].
I will detail the modifications, pending items and open items (needs discussion)
to implement proper pluggable storage.

Here I attached WIP patches to support pluggable storage. The patch series
are may not work individually. Still so many things are under development.
These patches are just to share the approach of the current development.

Some notable changes that I did to make the patch work:

1. Added storageam handler to the slot, this is because not all places
the relation is not available in handy.
2. Retained the minimal Tuple in the slot, as this is used in HASH join.
As per the first version, I feel it is fine to allow creating HeapTuple
format data.

Thanks everyone for sharing their ideas in the developer's unconference at
PGCon Ottawa.

Pending items:

1. Replacement of Tuple with slot in Trigger functionality
2. Replacement of Tuple with Slot from storage handler functions.
3. Remove/minimize the use of HeapTuple as a Datum.
4. Replace all references of HeapScanDesc with StorageScanDesc
5. Planner changes to consider the relation storage during the planning.
6. Any planner changes based on the discussion of open items?
7. some Executor changes to consider the storage advantages?

Open Items:

1. The BitmapHeapScan and TableSampleScan are tightly coupled with
HeapTuple and HeapScanDesc, So these scans are directly operating 
on those structures and providing the result.

These scan types may not be applicable to different storage formats.
So how to handle them?

Currently my goal to provide a basic infrastructure of pluggable storage as
a first step and later improve it further to improve the performance by
taking the advantage of storage.

Regards,
Hari Babu
Fujitsu Australia


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0010-compilation-fixes.patch (1K) Download Attachment
0001-Create-Access-method-change-to-include-storage-handl.patch (9K) Download Attachment
0002-Regression-suite-update-according-to-the-new-access-.patch (3K) Download Attachment
0003-Storage-access-method-API-functions.patch (18K) Download Attachment
0004-Heap-storage-handler.patch (50K) Download Attachment
0005-Adding-storageam-handler-to-slot.patch (31K) Download Attachment
0006-HeapTuple-replace-with-StorageTuple.patch (114K) Download Attachment
0007-Replace-slot-functions-with-storage-access-methods.patch (31K) Download Attachment
0008-Replace-heap_-functions-with-storage-access-methods.patch (87K) Download Attachment
0009-Remaining-heap_insert-calls-repalce.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Robert Haas
On Mon, Jun 12, 2017 at 9:50 PM, Haribabu Kommi
<[hidden email]> wrote:
> Open Items:
>
> 1. The BitmapHeapScan and TableSampleScan are tightly coupled with
> HeapTuple and HeapScanDesc, So these scans are directly operating
> on those structures and providing the result.
>
> These scan types may not be applicable to different storage formats.
> So how to handle them?

I think that BitmapHeapScan, at least, is applicable to any table AM
that has TIDs.   It seems to me that in general we can imagine three
kinds of table AMs:

1. Table AMs where a tuple can be efficiently located by a real TID.
By a real TID, I mean that the block number part is really a block
number and the item ID is really a location within the block.  These
are necessarily quite similar to our current heap, but they can change
the tuple format and page format to some degree, and it seems like in
many cases it should be possible to plug them into our existing index
AMs without too much heartache.  Both index scans and bitmap index
scans ought to work.

2. Table AMs where a tuple has some other kind of locator.  For
example, imagine an index-organized table where the locator is the
primary key, which is a bit like what Alvaro had in mind for indirect
indexes.  If the locator is 6 bytes or less, it could potentially be
jammed into a TID, but I don't think that's a great idea.  For things
like int8 or numeric, it won't work at all.  Even for other things,
it's going to cause problems because the bit patterns won't be what
the code is expecting; e.g. bitmap scans care about the structure of
the TID, not just how many bits it is.  (Due credit: Somebody, maybe
Alvaro, pointed out this problem before, at PGCon.)  For these kinds
of tables, larger modifications to the index AMs are likely to be
necessary, at least if we want a really general solution, or maybe we
should have separate index AMs - e.g. btree for traditional TID-based
heaps, and generic_btree or indirect_btree or key_btree or whatever
for heaps with some other kind of locator.  It's not too hard to see
how to make index scans work with this sort of structure but it's very
unclear to me whether, or how, bitmap scans can be made to work.

3. Table AMs where a tuple doesn't really have a locator at all.  In
these cases, we can't support any sort of index AM at all.  When the
table is queried, there's really nothing the core system can do except
ask the table AM for a full scan, supply the quals, and hope the table
AM has some sort of smarts that enable it to optimize somehow.  For
example, you can imagine converting cstore_fdw into a table AM of this
sort - ORC has a sort of inbuilt BRIN-like indexing that allows whole
chunks to be proven uninteresting and skipped.  (You could use chunk
number + offset to turn this into a table AM of the previous type if
you wanted to support secondary indexes; not sure if that'd be useful,
but it'd certainly be harder.)

I'm more interested in #1 than in #3, and more interested in #3 than
#2, but other people may have different priorities.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Michael Paquier
On Thu, Jun 22, 2017 at 4:47 AM, Robert Haas <[hidden email]> wrote:

> I think that BitmapHeapScan, at least, is applicable to any table AM
> that has TIDs.   It seems to me that in general we can imagine three
> kinds of table AMs:
>
> 1. Table AMs where a tuple can be efficiently located by a real TID.
> By a real TID, I mean that the block number part is really a block
> number and the item ID is really a location within the block.  These
> are necessarily quite similar to our current heap, but they can change
> the tuple format and page format to some degree, and it seems like in
> many cases it should be possible to plug them into our existing index
> AMs without too much heartache.  Both index scans and bitmap index
> scans ought to work.
>
> 2. Table AMs where a tuple has some other kind of locator.  For
> example, imagine an index-organized table where the locator is the
> primary key, which is a bit like what Alvaro had in mind for indirect
> indexes.  If the locator is 6 bytes or less, it could potentially be
> jammed into a TID, but I don't think that's a great idea.  For things
> like int8 or numeric, it won't work at all.  Even for other things,
> it's going to cause problems because the bit patterns won't be what
> the code is expecting; e.g. bitmap scans care about the structure of
> the TID, not just how many bits it is.  (Due credit: Somebody, maybe
> Alvaro, pointed out this problem before, at PGCon.)  For these kinds
> of tables, larger modifications to the index AMs are likely to be
> necessary, at least if we want a really general solution, or maybe we
> should have separate index AMs - e.g. btree for traditional TID-based
> heaps, and generic_btree or indirect_btree or key_btree or whatever
> for heaps with some other kind of locator.  It's not too hard to see
> how to make index scans work with this sort of structure but it's very
> unclear to me whether, or how, bitmap scans can be made to work.
>
> 3. Table AMs where a tuple doesn't really have a locator at all.  In
> these cases, we can't support any sort of index AM at all.  When the
> table is queried, there's really nothing the core system can do except
> ask the table AM for a full scan, supply the quals, and hope the table
> AM has some sort of smarts that enable it to optimize somehow.  For
> example, you can imagine converting cstore_fdw into a table AM of this
> sort - ORC has a sort of inbuilt BRIN-like indexing that allows whole
> chunks to be proven uninteresting and skipped.  (You could use chunk
> number + offset to turn this into a table AM of the previous type if
> you wanted to support secondary indexes; not sure if that'd be useful,
> but it'd certainly be harder.)
>
> I'm more interested in #1 than in #3, and more interested in #3 than
> #2, but other people may have different priorities.

Putting that in a couple of words.
1. Table AM with a 6-byte TID.
2. Table AM with a custom locator format, which could be TID-like.
3. Table AM with no locators.

Getting into having #1 first to work out would already be really
useful for users. My take on the matter is that being able to plug in
in-core index AMs directly into a table AM #1 is more useful in the
long term, as it is possible for multiple table AMs to use the same
kind of index AM which is designed nicely enough. So the index AM
logic basically does not need to be duplicated across multiple table
AMs. #3 implies that the index AM logic is implemented in the table
AM. Not saying that it is not useful, but it does not feel natural to
have the planner request for a sequential scan, just to have the table
AM secretly do some kind of index/skipping scan.
--
Michael


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Amit Langote-2
On 2017/06/22 10:01, Michael Paquier wrote:
> #3 implies that the index AM logic is implemented in the table
> AM. Not saying that it is not useful, but it does not feel natural to
> have the planner request for a sequential scan, just to have the table
> AM secretly do some kind of index/skipping scan.

I had read a relevant comment on a pluggable storage thread awhile back
[1].  In short, the comment was that the planner should be able to get
some intelligence, via some API, from the heap storage implementation
about the latter's access cost characteristics.  The storage should
provide accurate-enough cost information to the planner when such a
request is made by, say, cost_seqscan(), so that the planner can make
appropriate choice.  If two tables containing the same number of rows (and
the same size in bytes, perhaps) use different storage implementations,
then, planner's cost parameters remaining same, cost_seqscan() will end up
calculating different costs for the two tables.  Perhaps, SeqScan would be
chosen for one table but not the another based on that.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoY3LXVUPQVdZW70XKp5PsXffO82pXXt%3DbeegcV%2B%3DRsQgg%40mail.gmail.com



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Michael Paquier
On Thu, Jun 22, 2017 at 11:12 AM, Amit Langote
<[hidden email]> wrote:

> On 2017/06/22 10:01, Michael Paquier wrote:
>> #3 implies that the index AM logic is implemented in the table
>> AM. Not saying that it is not useful, but it does not feel natural to
>> have the planner request for a sequential scan, just to have the table
>> AM secretly do some kind of index/skipping scan.
>
> I had read a relevant comment on a pluggable storage thread awhile back
> [1].  In short, the comment was that the planner should be able to get
> some intelligence, via some API, from the heap storage implementation
> about the latter's access cost characteristics.  The storage should
> provide accurate-enough cost information to the planner when such a
> request is made by, say, cost_seqscan(), so that the planner can make
> appropriate choice.  If two tables containing the same number of rows (and
> the same size in bytes, perhaps) use different storage implementations,
> then, planner's cost parameters remaining same, cost_seqscan() will end up
> calculating different costs for the two tables.  Perhaps, SeqScan would be
> chosen for one table but not the another based on that.

Yeah, I agree that the costing part needs some clear attention and
thoughts, and the gains are absolutely huge with the correct
interface. That could be done in a later step though.
--
Michael


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Alexander Korotkov-3
In reply to this post by Haribabu Kommi-2
On Tue, Jun 13, 2017 at 4:50 AM, Haribabu Kommi <[hidden email]> wrote:
On Fri, Oct 14, 2016 at 7:26 AM, Alvaro Herrera <[hidden email]> wrote:
I have sent the partial patch I have to Hari Babu Kommi.  We expect that
he will be able to further this goal some more.

Thanks Alvaro for sharing your development patch.

Most of the patch design is same as described by Alvaro in the first mail [1].
I will detail the modifications, pending items and open items (needs discussion)
to implement proper pluggable storage.

Here I attached WIP patches to support pluggable storage. The patch series
are may not work individually. Still so many things are under development.
These patches are just to share the approach of the current development.

Some notable changes that I did to make the patch work:

1. Added storageam handler to the slot, this is because not all places
the relation is not available in handy.
2. Retained the minimal Tuple in the slot, as this is used in HASH join.
As per the first version, I feel it is fine to allow creating HeapTuple
format data.

Thanks everyone for sharing their ideas in the developer's unconference at
PGCon Ottawa.

Pending items:

1. Replacement of Tuple with slot in Trigger functionality
2. Replacement of Tuple with Slot from storage handler functions.
3. Remove/minimize the use of HeapTuple as a Datum.
4. Replace all references of HeapScanDesc with StorageScanDesc
5. Planner changes to consider the relation storage during the planning.
6. Any planner changes based on the discussion of open items?
7. some Executor changes to consider the storage advantages?

Open Items:

1. The BitmapHeapScan and TableSampleScan are tightly coupled with
HeapTuple and HeapScanDesc, So these scans are directly operating 
on those structures and providing the result.

What about vacuum?  I see vacuum is untouched in the patchset and it is not mentioned in this discussion.
Do you plan to override low-level function like heap_page_prune(), lazy_vacuum_page() etc., but preserve high-level logic of vacuum?
Or do you plan to let pluggable storage implement its own high-level vacuum algorithm?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Alexander Korotkov-3
In reply to this post by Michael Paquier
On Thu, Jun 22, 2017 at 4:01 AM, Michael Paquier <[hidden email]> wrote:
Putting that in a couple of words.
1. Table AM with a 6-byte TID.
2. Table AM with a custom locator format, which could be TID-like.
3. Table AM with no locators.

Getting into having #1 first to work out would already be really
useful for users.

What exactly would be useful for *users*?  Any kind of API itself is completely useless for users, because they are users, not developers.  Storage API could be useful for developers to implement storage AMs whose in turn could be useful for users.  Then while saying that #1 is useful for users, it would be nice to keep in mind particular storage AMs which can be implemented using #1.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 
Reply | Threaded
Open this post in threaded view
|

Re: Pluggable storage

Robert Haas
On Thu, Jun 22, 2017 at 8:32 AM, Alexander Korotkov
<[hidden email]> wrote:

> On Thu, Jun 22, 2017 at 4:01 AM, Michael Paquier <[hidden email]>
> wrote:
>> Putting that in a couple of words.
>> 1. Table AM with a 6-byte TID.
>> 2. Table AM with a custom locator format, which could be TID-like.
>> 3. Table AM with no locators.
>>
>> Getting into having #1 first to work out would already be really
>> useful for users.
>
> What exactly would be useful for *users*?  Any kind of API itself is
> completely useless for users, because they are users, not developers.
> Storage API could be useful for developers to implement storage AMs whose in
> turn could be useful for users.

What's your point?  I assume that is what Michael meant.

> Then while saying that #1 is useful for
> users, it would be nice to keep in mind particular storage AMs which can be
> implemented using #1.

I don't think anybody's arguing with that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1234 ... 6
Previous Thread Next Thread