Avoiding io penalty when updating large objects

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Avoiding io penalty when updating large objects

Mark Dilger
I would like to write a postgres extension type which represents a btree of data
and allows me to access and modify elements within that logical btree.  Assume
the type is named btree_extension, and I have the table:

CREATE TABLE example (
        a   TEXT,
        b   TEXT,
        c   BTREE_EXTENSION,
        UNIQUE(a,b)
);

If, for a given row, the value of c is, say, approximately 2^30 bytes large,
then I would expect it to be divided up into 8K chunks in an external table, and
I should be able to fetch individual chunks of that object (by offset) rather
than having to detoast the whole thing.

But what if I want to update a single chunk, or only a couple chunks?  How can I
go about loading chunks, modifying them, and writing them back to disk, without
incurring the overhead of writing 2^30 bytes back out to disk?  And if I can do
this in a hand coded c function, what does the corresponding SQL statement look
like to call the function?  Is it an update statement?

Also, is it possible that only the rows in the *external* table get marked as
updated during my transaction, or will the row in the "example" table be marked
as updated?

I expect this is not possible, but it would be really great if it were, and I
haven't found a definitive "No, you can't do this" in the documentation yet.
The idea is to store the first and second level entries of a tree directly in
columns "a" and "b", but then to store arbitrarily deep children in a btree type
stored in column "c".  It doesn't make sense to have a really wide table to
represent the tree for multiple reasons, mostly involving data duplication in
the leftward columns but also because you can't know ahead of time how wide to
make the table.

I look forward to any useful responses.

Thanks,

Mark Dilger

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Avoiding io penalty when updating large objects

Alvaro Herrera
On Tue, Jun 28, 2005 at 07:38:43PM -0700, Mark Dilger wrote:

> I would like to write a postgres extension type which represents a btree of
> data and allows me to access and modify elements within that logical btree.
> Assume the type is named btree_extension, and I have the table:
>
> CREATE TABLE example (
> a   TEXT,
> b   TEXT,
> c   BTREE_EXTENSION,
> UNIQUE(a,b)
> );
>
> If, for a given row, the value of c is, say, approximately 2^30 bytes
> large, then I would expect it to be divided up into 8K chunks in an
> external table, and I should be able to fetch individual chunks of that
> object (by offset) rather than having to detoast the whole thing.

I don't think you can do this with the TOAST mechanism.  The problem is
that there's no API which allows you to operate on only certain chunks
of data.  You can do it with large objects though -- those you create
with lo_creat().  You can do lo_seek(), lo_read() and lo_write() as you
see fit.  Of course, this allows you to change the LO by chunks.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [hidden email])
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Avoiding io penalty when updating large objects

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> On Tue, Jun 28, 2005 at 07:38:43PM -0700, Mark Dilger wrote:
>> If, for a given row, the value of c is, say, approximately 2^30 bytes
>> large, then I would expect it to be divided up into 8K chunks in an
>> external table, and I should be able to fetch individual chunks of that
>> object (by offset) rather than having to detoast the whole thing.

> I don't think you can do this with the TOAST mechanism.  The problem is
> that there's no API which allows you to operate on only certain chunks
> of data.

There is the ability to fetch chunks of a toasted value (if it was
stored out-of-line but not compressed).  There is no ability at the
moment to update it by chunks.  If Mark needs the latter then large
objects are probably the best bet.

I'm not sure what it'd take to support chunkwise update of toasted
fields.  Jan, any thoughts?

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Avoiding io penalty when updating large objects

Mark Dilger
Tom Lane wrote:

> Alvaro Herrera <[hidden email]> writes:
>
>>On Tue, Jun 28, 2005 at 07:38:43PM -0700, Mark Dilger wrote:
>>
>>>If, for a given row, the value of c is, say, approximately 2^30 bytes
>>>large, then I would expect it to be divided up into 8K chunks in an
>>>external table, and I should be able to fetch individual chunks of that
>>>object (by offset) rather than having to detoast the whole thing.
>
>
>>I don't think you can do this with the TOAST mechanism.  The problem is
>>that there's no API which allows you to operate on only certain chunks
>>of data.
>
>
> There is the ability to fetch chunks of a toasted value (if it was
> stored out-of-line but not compressed).  There is no ability at the
> moment to update it by chunks.  If Mark needs the latter then large
> objects are probably the best bet.
>
> I'm not sure what it'd take to support chunkwise update of toasted
> fields.  Jan, any thoughts?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to [hidden email] so that your
>        message can get through to the mailing list cleanly

Ok,

If there appears to be a sane path to implementing this, I may be able to
contribute engineering effort to it.  (I manage a group of engineers and could
spare perhaps half a man year towards this.)  But I would like direction as to
how you all think this should be done, or whether it is just a bad idea.

I can also go with the large object approach.  I'll look into that.

Mark Dilger

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org