Best way to keep track of a sliced TOAST

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Best way to keep track of a sliced TOAST

Bruno Hass
Hi,

I've been reading about TOASTing and would like to modify how the slicing works by taking into consideration the type of the varlena field. These changes would support future implementations of type specific optimized TOAST'ing functions. The first step would be to add information to the TOAST so we know if it is sliced or not and by which function it was sliced and TOASTed. This information should not break the current on disk format of TOASTs. I had the idea of putting this information on the varattrib struct va_header, perhaps adding more bit layouts to represent sliced TOASTs. This idea, however, was pointed to me to be a rather naive approach. What would be the best way to do this?

Bruno Hass
Reply | Threaded
Open this post in threaded view
|

Re: Best way to keep track of a sliced TOAST

Robert Haas
On Mon, Mar 11, 2019 at 9:27 AM Bruno Hass <[hidden email]> wrote:
> I've been reading about TOASTing and would like to modify how the slicing works by taking into consideration the type of the varlena field. These changes would support future implementations of type specific optimized TOAST'ing functions. The first step would be to add information to the TOAST so we know if it is sliced or not and by which function it was sliced and TOASTed. This information should not break the current on disk format of TOASTs. I had the idea of putting this information on the varattrib struct va_header, perhaps adding more bit layouts to represent sliced TOASTs. This idea, however, was pointed to me to be a rather naive approach. What would be the best way to do this?

Well, you can't really use va_header, because every possible bit
pattern for va_header means something already.  The first byte tells
us what kind of varlena we have:

 * Bit layouts for varlena headers on big-endian machines:
 *
 * 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
 * 01xxxxxx 4-byte length word, aligned, *compressed* data (up to 1G)
 * 10000000 1-byte length word, unaligned, TOAST pointer
 * 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 126b)
 *
 * Bit layouts for varlena headers on little-endian machines:
 *
 * xxxxxx00 4-byte length word, aligned, uncompressed data (up to 1G)
 * xxxxxx10 4-byte length word, aligned, *compressed* data (up to 1G)
 * 00000001 1-byte length word, unaligned, TOAST pointer
 * xxxxxxx1 1-byte length word, unaligned, uncompressed data (up to 126b)

All of the bits other than the ones that tell us what kind of varlena
we've got are part of the length word itself; you couldn't use any bit
pattern for some other purpose without breaking on-disk compatibility
with existing releases.  What you could possibly do is add a new
possible value of vartag_external, which tells us what "kind" of
toasted datum we've got.  Currently, toasted datums stored on disk are
always type 18, but there's no reason that I know of why we couldn't
have more than one possibility there.

However, I think you might want to discuss on this mailing list a bit
more about what you are hoping to achieve before you do too much
development, at least if you aspire to get something committed.  A
project like the one you are proposing sounds like something not for
the faint of heart, and it's not really clear what benefits you
anticipate.  I think there has been previous discussion of this topic
at least for jsonb, so you might also want to search the archives for
those discussions.  I wouldn't go so far as to say that this idea
can't work or wouldn't have any value, but it does seem like the kind
of thing where you could spend a lot of time going down a dead end,
and discussion on the list might help you avoid some of those dead
ends.

It seems to me that making this overly pluggable is likely to be a net
negative, because there probably aren't really that many different
ways of doing this that are useful, and because having to store more
identifying information will make the toasted datum larger.  One idea
is to let the datatype divide the datum up into variable-sized chunks
and then have the on-disk format store a list of chunk lengths in
chunk 0 (and following, if there are lots of chunks?) followed by the
chunks themselves.  The data would all go into the TOAST table as it
does today, and the TOASTed data could be read without knowing
anything about the data type.  However, code that knows how the data
was chunked at TOAST time could try to speed things up by operating
directly on the compressed data if it can figure out which chunk it
needs without fetching everything.

But that is just an idea, and it might turn out to suck.

Nice name, by the way, if an inferior spelling.  :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

RE: Best way to keep track of a sliced TOAST

Bruno Hass
> It seems to me that making this overly pluggable is likely to be a net
> negative, because there probably aren't really that many different
> ways of doing this that are useful, and because having to store more
> identifying information will make the toasted datum larger.  One idea
> is to let the datatype divide the datum up into variable-sized chunks
> and then have the on-disk format store a list of chunk lengths in
> chunk 0 (and following, if there are lots of chunks?) followed by the
> chunks themselves.  The data would all go into the TOAST table as it
> does today, and the TOASTed data could be read without knowing
> anything about the data type.  However, code that knows how the data
> was chunked at TOAST time could try to speed things up by operating
> directly on the compressed data if it can figure out which chunk it
> needs without fetching everything.

This idea is what I was hoping to achieve. Would we be able to make optimizations on deTOASTing  just by storing the chunk lengths in chunk 0? Also, wouldn't it break existing functions by dedicating a whole chunk (possibly more) to such metadata?


De: Robert Haas <[hidden email]>
Enviado: terça-feira, 12 de março de 2019 14:34
Para: Bruno Hass
Cc: pgsql-hackers
Assunto: Re: Best way to keep track of a sliced TOAST
 
On Mon, Mar 11, 2019 at 9:27 AM Bruno Hass <[hidden email]> wrote:
> I've been reading about TOASTing and would like to modify how the slicing works by taking into consideration the type of the varlena field. These changes would support future implementations of type specific optimized TOAST'ing functions. The first step would be to add information to the TOAST so we know if it is sliced or not and by which function it was sliced and TOASTed. This information should not break the current on disk format of TOASTs. I had the idea of putting this information on the varattrib struct va_header, perhaps adding more bit layouts to represent sliced TOASTs. This idea, however, was pointed to me to be a rather naive approach. What would be the best way to do this?

Well, you can't really use va_header, because every possible bit
pattern for va_header means something already.  The first byte tells
us what kind of varlena we have:

 * Bit layouts for varlena headers on big-endian machines:
 *
 * 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
 * 01xxxxxx 4-byte length word, aligned, *compressed* data (up to 1G)
 * 10000000 1-byte length word, unaligned, TOAST pointer
 * 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 126b)
 *
 * Bit layouts for varlena headers on little-endian machines:
 *
 * xxxxxx00 4-byte length word, aligned, uncompressed data (up to 1G)
 * xxxxxx10 4-byte length word, aligned, *compressed* data (up to 1G)
 * 00000001 1-byte length word, unaligned, TOAST pointer
 * xxxxxxx1 1-byte length word, unaligned, uncompressed data (up to 126b)

All of the bits other than the ones that tell us what kind of varlena
we've got are part of the length word itself; you couldn't use any bit
pattern for some other purpose without breaking on-disk compatibility
with existing releases.  What you could possibly do is add a new
possible value of vartag_external, which tells us what "kind" of
toasted datum we've got.  Currently, toasted datums stored on disk are
always type 18, but there's no reason that I know of why we couldn't
have more than one possibility there.

However, I think you might want to discuss on this mailing list a bit
more about what you are hoping to achieve before you do too much
development, at least if you aspire to get something committed.  A
project like the one you are proposing sounds like something not for
the faint of heart, and it's not really clear what benefits you
anticipate.  I think there has been previous discussion of this topic
at least for jsonb, so you might also want to search the archives for
those discussions.  I wouldn't go so far as to say that this idea
can't work or wouldn't have any value, but it does seem like the kind
of thing where you could spend a lot of time going down a dead end,
and discussion on the list might help you avoid some of those dead
ends.

It seems to me that making this overly pluggable is likely to be a net
negative, because there probably aren't really that many different
ways of doing this that are useful, and because having to store more
identifying information will make the toasted datum larger.  One idea
is to let the datatype divide the datum up into variable-sized chunks
and then have the on-disk format store a list of chunk lengths in
chunk 0 (and following, if there are lots of chunks?) followed by the
chunks themselves.  The data would all go into the TOAST table as it
does today, and the TOASTed data could be read without knowing
anything about the data type.  However, code that knows how the data
was chunked at TOAST time could try to speed things up by operating
directly on the compressed data if it can figure out which chunk it
needs without fetching everything.

But that is just an idea, and it might turn out to suck.

Nice name, by the way, if an inferior spelling.  :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Reply | Threaded
Open this post in threaded view
|

Re: Best way to keep track of a sliced TOAST

Robert Haas
On Fri, Mar 15, 2019 at 7:37 AM Bruno Hass <[hidden email]> wrote:
> This idea is what I was hoping to achieve. Would we be able to make optimizations on deTOASTing  just by storing the chunk lengths in chunk 0?

I don't know. I guess we could also NOT store the chunk lengths and
just say that if you don't know which chunk you want by chunk number,
your only other alternative is to read the chunks in order.  The
problem with that is that it you can no longer index by byte-position
without fetching every chunk prior to that byte position, but maybe
that's not important enough to justify the overhead of a list of chunk
lengths.  Or maybe it depends on what you want to do with it.

Again, stuff like what you are suggesting here has been suggested
before.  I think the problem is if someone did the work to invent such
an infrastructure, that wouldn't actually do anything by itself.  We'd
then need to find an application of it where it brought us some clear
advantage.  As I said in my previous email, jsonb seems like a
promising candidate, but I don't think it's a slam dunk.  What would
the design look like, exactly?  Which operations would get faster, and
could we really make them work?  The existing format is, I think,
designed with a byte-oriented format in mind, and a chunk-oriented
format might have different design constraints.  It seems like an idea
with potential, but there's a lot of daylight between a directional
idea with potential and a specific idea accompanied by a high-quality
implementation thereof.

> Also, wouldn't it break existing functions by dedicating a whole chunk (possibly more) to such metadata?

Anybody writing such a patch would have to be prepared to fix any such
breakage that occurred, at least as regards core code.  I would guess
that this could be done without breaking too much third-party code,
but I guess it depends on exactly what the author of this hypothetical
patch ends up changing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company