Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Peter Geoghegan-4
Anastasia's nbtree deduplication patch [1] has an open problem that I
would like to find a solution for: it currently assumes that there is
no difference between binary equality and opclass equality. That won't
work for opclasses such as btree/numeric, because compressing equal
numeric datums could destroy display scale if equal numeric datums
were naively lumped together (actually, the deduplication patch
doesn't work like that, but it has other subtle problems due to not
having worked out fundamental definitional issues).

We don't need to be able to assume that binary equality is exactly the
same thing as opclass equality at the level of individual tuples. We
only need to be able to assume that the user cannot observe any
differences when they are shown output for two datums that are
opclass-equal for any opclass that supports deduplication (i.e. cases
like the numeric_ops case just won't work, so we shouldn't event try).
I believe that it would be okay if we treated two IndexTuples as
equivalent and therefore targets to store together in the same posting
list when they happen to have distinct binary representations due to
the original datums having different TOAST input state. In short, the
deduplication patch cannot tolerate being unable to store
opclass-equal IndexTuples in the same posting list when the opclass
(or the underlying type being indexed) somehow allows that equality
isn't equivalence -- that's simply unsupportable. The opclass gets one
chance to say whether or not it vetoes the use of deduplication: at
CREATE INDEX time.

Consumers of this new infrastructure probably won't be limited to the
deduplication feature; the same infrastructure will be needed for a
B-Tree prefix compression patch (I'm thinking of a configurable CREATE
INDEX prefix compression feature). GIN never had to solve this problem
because its indexes are always lossy, and cannot support index-only
scans. It seems likely that a scheme like the one I have in mind can
work for the vast majority of Postgres B-Tree indexes in practice, so
I don't think that the user-visible restrictions I'm considering will
make the patch significantly less useful (it's already very useful).
The most notable restriction for users will almost certainly be not
supporting deduplication within indexes that use nondeterministic
collations. They were already paying a performance penalty during
hashing, though.

I would like to:

* Get some buy-in on whether or not the precise distinctions I would
like to make are correct for deduplication in particular, and as
useful as possible for other cases that we may need to add later on.

* Figure out the exact interface through which opclass/opfamily
authors can represent that their notion of equality is compatible with
deduplication/compression. (I think that the use of nondeterministic
collations should disable deduplication without explicit action from
the operator class -- that should just be baked in.)

* Mark most existing btree operator classes as being compatible with
deduplication as part of making the patch committable. As I said, I
believe that their semantics are already compatible with what we need
for deduplication to work sensibly, aside from a handful of specific
exceptions.

In any case, I'm certain that problems like the btree/numeric display
scale problem are simply not worth solving directly. That would add a
huge amount of complexity for very little benefit.

[1] https://commitfest.postgresql.org/24/2202/
--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Tom Lane-2
Peter Geoghegan <[hidden email]> writes:
> We don't need to be able to assume that binary equality is exactly the
> same thing as opclass equality at the level of individual tuples. We
> only need to be able to assume that the user cannot observe any
> differences when they are shown output for two datums that are
> opclass-equal for any opclass that supports deduplication (i.e. cases
> like the numeric_ops case just won't work, so we shouldn't event try).

Hmm, so that would exclude the optimization for numeric, float4/float8,
and nondeterministic text collations.  Anything else?

I agree that teaching opclasses to say whether this is okay is a
reasonable approach.

> Consumers of this new infrastructure probably won't be limited to the
> deduplication feature;

Indeed, we run up against this sort of thing all the time in, eg, planner
optimizations.  I think some sort of "equality is precise" indicator
would be really useful for a lot of things.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Peter Geoghegan-4
On Sun, Aug 25, 2019 at 1:56 PM Tom Lane <[hidden email]> wrote:
> Hmm, so that would exclude the optimization for numeric, float4/float8,
> and nondeterministic text collations.  Anything else?

Any pseudo-type whose output function could possibly be dependent on
the output function of another type (in case it happens to be one of
the types that definitely aren't safe). Maybe we could make fine
distinctions about pseudo-type safety in certain contexts, but that
doesn't matter to the deduplication patch.

> I agree that teaching opclasses to say whether this is okay is a
> reasonable approach.

Great.

> > Consumers of this new infrastructure probably won't be limited to the
> > deduplication feature;
>
> Indeed, we run up against this sort of thing all the time in, eg, planner
> optimizations.  I think some sort of "equality is precise" indicator
> would be really useful for a lot of things.

The case that I happened to think of was "collation strength
reduction". In other words, an optimization that has the planner use a
merge equijoin whose joinqual involves two text columns using the "C"
collation, even though the "C" collation isn't otherwise usable.
Perhaps there are far more compelling planner optimization that I
haven't considered, though. This idea probably has problems with
interesting sort orders that aren't actually that interesting.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Peter Geoghegan-4
On Sun, Aug 25, 2019 at 2:18 PM Peter Geoghegan <[hidden email]> wrote:
> > Indeed, we run up against this sort of thing all the time in, eg, planner
> > optimizations.  I think some sort of "equality is precise" indicator
> > would be really useful for a lot of things.
>
> The case that I happened to think of was "collation strength
> reduction".

I was thinking of stashing an "equality is precise" flag in the
metapage of each nbtree index, since we will only determine this once,
at CREATE INDEX time. That would make it fairly natural for the
planner to ask about the "equality is precise"-ness of the index at
the same point that it calls _bt_getrootheight(): within
get_relation_info().

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Tom Lane-2
Peter Geoghegan <[hidden email]> writes:
> I was thinking of stashing an "equality is precise" flag in the
> metapage of each nbtree index, since we will only determine this once,
> at CREATE INDEX time.

Sure.

> That would make it fairly natural for the
> planner to ask about the "equality is precise"-ness of the index at
> the same point that it calls _bt_getrootheight(): within
> get_relation_info().

The planner will almost certainly want to ask the opclass directly,
because most of the places where it wants to know this sort of thing
about operator behavior have nothing to do with indexes.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Peter Geoghegan-4
On Sun, Aug 25, 2019 at 2:40 PM Tom Lane <[hidden email]> wrote:
> > I was thinking of stashing an "equality is precise" flag in the
> > metapage of each nbtree index, since we will only determine this once,
> > at CREATE INDEX time.
>
> Sure.

I suppose that we'd add something new to CREATE OPERATOR CLASS to make
this work? My instinct is to avoid adding things that are only
meaningful for a single AM to interfaces like CREATE OPERATOR CLASS,
but the system already has numerous dependencies on B-Tree opclasses
that seem comparable to me.

There is a single case where nbtree stores a type that differs from
the type actually being indexed by the operator class: the "name"
case, where the underlying storage type is actually cstring. I'm not
sure whether or not this needs to be treated as its own kind of
special case. I suppose that we can ignore it completely, because
we're not directly concerned with the physical representation used
within an index. In fact, a major goal for this new infrastructure is
that nbtree gets to fully own the representation (it just needs to
know about the high level or logical requirements).

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Peter Geoghegan-4
On Sun, Aug 25, 2019 at 2:55 PM Peter Geoghegan <[hidden email]> wrote:
> I suppose that we'd add something new to CREATE OPERATOR CLASS to make
> this work? My instinct is to avoid adding things that are only
> meaningful for a single AM to interfaces like CREATE OPERATOR CLASS,
> but the system already has numerous dependencies on B-Tree opclasses
> that seem comparable to me.

Another question is whether or not it would be okay to define
"equality is precise"-ness to be "the system's generic equality
function works perfectly as a drop-in replacement for my own equality
operator's function". The system's generic equality function could be
the recently added datum_image_eq() function -- that looks like it
will do exactly what I have in mind. This would be a new way of using
datum_image_eq(), I think, since it wouldn't be okay for it to give an
answer that differed from the equality operator's function. It looks
like existing datum_image_eq() callers can deal with false negatives
(but not false positives, which are impossible).

This exceeds what is strictly necessary for the deduplication patch,
but it seems like the patch should make comparisons as fast as
possible in the context of deduplicating items (it would be nice if it
could just use datum_image_eq instead of an insertion scankey when
doing many comparisons to deduplicate items). We can imagine a
datatype with undefined garbage bytes that affect the answer that
datum_image_eq() gives, but could be safe targets for deduplication,
so it's not clear if being this aggressive will work. But maybe that
isn't actually possible among types that aren't inherently unsafe for
deduplication. And maybe we could be more aggressive with
optimizations in numerous other contexts by defining "equality is
precise"-ness as strict binary equality after accounting for TOAST
compression.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Antonin Houska-2
In reply to this post by Peter Geoghegan-4
Peter Geoghegan <[hidden email]> wrote:

> Consumers of this new infrastructure probably won't be limited to the
> deduplication feature;

It'd also solve an open problem of the aggregate push-down patch [1], in
particular see the mention of pg_opclass in [2]: the partial aggregate
node below the final join must not put multiple opclass-equal values of
which are not byte-wise equal into the same group because some
information needed by WHERE or JOIN/ON condition may be lost this
way. The scale of the numeric type is the most obvious example.

> I would like to:
>
> * Get some buy-in on whether or not the precise distinctions I would
> like to make are correct for deduplication in particular, and as
> useful as possible for other cases that we may need to add later on.
>
> * Figure out the exact interface through which opclass/opfamily
> authors can represent that their notion of equality is compatible with
> deduplication/compression.

It's not entirely clear to me whether opclass or opfamily should carry
this information. opclass probably makes more sense for index related
problems and the aggregate push-down patch can live with that. I don't
see particular reason to add any flag to opfamily. (Planner uses uses
both pg_opclass and pg_opfamily catalogs.)

I think the fact that the aggregate push-down would benefit from this
enhancement should affect choice of the new catalog attribute name,
i.e. it should be not mention words as concrete as "deduplication" or
"compression".

> (I think that the use of nondeterministic collations should disable
> deduplication without explicit action from the operator class -- that
> should just be baked in.)

(I think the aggregate push-down needs to consider the nondeterministic
collations too, I missed that so far.)

[1] https://commitfest.postgresql.org/24/1247/

[2] https://www.postgresql.org/message-id/10529.1547561178%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Anastasia Lubennikova
26.08.2019 14:15, Antonin Houska wrote:

> Peter Geoghegan <[hidden email]> wrote:
>
>> Consumers of this new infrastructure probably won't be limited to the
>> deduplication feature;
> It'd also solve an open problem of the aggregate push-down patch [1], in
> particular see the mention of pg_opclass in [2]: the partial aggregate
> node below the final join must not put multiple opclass-equal values of
> which are not byte-wise equal into the same group because some
> information needed by WHERE or JOIN/ON condition may be lost this
> way. The scale of the numeric type is the most obvious example.
>
>> I would like to:
>>
>> * Get some buy-in on whether or not the precise distinctions I would
>> like to make are correct for deduplication in particular, and as
>> useful as possible for other cases that we may need to add later on.
>>
>> * Figure out the exact interface through which opclass/opfamily
>> authors can represent that their notion of equality is compatible with
>> deduplication/compression.
> It's not entirely clear to me whether opclass or opfamily should carry
> this information. opclass probably makes more sense for index related
> problems and the aggregate push-down patch can live with that. I don't
> see particular reason to add any flag to opfamily. (Planner uses uses
> both pg_opclass and pg_opfamily catalogs.)
>
> I think the fact that the aggregate push-down would benefit from this
> enhancement should affect choice of the new catalog attribute name,
> i.e. it should be not mention words as concrete as "deduplication" or
> "compression".

The patch implementing new opclass option is attached.

It adds new attribute pg_opclass.opcisbitwise, which is set to true if
opclass equality is the same as binary equality.
By default it is true. It is set to false for numeric and float4, float8.

Does anyarray opclasses need special treatment?

New syntax for create opclass is  "CREATE OPERATOR CLASS NOT BITWISE ..."

Any ideas on better names?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


v1-Opclass-bitwise-equality.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Antonin Houska-2
Anastasia Lubennikova <[hidden email]> wrote:

> The patch implementing new opclass option is attached.
>
> It adds new attribute pg_opclass.opcisbitwise, which is set to true if opclass
> equality is the same as binary equality.
> By default it is true.

I think the default value should be false and we should only set it to true
for individual opclasses which do meet the bitwise equality requirement. Also
extension authors should explicitly state that their data types are bitwise
equal. Otherwise the existing opclasses, when created via pg_dump ->
pg_restore, can be used by the system incorrectly.

> It is set to false for numeric and float4, float8.

Are you sure about these?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Andrew Gierth
>>>>> "Antonin" == Antonin Houska <[hidden email]> writes:

 >> It is set to false for numeric and float4, float8.

 Antonin> Are you sure about these?

numeric values can compare equal but have different display scales (see
hash_numeric).

float4 and float8 both have representations for -0, which compares equal
to 0. (numeric technically has a representation for -0 too, but I
believe the current code carefully avoids ever generating it.)

--
Andrew (irc:RhodiumToad)


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Anastasia Lubennikova
In reply to this post by Antonin Houska-2
01.10.2019 8:41, Antonin Houska wrote:

> Anastasia Lubennikova <[hidden email]> wrote:
>
>> The patch implementing new opclass option is attached.
>>
>> It adds new attribute pg_opclass.opcisbitwise, which is set to true if opclass
>> equality is the same as binary equality.
>> By default it is true.
> I think the default value should be false and we should only set it to true
> for individual opclasses which do meet the bitwise equality requirement. Also
> extension authors should explicitly state that their data types are bitwise
> equal. Otherwise the existing opclasses, when created via pg_dump ->
> pg_restore, can be used by the system incorrectly.
Thank you for the feedback.

At first I implemented bitwise as default, because it is more common .
Though, I agree that it's essential to avoid false positives here.
The new version of the patch is attached. I also updated pg_dump.

A few more open questions:
1) How to handle contrib modules that create new opclasses?
Since default is 'not bitwise' it means that various operator classes
created in extensions
such as bloom, btree_gin and others, won't be able to take advantage of
various optimizations
that require the opclass to be BITWISE.

'v2-Opclass-bitwise-equality-0002' patch simply adds BITWISE keyword
where necessary.

2) Whether we should provide ALTER OPERATOR CLASS SET BITWISE syntax?

3) Current patch modifies regression test so that it checks CREATE
OPCLASS BITWISE syntax.
Is there anything else worth testing? As I see it, this patch is just
about infrastructure changes,
and more specific tests will be added by features that will implement
further optimizations.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


v2-Opclass-bitwise-equality-0002.patch (27K) Download Attachment
v2-Opclass-bitwise-equality-0001.patch (37K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Peter Geoghegan-4
On Mon, Oct 28, 2019 at 11:11 AM Anastasia Lubennikova
<[hidden email]> wrote:

> At first I implemented bitwise as default, because it is more common .
> Though, I agree that it's essential to avoid false positives here.
> The new version of the patch is attached. I also updated pg_dump.
>
> A few more open questions:
> 1) How to handle contrib modules that create new opclasses?
> Since default is 'not bitwise' it means that various operator classes
> created in extensions
> such as bloom, btree_gin and others, won't be able to take advantage of
> various optimizations
> that require the opclass to be BITWISE.

What optimizations? Do we anticipate that other index AMs will benefit
from BITWISE-ness?

> 'v2-Opclass-bitwise-equality-0002' patch simply adds BITWISE keyword
> where necessary.
>
> 2) Whether we should provide ALTER OPERATOR CLASS SET BITWISE syntax?

I think that that's probably not desirable. There should at least be a
strong practical advantage if we go that way. This would mean ALTER
OPERATOR CLASS could change the "substance" of an opclass, which is
fundamentally different from what it can do already (it currently just
changes the owner, or the schema that it is stored in).

> 3) Current patch modifies regression test so that it checks CREATE
> OPCLASS BITWISE syntax.
> Is there anything else worth testing? As I see it, this patch is just
> about infrastructure changes,
> and more specific tests will be added by features that will implement
> further optimizations.

I think so too -- this is really about associating a single piece of
information with an operator class.

BTW: No need to bump catversion when posting a patch, which I see in
"v2-Opclass-*0001.patch". That is our policy. (A catversion bump is
generally supposed to be done at the last minute, just as the patch is
committed. This avoids unnecessary conflicts against the master branch
over time, as a patch is developed.)

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Anastasia Lubennikova
14.11.2019 0:25, Peter Geoghegan wrote:

> On Mon, Oct 28, 2019 at 11:11 AM Anastasia Lubennikova
> <[hidden email]>  wrote:
>> At first I implemented bitwise as default, because it is more common .
>> Though, I agree that it's essential to avoid false positives here.
>> The new version of the patch is attached. I also updated pg_dump.
>>
>> A few more open questions:
>> 1) How to handle contrib modules that create new opclasses?
>> Since default is 'not bitwise' it means that various operator classes
>> created in extensions
>> such as bloom, btree_gin and others, won't be able to take advantage of
>> various optimizations
>> that require the opclass to be BITWISE.
> What optimizations? Do we anticipate that other index AMs will benefit
> from BITWISE-ness?
I was thinking of possible planner optimizations, that Tom mentioned up
thread.
Though, I don't have any specific examples. Anyway, we can implement
support for user-defined opclasses later.
>> 3) Current patch modifies regression test so that it checks CREATE
>> OPCLASS BITWISE syntax.
>> Is there anything else worth testing? As I see it, this patch is just
>> about infrastructure changes,
>> and more specific tests will be added by features that will implement
>> further optimizations.
> I think so too -- this is really about associating a single piece of
> information with an operator class.
Great. It seems that the patch is ready for commit.
I attached new version with pg_opclass documentation update.

One more thing I am uncertain about  is array_ops. Arrays may contain
bitwise and not bitwise element types.
What is the correct value of opcisbitwise the array_ops itself?

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


v3-Opclass-bitwise-equality-0002.patch (27K) Download Attachment
v3-Opclass-bitwise-equality-0001.patch (37K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Antonin Houska-2
Anastasia Lubennikova <[hidden email]> wrote:

> I attached new version with pg_opclass documentation update.
>
> One more thing I am uncertain about  is array_ops. Arrays may contain bitwise
> and not bitwise element types.
> What is the correct value of opcisbitwise the array_ops itself?

How about setting opcisbitwise to false for the array_ops opclass and checking
opcisbitwise of the element type whenever we need to know whether the array is
"bitwise equal"? When checking array_eq(), I thought whether the existence of
"expanded array" format is a problem but it does not seem to be: the
conversion of "expanded" value to "flat" value and then back to the "expanded"
should not change the array contents.

Anyway, in the current version of the patch I see that array_ops opclasses
have opcisbitwise=true. It should be false even if you don't use the approach
of checking the element type.

Besides that, I think that record_ops is similar to array_ops and therefore it
should not set opcisbitwise to true.

I also remember that, when thinking about the problem in the context of the
aggregate push down patch, I considered some of the geometric types
problematic. For example, box_eq() uses this expression

#define FPeq(A,B) (fabs((A) - (B)) <= EPSILON)

so equality does not imply bitwise equality here. Maybe you should only set
the flag for btree opclasses for now.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Robert Haas
In reply to this post by Peter Geoghegan-4
On Wed, Nov 13, 2019 at 4:25 PM Peter Geoghegan <[hidden email]> wrote:
> I think that that's probably not desirable. There should at least be a
> strong practical advantage if we go that way. This would mean ALTER
> OPERATOR CLASS could change the "substance" of an opclass, which is
> fundamentally different from what it can do already (it currently just
> changes the owner, or the schema that it is stored in).

My impression is that this is more of an implementation restriction
than a design goal. I don't really remember the details, but it seems
to me that there were locking and/or cache invalidation problems with
making ALTER OPERATOR CLASS do more substantive things -- and that it
was because of those problems, not a lack of desire, that we didn't
support it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Peter Geoghegan-4
On Thu, Dec 19, 2019 at 12:05 PM Robert Haas <[hidden email]> wrote:
> My impression is that this is more of an implementation restriction
> than a design goal. I don't really remember the details, but it seems
> to me that there were locking and/or cache invalidation problems with
> making ALTER OPERATOR CLASS do more substantive things -- and that it
> was because of those problems, not a lack of desire, that we didn't
> support it.

I agree with you. My point was only that this is something that the
operator class author is really expected to get right the first time
around -- just like the behavior of B-Tree support function 1. We're
really only concerned about the upgrade path for external types that
could see a benefit from the optimization planned for nbtree (and
possibly other such optimization). Providing a non-disruptive way to
get that benefit after a pg_upgrade only seems like a nice-to-have to
me, because it's not as if anything will stop working as well as it
once did. Also, there aren't that many external types that will be
made more useful by being able to use optimizations like
deduplication; in practice almost all B-Tree indexes only use a small
handful of operator classes that are shipped in core Postgres. Once
you're using common types like text and integer, a pg_upgrade'd
database is only a REINDEX away from being able to use deduplication
(though I am not even sure if even that will be necessary in the final
patch; I hope to be able to avoid even that inconvenience with indexes
using core operator classes).

If the underlying behavior of an operator class actually changes, then
that's a disaster for all the usual reasons. It doesn't make that much
sense to reverse an earlier decision to make an operator class
BITWISE. Better to drop everything, and recreate everything, since
your indexes should be considered corrupt anyway. (Also, I don't think
that it's that hard to get it right, so this will probably never
happen.)

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Anastasia Lubennikova
In reply to this post by Antonin Houska-2
19.12.2019 18:19, Antonin Houska wrote:

> Anastasia Lubennikova <[hidden email]> wrote:
>
>> I attached new version with pg_opclass documentation update.
>>
>> One more thing I am uncertain about  is array_ops. Arrays may contain bitwise
>> and not bitwise element types.
>> What is the correct value of opcisbitwise the array_ops itself?
> How about setting opcisbitwise to false for the array_ops opclass and checking
> opcisbitwise of the element type whenever we need to know whether the array is
> "bitwise equal"? When checking array_eq(), I thought whether the existence of
> "expanded array" format is a problem but it does not seem to be: the
> conversion of "expanded" value to "flat" value and then back to the "expanded"
> should not change the array contents.
>
> Anyway, in the current version of the patch I see that array_ops opclasses
> have opcisbitwise=true. It should be false even if you don't use the approach
> of checking the element type.
>
> Besides that, I think that record_ops is similar to array_ops and therefore it
> should not set opcisbitwise to true.
>
> I also remember that, when thinking about the problem in the context of the
> aggregate push down patch, I considered some of the geometric types
> problematic. For example, box_eq() uses this expression
>
> #define FPeq(A,B) (fabs((A) - (B)) <= EPSILON)
>
> so equality does not imply bitwise equality here. Maybe you should only set
> the flag for btree opclasses for now.
Thank you for pointing out at the issue with geometric opclasses.
If I understand it correctly, regular float types are not bitwise as well.

I updated the patchset.
The first patch now contains only infrastructure changes
and the second one sets opcisbitwise for btree opclasses in pg_opclass.dat.

I've tried to be conservative and only mark types that are 100% bitwise
safe.
See attached v2-Opclass-isbitwise.out file.

Non-atomic types, such as record, range, json and enum depend on element
types.
Text can be considered bitwise (i.e. texteq uses memcmp) only when
specific collation clauses are satisfied.

We can make this 'opcisbitwise' parameter enum (or char) instead of
boolean to mark
"always bitwise", "never bitwise" and "maybe bitwise". Though, I doubt
if it will be helpful in any real use case.

What do you think?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


v4-Opclass-bitwise-equality-0001.patch (13K) Download Attachment
v4-Opclass-bitwise-equality-0002.patch (8K) Download Attachment
v4-Opclass-isbitwise.out (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Alvaro Herrera-9
In reply to this post by Anastasia Lubennikova

> @@ -106,6 +106,18 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL
>      </listitem>
>     </varlistentry>
>  
> +    <varlistentry>
> +    <term><literal>NOT BITWISE</literal></term>
> +    <listitem>
> +     <para>
> +      If present, the operator class equality is not the same as equivalence.
> +      For example, two numerics can compare equal but have different scales.
> +      Most opclasses implement bitwise equal comparison, alternative behaviour
> +      must be set explicitly.
> +     </para>
> +    </listitem>
> +   </varlistentry>

Am I the only one bothered by the fact that this patch (and all
downstream discussion) reduces the term "bitwise equality" to simply
"bitwise"?  It reads really strange to me, both in the resulting SQL
grammar as well as in struct names, code comments etc.  "This operator
class is bitwise."

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> Am I the only one bothered by the fact that this patch (and all
> downstream discussion) reduces the term "bitwise equality" to simply
> "bitwise"?  It reads really strange to me, both in the resulting SQL
> grammar as well as in struct names, code comments etc.  "This operator
> class is bitwise."

I agree, that's really poor English.

                        regards, tom lane


12