Proposal: Global Index

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Proposal: Global Index

Ibrar Ahmed-4

A global index by very definition is a single index on the parent table that maps to many
underlying table partitions. The parent table itself does not have any underlying storage,
so it must, therefore, retrieve the data satisfying index constraints from the underlying tables.
In very crude terms, it is an accumulation of data from table partitions so that data spanning
across multiple partitions are accessed in one go as opposed to individually querying each
partition.

For the initial version of this work, we are only considering to build b-tree global indexes.

- Partitioned Index (Index Partitioning)
When global indexes become too large, then those are partitioned to keep the performance
and maintenance overhead manageable. These are not within the scope of this work.


- Local Index
A local index is an index that is local to a specific table partition; i.e. it doesn’t span across
multiple partitions. So, when we create an index on a parent table, it will create a separate
index for all its partitions. Unfortunately, PostgreSQL uses the terminology of “partitioned index”
when it refers to local indexes. This work with fix this terminology for PostgreSQL so that the
nomenclature remains consistent with other DBMS.


- Why We Need Global Index?
A global index is expected to give two very important upgrades to the partitioning feature set in
PostgreSQL. It is expected to give a significant improvement in read-performance for queries
targeting multiple local indexes of partitions. It also adds a unique constraint across partitions.


- Unique Constraint
Data uniqueness is a critical requirement for building an index. For global indexes that span across
multiple partitions, uniqueness will have to be enforced on index column(s). This effectively translates
into a unique constraint.


- Performance
Currently, the pseudo index created on the parent table of partitions does not contain any
data. Rather, it dereferences to the local indexes when an index search is required. This
means that multiple indexes will have to be evaluated and data to be combined thereafter.
However, with the global indexes, data will reside with global index declared on the parent
table. This avoids the need for multi-level index lookups. So read performance is expected
to be significantly higher in cases. There will however be a negative performance impact
during write (insert/update) of data. This is discussed in more detail later on.


- Creating a GLOBAL Index - Syntax
A global index may be created with the addition of a “GLOBAL” keyword to the index statement.
Alternatively, one could specify the “LOCAL” keyword to create local indexes on partitions.
We are suggesting to call this set of keywords: “partition_index_type”. By default,
partition_index_type will be set as LOCAL. Here is a sample of the create index syntax.

CREATE Index idx parent (columns) [GLOBAL | LOCAL];

Note: There is no shift/reduced by adding these options.


- Pointing Index to Tuple
Currently, CTID carries a page and offset information for a known heap (table name). However,
in the context of global indexes, this information within an index is insufficient. Since the index is
expected to carry tuples from multiple partitions (heaps), CTID alone will not be able to link an index
node to a tuple. This requires carrying additional data for the heap name to be stored with each
index node.

How this should be implemented is a point to be discussed. A few possibilities are listed below:

-- Expand CTID to include a relfilenode id. In PostgreSQL-Conf Asia, Bruce suggested having the OID
instead of relfilenode as relfilenode can be duplicated across tablespaces.
      -- Using OID introduces another complication where we would need to query catalog for OID to
         heap mapping.

-- The second option is to have a variable-length CTID. We can reserve some top-level bit for segregation of
Global CTID or Standard CTID. Robert Haas suggested in PostgreSQL-EU to discuss this with Peter Geoghegan.
      -- I discussed it with Peter and he believes that it is a very invasive approach that requires a whole lot of
         the effort to get committed.

-- Heikki pointed me to include heap specific information using the INCLUDE keyword so that heap information
is stored with each index node as data.
      -- We (Peter and I) also discussed that option and this looks a more easy and non-invasive option.


- Optimizer
The challenge with optimizer is a selection between local and global indexes when both are present.
How do we: 

-- Evaluate the cost of scanning a global index?
-- When should the LOCAL index be preferred over the GLOBAL index and vice versa?
        -- Should we hit a GLOBAL index when the query is targeting a couple of partitions only?
        -- We need to consider the sizes of those partitions being hit and the sizes of partitions not being hit.
-- Bruce suggested that we prioritize a GLOBAL index in the first version so that in every case,
the GLOBAL index is utilized.


- Write Performance and Vacuum
There will be some write performance degradation because every change in partition tables must
propagate upwards to the GLOBAL index on the parent table. This can be thought of as another index
on a table, however, the [slight] performance degradation will be due to the fact that the GLOBAL
index may carry a much bigger dataset with data from multiple partitions resulting in a higher tree
traversal and update time. This applies to both write and vacuum processes.

It is still an open question though on how this will be handled within the code and how we can better
optimize this process.

I have a POC patch and working on finalizing the patch, Hamid Akhtar is also working with me on this
work.



--
Ibrar Ahmed
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Tom Lane-2
Ibrar Ahmed <[hidden email]> writes:
> A global index by very definition is a single index on the parent table
> that maps to many
> underlying table partitions.

I believe that the current design of partitioning is explicitly intended
to avoid the need for such a construct.  It'd be absolutely disastrous
to have such a thing from many standpoints, including the breadth of
locking needed to work with the global index, the difficulty of vacuuming,
and the impossibility of cheaply attaching or detaching partitions.

In other words, this is a "feature" we do not want.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Robert Haas
On Wed, Oct 30, 2019 at 10:13 AM Tom Lane <[hidden email]> wrote:
> I believe that the current design of partitioning is explicitly intended
> to avoid the need for such a construct.  It'd be absolutely disastrous
> to have such a thing from many standpoints, including the breadth of
> locking needed to work with the global index, the difficulty of vacuuming,
> and the impossibility of cheaply attaching or detaching partitions.
>
> In other words, this is a "feature" we do not want.

I don't think that's true. Certainly, a lot of EnterpriseDB customers
want this feature - it comes up regularly in discussions here. But
that is not to say that the technical challenges are not formidable,
and I don't think this proposal really grapples with any of them.
However, that doesn't mean that the feature isn't desirable.

One of the biggest reasons why people want it is to enforce uniqueness
for secondary keys - e.g. the employees table is partitioned by
employee ID, but SSN should also be unique, at least among employees
for whom it's not NULL.

But people also want it for faster data retrieval: if you're looking
for a commonly-occurring value, an index per partition is fine. But if
you're looking for values that occur only once or a few times across
the whole hierarchy, an index scan per partition is very costly.
Consider, e.g.:

Nested Loop
-> Seq Scan
-> Append
  -> Index Scan on each_partition

You don't have to have very many partitions for that to suck, and it's
a thing that people want to do. Runtime partition pruning helps with
this case a lot, but, once again, only for the primary key. Secondary
keys are a big problem for partitioning today, in many ways.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Wed, Oct 30, 2019 at 10:13 AM Tom Lane <[hidden email]> wrote:
>> I believe that the current design of partitioning is explicitly intended
>> to avoid the need for such a construct.  It'd be absolutely disastrous
>> to have such a thing from many standpoints, including the breadth of
>> locking needed to work with the global index, the difficulty of vacuuming,
>> and the impossibility of cheaply attaching or detaching partitions.
>> In other words, this is a "feature" we do not want.

> I don't think that's true. Certainly, a lot of EnterpriseDB customers
> want this feature - it comes up regularly in discussions here. But
> that is not to say that the technical challenges are not formidable,
> and I don't think this proposal really grapples with any of them.
> However, that doesn't mean that the feature isn't desirable.

Well, the *effects* of the feature seem desirable, but that doesn't
mean that we want an implementation that actually has a shared index.
As soon as you do that, you've thrown away most of the benefits of
having a partitioned data structure in the first place.

No, I don't have an idea how we might support, eg, uniqueness of
non-partition-key columns without that.  But we need to spend our
effort on figuring that out, not on building a complicated mechanism
whose performance is never going to not suck.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Peter Geoghegan-4
On Wed, Oct 30, 2019 at 9:23 AM Tom Lane <[hidden email]> wrote:
> Well, the *effects* of the feature seem desirable, but that doesn't
> mean that we want an implementation that actually has a shared index.
> As soon as you do that, you've thrown away most of the benefits of
> having a partitioned data structure in the first place.

Right, but that's only the case for the global index. Global indexes
are useful when used judiciously. They enable the use of partitioning
for use cases where not being able to enforce uniqueness across all
partitions happens to be a deal breaker. I bet that this is fairly
common.

> No, I don't have an idea how we might support, eg, uniqueness of
> non-partition-key columns without that.  But we need to spend our
> effort on figuring that out, not on building a complicated mechanism
> whose performance is never going to not suck.

I don't think that there is a way to solve the problem that doesn't
look very much like a global index. Also, being able to push down a
partition number when scanning a global index seems like it would be
very compelling in some scenarios.

I'm a bit worried about the complexity that will need to be added to
nbtree to make global indexes work, but it's probably possible to come
up with something that isn't too bad. GIN already uses an
implementation level attribute number column for multi-column GIN
indexes, which is a little like what Ibrar has in mind. The really
complicated new code required for global indexes will be in places
like vacuumlazy.c.


--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Tom Lane-2
Peter Geoghegan <[hidden email]> writes:
> On Wed, Oct 30, 2019 at 9:23 AM Tom Lane <[hidden email]> wrote:
>> Well, the *effects* of the feature seem desirable, but that doesn't
>> mean that we want an implementation that actually has a shared index.
>> As soon as you do that, you've thrown away most of the benefits of
>> having a partitioned data structure in the first place.

> Right, but that's only the case for the global index. Global indexes
> are useful when used judiciously.

But ... why bother with partitioning then?  To me, the main reasons
why you might want a partitioned table are

* ability to cheaply add and remove partitions, primarily so that
you can cheaply do things like "delete the oldest month's data".

* ability to scale past our limits on the physical size of one table
--- both the hard BlockNumber-based limit, and the performance
constraints of e.g. vacuuming a very large table.

Both of those go out the window with a global index.  So you might
as well just have one table and forget all the overhead.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Andres Freund
Hi,

On 2019-10-30 13:05:57 -0400, Tom Lane wrote:

> Peter Geoghegan <[hidden email]> writes:
> > On Wed, Oct 30, 2019 at 9:23 AM Tom Lane <[hidden email]> wrote:
> >> Well, the *effects* of the feature seem desirable, but that doesn't
> >> mean that we want an implementation that actually has a shared index.
> >> As soon as you do that, you've thrown away most of the benefits of
> >> having a partitioned data structure in the first place.
>
> > Right, but that's only the case for the global index. Global indexes
> > are useful when used judiciously.
>
> But ... why bother with partitioning then?  To me, the main reasons
> why you might want a partitioned table are

Quite commonly there's a lot of *other* indexes, often on a lot wider
data than the primary key, that don't need to be global. And whereas in
a lot of cases the primary key in a partitioned table has pretty good
locality (and thus will be mostly buffered IO), other indexes will often
not have that property (i.e. not have much correlation with table
position).


> * ability to cheaply add and remove partitions, primarily so that
> you can cheaply do things like "delete the oldest month's data".

You can still do that to some degree with a global index. Imagine
e.g. keeping a 'partition id' as a sort-of column in the global
index. That allows you to drop the partition, without having to
immediately rebuild the index, by checking the partition id against the
live partitions during lookup.  So sure, your'e wasting space for a bit
in the global index, but it'll also be space that is likely to be fairly
efficiently reclaimed the next time vacuum touches the index.  And if
not the global index can be rebuilt concurrently without blocking
writes.


> * ability to scale past our limits on the physical size of one table
> --- both the hard BlockNumber-based limit, and the performance
> constraints of e.g. vacuuming a very large table.

For that to be a problem for a global index the global index (which will
often be something like two int4 or int8 columns) itself would need to
be above the block number based limit - which doesn't seem that close.

WRT vacuuming - based on my observations the table itself isn't a
performance problem for vacuuming all that commonly anymore, it's the
associated index scans. So yea, that's a problem.



Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Stephen Frost
In reply to this post by Peter Geoghegan-4
Greetings,

* Peter Geoghegan ([hidden email]) wrote:

> On Wed, Oct 30, 2019 at 9:23 AM Tom Lane <[hidden email]> wrote:
> > Well, the *effects* of the feature seem desirable, but that doesn't
> > mean that we want an implementation that actually has a shared index.
> > As soon as you do that, you've thrown away most of the benefits of
> > having a partitioned data structure in the first place.
>
> Right, but that's only the case for the global index. Global indexes
> are useful when used judiciously. They enable the use of partitioning
> for use cases where not being able to enforce uniqueness across all
> partitions happens to be a deal breaker. I bet that this is fairly
> common.
Absolutely- our lack of such is a common point of issue when folks are
considering using or migrating to PostgreSQL.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Isaac Morland
On Thu, 31 Oct 2019 at 14:50, Stephen Frost <[hidden email]> wrote:
Greetings,

* Peter Geoghegan ([hidden email]) wrote:
[....] 

Absolutely- our lack of such is a common point of issue when folks are
considering using or migrating to PostgreSQL.

Not sure how similar my situation really is, but I find myself wanting to have indices that cross non-partition members of an inheritance hierarchy:

create table t (
    id int,
    primary key (id)
);

create table t1 (
    a text
) inherits (t);

create table t2 (
    b int,
    c int
) inherits (t);

So "t"s are identified by an integer; and one kind of "t" has a single text attribute while a different kind of "t" has 2 int attributes. The idea is that there is a single primary key constraint on the whole hierarchy that ensures only one record with a particular id can exist in all the tables together. I can imagine wanting to do this with other unique constraints also.

At present I don't actually use inheritance; instead I put triggers on the child tables that do an insert on the parent table, which has the effect of enforcing the uniqueness I want.
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Tomas Vondra-4
On Thu, Oct 31, 2019 at 03:02:40PM -0400, Isaac Morland wrote:

>On Thu, 31 Oct 2019 at 14:50, Stephen Frost <[hidden email]> wrote:
>
>> Greetings,
>>
>> * Peter Geoghegan ([hidden email]) wrote:
>>
>[....]
>
>>
>> Absolutely- our lack of such is a common point of issue when folks are
>> considering using or migrating to PostgreSQL.
>>
>
>Not sure how similar my situation really is, but I find myself wanting to
>have indices that cross non-partition members of an inheritance hierarchy:
>
>create table t (
>    id int,
>    primary key (id)
>);
>
>create table t1 (
>    a text
>) inherits (t);
>
>create table t2 (
>    b int,
>    c int
>) inherits (t);
>
>So "t"s are identified by an integer; and one kind of "t" has a single text
>attribute while a different kind of "t" has 2 int attributes. The idea is
>that there is a single primary key constraint on the whole hierarchy that
>ensures only one record with a particular id can exist in all the tables
>together. I can imagine wanting to do this with other unique constraints
>also.
>

IMO the chances of us supporting global indexes with generic inheritance
hierarchies are about zero. We don't even support creating "partition"
indexes on those hierarchies ...

>At present I don't actually use inheritance; instead I put triggers on the
>child tables that do an insert on the parent table, which has the effect of
>enforcing the uniqueness I want.

Does it? Are you sure it actually works in READ COMMITTED? What exactly
does the trigger do?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Jeremy Schneider-2
In reply to this post by Andres Freund
On 10/30/19 10:27, Andres Freund wrote:

> On 2019-10-30 13:05:57 -0400, Tom Lane wrote:
>> Peter Geoghegan <[hidden email]> writes:
>>> On Wed, Oct 30, 2019 at 9:23 AM Tom Lane <[hidden email]> wrote:
>>>> Well, the *effects* of the feature seem desirable, but that doesn't
>>>> mean that we want an implementation that actually has a shared index.
>>>> As soon as you do that, you've thrown away most of the benefits of
>>>> having a partitioned data structure in the first place.
>>
>>> Right, but that's only the case for the global index. Global indexes
>>> are useful when used judiciously.
>>
>> But ... why bother with partitioning then?  To me, the main reasons
>> why you might want a partitioned table are
>
> Quite commonly there's a lot of *other* indexes, often on a lot wider
> data than the primary key, that don't need to be global. And whereas in
> a lot of cases the primary key in a partitioned table has pretty good
> locality (and thus will be mostly buffered IO), other indexes will often
> not have that property (i.e. not have much correlation with table
> position).

I asked around a little bit and got some interesting responses. Thought
I'd pass two of them along.

One person worked on a payments network (150,000+ installed readers),
the transaction table was date partitioned (1 per day) based on insert
timestamp, but lookups and updates were typically by the unique
transaction id.  Oracle DB, they kept 180 daily partitions, several
million rows per day. Transactions did not arrive in order, and could be
delayed if some part of the network was slow (they opted to allow the $2
charge rather than reject sales) and when the cash transaction records
were uploaded. Step one for their PG conversion created a read replica
in PG 9.6, and the cost of doing the individual index lookups across 180
partitions (and 180 indexes) was very high, so they stored max and min
txn id per partition and would generate a query with all the dates that
a txn id could have been in so that only a small number of partition
indexes would be accessed. They wanted a global index on txn id for
performance, not for uniqueness – id generated on reader with guid-like
semantics.

A second person worked on several large-scale systems and he relayed
that in some cases where they used Oracle global indexes on partitioned
tables, they ended up deciding to reverse that decision as things scaled
because of restrictive locking during partition maintenance (this is the
exact issue Tom points out). So even on a database _with_ the option of
using a global index, they've sometimes opted for "workaround" design
patterns instead:
* To solve uniqueness, manage serialization at the appliation level.
Isolate operations (e.g. using a queue) and use that to make sure that
two sessions don’t try to insert the same record at the same time.  From
an RDBMS, this looks like a separate, smaller table that is being used
to manage work activity.
* To solve the additional IO for a global table scan ... We often don’t
need to do this because the load in this pattern is not typically highly
concurrent.  If we are looking for higher concurrency, we can usually
add a hack/workaround that filters on a partition key to provide “pretty
good” pruning.  The net result is that you get 2-3x the IO due to the
lack of global index (same workaround as first story above).

Quote: "So ... I don’t actually like the idea of introducing this.
Unless, someone can solve the ugly challenges we have had [around
partition maintenance operations]."

I actually don't think those challenges are so un-solvable. I think that
global indexes will be irrelevant to most workloads. I'm not entirely
convinced that they won't be useful for a few people with specific
workloads and large amounts of data in PostgreSQL where the benefits
outweigh the costs. I definitely agree that care needs to be taken
around index maintenance operations if there's an effort here.


>> * ability to cheaply add and remove partitions, primarily so that
>> you can cheaply do things like "delete the oldest month's data".
>
> You can still do that to some degree with a global index. Imagine
> e.g. keeping a 'partition id' as a sort-of column in the global
> index. That allows you to drop the partition, without having to
> immediately rebuild the index, by checking the partition id against the
> live partitions during lookup.  So sure, your'e wasting space for a bit
> in the global index, but it'll also be space that is likely to be fairly
> efficiently reclaimed the next time vacuum touches the index.  And if
> not the global index can be rebuilt concurrently without blocking
> writes.

Another idea might be to leverage PostgreSQL's partial indexes. If the
index is created "where date>2020" and you're dropping an index from
2019 then you can entirely ignore the index. Not a panacea for every
index maintenance operation, but for the super-common case of dropping
the oldest partition you can now:

1) create new index concurrently "where dt>2020"
2) drop the old index
3) drop the 2019 partition

doesn't solve world hunger but there's lots of benefit for such a simple
hack.


>> * ability to scale past our limits on the physical size of one table
>> --- both the hard BlockNumber-based limit, and the performance
>> constraints of e.g. vacuuming a very large table.
>
> For that to be a problem for a global index the global index (which will
> often be something like two int4 or int8 columns) itself would need to
> be above the block number based limit - which doesn't seem that close.
>
> WRT vacuuming - based on my observations the table itself isn't a
> performance problem for vacuuming all that commonly anymore, it's the
> associated index scans. So yea, that's a problem.

I'm sure zheap will make all our dreams come true, right?  :D

-Jeremy


--
Jeremy Schneider
Database Engineer
Amazon Web Services


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Jeremy Schneider-2
On 11/25/19 15:05, Jeremy Schneider wrote:

> ... the cost of doing the individual index lookups across 180
> partitions (and 180 indexes) was very high, so they stored max and min
> txn id per partition and would generate a query with all the dates that
> a txn id could have been in so that only a small number of partition
> indexes would be accessed.
>
> .. If we are looking for higher concurrency, we can usually
> add a hack/workaround that filters on a partition key to provide “pretty
> good” pruning.  The net result is that you get 2-3x the IO due to the
> lack of global index (same workaround as first story above).

Is that basically like a global BRIN index with granularity at the
partition level?

-J

--
Jeremy Schneider
Database Engineer
Amazon Web Services


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Bruce Momjian
On Mon, Nov 25, 2019 at 03:44:39PM -0800, Jeremy Schneider wrote:

> On 11/25/19 15:05, Jeremy Schneider wrote:
> > ... the cost of doing the individual index lookups across 180
> > partitions (and 180 indexes) was very high, so they stored max and min
> > txn id per partition and would generate a query with all the dates that
> > a txn id could have been in so that only a small number of partition
> > indexes would be accessed.
> >
> > .. If we are looking for higher concurrency, we can usually
> > add a hack/workaround that filters on a partition key to provide “pretty
> > good” pruning.  The net result is that you get 2-3x the IO due to the
> > lack of global index (same workaround as first story above).
>
> Is that basically like a global BRIN index with granularity at the
> partition level?

Exactly!  :-)

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Jose Luis Tallon
On 19/12/19 4:03, Bruce Momjian wrote:

> On Mon, Nov 25, 2019 at 03:44:39PM -0800, Jeremy Schneider wrote:
>> On 11/25/19 15:05, Jeremy Schneider wrote:
>>> ... the cost of doing the individual index lookups across 180
>>> partitions (and 180 indexes) was very high, so they stored max and min
>>> txn id per partition and would generate a query with all the dates that
>>> a txn id could have been in so that only a small number of partition
>>> indexes would be accessed.
>>>
>>> .. If we are looking for higher concurrency, we can usually
>>> add a hack/workaround that filters on a partition key to provide “pretty
>>> good” pruning.  The net result is that you get 2-3x the IO due to the
>>> lack of global index (same workaround as first story above).
>> Is that basically like a global BRIN index with granularity at the
>> partition level?
> Exactly!  :-)

Actually, one "kind of" BRIN index *per partitioned table* mapping (key
range) -> (partition oid)... and so concurrency doesn't need to be very
affected.

(we don't need to do things just like other RDBMS do, ya know... ;)


IIRC, this precise approach was suggested around 2016 when initially
discussing the "declarative partitioning" which originated Postgres'
current partitioning scheme, in order to optimize partition pruning.


Just my .02€

     / J.L.




Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Bruce Momjian
On Thu, Dec 19, 2019 at 09:48:40AM +0100, Jose Luis Tallon wrote:

> On 19/12/19 4:03, Bruce Momjian wrote:
> > On Mon, Nov 25, 2019 at 03:44:39PM -0800, Jeremy Schneider wrote:
> > > On 11/25/19 15:05, Jeremy Schneider wrote:
> > > > ... the cost of doing the individual index lookups across 180
> > > > partitions (and 180 indexes) was very high, so they stored max and min
> > > > txn id per partition and would generate a query with all the dates that
> > > > a txn id could have been in so that only a small number of partition
> > > > indexes would be accessed.
> > > >
> > > > .. If we are looking for higher concurrency, we can usually
> > > > add a hack/workaround that filters on a partition key to provide “pretty
> > > > good” pruning.  The net result is that you get 2-3x the IO due to the
> > > > lack of global index (same workaround as first story above).
> > > Is that basically like a global BRIN index with granularity at the
> > > partition level?
> > Exactly!  :-)
>
> Actually, one "kind of" BRIN index *per partitioned table* mapping (key
> range) -> (partition oid)... and so concurrency doesn't need to be very
> affected.
>
> (we don't need to do things just like other RDBMS do, ya know... ;)
>
>
> IIRC, this precise approach was suggested around 2016 when initially
> discussing the "declarative partitioning" which originated Postgres' current
> partitioning scheme, in order to optimize partition pruning.

Robert Haas identified two needs for global indexes:

        https://www.postgresql.org/message-id/CA+Tgmob_J2M2+QKWrhg2NjQEkMEwZNTfd7a6Ubg34fJuZPkN2g@...
       
        One of the biggest reasons why people want it is to enforce uniqueness
        for secondary keys - e.g. the employees table is partitioned by
        employee ID, but SSN should also be unique, at least among employees
        for whom it's not NULL.
       
        But people also want it for faster data retrieval: if you're looking
        for a commonly-occurring value, an index per partition is fine. But if
        you're looking for values that occur only once or a few times across
        the whole hierarchy, an index scan per partition is very costly.

I don't see lossy BRIN indexes helping with the uniqueness use-case, and
I am not sure they would help with the rare case either.  They would
help for range-based partitions, but I thought our existing facilities
worked in that case.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Jeremy Schneider-2
On 12/19/19 08:12, Bruce Momjian wrote:
> I don't see lossy BRIN indexes helping with the uniqueness use-case, and
> I am not sure they would help with the rare case either.  They would
> help for range-based partitions, but I thought our existing facilities
> worked in that case.

Correlated data.  The existing facilities work if the filtering column
is exactly the same as the partition column.  But it's not at all
uncommon to have other columns with correlated data, perhaps the most
obvious of which is timeseries tables with many date columns of various
definitions (row first update, row latest update, invoice date, payment
date, process date, ship date, etc).

What if you could use *two* indexes in a single execution plan?  Use the
global BRIN to narrow down to 2 or 3 out of a hundred or more
partitions, then use local indexes to find specific rows in the
partitions of interest?  That might work, without being too overly
complicated.

-J


--
Jeremy Schneider
Database Engineer
Amazon Web Services


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Bruce Momjian
On Thu, Dec 19, 2019 at 11:28:55AM -0800, Jeremy Schneider wrote:

> On 12/19/19 08:12, Bruce Momjian wrote:
> > I don't see lossy BRIN indexes helping with the uniqueness use-case, and
> > I am not sure they would help with the rare case either.  They would
> > help for range-based partitions, but I thought our existing facilities
> > worked in that case.
>
> Correlated data.  The existing facilities work if the filtering column
> is exactly the same as the partition column.  But it's not at all
> uncommon to have other columns with correlated data, perhaps the most
> obvious of which is timeseries tables with many date columns of various
> definitions (row first update, row latest update, invoice date, payment
> date, process date, ship date, etc).
>
> What if you could use *two* indexes in a single execution plan?  Use the
> global BRIN to narrow down to 2 or 3 out of a hundred or more
> partitions, then use local indexes to find specific rows in the
> partitions of interest?  That might work, without being too overly
> complicated.

No, that is very interesting --- having secondary indexes for
partitioned tables that trim most partitions.  Would index lookups on
each partition index be very slow?  BRIN indexes?  I am assuming global
indexes would only avoid such lookups.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Jim Nasby
In reply to this post by Tom Lane-2

> On Oct 30, 2019, at 12:05 PM, Tom Lane <[hidden email]> wrote:
>
> But ... why bother with partitioning then?  To me, the main reasons
> why you might want a partitioned table are
>
> * ability to cheaply add and remove partitions, primarily so that
> you can cheaply do things like "delete the oldest month's data".
>
> * ability to scale past our limits on the physical size of one table
> --- both the hard BlockNumber-based limit, and the performance
> constraints of e.g. vacuuming a very large table.

A third case is data locality. In that case global indexes would be useful for queries that do not correlate will with hot data.

> Both of those go out the window with a global index.  So you might
> as well just have one table and forget all the overhead.

Partition pruning could still be valuable even with global indexes, provided that we teach vacuum how to clean up tuples in an index that point at a partition that has been deleted.

Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

曾文旌(义从)
I've been following this topic for a long time. It's been a year since the last response.
It was clear that our customers wanted this feature as well, and a large number of them mentioned it.

So, I wish the whole feature to mature as soon as possible.
I summarized the scheme mentioned in the email and completed the POC patch(base on PG_13).

Next, I encountered some difficulties when implementing the DDL of the partition table with global index, and I hope to get some help from the community

Here are some details what has been implemented
1 Definition of global index
Using the INCLUDE keyword to include the tableoid of the partitioned table.

2. Maintenance of global index by partition table DML.
Both INSERT and UPDATE of a partitioned table maintain global index

3. Global index scan
Planner: Processes predicate conditions on the primary partition, generating paths and plans for the global index.
Executer: index scan get indextup, get the tableoid from indextup, and verify the visibility of the data in the partition.

4. Vacuum partition table maintains global index
Each partitioned table VACUUM cleans its own garbage data in the global index.

After the above function point is completed, the global index can be used without partition table DDL.

Demo:
--Use pgbench to create the test partition table
pgbench -i -s 1000 --partitions=6 --partition-method=range

—- create global index on bid, bid is not partition key
CREATE INDEX  idx_pgbench_accounts_bid on pgbench_accounts(bid) global;

— check global index status
select * , sum(alivetup) over()as sum_alivetup, sum(deadtup) over() as sum_deadtup from bt_get_global_index_status('idx_pgbench_accounts_bid');
      relname       | alivetup | deadtup | sum_alivetup | sum_deadtup 
--------------------+----------+---------+--------------+-------------
 pgbench_accounts_1 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_2 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_3 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_4 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_5 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_6 | 16666665 |       0 |    100000000 |           0
(6 rows)

— run pgbench for for a while
pgbench -M prepared  -j 32 -c 32 -T 60 -P1


—- check global index, The index has bloated
postgres=# select * , sum(alivetup) over()as sum_alivetup, sum(deadtup) over() as sum_deadtup from bt_get_global_index_status('idx_pgbench_accounts_bid');
      relname       | alivetup | deadtup | sum_alivetup | sum_deadtup 
--------------------+----------+---------+--------------+-------------
 pgbench_accounts_1 | 16717733 |       0 |    100306102 |           0
 pgbench_accounts_2 | 16717409 |       0 |    100306102 |           0
 pgbench_accounts_3 | 16717540 |       0 |    100306102 |           0
 pgbench_accounts_4 | 16717972 |       0 |    100306102 |           0
 pgbench_accounts_5 | 16717578 |       0 |    100306102 |           0
 pgbench_accounts_6 | 16717870 |       0 |    100306102 |           0
(6 rows)

—- vacuum partition table
vacuum pgbench_accounts;

—- Garbage is collected, global index still looks correct and valid.
postgres=# select * , sum(alivetup) over()as sum_alivetup, sum(deadtup) over() as sum_deadtup from bt_get_global_index_status('idx_pgbench_accounts_bid');
      relname       | alivetup | deadtup | sum_alivetup | sum_deadtup 
--------------------+----------+---------+--------------+-------------
 pgbench_accounts_1 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_2 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_3 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_4 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_5 | 16666667 |       0 |    100000000 |           0
 pgbench_accounts_6 | 16666665 |       0 |    100000000 |           0
(6 rows)

—-

—- global index scan works well
postgres=# select tableoid ,count(*) from pgbench_accounts where bid = 834 group by tableoid;
 tableoid | count 
----------+-------
    16455 | 33335
    16458 | 66665
(2 rows)

postgres=# explain select tableoid ,count(*) from pgbench_accounts where bid = 834 group by tableoid;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2945.23..2945.24 rows=1 width=12)
   Group Key: pgbench_accounts.tableoid
   ->  Global Index Scan using idx_pgbench_accounts_bid on pgbench_accounts  (cost=0.50..10.18 rows=587011 width=4)
         Index Cond: (bid = 834)
(4 rows)


The following is how to implement DDL of global index. How to maintain global index of DDL of partitioned table.
This seems to be more difficult than the previous work.

I understand there are four main parts

1 Build global index or reindex, especially in concurrent mode

2 Detach partition
Would it be a good idea to make a flag to global index  and let VACUUM handle the index data of the Detach partition?

3 Attach partition
It is easy to Attach a new empty partition, but adding a new one with data is not.
If there is a unique key conflict, do we slowly clean up the garbage or invalidate the entire index?

4 Truncate partition with global index
Do we need to process the heap and index data separately in multiple transactions?
This will lose the ability to roll back for Truncate operation.
Is it worth it?


Looking forward to your feedback.

Thanks!

Wenjing








global_index_poc_v1.patch (96K) Download Attachment
smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: Global Index

Bruce Momjian
On Thu, Jan  7, 2021 at 05:44:01PM +0800, 曾文旌 wrote:
> I've been following this topic for a long time. It's been a year since the last response.
> It was clear that our customers wanted this feature as well, and a large number of them mentioned it.
>
> So, I wish the whole feature to mature as soon as possible.
> I summarized the scheme mentioned in the email and completed the POC patch(base on PG_13).

I think you need to address the items mentioned in this blog, and the
email link it mentions:

        https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020

I am not clear this is a feature we will want.  Yes, people ask for it,
but if the experience will be bad for them and they will regret using
it, I am not sure we want it.  Of course, if you code it up and we get
a good user experience, we would want it --- I am just saying it is not
clear right now.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



12