explanation for random_page_cost is outdated

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

explanation for random_page_cost is outdated

apt.postgresql.org Repository Update
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/runtime-config-query.html
Description:

Explanation for random_page_cost is rather outdated, because it did only for
case of mechanical hdd. But all modern database servers, which I know, made
upon SSD. Do or not do default value for random_page_cost equal to 1 is the
question, but, IMHO, at list in the documentation  about random_page_cost
need to add in a speculation about SSD.

It's important because a business programming now is mostly web programming.
Most database is poorly designed by web programmer, tables looked like a
primary key and a huge json (containing all) with large gin index upon it.
Now I am seeing a table with a GIN index 50% of the table size. The database
is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
by planner, but with random_page_cost=1 the result may be not excellent, but
acceptable for web programmers.
Reply | Threaded
Open this post in threaded view
|

Re: explanation for random_page_cost is outdated

Bruce Momjian
On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:

> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
> Description:
>
> Explanation for random_page_cost is rather outdated, because it did only for
> case of mechanical hdd. But all modern database servers, which I know, made
> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
> question, but, IMHO, at list in the documentation  about random_page_cost
> need to add in a speculation about SSD.
>
> It's important because a business programming now is mostly web programming.
> Most database is poorly designed by web programmer, tables looked like a
> primary key and a huge json (containing all) with large gin index upon it.
> Now I am seeing a table with a GIN index 50% of the table size. The database
> is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
> by planner, but with random_page_cost=1 the result may be not excellent, but
> acceptable for web programmers.

Does this sentence in the random_page_cost docs unclear or not have enough
visibility:

        https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
       
        Storage that has a low random read cost relative to sequential, e.g.
        solid-state drives, might also be better modeled with a lower value for
        random_page_cost.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: explanation for random_page_cost is outdated

Олег Самойлов
Yep. Unclear. What parameter is recommended for SSD? Lower? 3? 2? 1?

Much better will be write: if you use SSD set 1.

Олег

> 19 марта 2020 г., в 23:56, Bruce Momjian <[hidden email]> написал(а):
>
> On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
>> Description:
>>
>> Explanation for random_page_cost is rather outdated, because it did only for
>> case of mechanical hdd. But all modern database servers, which I know, made
>> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
>> question, but, IMHO, at list in the documentation  about random_page_cost
>> need to add in a speculation about SSD.
>>
>> It's important because a business programming now is mostly web programming.
>> Most database is poorly designed by web programmer, tables looked like a
>> primary key and a huge json (containing all) with large gin index upon it.
>> Now I am seeing a table with a GIN index 50% of the table size. The database
>> is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
>> by planner, but with random_page_cost=1 the result may be not excellent, but
>> acceptable for web programmers.
>
> Does this sentence in the random_page_cost docs unclear or not have enough
> visibility:
>
>    https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>    
>    Storage that has a low random read cost relative to sequential, e.g.
>    solid-state drives, might also be better modeled with a lower value for
>    random_page_cost.
>
> --
>  Bruce Momjian  <[hidden email]>        https://momjian.us
>  EnterpriseDB                             https://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +



Reply | Threaded
Open this post in threaded view
|

Re: explanation for random_page_cost is outdated

yigong hu
Sorry to hijack the thread, I also recently have similar observation that the statement about random_page_cost on SSD is ambiguous. The current document says that

> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost.

However, this statement does not clarify what values might be good. For some workload, the default value 4.0 would cause bad performance and lowering random_page_cost to a value 3.0 or 2.0 does not solve the performance problem. Only when the random_page_cost is lowered to below 1.2 will the bad performance be mitigated. Thus, I would suggest elaborating on this description further as:

 >  Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a value that is close to 1 for random_page_cost.

Detail:

I run the PostgreSQL 11 on an SSD hardware. The database has two small tables with 6MB and 16MB separately. The pgbench runs a select join query in 1 min. The result shows that when the random_page_cost is 1, the average latency is 14ms. When the random_page_cost is 1.5, 2, 3 or 4, the average latency is 26ms. This result suggests that setting random_page_cost to a value larger than 1.5 would cause almost 2x latency. If I increase the 6MB table to 60MB and rerun the sysbench, the result shows that when the random_page_cost is 1, the average latency is 13ms. When the random_page_cost is 1.5,2,3 or 4, the average latency is 17ms.

I attached my testing script, the postgresql configuration file, and planner output.

On Mon, Apr 27, 2020 at 3:19 AM Олег Самойлов <[hidden email]> wrote:
Yep. Unclear. What parameter is recommended for SSD? Lower? 3? 2? 1?

Much better will be write: if you use SSD set 1.

Олег

> 19 марта 2020 г., в 23:56, Bruce Momjian <[hidden email]> написал(а):
>
> On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
>> Description:
>>
>> Explanation for random_page_cost is rather outdated, because it did only for
>> case of mechanical hdd. But all modern database servers, which I know, made
>> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
>> question, but, IMHO, at list in the documentation  about random_page_cost
>> need to add in a speculation about SSD.
>>
>> It's important because a business programming now is mostly web programming.
>> Most database is poorly designed by web programmer, tables looked like a
>> primary key and a huge json (containing all) with large gin index upon it.
>> Now I am seeing a table with a GIN index 50% of the table size. The database
>> is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
>> by planner, but with random_page_cost=1 the result may be not excellent, but
>> acceptable for web programmers.
>
> Does this sentence in the random_page_cost docs unclear or not have enough
> visibility:
>
>    https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>   
>    Storage that has a low random read cost relative to sequential, e.g.
>    solid-state drives, might also be better modeled with a lower value for
>    random_page_cost.
>
> --
>  Bruce Momjian  <[hidden email]>        https://momjian.us
>  EnterpriseDB                             https://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +






postgresql.conf (31K) Download Attachment
query_analytical.sh (3K) Download Attachment
planner_output (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: explanation for random_page_cost is outdated

Pavel Stehule


ne 26. 4. 2020 v 21:25 odesílatel yigong hu <[hidden email]> napsal:
Sorry to hijack the thread, I also recently have similar observation that the statement about random_page_cost on SSD is ambiguous. The current document says that

> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost.

However, this statement does not clarify what values might be good. For some workload, the default value 4.0 would cause bad performance and lowering random_page_cost to a value 3.0 or 2.0 does not solve the performance problem. Only when the random_page_cost is lowered to below 1.2 will the bad performance be mitigated. Thus, I would suggest elaborating on this description further as:

 >  Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a value that is close to 1 for random_page_cost.

I depends on estimation. Lot of people use random_page_cost as fix of broken estimation. Then configures this value to some strange values. Lot of other queries with good estimation can be worse then.



Detail:

I run the PostgreSQL 11 on an SSD hardware. The database has two small tables with 6MB and 16MB separately. The pgbench runs a select join query in 1 min. The result shows that when the random_page_cost is 1, the average latency is 14ms. When the random_page_cost is 1.5, 2, 3 or 4, the average latency is 26ms. This result suggests that setting random_page_cost to a value larger than 1.5 would cause almost 2x latency. If I increase the 6MB table to 60MB and rerun the sysbench, the result shows that when the random_page_cost is 1, the average latency is 13ms. When the random_page_cost is 1.5,2,3 or 4, the average latency is 17ms.

I attached my testing script, the postgresql configuration file, and planner output.

On Mon, Apr 27, 2020 at 3:19 AM Олег Самойлов <[hidden email]> wrote:
Yep. Unclear. What parameter is recommended for SSD? Lower? 3? 2? 1?

Much better will be write: if you use SSD set 1.

Олег

> 19 марта 2020 г., в 23:56, Bruce Momjian <[hidden email]> написал(а):
>
> On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
>> Description:
>>
>> Explanation for random_page_cost is rather outdated, because it did only for
>> case of mechanical hdd. But all modern database servers, which I know, made
>> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
>> question, but, IMHO, at list in the documentation  about random_page_cost
>> need to add in a speculation about SSD.
>>
>> It's important because a business programming now is mostly web programming.
>> Most database is poorly designed by web programmer, tables looked like a
>> primary key and a huge json (containing all) with large gin index upon it.
>> Now I am seeing a table with a GIN index 50% of the table size. The database
>> is on SSD, of cause.  With default random_page_cost=4 GIN index don't used
>> by planner, but with random_page_cost=1 the result may be not excellent, but
>> acceptable for web programmers.
>
> Does this sentence in the random_page_cost docs unclear or not have enough
> visibility:
>
>    https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>   
>    Storage that has a low random read cost relative to sequential, e.g.
>    solid-state drives, might also be better modeled with a lower value for
>    random_page_cost.
>
> --
>  Bruce Momjian  <[hidden email]>        https://momjian.us
>  EnterpriseDB                             https://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +





Reply | Threaded
Open this post in threaded view
|

Re: explanation for random_page_cost is outdated

Bruce Momjian
On Mon, Apr 27, 2020 at 06:02:41AM +0200, Pavel Stehule wrote:

>
>
> ne 26. 4. 2020 v 21:25 odesílatel yigong hu <[hidden email]> napsal:
>
>     Sorry to hijack the thread, I also recently have similar observation that
>     the statement about random_page_cost on SSD is ambiguous. The current
>     document says that
>
>     > Storage that has a low random read cost relative to sequential, e.g.
>     solid-state drives, might also be better modeled with a lower value for
>     random_page_cost.
>
>     However, this statement does not clarify what values might be good. For
>     some workload, the default value 4.0 would cause bad performance and
>     lowering random_page_cost to a value 3.0 or 2.0 does not solve the
>     performance problem. Only when the random_page_cost is lowered to below 1.2
>     will the bad performance be mitigated. Thus, I would suggest elaborating on
>     this description further as:
>
>      >  Storage that has a low random read cost relative to sequential, e.g.
>     solid-state drives, might also be better modeled with a value that is close
>     to 1 for random_page_cost.
>
>
> I depends on estimation. Lot of people use random_page_cost as fix of broken
> estimation. Then configures this value to some strange values. Lot of other
> queries with good estimation can be worse then.
I have been recommending 1.1 as a value for random_page_cost for SSDs
for years, and I think it would be helpful to suggest that value, so doc
patch attached.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

random.diff (1003 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: explanation for random_page_cost is outdated

Олег Самойлов
Yes, I saw recommendations for 1.1 early, but why? Why such exactly precision number, why 1.1? Is here ever a theoretical or experimental prof?

As for me, random_page_cost depended not only not characteristic of a storage device (hdd or ssd), but also on assumptions about how much of the database is in memory cache (90% by default). And this is a very rough assumption (of cause in ideal whole database must fit in the memory cache).

And so I don't see any reason to recommend exactly value 1.1, simple 1 is good too, especially for an ideal server with huge memory cache.


> 27 апр. 2020 г., в 19:16, Bruce Momjian <[hidden email]> написал(а):
>
> I have been recommending 1.1 as a value for random_page_cost for SSDs
> for years, and I think it would be helpful to suggest that value, so doc
> patch attached.
>
> --
>  Bruce Momjian  <[hidden email]>        https://momjian.us
>  EnterpriseDB                             https://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +                      Ancient Roman grave inscription +
> <random.diff>



Reply | Threaded
Open this post in threaded view
|

Re: explanation for random_page_cost is outdated

Bruce Momjian
On Mon, Apr 27, 2020 at 07:24:45PM +0300, Олег Самойлов
wrote:
> Yes, I saw recommendations for 1.1 early, but why? Why such exactly
> precision number, why 1.1? Is here ever a theoretical or experimental
> prof?

Well, SSD random performance is slightly slower than sequential, so the
value should be slighly larger than 1.1.  Clearly 2.0 or higher is not
recommended.  Could it be 1.05 or 1.01 or 1.15?  Yeah, but probably
giving 1.1 is at least a good suggestion to start with.

> As for me, random_page_cost depended not only not characteristic of a
> storage device (hdd or ssd), but also on assumptions about how much of
> the database is in memory cache (90% by default). And this is a very
> rough assumption (of cause in ideal whole database must fit in the
> memory cache).

True.

> And so I don't see any reason to recommend exactly value 1.1, simple 1
> is good too, especially for an ideal server with huge memory cache.

Uh, well, 1.0 is fine for non-SSDs too if all the data is in cache, and
there are no database writes.

---------------------------------------------------------------------------

>
>
> > 27 апр. 2020 г., в 19:16, Bruce Momjian <[hidden email]>
> > написал(а):
> >
> > I have been recommending 1.1 as a value for random_page_cost for
> > SSDs for years, and I think it would be helpful to suggest that
> > value, so doc patch attached.
> >
> > -- Bruce Momjian <[hidden email]> https://momjian.us EnterpriseDB
> > https://enterprisedb.com
> >
> > + As you are, so once was I.  As I am, so you will be. + + Ancient
> > Roman grave inscription + <random.diff>
>

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Reply | Threaded
Open this post in threaded view
|

Re: explanation for random_page_cost is outdated

Bruce Momjian
In reply to this post by Bruce Momjian
On Mon, Apr 27, 2020 at 12:16:30PM -0400, Bruce Momjian wrote:

> On Mon, Apr 27, 2020 at 06:02:41AM +0200, Pavel Stehule wrote:
> >      >  Storage that has a low random read cost relative to sequential, e.g.
> >     solid-state drives, might also be better modeled with a value that is close
> >     to 1 for random_page_cost.
> >
> >
> > I depends on estimation. Lot of people use random_page_cost as fix of broken
> > estimation. Then configures this value to some strange values. Lot of other
> > queries with good estimation can be worse then.
>
> I have been recommending 1.1 as a value for random_page_cost for SSDs
> for years, and I think it would be helpful to suggest that value, so doc
> patch attached.

Patch applied back through 9.5.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +