Re: Additional Chapter for Tutorial

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

David G Johnston
Removing -docs as moderation won’t let me cross-post.

On Monday, October 26, 2020, David G. Johnston <[hidden email]> wrote:
On Monday, October 26, 2020, Jürgen Purtz <[hidden email]> wrote:
On 21.10.20 22:33, David G. Johnston wrote:

Two, I find the amount of detail being provided here to be on the too-much side.  A bit more judicious use of links into the appropriate detail chapters seems warranted.

The patch is intended to give every interested person an overall impression of the chapter within its new position. Because it has moved from part 'Tutorial' to 'Internals' the text should be very accurate concerning technical issues - like all the other chapters in this part. A tutorial chapter has a more superficial nature.

Haven’t reviewed the patches yet but...

I still think that my comment applies even with the move to internals.  The value here is putting together a coherent narrative and making deeper implementation details accessible.  If those details are already covered elsewhere in the documentation (not source code) links should be given serious consideration.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
On 26.10.20 15:53, David G. Johnston wrote:
Removing -docs as moderation won’t let me cross-post.

On Monday, October 26, 2020, David G. Johnston <[hidden email]> wrote:
On Monday, October 26, 2020, Jürgen Purtz <[hidden email]> wrote:
On 21.10.20 22:33, David G. Johnston wrote:

Two, I find the amount of detail being provided here to be on the too-much side.  A bit more judicious use of links into the appropriate detail chapters seems warranted.

The patch is intended to give every interested person an overall impression of the chapter within its new position. Because it has moved from part 'Tutorial' to 'Internals' the text should be very accurate concerning technical issues - like all the other chapters in this part. A tutorial chapter has a more superficial nature.

Haven’t reviewed the patches yet but...

I still think that my comment applies even with the move to internals.  The value here is putting together a coherent narrative and making deeper implementation details accessible.  If those details are already covered elsewhere in the documentation (not source code) links should be given serious consideration.

David J.

Please find the new patch in the attachment after integrating David's suggestions: a) versus the last patch and b) versus master.

Notably it contains

  • nearly all of his suggestions (see sgml file for comments 'DGJ')
  • reduction of <firstterm>. This was a hangover from the pre-glossary-times. I tried to emphasis standard terms. This is no longer necessary because nowadays they are clearly defined in the glossary.

--

J. Purtz



0009-architecture-vs-0008.patch (39K) Download Attachment
0009-architecture-vs-master.patch (230K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Erik Rijkers
On 2020-10-30 11:57, Jürgen Purtz wrote:
> On 26.10.20 15:53, David G. Johnston wrote:
>> Removing -docs as moderation won’t let me cross-post.
>>

Hi,

I applied 0009-architecture-vs-master.patch to head
and went through architecture.sgml (only that file),
then produced the attached .diff


And I wrote down some separate items:

1.
'Two Phase Locking' and 'TPL' should be, I think,
'Two-Phase Commit'. Please someone confirm.
(no changes made)

2.
To compare xid to sequence because they similarly 'count up' seems a bad
idea.
(I don't think it's always true in the case of sequences)
(no changes made)

3.
'accesses' seems a somewhat strange word most of the time just 'access'
may be better.  Not sure - native speaker wanted. (no changes made)

4.
'heap', in postgres, means often (always?) files. But more generally,
the meaning is more associated with memory.  Therefore it would be good
I think to explicitly use 'heap file' at least in the beginning once to
make clear that heap implies 'safely written away to disk'.  Again, I'm
not quite sure if my understanding is correct - I have made no changes
in this regard.



Erik Rijkers

architecture.sgml.20201030.diff (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Justin Pryzby
On Fri, Oct 30, 2020 at 05:45:00PM +0100, Erik Rijkers wrote:

> On 2020-10-30 11:57, Jürgen Purtz wrote:
> > On 26.10.20 15:53, David G. Johnston wrote:
> > > Removing -docs as moderation won’t let me cross-post.
> > >
>
> Hi,
>
> I applied 0009-architecture-vs-master.patch to head
> and went through architecture.sgml (only that file),
> then produced the attached .diff
Now I applied 0009 as well as Erik's changes and made some more of my own :)

I'm including all patches so CFBOT is happy.

> 3.
> 'accesses' seems a somewhat strange word most of the time just 'access' may
> be better.  Not sure - native speaker wanted. (no changes made)

You're right, and I included that part.

--
Justin

0001-Additional-Chapter-for-Tutorial.patch (21K) Download Attachment
0002-Fix-accesses-per-suggestion-from-Erik.patch (1K) Download Attachment
0003-More-fixes-on-top.patch (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
In reply to this post by Erik Rijkers
On 30.10.20 17:45, Erik Rijkers wrote:

> Hi,
>
> I applied 0009-architecture-vs-master.patch to head
> and went through architecture.sgml (only that file),
> then produced the attached .diff
>
>
> And I wrote down some separate items:
>
> 1.
> 'Two Phase Locking' and 'TPL' should be, I think,
> 'Two-Phase Commit'. Please someone confirm.
> (no changes made)
>
> 2.
> To compare xid to sequence because they similarly 'count up' seems a
> bad idea.
> (I don't think it's always true in the case of sequences)
> (no changes made)
>
> 3.
> 'accesses' seems a somewhat strange word most of the time just
> 'access' may be better.  Not sure - native speaker wanted. (no changes
> made)
>
> 4.
> 'heap', in postgres, means often (always?) files. But more generally,
> the meaning is more associated with memory.  Therefore it would be
> good I think to explicitly use 'heap file' at least in the beginning
> once to make clear that heap implies 'safely written away to disk'. 
> Again, I'm not quite sure if my understanding is correct - I have made
> no changes in this regard.
>
>
>
> Erik Rijkers
All suggestions so far are summarized in the attached patch with the
following exceptions:

- 'Two Phase Locking' is the intended term.

- Not adopted:

      Second, the transfer of dirty buffers from Shared Memory to
      files must take place. This is the primary task of the
-    Background Writer process. Because I/O activities can block
+    Checkpointer process. Because I/O activities can block
      other processes, it starts periodically and

Partly adopted:

-    the data in the old version of the row does not change! ...

-    before. Nothing is thrown away so far! Only <literal>xmax</literal> ...

--

J. Purtz





0010-architecture-vs-master.patch (230K) Download Attachment
0010-architecture-vs-0009.patch (44K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Erik Rijkers
On 2020-11-01 16:38, Jürgen Purtz wrote:

> On 30.10.20 17:45, Erik Rijkers wrote:
>>
>> And I wrote down some separate items:
>>
>> 1.
>> 'Two Phase Locking' and 'TPL' should be, I think,
>> 'Two-Phase Commit'. Please someone confirm.
>> (no changes made)
>>
>> Erik Rijkers
>
> All suggestions so far are summarized in the attached patch with the
> following exceptions:
>
> - 'Two Phase Locking' is the intended term.

OK, so what is 'Two Phase Locking'?  The term is not explained, and not
used anywhere else in the manual.  You propose to introduce it here, in
the tutorial.  I don't know what it means, and I am not really a
beginner.

'Two Phase Locking' should be explained somewhere, and how it relates
(or not) to Two-Phase Commit (2PC), don't you agree?


Erik Rijkers
























Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
On 02.11.20 07:15, Erik Rijkers wrote:

> On 2020-11-01 16:38, Jürgen Purtz wrote:
>> On 30.10.20 17:45, Erik Rijkers wrote:
>>>
>>> And I wrote down some separate items:
>>>
>>> 1.
>>> 'Two Phase Locking' and 'TPL' should be, I think,
>>> 'Two-Phase Commit'. Please someone confirm.
>>> (no changes made)
>>>
>>> Erik Rijkers
>>
>> All suggestions so far are summarized in the attached patch with the
>> following exceptions:
>>
>> - 'Two Phase Locking' is the intended term.
>
> OK, so what is 'Two Phase Locking'?  The term is not explained, and
> not used anywhere else in the manual.  You propose to introduce it
> here, in the tutorial.  I don't know what it means, and I am not
> really a beginner.
>
> 'Two Phase Locking' should be explained somewhere, and how it relates
> (or not) to Two-Phase Commit (2PC), don't you agree?
>
>
> Erik Rijkers
>
>
It may be possible to explain OCC and 2PL in two or three sentences
within the glossary. But I think, we shall not try to explain such
general strategies. They are not specific to PG and even not
implemented. Instead, if the paragraph is too detailed, we can use a
more general formulation without explicitly naming locking strategies.

OLD:

     A first approach to implement protections against concurrent
     access to the same data may be the locking of critical
     rows. Two such techniques are:
     <emphasis>Optimistic Concurrency Control</emphasis> (OCC)
     and <emphasis>Two Phase Locking</emphasis> (2PL).
     <productname>PostgreSQL</productname> implements a third, more
     sophisticated technique: <firstterm>Multiversion Concurrency
     Control</firstterm> (MVCC). The crucial advantage of MVCC ...

Proposal:

     A first approach to implement protections against concurrent
     access to the same data may be the locking of critical
     rows.
     <productname>PostgreSQL</productname> implements a more
     sophisticated technique which avoids any locking:
<firstterm>Multiversion Concurrency
     Control</firstterm> (MVCC). The crucial advantage of MVCC ...

Any thoughts or other suggestions?

--

J. Purtz




Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Erik Rijkers
On 2020-11-02 09:26, Jürgen Purtz wrote:

> OLD:
>
>     A first approach to implement protections against concurrent
>     access to the same data may be the locking of critical
>     rows. Two such techniques are:
>     <emphasis>Optimistic Concurrency Control</emphasis> (OCC)
>     and <emphasis>Two Phase Locking</emphasis> (2PL).
>     <productname>PostgreSQL</productname> implements a third, more
>     sophisticated technique: <firstterm>Multiversion Concurrency
>     Control</firstterm> (MVCC). The crucial advantage of MVCC ...
>
> Proposal:
>
>     A first approach to implement protections against concurrent
>     access to the same data may be the locking of critical
>     rows.
>     <productname>PostgreSQL</productname> implements a more
>     sophisticated technique which avoids any locking:
> <firstterm>Multiversion Concurrency
>     Control</firstterm> (MVCC). The crucial advantage of MVCC ...
>
> Any thoughts or other suggestions?
>

Yes, just leave it out. Much better, as far as I'm concerned.

Erik




Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
On 02.11.20 09:44, Erik Rijkers wrote:

> On 2020-11-02 09:26, Jürgen Purtz wrote:
>
>> OLD:
>>
>>     A first approach to implement protections against concurrent
>>     access to the same data may be the locking of critical
>>     rows. Two such techniques are:
>>     <emphasis>Optimistic Concurrency Control</emphasis> (OCC)
>>     and <emphasis>Two Phase Locking</emphasis> (2PL).
>>     <productname>PostgreSQL</productname> implements a third, more
>>     sophisticated technique: <firstterm>Multiversion Concurrency
>>     Control</firstterm> (MVCC). The crucial advantage of MVCC ...
>>
>> Proposal:
>>
>>     A first approach to implement protections against concurrent
>>     access to the same data may be the locking of critical
>>     rows.
>>     <productname>PostgreSQL</productname> implements a more
>>     sophisticated technique which avoids any locking:
>> <firstterm>Multiversion Concurrency
>>     Control</firstterm> (MVCC). The crucial advantage of MVCC ...
>>
>> Any thoughts or other suggestions?
>>
>
> Yes, just leave it out. Much better, as far as I'm concerned.
>
> Erik
>
>
Because there have been no more comments in the last days I created a
consolidated patch. It contains Erik's suggestion and some tweaks for
the text size within graphics.

--

J. Purtz


0011-architecture.patch (230K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Erik Rijkers
On 2020-11-07 13:24, Jürgen Purtz wrote:
>>
> Because there have been no more comments in the last days I created a
> consolidated patch. It contains Erik's suggestion and some tweaks for
> the text size within graphics.
>
> [0011-architecture.patch]

Hi,

I went through architecture.sgml once more; some proposed changes
attached.

And in some .svg files I noticed 'jungest' which should be 'youngest', I
suppose.
I did not change them but below is filelist of  grep -l 'jung'.

./doc/src/sgml/images/freeze-ink.svg
./doc/src/sgml/images/freeze-ink-svgo.svg
./doc/src/sgml/images/freeze-raw.svg
./doc/src/sgml/images/wraparound-ink.svg
./doc/src/sgml/images/wraparound-ink-svgo.svg
./doc/src/sgml/images/wraparound-raw.svg


Thanks,

Erik



architecture.sgml.20201107.diff (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
On 07.11.20 20:15, Erik Rijkers wrote:

> On 2020-11-07 13:24, Jürgen Purtz wrote:
>>>
>> Because there have been no more comments in the last days I created a
>> consolidated patch. It contains Erik's suggestion and some tweaks for
>> the text size within graphics.
>>
>> [0011-architecture.patch]
>
> Hi,
>
> I went through architecture.sgml once more; some proposed changes
> attached.
>
> And in some .svg files I noticed 'jungest' which should be 'youngest',
> I suppose.
> I did not change them but below is filelist of  grep -l 'jung'.
>
> ./doc/src/sgml/images/freeze-ink.svg
> ./doc/src/sgml/images/freeze-ink-svgo.svg
> ./doc/src/sgml/images/freeze-raw.svg
> ./doc/src/sgml/images/wraparound-ink.svg
> ./doc/src/sgml/images/wraparound-ink-svgo.svg
> ./doc/src/sgml/images/wraparound-raw.svg
>
>
> Thanks,
>
> Erik
>
>
Good catches. Everything applied.

--

J. Purtz


0012-architecture.patch (230K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

David G Johnston
On Sun, Nov 8, 2020 at 8:56 AM Jürgen Purtz <[hidden email]> wrote:

Good catches. Everything applied.

Reviewed the first three sections.

template0 - I would remove the schema portions of this and simply note this as being a pristine recovery database in the diagram.

I would drop the word "more" and just say "system schemas".  I would drop pg_toast from the list of system schema and focus on the three user-facing ones.

Instead of "my_schema" (optional) I would do "my_schema" (example)

Server Graphic
#3 Global SQL Objects: Objects which are shared among all databases within a cluster.
#6 Client applications are prohibited from connecting to template0
#1 If by you we mean "the client" saying that you work "in the cluster data" doesn't really help.  I would emphasize the point that the client sees an endpoint the Postmaster publishes as a port or socket file and that plus the database name defines the endpoint the client connects to (meld with #5)

In lieu of some of the existing detail provided about structure I would add information about configuration and search_path at this level.

I like the object type enumeration - I would suggest grouping them by type in a manner consistent with the documentation and making each one a link to its "primary" section - the SQL Command reference if all else fails.

The "i" in internal in 51.3 (the image) needs capitalization).

You correctly add both Extension and Collation as database-level objects but they are not mentioned anywhere else.  They do belong here and need to be tied in properly in the text.

The whole thing needs a good pass focused on capitalization.  Both for typos and to decide when various primary concepts like Instance should be capitalized and when not.

51.4 - When you look at the diagram seeing /pg/data/base looks really cool, but when reading the prose where both the "pg" and the "base" are omitted and all you get are repeated references to "data", the directory name choice becomes an issue IMO.  I suggest (and changed the attached) to name the actual root directory "pgdata".  You should change the /pg/ directory name to something like ".../tutorial_project/".

Since you aren't following alphabetical order anyway I would place pg_tblspc after globals since tablespaces are globals and thus proximity links them here - and pointing out that pg_tblspc holds the data makes stating that global doesn't contain tablespace data unnecessary.

Maybe point out somewhere the the "base/databaseOID" directory represents the default tablespace for each database, which isn't "global", only the non-default tablespaces are considered globals (or just get rid of the mentioned on "non-default tablespace" for now).

David J.


0012-architecture-dgj-suggestions.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

David G Johnston
In reply to this post by Jürgen Purtz
On Sun, Nov 8, 2020 at 8:56 AM Jürgen Purtz <[hidden email]> wrote:
Good catches. Everything applied.

MVCC Section

The first paragraph and example in the MVCC section is a good example but seems misplaced - its relationship to MVCC generally is tenuous, rather I would expect a discussion of the serializable isolation mode to follow.

I'm not sure how much detail this section wants to get into given the coverage of concurrency elsewhere in the documentation.  "Not much" would be my baseline.

I would suggest spelling out what "OLTP" stands for and ideally pointing the user to the glossary for the term.

Tending more toward a style gripe but the amount of leader phrases and redundancy are at a level that I am noticing them when I read this but do not have the same impression having read large portions of documentation. In particular:

"When we speak about transaction IDs, you need to know that xids are like sequences."

"But keep in mind that xids are independent of any time measurement — in milliseconds or otherwise. If you dive deeper into PostgreSQL, you will recognize parameters with names such as 'xxx_age'. Despite their names, these '_age' parameters do not specify a period of time but represent a certain number of transactions, e.g., 100 million."

Could just be:  xids are sequences and age computations involving them measure a transaction count as opposed to a time interval.

Then I would consider adding a bit more detail/context here.

xids are 32bit sequences, with a reserved value to handle wrap-around.  There are 4 billion values in the sequence but wrap-around handling must occur every 2 billion transactions. Age computations involving xids measure a transaction count as opposed to a time interval.

I would move the mentioning of "vacuum" to the main paragraph about delete and not solely as a "keep in mind" note.

The part before the diagram seems like it should be much shorter, concise, and provide links to the excellent documentation.  The part after the image, and the image itself, are good material, though possibly should be in a main administration chapter instead of an internals chapter.

The first bullet of "keep in mind" is both wordy and wrong - in particular "as xids grow old row versions get out of scope over time" doesn't make sense (or rather it only does in the context of wrap-around, not normal visibility).  Having the only mention of bloat be here is also not ideal, it too should be weaved into the main narrative.  The "keep in mind" section here should be a recap of already covered material in a succinct form, nothing should be new to someone who just read the entire section.

I don't think that usage of exclamation marks (!) is warranted here, though emphasis on the key phrase wouldn't hurt.

Vacuum Section

avoid -> prevent (continued growth)

Autovacuum is enabled by default.  The whole note needs commas.

I'd try to get rid of "at arbitrary point in time"

"Instance." we've already described where instances are previously ("on the server")

The other sections - these seem misplaced for the tutorial, update the main documentation if this information is wholly missing or lacking.  The MVCC chapter can incorporate overview information as it is a strict consequence of that implementation.

Statistics belong elsewhere - the tutorial should not use poor command implementation choices as a guide for user education.

In short, this whole section should not exist and its content moved to more appropriate areas (mainly MVCC).  Vacuum is a tool that one must use but the narrative should be about the system generally.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
In reply to this post by David G Johnston
On 10.11.20 00:14, David G. Johnston wrote:
Reviewed the first three sections.

template0 - I would remove the schema portions of this and simply note this as being a pristine recovery database in the diagram.
ok

I would drop the word "more" and just say "system schemas".  I would drop pg_toast from the list of system schema and focus on the three user-facing ones.
ok

Instead of "my_schema" (optional) I would do "my_schema" (example)
The terms 'optional' and 'default' are used at various places with their literal meaning. We shall not change them.

Server Graphic
#3 Global SQL Objects: Objects which are shared among all databases within a cluster.
#6 Client applications are prohibited from connecting to template0
ok
#1 If by you we mean "the client" saying that you work "in the cluster data" doesn't really help.  I would emphasize the point that the client sees an endpoint the Postmaster publishes as a port or socket file and that plus the database name defines the endpoint the client connects to (meld with #5)
ok, with some changes.

In lieu of some of the existing detail provided about structure I would add information about configuration and search_path at this level.
Search path appended. But IMO configuration questions are out of scope of this sub-chapter.

I like the object type enumeration - I would suggest grouping them by type in a manner consistent with the documentation and making each one a link to its "primary" section - the SQL Command reference if all else fails.
ok. But don't how to group them in a better way.

The "i" in internal in 51.3 (the image) needs capitalization).
ok

You correctly add both Extension and Collation as database-level objects but they are not mentioned anywhere else.  They do belong here and need to be tied in properly in the text.
Have some courage to the gap, it's an introductory chapter.

The whole thing needs a good pass focused on capitalization.  Both for typos and to decide when various primary concepts like Instance should be capitalized and when not.
'Instance' and 'Cluster' are now uppercase because of their importance, everything else lowercase for better reading.

51.4 - When you look at the diagram seeing /pg/data/base looks really cool, but when reading the prose where both the "pg" and the "base" are omitted and all you get are repeated references to "data", the directory name choice becomes an issue IMO.  I suggest (and changed the attached) to name the actual root directory "pgdata".  You should change the /pg/ directory name to something like ".../tutorial_project/".

The graphic shall reflect the default behavior of PG. Without the parameter -D, initdb creates the new cluster in the directory where PGDATA points to. This is in many cases /var/lib/pgsql/data. Therefore 'data' and its subdirectory 'base' are not my invention but reflects the default situation.

(Diving a little deeper into this issue I noticed that there is a parameter 'cluster_name' in the config file. But it does not change the name of the cluster's root directory, it only changes the names of the running processes. Choosing 'instance_name' instead of 'cluster_name' as the parameter's name would be a better choice imo - but that is not what we are speaking about in the context of the new chapter).

I changed the very first directory in the graphic to visualize the standard behavior; I reverted your recommendation to use 'pgdata' instead of 'data' in the text part.

Since you aren't following alphabetical order anyway I would place pg_tblspc after globals since tablespaces are globals and thus proximity links them here - and pointing out that pg_tblspc holds the data makes stating that global doesn't contain tablespace data unnecessary.
ok

Maybe point out somewhere the the "base/databaseOID" directory represents the default tablespace for each database, which isn't "global", only the non-default tablespaces are considered globals (or just get rid of the mentioned on "non-default tablespace" for now).

ok

more:

1) some changes concerning the nature of connections (52.2: logical perspective). IMO accessing multiple databases within one connection is not a question of configuring, you have to take more actions. But I'm not sure we should mention this at all.

2) you propose to cancel or trim down the paragraphs behind figure 51.2. (cluster, database, schema). I believe that a textual description of this hierarchy is essential for the understanding of the system. Because it isn't described explicitly at a different place, it should remain.

--- snipp -------- from other e-mail ----

MVCC Section

The first paragraph and example in the MVCC section is a good example but seems misplaced - its relationship to MVCC generally is tenuous, rather I would expect a discussion of the serializable isolation mode to follow.

I'm not sure how much detail this section wants to get into given the coverage of concurrency elsewhere in the documentation.  "Not much" would be my baseline.
The paragraph focus on the fact that new row versions are generated instead of locking something. Explaining serialization isolation modes is imo very complicate and out of the scope of this subchapter. If we want to give an overview - in addition to the exiting documentation - it should be a separate subchapter.

I would suggest spelling out what "OLTP" stands for and ideally pointing the user to the glossary for the term.
ok, but not added to glossary. The given explanation "... with a massive number of concurrent write actions" should be sufficient.

Tending more toward a style gripe but the amount of leader phrases and redundancy are at a level that I am noticing them when I read this but do not have the same impression having read large portions of documentation. In particular:
Because I'm not a native English speaker, orthographic and style hits are always welcome.

"When we speak about transaction IDs, you need to know that xids are like sequences."

"But keep in mind that xids are independent of any time measurement — in milliseconds or otherwise. If you dive deeper into PostgreSQL, you will recognize parameters with names such as 'xxx_age'. Despite their names, these '_age' parameters do not specify a period of time but represent a certain number of transactions, e.g., 100 million."

Could just be:  xids are sequences and age computations involving them measure a transaction count as opposed to a time interval.
ok

Then I would consider adding a bit more detail/context here.

xids are 32bit sequences, with a reserved value to handle wrap-around.  There are 4 billion values in the sequence but wrap-around handling must occur every 2 billion transactions. Age computations involving xids measure a transaction count as opposed to a time interval.

I would move the mentioning of "vacuum" to the main paragraph about delete and not solely as a "keep in mind" note.
The mentioning here at the food of the page is a crossover to the next subchapter.

The part before the diagram seems like it should be much shorter, concise, and provide links to the excellent documentation.  The part after the image, and the image itself, are good material, though possibly should be in a main administration chapter instead of an internals chapter.

vacuum: The problem - and one reason for the existence of this subchapter - is that vacuum's documentation is scattered across may pages:

19.4: parameters to configure the server, especially five parameters 'vacuum_cost_xxx'.

19.10: parameters to configure autovacuum.

19.11: parameters to configure client connections, especially five parameters 'vacuum_xxx' concerning their freeze-behavior.

24.1: explains the general necessity of (auto)vacuum and their strategies.

The page about the SQL command VACUUM explains the different options (FULL, FREEZE, ..) and their meaning.

Because of the structure of our documentation as well as the complexity of the issue that's ok. The existing documentation describes every parameter very well, but I'm missing a page where the 'big picture' of vacuum is explained (not necessarily here). It shall show the relationship between the huge number of parameters and an explanation *why* they exists. As far as we don't have such a page within the vacuum documentation the proposed subchapter fills the gap. (The provided graphics can be included multiple times without generating redundancies - here and at arbitrary other places.)


The first bullet of "keep in mind" is both wordy and wrong - in particular "as xids grow old row versions get out of scope over time" doesn't make sense (or rather it only does in the context of wrap-around, not normal visibility).  Having the only mention of bloat be here is also not ideal, it too should be weaved into the main narrative.  The "keep in mind" section here should be a recap of already covered material in a succinct form, nothing should be new to someone who just read the entire section.
ok.

I don't think that usage of exclamation marks (!) is warranted here, though emphasis on the key phrase wouldn't hurt.
ok

Vacuum Section

avoid -> prevent (continued growth)
ok

Autovacuum is enabled by default.  The whole note needs commas.
ok

I'd try to get rid of "at arbitrary point in time"
ok

"Instance." we've already described where instances are previously ("on the server")
ok

The other sections - these seem misplaced for the tutorial, update the main documentation if this information is wholly missing or lacking.  The MVCC chapter can incorporate overview information as it is a strict consequence of that implementation.

Statistics belong elsewhere - the tutorial should not use poor command implementation choices as a guide for user education.

In short, this whole section should not exist and its content moved to more appropriate areas (mainly MVCC).  Vacuum is a tool that one must use but the narrative should be about the system generally.


concerning vacuum section: see my comments above

concerning 'the other sections' (transactions, reliability, backup (plus: someone should add 'replication', I'm not familiar with this issue)): The intention of the chapter is to give a *summary* about PG's essential architecture and about central implementation aspects. This implies that the chapters does not present any new information. They shall only show (or repeat) essential things in their context and explain *why* they are used. In this sense the three chapters may be reasonable. Concerning this, I like to hear some comments from other people.


Attachments:

0013-architecture.patch: complete patch vs. master

0013-architecture.sgml.diff: changes in file architecture.sgml since 0012

0013-images.diff: changes in files *-raw.svg since 0012

--

J. Purtz



0013-architecture.patch (234K) Download Attachment
0013-architecture.sgml.diff (28K) Download Attachment
0013-images.diff (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial - arch-dev.sgml

Erik Rijkers
On 2020-11-15 19:45, Jürgen Purtz wrote:
>>

(smallish) Changes to arch-dev.sgml

Erik

arch-dev.sgml.20201120.diff (5K) Download Attachment