Additional Chapter for Tutorial

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Additional Chapter for Tutorial

Jürgen Purtz
Our documentation explains many details about commands, tools,
parameters in detail and with high accuracy. Nevertheless my impression
is that we neglect the 'big picture': why certain processes exist and
what their relation to each other is, summary of strategies,
visualization of key situations, ... . People with mature knowledge
don't miss this information because they know all about it. But for
beginners such explanations would be a great help. In the time before
GSoD 2019 we had similar discussions.

I plan to extend over time the part 'Tutorial' by an additional chapter
with an overview about key design decisions and basic features. The
typical audience should consist of persons with limited pre-knowledge in
database systems and some interest in PostgreSQL. In the attachment you
find a patch for the first sub-chapter. Subsequent sub-chapters should
be: MVCC, transactions, VACUUM, backup, replication, ... - mostly with
the focus on the PostgreSQL implementation and not on generic topics
like b-trees.

There is a predecessor of this patch:
https://www.postgresql.org/message-id/974e09b8-edf5-f38f-2fb5-a5875782ffc9%40purtz.de 
. In the meanwhile its glossary-part is separated and commited. The new
patch contains two elements: textual descriptions and 4 figures. My
opinion concerning figures is set out in detail in the previous patch.

Kind regards, Jürgen Purtz



0001-architecture.patch (129K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Corey Huinker
I plan to extend over time the part 'Tutorial' by an additional chapter
with an overview about key design decisions and basic features. The
typical audience should consist of persons with limited pre-knowledge in
database systems and some interest in PostgreSQL. In the attachment you
find a patch for the first sub-chapter. Subsequent sub-chapters should
be: MVCC, transactions, VACUUM, backup, replication, ... - mostly with
the focus on the PostgreSQL implementation and not on generic topics
like b-trees.

+1
 
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Erik Rijkers
In reply to this post by Jürgen Purtz
On 2020-04-17 19:56, Jürgen Purtz wrote:
> Our documentation explains many details about commands, tools,
> parameters in detail and with high accuracy. Nevertheless my
> impression is that we neglect the 'big picture': why certain processes

> [0001-architecture.patch]

Very good stuff, and useful. I think.

I mean that but nevertheless here is a lot of comment :)

(I didn't fully compile as docs, just read the 'text' from the patch
file)


Collabortion
Collaboration

drop 'resulting'


He acts in close cooperation with the
It acts in close cooperation with the

He loads the configuration files, allocates the
It loads the configuration files, allocates the

process</firstterm>. He checks the authorization, starts a
process</firstterm>. it checks the authorization, starts a

and instructs the client application to connect to him. All further
and instructs the client application to connect to it. All further

by him.
by it.

In an first attempt
In a first attempt

much huger than memory, it's likely that
much larger than memory, it's likely that

RAM is performed in units of complete pages while retaining
RAM is performed in units of complete pages, retaining

Sooner or later it is necessary to overwrite old RAM
Sooner or later it becomes necessary to overwrite old RAM

transfered
transferred
   (multiple times)

who runs
which runs

He writes
it writes

This is the primarily duty of the
This is primarily the duty of the
   or possibly:
This is the primary duty of the

he starts periodically
it starts periodically

speeds up a possibly occurring recovery.
can speed up recovery.

writen
written

collects counter about accesses
collects counters about accesses

and others. He stores the obtained information in system
and more. It stores the obtained information in system

sudirectories consists
subdirectories consist  <-- plural, no -s

there are information
there is information

and contains the ID of the
and contains the ID (pid) of the

( IMHO, it is conventional (and therefore easier to read) to have 'e.g.'
followed by a comma, and not by a semi-colon, although obviously that's
not really wrong either. )


Thanks,

Erik Rijkers




Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
On 17.04.20 20:40, Erik Rijkers wrote:
> Very good stuff, and useful. I think.
>
> I mean that but nevertheless here is a lot of comment :)
>
> (I didn't fully compile as docs, just read the 'text' from the patch
> file)

Thanks. Added nearly all of the suggestions.


--

Jürgen Purtz


0002-architecture.patch (129K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
On 20.04.20 10:30, Jürgen Purtz wrote:

> On 17.04.20 20:40, Erik Rijkers wrote:
>> Very good stuff, and useful. I think.
>>
>> I mean that but nevertheless here is a lot of comment :)
>>
>> (I didn't fully compile as docs, just read the 'text' from the patch
>> file)
>
> Thanks. Added nearly all of the suggestions.
>
>
What is new? Added two sub-chapters 'mvcc' and 'vacuum' plus graphics.
Made some modifications in previous sub-chapters and in existing titles.
Added some glossary entries.

--

Jürgen Purtz



0003-architecture.patch (226K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial - (review first half of 0003)

Erik Rijkers
On 2020-04-29 16:13, Jürgen Purtz wrote:

> On 20.04.20 10:30, Jürgen Purtz wrote:
>> On 17.04.20 20:40, Erik Rijkers wrote:
>>> Very good stuff, and useful. I think.
>>>
>>> I mean that but nevertheless here is a lot of comment :)
>>>
>>> (I didn't fully compile as docs, just read the 'text' from the patch
>>> file)
>>
>> Thanks. Added nearly all of the suggestions.
>>
>>
> What is new? Added two sub-chapters 'mvcc' and 'vacuum' plus graphics.
> Made some modifications in previous sub-chapters and in existing
> titles. Added some glossary entries.

> [0003-architecture.patch]

Hi Jürgen,


Here are again some suggested changes, up to line 600 (of the patch -
that is around start of the new NVCC paragraph)

I may have repeated some thing you have already rejected (it was too
much work to go back and check).  I am not a native speaker of english.

One general remark: in my humble opinion, you write too many capitalized
words.  It's not really a problem but overall it's becomes bit too much.
  But I have not marked these. perhaps some future iteration.

I'll probably read through the latter part of the patch later (probably
tomorrow).

Thanks,

Erik Rijkers



they merely send requests to the server side and receives
they merely send requests to the server side and receive

is a group of tightly coupled other server side processes plus a
is a group of tightly coupled other server-side processes plus a

Client requests (SELECT, UPDATE, ...) usually leads to the
Client requests (SELECT, UPDATE, ...) usually lead to the

Because files are much larger than memory, it's likely that
Because files are often larger than memory, it's likely that

RAM is performed in units of complete pages, retaining their size and
layout.
RAM is performed in units of complete pages.

Reading file pages is notedly slower than reading
Reading file pages is slower than reading

of the <firstterm>Backend processes</firstterm> has done the job those
pages are available for all other
of the <firstterm>Backend processes</firstterm> has read pages into
memory those pages are available for all other

they must be transferred back to disk. This is a two-step process.
they must be written back to disk. This is a two-step process.

Because of the sequential nature of this writing, it is much
Because of this writing is sequential, it is much

in an independent process. Nevertheless all
in an independent process. Nevertheless, all

huge I/O activities can block other processes significantly,
I/O activities can block other processes,

it starts periodically and acts only for a short period.
it starts periodically and is active only for a short period.

duty. As its name suggests, he has to create
duty. As its name suggests, it has to create

In consequence, after a <firstterm>Checkpoint</firstterm>
After a <firstterm>Checkpoint</firstterm>,

In correlation with data changes,
As a result of data changes,

text lines about serious and non-serious events which can happen
text lines about serious and less serious events which can happen

database contains many <glossterm
linkend="glossary-schema">schema</glossterm>,
database contains many <glossterm
linkend="glossary-schema">schemas</glossterm>,

belongs to a certain <firstterm>schema</firstterm>, they cannot
belongs to a single <firstterm>schema</firstterm>, they cannot

A <firstterm>Cluster</firstterm> is the outer frame for a
A <firstterm>Cluster</firstterm> is the outer container for a

<literal>postgres</literal> as a copy of
<literal>postgres</literal> is generated as a copy of

role of <literal>template0</literal> as the origin
role of <literal>template0</literal> as the pristine origin

are different objects and absolutely independent from each
are different objects and independent from each

complete <firstterm>cluster</firstterm>, independent from
<firstterm>cluster</firstterm>, independent from

anywhere in the file system. In many cases, the environment
somewhere in the file system. In many cases, the environment

some files, all of which are necessary to store long lasting
some files, all of which are necessary to store long-lasting

<firstterm>tablespaces</firstterm> itself.
<firstterm>tablespaces</firstterm> themselves.

<firstterm>Postgres</firstterm> (respectively
<firstterm>Postmaster</firstterm>) process.
<firstterm>Postgres</firstterm> process (also known as
<firstterm>Postmaster</firstterm>).

<title>MVCC</title>
<title>MVCC - Multiversion Concurrency Control</title>

The dabase must take a sensible decision to prevent the application
The database must take a sensible decision to prevent the application

# this sentence I just don't understand - can you please elucidate?
The database must take a sensible decision to prevent the application
from promising delivery of the single article to both clients.













Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Peter Eisentraut-6
In reply to this post by Jürgen Purtz
On 2020-04-29 16:13, Jürgen Purtz wrote:

> On 20.04.20 10:30, Jürgen Purtz wrote:
>> On 17.04.20 20:40, Erik Rijkers wrote:
>>> Very good stuff, and useful. I think.
>>>
>>> I mean that but nevertheless here is a lot of comment :)
>>>
>>> (I didn't fully compile as docs, just read the 'text' from the patch
>>> file)
>>
>> Thanks. Added nearly all of the suggestions.
>>
>>
> What is new? Added two sub-chapters 'mvcc' and 'vacuum' plus graphics.
> Made some modifications in previous sub-chapters and in existing titles.
> Added some glossary entries.

I don't see this really as belonging into the tutorial.  The tutorial
should be hands-on, how do you get started, how do you get some results.

Your material is more of an overview of the whole system.  What's a new
user supposed to do with that?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
On 29.04.20 21:12, Peter Eisentraut wrote:
>
> I don't see this really as belonging into the tutorial.  The tutorial
> should be hands-on, how do you get started, how do you get some results.
>
Yes, the tutorial should be a short overview and give instructions how
to start. IMO the first 4 sub-chapters fulfill this expectation. Indeed,
the fifth (VACUUM) is extensive and offers many details.

During the inspection of the existing documentation I recognized that
there are many details about VACUUM, AUTOVACUUM, all of their parameters
as well as their behavior. But the information is spread across many
pages: Automatic Vacuuming, Client Connection Defaults, Routine
Vacuuming, Resource Consumption, VACUUM. Even for a person with some
pre-knowledge it is hard to get an overview how this fits together and
why things are solved in exactly this way. In the end we have very good
descriptions of all details but I miss the 'big picture'. Therefore I
summarized central aspects and tried to give an answer to the question
'why is it done in this way?'. I do not dispute that the current version
of the page is not adequate for beginners. But at some place we should
have such a summary about vacuuming and freezing.

How to proceed?

- Remove the page and add a short paragraph to the MVCC page instead.

- Cut down the page to a tiny portion.

- Divide it into two parts: a) a short introduction and b) the rest
after a statement like 'The following offers more details and parameters
that are more interesting for an experienced user than for a beginner.
You can easily skip it.'


> Your material is more of an overview of the whole system.  What's a
> new user supposed to do with that?

When I dive into a new subject, I'm more interested in its architecture
than in its details. We shall offer an overview about the major PG
components and strategies to beginners.


--

Jürgen Purtz




Reply | Threaded
Open this post in threaded view
|

Re: Additional Chapter for Tutorial

Jürgen Purtz
On 30.04.20 14:31, Jürgen Purtz wrote:

> On 29.04.20 21:12, Peter Eisentraut wrote:
>>
>> I don't see this really as belonging into the tutorial.  The tutorial
>> should be hands-on, how do you get started, how do you get some results.
>>
> Yes, the tutorial should be a short overview and give instructions how
> to start. IMO the first 4 sub-chapters fulfill this expectation.
> Indeed, the fifth (VACUUM) is extensive and offers many details.
>
> During the inspection of the existing documentation I recognized that
> there are many details about VACUUM, AUTOVACUUM, all of their
> parameters as well as their behavior. But the information is spread
> across many pages: Automatic Vacuuming, Client Connection Defaults,
> Routine Vacuuming, Resource Consumption, VACUUM. Even for a person
> with some pre-knowledge it is hard to get an overview how this fits
> together and why things are solved in exactly this way. In the end we
> have very good descriptions of all details but I miss the 'big
> picture'. Therefore I summarized central aspects and tried to give an
> answer to the question 'why is it done in this way?'. I do not dispute
> that the current version of the page is not adequate for beginners.
> But at some place we should have such a summary about vacuuming and
> freezing.
>
> How to proceed?
>
> - Remove the page and add a short paragraph to the MVCC page instead.
>
> - Cut down the page to a tiny portion.
>
> - Divide it into two parts: a) a short introduction and b) the rest
> after a statement like 'The following offers more details and
> parameters that are more interesting for an experienced user than for
> a beginner. You can easily skip it.'
>
>
>> Your material is more of an overview of the whole system.  What's a
>> new user supposed to do with that?
>
> When I dive into a new subject, I'm more interested in its
> architecture than in its details. We shall offer an overview about the
> major PG components and strategies to beginners.
>
>
In comparison with to previous patch this one contains:

- Position and title changed to reflect its intention and importance.

- A <note> delimits VACUUM basics from details. This is done because I
cannot find another suitable place for such a summarizing description.

- Three additional sub-chapters.

--

Jürgen Purtz


0004-architecture.patch (232K) Download Attachment