Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Wrong statistics for size of XLOG_SWITCH during pg_waldump.

movead.li@highgo.ca
Hello hackers,

We know that pg_waldump can statistics size for every kind of records. When I use
the feature I find it misses some size for XLOG_SWITCH records. When a user does
pg_wal_switch(), then postgres will discard the remaining size in the current wal
segment, and the pg_waldump tool misses the discard size.

I think it will be better if pg_waldump  can show the matter, so I make a patch
which regards the discard size as a part of XLOG_SWITCH record, it works if we
want to display the detail of wal records or the statistics, and patch attached.

What's your opinion?


Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

fix_waldump_size_for_wal_switch.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Kyotaro Horiguchi-4
At Fri, 9 Oct 2020 13:41:25 +0800, "[hidden email]" <[hidden email]> wrote in

> Hello hackers,
>
> We know that pg_waldump can statistics size for every kind of records. When I use
> the feature I find it misses some size for XLOG_SWITCH records. When a user does
> a pg_wal_switch(), then postgres will discard the remaining size in the current wal
> segment, and the pg_waldump tool misses the discard size.
>
> I think it will be better if pg_waldump  can show the matter, so I make a patch
> which regards the discard size as a part of XLOG_SWITCH record, it works if we
> want to display the detail of wal records or the statistics, and patch attached.
>
> What's your opinion?

I think that the length of the XLOG_SWITCH record is no other than 24
bytes. Just adding the padding? garbage bytes to that length doesn't
seem the right thing to me.

If we want pg_waldump to show that length somewhere, it could be shown
at the end of that record explicitly:

rmgr: XLOG        len (rec/tot):     24/16776848, tx:          0, lsn: 0/02000148, prev 0/02000110, desc: SWITCH, trailing-bytes: 16776944

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

movead.li@highgo.ca

>I think that the length of the XLOG_SWITCH record is no other than 24
>bytes. Just adding the padding? garbage bytes to that length doesn't
>seem the right thing to me.
>
>If we want pg_waldump to show that length somewhere, it could be shown
>at the end of that record explicitly:
> 
>rmgr: XLOG        len (rec/tot):     24/16776848, tx:          0, lsn: 0/02000148, prev 0/02000110, desc: SWITCH, trailing-bytes: 16776944

Thanks, I think it's good idea, and new patch attached.

Here's the lookes:
rmgr: XLOG        len (rec/tot):     24/    24, tx:          0, lsn: 0/030000D8, prev 0/03000060, desc: SWITCH, trailing-bytes: 16776936



Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

fix_waldump_size_for_wal_switch_v2.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Michael Paquier-2
On Sat, Oct 10, 2020 at 09:50:02AM +0800, [hidden email] wrote:
>> I think that the length of the XLOG_SWITCH record is no other than 24
>> bytes. Just adding the padding? garbage bytes to that length doesn't
>> seem the right thing to me.
>
> Here's the lookes:
> rmgr: XLOG        len (rec/tot):     24/    24, tx:          0, lsn: 0/030000D8, prev 0/03000060, desc: SWITCH, trailing-bytes: 16776936


 static void
-XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len)
+XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len, uint32 *junk_len)
 {
If you wish to add more information about a XLOG_SWITCH record, I
don't think that changing the signature of XLogDumpRecordLen() is
adapted because the record length of this record is defined as
Horiguchi-san mentioned upthread, and the meaning of junk_len is
confusing here.  It seems to me that any extra information should be
added to xlog_desc() where there should be an extra code path for
(info == XLOG_SWITCH).  XLogReaderState should have all the
information you are lookng for.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

movead.li@highgo.ca

Thanks for reply.

>If you wish to add more information about a XLOG_SWITCH record, I
>don't think that changing the signature of XLogDumpRecordLen() is
>adapted because the record length of this record is defined as
>Horiguchi-san mentioned upthread, and the meaning of junk_len is
>confusing here.  It seems to me that any extra information should be
>added to xlog_desc() where there should be an extra code path for
>(info == XLOG_SWITCH).  XLogReaderState should have all the
>information you are lookng for.
We have two places to use the 'junk_len', one is when we show the 
detail record information, another is when we statistics the percent
of all kind of wal record kinds(by --stat=record). The second place
will not run the xlog_desc(), so it's not a good chance to do the thing.

I am still can not understand why it can't adapted to change the
signature of XLogDumpRecordLen(), maybe we can add a new function
to caculate the 'junk_len' and rename the 'junk_len' as 'skiped_size' or
'switched_size'?



Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Kyotaro Horiguchi-4
At Mon, 12 Oct 2020 09:46:37 +0800, "[hidden email]" <[hidden email]> wrote in

>
> Thanks for reply.
>
> >If you wish to add more information about a XLOG_SWITCH record, I
> >don't think that changing the signature of XLogDumpRecordLen() is
> >adapted because the record length of this record is defined as
> >Horiguchi-san mentioned upthread, and the meaning of junk_len is
> >confusing here.  It seems to me that any extra information should be
> >added to xlog_desc() where there should be an extra code path for
> >(info == XLOG_SWITCH).  XLogReaderState should have all the
> >information you are lookng for.
> We have two places to use the 'junk_len', one is when we show the
> detail record information, another is when we statistics the percent
> of all kind of wal record kinds(by --stat=record). The second place
> will not run the xlog_desc(), so it's not a good chance to do the thing.
>
> I am still can not understand why it can't adapted to change the
> signature of XLogDumpRecordLen(), maybe we can add a new function
> to caculate the 'junk_len' and rename the 'junk_len' as 'skiped_size' or
> 'switched_size'?

The reason is the function XLogDumpRecordLen is a common function
among all kind of LOG records, not belongs only to XLOG_SWICH. And the
junk_len is not useful for other than XLOG_SWITCH.  Descriptions
specifc to XLOG_SWITCH is provided by xlog_desc().

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Michael Paquier-2
On Wed, Oct 14, 2020 at 10:29:44AM +0900, Kyotaro Horiguchi wrote:
> The reason is the function XLogDumpRecordLen is a common function
> among all kind of LOG records, not belongs only to XLOG_SWICH. And the
> junk_len is not useful for other than XLOG_SWITCH.  Descriptions
> specifc to XLOG_SWITCH is provided by xlog_desc().

Yeah.  In its current shape, it means that only pg_waldump would be
able to know this information.  If you make this information part of
xlogdesc.c, any consumer of the WAL record descriptions would be able
to show this information, so it would provide a consistent output for
any kind of tools.

On top of that, it seems to me that the calculation used in the patch
is wrong in two aspects at quick glance:
1) startSegNo and endSegNo point always to two different segments with
a XLOG_SWITCH record, so you should check that ReadRecPtr is not at a
segment border instead before extracting SizeOfXLogLongPHD, no?
2) This stuff should also check after the case of a WAL *page* border
where you'd need to adjust based on SizeOfXLogShortPHD instead.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Andres Freund
Hi,

On 2020-10-14 15:52:43 +0900, Michael Paquier wrote:
> Yeah.  In its current shape, it means that only pg_waldump would be
> able to know this information.  If you make this information part of
> xlogdesc.c, any consumer of the WAL record descriptions would be able
> to show this information, so it would provide a consistent output for
> any kind of tools.

I'm not convinced by this argument. The only case where accounting for
the "wasted" length seems really interesting is for --stats=record - and
for that including it in the record description is useless. When looking
at plain records the length is sufficiently deducable by looking at the
next record's LSN.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Kyotaro Horiguchi-4
At Wed, 14 Oct 2020 13:46:13 -0700, Andres Freund <[hidden email]> wrote in

> Hi,
>
> On 2020-10-14 15:52:43 +0900, Michael Paquier wrote:
> > Yeah.  In its current shape, it means that only pg_waldump would be
> > able to know this information.  If you make this information part of
> > xlogdesc.c, any consumer of the WAL record descriptions would be able
> > to show this information, so it would provide a consistent output for
> > any kind of tools.
>
> I'm not convinced by this argument. The only case where accounting for
> the "wasted" length seems really interesting is for --stats=record - and
> for that including it in the record description is useless. When looking
> at plain records the length is sufficiently deducable by looking at the
> next record's LSN.
I'm not sure the exact motive of this proposal, but if we show the
wasted length in the stats result, I think it should be other than
existing record types.

  XLOG/CHECKPOINT_SHUTDOWN    1 (  0.50) ..
  ...
  Btree/INSERT_LEAF          63 ( 31.19) ..
+ EMPTY                       1 ( xx.xx) ..
----------------------------------------
  Total                      ...


By the way, I noticed that --stats=record shows two lines for
Transaction/COMMIT. The cause is that XLogDumpCountRecord assumes the
all four bits of xl_info is used to identify record id.

The fourth bit of xl_info of XLOG records is used to signal the
existence of record has 'xinfo' field or not. So an XLOG record with
recid == 8 actually exists but it is really a record that recid == 0
and has xinfo field.

I didn't come up with a cleaner solution but the attached fixes that.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..c544b90d88 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -438,6 +438,13 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 
  recid = XLogRecGetInfo(record) >> 4;
 
+ /*
+ * XXX: There is a special case for XACT records. Those records use the MSB
+ * of recid for another purpose. Ignore that bit in that case.
+ */
+ if (rmid == RM_XACT_ID)
+ recid &= 0x07;
+
  stats->record_stats[rmid][recid].count++;
  stats->record_stats[rmid][recid].rec_len += rec_len;
  stats->record_stats[rmid][recid].fpi_len += fpi_len;
Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

movead.li@highgo.ca
Thanks for all the suggestions.

>Yeah.  In its current shape, it means that only pg_waldump would be
>able to know this information.  If you make this information part of
>xlogdesc.c, any consumer of the WAL record descriptions would be able
>to show this information, so it would provide a consistent output for
>any kind of tools.
I have change the implement, move some code into xlog_desc().

>On top of that, it seems to me that the calculation used in the patch
>is wrong in two aspects at quick glance:
>1) startSegNo and endSegNo point always to two different segments with
>a XLOG_SWITCH record, so you should check that ReadRecPtr is not at a
>segment border instead before extracting SizeOfXLogLongPHD, no?
Yes you are right, and I use 'record->EndRecPtr - 1' instead.

>2) This stuff should also check after the case of a WAL *page* border
>where you'd need to adjust based on SizeOfXLogShortPHD instead.
The 'junk_len -= SizeOfXLogLongPHD' code is considered for when the
remain size of a wal segment can not afford a XLogRecord struct for
XLOG_SWITCH, it needn't care *page* border.

>I'm not sure the exact motive of this proposal, but if we show the
>wasted length in the stats result, I think it should be other than
>existing record types.
Yes agree, and now it looks like below as new patch:

movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/3000000 -e 0/6000000 -z
Type                                           N      (%)          Record size      (%)             FPI size      (%)        Combined size      (%)
----                                           -      ---          -----------      ---             --------      ---        -------------      ---
XLOG                                           5 ( 31.25)                  300 (  0.00)                    0 (  0.00)                  300 (  0.00)
XLOGSwitchJunk                                 3 ( 18.75)             50330512 (100.00)                    0 (  0.00)             50330512 (100.00)


movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/3000000 -e 0/6000000 --stat=record
XLOG/SWITCH                                    3 ( 18.75)                   72 (  0.00)                    0 (  0.00)                   72 (  0.00)
XLOG/SWITCH_JUNK                               3 ( 18.75)             50330512 (100.00)                    0 (  0.00)             50330512 (100.00)

The shortcoming now is I do not know how to handle the 'count' of SWITCH_JUNK
in pg_waldump results. Currently I regard SWITCH_JUNK as one count.

>By the way, I noticed that --stats=record shows two lines for
>Transaction/COMMIT. The cause is that XLogDumpCountRecord assumes the
>all four bits of xl_info is used to identify record id.
Thanks,I didn't notice it before, and your patch added into v3 patch attached.



Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

fix_waldump_size_for_wal_switch_v3.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Kyotaro Horiguchi-4
At Thu, 15 Oct 2020 12:56:02 +0800, "[hidden email]" <[hidden email]> wrote in
> Thanks for all the suggestions.
>
> >Yeah.  In its current shape, it means that only pg_waldump would be
> >able to know this information.  If you make this information part of
> >xlogdesc.c, any consumer of the WAL record descriptions would be able
> >to show this information, so it would provide a consistent output for
> >any kind of tools.
> I have change the implement, move some code into xlog_desc().

Andres suggested that we don't need that description with per-record
basis. Do you have a reason to do that?  (For clarity, I'm not
suggesting that you should reving it.)

> >On top of that, it seems to me that the calculation used in the patch
> >is wrong in two aspects at quick glance:
> >1) startSegNo and endSegNo point always to two different segments with
> >a XLOG_SWITCH record, so you should check that ReadRecPtr is not at a
> >segment border instead before extracting SizeOfXLogLongPHD, no?
> Yes you are right, and I use 'record->EndRecPtr - 1' instead.

+ XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);

We use XLByteToPrevSeg instead for this purpose.

> >2) This stuff should also check after the case of a WAL *page* border
> >where you'd need to adjust based on SizeOfXLogShortPHD instead.
> The 'junk_len -= SizeOfXLogLongPHD' code is considered for when the
> remain size of a wal segment can not afford a XLogRecord struct for
> XLOG_SWITCH, it needn't care *page* border.
>
> >I'm not sure the exact motive of this proposal, but if we show the
> >wasted length in the stats result, I think it should be other than
> >existing record types.
> Yes agree, and now it looks like below as new patch:
You forgot to add a correction for short headers.

> movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/3000000 -e 0/6000000 -z
> Type                                           N      (%)          Record size      (%)             FPI size      (%)        Combined size      (%)
> ----                                           -      ---          -----------      ---             --------      ---        -------------      ---
> XLOG                                           5 ( 31.25)                  300 (  0.00)                    0 (  0.00)                  300 (  0.00)
> XLOGSwitchJunk                                 3 ( 18.75)             50330512 (100.00)                    0 (  0.00)             50330512 (100.00)
>
>
> movead@movead-PC:/h2/pg/bin$ ./pg_waldump -p ../walbk/ -s 0/3000000 -e 0/6000000 --stat=record
> XLOG/SWITCH                                    3 ( 18.75)                   72 (  0.00)                    0 (  0.00)                   72 (  0.00)
> XLOG/SWITCH_JUNK                               3 ( 18.75)             50330512 (100.00)                    0 (  0.00)             50330512 (100.00)
>
> The shortcoming now is I do not know how to handle the 'count' of SWITCH_JUNK
> in pg_waldump results. Currently I regard SWITCH_JUNK as one count.

+ if(RM_XLOG_ID == rmid && XLOG_SWITCH == info)

We need a comment for the special code path.
We don't follow this kind of convension. Rather use "variable =
constant".

+ {
+ junk_len = xlog_switch_junk_len(record);
+ stats->count_xlog_switch++;
+ stats->junk_size += junk_len;

junk_len is used only the last line above.  We don't need that
temporary variable.

+ * If the wal switch record spread on two segments, we should extra minus the

This comment is sticking out of 80-column border.  However, I'm not
sure if we have reached a conclustion about the target console-width.

+ info = (rj << 4) & ~XLR_INFO_MASK;
+ switch_junk_id = "XLOG/SWITCH_JUNK";
+ if( XLOG_SWITCH == info && stats->count_xlog_switch > 0)

This line order is strange. At least switch_junk_id is used only in
the if-then block.

I'm not confindent on which is better, but I feel that this is not a
work for display side, but for the recorder side like attached.

> >By the way, I noticed that --stats=record shows two lines for
> >Transaction/COMMIT. The cause is that XLogDumpCountRecord assumes the
> >all four bits of xl_info is used to identify record id.
> Thanks,I didn't notice it before, and your patch added into v3 patch attached.

Sorry for the confusion, but it would be a separate topic even if we
are going to fix that..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..04042a50a4 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -142,6 +142,31 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
  }
 }
 
+uint32
+xlog_switch_junk_len(XLogReaderState *record)
+{
+ uint32 junk_len;
+ XLogSegNo startSegNo;
+ XLogSegNo endSegNo;
+
+ XLByteToSeg(record->ReadRecPtr, startSegNo, record->segcxt.ws_segsize);
+ XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);
+
+ junk_len = record->EndRecPtr - record->ReadRecPtr - XLogRecGetTotalLen(record);
+ /*
+ * If the wal switch record spread on two segments, we should extra minus the
+ * long page head. I mean the case when the remain size of a wal segment can not
+ * afford a XLogRecord struct for XLOG_SWITCH.
+ */
+ if(startSegNo != endSegNo)
+ junk_len -= SizeOfXLogLongPHD;
+
+
+ Assert(junk_len >= 0);
+
+ return junk_len;
+}
+
 const char *
 xlog_identify(uint8 info)
 {
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 31e99c2a6d..ab4e7c830f 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -24,6 +24,7 @@
 #include "common/logging.h"
 #include "getopt_long.h"
 #include "rmgrdesc.h"
+#include "catalog/pg_control.h"
 
 static const char *progname;
 
@@ -66,8 +67,8 @@ typedef struct Stats
 typedef struct XLogDumpStats
 {
  uint64 count;
- Stats rmgr_stats[RM_NEXT_ID];
- Stats record_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
+ Stats rmgr_stats[RM_NEXT_ID + 1];
+ Stats record_stats[RM_NEXT_ID + 1][MAX_XLINFO_TYPES];
 } XLogDumpStats;
 
 #define fatal_error(...) do { pg_log_fatal(__VA_ARGS__); exit(EXIT_FAILURE); } while(0)
@@ -441,6 +442,22 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
  stats->record_stats[rmid][recid].count++;
  stats->record_stats[rmid][recid].rec_len += rec_len;
  stats->record_stats[rmid][recid].fpi_len += fpi_len;
+
+
+ /* Add junk-space stats for XLOG_SWITCH  */
+ if(rmid == RM_XLOG_ID && XLogRecGetInfo(record) == XLOG_SWITCH)
+ {
+ uint64 junk_len = xlog_switch_junk_len(record);
+
+ rmid = WALDUMP_RM_ID;
+ recid = 0;
+
+ stats->rmgr_stats[rmid].count++;
+ stats->rmgr_stats[rmid].rec_len += junk_len;
+
+ stats->record_stats[rmid][recid].count++;
+ stats->record_stats[rmid][recid].rec_len += junk_len;
+ }
 }
 
 /*
@@ -616,7 +633,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
  * calculate column totals.
  */
 
- for (ri = 0; ri < RM_NEXT_ID; ri++)
+ for (ri = 0; ri <= RM_NEXT_ID; ri++)
  {
  total_count += stats->rmgr_stats[ri].count;
  total_rec_len += stats->rmgr_stats[ri].rec_len;
@@ -634,7 +651,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
    "Type", "N", "(%)", "Record size", "(%)", "FPI size", "(%)", "Combined size", "(%)",
    "----", "-", "---", "-----------", "---", "--------", "---", "-------------", "---");
 
- for (ri = 0; ri < RM_NEXT_ID; ri++)
+ for (ri = 0; ri <= RM_NEXT_ID; ri++)
  {
  uint64 count,
  rec_len,
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca4b1..efddc70ac1 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -35,6 +35,18 @@
 #define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask) \
  { name, desc, identify},
 
-const RmgrDescData RmgrDescTable[RM_MAX_ID + 1] = {
+static const char *waldump_xlogswitch_identify(uint8 info);
+
+const RmgrDescData RmgrDescTable[RM_MAX_ID + 2] = {
 #include "access/rmgrlist.h"
+ PG_RMGR(WALDUMP_RM_ID, "OTHERS", NULL, NULL, waldump_xlogswitch_identify, NULL, NULL, NULL)
 };
+
+static const char *
+waldump_xlogswitch_identify(uint8 info)
+{
+ if (info == 0)
+ return "XLOG_SWITCH_JUNK";
+ else
+ return NULL;
+}
diff --git a/src/bin/pg_waldump/rmgrdesc.h b/src/bin/pg_waldump/rmgrdesc.h
index 42f8483b48..2a67d1050e 100644
--- a/src/bin/pg_waldump/rmgrdesc.h
+++ b/src/bin/pg_waldump/rmgrdesc.h
@@ -20,4 +20,6 @@ typedef struct RmgrDescData
 
 extern const RmgrDescData RmgrDescTable[];
 
+#define WALDUMP_RM_ID RM_NEXT_ID
+
 #endif /* RMGRDESC_H */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..8cbc309108 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -315,6 +315,7 @@ extern XLogRecPtr RequestXLogSwitch(bool mark_unimportant);
 
 extern void GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli);
 
+extern uint32 xlog_switch_junk_len(XLogReaderState *record);
 /*
  * Exported for the functions in timeline.c and xlogarchive.c.  Only valid
  * in the startup process.
Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Kyotaro Horiguchi-4
At Thu, 15 Oct 2020 17:32:10 +0900 (JST), Kyotaro Horiguchi <[hidden email]> wrote in

> At Thu, 15 Oct 2020 12:56:02 +0800, "[hidden email]" <[hidden email]> wrote in
> > Thanks for all the suggestions.
> >
> > >Yeah.  In its current shape, it means that only pg_waldump would be
> > >able to know this information.  If you make this information part of
> > >xlogdesc.c, any consumer of the WAL record descriptions would be able
> > >to show this information, so it would provide a consistent output for
> > >any kind of tools.
> > I have change the implement, move some code into xlog_desc().
>
> Andres suggested that we don't need that description with per-record
> basis. Do you have a reason to do that?  (For clarity, I'm not
> suggesting that you should reving it.)

Sorry. Maybe I deleted wrong letters in the "reving" above.

====
Andres suggested that we don't need that description with per-record
basis. Do you have a reason to do that?  (For clarity, I'm not
suggesting that you should remove it.)
====

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

movead.li@highgo.ca
In reply to this post by Kyotaro Horiguchi-4
Thanks for all the suggestion, and new patch attached.

>Andres suggested that we don't need that description with per-record
>basis. Do you have a reason to do that?  (For clarity, I'm not
>suggesting that you should reving it.)
I think Andres is saying if we just log it in xlog_desc() then we can not get
the result for '--stats=record' case. And the patch solve the problem.

>+ XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);
>We use XLByteToPrevSeg instead for this purpose.
Thanks and follow the suggestion.

>You forgot to add a correction for short headers.
Infact, we need to consider this matter when the remain size of a wal
segment can not afford a XLogRecord struct for XLOG_SWITCH. 
It's mean that if record->ReadRecPtr is on A wal segment, then
record->EndRecPtr is on (A+2) wal segment. So we need to minus
the longpagehead size on (A+1) wal segment.
In my thought we need not to care the short header, if my understand
is wrong?

>+ if(RM_XLOG_ID == rmid && XLOG_SWITCH == info)
> 
>We need a comment for the special code path.
>We don't follow this kind of convension. Rather use "variable =
>constant".
>+ {
>+ junk_len = xlog_switch_junk_len(record);
>+ stats->count_xlog_switch++;
>+ stats->junk_size += junk_len;
> 
>junk_len is used only the last line above.  We don't need that
>temporary variable.
> 
>+ * If the wal switch record spread on two segments, we should extra minus the
>This comment is sticking out of 80-column border.  However, I'm not
>sure if we have reached a conclustion about the target console-width.
>+ info = (rj << 4) & ~XLR_INFO_MASK;
>+ switch_junk_id = "XLOG/SWITCH_JUNK";
>+ if( XLOG_SWITCH == info && stats->count_xlog_switch > 0)
> 
>This line order is strange. At least switch_junk_id is used only in
>the if-then block.
Thanks and follow the suggestions.

 
>I'm not confindent on which is better, but I feel that this is not a
>work for display side, but for the recorder side like attached.
The patch really seems more clearly, but the new 'OTHERS' may confuse
the users and we hard to handle it with '--rmgr=RMGR' option. So I have
not use this design in this patch, let's see other's opinion.

>Sorry for the confusion, but it would be a separate topic even if we
>are going to fix that..
Sorry, I remove the code, make sense we should discuss it in a
separate topic.


Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

fix_waldump_size_for_wal_switch_v4.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Kyotaro Horiguchi-4
At Fri, 16 Oct 2020 16:21:47 +0800, "[hidden email]" <[hidden email]> wrote in
> Thanks for all the suggestion, and new patch attached.
>
> >Andres suggested that we don't need that description with per-record
> >basis. Do you have a reason to do that?  (For clarity, I'm not
> >suggesting that you should reving it.)
> I think Andres is saying if we just log it in xlog_desc() then we can not get
> the result for '--stats=record' case. And the patch solve the problem.

Mmm.

>                                                                      and
> for that including it in the record description is useless. When looking
> at plain records the length is sufficiently deducable by looking at the
> next record's LSN.

It looks to me "We can know that length by subtracting the LSN of
XLOG_SWITCH from the next record's LSN so it doesn't add any
information."

> >+ XLByteToSeg(record->EndRecPtr - 1, endSegNo, record->segcxt.ws_segsize);
> >We use XLByteToPrevSeg instead for this purpose.
> Thanks and follow the suggestion.
>
> >You forgot to add a correction for short headers.
> Infact, we need to consider this matter when the remain size of a wal
> segment can not afford a XLogRecord struct for XLOG_SWITCH.
> It's mean that if record->ReadRecPtr is on A wal segment, then
> record->EndRecPtr is on (A+2) wal segment. So we need to minus
> the longpagehead size on (A+1) wal segment.
> In my thought we need not to care the short header, if my understand
> is wrong?

Maybe.

When a page doesn't have sufficient space for a record, the record is
split into to pieces and the last half is recorded after the header of
the next page. If it next page is in the next segment, the header is a
long header and a short header otherwise.

If it were the last page of a segment,

ReadRecPtr
v
<--- SEG A ------->|<---------- SEG A+1 ----------------->|<-SEG A+2
<XLOG_SWITCH_FIRST>|<LONG HEADER><XLOG_SWTICH_LAST><EMPTY>|<LONG HEADER>

So the length of <EMPTY> is:

  LOC(SEG A+2) - ReadRecPtr - LEN(long header) - LEN(XLOG_SWITCH)


If not, that is, it were not the last page of a segment.

<-------------------- SEG A ---------------------------->|<-SEG A+1
< page n ->|<-- page n + 1 ---------->|....|<-last page->|<-first page
<X_S_FIRST>|<SHORT HEADER><X_S_LAST><EMPTY..............>|<LONG HEADER>

So the length of <EMPTY> in this case is:

  LOC(SEG A+1) - ReadRecPtr - LEN(short header) - LEN(XLOG_SWITCH)

> >I'm not confindent on which is better, but I feel that this is not a
> >work for display side, but for the recorder side like attached.
> The patch really seems more clearly, but the new 'OTHERS' may confuse
> the users and we hard to handle it with '--rmgr=RMGR' option. So I have
> not use this design in this patch, let's see other's opinion.

Yeah, I don't like the "OTHERS", too.

> >Sorry for the confusion, but it would be a separate topic even if we
> >are going to fix that..
> Sorry, I remove the code, make sense we should discuss it in a
> separate topic.

Agreed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

movead.li@highgo.ca

>It looks to me "We can know that length by subtracting the LSN of
>XLOG_SWITCH from the next record's LSN so it doesn't add any
>information."
Sorry,maybe I miss this before.
But I think it will be better to show it clearly.

>So the length of <EMPTY> in this case is:
>LOC(SEG A+1) - ReadRecPtr - LEN(short header) - LEN(XLOG_SWITCH)
Thanks, I should not have missed this and fixed.



Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

fix_waldump_size_for_wal_switch_v5.patch (6K) Download Attachment