RHEL 7 (systemd) reboot

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

RHEL 7 (systemd) reboot

Bryce Pepper

I am running three instances (under different users) on a RHEL 7 server to support a vendor product.

 

In the defined services, the start & stop scripts work fine when invoked with systemctl {start|stop} whatever.service  but we have automated monthly patching which does a reboot.

 

Looking in /var/log/messages and the stop scripts do not get invoked on reboot, therefore I created a new shutdown service as described here.

 

It appears that PostGreSQL is receiving a signal from somewhere prior to my script running…

 

Oct 05 14:18:56 kccontrolmt01 NetworkManager[787]: <info>  [1538767136.0967] manager: NetworkManager state is now DISCONNECTED

Oct 05 14:18:56 kccontrolmt01 dbus[740]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispa

Oct 05 14:18:56 kccontrolmt01 dbus[740]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.nm-dispatcher.service': Refusing activation

Oct 05 14:18:56 kccontrolmt01 network[29310]: Shutting down interface eth0:  Device 'eth0' successfully disconnected.

Oct 05 14:18:56 kccontrolmt01 network[29310]: [  OK  ]

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: ------------------------

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Shutting down CONTROL-M.

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: ------------------------

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Waiting ...

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: psql action failed. cannot perform sql command in /data00/ctmlinux/ctm_server/tmp/upd_CMS_SY

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: db_execute_sql failed while processing /data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448.

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Failed to update CMS_SYSPRM table.

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Be aware that the Configuration Agent might start the CONTROL-M/Server

 

The database must be available for the product to shut down in a consistent state.

 

I am open to suggestions.

 

Thanks,

Bryce

 

Bryce Pepper

Sr. Unix Applications Systems Engineer

The Kansas City Southern Railway Company

114 West 11th Street  |  Kansas City,  MO 64105

Office:  816.983.1512 

Email:  [hidden email]  

 

Reply | Threaded
Open this post in threaded view
|

Re: RHEL 7 (systemd) reboot

Adrian Klaver-4
On 10/9/18 11:06 AM, Bryce Pepper wrote:

> I am running three instances (under different users) on a RHEL 7 server
> to support a vendor product.
>
> In the defined services, the start & stop scripts work fine when invoked
> with systemctl {start|stop} whatever.service  but we have automated
> monthly patching which does a reboot.
>
> Looking in /var/log/messages and the stop scripts do not get invoked on
> reboot, therefore I created a new shutdown service as described here
> <https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>.
>
> It appears that PostGreSQL is receiving a signal from somewhere prior to
> my script running…
>

>
> The database must be available for the product to shut down in a
> consistent state.
>
> I am open to suggestions.

What is the below doing or coming from?:

db_execute_sql failed while processing
/data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448.

>
> Thanks,
>
> Bryce
>
> *Bryce Pepper*
>
> Sr. Unix Applications Systems Engineer
>
> *The Kansas City Southern Railway Company *
>
> 114 West 11^th Street  |  Kansas City,  MO 64105
>
> Office:  816.983.1512
>
> Email: [hidden email] <mailto:[hidden email]>
>


--
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: RHEL 7 (systemd) reboot

Bryce Pepper
Adrian,
Thanks for the inquiry.  The function (db_execute_sql) is coming from a vendor (BMC) product called Control-M. It is a scheduling product.
The tmp file is deleted before I can see its contents but I believe it is trying to update some columns in the CMS_SYSPRM table.
I also think the postgresql instance is already stopped and hence why the db_execute fails.  I will try to modify the vendor function to save off the contents of the query.

Bryce

p.s. Do you know of any verbose logging that could be turned on to catch when pgsql is being terminated?


-----Original Message-----
From: Adrian Klaver <[hidden email]>
Sent: Tuesday, October 09, 2018 7:39 PM
To: Bryce Pepper <[hidden email]>; [hidden email]
Subject: Re: RHEL 7 (systemd) reboot

This email originated from outside the company. Please use caution when opening attachments or clicking on links. If you suspect this to be a phishing attempt, please report via PhishAlarm.
________________________________

On 10/9/18 11:06 AM, Bryce Pepper wrote:

> I am running three instances (under different users) on a RHEL 7
> server to support a vendor product.
>
> In the defined services, the start & stop scripts work fine when
> invoked with systemctl {start|stop} whatever.service  but we have
> automated monthly patching which does a reboot.
>
> Looking in /var/log/messages and the stop scripts do not get invoked
> on reboot, therefore I created a new shutdown service as described
> here <https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>.
>
> It appears that PostGreSQL is receiving a signal from somewhere prior
> to my script running.
>

>
> The database must be available for the product to shut down in a
> consistent state.
>
> I am open to suggestions.

What is the below doing or coming from?:

db_execute_sql failed while processing
/data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448.

>
> Thanks,
>
> Bryce
>
> *Bryce Pepper*
>
> Sr. Unix Applications Systems Engineer
>
> *The Kansas City Southern Railway Company *
>
> 114 West 11^th Street  |  Kansas City,  MO 64105
>
> Office:  816.983.1512
>
> Email: [hidden email] <mailto:[hidden email]>
>


--
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: RHEL 7 (systemd) reboot

Bryce Pepper
In reply to this post by Adrian Klaver-4
Here is the contents of the query and error:
[root@kccontrolmt01 tmp]# cat ctm.Xf9pQkg2
update CMS_SYSPRM set CURRENT_STATE='STOPPING',DESIRED_STATE='Down' where DESIRED_STATE <> 'Ignored'
;
psql: could not connect to server: Connection refused
        Is the server running on host "kccontrolmt01" (10.1.32.53) and accepting
        TCP/IP connections on port 5433?

-----Original Message-----
From: Adrian Klaver <[hidden email]>
Sent: Tuesday, October 09, 2018 7:39 PM
To: Bryce Pepper <[hidden email]>; [hidden email]
Subject: Re: RHEL 7 (systemd) reboot

This email originated from outside the company. Please use caution when opening attachments or clicking on links. If you suspect this to be a phishing attempt, please report via PhishAlarm.
________________________________

On 10/9/18 11:06 AM, Bryce Pepper wrote:

> I am running three instances (under different users) on a RHEL 7
> server to support a vendor product.
>
> In the defined services, the start & stop scripts work fine when
> invoked with systemctl {start|stop} whatever.service  but we have
> automated monthly patching which does a reboot.
>
> Looking in /var/log/messages and the stop scripts do not get invoked
> on reboot, therefore I created a new shutdown service as described
> here <https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>.
>
> It appears that PostGreSQL is receiving a signal from somewhere prior
> to my script running.
>

>
> The database must be available for the product to shut down in a
> consistent state.
>
> I am open to suggestions.

What is the below doing or coming from?:

db_execute_sql failed while processing
/data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448.

>
> Thanks,
>
> Bryce
>
> *Bryce Pepper*
>
> Sr. Unix Applications Systems Engineer
>
> *The Kansas City Southern Railway Company *
>
> 114 West 11^th Street  |  Kansas City,  MO 64105
>
> Office:  816.983.1512
>
> Email: [hidden email] <mailto:[hidden email]>
>


--
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: RHEL 7 (systemd) reboot

Adrian Klaver-4
In reply to this post by Bryce Pepper
On 10/10/18 5:32 AM, Bryce Pepper wrote:
> Adrian,
> Thanks for the inquiry.  The function (db_execute_sql) is coming from a vendor (BMC) product called Control-M. It is a scheduling product.
> The tmp file is deleted before I can see its contents but I believe it is trying to update some columns in the CMS_SYSPRM table.
> I also think the postgresql instance is already stopped and hence why the db_execute fails.  I will try to modify the vendor function to save off the contents of the query.

Alright, I'm confused. In your earlier post you said the stop script is
not running. Yet here it is, just not at the right time. I think a more
detailed explanation is needed:

1) The stop script you are concerned about is a systemd  script, one
that you created or system provided?

2) What is the shutdown service you refer to?

3) Is there a separate shutdown script for the Control-M product?

4) What do you expect to happen vs what is happening?

>
> Bryce
>
> p.s. Do you know of any verbose logging that could be turned on to catch when pgsql is being terminated?
>
>
> -----Original Message-----
> From: Adrian Klaver <[hidden email]>
> Sent: Tuesday, October 09, 2018 7:39 PM
> To: Bryce Pepper <[hidden email]>; [hidden email]
> Subject: Re: RHEL 7 (systemd) reboot
>
> This email originated from outside the company. Please use caution when opening attachments or clicking on links. If you suspect this to be a phishing attempt, please report via PhishAlarm.
> ________________________________
>
> On 10/9/18 11:06 AM, Bryce Pepper wrote:
>> I am running three instances (under different users) on a RHEL 7
>> server to support a vendor product.
>>
>> In the defined services, the start & stop scripts work fine when
>> invoked with systemctl {start|stop} whatever.service  but we have
>> automated monthly patching which does a reboot.
>>
>> Looking in /var/log/messages and the stop scripts do not get invoked
>> on reboot, therefore I created a new shutdown service as described
>> here <https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>.
>>
>> It appears that PostGreSQL is receiving a signal from somewhere prior
>> to my script running.
>>
>
>>
>> The database must be available for the product to shut down in a
>> consistent state.
>>
>> I am open to suggestions.
>
> What is the below doing or coming from?:
>
> db_execute_sql failed while processing
> /data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448.
>
>>
>> Thanks,
>>
>> Bryce
>>
>> *Bryce Pepper*
>>
>> Sr. Unix Applications Systems Engineer
>>
>> *The Kansas City Southern Railway Company *
>>
>> 114 West 11^th Street  |  Kansas City,  MO 64105
>>
>> Office:  816.983.1512
>>
>> Email: [hidden email] <mailto:[hidden email]>
>>
>
>
> --
> Adrian Klaver
> [hidden email]
>


--
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: RHEL 7 (systemd) reboot

Bryce Pepper
Sorry, I wasn't clear in the prior posts.  

The stop script is running during reboot. The problem is the database is not reachable when the stop script runs.  The ctmdist server shut down is as follows:
   Stop control-m application
   Stop control-m configuration agent
   Stop database

As you can see the intent is for the database to be shut down after the product.

But as you noticed from /var/log/message the stop_ctmlinux_server.sh  script is running but unable to execute the update query.

I created the following Service definition and scripts that follow -- note there are 2 datacenters (ctmdist, ctmlinux) that have comparable scripts so I have only included one set:

[root@kccontrolmt01 ~]# cat ControlM_Shutdown.service
[Unit]
Description=Run mycommand at shutdown
Requires=network.target CTM_Postgre.service
DefaultDependencies=no
Before=shutdown.target reboot.target

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/root/scripts/control-m_shutdown.sh

[Install]
WantedBy=multi-user.target


[root@kccontrolmt01 ~]# cat /root/scripts/control-m_shutdown.sh
#!/bin/sh
  # Shutdown any running Control-M services
    STATUS=$(/usr/bin/systemctl is-active CTMLinux_Server.service)
    if [ ${STATUS} == "active" ]; then
      /usr/bin/systemctl stop CTMLinux_Server.service
    fi

    STATUS=$(/usr/bin/systemctl is-active CTMDist_Server.service)
    if [ ${STATUS} == "active" ]; then
      /usr/bin/systemctl stop CTMDist_Server.service
    fi

    STATUS=$(/usr/bin/systemctl is-active EnterpriseManager.service)
    if [ ${STATUS} == "active" ]; then
      /usr/bin/systemctl stop EnterpriseManager.service
    fi
exit 0


#!/bin/bash

# stop CONTROL-M
if [ -f /data00/ctmlinux/ctm_server/scripts/shut_ctm ]; then
  echo "Stopping CONTROL-M application"
  /data00/ctmlinux/ctm_server/scripts/shut_ctm
fi

# stop CONTROL-M Configuration Agent
if [ -f /data00/ctmlinux/ctm_server/scripts/shut_ca ]; then
  echo "Stopping CONTROL-M Server Configuration Agent"
  /data00/ctmlinux/ctm_server/scripts/shut_ca
fi

# stop database
/data00/ctmlinux/ctm_server/scripts/dbversion
if [ $? -ne 0 ] ; then
  echo "SQL Server is already stopped "
else
  if [ -f /data00/ctmlinux/ctm_server/scripts/shutdb ]; then
    echo "Stopping SQL server for CONTROL-M"
    /data00/ctmlinux/ctm_server/scripts/shutdb
  fi
fi

exit 0

-----Original Message-----
From: Adrian Klaver <[hidden email]>
Sent: Wednesday, October 10, 2018 8:25 AM
To: Bryce Pepper <[hidden email]>; [hidden email]
Subject: Re: RHEL 7 (systemd) reboot

This email originated from outside the company. Please use caution when opening attachments or clicking on links. If you suspect this to be a phishing attempt, please report via PhishAlarm.
________________________________

On 10/10/18 5:32 AM, Bryce Pepper wrote:
> Adrian,
> Thanks for the inquiry.  The function (db_execute_sql) is coming from a vendor (BMC) product called Control-M. It is a scheduling product.
> The tmp file is deleted before I can see its contents but I believe it is trying to update some columns in the CMS_SYSPRM table.
> I also think the postgresql instance is already stopped and hence why the db_execute fails.  I will try to modify the vendor function to save off the contents of the query.

Alright, I'm confused. In your earlier post you said the stop script is not running. Yet here it is, just not at the right time. I think a more detailed explanation is needed:

1) The stop script you are concerned about is a systemd  script, one that you created or system provided?

2) What is the shutdown service you refer to?

3) Is there a separate shutdown script for the Control-M product?

4) What do you expect to happen vs what is happening?

>
> Bryce
>
> p.s. Do you know of any verbose logging that could be turned on to catch when pgsql is being terminated?
>
>
> -----Original Message-----
> From: Adrian Klaver <[hidden email]>
> Sent: Tuesday, October 09, 2018 7:39 PM
> To: Bryce Pepper <[hidden email]>;
> [hidden email]
> Subject: Re: RHEL 7 (systemd) reboot
>
> This email originated from outside the company. Please use caution when opening attachments or clicking on links. If you suspect this to be a phishing attempt, please report via PhishAlarm.
> ________________________________
>
> On 10/9/18 11:06 AM, Bryce Pepper wrote:
>> I am running three instances (under different users) on a RHEL 7
>> server to support a vendor product.
>>
>> In the defined services, the start & stop scripts work fine when
>> invoked with systemctl {start|stop} whatever.service  but we have
>> automated monthly patching which does a reboot.
>>
>> Looking in /var/log/messages and the stop scripts do not get invoked
>> on reboot, therefore I created a new shutdown service as described
>> here <https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>.
>>
>> It appears that PostGreSQL is receiving a signal from somewhere prior
>> to my script running.
>>
>
>>
>> The database must be available for the product to shut down in a
>> consistent state.
>>
>> I am open to suggestions.
>
> What is the below doing or coming from?:
>
> db_execute_sql failed while processing
> /data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448.
>
>>
>> Thanks,
>>
>> Bryce
>>
>> *Bryce Pepper*
>>
>> Sr. Unix Applications Systems Engineer
>>
>> *The Kansas City Southern Railway Company *
>>
>> 114 West 11^th Street  |  Kansas City,  MO 64105
>>
>> Office:  816.983.1512
>>
>> Email: [hidden email] <mailto:[hidden email]>
>>
>
>
> --
> Adrian Klaver
> [hidden email]
>


--
Adrian Klaver
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: RHEL 7 (systemd) reboot

Adrian Klaver-4
On 10/10/18 7:37 AM, Bryce Pepper wrote:
> Sorry, I wasn't clear in the prior posts.
>
> The stop script is running during reboot. The problem is the database is not reachable when the stop script runs.  The ctmdist server shut down is as follows:
>     Stop control-m application
>     Stop control-m configuration agent
>     Stop database

Several things:

1) In your OP there was this:

Oct 05 14:18:56 kccontrolmt01 network[29310]: Shutting down interface
eth0:  Device 'eth0' successfully disconnected.

Oct 05 14:18:56 kccontrolmt01 network[29310]: [  OK  ]

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]:
------------------------

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Shutting
down CONTROL-M.

So is your Postgres instance running on the same machine as the CTM
instance or does the eth0 need to be up to reach the database?

2) In the above there is:
"Shutting down CONTROL-M."

Yet in script below there is:
"Stopping CONTROL-M application"

Is this because there are sub-scripts involved or the "Stopping ..." is
embedded in the script?

3) I am by no means a shell script expert and I will admit to not fully
understanding what control-m_shutdown.sh does. Still here it goes:

a) Are there actually two shebangs in one file or are there two files
involved?

b) What is:

# stop database
/data00/ctmlinux/ctm_server/scripts/dbversion
if [ $? -ne 0 ] ; then
   echo "SQL Server is already stopped "
else
   if [ -f /data00/ctmlinux/ctm_server/scripts/shutdb ]; then
     echo "Stopping SQL server for CONTROL-M"
     /data00/ctmlinux/ctm_server/scripts/shutdb
   fi

actually doing?

I ask because from what I can see there are a set of parallel processes
initiated and it is possible that the database server is winning. It
comes down to what 'if [ $? -ne 0 ]' is testing.



>
> As you can see the intent is for the database to be shut down after the product.
>
> But as you noticed from /var/log/message the stop_ctmlinux_server.sh  script is running but unable to execute the update query.
>
> I created the following Service definition and scripts that follow -- note there are 2 datacenters (ctmdist, ctmlinux) that have comparable scripts so I have only included one set:
>
> [root@kccontrolmt01 ~]# cat ControlM_Shutdown.service
> [Unit]
> Description=Run mycommand at shutdown
> Requires=network.target CTM_Postgre.service
> DefaultDependencies=no
> Before=shutdown.target reboot.target
>
> [Service]
> Type=oneshot
> RemainAfterExit=true
> ExecStart=/bin/true
> ExecStop=/root/scripts/control-m_shutdown.sh
>
> [Install]
> WantedBy=multi-user.target
>
>
> [root@kccontrolmt01 ~]# cat /root/scripts/control-m_shutdown.sh
> #!/bin/sh
>    # Shutdown any running Control-M services
>      STATUS=$(/usr/bin/systemctl is-active CTMLinux_Server.service)
>      if [ ${STATUS} == "active" ]; then
>        /usr/bin/systemctl stop CTMLinux_Server.service
>      fi
>
>      STATUS=$(/usr/bin/systemctl is-active CTMDist_Server.service)
>      if [ ${STATUS} == "active" ]; then
>        /usr/bin/systemctl stop CTMDist_Server.service
>      fi
>
>      STATUS=$(/usr/bin/systemctl is-active EnterpriseManager.service)
>      if [ ${STATUS} == "active" ]; then
>        /usr/bin/systemctl stop EnterpriseManager.service
>      fi
> exit 0
>
>
> #!/bin/bash
>
> # stop CONTROL-M
> if [ -f /data00/ctmlinux/ctm_server/scripts/shut_ctm ]; then
>    echo "Stopping CONTROL-M application"
>    /data00/ctmlinux/ctm_server/scripts/shut_ctm
> fi
>
> # stop CONTROL-M Configuration Agent
> if [ -f /data00/ctmlinux/ctm_server/scripts/shut_ca ]; then
>    echo "Stopping CONTROL-M Server Configuration Agent"
>    /data00/ctmlinux/ctm_server/scripts/shut_ca
> fi
>
> # stop database
> /data00/ctmlinux/ctm_server/scripts/dbversion
> if [ $? -ne 0 ] ; then
>    echo "SQL Server is already stopped "
> else
>    if [ -f /data00/ctmlinux/ctm_server/scripts/shutdb ]; then
>      echo "Stopping SQL server for CONTROL-M"
>      /data00/ctmlinux/ctm_server/scripts/shutdb
>    fi
> fi
>
> exit 0
>



--
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: RHEL 7 (systemd) reboot

Bryce Pepper
Adrian,

Thanks for being willing to dig into this.  

You are correct there are other scripts being called from mine (delivered by BMC with their software).   In order to stay in support and work with their updates I use the vendor supplied scripts/programs.  

The Control-M product is installed on this single server and is broken down into the following parts:
Enterprise server with dedicated postgresql instance
Distributed datacenter with agent and dedicated postgresql instance
Linux datacenter with with agent and dedicated postgresql instance

To cut down on the noise, my post only focused on the "Distributed" side and shutdown process -- although the ControlM_Shutdown.service unit stop script manages all of the above components.

In the ControlM_Shutdown.service there is a requires statement identifying that  network must be available while this systemd unit runs.

You noticed that the eth0 disconnected in the /var/log/messages.   I showed that to highlight that the unit was not executing in the order I had intended, again refer to the requires statement.

The second shebang is from one of the invoked subscripts (stop_ctmdist_server.sh) and is the "main" shutdown sequence for the Distributed datacenter (I think the "SQL server" echo from BMC is because it can be configured with other databases and they use it in a generic term --- not meaning sqlserver from Microsoft).

The dbversion check is being used to verify pgsql instance for this datacenter is running and returns a non-zero return code if the instance is unreachable (I could use pg_isready or pg_ctl but would diverge further from the BMC supported technique).

You probably also noticed in the earlier posted shutdown service a requires of CTM_Postgre.service.  This was one of my attempts to ensure the instance was available by actually starting the instance outside of the BMC routines (if it is already running the BMC routines will not start -- the dbversion check is on the start side also).  I thought if I managed the postgresql instance outside of the product I could ensure it was running.  Unfortunately that didn't work as the instance shutdown on its own, presumably a resource (perhaps network) was terminated and postgresql shutdown.  

So to restate the original post...   It appears the postgresql instance is unavailable when the stop script runs.  

Thanks,
Bryce

[root@kccontrolmt01 ~]# systemctl --full cat ControlM_Shutdown.service
# /etc/systemd/system/ControlM_Shutdown.service
[Unit]
Description=Run ControlM shutdown process
Requires=graphical.target multi-user.target network.target network.service sockets.target
DefaultDependencies=no
Before=shutdown.target reboot.target halt.target poweroff.target kexec.target

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash /root/scripts/control-m_shutdown.sh
TimeoutStopSec=4min

[Install]
WantedBy=multi-user.target
[root@kccontrolmt01 ~]#

Reply | Threaded
Open this post in threaded view
|

Re: RHEL 7 (systemd) reboot

Adrian Klaver-4
On 10/11/18 6:33 AM, Bryce Pepper wrote:

> Adrian,
>
> Thanks for being willing to dig into this.
>
> You are correct there are other scripts being called from mine (delivered by BMC with their software).   In order to stay in support and work with their updates I use the vendor supplied scripts/programs.
>
> The Control-M product is installed on this single server and is broken down into the following parts:
> Enterprise server with dedicated postgresql instance
> Distributed datacenter with agent and dedicated postgresql instance
> Linux datacenter with with agent and dedicated postgresql instance
>
> To cut down on the noise, my post only focused on the "Distributed" side and shutdown process -- although the ControlM_Shutdown.service unit stop script manages all of the above components.
>
> In the ControlM_Shutdown.service there is a requires statement identifying that  network must be available while this systemd unit runs.
>
> You noticed that the eth0 disconnected in the /var/log/messages.   I showed that to highlight that the unit was not executing in the order I had intended, again refer to the requires statement.
>
> The second shebang is from one of the invoked subscripts (stop_ctmdist_server.sh) and is the "main" shutdown sequence for the Distributed datacenter (I think the "SQL server" echo from BMC is because it can be configured with other databases and they use it in a generic term --- not meaning sqlserver from Microsoft).
>
> The dbversion check is being used to verify pgsql instance for this datacenter is running and returns a non-zero return code if the instance is unreachable (I could use pg_isready or pg_ctl but would diverge further from the BMC supported technique).
>
> You probably also noticed in the earlier posted shutdown service a requires of CTM_Postgre.service.  This was one of my attempts to ensure the instance was available by actually starting the instance outside of the BMC routines (if it is already running the BMC routines will not start -- the dbversion check is on the start side also).  I thought if I managed the postgresql instance outside of the product I could ensure it was running.  Unfortunately that didn't work as the instance shutdown on its own, presumably a resource (perhaps network) was terminated and postgresql shutdown.
>
> So to restate the original post...   It appears the postgresql instance is unavailable when the stop script runs.
>
> Thanks,
> Bryce
>
> [root@kccontrolmt01 ~]# systemctl --full cat ControlM_Shutdown.service
> # /etc/systemd/system/ControlM_Shutdown.service
> [Unit]
> Description=Run ControlM shutdown process
> Requires=graphical.target multi-user.target network.target network.service sockets.target
> DefaultDependencies=no
> Before=shutdown.target reboot.target halt.target poweroff.target kexec.target

Again I am not a systemd expert, but I believe the Before line above is
the opposite of what you want:

https://serverfault.com/questions/812584/in-systemd-whats-the-difference-between-after-and-requires#812589

Above quotes man
page(https://www.freedesktop.org/software/systemd/man/systemd.unit.html):

"... Note that when two units with an ordering dependency between them
are shut down, the inverse of the start-up order is applied. i.e. if a
unit is configured with After= on another unit, the former is stopped
before the latter if both are shut down. ..."


>
> [Service]
> Type=oneshot
> RemainAfterExit=true
> ExecStart=/bin/true
> ExecStop=/bin/bash /root/scripts/control-m_shutdown.sh
> TimeoutStopSec=4min
>
> [Install]
> WantedBy=multi-user.target
> [root@kccontrolmt01 ~]#
>


--
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: RHEL 7 (systemd) reboot

Bryce Pepper
Adrian,

I tried changing the Before to After but the postgresql instance was still shutdown too early.

I appreciate all of the help but think I'm going to ask the patching group to ensure they stop the control-m services prior to reboot.

Bryce

Oct 11 09:19:57 kccontrolmt01 su[9816]: pam_unix(su-l:session): session opened for user sa_ctmlinux_uat by (uid=0)
Oct 11 09:19:57 kccontrolmt01 systemd[1]: Started Restore /run/initramfs.
Oct 11 09:19:57 kccontrolmt01 stop_ctmdist_agent.sh[9671]: setenv: Too many arguments.
Oct 11 09:19:57 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: setenv: Too many arguments.
Oct 11 09:19:57 kccontrolmt01 stop_ctmdist_agent.sh[9671]: Killing Control-M/Agent Listener pid:5595
Oct 11 09:19:57 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: Killing Control-M/Agent Listener pid:5977
Oct 11 09:19:58 kccontrolmt01 stop_ctmdist_agent.sh[9671]: 2018-10-11 09:19:58 Listener process stopped
Oct 11 09:19:58 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: 2018-10-11 09:19:58 Listener process stopped
Oct 11 09:19:58 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: Killing Control-M/Agent Tracker pid:6199
Oct 11 09:19:58 kccontrolmt01 stop_ctmdist_agent.sh[9671]: Killing Control-M/Agent Tracker pid:6172
Oct 11 09:19:58 kccontrolmt01 systemd[1]: Stopped Dynamic System Tuning Daemon.
Oct 11 09:19:59 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: 2018-10-11 09:19:59 Tracker process stopped
Oct 11 09:19:59 kccontrolmt01 stop_ctmdist_agent.sh[9671]: 2018-10-11 09:19:59 Tracker process stopped
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Eracent EUA Service.
Oct 11 09:19:59 kccontrolmt01 su[9815]: pam_unix(su-l:session): session closed for user sa_ctmdist_uat
Oct 11 09:19:59 kccontrolmt01 su[9816]: pam_unix(su-l:session): session closed for user sa_ctmlinux_uat
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Control-M CTM Dist Agent.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping Control-M CTM Dist Server...
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Control-M CTM Linux Agent.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping Control-M CTM Linux Server...
Oct 11 09:19:59 kccontrolmt01 su[10319]: (to sa_ctmdist_uat) root on none
Oct 11 09:19:59 kccontrolmt01 su[10320]: (to sa_ctmlinux_uat) root on none
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
Oct 11 09:19:59 kccontrolmt01 systemd-logind[777]: Failed to start session scope session-c12.scope: Transaction is destructive.
Oct 11 09:19:59 kccontrolmt01 su[10319]: pam_systemd(su-l:session): Failed to create session: Resource deadlock avoided
Oct 11 09:19:59 kccontrolmt01 su[10319]: pam_unix(su-l:session): session opened for user sa_ctmdist_uat by (uid=0)
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
Oct 11 09:19:59 kccontrolmt01 systemd-logind[777]: Failed to start session scope session-c13.scope: Transaction is destructive.
Oct 11 09:19:59 kccontrolmt01 su[10320]: pam_systemd(su-l:session): Failed to create session: Resource deadlock avoided
Oct 11 09:19:59 kccontrolmt01 su[10320]: pam_unix(su-l:session): session opened for user sa_ctmlinux_uat by (uid=0)
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Eracent EPA Service.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped target Network.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping Network.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping LSB: Bring up/down networking...
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: setenv: Too many arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Stopping CONTROL-M application
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: SQL Server is not running.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: setenv: Too many arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Stopping CONTROL-M application
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: SQL Server is not running.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: ------------------------
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Shutting down CONTROL-M.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: ------------------------
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Waiting ...
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: psql action failed. cannot perform sql command in /data00/ctmdist/ctm_server/tmp/upd_CMS_SYSP
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: db_execute_sql failed while processing /data00/ctmdist/ctm_server/tmp/upd_CMS_SYSPRM_10512.sq
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Failed to update CMS_SYSPRM table.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Be aware that the Configuration Agent might start the CONTROL-M/Server
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: ------------------------
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Shutting down CONTROL-M.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: ------------------------
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Waiting ...
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: psql action failed. cannot perform sql command in /data00/ctmlinux/ctm_server/tmp/upd_CMS_SY
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: db_execute_sql failed while processing /data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_10571.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Failed to update CMS_SYSPRM table.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Be aware that the Configuration Agent might start the CONTROL-M/Server
Oct 11 09:20:00 kccontrolmt01 NetworkManager[789]: <info>  [1539267600.3979] device (eth0): state change: activated -> deactivating (reason 'user-requeste
Oct 11 09:20:00 kccontrolmt01 NetworkManager[789]: <info>  [1539267600.4062] manager: NetworkManager state is now DISCONNECTING
Oct 11 09:20:00 kccontrolmt01 dbus[748]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispa
Oct 11 09:20:00 kccontrolmt01 dbus[748]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.nm-dispatcher.service': Refusing activation
Oct 11 09:20:00 kccontrolmt01 NetworkManager[789]: <info>  [1539267600.4228] audit: op="device-disconnect" interface="eth0" ifindex=2 pid=10883 uid=0 resu
Oct 11 09:20:00 kccontrolmt01 NetworkManager[789]: <info>  [1539267600.4240] device (eth0): state change: deactivating -> disconnected (reason 'user-reque
Oct 11 09:20:00 kccontrolmt01 NetworkManager[789]: <warn>  [1539267600.4319] platform-linux: do-change-link[2]: failure changing link: failure 97 (Address
Oct 11 09:20:00 kccontrolmt01 NetworkManager[789]: <warn>  [1539267600.4325] device (eth0): failed to enable userspace IPv6LL address handling (unspecifie
Oct 11 09:20:00 kccontrolmt01 NetworkManager[789]: <info>  [1539267600.4509] manager: NetworkManager state is now DISCONNECTED
Oct 11 09:20:00 kccontrolmt01 dbus[748]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispa
Oct 11 09:20:00 kccontrolmt01 dbus[748]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.nm-dispatcher.service': Refusing activation
Oct 11 09:20:00 kccontrolmt01 network[10323]: Shutting down interface eth0:  Device 'eth0' successfully disconnected.
Reply | Threaded
Open this post in threaded view
|

Re: RHEL 7 (systemd) reboot

Adrian Klaver-4
On 10/11/18 7:53 AM, Bryce Pepper wrote:
> Adrian,
>
> I tried changing the Before to After but the postgresql instance was still shutdown too early.

In an earlier post you had:

cat ControlM_Shutdown.service
[Unit]
Description=Run mycommand at shutdown
Requires=network.target CTM_Postgre.service

Did you add CTM_Postgre.service to After= ?

My suspicion being that CTM_Postgre.service is running before you get to
ControlM_Shutdown.service. Unless of course CTM_Postgre.service does not
exist anymore.

Then there is this:

Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: setenv: Too
many arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Stopping
CONTROL-M application
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: SQL Server
is not running.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: setenv:
Too many arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Stopping
CONTROL-M application
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: SQL Server
is not running.

which to me looks like the script is running twice

>
> I appreciate all of the help but think I'm going to ask the patching group to ensure they stop the control-m services prior to reboot.

Yeah, there seems to be hidden dependencies happening.
>
> Bryce
>



--
Adrian Klaver
[hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: RHEL 7 (systemd) reboot

Bryce Pepper
I disabled and removed the CTM_Postgre.service as it didn't help (and I didn't want too many moving parts left out there).

I did find a post https://superuser.com/questions/1016827/how-do-i-run-a-script-before-everything-else-on-shutdown-with-systemd that I think is getting me closer.

I tried    RequiresMountsFor=/data00    which starts the script much sooner but unfortunately  the  postgresql instance is unreachable by the time the script gets there.

These are two unique datacenter shutdowns: ctmdist  & ctmlinux

Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: setenv: Too many arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Stopping CONTROL-M application Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: SQL Server is not running.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: setenv:
Too many arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Stopping CONTROL-M application Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: SQL Server is not running.

Reply | Threaded
Open this post in threaded view
|

Re: RHEL 7 (systemd) reboot

Adrian Klaver-4
On 10/11/18 10:43 AM, Bryce Pepper wrote:
> I disabled and removed the CTM_Postgre.service as it didn't help (and I didn't want too many moving parts left out there).
>
> I did find a post https://superuser.com/questions/1016827/how-do-i-run-a-script-before-everything-else-on-shutdown-with-systemd that I think is getting me closer.
>
> I tried    RequiresMountsFor=/data00    which starts the script much sooner but unfortunately  the  postgresql instance is unreachable by the time the script gets there.

Seems to me the first priority is finding what is shutting down Postgres.

Does the system log show anything?

If not, find the shutdown time in the Postgres log and correlate that
with the system log.

>
> These are two unique datacenter shutdowns: ctmdist  & ctmlinux
>
> Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: setenv: Too many arguments.
> Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Stopping CONTROL-M application Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: SQL Server is not running.
> Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: setenv:
> Too many arguments.
> Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Stopping CONTROL-M application Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: SQL Server is not running.
>


--
Adrian Klaver
[hidden email]