Following are details from the server log files showing what's been going on:
Code:
root pts/1 pool-###-##-##-# Sat Feb 24 13:18:40 2018 still logged in
root pts/0 pool-###-##-##-# Sat Feb 24 13:16:20 2018 still logged in
root pts/1 pool-###-##-##-# Fri Feb 23 20:49:48 2018 - Sat Feb 24 03:37:44 2018 (06:47)
root pts/0 pool-###-##-##-# Fri Feb 23 18:58:05 2018 - Fri Feb 23 20:50:05 2018 (01:52)
root pts/1 pool-###-##-##-# Fri Feb 23 14:42:58 2018 - Fri Feb 23 18:24:10 2018 (03:41)
root pts/0 pool-###-##-##-# Fri Feb 23 14:41:41 2018 - Fri Feb 23 18:22:58 2018 (03:41)
root pts/1 pool-###-##-##-# Thu Feb 22 23:54:23 2018 - Fri Feb 23 00:52:59 2018 (00:58)
root pts/0 pool-###-##-##-# Thu Feb 22 23:52:58 2018 - Fri Feb 23 00:52:45 2018 (00:59)
root pts/0 pool-###-##-##-# Wed Feb 21 19:22:58 2018 - Wed Feb 21 19:23:10 2018 (00:00)
root pts/0 pool-###-##-##-# Wed Feb 21 12:23:52 2018 - Wed Feb 21 12:23:59 2018 (00:00)
root pts/1 pool-###-##-##-# Tue Feb 20 17:06:43 2018 - Wed Feb 21 01:21:08 2018 (08:14)
root pts/0 pool-###-##-##-# Tue Feb 20 16:43:54 2018 - Wed Feb 21 01:23:16 2018 (08:39)
runlevel (to lvl 3) 2.6.32-696.20.1. Mon Feb 19 10:15:08 2018 - Sat Feb 24 13:27:03 2018 (5+03:11)
reboot system boot 2.6.32-696.20.1. Mon Feb 19 10:15:08 2018 - Sat Feb 24 13:27:03 2018 (5+03:11)
root pts/1 ##-###-###-###.d Sat Feb 17 23:24:29 2018 - Sun Feb 18 00:48:24 2018 (01:23)
root pts/0 ##-###-###-###.d Sat Feb 17 23:21:06 2018 - Sun Feb 18 00:48:03 2018 (01:26)
root pts/0 ##-###-###-###.d Sat Feb 17 22:48:24 2018 - Sat Feb 17 23:19:25 2018 (00:31)
runlevel (to lvl 3) 2.6.32-696.20.1. Sat Feb 17 02:11:54 2018 - Mon Feb 19 10:15:08 2018 (2+08:03)
reboot system boot 2.6.32-696.20.1. Sat Feb 17 02:11:54 2018 - Sat Feb 24 13:27:03 2018 (7+11:15)
runlevel (to lvl 3) 2.6.32-696.20.1. Fri Feb 16 22:07:56 2018 - Sat Feb 17 02:11:54 2018 (04:03)
reboot system boot 2.6.32-696.20.1. Fri Feb 16 22:07:56 2018 - Sat Feb 24 13:27:03 2018 (7+15:19)
root pts/1 pool-###-##-##-# Wed Feb 14 18:04:21 2018 - Wed Feb 14 18:15:28 2018 (00:11)
root pts/0 pool-###-##-##-# Wed Feb 14 17:45:05 2018 - Wed Feb 14 18:15:34 2018 (00:30)
root pts/0 pool-###-##-##-# Thu Feb 8 05:26:42 2018 - Thu Feb 8 05:51:09 2018 (00:24)
runlevel (to lvl 3) 2.6.32-696.20.1. Thu Feb 8 05:23:45 2018 - Fri Feb 16 22:07:56 2018 (8+16:44)
reboot system boot 2.6.32-696.20.1. Thu Feb 8 05:23:45 2018 - Sat Feb 24 13:27:03 2018 (16+08:03)
shutdown system down 2.6.32-696.16.1. Thu Feb 8 05:22:31 2018 - Thu Feb 8 05:23:45 2018 (00:01)
runlevel (to lvl 6) 2.6.32-696.16.1. Thu Feb 8 05:22:19 2018 - Thu Feb 8 05:22:31 2018 (00:00)
root pts/0 pool-###-##-##-# Thu Feb 8 03:45:56 2018 - Thu Feb 8 05:09:05 2018 (01:23)
root pts/5 pool-###-##-##-# Thu Feb 8 02:05:21 2018 - Thu Feb 8 05:05:51 2018 (03:00)
root pts/4 pool-###-##-##-# Thu Feb 8 02:04:40 2018 - Thu Feb 8 03:36:24 2018 (01:31)
root pts/3 pool-###-##-##-# Thu Feb 8 02:03:54 2018 - down (03:18)
root pts/2 pool-###-##-##-# Thu Feb 8 01:14:14 2018 - Thu Feb 8 02:32:39 2018 (01:18)
root pts/1 pool-###-##-##-# Thu Feb 8 01:10:49 2018 - Thu Feb 8 02:33:22 2018 (01:22)
root pts/0 pool-###-##-##-# Thu Feb 8 00:10:15 2018 - Thu Feb 8 02:09:24 2018 (01:59)
root pts/0 pool-###-##-##-# Wed Feb 7 22:40:47 2018 - Wed Feb 7 23:02:51 2018 (00:22)
runlevel (to lvl 3) 2.6.32-696.16.1. Wed Feb 7 03:24:10 2018 - Thu Feb 8 05:22:19 2018 (1+01:58)
reboot system boot 2.6.32-696.16.1. Wed Feb 7 03:24:10 2018 - Thu Feb 8 05:22:19 2018 (1+01:58)
runlevel (to lvl 3) 2.6.32-696.16.1. Tue Feb 6 21:20:12 2018 - Wed Feb 7 03:24:10 2018 (06:03)
reboot system boot 2.6.32-696.16.1. Tue Feb 6 21:20:12 2018 - Thu Feb 8 05:22:19 2018 (1+08:02)
root pts/0 pool-###-##-##-# Sat Feb 3 01:21:42 2018 - Sat Feb 3 01:22:13 2018 (00:00)
runlevel (to lvl 3) 2.6.32-696.16.1. Thu Feb 1 20:16:02 2018 - Tue Feb 6 21:20:12 2018 (5+01:04)
reboot system boot 2.6.32-696.16.1. Thu Feb 1 20:16:02 2018 - Thu Feb 8 05:22:19 2018 (6+09:06)
root pts/1 pool-###-##-##-# Thu Dec 28 23:43:22 2017 - Fri Dec 29 01:02:03 2017 (01:18)
root pts/0 pool-###-##-##-# Thu Dec 28 23:36:30 2017 - Fri Dec 29 01:01:55 2017 (01:25)
root pts/0 pool-###-##-##-# Thu Dec 28 22:24:56 2017 - Thu Dec 28 23:36:17 2017 (01:11)
runlevel (to lvl 3) 2.6.32-696.16.1. Thu Dec 28 22:16:53 2017 - Thu Feb 1 20:16:02 2018 (34+21:59)
reboot system boot 2.6.32-696.16.1. Thu Dec 28 22:16:53 2017 - Thu Feb 8 05:22:19 2018 (41+07:05)
shutdown system down 2.6.32-696.1.1.e Thu Dec 28 22:15:39 2017 - Thu Dec 28 22:16:53 2017 (00:01)
runlevel (to lvl 6) 2.6.32-696.1.1.e Thu Dec 28 22:15:18 2017 - Thu Dec 28 22:15:39 2017 (00:00)
root pts/3 pool-###-##-##-# Thu Dec 28 19:50:37 2017 - down (02:24)
root pts/2 pool-###-##-##-# Thu Dec 28 15:59:54 2017 - Thu Dec 28 22:15:10 2017 (06:15)
root pts/1 pool-###-##-##-# Thu Dec 28 15:49:58 2017 - Thu Dec 28 22:05:46 2017 (06:15)
root pts/0 pool-###-##-##-# Thu Dec 28 15:30:31 2017 - Thu Dec 28 22:05:52 2017 (06:35)
runlevel (to lvl 3) 2.6.32-696.1.1.e Wed Dec 27 11:31:10 2017 - Thu Dec 28 22:15:18 2017 (1+10:44)
reboot system boot 2.6.32-696.1.1.e Wed Dec 27 11:31:10 2017 - Thu Dec 28 22:15:18 2017 (1+10:44)
root pts/0 pool-###-##-##-# Mon Dec 18 20:41:14 2017 - Mon Dec 18 20:45:12 2017 (00:03)
root pts/0 pool-###-##-##-# Mon Dec 18 19:46:20 2017 - Mon Dec 18 20:27:35 2017 (00:41)
root pts/0 pool-###-##-##-# Tue Dec 12 20:46:45 2017 - Tue Dec 12 21:01:20 2017 (00:14)
root pts/0 pool-###-##-##-# Tue Dec 12 16:34:05 2017 - Tue Dec 12 16:43:43 2017 (00:09)
root pts/2 pool-###-##-##-# Mon Dec 11 14:53:00 2017 - Mon Dec 11 20:18:45 2017 (05:25)
root pts/1 pool-###-##-##-# Mon Dec 11 14:52:09 2017 - Mon Dec 11 15:32:06 2017 (00:39)
root pts/0 pool-###-##-##-# Mon Dec 11 14:50:27 2017 - Mon Dec 11 20:18:29 2017 (05:28)
Seven server hard reboots (power disruptions?) on Dec 27, Feb 1, Feb 6, Feb 7, Feb 16, Feb 17 & Feb 19.
In the quote above, the problematic reboots are in red. The reboots highlighted in blue are what they are supposed to look like. Reboots are supposed to be preceded by shutdown messages, showing a clean/graceful reboot of the server.
With the preceding shutdown log messages missing, that pretty much means the server either crashed or had a power disruption that caused the server to immediately power off and reboot.
Now for the possibly good news: The datacenter staff where I colocate the server checked on the power distribution unit after I requested, below is one of the replies I received about it:
Quote:
The failure indicators on the PDU that I'm seeing are, one of the banks has failed completely and is usable, the display on the unit is reading error rather displaying it's current power usage. These are typically signs that the PDU heading towards complete failure.
The unit does have a management port but I'm unsure if it actually logs issues these types of issues, I'm also hesitant to console the unit or attempt to reset it as I have witnessed that cause complete failure once they start going bad.
|
Quote:
In regards to the failing PDU, I will contact management about replacing it since we would need to schedule a maintenance widow with multiple clients in order to swap it out.
|
With that, I think it's very likely the reboots were caused by the failing power distribution unit. We should be in the clear once the datacenter PDU is replaced but I'll do some additional hardware diagnostic testing afterwards to make sure.
|