» GC Stats |
Members: 325,425
Threads: 115,510
Posts: 2,196,457
|
Welcome to our newest member, zenjaminusasdz5 |
|
|
|
02-20-2018, 06:32 PM
|
Administrator
|
|
Join Date: Aug 1999
Location: NJ, USA
Posts: 2,151
|
|
All fixed up again. Things might be a little slow for a while, though, as I backup and download various files from the server.
|
02-20-2018, 07:03 PM
|
Super Moderator
|
|
Join Date: Aug 2000
Posts: 13,826
|
|
John, thank you for doing all this work! I must admit that I don't know what all a forum entails.
|
02-20-2018, 07:50 PM
|
Administrator
|
|
Join Date: Aug 1999
Location: NJ, USA
Posts: 2,151
|
|
Quote:
Originally Posted by carnation
John, thank you for doing all this work! I must admit that I don't know what all a forum entails.
|
It's not so much a forum issue as it is a server hardware issue. More specifically, database corruption issues being caused by sporadic power cycling / reboots of the server.
I'm not yet sure what's causing that to happen. Maybe I'll get lucky and it will be something on the datacenter's side of things, such as a faulty power distribution unit or similar. Otherwise, if it's the server then it could be the server's power supply unit causing reboots when it hits certain limits or maybe capacitors on the motherboard starting to go bad and causing reboots in specific circumstances. Might even be a failed/failing RAM module.
At this point my plan of action is to determine if it's on the server side, or datacenter, then proceed from there.
If it's definitely the server going haywire I'll most likely look into renting a server elsewhere and move things there for a while, rather than building a new server for GC. I'll likely build a new server for GC again at some point, but now is not the time for that.
GC's current server has been in 24/7 operation since late 2013, so we're getting close to 5 years on this hardware.
|
02-20-2018, 09:35 PM
|
Super Moderator
|
|
Join Date: Jul 2001
Location: On the beach. Well....not really but near it. :0)
Posts: 13,539
|
|
Many Thanks John!
__________________
Sigma Gamma Rho Sorority, Inc. ** Greater Service, Greater Progress Since 1922
|
02-24-2018, 07:06 PM
|
Administrator
|
|
Join Date: Aug 1999
Location: NJ, USA
Posts: 2,151
|
|
Following are details from the server log files showing what's been going on:
Code:
root pts/1 pool-###-##-##-# Sat Feb 24 13:18:40 2018 still logged in
root pts/0 pool-###-##-##-# Sat Feb 24 13:16:20 2018 still logged in
root pts/1 pool-###-##-##-# Fri Feb 23 20:49:48 2018 - Sat Feb 24 03:37:44 2018 (06:47)
root pts/0 pool-###-##-##-# Fri Feb 23 18:58:05 2018 - Fri Feb 23 20:50:05 2018 (01:52)
root pts/1 pool-###-##-##-# Fri Feb 23 14:42:58 2018 - Fri Feb 23 18:24:10 2018 (03:41)
root pts/0 pool-###-##-##-# Fri Feb 23 14:41:41 2018 - Fri Feb 23 18:22:58 2018 (03:41)
root pts/1 pool-###-##-##-# Thu Feb 22 23:54:23 2018 - Fri Feb 23 00:52:59 2018 (00:58)
root pts/0 pool-###-##-##-# Thu Feb 22 23:52:58 2018 - Fri Feb 23 00:52:45 2018 (00:59)
root pts/0 pool-###-##-##-# Wed Feb 21 19:22:58 2018 - Wed Feb 21 19:23:10 2018 (00:00)
root pts/0 pool-###-##-##-# Wed Feb 21 12:23:52 2018 - Wed Feb 21 12:23:59 2018 (00:00)
root pts/1 pool-###-##-##-# Tue Feb 20 17:06:43 2018 - Wed Feb 21 01:21:08 2018 (08:14)
root pts/0 pool-###-##-##-# Tue Feb 20 16:43:54 2018 - Wed Feb 21 01:23:16 2018 (08:39)
runlevel (to lvl 3) 2.6.32-696.20.1. Mon Feb 19 10:15:08 2018 - Sat Feb 24 13:27:03 2018 (5+03:11)
reboot system boot 2.6.32-696.20.1. Mon Feb 19 10:15:08 2018 - Sat Feb 24 13:27:03 2018 (5+03:11)
root pts/1 ##-###-###-###.d Sat Feb 17 23:24:29 2018 - Sun Feb 18 00:48:24 2018 (01:23)
root pts/0 ##-###-###-###.d Sat Feb 17 23:21:06 2018 - Sun Feb 18 00:48:03 2018 (01:26)
root pts/0 ##-###-###-###.d Sat Feb 17 22:48:24 2018 - Sat Feb 17 23:19:25 2018 (00:31)
runlevel (to lvl 3) 2.6.32-696.20.1. Sat Feb 17 02:11:54 2018 - Mon Feb 19 10:15:08 2018 (2+08:03)
reboot system boot 2.6.32-696.20.1. Sat Feb 17 02:11:54 2018 - Sat Feb 24 13:27:03 2018 (7+11:15)
runlevel (to lvl 3) 2.6.32-696.20.1. Fri Feb 16 22:07:56 2018 - Sat Feb 17 02:11:54 2018 (04:03)
reboot system boot 2.6.32-696.20.1. Fri Feb 16 22:07:56 2018 - Sat Feb 24 13:27:03 2018 (7+15:19)
root pts/1 pool-###-##-##-# Wed Feb 14 18:04:21 2018 - Wed Feb 14 18:15:28 2018 (00:11)
root pts/0 pool-###-##-##-# Wed Feb 14 17:45:05 2018 - Wed Feb 14 18:15:34 2018 (00:30)
root pts/0 pool-###-##-##-# Thu Feb 8 05:26:42 2018 - Thu Feb 8 05:51:09 2018 (00:24)
runlevel (to lvl 3) 2.6.32-696.20.1. Thu Feb 8 05:23:45 2018 - Fri Feb 16 22:07:56 2018 (8+16:44)
reboot system boot 2.6.32-696.20.1. Thu Feb 8 05:23:45 2018 - Sat Feb 24 13:27:03 2018 (16+08:03)
shutdown system down 2.6.32-696.16.1. Thu Feb 8 05:22:31 2018 - Thu Feb 8 05:23:45 2018 (00:01)
runlevel (to lvl 6) 2.6.32-696.16.1. Thu Feb 8 05:22:19 2018 - Thu Feb 8 05:22:31 2018 (00:00)
root pts/0 pool-###-##-##-# Thu Feb 8 03:45:56 2018 - Thu Feb 8 05:09:05 2018 (01:23)
root pts/5 pool-###-##-##-# Thu Feb 8 02:05:21 2018 - Thu Feb 8 05:05:51 2018 (03:00)
root pts/4 pool-###-##-##-# Thu Feb 8 02:04:40 2018 - Thu Feb 8 03:36:24 2018 (01:31)
root pts/3 pool-###-##-##-# Thu Feb 8 02:03:54 2018 - down (03:18)
root pts/2 pool-###-##-##-# Thu Feb 8 01:14:14 2018 - Thu Feb 8 02:32:39 2018 (01:18)
root pts/1 pool-###-##-##-# Thu Feb 8 01:10:49 2018 - Thu Feb 8 02:33:22 2018 (01:22)
root pts/0 pool-###-##-##-# Thu Feb 8 00:10:15 2018 - Thu Feb 8 02:09:24 2018 (01:59)
root pts/0 pool-###-##-##-# Wed Feb 7 22:40:47 2018 - Wed Feb 7 23:02:51 2018 (00:22)
runlevel (to lvl 3) 2.6.32-696.16.1. Wed Feb 7 03:24:10 2018 - Thu Feb 8 05:22:19 2018 (1+01:58)
reboot system boot 2.6.32-696.16.1. Wed Feb 7 03:24:10 2018 - Thu Feb 8 05:22:19 2018 (1+01:58)
runlevel (to lvl 3) 2.6.32-696.16.1. Tue Feb 6 21:20:12 2018 - Wed Feb 7 03:24:10 2018 (06:03)
reboot system boot 2.6.32-696.16.1. Tue Feb 6 21:20:12 2018 - Thu Feb 8 05:22:19 2018 (1+08:02)
root pts/0 pool-###-##-##-# Sat Feb 3 01:21:42 2018 - Sat Feb 3 01:22:13 2018 (00:00)
runlevel (to lvl 3) 2.6.32-696.16.1. Thu Feb 1 20:16:02 2018 - Tue Feb 6 21:20:12 2018 (5+01:04)
reboot system boot 2.6.32-696.16.1. Thu Feb 1 20:16:02 2018 - Thu Feb 8 05:22:19 2018 (6+09:06)
root pts/1 pool-###-##-##-# Thu Dec 28 23:43:22 2017 - Fri Dec 29 01:02:03 2017 (01:18)
root pts/0 pool-###-##-##-# Thu Dec 28 23:36:30 2017 - Fri Dec 29 01:01:55 2017 (01:25)
root pts/0 pool-###-##-##-# Thu Dec 28 22:24:56 2017 - Thu Dec 28 23:36:17 2017 (01:11)
runlevel (to lvl 3) 2.6.32-696.16.1. Thu Dec 28 22:16:53 2017 - Thu Feb 1 20:16:02 2018 (34+21:59)
reboot system boot 2.6.32-696.16.1. Thu Dec 28 22:16:53 2017 - Thu Feb 8 05:22:19 2018 (41+07:05)
shutdown system down 2.6.32-696.1.1.e Thu Dec 28 22:15:39 2017 - Thu Dec 28 22:16:53 2017 (00:01)
runlevel (to lvl 6) 2.6.32-696.1.1.e Thu Dec 28 22:15:18 2017 - Thu Dec 28 22:15:39 2017 (00:00)
root pts/3 pool-###-##-##-# Thu Dec 28 19:50:37 2017 - down (02:24)
root pts/2 pool-###-##-##-# Thu Dec 28 15:59:54 2017 - Thu Dec 28 22:15:10 2017 (06:15)
root pts/1 pool-###-##-##-# Thu Dec 28 15:49:58 2017 - Thu Dec 28 22:05:46 2017 (06:15)
root pts/0 pool-###-##-##-# Thu Dec 28 15:30:31 2017 - Thu Dec 28 22:05:52 2017 (06:35)
runlevel (to lvl 3) 2.6.32-696.1.1.e Wed Dec 27 11:31:10 2017 - Thu Dec 28 22:15:18 2017 (1+10:44)
reboot system boot 2.6.32-696.1.1.e Wed Dec 27 11:31:10 2017 - Thu Dec 28 22:15:18 2017 (1+10:44)
root pts/0 pool-###-##-##-# Mon Dec 18 20:41:14 2017 - Mon Dec 18 20:45:12 2017 (00:03)
root pts/0 pool-###-##-##-# Mon Dec 18 19:46:20 2017 - Mon Dec 18 20:27:35 2017 (00:41)
root pts/0 pool-###-##-##-# Tue Dec 12 20:46:45 2017 - Tue Dec 12 21:01:20 2017 (00:14)
root pts/0 pool-###-##-##-# Tue Dec 12 16:34:05 2017 - Tue Dec 12 16:43:43 2017 (00:09)
root pts/2 pool-###-##-##-# Mon Dec 11 14:53:00 2017 - Mon Dec 11 20:18:45 2017 (05:25)
root pts/1 pool-###-##-##-# Mon Dec 11 14:52:09 2017 - Mon Dec 11 15:32:06 2017 (00:39)
root pts/0 pool-###-##-##-# Mon Dec 11 14:50:27 2017 - Mon Dec 11 20:18:29 2017 (05:28)
Seven server hard reboots (power disruptions?) on Dec 27, Feb 1, Feb 6, Feb 7, Feb 16, Feb 17 & Feb 19.
In the quote above, the problematic reboots are in red. The reboots highlighted in blue are what they are supposed to look like. Reboots are supposed to be preceded by shutdown messages, showing a clean/graceful reboot of the server.
With the preceding shutdown log messages missing, that pretty much means the server either crashed or had a power disruption that caused the server to immediately power off and reboot.
Now for the possibly good news: The datacenter staff where I colocate the server checked on the power distribution unit after I requested, below is one of the replies I received about it:
Quote:
The failure indicators on the PDU that I'm seeing are, one of the banks has failed completely and is usable, the display on the unit is reading error rather displaying it's current power usage. These are typically signs that the PDU heading towards complete failure.
The unit does have a management port but I'm unsure if it actually logs issues these types of issues, I'm also hesitant to console the unit or attempt to reset it as I have witnessed that cause complete failure once they start going bad.
|
Quote:
In regards to the failing PDU, I will contact management about replacing it since we would need to schedule a maintenance widow with multiple clients in order to swap it out.
|
With that, I think it's very likely the reboots were caused by the failing power distribution unit. We should be in the clear once the datacenter PDU is replaced but I'll do some additional hardware diagnostic testing afterwards to make sure.
|
02-24-2018, 09:59 PM
|
Super Moderator
|
|
Join Date: Jul 2001
Location: On the beach. Well....not really but near it. :0)
Posts: 13,539
|
|
And I say again, Thank you John for keeping the ship sailing.
__________________
Sigma Gamma Rho Sorority, Inc. ** Greater Service, Greater Progress Since 1922
|
02-25-2018, 06:27 AM
|
Super Moderator
|
|
Join Date: Sep 2003
Location: naples, florida
Posts: 18,422
|
|
Ditto! Many thanks to you John.
__________________
I live in Fantasyland and I have waterfront property.
|
02-25-2018, 09:55 AM
|
GreekChat Member
|
|
Join Date: Feb 2007
Location: Indiana
Posts: 4,277
|
|
THANK YOU for your dedication to the good in Greek Life and your work to allow us to voice our opinions and—hopefully—offer some positive information and assistance to those who are seeking to join our ranks.
|
02-26-2018, 09:47 AM
|
GreekChat Member
|
|
Join Date: Apr 2001
Location: Rockville,MD,USA
Posts: 3,502
|
|
Thank you very muchly!
__________________
Because "undergrads, please abandon your national policies and make something up" will end well --KnightShadow
|
02-26-2018, 08:00 PM
|
Administrator
|
|
Join Date: Aug 1999
Location: NJ, USA
Posts: 2,151
|
|
Happened again...
Code:
runlevel (to lvl 3) 2.6.32-696.20.1. Mon Feb 26 08:15:42 2018 - Mon Feb 26 19:56:46 2018 (11:41)
reboot system boot 2.6.32-696.20.1. Mon Feb 26 08:15:42 2018 - Mon Feb 26 19:56:46 2018 (11:41)
Taking GC offline for maybe 30 minutes or so to check on things.
|
02-26-2018, 08:43 PM
|
Administrator
|
|
Join Date: Aug 1999
Location: NJ, USA
Posts: 2,151
|
|
Whenever the spontaneous reboots occur it usually causes some crashed tables and other tables not being closed properly. Fortunately, as far as I'm aware, so far none of this has caused any major database problems.
Anyhow, everything should be back in order and functioning properly again.
|
02-26-2018, 10:27 PM
|
GreekChat Member
|
|
Join Date: May 2007
Location: Michigan
Posts: 4,417
|
|
Thanks for everything, John!
__________________
Gamma Phi Beta
|
02-27-2018, 08:19 PM
|
Administrator
|
|
Join Date: Aug 1999
Location: NJ, USA
Posts: 2,151
|
|
Had another spontaneous server hard reboot today...
|
03-02-2018, 02:24 AM
|
Administrator
|
|
Join Date: Aug 1999
Location: NJ, USA
Posts: 2,151
|
|
2 spontaneous reboots on March 1st. One around 9:30 AM and the other around 10:00 PM. That's what took GC offline for a while last night. Some database issues just slow things down a bunch but others take the site completely offline.
Just finished fixing things up.
With the reboots happening more often, I might move GC to a different server temporarily until things with the current server or datacenter equipment are sorted out.
|
03-02-2018, 06:33 PM
|
Moderator
|
|
Join Date: Feb 2002
Location: You're looking at Planet Earth
Posts: 6,541
|
|
Thanks for keeping us updated, John. Gotta love hardware!
__________________
"If you want to criticize my methods, fine. But you can keep your snide remarks to yourself. And while you're at it, don't criticize my methods." Rupert Giles, BtVS
|
|
|
Thread Tools |
|
Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|