GreekChat.com Forums

GreekChat.com Forums (http://www.greekchat.com/gcforums/index.php)
-   Greek Life (http://www.greekchat.com/gcforums/forumdisplay.php?f=24)
-   -   (Resolved) GreekChat Recent Outages & Server Issues (http://www.greekchat.com/gcforums/showthread.php?t=240332)

John 12-28-2017 11:30 PM

(Resolved) GreekChat Recent Outages & Server Issues
 
For a good part of the day today (and possibly part of yesterday) GC was offline due to multiple crashed database tables. I usually check GC's status on a regular basis but found out sooner thanks to SAEalumnus emailing me about it.

Unsure exactly what caused the issue. The server did reboot around 24 hours ago so it may have been a momentary power disruption which caused crashed tables and/or corruption issues in several databases.

In any case, I spent the majority of the day & evening backing up before and after checking then repairing the database issues.

Everything seems back in order now, but if anyone notices something that is not functioning correctly with GC please let me know and I will check into it.

Titchou 12-29-2017 07:23 AM

Thanks, John!!!!

carnation 12-29-2017 08:22 AM

Thank you so much!

Kevin 12-29-2017 10:27 AM

Thanks John.

AZ-AlphaXi 12-29-2017 10:47 AM

Thank you!!

naraht 12-29-2017 10:59 AM

Thank you so much for your work.

Sciencewoman 12-29-2017 05:53 PM

Thanks, John!! And, Happy New Year!

Cheerio 12-29-2017 06:38 PM

Thank you to John and to all the mods/super mods who make GC stand out in a good way!

NinjaPoodle 12-29-2017 08:49 PM

Thanks!

Cheerio 12-29-2017 09:04 PM

But in looking at all the sudden "new" user names listed (on the left in the box) on GC right now, Jon and the mods may have their hands full repairing any damage or multi-posts the spammers can muster. Good luck to our wonderful mods!

AlwaysSAI 12-29-2017 09:16 PM

I don't post as often as I did a few years ago, but I'm still grateful for all your work on GC, John. Thank you!!!!!

aephi alum 12-29-2017 11:14 PM

John, thank you for all your hard work in keeping this site up and running. I know firsthand what a PITA database corruption is - even with a well-planned backup-and-restore plan, the restore is no picnic.

AnchorAlumna 01-03-2018 12:40 AM

Thank you, John!

John 02-08-2018 05:49 AM

Seems we may have had a repeat of whatever caused the outage at the end of December. ZTAngel & carnation reported some errors and problems logging on to GC in the past 2 days or so.

After looking into everything and getting it all sorted out, it's basically the same as what happened 5/6 weeks ago. Server had been rebooted recently (possibly hard boot due to power disruption, although I'm not certain about it) with resulting database corruption. This time, though, GC's database wasn't so corrupted to take it completely offline but still was causing problems.

Issues have been corrected... everything should be working normally again.

FSUZeta 02-08-2018 06:13 AM

Thank you John.

carnation 02-08-2018 06:30 AM

Thanks! I love to check GC during the day.

AZ-AlphaXi 02-08-2018 08:50 AM

John .. as always, thank you for your work and support.

Xidelt 02-08-2018 09:03 AM

Thank you so much for this! I really enjoy reading Greekchat and keeping up on what's going on in the fraternity/sorority world.

AZTheta 02-08-2018 11:13 AM

Thanks again, John. I thought it was something with Chrome or Safari. You fixed it!

IndianaSigKap 02-08-2018 04:59 PM

Poor reading skills for $500, Alex.

At first, I read the subject line as "GreekChat OUTRAGE today" not outage. :D

Cheerio 02-08-2018 05:08 PM

Thank you for getting greekchat up and running again, John. :) Happy to know the problem (in my case, for three days) was not on my end of the interwebs.

Sciencewoman 02-08-2018 08:14 PM

Thank you, John!

John 02-18-2018 12:35 AM

I've been away from home for a few days. Returning tomorrow and figured I would check on things here.

As my luck would have it... we had another problem with GC's server yesterday. I just finished up repairs and sorting things out. Things should be working normally again. Fortunately my tablet was up to the task.

Sorry for this additional outage over the past day.

With this becoming a regular problem, I will post again in the next few days with a plan on how to handle it.

In addition, there have been some changes/improvements that I've been wanting to make to GC for a while now and have been planning for that to begin happening very soon. This is unrelated to the recent server problems. Later this week I'll be posting more info regarding that as well.

FSUZeta 02-18-2018 07:00 AM

Thank you John.

andthen 02-18-2018 09:05 AM

Thanks for all that you do that allows us to post.

PGD-GRAD 02-18-2018 09:52 AM

THANK YOU for keeping our channels open.....

AZTheta 02-18-2018 11:09 AM

thanks, John. I thought it was a problem on my end b/c AZ-AlphaXi said she wasn't having any problems yesterday logging in/on. Hope you can resolve it easily without too much headache.

APhi4Ever 02-18-2018 06:22 PM

Ty so much! I appreciate all you do.

Sciencewoman 02-18-2018 07:15 PM

Thanks, John!!!!!

John 02-20-2018 05:05 PM

Server problems appear to have happened again. Was power cycled sometime in the past day (not by me). Similar to one of the previous times, some GCers are able to access the site but for others they are probably currently just seeing a database error message or something similar.

Working on this now. Should be sorted out soon.

John 02-20-2018 06:32 PM

All fixed up again. Things might be a little slow for a while, though, as I backup and download various files from the server.

carnation 02-20-2018 07:03 PM

John, thank you for doing all this work! I must admit that I don't know what all a forum entails.

John 02-20-2018 07:50 PM

Quote:

Originally Posted by carnation (Post 2454070)
John, thank you for doing all this work! I must admit that I don't know what all a forum entails.

It's not so much a forum issue as it is a server hardware issue. More specifically, database corruption issues being caused by sporadic power cycling / reboots of the server.

I'm not yet sure what's causing that to happen. Maybe I'll get lucky and it will be something on the datacenter's side of things, such as a faulty power distribution unit or similar. Otherwise, if it's the server then it could be the server's power supply unit causing reboots when it hits certain limits or maybe capacitors on the motherboard starting to go bad and causing reboots in specific circumstances. Might even be a failed/failing RAM module.

At this point my plan of action is to determine if it's on the server side, or datacenter, then proceed from there.

If it's definitely the server going haywire I'll most likely look into renting a server elsewhere and move things there for a while, rather than building a new server for GC. I'll likely build a new server for GC again at some point, but now is not the time for that.

GC's current server has been in 24/7 operation since late 2013, so we're getting close to 5 years on this hardware.

NinjaPoodle 02-20-2018 09:35 PM

Many Thanks John!

John 02-24-2018 07:06 PM

Following are details from the server log files showing what's been going on:

Code:


root    pts/1        pool-###-##-##-# Sat Feb 24 13:18:40 2018  still logged in
root    pts/0        pool-###-##-##-# Sat Feb 24 13:16:20 2018  still logged in
root    pts/1        pool-###-##-##-# Fri Feb 23 20:49:48 2018 - Sat Feb 24 03:37:44 2018  (06:47)
root    pts/0        pool-###-##-##-# Fri Feb 23 18:58:05 2018 - Fri Feb 23 20:50:05 2018  (01:52)
root    pts/1        pool-###-##-##-# Fri Feb 23 14:42:58 2018 - Fri Feb 23 18:24:10 2018  (03:41)
root    pts/0        pool-###-##-##-# Fri Feb 23 14:41:41 2018 - Fri Feb 23 18:22:58 2018  (03:41)
root    pts/1        pool-###-##-##-# Thu Feb 22 23:54:23 2018 - Fri Feb 23 00:52:59 2018  (00:58)
root    pts/0        pool-###-##-##-# Thu Feb 22 23:52:58 2018 - Fri Feb 23 00:52:45 2018  (00:59)
root    pts/0        pool-###-##-##-# Wed Feb 21 19:22:58 2018 - Wed Feb 21 19:23:10 2018  (00:00)
root    pts/0        pool-###-##-##-# Wed Feb 21 12:23:52 2018 - Wed Feb 21 12:23:59 2018  (00:00)
root    pts/1        pool-###-##-##-# Tue Feb 20 17:06:43 2018 - Wed Feb 21 01:21:08 2018  (08:14)
root    pts/0        pool-###-##-##-# Tue Feb 20 16:43:54 2018 - Wed Feb 21 01:23:16 2018  (08:39)
runlevel (to lvl 3)  2.6.32-696.20.1. Mon Feb 19 10:15:08 2018 - Sat Feb 24 13:27:03 2018 (5+03:11)
reboot  system boot  2.6.32-696.20.1. Mon Feb 19 10:15:08 2018 - Sat Feb 24 13:27:03 2018 (5+03:11)

root    pts/1        ##-###-###-###.d Sat Feb 17 23:24:29 2018 - Sun Feb 18 00:48:24 2018  (01:23)
root    pts/0        ##-###-###-###.d Sat Feb 17 23:21:06 2018 - Sun Feb 18 00:48:03 2018  (01:26)
root    pts/0        ##-###-###-###.d Sat Feb 17 22:48:24 2018 - Sat Feb 17 23:19:25 2018  (00:31)
runlevel (to lvl 3)  2.6.32-696.20.1. Sat Feb 17 02:11:54 2018 - Mon Feb 19 10:15:08 2018 (2+08:03)
reboot  system boot  2.6.32-696.20.1. Sat Feb 17 02:11:54 2018 - Sat Feb 24 13:27:03 2018 (7+11:15)

runlevel (to lvl 3)  2.6.32-696.20.1. Fri Feb 16 22:07:56 2018 - Sat Feb 17 02:11:54 2018  (04:03)
reboot  system boot  2.6.32-696.20.1. Fri Feb 16 22:07:56 2018 - Sat Feb 24 13:27:03 2018 (7+15:19)

root    pts/1        pool-###-##-##-# Wed Feb 14 18:04:21 2018 - Wed Feb 14 18:15:28 2018  (00:11)
root    pts/0        pool-###-##-##-# Wed Feb 14 17:45:05 2018 - Wed Feb 14 18:15:34 2018  (00:30)
root    pts/0        pool-###-##-##-# Thu Feb  8 05:26:42 2018 - Thu Feb  8 05:51:09 2018  (00:24)
runlevel (to lvl 3)  2.6.32-696.20.1. Thu Feb  8 05:23:45 2018 - Fri Feb 16 22:07:56 2018 (8+16:44)
reboot  system boot  2.6.32-696.20.1. Thu Feb  8 05:23:45 2018 - Sat Feb 24 13:27:03 2018 (16+08:03)
shutdown system down  2.6.32-696.16.1. Thu Feb  8 05:22:31 2018 - Thu Feb  8 05:23:45 2018  (00:01)
runlevel (to lvl 6)  2.6.32-696.16.1. Thu Feb  8 05:22:19 2018 - Thu Feb  8 05:22:31 2018  (00:00)

root    pts/0        pool-###-##-##-# Thu Feb  8 03:45:56 2018 - Thu Feb  8 05:09:05 2018  (01:23)
root    pts/5        pool-###-##-##-# Thu Feb  8 02:05:21 2018 - Thu Feb  8 05:05:51 2018  (03:00)
root    pts/4        pool-###-##-##-# Thu Feb  8 02:04:40 2018 - Thu Feb  8 03:36:24 2018  (01:31)
root    pts/3        pool-###-##-##-# Thu Feb  8 02:03:54 2018 - down                      (03:18)
root    pts/2        pool-###-##-##-# Thu Feb  8 01:14:14 2018 - Thu Feb  8 02:32:39 2018  (01:18)
root    pts/1        pool-###-##-##-# Thu Feb  8 01:10:49 2018 - Thu Feb  8 02:33:22 2018  (01:22)
root    pts/0        pool-###-##-##-# Thu Feb  8 00:10:15 2018 - Thu Feb  8 02:09:24 2018  (01:59)
root    pts/0        pool-###-##-##-# Wed Feb  7 22:40:47 2018 - Wed Feb  7 23:02:51 2018  (00:22)
runlevel (to lvl 3)  2.6.32-696.16.1. Wed Feb  7 03:24:10 2018 - Thu Feb  8 05:22:19 2018 (1+01:58)
reboot  system boot  2.6.32-696.16.1. Wed Feb  7 03:24:10 2018 - Thu Feb  8 05:22:19 2018 (1+01:58)

runlevel (to lvl 3)  2.6.32-696.16.1. Tue Feb  6 21:20:12 2018 - Wed Feb  7 03:24:10 2018  (06:03)
reboot  system boot  2.6.32-696.16.1. Tue Feb  6 21:20:12 2018 - Thu Feb  8 05:22:19 2018 (1+08:02)

root    pts/0        pool-###-##-##-# Sat Feb  3 01:21:42 2018 - Sat Feb  3 01:22:13 2018  (00:00)
runlevel (to lvl 3)  2.6.32-696.16.1. Thu Feb  1 20:16:02 2018 - Tue Feb  6 21:20:12 2018 (5+01:04)
reboot  system boot  2.6.32-696.16.1. Thu Feb  1 20:16:02 2018 - Thu Feb  8 05:22:19 2018 (6+09:06)

root    pts/1        pool-###-##-##-# Thu Dec 28 23:43:22 2017 - Fri Dec 29 01:02:03 2017  (01:18)
root    pts/0        pool-###-##-##-# Thu Dec 28 23:36:30 2017 - Fri Dec 29 01:01:55 2017  (01:25)
root    pts/0        pool-###-##-##-# Thu Dec 28 22:24:56 2017 - Thu Dec 28 23:36:17 2017  (01:11)
runlevel (to lvl 3)  2.6.32-696.16.1. Thu Dec 28 22:16:53 2017 - Thu Feb  1 20:16:02 2018 (34+21:59)
reboot  system boot  2.6.32-696.16.1. Thu Dec 28 22:16:53 2017 - Thu Feb  8 05:22:19 2018 (41+07:05)
shutdown system down  2.6.32-696.1.1.e Thu Dec 28 22:15:39 2017 - Thu Dec 28 22:16:53 2017  (00:01)
runlevel (to lvl 6)  2.6.32-696.1.1.e Thu Dec 28 22:15:18 2017 - Thu Dec 28 22:15:39 2017  (00:00)

root    pts/3        pool-###-##-##-# Thu Dec 28 19:50:37 2017 - down                      (02:24)
root    pts/2        pool-###-##-##-# Thu Dec 28 15:59:54 2017 - Thu Dec 28 22:15:10 2017  (06:15)
root    pts/1        pool-###-##-##-# Thu Dec 28 15:49:58 2017 - Thu Dec 28 22:05:46 2017  (06:15)
root    pts/0        pool-###-##-##-# Thu Dec 28 15:30:31 2017 - Thu Dec 28 22:05:52 2017  (06:35)
runlevel (to lvl 3)  2.6.32-696.1.1.e Wed Dec 27 11:31:10 2017 - Thu Dec 28 22:15:18 2017 (1+10:44)
reboot  system boot  2.6.32-696.1.1.e Wed Dec 27 11:31:10 2017 - Thu Dec 28 22:15:18 2017 (1+10:44)

root    pts/0        pool-###-##-##-# Mon Dec 18 20:41:14 2017 - Mon Dec 18 20:45:12 2017  (00:03)
root    pts/0        pool-###-##-##-# Mon Dec 18 19:46:20 2017 - Mon Dec 18 20:27:35 2017  (00:41)
root    pts/0        pool-###-##-##-# Tue Dec 12 20:46:45 2017 - Tue Dec 12 21:01:20 2017  (00:14)
root    pts/0        pool-###-##-##-# Tue Dec 12 16:34:05 2017 - Tue Dec 12 16:43:43 2017  (00:09)
root    pts/2        pool-###-##-##-# Mon Dec 11 14:53:00 2017 - Mon Dec 11 20:18:45 2017  (05:25)
root    pts/1        pool-###-##-##-# Mon Dec 11 14:52:09 2017 - Mon Dec 11 15:32:06 2017  (00:39)
root    pts/0        pool-###-##-##-# Mon Dec 11 14:50:27 2017 - Mon Dec 11 20:18:29 2017  (05:28)

Seven server hard reboots (power disruptions?) on Dec 27, Feb 1, Feb 6, Feb 7, Feb 16, Feb 17 & Feb 19.

In the quote above, the problematic reboots are in red. The reboots highlighted in blue are what they are supposed to look like. Reboots are supposed to be preceded by shutdown messages, showing a clean/graceful reboot of the server.

With the preceding shutdown log messages missing, that pretty much means the server either crashed or had a power disruption that caused the server to immediately power off and reboot.

Now for the possibly good news: The datacenter staff where I colocate the server checked on the power distribution unit after I requested, below is one of the replies I received about it:

Quote:

The failure indicators on the PDU that I'm seeing are, one of the banks has failed completely and is usable, the display on the unit is reading error rather displaying it's current power usage. These are typically signs that the PDU heading towards complete failure.

The unit does have a management port but I'm unsure if it actually logs issues these types of issues, I'm also hesitant to console the unit or attempt to reset it as I have witnessed that cause complete failure once they start going bad.
Quote:

In regards to the failing PDU, I will contact management about replacing it since we would need to schedule a maintenance widow with multiple clients in order to swap it out.
With that, I think it's very likely the reboots were caused by the failing power distribution unit. We should be in the clear once the datacenter PDU is replaced but I'll do some additional hardware diagnostic testing afterwards to make sure.

NinjaPoodle 02-24-2018 09:59 PM

And I say again, Thank you John for keeping the ship sailing.

FSUZeta 02-25-2018 06:27 AM

Ditto! Many thanks to you John.

PGD-GRAD 02-25-2018 09:55 AM

THANK YOU for your dedication to the good in Greek Life and your work to allow us to voice our opinions and—hopefully—offer some positive information and assistance to those who are seeking to join our ranks.

naraht 02-26-2018 09:47 AM

Thank you very muchly!

John 02-26-2018 08:00 PM

Happened again...

Code:

runlevel (to lvl 3)  2.6.32-696.20.1. Mon Feb 26 08:15:42 2018 - Mon Feb 26 19:56:46 2018  (11:41)
reboot  system boot  2.6.32-696.20.1. Mon Feb 26 08:15:42 2018 - Mon Feb 26 19:56:46 2018  (11:41)

Taking GC offline for maybe 30 minutes or so to check on things.


All times are GMT -4. The time now is 07:10 AM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.