Jump to content
Brian Enos's Forums... Maku mozo!

Uspsa Web Server Issues


Recommended Posts

I had to bring the USPSA web server down for some emergency maintenance.

Strange things had been happening over the past couple of days. While I was able to fix isolated problems, I suspected some broader problem and ran a disk utility ("fsck" for you Linux/Unix weenies) in "diagnostic" mode. Yikes!!! - It reported numerous intenal inconsistencies in the file system.

I am currently running this utility in "repair" mode, which will take a while, and requires taking the server off line. Once it completes, I'll have a better idea where we are, and if there is an hardware problem with the drive (if the utility corrects things, I will run the diagnostic daily for a few weeks just to make sure we stay in good shape).

I'll post any updates to this topic as they become available. "boudrie.com" is on the same server, so you can PM me here if you need to reach me ... but don't bother with general "when is it going to be up" questions - I prefer to spend my energy fixing the problem rather than talking about it (and I'll post any new info here).

Rob

Link to comment
Share on other sites

The disk on the USPSA server is not in good shape. I ran further tests, and the problems with the drive are beyond the ability of the "fsck" utility to self-heal.

I suspect this was caused by a hardware problem on the drive. I am having the server farm replace the drive, and will restore the entire server from backups.

We will probably be down for a day or so (although there may be periods where it is up in a semi-crippled state) - a backup restore will take a number of hours to run. Fortunately, I believe we are in pretty good shape with regards to backups. (And yes, that includes the other sites on this server - including several area sites).

I was able to shapshot the USPSA databases before doing the work, so all on-line squadding from the various matches is sitting safely on a development machine in my basement so I expect a 100.00% restore of that information.

Rob

Link to comment
Share on other sites

I thoughd I'd heard the USPSA server disks were on a RAID array?  Was this problem bad enough to break the array?

Even so, RAID can ONLY repair damage it can detect, if a signiature match cannot be found for a compare then the data will be lost.

If thats the case and the issue wasn't caught sooner, I'd be more inclined to think it's a controller failure than a disk failure causing the corruption.

Link to comment
Share on other sites

The USPSA server is not on a raid array (we were back when we actually owned a box at Shooters.com)

There are two reasons for this:

1) Our server farm is phasing out hardware raid boxes

2) The RAID drives are dependent on the particular type of system. If a system develops a motherboard problem, and we have raid drives requiring a particular type of controller, we're stuck if the server farm is out of that type of server and cannot find us another system.

But, things are not that bad:

1) We have a second drive in the box, with regular backups. The one which I have which preceeds any of these problems is from 5/11

2) I was able to grab a snapshot of the database tables, match results, classifier scores, and uploaded data while the system limped along. This means that I should be able to recover with virtually no loss of web server data.

3) I am probably going to set up a database replication server. This is not raid, but a networked database server which will stay in sync by ongoing communicaiton with the main server.

It's all a matter of money. If I had my dream system, we'd be running switched fibre to a industrial strength storage array (the kind that calls the factory for service if something starts to act up, and the first indication of the problem is the repair person at the door with a replacement drive). But... it's all a matter of balancing cost vs. benefit.

Also, as I know from personal experience, some low end RAID system merely give you the illusion you have redundancy (been there, done that :).

Another think I could consider is using Linux software based raid, but it may not be practical to install it on the box we currently rent.

Rob

Link to comment
Share on other sites

2) The RAID drives are dependent on the particular type of system.  If a system develops a motherboard problem, and we have raid drives requiring a particular type of controller, we're stuck if the server farm is out of that type of server and cannot find us another system.

HW RAID seems to work best/safest in a separate, self-contained storage subsystem....

3) I am probably going to set up a database replication server.  This is not raid, but a networked database server which will stay in sync by ongoing communicaiton with the main server.

Does this protect from corruption in the main DB instance?? My understanding of DB replication is that, depending on implementation and circumstances, you might propagate data corruption issues between the instances?? Obviously, it doesn't replace backups, but allows you to have a hot spare server...

Also, as I know from personal experience, some low end RAID system merely give you the illusion you have redundancy (been there, done that :).

Heh heh - I've always wondered about that....

Another think I could consider is using Linux software based raid, but it may not be practical to install it on the box we currently rent.

If you go there, my experience suggests that looking at mirrors is a more efficient setup (vs. a software based RAID-5 or something), at least in terms of processing horsepower on the host system. It does require more disk - but with as big as spindles have gotten, two disks mirrored to each other might be plenty of disk space....

Link to comment
Share on other sites

Database replication would replicate any data corruption, but would protect against hardware failure on the main server.

It would not provide a hot spare, as the second server is on a secondary network accessible only once you are on a machine at the server farm - but it does protect the data.

I'm a big fan of separate RAID boxes. I usually use the an Emulex fibre card to connect to arrays over switched fiber, but iSCSI shows promise. I just wish EV1 would get some enterprise class arrays and rent out luns.

Link to comment
Share on other sites

The server farm has installed a new hard drive in the server, and is currently doing the operating system and web server software install.

I will start the restores when I get home tonite, so we should be back in operation tomorrow.

This server also has most of the area web sites (I think has 1,2,3,6,7 and 8). Those webmasters should relax, as I will be restoring a backup of those sites from May 11. Ditto for the Infinity Firearms and JP Rifles web sites (each of those companies pays for space on the server, significantly reducing USPSA's cost)

Link to comment
Share on other sites

EV1 isn't known for high quality stuff, if you're wanting that, Verio or The Planet is more of a place to host. Good enough though...ever given thought to colo rather than leasing a server? (If not already colo)

On a side note, RAID 5 would be the best way to go, at least 3 disks on say a 3Ware (ifd were using SATA 150) card. That way that if any physical hardware issues arise with any of the discs, it could easily be swapped for a replacement and you can rebuild the array without much hassle.

----

On a side note since the acronym "RAID" is being thrown around, it stands for Redunant Array of Independent Disks. Basically instead of all of a file being stored on one disk, only parts of it are put on each disk. How the data is stored depends on the type of RAID Used.

Some are RAIDs fault taulerant, while others are not. One version of RAID, RAID 0, just splits the data between two drives, if one drive fails you lose all data. Take that against say RAID 5, the parity bits are distributed over the drive array so if one were to fail, you can rebuild the data on that drive using the parity bits contained on the other drives.

Link to comment
Share on other sites

Are you running a journaling file system? I've had much better luck recovering from Bad Events with ext3 than with ext2. Too late now, but something to keep in mind if you have to/get to rebuild.

Link to comment
Share on other sites

EV1 has seemed to be at a nice price point. If I were going to go RAID, I would want to get a box which was supported - not a "leftover" where we could not get a replacement to just drop our disks into if there was a problem. They started work on the replacement drive within one hour of my ticket submission - I delayed that until 1PM, since the server was limping along enough to serve email; and that was late enough for HQ to pick up the emails that came in during the evening but early enough for them to be "done" by the time I get home.

I am using the EXT3 journaling file system on the server (including the problem drive!) I prefer the Reisef FS which I use on the home system, but it's not a mainstream system on the Red Hat Enterprise Linux on our server, so I stuck wiht EXT3.

There will be an outage of about 30 minutes sometime after 3:30 Thrusday afternoon. The ISP is keeping the bad drive in the server as a third drive in case I want to try to get something off of it.

The restores are going very smoothly, however, I am working smallest to largest - I just finished installing areas 1, 2, 3, and 8 which are on this system.

This is a real pain, but we do OK considering this runs USPSA a total of about $100/month (after we collect from SVI and JP for their space rental on the box).

Rob

Link to comment
Share on other sites

The IPSC replaced our primary hard drive, and I did a complete restore last evening. While it appearst to be in good shape, there is the chance I missed somehting - if you see a proble, email rob@boudrie.com and I'll look into it,

I had them leave the old drive in temporarily, as I was still able to extract a few very recent updates from it. The ISP will be removing this drive sometime after 3:30 on Thursday, May 19 resulting in about 20 minues of downtime.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...