Jump to content
Brian Enos's Forums... Maku mozo!

USPSA WEB SERVER DOWN


Recommended Posts

  • Replies 66
  • Created
  • Last Reply

Top Posters In This Topic

$50 a year host with unlimited drive space would be a good place to keep a mirror of the site. When disaster hits you just do a DNS redirect, a little time to propagate.... This is great because it's a seperate site in case of fire, flood or acts of God. Doesn't matter how good your back-up stuff is if the host is gone your site is down.

I have two hosts with 5 domain/subs on them. I can use'm as a live site with a quick DNS switch. <_< Basically, I keep each site on two servers in two different locations. I don't do it for the simple site, but where clients are paying.... you bet,

It's cheaper than the "The Uruguayans" and it's fully functional immediately.

If you like, you can park a copy of the USPSA site on one of my servers. It could save this from happening again. I can just set you up with a sub domain and an FTP account... The cost would be zero! Just a thought. :D

The bottom line is that shit will break and a hot site waiting to be swapped (server on another host) over is the best option for limiting down time in a disaster situation.

PS It also throws a bit of a wrench in online squadding.

JT

Edited by JThompson
Link to comment
Share on other sites

Sounds like a management and planning fail more than anything. Sure you can blame the datacenter jockies, but in reality you set yourself up for failure.

It's risk/effort/reward.

The budget is minimal. There is NOBODY with web server sysadmin skills on staff. I take care of the web stuff at home after a "day job". USPSA does some work on the site in-house, the staff helps out on some web design/maintenance, and we had a consulting company re-design the USPSA.ORG look and feel. Yes, USPSA HQ is aware of the need to bring expertise in-house.

If I had been properly notified of a RAID 5 single drive failure, this would have been a non-event. Give me a budget and some staff and I can set up a system that will even survive uninterrupted if some folks representing a religion of peace take out the building hosting the server. Remember that day job I mentioned - it's all about reliable data storage :).

In some senses, this is an example of a setup for success:

- We had a single preventive mechanism (RAID 5) in place to protect live data. The hardware did what is should; human processes failed.

- We performed regular backups so we could survive any hardware failure. That worked!!!

As a financial exercise, let's suppose the down time for this is 4 days. It would represent less than one day downtime per year based on our experience to date averaged out over the years. How much $$ should USPSA spend to protect against this? We are already at three nines (.999) uptime, or the server is down less than one hour out of a thousand of operation. Each additional "nine" of reliability costs.

Dave Thomas and I discussed this years ago, with simple conclusions:

1. It is important not to lose a lot of data if we have a disaster

2. We are not going to spend a lot of $$ to totally eliminate the risk of a couple days downtime

every few years when something happens.

Naturally, we will be discussing the matter again as the degree to which USPSA members rely on web based services has increased significantly over the past few years.

Link to comment
Share on other sites

Status:

- The data was restored from backup, and we are getting the server state as of May 4. Yes, a few squadding changes and results uploads could be lost.

- I had all the user data and databases (the really important stuff), but missed some stuff that would speed up the configuration on restore - so I am spending a bit more time than I would like getting things back in operation.

- The ONE thing I was not able to restore (missed in the backup config) was the main password for each web site (the one you use to FTP to your web site). Individual email passwords were restored. If you take care of a web site on the USPSA server, email me (rob at boudrie dot com) telling me what you would like the password reset to and I will do it.

- I have restored many of the sites, but still have some to go - including some of the area sites and some USPSA sites. I'll post additional info as progress progresses.

Link to comment
Share on other sites

$50 a year host with unlimited drive space would be a good place to keep a mirror of the site. When disaster hits you just do a DNS redirect, a little time to propagate.... This is great because it's a seperate site in case of fire, flood or acts of God. Doesn't matter how good your back-up stuff is if the host is gone your site is down.

OK, I'll bite.

This would be trivial if our site were static, and pretty straightforward if the only dynamic changes were in the database.

The server has many functions that upload files to the server. Some are user visible (upload classification results); some are via a content management admin panel. What procedure would you recommend to easily keep the files in sync, and also assure that a file is synced at the same time a database entry that references that file is updated (database replication is easy using a log shipping approach for continuous replication). It would need to be a bit fancier than a cron job with "rsync", since data could get out sync between file and database. It would also need to catch any change to a content file uploaded to the primary server. Now, explain how I do this all in an evening or two (but there are some cloud approaches that look promising)

There are two primary metrics in the disaster recovery world:

RTO - Recovery Time Objective. How long goes it take to get back up? We are at a couple of days for a full server loss. That is an inconvenience for USPSA, but would cost a brokerage house millions, hence the difference in resources.

RPO - Recovery Point Objective. If you have a crash at time T(crash), and recover from a backup taken at Time(backup), how big is the difference between these times? This is about one day for a USPSA recovery from full server loss.

Link to comment
Share on other sites

Thank you sir for your hard work in this situation. I'm sure you are doing all you can and it will turn out fine. Mild inconvenience for us, a lot of work for you. Thanks again.

Link to comment
Share on other sites

Thank you sir for your hard work in this situation. I'm sure you are doing all you can and it will turn out fine. Mild inconvenience for us, a lot of work for you. Thanks again.

+1 We all have day jobs, and understand this is a volunteer sport.

Thanks for working on it over this special weekend Rob.

Link to comment
Share on other sites

If It matters (in other words, you don't know about it or it won't be fixed by another fix), I get an error when I look up my new USPSA # and the lookup for the old one doesn't show the new one.

http://www.uspsa.org/uspsa-classifer-lookup-results.php?number=fy64276

http://www.uspsa.org/uspsa-classifer-lookup-results.php?number=l3344

Link to comment
Share on other sites

$50 a year host with unlimited drive space would be a good place to keep a mirror of the site. When disaster hits you just do a DNS redirect, a little time to propagate.... This is great because it's a seperate site in case of fire, flood or acts of God. Doesn't matter how good your back-up stuff is if the host is gone your site is down.

OK, I'll bite.

This would be trivial if our site were static, and pretty straightforward if the only dynamic changes were in the database.

The server has many functions that upload files to the server. Some are user visible (upload classification results); some are via a content management admin panel. What procedure would you recommend to easily keep the files in sync, and also assure that a file is synced at the same time a database entry that references that file is updated (database replication is easy using a log shipping approach for continuous replication). It would need to be a bit fancier than a cron job with "rsync", since data could get out sync between file and database. It would also need to catch any change to a content file uploaded to the primary server. Now, explain how I do this all in an evening or two (but there are some cloud approaches that look promising)

There are two primary metrics in the disaster recovery world:

RTO - Recovery Time Objective. How long goes it take to get back up? We are at a couple of days for a full server loss. That is an inconvenience for USPSA, but would cost a brokerage house millions, hence the difference in resources.

RPO - Recovery Point Objective. If you have a crash at time T(crash), and recover from a backup taken at Time(backup), how big is the difference between these times? This is about one day for a USPSA recovery from full server loss.

I think you may be missing the main point here. You CAN do most of what you need with cron and use the back-up to restore the rest. The point is that you have 99% still available no matter how bad or for how long a crash takes to fix. As you said, it's not hyper critical to have all the data such as classifier uploads and the day to day uploads stuff from uses. That stuff would otherwise be unavailable anyway.

What you would have is a complete site with data perhaps one day out of date and a few areas unavailable to use until the primary is restored. What you would not have is a page not found or a dead site. To me this is unacceptable. Brian Enos has a back-up site to post in the event the main site goes down.

I understand you guys are all volunteers and we thank you for your hard work, but it may be time to take this from the volunteers and get it in the hands of someone who can give it the proper time and attention. This is not to run you guys down, but to say this is one of the areas we need to look at. At the very least, the USPSA should compensate you guys for this stuff. It isn't like you are doing it for a local club that can't afford a few bucks for web design and a DEDICATED webmaster to run it.

We pay the Prez we Pay people to run Steel Challenge.... which, correct me if I'm wrong, has never made a profit since we bought it, but I digress. We need to be putting in more $ to critical areas not looking for ways we can save a $.

If you build it they will come... Perhaps it's time to take a look at making a Per Diem available for certain jobs or areas to help us grow.

Back on point... there is no reason not to have a backup site going. You can always say well this is a problem or that is a problem, but it's really not rocket science.

I do appreciate what you guys do, but I see no reason it all should fall on your shoulders to save us a few buck a year.

JT

Edited by JThompson
Link to comment
Share on other sites

Man did that hose the USPSA self squadding.... You guys going to be able to get something current restored for it? The way it is now, it's a HUGE mess.

We have to take part of the blame for not printing a hard copy of it...

Also get this error when making a change:

ERR: Problem sending copy of notice to match administrator

JT

Edited by JThompson
Link to comment
Share on other sites

Is the USPSA site back up, I am still getting errors trying to access there web page?

NVM moment of stupidity, I can get to the site just fine.

Edited by DFinan
Link to comment
Share on other sites

I still get an error when trying to renew my membership...using https/SSL only. It will allow non-https, but I'd rather not put my Credit card info on a non-secure web page.

Secure Connection Failed

An error occurred during a connection to www.uspsa.org.

SSL received a record that exceeded the maximum permissible length.

Error code: ssl_error_rx_record_too_long)

The page you are trying to view can not be shown because the authenticity of the received data could not be verified. Please contact the web site owners to inform them of this problem. Alternatively, use the command found in the help menu to report this broken site.

Link to comment
Share on other sites

How will this effect classifiers that have been uploaded but unable to pay for until the https side of the site gets running again?

Not a big problem if it was the 1st of May but it is the 9th today. :cheers:

Link to comment
Share on other sites

Having a $50/yr hosting package somewhere else would be a good idea, but another idea for future improvement would be to have redundant servers. I certainly hope that your SQL database ISN'T on your main web server, but if it is, get it off of there and put it on a separate server. If you're using Windows, look into DFS replication to create a redundant server. Both would have raid 5 at LEAST. You would also set up your primary web server the same way. It's hard to explain it in words, but since you're a server guy, I'm sure you get the idea. (you'd have 4 servers total)

Hard drives aren't the only things that fail...you need more than one machine. A better RAID configuration only protects you from HDD failure scenarios, not CPU, memory, and mobo stuff.

This all depends on how much downtime you're comfortable with.

Edited by Erik S.
Link to comment
Share on other sites

We are working on the SSL problem.

As to redundancy - yes, it's possible but would require a bunch of coding to handle redundancy on the file part - not to mention the need to mirror email configurations, any new sites created on the server for USPSA use; etc.

It's not like we don't know about these industry standard techniques. The one thing nobody is offering is an explanation as to how we will do this with minimal time and cost, including the custom apps and ongoing maintenance issues. Keep in mind that USPSA currently has *NOBODY* on the staff with this sort of sysadmin skill, which means hiring someone do to the work. Yes, we can do better - but I don't think we can get more per dollar than we do now.

This all depends on how much downtime you're comfortable with.

Put another way, it all depends on how much we are willing to spend to avoid downtime that has historically averaged less than one day per year.

What does look promising is using a virtual system in a cloud environment, that way the entire system can be backed up on a daily basis as a "machine image" - and, in the event of a server failure, just restarted on another server with no need to do an OS and application reinstall.

Link to comment
Share on other sites

No, it just is hidden :closedeyes:

I had to change the IP address to install a secure certificate (it's best for a SSL site to have its own IP). The IP for the site has change from 209.62.63.82 to 209.62.63.83. The DNS should propogate shortly or, if you are a hacker, you can change "/etc/hosts" on Linux or %systemroot%\system32\drivers\etc\hosts on Windows - but, my recommendation is don't fool around with that if you don't already know what I am talking about - as you can forget it and have problems when the IP changes again in the future.

And yes, I know there is a "self signed" cert there now. I'll have a Godaddy cert up once I attend to the day job for a while :).

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now



×
×
  • Create New...