spanky Posted May 8, 2011 Share Posted May 8, 2011 Occurrences like this tend to expand the budget and fill in the holes in the processes. Ain't that the truth... Link to comment Share on other sites More sharing options...
JThompson Posted May 8, 2011 Share Posted May 8, 2011 (edited) $50 a year host with unlimited drive space would be a good place to keep a mirror of the site. When disaster hits you just do a DNS redirect, a little time to propagate.... This is great because it's a seperate site in case of fire, flood or acts of God. Doesn't matter how good your back-up stuff is if the host is gone your site is down. I have two hosts with 5 domain/subs on them. I can use'm as a live site with a quick DNS switch. Basically, I keep each site on two servers in two different locations. I don't do it for the simple site, but where clients are paying.... you bet, It's cheaper than the "The Uruguayans" and it's fully functional immediately. If you like, you can park a copy of the USPSA site on one of my servers. It could save this from happening again. I can just set you up with a sub domain and an FTP account... The cost would be zero! Just a thought. The bottom line is that shit will break and a hot site waiting to be swapped (server on another host) over is the best option for limiting down time in a disaster situation. PS It also throws a bit of a wrench in online squadding. JT Edited May 8, 2011 by JThompson Link to comment Share on other sites More sharing options...
Chris Keen Posted May 8, 2011 Share Posted May 8, 2011 PS It also throws a bit of a wrench in online squadding yes? J YES! Link to comment Share on other sites More sharing options...
Rob Boudrie Posted May 8, 2011 Author Share Posted May 8, 2011 Sounds like a management and planning fail more than anything. Sure you can blame the datacenter jockies, but in reality you set yourself up for failure. It's risk/effort/reward. The budget is minimal. There is NOBODY with web server sysadmin skills on staff. I take care of the web stuff at home after a "day job". USPSA does some work on the site in-house, the staff helps out on some web design/maintenance, and we had a consulting company re-design the USPSA.ORG look and feel. Yes, USPSA HQ is aware of the need to bring expertise in-house. If I had been properly notified of a RAID 5 single drive failure, this would have been a non-event. Give me a budget and some staff and I can set up a system that will even survive uninterrupted if some folks representing a religion of peace take out the building hosting the server. Remember that day job I mentioned - it's all about reliable data storage . In some senses, this is an example of a setup for success: - We had a single preventive mechanism (RAID 5) in place to protect live data. The hardware did what is should; human processes failed. - We performed regular backups so we could survive any hardware failure. That worked!!! As a financial exercise, let's suppose the down time for this is 4 days. It would represent less than one day downtime per year based on our experience to date averaged out over the years. How much $$ should USPSA spend to protect against this? We are already at three nines (.999) uptime, or the server is down less than one hour out of a thousand of operation. Each additional "nine" of reliability costs. Dave Thomas and I discussed this years ago, with simple conclusions: 1. It is important not to lose a lot of data if we have a disaster 2. We are not going to spend a lot of $$ to totally eliminate the risk of a couple days downtime every few years when something happens. Naturally, we will be discussing the matter again as the degree to which USPSA members rely on web based services has increased significantly over the past few years. Link to comment Share on other sites More sharing options...
Rob Boudrie Posted May 8, 2011 Author Share Posted May 8, 2011 Status: - The data was restored from backup, and we are getting the server state as of May 4. Yes, a few squadding changes and results uploads could be lost. - I had all the user data and databases (the really important stuff), but missed some stuff that would speed up the configuration on restore - so I am spending a bit more time than I would like getting things back in operation. - The ONE thing I was not able to restore (missed in the backup config) was the main password for each web site (the one you use to FTP to your web site). Individual email passwords were restored. If you take care of a web site on the USPSA server, email me (rob at boudrie dot com) telling me what you would like the password reset to and I will do it. - I have restored many of the sites, but still have some to go - including some of the area sites and some USPSA sites. I'll post additional info as progress progresses. Link to comment Share on other sites More sharing options...
yoshidaex Posted May 8, 2011 Share Posted May 8, 2011 Thanks for all the work Rob Link to comment Share on other sites More sharing options...
Rob Boudrie Posted May 8, 2011 Author Share Posted May 8, 2011 $50 a year host with unlimited drive space would be a good place to keep a mirror of the site. When disaster hits you just do a DNS redirect, a little time to propagate.... This is great because it's a seperate site in case of fire, flood or acts of God. Doesn't matter how good your back-up stuff is if the host is gone your site is down. OK, I'll bite. This would be trivial if our site were static, and pretty straightforward if the only dynamic changes were in the database. The server has many functions that upload files to the server. Some are user visible (upload classification results); some are via a content management admin panel. What procedure would you recommend to easily keep the files in sync, and also assure that a file is synced at the same time a database entry that references that file is updated (database replication is easy using a log shipping approach for continuous replication). It would need to be a bit fancier than a cron job with "rsync", since data could get out sync between file and database. It would also need to catch any change to a content file uploaded to the primary server. Now, explain how I do this all in an evening or two (but there are some cloud approaches that look promising) There are two primary metrics in the disaster recovery world: RTO - Recovery Time Objective. How long goes it take to get back up? We are at a couple of days for a full server loss. That is an inconvenience for USPSA, but would cost a brokerage house millions, hence the difference in resources. RPO - Recovery Point Objective. If you have a crash at time T(crash), and recover from a backup taken at Time(backup), how big is the difference between these times? This is about one day for a USPSA recovery from full server loss. Link to comment Share on other sites More sharing options...
whitedog Posted May 8, 2011 Share Posted May 8, 2011 Thank you sir for your hard work in this situation. I'm sure you are doing all you can and it will turn out fine. Mild inconvenience for us, a lot of work for you. Thanks again. Link to comment Share on other sites More sharing options...
Chris Keen Posted May 8, 2011 Share Posted May 8, 2011 Thank you sir for your hard work in this situation. I'm sure you are doing all you can and it will turn out fine. Mild inconvenience for us, a lot of work for you. Thanks again. +1 We all have day jobs, and understand this is a volunteer sport. Thanks for working on it over this special weekend Rob. Link to comment Share on other sites More sharing options...
alpha-charlie Posted May 9, 2011 Share Posted May 9, 2011 Looks like things are pretty much up and running. My match scores from today are posted. Link to comment Share on other sites More sharing options...
spanky Posted May 9, 2011 Share Posted May 9, 2011 If It matters (in other words, you don't know about it or it won't be fixed by another fix), I get an error when I look up my new USPSA # and the lookup for the old one doesn't show the new one. http://www.uspsa.org/uspsa-classifer-lookup-results.php?number=fy64276 http://www.uspsa.org/uspsa-classifer-lookup-results.php?number=l3344 Link to comment Share on other sites More sharing options...
JThompson Posted May 9, 2011 Share Posted May 9, 2011 (edited) $50 a year host with unlimited drive space would be a good place to keep a mirror of the site. When disaster hits you just do a DNS redirect, a little time to propagate.... This is great because it's a seperate site in case of fire, flood or acts of God. Doesn't matter how good your back-up stuff is if the host is gone your site is down. OK, I'll bite. This would be trivial if our site were static, and pretty straightforward if the only dynamic changes were in the database. The server has many functions that upload files to the server. Some are user visible (upload classification results); some are via a content management admin panel. What procedure would you recommend to easily keep the files in sync, and also assure that a file is synced at the same time a database entry that references that file is updated (database replication is easy using a log shipping approach for continuous replication). It would need to be a bit fancier than a cron job with "rsync", since data could get out sync between file and database. It would also need to catch any change to a content file uploaded to the primary server. Now, explain how I do this all in an evening or two (but there are some cloud approaches that look promising) There are two primary metrics in the disaster recovery world: RTO - Recovery Time Objective. How long goes it take to get back up? We are at a couple of days for a full server loss. That is an inconvenience for USPSA, but would cost a brokerage house millions, hence the difference in resources. RPO - Recovery Point Objective. If you have a crash at time T(crash), and recover from a backup taken at Time(backup), how big is the difference between these times? This is about one day for a USPSA recovery from full server loss. I think you may be missing the main point here. You CAN do most of what you need with cron and use the back-up to restore the rest. The point is that you have 99% still available no matter how bad or for how long a crash takes to fix. As you said, it's not hyper critical to have all the data such as classifier uploads and the day to day uploads stuff from uses. That stuff would otherwise be unavailable anyway. What you would have is a complete site with data perhaps one day out of date and a few areas unavailable to use until the primary is restored. What you would not have is a page not found or a dead site. To me this is unacceptable. Brian Enos has a back-up site to post in the event the main site goes down. I understand you guys are all volunteers and we thank you for your hard work, but it may be time to take this from the volunteers and get it in the hands of someone who can give it the proper time and attention. This is not to run you guys down, but to say this is one of the areas we need to look at. At the very least, the USPSA should compensate you guys for this stuff. It isn't like you are doing it for a local club that can't afford a few bucks for web design and a DEDICATED webmaster to run it. We pay the Prez we Pay people to run Steel Challenge.... which, correct me if I'm wrong, has never made a profit since we bought it, but I digress. We need to be putting in more $ to critical areas not looking for ways we can save a $. If you build it they will come... Perhaps it's time to take a look at making a Per Diem available for certain jobs or areas to help us grow. Back on point... there is no reason not to have a backup site going. You can always say well this is a problem or that is a problem, but it's really not rocket science. I do appreciate what you guys do, but I see no reason it all should fall on your shoulders to save us a few buck a year. JT Edited May 9, 2011 by JThompson Link to comment Share on other sites More sharing options...
Torogi Posted May 9, 2011 Share Posted May 9, 2011 Rob, Thank you Sir for the hardwork. Looks like we are back online now. Link to comment Share on other sites More sharing options...
SethML3602 Posted May 9, 2011 Share Posted May 9, 2011 Yes, thank you for all your hard work. Go ahead and take the next couple of days off. Just tell them I said it was ok. Link to comment Share on other sites More sharing options...
JThompson Posted May 9, 2011 Share Posted May 9, 2011 (edited) Man did that hose the USPSA self squadding.... You guys going to be able to get something current restored for it? The way it is now, it's a HUGE mess. We have to take part of the blame for not printing a hard copy of it... Also get this error when making a change: ERR: Problem sending copy of notice to match administrator JT Edited May 9, 2011 by JThompson Link to comment Share on other sites More sharing options...
DFinan Posted May 9, 2011 Share Posted May 9, 2011 (edited) Is the USPSA site back up, I am still getting errors trying to access there web page? NVM moment of stupidity, I can get to the site just fine. Edited May 9, 2011 by DFinan Link to comment Share on other sites More sharing options...
MarkCO Posted May 9, 2011 Share Posted May 9, 2011 Is the USPSA site back up, I am still getting errors trying to access there web page? Might try refresh. It was up last night, and this morning. Link to comment Share on other sites More sharing options...
Mark R Posted May 9, 2011 Share Posted May 9, 2011 I still get an error when trying to renew my membership...using https/SSL only. It will allow non-https, but I'd rather not put my Credit card info on a non-secure web page. Secure Connection Failed An error occurred during a connection to www.uspsa.org. SSL received a record that exceeded the maximum permissible length. Error code: ssl_error_rx_record_too_long) The page you are trying to view can not be shown because the authenticity of the received data could not be verified. Please contact the web site owners to inform them of this problem. Alternatively, use the command found in the help menu to report this broken site. Link to comment Share on other sites More sharing options...
Poppa Bear Posted May 9, 2011 Share Posted May 9, 2011 How will this effect classifiers that have been uploaded but unable to pay for until the https side of the site gets running again? Not a big problem if it was the 1st of May but it is the 9th today. Link to comment Share on other sites More sharing options...
Erik S. Posted May 9, 2011 Share Posted May 9, 2011 (edited) Having a $50/yr hosting package somewhere else would be a good idea, but another idea for future improvement would be to have redundant servers. I certainly hope that your SQL database ISN'T on your main web server, but if it is, get it off of there and put it on a separate server. If you're using Windows, look into DFS replication to create a redundant server. Both would have raid 5 at LEAST. You would also set up your primary web server the same way. It's hard to explain it in words, but since you're a server guy, I'm sure you get the idea. (you'd have 4 servers total) Hard drives aren't the only things that fail...you need more than one machine. A better RAID configuration only protects you from HDD failure scenarios, not CPU, memory, and mobo stuff. This all depends on how much downtime you're comfortable with. Edited May 9, 2011 by Erik S. Link to comment Share on other sites More sharing options...
Rob Boudrie Posted May 9, 2011 Author Share Posted May 9, 2011 We are working on the SSL problem. As to redundancy - yes, it's possible but would require a bunch of coding to handle redundancy on the file part - not to mention the need to mirror email configurations, any new sites created on the server for USPSA use; etc. It's not like we don't know about these industry standard techniques. The one thing nobody is offering is an explanation as to how we will do this with minimal time and cost, including the custom apps and ongoing maintenance issues. Keep in mind that USPSA currently has *NOBODY* on the staff with this sort of sysadmin skill, which means hiring someone do to the work. Yes, we can do better - but I don't think we can get more per dollar than we do now. This all depends on how much downtime you're comfortable with. Put another way, it all depends on how much we are willing to spend to avoid downtime that has historically averaged less than one day per year. What does look promising is using a virtual system in a cloud environment, that way the entire system can be backed up on a daily basis as a "machine image" - and, in the event of a server failure, just restarted on another server with no need to do an OS and application reinstall. Link to comment Share on other sites More sharing options...
9supercomp Posted May 10, 2011 Share Posted May 10, 2011 I upload a classifier yesterday with no problems, but it would not let me pay for it yet, I call Val today and we took care of payment over the phone. Link to comment Share on other sites More sharing options...
DocMedic Posted May 10, 2011 Share Posted May 10, 2011 It seem to went back down Link to comment Share on other sites More sharing options...
Rob Boudrie Posted May 10, 2011 Author Share Posted May 10, 2011 No, it just is hidden I had to change the IP address to install a secure certificate (it's best for a SSL site to have its own IP). The IP for the site has change from 209.62.63.82 to 209.62.63.83. The DNS should propogate shortly or, if you are a hacker, you can change "/etc/hosts" on Linux or %systemroot%\system32\drivers\etc\hosts on Windows - but, my recommendation is don't fool around with that if you don't already know what I am talking about - as you can forget it and have problems when the IP changes again in the future. And yes, I know there is a "self signed" cert there now. I'll have a Godaddy cert up once I attend to the day job for a while . Link to comment Share on other sites More sharing options...
spanky Posted May 10, 2011 Share Posted May 10, 2011 Any thoughts on the classification issue I posted? Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now