Infrastructure
August 26 2005 at 7:43 PM
Steven Roussey (Premier Login sroussey)
Network54 Employee
--------------------------------------------------------------------------------
It has been a long process and has slowed development of v2, but it has been necessary to build our hardware and software server platform on which the Network54 code runs. We are about one week away from being complete. In fact, next week on Friday (September 2) we will have a hard downtime, meaning that the site will be completely offline. More about that later.
The backstory: we have been running this site for a long time. And that means that most of the equipment is old. Some really old, and failing. So over the last few months we have done the following:
1. Replaced our database servers (with nice ones that have redundant power, redundant system disks, etc.).
2. Replaced our storage arrays (but one) (they also have redundant power, also every disk has a redundant second, and it has spares)
3. Replaced our power systems.
4. Replaced 80% of our webservers.
5. Upgraded the OS on all the machines to RHEL 3.
6. Later, upgraded the DB machines to RHEL 4.
7. Changed ISP, and changed IP addresses (latency is 10x faster!)
8. Changed application server software version.
9. Changed webserver software and architecture (Keep-Alive is now on so image loading and chat should seem snappier).
10. Many changes to enhance security and backups. After our big competitor lost 5 years of their customers' posts, we wanted to make sure that didn't happen to us. Nothing is perfect, of course, but we have regular backups encrypted and stored across the country. So even if there is an earthquake and fire that destroys all of los angeles, if we are still alive, we will be able to reconstruct up to at worst a week behind. Well, except for images, they are a different system and don't get backed up so often. Something that is chaged for v2.
What is left:
Saturday: 10 minute "read-only" mode. I need to check what is wrong with one of the storage arrays so we know what to plan for later in the week.
Monday: Update the OS on the webservers. No one should notice as we will pull one out at a time.
Tuesday: Likely one hour "read-only" mode as we replace the main storage array that we are checking on Saturday.
Friday: Physically moving the equipment to a new rack. Nothing will be plugged in for a while, thus the hard downtime. Expect the site to be truly down all day. It should only take four hours, with only about one to two of complete downtime. I think we will have an extra server just sitting there with a plain maintenance page, but no read-only. So we are hoping for only a couple hours, but packing food, etc. in case we are there all day.