Thread: real bad
View Single Post
Old 07-07-2006, 09:46 AM   #7 (permalink)
rob
Resident NetOp/*nix Geek
 
rob's Avatar
 
Join Date: Dec 2003
Posts: 223
Jason,

Thanks for the level-headed, informative post.

With regards to the downtime that we've had over the last few months - a large amount of this has been power related. Two incidents which affected the whole suite in the datacentre in which our core networking equipment is housed meant we experienced outages. In addition to this, we've had some problems with our own power distribution infrastructure. Unfortunately for us, and ultimately for our customers, it's almost impossible to keep a rack on fully redudnant, diverse power feeds in a datacentre - you rely on the datacentre itself to provide you with a reliable power feed for your equipment.

We are currently implenting a large number of changes to our infrastructure, ranging from replacing cabinets in order to provide a more manageable, and accessible physical infrastructure, to adding more power resiliency. Within the next couple of months, the network infrastructure will be completely resilient to power failure of both our own PDU equipment, and the datacentre's (core equipment is being placed onto UPSes). In addition to this, we're rotating all the power distribution equipment that we have currently, and replacing it with new equipment - in order to try and avoid any future failures. Our new infrastructure is also more distributed, meaning that we will not experience outages to such a large proportion of the network in the case of infrastructure failure.

In the cases where we have had extended outages - we have immediately dispatched staff on-site, and in my opinion, handled this as quickly, and efficiently as we could, given the circumstances, and have communicated to our customers as quickly as possible the actions that we're taking.

As Jason says, with more and more web applications, and packages out there having security issues (monitor a list like Full-Disclosure, there are hundreds), security is becoming paramount for a reliable hosting platform - whilst this sometimes can cause problems for some customers, I believe that once we're made aware of a problem for a customer, we do our very best to help with that problem. I'm sorry if this isn't your experience - please let me know of the ticket reference for your issues, so that we can optimise our support procedures to be more efficient.

Finally - if you are seeing performance issues with any of our servers, then we do need more information than "it's slow!" to debug this. Please put together a more detailed report (consisting of your ISP, time that the problem was experienced, your IP address, the site that you were trying to access' full URL, and any other data you feel is relevant) and we will look into it. Unfortunately, merely posting sketchy reports on these forums does not allow me to look into the issue to the extent that I'd like to.

At the end of the day - I believe that when an issue is reported to us, we will look into it in depth, in order to find the cause, and in order to ensure that the necessary performance/reliability is achieved from our platforms.

If any of our customers, or prospective customers, or ex-customers would like to comment on this, or ask questions based on our current upgrade plan, I'd be more than happy to answer them, please contact me personally at rob@catalyst2.net.

Kind regards,
Rob
__________________
Rob Shakir - rob@catalyst2.com
rob is offline   Reply With Quote