For those who are interested, here's a quick tutorial/speculation on why the
DCL servers had such problems last night, and why "fixing it" is a harder problem than you might expect.....
Ultimately, the DCL web site is just a piece of software running on one or more computers. Like any computer, the more applications you try to run, the slower the machine runs.
We can speculate/guess on what a normal load is for the DCL web site.
DCL has about 5000 guests/week (2 ships @2500 each - the wonder running 2 cruises/week offsets the fact that much of the year the ships don't run full).
Call that 260,000 passengers/year.
Not all of those passengers visit the web site (usually one books for the family, but then some have TA's also). Let's say 130,000 passengers on the site/year.
Each one visits the site multiple times, and there are also many people who visit the site who don't book. So let's multiply that by 15 to take that into account. 1.95 million visits/year.
Each visit runs about 8 pages (Alexa ranking), which is 15.6 million page views/year.
That comes down to about 30 page views/minute, or one every 2 seconds.
A powerful server can handle that pretty easily. There's no way to really know how many servers they have, but they probably aren't running a large server farm. Maybe 2-3 machines? All talking to a database backend which is probably something of enterprise grade (Oracle?) and shared with their main reservation system.
So, yesterday night you have about 1800 repeat guests, plus those on the Wonder. Not all of them were trying to book - let's assume there were 250 of us. At 12:01 you had a surge - 250 requests/second. And most of these were heavy requests (logins, reservation lookups, excursion database lookups, etc.). It's not unreasonable to think that the instantaneous load was 500x normal. It's virtually certain that it was at least 100x normal.
So, if you're Disney, what do you do? Do you run a server farm of 100 machines where 98 of them will be idle except for a 3 hour period once in a rare while? Obviously not.
What you do (hopefully) is do some load testing, and add error-checking to your software so that:
A - Errors don't corrupt your database (aka, people start losing confirmed excursions or reservations, people start getting info from other people's reservations, etc.)
B - When errors do occur (timeouts, conflicts, etc) the server dumps connections but doesn't die completely.
I know it's frustrating (I spent almost 3 hours online and ended up having to log in through a remote connection because my machines were completely locked out). But at the same time, it's important I think for people to realize that "fixing the system" isn't a matter of calling a handyman over for a few hours and doing something obvious. Building systems that can handle sudden surges in traffic is a VERY tough problem even if you can anticipate them.
The good news is - things should get better with two new ships coming in. Assuming the DCL IT folks aren't stupid, they know they'll have to scale up for higher average traffic with two new ships. That will also make them better able to handle the surges from one or two cruises. They may also get really smart and build some new systems based on the new computational cloud services that are becoming available (which make it easier to quickly bring on extra capacity).
Ok, enough of lecture mode. I just ran some numbers out of curiousity and in order to avoid doing any real work. It's just that even though I feel the pain of those who didn't get everything they wanted (including my own pain on that score), I think it's important to realize that the DCL folk aren't being negligent or stupid on this. They really are dealing with some extraordinarily complex problems, and the problems we are hearing (being bounced out, missing brunch reservations) are NOTHING compared to the kinds of problems that are possible (scrambled/lost reservations, passengers getting lost or transferred between reservations), etc. - And that those messages we're getting about reservations being locked are not the results of stupidity or carelessness, but rather a sign of the application protecting you from those much worse possibilities.