Web Application “GO-LIVE” Checklist
So you’ve coded it, tested it, debugged it, and your client likes it. Time to go live. NOT.
Going live is a point of no turning back. Once the application is live, the whole thing goes from debug-mode to *operations* mode. You must react in *real time* to anything bad that happens to it.
Thus before turning the big switch on, it helps to have a sense of what you are getting into, and it is also your responsibility to let your client know where the risks are.
Here is a list of questions, some of them pointed, and others just to make you think about what else might go wrong and what the cost of that would be. I don’t claimed the list to be complete. At the same time the list is too long to apply to most smaller projects. Use it as a source of ideas to build your own checklist.
General risk management
- Have we documented the client’s disaster scenarios?
- Have we documented how we mitigate these risks, how we recover from them?
- Have we tested the recovery processes?
- Have we implemented and deployed all elements of risk mitigation and recovery?
Application hosting
- Is the control pannel disabled if that’s possible?
- Is SSH access set up to use a public key encrypted password?
- Is SSH access (port 22) restricted to IP addresses belonging to the client or and/or Logimake?
- Is “root” user barred from using SSH?
- Are the usernames and passwords which can access SSH random strings?
- Is firewall set up? Does it allow access to ports other than 80, 20, 21, 22, 443, and any ports above port number 1024 which are needed for testing?
- Is the process for deploying the application on a “fresh” OS documented?
- Has this process been tested?
- Does the client know the time required to deploy the application on a new server?
- Have we benchmarked the hosting services tech support (responsiveness and quality)
- How long will it take us to recover from a complete and permanent meltdown at our hosting provider
Database
- What data in the database is sensitive in the commercial and legal sense?
- Have we documented how difficult it is for a hacker to access this data?
- If db transactions are used (commit / rollback), is the db engine for all the tables set to InnoDB?
- Is the database accessible on a TCP port or only on a socket?
- Can the database host, port, username, password and database name be guessed easily?
- Is there a need for the application to use multiple db users with different privileges on the db tables depending on the part of the application accessing the db?
- Have the usernames and passwords been changed to random strings?
- Do these db users have limited provileges, and only on the application db?
- Change the root username to a random string
- Is the root password set to a random string?
- Is the frequency of the backups sufficient?
- Are the backups sent offline instantly?
- Has the recovery process been documented?
- Has the recovery process been tested?
- Is the client aware of the potential loss of data between the last backup and the time of failure?
Storage
- What local files are sensitive in the commercial and legal sense?
- List all processes in the application which require hard-drive storage, and on which partition (Apache, MySQL, etc)
- Compare to available storage on each partition
- What are the means for notifying admins that storage space is running low?
- When these processes run out of storage space, do they corrupt data?
- When these processes run out of storage space, do they fail elegantly as seen by the user?
- Same questions about off-site storage (e.g. S3)
Error handling
- List all types of errors and exceptions that can be generated by each sub-system
- Are all types or errors and exceptions caught and handled in a way that the user or admins can recover from the error manually?
Response Time and Throughput
- Have we listed all I/O channels (I/O ports, HD access), their capacity, and the expected average and peak traffic?
- Have we load tested the application from each I/O channel?
- Are the test results documented and approved by the client as acceptable?
- Have we verified that the load balancers and load balanced resources have the expected I/O statistics?
- Have we verified that session handling is done as we expect by front-end load-balancers?
- Have we benchmarked replication / synchronization b/w database instances?
3rd-Party Systems
- Have we listed all the 3rd party systems?
- Have we documented information to access the following resources for each: administration panel, tech support, billing support, developer’s documentation
- When will each 3rd-party system be up for next payment? For renewal? Who will have the decision and purchasing power?
- What are the scheduled system outages?
- How do we deal with each of them?
- Have we listed all scenarios where an external system fails, and how our system handles that?
- Is the client aware of the risks presented by 3rd-party system failures?
Development and test data
- Is there any trace in the deployed code and/or data of development and/or test activity?
Security
- Have we listed the various types of data that are subject to security breaches (theft, fraud, etc)
- Have we documented the risks associated with each type of data being compromised?
- Have we reviewed the code for security using our internal security guide?
- Have we had the code audited by an external security expert?
- What means do we have for rapidly stopping and/or reverting damage being done by an ongoing security breach?
- Can we tell when data has been compromised, and which data?
- Does the client have a plan for controlling the damage resulting from security breach?
- Is the developer’s liability in case of a security breach documented, understood by the client, and accesptable to the client and to the developer?
