Is it possible to explain how to build a system that scales well as demand grows in less than 200 pages?
The answer is realistically no - in fact I doubt that it could be done in 200 pages, what is needed is experience and there is no substitute for it.
Firstly we need to define what makes something enterprise level, or at least enterprise class; I define it as having the following characteristics:
- Flexible and easily extended
- Split into layers and levels.
- Design for maintenance and reliability, tested.
Follow simple rules:
1. Find out the expected number of concurrent users, then multiply this by 100 as a base part. Or you could assume that you need to be able to handle 1million to begin with to make things simple.
Then bear this in mind whilst designing the basic architecture. At this point do not put in actual technologies - don't name languages, databases or servers - it channels the thinking away.
2. Draw a basic architectural diagram; make it fit onto a sheet of A3 - ideally A4, because basic architectural diagrams should be very high level, and each box on the basic diagram is going to have another sheet and so on.
At this point identify what could limit performance. Assume nothing. Invest time in building a test harness that simulates the throughput that is required, and see how the components perform. It doesn't need to be complicated - or even too close to the final design - just a representation to gain some metrics.
3. Guiding principles.
- Assume nothing
- Prototype and proove everything.
Design your database to avoid the need for locking and to avoid transaction commit failures. Be sensible about database normalisation - for example making a table contain its parent key is OK and it can increase the performance considerably. Make ownership of tables clear in the design and make it match the real world as closely as possible without needing to have too many tables. Always spend time and effort on the keys, make them as natural as possible and never, ever, allow the database to generate a unique key because generated unique keys temporarily mask the problem of a design providing unique keys where required.
4. Good Database Design
- Avoid locks
- Use natural keys - never DB generated unique ids
- Keep it simple - minimise the number of tables
Identify what needs to be stored in the database and it may even be sensible to have more than one database. So now that you have the basic architectural design consider the implementation and how each component may be required to be provided by one or more distinct servers - figure out how this will be done early on - and validate this by writing some code to test it. Ensure that as far as possible the dependence on specific technologies is minimal - e.g. ensure that the database is abstracted by a layer, and split components into tiers and peers.
Avoid dependence on specific technologies - use layers and generalised approaches.
In many ways designing an enterprise level solution based around web servers is a little easier, with load balancers able to assist, and even without load balancers it isn't hard to share out resource provision across different servers. Finally, and possibly most easily missed, is the importance of not allowing flights of fancy to creep into the design. Be ruthless about what is required and stick to the simplest solution - even if it is a simple background process update to create a static copy of static data. Don't be afraid of having items that aren't updated immediately - identify the processes that don't need to be instantaneous and build a simple system to manage the workload.
Be ruthless about what is required and stick to the simplest solution.
So in summary, design well, assume nothing, and performance test everything to identify bottlenecks. Use a good unit testing framework.
Sample Enterprise Application Architecure Diagram
The following diagram is based on a real live application architecture. On the any item with a drop shadow is an instance of a service. This instance could be on one of many servers.
There are many protocols and transmission methods used within this architecture
- CGI (Apache Common Gateway Interface)
- CORBA / IIOP
- Direct Linking (DLL / static library)
Emesary is used both within code modules and via both a TCP/IP and CORBA/IIOP bridge to allow communications between servers