Online Trading: How many users do you want?
Updated: Feb 15, 2020
A closed system for displaying currently published buy / sell prices, and working submitted orders between users.
- Java Spring Boot
- SQL Server
- Marionette Webapp
- EC2 instance with Windows serving content and API
- Route 53 to Elastic IP attached to the EC2 Instance
- Maintain user login sessions in Spring
- For performance, hold all state in memory (e.g. current order prices)
- Persist trades, user accounts, and metadata to SQL Server
A trading system must be highly reliable, secure, and performant; each interaction is transactional and cannot fail silently. The maximum number of concurrent users is not massive however there can be no discernible change in latency with varying user count. User's sessions must be completely segregated such that actions and events are not visible across accounts. Crucially any errors must be appropriately handled, with the system designed such that unexpected errors can be segregated. The commercial value per user event is relatively high, so a verbose log output can be afforded, along with status checks on events propagating through the system.
This system will require a single source of truth for matching orders into trades, naturally creating a single point of failure (PoF) on a business critical function. The stack should work to separate the PoF functionally ("separation of concerns") from the rest of the system functions, so that the PoF's implementation can be optimised.
Java is stable and performant with a large enough and well managed EC2 instance. EC2 instances do require ongoing maintenance, for example security patches and SSL certificates which will bring the service offline periodically. Java lends itself well to containerisation with the JVM, so such a system might be better suited for an ECS Fargate Service.
The SQL Server is again a proven unit and can be regularly backed up within AWS. A real time replica feed can be taken offsite, practically eliminating data loss problems. SQL Server in RDS requires periodic maintenance downtime, so depending on the trading hours this could be a problem.
A single EC2 (or ECS Fargate) instance is generally discouraged and best practise is to use multiple instances in parallel to mitigate against disasters such as hardware failure, deployment errors, and software errors. Furthermore, the user capacity of a system can be easily scaled to balance latency and cost. Architecting such a system requires that the application is stateless; this could be difficult in a system that is maintaining live orders in memory and might require different skillsets to write.
For the single source of truth (and PoF), a potential MVP solution is to use the existing SQL Server for holding current orders and use SQL Transactions from multiple stateless instances. Over time as the load increases, the SQL Server will become quite expensive to maintain and limit application performance. A more scalable solution would be to implement an AWS managed Memcached or Redis cluster for maintaining current trades in memory storage, move to an auto-scaling DynamoDB for user metadata, and use Cloudwatch (with the EC2 Cloudwatch Agent) for moving logs.
The infrastructure and software architecture as proposed would provide the functionality required for a minimum useable product. However, the implementation would bring some well known risks for which standard, albeit more expensive, solutions have already been developed. For example, at minimum we recommend that the probability of the single PoF failing is mitigated as far as reasonably practicable by re-architecting to use a different dedicated service.