Our parent firm was already using a home-grown research database for contribution and distribution of 400 GB of research information. It was suggested that we copy their setup and code to meet our rapid startup requirement. Our parent firm had implemented a Sybase ASE 11. 9 database running on high-end HP Unix servers (hp superdome with 256GB memory with a 2000GB (2TB) mirrored storage array)engineered for large-scale databases. This high-end architecture is ideal for the specific demands of online transaction processing workloads and high availability (never goes down).
Unfortunately, the whole setup is very expensive and very centralized (Hong Kong). In order to meet their performance criteria, they resorted to distributing all or parts of the database across multiple servers in different time zones. This was very expensive solution to maintain because of the database licensing, hardware, communication costs associated with distributing the data, and personnel to support the distributed databases (3 databases x 55 tables in each database x 4 countries).
We were starting with 6 countries and over 100,000 simultaneous users. The arent solution cost would have an initial capital expenditure of $4,200,000 and a run rate of $46,000 per month (to start). Our budget to start was some where around $125,000 (not including personnel). Using a copy of Microsoft SQL Server 2000, I prototyped the database and supporting structures (indexes, stored procs, tables, etc) in two weeks using our parent firm and another successful internet-based research supplier as templates for our base service.
The license for the software was $900, and the server was already being used for other non-production work. SQL Server license was free because we already had a development license. I set SQL (ANSI SQL-92), JAVA (J2EE), Javascript, XML, and HTLM as the baseline for our standards for development. Notepad was the development tool. I picked up a free evaluation copy of ColdFusion 5. 0 for the middleware components and downloaded JDK 1. 3 from Sun. com for the Java development environment.
The majority of the database activity revolves around a master transaction table for collecting and distributing data. Seperate reference tables (in 1-to-many relationships with the transaction tables) are used for user ccess, markets, countries, clients, contributors, names, address, etc, in order to minimize table maintenance and complicated screen inputs. Command-line batch jobs using SQL BULK INSERT we created to transfer data between database servers.
In six weeks we were ready to test the system’s performance on a Compaq Proliant 8500 class machine with 8 CPUs, 16GB memory, with external storage for 200GB. We had <3 second response time with a simulated load of 250,000 internet users. The rest of the hardware ($65,000) plus software (database, middleware, and tools) ran just under $35,000. To overcome the issues of high availability (very apparent when dealing with any Microsoft OS), we built redundant servers for $100,000 and installed clustering software.
Each dual cluster can support 500,000 users for just under $250,000. When we roll out the system in production, it’ll cost 1/4 of the parent firms system and support twice the number of users. And instead of using remote distributed databases for enhancing the access times, we contracted with an internet data caching service that manages this sort of application distribution problem for a fraction of the original cost.