I propose to go to work for a small start-up company that needs a DBA. The company is so new that I will not only be responsible for installing and managing a database application, but I will need to design the architecture that the database will reside on (decide RAID architecture) and the network (SAN or otherwise) that the data will travel on as applications makes requests of the database. While I will not be expected to support all of this once we have the equipment purchased and hardware installed, I am expected to make the recommendations and to justify them.
The company has 30 researchers (PHDs, BSs, research assistants) and one main product. It is always performing clinical trials in order to gather enough evidence that its’ product is effective. Logging of samples and surveys collected on sites generates lots of WRITING to the database, about xx GB/data per week. The manipulation and analysis of data is also something to investigate. Determining what the load will be in terms of READs will be necessary.
As I consider the problem and solutions for this company I will keep in the forefront of my mind the following:
• Mission Alignment
• Organizational Fit (Culture)
• User Access (Availability)
• Will users be able to do their jobs
• Security of Organization Resources
• Privacy Issues
• Return on Investment (ROI)
“Start up” company is moving to a new office space in a business park. It is made up of 30 scientists, research assistants, marketing, and business staff. Half of this group knows each other from collaborative work done together at their former biotech company, which was a successful venture that has now grown stale with its’ dependence on its’ former days of developing a successful diagnostic instrument. This spinoff wants to develop something new and has some creative and exciting ideas for a new direction.
With venture capital funding this company will be setting up office and lab spaces. Those familiar with the R&D of their former company’s product remember that former data labeling, collection and analysis was done “by hand” using handwritten labels, Excel spreadsheets and Access. But, the capabilities of today’s scientific instruments (fluorometers, spectrophotometers, High Pressure Liquid Chromatographs, Mass Spectrometers) allows for the automation of many processes. Barcoded patient samples can be fed into these machines and these machines can convert test results into electronic data that is fed directly to predesigned databases. Third party software allows investigators to perform high throughput screening on this data so relevant trends and significant statistics can be generated.
The importance of a DBA to this company cannot be overstated. With a well designed database and network backbone, scientific discovery will be hindered only by the pace at which scientists can run their tests and feed the database. Investigations into the scientific data generated in-house will need to be performed at desktops that are hard wire connected to the growing databases and the bandwidth needed to support queries must not slow down the work. Because of the centrality of the database’s role to the success of the company, the DBA has been asked to make not only recommendations for the server software and hardware that the database will reside on but for the network that will be used to access it.
Top technology priorities guiding the DBA’s decisions:
Requirement: to have a product ready in 5 years the network and servers will need to support high throughput data collection and analysis needs for the whole of that period without interruption.
Proposal: high bandwidth,
Organizational Fit (Culture)
Requirement: will stick with proven technology as this biotech company is committed to showing good stewardship of venture capital.
Proposal: steer clear of experimental IT technologies, focus of core values & culture of scientific community: reliable and consistent network experience. “The equipment used for communications over multi-mode optical fiber is much less expensive than that for single-mode optical fiber… Because of its high capacity and reliability, multi-mode optical fiber generally is used for backbone applications in buildings. “(Source: Wikipedia: Multi-mode optical fiber)
“Multimode fiber systems offer flexible, reliable and cost effective cabling solutions for
local area networks (LANs), Storage Area Networks (SANs), central offices and data
centers…. OM3 fiber is a logical and cost-effective choice for short-range applications that
need to support 1Gb/s or multi-gigabit speeds, especially when the cabling component
costs account for less than 3% of the total spend.” http://www.fols.org/fols_library/white_papers/documents/mmfiberwhitepaper.pdf
User Access (Availability)
Requirement: High availability during working hours loosely defined as being between 07:00 & 19:00. This is the typical workday, allowing for flexibility of start and end of day. There is no 24hour 365 days a year mandate, but remote access will be made possible using VPN and generally network will be available except for posted maintenance hours.
RAID: redundant arrays of independent disks.
Fiber – low maintenance.
Will users be able to do their jobs?
1. high speed collection & querying of large volumes of data
Proposal: A high bandwidth network that’s capable of transmitting data between computers at a rate of a gigabyte per second. In this company’s case it should be fiber optic cable which allows up to a Gigabyte of data to pass at any one time. Additionally, the company will need network interface cards of the standard IEEE 802.3z standard which are specifically for multi-mode fiber optic networks. These network cards will be installed on every computer on the company’s network (or LAN) and a fiber optic cable will be run between each card and a switch. A switch will act like a relay, passing information between computers using the MAC addresses that are assigned by the network cards. A switch, like Versitron LAN switch supports data speeds up to 1 Gb/sec will move data between computers on the LAN. Cisco is also likely to have solutions for finding the right switch. Any printers and scanners that are to be shared will need a network card as well or will need to be connected to a computer with a network card.
Finally a router will be needed for connecting the company’s LAN with external networks (ie. WANs or the outside world) will have to be capable of connecting with what whatever is type of equipment that is in the street, may be cable, copper, or fiber. Again I would contact Cisco.
Requirement: Database and hardware capable of high reads and writes.
- MS SQL Server 2008 on RAID 5
3. Safe: A lot of electrically powered equipment typifies the lab. Eg. An autoclave for sterilizing.
Proposal: Fiber Optics. With many electrical cables running through the walls there can be electrical interference affecting the network side if copper wires are used.
Security of Organization Resources
Requirement:-offsite backups of intellectual property, marketing data, human resources information, etc.
Proposal: back ups to tape
Requirement: Encryption to protect identities of clinical sample donors and company’s unpublished work
Proposal: TDE encryption and other types of encryption.
Return on Investment (ROI)
-installation must support data needs 5 years out so that no reinvestment into infrastructure is needed during the estimated product development life cycle and before a hoped for IPO.
Proposal: Less expensive LEDs are used for short-distance optical links such as enterprise backbones. The equipment used for communications over multi-mode optical fiber is much less expensive than that for single-mode optical fiber… Because of its high capacity and reliability, multi-mode optical fiber generally is used for backbone applications in buildings. (Source: Wikipedia: Multi-mode optical fiber)
Server architecture: Knowing that the network is only as fast as it’s slowest component it is important that we also invest in the right server. The best servers will be build with a redundant array of hard disk drives of the RAID 10 configuration.
“ Why? Excellent performance with Read and Write.
RAID 10 has advantage of both RAID 0 and RAID 1. RAID 10 uses all the drives in the array to gain higher I/O rates so more drives in the array higher performance. RAID 5 has penalty for write performance because of the parity in check.”
Redundant disks make sure that data is written twice. In the event of a disk failure there is always another disk that contains the data.
Application architecture: SQL Server 2008 will be used. To protect the data, it will be written to different drives. First of all, the operating system that SQL Server databases run on will be stored on a separate drive. In the event of an OS failure, one only needs to reload the OS. Database log files will be written to separate drives from the transactions files. Backup files will be written to separate drives as well. These measures must be taken to reduce the risk of data loss and reduces contention on one drive for reads and writes. This separations enables for concurrent processes to be occurring on separate disks thereby improving performance. This will be invisible to the end user and managed by the DBA. Even more granular work will be done to optimize the relationships between tables in the databases and the indexes that control sorts and retrieval of data.
Each of these measures is recommended for a high data throughput and high bandwidth environment.