Storage Solutions Best enterprise solution for RHEL build environment required

sydras · Mar 20, 2015

Folks, I'm looking for the best enterprise level solution to build our company's software on.

We currently use RedHat Enterprise Linux (RHEL) as our Operating System in our lab. We used to use servers with HP-UX 11.1v1 v2 and v3 OS but due to the ability to run virtual servers on RHEL, we've completely switched to RHEL 5 and 6. Still, HP remains our preferred partner to purchase hardware from.

About two years ago, we introduced BL460c G7 blade servers and chassis to replace our rack servers. The much smaller box form factor, six power supply redundancy, multiple LAN connectivity, space and power savings were much appreciated. We also augmented our server capacity with BL 460c G8 servers and chassis and deployed new networks within the lab to accommodate the additional physical and virtual server IPs.

But due to limited HDD capacity (max. 2x600GB 2.5 inch SAS disk drives i.e. 600GB in RAID 1 or 1.2TB in RAID 0 which we rarely use) that came with each blade server, we introduced an HP P2000 G3 SAN server with 60TB total capacity (24TB SATA + 36TB SAS) with 8Gbps Fibre Channel access in our lab.

I mention all this just to give folks an idea of our enterprise level configuration for our department and obtain some suggestions towards purchase of the latest, fastest and best RHEL hardware at reasonable cost for a new product build activity at our company.

I hear of Flash based enterprise storage options for SAN and G9 servers these days but I get the feeling that our 3rd party HP vendors are not providing us with the latest options for the astronomical prices quoted. My key focus is speed and eliminating all bottlenecks to a rapid build of our product which in my opinion are hard disk and network latencies.

Also, would a rack server be better suited to build that a blade server? I miss the larger amount of storage directly attached to a rack server and I'm also not a fan of running virtual servers from SAN volumes attached to a blade server as I have two potential points of failure i.e. the SAN and the blade server itself.

I am not a networking/lab expert with any formal training so please excuse me if I'm missing out on something

. I've tried to explain our scenario as best as I could. Please put forth your suggestions as I would like to learn of similar successful deployments from folks who have some/any kind of idea of enterprise level lab exposure.

booo · Mar 20, 2015

What is your requirement w.r.t. applications? if all of this is just to run some web server/database etc?
What is the kind of throughput that you are looking at when you said hdd and network?
how much high availability do you want? i.e., how many nines?

btw conflict of interest desclaimer: I will never suggest you to go for HP because I think they are shit.

sydras · Mar 20, 2015

booo said:
What is your requirement w.r.t. applications? if all of this is just to run some web server/database etc?
What is the kind of throughput that you are looking at when you said hdd and network?
how much high availability do you want? i.e., how many nines?

btw conflict of interest desclaimer: I will never suggest you to go for HP because I think they are shit.

W.r.t. applications, we would probably not be running any web server or database. Our requirement is to run a daily build process that will pick packages from 2-4 different work sites in different geographies via rsync, run some processes to create a product build that would bundle these packages from different locations and deliver the product build back to these locations again via rsync.

Our product size is around 30G. A current build of our product on existing hardware takes around 4-6 hours. This may sound unreasonable but I want to reduce it to around half an hour.

Agree about HP. Their 3rd party vendor has disappointed us in both pricing and service. Quotes from Dell for similar hardware are more competitive but our company has an agreement to give first preference to HP w.r.t. procurement of enterprise hardware.

booo · Mar 20, 2015

1. If I were you I would install openstack on the hardware and create a private cloud. virtual server farm will remove the single point of failure for the servers.
2. build process (as in compile&link) is more cpu intensive than IO intensive. I dont think you will need Flash storage for that but more than that you will need processing power.
3. We used to build our code using incredibuild. it basically schedules the compilation to multiple machines and speeds up the process by a big factor. but that works only on windows and visual studio. gcc for linux is https://code.google.com/p/distcc/ never used it but it should work.
4. I dont think you really need high availability and enterprise SAN for this. SAN is supposed to be highly available. if you are expecting the SAN to fail then the design is at fault. what is your local fabric setup? ideally we use two switches as a minimum for configuring SAN env. but in your case I would simply use a NAS setup with bonded interfaces to avoid any storage failures. but then again you dont really need 8G SAN for 30-100GB per day throughput. e.g., our env with 8G SAN hits something like 8-10TB per day throughput yet there a lot of bandwidth left.
5. if your SAN array uses hybrid raid, the try to increase the capacity. this will cause more spindles to improve more IO perf. I am not really sure about this on HP arrays, but there should be some tools on the array to profile the IO patterns and throughput. but then again, build process is more of cpu intensive than io intensive.

sydras · Mar 27, 2015

booo said:
1. If I were you I would install openstack on the hardware and create a private cloud. virtual server farm will remove the single point of failure for the servers.
2. build process (as in compile&link) is more cpu intensive than IO intensive. I dont think you will need Flash storage for that but more than that you will need processing power.
3. We used to build our code using incredibuild. it basically schedules the compilation to multiple machines and speeds up the process by a big factor. but that works only on windows and visual studio. gcc for linux is https://code.google.com/p/distcc/ never used it but it should work.
4. I dont think you really need high availability and enterprise SAN for this. SAN is supposed to be highly available. if you are expecting the SAN to fail then the design is at fault. what is your local fabric setup? ideally we use two switches as a minimum for configuring SAN env. but in your case I would simply use a NAS setup with bonded interfaces to avoid any storage failures. but then again you dont really need 8G SAN for 30-100GB per day throughput. e.g., our env with 8G SAN hits something like 8-10TB per day throughput yet there a lot of bandwidth left.
5. if your SAN array uses hybrid raid, the try to increase the capacity. this will cause more spindles to improve more IO perf. I am not really sure about this on HP arrays, but there should be some tools on the array to profile the IO patterns and throughput. but then again, build process is more of cpu intensive than io intensive.

Thank you for your reply.

W.r.t. points 2 and 3, Our product build process does not involve any compilation but rather assembly of application packages delivered to us from various geographic locations to create an integrated product. I'm not a 100% sure on this though as we are yet to receive transition of the product build activity. Our specific build process seems to be I/O intensive as I do not see a lot of CPU usage during the build process. 1-2 processors seem to work at full load while the remaining 30 processors seem to be relatively idle.

W.r.t. point 4, we require a SAN volume to be attached to our product build server mainly to store daily builds, each of which is around 30G. The 500GB or so filesystem on our 600GB server disk (two 600GB disks in RAID 1) would run out of space after a few days as we produced a 30GB build each day. So, we attach say a 4TB SAN slice (or SAN volume as it is called) from the 60TB SAN that we have to our blade server as a physical device (/dev/sdX). We then deploy an ext3 ot ext4 filesystem on this device.
W.r.t. high availability, we are currently using a single SAN switch through which all our physical servers access our SAN. We'd already had a SAN switch failure about a year ago which led to a complete stop of activities for my group until the faulty SAN switch was replaced. We've already procured a 2nd SAN switch from HP for multipathing our server access to the SAN but HP folks for some reason seem to be unable to figure out how to connect our SAN to the 2nd SAN switch

.

I am not familiar with OpenStack and Hybrid RAID. I will read up on it and see whether we can use it for our specific activity.

booo · Mar 27, 2015

sydras said:
already procured a 2nd SAN switch from HP

sydras said:
unable to figure out how to connect our SAN to the 2nd SAN switch

it goes like this, you have two switches and two dual port hbas on the boxes, the even ports (port 0) on both the hbas go to even fabric (switch 0) and the odd ports (port 1) on both the hbas go to the odd switch. this way, the connection will be resilient to port failure, hba failure and the switch failure. same applies to the array too. this way, you will have around 8 paths to the san lun. no matter which one fails, the storage never goes down.

On the other hand if you setup a NAS server, you can have two port ethernet (either 1G or 10G) and then bond them. now each eth port connects to a different network switch. this way you will again get better HA.

sydras said:
we require a SAN volume to be attached to our product build server mainly to store daily builds, each of which is around 30G.

This means you need archival type storage and not high performance storage. you can simply do the build on the attached storage and transfer it to the archive (NAS) later. using 80k iops storage for archival is not advisable here as it would be too costly. From what I understand at this point, HP guys are simply trying to leech you by suggesting such thing.

So here is what I suggest... If your build process is disk io intensive until the process is done, then you should go for NVM. NVM or flash storage is a PCIe card which would give you around 145k iops. a 1.4 terabyte hdd would cost something like 5k. you can buy two and do a raid 1. Now, once the build is done, archive it on the current SAN(or a new NAS if you are thinking of building one) with more storage.

Being that said, I dont know much about blade servers and if they support/sell NVM storage at all. something like this http://h18006.www1.hp.com/products/storageworks/io_accelerator/

sydras said:
Hybrid RAID

http://h20566.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c00687518 page 5 discusses about it. basically you stripe a volume across a lot of drives (usually all the hdds in the array or storage group) to get faster storage. (RAID 50 RAID 60) etc...

sydras said:
OpenStack

basically vmware esx but opensource and kickass. developed by NASA and Rackspace. very actively being developed.

sydras · Apr 10, 2015

booo said:
This means you need archival type storage and not high performance storage. you can simply do the build on the attached storage and transfer it to the archive (NAS) later. using 80k iops storage for archival is not advisable here as it would be too costly. From what I understand at this point, HP guys are simply trying to leech you by suggesting such thing.

So here is what I suggest... If your build process is disk io intensive until the process is done, then you should go for NVM. NVM or flash storage is a PCIe card which would give you around 145k iops. a 1.4 terabyte hdd would cost something like 5k. you can buy two and do a raid 1. Now, once the build is done, archive it on the current SAN(or a new NAS if you are thinking of building one) with more storage.

Being that said, I dont know much about blade servers and if they support/sell NVM storage at all. something like this http://h18006.www1.hp.com/products/storageworks/io_accelerator/

http://h20566.www2.hp.com/hpsc/doc/public/display?docId=emr_na-c00687518 page 5 discusses about it. basically you stripe a volume across a lot of drives (usually all the hdds in the array or storage group) to get faster storage. (RAID 50 RAID 60) etc...

My apologies for my late response(was getting ready to travel for some company business). Thank you for your analysis and reply. It is very useful. The I/O accelerator card seems to be a cost effective solution already suggested earlier by folks at our remote site but was somehow ignored. I am now re-visiting this idea.

RAID50 and RAID60 are something new to us and we've never considered such a configuration earlier. I will examine just how costly it is v/s RAID6 and just how much performance gain we could expect v/s a RAID6 configuration.

Storage Solutions Best enterprise solution for RHEL build environment required

sydras

booo

BA BA BA BABANANA

sydras

booo

BA BA BA BABANANA

sydras

booo

BA BA BA BABANANA

sydras