Now there has been a lot of discussion about whether cost per terabyte is a good metric when purchasing storage, nearly all of it dissenting (some of the dissent may even have come from my direction) but don't let me ever be accused of running with the herd or even consistency. I'm now going to argue that cost per terabyte is a fine metric for purchase of storage.
Now I wouldn't argue that cost per raw terabyte is a good metric for storage but per useable terabyte it certainly is. Indeed for me, it is sometimes the only metric!
Have I gone mad? Well I don't think so. But I simply have a different use case to most of you; I simply need capacity and throughput; I have a known and very consistent workload, it is pretty much purely deterministic and I don't need any of your fancy snaps, dedupe; replication etc. But I do need throughput measured in gigabytes per second and I need capacity.
Oh hang on but I am worried about physical space and environmentals and I am worried about scalability. I don't want to end up managing fourteen arrays when another vendor can do it in two and building a new data centre is not an option today; well, not until we've finished the new one. So even in this the most simple of cases where the requirements might be quite straightforward, demanding but straightforward; straight cost per terabyte is still not the complete picture.
And of course, note I said useable terabyte; useable terabyte is not just about RAID and hot-spare overhead; how much disk do I need to get the throughput I need. Why is life never simple?
But cost per terabyte can be an okay metric, you just need to know what that cost consists of.
p.s if you think your data-growth is bad, save a thought for me and my team; the move to 3D video is going to drive data-growth for us in the broadcast media sector to scary amounts. And not only does it drive data-growth, it is also going to drive raw throughput demands through the proverbial roof.
Martin-
You're talking out both sides of your mouth (Don't worry, it's a common customer affliction :-) ) You say $/TB usable is a good metric for you, then in the next breath you say you don't care about higher level functionality. You're failing to connect the most important dots that many large customers do- its about cost per TB used not usable- you have to give the vendors as much information about your data set to see if any of their 'whizbangs' can reduce your spinning rust requirements (Dedupe, Caching, SSD, etc.) The used is after not just RAID, Block Checksum, etc, but after those higher level functionalities.
A wise IT Director once said "all Terabytes are not created equal." He learned this the hard way. His normal procedure was to do all comparisons by price per TB RAW. He thought he was smartly getting the best bang for his buck- until we got him to expose the project requirements for the vendors. Then more interesting and advanced (and less costly) solutions came forward. He finally saw what the array vendors could provide from their R&D labs besides more/bigger disks.
Posted by: JustaStorageGuy | January 19, 2010 at 11:40 PM
Full Disclosure - I’m an Enterprise Architect with Xiotech :)
Nice blog post and you are speaking our language!! Go check out our Intelligent Storage Element (ISE) (http://xiotech.com/ise-technology.php ). Talk about a solution that is predictable!!
Also, check out our SPC-2 results here : http://www.storageperformance.org/results/b00031_Xiotech_Emprise5000-146GB_executive-summary.pdf We ran the test at almost full !!! Now that really is performance predictability !!
You can practically figure out how many Emprise 5000 (ISE's) solutions you would need to “Ring the bell” for your specific workload no matter if it’s 20% full or 95%!!.
I’m working on a VDI Blog post that talks about this predictability being really important when sizing for “boot storms”. The last thing you want to do is over buy, or under buy !! Predictability is a key !!
Again – great post.
@XIO_Tommyt
ttrogden.wordpress.com
PS - JustAStorageGuy - go read my blog on Cost per TB - ( http://bit.ly/4WvpiK ) i sort of agree with you. Not so much about the bloated stuff on the controllers like DeDupe etc, but in Cost per RAW TB's being a horrible metric when comparing solutions.
Posted by: Tommmy Trogden | January 20, 2010 at 06:18 AM
JASG; you should give me a bit more credit about knowing the use-case; in many cases you are right but in this particular use case you are wrong. This is a specific use-case where all the bells and whistles are not going to buy me very much but that's okay because many vendors don't get this either. And for this particular use we provided very specific known requirements which were based on throughput, capacity and known I/O access patterns.
The majority of storage vendors focus on a generic use-case and that's fine for 90% of the market but there are simply bits of the market which don't care for various reasons.
Your metrics vary from use case to use case; sometimes the features are important and sometimes not. I have various use-cases to deal with in life; sometimes it'll be $$ per terabyte and sometimes it'll be different.
No not all terabytes are created equal...but then again not all workloads are created equal either.
Posted by: Martin G | January 20, 2010 at 08:30 AM
Martin-
Actually I think we're saying the same thing- It's all about the specific workload you are looking to support. My point was just that some companies look for solutions to very specific workloads by using very general means (I need this many iops, this mix of r/w, this many tbs) which can cause them to miss out on higher level functionality that may skew those requirements (for the better hopefully).
Tommy-
I find what Xiotech is doing with ISE is very interesting way to handle some of the issues inherent in larger (1+TB) drives. That said, you are missing your own point. Its not about formatted capacity either. Those companies that you are comparing Xiotech to have much more functionality then your array that can (in some cases) expand their useable capacity way above yours- you may be writing to 95% of formatted capacity and they might only be writing 80%, but in that 80% they may be storing twice as much data as you (that bloated "dedupe" stuff).
To my previous post, the real utilization metric should be TB's stored. In Martin's case for this app, it may not included any of that functionality; for other workloads it might.
Posted by: JustaStorageGuy | January 20, 2010 at 01:10 PM
Martin, great post as always. Here's an interesting metric that someone came up with over at HP a long time ago--PB/admin. Carter wrote about this on our blog many moons ago. But basically the thinking is that when you're dealing with this level of scale you have to factor in the people costs. The complexity of managing unstructured data that is growing at an accelerated rate--exactly what you're talking about here--means that you had better come up with ways to simplify your stack. Just another perspective!
Here's the original post: http://onlinestorageoptimization.com/index.php/the-new-storage-cost-metric-petabytesadmin/
Posted by: Sunshine Mugrabi | January 20, 2010 at 07:12 PM
Oh I know of places which have petabytes per admin. Very proud the management is but if you look at their environment it's fairly close to the event horizon.
Posted by: Martin G | January 20, 2010 at 07:17 PM
JustaStorageGuy, The ISE is not designed to add all the bells and whistles of a typical Storage Enclosure that ends up having a scalable ceiling. The ISE is the "Best Storage Enclosure Building Block" available on the market. This is accomplished by designing and building an enclosure that has intelligence around the "dumb" drive. Think about what all storage solutions would like to be able to do when it comes to intelligently communicating with the telemetry of the physical device. By building a solution around those parameters, the building block can take care of the most important features of a storage solution. And that is to Store the business data, Protect the Business data with integrity and then be able to retrieve the data as rapidly and reliably as possible while not compromising that just because the business adds more data.
So the choice is to use an enclosure that relies on the thin and deep architecture where the storage resources come only from the controllers on the top ultimately making the system top heavy. Or to use an enclosure that enables the storage architecture to grow wide and deep while adding physical resources to every enclosure deployment. Predictably grow the architecture in the direction that the business requires.
Posted by: Travis Teddy | January 20, 2010 at 09:56 PM
Interesting article... I guess its sort of saying the same thing that i put in my article on jan 8th http://www.stuiesav.com/2010/01/measuring-cost-of-storage.html interesting your point on throughput also - maybe an additional charge metric is IOPS....????
as ever - top notch article!
Posted by: Stuart Savill | January 29, 2010 at 12:45 AM