« October 2008 | Main | December 2008 »

26 posts from November 2008

November 28, 2008

Unstructured Myth??

I see various figures thrown about as to how much unstructured data is produced, stored etc by people; often it is this data which people believe can be deduped,ILMed, migrated, compressed etc. This is often expressed as a percentage of the total data stored by a company, is it 10%, 15, 20%, 30% + but how much is there?

I suspect very few people really know; for example, nearly all of my unstructured data sits on my laptop hard-drive; a great deal of it is my offline mail folders, I'm a fairly heavy mail-user and probably have 2-3 gigs of PST file generated over three years. Documents, another couple of gigs; so I suspect all-in-all I have less than about 10 gigs of unstructured data and I suspect that I'm way above average in our company. Even if I was average, I'd only be really be looking at 150 terabytes of data and to be honest, that is less than 5% of our total storage estate.

Logs for various systems; should be compressed by default and we know logs generally compress really well. But we've had cron-jobs etc for years which rotate logs, compress them etc. So we need to keep them for longer and longer periods of time but even so we should be able to get that sort of stuff onto the right tier of disk straight away.

So what other unstructured data might people be storing; okay, I work for a media company and we have wodges of the stuff but that's slightly unusual.

I wonder how much unstructured stuff there really is? Any thoughts?  And of the unstructured stuff which users are storing, how much really belongs on the corporate storage in first place?

This is purely the anecdotal musings of a storage manager but what are the real figures?

November 27, 2008

Virtual Value

A recent blog by Tony Asaro had some of my fellow bods in the storage world rolling their eyes, letting out tired sighs and a general disbelief that something so hackneyed is still being blogged about.

Tony blogs about Intelligent Storage Tiering, the fact that huge amounts of data is basically not accessed 90 days after it has been created. Nothing especially new here and then obviously, what you need to do is implement virtualisation to allow you move this data around. This data movement is all seemless and invisible to the users and the applications etc....

This is such an over simplification that it is really beginning to get on my proverbials! Firstly, even without virtualisation I already have implemented storage tiering; I do it in the array. I have my primary disks which in my case are generally 300 gig FC and I have a lower tier which is 500 gig 'low cost FC'. I am considering using 1 TB SATA drives as well in the array. Simply implementing the two tiers has saved me a huge amount of money, I can use the lower tier as a clone target/replication target and for data which doesn't need the screaming performance that the users have become addicted to. I keep this all within an array boundary and to be honest, I don't really want a single application spread across multiple arrays if I can avoid it.

Currently my big space-hogs tend to be databases; I could tier these within the array (and arguably we do, we have another tier of disk which we don't talk about...we have some very small RAID-1 volumes where we can put redo-logs).

But to tier any more means alot of work, not by the storage team but the DBAs and application developers. In a previous job, we used Outerbay to achieve this but the tiering was a side-effect of the work we needed to do to get data out of a Peoplesoft environment to enable it to be upgraded in a reasonable time.

And how long do you carry out your data-classification for? When do you move data? After 90 days? Six months? A year? If you move the data, how quickly can you move it back? How do you ensure that you've got enough fast disk in case you need to move it back? What happens to that data which is written once, not accessed for 90 days, gets moved and then lets say an annual billing reconcilliation job runs and it needs access to all that data? Sure it's still accessible but unfortunately you've just dramatically increased the length of the billing run.

The magic bullet of Storage Virtualisation is really not magic at all if you want to reduce costs; it is a bullet which needs alot of aiming and callibration; it might hit the bulls-eye, it might just wing the target. You need to understand what you are doing, you could cripple your business!

I'll tell you where the greatest value of virtualisation is potentially for me, that is heterogeneous data-mobility. But it's not huge cross-array storage pools, not today anyway. And it's not currently intelligent storage tiering.

Concentrate on building me Storage Management tools which enable to me easily apply my intelligence to storage tiering.






 

Happy Thanksgiving...

To my American readers....Have a happy Thanksgiving!

Enjoy your turkey, think of us stuck in work!

And tomorrow, go and do your best to stimulate the world's economy in the 'Black Friday' sales!!

November 26, 2008

A New Addiction

I have identified a new medical condition, 'Speedy Storage Dependancy'. This addiction is suffered from many developers, DBAs, PMs and many others in the IT industry. It can take many forms depending on the addict but the most common form is based on the instant gratification recieved by putting your poorly written and designed application onto fast disk.

Unfortunately, the addiction is often accompanied by another addiction 'Supersized Storage Dependancy'. This has the impact of making the addictions exponentially more expensive as unfortunately no company has yet developed a 'SuperSized Speedy Storage Device'.

Often the excuse for this addiction is that it is too hard to change now and the alternatives are just too expensive to comtemplate!

As the enablers to this addiction, we in the infrastructure teams need to stage an intervention and develop alternative coping mechanisms. Otherwise the dream of Clustered Storage, Cloud Optimised Storage and fluffy Fairyland Storage will just remain that, a dream!! We need to point out the behaviour which leads to these addictions but in order to do this, we need assistance from the manufacturers of the addictive substances.

Unfortunately, the manufactuers of the addictive substances appear to be more interested in creating the crystal meth equivalent to yet further fuel the 'Speedy Storage Dependancy'. Just say No to SSD until a suitable dependancy plan is developed!

We need bring the consequences of this addiction to the attention of the addicts. I am going to campaign for a health warning to be put on all disk arrays in future, 'This Disk Array may lead to overweight, unfit and unsafe applications which may damage the health of your company!'.

November 25, 2008

Storage Vending Machine?

Self-service storage provision every now and then comes on the menu at work and I'm wondering if anyone has ever seen this work on a large scale? Has anyone tried it in a complex multi-tiered storage environment.

When I say self-service, I mean the user is presented with a series of questions and at the end of the process,the storage is automagically provisioned and made available. This sounds all very well and dandy but what really concerns me is how this would be done practically and I'm wondering if anyone has used tools or even built their own? I think things like wide-striping make it more feasible but I think the questions to guide the user may elicit blank responses.

For example, how do you ask a user to define performance? Ask most users to define their workload in IOPs and they will have no clue. Ask them whether the workload is going to be random I/O? Another blank look I suspect. Ask them whether the workload is high, medium or low; you'll get an answer, wouldn't bet your life on it.

So let's think of the questions which really need to be asked and the sort of detail we need to make appropriate decisions? 

  •  Space Requirement - Today and 18 months hence?
  •  I/O density - (iops per Gig) and does it increase with the amount of space?
  •  Availability Requirement?
  •  Sequential/Random mix?
  •  Read/Write mix?
  •  Peak load time?
  •  RPO?
  •  RTO?

I am quite happy with the concept of automated provisioning tools which can be used by storage administrators and I'm certainly happy with the concept that my teams can manage more storage per head. Automation is a good thing but are we really ready for self-service provision of storage? 

Perhaps if we had a one size fits all uber-tier? Aha!! XIV here we come!!! Or perhaps, self-aware-storage!

November 24, 2008

Spin Spindown to me?

Like a certain Alex McDonald, I don't get spin-down! Okay, I get the idea; when a disk isn't being accessed, after a certain period of time you can spin it down and hence you can save power, wave your little eco-worrier flag and feel good about yourself. But does it work?

Okay, first things first; I do not believe that spin-down fixes one of the fundamental challenges that a lot of data centre managers face today; they simply cannot get enough power into their data centres. Their data centres may be in the wrong place but that's not an easily solvable issue without building new ones (and there isn't alot of appetite for that at this moment). So they need to engage their local power-companies to upgrade the power coming into the buildings; not a cheap options.

But hang-on a minute, surely spun-down disks require less power; actually no, not really, they need less power on average but you still need to be able to cope with the peak-load which is everything spun-up. It might not be a likely scenario but if you can't supply enough to power to the spin them all up; you might find some interesting problems. I've been through the fun and games of having to shut things down extremely quickly due to over-loaded and over-heated PDUs; ones with smoke pouring out of them and electricians with fire extinguishers trying to buy us time to do clean shutdowns. So I still need to upgrade my power.

And then, wide-striping is going to completely stuff spin-down. So we'll end up with a non-wide-striped pool; I'd even suggest that you'd probably want to avoid parity RAID and stick with RAID-1.

Making spin-down work in a block-environment is going to be a challenge; volume managers are probably going to spin all the disks up in a volume. I guess a 1-to-1 relationship between disk and volume would work but I reckon most of the disks would still end up spinning. Or I could define spin-down pools I guess and use some kind of stub to point to the migrated files.

Spin-down in NAS environments may be more practical and may be easier to implement.

And yes, it works pretty well with desk-tops (mostly avoids crashing my computer).

But I'm really struggling at the moment to see how it's going to work at the moment; I suspect I need to do some more research but some pointers would be nice! And what's wrong with tape? It's the ultimate spun-down magnetic rust!






November 21, 2008

Dry your eyes - No More Tiers

In valiant effort to redefine storage tiers, Kostadis has come up with a very practical and useable set of differentiators for storage tiers and getting across the differences between the levels. We can call them what we want but they are tiers!

I did some work on this before the summer but really couldn't come up with clear definitions and an explanation as to what I meant; so I'm now going to plagarise Kostadis' work.

Dedicated Performance Tier - for those applications which have performance demands which mean that they are not good sharers - Exchange is a good example, badly written Oracle databases; anything which needs spindles to provide raw I/Os. In the ideal world, you'd give them their own array. You will probably see them utilising 70%+ of the available I/Os of a spindle but often space utilisation is dreadful and you still might find yourself trying to hand-tune. You are really talking about 15K Fibre Channel disks or SSDs.

Shared Performance Tier - for those applications which have a fair balance between I/Os and capacity. These applications are generally fairly cache-friendly, you'll still find a fairly high % of the I/Os utilised and space utilisation is better. I'd suggest things like VMWare images live quite happily on this tier. You are probably talking about 10K, larger Fibre Channel disks but you might get away with 500 gig SATA/Low cost fibre. You probably don't want to go much larger as your I/O density drops too much and your space utilisation drops.

Capacity Tier - for those applications which are just space-hogs; file-serving etc. You should be able to drive up the space utilisation much higher and you can probably use large SATA disks; you should be aiming to drive up the space utilisation as high as possible. I/O utilisation may be high but you don't really care; cheap as chips is the order of the day.

In each of the tiers, you will have service offerings; replication, snap-shots, dedupe, encryption etc. Availability in my case is a given, 99.999% during the service day; however service days may differ, so I might be able to take planned downtime but unplanned downtime must correlate to five nines availability. And each of the tiers, your presentation layer could be block or file.

You maybe able to come up a couple more tiers; I can think of a couple which are useful in my specific circumstances; one of which is tape.

Now why is this at all useful? Well, it potentially allows the infrastructure guys to really articulate the impact of poorly written applications or at least applications which don't play nice. It also allows us to explain why at least some disk utilisation rates are so poor and ensure that responsibility to drive down the TCO of storage is a shared responsibility. It is also important that the two utilisation metrics are articulated

  • Available/Utilised I/Os
  • Available/Utilised Space

To be honest, this has probably helped me more than you dear reader in that it has helped me crystalise some ideas in my own mind! But for the rest of today, I shan't be worrying about storage as I've got a day off!

p.s I lied...there are still Tiers!

November 20, 2008

Everything Connects

In a previous post I decided to shine a light on the murky world of SAN administration and blithely stated that the complexity of fibre-channel is over-stated. And I'll stand by that; I can provision storage in the fibre-channel world pretty much as fast as NAS. I don't need to worry about getting IP addresses allocated, ensure routing is correct and the bane of our lives, ensure that firewalls aren't getting in the way. So when all things are considered, fibre-channel administration is pretty easy until.....

You want to upgrade a component's firmware! Then life becomes nightmarish; last year it took us three months to plan and implement a firmware update on some of our DMXs. Why? Well, we had to ensure that all our servers were at the right firmware; operating-system at the right patch; Oracle at the right patch; ensure that no conflicts were found; then we had to do our directors, then finally we got to do our DMXs. And guess what? We had some problems; so we re-checked everything, only to find that the levels had changed and the HEAT reports flagged things red which were fine when they were run previously. It actually turned out to be something else....

And then I want to install a new DMX4 and guess what? The whole process starts again!

If I want to upgrade some NAS firmware, I don't worry about any of this and that is where NAS wins currently. So FC guys, can you make that easier?  Can you not just blithely insist that the latest firmware levels are what we need to go to? I want to know if something won't work obviously but I'll tell you another thing; when my guys raise a support call, tell your support guys that simply telling them to go to the latest level doesn't wash! They'll ask where in the release notes for the latest patch it states that it will fix our problem...why? See above! It simply isn't realistic to track every firmware....and obviously, we'll never go to the first release of anything!

November 19, 2008

Wide Boy...

Marc 'Driving without Due Care and Attention' Farley has posted his top 10 Storage Innovations on his blog and unsuprisingly Thin Provisioning makes his list but he also gives a passing mention to wide-striping and suggests that might turn out to be as innovative. And he's right; in fact I would suggest that it is more important and will be foundational for all new arrays based on spinning rust (apart from the Platypus using NetApp, which stripes wide but in a diferent way).

Now whether 3Par can claim the innovation or not, I'll leave to our current community historian, Stephen Foskett but from an end-user point of view it is massively important. Our storage estates are simply too large to manage at the micro-level generally, wide-striping allows us to manage at a macro-level.

The next big thing is policy-driven storage management; no, not Atmos; but assigning policies to Luns defining performance, protection, ILM-flows etc. Yes, it already exists today but in my experience, it's not widely used and trusted. Yes, you could do this without wide-striping but I think it makes the whole thing easier from an implementation point of view and hey, it might discourage some my admins thinking that they can do a better job than the automated routines...sometimes they can but life isn't long enough.

November 18, 2008

Echo and Bounce....

EMC's announcement of Decho is a good start of what I was trying to articulate here; I hope this is the beginning of services which will allow us all to manage our Digital Lives more easily. I had a suspicion that we were going to see something along these lines and I am expecting more especially in light of the Atmos announcements last week; I'm working on another blog post about Atmos and some thoughts on that. It's just Mozy today but hopefully we're going to see more than just online back-up services.

I wonder what level of integration we are going to see with other online services; for example, wouldn't it be cool when we purchase something from iTunes for it not only to be downloaded to our home-system but for it to move to our digital vault as well? When we take a picture, perhaps with an Eye-Fi enabled card for it to get automagically uploaded to our digital vault? Is it only me or are not the possbilities for this sort of service fantastic? That's the digital Echo...

And the digital bounce; well, I want to be able to bounce between formats depending on the device I'm accessing the data from. So that the presentation of the asset becomes transparent to me.

Also, I think we are witnessing something very interesting with EMC and the direction it is taking. The ambition in moving beyond it's traditional boundaries/markets and what it acquires next....eyes too big for its belly? We'll see.

p.s 'Zilla, I didn't answer your little teaser because someone had already hinted. But I'm enjoiyng your little teaser campaigns, you're wasted.......in your job! A role in marketing surely beckons!