With Automated Tiering solutions now rapidly becoming de rigeur, well in announcement form anyway; perhaps it's time for us to consider again what is going to go wrong and how much trouble the whole thing is going to cause some people.
Fully Automated Tiering solutions are going to cause catastrophic failures when they are deployed without understanding the potential pitfalls. I keep going on about this because I think that some people are sleep-walking into a false nirvana where they no longer need skilled people to manage their infrastructure estates; in fact, automated solutions means that you are going need even more skilled people; you will probably need less monkeys but you are going to need people who can think and learn.
Fully Automated Tiering solutions cannot be fully automated and be safe for most large businesses to run; they work on the premise that data gets aged out to slower and less performant disk, this reduces the amount of expensive fast disk that you require and can save you serious money. So why wouldn't you do this, it's obvious that you want to do this, isn't it?
Actually in most cases, it probably is; unstructured file data almost certainly is safe to shift on this basis, actually most unstructured data could be stuck on the lowest tier of disk from day one and most people wouldn't notice. In fact you could write most of it to /dev/null and not that many people would notice.
But structured data often has more complex life-cycle; accounting and billing systems for instance, the data can be very active initially eventually entering a period of dormancy when it basically goes to sleep. However, then something will happen to wake it up; periodic accounting runs, auditing and then the data will be awake and almost certainly required pretty sharpish. If your automated array has moved all the dormant data to a less performant tier; your periodic accounting jobs will suddenly take a lot longer than expect and potentially cripple your infrastructure. And that is just the predictable periodic jobs.
Automated Tiering solutions also encourage poor data management policies; actually, they don't encourage but they certainly reinforce a lot of bad practise which is in place at the moment. How many people have applications which have data which will grow forever? Those applications which have been designed to acquire data but do not age it out? Well, with Automated Tiering solutions, growing your data forever no longer attracts the cost that it once did; well, certainly from a storage point of view and it might even make you look good; as the data continues to grow, the amount of active data as a percentage of the estate will fall and hence the amount of data which sits on the lower tiers increases and you could argue that you have really good storage management policies in place. And you do!
However, you have really poor Data Management policies in place and I think that one of the consequences of Automated Tiering, is the need for Data Management; your storage might well be self-managing, your data isn't. If you do not address this and do not get a true understanding of your data, its life-cycle, its value; you are looking down the barrel of a gun. Eventually, you are going to face a problem which dwarfs the current storage-management problems.
What kind of problems?
- Applications that are impossible to upgrade because the time taken to upgrade data-in-place will be unacceptable.
- Applications that are impossible to recover in a timely fashion.
- Application data which is impossible to migrate.
- Data corruptions which down your application for weeks
And I'm sure there are a lot more. It's hard enough to get the business to agree on retention periods for back-ups; you need to start addressing Data Management now with both your Business and your Application teams.
The good news is that as Storage Management becomes easier, you've got more time to think about it; the bad news is you should have been doing something about it already.