The Storage Buddhist's blog about IBM Easy Tier with SATA and SSD here started me thinking.
And I quote from his blog
'Easy Tier was left to automatically learn the SPC-1 benchmark and respond (again, automatically)'
Very cool and IBM get some good results from this but I have a few questions and this goes out to all vendors who are doing automated tiering of some sort (I include NetApp's PAM here)
1) How long does the array take to learn the workload and how quickly does it take to achieve an optimal state?
This is very important because it impacts any new application deployment. Do I deploy to the fastest disk and artificially throttle the disk to required SLAs? Users are funny you see and if they discover that application performance is degrading over time, even if it is within the SLAs, they generally complain. Or do I deploy to the slowest tier and let the array tune itself to meet the required SLAs? Now, this might be the worst possible time to impact performance and may impact the user acceptance of the application?
Do I have to come up with some artificial way of simulating the production load and run that for a period to enable the array to tune itself prior to any real users? If so, that has impacts on my ability to respond quickly to my user's demands. And my dynamic data-centre becomes less dynamic than I want.
2) How does the automated tiering impact replication?
In traditional replication technologies; one tries to keep the layout of the local and the remote array in sync. This is challenging enough in a non-dynamic environment and is often a manual task. How do you keep the array layout in sync if the array constantly changing it's layout to reflect the load on it? How can you be sure that your remote recovery array is in an optimal state? Do you want to? I suppose the answer is very much dependant on my first question? How long does it take to reach optimal state? Is it hours? Is it days? What is the delta between optimal and non-optimal performance?
If you have to keep the arrays completely in sync; what is the impact of sending array layout changes to the remote array? How much additional network bandwidth do we require?
3) What is impact on restores both from tape and from snaps/clones?
The backup application has no idea how the underlying physical structure has been changed; it will just sequentially restore the blocks on the LUNs it can see. However the array has no idea if the blocks being restored are the hot blocks it moved onto SSD or the cold blocks it moved onto SATA. It could be a right dog's dinner and obviously it will see a completely different I/O pattern in a restore scenario; you might not want your array self-tuning itself at this point.
I suspect that the answers to these questions could be quite complex but I am certainly interested in how the various vendors mitigate some of the risk that I highlight. Automated Storage Tiering is still in it's infancy; I think we've a lot of lessons to learn.
Or do I worry too much?
Recent Comments