- Learning Ceph(Second Edition)
- Anthony D'Atri Vaibhav Bhembre Karan Singh
- 661字
- 2021-07-08 09:43:57
Storage drive capacity
Less capacious drive models are often attractive from a price/GB (or price/TB) angle than each generation's densest monsters, but there are other factors. Every drive needs a bay to live in and a controller / HBA / expander channel. Cheap drives are no bargain if you have to acquire, manage, rack, power, and cool double the number of servers in order to house them.
You may also find that the awesomeness of your Ceph cluster draws users and applications out of the woodwork, and the capacity you initially thought adequate might be severely taxed in the next fiscal year or even quarter. Clusters can be expanded by adding additional servers or swapping larger existing drives for smaller ones. The former can be a considerable undertaking from financial and logistical angles, if you can even go back to the data center well for additional racks in a suitable location. The latter can be done piecemeal, but will probably require significant engineer and logistical resources. Care is also needed to not disrupt ongoing client operations as your cluster rebalances.
One might then decide to provision the largest drives available up front. This can be an effective strategy, but with several critical caveats:
As we write, the spinning drives with the largest capacities often utilize a technology known as SMR to achieve stunning densities. SMR unfortunately presents a substantial write operation penalty, which often means they are unsuitable for Ceph deployment, especially in latency-sensitive block-storage applications. Elaborate caching may somewhat mitigate this drawback, but this author asserts that these are not the drives you're looking for.
Fewer, more-capacious drives also present a trade-off against more, less-capacious drives, in classic terms, more spindles are faster. With traditional rotating drives (also known as spinning rust) drive throughput tends to not nearly keep pace with drive capacity. Thus, a cluster built with 300 8 TB drives may offer much less aggregate speed (IOPS) than a cluster of the same raw capacity constructed from 600 4 TB drives.
Newer technologies may conversely present the opposite: often larger models of SAS, SATA, or NVMe drives are multiples faster than smaller models due to internal parallelism: internally, operations are distributed across multiple, smaller electronic components.
Thus, it is a crucial to plan ahead for the capacity you will need next month, next year, and three years from now. There are multiple paths to growth, but you will save yourself indescribable grief if you prepare a plan of some sort in advance. You may even choose to not fully populate the drive bays in your Ceph OSD servers. Say you choose a 12-bay chassis but only populate eight of those bays initially with 6TB drives. Next year, when you need additional capacity, you might populate those remaining four bays per server with eight or 10 TB drives at the same unit cost as today's smaller drives. Deployment would not require lockstep removal of those smaller drives, though that would still be possible down the road.
https://www.gbmb.org/tb-to-tib is a useful calculator for converting between the two. It shows us for example that a nominal 8 TB drive is in truth a 7.28 TiB drive. When a filesystem such as XFS is laid down, the structural overhead and the small percentage of space reserved for the root user additionally decrease the capacity reported by df especially. This distinction is widely known, but it's still quite easy to get tripped up by it.
That said, we primarily write TB and PB throughout this book, both because they are more familiar to readers, and to with the unit labels that Ceph and other tools display.