ZFS performance models for a streaming server

I’ve been spending a fair bit of the last week puzzling through the various postings on ZFS performance. While Richard Elling’s blog posts were informative, they didn’t really tell me much about the workflow that interested me most: high-throughput multimedia streaming. I eventually took the question to the ZFS-discuss list, and got a lot of knowledgeable feedback. The essence of what I learned about various ZFS configurations, for my purposes (once I understood the superficial size/reliability/performance tradeoffs), gets boiled down to one choice:

Which is more important to your ZFS workflow:
random access or write performance?

I suppose this is old hat to people who are very familiar with RAID systems and/or ZFS, but it took some digging for me to find out. I’ll set it out in words, in contrast with the pretty but dizzying graphs at relling’s site. These guidelines obviously set aside any other bottlenecks, but as was consistently pointed out, media speed is usually the bounding factor with performance.

    For mirrored configurations:
  • Small, random reads scale linearly with the number of disks; writes scale linearly with the number of mirror sets.
  • Sequential read throughput scales linearly with the number of disks; write throughput scales linearly with the number of mirror sets.
    For parity (RAID-Z, RAID-Z2) configurations:
  • Small, random I/O reads and writes scale linearly with the number of RAID sets.
  • Sequential read and write throughput scales linearly with the number of data (non-parity) disks.

In other words, mirrors suffer on writes, collapsing to the number of mirrors, essentially. RAID-Z groups suffer most with random I/O, collapsing to the number of RAID groups, performance-wise, in those situations. A hypothetical table with two different configurations of 12 disks (four 3-way mirror sets vs. two RAID-Z2 sets) helps show the strong contrast:

Random I/OSequential I/O
config Read Write Read Write
mirror: 43 12y 4y 12z 4z
RAIDZ2: 2(4+2) 2y 2y 8z 8z

where y is the number of random, short IOPS and z is the sustained media throughput on the drives

My case is fairly clear: I want to both read and write multimedia streams fairly equally, so I favour RAID-Z groups. I don’t need the same sort of long data life that others do, so I set RAID-Z2 aside for now.

In my particular case, I have 16 500GB SATAII drives to work with for the RAID. I am committed to one hot spare, so I’m down to 15 drives. Once I get my server, I need to know how much—and when—performance degrades when excessive numbers of streams are added and/or Random I/O requests are added to the mix.

For a long time, I had assumed I would use three sets of five-drive RAID sets. Looking at three-drive sets, I have to consider whether a ~17% drop in peak streaming performance is worth a 67% improvement in baseline small, random I/O (essentially the worst-case scenario).

How do I arrive at that? Five 2+1 RAIDZ groups have 10 data disks compared to the 12 in three 4+1 RAIDZ groups. If I go from 4+1 to 2+1 groups, I lose 2 disks worth of data storage and the equivalent amount of max streaming capacity, but gain two more RAIDZ groups for working on the seeking for random I/O. Actually, another table really makes the picture quite clear. I throw in a set of mirrored drives as further food for thought.

Random I/OSequential I/O
config Read Write Read Write Capacity
RAIDZ: 3(4+1) 3y 3y 12z 12z 6.0TB
RAIDZ: 5(2+1) 5y 5y 10z 10z 5.0TB
mirror: 7*2 14y 7y 14z 7z 3.5TB