Pause for Testing

To recap, the major components of the ZFS storage server were:

  • PCI Case’s IPC-C3E-BAR65-XP-SAS 3U chassis w/16 hot-swap SAS/SATA2 bays
  • AMD Opteron 275 processors (dual-core, 2.2GHz)
  • Tyan S3892 (K8HM) motherboard.
  • 8GB ECC memory
  • 80GB boot drives (connected to the motherboard)
  • Supermicro AOC-SAT2-MV8 SATA 8 port RAID controller
  • 16× 500 Gb Seagate enterprise class SATA Hard drives

This configuration is intended to act as a streaming multimedia recorder/server, with the most demanding disk I/O workflow intended to be writing a stream of data coming in from two GigE links. Imagine what it might take to store uncompressed high-definition video. There are likely to be other demanding tasks, but this was the most extreme.

Tyan’s S3892 suffers from some ambiguous documentation. The PDF specification sheet states in the text that there are two 133MHz PCI-X slots, and one 100MHz slot, whereas the block diagram says they all run at 133MHz. The manual says nothing about it. There was no answer from emailing Tyan support. Using the block diagram as my guide, I split the two Supermicro controllers amongst the two PCI busses, and decided to start testing the hardware configuration to be sure.

Basically, I wanted to test and compare the model I previously blogged about.

    For mirrored configurations:
  • Small, random reads scale linearly with the number of disks; writes scale linearly with the number of mirror sets.
  • Sequential read throughput scales linearly with the number of disks; write throughput scales linearly with the number of mirror sets.
    For parity (RAID-Z, RAID-Z2) configurations:
  • Small, random I/O reads and writes scale linearly with the number of RAID sets.
  • Sequential read and write throughput scales linearly with the number of data (non-parity) disks.

Bonnie-64 was designed to turn up performance bottlenecks. That is precisely what I was looking for. Can I tell that one controller is on a bus that runs 75% the speed of the other? I could, in fact, but the overall combined performance was very decent. The tests did show limitations in my hardware, however.

I compared configurations from two to fifteen disks (I always want to have a hot spare running), with 2+1 RAIDZ vdevs, 4+1 RAIDZ vdevs, and (single) mirror vdevs. So each graph reflects fifteen runs of Bonnie-64:

  • With the 2+1 RAID-Z: 3, 6, 9, 12, or 15 disks in the zpool,
  • With the 4+1 RAID-Z: 5, 10, or 15 disks in a zpool, and
  • With the mirror configuration: 2, 4, 6, 8, 10, 12, or 14 disks in the zpool.

All of the graphs measure the number of data disks (i.e., the total number of mirror disks, but only the number of non-parity disks for the RAID-Z configurations), or in the case of random seeks, the number of RAID/mirror sets. All of the tests were performed with 32GB test files: from the results, it’s pretty clear we’re exceeding any cache issues.

Block Writes, MB/sec

Block writes were always going to be the metric I was most sensitive to, because of the above-described workflow. You can see that there is a strong levelling-off of block write performance just below 390 MBytes/sec. The mirrored configurations increase their write speed at half the rate of the RAID-Z configurations, as we would expect from the slower writes indicated in the model. The “ideal” line is fairly arbitrary, as it’s an extrapolation of performance from fairly few data points. It is, however, indicative of what the performance model might predict.

Block Reads, MB/sec

Sustained read performance is much less limited than with writes. The “ideal” line also has a 33% steeper slope than with the writes: it appears we consistently achieve four block reads in the same time as it take to do three block writes. Strangely, the 4+1 RAID-Z groups underperform by a fair bit (I can’t comment as to the statistical significance, at the moment, but it seems fairly consistent). The 7×2 mirror configuration tops out at 735 MB/s on reads, which seems fairly decent.

Random Seeks /sec

I’ll admit that the random seek performance figures baffle me a bit. Everything I’ve read so far suggested that random seek performance would scale linearly with the number of vdevs (or disks in the mirror). Instead, the numbers line up fairly well with a logarithmic graph. Am I running into lots of vibration? Am I hitting an unexpected bottleneck that’s unrelated to data transfer over the bus?

This is a beast of a post already. I’ll push this out to the world, and start writing up the next installment, wherein I note that one of the SATA controllers is, in fact, on a slower PCI-X bus, and what I do to fix it.

edit: All of this was on Solaris Express Community Edition, Nevada 70, with the ZFS boot patch applied.