What are Blocks and Recordsize?

When creating a ZFS dataset you can set a parameter known as recordsize, which is the block size used for your files.

When ZFS stores a file, it breaks up the file into blocks and keeps track of where on your storage device(s) it wrote these blocks. The recordsize can have significant effects on performance.

For large files, you should use a large block size. The larger the block size, the fewer bits of overhead that are consumed in tracking these blocks, and the fewer metadata entries need to be accessed when reading or writing these blocks.

Conversely, for small files, if you use too big a block size, you will end up wasting space. For example, if you write a 4 KB text file but you set your block size to 128KB, then you will waste 124 KB of space [1].

If you're writing a database to ZFS backed storage, you should generally match the recordsize to the database entry size.

Setting Recordsize

Normally, to set the recordsize to 1 MB for the dataset tank, you use the command

zfs set recordsize=1M tank

I read that ZFS supports up to 16 MB blocks. However, when I tried zfs set recordsize=16M tank, I got an error that 16 MB records wasn't supported. It is very poorly documented as to how to fix this. It actually requires a change to a configuration file. In Arch Linux, this file was /etc/modprobe.d/zfs.conf (it may be different for other distros). You have to add the line

options zfs zfs_max_recordsize=16777216

Afterwards, make sure to reload the Linux boot image. In Ubuntu you use initramfs, and in Arch Linux you use mkinitcpio -p linux.

  1. I've simplified this for brevity ↩︎