Skip navigation

Category Archives: Linux

tl;dr: Hard drives already do this, the risks of loss are astronomically low, ZFS is useless for many common data loss scenarios, start backing your data up you lazy bastards, and RAID-5 is not as bad as you think.

Bit rot just doesn’t work that way.

I am absolutely sick and tired of people in forums hailing ZFS (and sometimes btrfs which shares similar “advanced” features) as some sort of magical way to make all your data inconveniences go away. If you were to read the ravings of ZFS fanboys, you’d come away thinking that the only thing ZFS won’t do is install kitchen cabinets for you and that RAID-Z is the Holy Grail of ways to organize files on a pile of spinning rust platters.

In reality, the way that ZFS is spoken of by the common Unix-like OS user shows a gross lack of understanding of how things really work under the hood. It’s like the “knowledge” that you’re supposed to discharge a battery as completely as possible before charging it again which hasn’t gone away even though that was accurate for old Ni-Cd battery chemistry and will destroy your laptop or cell phone lithium-ion cells far faster than if you’d have just left it on the charger all the time. Bad knowledge that has spread widely tends to have a very hard time dying. This post shall serve as all of the nails AND the coffin for the ZFS and btrfs feature-worshiping nonsense we see today.

Side note: in case you don’t already know, “bit rot” is the phenomenon where data on a storage medium gets damaged because of that medium “breaking down” over time naturally. Remember those old floppies you used to store your photos on and how you’d get read errors on a lot of them ten years later? That’s sort of like how bit rot works, except bit rot is a lot scarier because it supposedly goes undetected, silently destroying your data and you don’t ever find out until it’s too late and even your backups are corrupted.

“ZFS has CRCs for data integrity”

A certain category of people are terrified of the techno-bogeyman named “bit rot.” These people think that a movie file not playing back or a picture getting mangled is caused by data on hard drives “rotting” over time without any warning. The magical remedy they use to combat this today is the holy CRC, or “cyclic redundancy check.” It’s a certain family of hash algorithms that produce a magic number that will always be the same if the data used to generate it is the same every time.

This is, by far, the number one pain in the ass statement out of the classic ZFS fanboy’s mouth and is the basis for most of the assertions that ZFS “protects your data” or “guards against bit rot” or other similar claims. While it is true that keeping a hash of a chunk of data will tell you if that data is damaged or not, the filesystem CRCs are an unnecessary and redundant waste of space and their usefulness is greatly over-exaggerated by hordes of ZFS fanatics.

Hard drives already do it better

Enter error-correcting codes (ECC.) You might recognize that term because it’s also the specification for a type of RAM module that has extra bits for error checking and correction. What the CRC Jesus clan don’t seem to realize is that all hard drives since the IDE interface became popular in the 1990s have ECC built into their design and every single bit of information stored on the drive is both protected by it and transparently rescued by it once in a while.

Hard drives (as well as solid-state drives) use an error-correcting code to protect against small numbers of bit flips by both detecting and correcting them. If too many bits flip or the flips happen in a very specific way, the ECC in hard drives will either detect an uncorrectable error and indicate this to the computer or the ECC will be thwarted and “rotten” data will successfully be passed back to the computer as if it was legitimate. The latter scenario is the only bit rot that can happen on the physical medium and pass unnoticed, but what did it take to get there? One bit flip will easily be detected and corrected, so we’re talking about a scenario where multiple bit flips happen in close proximity and in such a manner that it is still mathematically valid.

While it is a possible scenario, it is also very unlikely. A drive that has this many bit errors in close proximity is likely to be failing and the the S.M.A.R.T. status should indicate a higher reallocated sectors count or even worse when this sort of failure is going on. If you’re monitoring your drive’s S.M.A.R.T. status (as you should be) and it starts deteriorating, replace the drive!

Flipping off your CRCs

Note that in most of these bit-flip scenarios, the drive transparently fixes everything and the computer never hears a peep about it. ZFS CRCs won’t change anything if the drive can recover from the error. If the drive can’t recover and sends back the dreaded uncorrectable error (UNC) for the requested sector(s), the drive’s error detection has already done the job that the ZFS CRCs are supposed to do; namely, the damage was detected and reported.

What about the very unlikely scenario where several bits flip in a specific way that thwarts the hard drive’s ECC? This is the only scenario where the hard drive would lose data silently, therefore it’s also the only bit rot scenario that ZFS CRCs can help with. ZFS with CRC checking will detect the damage despite the drive failing to do so and the damage can be handled by the OS appropriately…but what has this gained us? Unless you’re using specific kinds of RAID with ZFS or have an external backup you can restore from, it won’t save your data, it’ll just tell you that the data has been damaged and you’re out of luck.

Hardware failure will kill your data

If your drive’s on-board controller hardware, your data cable, your power supply, your chipset with your hard drive interface inside, your RAM’s physical slot connection, or any other piece of the hardware chain that goes from the physical platters to the CPU have some sort of problem, your data will be damaged. It should be noted that SATA drive interfaces use IEEE 802.3 CRCs so the transmission from the drive CPU to the host system’s drive controller is protected from transmission errors. Using ECC RAM only helps with errors in the RAM itself, but data can become corrupted while being shuffled around in other circuits and the damaged values stored in ECC RAM will be “correct” as far as the ECC RAM is concerned.

The magic CRCs I keep making fun of will help with these failures a little more because the hard drive’s ECC no longer protects the data once the data is outside of a CRC/ECC capable intermediate storage location. This is the only remotely likely scenario that I can think of which would make ZFS CRCs beneficial.

…but again: how likely is this sort of hardware failure to happen without the state of something else in the machine being trashed and crashing something? What are the chances of your chipset scrambling the data only while the other millions of transistors and capacitors on the die remain in a functional and valid working state? As far as I’m concerned, not very likely.

Data loss due to user error, software bugs, kernel crashes, or power supply issues usually won’t be caught by ZFS CRCs at all. Snapshots may help, but they depend on the damage being caught before the snapshot of the good data is removed. If you save something and come back six months later and find it’s damaged, your snapshots might just contain a few months with the damaged file and the good copy was lost a long time ago. ZFS might help you a little, but it’s still no magic bullet.

Nothing replaces backups

By now, you’re probably realizing something about the data CRC gimmick: it doesn’t hold much value for data integrity and it’s only useful for detecting damage, not correcting it and recovering good data. You should always back up any data that is important to you. You should always keep it on a separate physical medium that is ideally not attached to the computer on a regular basis.

Back up your data. I don’t care about your choice of filesystem or what magic software you write that will check your data for integrity. Do backups regularly and make sure the backups actually work.

In all of my systems, I use the far less exciting XFS on Linux with metadata CRCs (once they were added to XFS) on top of a software RAID-5 array. I also keep external backups of all systems updated on a weekly basis. I run S.M.A.R.T. long tests on all drives monthly (including the backups) and about once a year I will test my backups against my data with a tool like rsync that has a checksum-based matching option to see if something has “rotted” over time.

All of my data loss tends to come from poorly typed ‘rm’ commands. I have yet to encounter a failure mode that I could not bounce back from in the past 10 years. ZFS and btrfs are complex filesystems with a few good things going for them, but XFS is simple, stable, and all of the concerning data loss bugs were ironed out a long time ago. It scales well and it performs better all-around than any other filesystem I’ve ever tested. I see no reason to move to ZFS and I strongly question the benefit of catching a highly unlikely set of bit damage scenarios in exchange for the performance hit and increased management complexity that these advanced features will cost me…and if I’m going to turn those features off, why switch in the first place?

Bonus: RAID-5 is not dead, stop saying it is

A related category of blind zealot is the RAID zealot, often following in the footsteps of the ZFS zealot or even occupying the same meat-suit. They loudly scream about the benefits of RAID-6, RAID-10, and fancier RAID configurations. They scorn RAID-5 for having terrible rebuild times, hype up the fact that “if a second drive dies while rebuilding, you lose everything!” They point at 10TB hard drives and do back-of-the-napkin equations and tell you about how dangerous and stupid it is to use RAID-5 and how their system that gives you less space on more drives is so much better.

Stop it, fanboys. You’re dead wrong and you’re showing your ignorance of good basic system administration practices.

I will concede that your fundamental points are mostly correct. Yes, RAID-5 can potentially have a longer rebuild time than multi-stripe redundant formats like RAID-6. Yes, losing a second drive after one fails or during a rebuild will lose everything on the array. Yes, a 32TB RAID-5 with five 8TB drives will take a long time to rebuild (about 50 hours at 180 MB/sec.) No, this isn’t acceptable in an enterprise server environment. Yes, the infamous RAID-5 write hole (where a stripe and its parity aren’t both updated before a crash or power failure and the data is damaged as as result) is a problem, though a very rare one to encounter in the real world. How do I, the smug techno-weenie advocating for dead old stupid RAID-5, counter these obviously correct points?

  • Longer rebuild time? This is only true if you’re using the drives for something other than rebuilding while it’s rebuilding. What you really mean is that rebuilding slows down less when you interrupt it with other work if you’re using RAID levels with more redundancy. No RAID exists that doesn’t slow down when rebuilding. If you don’t use it much during the rebuild, it’ll go a lot faster. No surprise there!
  • Losing a second drive? This is possible but statistically very unlikely. However, let’s assume you ordered a bunch of bad Seagates from the same lot number and you really do have a second failure during rebuild. So what? You should be backing up the data to an external backup, in which case this failure does not matter. RAID-6 doesn’t mean you can skip the backups. Are you really not backing up your array? What’s wrong with you?
  • RAID-5 in the enterprise? Yeah, that’s pretty much dead because of the rebuild process slowdown being worse. An enterprise might have 28 drives in a RAID-10 because it’s faster in all respects. Most of us aren’t an enterprise and can’t afford 28 drives in the first place. It’s important to distinguish between the guy building a storage server for a rack in a huge datacenter and the guy building a home server for video editing work (which happens to be my most demanding use case.
  • The RAID-5 “write hole?” Use an uninterruptible power supply (UPS). You should be doing this on any machine with important data on it anyway! Assuming you don’t use a UPS, Linux as of kernel version 4.4 has added journaling features for RAID arrays in an effort to close the RAID-5 write hole problem.

A home or small business user is better off with RAID-5 if they’re also doing backups like everyone should anyway. With a 7200 RPM 3TB drive (the best $/GB ratio in 7200 RPM drives as of this writing) costing around $95 each shipped, I can only afford so many drives. I know that I need at least three for a RAID-5 and I need double as many because I need to back that RAID-5 up, ideally to another machine with another identically sized RAID-5 inside. That’s a minimum of six drives for $570 to get two 6TB RAID-5 arrays, one main and one backup. I can buy a nice laptop or even build a great budget gaming desktop for that price, but for these storage servers I haven’t even bought the other components yet. To get 6TB in a RAID-6 or RAID-10 configuration, I’ll need four drives instead of three for each array, adding $190 to the initial storage drive costs. I’d rather spend that money on the other parts and in the rare instance that I must rebuild the array I can use the backup server to read from to reduce my rebuild time impact. I’m not worried about a few extra hours of rebuild.

Not everyone has thousands of dollars to allocate to their storage arrays or the same priorities. All system architecture decisions are trade-offs and some people are better served with RAID-5. I am happy to say, however, that if you’re so adamant that I shouldn’t use RAID-5 and should upgrade to your RAID levels, I will be happy to take your advice on one condition.

Buy me the drives with your own money and no strings attached. I will humbly and graciously accept your gift and thank you for your contribution to my technical evolution.

If you can add to the conversation, please feel free to comment. I want to hear your thoughts. Comments are moderated but I try to approve them quickly.

(If you just want the good stuff and don’t care about the story, visit the snd2remote web page. Bottom line: run a listener on the LAN, then on a Linux box type “snd2remote C:/WINDOWS/Media/ding.wav”)

Problem: You’re running applications on a remote Linux system across a LAN, possibly using an X server such as Xming or x11vnc to view them on a Windows machine.  The programs run fine, displaying on the Windows machine, but there is no way to hear notifications on events such as receiving an email or IM. Every program has an option that lets you “run a custom command” on notification events, or has an available add-on that enables this functionality. There is no custom command that makes sounds play on the Windows machine you’re actually using.

Solution: Roll your own solution using free tools that are already available!

I use Mozilla Thunderbird for email and Pidgin for all instant messaging services. For two years, I have sat directly in front of the server at the shop and used these applications. Now, however, I use an old Windows laptop that we can’t sell for technical reasons along with Xming to run these Linux apps on my sever and show them as apps on my Windows laptop.  Unfortunately, as I have been getting busier and busier, I have been forgetting to periodically check my mail window.  Thunderbird in Xming has no tray mail notification and no sound on the Windows machine at all; likewise with Pidgin, which (at best) can put a star in the chat window title.  However, Pidgin on Linux comes with an option to “use a custom command” for playing sounds.  Thunderbird doesn’t, but an add-on easily adds “during new mail notification, run this command” capabilities to it.

So, I know that I CAN do this if I can run a command which plays the sound on the Windows machine, but a brief Google hunt turns up nothing of interest.  Oh, I’m sure somehow I can use JACK or some other complicated audio system, but I could care less about setting that up and dealing with the extra admin overhead if there’s a simpler way!

It takes some real Linux geek thought processes to do, but I figured out a way that, regardless of which machine in my shop I sit at, I can get audio notifications for my Xming apps (or any app that can run a command to play a sound for that matter).  The answer: UDP broadcast the sound you want to play to all the machines on the network.

I created a bash shell script and a Windows NT/2000/XP command prompt  batch file (it uses SET /P which I don’t think DOS/Win9x/Me have) that pair together to trigger sounds on my Windows machine from my Linux machine. An additional constraint which I artificially imposed on myself to minimize the number of required downloaded files was to only use Windows batch commands and not “cheat” by pulling a UNIX-style shell, sed, grep, and other UNIX commands.  This also forced me to learn some ways to do things in a batch file that I had never done before, meaning that not only do I make this easier to duplicate for others, but I’ll be a better Windows admin too.

I had to download the following free tools and Linux packages to make it work (links provided):

Here’s how we do it. The computer which is generating the notification event needs to order the computer at which we’re actually sitting to play some sort of sound. Ideally, we need to be able to play lots of different sounds, too. So, the quick and easy solution was to use socat (and netcat for Windows since there’s no native socat port) to broadcast the names of sound files to play and receive those broadcasted names, then try to play them.  Most of the time I spent was actually on file scanning and filtering code rather than making the *cat tools behave nicely.

“snd2remote” on Linux can both listen for events AND broadcast them, and does some fairly clever search tricks to try very hard to play a WAV file that is not valid for the Linux machine as-is.  The Windows version is restricted to batch file commands, which means it’s much “dumber” and therefore needs more hand-holding; most of the clever logic is in the Linux side so the Windows side can afford to be dumbed down.

snd2remote taught me the following lessons and/or has the following nice features:

  1. You can make netcat exit immediately on receipt of a single UDP packet in listen mode by piping an empty “echo” into it. This drastically decreases latency because the 1-second timeout is unnecessary.
  2. The way to remove timeout latency in socat is to specify a “udp4-recvfrom” endpoint instead of “udp4-listen” and also specify the “unidirectional” switch.
  3. Translating to backslashes for dumb Windows batch listeners and letting Linux listeners translate them back to normal forward slashes makes life a thousand times easier.
  4. Making the Linux version assume UNC paths such as \\server\share\path\file.wav equals /home/share/path/file.wav on the Linux side allows the use of a default Samba home directory share as a source for both machines to fetch the same WAV files from.
  5. Letting the Linux script perform quick scans in a couple of obvious locations for alternate copies of the specified file makes it extremely easy to feed Windows-specific paths and have them still play on the Linux listeners without replicating the path to the same file on Windows.
  6. snd2remote in listen mode checks the ROOT of the executing user’s home directory, does full subdirectory scanning of a custom directory (defaults to $HOME/media), and performs the previously mentioned UNC path to home directory translation, in an exhaustive effort to find a playable copy of the requested WAV file.
  7. It’s better to use the shell construct $(commands) than to use (backticked) `commands` because the latter interprets special characters in a way that makes life very painful.

By running “snd2remote -l” on a Linux box or “snd2remote-listener.bat” on a Windows box, any other Linux box on the LAN can broadcast a sound event to all listening computers with a simple command:

snd2remote [-q] /path/to/file.wav

For example, to play the classic Windows “ding” on all listeners:

snd2remote C:/WINDOWS/Media/ding.wav

If you are interested in the internal workings of these scripts, they are heavily commented to explain in detail what is going on.  Lots of shell/command interpreter constructs and command combinations are used in them which I had not needed, or had rarely used, before; therefore, these scripts would be a good starting point to nudge your way into some interesting scripting.

The code has grown too long and complex to document everything here, but it is freely available at the c02ware snd2remote page.

One of the greatest things in the world about Linux is the amount of flexibility it gives a person in creating a solution to a problem, and I found myself faced with yet another situation where this became relevant lately.  Some background is in order before I get to the point of the story, however, so grab your popcorn and soda, and prepare to be astonished!

As many readers will no doubt know, I am hard at work on Tritech Service System 3.0, the next incarnation of c02ware’s Linux distribution which we use regularly in our shop.  Since TSS is primarily developed with the needs of a computer repair service business in mind, we obviously look for ways to help TSS help us do our jobs, which is where some of the strange and unique features I’ve come up with originate.  One of the most novel ideas has been that of having a TSS CD which doesn’t require upgrading, because every upgrade cycle I’m forced to distribute new burned CDs to all of the technicians and rewrite all of our bootable USB flash drives for the new system.  TSS 3.0 will have the ability to upgrade over a network automatically during early startup.

Being able to self-upgrade before running anything very important is difficult, however, primarily due to the fact that it’s easy to make a messy solution that works in a specific case, but very hard to make a clean solution that works 99% of the time.  I’ve been writing custom scripts for the private TSS CDs that specifically refer to our core server by its internal IP address, meaning if it’s not on our network or the server’s down for any reason, there’s no way to run that script.  Making them generic, where they would automatically locate a suitable server clone such as a mirror on an external hard drive, is much harder, because you have to scan for things and, if using a network resource but not having its IP pre-programmed, you have to discover that resource somehow, without any prior knowledge of its existence.

Yes, this is all sort of straying from the point of the post, but I’m building up to it, so bear with me for a moment longer.

THE QUESTION:  “How can I find a network resource containing files I need, even if I have no idea what the server’s IP address is…AND get 100% of the information required to access those files without any manual user intervention?” Nothing that I could think of immediately seemed to be a viable solution, other than perhaps somehow pulling a particular network name via Samba and using that, but then I considered that perhaps the server itself might want to be running TSS with the default hostname of…well, “tss” (which obviously presents a problem, one which will also be fixed at some point!)  Then it started to hit me: what if we could somehow set up a “beacon” that would magically tell every “beacon-capable” TSS on the network that the beacon sender is a server?

Eureka!  …but not so fast…

The first thing that came to mind was, of course, good old fashioned netcat.  Netcat  (or just nc) is notoriously useful software that does a very simple thing: provide a way to pipe commands and information over TCP and UDP connections.  Well, netcat definitely would have done the job, so I tried something similar to this, to pipe a “configuration string” via UDP to the broadcast IP address for the LAN I was on at the time:

echo “ share path/to/file username password” | nc -u 9999

Unfortunately, all  kinds of really freakin’ weird socket errors were spit out, and it totally choked on me.  Apparently you can’t use netcat to do UDP broadcast on Linux for some reason (at least, not GNU netcat).  I was extremely disappointed, and I started to research netcat to see if there was some sort of solution.  I hit a forum post that mentioned socat and looked it up.  Holy cow, I hit jackpot!

socat is referred to by many as “netcat on steroids” and now I can see why.  Where netcat is a simple TCP or UDP pipeline tool, socat is a much more generic socket-to-socket version cat (thus “socat,” or “socket cat”).  This bad boy can hook up UNIX sockets, files on disk, standard input/output/error, TCP, UDP, raw IP with no protocol, and more, bidirectionally, and without restrictions on which can be input or output.  The help text for the usable socket modes alone is over 50 lines long, and it doesn’t complain at all if I try to use it to do a UDP broadcast.

What happened when I tried it out?  Check it out:

# socat UDP4-LISTEN:9999 GOPEN:foo &
[1] 14130
# echo " share path/to/files username password" | \
> socat - UDP4-DATAGRAM:,broadcast
# cat foo share path/to/files username password
[1]+  Done                    socat UDP4-LISTEN:9999 GOPEN:foo

Of course, this was all on the same machine, so I tried something on an old Linux router box on the network which I can’t use socat on for technical reasons.  With GNU netcat listening, the exact same configuration beacon broadcast came in perfectly:

$ nc -u -l -p 9999 share path/to/files username password

What does all of this silly stuff mean in the end?

With some clever scripting, Tritech Service System 3.0 will be able to automatically update itself from any server on the LAN. That means that when TSS 3.0.1 or 3.1 or even 4.0 appear, one machine running the newer version in its default configuration will be able to give its updated packages to all other machines on the network without any user intervention required. That means I can set up a TSS package repository on my main server and set up a TSS package beacon, and all of the TSS 3.0 CDs and flash drives I give to technicians won’t ever need to be replaced or rewritten just to get updated packages.  Bug fixes from one running new version will automatically propagate to older ones.  If I make a custom package that isn’t part of mainstream TSS, such as our custom scripts, they can be distributed only to private installations by the on-location server rather than being burned to the CD, eliminating the need for special private versions of TSS for in-shop use.

This, combined with an enhanced package acquisition system, dependency management, lower memory usage, persistent directories, and a host of other features will make Tritech Service System 3 one of the most flexible live Linux distributions in existence.

%d bloggers like this: