Skip navigation

Category Archives: advice

tl;dr: Hard drives already do this, the risks of loss are astronomically low, ZFS is useless for many common data loss scenarios, start backing your data up you lazy bastards, and RAID-5 is not as bad as you think.

Bit rot just doesn’t work that way.

I am absolutely sick and tired of people in forums hailing ZFS (and sometimes btrfs which shares similar “advanced” features) as some sort of magical way to make all your data inconveniences go away. If you were to read the ravings of ZFS fanboys, you’d come away thinking that the only thing ZFS won’t do is install kitchen cabinets for you and that RAID-Z is the Holy Grail of ways to organize files on a pile of spinning rust platters.

In reality, the way that ZFS is spoken of by the common Unix-like OS user shows a gross lack of understanding of how things really work under the hood. It’s like the “knowledge” that you’re supposed to discharge a battery as completely as possible before charging it again which hasn’t gone away even though that was accurate for old Ni-Cd battery chemistry and will destroy your laptop or cell phone lithium-ion cells far faster than if you’d have just left it on the charger all the time. Bad knowledge that has spread widely tends to have a very hard time dying. This post shall serve as all of the nails AND the coffin for the ZFS and btrfs feature-worshiping nonsense we see today.

Side note: in case you don’t already know, “bit rot” is the phenomenon where data on a storage medium gets damaged because of that medium “breaking down” over time naturally. Remember those old floppies you used to store your photos on and how you’d get read errors on a lot of them ten years later? That’s sort of like how bit rot works, except bit rot is a lot scarier because it supposedly goes undetected, silently destroying your data and you don’t ever find out until it’s too late and even your backups are corrupted.

“ZFS has CRCs for data integrity”

A certain category of people are terrified of the techno-bogeyman named “bit rot.” These people think that a movie file not playing back or a picture getting mangled is caused by data on hard drives “rotting” over time without any warning. The magical remedy they use to combat this today is the holy CRC, or “cyclic redundancy check.” It’s a certain family of hash algorithms that produce a magic number that will always be the same if the data used to generate it is the same every time.

This is, by far, the number one pain in the ass statement out of the classic ZFS fanboy’s mouth and is the basis for most of the assertions that ZFS “protects your data” or “guards against bit rot” or other similar claims. While it is true that keeping a hash of a chunk of data will tell you if that data is damaged or not, the filesystem CRCs are an unnecessary and redundant waste of space and their usefulness is greatly over-exaggerated by hordes of ZFS fanatics.

Hard drives already do it better

Enter error-correcting codes (ECC.) You might recognize that term because it’s also the specification for a type of RAM module that has extra bits for error checking and correction. What the CRC Jesus clan don’t seem to realize is that all hard drives since the IDE interface became popular in the 1990s have ECC built into their design and every single bit of information stored on the drive is both protected by it and transparently rescued by it once in a while.

Hard drives (as well as solid-state drives) use an error-correcting code to protect against small numbers of bit flips by both detecting and correcting them. If too many bits flip or the flips happen in a very specific way, the ECC in hard drives will either detect an uncorrectable error and indicate this to the computer or the ECC will be thwarted and “rotten” data will successfully be passed back to the computer as if it was legitimate. The latter scenario is the only bit rot that can happen on the physical medium and pass unnoticed, but what did it take to get there? One bit flip will easily be detected and corrected, so we’re talking about a scenario where multiple bit flips happen in close proximity and in such a manner that it is still mathematically valid.

While it is a possible scenario, it is also very unlikely. A drive that has this many bit errors in close proximity is likely to be failing and the the S.M.A.R.T. status should indicate a higher reallocated sectors count or even worse when this sort of failure is going on. If you’re monitoring your drive’s S.M.A.R.T. status (as you should be) and it starts deteriorating, replace the drive!

Flipping off your CRCs

Note that in most of these bit-flip scenarios, the drive transparently fixes everything and the computer never hears a peep about it. ZFS CRCs won’t change anything if the drive can recover from the error. If the drive can’t recover and sends back the dreaded uncorrectable error (UNC) for the requested sector(s), the drive’s error detection has already done the job that the ZFS CRCs are supposed to do; namely, the damage was detected and reported.

What about the very unlikely scenario where several bits flip in a specific way that thwarts the hard drive’s ECC? This is the only scenario where the hard drive would lose data silently, therefore it’s also the only bit rot scenario that ZFS CRCs can help with. ZFS with CRC checking will detect the damage despite the drive failing to do so and the damage can be handled by the OS appropriately…but what has this gained us? Unless you’re using specific kinds of RAID with ZFS or have an external backup you can restore from, it won’t save your data, it’ll just tell you that the data has been damaged and you’re out of luck.

Hardware failure will kill your data

If your drive’s on-board controller hardware, your data cable, your power supply, your chipset with your hard drive interface inside, your RAM’s physical slot connection, or any other piece of the hardware chain that goes from the physical platters to the CPU have some sort of problem, your data will be damaged. It should be noted that SATA drive interfaces use IEEE 802.3 CRCs so the transmission from the drive CPU to the host system’s drive controller is protected from transmission errors. Using ECC RAM only helps with errors in the RAM itself, but data can become corrupted while being shuffled around in other circuits and the damaged values stored in ECC RAM will be “correct” as far as the ECC RAM is concerned.

The magic CRCs I keep making fun of will help with these failures a little more because the hard drive’s ECC no longer protects the data once the data is outside of a CRC/ECC capable intermediate storage location. This is the only remotely likely scenario that I can think of which would make ZFS CRCs beneficial.

…but again: how likely is this sort of hardware failure to happen without the state of something else in the machine being trashed and crashing something? What are the chances of your chipset scrambling the data only while the other millions of transistors and capacitors on the die remain in a functional and valid working state? As far as I’m concerned, not very likely.

Data loss due to user error, software bugs, kernel crashes, or power supply issues usually won’t be caught by ZFS CRCs at all. Snapshots may help, but they depend on the damage being caught before the snapshot of the good data is removed. If you save something and come back six months later and find it’s damaged, your snapshots might just contain a few months with the damaged file and the good copy was lost a long time ago. ZFS might help you a little, but it’s still no magic bullet.

Nothing replaces backups

By now, you’re probably realizing something about the data CRC gimmick: it doesn’t hold much value for data integrity and it’s only useful for detecting damage, not correcting it and recovering good data. You should always back up any data that is important to you. You should always keep it on a separate physical medium that is ideally not attached to the computer on a regular basis.

Back up your data. I don’t care about your choice of filesystem or what magic software you write that will check your data for integrity. Do backups regularly and make sure the backups actually work.

In all of my systems, I use the far less exciting XFS on Linux with metadata CRCs (once they were added to XFS) on top of a software RAID-5 array. I also keep external backups of all systems updated on a weekly basis. I run S.M.A.R.T. long tests on all drives monthly (including the backups) and about once a year I will test my backups against my data with a tool like rsync that has a checksum-based matching option to see if something has “rotted” over time.

All of my data loss tends to come from poorly typed ‘rm’ commands. I have yet to encounter a failure mode that I could not bounce back from in the past 10 years. ZFS and btrfs are complex filesystems with a few good things going for them, but XFS is simple, stable, and all of the concerning data loss bugs were ironed out a long time ago. It scales well and it performs better all-around than any other filesystem I’ve ever tested. I see no reason to move to ZFS and I strongly question the benefit of catching a highly unlikely set of bit damage scenarios in exchange for the performance hit and increased management complexity that these advanced features will cost me…and if I’m going to turn those features off, why switch in the first place?

Bonus: RAID-5 is not dead, stop saying it is

A related category of blind zealot is the RAID zealot, often following in the footsteps of the ZFS zealot or even occupying the same meat-suit. They loudly scream about the benefits of RAID-6, RAID-10, and fancier RAID configurations. They scorn RAID-5 for having terrible rebuild times, hype up the fact that “if a second drive dies while rebuilding, you lose everything!” They point at 10TB hard drives and do back-of-the-napkin equations and tell you about how dangerous and stupid it is to use RAID-5 and how their system that gives you less space on more drives is so much better.

Stop it, fanboys. You’re dead wrong and you’re showing your ignorance of good basic system administration practices.

I will concede that your fundamental points are mostly correct. Yes, RAID-5 can potentially have a longer rebuild time than multi-stripe redundant formats like RAID-6. Yes, losing a second drive after one fails or during a rebuild will lose everything on the array. Yes, a 32TB RAID-5 with five 8TB drives will take a long time to rebuild (about 50 hours at 180 MB/sec.) No, this isn’t acceptable in an enterprise server environment. Yes, the infamous RAID-5 write hole (where a stripe and its parity aren’t both updated before a crash or power failure and the data is damaged as as result) is a problem, though a very rare one to encounter in the real world. How do I, the smug techno-weenie advocating for dead old stupid RAID-5, counter these obviously correct points?

  • Longer rebuild time? This is only true if you’re using the drives for something other than rebuilding while it’s rebuilding. What you really mean is that rebuilding slows down less when you interrupt it with other work if you’re using RAID levels with more redundancy. No RAID exists that doesn’t slow down when rebuilding. If you don’t use it much during the rebuild, it’ll go a lot faster. No surprise there!
  • Losing a second drive? This is possible but statistically very unlikely. However, let’s assume you ordered a bunch of bad Seagates from the same lot number and you really do have a second failure during rebuild. So what? You should be backing up the data to an external backup, in which case this failure does not matter. RAID-6 doesn’t mean you can skip the backups. Are you really not backing up your array? What’s wrong with you?
  • RAID-5 in the enterprise? Yeah, that’s pretty much dead because of the rebuild process slowdown being worse. An enterprise might have 28 drives in a RAID-10 because it’s faster in all respects. Most of us aren’t an enterprise and can’t afford 28 drives in the first place. It’s important to distinguish between the guy building a storage server for a rack in a huge datacenter and the guy building a home server for video editing work (which happens to be my most demanding use case.
  • The RAID-5 “write hole?” Use an uninterruptible power supply (UPS). You should be doing this on any machine with important data on it anyway! Assuming you don’t use a UPS, Linux as of kernel version 4.4 has added journaling features for RAID arrays in an effort to close the RAID-5 write hole problem.

A home or small business user is better off with RAID-5 if they’re also doing backups like everyone should anyway. With a 7200 RPM 3TB drive (the best $/GB ratio in 7200 RPM drives as of this writing) costing around $95 each shipped, I can only afford so many drives. I know that I need at least three for a RAID-5 and I need double as many because I need to back that RAID-5 up, ideally to another machine with another identically sized RAID-5 inside. That’s a minimum of six drives for $570 to get two 6TB RAID-5 arrays, one main and one backup. I can buy a nice laptop or even build a great budget gaming desktop for that price, but for these storage servers I haven’t even bought the other components yet. To get 6TB in a RAID-6 or RAID-10 configuration, I’ll need four drives instead of three for each array, adding $190 to the initial storage drive costs. I’d rather spend that money on the other parts and in the rare instance that I must rebuild the array I can use the backup server to read from to reduce my rebuild time impact. I’m not worried about a few extra hours of rebuild.

Not everyone has thousands of dollars to allocate to their storage arrays or the same priorities. All system architecture decisions are trade-offs and some people are better served with RAID-5. I am happy to say, however, that if you’re so adamant that I shouldn’t use RAID-5 and should upgrade to your RAID levels, I will be happy to take your advice on one condition.

Buy me the drives with your own money and no strings attached. I will humbly and graciously accept your gift and thank you for your contribution to my technical evolution.

If you can add to the conversation, please feel free to comment. I want to hear your thoughts. Comments are moderated but I try to approve them quickly.

I have been seeing A LOT of people lately who have been caught in today’s most common computer scams.

I want to review them briefly and help you avoid making a mistake and giving control of your computer or bank account to a scammer. All of them are modern takes on the “snake oil” smoke-and-mirrors show from history designed to separate you from your money.

There are three ways that the latest wave of tech scams work:

  1. You get a random call from someone claiming to be from Microsoft or another large computer company, sometimes on all of your cell and home phones in a short time frame. They’re always sporting a fairly heavy foreign accent and phrase things strangely. They’ll tell you all kinds of stories about how terrible your computer is or how many viruses you’re leaking on the Internet. It’ll sound REALLY BAD. They’ll offer to help you fix it…for a price of course.
  2. The pop-up scary talking warning! Your browser loads an infected website or a malicious ad and gets kicked over to a HUGE SCARY WARNING that says your computer is infected and you need to call the number on the screen. If your speakers aren’t muted, it’ll also talk to you in a synthesized voice. If you call, you’ll get the same people as in (1) but this time they didn’t have to luck up and cold-call you, plus you’ll already be terrified so they can trick you into doing what they want.
  3. You call “tech support” for a large company like HP or Dell. You’re not really talking to an HP or Dell employee; you’re talking to an iYogi employee in India whose job is to sell you a support contract. I’m not sure if they’re the same people doing the other two, but it’s the same song and dance as the other two: you’ll get a nice show hyping up how horrible of a situation your computer is in and a hard sell on buying support from them.

In all of these situations, the person on the phone will want to use remote support tools such as TeamViewer or Citrix GoToAssist to get remote control of your computer. Once they have remote control, they are capable of doing ANYTHING THEY WANT to your computer, though they don’t usually seem to infect machines; it’s mainly a high-pressure sales pitch for $300 of computer snake oil.


For cold-call scammers in (1), hang up quickly. If they call again later, keep hanging up. The more they talk, the more likely it is that they’ll convince you to remote them in and pay up.

For the huge scary pop-up in (2), open Task Manager and kill your browser from there. If that’s not working out, just hold the power button on the computer for five seconds and it’ll shut off. Your computer IS NOT INFECTED. If it happens again after rebooting, try power-cycling your modem and router; these can get temporarily “infected” in a way that causes the computer to land on these scary sites quickly, but this “infection” doesn’t survive the power to the box being unplugged.

For the big corporate tech support calls in (3), it’s a bit more difficult because sometimes you’ll be talking to a legitimate support agent that isn’t going to try to scam you. The key things that tell you it’s going to be a scam are that they (A) want to get remote access to your computer without spending a lot of time trying to talk you through it first, (B) they tell you that your computer has serious problems and want to help you fix them, or (C) they mention money at any point in the process. IF ANY OF THESE THREE THINGS HAPPENS, try calling back or seek help from someone else that you trust. Make sure you’re calling the support phone number on the manufacturer’s official website as well!

Almost all of the computers I’ve checked in the past month that were targeted by these scams didn’t have any serious problems before or after the scammer got on, but many of my customers had to initiate chargebacks on their cards or change their bank accounts or get their cards exchanged which is frustrating and annoying.

If you’re in or near the Chatham County, Randolph County, Orange County, or Wake County areas of North Carolina and you’re concerned that your computer has been messed up by a scammer, you can get support from me at Tritech Computer Solutions in Siler City, including 100% free in-store diagnostics and repair quotes.

As a small business owner, I get repeatedly clobbered with hordes of attempts to sell me stuff I don’t need.  Sometimes it seems like I’m spending more time trying to run off people who want me to give THEM money than making money myself.  However, I am sympathetic to the plight of the salesperson, seeing how I have to wear that hat as the business owner from time to time myself, and it’s really a tough gig.  Rejection is difficult enough to handle infrequently, and I can only imagine how rough a day slam full of rejections can tax one’s mind.

Two recent incidents, however, make me wonder if some salespeople are asking for the rejection or even the abuse that they receive.  I’ll talk about the short one first; the second one is a bit of an epic saga, a grand adventure into the world of what us Internet-savvy folk call EPIC FAIL.

Someone from some sort of local directory called the shop, seeking my purchase of a listing in their directory.  Our word-of-mouth income is so good (and our experience with other advertising forms has been so poor) that we don’t really need to advertise much at all; our customers are well taken care of, and in return those customers take care of us.  Well, this here salesperson weren’t gunna have nunavit!  Despite my attempts to make it clear that we did not wish to advertise AT THIS TIME, the person didn’t seem to understand what I was saying.  Eventually I was forced to enter “blunt mode” and outright state that “I don’t want to do this.”

This genius had a clever response to my clear refusal.  What was it?  Oh, the suspense is probably killing you.

“So you’re saying that you’re not accepting new customers?”

If your face just slammed into the desk in disbelief, now you know how I felt at that moment.

If she was smart, she would have offered her contact information so that I could call her should I change my mind in the future (which HAS happened in the past).  Instead, she chooses to insult my intelligence with one of the scummiest sales tactics in the book.  What kind of stupid business owner is going to fall for that kind of line?  I can’t imagine anyone who deserves to be in business at all tripping over this lame attempt at forcing a sale.

Just to prove that I understand what’s going on here, I’ll explain how this statement was theoretically supposed to work on me.  When faced with relatively strong refusal, a salesperson may be able to “save the sale” by changing the client’s mindset.  This is actually a very common sales tactic and is apparently extremely popular with multi-level marketing sales.  Note how MLM salespeople don’t approach you saying “wanna sell some products and make money?”  They instead ask a series of questions to which you are generally certain to answer in the affirmative.  The theory here is that if you say an equivalent to “yes” three times, you’ll be more willing to agree to a sale, because you’ve been nudged into an agreeing mindset.  The trick with the question “so you don’t want new customers?” is to extract a denial of that question and a subsequent positive statement i.e. “yes, I DO want new customers” to then inch me back towards the affirmative.  Unfortunately, human beings aren’t robots and business owners know better than to fall for such paltry tricks.  My response was a sarcastic “No, I’m not accepting new customers.” *click* and that was the end of the conversation.  Hint: if you’re trying to sell me something, don’t insult my intelligence.

If you think that’s bad, though, you’d love getting a load of the next sales call I dealt with.  Someone who runs a local sports reporting website (and one that appears very hastily assembled, no less) wanted me to purchase advertising in the sidebars of the site.  The first suspicious part was that site content was rather minimal, and used a WordPress installation that seemed to have been partially broken by someone.  The fabulous claims of a good unique visitor count that this guy rolled off certainly didn’t seem to match the semi-broken nature of the site, and anyone can say that they have thousands of unique visits per week.  The second problem, though, was in the advertising format they were using: it was positively insane.  Ads are formatted sort of like vertical business cards and stacked on top of each other scaling all the way down the blog…on BOTH SIDES OF THE PAGE.  And they didn’t seem to stop, either: though ad placement in the two columns was totally random, there had to be at least 30-40 ads on every page.  It screams “we made this site to sell bogus worthless advertising!” and it looks unprofessional.

The salesman who called was where the real problems began, though.  In general, he did a good job of working around my rejections until I switched from vague business reasons to concrete observations about what he had said and the site which he sent me to examine.  These problems resulted in the quick termination of the call, and one very upset salesman.

Problem 1:  “Here’s what I’ll do.  I’m going to call some of the people that advertise on the site already, and I’ll ask them how it’s working for them.  Once I have that information, I’ll make a decision.”  His response?  “Now that doesn’t make one bit of sense.”  Immediately the alarm bells go off in my head: what I propose makes ABSOLUTELY POSITIVELY PERFECT SENSE.  Before buying an ad and blowing all that money, why on earth wouldn’t I try to find some sort of metric for determining how well it works, especially if it’s advertising I already don’t feel that I need anyway?  He tried to convince me that the success of a chiropractor’s ad or a home renovator’s ad might not be the same as my own, and to some minor extent that may have been a correct statement, but if I called ALL of the advertisers and MOST of them found it to be a waste of dollars, doesn’t that speak volumes about the performance of the advertising in general?  He tried to offer me some “success stories” to which I replied that success stories from the mouth of the salesperson don’t mean anything because they can be easily fabricated.  It wasn’t taking much for him to get pretty annoyed with me, but then…

Problem 2:  “You said that you don’t have any computer repair people on the site and that you want one, but I see an ad for ‘Randolph Telephone Company AtomicTechs’ on the site.”  The guy clearly didn’t know how to respond because I caught him in a lie.  He tried to use the silly generic slogan from that ad to convince me that “that’s not the same as what you do!”  Once again, a salesman thinks I’m a complete idiot.  That was the end of the game.

In short, if you’re selling something, don’t be stupid about it.  Understand who you’re selling to before you call, or at least figure it out.  And whatever you do, don’t lie to or belittle the potential customer.  Would you buy anything from someone who belittled you or told you a load of bull about the product?

I didn’t think so.

%d bloggers like this: