Skip navigation

Tag Archives: TSS

I have a cleanup program that I’ve written as a Bash shell script. Over the years, it has morphed from a thing that just deleted a few fixed directories if they existed at all (mostly temporary file directories found on Windows) to a very flexible cleanup tool that can take a set of rules and rewrite and modify them to apply to multiple versions of Windows, along with safeguards that check the rules and auto-rewritten rules to prevent the equivalent of an “rm -rf /*” from happening. It’s incredibly useful for me; when I back up a customer’s PC data, I run the cleaner script first to delete many gigabytes of unnecessary junk and speed up the backup and restore process significantly.

Unfortunately, having the internal rewrite and safety check rules has the side effect of massively slowing the process. I’ve been tolerating the slowness for a long time, but as the rule set increased in size over the past few years the script has taken longer and longer to complete, so I finally decided to find out what was really going on and fix this speed problem.

Profiling shell scripts isn’t quite as easy as profiling C programs; with C, you can just use a tool like Valgrind to find out where all the effort is going, but shell scripts depend on the speed of the shell, the kernel, and the plethora of programs executed by the script, so it’s harder to follow what goes on and find the time sinks. However, I observed that a lot of time was spent in the steps between deleting items; since each rewrite and safety check is done on-the-fly as deletion rules are presented for processing, those were likely candidates. The first thing I wanted to know was how many times the script called an external program to do work; you can easily kill a shell script’s performance with unnecessary external program executions. To gather this info, I used the strace tool:

strace -f -o strace.txt tt_cleaner

This produced a file called “strace.txt” which contains every single system call issued by both the cleaner script and any forked programs. I then looked for the execve() system call and gathered the counts of the programs executed, excluding “execve resumed” events which aren’t actual execve() calls:

grep execve strace.txt | sed ‘s/.*execve/execve/’ | cut -d\” -f2 | grep -v resumed | sort | uniq -c | sort -g

The resulting output consisted of numbers below 100 until the last two lines, and that’s when I realized where the bottleneck might be:

4157 /bin/sed
11227 /usr/bin/grep

That’s a LOT of calls to sed, but the number of calls to grep was almost three times bigger, so that’s where I started to search for ways to improve. As I’ve said, the rewrite code takes each rule for deletion and rewrites it for other possible interpretations; “Username\Application Data” on Windows XP was moved to “Username\AppData\Roaming” on Vista and up, while “All Users\Application Data” was moved to “C:\ProgramData” in the same, plus there is a potential mirror of every single rule in “Username\AppData\Local\VirtualStore”. The rewrite code handles the expansion of the deletion rules to cover every single one of these possible cases. The outer loop of the rewrite engine grabs each rewrite rule in order while the inner loop does the actual rewriting to the current rule AND and all prior rewrites to ensure no possibilities are missed (VirtualStore is largely to blame for this double-loop architecture). This means that anything done within the inner loop is executed a huge number of times, and the very first command in the inner loop looked like this:

if echo “${RWNAMES[$RWNCNT]}” | grep -qi “${REWRITE0[$RWCNT]}”

This checks to see if the rewrite rule applies to the cleaner rule before doing the rewriting work. It calls grep once for every single iteration of the inner loop. I replaced this line with the following:

if [[ “${RWNAMES[$RWNCNT]}” =~ .*${REWRITE0[$RWCNT]}.* ]]

I had to also tack a “shopt -s nocasematch” to the top of the shell script to make the comparison case-insensitive. The result was a 6x speed increase. Testing on an existing data backup which had already been cleaned (no “work” to do) showed a consistent time reduction from 131 seconds to 22 seconds! The grep count dropped massively, too:

97 /usr/bin/grep

Bash can do wildcard and regular expression matching of strings (the =~ comparison operator is a regex match), so anywhere your shell script uses the “echo-grep” combination in a loop stands to benefit greatly by exploiting these Bash features. Unfortunately, these are not POSIX shell features and using them will lead to non-portable scripts, but if you will never use the script on other shells and the performance boost is significant, why not use them?

The bigger lesson here is that you should take some time to learn about the features offered by your shell if you’re writing advanced shell scripts.

Update: After writing this article, I set forth to eliminate the thousands of calls to sed. I was able to change an “echo-sed” combination to a couple of Bash substring substitutions. Try it out:

FOO=${VARIABLE/string_to_replace/replacement}

It accepts $VARIABLES where the strings go, so it’s quite powerful. Best of all, the total runtime dropped to 10.8 seconds for a total speed boost of over 11x!

If you take a look at the page for the Tritech Service System distribution of Linux, you’ll notice a few new things. The most obvious is that I’m redoing the c02ware site design; there’s now a basic logo, proper site navigation, a mobile-friendly layout, and a cleaner-looking color scheme. Consistency across pages has been greatly improved, and lots of unnecessary old junk and confusing content has been completely tossed out.

This change is being driven by my push to release the Tritech Service System with all of our proprietary bits included as a commercial product, with regular updates, bug fixes, and support. I will continue to release TSS without any proprietary bits as a public and completely free system, but for anyone in the PC repair business, the paid-for stuff can easily pay for itself in workflow acceleration and productivity boosts within a month, and we want to be able to bring that advantage to other PC service shops and I.T. departments. If you are interested in being notified when the Tritech Service System becomes available for purchase, send me an email and I will keep you in the loop.

A major goal in TSS is keeping the system as small as possible without cutting out basic features. In the effort to move towards this goal, I have released MiniTSS 2.9.0! The download is a paltry eight megabytes in size, and includes includes the following software packages:

  • busybox 1.21.1
  • chntpw 110511
  • cifsmount (mount.cifs helper)
  • dd-rescue 1.28
  • dropbear 0.52
  • fuse 2.8.3
  • glibc 2.10.1
  • libblkid 1.1.0
  • libuuid 1.3.0
  • ncurses 5.6
  • ntfs-3g-ntfsprogs 2011.4.12
  • pv 1.2.0
  • rsync 3.0.7
  • socat 1.7.2.1
  • sysfsutils 2.1.0
  • tar 1.22
  • tss-base-fs-mini 101
  • tss-bootstrap
  • udev 163
  • xz-utils 5.0.4
  • zlib 1.2.3

MiniTSS is not just a live CD/USB system. You can download the “source” archive, unpack it on your Linux system, add or remove packages to initramfs as you see fit, and rebuild your own custom version with whatever software you actually need. The system only provides basic tools and a command line interface, and therefore is aimed at intermediate-level Linux users.

This question was posed on a forum:

I have a customer who has a computer, 2 SATA disk (striped in RAID config. Windows won’t load. Diag reports bad hard drive. When I disconnect one, it kills the stripe and the computer appears to not have a hard drive at all. Seems kind of silly to have it set this way as it increases the risk of failure. Other than putting each hard drive in another computer, I’d like to determine which of the disk are bad.

Also, not quite sure how to attack data recovery as they are a stripe set and plugging in to a SATA to USB does not appear to be a valid method. If I put a third hard drive in as a boot drive, do i have to reconfig the stripe set and if i do, will it kill the data.

I have reassembled two RAID-0 “striped” drives to a single larger drive by hand before. It’s actually a programmatically simple operation, but you require a lot of low-level knowledge and some guesswork to do it. The specific pair I had appeared to store the metadata somewhere other than the start of the disk, and I was able to discover through a hex editor that the drive was on a 64KB stripe size. I also spotted which drive had a partition table and which didn’t, because that’s only on the first drive which contains the first stripe.

At a Linux command prompt, with the two RAID-0 disks (that were NOT being detected properly by the Linux LVM2 “FakeRAID” algorithms, by the way) and a disk of twice their size connected, I wrote a very simple script that looked something like this (sda/sdb as RAID-0, sdc as the destination disk, and this might work under ash or similar as well).

—- cut here —-

#/bin/bash

; X=sda position, Y=sdb position, Z=sdc position, STRIPE=stripe size
X=0; Y=0; Z=0; STRIPE=65536

; Retrieve the size of a RAID-0 disk so we can terminate at the end
SIZE=$(cat /proc/partitions | grep ‘sda$’ | awk ‘{print $3}’)
; Divide size by stripe, including any tail blocks (expr truncates)
SIZE=$(( SIZE + STRIPE – 1 ))
SIZE=$(expr $SIZE / $STRIPE ))
while [ “$Z” -lt “$SIZE” ]
do
dd if=/dev/sda of=/dev/sdc seek=$Z; skip=$X bs=$STRIPE count=1
Z=$(( Z + 1 ))
dd if=/dev/sda of=/dev/sdc seek=$Z; skip=$Y bs=$STRIPE count=1
Z=$(( Z + 1 ))
X=$(( X + 1 ))
Y=$(( Y + 1 ))
done

—- cut here —-

Note that all it does is load 64K at a time from each disk and save it to the third disk in sequential order. This is untested, and requires modification to suit your scenario, and is only being written here as an example. It does not fail if a ‘dd’ command fails, so it will work okay for data recovery; you will lose any stripe that contains a bad block, though, and the algorithm could be improved to use dd_rescue (if you have it) or to copy smaller units in a stripe so that only a partial stripe loss occurs on bad data.

%d bloggers like this: