PlutoSpin- Putting a New Spin on Programming
TARFIXER

tar file problems can come in many forms. It is important to understand and self-diagnose typical failure modes.
1) tar file has bit/byte errors
2) tar'ing up non-static files and/or filesystems
3) tar'ing up filesystem with bad sectors or through a poor network connection

Diagnosis and Possible Approaches to Fix
1) tar file has bit/byte errors
If the tar file is compressed these will typically result in an uncompression error. The challenge is two-fold. First, some utility or manual binary hacking needs to be performed to obtain an uncompressible file. Then, you will typically run into the other types of problems described below which will confuse tar. If the compression was done using bzip2, the recovery path is much easier than if gzip was used. bzip2 comes with a bzip2recover utility. This will salvage uncompressable blocks into a list of files looking like: "rec00001file.bz2", "rec00002file.bz2"... A good approach then is to uncompress and concat as big pieces as possible. Then handle the big pieces as corrupted tar files and run tarfixer on them.

Bit/Byte errors in uncompressed tar files cause two types of errors for tar. If the error is in a "tar header" region (usually 512 bytes before the real data), then tar will usually get really confused causing it to skip files or stop extracting. If the error is in the data region, tar will not know about it and simply extract the file with the error. There is no crc or checksum for the data region.
The hardest part is overcoming "tar header" corruption. The problem is that tar uses the file size expectations it finds in the header to know how to extract the data following it into a file and where to look for the next file. This is why tar typically goes berserk when confronted with this problem.
The tarfixer program searches the file byte by byte looking for tar headers and avoids outputting inconsistent headers into it's fixed file.

2) tar'ing up non-static files and/or non-locking filesystems
tar'ing files on non-locking filesystems (ie, a read and write can happen at the same time on the same data) can cause tar to insert the wrong file size into it's headers and then get completely confused and either skip or stop extracting files. Essentially tar puts the file size it originally sees in the header but then but then outputs a different amount of data following it since the file has changed in between the operations. The tarfixer program will detect the difference and change the header to reflect the amount of data following it.

3)  tar'ing up filesystem with bad sectors or through a poor network connection
This usually results in issues similar to (1) and (2). One added complication
is some corrupted filesystems will not maintain data offsets correctly. tar
depends on it's headers being on 512k boundaries. So if a byte or two is
skipped it will cause the typical skiping or stopping of extraction. Since
tarfixer scans the file byte by byte, it will fix the headers to be on 512k
boundaries.

tarfixer
Requirements:
POSIX 1003.1-1990 tar'ed file. GNU extensions are not directly supported, but
if they were not used/needed during the tar process it will probably still work. Older POSIX tar'ed files will not work-- there is no "magic" value in the headers to trigger off of.
bzip2 and gzip files are supported as long as the problem is corruption of the
tar process and not a problem with the decompression. Lempel-Ziv compress (*.Z) files are not directly supported.

Usage:
tarfixer [-j, --bzip2] [-z, --gzip] [-n, --parse] [-o fixed_output_file] filename

-j, --bzip2:  turn on bzip2 compression for fixed_output_file
-z, --gzip:   turn on gzip compression for fixed_output_file
-n, --parse:  just parse filename, but do not fix or write anything
fixed_output_file: fixed version of filename (can not be the same as filename)
     defaults to fixed_filename.tar{.gz,.bz2}

Defaults:
tarfixer will auto-detect the compression type of filename. If filename was bzip2 compressed, fixed_output_file will also be bzip2 compressed.
The compression options will only effect the fixed_output_file. So if filename was not compressed, but you want the fixed_output_file to be bzip2 compressed, you would set the -j option to tarfixer. This is useful if disk space is limited.

Downloads:
Static Binary For Linux (tested on FC6):
tarfixer.tar.bz2

Source Code (GPL):
tarfixer_src.tar.bz2
Copyright © 2005 E Berta. All rights reserved.