How to Archive, Compress and Extract Files in Linux

Handy guide on how to archive, compress and extract files in Linux command line to create backups or archive data in various formats.

By Tim TrottLinux Tips and Tutorials • July 24, 2016
943 words, estimated reading time 3 minutes.
Introduction to Linux

This article is part of a series of articles. Please use the links below to navigate between the articles.

  1. How to Download and Installing Linux Step by Step For Beginners
  2. Essential Guide to Working with Files in Linux
  3. Understanding Linux File Permissions and Permission Calculator
  4. How to Archive, Compress and Extract Files in Linux
  5. Linux Piping and Redirection Explained
  6. Hardlinks and Softlinks in Linux Explained With Examples
  7. How to Create and Use Bash Scripts in Linux
  8. Data Recovery in Linux - How To Recover Your Data after Drive Failures
  9. Apache Web Server Administration Cheat Sheet for Linux
  10. Essential MariaDB and MySql Administration Tips on Linux
  11. How to Switching from Windows to Linux - A Complete Guide
How to Archive, Compress and Extract Files in Linux

It's often useful to be able to archive multiple files into a single file for backup or organisation reasons. These archives are often compressed to save disk space. Read on and learn how to archive, compress and extract files in Linux.

The TAR command, a fundamental and robust archiving tool in Linux, has an intriguing history. Originating as the Tape ARchiver, it was initially used to send a stream of files to a sequential tape archive. Today, it can also create a physical file in the file system, making it a versatile tool for archiving in Linux.

Let's create an archive of my current home directory and back up all the data.

We can use the DU (Disk Usage) command to see the current size of the directory, and we will compare this to the generated archive size.

du -sh .

Executing this command in my home directory shows 288K used.

Creating Archives with TAR

Let's go ahead and create an archive of these files.

tar -cvf /tmp/$USER.tar $HOME

This command has a bunch of flags and parameters; let's look at each of them and see what they do.

  • -c create archive
  • -v verbose (see what's happening)
  • -f specify the file

The parameters are broken down as follows. $USER is a system variable that contains the current user's name, and $HOME is a system variable that contains the current user's home directory path. This command will create an archive called tim.tar and /tmp, and it will archive my home directory.

We can run this command and see what happens. Remember when we saw the size of the current home directory? We can do the same with the newly created archive.

du -h /tmp/tim.tar

When we examine the size of the archive, we notice that it's a bit smaller than the original home directory. This isn't due to compression, but rather the file system block size. In simple terms, the block size is the minimum amount of space a file can occupy on a disk, typically 4k. So, even a small file, say only a few characters, will still take up 4k of disk space. A 5k file will occupy one 4k block fully and another 4k block, most of which is wasted space.

Testing TAR Archives

Now, let's ensure the integrity of our archive. While TAR is a reliable tool for backing up your data, it's crucial to verify the archive's health. TAR can also be used to view or test an archive, giving you peace of mind about the safety of your data.

tar -tf /tmp/tim.tar

Expanding Archives

We can use the following command to expand the archive into the current directory, again with verbose and specify file flags.

tar -xvf /tmp/tim.tar

This will unpack all the files into the current directory.

Compressing Archives

TAR files can also be compressed using the gzip and bzip commands to shrink the backups to the smallest file size. This is great for transmitting online or fitting more data onto backup devices.

Using gzip to Compress the Archive

gzip tim.tar

Well, that was easy. This has now created an archive called tim.tar.gz. NOTE: This will replace the original archive with the gzip archive. The gzip archive is now a lot smaller than the original archive.

Using bzip to Compress the Archive

Bzip is an alternative to gzip, which offers slightly better compression at the cost of performance. They are functionally identical.

Creating the bzip archive is as easy as running this command:

bzip2 tim.tar

Uncompressing Archives

Uncompressing archives, often called unzipping, is the reverse process. It uncompresses the archive to the current directory or specified directory.

To uncompress gzip archives

This is the opposite of the gzip command. It recreated the original archive and removed the .gz archive.

gunzip tim.tar.gz

To uncompress bzip archives

bunzip2 tim.tar.bz2

Again, this will recreate the original archive and remove the .bz2 extension.

Streamlining Compression

It's a pain having to issue two commands to archive and compress files. Luckily, we can use piping to send the output of the TAR command to the GZIP or BZIP commands.

tar -cvzf tim.tar.gz
tar -cfjf tim.tar.bz2

tar -xvzf tim.tar.gz
tar -xvjf tim.tar.bz2

Using CPIO for Archiving

CPIO (CoPy Input Output) is another general file archiver utility. Like TAR, it does not compress by default, but you can create gzip and bzip archives.

CPIO can read the directory and path names of the files to archive from the STDIN pipe, which means that you can use commands such as find to specify what is included in the archive.

find -name '*.pdf' | cpio -o > /tmp/pdf.cpio

This will run the find command to search for all the PDF files; these are then archived to the /tmp/pdf.cpio archive file. -o means output.

Expanding the archive is also as easy as typing.

cpio -id < /tmp/pdf.cpio

In this case, -i means read input, and -d specifies that the directories should be created if they don't already.

Imaging with DD

Disk Duplicator (dd) is a tool for archiving and backup complete partitions or entire disks. This is called imaging or cloning a hard drive. Disk images can be used to backup drives, create snapshots or create ISO images of CD and DVDs. The images created are exact representations of the original filesystem and can be mounted as any other device.

Create an ISO image from a CD

dd -if=/dev/sr0 of=cdimage.iso

-if specifies the input source, the -of the parameter specifies the output file. Depending on your distro, you may need to change sr0 to your CD drive device.

Create an image of a hard drive

dd -if=/dev/sda of=harddrive.img

Create an image of a partition

dd -if=/dev/sda1 of=harddrive.img

Related ArticlesThese articles may also be of interest to you

CommentsShare your thoughts in the comments below

My website and its content are free to use without the clutter of adverts, popups, marketing messages or anything else like that. If you enjoyed reading this article, or it helped you in some way, all I ask in return is you leave a comment below or share this page with your friends. Thank you.

There are no comments yet. Why not get the discussion started?

New comments for this post are currently closed.