uniq

The uniq command, by default (with no options) will look for consecutive lines in a file, and will only output 1 copy of a row of the file for each batch of consecutive lines that agree. For instance, if there are five copies of the same row of data in a file, uniq will only print that row one time.

The uniq command will not, however, sort the file first. For this reason, it is common to use the uniq command in a pipeline, after the sort command.

There are several helpful options available with the uniq utility. For instance, the -c option prints the number of copies of the line (at the far left of the line).

The -u option indicates that we only want to print lines that are isolated, in other words, which are not in a batch of consecutive repeated lines. In contrast, the -d option tells uniq that we only want to lines that do occur in a batch of two or more consecutive repeated lines.

The -n option (where n is an integer) signifies that we want to skip the first n lines of each row of data, when comparing whether lines are equal. FIXTHIS page 419 in Section 21.20 of the Unix Power Tools book has an example with -4 that demonstrates how this is useful. In that example, the first four columns of a file are timestamps that should not be considered, when comparing whether lines are identical. (In other words, in that example, the uniq command only pays attention to the 5th field and beyond, when comparing the rows to see if they are unique.)

Using the uniq command in a pipeline with sort

FIXTHIS we might want to add an example about sorting before using uniq. For instance, I have done an example of this with the airline data, emphasizing that it is necessary to sort the data before using the uniq command.