Comparing two files
Often in day to day work there is a need to compare two files to determine the differences between them. Maybe you have two lists of users, or two lists of serial numbers. You know the lists are different sizes and so there are differences between them. You need to determine which lines are not in the smaller list. In Linux there are a number of utilities that you can use for this work.
The fgrep command
Let's take a look first at using fgrep. The fgrep utility is like grep and egrep but faster. The fgrep command does not handle regular expressions though. However, for this work we don't need regular expressions. We will be doing a simple compare for an exact match. So let's look at the command.
fgrep -v -f file1 file2
So the first switch, the -v switch, tells the program to invert the match. So it will show you what is in file 2 that is NOT in file 1. If you want what is in both file 1 and file 2 then you would simply leave the -v off. The -f says that the next thing coming are the names of the two files to compare. The fgrep command is case sensitive by default. If you have two files that you are not sure if the case is the same on the lines between them and you want to do a case insensitive search then you would add the -i switch.
fgrep -v -i -f file1 file2
This will do a case insensitive comparison. So that way Fred and fred or fred and FRED would match. One thing to keep in mind with this is that if an entry in file 1 exists several times in file two then the first instance would be seen as being in both and the other duplicates in file 2 would be seen as not existing in file 1 and would all be outputted.
The comm command
The comm command does a comparison between two files looking for lines that are common between the two files. The nice thing about comm is it can output what is unique in file 1, what is unique in file 2, and what is common to both. It will output this information in three different columns. There is a caution here though. The two files must be sorted for the comparison to work correctly. So you can either sort the files ahead of time, or you can use the sort command on each file before feeding them into the comm utility. So let's look at how the command would look.
comm file1 file2
It is as simple as that. Now if you want to show only what is in file 2 and is not in file 1 what you would do is suppress the output of the file 1 and the common to both lists. So the command would be like this.
comm -13 file1 file2
If you wanted to show what was in file 1 and was not in file 2 or common to both the command would be as follows.
comm -23 file1 file2
The comm command is case sensitive the same as the fgrep command. So if you don't know if there are differences in case then you would have to include the -i flag to make the command case insensitive.
comm -13i file1 file2
If you want to have the sort command sort each file from the same command line you could do the following string.
comm -13 <(sort -u file1) <(sort -u file2)
The -u switch will give you a sort with strict ordering. This is important to make sure they are sorted as well as possible.
The fgrep command
Let's take a look first at using fgrep. The fgrep utility is like grep and egrep but faster. The fgrep command does not handle regular expressions though. However, for this work we don't need regular expressions. We will be doing a simple compare for an exact match. So let's look at the command.
fgrep -v -f file1 file2
So the first switch, the -v switch, tells the program to invert the match. So it will show you what is in file 2 that is NOT in file 1. If you want what is in both file 1 and file 2 then you would simply leave the -v off. The -f says that the next thing coming are the names of the two files to compare. The fgrep command is case sensitive by default. If you have two files that you are not sure if the case is the same on the lines between them and you want to do a case insensitive search then you would add the -i switch.
fgrep -v -i -f file1 file2
This will do a case insensitive comparison. So that way Fred and fred or fred and FRED would match. One thing to keep in mind with this is that if an entry in file 1 exists several times in file two then the first instance would be seen as being in both and the other duplicates in file 2 would be seen as not existing in file 1 and would all be outputted.
The comm command
The comm command does a comparison between two files looking for lines that are common between the two files. The nice thing about comm is it can output what is unique in file 1, what is unique in file 2, and what is common to both. It will output this information in three different columns. There is a caution here though. The two files must be sorted for the comparison to work correctly. So you can either sort the files ahead of time, or you can use the sort command on each file before feeding them into the comm utility. So let's look at how the command would look.
comm file1 file2
It is as simple as that. Now if you want to show only what is in file 2 and is not in file 1 what you would do is suppress the output of the file 1 and the common to both lists. So the command would be like this.
comm -13 file1 file2
If you wanted to show what was in file 1 and was not in file 2 or common to both the command would be as follows.
comm -23 file1 file2
The comm command is case sensitive the same as the fgrep command. So if you don't know if there are differences in case then you would have to include the -i flag to make the command case insensitive.
comm -13i file1 file2
If you want to have the sort command sort each file from the same command line you could do the following string.
comm -13 <(sort -u file1) <(sort -u file2)
The -u switch will give you a sort with strict ordering. This is important to make sure they are sorted as well as possible.
Home |
About |
Services |
Copyright © 2016