search text file for lines present in another text file and redirect output.

rupeshforu3

Disciple
Hi I am Rupesh from India and I have a system with intel i3 10th gen processor and Asus prime H 510 me motherboard. I have installed Arch Linux and it is working fine.

I have two text files say input1.txt and input2.txt and I want to search input1 file line by line present in another text file called input2.txt and redirect output to another text file missing.txt.

input1 text file consists of lines in the following pattern.

This is a video file 1[abcdef].mp4
This is a video file 2[ghijklm].mp4
This is a video file 3[nopqrst].mp4
This is a video file 4[uvwxyz].mp4

input2 text file consists of lines in the following pattern.

This is a video file 1.mp4
This is a video file 2.mp4
This is a video file 4.mp4


Note that in the second text file one line is missing ie., "This is a video file 3.mp4"

The second text file file consists of lines from first text file last few characters removed.

Now my requirement is I want a script which searches input1.txt text file for lines containing "This is a video file 1" and after that search for "This is a video file 2" and so on.

If a match is found ignore the line. If a match is not found the line must be redirected to another text file called missing.txt.

In the above case missing.txt must contain the following line

This is a video file 3[nopqrst].mp4

You can say that use comm utility or diff utility but they are not applicable in the present situation because some characters are missing in the second text file.

You can even suggest that use text editor like gedit but I have 1000 lines in first text file and 800 lines in second text file.

I want all the 200 remaining missing lines in missing.txt file.

I have a script which has while loop and reads all the lines present in second text file input2.txt.

Code:
#!/bin/bash

while read name; do
   
   echo $name

done < input2.txt


But I don't know how to search input1 text file and how to use if construct.

At present I am reading linux operating system and utilities from the beginning and at present I completed how Linux boots and introduction to systemd. It will take time to create a script on my own.

Kindly try to provide a script which searches input1 text file and redirect the unique lines to missing.txt file.

Regards,
Rupesh.
 
I would try something like reading input1.txt line by line then grepping for each line. When there isn't a match, pipe it to missing.txt.

Something off the top of my head:
Bash:
while read -r line1; do
    if ! grep -q "$line1" input2.txt; then
        echo "$line1" >> missing.txt
    fi
done < input1.txt

This doesn't account for the extra characters in input1.txt but I'm sure you can figure that out ;)
 
Something like this might work (on mobile so can't test):
Bash:
diff -u -d -c 0 input1.txt input2.txt
This should give you the lines in input1.txt that don't exist in input2.txt and ignore the ones that have been "modified".
 
Suggestions from other people are as follows

Code:
% cat input1.txt | while read LINE; do fgrep "$(echo ${LINE} | sed 's/\[.*//')" input2.txt > /dev/null || echo "${LINE}" >> missing.txt; done

% cat missing.txt
This is a video file 3[nopqrst].mp4

while read -r LINE; do
  grep -F "${LINE%[*}" input2.txt > /dev/null || echo "$LINE" >> missing.txt;
done < input1.txt

wdiff input1.txt input2.txt |grep -v -E '{.*}$'
This is a video file [-3[nopqrst].mp4


sed 's/\([0-9][0-9]*\).*\./\1./' <(sort input1.txt)  | comm -3 - <(sort input2.txt)
 
Turn this into a script. You can try a better 'sed' pattern

[~] cat /tmp/input1.txt
This is a video file 1[abcdef].mp4
This is a video file 2[ghijklm].mp4
This is a video file 3[nopqrst].mp4
This is a video file 4[uvwxyz].mp4
[~] cat /tmp/input2.txt
This is a video file 1.mp4
This is a video file 2.mp4
This is a video file 4.mp4

[~] cat /tmp/input1.txt | sed -e 's/\[[a-z]*\]//g' | sort > /tmp/x1.txt
[~] cat /tmp/input2.txt | sort > /tmp/x2.txt

[~] diff /tmp/x1.txt /tmp/x2.txt
1d0
<
4d2
< This is a video file 3.mp4
[~]
 
Last edited:
Hi none of the scripts worked fine.
Hi I think if we follow the correct procedure the issue can be resolved and I think that bash is working fine.

1) first for each line in input1.txt the square brackets and the characters with in square brackets must be removed.

2) The trailing space character must be removed.

3) The resultant line must be searched with in input2.txt.

4) If the line is found in input2.txt the line must be ignored.

5) If the line is not found the original line including square brackets and it's contents from input1.txt must be redirected to missing.txt file.

Previously I have used the following code to remove square brackets including its contents from file name with the following code.

for x in *.mp4; do mv "$x" "${x// \[*\]/}"; done

We must modify the above code and use it in step 1 and step 2.

I have thought how to solve the problem and found the above. I am sharing this to you to share my effort.

Let me try if I can succeed. Here the problem is I am not as talented person like you. I have not studied sed awk grep etc.,.

From the past 4 days I am working on the same problem and not succeeded and so I have made comments that bash is not working properly. Sorry for that.
Hi finally I have created a small script based upon my past experience and suggestions from the people like you and it is as follows.

Code:
#!/bin/bash

rm missing.txt

while read LINE; do
   
#   echo $LINE
   NAME="${LINE// \[*\]/}"
   NAME2=${NAME%????}
   NAME3="$(echo -e "${NAME2}" | sed -e 's/^[[:space:]]*//')"
   NAME4="${NAME3}.mp4"
#   echo $NAME2
#   echo ${NAME}
   grep -F "$(echo ${NAME4})" input2.txt > /dev/null || echo "${LINE}" >> missing.txt
done < input1.txt

Many of you suggested single line code which works at some times and remaining not.

But the above code works in all situations. I have created this script step by step.

1) First I have removed square brackets including its contents from each line of input1.txt using the following code.

NAME="${LINE// \[*\]/}"

2) I tried to remove trailing space from each line using the following code.

NAME2=${NAME%????}
NAME3="$(echo -e "${NAME2}" | sed -e 's/^[[:space:]]*//')"
NAME4="${NAME3}.mp4"

3) Finally I have searched all lines of input2.txt containing the contents of variable NAME4 using the following code.

grep -F "$(echo ${NAME4})" input2.txt > /dev/null || echo "${LINE}" >> missing.txt

The most important step is to remove square brackets and including its contents through step 1.

The above script is working fine except step 2 I mean the code provided in step 2 can't remove trailing space.

Trailing space means.

input1.txt consists of some lines with pattern

This is a video file 1 [abcdefg].mp4

input2.txt consists of lines with pattern

This is video file 1.mp4

Generally the above script must ignore this line and it must not redirect this particular line to missing.txt file but it is being redirected.

I have included code to remove trailing white space before performing grep. I think that trailing white space has been removed but still redundant line is being redirected to missing.txt.

I think that if the above script run properly the missing text file must contain only 80 lines but at present it consists of 165 lines.

Something is better than nothing and that too you may be irritated if I go on saying not working not working etc.,. So I am aborting the current work.

If you still want to know what's not working then my answer is search is not performed well for lines containing trailing white space before square brackets in input1.txt.

Thanks for your patience.
 
Last edited:
Back
Top