Shell script to find invalid URLs

#!/bin/bash
if [ $# -ne 1 ];
then
      echo -e “$Usage: $0 URL\n”
      exit 1;
fi

echo Broken links:

mkdir /tmp/$$.lynx
cd /tmp/$$.lynx

# Visit web pages recursively and create lists of all hyperlinks in the website.
lynx -traversal $1 > /dev/null
count=0;

#reject.datThe file contains all links.
sort -u reject.dat > links.txt

while read link;
do
  output=`curl -I $link -s | grep “HTTP/.*OK”`;
      if [[ -z $output ]];
      then
          echo $link;
          let count++
      fi
done < links.txt

 [ $count -eq 0 ]  &&  echo No broken links found.    #The contents in the brackets are real time, and the output after execution is real.

LynxOnly Https404 URLs are returned, so other error types of URLs are omitted, so you need to manually check the status of the returns.

 

Leave a Reply

Your email address will not be published. Required fields are marked *