Sunday, March 12, 2017

Shell script to fetch valid and unexpired domain from URL list

8:26 AM Posted by Dilli Raj Maharjan No comments


Lets support we have URLs in the file as below:
[root@core tmp]# cat > url_lists.txt
https://adhitprofits.com/?ref=just4us
http://ladys.ro/PGEUhnzx
http://8322699.fgxpress.com/
http://www.hourlychicpay.com/?ref=Galalmog
http://www.uprivatebanking.com/massimo8
http://promatrixplus.com/?ref=2909
http://www.trideci.com/index.php?rid=galalmog123
http://turbopayplan.com/?ref=wealthnmore
http://galalmog.stiforptour.com/


We need to list all the domain that are valid and not expired with sorted as below:
adhitprofits.com
gladys.ro
fgxpress.com
hourlychicpay.com
uprivatebanking.com
gpromatrixplus.com
trideci.com
gturbopayplan.com
stiforptour.com

Use Following shell command to filter out the domains.
cat all_url.txt | grep ".com" | grep -v [0-9] | grep -v "-" | grep -v "@" | tr '[:upper:]' '[:lower:]' | \
sed -e 's/http[s]*:\/\///g' | awk -F "/" '{print $1}' | awk -F "." '{print $(NF-1)"."$NF}' | sort | uniq > all_url_filtered_v1.txt

Descriptions:
Command Description
grep ".com":Search for .com only
grep -v [0-9]:Removes url with numbers
grep -v "-":Removes url with hyphen
grep -v "@":Removes url with @
tr '[:upper:]' '[:lower:]':Converts upper case into lower case
\:Line break for the command that make our command more readable
sed -e 's/http[s]*:\/\///g':Find and replace http:// and https:// with nothing
awk -F "/" '{print $1}':Split the URL on basic of / and print the first part only
awk -F "." '{print $(NF-1)"."$NF}':Split the URL on basic of . and fetch second last, last and add "." in between
sort:Sort the output
uniq:list all the unique output only
> all_url_filtered_v1.txt:Redirect all output to all_url_filtered_v1.txt



Install jwhois package that will fetch the Expiration Date of the domain.
yum install jwhois-4.0-19.el6.x86_64
whois oraclecloudadmin.com
whois oraclecloudadmin.com | grep "Expiration Date" | awk '{print $NF}' | awk -F "T" '{print $1}'





Create a script file that read every line from the output file and use dig command to find the valid domains on basic of ANSWER SECTION and use whois command to find the Expiration Date and compare with current date of the system.

cat test.sh  
#!/bin/bash

# loop throught the end of file
while read line
do
        # dig to find out the valid command
        nsoutput=$(dig ${line} | grep -A1 "ANSWER SECTION" | grep -v "ANSWER SECTION")
        # If we have answer section with IP address then the domain is considered as VALID
        if [ "${nsoutput}" != "" ]; then
                # Fetch the expiriry date of the domain and compare with the date.
                ED=$(whois ${line} | grep "Expiration Date" | awk '{print $NF}' | awk -F "T" '{print $1}')
                if [[ "${ED}" > "$(date +%Y-%m-%d)" ]]
                then
                        echo ${line} >> all_url_filtered_final.txt
                fi
        fi
 done < all_url_filtered_v1.txt



0 comments:

Post a Comment