adblock for unbound, dnsmasq, dnscrypt-proxy

Started by guest19228, October 30, 2018, 01:11:49 AM

Previous topic - Next topic
Inspired by this https://devinstechblog.com/block-ads-with-dns-in-opnsense/ and this https://github.com/openwrt/packages/tree/master/net/adblock/files and forced by the demands of a project I created a little script to create blacklists for adblocking in unbound. My first attempt was adblock-update-hosts.sh. It is very close to the original script from Devin's Tech Blog. Because I believed it is not flexible enough and every new blacklist requires to add more code. To achieve more flexibility and making the mangement of blacklists more easy I did a complete rework. The new result is adblock.sh + configuration files in /var/adblock. This one is  in a very early state and only one public available blacklist is added so far. All the files you can find in the attached archive.
However I'm not a programmer and have no intention to become one. Most of the code is "stolen" from google searches and adapted to my needs without fully understanding what it will do if it is more complex (especially the "damned" regex used in the script).  All I will still do is add the remaining blacklists from here https://github.com/openwrt/packages/blob/master/net/adblock/files/adblock.conf.
So I'm looking for someone who is interested to take over.  Eventually it is possible to combine it with that project: https://forum.opnsense.org/index.php?topic=9523.0 so that it will become a frontend for my little script.
Please review and rip it into pieces. ;D

Upate: removed that attachement because I created a new version

I'll try to add some of those lists to BIND plugin. UnboundBL will also come in some time .. probably this year.

This is the final version. It will now create blacklists for use with unbound, dnsmasq, bind and dnscrypt-proxy.

To get the following help text just call the script with an argument.

To enable a blacklist source create a symlink for it from /var/adblock/blacklists_available to /var/adblock/blacklists_enabled
To disable a blacklist source delete the symlink from /var/adblock/blacklists_enabled
Be careful how many blacklist sources you enable. The resulting lists may become real huge (about 62 MB for bind, 30-32 MB for unbound.
To enable logging and truncating the log file uncomment the referring lines in the script
To automatically activate updated blacklists in your DNS server or resolver uncomment the referring lines "pluginctl dns" at the end of the script
To enable that list in unbound add "include: /var/adblock/unbound/blacklist-unb-nxd.conf" if you want a nxdomain replay or "include: /var/adblock/unbound/blacklist-unb-ip.conf" if you want to point to ip 0.0.0 for those domains to the custom options for unbound in the gui.
To enable that list in dnsmasq add "servers-file=/var/adblock/dnsmasq/blacklist-dnsmasq.conf" to the custom options for dnsmasq in the gui.
How to enable it in bind I could not figure out, it requires changes in /urs/local/etc/named.conf but this is not possible via the gui.
You would have to add "include "/var/adblock/bind/blacklist.zones";" to named.conf.
To enable the blacklist in dnscrypt-proxy you have to add "include "/var/adblock/dnscrypt/dnscrypt_blacklist.conf"" to /usr/local/etc/dnscrypt-proxy/dnscrypt-proxy.toml
Because dnscrypt-proxy is currently not supported via the gui you have to restart it manually every time the blacklist changes or just uncomment the referring line at the end of the document.
To create the blacklists just run the program whithout argument.
Blacklist sources available:
        adaway                         focus on mobile ads, infrequent updates, approx. 400 entries
        adguard                        combined adguard dns filter list, frequent updates, approx. 17.000 entries
        bitcoin                        focus on malicious bitcoin mining sites, infrequent updates, approx. 80 entries
        custom blacklist               static local domain blacklist, always deny these domains
        disconnect                     mozilla driven blocklist, numerous updates on the same day, approx. 4.700 entries
        dshield                        generic blocklist, daily updates, approx. 3.500 entries
        feodo                          focus on feodo botnet, daily updates, approx. 0-10 entries
        hphosts-adservers              broad blocklist with ad/tracking servers, monthly updates, approx. 19.200 entries
        hphosts-exploit sites          broad blocklist with exploit sites, irregular updates, approx. 1.100 entries
        hphosts-fraud sites            broad blocklist with fraud sites, monthly updates, approx. 183.300 entries
        hphosts-hijack sites           blocklist with hijack sites, irregular updates, approx. 250 entries
        hphosts-malwareservers         broad blocklist with malware sites, monthly updates, approx. 199.000 entries
        hphosts-phishing sites         broad blocklist with phishing sites, monthly updates, approx. 150.000 entries
        hphosts-warez/piracy sites     broad blocklist with warez/piracy sites, monthly updates, approx. 2.100 entries
        malware                        broad blocklist, daily updates, approx. 18.300 entries
        malwarelist                    focus on malware, daily updates, approx. 1.200 entries
        openphish                      focus on phishing, numerous updates on the same day, approx. 2.400 entries
        ransomware                     focus on ransomware by abuse.ch, numerous updates on the same day, approx. 1900 entries
        reg_cn                         focus on chinese ads plus generic easylist additions, daily updates, approx. 11.700 entries
        reg_cz                         focus on czech ads maintained by Turris Omnia Users, infrequent updates, approx. 100 entries
        reg_de                         focus on german ads plus generic easylist additions, daily updates, approx. 9.200 entries
        reg_id                         focus on indonesian ads plus generic easylist additions, weekly updates, approx. 9.600 entries
        reg_nl                         focus on dutch ads plus generic easylist additions, weekly updates, approx. 9.400 entries
        reg_pl                         focus on polish ads, daily updates, approx. 90 entries
        reg_ro                         focus on romanian ads plus generic easylist additions, weekly updates, approx. 9.400 entries
        reg_ru                         focus on russian ads plus generic easylist additions, weekly updates, approx. 14.500 entries
        shalla                         huge blocklist archive subdivided in different categories, daily updates. Check http://www.shallalist.de/categories.html for more categories
        spam404                        generic blocklist, infrequent updates, approx. 6.000 entries
        sysctl                         broad blocklist, weekly updates, approx. 16.500 entries
        urlhaus                        urlhaus RPZ domains by abuse.ch, numerous updates on the same day, approx. 3.500 entries
        ut_capitole                    huge blocklist archive subdivided in different categories, daily updates. Check https://dsi.ut-capitole.fr/blacklists/index_en.php for more categories
        whocares                       broad blocklist, weekly updates, approx. 10.000 entries
        winhelp                        broad blocklist, infrequent updates, approx. 13.000 entries
        winspy                         focus on windows spy & telemetry domains, infrequent updates, approx. 300 entries
        youtube                        focus on youtube ad-related subdomains, dynamic request API, approx. 150 entries
        yoyo                           focus on ad related domains, weekly updates, approx. 2.400 entries
        zeus                           focus on zeus botnet by abuse.ch, daily updates, approx. 400 entries

Although it is working I believe the code is somewhat ineffective. I would appreciate if an experienced shell scripter (especially one who is experienced with awk and regex) would review and optimize.

@mimugmail: Great to read. Eventually you can use my script as backend to feed the supported dns servers/resolvers.


I'll have a look next week. Is there an easy way to add more custom lists?
Are you sure about the copyright in the script? Not sure if allowed to remove others when you were "inspired" by it.

@mimugmail:
If you want to add a new blacklist source you just have to create a file in /var/adblock/blacklists_available. You can take the exisiting files as example. To enable it create a symlink to it in /var/adblock/blacklists_enabled. I was choosing that way because I believe it is the easiest way to handle the blacklist sources. Then there is no need to modify the source code of the script, and using it in a gui should be also very simple (just display the directory listing and the description lines from each of the files there).
If you want to create files for other resolvers/dns servers you have just to add a view variables like
readonly unbound_path="$config_path"'unbound/'
readonly unbound_nxd_block_file='blacklist-unb-nxd.conf'
readonly unbound_nxd_string='BEGIN{print "server:"}{printf "local-zone: \"%s\" static\n", $1}'

and some code lines at the and of the script like
# Converting to unbound format
awk "$unbound_ip_string" "$tmp_path$tmp_file_1" > "$unbound_path$unbound_ip_block_file"
awk "$unbound_nxd_string" "$tmp_path$tmp_file_1" > "$unbound_path$unbound_nxd_block_file"
echo "$unbound_path$unbound_ip_block_file has "$( wc -l "$unbound_path$unbound_ip_block_file" | awk '{ print $1 }' )" lines"
echo "$unbound_path$unbound_nxd_block_file has "$( wc -l "$unbound_path$unbound_nxd_block_file" | awk '{ print $1 }' )" lines"

The echo lines are only a debug help, forgot to comment them out  :D

When I created the new script I had in mind to make extensions as easy as possible. If you have suggestions how to make it better,  just let me know.

For the copyright
https://github.com/openwrt/packages/blob/master/net/adblock/ is GPLv3 but I did not use any code from there it was just the inspiration and a help which blacklist sources to add.

https://devinstechblog.com/block-ads-with-dns-in-opnsense/ does seem to have not any copyright and the project it is based on is not longer available.

Most of the code is a complete rewrite. What's still activley used from the original code may be 3 or 4 lines like
## Clean up any stale tempfile
echo "Removing old files..."

All other is my own creation.
If you have any problem with the copyright just remove it. I do not care. It was added by the IDE. It was asking me for one and I chose the freebsd license.

In the version I attached here I fixed some typos and added some more comments to make understanding of the code a little bit easier.

However as I stated before I'm not a programmer and most of the time I spent on figuring out the regex strings for awk to remove the unneeded parts from the original blacklists. Although regex is very powerful it is real pain in the ass for the unexperienced. :'(

Another update hopefully the last.
Fixed some more typos, changed and removed some code shellcheck was complaining about.
What is left and where I do not know how to fix are some variable definitions like
readonly unbound_ip_string='BEGIN{print "server:"}{printf "local-data: \"%s A 0.0.0.0\"\n", $1}'
                           ^-- SC2016: Expressions don't expand in single quotes, use double quotes for that.

I need to get that strings passed to awk exactly like this whithout doing any expansion. If someone knows how to write that better, please help.

The program comes with only the "personal/private" custom blacklist enabled which is empty by default. If you want to use some of the public blacklists provided, please do not forget to create the symlink for them in /var/adblock/blacklists_enabled.

This is my final version,  no more contributions are planned

Changelog

    2018-11-06

    * Restructured code, putting most of it in functions making understanding it more complicated  :P
    * eliminated several shellcheck complaints
    * added code to call script with options
    * added code to restart dns subsystem via cli
    * added code to display version via cli
    * added code to display info text via cli
    * added code to display help about command line options via cli
    * added code to insert blacklist line in dnscrypt-proxy config file via cli
    * added code to remove blacklist line from dnscrypt-proxy config file via cli
    * added code to restart dnscrypt-proxy via cli
    * added code to insert blacklist line into bind config file via cli
    * added code to delete blacklist line from bind config file via cli
    * added code to restart bind service but disabled it because it can be done via WebUI
    * added code to enable/disable blacklist sources via cli

    2018-11-04

    * Changed path for unbound blacklist files back to /var/unbound because unbound seems to be unhappy when the blacklist file is not in that path at boot time got error  "fatal error: Could not read config file: /var/unbound/unbound.conf. Maybe try unbound -dd, it stays on the command line to see more errors, or unbound-checkconf" although it did not have any problems whith the previous location when stopping and starting it via the gui
    * Changed path for bind blacklist files to /usr/local/etc/namedb bind wants to have the blacklist files in its working directory
    * Added functionality to insert needed include line into /usr/local/etc/namedb/named.conf for bind
    * Added functionality to restart bind service
    * Updated help text to reflect current changes
    * minor code enhancements by combining several piped awk commands
Have fun with this little script  8)