Master Hosts File, step 4 of 4

by Baz Cuda

Version 1 (July 15, 2019)

Download (20 downloads)

com.llamalab.safs.NoSuchFileException ? See the PPS at the end of this description ;)

Steps 1 - 3 are uploaded as a separate flow:

1. Downloads the Steven Black repository from github.
(https://github.com/StevenBlack/hosts/)
2. Unzips the file.
3. Merges all the many "hosts" files into one file.

Step 4 processes the 73MB merged file, currently standing at ~894,000 lines, removes all lines which don't contain a URL, and then removes all the duplicate URLs. This reduces the hosts file down to 72,881 distinct URLs (in a 2mb file) which can then be loaded into apps such as DNS66.

This was written as a learning exercise with a practical purpose.
Constructive comments are most welcome.

Please note, I only run step 4 on my Asus gaming laptop (running inside BlueStacks, the Android emulator). It takes 30 minutes, but it shows the amazing capabilities of Automate.
My Samsung Galaxy s10+ 512GB took 50 minutes to process the first 170,000 records; my laptop caught it up in less than 5 minutes :D. It shows that, given the right resources, Automate can be used for some hefty processing. On my phone, Automate was processing 100 records per second; on my laptop it was processing 1000 records per second.

The "Stop" button in Automate sometimes has difficulty cancelling a flow which is in a fast, intensive loop. I've added a notification which, if you click or dismiss it, will stop the flow.
Also, in block 132 you can, if you wish, set a maximum number of records to be processed from the input file.

I recommend you turn off logging. This flow will then display much more meaningful info in the live log window:

number of URLs / number of lines examined [number of non-URL lines rejected, number of duplicate URLs rejected]

This is displayed either on every 1000th new URL, or every 1000th duplicate URL.

I hope you will find my use of Automate interesting and parts of the flow useful in your own flows. I've tried to exploit the flexibility of the layout tools to make the flow, um....flow :D

p.s. Automate needs an efficient StringBuilder function (see block 63) :D

p.p.s Just realised that block 139 (which takes a full Automate backup right at the top of the flow) contains a hard-coded folder on my device. You can either change it to a folder of your choice or remove the block entirely: it was there as a precaution during testing as I had to uninstall Automate to force it to crash out of the tight loop, and then re-install and restore from backup. The need for the backup has been alleviated by the fact that the flow now works and, as a result, any interaction with the notification will successfully stop the flow.

3.0 average rating from 1 reviews

5 stars
0
4 stars
0
3 stars
1
2 stars
0
1 star
0
Warnings
0

Reviews and ratings can be submitted in the app.