HUGE 📄file processor

by Ricardo Fernández Serrata

Version 6 (February 6, 2021)

Download (52 downloads)

This shows how to read and process large amounts of data without causing an overflow. Thanks to buffering, loading data from storage won't blow up the memory heap.

The example data processor here is a byte inverter (you can call it "NOTter"), it inverts all bits of each and all bytes. Of course you can use buffering for something else, like encryption, regex find-&-replace, encoding, decoding, parsing, serializating, etc...

bs (block size) must be specified in bytes not kilobytes. It can be any positive integer you want, but it's recommended to use powers of 2. bs defines the max DD buffer size and AM array size (but because AM is a dynamic lang, arrays allocate more memory than necessary). Bigger = faster but it can have diminishing returns. Use chunks (blocks) small enough to avoid crashes and/or memory allocation errors.

Don't try to invert an inverted file while the original (non-inverted) file is still in the same directory with the same name. This flow always appends data instead of overwriting and I didn't add runtime support for custom destination path.

Please understand that AM is slow even for 2MB files, especially when iterating over individual bytes instead of SWords, DWords, or QWords.

Don't use this flow with +128GB files because bk (index) could reach an overflow. To avoid number overflow, create your own implementation of BigInt.

DD (cmd) executable should be available on any Android device since v2.3 "Gingerbread", thus compatibility is guaranteed.

Update: Now looping is reduced by processing 2 bytes (1 SWord) "simultaneously". I haven't done benchmarks but it should be faster. Piping dd output to xxd, base64, or od cmds, is bad for retrocompatibility and memory allocation (even though storage R/W is improved), so I avoided their use. Also AM's b64 decoder doesn't support custom charsets like ISO-8859-1, so trying to decode B64 corrupts data. To avoid this, xxd and hexDecode must be used, but memory allocation gets even worse

4.7 average rating from 3 reviews

5 stars
4 stars
3 stars
2 stars
1 star

Reviews and ratings can be submitted in the app.