Text lengths

by Ricardo Fernández Serrata

Version 2 (May 18, 2022)

Download (30 downloads)

All the different lengths that a string can have. Because Unicode is not ASCII.

CU: Code-Units
CP: Code-Points
B: Bytes

The length operator (`#`) returns a value at constant time, because Java stores metadata of strings so there's no need to scan the string. findAll() ALWAYS has a best case linear runtime, and an unbounded worst case. This means that `#` is always fast, and findAll() is as slow as the size of its input (and can get even worse if the regex has backtracking, which could lead to EXPONENTIAL runtime)

If `s` is a text string then `#split(s) = #s` is always true, because `split(text, null)` works at the CU level.

If your flow has to check how many CPs a text has, and has to do it repeatedly on the same text, store the result of `findAll` once in a variable, and code your flow to read the variable instead of calling `findAll`. Your flow will become faster and energy-saving.

In general, `char(x)[0] != x`, not just because `x` might be non-integer, but because `char` can return surrogate pairs, while `[0]` selects the 1st CU (ignoring the 2nd surrogate CU of the pair)

Related: hsivonen.fi/string-length

LICENSE: https://unlicense.org

4.0 average rating from 1 reviews

5 stars
4 stars
3 stars
2 stars
1 star

Rate and review within the app in the Community section.