Text lengths

by Ricardo Fernández Serrata

Version 2 (September 8, 2021)

Download (8 downloads)

All the different lengths that a text can have. Because Unicode is not ASCII.

CU: Code-Units
CP: Code-Points
B: Bytes

The length operator (#) returns a value at constant time, because Java stores metadata of strings so there's no need to scan the string. findAll() ALWAYS has a best case linear runtime, and an unbounded worst case. This means that `#` is always fast, and findAll() is as slow as the size of its input (and can get even worse if the regex has backtracking, which could lead to EXPONENTIAL runtime)

If s is a text string then #split(s) = #s is always true, because split(text, null) works at the CU level.

If your flow has to check how many CPs a text has, and has to do it repeatedly on the same text, store the result of #findAll() once in a variable, and code your flow to read the variable instead of calling findAll(). Your flow will become faster and energy-saving.

In general, char(x)[0] != x, not just because `x` might be non-integer, but because char() can return surrogate pairs, while `[0]` selects the 1st code-unit (ignoring the 2nd surrogate CU of the pair)

4.0 average rating from 1 reviews

5 stars
4 stars
3 stars
2 stars
1 star

Rate and review within the app in the Community section.