Investigation into ASCII ctype inherent methods performance with lookup table

Prompted by https://github.com/rust-lang/rust/issues/68983#issuecomment-613753557, I started looking into whether the performance of methods like `is_ascii_alphabetic` could be improved by using a lookup table.

The full list of methods:
- `is_ascii_alphabetic`
- `is_ascii_alphanumeric`
- `is_ascii_control`
- `is_ascii_digit`
- `is_ascii_graphic`
- `is_ascii_hexdigit`
- `is_ascii_lowercase`
- `is_ascii_punctuation`
- `is_ascii_uppercase`
- `is_ascii_whitespace`

## Implementations
Currently, all of these methods are implemented by [matching on characters/ranges](https://github.com/rust-lang/rust/blob/5fd2f06e99a985dd896684cb2c9f8c7090eca1ab/src/libcore/num/mod.rs#L4657-L4659):
```rust
#[inline]
pub const fn is_ascii_alphabetic(&self) -> bool {
    matches!(*self, b'A'..=b'Z' | b'a'..=b'z')
}
```

I investigated two ways of implementing these functions with a lookup table:

A lookup table for the entire `u8` range:  
```rust
#[inline]
pub const fn is_ascii_alphabetic(&self) -> bool {
    const LOOKUP_TABLE : [bool; 258] = ...;
    
    LOOKUP_TABLE[*self as usize]
}
```
A hybrid approach by checking if the byte is ascii first, reducing table size by half:
```rust
#[inline]
pub const fn is_ascii_alphabetic(&self) -> bool {
    const LOOKUP_TABLE : [bool; 128] = ...;
    
    self.is_ascii() && LOOKUP_TABLE[*self as usize]
}
```
I will be calling these implementations `branch`, `lookup` and `hybrid` respectively throughout the rest of this report.

Using the features `const_if_match` and `const_loop`, the lookup tables for `lookup` and `hybrid` can be easily generated:
<details>

```rust
macro lookup_table($condition:expr, $n:literal) {
    {
        const LOOKUP_TABLE: [bool; $n] = {
            let mut lookup = [false; $n];

            let mut i = 0;
            while i < $n {
                lookup[i] = $condition(&(i as u8));
                i += 1;
            }

            lookup
        };

        LOOKUP_TABLE
    }
}

#[inline]
pub const fn is_ascii_alphabetic_lookup(byte: &u8) -> bool {
    lookup_table!(u8::is_ascii_alphabetic, 256)[*self as usize]
}

#[inline]
pub const fn is_ascii_alphabetic_hybrid(byte: &u8) -> bool {
    byte.is_ascii() && lookup_table!(u8::is_ascii_alphabetic, 128)[*self as usize]
}
```
</details>

The instructions these different implementations compile down to can be compared on [godbolt](https://rust.godbolt.org/z/aFnVi4).

## Benchmark
*Note: I do not have enough experience with benchmarking to know if this approach is fully representative, let me know if you know any way the benchmark can be improved.*

The task I used for benchmarking is iterating through the bytes of a source text and counting how many of them satisfy a condition, e.g. `is_ascii_alphabetic`: 

```rust
const SOURCE_TEXT : &'static str = ...;

#[bench]
fn is_ascii_alphabetic_branch {
    $bench.iter(|| {
        let mut total = 0;
        for byte in SOURCE_TEXT.bytes() {
            if byte.is_ascii_alphabetic { total += 1; }
        }

        assert_eq!(total, ...);
    });
}
```

This results in a tight loop, which due to caching might be favorable to the lookup tables. However, some [quick searching](https://github.com/search?l=Rust&q=is_ascii_alphabetic&type=Code) for the use of the ascii methods in open source projects reveals at least a few instances of them being used in a loop or iterator filter, so the benchmark is representative of at least some real-world usage. 

As a source text I used "Hamlet, Prince of Denmark", adapted from https://www.gutenberg.org/files/27761/27761-0.txt, a primarily ascii text of 4k+ lines:

<details>
<summary>Benchmark</summary>

```rust
#![feature(test)]
#![feature(decl_macro)]
#![feature(const_if_match)]
#![feature(const_loop)]
#![feature(const_ascii_ctype_on_intrinsics)]

extern crate test;


macro lookup_table($condition:expr, $n:literal) {
    {
        const LOOKUP_TABLE: [bool; $ n] = {
            let mut lookup_table = [false; $ n];

            let mut i = 0;
            while i < $ n {
                lookup_table[i] = $ condition(&(i as u8));
                i += 1;
            }

            lookup_table
        };

        LOOKUP_TABLE
    }
}

macro bench($condition:ident, $branch_ident:ident, $hybrid_ident:ident, $lookup_ident:ident, $expected:literal) {
    #[bench]
    fn $branch_ident(bench: &mut test::Bencher) {
        bench_impl!(bench, u8::$condition, $expected);
    }

    #[bench]
    fn $hybrid_ident(bench: &mut test::Bencher) {
        #[inline]
        fn condition(byte: &u8) -> bool {
            byte.is_ascii() && lookup_table!(u8::$condition, 128)[*byte as usize]
        }

        bench_impl!(bench, condition, $expected);
    }

    #[bench]
    fn $lookup_ident(bench: &mut test::Bencher) {
        #[inline]
        fn condition(byte: &u8) -> bool {
            lookup_table!(u8::$condition, 256)[*byte as usize]
        }

        bench_impl!(bench, condition, $expected);
    }
}

macro bench_impl($bench:expr, $condition:expr, $expected:literal) {
    $bench.iter(|| {
        let mut total = 0;
        for byte in SOURCE_TEXT.bytes() {
            if $condition(&byte) { total += 1; }
        }

        assert_eq!(total, $expected);
    });
}


// "Hamlet, Prince of Denmark", adapted from https://www.gutenberg.org/files/27761/27761-0.txt
const SOURCE_TEXT: &'static str = include_str!("hamlet.txt");

bench!(is_ascii_alphabetic, is_ascii_alphabetic_branch, is_ascii_alphabetic_hybrid, is_ascii_alphabetic_lookup, 83663);
bench!(is_ascii_alphanumeric, is_ascii_alphanumeric_branch, is_ascii_alphanumeric_hybrid, is_ascii_alphanumeric_lookup, 83718);
bench!(is_ascii_control, is_ascii_control_branch, is_ascii_control_hybrid, is_ascii_control_lookup, 8216);
bench!(is_ascii_digit, is_ascii_digit_branch, is_ascii_digit_hybrid, is_ascii_digit_lookup, 55);
bench!(is_ascii_graphic, is_ascii_graphic_branch, is_ascii_graphic_hybrid, is_ascii_graphic_lookup, 94284);
bench!(is_ascii_hexdigit, is_ascii_hexdigit_branch, is_ascii_hexdigit_hybrid, is_ascii_hexdigit_lookup, 23825);
bench!(is_ascii_lowercase, is_ascii_lowercase_branch, is_ascii_lowercase_hybrid, is_ascii_lowercase_lookup, 76787);
bench!(is_ascii_punctuation, is_ascii_punctuation_branch, is_ascii_punctuation_hybrid, is_ascii_punctuation_lookup, 10566);
bench!(is_ascii_uppercase, is_ascii_uppercase_branch, is_ascii_uppercase_hybrid, is_ascii_uppercase_lookup, 6876);
bench!(is_ascii_whitespace, is_ascii_whitespace_branch, is_ascii_whitespace_hybrid, is_ascii_whitespace_lookup, 33893);
```
</details>

Files: [benches.zip](https://github.com/rust-lang/rust/files/4708850/benches.zip)

### Results
| method | `branch` | `hybrid` | `lookup` |
|---|---|---|---|
| `is_ascii_alphabetic` | 364,130 ±15,302 | 349,235 ±32,393 | 402,965 ±22,624 |
| `is_ascii_alphanumeric` | 310,435 ±12,975 | 346,025 ±18,873 | 404,880 ±23,582 |
| `is_ascii_control` | 154,802 ±17,761 | 105,160 ±10,883 | 197,493 ±73,893 |
| `is_ascii_digit` | 96,971 ± 23,020 | 98,597 ± 9,754 | 97,051 ± 4,354 |
| `is_ascii_graphic` | 274,238 ± 12,480 | 296,010 ± 24,268 | 340,445 ± 32,739 |
| `is_ascii_hexdigit` | 358,390 ± 21,165 | 280,170 ± 17,197 | 309,484 ±9,983 |
| `is_ascii_lowercase` | 329,840 ± 45,032 | 334,270 ±21,637 | 398,315 ±18,173 |
| `is_ascii_punctuation` | 372,600 ± 22,126 | 157,147 ±7,353 | 173,555 ±19,918 |
| `is_ascii_uppercase` | 131,011 ±26,059 | 135,658 ±19,618 | 142,463 ±19,191 |
| `is_ascii_whitespace` | 232,675 ±40,710 | 292,750 ±61,629 | 350,440 ±56,609 |

Adapted from the output of `cargo bench`, results are in ns.

### Analysis
It seems the `hybrid` approach with the smaller lookup table was overall as fast or faster than `lookup`, even though it has to do an extra check (smaller table fits better in cache?).

The `hybrid` approach was significantly faster than the current `branch` implementation for:
- `is_ascii_control` (105,160 ±10,883 vs. 154,802 ±17,761)
- `is_ascii_hexdigit` (280,170 ± 17,197 vs. 358,390 ± 21,165)
- `is_ascii_punctuation` (157,147 ±7,353 vs. 372,600 ± 22,126)

Looking at the current implementation for [is_ascii_hexdigit](https://github.com/rust-lang/rust/blob/5fd2f06e99a985dd896684cb2c9f8c7090eca1ab/src/libcore/num/mod.rs#L4828-L4830) and [is_ascii_punctuation](https://github.com/rust-lang/rust/blob/5fd2f06e99a985dd896684cb2c9f8c7090eca1ab/src/libcore/num/mod.rs#L4865-L4867) and [compiler output](https://rust.godbolt.org/z/TKDJws), these two methods have the most complex branches of all the ascii methods, indicating that there is indeed potential for a faster implementation using a lookup table.

Why the `hybrid` version of `is_ascii_control` is faster I can not say, maybe caching or an artifact of the way I benchmarked?

### Conclusion
From this preliminary investigation it seems that at least some of the ascii methods can be implemented faster, prompting further investigation. 

As further steps I propose first evaluating the merits of this benchmark, and then conduct more. There are still a number of questions: are the results of this benchmark correct, or is another factor interfering with the results? Is the benchmark representative of real-world usage? Is the source text influencing the benchmark?  

Once we are sure our benchmarks are measuring what we want, we can proceed with the investigation: quantify the trade-off of speed vs. memory, and compare alternate implementations. This information will inform the discussion on whether it makes sense to change the current implementations. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigation into ASCII ctype inherent methods performance with lookup table #72895

Implementations

Benchmark

Results

Analysis

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

method	`branch`	`hybrid`	`lookup`
`is_ascii_alphabetic`	364,130 ±15,302	349,235 ±32,393	402,965 ±22,624
`is_ascii_alphanumeric`	310,435 ±12,975	346,025 ±18,873	404,880 ±23,582
`is_ascii_control`	154,802 ±17,761	105,160 ±10,883	197,493 ±73,893
`is_ascii_digit`	96,971 ± 23,020	98,597 ± 9,754	97,051 ± 4,354
`is_ascii_graphic`	274,238 ± 12,480	296,010 ± 24,268	340,445 ± 32,739
`is_ascii_hexdigit`	358,390 ± 21,165	280,170 ± 17,197	309,484 ±9,983
`is_ascii_lowercase`	329,840 ± 45,032	334,270 ±21,637	398,315 ±18,173
`is_ascii_punctuation`	372,600 ± 22,126	157,147 ±7,353	173,555 ±19,918
`is_ascii_uppercase`	131,011 ±26,059	135,658 ±19,618	142,463 ±19,191
`is_ascii_whitespace`	232,675 ±40,710	292,750 ±61,629	350,440 ±56,609

Investigation into ASCII ctype inherent methods performance with lookup table #72895

Description

Implementations

Benchmark

Results

Analysis

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions