Skip to content

[RFC] StringHelper: check if string as utf8 chars before doing utf8 conversion? #19

@andrepereiradasilva

Description

@andrepereiradasilva

Steps to reproduce the issue

Was testing the performance of StringHelper calls.
For what i understand they are mostly a way to make native php functions utf8 compatible.

But this makes some performance overhead since even pure ascii strings will try to do the utf8 conversion.

So, made some tests with 1000 iterations in 3 scenarios:

  • Current: using StringHelper method
  • Utf8CheckBefore: just use utf8_decode($str) === $str (or even StringHelper::is_ascii($str) === true) to decide if it should use the StringHelper method or the php native method
  • Native: PHP native method (with utf8 issues)

Used to strings to test:

  • ASCII String: "Without utf-8 chars"
  • UTF-8 String: "With utf-8 chars 纹身馆简介你好"

Results on some methods

strpos tests

> ASCII String: [Current: 1.308 ms | Utf8CheckBefore:  0.740 ms | Native: 0.140 ms]
> UTF-8 String: [Current: 1.313 ms | Utf8CheckBefore:  2.268 ms | Native: 0.132 ms]

strtoupper tests

> ASCII String [Current: 6.142 ms | Utf8CheckBefore:  0.339 ms | Native: 0.080 ms]
> UTF-8 String: [Current: 6.977 ms | Utf8CheckBefore:  7.769 ms | Native: 0.103 ms]

strrev tests

> ASCII String [Current: 3.864 ms | Utf8CheckBefore:  0.378 ms  | Native: 0.095 ms]
> UTF-8 String: [Current: 4.524 ms | Utf8CheckBefore:  5.275 ms  | Native: 0.111 ms]

Remarks

It's clear that the Utf8CheckBefore method is much faster in plain ascii strings. And a little slower in strings with utf-8 chars.
Since many of the StringHelper usage across the cms (and overall) should be plain ascii shouldn't be better to check if the strign as utf-8 chars for deciding if it should use utf8 methods or not?

System information (as much as possible)

String 2.0
PHP 7.0.19 with mbstring extension (without is even slower)

Additional comments

@frankmayer would aprreciate your contribution here too

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions