Swift 中的字符串是如何比较的?

今天写二叉树用到了 Comparable 协议,突然想到 Swift 中 String 也是遵守 Comparable 的,它们是如何互相比较的呢?

🤔

首先我们来看一下下面这一段代码:

0x00.png

🤨?

🤯

首先我们查找一下 String 的定义中的 extension:

0x01.jpg

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
extension String : Equatable {
/// Returns a Boolean value indicating whether two values are equal.
///
/// Equality is the inverse of inequality. For any values `a` and `b`,
/// `a == b` implies that `a != b` is `false`.
///
/// - Parameters:
/// - lhs: A value to compare.
/// - rhs: Another value to compare.
@inlinable public static func == (lhs: String, rhs: String) -> Bool
}
extension String : Comparable {
/// Returns a Boolean value indicating whether the value of the first
/// argument is less than that of the second argument.
///
/// This function is the only requirement of the `Comparable` protocol. The
/// remainder of the relational operator functions are implemented by the
/// standard library for any type that conforms to `Comparable`.
///
/// - Parameters:
/// - lhs: A value to compare.
/// - rhs: Another value to compare.
@inlinable public static func < (lhs: String, rhs: String) -> Bool
}

看来 String 确实是遵守 Comparable 协议的,那我们就得去找一下源码了。 下载 swift 源码

我们定位到 swift/stdlib/public/core/String.swift, 尝试搜索 Comparable 关键字,但是……?

0x02.png

不要灰心,通过查询 API Reference可以得知 Comparable 也是标准库的一份子,所以我们打开一下同目录的 Comparable.swift

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
public protocol Comparable : Equatable {
/// Returns a Boolean value indicating whether the value of the first
/// argument is less than that of the second argument.
///
/// This function is the only requirement of the `Comparable` protocol. The
/// remainder of the relational operator functions are implemented by the
/// standard library for any type that conforms to `Comparable`.
///
/// - Parameters:
/// - lhs: A value to compare.
/// - rhs: Another value to compare.
static func < (lhs: Self, rhs: Self) -> Bool
/// Returns a Boolean value indicating whether the value of the first
/// argument is less than or equal to that of the second argument.
///
/// - Parameters:
/// - lhs: A value to compare.
/// - rhs: Another value to compare.
static func <= (lhs: Self, rhs: Self) -> Bool
/// Returns a Boolean value indicating whether the value of the first
/// argument is greater than or equal to that of the second argument.
///
/// - Parameters:
/// - lhs: A value to compare.
/// - rhs: Another value to compare.
static func >= (lhs: Self, rhs: Self) -> Bool
/// Returns a Boolean value indicating whether the value of the first
/// argument is greater than that of the second argument.
///
/// - Parameters:
/// - lhs: A value to compare.
/// - rhs: Another value to compare.
static func > (lhs: Self, rhs: Self) -> Bool
}
extension Comparable {
/// Returns a Boolean value indicating whether the value of the first argument
/// is greater than that of the second argument.
///
/// This is the default implementation of the greater-than operator (`>`) for
/// any type that conforms to `Comparable`.
///
/// - Parameters:
/// - lhs: A value to compare.
/// - rhs: Another value to compare.
@inlinable
public static func > (lhs: Self, rhs: Self) -> Bool {
return rhs < lhs
}
/// Returns a Boolean value indicating whether the value of the first argument
/// is less than or equal to that of the second argument.
///
/// This is the default implementation of the less-than-or-equal-to
/// operator (`<=`) for any type that conforms to `Comparable`.
///
/// - Parameters:
/// - lhs: A value to compare.
/// - rhs: Another value to compare.
@inlinable
public static func <= (lhs: Self, rhs: Self) -> Bool {
return !(rhs < lhs)
}
/// Returns a Boolean value indicating whether the value of the first argument
/// is greater than or equal to that of the second argument.
///
/// This is the default implementation of the greater-than-or-equal-to operator
/// (`>=`) for any type that conforms to `Comparable`.
///
/// - Parameters:
/// - lhs: A value to compare.
/// - rhs: Another value to compare.
/// - Returns: `true` if `lhs` is greater than or equal to `rhs`; otherwise,
/// `false`.
@inlinable
public static func >= (lhs: Self, rhs: Self) -> Bool {
return !(lhs < rhs)
}
}

但是这里只有 对比较符的实现,而且都是基于 < 的实现,那么 < 又是怎么实现的呢?

🧐

我们继续来查找一下源码,因为源码都解决不能你的问题,你再怎么Google也是没有用的

0x03.png

这个名字吸引了我的注意,点进去看一下:

1
2
3
4
5
6
7
extension String : Comparable {
@inlinable @inline(__always) // For the bitwise comparision
@_effects(readonly)
public static func < (lhs: String, rhs: String) -> Bool {
return _stringCompare(lhs._guts, rhs._guts, expecting: .less)
}
}

原来在标准库中 Comparable 本身并没有默认的实现方法,我们再打开StringComparison.swift

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
@inlinable @inline(__always) // top-level fastest-paths
@_effects(readonly)
internal func _stringCompare(
_ lhs: _StringGuts, _ rhs: _StringGuts, expecting: _StringComparisonResult
) -> Bool {
if lhs.rawBits == rhs.rawBits { return expecting == .equal }
return _stringCompareWithSmolCheck(lhs, rhs, expecting: expecting)
}
@usableFromInline
@_effects(readonly)
internal func _stringCompareWithSmolCheck(
_ lhs: _StringGuts, _ rhs: _StringGuts, expecting: _StringComparisonResult
) -> Bool {
// ASCII small-string fast-path:
if lhs.isSmallASCII && rhs.isSmallASCII {
let lhsRaw = lhs.asSmall._storage
let rhsRaw = rhs.asSmall._storage
if lhsRaw.0 != rhsRaw.0 {
return _lexicographicalCompare(
lhsRaw.0.bigEndian, rhsRaw.0.bigEndian, expecting: expecting)
}
return _lexicographicalCompare(
lhsRaw.1.bigEndian, rhsRaw.1.bigEndian, expecting: expecting)
}
return _stringCompareInternal(lhs, rhs, expecting: expecting)
}

我们由此可以知道,如果 .isSmallASCII 是相同的,则把其 .asSmall._storage 赋值给变量 lhsRaw && rhsRaw 。这个方法对 ASCII 字符串对比做了优化,其他情况用_stringCompareInternal:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@inline(never) // Keep `_stringCompareWithSmolCheck` fast-path fast
@usableFromInline
@_effects(readonly)
internal func _stringCompareInternal(
_ lhs: _StringGuts, _ rhs: _StringGuts, expecting: _StringComparisonResult
) -> Bool {
guard _fastPath(lhs.isFastUTF8 && rhs.isFastUTF8) else {
return _stringCompareSlow(lhs, rhs, expecting: expecting)
}
let isNFC = lhs.isNFC && rhs.isNFC
return lhs.withFastUTF8 { lhsUTF8 in
return rhs.withFastUTF8 { rhsUTF8 in
return _stringCompareFastUTF8(
lhsUTF8, rhsUTF8, expecting: expecting, bothNFC: isNFC)
}
}
}

我们再看到 _stringCompareSlow

1
2
3
4
5
6
7
@_effects(readonly)
private func _stringCompareSlow(
_ lhs: _StringGuts, _ rhs: _StringGuts, expecting: _StringComparisonResult
) -> Bool {
return _stringCompareSlow(
lhs, 0..<lhs.count, rhs, 0..<rhs.count, expecting: expecting)
}

而其中 _lexicographicalCompare 定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@_effects(readonly)
@inline(__always)
private func _lexicographicalCompare<I: FixedWidthInteger>(
_ lhs: I, _ rhs: I, expecting: _StringComparisonResult
) -> Bool {
return expecting == .equal ? lhs == rhs : lhs < rhs
}
...
internal enum _StringComparisonResult {
case equal
case less
@inlinable @inline(__always)
internal init(signedNotation int: Int) {
_internalInvariant(int <= 0)
self = int == 0 ? .equal : .less
}
@inlinable @inline(__always)
static func ==(
_ lhs: _StringComparisonResult, _ rhs: _StringComparisonResult
) -> Bool {
switch (lhs, rhs) {
case (.equal, .equal): return true
case (.less, .less): return true
default: return false
}
}
}

这下我们大致明白 _stringCompareWithSmolCheck 的功能了,为了更加充分的理解,我们可以在 StringGuts.swift 中找到如下 _StringGuts 的定义:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@_fixed_layout
public // SPI(corelibs-foundation)
struct _StringGuts {
@usableFromInline
internal var _object: _StringObject
@inlinable @inline(__always)
internal init(_ object: _StringObject) {
self._object = object
_invariantCheck()
}
// Empty string
@inlinable @inline(__always)
init() {
self.init(_StringObject(empty: ()))
}
}

我们继续看下去:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
extension _StringGuts {
@inlinable
internal var rawBits: _StringObject.RawBitPattern {
@inline(__always) get { return _object.rawBits }
}
}
....
@inlinable
internal var asSmall: _SmallString {
@inline(__always) get { return _SmallString(_object) }
}
internal var isSmallASCII: Bool {
@inline(__always) get { return _object.isSmall && _object.smallIsASCII }
}

好吧,asSmall 是被声明为_SmallString 类型, 那我们在打开 SmallString.swift 看看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
@_fixed_layout @usableFromInline
internal struct _SmallString {
@usableFromInline
internal typealias RawBitPattern = (UInt64, UInt64)
// Small strings are values; store them raw
@usableFromInline
internal var _storage: RawBitPattern
@inlinable
internal var rawBits: RawBitPattern {
@inline(__always) get { return _storage }
}
...

由此段代码可知,_SmallString(UInt64, UInt64) 的别称。

从以上可以看出,在 swift中,StringCharacters 的集合,我们在 Characters.swift 中可以找到:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
extension Character : Equatable {
@inlinable @inline(__always)
@_effects(readonly)
public static func == (lhs: Character, rhs: Character) -> Bool {
return lhs._str == rhs._str
}
}
extension Character : Comparable {
@inlinable @inline(__always)
@_effects(readonly)
public static func < (lhs: Character, rhs: Character) -> Bool {
return lhs._str < rhs._str
}
}

我们继续深挖:

1
2
3
4
5
internal init(unchecked str: String) {
self._str = str
_invariantCheck()
}
}

Assert.swift 中:

1
2
3
4
5
6
7
8
9
10
11
12
@usableFromInline @_transparent
internal func _internalInvariant(
_ condition: @autoclosure () -> Bool, _ message: StaticString = StaticString(),
file: StaticString = #file, line: UInt = #line
) {
#if INTERNAL_CHECKS_ENABLED
if !_branchHint(condition(), expected: true) {
_fatalErrorMessage("Fatal error", message, file: file, line: line,
flags: _fatalErrorFlags())
}
#endif
}

我们再找到:swift-corelibs-foundation/bootstrap/x86_64-apple-darwin/usr/local/include/unicode/coll.h

1
2
3
4
5
6
7
8
9
10
/**
* LESS is returned if source string is compared to be less than target
* string in the compare() method.
* EQUAL is returned if source string is compared to be equal to target
* string in the compare() method.
* GREATER is returned if source string is compared to be greater than
* target string in the compare() method.
* @see Collator#compare
* @deprecated ICU 2.6. Use C enum UCollationResult defined in ucol.h
*/

我们再找到 swift-corelibs-foundation/bootstrap/x86_64-apple-darwin/usr/local/include/unicode/unistr.h

1
2
3
4
5
6
7
8
/**
* characters (code units or code points) in a UnicodeString.
* It's possible not only to create an
* iterator that iterates over an entire UnicodeString, but also to
* create one that iterates over only a subrange of a UnicodeString
* (iterators over different subranges of the same UnicodeString don't
* compare equal).
**/

官方文档已经告诉了我们一切由此我们可以得出不同字符串的比较方式不同的,对于普通的 ASCII字符串,采用比较首位大小的方式进行判断,而遇见Unicode字符串时,如果两个字符串或两个字符值的扩展字符集规范相同,那么它们将被视作相等。如果它们含有相同的意义与表现形式,那么即使组成它们的 Unicode 标量元素不一样,它们的扩展字符集规范也会被视作相等。

举个例子,LATIN SMALL LETTER E WITH ACUTEU+00E9)在规范上等同于 LATIN SMALL LETTER E (U+0065)。 这两种都是使用扩展字符集表示字符 é 的有效方法,所以他们被视作在规范上相等。

1
2
3
4
5
6
7
8
9
10
// "Voulez-vous un café?" using LATIN SMALL LETTER E WITH ACUTE
let eAcuteQuestion = "Voulez-vous un caf\u{E9}?"
// "Voulez-vous un café?" using LATIN SMALL LETTER E and COMBINING ACUTE ACCENT
let combinedEAcuteQuestion = "Voulez-vous un caf\u{65}\u{301}?"
if eAcuteQuestion == combinedEAcuteQuestion {
print("These two strings are considered equal")
}
// Prints "These two strings are considered equal"

相反的,英语中的 LATIN CAPITAL LETTER A (U+0041, or "A") 与俄语中的 CYRILLIC CAPITAL LETTER A (U+0410, or "А") 是不相等的。虽然它们两者十分相似,但是它们没有相同的语义:

1
2
3
4
5
6
7
8
let latinCapitalLetterA: Character = "\u{41}"
let cyrillicCapitalLetterA: Character = "\u{0410}"
if latinCapitalLetterA != cyrillicCapitalLetterA {
print("These two characters are not equivalent.")
}
// Prints "These two characters are not equivalent."

当一个 Unicode 字符串被写入到一个文本文件或其他存储空间时,在字符串中的 Unicode 标量将会其编码成 Unicode 定义的编码形式中的一种。每种形式都会将字符串编码成更小块的代码单元。其中包含了把字符串编码成 8 位代码单元的 UTF-8 编码形式,以及能够将字符串编码成 16 位代码单元的 UTF-16 编码形式,和将字符串编码成 32 位代码单元的 UTF-32 代码形式。通过为每个标准化后的字符串生成对照元素数组,根据对照元素数组,生成二进制的排序 Key,对比二进制数据的大小,其结果就是原字符串的大小。

再回头看看最前面的那几行代码,恍然大悟。

Swift 5.0

🥳

Comparable

Foundation

swift-lldb

swift

Title: Swift 中的字符串是如何比较的?

Author: Tuski

Published: 04/07/2019 - 21:08:57

Updated: 04/08/2019 - 15:35:45

Link: http://www.perphet.com/2019/04/Swift-中的字符串是如何比较的?/

Protocol: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Reprinted please keep the original link and author

Thx F Sup