Table of Contents
What is a character ?
Nowadays, what is accepted, is that a character has a code point, an encoding, and that it belongs, to a character set.
A character-set, can specify both the code points, and the encoding, or it can just specify, the code points. For example , the unicode character set, which is capable of representing all the characters in the world, by assigning to each a code point, specifies multiple encoding schemes, such as, utf-8
, utf-16
, and utf-32
.
To give a concrete example, the character 0
in unicode, has a code point of U+0069
, as for its encoding , it depends on which encoding you use. So if it is utf8
, its encoding is 00110000
, and if it is utf16
, then on a big endian machine, its encoding is 0000000000110000
, and if it is to be encoded as utf32
, then on a big endian machine, its encoding is 00000000000000000000000000110000
.
This being said, some human readable characters, can be formed using one or more unicode code points. So, the letter é
in unicode, is either represented by the code point U+00E9
, which is the LATIN SMALL LETTER E WITH ACUTE
é
, or by the combination of the two code points U+0065
, which is the LATIN SMALL LETTER E
e
, and U+0301
, which is the combining acute accent ́
.
This being said, swift goes farther down the road, than most other languages. In swift, a character is not a unicode code point, but is the combination, of one or more unicode code points, that would form, one human readable character. In other words, é
, written as the combination of the two unicode code points, U+0065
, and U+0301
, is not two characters under swift, but only one.
Introspection, know your data
Introspection is about having the ability, to ask questions about your data, such as, is it this, or is it that, and additionally, to actually view, or get a description, about the data.
Swift does not offer primitive types, as in other languages, so a character is not a number, as in C
, but is actually a structure. This being said, a character has certain properties, which allows the answer, to certain questions.
var isASCII: Bool var isLetter: Bool var isCased: Bool var isUppercase: Bool var isLowercase: Bool var isNumber: Bool var isWholeNumber: Bool var isHexDigit: Bool var isSymbol: Bool var isMathSymbol: Bool var isCurrencySymbol: Bool var isPunctuation: Bool var isWhitespace: Bool var isNewline: Bool
Let us not be rigid at this stage, usually while programming, you certainly will encounter the necessity, to ask certain of these questions, such as, is this whitespace , is this a letter, is this a number ?
All these questions are answered, by referring the definition, of the what is question, to the Unicode standard.
var c :Character = "ࠀ" /* SAMARITAN LETTER ALAF */ c .isASCII //false c .isLetter //true c .isCased //false c .isUppercase //false c .isLowercase //false c = Character ("⅛" ) //VULGAR FRACTION ONE EIGHTH c .isNumber //true c .isWholeNumber //false c = Character ("F") //LATIN CAPITAL LETTER F c .isHexDigit //true c = Character ("∏") // N-ARY PRODUCT c .isSymbol //true c .isMathSymbol //true c .isCurrencySymbol //false c = Character ("¶") // PILCROW SIGN c .isPunctuation //true c = "\u{00A0}" // ZERO WIDTH SPACE c .isWhitespace //true c = Character ("\n") c .isWhitespace //true c .isNewline //true
A character value can be printed, by using the print function.
var c = Character ("∏") print (c ) ∏
If what is needed, is to get an encoding view, of a character, so in other words, the encoding of a character, then you can use the properties, utf8
, and utf16
, of a character.
let c :Character = "વ" // GUJARATI LETTER VA let utf8View = c .utf8 // Get the utf8 encoding utf8View .count /* Number of bytes used to encode વ , is : 3 .*/ utf8View .forEach { print($0 )} /* The bytes used to encode વ , are : 224 170 181 */ let utf16View = c .utf16 /* Get the utf16 encoding, of વ */ utf16View .count /* Get the number of 16 bits units, also known as half-word, which were needed, to encode વ : 1 */ utf16View .forEach { print($0 )} /* The encoding of વ ,in utf16 is : 2741 */
What if what was wanted, is to view the unicode code points, that make up a character? In such a case, you can use the unicodeScalars
property, of a character.
var char :Character = "વ" // GUJARATI LETTER VA var unicodeScalars = char .unicodeScalars /* Get the unicode code point scalars, that make the character વ */ unicodeScalars .count /* વ unicode code point, is U+0AB5, so વ is formed only, of one unicode code point, or unicode scalar, hence the number of unicode scalars, that make વ is : 1 */ unicodeScalars .forEach { print($0 )} /* Print the unicode scalars, that make વ . the result is : વ */ char = "\u{65}\u{301}" print (char ) /* char is actually the human readable character é , and is formed of the two unicode scalars, \u{65} and \u{301} */ unicodeScalars = char .unicodeScalars /* Get the unicode scalars, that make é */ unicodeScalars .count /* Get the count of the unicode scalars, that make é : 2 */ unicodeScalars .forEach { print($0 )} /* The unicode scalars, that make é are : e ́ */
A character in swift, does not have a count property, since it is only one human readable character. Whereas a character encoding, has a count property, since in the case of utf-8
, the encoding can be performed, using one, to four bytes, and in the case of utf-16
, the encoding can be performed, using one, or two 16
bits unit.
Additionally, as shown in the previous examples, a character in swift, can be formed, of more than one unicode code point, as such, a character UnicodeScalarView
, which is the view, of a character unicode code points, has the count property.
Comparison
In a way or another, swift is the only programming language, which does character comparison, out of the box, correctly . So to give an example, and if you are using python:
>>> str_1 = "\u00E9" //'é' >>> str_2 = "\u0065\u0301" //'é' >>> str_1 == str_2 // false
whereas in swift:
var char_1 :Character = "\u{E9}" //é var char_2 :Character = "\u{65}\u{301}" //é char_1 == char_2 // true
Swift supports comparing characters, for equality, difference, and order using less , less or equal, greater, and greater or equal .
var char_1 :Character = "f" //f char_1 .unicodeScalars .forEach { print($0 .value )} /* Get f , unicode scalars, decimal values: 102 */ var char_2 :Character = "\u{E9}" //é char_2 .unicodeScalars .forEach { print($0 .value )} /* Get é , written as \u{E9}, unicode code points, values, in decimal: 233 */ var char_3 :Character = "\u{65}\u{301}" //é char_3 .unicodeScalars .forEach { print($0 .value )} /* Get é , written as \u{65}\u{301}, unicode code points values, in decimal: 101 769 */ char_1 < char_2 /* f < é written as \u{E9} true */ char_2 == char_3 /* é written as \u{E9} is equal to é written as \u{65}\u{301} true.*/ char_1 < char_3 /* f < é written as \u{65}\u{301} true .*/
Character comparison in swift, is not locale dependent. Locale can affect for example, currency, and date formatting.
Ways to create a character
You can create a character, using the character directly, enclosed in double quotes, but since a string literal, is also defined using a double quote, then when creating a character, using a literal, you must specify, that the variable or the constant , to be created, is of the Character
type.
var char_1 :Character = "a" // a
A unicode escape sequence "\u{hh..}"
, can also be used, when creating a character. In between the brackets, of the escape sequence, you must have between one, and eight hexadecimal digits, representing the character unicode code point. If the character has more than one unicode code point, then it is possible to use, more than one unicode escape sequence.
var char :Character = "\u{69}" //i char = "\u{65}\u{301}" //é
Additionally, there are various constructors, that can be used to create a character, what follows, are examples of using each.
var char = Character ("a" ) char = Character ("\u{65}\u{301}" ) // Create characters from a string char = Character (Unicode .Scalar (97 )) /* Create a character, from a unicode code point, specified in decimal. The code point 97, in decimal, is the character a .*/
Getting the ascii , whole number, and hex digits values
Additionally for a character in swift, properties which return, an optional value, are defined. They allow you to possibly get, the hexadecimal value, the whole number value, and the ascii value, of a character.
var char :Character = "\u{61}\u{030A}" /* This is the letter å , written as the code point U+0061 , representing the letter a , and the code point U+030A , representing the COMBINING RING ABOVE .*/ char .hexDigitValue // nil char.wholeNumberValue // nil char .asciiValue // nil char = "a" char .hexDigitValue // Optional(10) char.wholeNumberValue // nil char .asciiValue // Optional(97) char = "1" char .hexDigitValue //Optional(1) char.wholeNumberValue //Optional(1) char .asciiValue // Optional(49)
Uppercase, and lowercase
The functions lowercased
, and uppercased
, can be used to convert a character, to a lowercase string, and to an uppercase string respectively.
Why a string, and not another character, you might ask? Well, simply because, the conversion of certain characters, might lead to more than one character.
var chars :[Character] = ["⅔", "Ḳ" , "ģ" , "ß" ] for char in chars { print("\(char) , uppercase is \(char.uppercased ( )) , lowercase is \(char.lowercased ( ))")} ⅔ , uppercase is ⅔ , lowercase is ⅔ Ḳ , uppercase is Ḳ , lowercase is ḳ ģ , uppercase is Ģ , lowercase is ģ ß , uppercase is SS , lowercase is ß