Characters in swift a tutorial

What is a character ?

Nowadays, what is accepted, is that a character has a code point, an encoding, and that it belongs, to a character set.

A character-set, can specify both the code points, and the encoding, or it can just specify, the code points. For example , the unicode character set, which is capable of representing all the characters in the world, by assigning to each a code point, specifies multiple encoding schemes, such as, utf-8, utf-16, and utf-32.

To give a concrete example, the character 0 in unicode, has a code point of U+0069, as for its encoding , it depends on which encoding you use. So if it is utf8, its encoding is 00110000, and if it is utf16, then on a big endian machine, its encoding is 0000000000110000, and if it is to be encoded as utf32, then on a big endian machine, its encoding is 00000000000000000000000000110000.

This being said, some human readable characters, can be formed using one or more unicode code points. So, the letter é in unicode, is either represented by the code point U+00E9, which is the LATIN SMALL LETTER E WITH ACUTE é, or by the combination of the two code points U+0065, which is the LATIN SMALL LETTER E e, and U+0301, which is the combining acute accent ́ .

This being said, swift goes farther down the road, than most other languages. In swift, a character is not a unicode code point, but is the combination, of one or more unicode code points, that would form, one human readable character. In other words, é , written as the combination of the two unicode code points, U+0065, and U+0301, is not two characters under swift, but only one.

Introspection, know your data

Introspection is about having the ability, to ask questions about your data, such as, is it this, or is it that, and additionally, to actually view, or get a description, about the data.

Swift does not offer primitive types, as in other languages, so a character is not a number, as in C, but is actually a structure. This being said, a character has certain properties, which allows the answer, to certain questions.

var isASCII: Bool

var isLetter: Bool
var isCased: Bool
var isUppercase: Bool
var isLowercase: Bool

var isNumber: Bool
var isWholeNumber: Bool
var isHexDigit: Bool

var isSymbol: Bool
var isMathSymbol: Bool
var isCurrencySymbol: Bool

var isPunctuation: Bool

var isWhitespace: Bool
var isNewline: Bool

Let us not be rigid at this stage, usually while programming, you certainly will encounter the necessity, to ask certain of these questions, such as, is this whitespace , is this a letter, is this a number ?

All these questions are answered, by referring the definition, of the what is question, to the Unicode standard.

var c :Character = "ࠀ"
/* SAMARITAN LETTER ALAF */

c .isASCII          //false
c .isLetter         //true
c .isCased          //false
c .isUppercase      //false
c .isLowercase      //false


c = Character ("⅛" )
//VULGAR FRACTION ONE EIGHTH
c .isNumber         //true
c .isWholeNumber    //false


c = Character ("F")
//LATIN CAPITAL LETTER F
c .isHexDigit       //true


c = Character ("∏")
// N-ARY PRODUCT
c .isSymbol         //true
c .isMathSymbol     //true
c .isCurrencySymbol  //false


c = Character ("¶")
// PILCROW SIGN
c .isPunctuation         //true


c = "\u{00A0}"
// ZERO WIDTH SPACE
c .isWhitespace         //true


c = Character ("\n")
c .isWhitespace         //true
c .isNewline            //true

A character value can be printed, by using the print function.

var c = Character ("∏") 
print (c )
∏

If what is needed, is to get an encoding view, of a character, so in other words, the encoding of a character, then you can use the properties, utf8, and utf16, of a character.

let c :Character = "વ"
// GUJARATI LETTER VA


let utf8View = c .utf8
// Get the utf8 encoding 

utf8View .count
/* Number of bytes used to 
   encode વ , is :
     3  .*/

utf8View .forEach { print($0 )}
/* The bytes used to encode વ , are :
     224
     170
     181 */



let utf16View = c .utf16
/* Get the utf16 encoding, of વ */

utf16View .count
/* Get the number of 16 bits units, 
   also known as half-word, which
   were needed, to encode વ :
         1 */

utf16View .forEach { print($0 )}
/* The encoding of વ ,in utf16 is :
     2741 */

What if what was wanted, is to view the unicode code points, that make up a character? In such a case, you can use the unicodeScalars property, of a character.

var char :Character = "વ"
// GUJARATI LETTER VA

var unicodeScalars = char .unicodeScalars
/* Get the unicode code point scalars, 
     that make the character વ */

unicodeScalars .count
/* વ unicode code point, is 
     U+0AB5, so વ is formed only,
     of one unicode code point, 
     or unicode scalar, 
     hence the number of unicode 
     scalars, that make વ is :
       1 */

unicodeScalars .forEach { print($0 )}
/* Print the unicode scalars, that make
     વ . the result is :
       વ */


char = "\u{65}\u{301}"
print (char )
/* char is actually the human readable
   character é , and is formed of the
   two unicode scalars, \u{65} and 
   \u{301} */

unicodeScalars = char .unicodeScalars
/* Get the unicode scalars, that 
   make é */

unicodeScalars .count
/* Get the count of the unicode scalars, 
     that make é :
       2 */

unicodeScalars .forEach { print($0 )}
/* The unicode scalars, that make é are :
e
́ */

A character in swift, does not have a count property, since it is only one human readable character. Whereas a character encoding, has a count property, since in the case of utf-8, the encoding can be performed, using one, to four bytes, and in the case of utf-16, the encoding can be performed, using one, or two 16 bits unit.

Additionally, as shown in the previous examples, a character in swift, can be formed, of more than one unicode code point, as such, a character UnicodeScalarView, which is the view, of a character unicode code points, has the count property.

Comparison

In a way or another, swift is the only programming language, which does character comparison, out of the box, correctly . So to give an example, and if you are using python:

>>> str_1 = "\u00E9"           //'é'
>>> str_2 = "\u0065\u0301"     //'é'
>>> str_1 == str_2 
// false

whereas in swift:

var char_1 :Character = "\u{E9}"        //é
var char_2 :Character = "\u{65}\u{301}" //é
char_1 == char_2 // true

Swift supports comparing characters, for equality, difference, and order using less , less or equal, greater, and greater or equal .

var char_1 :Character = "f"             //f
char_1 .unicodeScalars .forEach { print($0 .value )}
/* Get f , unicode scalars, decimal values: 
     102 */

var char_2 :Character = "\u{E9}"        //é
char_2 .unicodeScalars .forEach { print($0 .value )}
/* Get é , written as \u{E9}, unicode code points,
     values, in decimal:  
       233 */

var char_3 :Character = "\u{65}\u{301}" //é
char_3 .unicodeScalars .forEach { print($0 .value )}
/* Get é , written as \u{65}\u{301}, unicode 
     code points values, in decimal: 
       101
       769 */

char_1 < char_2
/* f < é written as \u{E9}
     true */

char_2  == char_3
/* é written as \u{E9} is 
     equal to é written as 
     \u{65}\u{301} 
        true.*/

char_1 < char_3 
/* f < é written as \u{65}\u{301}
     true .*/

Character comparison in swift, is not locale dependent. Locale can affect for example, currency, and date formatting.

Ways to create a character

You can create a character, using the character directly, enclosed in double quotes, but since a string literal, is also defined using a double quote, then when creating a character, using a literal, you must specify, that the variable or the constant , to be created, is of the Character type.

var char_1 :Character = "a"
// a 

A unicode escape sequence "\u{hh..}", can also be used, when creating a character. In between the brackets, of the escape sequence, you must have between one, and eight hexadecimal digits, representing the character unicode code point. If the character has more than one unicode code point, then it is possible to use, more than one unicode escape sequence.

var char :Character = "\u{69}"
//i
char = "\u{65}\u{301}"
//é

Additionally, there are various constructors, that can be used to create a character, what follows, are examples of using each.

var char = Character ("a" )
char = Character ("\u{65}\u{301}" )
// Create characters from a string 


char = Character (Unicode .Scalar (97 ))
/* Create a character, from a unicode 
code point, specified in decimal.
The code point 97, in decimal, is the 
character a .*/

Getting the ascii , whole number, and hex digits values

Additionally for a character in swift, properties which return, an optional value, are defined. They allow you to possibly get, the hexadecimal value, the whole number value, and the ascii value, of a character.

var char :Character = "\u{61}\u{030A}"
/* This is the letter å , written as
the code point U+0061 , representing the
letter a , and the code point U+030A ,
representing the COMBINING RING ABOVE .*/

char .hexDigitValue
// nil
char.wholeNumberValue
// nil
char .asciiValue
// nil


char = "a"
char .hexDigitValue
// Optional(10)
char.wholeNumberValue
// nil
char .asciiValue
// Optional(97)


char = "1"
char .hexDigitValue
//Optional(1)
char.wholeNumberValue
//Optional(1)
char .asciiValue
// Optional(49)

Uppercase, and lowercase

The functions lowercased, and uppercased, can be used to convert a character, to a lowercase string, and to an uppercase string respectively.

Why a string, and not another character, you might ask? Well, simply because, the conversion of certain characters, might lead to more than one character.

var chars :[Character] = ["⅔", "Ḳ" , "ģ" , "ß" ]

for char in chars { 
    print("\(char) , uppercase is \(char.uppercased ( )) , lowercase is \(char.lowercased ( ))")}

⅔ , uppercase is ⅔ , lowercase is ⅔
Ḳ , uppercase is Ḳ , lowercase is ḳ
ģ , uppercase is Ģ , lowercase is ģ
ß , uppercase is SS , lowercase is ß