Character encoding for SMS

Character encoding

Character encoding allows for the representation and transmission of characters according to an encoding system. It's necessary for the accurate transmission of the incredibly diverse characters and symbols used to communicate by text messages all over the world.

SMS concatenation

SMS concatenation merges messages that are longer than one SMS length. Some (amount varies with encoding) hidden characters are necessary to ensure the concatenation of SMS. They are the continuation prefix and a sequence number of subsequent SMS messages.

Supported characters using GSM

The full table of characters supported in the GSM encoding is available on Wikipedia.

Characters limits for GSM

Any text below 160 characters is counted as one SMS.

For texts longer than 160 characters, one SMS is counted every 153 characters (7 characters are used for smooth concatenation).

Supported characters using UNICODE

Unicode represents most of the characters used in common languages. Unicode is frequently updated to amount for previously uncovered needs. As of today, there are 128 000 Unicode characters. The full table of characters supported in the UNICODE encoding is available on Wikipedia.

Characters limits for UNICODE

Any text below 70 characters is counted as one SMS.

For texts longer than 70 characters, one SMS will be counted every 67 characters (3 are used for smooth concatenation.

Specify/force character encoding

Character encoding using SENDR

On SENDR, GSM character is forced by default to reduce the cost of SMS sending. Non-GSM latin characters are converted to similar GSM characters when possible (for instance ê -> e). If you use UNICODE character without GSM equivalent (Cyrillic or Chinese characters for instance), the encoding will switch to UNICODE to ensure the accurate transmission of your message.

SENDR displays the encoding used, as well as the number of character and corresponding SMS length:

Encoding.gif

Character encoding using our API

On our API, you can force a specific encoding, using the “force_encoding” option in the ‘SMS.options’ object: It enables the automatic non-GSM character conversion. (ê –> e). More information on our API docs for SMS

Text formatting in SMS (API)

The body of text messages is a string, therefore, basic string formatting options can be used:

  • Newline character (\n) is interpreted and produces a new line in the received text.
  • Tabs are not rendered (carrier level restrictions), but spaces in the string are respected if they are not at the beginning of the message.
    "    Hello Bob" is displayed on the recipient phone as "Hello Bob" but
    "Hello Bob    here is your package number" is displayed as it is in the string.

Maximum length of a concatenated SMS

The SMS protocol is limited to up to 16 SMS concatenated together (the UDH header that joins the messages is a hex value). Therefore, a single concatenated SMS can contain up to:

GSM encoding: 2448 characters (16 x 153)

UNICODE encoding: 1072 characters (16 x 67)

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.