Transfer Encoding Base64

Posted on
Transfer Encoding Base64 Rating: 3,5/5 3819 reviews

In computer scienceBase64 is a group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding. Each Base64 digit represents exactly 6 bits of data. Three 8-bit bytes (i.e., a total of 24 bits) can therefore be represented by four 6-bit Base64 digits.

For big files though, like 10MB and bigger, I recommend using binary file transfer instead of base64. Ax 2012 has built in support for decoding and encoding base64 data type with some missing things and drawback we will mention in this article. The Base64 term originates from a specific MIME content transfer encoding. Base64 encoding schemes are commonly used when there is a need to encode. Encoding can be either 'Q' denoting Q-encoding that is similar to the quoted-printable encoding, or 'B' denoting base64 encoding. Encoded text is the Q-encoded or base64-encoded text. An encoded-word may not be more than 75 characters long, including charset, encoding, encoded text, and delimiters.

Common to all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the World Wide Web[1] where its uses include the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files.[2]

  • 3Examples
  • 4Implementations and history

Design[edit]

The particular set of 64 characters chosen to represent the 64 place-values for the base varies between implementations. The general strategy is to choose 64 characters that are common to most encodings and that are also printable. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not 8-bit clean.[3] For example, MIME's Base64 implementation uses AZ, az, and 09 for the first 62 values. Other variations share this property but differ in the symbols chosen for the last two values; an example is UTF-7.

The earliest instances of this type of encoding were created for dialup communication between systems running the same OS — e.g., uuencode for UNIX, BinHex for the TRS-80 (later adapted for the Macintosh) — and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.[4][5][6][3]

Base64 table[edit]

The Base64 index table:

IndexCharIndexCharIndexCharIndexChar
0A16Q32g48w
1B17R33h49x
2C18S34i50y
3D19T35j51z
4E20U36k520
5F21V37l531
6G22W38m542
7H23X39n553
8I24Y40o564
9J25Z41p575
10K26a42q586
11L27b43r597
12M28c44s608
13N29d45t619
14O30e46u62+
15P31f47v63/

Examples[edit]

The example below uses ASCII text for simplicity, but this is not a typical use case, as it can already be safely transferred across all systems that can handle Base64. The more typical use is to encode binary data (such as an image); the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.

A quote from Thomas Hobbes' Leviathan:

is represented as a byte sequence of 8-bit-padded ASCII characters encoded in MIME's Base64 scheme as follows (newlines and whitespaces may be present anywhere but are to be ignored on decoding):

In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the bytes 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from left to right (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.

As this example illustrates, Base64 encoding converts three octets into four encoded characters.

SourceText (ASCII)Man
Octets77 (0x4d)97 (0x61)110 (0x6e)
Bits010011010110000101101110
Base64
encoded
Sextets1922546
CharacterTWFu
Octets84 (0x54)87 (0x57)70 (0x46)117 (0x75)

= padding characters might be added to make the last encoded block contain four Base64 characters.

If there are only two significant input octets (e.g., 'Ma'), or when the last input group contains only two octets, all 16 bits will be captured in the first three Base64 digits (18 bits); the two least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the following = padding characters):

SourceText (ASCII)Ma
Octets77 (0x4d)97 (0x61)
Bits010011010110000100
Base64
encoded
Sextets19224Padding
CharacterTWE=
Octets84 (0x54)87 (0x57)69 (0x45)61 (0x3D)

If there is only one significant input octet (e.g., 'M'), or when the last input group contains only one octet, all 8 bits will be captured in the first two Base64 digits (12 bits); the four least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the following = padding characters):

SourceText (ASCII)M
Octets77 (0x4d)
Bits010011010000
Base64
encoded
Sextets1916PaddingPadding
CharacterTQ==
Octets84 (0x54)81 (0x51)61 (0x3D)61 (0x3D)

Output padding[edit]

The final sequence indicates that the last group contained only one byte, and = indicates that it contained two bytes. The example below illustrates how truncating the input of the above quote changes the output padding:

InputOutputPadding
LengthTextLengthText
20any carnal pleasure.28YW55IGNhcm5hbCBwbGVhc3VyZS4=1
19any carnal pleasure28YW55IGNhcm5hbCBwbGVhc3VyZQ2
18any carnal pleasur24YW55IGNhcm5hbCBwbGVhc3Vy0
17any carnal pleasu24YW55IGNhcm5hbCBwbGVhc3U=1
16any carnal pleas24YW55IGNhcm5hbCBwbGVhcw2
Transfer Encoding Base64

The same characters will be encoded differently depending on their position within the three-octet group which is encoded to produce the four characters. For example:

InputOutput
pleasure.cGxlYXN1cmUu
leasure.bGVhc3VyZS4=
easure.ZWFzdXJlLg
asure.YXN1cmUu
sure.c3VyZS4=

The ratio of output bytes to input bytes is 4:3 (33% overhead). Specifically, given an input of n bytes, the output will be 413n{textstyle 4leftlceil {frac {1}{3}}nrightrceil } bytes long, including padding characters.

In theory, the padding character is not needed for decoding, since the number of missing bytes can be calculated from the number of Base64 digits. In some implementations, the padding character is mandatory, while for others it is not used. One case in which padding characters are required is concatenating multiple Base64 encoded files.

Decoding Base64 with padding[edit]

When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single = indicates that the four characters will decode to only two bytes, while indicates that the four characters will decode to only a single byte. For example:

EncodedPaddingLengthDecoded
YW55IGNhcm5hbCBwbGVhcw1any carnal pleas
YW55IGNhcm5hbCBwbGVhc3U==2any carnal pleasu
YW55IGNhcm5hbCBwbGVhc3VyNone3any carnal pleasur

Decoding Base64 without padding[edit]

Without padding, after normal decoding of four characters to three bytes over and over again, fewer than four encoded characters may remain. In this situation only two or three characters shall remain. A single remaining encoded character is not possible (because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte, so a minimum of 2 Base64 characters are required: The first character contributes 6 bits, and the second character contributes its first 2 bits. For example:

LengthEncodedLengthDecoded
2YW55IGNhcm5hbCBwbGVhcw1any carnal pleas
3YW55IGNhcm5hbCBwbGVhc3U2any carnal pleasu
4YW55IGNhcm5hbCBwbGVhc3Vy3any carnal pleasur

Implementations and history[edit]

Variants summary table[edit]

Implementations may have some constraints on the alphabet used for representing some bit patterns. This notably concerns the last two characters used in the index table for index 62 and 63, and the character used for padding (which may be mandatory in some protocols, or removed in others). The table below summarizes these known variants, and link to the subsections below.

EncodingEncoding charactersSeparate encoding of linesDecoding non-encoding characters
62nd63rdpadSeparatorsLengthChecksum
Base64 for Privacy-Enhanced Mail (PEM; RFC 1421; deprecated)+/= mandatoryCR+LF64, or lower for the last lineNoNo
Base64 transfer encoding for MIME (RFC 2045)+/= mandatoryCR+LFAt most 76NoDiscarded
base64 for RFC 4648 (previously, RFC 3548; standard)[a]+/= mandatoryN/ANo
base64url URL- and filename-safe (RFC 4648 §5)[a]-_= optionalN/ANo
Radix-64 for OpenPGP (RFC 4880)+/= mandatoryCR+LFAt most 76Radix-64 encoded 24-bit CRCNo
Base64 for UTF-7 (RFC 2152)+/N/ANo
Base64 encoding for IMAP mailbox names (RFC 3501)+,N/ANo
Y64 URL-safe Base64 from YUI Library[7]._- optionalN/ANo
XML name tokens (Nmtoken) Base64.-N/ANo
XML identifiers (Name) Base64_:N/ANo
Program identifier Base64 variant 1 (non-standard)_-Unknown
Program identifier Base64 variant 2 (non-standard)._Unknown
Freenet URL-safe Base64 (non-standard)~-= optionalN/ANo
  1. ^ abIt is important to note that this variant is intended to provide common features where they are not desired to be specialised by implementations, ensuring robust engineering. This is particularly in light of separate line encodings and restrictions, which have not been considered when previous standards have been co-opted for use elsewhere. Thus, the features indicated here may be over-ridden.

Due to so many variants for base64, base62x has been introduced to unify all of them by excluding symbols in its output, i.e. only letters and numbers in the textual representation of base64 implementation in base62x.

Privacy-enhanced mail[edit]

The first known standardized use of the encoding now called MIME Base64 was in the Privacy-enhanced Electronic Mail (PEM) protocol, proposed by RFC 989 in 1987. PEM defines a 'printable encoding' scheme that uses Base64 encoding to transform an arbitrary sequence of octets to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as SMTP.[8]

The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman letters (AZ, az), the numerals (09), and the + and / symbols. The = symbol is also used as a padding suffix.[4] The original specification, RFC 989, additionally used the * symbol to delimit encoded but unencrypted data within the output stream.

To convert data to PEM printable encoding, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', and the indicated character is output.

The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.

Transfer

After encoding the non-padded data, if two octets of the 24-bit buffer are padded-zeros, two = characters are appended to the output; if one octet of the 24-bit buffer is filled with padded-zeros, one = character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.

PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.

MIME[edit]

The MIME (Multipurpose Internet Mail Extensions) specification lists Base64 as one of two binary-to-text encoding schemes (the other being quoted-printable).[5] MIME's Base64 encoding is based on that of the RFC 1421 version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM, and uses the = symbol for output padding in the same way, as described at RFC 2045.

MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally it specifies that any extra-alphabetic characters must be ignored by a compliant decoder, although most implementations use a CR/LF newline pair to delimit encoded lines.

Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length, though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula:

UTF-7[edit]

UTF-7, described first in RFC 1642, which was later superseded by RFC 2152, introduced a system called modified Base64. This data encoding scheme is used to encode UTF-16 as ASCII characters for use in 7-bit transports such as SMTP. It is a variant of the Base64 encoding used in MIME.[9][10]

The 'Modified Base64' alphabet consists of the MIME Base64 alphabet, but does not use the '=' padding character. UTF-7 is intended for use in mail headers (defined in RFC 2047), and the '=' character is reserved in that context as the escape character for 'quoted-printable' encoding. Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit.

Topics include: Text Effects in Photoshop, Photo Composition, Photo Retouching, The Basics of Photoshop, Designing in Photoshop, Special Effects, and more! Adobe cc 2014 serial. No matter what you're looking to learn in Photoshop, you're bound to find it in our library of over 350 tutorials.

OpenPGP[edit]

OpenPGP, described in RFC 4880, describes Radix-64 encoding, also known as 'ASCII armor'. Radix-64 is identical to the 'Base64' encoding described from MIME, with the addition of an optional 24-bit CRC. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by '=' symbol as separator, appended to the encoded output data.[11]

RFC 3548[edit]

RFC 3548, entitled The Base16, Base32, and Base64 Data Encodings, is an informational (non-normative) memo that attempts to unify the RFC 1421 and RFC 2045 specifications of Base64 encodings, alternative-alphabet encodings, and the seldom-used Base32 and Base16 encodings.

Unless implementations are written to a specification that refers to RFC 3548 and specifically requires otherwise, RFC 3548 forbids implementations from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that decoder implementations must reject data that contain characters outside the encoding alphabet.[6]

RFC 4648[edit]

This RFC obsoletes RFC 3548 and focuses on Base64/32/16:

This document describes the commonly used Base64, Base32, and Base16 encoding schemes. It also discusses the use of line-feeds in encoded data, use of padding in encoded data, use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings.

Filenames[edit]

Another variant called modified Base64 for filename uses '-' instead of '/', because Unix and Windows filenames cannot contain '/'.

It could be recommended to use the modified Base64 for URL instead, since then the filenames could be used in URLs also.

URL applications[edit]

Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. For example, a database persistence framework for Java objects might use Base64 encoding to encode a relatively large unique id (generally 128-bit UUIDs) into a string for use as an HTTP parameter in HTTP forms or HTTP GET URLs. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.

Using standard Base64 in URL requires encoding of '+', '/' and '=' characters into special percent-encoded hexadecimal sequences ('+' becomes '%2B', '/' becomes '%2F' and '=' becomes '%3D'), which makes the string unnecessarily longer.

Transfer Encoding Base64

For this reason, modified Base64 for URL variants exist (such as base64url in RFC 4648), where the '+' and '/' characters of standard Base64 are respectively replaced by '-' and '_', so that using URL encoders/decoders is no longer necessary and have no impact on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. Some variants allow or require omitting the padding '=' signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries will encode '=' to '.', potentially exposing applications to relative path attacks when a folder name is encoded from user data.

Program identifiers[edit]

There are other variants[clarification needed] that use _- or ._ when the Base64 variant string must be used within valid identifiers for programs.

XML[edit]

XML identifiers and name tokens are encoded using two variants:[citation needed]

  • .- for use in XML name tokens (Nmtoken), or even
  • _: for use in more restricted XML identifiers (Name).

HTML[edit]

The atob() and btoa() JavaScript methods, defined in the HTML5 draft specification,[12] provide Base64 encoding and decoding functionality to web pages. The btoa() method outputs padding characters, but these are optional in the input of the atob() method.

Other applications[edit]

Example of an SVG containing embedded JPEG images encoded in Base64[13]

Base64 can be used in a variety of contexts:

  • Base64 can be used to transmit and store text that might otherwise cause delimiter collision
  • Spammers use Base64 to evade basic anti-spamming tools, which often do not decode Base64 and therefore cannot detect keywords in encoded messages.
  • Base64 is used to encode character strings in LDIF files
  • Base64 is often used to embed binary data in an XML file, using a syntax similar to <data encoding='base64'>…</data> e.g. favicons in Firefox's exported bookmarks.html.
  • Base64 is used to encode binary files such as images within scripts, to avoid depending on external files.
  • The data URI scheme can use Base64 to represent file contents. For instance, background images and fonts can be specified in a CSS stylesheet file as data: URIs, instead of being supplied in separate files.
  • The FreeSWAN ipsec implementation precedes Base64 strings with 0s, so they can be distinguished from text or hexadecimal strings.
  • Although not part of the official specification for SVG, some viewers can interpret Base64 when used for embedded elements, such as images inside SVG.[14]

Radix-64 applications not compatible with Base64[edit]

  • Uuencoding, traditionally used on UNIX, uses ASCII 32 (' (space)) through 95 ('_'), consecutively, making its 64-character set ' !'#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_'. Avoiding all lower-case letters was helpful because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power because it was only necessary to add 32, not do a lookup. Its use of most punctuation characters and the space character limits its usefulness.[citation needed]
  • BinHex 4 (HQX), which was used within the classic Mac OS, uses a different set of 64 characters. It uses upper and lower case letters, digits, and punctuation characters, but does not use some visually confusable characters like '7', 'O', 'g' and 'o'. Its 64-character set is '!'#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr'.
  • Several other applications use radix-64 sets more similar to but in a different order to the Base64 format, starting with two symbols, then numerals, then uppercase, then lowercase:
    • Unix stores password hashes computed with crypt in the /etc/passwd file using radix-64 encoding called B64. It uses a mostly-alphanumeric set of characters, plus . and /. Its 64-character set is './0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'. Padding is not used.
    • The GEDCOM 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format using radix-64. Its 64-character set is also ./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'.[15]
    • bcrypt hashes are designed to be used in the same way as traditional crypt(3) hashes, and the algorithm uses a similar but permuted alphabet. Its 64-character set is './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'.[16]
    • Xxencoding uses a mostly-alphanumeric character set similar to crypt and GEDCOM, but using + and - rather than . and /. Its 64-character set is '+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'.
  • 6PACK, used with some terminal node controllers, uses a different set of 64 characters.[17]
    • Bash internally represents Base64 with character set '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_' which can be seen to decode as Base64 by iteratively decoding the sequence of characters in order: for i in {0.9} {a.z} {A.Z} '@' '_'; do printf '$i : $((64#$i))n'; done

See also[edit]

  • Ascii85 (also called Base85)
  • Binary-to-text encoding for a comparison of various encoding algorithms

References[edit]

  1. ^'Base64 encoding and decoding - Web APIs MDN'.
  2. ^'When to base64 encode images (and when not to)'.
  3. ^ abThe Base16,Base32,and Base64 Data Encodings. IETF. October 2006. doi:10.17487/RFC4648. RFC 4648. Retrieved March 18, 2010.
  4. ^ abPrivacy Enhancement for InternetElectronic Mail: Part I: Message Encryption and Authentication Procedures. IETF. February 1993. doi:10.17487/RFC1421. RFC 1421. Retrieved March 18, 2010.
  5. ^ abMultipurpose Internet Mail Extensions: (MIME) Part One: Format of Internet Message Bodies. IETF. November 1996. doi:10.17487/RFC2045. RFC 2045. Retrieved March 18, 2010.
  6. ^ abThe Base16, Base32, and Base64 Data Encodings. IETF. July 2003. doi:10.17487/RFC3548. RFC 3548. Retrieved March 18, 2010.
  7. ^'YUIBlog'. YUIBlog. Retrieved 2012-06-21.
  8. ^Privacy Enhancement for Internet Electronic Mail. IETF. February 1987. doi:10.17487/RFC0989. RFC 989. Retrieved March 18, 2010.
  9. ^UTF-7 A Mail-Safe Transformation Format of Unicode. IETF. July 1994. doi:10.17487/RFC1642. RFC 1642. Retrieved March 18, 2010.
  10. ^UTF-7 A Mail-Safe Transformation Format of Unicode. IETF. May 1997. doi:10.17487/RFC2152. RFC 2152. Retrieved March 18, 2010.
  11. ^OpenPGP Message Format. IETF. November 2007. doi:10.17487/RFC4880. RFC 4880. Retrieved March 18, 2010.
  12. ^'7.3. Base64 utility methods'. HTML 5.2 Editor's Draft. World Wide Web Consortium. Retrieved 2 January 2017. Introduced by changeset 5814, 2011-02-01.
  13. ^<image xlink:href='data:image/jpeg;base64,JPEG contents encoded in Base64' .. />
  14. ^JSFiddle. 'Edit fiddle - JSFiddle'. jsfiddle.net.
  15. ^'The GEDCOM Standard Release 5.5'. Homepages.rootsweb.ancestry.com. Retrieved 2012-06-21.
  16. ^Provos, Niels (1997-02-13). 'src/lib/libc/crypt/bcrypt.c r1.1'. Retrieved 2018-05-18.
  17. ^'6PACK a 'real time' PC to TNC protocol'. Retrieved 2013-05-19.

External links[edit]

The Wikibook Algorithm implementation has a page on the topic of: Base64
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Base64&oldid=899315946'
-->

Definition

Specifies the Content-Transfer-Encoding header information for an email message attachment.

Inheritance
ValueTypeValueTypeValueTypeValueType
TransferEncodingTransferEncodingTransferEncodingTransferEncoding

Fields

Base64Base64Base64Base641

Encodes stream-based data. See RFC 2406 Section 6.8.

EightBitEightBitEightBitEightBit3

The data is in 8-bit characters that may represent international characters with a total line length of no longer than 1000 8-bit characters. For more information about this 8-bit MIME transport extension, see IETF RFC 6152.

QuotedPrintableQuotedPrintableQuotedPrintableQuotedPrintable0

Encodes data that consists of printable characters in the US-ASCII character set. See RFC 2406 Section 6.7.

SevenBitSevenBitSevenBitSevenBit2

Used for data that is not encoded. The data is in 7-bit US-ASCII characters with a total line length of no longer than 1000 characters. See RFC2406 Section 2.7.

UnknownUnknownUnknownUnknown-1

Indicates that the transfer encoding is unknown.

Examples

The following code example displays TransferEncoding used by an attachment.

Remarks

The values in the TransferEncoding enumeration are used with the AttachmentBase.TransferEncoding property.

The Content-Transfer-Encoding header specifies the encoding of the associated message body so that it meets SMTP requirements. SMTP requires data for transport to be in 7-bit US-ASCII characters with lines no longer than 1000 characters.

Content-Transfer-Encoding values are described in detail in RFC 2045 Section 6, available at https://www.ietf.org.

Applies to