MessageFormat 2.0 Specification Jump to heading

Table of Contents Jump to heading

  1. Introduction
    1. Conformance
    2. Terminology and Conventions
    3. Stability Policy
  2. Syntax
    1. Productions
    2. Tokens
    3. message.abnf
  3. Errors
    1. Error Handling
    2. Syntax Errors
    3. Data Model Errors
    4. Resolution Errors
    5. Message Function Errors
  4. Registry
    1. registry.dtd
  5. Formatting
  6. Interchange data model

Introduction Jump to heading

One of the challenges in adapting software to work for users with different languages and cultures is the need for dynamic messages. Whenever a user interface needs to present data as part of a larger string, that data needs to be formatted (and the message may need to be altered) to make it culturally accepted and grammatically correct.

For example, if your US English (en-US) interface has a message like:

Your item had 1,023 views on April 3, 2023

You want the translated message to be appropriately formatted into French:

Votre article a eu 1 023 vues le 3 avril 2023

Or Japanese:

あなたのアイテムは 2023 年 4 月 3 日に 1,023 回閲覧されました。

This specification defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages and APIs. This will enable the integration of existing internationalization APIs (such as the date and number formats shown above), grammatical matching (such as plurals or genders), as well as user-defined formats and message selectors.

The document is the successor to ICU MessageFormat, henceforth called ICU MessageFormat 1.0.

Conformance Jump to heading

Everything in this specification is normative except for: sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Terminology and Conventions Jump to heading

A term looks like this when it is defined in this specification.

A reference to a term looks like this.

Examples are non-normative and styled like this.

Stability Policy Jump to heading

[!IMPORTANT] The provisions of the stability policy are not in effect until the conclusion of the technical preview and adoption of this specification.

Updates to this specification will not change the syntactical meaning, the runtime output, or other behaviour of valid messages written for earlier versions of this specification that only use functions defined in this specification. Updates to this specification will not remove any syntax provided in this version. Future versions MAY add additional structure or meaning to existing syntax.

Updates to this specification will not remove any reserved keywords or sigils.

[!NOTE] Future versions may define new keywords.

Updates to this specification will not reserve or assign meaning to any character "sigils" except for those in the reserved production.

Updates to this specification will not remove any functions defined in the default registry nor will they remove any options or option values. Additional options or option values MAY be defined.

[!NOTE] This does not guarantee that the results of formatting will never change. Even when the specification doesn't change, the functions for date formatting, number formatting and so on will change their results over time.

Later specification versions MAY make previously invalid messages valid.

Updates to this specification will not introduce message syntax that, when parsed according to earlier versions of this specification, would produce syntax or data model errors. Such messages MAY produce errors when formatted according to an earlier version of this specification.

From version 2.0, MessageFormat will only reserve, define, or require function names or function option names consisting of characters in the ranges a-z, A-Z, and 0-9. All other names in these categories are reserved for the use of implementations or users.

[!NOTE] Users defining custom names SHOULD include at least one character outside these ranges to ensure that they will be compatible with future versions of this specification.

Later versions of this specification will not introduce changes to the data model that would result in a data model representation based on this version being invalid.

For example, existing interfaces or fields will not be removed.

Later versions of this specification MAY introduce changes to the data model that would result in future data model representations not being valid for implementations of this version of the data model.

For example, a future version could introduce a new keyword, whose data model representation would be a new interface that is not recognized by this version's data model.

Later specification versions will not introduce syntax that cannot be represented by this version of the data model.

For example, a future version could introduce a new keyword. The future version's data model would provide an interface for that keyword while this version of the data model would parse the value into the interface UnsupportedStatement. Both data models would be "valid" in their context, but this version's would be missing any functionality for the new statement type.

DRAFT MessageFormat 2.0 Syntax Jump to heading

Table of Contents Jump to heading

[TBD]

Introduction Jump to heading

This section defines the formal grammar describing the syntax of a single message.

Design Goals Jump to heading

This section is non-normative.

The design goals of the syntax specification are as follows:

  1. The syntax should leverage the familiarity with ICU MessageFormat 1.0 in order to lower the barrier to entry and increase the chance of adoption. At the same time, the syntax should fix the pain points of ICU MessageFormat 1.0.

  2. The syntax inside translatable content should be easy to understand for humans. This includes making it clear which parts of the message body are translatable content, which parts inside it are placeholders for expressions, as well as making the selection logic predictable and easy to reason about.

  3. The syntax surrounding translatable content should be easy to write and edit for developers, localization engineers, and easy to parse by machines.

  4. The syntax should make a single message easily embeddable inside many container formats: .properties, YAML, XML, inlined as string literals in programming languages, etc. This includes a future MessageResource specification.

Design Restrictions Jump to heading

This section is non-normative.

The syntax specification takes into account the following design restrictions:

  1. Whitespace outside the translatable content should be insignificant. It should be possible to define a message entirely on a single line with no ambiguity, as well as to format it over multiple lines for clarity.

  2. The syntax should define as few special characters and sigils as possible. Note that this necessitates extra care when presenting messages for human consumption, because they may contain invisible characters such as U+200B ZERO WIDTH SPACE, control characters such as U+0000 NULL and U+0009 TAB, permanently reserved noncharacters (U+FDD0 through U+FDEF and U+nFFFE and U+nFFFF where n is 0x0 through 0x10), private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and U+100000 through U+10FFFD), unassigned code points, and other potentially confusing content.

Messages and their Syntax Jump to heading

The purpose of MessageFormat is to allow content to vary at runtime. This variation might be due to placing a value into the content or it might be due to selecting a different bit of content based on some data value or it might be due to a combination of the two.

MessageFormat calls the template for a given formatting operation a message.

The values passed in at runtime (which are to be placed into the content or used to select between different content items) are called external variables. The author of a message can also assign local variables, including variables that modify external variables.

This part of the MessageFormat specification defines the syntax for a message, along with the concepts and terminology needed when processing a message during the formatting of a message at runtime.

The complete formal syntax of a message is described by the ABNF.

Well-formed vs. Valid Messages Jump to heading

A message is well-formed if it satisfies all the rules of the grammar. Attempting to parse a message that is not well-formed will result in a Syntax Error.

A message is valid if it is well-formed and also meets the additional content restrictions and semantic requirements about its structure defined below for declarations, matcher and options. Attempting to parse a message that is not valid will result in a Data Model Error.

The Message Jump to heading

A message is the complete template for a specific message formatting request.

A variable is a name associated to a resolved value.

An external variable is a variable whose name and initial value are supplied by the caller to MessageFormat or available in the formatting context. Only an external variable can appear as an operand in an input declaration.

A local variable is a variable created as the result of a local declaration.

[!NOTE] This syntax is designed to be embeddable into many different programming languages and formats. As such, it avoids constructs, such as character escapes, that are specific to any given file format or processor. In particular, it avoids using quote characters common to many file formats and formal languages so that these do not need to be escaped in the body of a message.

[!NOTE] In general (and except where required by the syntax), whitespace carries no meaning in the structure of a message. While many of the examples in this spec are written on multiple lines, the formatting shown is primarily for readability.

Example This message:

.local $foo   =   { |horse| }
{{You have a {$foo}!}}

Can also be written as:

.local $foo={|horse|}{{You have a {$foo}!}}

An exception to this is: whitespace inside a pattern is always significant.

[!NOTE] The syntax assumes that each message will be displayed with a left-to-right display order and be processed in the logical character order. The syntax also permits the use of right-to-left characters in identifiers, literals, and other values. This can result in confusion when viewing the message.

Additional restrictions or requirements, such as permitting the use of certain bidirectional control characters in the syntax, might be added during the Tech Preview to better manage bidirectional text. Feedback on the creation and management of messages containing bidirectional tokens is strongly desired.

A message can be a simple message or it can be a complex message.

message = simple-message / complex-message

A simple message contains a single pattern, with restrictions on its first character. An empty string is a valid simple message.

simple-message = [simple-start pattern]
simple-start   = simple-start-char / escaped-char / placeholder

A complex message is any message that contains declarations, a matcher, or both. A complex message always begins with either a keyword that has a . prefix or a quoted pattern and consists of:

  1. an optional list of declarations, followed by
  2. a complex body
complex-message = *(declaration [s]) complex-body

Declarations Jump to heading

A declaration binds a variable identifier to a value within the scope of a message. This variable can then be used in other expressions within the same message. Declarations are optional: many messages will not contain any declarations.

An input-declaration binds a variable to an external input value. The variable-expression of an input-declaration MAY include an annotation that is applied to the external value.

A local-declaration binds a variable to the resolved value of an expression.

For compatibility with later MessageFormat 2 specification versions, declarations MAY also include reserved statements.

declaration       = input-declaration / local-declaration / reserved-statement
input-declaration = input [s] variable-expression
local-declaration = local s variable [s] "=" [s] expression

Variables, once declared, MUST NOT be redeclared. A message that does any of the following is not valid and will produce a Duplicate Declaration error during processing:

A local-declaration MAY overwrite an external input value as long as the external input value does not appear in a previous declaration.

[!NOTE] These restrictions only apply to declarations. A placeholder or selector can apply a different annotation to a variable than one applied to the same variable named in a declaration. For example, this message is valid:

.input {$var :number maximumFractionDigits=0}
.match {$var :number maximumFractionDigits=2}
0 {{The selector can apply a different annotation to {$var} for the purposes of selection}}
* {{A placeholder in a pattern can apply a different annotation to {$var :number maximumFractionDigits=3}}}

(See the Errors section for examples of invalid messages)

Reserved Statements Jump to heading

A reserved statement reserves additional .keywords for use by future versions of this specification. Any such future keyword must start with ., followed by two or more lower-case ASCII characters.

The rest of the statement supports a similarly wide range of content as reserved annotations, but it MUST end with one or more expressions.

reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
reserved-keyword   = "." name

[!NOTE] The reserved-keyword ABNF rule is a simplification, as it MUST NOT be considered to match any of the existing keywords .input, .local, or .match.

This allows flexibility in future standardization, as future definitions MAY define additional semantics and constraints on the contents of these reserved statements.

Implementations MUST NOT assign meaning or semantics to a reserved statement: these are reserved for future standardization. Implementations MUST NOT remove or alter the contents of a reserved statement.

Complex Body Jump to heading

The complex body of a complex message is the part that will be formatted. The complex body consists of either a quoted pattern or a matcher.

complex-body = quoted-pattern / matcher

Pattern Jump to heading

A pattern contains a sequence of text and placeholders to be formatted as a unit. Unless there is an error, resolving a message always results in the formatting of a single pattern.

pattern = *(text-char / escaped-char / placeholder)

A pattern MAY be empty.

A pattern MAY contain an arbitrary number of placeholders to be evaluated during the formatting process.

Quoted Pattern Jump to heading

A quoted pattern is a pattern that is "quoted" to prevent interference with other parts of the message. A quoted pattern starts with a sequence of two U+007B LEFT CURLY BRACKET {{ and ends with a sequence of two U+007D RIGHT CURLY BRACKET }}.

quoted-pattern = "{{" pattern "}}"

A quoted pattern MAY be empty.

An empty quoted pattern:

{{}}

Text Jump to heading

text is the translateable content of a pattern. Any Unicode code point is allowed, except for U+0000 NULL and the surrogate code points U+D800 through U+DFFF inclusive. The characters U+005C REVERSE SOLIDUS \, U+007B LEFT CURLY BRACKET {, and U+007D RIGHT CURLY BRACKET } MUST be escaped as \\, \{, and \} respectively.

In the ABNF, text is represented by non-empty sequences of simple-start-char, text-char, and escaped-char. The first of these is used at the start of a simple message, and matches text-char except for not allowing U+002E FULL STOP .. The ABNF uses content-char as a shared base for text and quoted literal characters.

Whitespace in text, including tabs, spaces, and newlines is significant and MUST be preserved during formatting.

simple-start-char = content-char / s / "@" / "|"
text-char         = content-char / s / "." / "@" / "|"
quoted-char       = content-char / s / "." / "@" / "{" / "}"
reserved-char     = content-char / "."
content-char      = %x01-08        ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
                  / %x0B-0C        ; omit CR (%x0D)
                  / %x0E-1F        ; omit SP (%x20)
                  / %x21-2D        ; omit . (%x2E)
                  / %x2F-3F        ; omit @ (%x40)
                  / %x41-5B        ; omit \ (%x5C)
                  / %x5D-7A        ; omit { | } (%x7B-7D)
                  / %x7E-2FFF      ; omit IDEOGRAPHIC SPACE (%x3000)
                  / %x3001-D7FF    ; omit surrogates
                  / %xE000-10FFFF

When a pattern is quoted by embedding the pattern in curly brackets, the resulting message can be embedded into various formats regardless of the container's whitespace trimming rules. Otherwise, care must be taken to ensure that pattern-significant whitespace is preserved.

Example In a Java .properties file, the values hello and hello2 both contain an identical message which consists of a single pattern. This pattern consists of text with exactly three spaces before and after the word "Hello":

hello = {{   Hello   }}
hello2=\   Hello  \ 

Placeholder Jump to heading

A placeholder is an expression or markup that appears inside of a pattern and which will be replaced during the formatting of a message.

placeholder = expression / markup

Matcher Jump to heading

A matcher is the complex body of a message that allows runtime selection of the pattern to use for formatting. This allows the form or content of a message to vary based on values determined at runtime.

A matcher consists of the keyword .match followed by at least one selector and at least one variant.

When the matcher is processed, the result will be a single pattern that serves as the template for the formatting process.

A message can only be considered valid if the following requirements are satisfied:

matcher         = match-statement 1*([s] variant)
match-statement = match 1*([s] selector)

A message with a matcher:

.input {$count :number}
.match {$count}
one {{You have {$count} notification.}}
*   {{You have {$count} notifications.}}

A message containing a matcher formatted on a single line:

.match {:platform} windows {{Settings}} * {{Preferences}}

Selector Jump to heading

A selector is an expression that ranks or excludes the variants based on the value of the corresponding key in each variant. The combination of selectors in a matcher thus determines which pattern will be used during formatting.

selector = expression

There MUST be at least one selector in a matcher. There MAY be any number of additional selectors.

A message with a single selector that uses a custom function :hasCase which is a selector that allows the message to choose a pattern based on grammatical case:

.match {$userName :hasCase}
vocative {{Hello, {$userName :person case=vocative}!}}
accusative {{Please welcome {$userName :person case=accusative}!}}
* {{Hello!}}

A message with two selectors:

.input {$numLikes :integer}
.input {$numShares :integer}
.match {$numLikes} {$numShares}
0   0   {{Your item has no likes and has not been shared.}}
0   one {{Your item has no likes and has been shared {$numShares} time.}}
0   *   {{Your item has no likes and has been shared {$numShares} times.}}
one 0   {{Your item has {$numLikes} like and has not been shared.}}
one one {{Your item has {$numLikes} like and has been shared {$numShares} time.}}
one *   {{Your item has {$numLikes} like and has been shared {$numShares} times.}}
*   0   {{Your item has {$numLikes} likes and has not been shared.}}
*   one {{Your item has {$numLikes} likes and has been shared {$numShares} time.}}
*   *   {{Your item has {$numLikes} likes and has been shared {$numShares} times.}}

Variant Jump to heading

A variant is a quoted pattern associated with a set of keys in a matcher. Each variant MUST begin with a sequence of keys, and terminate with a valid quoted pattern. The number of keys in each variant MUST match the number of selectors in the matcher.

Each key is separated from each other by whitespace. Whitespace is permitted but not required between the last key and the quoted pattern.

variant = key *(s key) [s] quoted-pattern
key     = literal / "*"

Key Jump to heading

A key is a value in a variant for use by a selector when ranking or excluding variants during the matcher process. A key can be either a literal value or the "catch-all" key *.

The catch-all key is a special key, represented by *, that matches all values for a given selector.

Expressions Jump to heading

An expression is a part of a message that will be determined during the message's formatting.

An expression MUST begin with U+007B LEFT CURLY BRACKET { and end with U+007D RIGHT CURLY BRACKET }. An expression MUST NOT be empty. An expression cannot contain another expression. An expression MAY contain one more attributes.

A literal-expression contains a literal, optionally followed by an annotation.

A variable-expression contains a variable, optionally followed by an annotation.

An annotation-expression contains an annotation without an operand.

expression            = literal-expression
                      / variable-expression
                      / annotation-expression
literal-expression    = "{" [s] literal [s annotation] *(s attribute) [s] "}"
variable-expression   = "{" [s] variable [s annotation] *(s attribute) [s] "}"
annotation-expression = "{" [s] annotation *(s attribute) [s] "}"

There are several types of expression that can appear in a message. All expressions share a common syntax. The types of expression are:

  1. The value of a local-declaration
  2. A selector
  3. A kind of placeholder in a pattern

Additionally, an input-declaration can contain a variable-expression.

Examples of different types of expression

Declarations:

.input {$x :function option=value}
.local $y = {|This is an expression|}

Selectors:

.match {$selector :functionRequired}

Placeholders:

This placeholder contains a literal expression: {|literal|}
This placeholder contains a variable expression: {$variable}
This placeholder references a function on a variable: {$variable :function with=options}
This placeholder contains a function expression with a variable-valued option: {:function option=$variable}

Annotation Jump to heading

An annotation is part of an expression containing either a function together with its associated options, or a private-use annotation or a reserved annotation.

annotation = function
           / private-use-annotation
           / reserved-annotation

An operand is the literal of a literal-expression or the variable of a variable-expression.

An annotation can appear in an expression by itself or following a single operand. When following an operand, the operand serves as input to the annotation.

Function Jump to heading

A function is named functionality in an annotation. Functions are used to evaluate, format, select, or otherwise process data values during formatting.

Each function is defined by the runtime's function registry. A function's entry in the function registry will define whether the function is a selector or formatter (or both), whether an operand is required, what form the values of an operand can take, what options and option values are valid, and what outputs might result. See function registry for more information.

A function starts with a prefix sigil : followed by an identifier. The identifier MAY be followed by one or more options. Options are not required.

function = ":" identifier *(s option)

A message with a function operating on the variable $now:

It is now {$now :datetime}.
Options Jump to heading

An option is a key-value pair containing a named argument that is passed to a function.

An option has an identifier and a value. The identifier is separated from the value by an U+003D EQUALS SIGN = along with optional whitespace. The value of an option can be either a literal or a variable.

Multiple options are permitted in an annotation. Options are separated from the preceding function identifier and from each other by whitespace. Each option's identifier MUST be unique within the annotation: an annotation with duplicate option identifiers is not valid.

The order of options is not significant.

option = identifier [s] "=" [s] (literal / variable)

Examples of functions with options

A message using the :datetime function. The option weekday has the literal long as its value:

Today is {$date :datetime weekday=long}!

A message using the :datetime function. The option weekday has a variable $dateStyle as its value:

Today is {$date :datetime weekday=$dateStyle}!

Private-Use Annotations Jump to heading

A private-use annotation is an annotation whose syntax is reserved for use by a specific implementation or by private agreement between multiple implementations. Implementations MAY define their own meaning and semantics for private-use annotations.

A private-use annotation starts with either U+0026 AMPERSAND & or U+005E CIRCUMFLEX ACCENT ^.

Characters, including whitespace, are assigned meaning by the implementation. The definition of escapes in the reserved-body production, used for the body of a private-use annotation is an affordance to implementations that wish to use a syntax exactly like other functions. Specifically:

A private-use annotation MAY be empty after its introducing sigil.

private-use-annotation = private-start [[s] reserved-body]
private-start          = "^" / "&"

[!NOTE] Users are cautioned that private-use annotations cannot be reliably exchanged and can result in errors during formatting. It is generally a better idea to use the function registry to define additional formatting or annotation options.

Here are some examples of what private-use sequences might look like:

Here's private use with an operand: {$foo &bar}
Here's a placeholder that is entirely private-use: {&anything here}
Here's a private-use function that uses normal function syntax: {$operand ^foo option=|literal|}
The character \| has to be paired or escaped: {&private || |something between| or isolated: \| }
Stop {& "translate 'stop' as a verb" might be a translator instruction or comment }
Protect stuff in {^ph}<a>{^/ph}private use{^ph}</a>{^/ph}

Reserved Annotations Jump to heading

A reserved annotation is an annotation whose syntax is reserved for future standardization.

A reserved annotation starts with a reserved character. The remaining part of a reserved annotation, called a reserved body, MAY be empty or contain arbitrary text that starts and ends with a non-whitespace character.

This allows maximum flexibility in future standardization, as future definitions MAY define additional semantics and constraints on the contents of these annotations.

Implementations MUST NOT assign meaning or semantics to an annotation starting with reserved-annotation-start: these are reserved for future standardization. Whitespace before or after a reserved body is not part of the reserved body. Implementations MUST NOT remove or alter the contents of a reserved body, including any interior whitespace, but MAY remove or alter whitespace before or after the reserved body.

While a reserved sequence is technically "well-formed", unrecognized reserved-annotations or private-use-annotations have no meaning.

reserved-annotation       = reserved-annotation-start [[s] reserved-body]
reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~"

reserved-body             = reserved-body-part *([s] reserved-body-part)
reserved-body-part        = reserved-char / escaped-char / quoted

Markup Jump to heading

Markup placeholders are pattern parts that can be used to represent non-language parts of a message, such as inline elements or styling that should apply to a span of parts.

Markup MUST begin with U+007B LEFT CURLY BRACKET { and end with U+007D RIGHT CURLY BRACKET }. Markup MAY contain one more attributes.

Markup comes in three forms:

Markup-open starts with U+0023 NUMBER SIGN # and represents an opening element within the message, such as markup used to start a span. It MAY include options.

Markup-standalone starts with U+0023 NUMBER SIGN # and has a U+002F SOLIDUS / immediately before its closing } representing a self-closing or standalone element within the message. It MAY include options.

Markup-close starts with U+002F SOLIDUS / and is a pattern part ending a span.

markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}"  ; open and standalone
       / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}"  ; close

A message with one button markup span and a standalone img markup element:

{#button}Submit{/button} or {#img alt=|Cancel| /}.

A message with attributes in the closing tag:

{#ansi attr=|bold,italic|}Bold and italic{/ansi attr=|bold|} italic only {/ansi attr=|italic|} no formatting.}

A markup-open can appear without a corresponding markup-close. A markup-close can appear without a corresponding markup-open. Markup placeholders can appear in any order without making the message invalid. However, specifications or implementations defining markup might impose requirements on the pairing, ordering, or contents of markup during formatting.

Attributes Jump to heading

Attributes are reserved for standardization by future versions of this specification._ Examples in this section are meant to be illustrative and might not match future requirements or usage.

[!NOTE] The Tech Preview does not provide a built-in mechanism for overriding values in the formatting context (most notably the locale) Nor does it provide a mechanism for identifying specific expressions such as by assigning a name or id. The utility of these types of mechanisms has been debated. There are at least two proposed mechanisms for implementing support for these. Specifically, one mechanism would be to reserve specifically-named options, possibly using a Unicode namespace (i.e. locale=xxx or u:locale=xxx). Such options would be reserved for use in any and all functions or markup. The other mechanism would be to use the reserved "expression attribute" syntax for this purpose (i.e. @locale=xxx or @id=foo) Neither mechanism was included in this Tech Preview. Feedback on the preferred mechanism for managing these features is strongly desired.

In the meantime, function authors and other implementers are cautioned to avoid creating function-specific or implementation-specific option values for this purpose. One workaround would be to use the implementation's namespace for these features to insure later interoperability when such a mechanism is finalized during the Tech Preview period. Specifically:

An attribute is an identifier with an optional value that appears in an expression or in markup.

Attributes are prefixed by a U+0040 COMMERCIAL AT @ sign, followed by an identifier. An attribute MAY have a value which is separated from the identifier by an U+003D EQUALS SIGN = along with optional whitespace. The value of an attribute can be either a literal or a variable.

Multiple attributes are permitted in an expression or markup. Each attribute is separated by whitespace.

The order of attributes is not significant.

attribute = "@" identifier [[s] "=" [s] (literal / variable)]

Examples of expressions and markup with attributes:

A message including a literal that should not be translated:

In French, "{|bonjour| @translate=no}" is a greeting

A message with markup that should not be copied:

Have a {#span @can-copy}great and wonderful{/span @can-copy} birthday!

Other Syntax Elements Jump to heading

This section defines common elements used to construct messages.

Keywords Jump to heading

A keyword is a reserved token that has a unique meaning in the message syntax.

The following three keywords are defined: .input, .local, and .match. Keywords are always lowercase and start with U+002E FULL STOP ..

input = %s".input"
local = %s".local"
match = %s".match"

Literals Jump to heading

A literal is a character sequence that appears outside of text in various parts of a message. A literal can appear as a key value, as the operand of a literal-expression, or in the value of an option. A literal MAY include any Unicode code point except for U+0000 NULL or the surrogate code points U+D800 through U+DFFF.

All code points are preserved.

A quoted literal begins and ends with U+005E VERTICAL BAR |. The characters \ and | within a quoted literal MUST be escaped as \\ and \|.

An unquoted literal is a literal that does not require the | quotes around it to be distinct from the rest of the message syntax. An unquoted MAY be used when the content of the literal contains no whitespace and otherwise matches the unquoted production. Any unquoted literal MAY be quoted. Implementations MUST NOT distinguish between quoted and unquoted literals that have the same sequence of code points.

Unquoted literals can contain a name or consist of a number-literal. A number-literal uses the same syntax as JSON and is intended for the encoding of number values in operands or options, or as keys for variants.

literal        = quoted / unquoted
quoted         = "|" *(quoted-char / escaped-char) "|"
unquoted       = name / number-literal
number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT]

Names and Identifiers Jump to heading

An identifier is a character sequence that identifies a function, markup, or option. Each identifier consists of a name optionally preceeded by a namespace. When present, the namespace is separated from the name by a U+003A COLON :. Built-in functions and their options do not have a namespace identifier.

The namespace u (U+0075 LATIN SMALL LETTER U) is reserved for future standardization.

Function identifiers are prefixed with :. Markup identifiers are prefixed with # or /. Option identifiers have no prefix.

A name is a character sequence used in an identifier or as the name for a variable or the value of an unquoted literal.

Variable names are prefixed with $.

Valid content for names is based on Namespaces in XML 1.0's NCName. This is different from XML's Name in that it MUST NOT contain a U+003A COLON :. Otherwise, the set of characters allowed in a name is large.

[!NOTE] External variables can be passed in that are not valid names. Such variables cannot be referenced in a message, but are not otherwise errors.

Examples:

A variable:

This has a {$variable}

A function:

This has a {:function}

An add-on function from the icu namespace:

This has a {:icu:function}

An option and an add-on option:

This has {:options option=value icu:option=add_on}

Support for namespaces and their interpretation is implementation-defined in this release.

variable   = "$" name
option     = identifier [s] "=" [s] (literal / variable)

identifier = [namespace ":"] name
namespace  = name
name       = name-start *name-char
name-start = ALPHA / "_"
           / %xC0-D6 / %xD8-F6 / %xF8-2FF
           / %x370-37D / %x37F-1FFF / %x200C-200D
           / %x2070-218F / %x2C00-2FEF / %x3001-D7FF
           / %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
name-char  = name-start / DIGIT / "-" / "."
           / %xB7 / %x300-36F / %x203F-2040

Escape Sequences Jump to heading

An escape sequence is a two-character sequence starting with U+005C REVERSE SOLIDUS \.

An escape sequence allows the appearance of lexically meaningful characters in the body of text, quoted, or reserved (which includes, in this case, private-use) sequences. Each escape sequence represents the literal character immediately following the initial \.

escaped-char = backslash ( backslash / "{" / "|" / "}" )
backslash    = %x5C ; U+005C REVERSE SOLIDUS "\"

[!NOTE] The escaped-char rule allows escaping some characters in places where they do not need to be escaped, such as braces in a quoted literal. For example, |foo {bar}| and |foo \{bar\}| are synonymous.

When writing or generating a message, escape sequences SHOULD NOT be used unless required by the syntax. That is, inside literals only escape | and inside patterns only escape { and }.

Whitespace Jump to heading

Whitespace is defined as one or more of U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (new line), U+000D CARRIAGE RETURN, U+3000 IDEOGRAPHIC SPACE, or U+0020 SPACE.

Inside patterns and quoted literals, whitespace is part of the content and is recorded and stored verbatim. Whitespace is not significant outside translatable text, except where required by the syntax.

[!NOTE] The character U+3000 IDEOGRAPHIC SPACE is included in whitespace for compatibility with certain East Asian keyboards and input methods, in which users might accidentally create these characters in a message.

s = 1*( SP / HTAB / CR / LF / %x3000 )

Complete ABNF Jump to heading

The grammar is formally defined in message.abnf using the ABNF notation [STD68], including the modifications found in RFC 7405.

RFC7405 defines a variation of ABNF that is case-sensitive. Some ABNF tools are only compatible with the specification found in RFC 5234. To make message.abnf compatible with that version of ABNF, replace the rules of the same name with this block:

input = %x2E.69.6E.70.75.74  ; ".input"
local = %x2E.6C.6F.63.61.6C  ; ".local"
match = %x2E.6D.61.74.63.68  ; ".match"

MessageFormat 2.0 Errors Jump to heading

Errors can occur during the processing of a message. Some errors can be detected statically, such as those due to problems with message syntax, violations of requirements in the data model, or requirements defined by a function. Other errors might be detected during selection or formatting of a given message. Where available, the use of validation tools is recommended, as early detection of errors makes their correction easier.

Error Handling Jump to heading

Syntax Errors and Data Model Errors apply to all message processors, and MUST be emitted as soon as possible. The other error categories are only emitted during formatting, but it might be possible to detect them with validation tools.

During selection and formatting, expression handlers MUST only emit Message Function Errors.

Implementations do not have to check for or emit Resolution Errors or Message Function Errors in expressions that are not otherwise used by the message, such as placeholders in unselected patterns or declarations that are never referenced during formatting.

In all cases, when encountering a runtime error, a message formatter MUST provide some representation of the message. An informative error or errors MUST also be separately provided.

When a message contains more than one error, or contains some error which leads to further errors, an implementation which does not emit all of the errors SHOULD prioritise Syntax Errors and Data Model Errors over others.

When an error occurs within a selector, the selector MUST NOT match any variant key other than the catch-all * and a Resolution Error or a Message Function Error MUST be emitted.

Syntax Errors Jump to heading

Syntax Errors occur when the syntax representation of a message is not well-formed.

Example invalid messages resulting in a Syntax Error:

{{Missing end braces
{{Missing one end brace}
Unknown {{expression}}
.local $var = {|no message body|}

Data Model Errors Jump to heading

Data Model Errors occur when a message is invalid due to violating one of the semantic requirements on its structure.

Variant Key Mismatch Jump to heading

A Variant Key Mismatch occurs when the number of keys on a variant does not equal the number of selectors.

Example invalid messages resulting in a Variant Key Mismatch error:

.match {$one :func}
1 2 {{Too many}}
* {{Otherwise}}
.match {$one :func} {$two :func}
1 2 {{Two keys}}
* {{Missing a key}}
* * {{Otherwise}}

Missing Fallback Variant Jump to heading

A Missing Fallback Variant error occurs when the message does not include a variant with only catch-all keys.

Example invalid messages resulting in a Missing Fallback Variant error:

.match {$one :func}
1 {{Value is one}}
2 {{Value is two}}
.match {$one :func} {$two :func}
1 * {{First is one}}
* 1 {{Second is one}}

Missing Selector Annotation Jump to heading

A Missing Selector Annotation error occurs when the message contains a selector that does not have an annotation, or contains a variable that does not directly or indirectly reference a declaration with an annotation.

Examples of invalid messages resulting in a Missing Selector Annotation error:

.match {$one}
1 {{Value is one}}
* {{Value is not one}}
.local $one = {|The one|}
.match {$one}
1 {{Value is one}}
* {{Value is not one}}
.input {$one}
.match {$one}
1 {{Value is one}}
* {{Value is not one}}

Duplicate Declaration Jump to heading

A Duplicate Declaration error occurs when a variable is declared more than once. Note that an input variable is implicitly declared when it is first used, so explicitly declaring it after such use is also an error.

Examples of invalid messages resulting in a Duplicate Declaration error:

.input {$var :number maximumFractionDigits=0}
.input {$var :number minimumFractionDigits=0}
{{Redeclaration of the same variable}}

.local $var = {$ext :number maximumFractionDigits=0}
.input {$var :number minimumFractionDigits=0}
{{Redeclaration of a local variable}}

.input {$var :number minimumFractionDigits=0}
.local $var = {$ext :number maximumFractionDigits=0}
{{Redeclaration of an input variable}}

.input {$var :number minimumFractionDigits=$var2}
.input {$var2 :number}
{{Redeclaration of the implicit input variable $var2}}

.local $var = {$ext :someFunction}
.local $var = {$error}
.local $var2 = {$var2 :error}
{{{$var} cannot be redefined. {$var2} cannot refer to itself}}

Duplicate Option Name Jump to heading

A Duplicate Option Name error occurs when the same identifier appears on the left-hand side of more than one option in the same expression.

Examples of invalid messages resulting in a Duplicate Option Name error:

Value is {42 :number style=percent style=decimal}
.local $foo = {horse :func one=1 two=2 one=1}
{{This is {$foo}}}

Resolution Errors Jump to heading

Resolution Errors occur when the runtime value of a part of a message cannot be determined.

Unresolved Variable Jump to heading

An Unresolved Variable error occurs when a variable reference cannot be resolved.

For example, attempting to format either of the following messages would result in an Unresolved Variable error if done within a context that does not provide for the variable reference $var to be successfully resolved:

The value is {$var}.
.match {$var :func}
1 {{The value is one.}}
* {{The value is not one.}}

Unknown Function Jump to heading

An Unknown Function error occurs when an expression includes a reference to a function which cannot be resolved.

For example, attempting to format either of the following messages would result in an Unknown Function error if done within a context that does not provide for the function :func to be successfully resolved:

The value is {horse :func}.
.match {|horse| :func}
1 {{The value is one.}}
* {{The value is not one.}}

Unsupported Expression Jump to heading

An Unsupported Expression error occurs when an expression uses syntax reserved for future standardization, or for private implementation use that is not supported by the current implementation.

For example, attempting to format this message would result in an Unsupported Expression error because it includes a reserved annotation.

The value is {!horse}.

Attempting to format this message would result in an Unsupported Expression error if done within a context that does not support the ^ private use sigil:

.match {|horse| ^private}
1 {{The value is one.}}
* {{The value is not one.}}

Unsupported Statement Jump to heading

An Unsupported Statement error occurs when a message includes a reserved statement.

For example, attempting to format this message would result in an Unsupported Statement error:

.some {|horse|}
{{The message body}}

Bad Selector Jump to heading

A Bad Selector error occurs when a message includes a selector with a resolved value which does not support selection.

For example, attempting to format this message would result in a Bad Selector error:

.local $day = {|2024-05-01| :date}
.match {$day}
* {{The due date is {$day}}}

Message Function Errors Jump to heading

A Message Function Error is any error that occurs when calling a message function implementation or which depends on validation associated with a specific function.

Implementations SHOULD provide a way for functions to emit (or cause to be emitted) any of the types of error defined in this section. Implementations MAY also provide implementation-defined Message Function Error types.

For example, attempting to format any of the following messages might result in a Message Function Error if done within a context that

  1. Provides for the variable reference $user to resolve to an object { name: 'Kat', id: 1234 },
  2. Provides for the variable reference $field to resolve to a string 'address', and
  3. Uses a :get message function which requires its argument to be an object and an option field to be provided with a string value.

The exact type of Message Function Error is determined by the message function implementation.

Hello, {horse :get field=name}!
Hello, {$user :get}!
.local $id = {$user :get field=id}
{{Hello, {$id :get field=name}!}}
Your {$field} is {$id :get field=$field}

Bad Operand Jump to heading

A Bad Operand error is any error that occurs due to the content or format of the operand, such as when the operand provided to a function during function resolution does not match one of the expected implementation-defined types for that function; or in which a literal operand value does not have the required format and thus cannot be processed into one of the expected implementation-defined types for that specific function.

For example, the following messages each produce a Bad Operand error because the literal |horse| does not match the number-literal production, which is a requirement of the function :number for its operand:

.local $horse = {|horse| :number}
{{You have a {$horse}.}}
.match {|horse| :number}
1 {{The value is one.}}
* {{The value is not one.}}

Bad Option Jump to heading

A Bad Option error is an error that occurs when there is an implementation-defined error with an option or its value. These might include:

For example, the following message might produce a Bad Option error because the literal foo does not match the production digit-size-option, which is a requirement of the function :number for its minimumFractionDigits option:

The answer is {42 :number minimumFractionDigits=foo}.

Bad Variant Key Jump to heading

A Bad Variant Key error is an error that occurs when a variant key does not match the expected implementation-defined format.

For example, the following message produces a Bad Variant Key error because horse is not a recognized plural category and does not match the number-literal production, which is a requirement of the :number function:

.match {42 :number}
1     {{The value is one.}}
horse {{The value is a horse.}}
*     {{The value is not one.}}

WIP DRAFT MessageFormat 2.0 Registry Jump to heading

Implementations and tooling can greatly benefit from a structured definition of formatting and matching functions available to messages at runtime. This specification is intended to provide a mechanism for storing such declarations in a portable manner.

Goals Jump to heading

This section is non-normative.

The registry provides a machine-readable description of MessageFormat 2 extensions (custom functions), in order to support the following goals and use-cases:

Conformance and Use Jump to heading

This section is normative.

To be conformant with MessageFormat 2.0, an implementation MUST implement the functions, options and option values, operands and outputs described in the section Default Registry below.

Implementations MAY implement additional functions or additional options. In particular, implementations are encouraged to provide feedback on proposed options and their values.

[!IMPORTANT] In the Tech Preview, the registry data model should be regarded as experimental. Changes to the format are expected during this period. Feedback on the registry's format and implementation is encouraged!

Implementations are not required to provide a machine-readable registry nor to read or interpret the registry data model in order to be conformant.

The MessageFormat 2.0 Registry was created to describe the core set of formatting and selection functions, including operands, options, and option values. This is the minimum set of functionality needed for conformance. By using the same names and values, messages can be used interchangeably by different implementations, regardless of programming language or runtime environment. This ensures that developers do not have to relearn core MessageFormat syntax and functionality when moving between platforms and that translators do not need to know about the runtime environment for most selection or formatting operations.

The registry provides a machine-readable description of functions suitable for tools, such as those used in translation automation, so that variant expansion and information about available options and their effects are available in the translation ecosystem. To that end, implementations are strongly encouraged to provide appropriately tailored versions of the registry for consumption by tools (even if not included in software distributions) and to encourage any add-on or plug-in functionality to provide a registry to support localization tooling.

Registry Data Model Jump to heading

This section is non-normative.

[!IMPORTANT] This part of the specification is not part of the Tech Preview.

The registry contains descriptions of function signatures. registry.dtd describes its data model.

The main building block of the registry is the <function> element. It represents an implementation of a custom function available to translation at runtime. A function defines a human-readable <description> of its behavior and one or more machine-readable signatures of how to call it. Named <validationRule> elements can optionally define regex validation rules for literals, option values, and variant keys.

MessageFormat 2 functions can be invoked in two contexts:

A single function name may be used in both contexts, regardless of whether it's implemented as one or multiple functions.

A signature defines one particular set of at most one argument and any number of named options that can be used together in a single call to the function. <formatSignature> corresponds to a function call inside a placeholder inside translatable text. <matchSignature> corresponds to a function call inside a selector.

A signature may define the positional argument of the function with the <input> element. If the <input> element is not present, the function is defined as a nullary function. A signature may also define one or more <option> elements representing named options to the function. An option can be omitted in a call to the function, unless the required attribute is present. They accept either a finite enumeration of values (the values attribute) or validate their input with a regular expression (the validationRule attribute). Read-only options (the readonly attribute) can be displayed to translators in CAT tools, but may not be edited.

As the <input> and <option> rules may be locale-dependent, each signature can include an <override locales="..."> that extends and overrides the corresponding input and options rules. If multiple <override> elements would match the current locale, only the first one is used.

Matching-function signatures additionally include one or more <match> elements to define the keys against which they can match when used as selectors.

Functions may also include <alias> definitions, which provide shorthands for commonly used option baskets. An alias name may be used equivalently to a function name in messages. Its <setOption> values are always set, and may not be overridden in message annotations.

If a <function>, <input> or <option> includes multiple <description> elements, each SHOULD have a different xml:lang attribute value. This allows for the descriptions of these elements to be themselves localized according to the preferred locale of the message authors and editors.

Example Jump to heading

The following registry.xml is an example of a registry file which may be provided by an implementation to describe its built-in functions. For the sake of brevity, only locales="en" is considered.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE registry SYSTEM "./registry.dtd">

<registry xml:lang="en">
    <function name="platform">
        <description>Match the current OS.</description>
        <matchSignature>
            <match values="windows linux macos android ios"/>
        </matchSignature>
    </function>

    <validationRule id="anyNumber" regex="-?[0-9]+(\.[0-9]+)"/>
    <validationRule id="positiveInteger" regex="[0-9]+"/>
    <validationRule id="currencyCode" regex="[A-Z]{3}"/>

    <function name="number">
        <description>
            Format a number.
            Match a **formatted** numerical value against CLDR plural categories or against a number literal.
        </description>

        <matchSignature>
            <input validationRule="anyNumber"/>
            <option name="type" values="cardinal ordinal"/>
            <option name="minimumIntegerDigits" validationRule="positiveInteger"/>
            <option name="minimumFractionDigits" validationRule="positiveInteger"/>
            <option name="maximumFractionDigits" validationRule="positiveInteger"/>
            <option name="minimumSignificantDigits" validationRule="positiveInteger"/>
            <option name="maximumSignificantDigits" validationRule="positiveInteger"/>
            <!-- Since this applies to both cardinal and ordinal, all plural options are valid. -->
            <match locales="en" values="one two few other" validationRule="anyNumber"/>
            <match values="zero one two few many other" validationRule="anyNumber"/>
        </matchSignature>

        <formatSignature>
            <input validationRule="anyNumber"/>
            <option name="minimumIntegerDigits" validationRule="positiveInteger"/>
            <option name="minimumFractionDigits" validationRule="positiveInteger"/>
            <option name="maximumFractionDigits" validationRule="positiveInteger"/>
            <option name="minimumSignificantDigits" validationRule="positiveInteger"/>
            <option name="maximumSignificantDigits" validationRule="positiveInteger"/>
            <option name="style" readonly="true" values="decimal currency percent unit" default="decimal"/>
            <option name="currency" readonly="true" validationRule="currencyCode"/>
        </formatSignature>

        <alias name="integer">
          <description>Locale-sensitive integral number formatting</description>
          <setOption name="maximumFractionDigits" value="0" />
          <setOption name="style" value="decimal" />
        </alias>
    </function>
</registry>

Given the above description, the :number function is defined to work both in a selector and a placeholder:

.match {$count :number}
1 {{One new message}}
* {{{$count :number} new messages}}

Furthermore, :number's <matchSignature> contains two <match> elements which allow the validation of variant keys. The element whose locales best matches the current locale using resource item lookup from LDML is used. An element with no locales attribute is the default (and is considered equivalent to the root locale).


A localization engineer can then extend the registry by defining the following customRegistry.xml file.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE registry SYSTEM "./registry.dtd">

<registry xml:lang="en">
    <function name="noun">
        <description>Handle the grammar of a noun.</description>
        <formatSignature>
            <override locales="en">
                <input/>
                <option name="article" values="definite indefinite"/>
                <option name="plural" values="one other"/>
                <option name="case" values="nominative genitive" default="nominative"/>
            </override>
        </formatSignature>
    </function>

    <function name="adjective">
        <description>Handle the grammar of an adjective.</description>
        <formatSignature>
            <override locales="en">
                <input/>
                <option name="article" values="definite indefinite"/>
                <option name="plural" values="one other"/>
                <option name="case" values="nominative genitive" default="nominative"/>
            </override>
        </formatSignature>
        <formatSignature>
            <override locales="en">
                <input/>
                <option name="article" values="definite indefinite"/>
                <option name="accord"/>
            </override>
        </formatSignature>
    </function>
</registry>

Messages can now use the :noun and the :adjective functions. The following message references the first signature of :adjective, which expects the plural and case options:

You see {$color :adjective article=indefinite plural=one case=nominative} {$object :noun case=nominative}!

The following message references the second signature of :adjective, which only expects the accord option:

.input {$object :noun case=nominative}
{{You see {$color :adjective article=indefinite accord=$object} {$object}!}}

Default Registry Jump to heading

[!IMPORTANT] This part of the specification is part of the Tech Preview and is NORMATIVE.

This section describes the functions which each implementation MUST provide to be conformant with this specification.

[!NOTE] The Stability Policy allows for updates to Default Registry functions to add support for new options. As implementations are permitted to ignore options that they do not support, it is possible to write messages using options not defined below which currently format with no error, but which could produce errors when formatted with a later edition of the Default Registry. Therefore, using options not explicitly defined here is NOT RECOMMENDED.

String Value Selection and Formatting Jump to heading

The :string function Jump to heading

The function :string provides string selection and formatting.

Operands Jump to heading

The operand of :string is either any implementation-defined type that is a string or for which conversion to a string is supported, or any literal value. All other values produce a Bad Operand error.

For example, in Java, implementations of the java.lang.CharSequence interface (such as java.lang.String or java.lang.StringBuilder), the type char, or the class java.lang.Character might be considered as the "implementation-defined types". Such an implementation might also support other classes via the method toString(). This might be used to enable selection of a enum value by name, for example.

Other programming languages would define string and character sequence types or classes according to their local needs, including, where appropriate, coercion to string.

Options Jump to heading

The function :string has no options.

[!NOTE] Proposals for string transformation options or implementation experience with user requirements is desired during the Tech Preview.

Selection Jump to heading

When implementing MatchSelectorKeys(resolvedSelector, keys) where resolvedSelector is the resolved value of a selector expression and keys is a list of strings, the :string selector performs as described below.

  1. Let compare be the string value of resolvedSelector.
  2. Let result be a new empty list of strings.
  3. For each string key in keys:
    1. If key and compare consist of the same sequence of Unicode code points, then
      1. Append key as the last element of the list result.
  4. Return result.

[!NOTE] Matching of key and compare values is sensitive to the sequence of code points in each string. As a result, variations in how text can be encoded can affect the performance of matching. The function :string does not perform case folding or Unicode Normalization of string values. Users SHOULD encode messages and their parts (such as keys and operands), in Unicode Normalization Form C (NFC) unless there is a very good reason not to. See also: String Matching

[!NOTE] Unquoted string literals in a variant do not include spaces. If users wish to match strings that include whitespace (including U+3000 IDEOGRAPHIC SPACE) to a key, the key needs to be quoted.

For example:

.match {$string :string}
| space key | {{Matches the string " space key "}}
*             {{Matches the string "space key"}}

Formatting Jump to heading

The :string function returns the string value of the resolved value of the operand.

Numeric Value Selection and Formatting Jump to heading

The :number function Jump to heading

The function :number is a selector and formatter for numeric values.

Operands Jump to heading

The function :number requires a Number Operand as its operand.

Options Jump to heading

Some options do not have default values defined in this specification. The defaults for these options are implementation-dependent. In general, the default values for such options depend on the locale, the value of other options, or both.

[!NOTE] The names of options and their values were derived from the options in JavaScript's Intl.NumberFormat.

The following options and their values are required to be available on the function :number:

[!NOTE] The following options and option values are being developed during the Technical Preview period.

The following values for the option style are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

The following options are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

Default Value of select Option Jump to heading

The value plural is the default for the option select because it is the most common use case for numeric selection. It can be used for exact value matches but also allows for the grammatical needs of languages using CLDR's plural rules. This might not be noticeable in the source language (particularly English), but can cause problems in target locales that the original developer is not considering.

For example, a naive developer might use a special message for the value 1 without considering a locale's need for a one plural:

.match {$var :number}
1   {{You have one last chance}}
one {{You have {$var} chance remaining}}
*   {{You have {$var} chances remaining}}

The one variant is needed by languages such as Polish or Russian. Such locales typically also require other keywords such as two, few, and many.

Percent Style Jump to heading

When implementing style=percent, the numeric value of the operand MUST be multiplied by 100 for the purposes of formatting.

For example,

The total was {0.5 :number style=percent}.

should format in a manner similar to:

The total was 50%.

Selection Jump to heading

The function :number performs selection as described in Number Selection below.

The :integer function Jump to heading

The function :integer is a selector and formatter for matching or formatting numeric values as integers.

Operands Jump to heading

The function :integer requires a Number Operand as its operand.

Options Jump to heading

Some options do not have default values defined in this specification. The defaults for these options are implementation-dependent. In general, the default values for such options depend on the locale, the value of other options, or both.

[!NOTE] The names of options and their values were derived from the options in JavaScript's Intl.NumberFormat.

The following options and their values are required in the default registry to be available on the function :integer:

[!NOTE] The following options and option values are being developed during the Technical Preview period.

The following values for the option style are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

The following options are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

Default Value of select Option Jump to heading

The value plural is the default for the option select because it is the most common use case for numeric selection. It can be used for exact value matches but also allows for the grammatical needs of languages using CLDR's plural rules. This might not be noticeable in the source language (particularly English), but can cause problems in target locales that the original developer is not considering.

For example, a naive developer might use a special message for the value 1 without considering a locale's need for a one plural:

.match {$var :integer}
1   {{You have one last chance}}
one {{You have {$var} chance remaining}}
*   {{You have {$var} chances remaining}}

The one variant is needed by languages such as Polish or Russian. Such locales typically also require other keywords such as two, few, and many.

Percent Style Jump to heading

When implementing style=percent, the numeric value of the operand MUST be multiplied by 100 for the purposes of formatting.

For example,

The total was {0.5 :number style=percent}.

should format in a manner similar to:

The total was 50%.

Selection Jump to heading

The function :integer performs selection as described in Number Selection below.

Number Operands Jump to heading

The operand of a number function is either an implementation-defined type or a literal whose contents match the number-literal production in the ABNF. All other values produce a Bad Operand error.

For example, in Java, any subclass of java.lang.Number plus the primitive types (byte, short, int, long, float, double, etc.) might be considered as the "implementation-defined numeric types". Implementations in other programming languages would define different types or classes according to their local needs.

[!NOTE] String values passed as variables in the formatting context's input mapping can be formatted as numeric values as long as their contents match the number-literal production in the ABNF.

For example, if the value of the variable num were the string -1234.567, it would behave identically to the local variable in this example:

.local $example = {|-1234.567| :number}
{{{$num :number} == {$example}}}

[!NOTE] Implementations are encouraged to provide support for compound types or data structures that provide additional semantic meaning to the formatting of number-like values. For example, in ICU4J, the type com.ibm.icu.util.Measure can be used to communicate a value that includes a unit or the type com.ibm.icu.util.CurrencyAmount can be used to set the currency and related options (such as the number of fraction digits).

Digit Size Options Jump to heading

Some options of number functions are defined to take a "digit size option". Implementations of number functions use these options to control aspects of numeric display such as the number of fraction, integer, or significant digits.

A "digit size option" is an option value that the function interprets as a small integer value greater than or equal to zero. Implementations MAY define an upper limit on the resolved value of a digit size option option consistent with that implementation's practical limits.

In most cases, the value of a digit size option will be a string that encodes the value as a decimal integer. Implementations MAY also accept implementation-defined types as the value. When provided as a string, the representation of a digit size option matches the following ABNF:

digit-size-option = "0" / (("1"-"9") [DIGIT])

Number Selection Jump to heading

Number selection has three modes:

When implementing MatchSelectorKeys(resolvedSelector, keys) where resolvedSelector is the resolved value of a selector expression and keys is a list of strings, numeric selectors perform as described below.

  1. Let exact be the JSON string representation of the numeric value of resolvedSelector. (See Determining Exact Literal Match for details)
  2. Let keyword be a string which is the result of rule selection on resolvedSelector.
  3. Let resultExact be a new empty list of strings.
  4. Let resultKeyword be a new empty list of strings.
  5. For each string key in keys:
    1. If the value of key matches the production number-literal, then
      1. If key and exact consist of the same sequence of Unicode code points, then
        1. Append key as the last element of the list resultExact.
    2. Else if key is one of the keywords zero, one, two, few, many, or other, then
      1. If key and keyword consist of the same sequence of Unicode code points, then
        1. Append key as the last element of the list resultKeyword.
    3. Else, emit a Bad Variant Key error.
  6. Return a new list whose elements are the concatenation of the elements (in order) of resultExact followed by the elements (in order) of resultKeyword.

[!NOTE] Implementations are not required to implement this exactly as written. However, the observed behavior must be consistent with what is described here.

Rule Selection Jump to heading

If the option select is set to exact, rule-based selection is not used. Return the empty string.

[!NOTE] Since valid keys cannot be the empty string in a numeric expression, returning the empty string disables keyword selection.

If the option select is set to plural, selection should be based on CLDR plural rule data of type cardinal. See charts for examples.

If the option select is set to ordinal, selection should be based on CLDR plural rule data of type ordinal. See charts for examples.

Apply the rules defined by CLDR to the resolved value of the operand and the function options, and return the resulting keyword. If no rules match, return other.

Example. In CLDR 44, the Czech (cs) plural rule set can be found here.

A message in Czech might be:

.match {$numDays :number}
one  {{{$numDays} den}}
few  {{{$numDays} dny}}
many {{{$numDays} dne}}
*    {{{$numDays} dní}}

Using the rules found above, the results of various operand values might look like:

Operand value Keyword Formatted Message
1 one 1 den
2 few 2 dny
5 other 5 dní
22 few 22 dny
27 other 27 dní
2.4 many 2,4 dne

Determining Exact Literal Match Jump to heading

[!IMPORTANT] The exact behavior of exact literal match is only defined for non-zero-filled integer values. Annotations that use fraction digits or significant digits might work in specific implementation-defined ways. Users should avoid depending on these types of keys in message selection.

Number literals in the MessageFormat 2 syntax use the format defined for a JSON number. A resolvedSelector exactly matches a numeric literal key if, when the numeric value of resolvedSelector is serialized using the format for a JSON number, the two strings are equal.

[!NOTE] Only integer matching is required in the Technical Preview. Feedback describing use cases for fractional and significant digits-based selection would be helpful. Otherwise, users should avoid using matching with fractional numbers or significant digits.

Date and Time Value Formatting Jump to heading

This subsection describes the functions and options for date/time formatting. Selection based on date and time values is not required in this release.

[!NOTE] Selection based on date/time types is not required by MF2. Implementations should use care when defining selectors based on date/time types. The types of queries found in implementations such as java.time.TemporalAccessor are complex and user expectations may be inconsistent with good I18N practices.

The :datetime function Jump to heading

The function :datetime is used to format date/time values, including the ability to compose user-specified combinations of fields.

If no options are specified, this function defaults to the following:

[!NOTE] The default formatting behavior of :datetime is inconsistent with Intl.DateTimeFormat in JavaScript and with {d,date} in ICU MessageFormat 1.0. This is because, unlike those implementations, :datetime is distinct from :date and :time.

Operands Jump to heading

The operand of the :datetime function is either an implementation-defined date/time type or a date/time literal value, as defined in Date and Time Operand. All other operand values produce a Bad Operand error.

Options Jump to heading

The :datetime function can use either the appropriate style options or can use a collection of field options (but not both) to control the formatted output.

If both are specified, a Bad Option error MUST be emitted and a fallback value used as the resolved value of the expression.

[!NOTE] The names of options and their values were derived from the options in JavaScript's Intl.DateTimeFormat.

Style Options Jump to heading

The function :datetime has these style options.

Field Options Jump to heading

Field options describe which fields to include in the formatted output and what format to use for that field. The implementation may use this annotation to configure which fields appear in the formatted output.

[!NOTE] Field options do not have default values because they are only to be used to compose the formatter.

The field options are defined as follows:

[!IMPORTANT] The value 2-digit for some field options must be quoted in the MessageFormat syntax because it starts with a digit but does not match the number-literal production in the ABNF.

.local $correct = {$someDate :datetime year=|2-digit|}
.local $syntaxError = {$someDate :datetime year=2-digit}

The function :datetime has the following options:

[!NOTE] The following options do not have default values because they are only to be used as overrides for locale-and-value dependent implementation-defined defaults.

The following date/time options are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

The :date function Jump to heading

The function :date is used to format the date portion of date/time values.

If no options are specified, this function defaults to the following:

Operands Jump to heading

The operand of the :date function is either an implementation-defined date/time type or a date/time literal value, as defined in Date and Time Operand. All other operand values produce a Bad Operand error.

Options Jump to heading

The function :date has these options:

The :time function Jump to heading

The function :time is used to format the time portion of date/time values.

If no options are specified, this function defaults to the following:

Operands Jump to heading

The operand of the :time function is either an implementation-defined date/time type or a date/time literal value, as defined in Date and Time Operand. All other operand values produce a Bad Operand error.

Options Jump to heading

The function :time has these options:

Date and Time Operands Jump to heading

The operand of a date/time function is either an implementation-defined date/time type or a date/time literal value, as defined below. All other operand values produce a Bad Operand error.

A date/time literal value is a non-empty string consisting of an ISO 8601 date, or an ISO 8601 datetime optionally followed by a timezone offset. As implementations differ slightly in their parsing of such strings, ISO 8601 date and datetime values not matching the following regular expression MAY also be supported. Furthermore, matching this regular expression does not guarantee validity, given the variable number of days in each month.

(?!0000)[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])(T([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]{1,3})?(Z|[+-]((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?)?

When the time is not present, implementations SHOULD use 00:00:00 as the time. When the offset is not present, implementations SHOULD use a floating time type (such as Java's java.time.LocalDateTime) to represent the time value. For more information, see Working with Timezones.

[!IMPORTANT] The ABNF and syntax of MF2 do not formally define date/time literals. This means that a message can be syntactically valid but produce a Bad Operand error at runtime.

[!NOTE] String values passed as variables in the formatting context's input mapping can be formatted as date/time values as long as their contents are date/time literals.

For example, if the value of the variable now were the string 2024-02-06T16:40:00Z, it would behave identically to the local variable in this example:

.local $example = {|2024-02-06T16:40:00Z| :datetime}
{{{$now :datetime} == {$example}}}

[!NOTE] True time zone support in serializations is expected to coincide with the adoption of Temporal in JavaScript. The form of these serializations is known and is a de facto standard. Support for these extensions is expected to be required in the post-tech preview. See: https://datatracker.ietf.org/doc/draft-ietf-sedate-datetime-extended/