Using MF2 with C++

A Technical Preview implementation of MF2 is available in ICU, the International Components for Unicode library. Beginning with version 75.0, ICU includes MF2 implementations in C++ and Java.

For comprehensive documentation on the C++ implementation, see the ICU API guide. The ICU User Guide contains general information on using ICU, but has not yet been updated to include MF2 documentation.

Quick start Jump to heading

The following instructions presume using the clang C/C++ compiler on a Linux system.

The Tech Preview implementation of MF2 in ICU4C continues to evolve, so to ensure consistency with this documentation, it's best to build ICU from source. Building ICU takes some time, but only needs to be done once (unless further changes occur to the API).

git clone https://github.com/unicode-org/icu.git
cd icu
../icu4c/source/runConfigureICU Linux/clang  --disable-renaming \
    --prefix=$HOME/icu76
make          # or make -j8 to build in parallel on 8 cores
make install

ICU is now installed in the icu76 subdirectory of your home directory. Avoid installing globally, since it may override the version of ICU that comes with your distribution, and break things that depend on ICU.

If you don't want to build ICU from source, ICU 76 should be fairly close to the version of the MessageFormat syntax and APIs described on this site, but some differences may exist, such as the syntax of .match (whether it selects on expressions or variables). As of this writing, ICU 76 is a release candidate that can be downloaded from GitHub. In the future, it might be sufficient to upgrade ICU using your system package manager.

A simple client program Jump to heading

Next, let's write a simple program that uses the MF2 API:

#define U_DISABLE_RENAMING 1

#include <unicode/messageformat2.h>

using namespace icu;
using namespace message2;

bool testMessageFormat() {
    UErrorCode errorCode = U_ZERO_ERROR;
    UParseError parseError;

    MessageFormatter::Builder builder(errorCode);
    UnicodeString pattern = "Hello, {$userName}!";
    MessageFormatter mf =
        builder.setPattern(pattern, parseError, errorCode)
               .build(errorCode);

    std::map<UnicodeString, message2::Formattable> argMap;
    argMap["userName"] = message2::Formattable("John");
    MessageArguments args(argMap, errorCode);

    UnicodeString result = mf.formatToString(args, errorCode);
    return (result == "Hello, John!");
}

int main() {
    return testMessageFormat() ? 0 : 1;
}

The line #define U_DISABLE_RENAMING 1 is important if you installed ICU from source. Without it, you may see linker errors due to the library defining symbols in the icu namespace while your code relies on symbols in the icu_76 namespace.

Before we look at how to compile and run this program, let's go through what it does.

Error handling Jump to heading

The line beginning with UErrorCode errorCode sets up an error code that will be passed by reference to all the MessageFormat functions. ICU4C does not use exceptions, so this mechanism is used for signaling errors. By default, the MF2 API provides fallback output for certain types of errors, while other types of errors (syntax errors and data model errors) are a hard failure. The message embedded in this program doesn't contain any errors, so we won't go into detail here. The error code is also used to signal errors that could happen at lower layers of ICU, like a memory allocation failure.

The next line, beginning with UParseError, sets up a data structure that is used by the MF2 parser in case the message contains any parse errors. In that case, the structure will be set to contain a line and column number. There's no need to worry about that yet, since this message has no syntax errors.

For more details, see the ICU FAQ on error handling.

MessageFormatter::Builder Jump to heading

In the next line, we start invoking the MF2 API. The MessageFormatter class is immutable, so like other immutable classes in ICU, it's constructed using the builder pattern. builder is now bound to a mutable MessageFormatter::Builder object.

Next, we declare a C++ variable holding a message, using ICU's UnicodeString type:

UnicodeString pattern = "Hello, {$userName}!";

The next line creates a MessageFormatter object by first calling setPattern() on the builder, and then build(). The builder methods can all be chained. We now have a MessageFormatter that has our desired message built into it.

For more details, see the API documentation on MessageFormatter and MessageFormatter::Builder.

Arguments and Formattable Jump to heading

Since this message has a variable, $userName, that is not declared as a .local, we need to provide it as an external argument. This is done using the MessageArguments class. The line beginning with std::map creates a C++ map object where the keys are UnicodeStrings and the values are Formattable objects. The next line sets the userName key in the map to "John".

It's a little more complicated than that, because the message2::Formattable class has to be used to represent the values of arguments to the message. This is basically a wrapper class that allows arguments to have different types. Usually, calling the Formattable constructor as in the example just works.

The class Formattable is referred to explicitly as message2::Formattable because there is a separate icu::Formattable class that is different.

The next line constructs a MessageArguments object from the std::map we just created. This is necessary for internal reasons.

For more details, see the API documentation on MessageArguments and message2::Formattable.

Formatting to string Jump to heading

Finally, we call the MessageFormatter's formatToString() method on the arguments. Note that the message (pattern) is fixed for each MessageFormatter object, but it can be called repeatedly on different MessageArguments objects, so the same formatter can be applied to different arguments.

The last line returns a boolean indicating whether the output of the message formatter was the expected value, specifically "Hello, John!". The main() method in the program sets the shell exit code based on the return value of our test method.

Building and running the program Jump to heading

To compile this program, save it to a file named testmessageformat.cpp and use the following command:

clang++  -I$HOME/icu76/include -L$HOME/icu76/lib \
  -o testmessageformat testmessageformat.cpp \
  -licui18n -licuuc -licudata -licuio

For more complicated experiments, you'll want to create a simple Makefile or use your preferred build system; the ICU user guide has some information on integrating ICU with different build systems.

Before running the program, set your LD_LIBRARY_PATH so that the loader knows where to search for the shared ICU library:

declare -x LD_LIBRARY_PATH=$HOME/icu76/lib

And then run the program with:

./testmessageformat

This program has no output, but you can check the exit code to see that the string comparison succeeded and the output of the formatter was correct:

$ echo $?
0
On this page