Xtext vs. MPS: Decision Criteria

tl;dr If we started a new domain-specific language tomorrow, we could choose between different language workbenches or, more general, textual vs. structural / projectional systems. We should decide case-by-case, guided by the criteria targeted user group, tool environment, language properties, input type, environment, model-to-model and model-to-text transformations, extensibility, theory, model evolution, language test support, and longevity.

This post is based on a presentation and discussion we had at the Strumenta Community. You can download the slides, although reading on might be a bit more clear on the details. Special thanks to Eelco Visser for his contributions regarding language workbenches besides Xtext and MPS.

Introduction

This whole post wants to answer the question:

Tomorrow I want to start a new domain-specific language.
Which criteria shall I think about to decide on a language workbench?

The most important, and most useless answer to this question is: “It depends.” Every language workbench has its own strengths and weaknesses, and we should assess them anew for each language or project. All criteria mentioned below are worth consideration, and should be balanced towards the needs of the language or project at hand.

Almost every aspect described below can be realized in any language workbench — if we really wanted to torture ourselves, we could write an ASCII-art text DSL to “draw” diagrams, or force a really complex piece of procedural logic into lines and boxes. On the other hand, an existing text-based processing chain integrates rather well with a textual DSL, and tables work nicely in a structured environment.

I personally only know Xtext and MPS good enough to offer an educated opinion; Thankfully, during the presentation several others chimed in to offer additional insights. Thus, we can extend this post’s content (to some degree) to “Textual vs. Structural: Decision Criteria”.

What do we mean with textual and structural language workbenches?

As a loose distinction, we’re using the rule of thumb “If you directly edit what’s written on disk, it’s textual.”

Structural describes both projectional and graphical systems. In projectional systems, the user has no influence on how things are shown; with structural systems, the user may have some influence — think of manually layouting a diagram (thanks to Jos Warmer for this clarification).

Examples of textual systems include

ANTLR
MontiCore
Racket
Rascal
Spoofax
Xtext

Examples of structural systems are

MetaEdit+
MPS
Sirius

Targeted User Group

If our DSL targeted developers, we might go for a textual system. Developers are used to the powerful tools provided by a good editor or an IDE, and expect this kind of support for handling their “source code” — or, in this case, model. Textual systems might integrate better with their other tools.

If we targeted business users, they might prefer a structural system. The main competitor in this field is Excel with hand-crafted validation rules and obscure VBA-scripts attached. Typically, business users can profit more from projectional features like mixing text, tables and diagrams.

Tool Environment

If our client had an existing infrastructure to deploy Eclipse-based tooling, we probably wanted to leverage that. This implies using an Eclipse-based language workbench like Rascal, Sirius or Xtext. If we wanted model integration with existing tools, EMF would be our best bet, pointing towards Eclipse.

If our client already leaned towards IntelliJ or similar systems, MPS would be more familiar with them. Spoofax supports both Eclipse and IntelliJ.

Language Properties

If (parts of) our DSL had an established text-based language, we wanted to reuse this existing knowledge in our users and provide a similar textual language. Textual syntax often provides aids to parsers that are difficult to reproduce fluently in structural systems.

As an example, think of a C-style if-statement. In text, the user types i, f, maybe a space, and ( without even thinking about it. In a projectional editor, she still types i and f, but the parenthesis is probably automatically added by the projection.

// | denotes cursor position
if (|«condition») {
  «statements»
}

If she typed (, we would have two bad choices: either we add the parenthesis inside the condition, which is probably not what the user wanted in 95 % of the cases; or we ignore the parenthesis, making the other 5 % really hard to enter.

One important language property is whether we can parse it with reasonable effort and accuracy. For more traditional systems like ANTLR and Xtext, we reach the threshold of unparsable input rather quickly. More advanced systems like Spoofax and Rascal can handle ambiguities well. However, as an extreme example, I doubt we could ever have a parser that reconstructs the semantics of an ASCII-art UML diagram. More realistically, it might be pretty hard for a parser to distinguish mixed free text with unmarked references — think of a free text with some syntactically unmarked references to a user-defined ontology sprinkled in the text: This is free text, with Ornithopters or other Dune references.

Other structures might be parsable, but are very cumbersome to enter — I have yet to see a textual language where writing tables is less than annoying.

Related to parseability is language integration. Almost all technical languages use traditional parser systems, leading to the joy of escaping: <span onclick="if(myVar.substr(\"\\'\") < 5) myTag.style = \'.header > ul { font-weight: bold; } \'">. More modern languages aren’t that pedantic, but try to write the previous sentence in markdown …

If we wanted to integrate non-textual content or languages in a textual system, it gets tricky pretty soon. In fact, we had to solve a lot of the problems projectional editors face. As an example, think of the parameter info many IDEs can project in the source code: The Java file contains myObj.myFunc("Niko", false), but the IDE displays myObj.myFunc(name: "Niko", authorized: false). If the cursor was just right to the opening parenthesis, and we pressed right arrow, would we move to the left or right of the double quotes? What if the user could interact with the projected part, e.g. a color selector? These examples are projected mix-ins, but it doesn’t get better at all if we imagined the file contents <img src="data:image/png;base64,iVBORw …"/>, and wanted to display an inline pixel editor. The aforementioned table embedded into some text is another example.

Structural systems really shine if we wanted to have different editors for the same content, or different viewpoints on the content. To illustrate different editors for the same content, think of a state machine. If we wanted to discuss it with our colleagues, it should be presented in the well-known lines-and-boxes form. We still wanted to retarget a transition or add a state graphically. However, if we had to write it from scratch and had a good structure in mind, or just wanted to refactor an existing one, a text-like representation would be much more efficient.

Different view points can be as simple as “more or less detail”: in a component model, we might want to see only the connections between components, or also their internal wiring. Textual editors can also hide parts of the content — most IDEs, by default, fold the legal header comment in a source code file.
As an example of different viewpoints, imagine a complex model of a machine that integrates mechanical, electrical, and cost aspects. All of these are interconnected, so the integrated model is very valuable. Hardly anybody would like to see all the details. However, different users would be interested in different combinations: the safety engineer needs to know about currents and moving parts, and the production planner wants to look at costs and parts that are hard to get. In a textual system, we could create reports with such contents, but had to accept serious limitations if we wanted all the viewpoints to be editable (e.g. a complex distribution to different files + projection into a different file).

Input Type

A blank slate can be unsuitable for some types of users and input. If we wanted the user to provide very specific data, we would offer them a form or a wizard. These are very simple structured systems. A state machine DSL provides the user with much more flexibility, but enforces some structure — we can’t point a transition to another transition, only to a state. In a structured implementation of this DSL, the user would just not be able to create such an invalid transition; a textual DSL would allow to write it, but mark it as erroneous. If our users were developers, they would be used to starting with an empty window, entering the right syntax, and handling error messages. If we targeted people mostly dealing with forms, they might be scared by the empty window, or would not know how to fix the error reported by the system. (“Scared” might sound funny, but there’s quite some anecdotal evidence.) In a structural system, developers might be really annoyed that they have 15 very similar states with only one transition each, but still have to write them as separate multi-line blocks; they felt limited by the rigid structure. For the other group, we could project explanatory texts, and visually separate scaffolding from places where they should enter something; they felt guided by pre-existing structure.

To some degree we can adjust our language design to the appropriate level of flexibility. If we implemented a OO-class like system, we could either allow class content in arbitrary order, or (by grammar / language definition) enforce to first write constructors, then attributes, then public methods, and private methods only at the end.

Environment

Textual systems have been around for a long time, so we know how to integrate them with other systems. Any workflow system can move text files around, and every versioning system can store, merge and diff such files. We understand perfectly how to handle them as build artifacts, and can inspect them on any system with a simple text editor. The Language Server Protocol provides an established technology to use textual languages in web context.

Any such integration is more complicated with structural systems. It might store its contents in XML or binary, thus we require specific support for version control. As of now (March 2021), I’m not aware of a production-quality structural language workbench based on web technology. I hope this will change within next year.

On the other hand, if our project does not require tight external integration and targets a desktop environment, a system like MPS provides lots of tooling out-of-the box that’s well integrated with each other.

Transformations: Model-to-Model

The main distinction for this criteria is between EMF-enabled systems, and others. Our chances to leverage existing transformation technologies, or re-use existing transformations, were pretty good in an EMF ecosystem. EMF provides a very powerful common platform, and a plethora of tooling (both industrial and academic) is available.

Two very strong suits of MPS are intermediate languages, and extensible transformations. EMF provides frameworks to lock several model-to-model transformations into a chain, but it still requires quite some manual work and plumbing. In MPS, this approach is used extensively both by MPS itself, and most of the more complex custom languages I know of. The tool support is excellent; for example, it takes literally one click to inspect all intermediate models of a transformation chain.

Every model-to-model transformation in MPS can be extended by other transformations. It depends on language and transformation design how feasible a specific extension is in practice, but it is used a lot in real-world systems.

Transformations: Model-to-Text

Tightly controlling the output of a model-to-text transformation tends to be easier in textual systems. On the one hand, it’s doable to maintain the formatting (i.e. white space, indentation, newlines) of some part of the input. On the other hand, the system is usually designed to output arbitrary text, so we can tweak it as required. Xtend integrates very nice with Xtext (or any other EMF-based system), and provides superior support for model-to-text transformation: It natively supports polymorphic dispatch, and allows to indent generation templates by both the template and the output structure, with a clear way to tell them apart.

If we didn’t need, or even wanted to prevent, customization of the output, structural systems could be helpful. The final text is structured by the transformation, or post-processed by a pretty printer.

For MPS, we need to consider whether the output format is available as a language. In this case, we use a chain of model-to-model transformations and have the final model take care of the text output, which usually is very close to the model. Java and XML languages are shipped with MPS, C, JSON, partial C++, partial C#, and others are available from the community.

Extensibility

Xtext assumes a closed world, whereas MPS assumes an open world. Thus, if we wanted to tightly control our DSL environment, we have very little effort with Xtext. Using MPS in a controlled environment requires a lot of work.

On the other hand, if our DSL served as an open platform, MPS inherently offers any kind of extensibility we could wish for. We had to explicitly design each required extension point in Xtext.

Conceptual Framework / Theory

Parsers and related text-processing tools are well-researched since the 1970s, and continues to move forward. Computer science build up solid theoretical understanding of the problem and available solutions. We can find several comparable, stable and usable implementations for any major approach.

Structural systems are a niche topic in computer science; Eelco provided some pointers. We don’t understand structural editors well enough to come up with sensible, objective ways to compare them. All usable implementations I know of are proprietary (although often Open Source).

Scalability

As parsers are around for a long time, we understand pretty well how they can be tuned. They are widely used, so there’s a lot of experience available how to design a language to be efficiently parsable. Xtext has been used in production with gigabyte-sized models. The same experience provides us with very performant editors. I’d expect a textual system to fail more graceful if we closed in on its limits: loading, purely displaying the content, syntax highlighting, folding, navigation, validation, and generation should scale differently, and the system should be partially useful/usable with a subset of remaining operational aspects. If a model became too big for our tooling, we could always fall back to plain text editors; they can edit files of any size. We also know how to generate from very big models: C++ compilers build up completely inlined files of several hundreds of megabytes; the aforementioned gigabyte-sized Xtext models are processed by generators.

Practical experience with MPS shows scalability issues in several aspects. The default serialization format stores one model with all its root nodes in one XML file. Performance degrades seriously for larger models. Using any of the other default serialization formats (XML per root node; binary) helps a lot. The editor is always rendered completely. Depending on the editor implementation, it might be re-rendered by every model change, or even every cursor navigation. I’m not aware of any comprehensive guide how to tackle editor performance issues (in my experience, we should try to avoid the flow layout for bigger parts of the editor). The biggest performance issue with possibly any structural system is the missing fallback: Once we have a model too big for the system (e.g. by import), it’s very hard to do something about the model’s size, as we would need the system to actually edit the model. Thankfully, we can still edit the model programmatically in most cases. Both validation and generation performance in MPS highly depends on the language implementation. The model-to-model transformation approach tends to use quite some memory; I’d assume model-to-model transformations (with free model navigation) to be harder to optimize for memory usage than model-to-text transformation.

Model Evolution

Xtext does not provide any specific support for model evolution. As conceptual advantage of textual systems, we can migrate models with text processing tools. Search / replace or sed can be sufficient for smaller changes to model instances. As a drawback, we cannot store any meta-information in the model, but out of sight (and manipulation) of the user. Thus, we have to put version information in some way directly into our language content.

MPS stores the used language version with every model instance. It detects if a newer version is available, and can run migration scripts on the instance.

Language Test

Most aspects of Xtext-based languages are implemented in Java (or another JVM language), enabling regular JUnit-based tests. Xtext ships with some utilities to simplify such tests, and to ease tests for parsing errors. Xpect, an auxiliary language to Xtext, allows to embed language-specific tests like validation, auto-complete and scoping in comments of example model instances. In practice, most transformation tests compare the generated output to some reference by text comparison.

Naturally, MPS does not support (or need) parsing tests. It provides specific tests for editors, generators, and other language aspects. The editor tests support checking interaction schemes like cursor movement, intentions, or auto-complete. Generator tests are hardly usable in practice, as they require the generated output model to identical to a reference model, and don’t allow to check intermediate models. The tests for other language aspects use language extensibility to annotate regular models with checks for validation, scoping, type calculation etc. MPS provides technically separated language aspects, and specific DSLs, for e.g. scoping or validation. They are efficient, but make it hard to test contained logic with regular JUnit tests.

Longevity

We can safely assume we will always be able to open text files once we can read the storage media. Text could even be printed. It’s a bit less clear whether parsing technology in 50 years time will easily cope with the structures of today’s languages. Today’s (traditional, as described above) parsers would have a hard time parsing something like PL/1, where any keyword can be used as identifier in an unambiguous context.

If we stored structured models in binary, it might be very hard to retrieve the contents if the system itself was lost. If we used an XML dialect, we could probably recover the basic structures (containment + type, reference + type, metatype, property) of the model.

Let’s assume we lost the DSL system itself, and only know the model instances, or cannot modify the DSL system. (This scenario is not extremely unlikely — there are a lot of productive mainframe programs without available source code.) I don’t have a clear opinion whether it would be easier to filter out all the “noise” from a parsed text file to recover the underlying concepts, or to reassemble the basic structures from an XML file.

In the more probable case, our DSL system is outdated, but we can still run and modify it, e.g. in a virtual environment. Then we can write an exporter that uses the original retrival logic (irrespective of parsing or structured model loading), and export the model contents to a suitable format.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31