The minimum infrastructure for running languages and models
In my last post I wrote about what constitutes a language, and what you might want to call the data structures that don’t. In this one I want to discuss what the minimum infrastructure is necessary to “implement” one or more languages.
Here is why I think this is a relevant question. If you have decided that you want to solve some kind of problem with a language — metamodel + validation or type checks + custom syntax + some form of execution — then the next question is: how do I implement this? How do I actually do this.
Language Workbenches — or maybe not
Ideally, you have the opportunity to use a language workbench like MPS (or others, but that’s not the point here.) They come with all the tooling necessary to implement your language. Just learn MPS, and you’re golden :)
But using MPS (or one of its language workbench brethren) has drawbacks, too. One is the “learning” part. Some of these tools are very powerful and therefore not so easy to learn. They are also very opinionated about deployment. For example, MPS is a (relatively fat) Java application. So is Eclipse/EMF/Xtext. The others aren’t much “thinner”. These days you might want to run your language in the browser (or potentially in a mobile app). There are increasingly more language workbenches (or at least first steps towards them) that are web native, but (a) they are all still work in progress, and (b) they also come with quite a bit of infrastructure. Examples include Modelix, ProjectIt, plus a whole bunch of academic prototypes.
Finally, you might already have an existing software ecosystem into which you have to integrate the “language stuff”.
A Robust M3 Layer
So where do you start? The most important building block is an ability to uniformly process models. This means that you have to implement a bunch of classes to
- represent models in memory,
- persist them somehow using a metamodel-specific serialization format (not a syntax, see my last post),
- provide an API to read, traverse and modify models,
- and to support a a rudimentary but generic way of editing them.
All of this must be independent of your actual language, at least if you plan to develop multiple languages. In other words, you have to implement your own M3 layer, your own meta meta model.
Since this M3 layer does not know about any particular M2 (metamodel), the model access will be reflective (as seen from the metamodel). Typical operations found in these APIs are
To get a better understanding you can check out EMF Ecore and its Java mapping or the MPS structure language and the SModel API. A nice, simple and clean example is the API provided by Modelix. If you want to torture yourself, you can alternatively read the OMG MOF standard :-)
In principle you now have all you need: you can work with any model, expressed in any language, and then, for example, implement type checkers or editing frameworks on top of it. However, in practice, you will want to do one more thing: provide a convenient way to define metamodels.
Definition of Metamodels
The problem with only defining an M3 is that access to all models is necessarily reflective, because there is no tool-processable definition of a particular language’s metamodel (ie., the structure of models expressed with that language).
Existing M3s like MPS or EMF allow you to express metamodels declaratively (could be a Json format in the simplest case) and then generate typed APIs for the metamodel. Internally, these typed APIs are implemented using the generic, reflective ones. So if we use our typical state machine DSL as an example, you could have a metamodel-specific API like this:
To make references work reasonably, your M2-definition approach should provide a way to define the scope of references, i.e., which type-compatible nodes are valid reference targets.
Scaling and Notifications
Except in the simplest use cases, you have to make sure that the system scales to large and/or many models. If you store your models as a node graph in a database, you have to ensure that appropriate lazy loading and unloading of nodes is supported, ideally transparent for the user of the M3 API. If you store in files (or Json Blobs in a database), then your M3 layer must support some kind of way to define the granularity of these files or blobs, and likely some notion of model-file-import is needed. Some of this can be non-trivial, and encapsulating it in a well-defined M3 layer is a major reason for having one.
I would say that this concludes the minimum reasonable infrastructure. You now have an API onto which you can build — for example — type checkers, editors or interpreters. Sure, if you build many of these, then you will certainly build additional frameworks and libraries to let you implement those more efficiently, but such additional infrastructure can be built iteratively, as needed.
One piece of infrastructure that is worth mentioning explicitly here is notifications, where clients (type checkers, editors, interpreters) can subscribe to changes of the model in order to react to them. This is crucial if you want to make your model editors and processing services integrate seamlessly with a modern web application where users expect immediate synchronization between users and incremental update of data derived from the model.
What I describe here as the minimum might sounds like a lot. But in fact it is just a couple of thousand lines, usually, that you can even adapt from the linked examples. It is therefore not a coincidence that Modelix started with exactly this. It is now considerably more than a few thousand lines because it addresses many non-functional concerns, but what I describe here was the rationale behind the development roadmap.
In contrast, if you don’t start this kind of infrastructure, you will not have any integration layer across your models, and your language development endeavour will descend into ad-hoc-ness and improvisation.
Thanks to Kolja Dummann and Sascha Lisson for useful input on a previous version of the document. And to some of my customers for making me think about the topic.