ISoLA 2018: Thoughts on X-by-Construction

Markus Voelter
8 min readNov 14, 2018


X-by-Construction refers to the idea of ensuring, by construction, that a particular property (“X”) holds in a program. The X is meant to be replaced by any property you care about. A classical property X is correctness (correctness-by-construction), and it means that, if you are able to write the program, it is also correct. But the X can also stand for security, performance or scalability. There was a track on X-by-C at ISoLA, and it prompted some thoughts about what the “by construction” actually means. I thought I knew. But now I am not sure. And based on my experience in the workshop, it seems others are confused as well. So here is some food for thought.

Expressing what you want

The first point in ensuring any particular property, whether by construction or not, is to be able to specify this property. This is harder than it might seem!

Let’s say you write in your program 2 + 2. What does it mean for this to be correct? It means that the plus operator acts in accordance with some kind of specification, that is usually part of the semantics of the language you use. But capturing this formally is not quite so easy (and is in practice often not done). The more you move to non-functional properties, the harder it is to specify those things in the first place.

In the rest of this little essay I focus on X = Functional Correctness because there we at least have some idea of how to specify it (pre- and postconditions, model checking of state machines and the like). I think my observations hold for the other X as well, but specifying the expectation becomes harder.

The Extremes

So let’s start with the extremes: assume we have a language for modelling state machines. One particular — and very trivial — aspect of correctness might be that every state machine has at most one start state. The IDE has the usual graphical editor with a palette. Let’s say that the IDE prevents you from dropping a start state symbol onto your canvas if there is already one. So you physically cannot create a model that violates this (admittedly trivial) correctness constraint. This is clearly correctness by construction.

Let’s consider the other extreme: a running Java program throws a null pointer exception and the fails. This is clearly not correctness-by-construction (it not correctness-by-anything) because the program fails during execution.

Between the Extremes

Let’s say the state machine language lets you drop a second start state onto the canvas, but then immediately marks it as an error with a red squiggly. Or for a more advanced property which we can specify using temporal logic, every change to the machine (or property) triggers a model check that comes back with an error (or not) after a reasonable time, so the feedback happens “during construction”. Strictly speaking, you were able to construct a faulty model, but you were made aware of the error immediately. Does this count as “by construction”? If you say yes, then basically every interactive program analysis such as type systems is part of correctness-by-construction. That’s fine, but the “by construction” starts to lose its meaning a bit.

Relaxing our constraints further, what about if the verification takes longer (model checking will, for non-trivial machines or properties), and you get an email from your CI server that runs the checks after 30 minutes. Or the next morning. This is not interactive, so does this count as by-construction? If yes, then this basically means that every error you can find before you ship/deploy/run the system, through verification, simulation or exhaustive testing constitutes “by-construction”. At the workshop there were talks that (at least implicitly) held this view. I don’t think this is a useful definition, because then we all do “by construction” all the time; it’s effectively similar to “software QA”.


Often the scenario is slightly more complicated because a transformation step is involved. Say your state machine is correct as defined on the level of the model (take the “only one start state” as an example or a valid model check of your temporal properties with NuSMV). Now you translate it. How do you know that the result of the transformation is correct?

In the context of compilers we simply assume correctness. Because of their wide-spread use and “long history” this is often a valid assumption.

If you don’t “just believe” correctness, then one thing you can do is prove your transformation correct. This is exceptionally hard to do, but of course possible; some simple examples were presented in the ISoLA track. Some compilers, such as CompCert, are proven to preserve correctness. I don’t think there are practically usable formalisms where, for a transformation I write myself, I can prove its correctness. Please tell me if I am wrong here.

I would say that a model M1 that is correct (either by construction or through a build-check-fix cycle) and is then transformed into another model (or code) M2 using a proven-to-be-correct transformation constitutes correctness-by-construction with regard to M2. A provably-correct transformation is one that guarantees that (a particular class of) properties that hold on an input M1 always also hold the resulting M2. The reason why I think this is by-construction is that there is no need to check any particular M2, because, from any valid M1, the system can only create valid M2s. No need to check a particular one.

A slightly less onerous way for achieving correctness-by-construction is the following: let’s say you have a model M1 (say, a state machine) as well as a specification which properties you want to hold (say, a bunch of LTL properties). Now you translate the model to code (say, C) and you encode the LTL properties in a way they can be checked on the C code. And then you run the checker on the C code to proof that the properties hold. Is this a valid approach? Yes, but you have to ensure that the properties are translated correctly, otherwise the checker saying “ok” does not mean anything (the classical case of a generator always generating the equivalent of assert true)! A manual review doesn’t really count, right? So this basically means you have to proof the transformation of the properties to be correct! This might be simpler than proving the whole compiler, but it is still a relatively tall order.

So is this correctness by construction? Well, technically not, right? Because you verify each particular M2. There is no proof that any M2 will be correct as long as the input M1 was correct. But because all the proving infrastructure is automatically generated, I tend to count this as correctness by construction.

By the way: we used this approach a while ago with mbeddr-based C extensions and CBMC-based testing. We got an ASE paper accepted about it. But we didn’t really prove that the properties were translated correctly…

There is an interesting special case: if the transformation from M1 to M2 stays in the same language and the M1 properties don’t have to be changed at all to apply to M2, then the approach is valid without a proof. While this sounds like a strange corner case, it is not: optimizers do exactly this. And proving that an optimized (presumably much less understandable) program has the same semantics as the unoptimized one is a very relevant case.

From this observation I would extrapolate that if a development approach is correct-by-construction, we have to be able to define and check a property on some kind of model before the system (code) exists, and then we guarantee that the system (code) fulfils the property without manual checking. It does not matter if we prove the transformation correct (any correct M1 will lead to a correct M2) or whether we automatically check that is true for any particular M1 and M2. Of course the former is a stronger guarantee, but I’d suggest the second one is good enough in practice.

DSLs without by-construction?

This hints at an interesting middle ground: use a nice DSL to specify your model and then exploit the domain semantics for generating code that implements the semantics together with properties that check those semantics on the code level. And yes, ideally you should prove those correct, but maybe you can make do with a slightly weaker guarantee: make sure the implementation generator and the property generator are developed independently to avoid common-cause errors. It is then very unlikely that the implementation and the properties are “compatibly broken”. Not sure it qualifies as correct-by-construction, though. This is basically what we did in the Voluntis project.

The reason why using DSLs is desirable is that the models and the properties specified against the models are more readable, so validation becomes easier. We should not forget that it is not very helpful if we verify internal correctness, but cannot validate that the behavior we want is actually the one we have specified for the verifier to check.

Correctness vs. Security

If we look at security-by-construction, then things become even harder. A slightly colloquial way of distinguishing security from correctness is that correctness specifies what exactly the system should do. And security ensures that it does not do anything else (which could then presumably be exploited). Staying with our state machine, one that is secure with respect to this definition is one that is fully specified, i.e., we define the reaction to every event in every state. There are no “unspecified behaviors” left to be exploited.

But lets now consider a translation to C. On the functional level the behaviors are the same, and we can presumably prove this on the level of the code. Also here, no extra behaviors are left. So it’s secure. Or is it? Of course not. Because there could be any number of low level errors in the libraries our program uses that can be exploited. We’d have to verify all of this as well, treating rhw whole stack as whitebox, and attempting all kinds of malicious attacks. I don’t know how this can be automated.

Loose Ends

Here are two few more things to think about. A development process that puts X first, and ensures, through process and methodology that X holds in the developed system is not X-by-construction. You can maybe call it X-by-design, and it is certainly useful, but it is different.

I’d also like to revisit runtime errors. Let’s say we have a way of generating an implementation that degrades safety. We might not be able to prove that the system is functionally correct, but we can ensure through architectural means (such as watchdogs that check properties at runtime) that if incorrect behavior is detected, we go to a safe state. Of course the system is no longer available in this state (so X = Availability is not guaranteed), but the architecture, to some degree, guarantees certain safety properties “by construction”, because of the architecture. I think one could make arguments that this is safety-by-construction.


I think we have two options. We either have to widen the definition of X-by-Construction to the point where it includes pretty much everything that mechanically finds violations of X before we deploy/ship/run the system. But then it’s a distinction without a difference. Or we stay with the strict definition, but then we require proofs, either of the transformation of the program or of the transformation of the properties. For realistic system sizes in realistic (industry) contexts I think we are quite a ways away from this.

So where does this leave us? I think it’s obvious: for any particular system/language/approach/context we have to find an approach that is feasible and then argue why and how it guarantees particular properties before the system fails in the real world. Whether we do this through an actual proof, verification of particular models, simulation (as in Simbench, see the respective chapter in this paper) or through exhaustive testing (as in the Voluntis case) does not really matter.

So next time, let’s have a workshop that discusses great case studies for such systems, and let’s not use the “by-construction” word.



Markus Voelter

software (language) engineer, science & engineering podcaster, cross-country glider pilot. On medium mostly for the software stuff.