Comment Markup Language (CML)


Comment Markup (language) - CML

[/fusion_text][menu_anchor name="intro"][fusion_text]


There are many tools to help identify with the changes that have been made to code. Source Control and diff tools help you to identify what has been changed and where it has been changed down to the line level. From this you can get to the module that has been changed and sometimes that is enough. However, for a number of reasons, I often need to get to the context of what has been changed. Some of the code I work on, on some of my projects is a little more ‘legacy’ than object-orientated and can therefore still have some big, monolithic blocks of code. Refactoring is not necessarily the answer, so the nasty thing is that the code has to be changed in-situ.

When we put together documentation around the change, then not only do we need to know what has been changed and where has been changed we need to know the context of the change - what procedure, routine or function. Out of this frustration and to try and alleviate some of the pain, CML was ‘born’.

[/fusion_text][menu_anchor name="WhatDoesItDo"][fusion_text]

What does it do?

I like to think of CML as an intelligent yellow highlighter that not only shows you what has changed but it is also smart enough to look backwards and show you the parent that has been affected, with the right settings it can show you the whole tree of change within a single document.

At it’s most basic, it is just a case of putting a marker into your code or document behind the appropriate comment mark for the language that you are using, be that –, or ’ or # or whatever

Because that comment is ignored by your system, it is only available in the source. By correct identification of the source material the CMLParser is able to identify what you comments are and knows to look inside those for the CML marker(s) should it find any then it will do it’s magic and extract the semantic meaning of the change that you have made.

How does it do it?

Very simply, actually. Depending on the nature of the document that is being parsed the CMLParser knows some of the structure of your document. It knows what it should be looking for that defines the structure of your document and looks for those structure markers as it reads your document. As it passes a structure marker, it takes a note of what it has found and remembers the last marker it found. When it then finds a CML marker, it outputs the last structure marker that it found before it with all the relevant information.


CML grew from a need to document what we thought we would be changing - the initial impact assessment and then documenting what we had actually changed - the impact statement. This gives us the ability to quickly identify where we think things will change but then confirm, emperically, what has really changed and document it quickly and easily without having to spend hours doing it manually.

Who’s it for?

It grew out of a need to produce quality software documentation around change management. If you have any need for such a thing then you might find it useful.

How do I use it?

It’s written in Ruby to be relatively platform independent. So you will need to ensure that you have Ruby installed on your system.

Isn’t it just another comment?

Yes and no! If you already practice good commenting, then using CML should not be an issue and this was one of the design decisions. In this case this is just a structured comment that has more semantic meaning. However, unusually, CML is designed to be removed from the source as part of the parsing process. As you can make the CML blocks quite large, it could see your code ‘bloat’ quite quickly and then the parser could potentially have to read loads of CML to find the latest change.

If you don’t want the default behaviour then you can provide a switch when executing the parser to leave the CML in place. This can be useful for a pass through a folder to get an overview of changes but without needing to remove the markers.

Why is it Open Source?

Because I believe that this is a problem that other people and other development teams face on a daily basis and it will help towards the production of quality documentation, even in an Agile environment.

By making it Open Source, I can invite others to help contribute to the idea. Together we can expand the number of languages and document types that it can support and hopefully it’ll become a useful tool to many development teams around the World.


  • You can pass in a single file
  • You can pass in a whole folder structure
  • You can provide an ignore file path that lists the documents to process and those not to process
  • you can apply a switch to just output the files that have been changed instead of the default tree structure
  • you can apply a switch to not remove the CML from the document (Which should be the default? Extract or leave?)