Code Annotation Format
Author: David Betz
Version: 0.7
Created: 8/12/2008
Updated: 1/26/09
Preamble
This document outlines the specification for the Code Annotation Format ("CAF"). The CAF specifies a way of representing code metadata in the code itself to help decrease the time that is required for the brain to parse code.
This is not a set of guidelines, set of rules, a series of conventions, or a format specification. It does not deal with how and when to use particular types, where to place block beginning and ending markers or speak on how and when to use IDisposable. There are already authoritative documents on these topics that all both individuals as well as corporate entities are to follow.
This format has been designed based upon the following core principles (names in parenthesis):
- (free) Code shall never be implemented with a specific tool, settings, or color scheme in mind.
- (current) Code should be based on the current reality, not prior-art.
- (natural) Code shall take into consideration the psychology of the human, not merely the function of the system.
Commentary
- (free) Far too often those who design conventions, design it assuming that everyone will use the same IDE as them. This severe arrogance is rampant amongst Microsoft-centric developers. Not everyone uses Visual Studio all the time. You will not be able to rely on shortcut keys, macros, plug-ins, and color syntax at all times.
- (current) This is a common sense principle. Coding for C# should look very different than that of C++, Java, and VB. One of the mistakes of .NET 1.x is that C# looked far too much like Java. This caused no end of confusion in the minds of developers who thought, because of this, that C# was a variant of Java, when in fact it was in fact in borrowed concepts from many different languages.
- (natural) Code styles should be considered dead-on-arrival if they are designed from a system perspective, not a human perspective. The former was very popular in the early decades of computing (i.e. COBOL, FORTRAN, etc...). Modern systems have been designed from a human perspective. C# 2.0 and beyond, for example, have been designed with the human in mind. This is never as easy as it sounds and takes much research to enhance.
While the CAF is a format that may be applied anywhere, it is primarily for C#, C/C++, IL, and partially for JavaScript. There is a running commentary with these guidelines.
The CAF works best when the code to which it is applied follows industry best practices as stated by three previous works.
These works are:
· Framework Design Guidelines - 2nd Edition (not to be confusing with FX Cop with often contradicts the official rules)
· Refactoring by Martin Fowler
· Visual Studio 2005/2008 Default Settings
The first source shall be followed at all times. No company or group rule shall ever take exception to that which is written in this document. It is law. You do not come to the law to judge the law, but you come to be judged by the law. Where your practices contradict that which is written in the law, it is you, not the law, that must change.
The second source has been proven to be the standard platform independent way to think about manageable code.
The third is so widely accepted that deviation from it will lead to inevitable disaster. The CAF does not deal with “formatting”. The Visual Studio defaults have set the de-facto standard and it is an act of futility to attempt to augment it.
Coding Annotation Format Details
This metadata is in the form of comments in the code. They have been heavily influenced by the Instinct format commenting created by Digital Evolution Group. The rules have be tweaked only slightly to make them more in line with the three code principles.
The following sections contain the individual details in bold, followed by optional example and commentary sections.
Empty lines shall never be used in the body of a method or property; "//+" shall be used in it's place (without quotes).
Commentary
This rule is the cornerstone of the CAF. Many of the following rules are derivatives of this rule. There are three primarily reasons for the existence of this rule in this form:
First, this rule employs a technique of visual compression to allow code to be parsed much faster. When a person reads from top to bottom, the brain has a sense of "momentum" in the reading. When a blank area is hit, this momentum is destroyed. This increases the latency that is required to read any code.
However, visual code compression without any separation will lead to raw obfuscation. That is, it would be nearly impossible to see where one part of the code ends and the next begins. Therefore, a visual symbol (a plus sign) is used in it's place. This tells the brain that there was a small bump here, but the momentum does not stop. This decreases mental latency, thus increases reading speed.
This is in direct contrast to a programming language like VB, which has extremely high mental latency. The brain does almost nothing but constant parsing. Instead of using short symbols like "{" as in C#, it uses long symbols like "Begin Sub", which, obviously takes longer for the brain to parse. This natural language parsing increases the latency.
This concept is somewhat related to the linguistic topic called Haitus. This is a morphological term (morphology is the study of the structure and content of word forms) that represents the separate pronunciation of two adjacent vowels. Humans naturally despise haitus so much that most languages filter out all haitus through time. For example, in English, the word "an" is used before a word starting with a phonetic vowel. Having "a apple" increases the latency of speech. By adding the movable "n", latency is decreased. The same idea applies to the //+ in the CAF.
Second, if there is was a blank line break in code there is nothing stating whether the break was deliberate or not. It could be that the developer inadvertently hit "return" thus causing confusing in those trying to interpret the code. By using a deliberate "//+", the developer is stating that the break is deliberate. This will lead to less confusing and will reduce the number of questions in the mind of a future developer. It also decreases the probability of a future developer removing that break.
Third, some notes left by developers (see comment-related rules later in this CAF) may be ambiguous. They may either be a statement of action or a statement of explanation to a future developer. The example will explain more.
Example (ambiguity)
// Write out rendering
This comment is ambiguous. Is this a note to a future developer to "write out rendering"? Or is it a statement that the following code "writes out rendering"? There is absolutely no reason to give cause to ask such questions. This problem is easily solved by writing a deliberate note as follows:
//+ write out rendering
This "hardens" the note to be a permanent comment, instead of a temporary note.
Each non-field member of class shall be labeled with the metadata header //- @Name -// above any XML documentation and attributes. The "Name" portion shall be the name of the member and the prefix symbol shall state the access modifier.
The access modifier symbols are as follows:
- @ - public (IL:public)
- $ - private (IL:private)
- # - protected (IL:family)
- ~ - internal (IL:assembly)
- % - internal protected (IL:famorassembly)
- & - [not supported in C#] (IL: famandassembly)
Commentary
This rule is also a core rule from which others may be derived.
This follows directly from the (free) and (natural) code principles. You may never assume that everyone will always use Visual Studio at all times. This isn't a reality. There's no reason to open Visual Studio to make a quick change to a file that will go through ASP.NET's automatic compilation. You also won't have Visual Studio around when you do a dump analysis of a server. You won't even have it when you make a change on an FTP server.
Designing to a specific IDE is arrogant and must be avoided. Therefore, measures must be taken to allow efficient readability in a general text editor. By applying the aforementioned header metadata, there is a very obvious signal that a member is present. This drastically decreases the time required to mentally parse a file.
When using Visual Studio, the metadata header will show in a different color than the surrounding area, thus proving even more value. This will "pull" the eye to the area so it can be quickly parsed.
This metadata header also fixes the problem of method signature hiding brought on by attributes. With this in place, the eye can quickly find the method in question This provides an even greater benefit when XML documentation is applied to a member. The eye will see that there is a small //- @Name -// area on top of a massive block of text and will quickly be able to see the name of the member. Thus, the the method signature hiding problem is solved as well.
Example
//- $Process- // private void Process() { }
Example
//- @Id -// /// <summary> /// This is how a field is identified to the system. /// </summary> [DataMember] public Int32 Id { get; set; }
Example
//- @SetScopedEntry -// /// <summary> /// Sets the scoped entry. /// </summary> /// <param name="scope">The scope.</param> /// <param name="key">The key.</param> /// <param name="value">The value.</param> public void SetScopedEntry(String scope, String key, String value) { this.Add(scope + "::" + key, value); }
Commentary
This technique is based on a time-tested technique from the C++ world. In the header files for Microsoft's Win32 API you will see that members are actually declared, in part, vertically. This makes parser slightly quicker. Here is a C# example of using a metadata header:
For example,
WINBASEAPI
VOID
WINAPI
OutputDebugStringA(
__in_opt LPCSTR lpOutputString
);
WINBASEAPI
VOID
WINAPI
OutputDebugStringW(
__in_opt LPCWSTR lpOutputString
);
This C++ example will be mentioned again in the next rule. See it's example for more information.
Overloaded, semantically equivalent, or technically grouped members shall be grouped together without line breaks and share the same header metadata.
Commentary
This allows for members to be naturally grouped together.
Example (overloaded)
//- @ComputeTotal -// public Int32 ComputeTotal() { } public Int32 ComputeTotal(Int32 padding) { } //- @AddProduct -// public String AddProduct(Product product) { } public String AddProduct(Product product, Int32 padding) { } public String AddProduct(Product product, Int32 padding, Double metric) { }
Example (semantically equivalent)
//- OutputDebugString -//
WINBASEAPI
VOID
WINAPI
OutputDebugStringA(
__in_opt LPCSTR lpOutputString
);
WINBASEAPI
VOID
WINAPI
OutputDebugStringW(
__in_opt LPCWSTR lpOutputString
);
Example (technically grouped)
See example in next rule.
When the members are technically grouped async members, the name shall be the true method name suffixed with "(Async)".
Commentary
This allows very quick visual access to the async version of the method. The idea is that both parts of the async pattern represent the same method. Therefore, they share the same metadata. This allows a person to see the true name (the non-async name) instead of having to mentally parse the async mechanics from the true name.
Example (also: technically grouped example from previous rule)
[ServiceContract(Namespace = Information.Namespace.Person, Name = "IPermissionService")] public interface IPermissionServiceAsync { //- CreatePerson (Async) -// [OperationContract(AsyncPattern = true)] IAsyncResult BeginCreatePerson(Person person, AsyncCallback callback, Object state); Int32 EndCreatePerson(IAsyncResult result); //- GetPersonList (Async) -// [OperationContract(AsyncPattern = true)] IAsyncResult BeginGetPersonList(AsyncCallback callback, Object state); List<Person> EndGetPersonList(IAsyncResult result); }
Constructors shall be labeled with a //- @Ctor -// header as metadata.
Example
//- @Ctor -// public Person( ) { } public Person(String firstName) { }
The Fowler approach will be used when dealing with comments: they pollute the code or, as he writes, they have a "smell".
Commentary
Per the official .NET coding guidelines, comments should never be used as a cover up for sloppily written garbage. Properly designed code should be self-documenting. This is especially true in an environment which ships the metadata along side of the implementation like .NET. Never use comments to make an excuse for coding laziness or inelegance. Methods, properties, events, classes, structures, fields, parameters, locals, and delegates should be named in such a way that they describe the exact value of what the entity provides.
Using comments in a method (or property accessor) to describe what a particular peice of code does is all too often an excuse for not refactoring. If a method does so much that a developer feels obligated to document it, this method is a good candidate for refactoring. Instead of having a single block, it would probably be better to have several smaller blocks, each in a separate method which a name which describes the functionality of that block.
This doesn't mean no comments should be used. However, when a comment is used, it should provide a value that refactoring cannot. Again: comments are never to replace properly designed code or a trained professional. Also, do not use comments to cater to the ignorance of an entry-level developer.
Prefer //+ prefixed contextual labels over comments.
Commentary
When a method contains logic that is not deemed a good candidate for refactoring, some statement of purpose should be applied. Humans are contextual creatures; we deal in scopes. The example will explain more.
Example
When dealing with an HttpHandler called "UrlProcessingHttpHandler", a developer should know that this HTTP handler is to be used for url processing. Names and comments in this class should not restate this. Furthermore, in an "Extract" method in this class, it's meaningless to write the following comment:
// Do link extraction for URL processing
Note only is this note ambiguous, thus ignoring a previously stated rule, it also states information that's entirely redundant is a properly designed architecture. Given the context, the following is the preferred label over the previous comment:
In the following example, the context, or, rather, scope, of the parsing of the link is "process url". So, there is a sign that says "//+ link". You know what you are
//+ link
Prefer lower labels over sentence case labels
Commentary
The English language is amongst a unique group of languages which have two sets of alphabetical symbols. That is, for each semantic entity in an English alphabet, there is a mirrored entity: the lower case and the upper case. For longer sentences, the differentiating sets provides for greater mental parsing abilities. However, for short labels, the overhead of switching between the two sets only increases latency, providing absolutely no extra value.
Example
See previous "link" example
Never use plurals (except in the case of Flags enumerations as required by the FDG as de facto standards)
Commentary
This is just a logical progression from standard database design and REST-based principles. For example, there is no such thing as an "Orders" table in a database. That obviously makes no sense. It makes no sense to say that "Orders is related to Products". The grammatical numbers don't agree (in English we use "are" for plural, not "is") . It's "Order is related to Product". There is an "Order" table which has orders.
This principle is also in REST-based design. REST states that the following is incorrect for the exact same grammatical reasons:
http://www.tempuri.com/documents/people/edit/1
Instead, it should be the following:
http://www.tempuri.com/document/person/edit/1
In the same way, there is no "GetProducts" method.
Furthermore, you have no idea what is returned from a "GetProducts" method. Is it an array? A list? While it will not always be possible to name a method to the specific return type, changing GetProducts to GetProductList, GetProductArray, or GetPersonDictionary goes a long way in aiding the readability of the code. This is extremely important when not using a non Visual Studio editor. However, even in a Visual Studio editor, this helps the developer determind the return value of a type much faster.
Example
Instead of "People", use:
//- @PersonList -// public List<Person> PersonList { get; set; }
Example
Instead of "Points", use:
//- @PointArray -// public Point[] PointArray { get; set; }
Example
Instead of "GetProducts", use:
//- @GetProductList -// public List<Product> GetProductList() { //+ return product list }
The following rules are derivations of the //+ rule.
Separate sections of namespace imports with a //+.
Commentary
As mentioned earlier, the //+ is the cornerstone of the CAF. This rule is simply another application of using //+ to decrease latency in code parsing.
Example
using System; using System.Web.UI; using System.Web.UI.HtmlControls; using System.Web.UI.WebControls; //+ using Nalarium; using Nalarium.Activation; //+ using Nalarium.Blog.Service; //+ namespace Nalarium.Blog.Controls { }
Ordering the namespace import sections from the most core to those most specific to your project.
Commentary
The ordering of the sections are from the most core (System, Microsoft) through the 3rd party entities to your library entities to your project entities. The progression is from the least to the most specific to your project. When a system is designed followed proper architectural principles this progression should be very clear; for example, no higher sections should know about the lower sections.
Example
See example from previous rule
Separate the namespace import section from the first true file entity with a //+
Commentary
See commentary from first rule on namespace section separation.
Example
See example from first rule on namespace section separation.
Always make sure to only retain the namespace imports that your current file is using.
Commentary
By including too many namespaces, you will defeat the entire purpose of them. With each namespace that is added, the probability of naming conflict is increased. Even then, that's not the only purpose of namespaces. They are containers and for every namespace that is added, the contents of that container is poured into the file, thus defeating the entire purpose of namespaces.
Furthermore, when a developer opens a file, he or she should be able to quickly certain the purpose of the file by looking at the namespaces. If the file uses System.Text and System.Text.RegularExpressions, it's clear that something directly involving text is in this file. If you see System.Web, then you know something directly involving web-related resources are in this file.
Even then, when using Visual Studio, excess namespaces bloats Intellisense, making finding the specific type you are looking for very difficult. Given this reasons as well as the others, it's highly recommended that the Microsoft PowerCommands "Remove and Sort" be used very liberally to remove unnecessary namespaces.
Example
With the following namespaces imported into a particular file, it should be very clear to what this file relates:
using System; using System.Reflection; using System.Reflection.Emit;
Avoid including namespaces that will be used under 3 times separate times or for only a single type.
Commentary
The purpose of a namespace is to have a container for types. The purpose of importing a namespace is to pour the contents of that container into the current context. If you are only going to use a single type from a particular namespace, using the namespace import feature of C# and C++/CLI is counter to its original purpose.
Furthermore, explicitly writing out the namespace of an entity increases readability. Each time a namespace is imported, the readability of that code decreases because a developer must know exactly what type is actually being used or must constantly correlate the type with the context and the imported namespaces. Visual Studio aides in this correlation, but, design for a particular IDE is a very poor and extremely dangerous practice.
Example 2
The following method demonstrates the use of a System.IO.MemoryStream class only being instantiated once. Note that, since the file deals with XML, the System.XML namespace has been imported. Further note that, the Nalarium.IO namespace was not imported, but was accessed through a relative namespace (i.e. IO.StreamConverter).
using System; using System.Xml; //+ namespace Nalarium.Xml { public static class XmlFormatter { //- @Format -// public static String Format(String input) { System.IO.MemoryStream stream = new System.IO.MemoryStream(); XmlDocument doc = new XmlDocument(); doc.LoadXml(input); XmlTextWriter writer = new XmlTextWriter(stream, null); writer.Formatting = Formatting.Indented; writer.IndentChar = ' '; writer.Indentation = 2; doc.Save(writer); //+ String output = IO.StreamConverter.GetStreamText(stream); Int32 lastAngle = output.LastIndexOf(">"); output = output.Substring(0, lastAngle + 1); //+ return output; } } }
Import the System namespace even if it is only used once
Commentary
The System namespace is a special namespace containing the CLR system primitives. It is highly likely that it will be used repaetedly in the future of a particular type. Not only that, but given that CLR system primitives are very common, it's beneficial to avoid excess "System." prefixed on them.
Separate the field section, from the property section, from the constructor section, from the method section each with a //+, using a section name only for the field section.
Commentary
This rule dramatically increases the ability of the eye to see separations in code. Not only that, but it does so without the need for code region sections, which are specific to an IDE.
Example
public class MyType { private Type _type = typeof(MyType); //+ field private Int32 code; //+ //- @Name -// public String Name { get; set; } //+ //- @Ctor -// public MyType( ) { } public MyType(String name) { this.Name = name; } //+ //- @DoStuff -// public void DoStuff( ) { } public void DoStuff(Int32 index) { } //- @DoOtherStuff -// public void DoOtherStuff( ) { } }
Use a section name only for the field section.
Commentary
Since properties are typically nouns constructors always state "Ctor" and methods are typically verbs, the use of section names are typically of very little use given the names of the individual entities. Therefore, it's used only for the instance field section. It's used here because fields do not have the same name metadata as methods, constructors, and properties. This "field' label increases the ability for the eye to find the section.
Example
See example for previous rule
Place a //+ before the "return" keyword in all blocks containing more than 1 lines of code (excluding the "return" line)
Commentary
The return section of a method is entirely separate from the rest of the method. It's almost as if it's a footer. Technically speaking, it's how the system knows to return to a previous point in a call stack. Given this very high level of importance, it's important to separate it out to increase the ability to see this footer.
Example
//- @IsNullOrEmpty -// public static Boolean IsNullOrEmpty<T>(T[] array) { if (array == null || array.Length == 0) { return true; } //+ return false; }
Example
//- #AccessType -// protected AccessType AccessType { get { if (!String.IsNullOrEmpty(this.Secure)) { return AccessType.Secure; } if (!String.IsNullOrEmpty(this.Checked)) { return AccessType.Checked; } if (this.Index > 0) { return AccessType.Index; } //+ return AccessType.Default; } }
Use //++ to prefix a very special note.
Commentary
In certain situations, it may be appropriate to pull a developer's attention to a particular, special area of code. For something to be special, it must be different. The //++ symbol acts as a strong pull to a simple statement. The keyword here is "simple". Writing more than a sentence gives weight to the possibility that refactoring may be necessary.
Example
//++ due to how IIS6 works, this is only compatible with IIS7 integrated mode
Use the following pattern to declare a multiline explanatory block:
//++
//+ information here
//++
Commentary
This format for an explanatory block allows the developer to quickly see the note. Keep in mind that explanatory blocks in code should be either temporary or for extreme complex areas of code that have already been refactored. Comments are never to be used as copouts for not refactoring bad on improperly set deadlines.
Note also that explanatory blocks should be entirely in lower case to decrease the latency that is required for the brain to switch between uppercase and lowercase set switching.
For C#, one multiline explanatory block per every 5,000 lines of code is a good ratio for which to aim. Any higher probably requires more refactoring. This is due to the fact that C# allows for XML documentation, a feature designed specifically for this purpose. This is a compilter feature, not an IDE feature. For C++ and IL, explanatory comments may be used in places where XML documentation would normally be in C#.
Example
//++
//+ this is a hack to be replaced by a future version; however,
//+ for the time being, this autodetects the prototol inplace.
//++
Use the following pattern to declare a TODO code section:
//++
//TODO: Note
//++
Commentary
This pattern will provide a very obvious video cue that action must be taken. This pattern will be very obvious in every IDE, including a basic text editor. Visual Studio will also be able to read this into its own to do pane as the //TODO: line is the format for a Visual Studio to do line.
Note that the text should follow sentence casing. Given that the word TODO is required to be in all caps, the following letter should also be a capital letter to lower latency. A lower case letter following the final "O" provides too much of an abrupt switch to the note.
Keep in mind also that the note should be short. However, given that these statements are by nature temporary, they may be as long as required to explain the situation. There are no strict length requirements.
Example
//++
//TODO: Implement validation for access code
//++
