Once you get used to working with gofmt on a daily basis, you want to have it everywhere. I have been writing a good amount of assembler for Go, and I was getting annoyed by not having similar functionality for assembler.
This lead to having to manually align and re-align comments. I also often encountered a mixture of tabs and spaces, sometimes at random.
The main goals where:
- It should provide predictable formatting.
- Be as un-intrusive as possible.
- Tab setting should not affect alignment.
Here is the package, along with installation instructions: https://github.com/klauspost/asmfmt.
To help integration, beside a standalong executable, I have added it to forks of the original
goreturns. This means that when processing a package, it will also format assembler files. Hopefully that will help integration for you.
Before digging in deeper, let’s look at an example. If we look at a short example from the standard library, here is the original code:
// func castagnoliSSE42(crc uint32, p byte) uint32 TEXT ·castagnoliSSE42(SB),NOSPLIT,$0 MOVL crc+0(FP), AX // CRC value MOVQ p+8(FP), SI // data pointer MOVQ p_len+16(FP), CX; // len(p) NOTL AX /* If there's less than 8 bytes to process, we do it byte-by-byte. */ CMPQ CX, $8 JL cleanup // Process individual bytes until the input is 8-byte aligned. startup:
This is the output after running asmfmt:
// func castagnoliSSE42(crc uint32, p byte) uint32 TEXT ·castagnoliSSE42(SB), NOSPLIT, $0 MOVL crc+0(FP), AX // CRC value MOVQ p+8(FP), SI // data pointer MOVQ p_len+16(FP), CX // len(p) NOTL AX // If there's less than 8 bytes to process, we do it byte-by-byte. CMPQ CX, $8 JL cleanup // Process individual bytes until the input is 8-byte aligned. startup:
The changes are fairly small, but in this example we can observe the following changes:
- Parameters separated by a “,” always have a space after the comma.
- Comments are aligned.
- Single line block comments are converted to a line comment.
- The first parameter of every block is aligned.
- Unneeded semicolons at the EOL are removed.
Indentation is calculated automatically. There is a lot of logic behind it. The the most important rules are listed here.
The main challenge is that assembler files don’t have the same structure as a Go file. Therefore the formatting is done on a per-line basis, but with a state kept from the previous lines.
The main challenge that can yield unexpected results are comment indentation. The formatter attempts to guess if you are at the start of a new section, by looking at the previous instruction. If it is
RET or a
JMP, it will assume that you are about to start a new section, and will place the comment at level 0.
This is the only unexpected situation that can arise from processing on a line-by-line basis, and in practice I haven’t seen many problems arise from it.
// func somefunc(a byte) int32 TEXT ·somefunc(SB), 7, $0 // Indendation increased to 1 MOVQ a+0(FP), R10 // Still level 1 because last instruction was not a terminator. RET // RET is a terminator // This comment is at level 0, because it is following a terminator
Line continuation is another issue. Even though it isn’t used that often, I thought it was an important feature to have it look nice, since it is an annoying task to have to keep them looking nice. Here is a short example:
#define MOD_SUB2(x, y, p, LABEL) \ XORQ R10,R10; \ SUBQ y, x; \ CMOVQCS p, R9; \ CMOVQCC R10, R9; \ ADDQ R9, x; \
And after running asmfmt:
#define MOD_SUB2(x, y, p, LABEL) \ XORQ R10, R10; \ SUBQ y, x; \ CMOVQCS p, R9; \ CMOVQCC R10, R9; \ ADDQ R9, x; \
#define line and the instructions are seen as two separate blocks since the indentation changes, so therefore the alignment of the first line continuation backslash are different. An indentation change will always change alignment, so we can remain tab-size independent.
Here semi-colons are kept, since they have a meaning.
Comments received some special love for this, and I attempted to make comment placement and indentation as seamless as possible.
The most intrusive change is that single line block comments
/* comment */ are converted to line comments
// comment if it is deemed safe to do so. Multi-line block comments are retained.
In my experience avoiding unnecessary block comments are a plus. If you want to quickly disable a block of code, it can be easily done by commenting out the code using a block comment. If there is a block comment within that code, that doesn’t work.
For block comments there is also special indentation for lines starting with an asterisk. This is to ensure that blocks like this still looks good:
/* * whole thing backwards has * adjusted addresses */
Ordinary line comments have a space after the
//. The only exceptions are if the first character is “+” or “/”. The first is for
//+build !noasm. The second for line separations like
////////////////// which I encountered a few places.
Spaces or Tabs for alignment
The only discussion point that was, was if tabs should be used exclusive for indentation. Tabs has the advantage that it doesn’t require a fixed size font for alignment to work.
However, tabs are also unreliable, if we want consistent formatting. First of all, we don’t want to force people to use a fixed tab size. Second of all, in variable sized fonts we still cannot use tabs reliably for comment alignment.
Therefore for the leftmost indentation we use a tab, and for comment and line continuation we use spaces. This means that the leftmost indentation will be the width specified by the tab size. This also has the side-effect that we do not attempt to align non-indented and indented lines.
I think this tradeoff makes sense, and exposing tabs as an option will just make formatting inconsistent, which is one of the things we are trying to avoid.
I do not expect any major changes, but there might still be some unintentional corner cases that needs working out.
Feedback is of course still very welcome, but I will be very skeptical about changes, unless it has a very clear upside.