Code Generation with Roslyn - Comparison of SyntaxFactory and CSharpParseTree

At Code Connect, we need to generate C# code that runs our user’s unit tests. We devised an internal code framework which discovers relevant methods (e.g. test methods, test initialize and test cleanup) and builds C# code which invokes them in the order that testing framework would invoke them.

We’ve been using Microsoft.CodeAnalysis.SyntaxFactory for all code generation needs, but its implementation requires really lengthy code. We decided to test its performance against CSharpSyntaxTree.ParseText

The task in our benchmark is to generate code that consists of a class and a number of methods. To test performance with different sizes of syntax trees, we’ll measure the performance when we build between 1 and 4096 methods in a class. Furthermore, we’ll measure the performance when generated methods are empty and when they contains 4 syntactically rich lines of C# code.

Here’s an example of the code we might generate. It consists of a single class containing method Run and a specified number of methods with random identifiers:

    public class DemoClass : IDemo
    {
        public void Run()
        {
            method123();
        }

        private void method123()
        {
            var myNumber = Int32.Parse("method123".Substring(6));
            Console.WriteLine("method123 says hi to " + (myNumber + 1));
            System.Diagnostics.Debug.WriteLine("Method {0} executed.", new[] { myNumber });
            var test = myNumber * 2;
        }
    }

Performance analysis

We will compare how long it takes each approach to generate the desired code. The number of methods is the controlled variable. We have tested two approaches of generating syntax, each generating either empty or relatively complex methods. The lower the measurement, the better the approach is.

The resulting time is milliseconds to generate a type, averaged over 100 samples. Building of the syntax tree was performed in a separate process every time. Standard deviation error bars were too small to be visible on the plot. All data is available here.

Code generation time [ms] vs. complexity. Lower is better. graph

When generating a modest amount of methods, ParseText is noticeably faster, whether the methods have body or are empty. Let’s see the performance when generating more methods:

Code generation time [ms] vs. complexity. Lower is better. graph

Performance of SyntaxFactory is very sensitive to both amount and complexity of generated code, which might be correlated with a high number of object allocations. Depending on complexity of generated code, SyntaxFactory might be faster than ParseText.

How many objects were actually allocated? Roughly twice as many in the SyntaxFactory approach. However, SyntaxFactory kept the Heap Size lower:

SyntaxFactory generating 4096 empty methods used 83k allocations graph

ParseText generating 4096 empty methods used 48k allocations graph

SyntaxFactory generating 4096 complex methods used 470k allocations graph

ParseText generating 4096 complex methods used 185k allocations graph

At Code Connect, we rarely have a need to create a type with a hundred methods. The most common scenario is creating a type with only one method. In either case, ParseText is the fastest and lightest choice.

Developer experience

The most noticeable difference is that, SyntaxFactoryCodeGenerator.cs has 581 source lines of code, and ParseTextCodeGenerator.cs has 86 source lines of code. However, most of the code for the SyntaxFactory experiment was generated by the Roslyn Quoter tool. Most of the changes pertained extracting chunks of code into functions that yield return. When in doubt, I used the Roslyn Syntax Visualizer to see the structure of the C# syntax trees.

How does the number of lines translate to amount of errors? Is it proportional in the same way for two approaches? It’s hard to tell: C# grammar is for the most part unambiguous, so that usually ParseText also outputs the tree that you wanted, without you needing to know the detailed structure of the tree. In some situations “usually” in not acceptable, and this method can’t be used.

SyntaxFactory extensively uses the type system, and the best thing about SyntaxFactory approach is that it builds exactly what you tell it to build. The worst thing about SyntaxFactory approach is that it builds exactly what you tell it to build. Some rules are very stringent and there is no way to bend them. SyntaxFactory builds exactly the tree that you, a programmer, literally asked for. Watch out when you’re manually editing generated code!

Consider a situation where you need to build a method invocation expression Method(“one”, 2, 3d);. Roslyn Quoter provides the following syntax for the ArgumentList:

SyntaxFactory.SeparatedList<ArgumentSyntax>(
    new SyntaxNodeOrToken[]{
        SyntaxFactory.Argument(
            SyntaxFactory.LiteralExpression(
                SyntaxKind.StringLiteralExpression,
                SyntaxFactory.Literal(
                    SyntaxFactory.TriviaList(),
                    @"""one""",
                    @"""one""",
                    SyntaxFactory.TriviaList()))),
        SyntaxFactory.Token(
            SyntaxKind.CommaToken),
        SyntaxFactory.Argument(
            SyntaxFactory.LiteralExpression(
                SyntaxKind.NumericLiteralExpression,
                SyntaxFactory.Literal(
                    SyntaxFactory.TriviaList(),
                    @"2",
                    2,
                    SyntaxFactory.TriviaList()))),
        SyntaxFactory.Token(
            SyntaxKind.CommaToken),
        SyntaxFactory.Argument(
            SyntaxFactory.PrefixUnaryExpression(
                SyntaxKind.UnaryMinusExpression,
                SyntaxFactory.LiteralExpression(
                    SyntaxKind.NumericLiteralExpression,
                    SyntaxFactory.Literal(
                        SyntaxFactory.TriviaList(),
                        @"3d",
                        3d,
                        SyntaxFactory.TriviaList())))
            .WithOperatorToken(
                SyntaxFactory.Token(
                    SyntaxKind.MinusToken)))});

Each of the arguments is a different type of ExpressionSyntax. The first two are LiteralExpression with StringLiteralExpression and NumericLiteralExpression inside. The last one is a PrefixUnaryExpression which ultimately contains a NumericLiteralExpression.

Suppose that the method arguments are supplied by the user as a string “\”one\”, 2, 3d”. After splitting this string into three substrings, you must choose appropriate LiteralExpression for each one of them.

If you need NumericLiteralExpression, you must pass a numeric to SyntaxFactory.Literal as the third parameter. Setting SyntaxKind.NumericLiteralExpression doesn’t mean that you will obtain NumericLiteralExpression.

If you use a wrong expression, the syntax tree won’t compile - but it will appear ok! Read more about invalid syntax trees at Josh’s blog and add a test that tries to compile the syntax tree.

In conclusion, ff the parameter is a string, then you need to parse it, validate it and and convert it yourself. Only after parsing it you can use the SyntaxFactory method.

Which method to use?

Conclusion

Initially I dismissed ParseText approach thinking that SyntaxFactory is the fastest because it is the closest to the metal. However, SyntaxFactory is slower than ParseText and allocates roughly twice as many objects than ParseText.

There are many surprising and clever performance optimizations in Roslyn’s source code: read Robin Sedlaczek’s excellent analysis of Roslyn’s performance and if you have an hour, watch Dustin Campbell’s presentation on performance in a large codebase.

The source code

The source code to the generator is available on GitHub

You can follow the call tree which is identical in both ParseTextCodeGenerator and SyntaxFactoryCodeGenerator. The entry point is method GenerateType, which calls getAllMembers. This method calls both getRunMethod and getMethods. The former calls getMethodInvocations and the latter one calls getMethodBody. There is an extra step in the ParseTextCodeGenerator.GenerateType: you must get the root of the tree and descend to down the child nodes to find the desired TypeDeclarationSyntax.

ParseText code generation: Screenshot of code generated with Code Connect

SyntaxFactory code generation: Screenshot of code generated with Code Connect

Follow us on Twitter: @CodeConnectHQ