Skip to content

Unified Parser

Category: Parser (takes in source code and outputs cc.json)

The Unified Parser is a parser to generate code metrics from a source code file or a project folder without relying on tools other than CodeCharta. It generates a cc.json file (compressed by default, or uncompressed with -nc).

LanguageSupported file extensions
Javascript.js, .cjs, .mjs, .jsx
Typescript.ts, .cts, .mts
TSX.tsx
Java.java
Kotlin.kt
C#.cs
C++.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx
C.c, .h
Objective-C.m
Python.py
Go.go
PHP.php
Ruby.rb
Swift.swift
Bash.sh
Vue.vue
Delphi.pas, .dpr
MetricDescription
complexityComplexity of the file based on the number of paths through the code. Also includes complexity introduced by definition of functions, classes, etc. (Represents the ‘cognitive load’ necessary to overlook the whole file)
logic_complexityComplexity of the file based on number of paths through the code, similar to cyclomatic complexity (only counts complexity in code, not complexity introduced by definition of functions, classes, etc.)
comment_linesThe number of comment lines found in a file
number_of_functionsThe number of functions and methods in a file
loc (Lines of Code)Lines of code including empty lines and comments
rloc (Real lines of code)Number of lines that contain at least one character which is neither a whitespace nor a tabulation nor part of a comment
long_methodCode smell showing the number of functions with more than 10 real lines of code (rloc)
long_parameter_listCode smell showing the number of functions with more than 4 parameters
excessive_commentsCode smell showing whether a file has more than 10 comment lines
comment_ratioThe ratio of comment lines to real lines of code (rloc)
message_chainsCode smell showing occurrences of method call chains with 4 or more consecutive calls suggesting tight coupling

Some metrics are calculated on a per-function basis rather than per-file. Each of these metrics has max, min, mean and median values for each file. The names of these metrics are prefixed by “max_”, “min_”, …

Metric per functionDescription
parameters_per_functionThe number of parameters for each function
complexity_per_functionThe complexity inside the body of a function
rloc_per_functionThe real lines of code inside the body of a function
ParameterDescription
FOLDER or FILEThe project folder or code file to parse. To merge the result with an existing project piped into STDIN, pass a ’-’ as an additional argument
-bf, --base-file=<baseFile>base cc.json file with checksums to skip unchanged files during analysis
--bypass-gitignoredisable automatic .gitignore-based file exclusion (uses regex-based exclusion of common build folders)
-e, --exclude=<exclude>comma-separated list of regex patterns to exclude files/folders (applied in addition to .gitignore patterns)
-fe, --file-extensions=<fileExtensions>comma-separated list of file-extensions to parse only those files (default: any)
--commit=<ref>analyze the codebase at a specific git commit, tag, branch, or date expression (creates a temporary worktree). Cannot be combined with --local-changes. See Commit-Based Analysis
-h, --helpdisplays this help and exits
-ibf, --include-build-foldersinclude build folders (out, build, dist and target) and common resource folders (e.g. resources, node_modules or files/folders starting with ’.’)
--local-changesonly analyze files that differ from the remote tracking branch (uncommitted, staged, unstaged, untracked). See Local Changes
-nc, --not-compressedsave uncompressed output File
-o, --output-file=<outputFile>output File (or empty for stdout)
--verbosedisplays messages about parsed and ignored files
Usage: ccsh unifiedparser [-h] [--bypass-gitignore] [--commit=<ref>] [-ibf]
[--local-changes] [-nc] [--verbose]
[-bf=<baseFile>] [-o=<outputFile>]
[-e=<specifiedExcludePatterns>]...
[-fe=<fileExtensionsToAnalyse>]... FILE or FOLDER...

The Unified Parser can analyze either a single file or a project folder; here are some sample commands:

ccsh unifiedparser src/test/resources -o foo.cc.json
ccsh unifiedparser src/test/resources/foo.ts -o foo.cc.json
ccsh unifiedparser src/test/resources -o foo.cc.json -nc --verbose
ccsh unifiedparser src/test/resources -o foo.cc.json --include-build-folders -e=something -e=/.*\.foo
ccsh unifiedparser src/test/resources -o foo.cc.json --bypass-gitignore

If a project is piped into the UnifiedParser, the results and the piped project are merged. The resulting project has the project name specified for the UnifiedParser.

cat pipeInput.cc.json | ccsh unifiedparser src/test/resources - -o merged.cc.json
  • In ruby the ‘lambda’ keyword is not counted correctly for complexity and number of functions
  • In C/C++/ObjectiveC using void as a parameter counts as 1 for parameters per function

This section describes what is counted for each metric per language. The parser uses Tree-sitter to parse source code and identifies specific AST node types for each metric.

Complexity is calculated using McCabe Complexity, counting the number of paths through the code. Each language has specific constructs that contribute to complexity:

  • Control flow: if_statement, do_statement, for_statement, while_statement, for_in_statement, ternary_expression, switch_case, switch_default, catch_clause
  • Functions: function_declaration, generator_function_declaration, arrow_function, generator_function, method_definition, class_static_block, function_expression
  • Logical operators: &&, ||, ?? in binary expressions
  • Control flow: if_statement, do_statement, for_statement, while_statement, for_in_statement, ternary_expression, conditional_type, switch_case, switch_default, catch_clause
  • Functions: function_declaration, generator_function_declaration, arrow_function, generator_function, method_definition, class_static_block, function_expression
  • Logical operators: &&, ||, ?? in binary expressions
  • Control flow: if_statement, do_statement, for_statement, while_statement, for_in_statement, ternary_expression, conditional_type, switch_case, switch_default, catch_clause
  • Functions: function_declaration, generator_function_declaration, arrow_function, generator_function, method_definition, class_static_block, function_expression
  • Logical operators: &&, ||, ?? in binary expressions
  • Control flow: if_statement, do_statement, for_statement, while_statement, enhanced_for_statement, ternary_expression, switch_label, catch_clause
  • Functions: constructor_declaration, method_declaration, lambda_expression, static_initializer, compact_constructor_declaration
  • Logical operators: &&, || in binary expressions
  • Control flow: if_expression, for_statement, while_statement, do_while_statement, elvis_expression, conjunction_expression, disjunction_expression, when_entry, catch_block
  • Functions: function_declaration, anonymous_function, anonymous_initializer, lambda_literal, secondary_constructor, setter, getter
  • Control flow: if_statement, do_statement, foreach_statement, for_statement, while_statement, conditional_expression, is_expression, and_pattern, or_pattern, switch_section, switch_expression_arm, catch_clause
  • Functions: constructor_declaration, method_declaration, lambda_expression, local_function_statement, accessor_declaration
  • Logical operators: &&, ||, ?? in binary expressions
C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)
Section titled “C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)”
  • Control flow: if_statement, do_statement, for_statement, while_statement, for_range_loop, conditional_expression, case_statement, catch_clause, seh_except_clause
  • Functions: lambda_expression, function_definition, abstract_function_declarator, function_declarator
  • Logical operators: &&, ||, and, or, xor in binary expressions
  • Control flow: if_statement, do_statement, for_statement, while_statement, conditional_expression, case_statement, seh_except_clause
  • Functions: function_definition, abstract_function_declarator, function_declarator
  • Logical operators: &&, || in binary expressions
  • Control flow: if_statement, do_statement, for_statement, while_statement, conditional_expression, case_statement, @catch
  • Functions: function_definition, block_expression
  • Logical operators: &&, || in binary expressions
  • Control flow: if_statement, elif_clause, if_clause, for_statement, while_statement, for_in_clause, conditional_expression, list, boolean_operator, case_pattern, except_clause
  • Functions: function_definition, lambda expressions (with specific nesting rules)
  • Control flow: if_statement, for_statement, communication_case, expression_case, type_case, default_case
  • Functions: method_declaration, func_literal, function_declaration, method_spec
  • Logical operators: &&, || in binary expressions
  • Control flow: if_statement, else_if_clause, do_statement, for_statement, while_statement, foreach_statement, conditional_expression, case_statement, default_statement, match_conditional_expression, match_default_expression, catch_clause
  • Functions: method_declaration, lambda_expression, arrow_function, anonymous_function, function_definition, function_static_declaration
  • Logical operators: &&, ||, ??, and, or, xor in binary expressions
  • Control flow: if, elsif, for, until, while, do_block, when, else, rescue
  • Functions: lambda, method, singleton_method
  • Logical operators: &&, ||, and, or in binary expressions
  • Control flow: if_statement, elif_clause, for_statement, while_statement, c_style_for_statement, ternary_expression, list, case_item
  • Functions: function_definition
  • Logical operators: &&, || in binary expressions
  • Control flow: if_statement, guard_statement, for_statement, while_statement, repeat_while_statement, switch_entry, catch_block, defer_statement, nil_coalescing_expression, ternary_expression, willset_clause, didset_clause
  • Functions: function_declaration, init_declaration, deinit_declaration, lambda_literal, subscript_declaration, computed_getter, computed_setter
  • Logical operators: conjunction_expression, disjunction_expression
  • Control flow: if, ifElse, for, foreach, while, case, caseCase, repeat, try, exceptionHandler
  • Functions: defProc (procedure/function implementation), lambda
  • Logical operators: kAnd, kOr, kXor in exprBinary

Comment lines are counted based on language-specific comment syntax:

  • JavaScript/TypeScript/TSX: comment, html_comment
  • Java: block_comment, line_comment
  • Kotlin: line_comment, multiline_comment
  • C#: comment
  • C/C++: comment
  • Objective-C: comment
  • Python: comment and unassigned string literals (used as block comments)
  • Go: comment
  • PHP: comment
  • Ruby: comment
  • Swift: comment, multiline_comment
  • Bash: comment
  • Delphi: comment (covers // line, { } brace, and (* *) star comments)

Function counting identifies different types of function definitions per language:

  • Simple functions: function_declaration, generator_function_declaration, method_definition, function_expression
  • Arrow functions: Assigned to variables (detected via variable_declarator with arrow_function value)
  • Simple functions: function_declaration, generator_function_declaration, method_definition, function_expression
  • Arrow functions: Assigned to variables (detected via variable_declarator with arrow_function value)
  • Simple functions: function_declaration, generator_function_declaration, method_definition, function_expression
  • Arrow functions: Assigned to variables (detected via variable_declarator with arrow_function value)
  • Methods and constructors: method_declaration, constructor_declaration, compact_constructor_declaration
  • Lambda expressions: Assigned to variables (detected via variable_declarator with lambda_expression value)
  • Simple functions: secondary_constructor, setter, getter
  • Complex functions: Property declarations with lambda literals, anonymous functions, or initializers; function declarations with function bodies
  • Methods and constructors: constructor_declaration, method_declaration, local_function_statement, accessor_declaration
  • Lambda expressions: Assigned to variables (detected via variable_declarator with lambda_expression value)
C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)
Section titled “C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)”
  • Functions: function_definition
  • Lambda expressions: Assigned to variables (detected via init_declarator with lambda_expression value)
  • Functions: function_definition
  • Functions: function_definition (C functions), method_definition (Objective-C methods)
  • Functions: function_definition
  • Lambda expressions: Assigned to variables (detected via assignment with lambda value)
  • Functions: method_declaration, func_literal, function_declaration, method_spec
  • Simple functions: method_declaration, function_definition, function_static_declaration
  • Anonymous functions: Assigned to variables (detected via assignment_expression with anonymous_function, arrow_function, or lambda_expression value)
  • Methods: method, singleton_method
  • Lambda expressions: Assigned to variables (detected via assignment with lambda value)
  • Functions: function_definition
  • Functions: function_declaration, init_declaration, deinit_declaration, computed_getter, computed_setter
  • Functions: defProc (procedure/function implementation in the implementation section). Forward declarations in the interface section (declProc) are not counted, and lambda contributes only to complexity.

LOC is calculated as the total number of lines in the file, including empty lines and comments. This metric is language-independent and simply counts from the first line to the last line of the file.

RLOC counts only lines that contain actual code, excluding:

  • Empty lines (whitespace only)
  • Comment-only lines
  • Lines that are part of multi-line comments

This metric is calculated by counting all lines that are not identified as comment nodes by the Tree-sitter parser for each language.

Parameters per function counts the number of parameters declared for each function. The metric identifies parameter nodes specific to each language:

  • JavaScript/TypeScript/TSX: formal_parameter, required_parameter
  • Java: formal_parameter
  • Kotlin: parameter
  • C#: parameter
  • C++: parameter_declaration
  • C: parameter_declaration
  • Objective-C: parameter_declaration (C functions), keyword_declarator (Objective-C method parameters)
  • Python: identifier parameters in parameters node
  • Go: parameter_declaration
  • PHP: simple_parameter, variadic_parameter, property_promotion_parameter
  • Ruby: identifier parameters
  • Swift: parameter
  • Bash: Parameters are counted from function definitions
  • Delphi: declArg

Message Chains is a code smell that detects method call chains with 4 or more consecutive calls (e.g., obj.a().b().c().d()), which can indicate tight coupling and violations of the Law of Demeter. The metric counts only method/function calls, not property accesses.

  • Chain nodes: call_expression, member_expression
  • Call nodes: call_expression
  • Chain nodes: call_expression, member_expression
  • Call nodes: call_expression
  • Chain nodes: call_expression, member_expression
  • Call nodes: call_expression
  • Chain nodes: method_invocation, field_access
  • Call nodes: method_invocation
  • Chain nodes: call_expression, navigation_expression
  • Call nodes: call_expression
  • Chain nodes: invocation_expression, member_access_expression
  • Call nodes: invocation_expression
C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)
Section titled “C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)”
  • Chain nodes: call_expression, field_expression
  • Call nodes: call_expression
  • Chain nodes: call_expression, field_expression
  • Call nodes: call_expression
  • Chain nodes: call, attribute
  • Call nodes: call
  • Chain nodes: call_expression, selector_expression
  • Call nodes: call_expression
  • Chain nodes: member_call_expression, scoped_call_expression, member_access_expression
  • Call nodes: member_call_expression, scoped_call_expression
  • Chain nodes: call
  • Call nodes: call

Message chains are not applicable to Bash as it does not support method chaining.

  • Chain nodes: call_expression, navigation_expression
  • Call nodes: call_expression
  • Chain nodes: exprCall, exprDot
  • Call nodes: exprCall, exprDot (paren-less Obj.M1.M2.M3.M4 chains). When exprDot is wrapped in exprCall (e.g. Obj.M1().M2().M3().M4()), only exprCall counts, preventing double-counting of message-chain calls.

The following code smell metrics are derived from the base metrics and are calculated after the tree traversal:

Counts the number of functions in a file that have more than 10 real lines of code (RLOC). This is a language-independent metric that uses the per-function RLOC values.

Counts the number of functions in a file that have more than 4 parameters. This is a language-independent metric that uses the per-function parameter counts.

Binary metric (0 or 1) that indicates whether a file has more than 10 comment lines. This threshold helps identify files that may be over-commented.

Calculates the ratio of comment lines to real lines of code (comment_lines / rloc). The result is rounded to two decimal places. If RLOC is 0, the ratio is 0.0.