Unified Parser
Category: Parser (takes in source code and outputs cc.json)
The Unified Parser is a parser to generate code metrics from a source code file or a project folder without relying on tools other than
CodeCharta. It generates a cc.json file (compressed by default, or uncompressed with -nc).
Supported Languages
Section titled “Supported Languages”| Language | Supported file extensions |
|---|---|
| Javascript | .js, .cjs, .mjs, .jsx |
| Typescript | .ts, .cts, .mts |
| TSX | .tsx |
| Java | .java |
| Kotlin | .kt |
| C# | .cs |
| C++ | .cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx |
| C | .c, .h |
| Objective-C | .m |
| Python | .py |
| Go | .go |
| PHP | .php |
| Ruby | .rb |
| Swift | .swift |
| Bash | .sh |
| Vue | .vue |
| Delphi | .pas, .dpr |
Supported Metrics
Section titled “Supported Metrics”| Metric | Description |
|---|---|
| complexity | Complexity of the file based on the number of paths through the code. Also includes complexity introduced by definition of functions, classes, etc. (Represents the ‘cognitive load’ necessary to overlook the whole file) |
| logic_complexity | Complexity of the file based on number of paths through the code, similar to cyclomatic complexity (only counts complexity in code, not complexity introduced by definition of functions, classes, etc.) |
| comment_lines | The number of comment lines found in a file |
| number_of_functions | The number of functions and methods in a file |
| loc (Lines of Code) | Lines of code including empty lines and comments |
| rloc (Real lines of code) | Number of lines that contain at least one character which is neither a whitespace nor a tabulation nor part of a comment |
| long_method | Code smell showing the number of functions with more than 10 real lines of code (rloc) |
| long_parameter_list | Code smell showing the number of functions with more than 4 parameters |
| excessive_comments | Code smell showing whether a file has more than 10 comment lines |
| comment_ratio | The ratio of comment lines to real lines of code (rloc) |
| message_chains | Code smell showing occurrences of method call chains with 4 or more consecutive calls suggesting tight coupling |
Some metrics are calculated on a per-function basis rather than per-file. Each of these metrics has max, min, mean and median values for each file. The names of these metrics are prefixed by “max_”, “min_”, …
| Metric per function | Description |
|---|---|
| parameters_per_function | The number of parameters for each function |
| complexity_per_function | The complexity inside the body of a function |
| rloc_per_function | The real lines of code inside the body of a function |
Usage and Parameters
Section titled “Usage and Parameters”| Parameter | Description |
|---|---|
FOLDER or FILE | The project folder or code file to parse. To merge the result with an existing project piped into STDIN, pass a ’-’ as an additional argument |
-bf, --base-file=<baseFile> | base cc.json file with checksums to skip unchanged files during analysis |
--bypass-gitignore | disable automatic .gitignore-based file exclusion (uses regex-based exclusion of common build folders) |
-e, --exclude=<exclude> | comma-separated list of regex patterns to exclude files/folders (applied in addition to .gitignore patterns) |
-fe, --file-extensions=<fileExtensions> | comma-separated list of file-extensions to parse only those files (default: any) |
--commit=<ref> | analyze the codebase at a specific git commit, tag, branch, or date expression (creates a temporary worktree). Cannot be combined with --local-changes. See Commit-Based Analysis |
-h, --help | displays this help and exits |
-ibf, --include-build-folders | include build folders (out, build, dist and target) and common resource folders (e.g. resources, node_modules or files/folders starting with ’.’) |
--local-changes | only analyze files that differ from the remote tracking branch (uncommitted, staged, unstaged, untracked). See Local Changes |
-nc, --not-compressed | save uncompressed output File |
-o, --output-file=<outputFile> | output File (or empty for stdout) |
--verbose | displays messages about parsed and ignored files |
Usage: ccsh unifiedparser [-h] [--bypass-gitignore] [--commit=<ref>] [-ibf] [--local-changes] [-nc] [--verbose] [-bf=<baseFile>] [-o=<outputFile>] [-e=<specifiedExcludePatterns>]... [-fe=<fileExtensionsToAnalyse>]... FILE or FOLDER...Examples
Section titled “Examples”The Unified Parser can analyze either a single file or a project folder; here are some sample commands:
ccsh unifiedparser src/test/resources -o foo.cc.jsonccsh unifiedparser src/test/resources/foo.ts -o foo.cc.jsonccsh unifiedparser src/test/resources -o foo.cc.json -nc --verboseccsh unifiedparser src/test/resources -o foo.cc.json --include-build-folders -e=something -e=/.*\.fooccsh unifiedparser src/test/resources -o foo.cc.json --bypass-gitignoreIf a project is piped into the UnifiedParser, the results and the piped project are merged. The resulting project has the project name specified for the UnifiedParser.
cat pipeInput.cc.json | ccsh unifiedparser src/test/resources - -o merged.cc.jsonKnown issues
Section titled “Known issues”- In ruby the ‘lambda’ keyword is not counted correctly for complexity and number of functions
- In C/C++/ObjectiveC using
voidas a parameter counts as 1 for parameters per function
Detailed Metric Calculation
Section titled “Detailed Metric Calculation”This section describes what is counted for each metric per language. The parser uses Tree-sitter to parse source code and identifies specific AST node types for each metric.
Complexity
Section titled “Complexity”Complexity is calculated using McCabe Complexity, counting the number of paths through the code. Each language has specific constructs that contribute to complexity:
JavaScript (.js, .cjs, .mjs, .jsx)
Section titled “JavaScript (.js, .cjs, .mjs, .jsx)”- Control flow:
if_statement,do_statement,for_statement,while_statement,for_in_statement,ternary_expression,switch_case,switch_default,catch_clause - Functions:
function_declaration,generator_function_declaration,arrow_function,generator_function,method_definition,class_static_block,function_expression - Logical operators:
&&,||,??in binary expressions
TypeScript (.ts, .cts, .mts)
Section titled “TypeScript (.ts, .cts, .mts)”- Control flow:
if_statement,do_statement,for_statement,while_statement,for_in_statement,ternary_expression,conditional_type,switch_case,switch_default,catch_clause - Functions:
function_declaration,generator_function_declaration,arrow_function,generator_function,method_definition,class_static_block,function_expression - Logical operators:
&&,||,??in binary expressions
TSX (.tsx)
Section titled “TSX (.tsx)”- Control flow:
if_statement,do_statement,for_statement,while_statement,for_in_statement,ternary_expression,conditional_type,switch_case,switch_default,catch_clause - Functions:
function_declaration,generator_function_declaration,arrow_function,generator_function,method_definition,class_static_block,function_expression - Logical operators:
&&,||,??in binary expressions
Java (.java)
Section titled “Java (.java)”- Control flow:
if_statement,do_statement,for_statement,while_statement,enhanced_for_statement,ternary_expression,switch_label,catch_clause - Functions:
constructor_declaration,method_declaration,lambda_expression,static_initializer,compact_constructor_declaration - Logical operators:
&&,||in binary expressions
Kotlin (.kt)
Section titled “Kotlin (.kt)”- Control flow:
if_expression,for_statement,while_statement,do_while_statement,elvis_expression,conjunction_expression,disjunction_expression,when_entry,catch_block - Functions:
function_declaration,anonymous_function,anonymous_initializer,lambda_literal,secondary_constructor,setter,getter
C# (.cs)
Section titled “C# (.cs)”- Control flow:
if_statement,do_statement,foreach_statement,for_statement,while_statement,conditional_expression,is_expression,and_pattern,or_pattern,switch_section,switch_expression_arm,catch_clause - Functions:
constructor_declaration,method_declaration,lambda_expression,local_function_statement,accessor_declaration - Logical operators:
&&,||,??in binary expressions
C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)
Section titled “C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)”- Control flow:
if_statement,do_statement,for_statement,while_statement,for_range_loop,conditional_expression,case_statement,catch_clause,seh_except_clause - Functions:
lambda_expression,function_definition,abstract_function_declarator,function_declarator - Logical operators:
&&,||,and,or,xorin binary expressions
C (.c, .h)
Section titled “C (.c, .h)”- Control flow:
if_statement,do_statement,for_statement,while_statement,conditional_expression,case_statement,seh_except_clause - Functions:
function_definition,abstract_function_declarator,function_declarator - Logical operators:
&&,||in binary expressions
Objective-C (.m)
Section titled “Objective-C (.m)”- Control flow:
if_statement,do_statement,for_statement,while_statement,conditional_expression,case_statement,@catch - Functions:
function_definition,block_expression - Logical operators:
&&,||in binary expressions
Python (.py)
Section titled “Python (.py)”- Control flow:
if_statement,elif_clause,if_clause,for_statement,while_statement,for_in_clause,conditional_expression,list,boolean_operator,case_pattern,except_clause - Functions:
function_definition, lambda expressions (with specific nesting rules)
Go (.go)
Section titled “Go (.go)”- Control flow:
if_statement,for_statement,communication_case,expression_case,type_case,default_case - Functions:
method_declaration,func_literal,function_declaration,method_spec - Logical operators:
&&,||in binary expressions
PHP (.php)
Section titled “PHP (.php)”- Control flow:
if_statement,else_if_clause,do_statement,for_statement,while_statement,foreach_statement,conditional_expression,case_statement,default_statement,match_conditional_expression,match_default_expression,catch_clause - Functions:
method_declaration,lambda_expression,arrow_function,anonymous_function,function_definition,function_static_declaration - Logical operators:
&&,||,??,and,or,xorin binary expressions
Ruby (.rb)
Section titled “Ruby (.rb)”- Control flow:
if,elsif,for,until,while,do_block,when,else,rescue - Functions:
lambda,method,singleton_method - Logical operators:
&&,||,and,orin binary expressions
Bash (.sh)
Section titled “Bash (.sh)”- Control flow:
if_statement,elif_clause,for_statement,while_statement,c_style_for_statement,ternary_expression,list,case_item - Functions:
function_definition - Logical operators:
&&,||in binary expressions
Swift (.swift)
Section titled “Swift (.swift)”- Control flow:
if_statement,guard_statement,for_statement,while_statement,repeat_while_statement,switch_entry,catch_block,defer_statement,nil_coalescing_expression,ternary_expression,willset_clause,didset_clause - Functions:
function_declaration,init_declaration,deinit_declaration,lambda_literal,subscript_declaration,computed_getter,computed_setter - Logical operators:
conjunction_expression,disjunction_expression
Delphi (.pas, .dpr)
Section titled “Delphi (.pas, .dpr)”- Control flow:
if,ifElse,for,foreach,while,case,caseCase,repeat,try,exceptionHandler - Functions:
defProc(procedure/function implementation),lambda - Logical operators:
kAnd,kOr,kXorinexprBinary
Comment Lines
Section titled “Comment Lines”Comment lines are counted based on language-specific comment syntax:
- JavaScript/TypeScript/TSX:
comment,html_comment - Java:
block_comment,line_comment - Kotlin:
line_comment,multiline_comment - C#:
comment - C/C++:
comment - Objective-C:
comment - Python:
commentand unassigned string literals (used as block comments) - Go:
comment - PHP:
comment - Ruby:
comment - Swift:
comment,multiline_comment - Bash:
comment - Delphi:
comment(covers//line,{ }brace, and(* *)star comments)
Number of Functions
Section titled “Number of Functions”Function counting identifies different types of function definitions per language:
JavaScript (.js, .cjs, .mjs, .jsx)
Section titled “JavaScript (.js, .cjs, .mjs, .jsx)”- Simple functions:
function_declaration,generator_function_declaration,method_definition,function_expression - Arrow functions: Assigned to variables (detected via
variable_declaratorwitharrow_functionvalue)
TypeScript (.ts, .cts, .mts)
Section titled “TypeScript (.ts, .cts, .mts)”- Simple functions:
function_declaration,generator_function_declaration,method_definition,function_expression - Arrow functions: Assigned to variables (detected via
variable_declaratorwitharrow_functionvalue)
TSX (.tsx)
Section titled “TSX (.tsx)”- Simple functions:
function_declaration,generator_function_declaration,method_definition,function_expression - Arrow functions: Assigned to variables (detected via
variable_declaratorwitharrow_functionvalue)
Java (.java)
Section titled “Java (.java)”- Methods and constructors:
method_declaration,constructor_declaration,compact_constructor_declaration - Lambda expressions: Assigned to variables (detected via
variable_declaratorwithlambda_expressionvalue)
Kotlin (.kt)
Section titled “Kotlin (.kt)”- Simple functions:
secondary_constructor,setter,getter - Complex functions: Property declarations with lambda literals, anonymous functions, or initializers; function declarations with function bodies
C# (.cs)
Section titled “C# (.cs)”- Methods and constructors:
constructor_declaration,method_declaration,local_function_statement,accessor_declaration - Lambda expressions: Assigned to variables (detected via
variable_declaratorwithlambda_expressionvalue)
C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)
Section titled “C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)”- Functions:
function_definition - Lambda expressions: Assigned to variables (detected via
init_declaratorwithlambda_expressionvalue)
C (.c, .h)
Section titled “C (.c, .h)”- Functions:
function_definition
Objective-C (.m)
Section titled “Objective-C (.m)”- Functions:
function_definition(C functions),method_definition(Objective-C methods)
Python (.py)
Section titled “Python (.py)”- Functions:
function_definition - Lambda expressions: Assigned to variables (detected via assignment with lambda value)
Go (.go)
Section titled “Go (.go)”- Functions:
method_declaration,func_literal,function_declaration,method_spec
PHP (.php)
Section titled “PHP (.php)”- Simple functions:
method_declaration,function_definition,function_static_declaration - Anonymous functions: Assigned to variables (detected via
assignment_expressionwithanonymous_function,arrow_function, orlambda_expressionvalue)
Ruby (.rb)
Section titled “Ruby (.rb)”- Methods:
method,singleton_method - Lambda expressions: Assigned to variables (detected via assignment with lambda value)
Bash (.sh)
Section titled “Bash (.sh)”- Functions:
function_definition
Swift (.swift)
Section titled “Swift (.swift)”- Functions:
function_declaration,init_declaration,deinit_declaration,computed_getter,computed_setter
Delphi (.pas, .dpr)
Section titled “Delphi (.pas, .dpr)”- Functions:
defProc(procedure/function implementation in theimplementationsection). Forward declarations in theinterfacesection (declProc) are not counted, andlambdacontributes only to complexity.
Lines of Code (LOC)
Section titled “Lines of Code (LOC)”LOC is calculated as the total number of lines in the file, including empty lines and comments. This metric is language-independent and simply counts from the first line to the last line of the file.
Real Lines of Code (RLOC)
Section titled “Real Lines of Code (RLOC)”RLOC counts only lines that contain actual code, excluding:
- Empty lines (whitespace only)
- Comment-only lines
- Lines that are part of multi-line comments
This metric is calculated by counting all lines that are not identified as comment nodes by the Tree-sitter parser for each language.
Parameters per Function
Section titled “Parameters per Function”Parameters per function counts the number of parameters declared for each function. The metric identifies parameter nodes specific to each language:
- JavaScript/TypeScript/TSX:
formal_parameter,required_parameter - Java:
formal_parameter - Kotlin:
parameter - C#:
parameter - C++:
parameter_declaration - C:
parameter_declaration - Objective-C:
parameter_declaration(C functions),keyword_declarator(Objective-C method parameters) - Python:
identifierparameters inparametersnode - Go:
parameter_declaration - PHP:
simple_parameter,variadic_parameter,property_promotion_parameter - Ruby:
identifierparameters - Swift:
parameter - Bash: Parameters are counted from function definitions
- Delphi:
declArg
Message Chains
Section titled “Message Chains”Message Chains is a code smell that detects method call chains with 4 or more consecutive calls (e.g., obj.a().b().c().d()), which
can indicate tight coupling and violations of the Law of Demeter. The metric counts only method/function calls, not property accesses.
JavaScript (.js, .cjs, .mjs, .jsx)
Section titled “JavaScript (.js, .cjs, .mjs, .jsx)”- Chain nodes:
call_expression,member_expression - Call nodes:
call_expression
TypeScript (.ts, .cts, .mts)
Section titled “TypeScript (.ts, .cts, .mts)”- Chain nodes:
call_expression,member_expression - Call nodes:
call_expression
TSX (.tsx)
Section titled “TSX (.tsx)”- Chain nodes:
call_expression,member_expression - Call nodes:
call_expression
Java (.java)
Section titled “Java (.java)”- Chain nodes:
method_invocation,field_access - Call nodes:
method_invocation
Kotlin (.kt)
Section titled “Kotlin (.kt)”- Chain nodes:
call_expression,navigation_expression - Call nodes:
call_expression
C# (.cs)
Section titled “C# (.cs)”- Chain nodes:
invocation_expression,member_access_expression - Call nodes:
invocation_expression
C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)
Section titled “C++ (.cpp, .cc, .cxx, .c++, .hh, .hpp, .hxx)”- Chain nodes:
call_expression,field_expression - Call nodes:
call_expression
C (.c, .h)
Section titled “C (.c, .h)”- Chain nodes:
call_expression,field_expression - Call nodes:
call_expression
Python (.py)
Section titled “Python (.py)”- Chain nodes:
call,attribute - Call nodes:
call
Go (.go)
Section titled “Go (.go)”- Chain nodes:
call_expression,selector_expression - Call nodes:
call_expression
PHP (.php)
Section titled “PHP (.php)”- Chain nodes:
member_call_expression,scoped_call_expression,member_access_expression - Call nodes:
member_call_expression,scoped_call_expression
Ruby (.rb)
Section titled “Ruby (.rb)”- Chain nodes:
call - Call nodes:
call
Bash (.sh)
Section titled “Bash (.sh)”Message chains are not applicable to Bash as it does not support method chaining.
Swift (.swift)
Section titled “Swift (.swift)”- Chain nodes:
call_expression,navigation_expression - Call nodes:
call_expression
Delphi (.pas, .dpr)
Section titled “Delphi (.pas, .dpr)”- Chain nodes:
exprCall,exprDot - Call nodes:
exprCall,exprDot(paren-lessObj.M1.M2.M3.M4chains). WhenexprDotis wrapped inexprCall(e.g.Obj.M1().M2().M3().M4()), onlyexprCallcounts, preventing double-counting of message-chain calls.
Code Smells
Section titled “Code Smells”The following code smell metrics are derived from the base metrics and are calculated after the tree traversal:
Long Method
Section titled “Long Method”Counts the number of functions in a file that have more than 10 real lines of code (RLOC). This is a language-independent metric that uses the per-function RLOC values.
Long Parameter List
Section titled “Long Parameter List”Counts the number of functions in a file that have more than 4 parameters. This is a language-independent metric that uses the per-function parameter counts.
Excessive Comments
Section titled “Excessive Comments”Binary metric (0 or 1) that indicates whether a file has more than 10 comment lines. This threshold helps identify files that may be over-commented.
Comment Ratio
Section titled “Comment Ratio”Calculates the ratio of comment lines to real lines of code (comment_lines / rloc). The result is rounded to two decimal places. If RLOC is 0, the ratio is 0.0.