Skip to content

Feature: Support for parsing C/C++ test descriptions #2231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Master5 opened this issue May 8, 2025 · 9 comments
Open

Feature: Support for parsing C/C++ test descriptions #2231

Master5 opened this issue May 8, 2025 · 9 comments
Milestone

Comments

@Master5
Copy link

Master5 commented May 8, 2025

Description

Add support for parsing C/C++ test descriptions independently of any specific test framework. The parser should extract structured test documentation embedded in comments above test functions, using recognizable markers such as @test and Markdown headers.

Problem

Currently, there is no built-in support for parsing test descriptions directly from C/C++ source files without relying on a specific test framework (e.g., CTest, GoogleTest, etc.). This makes it difficult to extract metadata or structured documentation from tests in a consistent way across different codebases.

Additionally, without structured parsing, it's not possible to automatically generate software test documentation that includes traceability links (e.g., to requirements or test cases). This limits the ability to ensure compliance with standards or maintain a clear test-to-requirement mapping.

Solution

Implement a general-purpose C/C++ parser that can identify specially formatted comment blocks placed above test functions. These blocks should include markers like @test and structured headers e.g. Expected Results:.

To support downstream tooling, the format of these headers should align with the grammar supported by StrictDoc, allowing seamless generation of documentation and traceability reports. This enables the automatic creation of structured test documentation compatible with existing documentation pipelines or compliance tools.

/** 
 * @test Test Initialization
 *
 * @relation(TC-00001, scope=function)
 * 
 * **Intention**: blabla  
 *
 * **Input**:  
 * blabla  
 *
 * **Expected Results**:  
 * blabla
 */
TEST_CASE("Test Initialization", testInitialization)
{
    ...
}

Additional Information

  • The parser should be independent of any specific test framework.
  • Could be useful for generating test documentation
@Master5 Master5 changed the title Feature: Feature: Support for parsing C/C++ test descriptions May 8, 2025
@stanislaw
Copy link
Collaborator

cc @haxtibal @thseiler @richardbarlow @nicpappler @johanenglund @RobertoBagnara @fkromer what do you think about the syntax proposed? Any feedback is appreciated.

@stanislaw stanislaw added this to the 2025-Q2 milestone May 8, 2025
@nicpappler
Copy link

Generally I like the syntax. What I'm just wondering, do we also want a unique ID for a test, so that we can uniquely identify it? I just assume there will not be a separate test case specification in a requirement style describing the test, right? So the test here is both test case specification and test case implementation, right?

@fkromer
Copy link

fkromer commented May 8, 2025

Context: In embedded Linux devices it's pretty common to have heterogeneous language usage (C/C++/Rust, TypeScript, Flutter, ...). I'd extend the requirement to "Implement a general-purpose programming language parser..." in the first place to avoid possible future incompatibilities.

Do potential future language specific source code comment processors (equivalents to e.g. C/C++ Doxygen) support the syntax as well? Otherwise it would not be possible to process those comments with those tools anymore.

Rust: rustdoc
TypeScript: typedoc
Flutter: dartdoc
...

It would be great to be that generic to enable consideration of Yocto ptest (sh) as well.

@fkromer
Copy link

fkromer commented May 8, 2025

@nicpappler

Generally I like the syntax. What I'm just wondering, do we also want a unique ID for a test, so that we can uniquely identify it? I just assume there will not be a separate test case specification in a requirement style describing the test, right? So the test here is both test case specification and test case implementation, right?

In general I like the syntax as well. My assumption is that TC-00001 is the UID of the SDoc test specification (empty, just a reference the parser/logic can refer to) which is references by the test implementation. The parser extracts the test specification from the source code and updates the empty SDoc test specification.

@johanenglund
Copy link
Contributor

This feature world be a very welcomed addition.

Anything I don't entirely understand the linkage to the tested requirement if TC-00001 in the example is a test case?

@stanislaw
Copy link
Collaborator

Thanks to everyone for your comments so far. Let me also share my thoughts and considerations.

As someone who has to maintain many features and nuances of StrictDoc in the long term, I would prefer a markup that works consistently across all types of source files and remains independent of any single document processing system—which the proposed example does seem to allow for. The example in the issue description features Doxygen. I am almost fine with using the **<Field name>**: Field content syntax**, but one thing comes to mind: having both @relation() and **...**: ... type of fields is slightly inconsistent.

Example: The following C code:

/**
 * @brief Adds two integers.
 *
 * @test Test Initialization
 *
 * @relation(TC-00001, scope=function)
 *
 * **Intention**: blabla
 *
 * **Input**:
 * blabla
 *
 * **Expected Results**:
 * This
 * \n\n
 * is how we can do
 * \n\n
 * paragraphs.
 *
 */
int add(int a, int b);

...translates to Doxygen as follows.

Doxygen example

Image

I guess there are two things that are inconsistent here, but maybe it is all not so bad and can be acceptable from the user experience point of view:

1) @relation marker is printed as-is. Previously @johanenglund suggested that Doxygen can be customized and the @relation{} is used, see here: https://strictdoc.readthedocs.io/en/latest/latest/docs/strictdoc_01_user_guide.html#SECTION-UG-Doxygen. If the {} syntax is not used, is it too disturbing to see the @relation(...) directly in the Doxygen HTML output?

2) Doing @relation for relations and using bold syntax for node fields does feel inconsistent but doing relations using the bold syntax would result in something like this which also looks strange:

**Relation: TC-00001, scope=function)** 

I am not sure if there is a better way of doing this, and I am open to suggestions.

Test case description vs test case source

I just assume there will not be a separate test case specification in a requirement style describing the test, right? So the test here is both test case specification and test case implementation, right?

My understanding is that @Master5 wants to keep the test specification directly in the source code which makes sense from the maintainability point of view.

What StrictDoc can do here is to parse both the @relation markers and the **<Field name>: <Field value>** parts and create SDoc nodes at runtime. This way a document with a test specification will be generated automatically, with each test specification item linking to its source code.

The StrictDoc's source parser can work in two different modes:

  1. If no **Key:** Value IS NOT provided, the behavior will be like it is now when the node pointed to by @relation is linked with a source code function.

  2. If at least one **Key:** Value IS provided, the node pointed to by @relation is linked with the auto-generated SDoc description node which is then linked to the source code function.

How to deal with UID?

Generally I like the syntax. What I'm just wondering, do we also want a unique ID for a test, so that we can uniquely identify it?

When it comes to generating the UID, it is again two behaviors:

  1. If a user provides a field **UID:** Foobar, then the SDoc node will be auto-generated with this UID.
  2. If a user does not provide it, the SDoc node will be auto-generated with a UID synthesized from a relative source file path + the function name.

Open questions

  1. I think we should assume that not only test nodes can have such source code descriptions, and this is why one field could simply be: **Node type: TEST_CASE** or **NODE_TYPE: TEST_CASE**.

In general, it feels like using ALL_CAPS names for the field names would match StrictDoc's current convention better but I am open to relaxing the constraints at the source code level.

  1. How to avoid collisions between the **<name>:**: <value> and a user randomly using this syntax? Random use of this syntax will confuse StrictDoc's marker parser and will make it generate nonsensical documents.

@Master5
Copy link
Author

Master5 commented May 13, 2025

Thanks again everyone for the great input. Based on the discussion, I'd like to propose a refined, more consistent and semantically clear approach that may work across multiple languages and tooling setups, including StrictDoc.

I don't think Strict needs to be aware of the actual test case description markup or structure. Any metadata, such as the relationship to the requirement, should use existing syntax, such as @relation(...). I propose that the test description be opaque to StrictDoc until the document is actually rendered. This would allow for different styles of test description without adaptation to StrictDoc.

Relation Marker Clarification

Just a quick clarification regarding the @relation(TC-00001, scope=function) marker:

  • In this context, the @relation tag is used to establish a traceability link between the test (in code) and the requirement it verifies.
  • The identifier TC-00001 in the original example looks like it points to another test case. To reduce confusion, we propose using REQ-00001 to explicitly indicate that the target is a requirement, not a test.
  • This will improve clarity and avoid misinterpretation in both tooling and manual review.

Proposal @-based Tags and opqaque test description

  • All @tags are reserved and explicitly StrictDoc-relevant (and parseable)
  • Other content is left opaquely as free-form test documentation or human-readable descriptions
  • Works well in /** */, `, and /// comment formats
  • Compatible with potential Gherkin-style test expression or markdown or rst

Markdown Variant

This approach keeps all tooling-relevant fields prefixed with @ (for traceability), while the rest of the documentation uses Markdown headers for semantic structuring.

1. C/C++

/**
 * @relation(REQ-000001, scope=function)
 * @verification_method(LLT)
 *
 * # Intention
 * Sunday isn't Friday
 *
 * # Input
 * - today is Sunday
 * - I ask whether it's Friday yet
 *
 * # Expected result
 * - I should be told "Nope"
 */

2. Python (e.g. PyTest)

"""
@relation(REQ-000001, scope=function)
@verification_method(LLT)

# Intention
Sunday isn't Friday

# Input
- today is Sunday
- I ask whether it's Friday yet

# Expected result
- I should be told "Nope"
"""

3. Rust

/// @relation(REQ-000001, scope=function)
/// @verification_method(LLT)
///
/// # Intention
/// Sunday isn't Friday
///
/// # Input
/// - today is Sunday
/// - I ask whether it's Friday yet
///
/// # Expected result
/// - I should be told "Nope"

Gherkin-Style Variant

This format aligns with BDD-style scenarios and could optionally be supported for teams already using Gherkin syntax.

1. C/C++

/**
 * @relation(REQ-000001, scope=function)
 * @verification_method(LLT)
 *
 * Scenario: Sunday isn't Friday
 *   Given today is Sunday
 *   When I ask whether it's Friday yet
 *   Then I should be told "Nope"
 */

2. Python

"""
@relation(REQ-000001, scope=function)
@verification_method(LLT)

Scenario: Sunday isn't Friday
  Given today is Sunday
  When I ask whether it's Friday yet
  Then I should be told "Nope"
"""

3. Rust

/// @relation(REQ-000001, scope=function)
/// @verification_method(LLT)
///
/// Scenario: Sunday isn't Friday
///   Given today is Sunday
///   When I ask whether it's Friday yet
///   Then I should be told "Nope"

Explanation of @ Metadata Markers

All lines starting with @ are intended for structured metadata and should be interpreted by tools like StrictDoc. These lines should be consistently parseable and ideally defined by a grammar or schema.

Currently used:
@relation(REQ-000001, scope: function):
Creates a traceability link to a requirement.

@verification_method(LLT):
Specifies how the requirement is verified.
Common values could be:

  • LLT = Low-Level Test
  • HLT = High-Level Test
  • Analysis
  • Review

You could expand this with additional @key(value) fields in the future — e.g., @author.


  • Using Markdown headers makes the documentation more semantic and easier to parse.
  • All StrictDoc-relevant markers begin with @, and the rest can remain tool-agnostic.
  • This structure should be compatible with most doc generators (e.g., Doxygen, rustdoc, typedoc) and future extensions (e.g., Yocto test specs).
  • Encourages better separation of logic (metadata vs narrative), while staying simple and readable.

@stanislaw
Copy link
Collaborator

stanislaw commented May 16, 2025

Just a heads up: @mettta and I are finishing a relatively large migration of the document model. When that is done, I would like to get to implementing a prototype for this test descriptions parser as one of the next work items.

In general, I am still seeing the following inconsistencies:

  • Both the **<Field name>**: <Field value> and # <Field name> syntaxes could cause conflicts with the non-StrictDoc use of Markdown in Doxygen. I saw many projects that actively use #-headers to write documentation in Doxygen. This would cause a direct conflict with StrictDoc trying to parse them as SDocNodeFields.
  • I see that there is a lack of consistency in how we decide which fields are reserved with @-character and what is not. For example, why @verification_method would become a privileged/reserved field but not the other ones?

I am leaning towards implementing a parser that would support multiple flavours of the syntax parsing. I will probably start with **<Field name>**: <Field value> as the one that is more readable and will hopefully cause less conflicts with other non-SDoc-related Markdown on average.

Any further thoughts on a syntax that would be Markdown-friendly but also non-invasive are appreciated.

@stanislaw
Copy link
Collaborator

For everyone involved in this discussion, I have merged the initial work here: #2272. See the description there. Now it is to be discussed whether the default grammar chosen is good enough or we need to implement more flavors/customizations.

@stanislaw stanislaw modified the milestones: 2025-Q2, 2025-Q3 Jun 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants