Skip to content

Forward dataflow analysis for collecting comparison facts #657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 79 commits into from
Aug 22, 2019

Conversation

ppashakhanloo
Copy link
Contributor

@ppashakhanloo ppashakhanloo commented Aug 7, 2019

This pull request adds a forward dataflow analysis for collecting facts in every function for each CFG block.
I have added some new tests under test/CheckedC/dump_dataflow_facts.c.
Some comments still need to be added.

The main function that runs the analysis is AvailableFactsAnalysis::Analyze() which implements the iterative worklist algorithm. This function must be called before checking the bounds declarations.

For each CFG block B, the following sets are computed:

  • Gen[B]: For every if-node, if the condition is a binary operator where opcode is one of <, <=, >, or >=, add the condition to Gen[B]
  • Kill[B]: First, collect all the variables that are defined in block B, and if any expression in the program contains any of these variables, add it to Kill[B]
  • In[B]: The intersection of the Out set of Predecessors of B
  • Out[B]: First, compute the set union between In[B] and Gen[B]. Next, remove any comparison in Kill[B].

For using the collected facts, the following functions can be used:

  • void Next(); // moving to the next block in CFG
  • void GetFacts(std::set<std::pair<Expr *, Expr *>>& InequalityFacts); // getting the collected facts of the current CFG block

Representation of the comparison facts:

  • Each fact is represented by a pair of two Expr*s. By (Expr1, Expr2), we mean Expr1 <= Expr2 or Expr1 < Expr2.

@ppashakhanloo ppashakhanloo changed the title WIP: Forward dataflow analysis for collecting inequality facts Forward dataflow analysis for collecting inequality facts Aug 9, 2019
@ppashakhanloo
Copy link
Contributor Author

@dtarditi I believe this PR is ready for a review of this stage.

@ppashakhanloo ppashakhanloo changed the title Forward dataflow analysis for collecting inequality facts Forward dataflow analysis for collecting comparison facts Aug 9, 2019
Copy link
Member

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! I have a number of suggestions for the code.

// - Expr1 <= Expr2
// - Expr2 > Expr1
// - Expr2 >= Expr1
void ExtractComparisons(const Stmt *St, ComparisonSet &ISet) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can St be an Expression? I'd like you to use the Expr::ignoreParens() method to ignore parenthesis, so that this is more general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

// This function looks recursively for comparisons in statement `S`.
// Then, it records them as elements of type Comparison in `ISet`.
// Each Comparison is a pair in form of (Expr1, Expr2) where both
// Expr1 and Expr2 are of type Expr*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is confusing. It should just say that for each expression, Expr1 <= Expr2 holds.

@@ -3285,6 +3288,318 @@ namespace {
};
}

namespace {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code for this analysis should be broken out into separate files: a .h file for the class declaration and cpp for the class implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1491,6 +1492,8 @@ Expr *Sema::MakeAssignmentImplicitCastExplicit(Expr *E) {

namespace {
class CheckBoundsDeclarations {
public:
typedef llvm::SmallPtrSet<const Stmt *, 16> StmtSet;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard-wiring 16 into the code, can we use a constant expression?

}

// This function fills `ComparisonFacts` with pairs (Expr1, Expr2) where
// Expr1 < Expr2, Expr1 <= Expr2, Expr2 > Expr1, or Expr2 >= Expr1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is confusing. The invariant is that Expr1 <= Expr2.

}

// Given a set of Comparisons `InputSet`, this function negates them
// by swaping the element of pairs, and records them in `OutputSet`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: swaping->swapping.

@@ -0,0 +1,213 @@
// Tests for dumping of datafow analysis for collecting facts
//
// RUN: %clang_cc1 -DDEBUG_DATAFLOW -fcheckedc-extension %s 2>&1 | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a compiler flag. After the analysis, you'd check the compiler flag and dump the results accordingly. You can look at the recent commit by @mgrang to see how to do this.

Copy link
Contributor Author

@ppashakhanloo ppashakhanloo Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The new flag is -fdump-extracted-comparison-facts.

b = c;
}

// CHECK: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should include block number information. Otherwise this will be very difficult to debug in the future.

Could you add more test cases for more complicated expressions?

WorkList.push(GetByCFGBlock(*I));
}

//if(DEBUG_DATAFLOW)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be moved out to a separate routine. The dumps should be controlled by a compiler flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return false;
}

// This function looks recursively for comparisons in statement `S`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proper invariant is this: this function computes a list of comparisons E1 <= E2 that will be true if the expression S evaluates to true. Clearly if S is simple direct comparison expression E3 op E4, a comparison can be created if op is one of LE, LT, GE, GT, or EQ. If S has the form A && B, comparisons can be created for A and B.

Could you add a TODO for handling logical negation the (!) operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dtarditi
Copy link
Member

dtarditi commented Aug 9, 2019

We also need to conservative in the set of comparisons that we collect with respect to pointers and calls.

  • We can't include comparisons whose expressions involve calls. The call could produce a different value.
  • Similarly a comparison shouldn't include a reference to a volatile variable.
  • If an expression in a comparison contains a pointer dereference, we need to kill the comparison at any potential pointer assignment expression (assignment via a pointer or a call).

Copy link

@sunnychatterjee sunnychatterjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Pardis,

I've left some comments in the PR for your consideration.

Thank you,
Sunny

@@ -1520,6 +1521,8 @@ Expr *Sema::MakeAssignmentImplicitCastExplicit(Expr *E) {

namespace {
class CheckBoundsDeclarations {
public:
typedef llvm::SmallPtrSet<const Stmt *, 16> StmtSet;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "using" is the preferred syntax over C-style typedef :) Similar suggestion for other places where you've used a typedef.

ComparisonSet Kill, GenThen, GenElse;

public:
ElevatedCFGBlock(const CFGBlock *Block) : Block(Block) {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ElevatedCFGBlock(const CFGBlock *Block) : Block(Block) {} -> Can we use a different variable name than the data member?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunnychatterjee I followed a similar naming convention that appears in the existing constructors in this file. Should I change it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppashakhanloo You don't have to change this if it appears in the existing convention. Personally, I don't prefer this style since you have to qualify member access with a "this" in such cases.


class ElevatedCFGBlock {
private:
const CFGBlock *Block;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add default initializations for the data members? For example:

private:
const CFGBlock *Block = nullptr;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, for other data members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunnychatterjee ElevatedCFGBlock has one constructor with const CFGBlock *Block as input. And there is no default constructor for it. I am not sure why we need to add default initializations for the data members here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppashakhanloo For a single data member like in your example, it's fine. I usually follow the rule to always initialize data members of the object: http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-always.
This way, we avoid errors when constructors sometimes don't initialize all the data members of a class.

public:
ElevatedCFGBlock(const CFGBlock *Block) : Block(Block) {}

friend class AvailableFactsAnalysis;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friend class AvailableFactsAnalysis; --> Is there a way to avoid the "friend"? For example, we can have public methods to access the data members of "ElevatedCFGBlock".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunnychatterjee I used friend since AvailableFactsAnalysis should be the only class that manipulates an object of ElevatedCFGBlock. (I thought of it as an Iterator class.)
If I make public methods to access the data members of it, I think I share more than necessary with the world outside AvailableFactsAnalysis. Is there a better way to achieve this goal?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppashakhanloo Thanks for the explanation. I try to avoid "friend" wherever possible since it tends to break encapsulation. However, there are exceptions to this like iterators as you mention. Given that the two classes are so close to each other (nested), let's leave them as is.

if (Block->getBlockID() > MaxBlockID)
MaxBlockID = Block->getBlockID();

BlockIDs.resize(MaxBlockID + 1, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clear BlockIDs before calling resize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

WorkList.push(GetByCFGBlock(*I));
}

//if(DEBUG_DATAFLOW)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant to remove the comment here right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added a compiler flag for debugging purposes and removed this comment.

return Blocks[BlockIDs[B->getBlockID()]];
}

// Given two sets S1 and S2, the return value is S1 \ S2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You meant: S1 - S2, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunnychatterjee I will change this comment to S1 - S2 for more clarity. I meant \ which is the set difference operator.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppashakhanloo This was a clarifying question. Feel free keep the set difference operator.

if (const Expr *E = dyn_cast<Expr>(St))
AllExprs.insert(E);
for (auto I = St->child_begin(); I != St->child_end(); ++I)
CollectExpressions(*I, AllExprs);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use range-for loop here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

CollectDefinedVars(*I, DefinedVars);
}

//#if DEBUG_DATAFLOW

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant to remove the comment for DEBUG_DATAFLOW, right? I'd expect the pattern to look like this in the production code. Alternatively, you can use some debug flags to enable this.

#DEBUG_DATAFLOW 0 // Set this to 1 for debugging

#if DEBUG_DATAFLOW
...
#endif // DEBUG_DATAFLOW

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I removed this comment and added a new compiler flag which is only used when testing the functionality of this part of the code.

@@ -3394,8 +3726,11 @@ void Sema::CheckFunctionBodyBoundsDecls(FunctionDecl *FD, Stmt *Body) {
ComputeBoundsDependencies(Tracker, FD, Body);
std::unique_ptr<CFG> Cfg = CFG::buildCFG(nullptr, Body, &getASTContext(), CFG::BuildOptions());
CheckBoundsDeclarations Checker(*this, Body, Cfg.get(), FD->getBoundsExpr());
if (Cfg != nullptr)
AvailableFactsAnalysis Collector(*this, Cfg.get());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this inside the if-block below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests for assignments via pointers or dereferences via pointers need to be corrected.

The operators are defined in clang/include/clang/AST/OperationKinds.def. The *', '->and[]` operators all produce lvalues. In the clang AST, memory is read by using an LValueToRvalue Cast. Memory is modified in the lvalue appearing as the left-hand side of an assignment or as the subexpression of increment or decrement operator.

@@ -169,7 +169,6 @@ CODEGENOPT(CoverageMapping , 1, 0) ///< Generate coverage mapping regions to
///< enable code coverage analysis.
CODEGENOPT(DumpCoverageMapping , 1, 0) ///< Dump the generated coverage mapping
///< regions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears a newline was deleted here. Could you add it back?

@@ -2625,6 +2625,9 @@ def : Flag<["-"], "integrated-as">, Alias<fintegrated_as>, Flags<[DriverOption]>
def : Flag<["-"], "no-integrated-as">, Alias<fno_integrated_as>,
Flags<[CC1Option, DriverOption]>;

def fdump_extracted_comparison_facts : Flag<["-"], "fdump-extracted-comparison-facts">, Group<f_Group>, Flags<[CC1Option]>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this next to the other Checked C flags? The flags beging at def fcheckedc_extension.


bool AvailableFactsAnalysis::ContainsFunctionCall(const Expr *E) {
if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(E))
if (const FunctionDecl *FD = dyn_cast<FunctionDecl>(DRE->getDecl()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the right check for whether an expression contains a call expression. You should for CallExpr, not the presence of a function call.

@ppashakhanloo
Copy link
Contributor Author

@dtarditi I addressed your comments and added more tests.
Then, I updated CompareUpperOffsets/CompareLowerOffsets to use Facts of the current block that is passed by reference to them. Instead of the function EqualExtended that was used before, I added LessThanOrEqualExtended which is similar to EqualExtended except that it also looks up facts when comparing two offsets. The corresponding test is in test/CheckedC/static-checking/bounds-decl-checking.c .

The code is ready for review.

Copy link
Member

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some additional suggestions for the code. I think you can simplify the code by using the existing function Sema::CheckIsNonModifying also. You'll want to pass an argument so that CheckIsNonModifying doesn't report an error if it s find a modifying expression.

return true;
}
if (const CallExpr *CE = dyn_cast<CallExpr>(E)) {
for (int ArgIndex = 0; ArgIndex < CE->getNumArgs(); ArgIndex++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any call expression must be considered as possibly doing a pointer assignment. We should not consider the types of the arguments passed to the function call. It's possible that a pointer could be accessible to a function via a global variable or through a structure type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return false;
}

// This function return true if an expression is volatile, and false otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: return->returns

// - If the child of `E` is `e`, then `f(e)` must be true. `f(e)` is defined in
// a separate function `IsPointerDerefLValue`.
// Otherwise, the function returns false;
bool AvailableFactsAnalysis::IsPointerDeref(const Expr *E) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check whether E contains a pointer dereference. That means you need to check the children recursively also. I think you should rename this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return false;
}

bool AvailableFactsAnalysis::IsPointerAssignment(const Expr *E) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check whether E contains a pointer assignment. You need to recursively check the children too. I think you should rename this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

CollectExpressions(I.first, Exprs);
CollectExpressions(I.second, Exprs);
for (auto InnerExpr : Exprs)
if (const DeclRefExpr *DRE = dyn_cast<DeclRefExpr>(InnerExpr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variable v produces an lvalue. For the variable to be used, the lvalue must be the used by an LValueToRValue cast expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining this. Fixed.

// - `E` has the form `E1 && E2`: ExtractComparisons in E1 and E2 and add them to `ISet`.
// TODO: handle the case where logical negation operator (!) is used.
void AvailableFactsAnalysis::ExtractComparisons(const Expr *E, ComparisonSet &ISet) {
if (IsVolatile(E) || ContainsCallExpr(E))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the function Sema::CheckIsNonModifying instead.

// - `E` has the form `E1 || E2`: ExtractNegatedComparisons in E1 and E2 and add
// them to `ISet`.
void AvailableFactsAnalysis::ExtractNegatedComparisons(const Expr *E, ComparisonSet &ISet) {
if (IsVolatile(E) || ContainsCallExpr(E))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the function Sema::CheckIsNonModifying instead.


// This function return true if an expression is volatile, and false otherwise.
// An expression is volatile if there is at least one volatile variable in it.
bool AvailableFactsAnalysis::IsVolatile(const Expr *E) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overly conservative: it would flag an expression that used &v, where v is a volatile variable, as being volatile. I suggest that you use the existing function CheckIsNonModifying in the functions that are calling this instead.

// We assume a variable is defined if it appears in the lhs of an assignment:
// 1. increment or decrement operator (a++)
// 2. assignment operator (a += 1, a = 2)
void AvailableFactsAnalysis::CollectDefinedVars(const Stmt *St, std::set<const VarDecl *> &DefinedVars) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could miss some definitions of variables. In the unary operator case, the expression being incremented/decremented could be parenthesized. In both cases, the LValueBitCast could also be parenthesized. I think you need a loop that ignores parentheses and lvalue bit casts until nothing changes. This should be factored into a separate function.

return false;

bool EqualWithoutFacts = EqualValue(Ctx, VariablePart1, VariablePart2, EquivExprs);
bool LEWithFacts = FactExists(Ctx, VariablePart1, VariablePart2, EquivExprs, Facts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic would be clearer to return true if EqualWithoutFact is true, and similarly return true if LEWtihFacts is true. Then return false otherwise.

@ppashakhanloo
Copy link
Contributor Author

@dtarditi Thanks for your comments. I addressed all of them.
The code is ready for review.

Copy link
Member

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you.

There is one change in a comment that I'd like you to make. I don't need to re-review it before you complete the PR.

@@ -2151,6 +2156,62 @@ namespace {
return true;
}

// This function is an extension of EqualExtended. It looks into the provided `Facts`
// in order to prove `Base1 + Offset1 <= Offset2 + Offset2`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's typo here: Offset2 is used twice.

Could you add a TODO comment: we are ignoring the possibility of overflow in the addition.

@ppashakhanloo ppashakhanloo merged commit 8bb1425 into master Aug 22, 2019
@dtarditi dtarditi deleted the collect-facts branch August 26, 2019 21:17
Machiry pushed a commit to Machiry/checkedc-clang that referenced this pull request Jan 21, 2022
The most important changes:

- Clean up the multi-decl detection and rewriting code, and unify the
  code paths for global variables, fields, and local variables.

- Automatically generate names for unnamed inline structs if they need
  to be rewritten, and remove 3C's previous workarounds that tried to
  avoid needing to rewrite multi-decls with unnamed inline structs that
  3C couldn't handle correctly.

- Inside a struct, when rewriting a field multi-decl that contains an
  inline struct, correctly detect the presence of the inline struct to
  avoid losing it completely, and additionally move it to the top level
  to avoid a compiler warning.

- Add support for typedef multi-decls on par with variable and field
  multi-decls.

See correctcomputation#657 for
more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants