Skip to content

WI #2039 Compute cyclic call using transitive closure. #2052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions TypeCobol.Analysis/Cfg/ControlFlowGraphBuilder.PerformCallRelation.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using TypeCobol.Compiler.Nodes;

namespace TypeCobol.Analysis.Cfg
{
public partial class ControlFlowGraphBuilder<D>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be in the class ControlFlowGraphBuilder ?
It seems to me that the transitive closure mechanism could be put aside the ControlFlowGraphBuilder.
I mean it can be called by ControlFlowGraphBuilder, but the mechanism could also be called later. Once the graph is fully calculated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes because only ControlFlowGraphBuilder knows how to discover direct perfrom calls. Once the graph is fully calculated direct perform calls context will be difficult to reproduce without retraversing the graph and analyzing each block to look for performs and perform targets..

{
/// <summary>
/// Represent the perform call chain of a PerformProcedure caller
/// </summary>
private class PerformCallChain
{
/// <summary>
/// Source Perform Procedure instruction
/// </summary>
internal PerformProcedure PerformCaller;
/// <summary>
/// List of PerformProcedure (Nodes) that form the call chain from the 'PerformCaller'
/// </summary>
internal List<Node> PerformCallees;
/// <summary>
/// All target procedures
/// </summary>
internal HashSet<Procedure> Procedures;

/// <summary>
/// Constructor
/// </summary>
/// <param name="performCaller"></param>
internal PerformCallChain(PerformProcedure performCaller)
{
this.PerformCaller = performCaller;
this.PerformCallees = new List<Node>();
this.Procedures = new HashSet<Procedure>();
}
/// <summary>
/// Constructor
/// </summary>
/// <param name="performCaller"></param>
/// <param name="performCallees"></param>
/// <param name="procedures"></param>
internal PerformCallChain(PerformProcedure performCaller, List<Node> performCallees, HashSet<Procedure> procedures)
{
this.PerformCaller = performCaller;
this.PerformCallees = performCallees;
this.Procedures = procedures;
}
}

/// <summary>
/// Representation of Perform Call Relation.
/// It is a dictionary which gives for each Procedure its Perform Call chain.
/// </summary>
private class PerformCallRelation : Dictionary<Procedure, PerformCallChain>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inheritance from Dictionary disturbs me.
You are not extending the concept of Dictionary here but rather you want to use a Dictionary to store some information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not an extension of the concept of Dictionary because Generic Type Variables <TKey, TValue> of the Dictionary base class are instantiated with <Procedure, PerformCallChain>types.

{
/// <summary>
/// The Relation's domain
/// </summary>
internal HashSet<Procedure> Domain;

internal PerformCallRelation()
{
Domain = new HashSet<Procedure>();
}
/// <summary>
/// Add a PERFORM CALL relation : Caller => Callee (Procedure)
/// Example of code:
/// main.
/// perform a
/// goback
/// .
/// a.
/// perform b
/// .
/// The relation is: a => b
/// </summary>
/// <param name="performCaller">PERFORM procedure caller, source instruction: in the example (perform a)</param>
/// <param name="caller">Caller procedure: in the example 'a'</param>///
/// <param name="performCallee">The PERFORM procedure called, target instruction: in the example (perform b)</param>
/// <param name="callee">The called procedure: in the example 'b' </param>
internal void AddToRelation(PerformProcedure performCaller, Procedure caller, PerformProcedure performCallee, Procedure callee)
{
if (!TryGetValue(caller, out var performCallChain))
{
performCallChain = new PerformCallChain(performCaller);
Add(caller, performCallChain);
}
performCallChain.Procedures.Add(callee);
if (!performCallChain.PerformCallees.Contains(performCallee))
{
performCallChain.PerformCallees.Add(performCallee);
}
Domain.Add(caller);
Domain.Add(callee);
}

/// <summary>
/// Compute the transitive closure of the PERFORM CALL relation.
/// We browse the the graph represented using the Call Relation, then we found during the visit cycles.
/// In the same time we compute the call chain.
/// </summary>
internal void TransitiveClosure()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works but it seems we don't realy use the full capabilities of the CFG here. There is no use of SuccessorEdges.

Plus this algorithm only focus on cyclic perform call.
Why don't we try to detect all cyclic code? Like this:

        procedure division.
        a.
            go to b.
        b.
            perform a.

There is also PR #2050 currently in progress with almost the same algorithm.
Why not use the DFS in this other PR to detect every cycle ?
If our internal quality tools only want to report perform cycle as error, then we can filter the result.
And we can report other cycles (cycles not only with perform) as warning as a bonus for developers.

Copy link
Contributor Author

@mayanje mayanje Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this will be an improvement because IBM compiler reports these warnings in such situation:

12  IGYCB7309-W   There may be a loop from the "PERFORM" statement at "PERFORM (line 12.1)" to itself. 
12  IGYCB7310-W   The "PERFORM" statement at "PERFORM (line 12.1)" cannot reach its exit.

But it seems that qualis does not report such situation also.

But our parser report a CFG error like this:
--- Diagnostics ---
Line 12[13,21] <37, Warning, General> - Warning: Statement ' go to b' located at line 10, column 13 prevents this PERFORM statement to reach its exit.

Ok for such situation I agree, Abstract Interpretation is welcome to be very similar with IBM Compiler.

{
// The dfs stack of all procedures in active consideration.
Stack<Procedure> S = new Stack<Procedure>();
// The dictionary N is used for three purposes
// 1) To record unmarked procedures (N[p] == 0).
// 2) To associate a positive number value with procedures that are active consideration (0 < N[p] < Infnity).
// 3) To mark Infinity Procedures whose cycles have already been found
Dictionary<Procedure, int> N = new Dictionary<Procedure, int>();

// For each procedure in the relation's domain
foreach (var a in Domain)
{ // For each procedure 'a' not yet considered, treat it.
var na = Visited(a);
if (na == 0)
dfs(a);
}

// Depth First Search Traversal.
void dfs(Procedure a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last version of the code without Tuple is much clearer.
Can you use more meaningful name for variables?
Like procedure or at least proc instead of a for the name of the procedure.
Same remarks for variables : d, fa, b, nb, ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 35d0734

{
S.Push(a);
int d = S.Count;
N[a] = d;
var fa = GetPerformCallChain(a);
if (fa.Procedures != null)
{ //For all procedure b called by a
foreach (var b in fa.Procedures.ToArray())
{
int nb = Visited(b);
if (nb == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my previous remarks on meaningful variable name, you could inline variable:

                            if (Visited(b)== 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 35d0734

dfs(b);//Procedure not visited yet
int _na = Visited(a);
int _nb = Visited(b);
N[a] = Math.Min(_na, _nb); // N[b] < N[a] : we find a cycle
var fb = GetPerformCallChain(b);
// f(a) = f(a) U f(b)
// We update the call chain
UpdatePerformCallChain(fb, fa);
}
}
var na = Visited(a);
if (na == d)
{ // 'a' is an entry point of a calculated component, that is to say the start procedure of a cycle.
// because if na != d then 'a' has been included in another cycle whose root is not 'a'.
do
{
N[S.Peek()] = Int32.MaxValue; // The Node(The procedure) is terminated
if (TryGetValue(S.Peek(), out var top))
{
UpdatePerformCallChain(fa, top);
}
else
{
Add(S.Peek(), new PerformCallChain(
fa.PerformCaller, new List<Node>(fa.PerformCallees), new HashSet<Procedure>(fa.Procedures)));
}
} while (S.Count > 0 && S.Pop() != a);
}
}
// Get the visited mark of the procedure a
int Visited(Procedure a)
{
if (N.TryGetValue(a, out var na))
return na;
return 0;
}
// Get the PerformCallChain value of the procedure a
PerformCallChain GetPerformCallChain(Procedure a)
{
if (TryGetValue(a, out var v))
return v;
else
return new PerformCallChain(null, new List<Node>(), new HashSet<Procedure>());
}
// Update PerformCallChain value of procedure 'to' when prodecure 'from' call
// procedure 'to' this the transitive aspect: all procedures called by 'from' are
// also called by 'to'
void UpdatePerformCallChain(PerformCallChain from, PerformCallChain to)
{
foreach (var p in from.PerformCallees)
{
if (!to.PerformCallees.Contains(p))
{
to.PerformCallees.Add(p);
}
}
foreach (var p in from.Procedures)
{
to.Procedures.Add(p);
}
}
}
}
}
}

Loading