Programs Manipulating Dynamic Heap Objects

The lower part of Figure 3 shows heap configurations for some states in the computation tree. Path Explosion: Lazy initialization may contribute to the path explosion problem; in the worst case, the number of paths is exponential with respect to the size of the heap. The first algorithm is essentially a good version of the lazy initialization algorithm presented by Khurshid et al.

The problem with the original lazy initialization algorithm (shown in Figure 2) is in the third non-deterministic choice of an uninitialized reference (non-primitive) field: ``an object created during the previous initialization of a field of type T.'' '. It turns out that we can still reduce the size of the computational tree.

Figure 1 illustrates the symbolic computation tree of the example method abs;

FORMALIZATION

Finally, Section 4.6 outlines proofs of the relative robustness and completeness of SEL, SELA, and SELB using a concrete JVM implementation as a basis. The JVM runtime structures and the subset of the JVM instruction set used in the semantics are presented in Section 4.1.1 and Section 4.1.2, respectively. The two possible final states are due to nondeterminism at the branch points, and all JVM instructions we model have at most two branches.5 Final states can be normal states, exceptions, or errors.

If the state of the chosen end-state path is unsatisfactory, SymExe will go back and explore the next end-state. Since large parts of the SELA semantic rules are shared with those of SEL, we present only the additional rules of SELA in this subsection. Rules IF ACMPEQ2-A,IF ACMPEQ3-A, andIF ACMPEQ4-Handle the case of the first operand being a symbolic location.

For each of the ifnullandifnonnull instructions, there is one additional rule for handling the symbolic location operand case. The remaining instructions (anew, anewarray, instanceof, and checkcast) in this category have the same rules as those in SEL, because no symbolic location can appear in the operands.

Fig. 8. Rules for Object Creation and Manipulation Instructions (1) in SEL

EXPERIMENTAL RESULT

Ratio of #paths explored by the Lazy, Lazier and Lazier# initialization algorithms to #Paths explored by the Lazy Initialization algorithm with k= 3. We compare the lazy, lazy and lazy# initialization algorithms using experimental data from Bogor/Kiasan. As expected, the lazier# initialization algorithm outperforms the lazier initialization algorithm, which outperforms the lazy initialization algorithm in most examples.

For all examples except SortandGC, the idle# initialization algorithm outperforms the idle initialization algorithm and the idle initialization algorithm outperforms the lazy initialization algorithm in terms of shorter running time and fewer fully explored paths. For example, Figure 196 shows the ratios of the number of paths explored by the lazy, idle, and lazy# initialization algorithms over the number of paths explored by the lazy initialization algorithm for examples AATree and TreeMap. Not only does the idle# initialization algorithm explore the fewest numbers of paths, it actually explores the optimal numbers.

For example, the lazy-initialization algorithms # and lazier are no better than the lazy-initialization algorithm in SortandGCexamples because Sortonly uses an array of integers and in GC all objects are fully expanded. Increasing levels of explored path and execution time of the lazier# initialization algorithm above which are the smallest of the three algorithms. For example, Figure 20 shows the number of paths explored by the three algorithms for AA-Tree.remove andTreeMap.put.

When comparing the performance of Sireum/Kiasan and Bogor/Kiasan, there is only one common configuration: the lazier# initialization algorithm using CVC3 over IPC.

Fig. 19. Ratio of #Paths Explored by the Lazy, Lazier, and Lazier# Initialization Algorithms over #Paths Explored by the Lazy Initialization Algorithm with k = 3

RELATED WORK [Symbolic Execution]

More specifically, CUTE uses concrete execution for testing and symbolic execution for directing concrete execution to cover branches. Kiasan maintains an apparently correct portion of the heap while CUTE uses constraints to maintain relationships between heap nodes. Pex starts as Unit Meister [Tillmann and Schulte 2005], a pure symbolic execution engine with loop delimiters.

More recently, Pex has incorporated CUTE's approach of mixing concrete execution and symbolic execution to achieve high degrees of branch coverage. 2005] is another symbolic execution tool for generating minimal sequences of public method calls to test a class. 2000] approach to Kiasan is explicit state model checking using a depth-first exploration strategy: both perform a forward path-sensitive analysis and can control temporal properties.

Explicit state model checking can be classified into two categories: stateless and stateless depending on whether it stores states, and symbolic execution can be thought of as stateless explicit state model checking. Symbolic implementation is more abstract than simple model checking because it manipulates symbols instead of concrete values. Model checking is good when dealing with concurrent programs, but symbolic execution is usually limited to sequential programs due to state matching problems.

The main advantage of TVLA is that it can verify heap-manipulating programs because it calculates an over-approximation of the state.

CONCLUSION AND FUTURE WORK

The lazy, lazy, and lazy # initialization algorithms used in Kiasan can also be viewed as heap abstraction techniques, where the symbolic objects, locations, and references are the summary nodes. To complete the analysis, TVLA includes the ablur [Lev-Ami and Sagiv 2000] operation, which merges heap shapes after the focus operation, whereas the lazy, lazy, and lazy# initialization algorithms never merge states. Thus, the lazy, lazy, and lazy # initialization is more precise, but at the cost of non-termination.

The main disadvantage of TVLA is that the user generally needs to provide instrumental predicates to facilitate verification. Early work by some authors and others demonstrates that adaptive decision processes can achieve orders of magnitude in speed [Belt et al. Additionally, in some cases it would be useful to introduce abstractions to deal with commonly used data structures and their properties.

For example, when analyzing Java programs using strings, it would be more efficient to use string models/abstractions/theories (eg [Hopcroft and Ullman 1979]) than to use the actual implementation of the java.lang.String class in the Java standard library. In conclusion, we believe that both contracts and adapted abstract models would be the key to further symbolic implementation at scale. We would like to extend the metamathematical analysis of instance optimality to commonly used bounds in the bounded verification literature and to more data structures.

Such an analysis technique can then serve as a benchmark for evaluating the effectiveness of bounded verification algorithms.

ACKNOWLEDGEMENTS

InProceedings of the 7th joint meeting of European Software Engineering Conference (ESEC) and ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE). JML4: Towards an Industrial Grade IVE for Java and Next Generation Research Platform for JML. Abstract Interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixed points.

Toward a case- and case-optimal symbolic execution algorithm for analyzing the robust properties of object-oriented programs. In OOPSLA ’08: Proceedings of the 23rd ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications. InTools and Algorithms for the Construction and Analysis of Systems, 9th International Conference, TACAS 2003 held as part of the Joint European Conferences on Software Theory and Practice, ETAPS 2003, Warsaw, Poland, April Proceedings, H.

Proceedings of the 9th European Conference on Software Engineering held together with the 11th ACM SIGSOFT Symposium on Foundations of Software Engineering. In Proceedings of the 7th International Conference on Tools and Algorithms for System Design and Analysis (TACAS01), T. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2004, Boston, Massachusetts, USA, July G.

FORMALIZATION OF THE SWAP EXAMPLE 1 Bytecode Execution of swap in JVM

We will use the exchange method shown in Program 1 as an example to show the formalizations of initial conditions in SEL, SELA, and SELB. Note that there are three empty sets that appear in the state: global, the heap, and the path state.

CONCRETE SEMANTIC RULES

Instruction getfield f reads the f-field of an object indexed by the location at the top of the stack. According to the JVM specification [Lindholm and Yellin 1999], the test returns False if the object is Null; if the object is not Null and the object type is a subtype of τ, it returns True; otherwise it returns False. However, there are two differences: first, the test is passed if the operand is Null; second, that the statement does not return any value; if the test passes, it does nothing;

The new array has all indices initialized to the default value of the element type with the new-carr function shown in Listing 6. If this is the case, then execution jumps to the command operand (ruleIF ICMPLT1-C); otherwise, execution will simply move to the next instruction (ICMPLT2-C IF rule). Similar to icmpt, instructionif acmpeq checks for equality between two object references (which may be Null) at the top of the stack, and execution will branch if the equality holds.

Instructionifnull performs a Nullity test of the top of the operand and execution jumps if it is Null. Rules for the assume and assert instructions (shown in Figure 30): The semantics for assume and assert are standard: if the top of the stack is true, assume and do nothing; otherwise, assume to silently terminate execution by setting the path condition to False, while confirm indicates an error and terminates execution. 9We adopt a shorthand notation f(D0) to represent a function whose domain is restricted to D0 which is supposed to be a subset of the domain.

Then we can construct a state σc ∈ Statec by applying ET, ES, the identity switch ρtoσs and allowing any undefined field/index to be the default value of the field/index type.

Fig. 24. Formalization of the Trace 1-11-112-1121 in Figure 5

KRIPKE STRUCTURE

The above simulation relation essentially says that if two states are related by a certain relation, the final states obtained by applying their own transitions are also related by the same relation. We also use unlabeled power Kripke structures in the main text, where LP(K) is omitted.

PROOF OF LEMMA 4.1