Internship with Benoit Baudry at KTH
Adaptation of Amplified Unit Tests for Human Comprehension

Table of Contents

1 Introduction

2 Findings

2.1 Bibliography

2.1.1 Writing

  1. Context
  2. Problem statement

    Amplified tests have many new assertions and new inputs and DSpot keeps tests depending on the fact that they detect new mutants. What that means is that there is no enquiry on the usefulness of each amplification. Because the number of new mutants killed is often times significantly lower than the number of amplifications — especially the number of assertions which are the one to effectively detect a mutant — we end up with a lot of useless statements. This noise is problematic because it makes the review process longer, and because the less focused a program is, the harder it is to comprehend. And noise is not the only threat to focus as the new mutants can be completely different and from a different part of the SUT compared to the ones the original test case could detect. The final step towards a human-friendly output would be to add context and human-like description of the amplification. To sum-up, the three identified problems to tackle are noise reduction, focus confinement, and natural language description of the amplification.

  3. Technical Difficulties

    In order to describe a mutant, you need information about it. You could give the mutator "category" but you only have its class name. You could give the column to the statement it was applied to to highlight it, maybe, but you do not have access to that. You could use the position in the AST, but you do not know that.

    Knowing the which assertion killed what mutant is essential. Be it to start a program slice from that assertion, or simply paraphrasing the assertion to explain what bug is detected. But you do not know that. And you barely know, with ugly comments, what assertions are the result of amplifications. Identifying, afterwards, the role that assertions play is cumbersome. You can run the test case with every mutants. But first, you do not have directly access to mutants. And what do you want to do with them? Instrument, by adding a probe after each assertion? How do you automate elegantly such stream of test execution? Maybe you can remove assertions, one by one, and see if mutants keep getting killed? I know we are in SBSE but that is quite ugly.

    As said before we have no direct information on the position of amplifications in the new test case. Makes it harder to generate descriptions or apply minimization on them. But what data structure would you use? Bookkeeping of the position in the AST? How would you keep it up-to-date with multiple rounds of amplifications?

  4. On the usefulness of works from (code maintenance|software artifacts summarization)

    A lot of effort has been put in generating human friendly descriptions for various kinds of software artifacts. In particular, there have been works on generating documentation (or comments or description or summary) for source code, methods, and more interestingly unit test cases. These tools can generate natural language description of what the piece of code does and identify to most important parts or lines of code. But as for why a code change was done or the role a piece of code plays — i.e. understand the intentions of the developers — it is harder and tools need additional information or limit the scope by identifying stereotypes (e.g. labelling a commit as Feature Addition).

    But those works are not directly useful for our problem. First, we know why an amplified test was kept, it is because it can detect a new bug.

  5. On the usefulness of works from test cases minimization

    Using delta-diff we can identify useless statement and then remove them. But more powerful program minimisation tools are available. While we could predict that the more minimisation is applied, the less code there is left to describe thus the description is easier to generate, it is not obvious and others details have to be taken into account.

    First we might not want to modify the original part, as the developer might already be familiar with it and it might be less overwhelming to grasp the purpose of the test case. And even if the developer has never seen this test before, a hand-written program is probably easier to understand than a compact version.

    And tools probably cannot be told not to touch certain parts.

  6. On the usefulness of an NLG

    The sentences should always be the same, follow the same structure, built with the same template as humans do.

  7. What do we then propose as contribution

2.1.2 References

  1. Cultural
    • Search Based Software Engineering: Techniques, Taxonomy, Tutorial (harman2012search)
      • TODO
    • A Few Billion Lines of Code Later (bessey2010few)
      • Great to understand the limits of static analysis but also some of the limits of all analysis
      • Difficult to analyze code because of the diversity of build automation tools
      • "By default, companies refuse to let an external force modify anything."
      • "A misunderstood explanation means the error is ignored or, worse, transmuted into a false positive."
      • Many standards
      • Some people don't care about bugs, sometimes improving the tool reveals more bugs which is bad for the manager
    • Lessons from Building Static Analysis Tools at Google (sadowski2018lessons) Tricorder: Building a Program Analysis Ecosystem (sadowski2015tricorder)
      • Great to understand the challenges in pushing an analysis tool in the real world
      • notes on the printed paper
      • such tool need to be
        1. Integrated/Easy to use
        2. Free of false positive
        3. Easy to understand
    • Spoon: A Library for Implementing Analyses and Transformations of Java Source Code (pawlak2016spoon)
      • let's say it's like llvm/clang
    • Regression Testing Minimisation, Selection and Prioritisation : A Survey (yoo2012regression)
      • TODO
    • Clustering Test Cases to Achieve Effective & Scalable Prioritisation Incorporating Expert Knowledge (yoo2009clustering)
      • TODO
    • Measuring software redundancy (carzaniga2015measuring)
      • TODO
    • Automatic Software Diversity in the Light of Test Suites (baudry2015automatic)
      • analysis of common features (e.g. number of tests covering one statement)
      • plastic behavior (have different behaviors while still remaining correct) study
      • different details compared to 4 and 9
    • Tailored source code transformations to synthesize computationally diverse program variants (baudry2014tailored)
      • More details than in 4
    • Selecting a Software Engineering Tool: Lessons Learnt from Mutation Analysis (delahaye2015selecting)
      • TODO
    • The Oracle Problem in Software Testing: A Survey (barr2015oracle)
      • TODO
    • Human-Centered Design Meets Cognitive Load Theory: Designing Interfaces that Help People Think (oviatt2006human)
    • Grounded Theory in Software Engineering Research: A Critical Review and Guidelines (stol2016grounded)
    • Five Misunderstandings about Case-study Research (flyvbjerg2006five)
      • TODO
    • Is Search-based Unit Test Generation Research Stuck in a Local Optimum? ๐ŸŒŸ๐ŸŒŸ (rojas2017search)
      • list of challenges
        1. Searching Flat Fitness Landscapes
        2. Achieving High Code Coverage
        3. Tests Without Oracles
        4. Ugly Generated Test Code
        5. Research Papers Instead of Usable Tools
  2. Unit Testing
  3. Mutation Testing
    • Is Mutation Testing Ready to be Adopted Industry-Wide? (movzucha2016mutation)
    • Investigating the Correlation between Mutation Score and Coverage Score (assylbekov2013investigating)
    • An Analysis and Survey of the Development of Mutation Testing ๐ŸŒŸ (jia2011analysis)
      • TODO
    • PIT: A Practical Mutation Testing Tool for Java (Demo) ๐ŸŒŸ (coles2016pit)
      • Well written
      • PIT is fast (manipulate bytecode), which is one of the reasons it can be used in real life
      • test selection
      • robust, easy to use, well integrated (cites 10)
    • Resolving the Equivalent Mutant Problem in the Presence of Non-determinism and Coincidental Correctness (patel2016resolving)
      • TODO
    • An Experimental Evaluation of PITโ€™s Mutation Operators (andersson2017experimental)
      • TODO
    • Are Mutation Scores Correlated with Real Fault Detection? (papadakis2018mutation)
      • TODO
    • A Transformational Language for Mutant Description (simao2009transformational)
      • TODO
      • unfortunately it doesn't give clues on how to describe mutants as they see mutation simply as a match-and-replace process.
      • kind of look like a formal description of the design of a DSL
    • An Experimental Evaluation of Selective Mutation (offutt1993experimental)
    • A theoretical study of fault coupling (wah2000theoretical)
    • Proteum IM 2.0: An Integrated Mutation Testing Environment ๐ŸŒŸ๐ŸŒŸ (delamaro2001proteum)
      • TODO
  4. Search-based Software Testing
    • Search-based software testing: Past, present and future (mcminn2011search)
      • Already read from previous internship
    • Genetic Improvement of Software: a Comprehensive Survey (petke2017genetic)
    • Evosuite ๐ŸŒŸ (fraser2011evosuite) (fraser2013evosuite)
      • State-of-the-art tool
      • Very sophisticated, difficult to reproduce experiments because it changes fast and a lot of parameters are tweaked
      • minimization
        • remove unnecessary statements
        • careful not to generate long test cases
    • An Approach to Test Data Generation for Killing Multiple Mutants ๐ŸŒŸ (liu2006approach)
  5. Test Amplification
    • B-Refactoring: Automatic Test Code Refactoring to Improve Dynamic Analysis (xuan2016b)
      • Split tests for each fragment to cover a simple part of the control flow.
      • Help with respect to fault localization.
    • Test data regeneration: generating new test data from existing test data (yoo2012test)
    • The Emerging Field of Test Amplification: A Survey (danglot2017emerging)
      • Dense
      • Good overview of goals (Table 1) and methods (Table 2)
    • DSpot: Test Amplification for Automatic Assessment of Computational Diversity (baudry2015dspot)
      • Comparison with TDR 2 and also concurrent to 7
        • "the key differences between DSpot and TDR are: TDR stacks multiple transformations together; DSpot has more new transformation operators on test cases: DSpot considers a richer observation space based on arbitrary data types and sequences of method calls."
        • "We count the number of variants that are identified as computationally different using DSpot and TDR. "
    • A Systematic Literature Review on Test Amplification ๐ŸŒŸ
      • TODO
    • Genetic-Improvement based Unit Test Amplification for Java ๐ŸŒŸ
      • TODO
    • Dynamic Analysis can be Improved with Automatic Test Suite Refactoring (xuan2015dynamic)
      • TODO
    • Automatic Test Case Optimization: A Bacteriologic Algorithm (baudry2005automatic)
      • TODO
      • Compared to DSpot, no assertions generation, small programs.
  6. Automated Test Generation
    • How Do Automatically Generated Unit Tests Influence Software Maintenance? ๐ŸŒŸ๐ŸŒŸ (shamshiri2018how)
      • TODO
    • Generating Unit Tests with Descriptive Names Or: Would You Name Your Children Thing1 and Thing2? ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ (daka2017generating)
      • TODO
    • An Empirical Investigation on the Readability of Manual and Generated Test Cases ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ (grano2018empirical)
      • TODO
  7. Generating natural language descriptions for software artifacts
    1. Surveys
      • Survey of Methods to Generate Natural Language from Source Code ๐ŸŒŸ (neubig2016survey)
        1. Survey papers
          • recommends 2
        2. Generation Methods
          1. manual rules/templates
            • SWUM 2&6
              • test cases 4 & 3
              • changes 6 & 1
              • exceptions 5
            • multiple lines description 5
              • not useful, too high level
            • using execution path information 5 & 4
              • not useful(?)
        3. Content Selection Methods
        4. Targeted Software Units
        5. Training Data Creation
        6. Evaluation
          • TODO later
      • Summarizing Software Artifacts: A Literature Review ๐ŸŒŸ (nazar2016summarizing)
        • very complete
      • Automatic Summarising: The State of the Art (jones2007automatic)
    2. Tools for tests
      • Automatically Documenting Software Artifacts ๐ŸŒŸ (li2018automatically)
        • PhD thesis
        • Chapter 4 (p. 109) on tag for unit tests
        • catalog of 21 stereotypes for methods in unit tests
          • 14 JUnit API-Based Stereotypes for Methods in Unit Test Cases
            • Boolean verifier
            • Null verifier
            • Equality verifier
            • Identity verifier
            • Utility verifier
            • Exception verifier
            • Condition Matcher
            • Assumption setter
            • Test initializer
            • Test cleaner
            • Logger
            • Ignored method
            • Hybrid verifier
            • Unclassified
          • 7 C/D-Flow Based Stereotypes for Methods in Unit Test Cases
            • Branch verifier
            • Iterative verifier
            • Public field verifier
            • API utility verifier
            • Internal call verifier
            • Execution tester
            • Empty tester
      • Automatically Documenting Unit Test Cases ๐ŸŒŸ๐ŸŒŸ (li2016automatically) (git)
        • Survey with developers and projects mining study to justify automatic documentation of unit tests
        • uses a SWUM implementation in C#
        • example of templates and placeholders
        • as with other similar works it may not be useful for us
      • Towards Generating Human-Oriented Summaries of Unit Test Cases ๐ŸŒŸ (kamimura2013towards)
      • Automated Documentation Inference to Explain Failed Tests (zhang2011automated)
        • could be used to improve the documentation and precision of try/catch amplification
      • Automatically Identifying Focal Methods under Test in Unit Test Cases (ghafari2015automatically)
        • not useful, we are focusing on explaining edge cases
    3. Commits/Code changes
      • On Automatically Generating Commit Messages via Summarization of Source Code Changes (cortes2014automatically) ChangeScribe: A Tool for Automatically Generating Commit Messages (linares2015changescribe)
        • Good entry point for the related work
        • Classifies commit with stereotypes
        • Uses templates for sentences, and fills it with commit stereotypes (2)
        • lacks 'why' information
      • Using Stereotypes to Help Characterize Commits (dragan2011using)
        • Only categorize based on added or deleted methods
      • Towards Automatic Generation of Short Summaries of Commits (jiang2017towards)
      • Automatically Generating Commit Messages from Diffs using Neural Machine Translation (jiang2017automatically)
        • trying to be less verbose and add context
      • On Automatic Summarization of What and Why Information in Source Code Changes (shen2016automatic)
        • Better than ChangeScribe1
        • Categories of Commits in Terms of Maintenance Task and Corresponding Description (based on 3) (why information)

           Categories of commits             Description            
           Implementation          New requirements                 
           Corrective            
                                 
                                 
           Processing failure               
           Performance failure              
           Implementation failure           
           Adaptive                Change in data environment       
           Perfective            
                                 
                                 
           Processing inefficiency          
           Performance enhancement          
           Maintainability                  
           Non functional        
                                 
                                 
           Code clean-up                    
           Legal                            
           Source control system management 
        • What information: description (more like diff (ChangeDistiller) dump) of changes
        • only keep information for methods that are called many times
        • boilerplates not interesting
      • Automatically Documenting Program Changes (buse2010automatically)
        • precise description
        • nicely written, but not useful for us
      • Towards a taxonomy of software change ๐ŸŒŸ (buckley2005towards)
        • purely about what information
        • nice charts or table to display all possible informations
    4. General/Others
      • Comment Generation for Source Code: State of the Art, Challenges and Opportunities (https://arxiv.org/pdf/1802.02971.pdf)
        • TODO
        • Information Retrieval ("analyze the natural language clues in the source code") -> not relevant
        • Program Structure Information (summary from important statements) -> not relevant(?)
        • Software Artifacts Beyond Source Code (using the social interaction revolving around development) -> not relevant
        • Fundamental NLP Techniques -> not relevant
        • Not very useful… "current approach only generate descriptive comments"
      • The Emergent Laws of Method and Class Stereotypes in Object Oriented Software (dragan2011emergent)
        • Excerpt from PhD Thesis
        • Source of the Taxonomy of Method Stereotypes ๐ŸŒŸ
        • C++
      • The Dimensions of Maintenance (swanson1976dimensions)
        • Foundational paper
      • JStereoCode: Automatically Identifying Method and Class Stereotypes in Java Code (moreno2012jstereocode)
        • Extending Dragan's work for Java
      • Automatic Documentation Inference for Exceptions ๐ŸŒŸ (buse2008automatic)
        • well written
        • could be used to improve the documentation and precision of try/catch amplification
        • nice study of percentage of what and why information in open-source projects' commit messages
      • Towards Automatically Generating Summary Comments for Java Methods ๐ŸŒŸ (sridhara2010towards) (+ PhD thesis)
        • well written
        • SWUM, central lines selection, …
        • again not exactly useful for us
      • Integrating Natural Language and Program Structure Information to Improve Software Search and Exploration (hill2010integrating)
        • PhD thesis
        • Source of SWUM
        • SWUM implementation as Eclipse plugin
      • Swummary: Self-Documenting Code (herbert2016swummary) (git)
        • focal method extraction -> Swum.NET
      • Automatic Source Code Summarization of Context for Java Methods (mcburney2016automatic)
        • looks very complete but again not quite useful
      • Method Execution Reports: Generating Text and Visualization to Describe Program Behavior ๐ŸŒŸ๐ŸŒŸ (beck2017method)
        • good list of possible information
        • TODO
      • Towards Automatically Generating Descriptive Names for Unit Tests (zhang2016towards)
        • TODO
  8. Commits/Code survey
    • Whatโ€™s a Typical Commit? A Characterization of Open Source Software Repositories (alali2008s)
      • Useful to know what terms to use
      • According to 1 the most used terms are fix, add, test, bug, patch and the most used combinations are file-fix, fix-use, add-bug, remove-test, and file-update.
    • On the Nature of Commits (hattori2008nature)
    • What do large commits tell us? A taxonomical study of large commits (hindle2008large)
      • extending 3
    • Cognitive Processes in Program Comprehension (letovsky1987cognitive)
      • Foundational paper
    • On the Naturalness of Software (hindle2012naturalness)
      • Code is repetitive and predictable
  9. Natural Language Generator
    • SimpleNLG: A realisation engine for practical applications (gatt2009simplenlg)
      • TODO
  10. Code Evolution
  11. Test Case Minimisation
  12. Not Relevant
    1. Knowledge
      • Poster: Construct Bug Knowledge Graph for Bug Resolution (wang2017construct)
      • Towards the Visualization of Usage and Decision Knowledge in Continuous Software Engineering (johanssen2017towards)
        • Pretty figures
        • Design of a tool to visualize various kinds of knowledge
      • Method Execution Reports: Generating Text and Visualization to Describe Program Behavior (beck2017method)
    2. Testing Related
      • SCOTCH: Test-to-Code Traceability using Slicing and Conceptual Coupling (qusef2011scotch)
      • ComTest: A Tool to Impart TDD and Unit Testing to Introductory Level Programming (lappalainen2010comtest)
    3. Others
      • A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes (loyola2017neural)
        • Multiple good citation to papers on NL and SE
      • Automatically Capturing Source Code Context of NL-Queries for Software Maintenance and Reuse (hill2009automatically)
      • How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms (panichella2013effectively)
        • Enhancement that doesn't really interest us
        • "in the context of three different SE tasks: (1) traceability link recovery, (2) feature location, and (3) software artifact labeling."
      • Software traceability with topic modeling (asuncion2010software)
        • "navigate the software architecture and view semantic topics associated with relevant artifacts and architectural components"
      • Automatically Detecting and Describing High Level Actions within Methods (sridhara2011automatically)
        • too high level
      • Automatic Generation of Natural Language Summaries for Java Classes (moreno2013automatic)
      • Using Method Stereotype Distribution as a Signature Descriptor for Software Systems (dragan2009using)
      • Reverse Engineering Method Stereotypes (dragan2006reverse)
      • Supporting Program Comprehension with Source Code Summarization (haiduc2010supporting)
        • motivations
      • Natural Language-based Software Analyses and Tools for Software Maintenance (pollock2009natural)
        • more about analysis than generation

2.2 Contribution

2.2.1 Minimisation

2.2.2 Focus

2.2.3 Replace original test or keep both

2.2.4 Explanation

  1. Slicing
  2. Natural Description

2.2.5 Ranking

3 Development

4 Global Goals [0/2]

4.1 TODO Report <2018-06-08 Fri 12:00>

  • Thanks all the team in report (Benjamin, Benoit, Martin)

4.2 TODO Defense <2018-06-25 Mon>

4.2.1 DONE Talk @ Workshop Software Engineering Research <2018-03-08 Thu 10:00>–<2018-03-08 Thu 10:20>

4.2.2 TODO Talk @ Workshop Software Engineering Research <2018-05-08 Tue>

4.2.3 TODO Defense Rehearsal @ ENS <2018-06-22 Fri>

5 Journal [14/21]

5.1 DONE Preliminary Bibliographical Work <2017-09-18 Mon>–<2018-02-07 Wed>

5.1.1 Things Done

  • Meeting with Benoit <2017-09-22 Fri>
    • 1, 2, 3 issues for possible work to do
    • 1 possible work: explain if a mutant isn't killed because of oracle or input
    • focus on mutation (e.g. mutation score)
    • work will be on Dspot and PIT.
  • Read blog on PIT and Descartes
    • Sum up PIT/Descartes
    • List of wanted features
  • Meeting with Benoit <2017-11-23 Thu>
    • The purpose of DSpot has shifted right?
      • interesting to talk about the history in bibliography? No, there is a new paper
    • Enough space to talk about related work? present a few papers in details and cite others
    • Current organisation of bibliography
      • General techniques
        • Definitions
        • Mutants
        • etc
      • Useful tools
        • DSpot
    • do extensive evaluation (comparison from scratch vs amplification)
    • find literals to help tests
    • add mutation operator for specific data structures
    • stack mutations
    • add explanations
    • 3 big open problems
  • Meeting with Benoit <2017-12-22 Fri>
    • reduce only the generated tests
    • big question: minimal generated tests
      • pre or post treatement
      • order of presenting PRs
      • this is the big question
      • we don't want to touch the original suite
      • we want the programmer to understand the new tests
    • add an example of junit test
    • talk about the trend of genetic improvement
    • don't necesseraly cite Automatic software diversity in the light of test suites and Tailored source code transformations to synthesize computationally diverse program variants
  • Talk rehearsal <2018-01-28 Sun 08:30>, notes by Vladislav
    • More illustrations (workflow graph?)
    • Check the test case example (too complicated for not much, not really java)
    • Year and conference acronym in footcite
    • Careful with lambdas for TDR (check with supervisor)
    • More details on commits/pull requests and emphasize the importance of developers reviewing generated tests
    • Slide 10 -> ugly (different spacings)
    • Stacking operators: explanation too sparse
    • 4th point in conclusion slide too vague. Not just the goal but also the mean to achieve it
  • https://blog.acolyer.org/2018/01/23/why-is-random-testing-effective-for-partition-tolerance-bugs/

5.1.2 Blocking Points

5.1.3 Planned Work [6/6]

  • Read papers
  • Meeting with Benoit <2017-09-22 Fri 15:00>–<2017-09-22 Fri 15:30>
  • Meeting with Benoit <2017-11-23 Thu 15:00>–<2017-11-23 Thu 16:00>
  • Send link to repo
  • Ask Maud about plane tickets refund
  • Meeting with Benoit <2017-12-22 Fri 10:30>–<2017-12-22 Fri 11:30>

5.2 DONE Week 1 & 2 <2018-02-07 Wed>–<2018-02-18 Sun>

5.2.1 Things Done

  • Wrote the little example of use of Spoon (I simply added it in spoon-examples)
package fr.inria.gforge.spoon.transformation;

import spoon.processing.AbstractProcessor;
import spoon.reflect.code.*;

/**
 * Removes if when there is no else and if the body consists only of a return
 *
 * @author Simon Bihel
 */
public class RemoveIfReturn extends AbstractProcessor<CtIf> {

    @Override
    public void process(CtIf element) {
        CtStatement elseStmt = element.getElseStatement();
        if (elseStmt == null) { return; } // should not be an else

        CtStatement thenStmt = element.getThenStatement();
        if (thenStmt instanceof CtReturn) { // simple case with directly a then statement
            element.replace(thenStmt);
            return;
        }
        if (thenStmt instanceof CtBlock) { // case with a block which first statement is a return
            CtStatement firstStmt = ((CtBlock) thenStmt).getStatement(0);
            if (firstStmt instanceof CtReturn) {
                element.replace(thenStmt);
            }
        }
    }
}

  • Clang static analyzer for windows
    • Clang is painful to install on Windows… It requires llvm and Microsoft Visual Studio. And there is no other choice than building from source. And it requires Perl to run.
    • Should probably use CPPcheck
    • Cppcheck has a GUI and an installer for Windows. ๐Ÿ‘
    • example of bugs http://courses.cs.vt.edu/~cs1206/Fall00/bugs_CAS.html
    • no bug in the provided code
  • Software Maintenance seems to be an important keyword/field for the documentation of code
  • To what extent are documenting source code changes useful for us?
    • Only few changes made by DSpot
    • The source of the change is a tool, not a human
    • Still useful to see how they formulate features in natural language
    • DSpot doesn't add new features, we want the purpose of enhanced tests.
    • Don't really care about Pyramid method because it compares with human written messages
  • GitHub's PR templates are just plain text templates.
  • Went through papers that cited ChangeScribe. Went partly through citations by ChangeScribe.
  • Spent a lot of time on generating natural language from source code
  • Submitted a fix for a bug in vim-orgmode
  • Natural Language Generators
    • found on github, for java
      1. SimpleNLG
        • 410 stars, 215 citations
        • Seems to be just what we need
      2. NaLanGen
        • 2 stars
    • ChangeScribe seems to use a homemade generator
  • "The Software Word Usage Model (SWUM) is one of the first models of this type, and can be used for converting Java method calls into natural language statements (Hill et al., 2009)."
  • Looking at the code of DSpot to get info on generated tests
    • looks like a list of amplified test are generated and you don't know what was the amplifier

5.2.2 Blocking Points

  • Is it useful to explore approaches for augmenting the context provided by differencing tools?

5.2.3 Planned Work [6/12]

  • Read papers
  • should I register for ICST? and ICSE? -> Yes, talk/remind Benoit
  • Sign papers grant
  • Is there a Slack or something?
  • Get familiar with Spoon
    • Read paper
    • Little project, remove if when there is no else and the body is just a return.
      • Write the program
      • Write tests
  • Get familiar with Dspot
    • Running it
    • Contributing
      • Pick issues
      • Fix them
  • See boiler-plates for NLP way of building sentences.
    • a.k.a templates, placeholder templates
    • Search for papers and read them
    • Search for tools
  • Sign contract with KTH Relocation <2018-02-13 Tue 14:00>–<2018-02-13 Tue 15:30>
  • Categorize papers of preliminaries
  • Lookup what static analysis is possible with clang Cppcheck [100%]
    • find tools
    • it is for mechatronics students who write small programs for arduinos
    • show them what tests are and what's possible to discover bugs
    • Think of what they could be taught
    • Test Cppcheck on a windows machine
      • Install windows on the small computer
      • Test the code provided in the course
  • Go to Entrรฉ for badge and PIN code outside working hours
  • Run tools that I encounter in papers

5.3 DONE Week 3 <2018-02-19 Mon>–<2018-02-25 Sun>

5.3.1 Things Done

  • Work on DSpot documentation
  • Read reviews of bibliographic report
  • How to remember what amplification has been applied?
    • Go through logs
      • nothing useful in them
    • Comments directly in the code
      • name of the amplifier used in the line before
      • could easily be enriched if necessary
    • Enrich test methods with a new parameter
      • last resort
  • A json file summarizes killed mutants (with their location)
  • Need to keep focus

To select the new test case to be proposed as pull request, we look for an amplified test that kills mutants which are all located in the same method.

(this was done manually)

  • Need for automated minimization

A second point in the preparation of the pull request relates to the length of the amplified test. Once a test method has been selected as a candidate pull request, we analyze it and manually make it clearer and more concise, we call this process the manual minimization of the amplified test. We note that automated min- imization of amplified tests is an interesting area of future work, left out of the scope of this paper.

  • SWUM is really about analysis. Trying to reformulate things without making sense of them.
  • Possible title: Adaptation of Amplified Unit Tests for Human Comprehension
  • Swum.NET

UnitTestScribe also uses SWUM.NET to generate a general NL description for each unit test case method. SWUM.NET captures both linguistic and structural information about a program, and then generates a sentence describing the purpose of a source code method.

  • Started writing
  • Made a PR for vim-grammarous
  • Discussion on how to minimize generated tests

5.3.2 Blocking Points

  • Where is the "keep test that kills mutants all located in the same method"? Seems to be implemented reading the paper, but issue still open and it proposes a solution that seems different than just looking at the json file at then end of the process.
    • it was done manually

5.3.3 Planned Work [7/12]

  • Read papers
  • Register for ICST
  • Get familiar with Dspot [1/6]
    • Running it
    • Contributing
      • Pick issues
      • Fix them
    • Write documentation [2/4]
      • Key methods [3/5]
        • Assertion generation [2/2]
          • AssertGenerator
          • MethodsAssertGenerator
        • Input amplification [1/2]
          • glue
          • amplifiers
        • Pre-amplification
        • Amplification
        • Compilation & run [2/3]
          • compileAndRunTests
          • buildClassForSelection
          • TestCompiler
      • Rename amplifyTests to express the fact that it is only doing input amplification
      • compileAndRunTests
        • Why return null when not all methods were compilable or some tests failed?
      • Renaming plural variables
    • Work on removing all deprecated classes in stamp [0/1]
      • Remove unused deprecated methods of TestSelector
    • More precise try/catch?
      • Would that be useful? Feasible?
    • Extract hard-coded amplifications messages
  • Lab access denied outside working hours
    • Go to Entrรฉ
    • Go again to Entrรฉ
    • Send email to request access to the lab
      • resend
    • Resolved
  • Run tools that I encounter in papers
    • tools not really useful are they(?)
    • closing this for now
  • Find a way to know which amplifications have been applied and/or how to implement it
  • Make DHELL PR maven compiler version
  • Start writing [0/4]
    • Problem statement
      • scientific
        • quite short
      • technical
    • Comparison with works on description
      • Explaining what they do
        • badly written
        • quite short
      • Why we can't apply them for our work
    • Comparison with works on test cases minimization
      • Explaining what they do
      • Why we can't apply them for our work
    • Whether using an NLG is useful
  • Start doing a simple NL commit messages generator
    • for later, first we need minimization
  • Maybe reorganize the references on descriptions
  • Read about identify essential parts of a test for killing a specific mutant
  • Search for papers on mutation testing and same location targeting

5.4 DONE Week 4 <2018-02-26 Mon>–<2018-03-04 Sun>

5.4.1 Things Done

  • Added git hook to commit the html version of the reporting
  • Explored the use of slicing to detect the cause of new killed mutant
    • Need observation-based slicing with mutation score(?)
  • Nothing on summarization and mutation testing
    • You usually think the other way around, what do I need to do in order to kill this new mutant
  • srcSlice not supporting Java (paper)
  • JavaSlice does not support Java 8
  • Kaveri (Indus Java Program Slicer) old and eclipse plugin
  • JavaBST not available ? paper badly written
  • WALA
  • Fixed org export and also pull on server
  • Starred every vim plugin I use with Github's API and PyGitbub
  • Explored end-user description of Pitest mutators
    • Pitest has user-friendly mutators, now the question is how to use/access them
cd .. && mvn clean package -DskipTests && cd dhell && mvn clean package && java -jar ../dspot/target/dspot-1.0.6-SNAPSHOT-jar-with-dependencies.jar -p ./dspot.properties -i 1 -t eu.stamp.examples.dhell.HelloAppTest -a MethodAdd --verbose && vim dspot-out/eu.stamp.examples.dhell.HelloAppTest_mutants_killed.json

5.4.2 Blocking Points

  • NL commit message generator
    • how to know which amplifications were applied?
  • What is a program/test slice for a mutation score criterion?
    • dataflow slice starting from the killing assertion

5.4.3 Planned Work [3/9]

  • Register for ICST
  • Dspot [2/5]
    • Contributing
    • Write documentation [2/2]
      • Key methods [2/2]
        • Input amplification
          • amplifiers
        • Compilation & run
          • TestCompiler
            • no need
      • compileAndRunTests
        • Why return null when not all methods were compilable or some tests failed?
      • PR
    • Work on removing all deprecated classes in stamp
      • Remove unused deprecated methods of TestSelector
    • More precise try/catch?
      • Would that be useful? Feasible?
    • Extract hard-coded amplifications messages
  • Start writing [0/4]
    • Problem statement
      • technical
    • Comparison with works on description
      • Why we can't apply them for our work
    • Comparison with works on test cases minimization
      • Explaining what they do
      • Why we can't apply them for our work
    • Whether using an NLG is useful
  • Read about identify essential parts of a test for killing a specific mutant
  • Search for papers on mutation testing and same location targeting
  • Start doing a simple NL commit messages generator [0/2]
    • DSpot automated PR
    • Simple PR description [3/4]
      • Add a field in the killed mutants json file
      • Print it
        • done automatically
      • Stupid message
      • Long stupid description
        • Get what amplifications were applied
        • done
  • Replace fr.inria.stamp with eu.stamp
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant [1/4]
    • Need a more precise location for the mutant location
      • column number
        • not available
      • maybe I don't need it
    • Need to know the killing assertion
      • Add a trace of this when a test is kept
    • Adding as dependency
    • Use it

5.5 DONE Week 5 <2018-03-05 Mon>–<2018-03-11 Sun>

5.5.1 Things Done

  • Tried to use Sourcetrail
    • Needed to run mvn install -DskipTests -Djacoco.skip=true
    • displayed no references or class
  • Worked on presentation for the workshop
  • Proposed mutators taxonomy
    • Literal change
    • Object change
    • New assertion
  • Meeting with Benoit
    • in commit message, talk about bugs instead of mutants
    • 3 steps
      • oracle enhancement only
      • new input
      • combination
    • write why the problem is difficult
    • write different kinds of message with each a specific focus
    • maybe compare trace of amplified test vs original
    • study commit messages related to tests
  • git log --grep "^Tested"

5.5.2 Blocking Points

  • What will the scientific contribution be?
    • Software Engineering is often at the border.
    • We tackle complex problem, that the industry is not particularly interested in, at least directly.
    • applying existing methods and see if they scale or just that they can be implemented, is a contribution in itself
  • What kind of evaluation?
    • survey
    • performance
    • comparison with study of repos

5.5.3 Planned Work [3/10]

  • Talk @ Workshop Software Engineering Research <2018-03-08 Thu 10:00>–<2018-03-08 Thu 10:20>
    • Workshop <2018-03-08 Thu 09:30>–<2018-03-08 Thu 12:30>
    • Room 4523
  • Register for ICST
  • Dspot [0/1]
    • Extract hard-coded amplifications messages
  • Start writing [4/4]
    • Problem statement
      • technical
    • Comparison with works on description
      • Why we can't apply them for our work
    • Comparison with works on test cases minimization
      • Explaining what they do
        • rephrase a description from a survey or something
      • Why we can't apply them for our work
    • Whether using an NLG is useful
  • Start doing a simple NL commit messages generator
    • Long stupid description
      • Get what amplifications were applied
      • done
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant [1/4]
    • Need a more precise location for the mutant location
      • column number
        • not available
      • maybe I don't need it
    • Need to know the killing assertion
      • Add a trace of this when a test is kept
    • Adding as dependency
    • Use it
  • Fix https://github.com/STAMP-project/dspot/issues/336
  • Study commit messages related to tests
  • More precise try/catch would actually be useful for slicing

5.6 DONE Week 6 <2018-03-12 Mon>–<2018-03-18 Sun>

5.6.1 Things Done

5.6.2 Blocking Points

  • How to detect an amplification that modifies a statement?
    • added amplification -> easy
    • modifying amplification -> ?
      • maybe they have tags/annotations?
        • maybe I could implement that
    • use annotations during amplification process to "tag" amplified statements
  • What about a change listener to detect amplifications? and easier amplification counter
    • it is silly because we are applying amplifications
    • and big overhead
    • No, use annotations

5.6.3 Planned Work [8/17]

  • Change apartment <2018-03-15 Thu>
    • move out, hotel -> university
    • retrieve keys <2018-03-15 Thu 12:00>–<2018-03-15 Thu 16:30>
    • move in
  • Register for ICST
  • Simple NL commit messages generator
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Study commit messages related to tests
  • More precise try/catch would actually be useful for slicing
  • Retrieve amplifications [1/2]
    • https://github.com/STAMP-project/dspot/issues/362#issuecomment-372384168
      • it is possible to explore the AST and get the amplifications
    • documentation [4/4]
      • reduce
      • ampTestToParent
      • tmpAmpTestToParent [5/5]
        • is it a buffer to add new relations after applying mutators?
          • yes, to have rounds
        • why isn't it used everywhere?
          • ampTestToParent is directly modified
        • returns the input without modification, need refactor
        • when it is used somewhere, amplification counter not incremented
          • I was wrong
        • document it
      • StatementAdd doesn't increment amplification counter
      • these issues were resolved
    • implement annotations
    • explore trees to find amplifications
  • Clean amplification code [8/8]
    • amplification counter increment during cloning
      • two methods
        • refactor i-amp
        • refactor a-amp
      • remove original public method and make sure everything work
      • clean imports
    • rework assertion counter because 1 clone can mean many a-amplifications
      • don't increment counter in during cloning
    • removing an assertion means +1 for the counter?
      • no
    • parenting link
      • update parenting link during cloning
      • remove updating outside
      • remove plain getter
      • load buffer before starting
    • parenting map loading is ugly
      • yeah, well…
    • documentation
    • tests for verifying counter?
      • with report
    • close issue in message
  • rename Amplifier to InputAmplifier
    • too much conflicts
  • add up amplifications of parents?
    • no if a parent has amplifications it is reported
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Ask for access outside working hours
  • Respond to https://github.com/STAMP-project/dspot/issues/54
  • Understand https://github.com/STAMP-project/dspot/pull/360
  • Work on report
    • use explicit definitions
    • work on background sections
    • add Java examples
    • insist on the distinction between what and why information
    • describe thoroughly the oracle problem
  • Read papers
  • Meeting with Benoit <2018-03-16 Fri>
    • Make a formal proposal of natural language description
    • Ask people what they think about it
    • Ask (Simon U, Spoon) (, XWiki) (, SAT4J) what they think of my proposal
    • Difficulties to evaluate because there isn't a lot of material (DSpot isn't an established tool).
  • formal description of the NL description
  • ask for opinions

5.7 DONE Week 7 <2018-03-19 Mon>–<2018-03-25 Sun>

5.7.1 Things Done

  • https://stackoverflow.com/questions/14040/developer-testing-vs-qa-team-testing-what-is-the-right-division-of-work
  • Precise description of NL amplification description
    1. High level description:
      • Enhancement or rework
        1. [TEST] Enhancement of <original test's name>.

        2. [TEST] New test.

      • Part of the system where mutants are

        Target <method with mutants>.

    2. Slice for each new mutant killed, starting from the killing assertion
      1. New oracle
        • NL paraphrase of the assertion

          The <checked property> is checked.

        • NL description of the impact of the mutant
          • variables with different values

            If <changed variables> have different values,

          • branches differences

            and <different branches> are explored, the test can detect them.

      2. New behaviour
        1. Enhancement (Old mutants are still detected)
          • new interactions with objects

            New interactions with <object> using <methods>.

          • if also new oracle enhancement
            • new branches during execution

        2. Rework
          • unit test documentation

            <methods> are called on <object>.

  • Learned about the visitor pattern. That name is regrettably confusing.
  • Mutant description
    • Whole method removal
      • "this method was previously not tested in a scenario where it is useful"
    • change of condition
      • "these branches were previously not tested in a scenario where they are useful"
    • change to value of variable
      • "this variable was previously not tested in a scenario where it is useful"
    • DON'T DESCRIBE MUTANTS
      • too complicated, huge repercussions, little insights
      • starting to question the relevance of focus on mutants in the same method
        • should see it the other way, check properties for this method
  • Got a cold ๐Ÿคง

5.7.2 Blocking Points

  • Assertion count
    • Is the process: remove all assertions and generate all possible assertions?
    • if so then all assertions are counted as amplifications
  • Assertion log, can there be REMOVE+ADD instead of MODIFY?

5.7.3 Planned Work [0/12]

  • Register for ICST
  • Simple NL commit messages generator
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Study commit messages related to tests
  • More precise try/catch would actually be useful for slicing
  • Retrieve amplifications [6/9]
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Work on report
    • use explicit definitions
    • work on background sections
    • add Java examples
    • insist on the distinction between what and why information
    • describe thoroughly the oracle problem
  • Read papers
  • formal description of the NL description
  • ask for opinions (e.g. Simon U, XWiki, SAT4J)

5.8 DONE Week 8 <2018-03-26 Mon>–<2018-04-01 Sun>

5.8.1 Things Done

  • I did so much crap in the PR to refactor parent map ๐Ÿ˜ฃ
  • Discovered the field of cognitive support for unit test comprehension
    • why didn't I hear about that before???
  • ID of mutator now available in report
  • Worked on the report

5.8.2 Blocking Points

  • Assertion count
    • Is the process: remove all assertions and generate all possible assertions?
    • if so then all assertions are counted as amplifications
  • Assertion log, can there be REMOVE+ADD instead of MODIFY?
  • Should I focus solely on mutants description, amplifications descriptions or test case as a whole?
  • Need to identify the main object that is interacted with

5.8.3 Planned Work [2/12]

  • Register for ICST
  • Simple NL commit messages generator
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Study commit messages related to tests
  • More precise try/catch would actually be useful for slicing
  • Retrieve amplifications [1/4]
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Work on report
    • use explicit definitions
    • work on background sections
    • add Java examples
    • insist on the distinction between what and why information
    • describe thoroughly the oracle problem
  • Read papers
  • formal description of the NL description
  • ask for opinions (e.g. Simon U, XWiki, SAT4J)

5.9 DONE Week 9 <2018-04-02 Mon>–<2018-04-08 Sun>

5.9.1 Things Done

  • Spent time on UCL PhD application and IELTS

5.9.2 Blocking Points

  • Assertion count
    • Is the process: remove all assertions and generate all possible assertions?
    • if so then all assertions are counted as amplifications
  • Assertion log, can there be REMOVE+ADD instead of MODIFY?
  • Should I focus solely on mutants description, amplifications descriptions or test case as a whole?
  • Need to identify the main object that is interacted with

5.9.3 Planned Work [5/17]

  • Plan ICST trip
    • hotel
      • going to take the train everyday (the train alone is 1h, one way)
    • train
  • Get reimbursed for ICST
  • Simple NL commit messages generator
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Study commit messages related to tests
  • More precise try/catch would actually be useful for slicing
  • Retrieve amplifications [1/4]
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Work on report [1/6]
    • use explicit definitions
    • work on background sections
    • add Java examples
    • insist on the distinction between what and why information
    • describe thoroughly the oracle problem
    • add papers from 'General/Others'
  • Read papers
  • formal description of the NL description
  • ask for opinions (e.g. Simon U, XWiki, SAT4J)
  • SCOTCH might actually be useful
    • check other 'Not Relevant' papers
  • Meeting with Benoit <2018-04-03 Tue>
    • Preparation for ICST
      • hotel or train everyday?
        • don't know
    • my future
      • Interesting PhD at UCL but already 3 internships that went badly
        • have to be motivated, ask questions
    • citing blog posts
      • yes
    • no subjective comment
    • a test is a subset of observable responses that are equal to specification
    • a program is complete but not specifications
    • good introduction to talk about abstract definition of verification
    • need to select subset of R (in def of test activities)
    • encapsulation principle hides stuff which means you can't observe them
      • if something is private you can't test it
    • delegations is difficult to test
    • extremely difficult to define levels of tests
    • how do you know you have a good oracle
    • fig 3, test inputs are only lines 2-5 & 7
    • be clear about what infos we have about the amplified test, how it is collected, etc.
    • speak about coverage more than mutants
    • decide between coverage enhancement, slicing or minimization, text or casual relationships
    • need a good use case throughout the thesis
  • Questions about the PhD position at UCL
  • Why is KTH blocking my ens-rennes mails
    • grey listed
    • send email to support
    • warn benoit and madeleine

5.10 DONE Week 10 <2018-04-09 Mon>–<2018-04-15 Sun>

5.10.1 Things Done

  • ICST
    • Infer, sapienz
    • 1st talk interesting, good intro on flaky tests
    • Repairnator paper
    • Tests as specifications, or something like that, a paper that gives formal definitions of tests
    • Talk about over fitting in thesis
    • Write use cases instead of evaluation with developers
    • Really good keynote for testing in the video game industry
    • Nice paper/presentation by shin hong
    • Good talk of Josรฉ rollas on my subject
      • Check references
    • Check assumptions generation for mutation testing (mutant assumption) (question for talk about mutation compression)
  • Code Defenders talk by Josรฉ Rojร s
  • Bachelor and Master workshop
    • Zimin's talk
      • what if you predict a line in another function that is closer to another prediction in the good function

5.10.2 Blocking Points

  • Assertion count
    • Is the process: remove all assertions and generate all possible assertions?
    • if so then all assertions are counted as amplifications
  • Assertion log, can there be REMOVE+ADD instead of MODIFY?
  • Should I focus solely on mutants description, amplifications descriptions or test case as a whole?
  • Need to identify the main object that is interacted with

5.10.3 Planned Work [3/22]

  • ICST <2018-04-10 Tue>–<2018-04-12 Thu>
  • Apply to the PhD position
  • Check advices from last week meeting with Benoit
  • Get reimbursed for ICST
  • Simple NL commit messages generator
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Study commit messages related to tests
  • More precise try/catch would actually be useful for slicing
  • Retrieve amplifications [1/4]
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Work on report [1/6]
    • use explicit definitions
    • work on background sections
    • add Java examples
    • insist on the distinction between what and why information
    • describe thoroughly the oracle problem
    • over fitting
  • Read papers
  • formal description of the NL description
  • ask for opinions (e.g. Simon U, XWiki, SAT4J)
  • SCOTCH might actually be useful
    • check other 'Not Relevant' papers
  • read code defenders papers
  • check Repairnator paper
  • Tests as specifications, or something like that, a paper that gives formal definitions of tests
  • Use cases instead of evaluation
  • Check mutant assumption
  • Read Josรฉ Rojas's ICST paper

5.11 DONE Week 11 <2018-04-16 Mon>–<2018-04-22 Sun>

5.11.1 Things Done

5.11.2 Blocking Points

  • Assertion count
    • Is the process: remove all assertions and generate all possible assertions?
    • if so then all assertions are counted as amplifications
  • Assertion log, can there be REMOVE+ADD instead of MODIFY?
  • Should I focus solely on mutants description, amplifications descriptions or test case as a whole?
  • Need to identify the main object that is interacted with

5.11.3 Planned Work [8/21]

  • Change top mattress <2018-04-16 Mon 13:00>
  • Apply to the PhD position
  • Get reimbursed for ICST
  • Simple NL commit messages generator
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Study commit messages related to tests
  • More precise try/catch would actually be useful for slicing
  • Retrieve amplifications [1/4]
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Work on report [2/6]
    • work on background sections
    • add Java examples
    • insist on the distinction between what and why information
    • describe thoroughly the oracle problem
    • over fitting
    • ugly C#
  • Read papers
  • formal description of the NL description
  • ask for opinions (e.g. Simon U, XWiki, SAT4J)
  • SCOTCH might actually be useful
    • check other 'Not Relevant' papers
  • read code defenders papers
  • check Repairnator paper
    • example of software development bot
  • Tests as specifications, or something like that, a paper that gives formal definitions of tests
  • Use cases instead of evaluation
  • Check mutant assumption
    • it's mutation applied at the software design level
  • Read Josรฉ Rojas's ICST paper

5.12 DONE Week 12 <2018-04-23 Mon>–<2018-04-29 Sun>

5.12.1 Things Done

  • Proofread Long's paper
  • Mattias' presentation

5.12.2 Blocking Points

  • Assertion count
    • Is the process: remove all assertions and generate all possible assertions?
    • if so then all assertions are counted as amplifications
  • Assertion log, can there be REMOVE+ADD instead of MODIFY?
    • there seems to be a lot of MODIFY
  • Should I focus solely on mutants description, amplifications descriptions or test case as a whole?
  • Need to identify the main object that is interacted with
  • Ugly to pass the AmplificationListener around

5.12.3 Planned Work [3/17]

  • Apply to the PhD position
  • Get reimbursed for ICST
  • Simple NL commit messages generator
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Study commit messages related to tests
  • More precise try/catch would actually be useful for slicing
  • Retrieve amplifications [1/5]
    • use another counter to keep the pointer to AST nodes of amplifications
    • add categories of modifying amplifiers
      • MODIFY LITERALS
      • MODIFY INTERACTIONS
    • write test to make sure every amplifier includes the counter update
    • write test to make sure every amplifier logs the amplifications
    • identify amplification when writing the JSON report for the test case
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Work on report [1/6]
  • formal description of the NL description
  • ask for opinions (e.g. Simon U, XWiki, SAT4J)
  • check other 'Not Relevant' papers
  • read code defenders papers
  • concurrent map in amplification listener
  • upgrade jacoco version
    • not my job, and PIT is probably also not supporting java10
  • unspecify tests [4/5]
    • use a Set to avoid failing tests due to the ordering of assertions
    • testBuildNewAssertWithComment(fr.inria.diversify.dspot.assertGenerator.MethodsAssertGeneratorTest): expected:<....junit.Assert.assert[True(((fr.inria.sample.ClassWithBoolean)cl).getTrue());(..)
    • testBuildNewAssert(fr.inria.diversify.dspot.assertGenerator.MethodsAssertGeneratorTest): expected:<....junit.Assert.assert[True(((fr.inria.sample.ClassWithBoolean)cl).getTrue());(..)
    • testNoInstrumentationOnGeneratedObject(fr.inria.diversify.dspot.assertGenerator.AssertGeneratorHelperTest): expected:<..."test_sd6__1");(..)
    • testWithLoop(fr.inria.diversify.dspot.amplifier.StatementAddTest): expected:<...compute(0);(..)
    • find others that could fail
    • problem of readability when it fails

5.13 DONE Week 13 <2018-04-30 Mon>–<2018-05-06 Sun>

5.13.1 Things Done

  • DSpot on javapoet
project=.
targetModule=.
src=src/main/java/
testSrc=src/test/java
javaVersion=8
outputDirectory=dspot-out/
filter=com.squareup.javapoet.*
  • there are new amplified tests but only with new amplifications
  • DSpot on mustache.java (9.0) compiler
  • Listener only collects a-amplifications, trying something else, like a global counter
    • and it was very slow
  • First version prmessagegen
python3 main.py -amplog ../dspot/javapoet/dspot-out/com.squareup.javapoet.TypeNameTest_amp_log.json -mutants ../dspot/javapoet/dspot-out/com.squareup.javapoet.TypeNameTest_mutants_killed.json
for filename in ../dspot/javapoet/dspot-out/*_amp_log.json; do; python3 main.py -test ${filename: : -13}; done;

5.13.2 Blocking Points

  • Assertion count
    • Is the process: remove all assertions and generate all possible assertions?
    • if so then all assertions are counted as amplifications
    • it's fine
  • Should I focus solely on mutants description, amplifications descriptions or test case as a whole?
    • do everything and compare
  • Need to identify the main object that is interacted with
  • Ugly to pass the AmplificationListener around
    • stand-alone tool

5.13.3 Planned Work [12/19]

  • Simple NL commit messages generator
  • Classification of mutators
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Study commit messages related to tests
  • More precise try/catch would actually be useful for slicing
  • Retrieve amplifications [3/4]
    • add categories of modifying amplifiers
      • MODIFY LITERALS
      • MODIFY INTERACTIONS
    • write test to make sure every amplifier includes the counter update
    • write test to make sure every amplifier logs the amplifications
    • identify amplification when writing the JSON report for the test case
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Work on report [1/5]
  • formal description of the NL description
  • ask for opinions (e.g. Simon U, XWiki, SAT4J)
  • check other 'Not Relevant' papers
  • read code defenders papers
  • concurrent map in amplification listener
  • unspecify tests [4/5]
    • use a Set to avoid failing tests due to the ordering of assertions
    • testBuildNewAssertWithComment(fr.inria.diversify.dspot.assertGenerator.MethodsAssertGeneratorTest): expected:<....junit.Assert.assert[True(((fr.inria.sample.ClassWithBoolean)cl).getTrue());(..)
    • testBuildNewAssert(fr.inria.diversify.dspot.assertGenerator.MethodsAssertGeneratorTest): expected:<....junit.Assert.assert[True(((fr.inria.sample.ClassWithBoolean)cl).getTrue());(..)
    • testNoInstrumentationOnGeneratedObject(fr.inria.diversify.dspot.assertGenerator.AssertGeneratorHelperTest): expected:<..."test_sd6__1");(..)
    • testWithLoop(fr.inria.diversify.dspot.amplifier.StatementAddTest): expected:<...compute(0);(..)
    • find others that could fail
    • problem of readability when it fails
    • https://github.com/STAMP-project/dspot/pull/413
  • Referees reminder (deadline <2018-05-11 Fri>)
  • Answer Benoit's mail on Towards Automatically Generating Descriptive Names for Unit Tests
    • they, partly, analyse the text in the test's body
    • Action
    • Expected Outcome
    • Scenario Under Test
    • we want to explain amplifications
  • Meeting Benoit <2018-05-02 Wed 17:00>–<2018-05-02 Wed 18:00>
    • to present
      • l'รฉtat de l'art: que font les papiers que tu as lus et qu'est-ce qui diffรฉrencie ton approche des autres
      • l'รฉtat de ton outil: oรน est-il par rapport ร  DSpot, qu'est-ce qu'il peut faire
      • tes plans pour une validation expรฉrimentale de cet outil
      • tes plans d'ici la fin du stage
    • print report
    • discuss SCAM (paper deadline <2018-06-15 Fri>)
      • I'll never have a contribution and a report by <2018-06-08 Fri>, so adding a paper on top of that…
    • a table for the related works could still be nice, in addition to the text
    • abstract myself from failing tests
    • hyperlink of the line of the mutant
    • have the generator independent from DSpot
      • serialize the list of amplifications
    • multiple version of generators
    • pick a text extension
    • have a discussion on the different versions
    • use projects from the evaluation in the paper, not dhell
  • mutants reported should only be the new ones
    • can remove, afterwards, redundant mutants from parent
  • remove discarded test from the amplification log

5.14 DONE Week 14 <2018-05-07 Mon>–<2018-05-13 Sun>

5.14.1 Things Done

  • New command line usage python3 main.py -p ../dspot/javapoet/dspot.properties -t com.squareup.javapoet.JavaFileTest
  • for link previews to work on github
    • can't be inside a md link
    • can't have /./
  • Using distance metric for mutants to express how indirect the mutants detection is?
    • that's not a clear and confusing metric
  • run DSpot on jsoup
    • very long
    • NPE on minimisation
  • run DSpot on twilio
    • way too big
  • running DSpot on dhell doesn't yield anything
  • Pierre Laperdrix talk on browser fingerprinting
  • run DSpot on scribejava-core, no i-amplified test
  • tried on jodatime but JUnit too old
  • socketio test suite not passing
  • javacpp test suite not passing
  • google-java-format seems to be way too long
  • nothing for fb-contrib
  • fess is too slow
  • run on xwiki-commons-core
  • didn't work on jabref
  • didn't work on webdrivermanager

5.14.2 Blocking Points

  • Need to identify the main object that is interacted with
  • What does it mean to have only a-amp, when the original tests are already a-amplified?
    • nah, must be a problem, I don't collect all a-amps

5.14.3 Planned Work [6/15]

  • Renew SL access card <2018-05-08 Tue>
  • Workshop <2018-05-08 Tue>
    • Room 4423
  • Study for IELTS
  • Withdraw UCL application
    • withdraw online application
    • people to apologise to
      • Justyna
      • Shin
      • ENS profs
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Retrieve amplifications [0/1]
    • add categories of modifying amplifiers
      • MODIFY LITERALS
      • MODIFY INTERACTIONS
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Work on report [0/4]
  • ask for opinions (e.g. Simon U, XWiki, SAT4J)
  • unspecify tests [4/5]
    • use a Set to avoid failing tests due to the ordering of assertions
    • testBuildNewAssertWithComment(fr.inria.diversify.dspot.assertGenerator.MethodsAssertGeneratorTest): expected:<....junit.Assert.assert[True(((fr.inria.sample.ClassWithBoolean)cl).getTrue());(..)
    • testBuildNewAssert(fr.inria.diversify.dspot.assertGenerator.MethodsAssertGeneratorTest): expected:<....junit.Assert.assert[True(((fr.inria.sample.ClassWithBoolean)cl).getTrue());(..)
    • testNoInstrumentationOnGeneratedObject(fr.inria.diversify.dspot.assertGenerator.AssertGeneratorHelperTest): expected:<..."test_sd6__1");(..)
    • testWithLoop(fr.inria.diversify.dspot.amplifier.StatementAddTest): expected:<...compute(0);(..)
    • find others that could fail
    • problem of readability when it fails
    • https://github.com/STAMP-project/dspot/pull/413
  • improvements [4/10]
    • group assertions
    • Add links in report
    • "The new test can detect if toBuilder returns XXX instead of the regular value. The original test 'toto' could not detect this fault" where XXX is the value injected by the mutation
      • isn't really suitable for other kinds of mutants, and even for return-related mutants as what might be interesting is that they change the state of the SUT, but don't have a direct relation with the test case
    • also consider when mutation modifies the state and is detected later
    • Don't name mutators, only explain the transformation instance (i.e. mutator category?)
    • Group mutants that are on the same line
    • show the generated test with a diff
      • don't put diff for assertions when there are too much (more than 10)
    • don't talk about 'new detectable bugs' but about 'assess more behavior than original' and/or 'reach more paths than original'
    • For example, "this new test assesses more behavior as the original: it can detect 5 changes in the source code that the original test could not detect:" and then show the changes (diffs)
    • handle try/catch
  • Read dspot.properties and read various info like the path
  • Measure the overhead of the amplification logging
    • javapoet
      • DSpot(master): 47m22s
      • DSpot(collect_amp): 45m52s 45m50s 52m3s
  • Make better names for generated tests
  • Don't add try/catch when @Test(expected = IllegalArgumentException.class)

5.15 TODO Week 15 <2018-05-14 Mon>–<2018-05-20 Sun>

5.15.1 Things Done

5.15.2 Blocking Points

  • Need to identify the main object that is interacted with

5.15.3 Planned Work [0/11]

  • IELTS Speaking <2018-05-18 Fri 10:00>
  • IELTS <2018-05-19 Sat 08:30>
  • Integrate WALA to compute a slice per new mutant
    • Adding as dependency
    • Use it
  • Retrieve amplifications [0/1]
    • add categories of modifying amplifiers
      • MODIFY LITERALS
      • MODIFY INTERACTIONS
  • Retrieve mutants
    • The issue on runtime info from PIT
  • Work on report [0/4]
  • ask for opinions (e.g. Simon U, XWiki, SAT4J)
  • unspecify tests [4/5]
    • use a Set to avoid failing tests due to the ordering of assertions
    • testBuildNewAssertWithComment(fr.inria.diversify.dspot.assertGenerator.MethodsAssertGeneratorTest): expected:<....junit.Assert.assert[True(((fr.inria.sample.ClassWithBoolean)cl).getTrue());(..)
    • testBuildNewAssert(fr.inria.diversify.dspot.assertGenerator.MethodsAssertGeneratorTest): expected:<....junit.Assert.assert[True(((fr.inria.sample.ClassWithBoolean)cl).getTrue());(..)
    • testNoInstrumentationOnGeneratedObject(fr.inria.diversify.dspot.assertGenerator.AssertGeneratorHelperTest): expected:<..."test_sd6__1");(..)
    • testWithLoop(fr.inria.diversify.dspot.amplifier.StatementAddTest): expected:<...compute(0);(..)
    • find others that could fail
    • problem of readability when it fails
    • https://github.com/STAMP-project/dspot/pull/413
  • improvements [0/6]
    • "The new test can detect if toBuilder returns XXX instead of the regular value. The original test 'toto' could not detect this fault" where XXX is the value injected by the mutation
      • isn't really suitable for other kinds of mutants, and even for return-related mutants as what might be interesting is that they change the state of the SUT, but don't have a direct relation with the test case
    • also consider when mutation modifies the state and is detected later
    • Don't name mutators, only explain the transformation instance (i.e. mutator category?)
    • don't talk about 'new detectable bugs' but about 'assess more behavior than original' and/or 'reach more paths than original'
    • For example, "this new test assesses more behavior as the original: it can detect 5 changes in the source code that the original test could not detect:" and then show the changes (diffs)
    • handle try/catch
  • Make better names for generated tests
  • Don't add try/catch when @Test(expected = IllegalArgumentException.class)

5.16 TODO Week 16 <2018-05-21 Mon>–<2018-05-27 Sun>

5.16.1 Things Done

5.16.2 Blocking Points

5.16.3 Planned Work

5.17 TODO Week 17 <2018-05-28 Mon>–<2018-06-03 Sun>

5.17.1 Things Done

5.17.2 Blocking Points

5.17.3 Planned Work

5.18 TODO Week 18 <2018-06-04 Mon>–<2018-06-10 Sun>

5.18.1 Things Done

5.18.2 Blocking Points

5.18.3 Planned Work [0/2]

  • Report <2018-06-08 Fri 12:00>
  • Renew SL access card <2018-06-07 Thu>

5.19 TODO Week 19 <2018-06-11 Mon>–<2018-06-17 Sun>

5.19.1 Things Done

5.19.2 Blocking Points

5.19.3 Planned Work

5.20 TODO Week 20 <2018-06-18 Mon>–<2018-06-24 Sun>

5.20.1 Things Done

5.20.2 Blocking Points

5.20.3 Planned Work [0/1]

  • Defense Rehearsal @ ENS <2018-06-22 Fri>

5.21 TODO Week 21 <2018-06-25 Mon>–<2018-07-01 Sun>

5.21.1 Things Done

5.21.2 Blocking Points

5.21.3 Planned Work [0/1]

  • Defense <2018-06-25 Mon>

6 Conclusion

Date: 2018-02-07 Wed 00:00

Author: Simon Bihel

Created: 2018-05-13 Sun 21:15

Validate