All right, next up, Shivam Gupta.
Okay.
Good afternoon, everyone.
Today I will be talking about my GSOC project.
It was mentored by Henrik Olson, and this GSOC project is about
patch-based code coverage testing for LLVM patches.
And so in this talk, the agenda is, first we will introduce what is the project is about,
and then what is the terminology we use, like how LLVM test cases are written,
and then we will see what is LLVM source-based code coverage that is used to get the code coverage of a patch.
And then we see how it is implemented.
It is basically a Python script, so we will see what are the functions used to implement this tool.
And then we will see a demo, like it's already a patch is there in LLVM community.
We will see like what is the lines are covered or what lines are not covered with this patch.
So we will start by introducing introductions.
So LLVM test cases are written in a lit format.
Regression tests are written in lit format, and unit test cases are written in Google test format,
or Google mock these formats.
So the goal of this project is to help developer to create good test coverage for their patches,
and it will also help the reviewers to know that the code they are submitting,
it has a good test coverage or not.
So this is the project, and to accomplish this project we have created a Python tool.
It's around 800 lines of Python code, so it will fetch the patch as input,
and then it will extract some information like what are the source lines in the patch,
what are the test cases in the patch, test case lines in the patch,
and then we build a LLVM project with the code coverage enabled.
So it will instrument our binary.
So whenever we run the test case with this binary,
so it will generate a prof data file that will be further converted, further processed,
and then it will show the lines which lines are covered or not covered by the source code of the patch.
So LLVM test suits basically like they have two kind of test cases written for any patches.
One is regression test, and second is unit test.
Mainly the regression tests are written for most of the patches.
So these regression tests are in .LL format or .C format for different tools.
So mostly our focus is on regression tests, and some test cases are written in unit test case.
So these test cases are test for libraries, like support libraries or FSEG data types.
So these are checking the feature in the system, how it is well indicated in the system.
So this is unit test case.
Regression test is very small, but you can see at the top there is one run line which will actually run for this test case.
Then there is unit test case.
This is using Google Gold Test Library.
So it has some micros to check.
It is not important, but these two kind of test cases are in LLVM for any patches.
And then we will see what is the source-based code coverage.
So source-based code coverage consists of three steps.
The first step is compiling the program with the coverage enabled.
We want to instrument any binary.
So we will use a fropile generate this flag we will use and this will generate foo binary which is instrumented.
So in the next step when we run this binary, it will generate a prof data file.
That prof data file contains the data for further creating coverage reports.
So next is the tool is LLVM prof data.
This tool is used to convert the prof format to prof data format which is further used by LLVM cov to generate or show the report of what lines are covered or not.
In the next slide we have a simple test case and I have generated the report.
It checks if the number is even or odd.
If we pass suppose 5, it will say that the number is odd and this line, this if condition will not run.
So it will show like this.
This is the report of LLVM cov for any program.
Next is implementation.
So for implementation I have submitted two patches.
For this first one is about the change in LLVM lit.
This is the testing tool that is used to run the test case, regression test case in LLVM.
So initially whenever we run a test case, it will generate prof data in some random name.
So we have modified that and we have given a proper name for every test case.
So it will generate a proper name in a specific directory.
So this is categorization of prof data.
Next we have the main tool that has all the functions that will pass the patch and then build the project, LLVM project and then generate data.
Then process the data to show the coverage report to reviewer or a patch author.
Next these are the some functions that are implemented in the tool.
First two function is just a logging function.
And then it is sequentially like we as a name suggest we have first we create the patch from the last commit or from the patch itself.
And then we accept the source file and then we have write source file allow list that is used to reduce the coverage data.
Because if we generate the coverage data for all the files of LLVM then it will be around 150 MB for each test case.
So it will be difficult to process later.
So we will use, we have used a flag afro file list.
This flag used to generate coverage data for only the files in the patch.
So next we accept the modified source line from the patch and then we build the project.
We build the project with a flag LLVM build instrumentation.
So this flag is passed during the CMECH invocation.
So when we pass this the binary that will be old for LLVM project will have instrumentation enabled.
And then we run the single test case with coverage and that is helper function next the modified lit test case or unit test case.
Whichever they if it is if the patch contains a lit test case then it will run the regression.
It will run the that function and if it has unit test case then this function will call and the test case will run.
And next we have a process coverage data which will process the data.
And next similarly we have a coverage file and it will run.
Then we will have a print coverage detail that will actually be printing the coverage detail.
We will also have a log file. So print coverage detail have a print a lot of details.
So it will print something to log file and then we will print common uncovered line which is so in a patch there is one source file.
But there are many test case.
If one test case is covering the source file then it is covered.
But if all the test case are not covering a source line then it means that this line is uncovered.
So it will print the uncovered line this way and then there are some helper functions which is not important.
This is the GitHub CI workflow that is actually is a file that is used to compile the project like on GitHub.
So it is like it is holding the project and then at the end it is running a Python Git code coverage.
This is the file name. So it will run here in the Python code and then it will print the coverage result.
I will show this is the format.
It will show the common uncovered line for the...