486 lines
19 KiB
HTML
486 lines
19 KiB
HTML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html
|
|
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
<head>
|
|
<title>Writing your own checks</title>
|
|
<link rel="stylesheet" type="text/css" href="mystyle.css"/>
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<!-- The header -->
|
|
<table border="0" width="100%" summary="header layout">
|
|
<tr>
|
|
<td>
|
|
<h1>Writing your own Checks</h1>
|
|
</td>
|
|
<td align="right">
|
|
<img src="logo.png" alt="Checkstyle Logo"/>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<!-- content -->
|
|
<table border="0" width="100%" cellpadding="5" summary="body layout">
|
|
<tr>
|
|
<!--Left menu-->
|
|
<td class="menu" valign="top">
|
|
<p><a href="#overview">Overview</a></p>
|
|
<p><a href="#checks">Writing Checks</a></p>
|
|
<ul>
|
|
<li><a href="#grammar">Java Grammar</a></li>
|
|
<li><a href="#gui">Checkstyle GUI</a></li>
|
|
<li><a href="#visitor">Vistor Pattern</a></li>
|
|
<li><a href="#regtokens">Visitor in Action</a></li>
|
|
<li><a href="#astnav">Navigating the AST</a></li>
|
|
<li><a href="#configchecks">Defining Properties</a>
|
|
<li><a href="#logerrors">Logging Errors</a>
|
|
<li><a href="#integrate">Integrating Checks</a></li>
|
|
<li><a href="#limitations">Limitations</a></li>
|
|
</ul>
|
|
<p><a href="#filesetchecks">Writing FileSetChecks</a>
|
|
<p><a href="#huh">Huh?</a></p>
|
|
</td>
|
|
|
|
<!--Content-->
|
|
<td class="content" valign="top" align="left">
|
|
<a name="overview"></a>
|
|
<h2>Overview</h2>
|
|
<p class="body">
|
|
OK, so you have finally decided to write your own check.
|
|
Welcome aboard, this is really a fun thing to do. There are
|
|
actually two kinds of checks, so before you can start, you have
|
|
to find out which kind of check you want to implement.
|
|
</p>
|
|
|
|
<p class="body">
|
|
The functionality of Checkstyle is implemented in modules that
|
|
can be plugged into Checkstyle. Modules can be containers for
|
|
other modules, i.e. they form a tree structure. The toplevel
|
|
modules that are known directly to the Checkstyle kernel (which
|
|
is also a module and forms the root of the tree) are
|
|
FileSetChecks. These are pretty simple to grasp: they take a set
|
|
of input files and fire error messages.
|
|
</p>
|
|
|
|
<p class="body">
|
|
Checkstyle provides a few FileSetCheck implementations by
|
|
default and one of them happens to be the TreeWalker. A
|
|
TreeWalker typically has some submodules, called Checks. The
|
|
TreeWalker operates by seperately transforming each of the java
|
|
input files into an abstract syntax tree and then handing the
|
|
result over to each of the Check submodules which in turn have a
|
|
look at a certain aspect of the tree.
|
|
</p>
|
|
|
|
<a name="checks"></a>
|
|
<h2>Writing Checks</h2>
|
|
<p class="body">
|
|
|
|
Most of the functionality of Checkstyle is implemented as
|
|
Checks. If you know how to write your own Checks, you can extend
|
|
Checkstyle according to your needs without having to wait for
|
|
the checkstyle development team. You are about to become a
|
|
Checkstyle Expert.
|
|
</p>
|
|
|
|
<p class="body">
|
|
Suppose you have a convention that the number of methods in a
|
|
class should not exceed a certain limit, say 30. This rule makes
|
|
sense, a class should only do one thing and do it well. With a
|
|
zillion methods chances are that the class does more than one
|
|
thing. The only problem you have is that your convention is not
|
|
checked by Checkstyle, so you'll have to write your own check
|
|
and plug it into the Checkstyle framework.
|
|
</p>
|
|
|
|
<p class="body">
|
|
This chapter is organized as a tour that takes you through the
|
|
process step by step and explains both the theoretical
|
|
foundations and the Checkstyle API along the way.
|
|
</p>
|
|
|
|
<a name="grammar"></a>
|
|
<h3>Java Grammar</h3>
|
|
<p class="body">
|
|
Every Java Program is structured into files, and each of these
|
|
files has a certain structure. For example, if there is a
|
|
package statement then it is the first line of the file that is
|
|
not comment or whitespace. After the package statement comes a
|
|
list of import statements, which is followed by a class or
|
|
interface definition, and so on.
|
|
</p>
|
|
<p class="body">
|
|
If you have ever read an introductory level Java book you probably
|
|
knew all of the above. And if you have studied computer science,
|
|
you probably also know that the rules that specify the Java Language
|
|
can be formally specified using a Grammar (statement is simplified
|
|
for didactic purposes).
|
|
</p>
|
|
<p class="body">
|
|
Tools exist which read a grammar definition and produce a parser
|
|
for the language that is specified in the grammar. In other
|
|
words the output of the tool is a program that can transform a
|
|
stream of characters (a Java File) into a tree representation
|
|
that reflects the structure of the file. CheckStyle uses the
|
|
parser generator <a href="http://www.antlr.org">ANTLR</a> but
|
|
that is an implementation detail you do not need to worry about
|
|
when writing checks. Several other parser generators exist and
|
|
they all work well.
|
|
</p>
|
|
|
|
<a name="gui"></a>
|
|
<h3>The Checkstyle SDK gui</h3>
|
|
<p class="body">
|
|
Still with us? Great, you have mastered the basic theory so here
|
|
is your reward - a gui that displays the structure of a Java
|
|
source file. To run it type
|
|
<pre>
|
|
java -classpath checkstyle-all-${version}.jar com.puppycrawl.toos.checkstyle.gui.Main
|
|
</pre>
|
|
</p>
|
|
<p class="body">
|
|
on the command line. Click the button at the botton of the frame
|
|
and select a syntactically correct Java source file. The frame
|
|
will be populated with a tree that corresponds to the structure
|
|
of the java source code.
|
|
</p>
|
|
|
|
<p class="body">
|
|
TODO: screenshot
|
|
</p>
|
|
|
|
<p class="body">
|
|
In the leftmost column you can open and close branches of the
|
|
tree, the remaining columns display information about each node
|
|
in the tree. The second column displays a token type for each
|
|
node. As you navigate from the root of the tree to one of the
|
|
leafs, you'll notice that the token type denotes smaller and
|
|
smaller units of your source file, i.e. close to the root you
|
|
might see the token type CLASS_DEF (a node that represents a
|
|
class definition) while you will see token types like IDENT (an
|
|
identifier) near the leafs of the tree.
|
|
</p>
|
|
|
|
<p class="body">
|
|
We'll get back to the details in the other columns later, they
|
|
are important for implementing checks but not for understanding
|
|
the basic concepts. For now it is sufficient to know that the
|
|
gui is a tool that lets you look at the structure of a java
|
|
file, i.e. you can see the java grammar 'in action'.
|
|
</p>
|
|
|
|
<a name="visitor"></a>
|
|
<h3>Understanding the visitor pattern</h3>
|
|
<p class="body">
|
|
Ready for a bit more theory? OK, here it comes: The last bit
|
|
that is missing before you can start writing checks is that you have
|
|
to understand the Vistor pattern.
|
|
</p>
|
|
|
|
<p class="body">
|
|
When working with ASTs, a simple approach to define check operations
|
|
on them would be to add a check() method to the Class that defines
|
|
the AST nodes. For example, our AST type could have a method
|
|
checkNumberOfMethods(). Such an approach would suffer from a few
|
|
serious drawbacks. Most importantly, it does not provide an extensible
|
|
design, i.e. the checks have to be known at compile time, there is no
|
|
way to write plugins.
|
|
</p>
|
|
|
|
<p class="body">
|
|
Hence Checkstyle's AST classes do not have any methods that implement
|
|
checking functionality. Instead, Checkstyle's TreeWalker takes a set
|
|
of objects that conform to a Check interface. OK, you're right -
|
|
actually it's not an interface but an abstract class to provides
|
|
some helper methods. A Check provides
|
|
methods that take an AST as an argument and perform the checking
|
|
process for that AST, most prominently <span
|
|
class="code">visitToken()</span>.
|
|
</p>
|
|
|
|
<p class="body">
|
|
|
|
It is important to understand that the individual Checks do no
|
|
drive the AST traversal. Instead, the TreeWalker initiates a recursive
|
|
descend from the root of the AST to the leaf nodes and calls the Check
|
|
methods.
|
|
</p>
|
|
|
|
<p class="body">
|
|
Before any visitor method is called, the TreeWalker will call
|
|
beginTree() to give the Check a chance to do some
|
|
initialization. Then, when performing the recursive descend from the
|
|
root to the leaf nodes, the visitToken method is called. Unlike the
|
|
basic examples in the pattern book, there is a visitToken counterpart
|
|
called leaveToken(). The TreeWalker will call that method to signal
|
|
that the subtree below the node has been processed and the TreeWalker
|
|
is backtracking from the node. After the root node has been left, the
|
|
TreeWalker will call finishTree().
|
|
</p>
|
|
|
|
<p class="body">
|
|
If you'd like to learn more about the Visitor pattern you should
|
|
grab a copy of the Gof
|
|
<a href="http://c2.com/cgi/wiki?DesignPatternsBook">Design
|
|
Patterns</a> book.
|
|
</p>
|
|
|
|
<a name="regtokens"></a>
|
|
<h3>Visitor in action</h3>
|
|
<p class="body">
|
|
Let's get back to our example and start writing code - that's why
|
|
you came here, right?
|
|
When you fire up the checkstyle GUI and look at a few source
|
|
files you'll figure out pretty quickly that you are mainly
|
|
interested in the number of tree nodes of type METHOD_DEF. The
|
|
number of such tokens should be counted separately for each
|
|
CLASS_DEF / INTERFACE_DEF.
|
|
</p>
|
|
|
|
<p class="body">
|
|
Hence we need to register the Check for the token types
|
|
CLASS_DEF and INTERFACE_DEF. The TreeWalker will only call
|
|
visitToken for these token types. Because the requirements of
|
|
our tasks are so simple, there is no need to implement the other
|
|
fancy methods, like finishTree(), etc., so here is our first
|
|
shot at our check implementation:
|
|
</p>
|
|
|
|
<pre>
|
|
package com.mycompany.checks;
|
|
|
|
import com.puppycrawl.tools.checkstyle.api.*;
|
|
|
|
public class MethodLimitCheck extends Check
|
|
{
|
|
private int max = 30;
|
|
|
|
public int[] getDefaultTokens()
|
|
{
|
|
return new int[]{TokenTypes.CLASS_DEF, TokenTypes.INTERFACE_DEF};
|
|
}
|
|
|
|
public void visitToken(DetailAST ast)
|
|
{
|
|
int methodDefs = ast.getChildCount(TokenTypes.METHOD_DEF);
|
|
if (methodDefs > max) {
|
|
log(ast.getLineNo(), "too many methods, only " + max + " are allowed");
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
|
|
<a name="astnav"></a>
|
|
<h3>Navigating the Abstract Syntax Tree (AST)</h3>
|
|
<p class="body">
|
|
In the example above you already saw that the DetailsAST class
|
|
provides utility methods to extract information from the tree, like
|
|
getChildCount(). By now you have probably consulted the api
|
|
documentation and found that DetailsAST additionally provides methods
|
|
for navigating around in the syntax tree, like finding the next
|
|
sibling of a node, the childs of a node, the parent of a node, etc.
|
|
</p>
|
|
|
|
<p class="body">
|
|
These methods provide great power for developing complex
|
|
checks. Most of the checks that Checkstyle provides by default
|
|
use these methods to analyze the environment of the ASTs that
|
|
are visited by the TreeWalker. Don't abuse that feature for
|
|
exploring the whole tree, though. Let the TreeWalker drive the
|
|
tree traversal and limit the visitor to the neighbours of a
|
|
single AST.
|
|
</p>
|
|
|
|
<a name="configchecks"></a>
|
|
<h3>Defining Check Properties</h3>
|
|
<p class="body">
|
|
|
|
Ok Mr. Checkstyle, that's all very nice but in my company we
|
|
have several projects, and each has another number of allowed
|
|
methods. I need to control my Check through properties, so where
|
|
is the Api to do that?
|
|
</p>
|
|
|
|
<p class="body">
|
|
Well, the short answer is, there is no Api. It's magic. Really!
|
|
</p>
|
|
|
|
<p class="body">
|
|
If you need to make something configurable, just add a setter method
|
|
to the Check:
|
|
</p>
|
|
|
|
<pre>
|
|
public class MethodLimitCheck extends Check
|
|
{
|
|
// code from above omitted for brevity
|
|
|
|
public void setMax(int limit)
|
|
{
|
|
max = limit;
|
|
}
|
|
}
|
|
</pre>
|
|
</p>
|
|
|
|
<p class="body">
|
|
With this code added, you can set the property <span
|
|
class="code">max</span> for the MethodLimitCheck module in the
|
|
config file. It doesn't get any simpler than that. The secret is
|
|
that Checkstyle uses JavaBean introspection to set the JavaBean
|
|
properties. That works for all primitive types like boolean,
|
|
int, long, etc. plus Strings plus arrays of these types.
|
|
</p>
|
|
|
|
<a name="logerrors"></a>
|
|
<h3>Logging errors</h3>
|
|
<p class="body">
|
|
Detecting errors is one thing, presenting them to the user is
|
|
another. To do that, the Check base class provides several log
|
|
messages, the most simple of them is Check.log(String). In your
|
|
check you can simply use a verbatim error string like in <span
|
|
class="code">log("Too many methods, only " + mMax +
|
|
" are allowed");</span> as the argument. That will
|
|
work, but it's not the best possible solution if your check is
|
|
intended for a wider audience.
|
|
</p>
|
|
|
|
<p class="body">
|
|
If you are not living in a country where people speak English
|
|
you may have noticed that Checkstyle writes internationalized
|
|
error messages, for example if you live in Germany the error
|
|
messages are german. The individual checks don't have to do
|
|
anything fancy to achieve this, it's actually quite easy and the
|
|
Checkstyle framework does most of the work.
|
|
</p>
|
|
|
|
<p class="body">
|
|
To support internationalized error messages, you need to create
|
|
a message.properties file alongside your Check class, i.e. the
|
|
java file and the properties files should be in the same
|
|
directory. Add a symbolic error code and an english
|
|
representation to the messages.properties, the file should
|
|
contain the following line: <span
|
|
class="code">too.many.methods=Too many methods, only {0} are
|
|
allowed</span>. Then replace the verbatim error message with
|
|
the symbolic representation and use one of the log helper
|
|
methods to provide the dynamic part of the message (mMax in this
|
|
case): <span class="code">log("too.many.methods",
|
|
mMax);</span>. Please consult the documentation of Java's <a
|
|
href="http://java.sun.com/j2se/1.4.1/docs/api/java/text/MessageFormat.html">MassageFormat</a>
|
|
to learn about the syntax of format strings (especially about
|
|
those funny numbers in the translated text).
|
|
</p>
|
|
|
|
<p class="body">
|
|
Supporting a new language is very easy now, simply create a new
|
|
messages file for the language, e.g. messages_fr.properties to
|
|
provide french error messages. The correct file will be chosen
|
|
automatically, based on the language settings of the user's
|
|
operating system.
|
|
</p>
|
|
|
|
<a name="integrate"></a>
|
|
<h3>Integrate your Check</h3>
|
|
<p class="body">
|
|
TODO: Explain the config system and how to integrate a user check.
|
|
</p>
|
|
|
|
<a name="limitations"></a>
|
|
<h3>Limitations</h3>
|
|
<p class="body">
|
|
OK, so you have written your first Check, and you have found
|
|
several flaws in many of your programs. You now know that your
|
|
boss does not follow the coding conventions he wrote. And you
|
|
know that you are the king of the world. To become a programming
|
|
god, you want to write your second check - now wait, first you
|
|
should know what your limits are.
|
|
</p>
|
|
|
|
<p class="body">
|
|
There are basically only two of them:
|
|
<ul>
|
|
<li>You cannot determine the type of an expression.</li>
|
|
<li>You cannot see the content of other files.</li>
|
|
</ul>
|
|
TODO: Explain the practical consequences of these limitations.
|
|
</p>
|
|
|
|
|
|
<a name="filesetchecks"></a>
|
|
<h2>Writing FileSetChecks</h2>
|
|
<p class="body">
|
|
Writing a FileSetCheck is pretty straightforward: Just inherit
|
|
from AbstractFileSetCheck and implement the process(File[]
|
|
files) method and you're done. A very simple example could fire
|
|
an error if the number of files that are passed in exceeds a
|
|
certain limit.
|
|
</p>
|
|
<p class="body">
|
|
TODO: Implement that FSC and provide it as an example. Sketch:
|
|
<pre>
|
|
private int max = 100;
|
|
|
|
public void setMax(int aMax)
|
|
{
|
|
max = aMax;
|
|
}
|
|
|
|
public void process(File[] files)
|
|
{
|
|
if (files != null && files.length > max)
|
|
{
|
|
// build the error list
|
|
Object[] key = new Object[]{it.next()};
|
|
LocalizedMessage[] errors = new LocalizedMessage[1];
|
|
final String className = getClass().getName();
|
|
final int pkgEndIndex = className.lastIndexOf('.');
|
|
final String pkgName = className.substring(0, pkgEndIndex);
|
|
final String bundle = pkgName + ".messages";
|
|
errors[0] = new LocalizedMessage(
|
|
0, bundle, "max.files.exceeded", key);
|
|
|
|
// fire the errors to the AuditListeners
|
|
getMessageDispatcher().fireErrors(path, errors);
|
|
}
|
|
}
|
|
</pre>
|
|
</p>
|
|
<p class="body">
|
|
Note that by implementing the setMax() method the FileSetCheck
|
|
automatically makes "max" a legal configuration
|
|
parameter that you can use in the Checkstyle configuration file.
|
|
</p>
|
|
<p class="body">
|
|
There are virtually no limits what you can do in
|
|
FileSetChecks. The most crazy ideas we've had so far are
|
|
<ul>
|
|
<li class="body">to find global code problems like unused public methods.</li>
|
|
<li class="body">to find duplicate code.</li>
|
|
<li class="body">to port the TreeWalker solution to check C#
|
|
instead of Java.</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<a name="huh"></a>
|
|
<h2>Huh? I can't figure it out!</h2>
|
|
<p class="body">
|
|
That's probably our fault, it means that we have to provide
|
|
better docs. Please do not hesitate to ask questions on the user
|
|
mailing list, this will help us to improve this document.
|
|
Please make your question as precise as possible, we will not
|
|
be able to answer questions like "I want to write a check
|
|
but I don't know how, can you help me?". Be precise, tell us
|
|
what you are trying to do (the purpose of the check), what you have
|
|
understood so far, and what exactly is the problem that blocks you.
|
|
</p>
|
|
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<hr />
|
|
</body> </html>
|