checkstyle/docs/writingchecks.html

355 lines
14 KiB
HTML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Writing your own checks</title>
<link rel="stylesheet" type="text/css" href="mystyle.css"/>
</head>
<body>
<!-- The header -->
<table border="0" width="100%" summary="header layout">
<tr>
<td>
<h1>Writing your own Checks</h1>
</td>
<td align="right">
<img src="logo.png" alt="Checkstyle Logo"/>
</td>
</tr>
</table>
<!-- content -->
<table border="0" width="100%" cellpadding="5" summary="body layout">
<tr>
<!--Left menu-->
<td class="menu" valign="top">
<p><a href="#overview">Overview</a></p>
<p><a href="#checks">Writing Checks</a></p>
<ul>
<li><a href="#grammar">Java Grammar</a></li>
<li><a href="#gui">Checkstyle GUI</a></li>
<li><a href="#visitor">Vistor Pattern</a></li>
<li><a href="#regtokens">Visitor in Action</a></li>
<li><a href="#astnav">Navigating the AST</a></li>
<li><a href="#integrate">Integrating Checks</a></li>
<li><a href="#limitations">Limitations</a></li>
</ul>
<p><a href="#filesetchecks">Writing FileSetChecks</a>
<p><a href="#huh">Huh?</a></p>
</td>
<!--Content-->
<td class="content" valign="top" align="left">
<a name="overview"></a>
<h2>Overview</h2>
<p class="body">
OK, so you have finally decided to write your own check.
Welcome aboard, this is really a fun thing to do. There are
actually two kinds of checks, so before you can start, you have
to find out which kind of check you want to implement.
</p>
<p class="body">
The functionality of Checkstyle is implemented in modules that
can be plugged into Checkstyle. Modules can be containers for
other modules, i.e. they form a tree structure. The toplevel
modules that are known directly to the Checkstyle kernel (which
is also a module and forms the root of the tree) are
FileSetChecks. These are pretty simple to grasp: they take a set
of input files and fire error messages.
</p>
<p class="body">
Checkstyle provides a few FileSetCheck implementations by
default and one of them happens to be the TreeWalker. A
TreeWalker typically has some submodules, called Checks. The
TreeWalker operates by seperately transforming each of the java
input files into an abstract syntax tree and then handing the
result over to each of the Check submodules which in turn have a
look at a certain aspect of the tree.
</p>
<a name="checks"></a>
<h2>Writing Checks</h2>
<p class="body">
Most of the functionality of Checkstyle is implemented as
Checks. If you know how to write your own Checks, you can extend
Checkstyle according to your needs without having to wait for
the checkstyle development team. You are about to become a
Checkstyle Expert.
</p>
<p class="body">
Suppose you have a convention that the number of methods in a
class should not exceed a certain limit, say 30. This rule makes
sense, a class should only do one thing and do it well. With a
zillion methods chances are that the class does more than one
thing. The only problem you have is that your convention is not
checked by Checkstyle, so you'll have to write your own check
and plug it into the Checkstyle framework.
</p>
<p class="body">
This chapter is organized as a tour that takes you through the
process step by step and explains both the theoretical
foundations and the Checkstyle API along the way.
</p>
<a name="grammar"></a>
<h3>Java Grammar</h3>
<p class="body">
Every Java Program is structured into files, and each of these
files has a certain structure. For example, if there is a
package statement then it is the first line of the file that is
not comment or whitespace. After the package statement comes a
list of import statements, which is followed by a class or
interface definition, and so on.
</p>
<p class="body">
If you have ever read an introductory level Java book you probably
knew all of the above. And if you have studied computer science,
you probably also know that the rules that specify the Java Language
can be formally specified using a Grammar (statement is simplified
for didactic purposes).
</p>
<p class="body">
Tools exist which read a grammar definition and produce a parser
for the language that is specified in the grammar. In other
words the output of the tool is a program that can transform a
stream of characters (a Java File) into a tree representation
that reflects the structure of the file. CheckStyle uses the
parser generator <a href="http://www.antlr.org">ANTLR</a> but
that is an implementation detail you do not need to worry about
when writing checks. Several other parser generators exist and
they all work well.
</p>
<a name="gui"></a>
<h3>The Checkstyle SDK gui</h3>
<p class="body">
Still with us? Great, you have mastered the basic theory so here
is your reward - a gui that displays the structure of a Java
source file. To run it type
<pre>
java -classpath checkstyle-all-${version}.jar com.puppycrawl.toos.checkstyle.gui.Main
</pre>
</p>
<p class="body">
on the command line. Click the button at the botton of the frame
and select a syntactically correct Java source file. The frame
will be populated with a tree that corresponds to the structure
of the java source code.
</p>
<p class="body">
TODO: screenshot
</p>
<p class="body">
In the leftmost column you can open and close branches of the
tree, the remaining columns display information about each node
in the tree. The second column displays a token type for each
node. As you navigate from the root of the tree to one of the
leafs, you'll notice that the token type denotes smaller and
smaller units of your source file, i.e. close to the root you
might see the token type CLASS_DEF (a node that represents a
class definition) while you will see token types like IDENT (an
identifier) near the leafs of the tree.
</p>
<p class="body">
We'll get back to the details in the other columns later, they
are important for implementing checks but not for understanding
the basic concepts. For now it is sufficient to know that the
gui is a tool that lets you look at the structure of a java
file, i.e. you can see the java grammar 'in action'.
</p>
<a name="visitor"></a>
<h3>Understanding the visitor pattern</h3>
<p class="body">
TODO: A brief explanation of the Visitor pattern, xref to
GoF/pattern wiki.
</p>
<a name="regtokens"></a>
<h3>Visitor in action</h3>
<p class="body">
When you fire up the checkstyle GUI and look at a few source
files you'll figure out pretty quickly that you are mainly
interested in the number of tree nodes of type METHOD_DEF. The
number of such tokens should be counted separately for each
CLASS_DEF / INTERFACE_DEF.
</p>
<p class="body">
Now you have to decide how constructors are treated. Do they
count as a method for the purposes of your Check? Maybe you
should make that configurable, and we have good news for you:
Checkstyle lets you control the token types for which your
visitor methods are called.
</p>
<p class="body">
TODO: Explain how. Explain the visitor methods
(visitToken, leaveToken, beginTree, endTree).
</p>
<a name="astnav"></a>
<h3>Navigating the Abstract Syntax Tree (AST)</h3>
<p class="body">
TODO: Explain the navigation methods in DetailAST and how to
use them.
</p>
<a name="logerrors"></a>
<h3>Logging errors</h3>
<p class="body">
Detecting errors is one thing, presenting them to the user is
another. To do that, the Check base class provides several log
messages, the most simple of them is Check.log(String). In your
check you can simply use a verbatim error string like in <span
class="code">log(&quot;Too many methods, only &quot; + mMax +
&quot; are allowed&quot;);</span> as the argument. That will
work, but it's not the best possible solution if your check is
intended for a wider audience.
</p>
<p class="body">
If you are not living in a country where people speak English
you may have noticed that Checkstyle writes internationalized
error messages, for example if you live in Germany the error
messages are german. The individual checks don't have to do
anything fancy to achieve this, it's actually quite easy and the
Checkstyle framework does most of the work.
</p>
<p class="body">
To support internationalized error messages, you need to create
a message.properties file alongside your Check class, i.e. the
java file and the properties files should be in the same
directory. Add a symbolic error code and an english
representation to the messages.properties, the file should
contain the following line: <span
class="code">too.many.methods=Too many methods, only {0} are
allowed</span>. Then replace the verbatim error message with
the symbolic representation and use one of the log helper
methods to provide the dynamic part of the message (mMax in this
case): <span class="code">log(&quot;too.many.methods&quot;,
mMax);</span>. Please consult the documentation of Java's <a
href="http://java.sun.com/j2se/1.4.1/docs/api/java/text/MessageFormat.html">MassageFormat</a>
to learn about the syntax of format strings (especially about
those funny numbers in the translated text).
</p>
<p class="body">
Supporting a new language is very easy now, simply create a new
messages file for the language, e.g. messages_fr.properties to
provide french error messages. The correct file will be chosen
automatically, based on the language settings of the user's
operating system.
</p>
<a name="integrate"></a>
<h3>Integrate your Check</h3>
<p class="body">
TODO: Explain the config system and how to integrate a user check.
</p>
<a name="limitations"></a>
<h3>Limitations</h3>
<p class="body">
OK, so you have written your first Check, and you have found
several flaws in many of your programs. You now know that your
boss does not follow the coding conventions he wrote. And you
know that you are the king of the world. To become a programming
god, you want to write your second check - now wait, first you
should know what your limits are.
</p>
<p class="body">
There are basically only two of them:
<ul>
<li>You cannot determine the type of an expression.</li>
<li>You cannot see the content of other files.</li>
</ul>
TODO: Explain the practical consequences of these limitations.
</p>
<a name="filesetchecks"></a>
<h2>Writing FileSetChecks</h2>
<p class="body">
Writing a FileSetCheck is pretty straightforward: Just inherit
from AbstractFileSetCheck and implement the process(File[]
files) method and you're done. A very simple example could fire
an error if the number of files that are passed in exceeds a
certain limit.
</p>
<p class="body">
TODO: Implement that FSC and provide it as an example. Sketch:
<pre>
private int max = 100;
public void setMax(int aMax)
{
max = aMax;
}
public void process(File[] files)
{
if (files != null && files.length &gt max)
{
// build the error list
Object[] key = new Object[]{it.next()};
LocalizedMessage[] errors = new LocalizedMessage[1];
final String className = getClass().getName();
final int pkgEndIndex = className.lastIndexOf('.');
final String pkgName = className.substring(0, pkgEndIndex);
final String bundle = pkgName + ".messages";
errors[0] = new LocalizedMessage(
0, bundle, "max.files.exceeded", key);
// fire the errors to the AuditListeners
getMessageDispatcher().fireErrors(path, errors);
}
}
</pre>
</p>
<p class="body">
Note that by implementing the setMax() method the FileSetCheck
automatically makes &quot;max&quot; a legal configuration
parameter that you can use in the Checkstyle configuration file.
</p>
<p class="body">
There are virtually no limits what you can do in
FileSetChecks. The most crazy ideas we've had so far are
<ul>
<li class="body">to find global code problems like unused public methods.</li>
<li class="body">to find duplicate code.</li>
<li class="body">to port the TreeWalker solution to check C#
instead of Java.</li>
</p>
<a name="huh"></a>
<h2>Huh? I can't figure it out!</h3>
<p class="body">
That's probably our fault, it means that we have to provide
better docs. Please do not hesitate to ask questions on the user
mailing list, this will help us to improve this document.
Please make your question as precise as possible, we will not
be able to answer questions like &quot;I want to write a check
but I don't know how, can you help me?&quot;.
</p>
</td>
</tr>
</table>
<hr />
</body> </html>