From 08d574454c61b7802af40958d5bf2a4d786379cb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lars=20K=C3=BChne?=
+ * Performs a line-by-line comparison of all code lines and reports
+ * duplicate code if a sequence of lines differs only in
+ * indentation. All import statements in Java code are ignored, any
+ * other line - including javadoc, whitespace lines between methods,
+ * etc. - is considered (which is why the check is called
+ * strict).
+ * To configure the check:
- * There are many approaches for detecting duplicate code. Some involve
- * parsing a file of a programming language and analyzing the source trees
- * of all files. This is a very powerful approach for a specific programming
- * language (such as Java), as it can potentially even detect duplicate code
- * where linebreaks have been changed, variables have been renamed, etc.
+ * To configure the check so that it allows larger equivalent blocks:
*
- * This copy and paste detection implementation works differently.
- * It cannot detect copy and paste code where the author deliberately
- * tries to hide his copy+paste action. Instead it focusses on the standard
- * corporate problem of reuse by copy and paste. Usually this leaves linebreaks
- * and variable names intact. Since we do not need to analyse a parse tree
- * our tool is not tied to a particular programming language.
+ * To configure the check so that it handles files with the UTF-8 charset:
* com.puppycrawl.tools.checkstyle.checks.duplicates
+ * Checker
+ *
-Duplicate code detection allows you to find
+Duplicate code detection allows you to find
code that has been generated by Copy/Paste programming. Duplicate code typically
leads to higher maintainance cost because bugs will need to be fixed twice,
more code needs to be tested, etc.
@@ -28,16 +28,49 @@ Note that there are brilliant commercial implementations of duplicate code
detection tools. One that is particularly noteworthy is
Simian
from RedHill Consulting, Inc.
-
Simian has managed to find a very good balance of the above tradeoffs.
It is superior to the checks in this package in many repects.
Simian is reasonably priced (free for noncommercial projects)
and includes a Checkstyle plugin.
+
+The following table summarizes the characteristics of the available
+Checkstyle plugins for duplicate code detection:
+
We encourage all users of Checkstyle to evaluate Simian as an
alternative to the Checks we offer in our distribution.
- There are many trade-offs when writing a duplicate code detection tool.
- Some of the conflicting goals are:
-
+ *
+ *
+ *
+ * name
+ * description
+ * type
+ * default value
+ *
+ *
+ * min
+ * how many lines must be equal to be considered a duplicate
+ * int
+ * 12
+ *
+ *
+ * charset
+ * name of the file charset
+ * String
+ * System property "file.encoding"
+ *
+
+
+
+
+ Name
+ Speed
+ Memory Usage
+ False Alarms
+ Supported languages
+ Fuzzy matches
+
+
+ StrictDuplicateCode
+ Medium
+ Very Low
+ Possible but very unlikely
+ any language
+ No
+
+
+Simian
+ Very high
+ Low
+ Possible but very unlikely
+ many languages, including Java and C/C++/C#
+ Limited support
+
-
-
- Note that there are brilliant commercial implementations of duplicate - code detection tools. One that is particularly noteworthy is Simian - from RedHill Consulting, Inc. -
- -- Simian is reasonably priced (free for noncommercial projects) and - includes a Checkstyle plugin. We encourage all users of Checkstyle to - evaluate Simian as an alternative to the Checks we offer in our - distribution. -
- -- The following table summarizes the characteristics of the available - Checkstyle plugins for duplicate code detection: -
- -| Name | -Speed | -Memory Usage | -False Alarms | -Supported languages | -Fuzzy matches | -
|---|---|---|---|---|---|
| StrictDuplicateCode | -Medium | -Very Low | -Possible but very unlikely | -any language | -No | -
| Simian | -Very high | -Low | -Possible but very unlikely | -many languages, including Java and C/C++/C# | -Limited support | -
- Performs a line-by-line comparison of all code lines and reports - duplicate code, i.e. a sequence of lines that differ only in - indentation. All import statements in Java code are ignored, any - other line - including javadoc, whitespace lines between methods, - etc. - is considered (which is why the check is called - strict). -
- -| name | -description | -type | -default value | -
|---|---|---|---|
| min | -how many lines must be equal to be considered a duplicate | -int | -12 | -
| charset | -name of the file charset | -String | -System property "file.encoding" | -
To configure the check:
-- To configure the check so that it allows larger equivalent blocks: -
-- To configure the check so that it handles files with the UTF-8 charset: -
-com.puppycrawl.tools.checkstyle.checks.duplicates
-- Checker -
-