diff --git a/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/StrictDuplicateCodeCheck.java b/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/StrictDuplicateCodeCheck.java index 58cd7a96c..082df43a0 100644 --- a/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/StrictDuplicateCodeCheck.java +++ b/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/StrictDuplicateCodeCheck.java @@ -29,23 +29,73 @@ import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; /** - * Checks for duplicate code that only differs by indentation. + *
+ * Performs a line-by-line comparison of all code lines and reports + * duplicate code if a sequence of lines differs only in + * indentation. All import statements in Java code are ignored, any + * other line - including javadoc, whitespace lines between methods, + * etc. - is considered (which is why the check is called + * strict). + *
+ * + *| name | + *description | + *type | + *default value | + *
|---|---|---|---|
| min | + *how many lines must be equal to be considered a duplicate | + *int | + *12 | + *
| charset | + *name of the file charset | + *String | + *System property "file.encoding" | + *
To configure the check:
+ *- * There are many approaches for detecting duplicate code. Some involve - * parsing a file of a programming language and analyzing the source trees - * of all files. This is a very powerful approach for a specific programming - * language (such as Java), as it can potentially even detect duplicate code - * where linebreaks have been changed, variables have been renamed, etc. + * To configure the check so that it allows larger equivalent blocks: *
+ *- * This copy and paste detection implementation works differently. - * It cannot detect copy and paste code where the author deliberately - * tries to hide his copy+paste action. Instead it focusses on the standard - * corporate problem of reuse by copy and paste. Usually this leaves linebreaks - * and variable names intact. Since we do not need to analyse a parse tree - * our tool is not tied to a particular programming language. + * To configure the check so that it handles files with the UTF-8 charset: *
+ *com.puppycrawl.tools.checkstyle.checks.duplicates
+ *+ * Checker + *
+ *-Duplicate code detection allows you to find +Duplicate code detection allows you to find code that has been generated by Copy/Paste programming. Duplicate code typically leads to higher maintainance cost because bugs will need to be fixed twice, more code needs to be tested, etc. @@ -28,16 +28,49 @@ Note that there are brilliant commercial implementations of duplicate code detection tools. One that is particularly noteworthy is Simian from RedHill Consulting, Inc. -
-Simian has managed to find a very good balance of the above tradeoffs. It is superior to the checks in this package in many repects. Simian is reasonably priced (free for noncommercial projects) and includes a Checkstyle plugin. +
++The following table summarizes the characteristics of the available +Checkstyle plugins for duplicate code detection: +
+ +| Name | +Speed | +Memory Usage | +False Alarms | +Supported languages | +Fuzzy matches | +
|---|---|---|---|---|---|
| StrictDuplicateCode | +Medium | +Very Low | +Possible but very unlikely | +any language | +No | +
| Simian | +Very high | +Low | +Possible but very unlikely | +many languages, including Java and C/C++/C# | +Limited support | +
We encourage all users of Checkstyle to evaluate Simian as an alternative to the Checks we offer in our distribution.
+ \ No newline at end of file diff --git a/src/xdocs/config_duplicates.xml b/src/xdocs/config_duplicates.xml deleted file mode 100755 index 5f6048d76..000000000 --- a/src/xdocs/config_duplicates.xml +++ /dev/null @@ -1,143 +0,0 @@ - - -- There are many trade-offs when writing a duplicate code detection tool. - Some of the conflicting goals are: -
- Note that there are brilliant commercial implementations of duplicate - code detection tools. One that is particularly noteworthy is Simian - from RedHill Consulting, Inc. -
- -- Simian is reasonably priced (free for noncommercial projects) and - includes a Checkstyle plugin. We encourage all users of Checkstyle to - evaluate Simian as an alternative to the Checks we offer in our - distribution. -
- -- The following table summarizes the characteristics of the available - Checkstyle plugins for duplicate code detection: -
- -| Name | -Speed | -Memory Usage | -False Alarms | -Supported languages | -Fuzzy matches | -
|---|---|---|---|---|---|
| StrictDuplicateCode | -Medium | -Very Low | -Possible but very unlikely | -any language | -No | -
| Simian | -Very high | -Low | -Possible but very unlikely | -many languages, including Java and C/C++/C# | -Limited support | -
- Performs a line-by-line comparison of all code lines and reports - duplicate code, i.e. a sequence of lines that differ only in - indentation. All import statements in Java code are ignored, any - other line - including javadoc, whitespace lines between methods, - etc. - is considered (which is why the check is called - strict). -
- -| name | -description | -type | -default value | -
|---|---|---|---|
| min | -how many lines must be equal to be considered a duplicate | -int | -12 | -
| charset | -name of the file charset | -String | -System property "file.encoding" | -
To configure the check:
-- To configure the check so that it allows larger equivalent blocks: -
-- To configure the check so that it handles files with the UTF-8 charset: -
-com.puppycrawl.tools.checkstyle.checks.duplicates
-- Checker -
-