diff --git a/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/StrictDuplicateCodeCheck.java b/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/StrictDuplicateCodeCheck.java index 58cd7a96c..082df43a0 100644 --- a/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/StrictDuplicateCodeCheck.java +++ b/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/StrictDuplicateCodeCheck.java @@ -29,23 +29,73 @@ import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; /** - * Checks for duplicate code that only differs by indentation. + *

+ * Performs a line-by-line comparison of all code lines and reports + * duplicate code if a sequence of lines differs only in + * indentation. All import statements in Java code are ignored, any + * other line - including javadoc, whitespace lines between methods, + * etc. - is considered (which is why the check is called + * strict). + *

+ * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + *
namedescriptiontypedefault value
minhow many lines must be equal to be considered a duplicateint12
charsetname of the file charsetStringSystem property "file.encoding"
+ *
+ * + * + *

To configure the check:

+ * + * <module name="StrictDuplicateCode"/> + * * *

- * There are many approaches for detecting duplicate code. Some involve - * parsing a file of a programming language and analyzing the source trees - * of all files. This is a very powerful approach for a specific programming - * language (such as Java), as it can potentially even detect duplicate code - * where linebreaks have been changed, variables have been renamed, etc. + * To configure the check so that it allows larger equivalent blocks: *

+ * + * <module name="StrictDuplicateCode"> + * <property name="min" value="15"/> + * </module> + * + * *

- * This copy and paste detection implementation works differently. - * It cannot detect copy and paste code where the author deliberately - * tries to hide his copy+paste action. Instead it focusses on the standard - * corporate problem of reuse by copy and paste. Usually this leaves linebreaks - * and variable names intact. Since we do not need to analyse a parse tree - * our tool is not tied to a particular programming language. + * To configure the check so that it handles files with the UTF-8 charset: *

+ * + * <module name="StrictDuplicateCode"> + * <property name="charset" value="UTF-8"/> + * </module> + * + *
+ * + * + *

com.puppycrawl.tools.checkstyle.checks.duplicates

+ *
+ * + * + *

+ * Checker + *

+ *
* * @author Lars Kühne */ diff --git a/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/package.html b/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/package.html index 60800ece1..f90582b40 100644 --- a/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/package.html +++ b/src/checkstyle/com/puppycrawl/tools/checkstyle/checks/duplicates/package.html @@ -1,7 +1,7 @@

-Duplicate code detection allows you to find +Duplicate code detection allows you to find code that has been generated by Copy/Paste programming. Duplicate code typically leads to higher maintainance cost because bugs will need to be fixed twice, more code needs to be tested, etc. @@ -28,16 +28,49 @@ Note that there are brilliant commercial implementations of duplicate code detection tools. One that is particularly noteworthy is Simian from RedHill Consulting, Inc. -

-

Simian has managed to find a very good balance of the above tradeoffs. It is superior to the checks in this package in many repects. Simian is reasonably priced (free for noncommercial projects) and includes a Checkstyle plugin. +

+

+The following table summarizes the characteristics of the available +Checkstyle plugins for duplicate code detection: +

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
NameSpeedMemory UsageFalse AlarmsSupported languagesFuzzy matches
StrictDuplicateCodeMediumVery LowPossible but very unlikelyany languageNo
SimianVery highLowPossible but very unlikelymany languages, including Java and C/C++/C#Limited support
+ +

We encourage all users of Checkstyle to evaluate Simian as an alternative to the Checks we offer in our distribution.

+ \ No newline at end of file diff --git a/src/xdocs/config_duplicates.xml b/src/xdocs/config_duplicates.xml deleted file mode 100755 index 5f6048d76..000000000 --- a/src/xdocs/config_duplicates.xml +++ /dev/null @@ -1,143 +0,0 @@ - - - - - - Duplicate Code - Checkstyle Development Team - - - - -

- There are many trade-offs when writing a duplicate code detection tool. - Some of the conflicting goals are: -

-

- -

- Note that there are brilliant commercial implementations of duplicate - code detection tools. One that is particularly noteworthy is Simian - from RedHill Consulting, Inc. -

- -

- Simian is reasonably priced (free for noncommercial projects) and - includes a Checkstyle plugin. We encourage all users of Checkstyle to - evaluate Simian as an alternative to the Checks we offer in our - distribution. -

- -

- The following table summarizes the characteristics of the available - Checkstyle plugins for duplicate code detection: -

- - - - - - - - - - - - - - - - - - - - - - - - - - -
NameSpeedMemory UsageFalse AlarmsSupported languagesFuzzy matches
StrictDuplicateCodeMediumVery LowPossible but very unlikelyany languageNo
SimianVery highLowPossible but very unlikelymany languages, including Java and C/C++/C#Limited support
- -
-

- Performs a line-by-line comparison of all code lines and reports - duplicate code, i.e. a sequence of lines that differ only in - indentation. All import statements in Java code are ignored, any - other line - including javadoc, whitespace lines between methods, - etc. - is considered (which is why the check is called - strict). -

- - - - - - - - - - - - - - - - - - - - - -
namedescriptiontypedefault value
minhow many lines must be equal to be considered a duplicateint12
charsetname of the file charsetStringSystem property "file.encoding"
-
- - -

To configure the check:

- -<module name="StrictDuplicateCode"/> - - -

- To configure the check so that it allows larger equivalent blocks: -

- -<module name="StrictDuplicateCode"> - <property name="min" value="15"/> -</module> - - -

- To configure the check so that it handles files with the UTF-8 charset: -

- -<module name="StrictDuplicateCode"> - <property name="charset" value="UTF-8"/> -</module> - -
- - -

com.puppycrawl.tools.checkstyle.checks.duplicates

-
- - -

- Checker -

-
-
- -