summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
Diffstat (limited to 'README')
-rw-r--r--README124
1 files changed, 47 insertions, 77 deletions
diff --git a/README b/README
index b80a187..dbbe6f1 100644
--- a/README
+++ b/README
@@ -11,16 +11,13 @@ under which a source file is made available.
This tool uses a source file as input and outputs the licenses
identified within that file.
-If you need to know the detail of Ninka, please see the following
-paper:
+If you need to know the detail of Ninka, please see the following paper:
Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching
method for automatic license identification of source code files. In
25nd IEEE/ACM International Conference on Automated Software
Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or
-download it from
-
-http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf
+download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf.
If you use Ninka for research purposes, we would appreciate you cite
the above paper.
@@ -28,13 +25,13 @@ the above paper.
* Contributors
- Paul Clough for his code to split sentences
-- Anthony Kohan for writing the excel and sqlite backends.
-- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions
+- Anthony Kohan for writing the excel and sqlite backends
+- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions
+- René Scheibe for modularizing the code
* License
- Except for the directories comments and splitter, Ninka is licensed
- under the GPLv2+
+ Ninka is licensed under the GPLv2+:
Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German
@@ -51,59 +48,41 @@ the above paper.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
- - splitter.pl is a derivative work of the Rule-based sentence
- splitter script by Paul Paul Clough. Please see splitter/README
- for details.
+ Ninka::SentenceExtraxtor is a derivative work of the rule-based sentence
+ splitter script by Paul Paul Clough.
- - comments is based on a program to remove comments by Jon Newman,
- it is released under the GNU General Public License Version 2 or
- (at your option) any later version.
+ comments is based on a program to remove comments by Jon Newman.
* Requirements
- Perl version 5 or above
-- for ninka-excel.pl: Perl module Spreadsheet::WriteExcel
- https://metacpan.org/release/Spreadsheet-WriteExcel/
-- for ninka-sqlite.pl: Perl module DBD::SQLite
+- for ninka-excel: Perl module Spreadsheet::WriteExcel
+ https://metacpan.org/release/Spreadsheet-WriteExcel
+- for ninka-sqlite: Perl module DBD::SQLite
https://metacpan.org/release/DBD-SQLite
* How to install
1. Unpack the distribution in a directory.
- 2. Optional: Build and install comments (make sure it is somwehere in the
- path) (see directory comments)
-
+ 2. Optional: Build and install comments (make sure it is somwehere in the path) (see directory comments)
-* Usage:
+* Usage
-Ninka uses a pipe model (see below). Each step of the "pipe" creates a
-file, but
+ninka [options] filename
-ninka.pl [options] [filename]
+Available options:
-Available options
+ -i create intermediary files
-v verbose
- -d delete intermediate files
- -C force creation of comments file
- -c stop after creation of comments
- -S force creation of sentences file
- -s stop after creation of sentences
- -G force creation of goodsent file
- -g stop after creation of goodsent
- -T force creation of senttok file
- -t stop after creation of senttok
- -L force creation of license file
- -f force all processing
-
Example:
- ninka.pl foo.c
+ ninka -i foo.c
It will create five files:
- 1. foo.c.comments: extracted the first two comments blocks, where
- the license is usually
+ 1. foo.c.comments: extracted the first comments blocks, where
+ the license is usually included
2. foo.c.sentences: creates the list of sentences in the license
statement
3. foo.c.goodsent: contains sentences that are likely to be part of
@@ -117,69 +96,60 @@ It will create five files:
- Licenses
- Unmatched sentences in *.senttok that were not matched
-
-
+The files are not required for Ninka's functionality. But they can help
+to debug license detection issues.
* Ninka model
Ninka uses a pipe-model. Each stage of the pipe does something very specific:
- 1. Comment extractor.
+1. Comment extractor
- - directory: extComments
+ - Module: Ninka::CommentExtractor
- - command: extComments.pl, might use comments (included in distribution)
+ - Purpose: Extracts top comments of source code.
+ If no comment extractor is known for the language,
+ then extracts top lines from source (currently 700)
- - Purpose: Extracts top comments of source code. If no
- comment extractor is known for the language, then extracts top lines from source (currently 700)
-
- - Creates <filename>.comments file
+ - Output: <filename>.comments
2. Split sentences in comments
- - directory: splitter
-
- - command: splitter.pl
-
- - Purpose: Ninka works by matching sentences of licenses, hence
- it needs to properly break text into sentences.
-
- - Outputs <filename>.sentences
-
-3. Filter "good" sentences.
+ - Module: Ninka::SentenceExtractor
- - directory filter
+ - Purpose: Ninka works by matching sentences of licenses,
+ hence it needs to properly break text into sentences.
- - command: filter.pl
+ - Output: <filename>.sentences
- - Purpose: some sentences are related to a license, some are
- not. It is valuable to know if a file contains lines that look
- like a license or not (e.g. to know that a file has no license)
+3. Filter "good" sentences
- - Outputs: <filename>.goodsent, and <filename>.badsent (not used)
+ - Module: Ninka::SentenceFilter
-4. Tokenizes sentences
+ - Purpose: Some sentences are related to a license, some are not.
+ It is valuable to know if a file contains lines that look like
+ a license or not (e.g. to know that a file has no license).
- - Directory senttok
+ - Output: <filename>.goodsent and <filename>.badsent
- - command: senttok.pl
+4. Tokenize sentences
- - Purpose: It creates a file that corresponds to the recognized
- sentence tokens. For each sentence, it outputs its sentence token, or unknown otherwise.
+ - Module: Ninka::SentenceTokenizer
- - Outputs: <filename>.senttok
+ - Purpose: It creates a file that corresponds to the recognized sentence tokens.
+ For each sentence, it outputs its sentence token, or unknown otherwise.
-5. Matches sentences to licenses
+ - Output: <filename>.senttok
- - Directory matcher
+5. Match sentences to licenses
- - Command: matcher.pl
+ - Module: Ninka::LicenseMatcher
- - Purpose: looks at the sequence of sentence tokens and outputs the licenses found
+ - Purpose: It looks at the sentence tokens and outputs the licenses found.
- Output: <filename>.license
-The script ninka.pl takes care of all these steps, and optionally removes
+The script ninka takes care of all these steps, and optionally creates
intermediary files, and writes to the stdout the licenses found.
------