1 files changed, 47 insertions, 77 deletions
diff --git a/README b/README
index b80a187..dbbe6f1 100644
--- a/README
+++ b/README
@@ -11,16 +11,13 @@ under which a source file is made available.
 This tool uses a source file as input and outputs the licenses
 identified within that file.
 
-If you need to know the detail of Ninka, please see the following
-paper:
+If you need to know the detail of Ninka, please see the following paper:
 
 Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching
 method for automatic license identification of source code files. In
 25nd IEEE/ACM International Conference on Automated Software
 Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or
-download it from
-
-http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf
+download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf.
 
 If you use Ninka for research purposes, we would appreciate you cite
 the above paper.
@@ -28,13 +25,13 @@ the above paper.
 * Contributors
 
 - Paul Clough for his code to split sentences
-- Anthony Kohan for writing the excel and sqlite backends.
-- Armijn Hemel from Tjaldur Software Governance Solutions  for multiple bug reports and suggestions
+- Anthony Kohan for writing the excel and sqlite backends
+- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions
+- René Scheibe for modularizing the code
 
 * License
 
-  Except for the directories comments and splitter, Ninka is licensed
-  under the GPLv2+
+  Ninka is licensed under the GPLv2+:
 
     Copyright (C) 2009-2014  Yuki Manabe and Daniel M. German
 
@@ -51,59 +48,41 @@ the above paper.
     You should have received a copy of the GNU General Public License
     along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
-  - splitter.pl is a derivative work of the Rule-based sentence
-    splitter script by Paul Paul Clough. Please see splitter/README
-    for details.
+  Ninka::SentenceExtraxtor is a derivative work of the rule-based sentence
+  splitter script by Paul Paul Clough.
 
-  - comments is based on a program to remove comments by Jon Newman,
-    it is released under the GNU General Public License Version 2 or
-    (at your option) any later version.
+  comments is based on a program to remove comments by Jon Newman.
 
 * Requirements
 
 - Perl version 5 or above
-- for ninka-excel.pl: Perl module Spreadsheet::WriteExcel
-  https://metacpan.org/release/Spreadsheet-WriteExcel/
-- for ninka-sqlite.pl: Perl module DBD::SQLite
+- for ninka-excel: Perl module Spreadsheet::WriteExcel
+  https://metacpan.org/release/Spreadsheet-WriteExcel
+- for ninka-sqlite: Perl module DBD::SQLite
   https://metacpan.org/release/DBD-SQLite
 
 * How to install
 
   1. Unpack the distribution in a directory.
-  2. Optional: Build and install comments (make sure it is somwehere in the
-     path) (see directory comments)
-
+  2. Optional: Build and install comments (make sure it is somwehere in the path) (see directory comments)
 
-* Usage:
+* Usage
 
-Ninka uses a pipe model (see below). Each step of the "pipe" creates a
-file, but
+ninka [options] filename
 
-ninka.pl [options] [filename]
+Available options:
 
-Available options
+  -i create intermediary files
   -v verbose
-  -d delete intermediate files
-  -C force creation of comments file
-  -c stop after creation of comments
-  -S force creation of sentences file
-  -s stop after creation of sentences
-  -G force creation of goodsent file
-  -g stop after creation of goodsent
-  -T force creation of senttok file
-  -t stop after creation of senttok
-  -L force creation of license file
-  -f force all processing
-
 
 Example:
 
-   ninka.pl foo.c
+  ninka -i foo.c
 
 It will create five files:
 
-  1. foo.c.comments: extracted the first two comments blocks, where
-     the license is usually
+  1. foo.c.comments: extracted the first comments blocks, where
+     the license is usually included
   2. foo.c.sentences: creates the list of sentences in the license
      statement
   3. foo.c.goodsent: contains sentences that are likely to be part of
@@ -117,69 +96,60 @@ It will create five files:
      - Licenses
      - Unmatched sentences in *.senttok that were not matched
 
-
-
+The files are not required for Ninka's functionality. But they can help
+to debug license detection issues.
 
 * Ninka model
 
 Ninka uses a pipe-model. Each stage of the pipe does something very specific:
 
- 1. Comment extractor.
+1. Comment extractor
 
-    - directory: extComments
+    - Module: Ninka::CommentExtractor
 
-    - command: extComments.pl, might use comments (included in distribution)
+    - Purpose: Extracts top comments of source code.
+               If no comment extractor is known for the language,
+               then extracts top lines from source (currently 700)
 
-    - Purpose: Extracts top comments of source code. If no
-          comment extractor is known for the language, then extracts top lines from source (currently 700)
-
-    - Creates <filename>.comments file
+    - Output: <filename>.comments
 
 2. Split sentences in comments
 
-     - directory: splitter
-
-     - command: splitter.pl
-
-     - Purpose: Ninka works by matching sentences of licenses, hence
-       it needs to properly break text into sentences.
-
-     - Outputs <filename>.sentences
-
-3. Filter "good" sentences.
+     - Module: Ninka::SentenceExtractor
 
-     - directory filter
+     - Purpose: Ninka works by matching sentences of licenses,
+                hence it needs to properly break text into sentences.
 
-     - command: filter.pl
+     - Output: <filename>.sentences
 
-     - Purpose: some sentences are related to a license, some are
-       not. It is valuable to know if a file contains lines that look
-       like a license or not (e.g. to know that a file has no license)
+3. Filter "good" sentences
 
-     - Outputs: <filename>.goodsent, and <filename>.badsent (not used)
+     - Module: Ninka::SentenceFilter
 
-4. Tokenizes sentences
+     - Purpose: Some sentences are related to a license, some are not.
+                It is valuable to know if a file contains lines that look like
+                a license or not (e.g. to know that a file has no license).
 
-     - Directory senttok
+     - Output: <filename>.goodsent and <filename>.badsent
 
-     - command: senttok.pl
+4. Tokenize sentences
 
-     - Purpose: It creates a file that corresponds to the recognized
-       sentence tokens. For each sentence, it outputs its sentence token, or unknown otherwise.
+     - Module: Ninka::SentenceTokenizer
 
-     - Outputs: <filename>.senttok
+     - Purpose: It creates a file that corresponds to the recognized sentence tokens.
+                For each sentence, it outputs its sentence token, or unknown otherwise.
 
-5. Matches sentences to licenses
+     - Output: <filename>.senttok
 
-     - Directory matcher
+5. Match sentences to licenses
 
-     - Command: matcher.pl
+     - Module: Ninka::LicenseMatcher
 
-     - Purpose: looks at the sequence of sentence tokens and outputs the licenses found
+     - Purpose: It looks at the sentence tokens and outputs the licenses found.
 
      - Output: <filename>.license
 
-The script ninka.pl takes care of all these steps, and optionally removes
+The script ninka takes care of all these steps, and optionally creates
 intermediary files, and writes to the stdout the licenses found.
 
 ------