1 files changed, 176 insertions, 0 deletions
diff --git a/README.TXT b/README.TXT
new file mode 100644
index 0000000..f5a016a
--- /dev/null
+++ b/README.TXT
@@ -0,0 +1,176 @@
+* Contact information. 
+
+Any feedback will be appreciated. You can email us at Daniel M. German
+<dmg@uvic.ca> and Yuki Manabe <y-manabe@ist.osaka-u.ac.jp>
+
+* Introduction
+
+Ninka is license identification tool that identifies the license(s)
+under which a source file is made available.
+
+This tool uses a source file as input and outputs the licenses
+identified within that file.
+
+If you need to know the detail of Ninka, please see the following
+paper:
+
+Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching
+method for automatic license identification of source code files. In
+25nd IEEE/ACM International Conference on Automated Software
+Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or
+download it from
+
+http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf
+
+If you use Ninka for research purposes, we would appreciate you cite
+the above paper.
+
+* License
+ 
+  Except for the directories comments and splitter, Ninka is licensed
+  under the AGPLv3+
+ 
+    Copyright (C) 2009-2010  Yuki Manabe and Daniel M. German
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+  - splitter.pl is a derivative work of the Rule-based sentence
+    splitter script by Paul Paul Clough. Please see splitter/README
+    for details.
+
+  - comments is based on a program to remove comments by Jon Newman,
+    it is released under the GNU General Public License Version 2 or
+    (at your option) any later version.
+
+* Requirements
+
+Perl version 5
+
+* How to install
+
+  1. Unpack the distribution in a directory.
+  2. Build and install comments (make sure it is somwehere in the
+     path) (see directory comments)
+  3. Build splitter.pl (see splitter/README for instructions)
+
+* Usage:
+
+Ninka uses a pipe model (see below). Each step of the "pipe" creates a
+file, but
+
+ninka.pl [options] [filename] 
+
+Available options
+  -v verbose
+  -d delete intermediate files
+  -C force creation of comments file
+  -c stop after creation of comments
+  -S force creation of sentences file
+  -s stop after creation of sentences
+  -G force creation of goodsent file
+  -g stop after creation of goodsent
+  -T force creation of senttok file
+  -t stop after creation of senttok
+  -L force creation of license file
+  -f force all processing
+
+
+Example:
+
+   ninka.pl foo.c
+
+It will create five files:
+
+  1. foo.c.comments: extracted the first two comments blocks, where
+     the license is usually
+  2. foo.c.sentences: creates the list of sentences in the license
+     statement
+  3. foo.c.goodsent: contains sentences that are likely to be part of
+     a license statement
+  4. foo.c.badsent: contains the sentences that are not part of
+     foo.c.goodsent
+  5. foo.c.senttok: Each sentence in *.goodsent is converted into a
+     tokenized sentence (or unmatched, when none matches)
+  6. foo.c.license: List of licenses found in the file. Its contains a
+     single line with 3 fields (semicolon delimited):
+     - Licenses
+     - Unmatched sentences in *.senttok that were not matched
+
+   
+
+
+* Ninka model
+
+Ninka uses a pipe-model. Each stage of the pipe does something very specific:
+
+ 1. Comment extractor. 
+
+    - directory: extComments
+
+    - command: extComments.pl, might use comments (included in distribution)
+    
+    - Purpose: Extracts top comments of source code. If no
+          comment extractor is known for the language, then extracts top lines from source (currently 700)
+
+    - Creates <filename>.comments file
+
+2. Split sentences in comments
+ 
+     - directory: splitter
+
+     - command: splitter.pl
+
+     - Purpose: Ninka works by matching sentences of licenses, hence
+       it needs to properly break text into sentences.
+
+     - Outputs <filename>.sentences
+
+3. Filter "good" sentences.
+
+     - directory filter
+
+     - command: filter.pl
+
+     - Purpose: some sentences are related to a license, some are
+       not. It is valuable to know if a file contains lines that look
+       like a license or not (e.g. to know that a file has no license)
+
+     - Outputs: <filename>.goodsent, and <filename>.badsent (not used)
+
+4. Tokenizes sentences
+
+     - Directory senttok
+ 
+     - command: senttok.pl
+
+     - Purpose: It creates a file that corresponds to the recognized
+       sentence tokens. For each sentence, it outputs its sentence token, or unknown otherwise.
+
+     - Outputs: <filename>.senttok
+
+5. Matches sentences to licenses
+
+     - Directory matcher
+
+     - Command: matcher.pl
+
+     - Purpose: looks at the sequence of sentence tokens and outputs the licenses found
+
+     - Output: <filename>.license
+      
+The script ninka.pl takes care of all these steps, and optionally removes
+intermediary files, and writes to the stdout the licenses found.
+
+------
+