From 6720a4799ccf1f0b31523a00a66de96280502a54 Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:04:34 +0200 Subject: Update README Use Markdown. --- README | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/README b/README index dbbe6f1..9a13aff 100644 --- a/README +++ b/README @@ -1,12 +1,14 @@ -* Contact information. +Contact information +=================== Any feedback will be appreciated. You can email us at Daniel M. German and Yuki Manabe -* Introduction +Introduction +------------ Ninka is license identification tool that identifies the license(s) -under which a source file is made available. +under which a given source file is made available. This tool uses a source file as input and outputs the licenses identified within that file. @@ -22,7 +24,8 @@ download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf. If you use Ninka for research purposes, we would appreciate you cite the above paper. -* Contributors +Contributors +------------ - Paul Clough for his code to split sentences - Anthony Kohan for writing the excel and sqlite backends -- cgit v1.2.1 From bb9fe62301a3139cf563186eb9f5f505af581559 Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:22:15 +0200 Subject: Rename README to README.md Take advantage of Markdown --- README | 197 -------------------------------------------------------------- README.md | 197 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 197 insertions(+), 197 deletions(-) delete mode 100644 README create mode 100644 README.md diff --git a/README b/README deleted file mode 100644 index 9a13aff..0000000 --- a/README +++ /dev/null @@ -1,197 +0,0 @@ -Contact information -=================== - -Any feedback will be appreciated. You can email us at Daniel M. German - and Yuki Manabe - -Introduction ------------- - -Ninka is license identification tool that identifies the license(s) -under which a given source file is made available. - -This tool uses a source file as input and outputs the licenses -identified within that file. - -If you need to know the detail of Ninka, please see the following paper: - -Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching -method for automatic license identification of source code files. In -25nd IEEE/ACM International Conference on Automated Software -Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or -download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf. - -If you use Ninka for research purposes, we would appreciate you cite -the above paper. - -Contributors ------------- - -- Paul Clough for his code to split sentences -- Anthony Kohan for writing the excel and sqlite backends -- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions -- René Scheibe for modularizing the code - -* License - - Ninka is licensed under the GPLv2+: - - Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as - published by the Free Software Foundation; either version 2 of the - License, or (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . - - Ninka::SentenceExtraxtor is a derivative work of the rule-based sentence - splitter script by Paul Paul Clough. - - comments is based on a program to remove comments by Jon Newman. - -* Requirements - -- Perl version 5 or above -- for ninka-excel: Perl module Spreadsheet::WriteExcel - https://metacpan.org/release/Spreadsheet-WriteExcel -- for ninka-sqlite: Perl module DBD::SQLite - https://metacpan.org/release/DBD-SQLite - -* How to install - - 1. Unpack the distribution in a directory. - 2. Optional: Build and install comments (make sure it is somwehere in the path) (see directory comments) - -* Usage - -ninka [options] filename - -Available options: - - -i create intermediary files - -v verbose - -Example: - - ninka -i foo.c - -It will create five files: - - 1. foo.c.comments: extracted the first comments blocks, where - the license is usually included - 2. foo.c.sentences: creates the list of sentences in the license - statement - 3. foo.c.goodsent: contains sentences that are likely to be part of - a license statement - 4. foo.c.badsent: contains the sentences that are not part of - foo.c.goodsent - 5. foo.c.senttok: Each sentence in *.goodsent is converted into a - tokenized sentence (or unmatched, when none matches) - 6. foo.c.license: List of licenses found in the file. Its contains a - single line with 3 fields (semicolon delimited): - - Licenses - - Unmatched sentences in *.senttok that were not matched - -The files are not required for Ninka's functionality. But they can help -to debug license detection issues. - -* Ninka model - -Ninka uses a pipe-model. Each stage of the pipe does something very specific: - -1. Comment extractor - - - Module: Ninka::CommentExtractor - - - Purpose: Extracts top comments of source code. - If no comment extractor is known for the language, - then extracts top lines from source (currently 700) - - - Output: .comments - -2. Split sentences in comments - - - Module: Ninka::SentenceExtractor - - - Purpose: Ninka works by matching sentences of licenses, - hence it needs to properly break text into sentences. - - - Output: .sentences - -3. Filter "good" sentences - - - Module: Ninka::SentenceFilter - - - Purpose: Some sentences are related to a license, some are not. - It is valuable to know if a file contains lines that look like - a license or not (e.g. to know that a file has no license). - - - Output: .goodsent and .badsent - -4. Tokenize sentences - - - Module: Ninka::SentenceTokenizer - - - Purpose: It creates a file that corresponds to the recognized sentence tokens. - For each sentence, it outputs its sentence token, or unknown otherwise. - - - Output: .senttok - -5. Match sentences to licenses - - - Module: Ninka::LicenseMatcher - - - Purpose: It looks at the sentence tokens and outputs the licenses found. - - - Output: .license - -The script ninka takes care of all these steps, and optionally creates -intermediary files, and writes to the stdout the licenses found. - ------- - -How to read the output: - -Assume, for example, this output: - -eq.c;MITX11noNotice;1;2;2;6;0;Copyright,-1,-1,DualLicenseIntention,GPLorOpenBSDTypeVer2,BSDpre,BSDcondSource,BSDcondBinary - - -So Ninka detects all the sentences, including the MIT variant, it -finds the GPL bsd intention. But the license is not really BSD. - -The disclaimers are not what you expect. Now, in all fairness, maybe -this is another license. - - -Let me translate the output for you: - -file: eq.c; -License(s) found: MITX11noNotice - - -;1;2;2;6;0; -Found 1 license -Composed of 2 lines (tokens) -2 tokens were ignored -6 tokens were not mached: Copyright,-1,-1,DualLicenseIntention,GPLorOpenBSDTypeVer2,BSDpre,BSDcondSource,BSDcondBinary (-1 indicates where a match happened) -0 tokens were unknown - - -Another example: - -nsAccessibilityUtils.cpp;MPLv1_1;1;1;3;7;2;UNKNOWN,MPL1_1_GPL2_LGPL2_1intentionVer0,1,-1,-1,MPLsee,Copyright,-1,Altern,UNKNOWN,MPLoptionNOTGPLVer0,MPLoptionIfNotDelete3licsVer0,licenseBlockEnd - -License matched:MPLv1_1; -One license: 1; -Composed of one token: 1; -3 token were ignored 3; -7 tokens were matched but not recognized as a license: UNKNOWN,MPL1_1_GPL2_LGPL2_1intentionVer0,1,-1,-1,MPLsee,Copyright,-1,Altern,UNKNOWN,MPLoptionNOTGPLVer0,MPLoptionIfNotDelete3licsVer0,licenseBlockEnd -2 of those tokens were unknown diff --git a/README.md b/README.md new file mode 100644 index 0000000..9a13aff --- /dev/null +++ b/README.md @@ -0,0 +1,197 @@ +Contact information +=================== + +Any feedback will be appreciated. You can email us at Daniel M. German + and Yuki Manabe + +Introduction +------------ + +Ninka is license identification tool that identifies the license(s) +under which a given source file is made available. + +This tool uses a source file as input and outputs the licenses +identified within that file. + +If you need to know the detail of Ninka, please see the following paper: + +Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching +method for automatic license identification of source code files. In +25nd IEEE/ACM International Conference on Automated Software +Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or +download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf. + +If you use Ninka for research purposes, we would appreciate you cite +the above paper. + +Contributors +------------ + +- Paul Clough for his code to split sentences +- Anthony Kohan for writing the excel and sqlite backends +- Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions +- René Scheibe for modularizing the code + +* License + + Ninka is licensed under the GPLv2+: + + Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as + published by the Free Software Foundation; either version 2 of the + License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + + Ninka::SentenceExtraxtor is a derivative work of the rule-based sentence + splitter script by Paul Paul Clough. + + comments is based on a program to remove comments by Jon Newman. + +* Requirements + +- Perl version 5 or above +- for ninka-excel: Perl module Spreadsheet::WriteExcel + https://metacpan.org/release/Spreadsheet-WriteExcel +- for ninka-sqlite: Perl module DBD::SQLite + https://metacpan.org/release/DBD-SQLite + +* How to install + + 1. Unpack the distribution in a directory. + 2. Optional: Build and install comments (make sure it is somwehere in the path) (see directory comments) + +* Usage + +ninka [options] filename + +Available options: + + -i create intermediary files + -v verbose + +Example: + + ninka -i foo.c + +It will create five files: + + 1. foo.c.comments: extracted the first comments blocks, where + the license is usually included + 2. foo.c.sentences: creates the list of sentences in the license + statement + 3. foo.c.goodsent: contains sentences that are likely to be part of + a license statement + 4. foo.c.badsent: contains the sentences that are not part of + foo.c.goodsent + 5. foo.c.senttok: Each sentence in *.goodsent is converted into a + tokenized sentence (or unmatched, when none matches) + 6. foo.c.license: List of licenses found in the file. Its contains a + single line with 3 fields (semicolon delimited): + - Licenses + - Unmatched sentences in *.senttok that were not matched + +The files are not required for Ninka's functionality. But they can help +to debug license detection issues. + +* Ninka model + +Ninka uses a pipe-model. Each stage of the pipe does something very specific: + +1. Comment extractor + + - Module: Ninka::CommentExtractor + + - Purpose: Extracts top comments of source code. + If no comment extractor is known for the language, + then extracts top lines from source (currently 700) + + - Output: .comments + +2. Split sentences in comments + + - Module: Ninka::SentenceExtractor + + - Purpose: Ninka works by matching sentences of licenses, + hence it needs to properly break text into sentences. + + - Output: .sentences + +3. Filter "good" sentences + + - Module: Ninka::SentenceFilter + + - Purpose: Some sentences are related to a license, some are not. + It is valuable to know if a file contains lines that look like + a license or not (e.g. to know that a file has no license). + + - Output: .goodsent and .badsent + +4. Tokenize sentences + + - Module: Ninka::SentenceTokenizer + + - Purpose: It creates a file that corresponds to the recognized sentence tokens. + For each sentence, it outputs its sentence token, or unknown otherwise. + + - Output: .senttok + +5. Match sentences to licenses + + - Module: Ninka::LicenseMatcher + + - Purpose: It looks at the sentence tokens and outputs the licenses found. + + - Output: .license + +The script ninka takes care of all these steps, and optionally creates +intermediary files, and writes to the stdout the licenses found. + +------ + +How to read the output: + +Assume, for example, this output: + +eq.c;MITX11noNotice;1;2;2;6;0;Copyright,-1,-1,DualLicenseIntention,GPLorOpenBSDTypeVer2,BSDpre,BSDcondSource,BSDcondBinary + + +So Ninka detects all the sentences, including the MIT variant, it +finds the GPL bsd intention. But the license is not really BSD. + +The disclaimers are not what you expect. Now, in all fairness, maybe +this is another license. + + +Let me translate the output for you: + +file: eq.c; +License(s) found: MITX11noNotice + + +;1;2;2;6;0; +Found 1 license +Composed of 2 lines (tokens) +2 tokens were ignored +6 tokens were not mached: Copyright,-1,-1,DualLicenseIntention,GPLorOpenBSDTypeVer2,BSDpre,BSDcondSource,BSDcondBinary (-1 indicates where a match happened) +0 tokens were unknown + + +Another example: + +nsAccessibilityUtils.cpp;MPLv1_1;1;1;3;7;2;UNKNOWN,MPL1_1_GPL2_LGPL2_1intentionVer0,1,-1,-1,MPLsee,Copyright,-1,Altern,UNKNOWN,MPLoptionNOTGPLVer0,MPLoptionIfNotDelete3licsVer0,licenseBlockEnd + +License matched:MPLv1_1; +One license: 1; +Composed of one token: 1; +3 token were ignored 3; +7 tokens were matched but not recognized as a license: UNKNOWN,MPL1_1_GPL2_LGPL2_1intentionVer0,1,-1,-1,MPLsee,Copyright,-1,Altern,UNKNOWN,MPLoptionNOTGPLVer0,MPLoptionIfNotDelete3licsVer0,licenseBlockEnd +2 of those tokens were unknown -- cgit v1.2.1 From 58c66d009cf2c958e6eb355eafe55c2450ef3ff8 Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:24:56 +0200 Subject: Update README.md More Markdown. --- README.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 9a13aff..f592f22 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,8 @@ Contributors - Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions - René Scheibe for modularizing the code -* License +License +------- Ninka is licensed under the GPLv2+: @@ -56,7 +57,8 @@ Contributors comments is based on a program to remove comments by Jon Newman. -* Requirements +Requirements +------------ - Perl version 5 or above - for ninka-excel: Perl module Spreadsheet::WriteExcel @@ -69,14 +71,15 @@ Contributors 1. Unpack the distribution in a directory. 2. Optional: Build and install comments (make sure it is somwehere in the path) (see directory comments) -* Usage +Usage +----- -ninka [options] filename +'''ninka [options] filename Available options: -i create intermediary files - -v verbose + -v verbose''' Example: -- cgit v1.2.1 From 3eeb3f60e155eb593d69f3ce6966e6d9bc8c38b7 Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:27:02 +0200 Subject: Update README.md again Moar Markdown. --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index f592f22..b60b983 100644 --- a/README.md +++ b/README.md @@ -74,12 +74,12 @@ Requirements Usage ----- -'''ninka [options] filename +```ninka [options] filename Available options: -i create intermediary files - -v verbose''' + -v verbose``` Example: -- cgit v1.2.1 From eb387189e9096ecfb5d8b6f2fa64cd2934aa34ed Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:27:27 +0200 Subject: Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b60b983..c9c77b5 100644 --- a/README.md +++ b/README.md @@ -74,7 +74,7 @@ Requirements Usage ----- -```ninka [options] filename + ```ninka [options] filename Available options: -- cgit v1.2.1 From a92750bf0f876feb0773f2590a13e58dc56a38f8 Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:29:41 +0200 Subject: Update README.md --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index c9c77b5..c719cdf 100644 --- a/README.md +++ b/README.md @@ -74,16 +74,16 @@ Requirements Usage ----- - ```ninka [options] filename +| ninka [options] filename Available options: - -i create intermediary files - -v verbose``` +| -i create intermediary files +| -v verbose``` Example: - ninka -i foo.c +| ninka -i foo.c It will create five files: -- cgit v1.2.1 From 888e0c95ed334f2e78dd4cb3746dbee6e6876b3a Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:30:42 +0200 Subject: Update README.md --- README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index c719cdf..2629d75 100644 --- a/README.md +++ b/README.md @@ -74,16 +74,16 @@ Requirements Usage ----- -| ninka [options] filename + ninka [options] filename Available options: -| -i create intermediary files -| -v verbose``` + -i create intermediary files + -v verbose``` Example: -| ninka -i foo.c + ninka -i foo.c It will create five files: @@ -105,7 +105,8 @@ It will create five files: The files are not required for Ninka's functionality. But they can help to debug license detection issues. -* Ninka model +Ninka model +----------- Ninka uses a pipe-model. Each stage of the pipe does something very specific: -- cgit v1.2.1 From b59f5c7ce0b4368214205a61b4ec4b5d3acfb3ad Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:32:07 +0200 Subject: Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2629d75..0c0fc9e 100644 --- a/README.md +++ b/README.md @@ -79,7 +79,7 @@ Usage Available options: -i create intermediary files - -v verbose``` + -v verbose Example: -- cgit v1.2.1 From 6a97fdbabd08d3807444c0d12490f6359124e7e2 Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:33:52 +0200 Subject: New public domain sentence. --- lib/Ninka/licensesentence.dict | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/Ninka/licensesentence.dict b/lib/Ninka/licensesentence.dict index 5ab826c..f564b3d 100644 --- a/lib/Ninka/licensesentence.dict +++ b/lib/Ninka/licensesentence.dict @@ -504,6 +504,7 @@ publicDomain:12:0:^The author of this program disclaims copyright$ publicDomain:13:0:^This file has been put into the public domain$ publicDomain:52:1:This ([^ ]+)is public domain software: publicDomain:53:1:and placed in the public domain: +publicDomain:54:1:and is placed in the public domain: qtCommercialuse:52:1:Commercial Usage Licensees holding valid Qt Commercial licenses may use this file in accordance with the Qt Commercial License Agreement provided with the Software or, alternatively, in accordance with the terms contained in a written agreement between you and Nokia: qtCommercialuse:10:2:Commercial License Usage Licensees holding valid commercial Qt licenses can use this file in accordance with the commercial license agreement provided with the Software or, alternatively, in accordance with the terms contained in a written agreement between you and (.+): -- cgit v1.2.1 From bbe071f5253741a82f07fbc9df8dd0b314442e84 Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:50:33 +0200 Subject: Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 0c0fc9e..fd552be 100644 --- a/README.md +++ b/README.md @@ -74,12 +74,12 @@ Requirements Usage ----- - ninka [options] filename + ninka [options] filename Available options: - -i create intermediary files - -v verbose + -i create intermediary files + -v verbose Example: -- cgit v1.2.1 From 4a9fb449715a9bf8738ec7c3c0924a06dc597261 Mon Sep 17 00:00:00 2001 From: "Jeremiah C. Foster" Date: Mon, 22 Jun 2015 11:54:15 +0200 Subject: Update README.md --- README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index fd552be..c2152ea 100644 --- a/README.md +++ b/README.md @@ -83,7 +83,7 @@ Available options: Example: - ninka -i foo.c + ninka -i foo.c It will create five files: @@ -165,8 +165,7 @@ How to read the output: Assume, for example, this output: -eq.c;MITX11noNotice;1;2;2;6;0;Copyright,-1,-1,DualLicenseIntention,GPLorOpenBSDTypeVer2,BSDpre,BSDcondSource,BSDcondBinary - + eq.c;MITX11noNotice;1;2;2;6;0;Copyright,-1,-1,DualLicenseIntention,GPLorOpenBSDTypeVer2,BSDpre,BSDcondSource,BSDcondBinary So Ninka detects all the sentences, including the MIT variant, it finds the GPL bsd intention. But the license is not really BSD. -- cgit v1.2.1