diff options
author | dmg <dmg@uvic.ca> | 2013-07-01 19:26:30 -0400 |
---|---|---|
committer | dmg <dmg@uvic.ca> | 2013-07-01 19:26:30 -0400 |
commit | fd9bd05d191b3baa16c074197913c9ef0a9478b8 (patch) | |
tree | 3db02ba61d30a302123c1d90883fee8052ab569c | |
parent | b68c20fc1930836b4e77c3a75efb69bf03bad7c2 (diff) | |
download | ninka-fd9bd05d191b3baa16c074197913c9ef0a9478b8.tar.gz |
tighten some definitions to avoid
false positives without hurting precision. In particular, removed
"subject" as a "legal" term.
-rw-r--r-- | ChangeLog | 4 | ||||
-rw-r--r-- | filter/Makefile | 2 | ||||
-rwxr-xr-x | filter/criticalword.dict | 5 |
3 files changed, 9 insertions, 2 deletions
@@ -1,5 +1,9 @@ 2013-07-01 dmg <dmg@uvic.ca> + * filter/criticalword.dict: tighten some definitions to avoid + false positives without hurting precision. In particular, removed + "subject" as a "legal" term. + * senttok/licensesentence.dict (publicDomain): Added another public domain. 2011-02-08 <dmg@uvic.ca> diff --git a/filter/Makefile b/filter/Makefile index eb1ffea..8c0d42b 100644 --- a/filter/Makefile +++ b/filter/Makefile @@ -1,4 +1,4 @@ default: cp ../senttok/licensesentence.dict /tmp/test.sentences ./filter.pl /tmp/test.sentences - diff -w -B /dev/null /tmp/test.badsent + egrep -v '^#' /tmp/test.badsent diff --git a/filter/criticalword.dict b/filter/criticalword.dict index a2228bf..b58a586 100755 --- a/filter/criticalword.dict +++ b/filter/criticalword.dict @@ -107,7 +107,7 @@ for details all intellectual property rights sale sell -subject +subject to terms #under# too common to be useful warranties @@ -119,6 +119,8 @@ meet some day notices legal accompanying +included with this distribution for more information +See the file public domain special exception notwithstanding @@ -128,3 +130,4 @@ suitability computer program whose purpose disclaims copyright software is covered +Copyright |