summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authordmg <dmg@uvic.ca>2013-07-01 19:26:30 -0400
committerdmg <dmg@uvic.ca>2013-07-01 19:26:30 -0400
commitfd9bd05d191b3baa16c074197913c9ef0a9478b8 (patch)
tree3db02ba61d30a302123c1d90883fee8052ab569c
parentb68c20fc1930836b4e77c3a75efb69bf03bad7c2 (diff)
downloadninka-fd9bd05d191b3baa16c074197913c9ef0a9478b8.tar.gz
tighten some definitions to avoid
false positives without hurting precision. In particular, removed "subject" as a "legal" term.
-rw-r--r--ChangeLog4
-rw-r--r--filter/Makefile2
-rwxr-xr-xfilter/criticalword.dict5
3 files changed, 9 insertions, 2 deletions
diff --git a/ChangeLog b/ChangeLog
index c58547b..6b2c2f5 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,9 @@
2013-07-01 dmg <dmg@uvic.ca>
+ * filter/criticalword.dict: tighten some definitions to avoid
+ false positives without hurting precision. In particular, removed
+ "subject" as a "legal" term.
+
* senttok/licensesentence.dict (publicDomain): Added another public domain.
2011-02-08 <dmg@uvic.ca>
diff --git a/filter/Makefile b/filter/Makefile
index eb1ffea..8c0d42b 100644
--- a/filter/Makefile
+++ b/filter/Makefile
@@ -1,4 +1,4 @@
default:
cp ../senttok/licensesentence.dict /tmp/test.sentences
./filter.pl /tmp/test.sentences
- diff -w -B /dev/null /tmp/test.badsent
+ egrep -v '^#' /tmp/test.badsent
diff --git a/filter/criticalword.dict b/filter/criticalword.dict
index a2228bf..b58a586 100755
--- a/filter/criticalword.dict
+++ b/filter/criticalword.dict
@@ -107,7 +107,7 @@ for details
all intellectual property rights
sale
sell
-subject
+subject to
terms
#under# too common to be useful
warranties
@@ -119,6 +119,8 @@ meet some day
notices
legal
accompanying
+included with this distribution for more information
+See the file
public domain
special exception
notwithstanding
@@ -128,3 +130,4 @@ suitability
computer program whose purpose
disclaims copyright
software is covered
+Copyright