summaryrefslogtreecommitdiff
path: root/utils/data-generators
diff options
context:
space:
mode:
authorMartyn Russell <martyn@lanedo.com>2009-12-11 12:44:05 +0000
committerMartyn Russell <martyn@lanedo.com>2009-12-11 12:44:05 +0000
commit74286929a4712925747056bb945b4afce447fd54 (patch)
tree7850bfbcc41b7bd80cb101cbca3cc43667c15f27 /utils/data-generators
parent0c01f4884359067accd0a790ae2906c2b412ef29 (diff)
downloadtracker-74286929a4712925747056bb945b4afce447fd54.tar.gz
Merged whitespace branch
Diffstat (limited to 'utils/data-generators')
-rw-r--r--utils/data-generators/README18
-rwxr-xr-xutils/data-generators/barnum/README24
-rwxr-xr-xutils/data-generators/barnum/convert_data.py12
-rwxr-xr-xutils/data-generators/barnum/gen_data.py10
-rwxr-xr-xutils/data-generators/barnum/gencc.py16
-rwxr-xr-xutils/data-generators/barnum/genpw.py38
-rwxr-xr-xutils/data-generators/generate-all-and-import.sh2
-rwxr-xr-xutils/data-generators/generate-data-for-contact-messages.py44
-rwxr-xr-xutils/data-generators/generate-data-for-music.py54
-rw-r--r--utils/data-generators/id32nmmTurtle.py344
-rwxr-xr-xutils/data-generators/id32ttl.py58
11 files changed, 310 insertions, 310 deletions
diff --git a/utils/data-generators/README b/utils/data-generators/README
index a197365ae..195e31504 100644
--- a/utils/data-generators/README
+++ b/utils/data-generators/README
@@ -15,7 +15,7 @@ Easy method:
./generate-all-and-import.sh
-That's it. All turtle file generators are run, and their contents
+That's it. All turtle file generators are run, and their contents
are imported to tracker.
Detailed method:
@@ -31,10 +31,10 @@ Detailed method:
./id32ttl.py directory/
This script will crawl "directory" extracting the id3 information of
- the files there.
+ the files there.
* To generate information about music according to nmm
- specification:
+ specification:
python id32nmmTurtle.py dir_of_music_files/
@@ -45,11 +45,11 @@ Detailed method:
./generate-data-for-bookmarks.py > bookmarks.ttl
This script uses bookmarks.in to get the data and prints the output
- in the stdout.
+ in the stdout.
* To generate some feeds information:
- ./get-fresh-planets.sh
+ ./get-fresh-planets.sh
(This script gets the atom feeds of planet.maemo and planet.gnome.
Be sure you have internet connection, the proxy correctly setted,
@@ -67,13 +67,13 @@ Other files
===========
Dont touch these files, they are included in the generation scripts
-described before:
+described before:
-* tools.py:
+* tools.py:
Some common functionality
-* bookmarks.in:
+* bookmarks.in:
Input information to generate bookmarks
-* songlist.ttl:
+* songlist.ttl:
The id3 information is harvested from mp3 files in the filesystem.
This is a file with pre-generated information, so can be used to
test even when you dont have a huge mp3 collection in your computer.
diff --git a/utils/data-generators/barnum/README b/utils/data-generators/barnum/README
index f36bb8361..cf11c38ac 100755
--- a/utils/data-generators/barnum/README
+++ b/utils/data-generators/barnum/README
@@ -1,25 +1,25 @@
What is Barnum?
===============
-Barnum is a python-based application for quickly and easily creating
+Barnum is a python-based application for quickly and easily creating
pseudo-random data typically used for application testing.
Why did you create Barnum?
==========================
-I am developing a shopping cart application in Django and realized that I
-needed a bunch of data to simulate the store's behavior under somewhat normal
-production usage.
+I am developing a shopping cart application in Django and realized that I
+needed a bunch of data to simulate the store's behavior under somewhat normal
+production usage.
-I got tired of always trying to think of names and addresses for customers and
+I got tired of always trying to think of names and addresses for customers and
so decided to automate the process a little bit. Such was born Barnum.
Why is Barnum unique?
=====================
-I was able to find some online systems for generating large amounts of test
-data. I could not find any application that had the breadth of data generation
-capabilities nor the ability to easily interface with Django in the way I
+I was able to find some online systems for generating large amounts of test
+data. I could not find any application that had the breadth of data generation
+capabilities nor the ability to easily interface with Django in the way I
wanted to.
One of the most unique aspects of Barnum is that the data is what I'll call
@@ -37,7 +37,7 @@ You should notice a couple of things about this data.
- There's a realistic first and last name
- The street names are also plausible
- Arthur, ND is a real city and the zip code is 58006
- - 701 is an area code used for North Dakota
+ - 701 is an area code used for North Dakota
- The fictional company is somewhat reasonable.
- The job position also makes sense.
@@ -73,7 +73,7 @@ The gen_data.py script is the primary showcase for how to create random data
using Barnum. If you run it from the command line:
python gen_data.py
-
+
You'll see some sample data output.
If you'd like to call it from another script, here's an example or two from the
@@ -114,7 +114,7 @@ endless!
Where does the data come from?
==============================
-I pulled sample data and existing scripts from a bunch of different sources.
+I pulled sample data and existing scripts from a bunch of different sources.
- The names are from 1990 US Census data http://www.census.gov/genealogy/names/names_files.html
- The street names are from real us streets in a few locales.
- Company names are randomly generated by me.
@@ -157,7 +157,7 @@ Where did this name come from?
Choosing names for projects is kind of fun but kind of a hassle. There needs to be a name
but it can't be anything too stupid. I started off thinking of an acronym and ended up with
-PT ("Python Testing") and immediately thought of P.T. Barnum. I really liked the name
+PT ("Python Testing") and immediately thought of P.T. Barnum. I really liked the name
because I was using this for Satchmo and project made in Django. Single word names seemed
cool. Also, I like the fact that P.T. Barnum was really a master at making people think
something was real that wasn't. Which is exactly what this little script does.
diff --git a/utils/data-generators/barnum/convert_data.py b/utils/data-generators/barnum/convert_data.py
index e7915cbbb..cd9218c9f 100755
--- a/utils/data-generators/barnum/convert_data.py
+++ b/utils/data-generators/barnum/convert_data.py
@@ -1,6 +1,6 @@
#!/usr/bin/python2.5
"""
-This application converts the various text files stored in the source-data
+This application converts the various text files stored in the source-data
directory into a pickled python object to be used by the random data
generator scripts
@@ -29,10 +29,10 @@ import random
import os
data_dir = "barnum/source-data"
-simple_files_to_process = ['street-names.txt', 'street-types.txt', 'latin-words.txt',
+simple_files_to_process = ['street-names.txt', 'street-types.txt', 'latin-words.txt',
'email-domains.txt', 'job-titles.txt', 'company-names.txt',
'company-types.txt']
-
+
def load_files():
# Process Zip Codes
all_zips = {}
@@ -48,7 +48,7 @@ def load_files():
state_area_codes = {}
for line in area_code_file:
clean_line = line.replace(' ','').rstrip('\n')
- state_area_codes[line.split(':')[0]] = clean_line[3:].split(',')
+ state_area_codes[line.split(':')[0]] = clean_line[3:].split(',')
pickle.dump(state_area_codes, output)
area_code_file.close()
@@ -95,6 +95,6 @@ if __name__ == "__main__":
response = string.lower(raw_input("Type 'yes' to reload the data from source files and create a new source file: "))
if response == 'yes':
load_files()
-
-
+
+
diff --git a/utils/data-generators/barnum/gen_data.py b/utils/data-generators/barnum/gen_data.py
index f4941e68b..333ecd60a 100755
--- a/utils/data-generators/barnum/gen_data.py
+++ b/utils/data-generators/barnum/gen_data.py
@@ -73,7 +73,7 @@ def create_name(full_name=True, gender=None):
def create_job_title():
return random.choice(job_titles)
-
+
def create_phone(zip_code=None):
if not zip_code:
zip_code = random.choice(all_zips.keys())
@@ -99,7 +99,7 @@ def create_sentence(min=4, max=15):
for word in range(1, random.randint(min, max-1)):
sentence.append(random.choice(latin_words))
return " ".join(sentence) + "."
-
+
def create_paragraphs(num=1, min_sentences=4, max_sentences=7):
paragraphs = []
for para in range(0, num):
@@ -122,9 +122,9 @@ def create_date(numeric=True, past=False, max_years_future=10, max_years_past=10
else:
start = datetime.datetime.today()
num_days = max_years_future * 365
-
+
random_days = random.randint(1, num_days)
- random_date = start + datetime.timedelta(days=random_days)
+ random_date = start + datetime.timedelta(days=random_days)
return(random_date)
def create_birthday(age=random.randint (16, 80)):
@@ -143,7 +143,7 @@ def create_company_name(biz_type=None):
if not biz_type:
biz_type = random.choice(company_type)
if biz_type == "LawFirm":
- name.append( random.choice(last_names)+ ", " + random.choice(last_names) + " & " +
+ name.append( random.choice(last_names)+ ", " + random.choice(last_names) + " & " +
random.choice(last_names))
name.append('LLP')
else:
diff --git a/utils/data-generators/barnum/gencc.py b/utils/data-generators/barnum/gencc.py
index 85dc26861..37db52b10 100755
--- a/utils/data-generators/barnum/gencc.py
+++ b/utils/data-generators/barnum/gencc.py
@@ -26,10 +26,10 @@ import random
import sys
import copy
-visaPrefixList = [ ['4', '5', '3', '9'],
- ['4', '5', '5', '6'],
+visaPrefixList = [ ['4', '5', '3', '9'],
+ ['4', '5', '5', '6'],
['4', '9', '1', '6'],
- ['4', '5', '3', '2'],
+ ['4', '5', '3', '2'],
['4', '9', '2', '9'],
['4', '0', '2', '4', '0', '0', '7', '1'],
['4', '4', '8', '6'],
@@ -67,8 +67,8 @@ jcbPrefixList16 = [ ['3', '0', '8', '8'],
jcbPrefixList15 = [ ['2', '1', '0', '0'],
['1', '8', '0', '0'] ]
-voyagerPrefixList = [ ['8', '6', '9', '9'] ]
-
+voyagerPrefixList = [ ['8', '6', '9', '9'] ]
+
"""
'prefix' is the start of the CC number as a string, any number of digits.
@@ -85,7 +85,7 @@ def completed_number(prefix, length):
ccnumber.append(digit)
- # Calculate sum
+ # Calculate sum
sum = 0
pos = 0
@@ -113,7 +113,7 @@ def completed_number(prefix, length):
checkdigit = ((sum / 10 + 1) * 10 - sum) % 10
ccnumber.append( str(checkdigit) )
-
+
return ''.join(ccnumber)
def credit_card_number(prefixList, length, howMany):
@@ -121,7 +121,7 @@ def credit_card_number(prefixList, length, howMany):
result = []
for i in range(howMany):
-
+
ccnumber = copy.copy( random.choice(prefixList) )
result.append( completed_number(ccnumber, length) )
diff --git a/utils/data-generators/barnum/genpw.py b/utils/data-generators/barnum/genpw.py
index 446ea0c88..f5cb8b831 100755
--- a/utils/data-generators/barnum/genpw.py
+++ b/utils/data-generators/barnum/genpw.py
@@ -1,32 +1,32 @@
#!/usr/bin/python2.5
## Generate a human readable 'random' password
-## password will be generated in the form 'word'+digits+'word'
+## password will be generated in the form 'word'+digits+'word'
## eg.,nice137pass
## parameters: number of 'characters' , number of 'digits'
## Pradeep Kishore Gowda <pradeep at btbytes.com >
-## License : GPL
+## License : GPL
## Date : 2005.April.15
-## Revision 1.2
-## ChangeLog:
-## 1.1 - fixed typos
-## 1.2 - renamed functions _apart & _npart to a_part & n_part as zope does not allow functions to
+## Revision 1.2
+## ChangeLog:
+## 1.1 - fixed typos
+## 1.2 - renamed functions _apart & _npart to a_part & n_part as zope does not allow functions to
## start with _
def nicepass(alpha=6,numeric=2):
"""
- returns a human-readble password (say rol86din instead of
- a difficult to remember K8Yn9muL )
+ returns a human-readble password (say rol86din instead of
+ a difficult to remember K8Yn9muL )
"""
import string
import random
vowels = ['a','e','i','u', 'y']
consonants = [a for a in string.ascii_lowercase if a not in vowels]
digits = string.digits
-
+
####utility functions
def a_part(slen):
ret = ''
- for i in range(slen):
+ for i in range(slen):
if i%2 ==0:
randid = random.randint(0,20) #number of consonants
ret += consonants[randid]
@@ -34,26 +34,26 @@ def nicepass(alpha=6,numeric=2):
randid = random.randint(0,4) #number of vowels
ret += vowels[randid]
return ret
-
+
def n_part(slen):
ret = ''
for i in range(slen):
randid = random.randint(0,9) #number of digits
ret += digits[randid]
return ret
-
- ####
- fpl = alpha/2
+
+ ####
+ fpl = alpha/2
if alpha % 2 :
- fpl = int(alpha/2) + 1
- lpl = alpha - fpl
-
+ fpl = int(alpha/2) + 1
+ lpl = alpha - fpl
+
start = a_part(fpl)
mid = n_part(numeric)
end = a_part(lpl)
-
+
return "%s%s%s" % (start,mid,end)
-
+
if __name__ == "__main__":
for i in range(10):
print nicepass(6,2)
diff --git a/utils/data-generators/generate-all-and-import.sh b/utils/data-generators/generate-all-and-import.sh
index 40c53c5cb..971e9936d 100755
--- a/utils/data-generators/generate-all-and-import.sh
+++ b/utils/data-generators/generate-all-and-import.sh
@@ -1,6 +1,6 @@
#!/bin/sh
# generate and import all local .ttl files
-# takes as one parameter the number of entries that should be
+# takes as one parameter the number of entries that should be
# generated in each category
./generate-all.sh $1
diff --git a/utils/data-generators/generate-data-for-contact-messages.py b/utils/data-generators/generate-data-for-contact-messages.py
index 30e16f58d..d47ec2b7e 100755
--- a/utils/data-generators/generate-data-for-contact-messages.py
+++ b/utils/data-generators/generate-data-for-contact-messages.py
@@ -63,7 +63,7 @@ def generatePhoneNumber():
def generatePhoneCalls (many):
for i in range (0, many):
callUID = str(random.randint(0, sys.maxint))
-
+
duration = random.randint (0, 50)
relationType = random.randint (0,100) % 2
if (relationType == 0):
@@ -95,9 +95,9 @@ def generateSMS (many):
#Received SMS
sys.stdout.write('\tnmo:from [a nco:Contact; nco:hasPhoneNumber <' + phoneUri + '>];\n')
sys.stdout.write('\tnmo:to [a nco:Contact; nco:hasPhoneNumber <' + myOwnPhoneNumberURI + '>];\n')
-
+
sys.stdout.write('\tnmo:sentDate "' + getPseudoRandomDate () + '";\n')
- sys.stdout.write('\tnmo:receivedDate "' + getPseudoRandomDate () + '";\n')
+ sys.stdout.write('\tnmo:receivedDate "' + getPseudoRandomDate () + '";\n')
if (random.randint(0, 4) > 3):
sys.stdout.write ('\tnao:hasTag [a nao:Tag ; nao:prefLabel ' + getRandomTag () +'];\n')
sys.stdout.write('\tnmo:plainTextMessageContent "' + str.replace(gen_data.create_paragraphs(1, 5, 8), "\n", "") + '".\n')
@@ -136,7 +136,7 @@ def generateEmail(sys, gen_data, str, random):
ccid = random.randint(0, len(previousContacts) - 1)
sys.stdout.write('\tnmo:cc <urn:uuid:' + previousContacts[ccid] + '>;\n')
sys.stdout.write('\tnmo:messageHeader [a nmo:MessageHeader; nmo:headerName "cc"; nmo:headerValue "' + previousEmailAddresses[ccid] + '"];\n')
-
+
if random.randint(0, 10) > 7 and len(previousContacts) > 1:
bccid = random.randint(0, len(previousContacts) - 1)
sys.stdout.write('\tnmo:bcc <urn:uuid:' + previousContacts[bccid] + '>;\n')
@@ -144,7 +144,7 @@ def generateEmail(sys, gen_data, str, random):
#TODO add some sense to the email titles. Some reply chains as well.
if (random.randint(0, 4) > 3):
sys.stdout.write ('\tnao:hasTag [a nao:Tag ; nao:prefLabel ' + getRandomTag () +'];\n')
-
+
sys.stdout.write('\tnmo:messageSubject "' + str.replace(gen_data.create_paragraphs(1, 2, 2), "\n", "") + '";\n')
sys.stdout.write('\tnmo:plainTextMessageContent "' + str.replace(gen_data.create_paragraphs(1, 2, 3), "\n", "") + '".\n')
@@ -173,7 +173,7 @@ sys.stdout.write("@prefix nid3: <http://www.semanticdesktop.org/ontologies/200
sys.stdout.write("@prefix nao: <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#>.\n")
sys.stdout.write("@prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>.\n")
sys.stdout.write("@prefix nmo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nmo#>.\n")
-sys.stdout.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n")
+sys.stdout.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n")
sys.stdout.write("@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>.\n")
sys.stdout.write("@prefix ncal: <http://www.semanticdesktop.org/ontologies/2007/04/02/ncal#>.\n")
@@ -181,7 +181,7 @@ sys.stdout.write("@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.\n")
sys.stdout.write('<mailto:me@me.com> a nco:EmailAddress; \n')
sys.stdout.write('\tnco:emailAddress "me@me.com".\n')
sys.stdout.write('\n')
-sys.stdout.write('<urn:uuid:1> a nco:PersonContact; \n')
+sys.stdout.write('<urn:uuid:1> a nco:PersonContact; \n')
sys.stdout.write('\tnco:fullname "Me Myself";\n')
sys.stdout.write('\tnco:nameGiven "Me";\n')
sys.stdout.write('\tnco:nameFamily "Myself";\n')
@@ -204,7 +204,7 @@ for dummy in range (0, count):
firstName, lastName = gen_data.create_name()
zip, city, state = gen_data.create_city_state_zip()
postalAddressID=str(random.randint(0, sys.maxint))
-
+
UID = str(random.randint(0, sys.maxint))
phoneNumber = gen_data.create_phone()
phoneUri = 'tel:+1' + phoneNumber.translate(allchars,' -()')
@@ -216,11 +216,11 @@ for dummy in range (0, count):
hasPhoneNumber = False
jobTitle = gen_data.create_job_title()
- generatePostalAddress()
- generateEmailAddress()
-
+ generatePostalAddress()
+ generateEmailAddress()
+
#Only every 3rd have Phone or IM to add variation.
- if random.randint(0, 3) > 2 or count == 1:
+ if random.randint(0, 3) > 2 or count == 1:
generateIMAccount(gen_data, str)
hasIMAccount = True
if random.randint(0, 3) > 2 or count == 1:
@@ -229,8 +229,8 @@ for dummy in range (0, count):
if (withPhone): generatePhoneCalls(3)
if (withPhone): generateSMS (4)
-
- sys.stdout.write('<urn:uuid:' + UID + '> a nco:PersonContact; \n')
+
+ sys.stdout.write('<urn:uuid:' + UID + '> a nco:PersonContact; \n')
sys.stdout.write('\tnco:fullname "' + firstName + ' ' + lastName +'";\n')
sys.stdout.write('\tnco:nameGiven "' + firstName + '";\n')
sys.stdout.write('\tnco:nameFamily "' + lastName + '";\n')
@@ -238,26 +238,26 @@ for dummy in range (0, count):
#sys.stdout.write('\tnco:title "'+jobTitle+'";\n')
sys.stdout.write('\tnco:hasEmailAddress <mailto:' + emailAddress + '>;\n')
if hasPhoneNumber: sys.stdout.write('\tnco:hasPhoneNumber <' + phoneUri + '>;\n')
- if hasIMAccount: sys.stdout.write('\tnco:hasIMAccount <xmpp:' + xmppAddress + '>;\n')
+ if hasIMAccount: sys.stdout.write('\tnco:hasIMAccount <xmpp:' + xmppAddress + '>;\n')
if (random.randint(0, 4) > 3):
sys.stdout.write ('\tnao:hasTag [a nao:Tag ; nao:prefLabel ' + getRandomTag () + '];\n')
sys.stdout.write('\tnco:hasPostalAddress <urn:uuid:' + postalAddressID + '>.\n')
sys.stdout.write('\n')
-
+
#calendarEntryID=str(random.randint(0, sys.maxint))
#if random.randint(0, 3)>2 and count>2 and len(previousContacts):
# generateCalendarEntry(gen_data, str, random)
-
- #20% Send emails. Those who do, send 1-30 emails. EMails have CC and BCC people
+
+ #20% Send emails. Those who do, send 1-30 emails. EMails have CC and BCC people
if random.randint(0, 10)>8 or count==1:
- emailcount=random.randint(1, 30)
+ emailcount=random.randint(1, 30)
for dummy in range (0, emailcount):
generateEmail(sys, gen_data, str, random)
sys.stdout.write('\n')
previousContacts.append(UID)
previousEmailAddresses.append(emailAddress)
-
- #TODO INSERT IM - Use just a nmo:Message for that for now.
-
+
+ #TODO INSERT IM - Use just a nmo:Message for that for now.
+
#TODO: Insert bookmarks
diff --git a/utils/data-generators/generate-data-for-music.py b/utils/data-generators/generate-data-for-music.py
index a13c7afc7..d09e6319c 100755
--- a/utils/data-generators/generate-data-for-music.py
+++ b/utils/data-generators/generate-data-for-music.py
@@ -41,9 +41,9 @@ def generate_name():
def update_tag(artistid, artistname, albumid, trackid, genreid):
global fileid
-
+
length = random.randint(5000,5000000)
- song = 'SongTitle%03u' % fileid
+ song = 'SongTitle%03u' % fileid
album = 'Album%03u' % albumid
genre = 'Genre%03u' % genreid
trackstr = str(artistid) + '/' + str(trackid)
@@ -55,7 +55,7 @@ def update_tag(artistid, artistname, albumid, trackid, genreid):
random.randint(1, 12),
random.randint(1, 28))
created = modified
-
+
if not artist_UID.has_key(artistname):
#print " The new artist is "+artist
UID = str(random.randint(0, sys.maxint))
@@ -68,8 +68,8 @@ def update_tag(artistid, artistname, albumid, trackid, genreid):
if not album_UID.has_key(album):
album_UID[album] = album
f.write('<urn:album:' + album + '> a nmm:MusicAlbum; \n')
-
- if len(UID)>0:
+
+ if len(UID)>0:
f.write('\tnmm:albumArtist <urn:uuid:' + UID + '>;\n')
f.write('\tnie:title "' + album + '".\n\n')
@@ -77,7 +77,7 @@ def update_tag(artistid, artistname, albumid, trackid, genreid):
UID = artist_UID[artistname]
f.write('<file://' + urllib.pathname2url(fullpath) + '> a nmm:MusicPiece,nfo:FileDataObject;\n')
- if len(song) > 0:
+ if len(song) > 0:
f.write('\tnie:title "' + song + '";\n')
f.write('\tnfo:fileName \"' + artistname + '.mp3\";\n')
@@ -89,7 +89,7 @@ def update_tag(artistid, artistname, albumid, trackid, genreid):
if len(trackstr) > 0:
trackArray = trackstr.split("/")
- if len(trackArray) > 0:
+ if len(trackArray) > 0:
f.write('\tnmm:trackNumber ' + trackArray[0] + ';\n')
f.write('\tnmm:length ' + str(length) + ';\n')
@@ -110,7 +110,7 @@ def create_track(artistid, albumid, genreid, settings):
def generate(settings):
''' A total of TotalTracks files will be generated.
These contain the specified number of albums.'''
- '''
+ '''
filepath = settings['OutputDir']
try:
os.makedirs(filepath)
@@ -118,7 +118,7 @@ def generate(settings):
print 'Directory exists'
'''
- global album_UID
+ global album_UID
genreid = 1
artistid = 1
albumid = 0
@@ -137,36 +137,36 @@ if __name__ == '__main__':
parser = OptionParser()
- parser.add_option("-T", "--TotalTracks",
+ parser.add_option("-T", "--TotalTracks",
dest='TotalTracks',
- help="Specify (mandatory) the total number of files to be generated",
+ help="Specify (mandatory) the total number of files to be generated",
metavar="TotalTracks")
- parser.add_option("-r", "--ArtistCount",
- dest='ArtistCount',
+ parser.add_option("-r", "--ArtistCount",
+ dest='ArtistCount',
default=2,
- help="Specify (mandatory) the total number of Artists." ,
+ help="Specify (mandatory) the total number of Artists." ,
metavar="ArtistCount")
- parser.add_option("-a", "--album-count",
- dest='AlbumCount',
+ parser.add_option("-a", "--album-count",
+ dest='AlbumCount',
default=5,
- help="Specify (mandatory) the number of albums per artist.",
+ help="Specify (mandatory) the number of albums per artist.",
metavar="AlbumCount")
- parser.add_option("-g", "--genre-count",
- dest='GenreCount',
+ parser.add_option("-g", "--genre-count",
+ dest='GenreCount',
default=10,
- help="Specify the genre count" ,
+ help="Specify the genre count" ,
metavar="GenreCount")
- parser.add_option("-o", "--output",
- dest='OutputFileName',
+ parser.add_option("-o", "--output",
+ dest='OutputFileName',
default='generate-data-for-music.ttl',
- help="Specify the output ttl filename. e.g. -T 2000 -r 25 -a 20 -g 10 -o generated_songs.ttl",
+ help="Specify the output ttl filename. e.g. -T 2000 -r 25 -a 20 -g 10 -o generated_songs.ttl",
metavar="OutputFileName")
(options, args) = parser.parse_args()
-
- mandatories = ['TotalTracks', 'ArtistCount', 'AlbumCount']
- for m in mandatories:
- if not options.__dict__[m]:
+
+ mandatories = ['TotalTracks', 'ArtistCount', 'AlbumCount']
+ for m in mandatories:
+ if not options.__dict__[m]:
# Set defaults
if m == "TotalTracks":
options.TotalTracks = 100
diff --git a/utils/data-generators/id32nmmTurtle.py b/utils/data-generators/id32nmmTurtle.py
index 6a98fd786..00d4ed099 100644
--- a/utils/data-generators/id32nmmTurtle.py
+++ b/utils/data-generators/id32nmmTurtle.py
@@ -1,172 +1,172 @@
-#!/usr/bin/python
-# -*- coding: utf-8 -*-
-#
-#
-# Copyright (c) 2007 Urho Konttori <urho.konttori@gmail.com>
-#
-# This program is free software; you can redistribute it and/or
-# modify it under the terms of the GNU General Public License as
-# published by the Free Software Foundation; either version 2 of the
-# License, or (at your option) any later version.
-#
-# This program is distributed in the hope that it will be useful, but
-# WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-# General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, write to the Free Software
-# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
-# 02111-1307, USA.
-#
-import os, datetime, time, internals.id3reader as id3reader
-import sys, urllib, random
-indexPath="/Volumes/OSX"
-
-if len(sys.argv)>1:
- indexPath=str(sys.argv[1])
-else:
- print "Usage: python id32nmmTurtle.py <path-to-index>."
- sys.exit (1)
-
-
-songcounter=0
-
-filelist=[]
-folderlist=[]
-foldermap=[]
-depth=0
-artist_UID = {}
-album_UID = {}
-
-class FileProcessor:
-
- def __init__(self):
- self.f=open("./songlist.ttl", 'w' )
-
- self.f.write("@prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>.\n")
- self.f.write("@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.\n")
- self.f.write("@prefix nrl: <http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#>.\n")
- self.f.write("@prefix nid3: <http://www.semanticdesktop.org/ontologies/2007/05/10/nid3#>.\n")
- self.f.write("@prefix nmm: <http://www.tracker-project.org/temp/nmm#>.\n")
- self.f.write("@prefix nao: <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#>.\n")
- self.f.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n")
- self.f.write("@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>.\n");
- self.f.write("@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.\n")
-
- def addMp3(self, fullpath, fileName):
- global songcounter
- global g_UID
- try:
- year=""
- album=""
- song=""
- artist=""
- trackstr=""
- genre=""
- comment=""
- year=""
- length=0
- id3r = id3reader.Reader(fullpath)
- if id3r.getValue('album'): album = id3r.getValue('album')
- if id3r.getValue('title'): song = id3r.getValue('title')
- if id3r.getValue('performer'): artist = id3r.getValue('performer')
- if id3r.getValue('year'): year = id3r.getValue('year')
- if id3r.getValue('genre'): genre = id3r.getValue('genre')
- if id3r.getValue('comment'): comment = id3r.getValue('comment')
- length=random.randint(5000,5000000 )
- if id3r.getValue('track'):
- trackstr=str(id3r.getValue('track'))
- if id3r.getValue('TPA'):
- partOfSet=id3r.getValue('TPA')
- if partOfSet=="None": partOfSet=""
- modified=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getmtime(fullpath)))
- created=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getctime(fullpath)))
- size = os.path.getsize(fullpath)
-
-
- artistUID = ""
- albumUID = ""
- UID=""
- if not artist_UID.has_key(artist):
- #print " The new artist is "+artist
- UID = str(random.randint(0, sys.maxint))
- artist_UID[artist] = UID
- self.f.write('<urn:uuid:'+UID+'> a nco:Contact; \n')
- #self.f.write('<urn:artist:'+artist+'> a nco:Contact; \n')
- self.f.write('\tnco:fullname "'+artist+'".\n\n')
- else :
- #print 'Artist exists ' + artist
- UID = artist_UID[artist]
-
- if not album_UID.has_key(album):
- #print " The new album is "+artist
-
- album_UID[artist] = album
- self.f.write('<urn:album:'+album+'> a nmm:MusicAlbum; \n')
-
- if len(partOfSet)>0:
- setArray=partOfSet.split("/")
- if len(setArray)>0: self.f.write('\tnmm:setNumber '+setArray[0]+';\n')
- if len(setArray)>1: self.f.write('\tnmm:setCount '+setArray[1]+';\n')
- if len(UID)>0: self.f.write('\tnmm:albumArtist <urn:uuid:'+UID+'>;\n')
- self.f.write('\tnie:title "'+album+'".\n\n')
- else :
- #print 'Artist exists ' + artist
- UID = artist_UID[artist]
-
- self.f.write('<file://'+urllib.pathname2url(fullpath)+'> a nmm:MusicPiece,nfo:FileDataObject;\n')
- if len(song)>0: self.f.write('\tnie:title "'+song+'";\n')
- if len(fileName)>0: self.f.write('\tnfo:fileName "'+fileName+'";\n')
- if len(modified)>0: self.f.write('\tnfo:fileLastModified "'+modified+'" ;\n')
- if len(created)>0: self.f.write('\tnfo:fileCreated "'+created+'";\n')
- self.f.write('\tnfo:fileSize '+str(size)+';\n')
- if len(album)>0: self.f.write('\tnmm:musicAlbum <urn:album:'+album+'>;\n')
-# if len(year)>0: self.f.write('\tnid3:recordingYear '+str(year)+';\n')
- if len(genre)>0: self.f.write('\tnmm:genre "'+genre+'";\n')
- if len(trackstr)>0:
- trackArray=trackstr.split("/")
- if len(trackArray)>0: self.f.write('\tnmm:trackNumber '+trackArray[0]+';\n')
-
-
- if length>0: self.f.write('\tnmm:length '+str(length)+';\n')
- if len(UID)>0: self.f.write('\tnmm:performer <urn:uuid:'+UID+'>.\n\n')
-
-
- songcounter+=1
-
-
- if songcounter==1:
- print id3r.dump()
-
-
-
- except IOError, message:
- print "ID TAG ERROR: getIDTags(): IOERROR:", message
-
- def getOSDir(self,addpath, filelist, depth=0):
- try:
- test=os.path.exists(addpath)
- depth=depth+1
- if (test and depth<8):
- #folderlist.append(addpath)
- #folderCounter=len(folderlist)-1
- for fileName in os.listdir (addpath):
- try:
- #filelist.append(addpath+"/"+fileName)
- if fileName.endswith(".mp3") or fileName.endswith(".MP3"):
- self.addMp3(addpath+"/"+fileName, fileName)
- #foldermap.append(folderCounter)
- if os.path.isdir(addpath+"/"+fileName) and not (fileName.find('.')==0) and not (fileName.find("debian")==0) and not (fileName.find('Maps')==0) and not (fileName.find('maps')==0):
- self.getOSDir(addpath+"/"+fileName,filelist, depth)
- except OSError, message:
- print "getOSDir():OSError:", message
- except OSError, message:
- print "getOSDir():OSError:", message
-
-
-fileProcessor=FileProcessor()
-startTime = time.time()
-fileProcessor.getOSDir(indexPath, filelist, depth)
-fileProcessor.f.close()
-print "created "+ str(songcounter) +" songs to turtle file in " + str(time.time()-startTime)+ " seconds."
+#!/usr/bin/python
+# -*- coding: utf-8 -*-
+#
+#
+# Copyright (c) 2007 Urho Konttori <urho.konttori@gmail.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation; either version 2 of the
+# License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+# 02111-1307, USA.
+#
+import os, datetime, time, internals.id3reader as id3reader
+import sys, urllib, random
+indexPath="/Volumes/OSX"
+
+if len(sys.argv)>1:
+ indexPath=str(sys.argv[1])
+else:
+ print "Usage: python id32nmmTurtle.py <path-to-index>."
+ sys.exit (1)
+
+
+songcounter=0
+
+filelist=[]
+folderlist=[]
+foldermap=[]
+depth=0
+artist_UID = {}
+album_UID = {}
+
+class FileProcessor:
+
+ def __init__(self):
+ self.f=open("./songlist.ttl", 'w' )
+
+ self.f.write("@prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>.\n")
+ self.f.write("@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.\n")
+ self.f.write("@prefix nrl: <http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#>.\n")
+ self.f.write("@prefix nid3: <http://www.semanticdesktop.org/ontologies/2007/05/10/nid3#>.\n")
+ self.f.write("@prefix nmm: <http://www.tracker-project.org/temp/nmm#>.\n")
+ self.f.write("@prefix nao: <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#>.\n")
+ self.f.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n")
+ self.f.write("@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>.\n");
+ self.f.write("@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.\n")
+
+ def addMp3(self, fullpath, fileName):
+ global songcounter
+ global g_UID
+ try:
+ year=""
+ album=""
+ song=""
+ artist=""
+ trackstr=""
+ genre=""
+ comment=""
+ year=""
+ length=0
+ id3r = id3reader.Reader(fullpath)
+ if id3r.getValue('album'): album = id3r.getValue('album')
+ if id3r.getValue('title'): song = id3r.getValue('title')
+ if id3r.getValue('performer'): artist = id3r.getValue('performer')
+ if id3r.getValue('year'): year = id3r.getValue('year')
+ if id3r.getValue('genre'): genre = id3r.getValue('genre')
+ if id3r.getValue('comment'): comment = id3r.getValue('comment')
+ length=random.randint(5000,5000000 )
+ if id3r.getValue('track'):
+ trackstr=str(id3r.getValue('track'))
+ if id3r.getValue('TPA'):
+ partOfSet=id3r.getValue('TPA')
+ if partOfSet=="None": partOfSet=""
+ modified=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getmtime(fullpath)))
+ created=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getctime(fullpath)))
+ size = os.path.getsize(fullpath)
+
+
+ artistUID = ""
+ albumUID = ""
+ UID=""
+ if not artist_UID.has_key(artist):
+ #print " The new artist is "+artist
+ UID = str(random.randint(0, sys.maxint))
+ artist_UID[artist] = UID
+ self.f.write('<urn:uuid:'+UID+'> a nco:Contact; \n')
+ #self.f.write('<urn:artist:'+artist+'> a nco:Contact; \n')
+ self.f.write('\tnco:fullname "'+artist+'".\n\n')
+ else:
+ #print 'Artist exists ' + artist
+ UID = artist_UID[artist]
+
+ if not album_UID.has_key(album):
+ #print " The new album is "+artist
+
+ album_UID[artist] = album
+ self.f.write('<urn:album:'+album+'> a nmm:MusicAlbum; \n')
+
+ if len(partOfSet)>0:
+ setArray=partOfSet.split("/")
+ if len(setArray)>0: self.f.write('\tnmm:setNumber '+setArray[0]+';\n')
+ if len(setArray)>1: self.f.write('\tnmm:setCount '+setArray[1]+';\n')
+ if len(UID)>0: self.f.write('\tnmm:albumArtist <urn:uuid:'+UID+'>;\n')
+ self.f.write('\tnie:title "'+album+'".\n\n')
+ else:
+ #print 'Artist exists ' + artist
+ UID = artist_UID[artist]
+
+ self.f.write('<file://'+urllib.pathname2url(fullpath)+'> a nmm:MusicPiece,nfo:FileDataObject;\n')
+ if len(song)>0: self.f.write('\tnie:title "'+song+'";\n')
+ if len(fileName)>0: self.f.write('\tnfo:fileName "'+fileName+'";\n')
+ if len(modified)>0: self.f.write('\tnfo:fileLastModified "'+modified+'" ;\n')
+ if len(created)>0: self.f.write('\tnfo:fileCreated "'+created+'";\n')
+ self.f.write('\tnfo:fileSize '+str(size)+';\n')
+ if len(album)>0: self.f.write('\tnmm:musicAlbum <urn:album:'+album+'>;\n')
+# if len(year)>0: self.f.write('\tnid3:recordingYear '+str(year)+';\n')
+ if len(genre)>0: self.f.write('\tnmm:genre "'+genre+'";\n')
+ if len(trackstr)>0:
+ trackArray=trackstr.split("/")
+ if len(trackArray)>0: self.f.write('\tnmm:trackNumber '+trackArray[0]+';\n')
+
+
+ if length>0: self.f.write('\tnmm:length '+str(length)+';\n')
+ if len(UID)>0: self.f.write('\tnmm:performer <urn:uuid:'+UID+'>.\n\n')
+
+
+ songcounter+=1
+
+
+ if songcounter==1:
+ print id3r.dump()
+
+
+
+ except IOError, message:
+ print "ID TAG ERROR: getIDTags(): IOERROR:", message
+
+ def getOSDir(self,addpath, filelist, depth=0):
+ try:
+ test=os.path.exists(addpath)
+ depth=depth+1
+ if (test and depth<8):
+ #folderlist.append(addpath)
+ #folderCounter=len(folderlist)-1
+ for fileName in os.listdir (addpath):
+ try:
+ #filelist.append(addpath+"/"+fileName)
+ if fileName.endswith(".mp3") or fileName.endswith(".MP3"):
+ self.addMp3(addpath+"/"+fileName, fileName)
+ #foldermap.append(folderCounter)
+ if os.path.isdir(addpath+"/"+fileName) and not (fileName.find('.')==0) and not (fileName.find("debian")==0) and not (fileName.find('Maps')==0) and not (fileName.find('maps')==0):
+ self.getOSDir(addpath+"/"+fileName,filelist, depth)
+ except OSError, message:
+ print "getOSDir():OSError:", message
+ except OSError, message:
+ print "getOSDir():OSError:", message
+
+
+fileProcessor=FileProcessor()
+startTime = time.time()
+fileProcessor.getOSDir(indexPath, filelist, depth)
+fileProcessor.f.close()
+print "created "+ str(songcounter) +" songs to turtle file in " + str(time.time()-startTime)+ " seconds."
diff --git a/utils/data-generators/id32ttl.py b/utils/data-generators/id32ttl.py
index 26bb934dd..d1f342595 100755
--- a/utils/data-generators/id32ttl.py
+++ b/utils/data-generators/id32ttl.py
@@ -38,18 +38,18 @@ foldermap=[]
depth=0
artist_UID = {}
class FileProcessor:
-
+
def __init__(self):
self.f=open("./songlist.ttl", 'w' )
-
+
self.f.write("@prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>.\n")
self.f.write("@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.\n")
self.f.write("@prefix nrl: <http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#>.\n")
self.f.write("@prefix nid3: <http://www.semanticdesktop.org/ontologies/2007/05/10/nid3#>.\n")
self.f.write("@prefix nao: <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#>.\n")
- self.f.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n")
+ self.f.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n")
self.f.write("@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.\n")
-
+
def addMp3(self, fullpath, fileName):
global songcounter
global g_UID
@@ -79,50 +79,50 @@ class FileProcessor:
modified=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getmtime(fullpath)))
created=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getctime(fullpath)))
size = os.path.getsize(fullpath)
-
- #UID=str(random.randint(0, sys.maxint))
+
+ #UID=str(random.randint(0, sys.maxint))
UID = ""
if not artist_UID.has_key(artist):
- #print " The new artist is "+artist
- UID = str(random.randint(0, sys.maxint))
- artist_UID[artist] = UID
- self.f.write('<urn:uuid:'+UID+'> a nco:Contact; \n')
- self.f.write('\tnco:fullname "'+artist+'".\n')
+ #print " The new artist is "+artist
+ UID = str(random.randint(0, sys.maxint))
+ artist_UID[artist] = UID
+ self.f.write('<urn:uuid:'+UID+'> a nco:Contact; \n')
+ self.f.write('\tnco:fullname "'+artist+'".\n')
else :
- #print 'Artist exists ' + artist
- UID = artist_UID[artist]
-
- self.f.write('<file://'+urllib.pathname2url(fullpath)+'> a nid3:ID3Audio,nfo:FileDataObject;\n')
- if len(fileName)>0: self.f.write('\tnfo:fileName "'+fileName+'";\n')
- if len(modified)>0: self.f.write('\tnfo:fileLastModified "'+modified+'" ;\n')
- if len(created)>0: self.f.write('\tnfo:fileCreated "'+created+'";\n')
+ #print 'Artist exists ' + artist
+ UID = artist_UID[artist]
+
+ self.f.write('<file://'+urllib.pathname2url(fullpath)+'> a nid3:ID3Audio,nfo:FileDataObject;\n')
+ if len(fileName)>0: self.f.write('\tnfo:fileName "'+fileName+'";\n')
+ if len(modified)>0: self.f.write('\tnfo:fileLastModified "'+modified+'" ;\n')
+ if len(created)>0: self.f.write('\tnfo:fileCreated "'+created+'";\n')
self.f.write('\tnfo:fileSize '+str(size)+';\n')
if len(album)>0: self.f.write('\tnid3:albumTitle "'+album+'";\n')
if len(year)>0: self.f.write('\tnid3:recordingYear '+str(year)+';\n')
if len(song)>0: self.f.write('\tnid3:title "'+song+'";\n')
if len(trackstr)>0: self.f.write('\tnid3:trackNumber "'+trackstr+'";\n')
if len(partOfSet)>0: self.f.write('\tnid3:partOfSet "'+partOfSet+'";\n')
-
- if len(genre)>0: self.f.write('\tnid3:contentType "'+genre+'";\n')
+
+ if len(genre)>0: self.f.write('\tnid3:contentType "'+genre+'";\n')
if len(comment)>0: self.f.write('\tnid3:comments "'+comment+'";\n')
if length>0: self.f.write('\tnid3:length '+str(length)+';\n')
if len(UID)>0: self.f.write('\tnid3:leadArtist <urn:uuid:'+UID+'>.\n\n')
-
-
+
+
songcounter+=1
-
-
+
+
if songcounter==1:
print id3r.dump()
-
-
+
+
except IOError, message:
print "ID TAG ERROR: getIDTags(): IOERROR:", message
def getOSDir(self,addpath, filelist, depth=0):
try:
- test=os.path.exists(addpath)
+ test=os.path.exists(addpath)
depth=depth+1
if (test and depth<8):
#folderlist.append(addpath)
@@ -132,8 +132,8 @@ class FileProcessor:
#filelist.append(addpath+"/"+fileName)
if fileName.endswith(".mp3") or fileName.endswith(".MP3"):
self.addMp3(addpath+"/"+fileName, fileName)
- #foldermap.append(folderCounter)
- if os.path.isdir(addpath+"/"+fileName) and not (fileName.find('.')==0) and not (fileName.find("debian")==0) and not (fileName.find('Maps')==0) and not (fileName.find('maps')==0):
+ #foldermap.append(folderCounter)
+ if os.path.isdir(addpath+"/"+fileName) and not (fileName.find('.')==0) and not (fileName.find("debian")==0) and not (fileName.find('Maps')==0) and not (fileName.find('maps')==0):
self.getOSDir(addpath+"/"+fileName,filelist, depth)
except OSError, message:
print "getOSDir():OSError:", message