diff options
author | Martyn Russell <martyn@lanedo.com> | 2009-12-11 12:44:05 +0000 |
---|---|---|
committer | Martyn Russell <martyn@lanedo.com> | 2009-12-11 12:44:05 +0000 |
commit | 74286929a4712925747056bb945b4afce447fd54 (patch) | |
tree | 7850bfbcc41b7bd80cb101cbca3cc43667c15f27 /utils/data-generators | |
parent | 0c01f4884359067accd0a790ae2906c2b412ef29 (diff) | |
download | tracker-74286929a4712925747056bb945b4afce447fd54.tar.gz |
Merged whitespace branch
Diffstat (limited to 'utils/data-generators')
-rw-r--r-- | utils/data-generators/README | 18 | ||||
-rwxr-xr-x | utils/data-generators/barnum/README | 24 | ||||
-rwxr-xr-x | utils/data-generators/barnum/convert_data.py | 12 | ||||
-rwxr-xr-x | utils/data-generators/barnum/gen_data.py | 10 | ||||
-rwxr-xr-x | utils/data-generators/barnum/gencc.py | 16 | ||||
-rwxr-xr-x | utils/data-generators/barnum/genpw.py | 38 | ||||
-rwxr-xr-x | utils/data-generators/generate-all-and-import.sh | 2 | ||||
-rwxr-xr-x | utils/data-generators/generate-data-for-contact-messages.py | 44 | ||||
-rwxr-xr-x | utils/data-generators/generate-data-for-music.py | 54 | ||||
-rw-r--r-- | utils/data-generators/id32nmmTurtle.py | 344 | ||||
-rwxr-xr-x | utils/data-generators/id32ttl.py | 58 |
11 files changed, 310 insertions, 310 deletions
diff --git a/utils/data-generators/README b/utils/data-generators/README index a197365ae..195e31504 100644 --- a/utils/data-generators/README +++ b/utils/data-generators/README @@ -15,7 +15,7 @@ Easy method: ./generate-all-and-import.sh -That's it. All turtle file generators are run, and their contents +That's it. All turtle file generators are run, and their contents are imported to tracker. Detailed method: @@ -31,10 +31,10 @@ Detailed method: ./id32ttl.py directory/ This script will crawl "directory" extracting the id3 information of - the files there. + the files there. * To generate information about music according to nmm - specification: + specification: python id32nmmTurtle.py dir_of_music_files/ @@ -45,11 +45,11 @@ Detailed method: ./generate-data-for-bookmarks.py > bookmarks.ttl This script uses bookmarks.in to get the data and prints the output - in the stdout. + in the stdout. * To generate some feeds information: - ./get-fresh-planets.sh + ./get-fresh-planets.sh (This script gets the atom feeds of planet.maemo and planet.gnome. Be sure you have internet connection, the proxy correctly setted, @@ -67,13 +67,13 @@ Other files =========== Dont touch these files, they are included in the generation scripts -described before: +described before: -* tools.py: +* tools.py: Some common functionality -* bookmarks.in: +* bookmarks.in: Input information to generate bookmarks -* songlist.ttl: +* songlist.ttl: The id3 information is harvested from mp3 files in the filesystem. This is a file with pre-generated information, so can be used to test even when you dont have a huge mp3 collection in your computer. diff --git a/utils/data-generators/barnum/README b/utils/data-generators/barnum/README index f36bb8361..cf11c38ac 100755 --- a/utils/data-generators/barnum/README +++ b/utils/data-generators/barnum/README @@ -1,25 +1,25 @@ What is Barnum? =============== -Barnum is a python-based application for quickly and easily creating +Barnum is a python-based application for quickly and easily creating pseudo-random data typically used for application testing. Why did you create Barnum? ========================== -I am developing a shopping cart application in Django and realized that I -needed a bunch of data to simulate the store's behavior under somewhat normal -production usage. +I am developing a shopping cart application in Django and realized that I +needed a bunch of data to simulate the store's behavior under somewhat normal +production usage. -I got tired of always trying to think of names and addresses for customers and +I got tired of always trying to think of names and addresses for customers and so decided to automate the process a little bit. Such was born Barnum. Why is Barnum unique? ===================== -I was able to find some online systems for generating large amounts of test -data. I could not find any application that had the breadth of data generation -capabilities nor the ability to easily interface with Django in the way I +I was able to find some online systems for generating large amounts of test +data. I could not find any application that had the breadth of data generation +capabilities nor the ability to easily interface with Django in the way I wanted to. One of the most unique aspects of Barnum is that the data is what I'll call @@ -37,7 +37,7 @@ You should notice a couple of things about this data. - There's a realistic first and last name - The street names are also plausible - Arthur, ND is a real city and the zip code is 58006 - - 701 is an area code used for North Dakota + - 701 is an area code used for North Dakota - The fictional company is somewhat reasonable. - The job position also makes sense. @@ -73,7 +73,7 @@ The gen_data.py script is the primary showcase for how to create random data using Barnum. If you run it from the command line: python gen_data.py - + You'll see some sample data output. If you'd like to call it from another script, here's an example or two from the @@ -114,7 +114,7 @@ endless! Where does the data come from? ============================== -I pulled sample data and existing scripts from a bunch of different sources. +I pulled sample data and existing scripts from a bunch of different sources. - The names are from 1990 US Census data http://www.census.gov/genealogy/names/names_files.html - The street names are from real us streets in a few locales. - Company names are randomly generated by me. @@ -157,7 +157,7 @@ Where did this name come from? Choosing names for projects is kind of fun but kind of a hassle. There needs to be a name but it can't be anything too stupid. I started off thinking of an acronym and ended up with -PT ("Python Testing") and immediately thought of P.T. Barnum. I really liked the name +PT ("Python Testing") and immediately thought of P.T. Barnum. I really liked the name because I was using this for Satchmo and project made in Django. Single word names seemed cool. Also, I like the fact that P.T. Barnum was really a master at making people think something was real that wasn't. Which is exactly what this little script does. diff --git a/utils/data-generators/barnum/convert_data.py b/utils/data-generators/barnum/convert_data.py index e7915cbbb..cd9218c9f 100755 --- a/utils/data-generators/barnum/convert_data.py +++ b/utils/data-generators/barnum/convert_data.py @@ -1,6 +1,6 @@ #!/usr/bin/python2.5 """ -This application converts the various text files stored in the source-data +This application converts the various text files stored in the source-data directory into a pickled python object to be used by the random data generator scripts @@ -29,10 +29,10 @@ import random import os data_dir = "barnum/source-data" -simple_files_to_process = ['street-names.txt', 'street-types.txt', 'latin-words.txt', +simple_files_to_process = ['street-names.txt', 'street-types.txt', 'latin-words.txt', 'email-domains.txt', 'job-titles.txt', 'company-names.txt', 'company-types.txt'] - + def load_files(): # Process Zip Codes all_zips = {} @@ -48,7 +48,7 @@ def load_files(): state_area_codes = {} for line in area_code_file: clean_line = line.replace(' ','').rstrip('\n') - state_area_codes[line.split(':')[0]] = clean_line[3:].split(',') + state_area_codes[line.split(':')[0]] = clean_line[3:].split(',') pickle.dump(state_area_codes, output) area_code_file.close() @@ -95,6 +95,6 @@ if __name__ == "__main__": response = string.lower(raw_input("Type 'yes' to reload the data from source files and create a new source file: ")) if response == 'yes': load_files() - - + + diff --git a/utils/data-generators/barnum/gen_data.py b/utils/data-generators/barnum/gen_data.py index f4941e68b..333ecd60a 100755 --- a/utils/data-generators/barnum/gen_data.py +++ b/utils/data-generators/barnum/gen_data.py @@ -73,7 +73,7 @@ def create_name(full_name=True, gender=None): def create_job_title(): return random.choice(job_titles) - + def create_phone(zip_code=None): if not zip_code: zip_code = random.choice(all_zips.keys()) @@ -99,7 +99,7 @@ def create_sentence(min=4, max=15): for word in range(1, random.randint(min, max-1)): sentence.append(random.choice(latin_words)) return " ".join(sentence) + "." - + def create_paragraphs(num=1, min_sentences=4, max_sentences=7): paragraphs = [] for para in range(0, num): @@ -122,9 +122,9 @@ def create_date(numeric=True, past=False, max_years_future=10, max_years_past=10 else: start = datetime.datetime.today() num_days = max_years_future * 365 - + random_days = random.randint(1, num_days) - random_date = start + datetime.timedelta(days=random_days) + random_date = start + datetime.timedelta(days=random_days) return(random_date) def create_birthday(age=random.randint (16, 80)): @@ -143,7 +143,7 @@ def create_company_name(biz_type=None): if not biz_type: biz_type = random.choice(company_type) if biz_type == "LawFirm": - name.append( random.choice(last_names)+ ", " + random.choice(last_names) + " & " + + name.append( random.choice(last_names)+ ", " + random.choice(last_names) + " & " + random.choice(last_names)) name.append('LLP') else: diff --git a/utils/data-generators/barnum/gencc.py b/utils/data-generators/barnum/gencc.py index 85dc26861..37db52b10 100755 --- a/utils/data-generators/barnum/gencc.py +++ b/utils/data-generators/barnum/gencc.py @@ -26,10 +26,10 @@ import random import sys import copy -visaPrefixList = [ ['4', '5', '3', '9'], - ['4', '5', '5', '6'], +visaPrefixList = [ ['4', '5', '3', '9'], + ['4', '5', '5', '6'], ['4', '9', '1', '6'], - ['4', '5', '3', '2'], + ['4', '5', '3', '2'], ['4', '9', '2', '9'], ['4', '0', '2', '4', '0', '0', '7', '1'], ['4', '4', '8', '6'], @@ -67,8 +67,8 @@ jcbPrefixList16 = [ ['3', '0', '8', '8'], jcbPrefixList15 = [ ['2', '1', '0', '0'], ['1', '8', '0', '0'] ] -voyagerPrefixList = [ ['8', '6', '9', '9'] ] - +voyagerPrefixList = [ ['8', '6', '9', '9'] ] + """ 'prefix' is the start of the CC number as a string, any number of digits. @@ -85,7 +85,7 @@ def completed_number(prefix, length): ccnumber.append(digit) - # Calculate sum + # Calculate sum sum = 0 pos = 0 @@ -113,7 +113,7 @@ def completed_number(prefix, length): checkdigit = ((sum / 10 + 1) * 10 - sum) % 10 ccnumber.append( str(checkdigit) ) - + return ''.join(ccnumber) def credit_card_number(prefixList, length, howMany): @@ -121,7 +121,7 @@ def credit_card_number(prefixList, length, howMany): result = [] for i in range(howMany): - + ccnumber = copy.copy( random.choice(prefixList) ) result.append( completed_number(ccnumber, length) ) diff --git a/utils/data-generators/barnum/genpw.py b/utils/data-generators/barnum/genpw.py index 446ea0c88..f5cb8b831 100755 --- a/utils/data-generators/barnum/genpw.py +++ b/utils/data-generators/barnum/genpw.py @@ -1,32 +1,32 @@ #!/usr/bin/python2.5 ## Generate a human readable 'random' password -## password will be generated in the form 'word'+digits+'word' +## password will be generated in the form 'word'+digits+'word' ## eg.,nice137pass ## parameters: number of 'characters' , number of 'digits' ## Pradeep Kishore Gowda <pradeep at btbytes.com > -## License : GPL +## License : GPL ## Date : 2005.April.15 -## Revision 1.2 -## ChangeLog: -## 1.1 - fixed typos -## 1.2 - renamed functions _apart & _npart to a_part & n_part as zope does not allow functions to +## Revision 1.2 +## ChangeLog: +## 1.1 - fixed typos +## 1.2 - renamed functions _apart & _npart to a_part & n_part as zope does not allow functions to ## start with _ def nicepass(alpha=6,numeric=2): """ - returns a human-readble password (say rol86din instead of - a difficult to remember K8Yn9muL ) + returns a human-readble password (say rol86din instead of + a difficult to remember K8Yn9muL ) """ import string import random vowels = ['a','e','i','u', 'y'] consonants = [a for a in string.ascii_lowercase if a not in vowels] digits = string.digits - + ####utility functions def a_part(slen): ret = '' - for i in range(slen): + for i in range(slen): if i%2 ==0: randid = random.randint(0,20) #number of consonants ret += consonants[randid] @@ -34,26 +34,26 @@ def nicepass(alpha=6,numeric=2): randid = random.randint(0,4) #number of vowels ret += vowels[randid] return ret - + def n_part(slen): ret = '' for i in range(slen): randid = random.randint(0,9) #number of digits ret += digits[randid] return ret - - #### - fpl = alpha/2 + + #### + fpl = alpha/2 if alpha % 2 : - fpl = int(alpha/2) + 1 - lpl = alpha - fpl - + fpl = int(alpha/2) + 1 + lpl = alpha - fpl + start = a_part(fpl) mid = n_part(numeric) end = a_part(lpl) - + return "%s%s%s" % (start,mid,end) - + if __name__ == "__main__": for i in range(10): print nicepass(6,2) diff --git a/utils/data-generators/generate-all-and-import.sh b/utils/data-generators/generate-all-and-import.sh index 40c53c5cb..971e9936d 100755 --- a/utils/data-generators/generate-all-and-import.sh +++ b/utils/data-generators/generate-all-and-import.sh @@ -1,6 +1,6 @@ #!/bin/sh # generate and import all local .ttl files -# takes as one parameter the number of entries that should be +# takes as one parameter the number of entries that should be # generated in each category ./generate-all.sh $1 diff --git a/utils/data-generators/generate-data-for-contact-messages.py b/utils/data-generators/generate-data-for-contact-messages.py index 30e16f58d..d47ec2b7e 100755 --- a/utils/data-generators/generate-data-for-contact-messages.py +++ b/utils/data-generators/generate-data-for-contact-messages.py @@ -63,7 +63,7 @@ def generatePhoneNumber(): def generatePhoneCalls (many): for i in range (0, many): callUID = str(random.randint(0, sys.maxint)) - + duration = random.randint (0, 50) relationType = random.randint (0,100) % 2 if (relationType == 0): @@ -95,9 +95,9 @@ def generateSMS (many): #Received SMS sys.stdout.write('\tnmo:from [a nco:Contact; nco:hasPhoneNumber <' + phoneUri + '>];\n') sys.stdout.write('\tnmo:to [a nco:Contact; nco:hasPhoneNumber <' + myOwnPhoneNumberURI + '>];\n') - + sys.stdout.write('\tnmo:sentDate "' + getPseudoRandomDate () + '";\n') - sys.stdout.write('\tnmo:receivedDate "' + getPseudoRandomDate () + '";\n') + sys.stdout.write('\tnmo:receivedDate "' + getPseudoRandomDate () + '";\n') if (random.randint(0, 4) > 3): sys.stdout.write ('\tnao:hasTag [a nao:Tag ; nao:prefLabel ' + getRandomTag () +'];\n') sys.stdout.write('\tnmo:plainTextMessageContent "' + str.replace(gen_data.create_paragraphs(1, 5, 8), "\n", "") + '".\n') @@ -136,7 +136,7 @@ def generateEmail(sys, gen_data, str, random): ccid = random.randint(0, len(previousContacts) - 1) sys.stdout.write('\tnmo:cc <urn:uuid:' + previousContacts[ccid] + '>;\n') sys.stdout.write('\tnmo:messageHeader [a nmo:MessageHeader; nmo:headerName "cc"; nmo:headerValue "' + previousEmailAddresses[ccid] + '"];\n') - + if random.randint(0, 10) > 7 and len(previousContacts) > 1: bccid = random.randint(0, len(previousContacts) - 1) sys.stdout.write('\tnmo:bcc <urn:uuid:' + previousContacts[bccid] + '>;\n') @@ -144,7 +144,7 @@ def generateEmail(sys, gen_data, str, random): #TODO add some sense to the email titles. Some reply chains as well. if (random.randint(0, 4) > 3): sys.stdout.write ('\tnao:hasTag [a nao:Tag ; nao:prefLabel ' + getRandomTag () +'];\n') - + sys.stdout.write('\tnmo:messageSubject "' + str.replace(gen_data.create_paragraphs(1, 2, 2), "\n", "") + '";\n') sys.stdout.write('\tnmo:plainTextMessageContent "' + str.replace(gen_data.create_paragraphs(1, 2, 3), "\n", "") + '".\n') @@ -173,7 +173,7 @@ sys.stdout.write("@prefix nid3: <http://www.semanticdesktop.org/ontologies/200 sys.stdout.write("@prefix nao: <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#>.\n") sys.stdout.write("@prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>.\n") sys.stdout.write("@prefix nmo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nmo#>.\n") -sys.stdout.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n") +sys.stdout.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n") sys.stdout.write("@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>.\n") sys.stdout.write("@prefix ncal: <http://www.semanticdesktop.org/ontologies/2007/04/02/ncal#>.\n") @@ -181,7 +181,7 @@ sys.stdout.write("@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.\n") sys.stdout.write('<mailto:me@me.com> a nco:EmailAddress; \n') sys.stdout.write('\tnco:emailAddress "me@me.com".\n') sys.stdout.write('\n') -sys.stdout.write('<urn:uuid:1> a nco:PersonContact; \n') +sys.stdout.write('<urn:uuid:1> a nco:PersonContact; \n') sys.stdout.write('\tnco:fullname "Me Myself";\n') sys.stdout.write('\tnco:nameGiven "Me";\n') sys.stdout.write('\tnco:nameFamily "Myself";\n') @@ -204,7 +204,7 @@ for dummy in range (0, count): firstName, lastName = gen_data.create_name() zip, city, state = gen_data.create_city_state_zip() postalAddressID=str(random.randint(0, sys.maxint)) - + UID = str(random.randint(0, sys.maxint)) phoneNumber = gen_data.create_phone() phoneUri = 'tel:+1' + phoneNumber.translate(allchars,' -()') @@ -216,11 +216,11 @@ for dummy in range (0, count): hasPhoneNumber = False jobTitle = gen_data.create_job_title() - generatePostalAddress() - generateEmailAddress() - + generatePostalAddress() + generateEmailAddress() + #Only every 3rd have Phone or IM to add variation. - if random.randint(0, 3) > 2 or count == 1: + if random.randint(0, 3) > 2 or count == 1: generateIMAccount(gen_data, str) hasIMAccount = True if random.randint(0, 3) > 2 or count == 1: @@ -229,8 +229,8 @@ for dummy in range (0, count): if (withPhone): generatePhoneCalls(3) if (withPhone): generateSMS (4) - - sys.stdout.write('<urn:uuid:' + UID + '> a nco:PersonContact; \n') + + sys.stdout.write('<urn:uuid:' + UID + '> a nco:PersonContact; \n') sys.stdout.write('\tnco:fullname "' + firstName + ' ' + lastName +'";\n') sys.stdout.write('\tnco:nameGiven "' + firstName + '";\n') sys.stdout.write('\tnco:nameFamily "' + lastName + '";\n') @@ -238,26 +238,26 @@ for dummy in range (0, count): #sys.stdout.write('\tnco:title "'+jobTitle+'";\n') sys.stdout.write('\tnco:hasEmailAddress <mailto:' + emailAddress + '>;\n') if hasPhoneNumber: sys.stdout.write('\tnco:hasPhoneNumber <' + phoneUri + '>;\n') - if hasIMAccount: sys.stdout.write('\tnco:hasIMAccount <xmpp:' + xmppAddress + '>;\n') + if hasIMAccount: sys.stdout.write('\tnco:hasIMAccount <xmpp:' + xmppAddress + '>;\n') if (random.randint(0, 4) > 3): sys.stdout.write ('\tnao:hasTag [a nao:Tag ; nao:prefLabel ' + getRandomTag () + '];\n') sys.stdout.write('\tnco:hasPostalAddress <urn:uuid:' + postalAddressID + '>.\n') sys.stdout.write('\n') - + #calendarEntryID=str(random.randint(0, sys.maxint)) #if random.randint(0, 3)>2 and count>2 and len(previousContacts): # generateCalendarEntry(gen_data, str, random) - - #20% Send emails. Those who do, send 1-30 emails. EMails have CC and BCC people + + #20% Send emails. Those who do, send 1-30 emails. EMails have CC and BCC people if random.randint(0, 10)>8 or count==1: - emailcount=random.randint(1, 30) + emailcount=random.randint(1, 30) for dummy in range (0, emailcount): generateEmail(sys, gen_data, str, random) sys.stdout.write('\n') previousContacts.append(UID) previousEmailAddresses.append(emailAddress) - - #TODO INSERT IM - Use just a nmo:Message for that for now. - + + #TODO INSERT IM - Use just a nmo:Message for that for now. + #TODO: Insert bookmarks diff --git a/utils/data-generators/generate-data-for-music.py b/utils/data-generators/generate-data-for-music.py index a13c7afc7..d09e6319c 100755 --- a/utils/data-generators/generate-data-for-music.py +++ b/utils/data-generators/generate-data-for-music.py @@ -41,9 +41,9 @@ def generate_name(): def update_tag(artistid, artistname, albumid, trackid, genreid): global fileid - + length = random.randint(5000,5000000) - song = 'SongTitle%03u' % fileid + song = 'SongTitle%03u' % fileid album = 'Album%03u' % albumid genre = 'Genre%03u' % genreid trackstr = str(artistid) + '/' + str(trackid) @@ -55,7 +55,7 @@ def update_tag(artistid, artistname, albumid, trackid, genreid): random.randint(1, 12), random.randint(1, 28)) created = modified - + if not artist_UID.has_key(artistname): #print " The new artist is "+artist UID = str(random.randint(0, sys.maxint)) @@ -68,8 +68,8 @@ def update_tag(artistid, artistname, albumid, trackid, genreid): if not album_UID.has_key(album): album_UID[album] = album f.write('<urn:album:' + album + '> a nmm:MusicAlbum; \n') - - if len(UID)>0: + + if len(UID)>0: f.write('\tnmm:albumArtist <urn:uuid:' + UID + '>;\n') f.write('\tnie:title "' + album + '".\n\n') @@ -77,7 +77,7 @@ def update_tag(artistid, artistname, albumid, trackid, genreid): UID = artist_UID[artistname] f.write('<file://' + urllib.pathname2url(fullpath) + '> a nmm:MusicPiece,nfo:FileDataObject;\n') - if len(song) > 0: + if len(song) > 0: f.write('\tnie:title "' + song + '";\n') f.write('\tnfo:fileName \"' + artistname + '.mp3\";\n') @@ -89,7 +89,7 @@ def update_tag(artistid, artistname, albumid, trackid, genreid): if len(trackstr) > 0: trackArray = trackstr.split("/") - if len(trackArray) > 0: + if len(trackArray) > 0: f.write('\tnmm:trackNumber ' + trackArray[0] + ';\n') f.write('\tnmm:length ' + str(length) + ';\n') @@ -110,7 +110,7 @@ def create_track(artistid, albumid, genreid, settings): def generate(settings): ''' A total of TotalTracks files will be generated. These contain the specified number of albums.''' - ''' + ''' filepath = settings['OutputDir'] try: os.makedirs(filepath) @@ -118,7 +118,7 @@ def generate(settings): print 'Directory exists' ''' - global album_UID + global album_UID genreid = 1 artistid = 1 albumid = 0 @@ -137,36 +137,36 @@ if __name__ == '__main__': parser = OptionParser() - parser.add_option("-T", "--TotalTracks", + parser.add_option("-T", "--TotalTracks", dest='TotalTracks', - help="Specify (mandatory) the total number of files to be generated", + help="Specify (mandatory) the total number of files to be generated", metavar="TotalTracks") - parser.add_option("-r", "--ArtistCount", - dest='ArtistCount', + parser.add_option("-r", "--ArtistCount", + dest='ArtistCount', default=2, - help="Specify (mandatory) the total number of Artists." , + help="Specify (mandatory) the total number of Artists." , metavar="ArtistCount") - parser.add_option("-a", "--album-count", - dest='AlbumCount', + parser.add_option("-a", "--album-count", + dest='AlbumCount', default=5, - help="Specify (mandatory) the number of albums per artist.", + help="Specify (mandatory) the number of albums per artist.", metavar="AlbumCount") - parser.add_option("-g", "--genre-count", - dest='GenreCount', + parser.add_option("-g", "--genre-count", + dest='GenreCount', default=10, - help="Specify the genre count" , + help="Specify the genre count" , metavar="GenreCount") - parser.add_option("-o", "--output", - dest='OutputFileName', + parser.add_option("-o", "--output", + dest='OutputFileName', default='generate-data-for-music.ttl', - help="Specify the output ttl filename. e.g. -T 2000 -r 25 -a 20 -g 10 -o generated_songs.ttl", + help="Specify the output ttl filename. e.g. -T 2000 -r 25 -a 20 -g 10 -o generated_songs.ttl", metavar="OutputFileName") (options, args) = parser.parse_args() - - mandatories = ['TotalTracks', 'ArtistCount', 'AlbumCount'] - for m in mandatories: - if not options.__dict__[m]: + + mandatories = ['TotalTracks', 'ArtistCount', 'AlbumCount'] + for m in mandatories: + if not options.__dict__[m]: # Set defaults if m == "TotalTracks": options.TotalTracks = 100 diff --git a/utils/data-generators/id32nmmTurtle.py b/utils/data-generators/id32nmmTurtle.py index 6a98fd786..00d4ed099 100644 --- a/utils/data-generators/id32nmmTurtle.py +++ b/utils/data-generators/id32nmmTurtle.py @@ -1,172 +1,172 @@ -#!/usr/bin/python
-# -*- coding: utf-8 -*-
-#
-#
-# Copyright (c) 2007 Urho Konttori <urho.konttori@gmail.com>
-#
-# This program is free software; you can redistribute it and/or
-# modify it under the terms of the GNU General Public License as
-# published by the Free Software Foundation; either version 2 of the
-# License, or (at your option) any later version.
-#
-# This program is distributed in the hope that it will be useful, but
-# WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-# General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, write to the Free Software
-# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
-# 02111-1307, USA.
-#
-import os, datetime, time, internals.id3reader as id3reader
-import sys, urllib, random
-indexPath="/Volumes/OSX"
-
-if len(sys.argv)>1:
- indexPath=str(sys.argv[1])
-else:
- print "Usage: python id32nmmTurtle.py <path-to-index>."
- sys.exit (1)
-
-
-songcounter=0
-
-filelist=[]
-folderlist=[]
-foldermap=[]
-depth=0
-artist_UID = {}
-album_UID = {}
-
-class FileProcessor:
-
- def __init__(self):
- self.f=open("./songlist.ttl", 'w' )
-
- self.f.write("@prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>.\n")
- self.f.write("@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.\n")
- self.f.write("@prefix nrl: <http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#>.\n")
- self.f.write("@prefix nid3: <http://www.semanticdesktop.org/ontologies/2007/05/10/nid3#>.\n")
- self.f.write("@prefix nmm: <http://www.tracker-project.org/temp/nmm#>.\n")
- self.f.write("@prefix nao: <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#>.\n")
- self.f.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n")
- self.f.write("@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>.\n");
- self.f.write("@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.\n")
-
- def addMp3(self, fullpath, fileName):
- global songcounter
- global g_UID
- try:
- year=""
- album=""
- song=""
- artist=""
- trackstr=""
- genre=""
- comment=""
- year=""
- length=0
- id3r = id3reader.Reader(fullpath)
- if id3r.getValue('album'): album = id3r.getValue('album')
- if id3r.getValue('title'): song = id3r.getValue('title')
- if id3r.getValue('performer'): artist = id3r.getValue('performer')
- if id3r.getValue('year'): year = id3r.getValue('year')
- if id3r.getValue('genre'): genre = id3r.getValue('genre')
- if id3r.getValue('comment'): comment = id3r.getValue('comment')
- length=random.randint(5000,5000000 )
- if id3r.getValue('track'):
- trackstr=str(id3r.getValue('track'))
- if id3r.getValue('TPA'):
- partOfSet=id3r.getValue('TPA')
- if partOfSet=="None": partOfSet=""
- modified=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getmtime(fullpath)))
- created=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getctime(fullpath)))
- size = os.path.getsize(fullpath)
-
-
- artistUID = ""
- albumUID = ""
- UID=""
- if not artist_UID.has_key(artist):
- #print " The new artist is "+artist
- UID = str(random.randint(0, sys.maxint))
- artist_UID[artist] = UID
- self.f.write('<urn:uuid:'+UID+'> a nco:Contact; \n')
- #self.f.write('<urn:artist:'+artist+'> a nco:Contact; \n')
- self.f.write('\tnco:fullname "'+artist+'".\n\n')
- else :
- #print 'Artist exists ' + artist
- UID = artist_UID[artist]
-
- if not album_UID.has_key(album):
- #print " The new album is "+artist
-
- album_UID[artist] = album
- self.f.write('<urn:album:'+album+'> a nmm:MusicAlbum; \n')
-
- if len(partOfSet)>0:
- setArray=partOfSet.split("/")
- if len(setArray)>0: self.f.write('\tnmm:setNumber '+setArray[0]+';\n')
- if len(setArray)>1: self.f.write('\tnmm:setCount '+setArray[1]+';\n')
- if len(UID)>0: self.f.write('\tnmm:albumArtist <urn:uuid:'+UID+'>;\n')
- self.f.write('\tnie:title "'+album+'".\n\n')
- else :
- #print 'Artist exists ' + artist
- UID = artist_UID[artist]
-
- self.f.write('<file://'+urllib.pathname2url(fullpath)+'> a nmm:MusicPiece,nfo:FileDataObject;\n')
- if len(song)>0: self.f.write('\tnie:title "'+song+'";\n')
- if len(fileName)>0: self.f.write('\tnfo:fileName "'+fileName+'";\n')
- if len(modified)>0: self.f.write('\tnfo:fileLastModified "'+modified+'" ;\n')
- if len(created)>0: self.f.write('\tnfo:fileCreated "'+created+'";\n')
- self.f.write('\tnfo:fileSize '+str(size)+';\n')
- if len(album)>0: self.f.write('\tnmm:musicAlbum <urn:album:'+album+'>;\n')
-# if len(year)>0: self.f.write('\tnid3:recordingYear '+str(year)+';\n')
- if len(genre)>0: self.f.write('\tnmm:genre "'+genre+'";\n')
- if len(trackstr)>0:
- trackArray=trackstr.split("/")
- if len(trackArray)>0: self.f.write('\tnmm:trackNumber '+trackArray[0]+';\n')
-
-
- if length>0: self.f.write('\tnmm:length '+str(length)+';\n')
- if len(UID)>0: self.f.write('\tnmm:performer <urn:uuid:'+UID+'>.\n\n')
-
-
- songcounter+=1
-
-
- if songcounter==1:
- print id3r.dump()
-
-
-
- except IOError, message:
- print "ID TAG ERROR: getIDTags(): IOERROR:", message
-
- def getOSDir(self,addpath, filelist, depth=0):
- try:
- test=os.path.exists(addpath)
- depth=depth+1
- if (test and depth<8):
- #folderlist.append(addpath)
- #folderCounter=len(folderlist)-1
- for fileName in os.listdir (addpath):
- try:
- #filelist.append(addpath+"/"+fileName)
- if fileName.endswith(".mp3") or fileName.endswith(".MP3"):
- self.addMp3(addpath+"/"+fileName, fileName)
- #foldermap.append(folderCounter)
- if os.path.isdir(addpath+"/"+fileName) and not (fileName.find('.')==0) and not (fileName.find("debian")==0) and not (fileName.find('Maps')==0) and not (fileName.find('maps')==0):
- self.getOSDir(addpath+"/"+fileName,filelist, depth)
- except OSError, message:
- print "getOSDir():OSError:", message
- except OSError, message:
- print "getOSDir():OSError:", message
-
-
-fileProcessor=FileProcessor()
-startTime = time.time()
-fileProcessor.getOSDir(indexPath, filelist, depth)
-fileProcessor.f.close()
-print "created "+ str(songcounter) +" songs to turtle file in " + str(time.time()-startTime)+ " seconds."
+#!/usr/bin/python +# -*- coding: utf-8 -*- +# +# +# Copyright (c) 2007 Urho Konttori <urho.konttori@gmail.com> +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation; either version 2 of the +# License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA +# 02111-1307, USA. +# +import os, datetime, time, internals.id3reader as id3reader +import sys, urllib, random +indexPath="/Volumes/OSX" + +if len(sys.argv)>1: + indexPath=str(sys.argv[1]) +else: + print "Usage: python id32nmmTurtle.py <path-to-index>." + sys.exit (1) + + +songcounter=0 + +filelist=[] +folderlist=[] +foldermap=[] +depth=0 +artist_UID = {} +album_UID = {} + +class FileProcessor: + + def __init__(self): + self.f=open("./songlist.ttl", 'w' ) + + self.f.write("@prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>.\n") + self.f.write("@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.\n") + self.f.write("@prefix nrl: <http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#>.\n") + self.f.write("@prefix nid3: <http://www.semanticdesktop.org/ontologies/2007/05/10/nid3#>.\n") + self.f.write("@prefix nmm: <http://www.tracker-project.org/temp/nmm#>.\n") + self.f.write("@prefix nao: <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#>.\n") + self.f.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n") + self.f.write("@prefix nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>.\n"); + self.f.write("@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.\n") + + def addMp3(self, fullpath, fileName): + global songcounter + global g_UID + try: + year="" + album="" + song="" + artist="" + trackstr="" + genre="" + comment="" + year="" + length=0 + id3r = id3reader.Reader(fullpath) + if id3r.getValue('album'): album = id3r.getValue('album') + if id3r.getValue('title'): song = id3r.getValue('title') + if id3r.getValue('performer'): artist = id3r.getValue('performer') + if id3r.getValue('year'): year = id3r.getValue('year') + if id3r.getValue('genre'): genre = id3r.getValue('genre') + if id3r.getValue('comment'): comment = id3r.getValue('comment') + length=random.randint(5000,5000000 ) + if id3r.getValue('track'): + trackstr=str(id3r.getValue('track')) + if id3r.getValue('TPA'): + partOfSet=id3r.getValue('TPA') + if partOfSet=="None": partOfSet="" + modified=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getmtime(fullpath))) + created=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getctime(fullpath))) + size = os.path.getsize(fullpath) + + + artistUID = "" + albumUID = "" + UID="" + if not artist_UID.has_key(artist): + #print " The new artist is "+artist + UID = str(random.randint(0, sys.maxint)) + artist_UID[artist] = UID + self.f.write('<urn:uuid:'+UID+'> a nco:Contact; \n') + #self.f.write('<urn:artist:'+artist+'> a nco:Contact; \n') + self.f.write('\tnco:fullname "'+artist+'".\n\n') + else: + #print 'Artist exists ' + artist + UID = artist_UID[artist] + + if not album_UID.has_key(album): + #print " The new album is "+artist + + album_UID[artist] = album + self.f.write('<urn:album:'+album+'> a nmm:MusicAlbum; \n') + + if len(partOfSet)>0: + setArray=partOfSet.split("/") + if len(setArray)>0: self.f.write('\tnmm:setNumber '+setArray[0]+';\n') + if len(setArray)>1: self.f.write('\tnmm:setCount '+setArray[1]+';\n') + if len(UID)>0: self.f.write('\tnmm:albumArtist <urn:uuid:'+UID+'>;\n') + self.f.write('\tnie:title "'+album+'".\n\n') + else: + #print 'Artist exists ' + artist + UID = artist_UID[artist] + + self.f.write('<file://'+urllib.pathname2url(fullpath)+'> a nmm:MusicPiece,nfo:FileDataObject;\n') + if len(song)>0: self.f.write('\tnie:title "'+song+'";\n') + if len(fileName)>0: self.f.write('\tnfo:fileName "'+fileName+'";\n') + if len(modified)>0: self.f.write('\tnfo:fileLastModified "'+modified+'" ;\n') + if len(created)>0: self.f.write('\tnfo:fileCreated "'+created+'";\n') + self.f.write('\tnfo:fileSize '+str(size)+';\n') + if len(album)>0: self.f.write('\tnmm:musicAlbum <urn:album:'+album+'>;\n') +# if len(year)>0: self.f.write('\tnid3:recordingYear '+str(year)+';\n') + if len(genre)>0: self.f.write('\tnmm:genre "'+genre+'";\n') + if len(trackstr)>0: + trackArray=trackstr.split("/") + if len(trackArray)>0: self.f.write('\tnmm:trackNumber '+trackArray[0]+';\n') + + + if length>0: self.f.write('\tnmm:length '+str(length)+';\n') + if len(UID)>0: self.f.write('\tnmm:performer <urn:uuid:'+UID+'>.\n\n') + + + songcounter+=1 + + + if songcounter==1: + print id3r.dump() + + + + except IOError, message: + print "ID TAG ERROR: getIDTags(): IOERROR:", message + + def getOSDir(self,addpath, filelist, depth=0): + try: + test=os.path.exists(addpath) + depth=depth+1 + if (test and depth<8): + #folderlist.append(addpath) + #folderCounter=len(folderlist)-1 + for fileName in os.listdir (addpath): + try: + #filelist.append(addpath+"/"+fileName) + if fileName.endswith(".mp3") or fileName.endswith(".MP3"): + self.addMp3(addpath+"/"+fileName, fileName) + #foldermap.append(folderCounter) + if os.path.isdir(addpath+"/"+fileName) and not (fileName.find('.')==0) and not (fileName.find("debian")==0) and not (fileName.find('Maps')==0) and not (fileName.find('maps')==0): + self.getOSDir(addpath+"/"+fileName,filelist, depth) + except OSError, message: + print "getOSDir():OSError:", message + except OSError, message: + print "getOSDir():OSError:", message + + +fileProcessor=FileProcessor() +startTime = time.time() +fileProcessor.getOSDir(indexPath, filelist, depth) +fileProcessor.f.close() +print "created "+ str(songcounter) +" songs to turtle file in " + str(time.time()-startTime)+ " seconds." diff --git a/utils/data-generators/id32ttl.py b/utils/data-generators/id32ttl.py index 26bb934dd..d1f342595 100755 --- a/utils/data-generators/id32ttl.py +++ b/utils/data-generators/id32ttl.py @@ -38,18 +38,18 @@ foldermap=[] depth=0 artist_UID = {} class FileProcessor: - + def __init__(self): self.f=open("./songlist.ttl", 'w' ) - + self.f.write("@prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#>.\n") self.f.write("@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.\n") self.f.write("@prefix nrl: <http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#>.\n") self.f.write("@prefix nid3: <http://www.semanticdesktop.org/ontologies/2007/05/10/nid3#>.\n") self.f.write("@prefix nao: <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#>.\n") - self.f.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n") + self.f.write("@prefix nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>.\n") self.f.write("@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.\n") - + def addMp3(self, fullpath, fileName): global songcounter global g_UID @@ -79,50 +79,50 @@ class FileProcessor: modified=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getmtime(fullpath))) created=time.strftime("%Y-%m-%dT%H:%M:%S",time.localtime(os.path.getctime(fullpath))) size = os.path.getsize(fullpath) - - #UID=str(random.randint(0, sys.maxint)) + + #UID=str(random.randint(0, sys.maxint)) UID = "" if not artist_UID.has_key(artist): - #print " The new artist is "+artist - UID = str(random.randint(0, sys.maxint)) - artist_UID[artist] = UID - self.f.write('<urn:uuid:'+UID+'> a nco:Contact; \n') - self.f.write('\tnco:fullname "'+artist+'".\n') + #print " The new artist is "+artist + UID = str(random.randint(0, sys.maxint)) + artist_UID[artist] = UID + self.f.write('<urn:uuid:'+UID+'> a nco:Contact; \n') + self.f.write('\tnco:fullname "'+artist+'".\n') else : - #print 'Artist exists ' + artist - UID = artist_UID[artist] - - self.f.write('<file://'+urllib.pathname2url(fullpath)+'> a nid3:ID3Audio,nfo:FileDataObject;\n') - if len(fileName)>0: self.f.write('\tnfo:fileName "'+fileName+'";\n') - if len(modified)>0: self.f.write('\tnfo:fileLastModified "'+modified+'" ;\n') - if len(created)>0: self.f.write('\tnfo:fileCreated "'+created+'";\n') + #print 'Artist exists ' + artist + UID = artist_UID[artist] + + self.f.write('<file://'+urllib.pathname2url(fullpath)+'> a nid3:ID3Audio,nfo:FileDataObject;\n') + if len(fileName)>0: self.f.write('\tnfo:fileName "'+fileName+'";\n') + if len(modified)>0: self.f.write('\tnfo:fileLastModified "'+modified+'" ;\n') + if len(created)>0: self.f.write('\tnfo:fileCreated "'+created+'";\n') self.f.write('\tnfo:fileSize '+str(size)+';\n') if len(album)>0: self.f.write('\tnid3:albumTitle "'+album+'";\n') if len(year)>0: self.f.write('\tnid3:recordingYear '+str(year)+';\n') if len(song)>0: self.f.write('\tnid3:title "'+song+'";\n') if len(trackstr)>0: self.f.write('\tnid3:trackNumber "'+trackstr+'";\n') if len(partOfSet)>0: self.f.write('\tnid3:partOfSet "'+partOfSet+'";\n') - - if len(genre)>0: self.f.write('\tnid3:contentType "'+genre+'";\n') + + if len(genre)>0: self.f.write('\tnid3:contentType "'+genre+'";\n') if len(comment)>0: self.f.write('\tnid3:comments "'+comment+'";\n') if length>0: self.f.write('\tnid3:length '+str(length)+';\n') if len(UID)>0: self.f.write('\tnid3:leadArtist <urn:uuid:'+UID+'>.\n\n') - - + + songcounter+=1 - - + + if songcounter==1: print id3r.dump() - - + + except IOError, message: print "ID TAG ERROR: getIDTags(): IOERROR:", message def getOSDir(self,addpath, filelist, depth=0): try: - test=os.path.exists(addpath) + test=os.path.exists(addpath) depth=depth+1 if (test and depth<8): #folderlist.append(addpath) @@ -132,8 +132,8 @@ class FileProcessor: #filelist.append(addpath+"/"+fileName) if fileName.endswith(".mp3") or fileName.endswith(".MP3"): self.addMp3(addpath+"/"+fileName, fileName) - #foldermap.append(folderCounter) - if os.path.isdir(addpath+"/"+fileName) and not (fileName.find('.')==0) and not (fileName.find("debian")==0) and not (fileName.find('Maps')==0) and not (fileName.find('maps')==0): + #foldermap.append(folderCounter) + if os.path.isdir(addpath+"/"+fileName) and not (fileName.find('.')==0) and not (fileName.find("debian")==0) and not (fileName.find('Maps')==0) and not (fileName.find('maps')==0): self.getOSDir(addpath+"/"+fileName,filelist, depth) except OSError, message: print "getOSDir():OSError:", message |