summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSebastian Thiel <byronimo@gmail.com>2011-06-13 22:48:55 +0200
committerSebastian Thiel <byronimo@gmail.com>2011-06-13 22:48:55 +0200
commita51b65d35d402791f774efe95d4b848cc524a403 (patch)
treec0c25cc4fc9bd8612ca758320a8db6480bea762a
parent78cdb214fb6273169433fa662ad630afc26eb36a (diff)
downloadsmmap-a51b65d35d402791f774efe95d4b848cc524a403.tar.gz
Finished tutorial section, umproved capabilities of the buffer implementation to be more pythonic. Unfortunately, not all docs build yet because of some typical sphinx issue that results in an error which doesn't at all tell what the culprit actually is
-rw-r--r--doc/source/intro.rst9
-rw-r--r--doc/source/tutorial.rst118
-rw-r--r--smmap/buf.py10
-rw-r--r--smmap/mman.py6
-rw-r--r--smmap/test/test_buf.py4
-rw-r--r--smmap/test/test_tutorial.py83
6 files changed, 221 insertions, 9 deletions
diff --git a/doc/source/intro.rst b/doc/source/intro.rst
index b35e785..30bff0d 100644
--- a/doc/source/intro.rst
+++ b/doc/source/intro.rst
@@ -34,11 +34,6 @@ Limitations
* In python below 2.6, memory maps will be created in compatibility mode which works, but creates inefficient memory mappings as they always start at offset 0.
* It wasn't tested on python 2.7 and 3.x.
-###############
-Getting Started
-###############
-It is advised to have a look at the :ref:`Usage Guide <tutorial-label>` for a brief introduction on the different database implementations.
-
################
Installing smmap
################
@@ -53,7 +48,9 @@ As the command will install smmap in your respective python distribution, you wi
If you have downloaded the source archive, the package can be installed by running the ``setup.py`` script::
$ python setup.py install
-
+
+It is advised to have a look at the :ref:`Usage Guide <tutorial-label>` for a brief introduction on the different database implementations.
+
##################
Homepage and Links
##################
diff --git a/doc/source/tutorial.rst b/doc/source/tutorial.rst
new file mode 100644
index 0000000..917b245
--- /dev/null
+++ b/doc/source/tutorial.rst
@@ -0,0 +1,118 @@
+.. _tutorial-label:
+
+###########
+Usage Guide
+###########
+This text briefly introduces you to the basic design decisions and accompanying classes.
+
+******
+Design
+******
+Per application, there is *MemoryManager* which is held as static instance and used throughout the application. It can be configured to keep your resources within certain limits.
+
+To access mapped regions, you require a cursor. Cursors point to exactly one file and serve as handles into it. As long as it exists, the respective memory region will remain available.
+
+For convenience, a buffer implementation is provided which handles cursors and resource allocation behind its simple buffer like interface.
+
+***************
+Memory Managers
+***************
+There are two types of memory managers, one uses *static* windows, the other one uses *sliding* windows. A window is a region of a file mapped into memory. Although the names might be somewhat misleading as technically windows are always static, the *sliding* version will allocate relatively small windows whereas the *static* version will always map the whole file.
+
+The *static* manager does nothing more than keeping a client count on the respective memory maps which always map the whole file, which allows to make some assumptions that can lead to simplified data access and increased performance, but reduces the compatibility to 32 bit systems or giant files.
+
+The *sliding* memory manager therefore should be the default manager when preparing an application for handling huge amounts of data on 32 bit and 64 bit platforms::
+
+ import smmap
+ # This instance should be globally available in your application
+ # It is configured to be well suitable for 32-bit or 64 bit applications.
+ mman = smmap.SlidingWindowMapManager()
+
+ # the manager provides much useful information about its current state
+ # like the amount of open file handles or the amount of mapped memory
+ mman.num_file_handles()
+ mman.mapped_memory_size()
+ # and many more ...
+
+
+Cursors
+*******
+*Cursors* are handles that point onto a window, i.e. a region of a file mapped into memory. From them you may obtain a buffer through which the data of that window can actually be accessed::
+
+ import smmap.test.lib
+ fc = smmap.test.lib.FileCreator(1024*1024*8, "test_file")
+
+ # obtain a cursor to access some file.
+ c = mman.make_cursor(fc.path)
+
+ # the cursor is now associated with the file, but not yet usable
+ assert c.is_associated()
+ assert not c.is_valid()
+
+ # before you can use the cursor, you have to specify a window you want to
+ # access. The following just says you want as much data as possible starting
+ # from offset 0.
+ # To be sure your region could be mapped, query for validity
+ assert c.use_region().is_valid() # use_region returns self
+
+ # once a region was mapped, you must query its dimension regularly
+ # to assure you don't try to access its buffer out of its bounds
+ assert c.size()
+ c.buffer()[0] # first byte
+ c.buffer()[1:10] # first 9 bytes
+ c.buffer()[c.size()-1] # last byte
+
+ # its recommended not to create big slices when feeding the buffer
+ # into consumers (e.g. struct or zlib).
+ # Instead, either give the buffer directly, or use pythons buffer command.
+ buffer(c.buffer(), 1, 9) # first 9 bytes without copying them
+
+ # you can query absolute offsets, and check whether an offset is included
+ # in the cursor's data.
+ assert c.ofs_begin() < c.ofs_end()
+ assert c.includes_ofs(100)
+
+ # If you are over out of bounds with one of your region requests, the
+ # cursor will be come invalid. It cannot be used in that state
+ assert not c.use_region(fc.size, 100).is_valid()
+ # map as much as possible after skipping the first 100 bytes
+ assert c.use_region(100).is_valid()
+
+ # You can explicitly free cursor resources by unusing the cursor's region
+ c.unuse_region()
+ assert not c.is_valid()
+
+
+Now you would have to write your algorithms around this interface to properly slide through huge amounts of data.
+
+Alternatively you can use a convenience interface.
+
+*******
+Buffers
+*******
+To make first use easier, at the expense of performance, there is a Buffer implementation which uses a cursor underneath.
+
+With it, you can access all data in a possibly huge file without having to take care of setting the cursor to different regions yourself::
+
+ # Create a default buffer which can operate on the whole file
+ buf = smmap.SlidingWindowMapBuffer(mman.make_cursor(fc.path))
+
+ # you can use it right away
+ assert buf.cursor().is_valid()
+
+ buf[0] # access the first byte
+ buf[-1] # access the last ten bytes on the file
+ buf[-10:]# access the last ten bytes
+
+ # If you want to keep the instance between different accesses, use the
+ # dedicated methods
+ buf.end_access()
+ assert not buf.cursor().is_valid() # you cannot use the buffer anymore
+ assert buf.begin_access(offset=10) # start using the buffer at an offset
+
+ # it will stop using resources automatically once it goes out of scope
+
+Disadvantages
+*************
+Buffers cannot be used in place of strings or maps, hence you have to slice them to have valid input for the sorts of struct and zlib. A slice means a lot of data handling overhead which makes buffers slower compared to using cursors directly.
+
diff --git a/smmap/buf.py b/smmap/buf.py
index c4d2522..9b24026 100644
--- a/smmap/buf.py
+++ b/smmap/buf.py
@@ -47,6 +47,8 @@ class SlidingWindowMapBuffer(object):
def __getitem__(self, i):
c = self._c
assert c.is_valid()
+ if i < 0:
+ i = self._size + i
if not c.includes_ofs(i):
c.use_region(i, 1)
# END handle region usage
@@ -57,6 +59,12 @@ class SlidingWindowMapBuffer(object):
# fast path, slice fully included - safes a concatenate operation and
# should be the default
assert c.is_valid()
+ if i < 0:
+ i = self._size + i
+ if j == sys.maxint:
+ j = self._size
+ if j < 0:
+ j = self._size + j
if (c.ofs_begin() <= i) and (j < c.ofs_end()):
b = c.ofs_begin()
return c.buffer()[i-b:j-b]
@@ -68,6 +76,7 @@ class SlidingWindowMapBuffer(object):
md = str()
while l:
c.use_region(ofs, l)
+ assert c.is_valid()
d = c.buffer()[:l]
ofs += len(d)
l -= len(d)
@@ -102,6 +111,7 @@ class SlidingWindowMapBuffer(object):
self._size = size
#END set size
return res
+ # END use our cursor
return False
def end_access(self):
diff --git a/smmap/mman.py b/smmap/mman.py
index 9629eca..deba998 100644
--- a/smmap/mman.py
+++ b/smmap/mman.py
@@ -11,17 +11,16 @@ from weakref import ref
import sys
from sys import getrefcount
-__all__ = ["StaticWindowMapManager", "SlidingWindowMapManager"]
+__all__ = ["StaticWindowMapManager", "SlidingWindowMapManager", "WindowCursor"]
#{ Utilities
#}END utilities
-
class WindowCursor(object):
"""Pointer into the mapped region of the memory manager, keeping the map
alive until it is destroyed and no other client uses it.
-
+
Cursors should not be created manually, but are instead returned by the SlidingWindowMapManager
:note: The current implementation is suited for static and sliding window managers, but it also means
that it must be suited for the somewhat quite different sliding manager. It could be improved, but
@@ -85,6 +84,7 @@ class WindowCursor(object):
def use_region(self, offset = 0, size = 0, flags = 0):
"""Assure we point to a window which allows access to the given offset into the file
+
:param offset: absolute offset in bytes into the file
:param size: amount of bytes to map. If 0, all available bytes will be mapped
:param flags: additional flags to be given to os.open in case a file handle is initially opened
diff --git a/smmap/test/test_buf.py b/smmap/test/test_buf.py
index d8b7fbc..9881c62 100644
--- a/smmap/test/test_buf.py
+++ b/smmap/test/test_buf.py
@@ -50,6 +50,10 @@ class TestBuf(TestBase):
assert data[offset] == buf[0]
assert data[offset:offset*2] == buf[0:offset]
+ # negative indices, partial slices
+ assert buf[-1] == buf[len(buf)-1]
+ assert buf[-10:] == buf[len(buf)-10:len(buf)]
+
# end access makes its cursor invalid
buf.end_access()
assert not buf.cursor().is_valid()
diff --git a/smmap/test/test_tutorial.py b/smmap/test/test_tutorial.py
new file mode 100644
index 0000000..a9f4b1c
--- /dev/null
+++ b/smmap/test/test_tutorial.py
@@ -0,0 +1,83 @@
+from lib import TestBase
+
+class TestTutorial(TestBase):
+
+ def test_example(self):
+ # Memory Managers
+ ##################
+ import smmap
+ # This instance should be globally available in your application
+ # It is configured to be well suitable for 32-bit or 64 bit applications.
+ mman = smmap.SlidingWindowMapManager()
+
+ # the manager provides much useful information about its current state
+ # like the amount of open file handles or the amount of mapped memory
+ assert mman.num_file_handles() == 0
+ assert mman.mapped_memory_size() == 0
+ # and many more ...
+
+ # Cursors
+ ##########
+ import smmap.test.lib
+ fc = smmap.test.lib.FileCreator(1024*1024*8, "test_file")
+
+ # obtain a cursor to access some file.
+ c = mman.make_cursor(fc.path)
+
+ # the cursor is now associated with the file, but not yet usable
+ assert c.is_associated()
+ assert not c.is_valid()
+
+ # before you can use the cursor, you have to specify a window you want to
+ # access. The following just says you want as much data as possible starting
+ # from offset 0.
+ # To be sure your region could be mapped, query for validity
+ assert c.use_region().is_valid() # use_region returns self
+
+ # once a region was mapped, you must query its dimension regularly
+ # to assure you don't try to access its buffer out of its bounds
+ assert c.size()
+ c.buffer()[0] # first byte
+ c.buffer()[1:10] # first 9 bytes
+ c.buffer()[c.size()-1] # last byte
+
+ # its recommended not to create big slices when feeding the buffer
+ # into consumers (e.g. struct or zlib).
+ # Instead, either give the buffer directly, or use pythons buffer command.
+ buffer(c.buffer(), 1, 9) # first 9 bytes without copying them
+
+ # you can query absolute offsets, and check whether an offset is included
+ # in the cursor's data.
+ assert c.ofs_begin() < c.ofs_end()
+ assert c.includes_ofs(100)
+
+ # If you are over out of bounds with one of your region requests, the
+ # cursor will be come invalid. It cannot be used in that state
+ assert not c.use_region(fc.size, 100).is_valid()
+ # map as much as possible after skipping the first 100 bytes
+ assert c.use_region(100).is_valid()
+
+ # You can explicitly free cursor resources by unusing the cursor's region
+ c.unuse_region()
+ assert not c.is_valid()
+
+ # Buffers
+ #########
+ # Create a default buffer which can operate on the whole file
+ buf = smmap.SlidingWindowMapBuffer(mman.make_cursor(fc.path))
+
+ # you can use it right away
+ assert buf.cursor().is_valid()
+
+ buf[0] # access the first byte
+ buf[-1] # access the last ten bytes on the file
+ buf[-10:]# access the last ten bytes
+
+ # If you want to keep the instance between different accesses, use the
+ # dedicated methods
+ buf.end_access()
+ assert not buf.cursor().is_valid() # you cannot use the buffer anymore
+ assert buf.begin_access(offset=10) # start using the buffer at an offset
+
+ # it will stop using resources automatically once it goes out of scope
+