| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Change-Id: I540eb2fff8a9f67815fda26263350ecaa217f217
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously, we could over-assign how many parts should be in a tier.
This would cause the local `parts` variable to go negative, which meant
that our `while parts` loop would never terminate.
Change-Id: Id7e7889742ca37cf1a9c0d55fba78d967e90e8d0
Closes-Bug: 1642538
(cherry picked from commit 2e7a7347fc58676fbaabce3d87a15866796d32e4)
|
|\ \ |
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Occurs when a new file is stored to new suffix to not empty partition.
Then suffix is added to an invalidations file but not into hashes
pickle file. When a replication of this partition runs, replication of
suffix is completed on first and each 10th run of replicator. Rsync
runs on each new suffix because destination does not return hash of
new suffix although suffix content is in the same state.
This bug was introduced in 2.7.0
Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Change-Id: Ie2700f6e6171f2ecfa7d07b0f18b79e90cbf1c8a
Closes-Bug: #1634967
(cherry picked from commit 8ac432fff3e01a07f4bff918bb9cc38d93532b43)
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now the do_listdir option was set on every 10th replication run.
Due to the randomness of the job listing this might update a given
partition much less often than expected, for example with 1000
partitions per replicator only every ~70th run.
Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Christian Schwede <cschwede@redhat.com>
Related-Bug: #1634967
Closes-Bug: 1644807
Change-Id: Ib5c9dd17e40150450ec57a728ae8652fbc730af6
|
|
|
|
|
|
|
|
| |
This shorten shebang in infra, because we are hitting 128 bytes limit.
Added bindep.txt, which is needed for infra
Change-Id: I02477d81b836df71780942189d37d616944c4dce
(cherry picked from commit 5d7a3a4 and aab2cee)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes the ECDiskFileReader check the validity of EC
fragment metadata as it reads chunks from disk and quarantine a
diskfile with bad metadata. This in turn means that both the object
auditor and a proxy GET request will cause bad EC fragments to be
quarantined.
This change is motivated by bug 1631144 which may result in corrupt EC
fragments being written to disk but appear valid to the object auditor
md5 hash and content-length checks.
NotImplemented:
* perform metadata check when a read starts on any frag_size
boundary, not just at zero
Related-Bug: #1631144
Closes-Bug: #1633647
This is a backport of commit 2a75091c58948fb664016c0e91e72acd313e4610
Change-Id: Ifa6a7f8aaca94c7d39f4aeb9d4fa3f59c4f6ee13
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, if a reconstructor sync type job failed to provide
sufficient bytes from a reconstructed fragment body iterator to match
the content-length that the ssync sender had already sent to the ssync
receiver, the sender would still proceed to send the next
subrequest. The ssync receiver might then write the start of the next
subrequest to the partially complete diskfile for the previous
subrequest (including writing subrequest headers to that diskfile)
until it has received content-length bytes.
Since a reconstructor ssync job does not send an ETag header (it
cannot because it does not know the ETag of a reconstructed fragment
until it has been sent) then the receiving object server does not
detect the "bad" data written to the fragment diskfile, and worse,
will label it with an ETag that matches the md5 sum of the bad
data. The bad fragment file will therefore appear good to the auditor.
There is no easy way for the ssync sender to communicate a lack of
source data to the receiver other than by disconnecting the
session. So this patch adds a check in the ssync sender that the sent
byte count is equal to the sent Content-Length header value for each
subrequest, and disconnect if a mismatch is detected.
The disconnect prevents the receiver finalizing the bad diskfile, but
also prevents subsequent fragments in the ssync job being sync'd until
the next cycle.
N.B.
Though this is a backport patch to 2.7.0 release, there is the
difference from the original commit
3218f8b064e462d901466b04a4813e15ec96da85 on the master branch which is
the number of eventlet trampoline sleeps to make before asserting the
written log in test/unit/obj/test_ssync.py. That is because in 2.7.0
there is an extra eventlet coro for writing chunks into the real
diskfile, which was subsequently removed with commit
4c11833a9cbff499725365e535e217f3eae3c442 during the 2.7.0-2.8.0
development cycle.
Closes-Bug: #1631144
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: I54068906efdb9cd58fcdc6eae7c2163ea92afb9d
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Following fd86d5a, the object-auditor would leave status files so it
could resume where it left off if restarted. However, this would also
cause the object-reconstructor to print warnings like:
Unexpected entity in data dir: u'/srv/node4/sdb8/objects/auditor_status_ZBF.json'
...which isn't actually terribly useful or actionable. The auditor will
clean it up (eventually); the operator doesn't have to do anything.
Now, the reconstructor will specifically ignore those status files.
Partial-Bug: 1583305
Change-Id: I2f3d0bd2f1e242db6eb263c7755f1363d1430048
(cherry picked from commit ad16e2c77bb61bdf51a7d3b2c258daf69bfc74da)
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Ignore `auditor_status_*.json` files during the collecting jobs
and replicator won't use these wrong paths to find objects that
causes an exception to increase failure count in replicator report.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Change-Id: Ib15a0987288d9ee32432c1998aefe638ca3b223b
Closes-Bug: #1583305
(cherry picked from commit 65b1820407ea40bd7d65a5356a58a689befe3cb5)
|
|
|
|
|
|
|
| |
For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure
Change-Id: I6dc5872fe3005df1d98a7d914f4488a9d3b2f39f
|
|
|
|
|
|
|
| |
For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure
Change-Id: Idffa1f644ef1363a043a0f7dd9f10e801b0dc374
|
|
|
|
|
|
|
| |
For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure
Change-Id: I995ccd6d0494dd3535c6b93565c5ad9860d6f79e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, versioned_writes assumed that all container servers would
always have the latest Swift code, allowing them to return reversed
listings. This could cause the wrong version of a file to be restored
during rolling upgrades.
Now, versioned_writes will check that the listing returned is actually
reversed. If it isn't, we will revert to getting the full (in-order)
listing of versions and reversing it on the proxy.
Change-Id: Ib53574ff71961592426cb386ef00a75eb5824def
Closes-Bug: 1562083
(cherry picked from commit ebf0b220127b14bec7c05f1bc0286728f27f39d1)
|
|
|
|
|
|
|
| |
For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure
Change-Id: Ib7e30b55fd1795bf63cb0c22a97580be5e6f9f23
|
|
|
|
|
|
|
| |
For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure
Change-Id: Ia78689c73464b565fc6a09a633a1e9a84d92a97d
|
|
|
|
|
|
|
| |
For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure
Change-Id: I68439736f0d0cfc981d4eacbdd1908f57e82b4c9
|
|
|
|
| |
Change-Id: If8945bca78f90e95f6d05baefc078b8905f6bdb2
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Follow up for change [1] to add some assertions to check that
marker param is included in sequential GET requests sent during
a full listing.
Extract multiple FakeConn class definitions to single class at
module level and share between all classes.
Also, explicitly unpack the return values from base request calls
made in the full listing section of base_request, and explicitly
return a list to make more consistent with rest of the method.
[1] Change-Id: I6892390d72f70f1bc519b482d4f72603e1570163
Change-Id: Iad038709f46364b8324d25ac79be4317add79df5
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The internal_client is used in swift-dispersion-report, and in case one has more
than 10000 containers or objects these are not queried.
This patch adds support to the internal_client to iterate over all
containers/objects if the listing exceeds the default of 10000 entries and the
argument full_listing=True is used.
Closes-Bug: 1314817
Closes-Bug: 1525995
Change-Id: I6892390d72f70f1bc519b482d4f72603e1570163
|
|/
|
|
| |
Change-Id: I16ad0c61b048921ca01fa96862ae7eea0eec6017
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
DiskFile already fills in the _ondisk_info attribute when it tries to open
a diskfile - even if the DiskFile's fileset is not valid or deleted.
During this process the rsync tempfiles would be discovered and logged,
but no-one would attempt to clean them up - even if they were really old.
Instead of logging and ignoring unexpected files when validate a DiskFile
fileset we'll add unexpected files to the unexpected key in the
_ondisk_info attribute.
With a little bit of re-organization in the auditor's object_audit method
to get things into a single return path we can add an unconditional check
for unexpected files and remove those that are "old enough".
Since the replicator will kill any rsync processes that are running longer
than the configured rsync_timeout we know that any rsync tempfiles older
than this can be deleted.
Split unlink_older_than in common.utils into two functions to allow an
explicit list of previously discovered paths to be passed in to avoid an
extra listdir. Since the getmtime handling already ignores OSError
there's less concern of race condition where a previous discovered
unexpected file is reaped by rsync while we're attempting to clean it up.
Update some doc on the new config option.
Closes-Bug: #1554005
Change-Id: Id67681cb77f605e3491b8afcb9c69d769e154283
|
|\ \ |
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This change adds a remote HEAD object request before each call to
sync_row.
Currently, container-sync-row attempts to replicate the object
(using PUT) regardless of the existance of the object on the remote side,
thus causing each object to be transferred on the wire several times
(depending on the replication factor)
An alternative to HEAD is to do a conditional PUT (using, 100-continue).
However, this change is more involved and requires upgrade of both the
client and server side clusters to work.
In the Tokyo design summit it was decided to start with the HEAD approach.
Change-Id: I60d982dd2cc79a0f13b0924507cd03d7f9c9d70b
Closes-Bug: #1277223
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Updates docs to remove warnings that container sync only
works with object_post_as_copy=True. Since commit e91de49
container sync will also sync POST updates when using
object_post_as_copy=False.
Change-Id: I5cc3cc6e8f9ba2fef6f896f2b11d2a4e06825f7f
|
| |/
|/|
| |
| |
| |
| |
| | |
For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure
Change-Id: Ia2c2819db372da46538d71a80888a4e27538bdcd
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If you invoked the object auditor with --once, it would run the
full-audit checker(s) once, but it would run the ZBF checker over and
over until the full-audit checkers were done. Now it runs the ZBF and
full-audit checkers once each.
Change-Id: Ieeaa6fba4184a069756ee150727f24df7833697a
|
|\ \ \ |
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
After running:
python setup.py build_sphinx
there is a .eggs directory left in the repo root directory
which is not currently ignored by git.
Change-Id: Id15811f94046fd8bb22153425bf5cafe6c045453
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | | |
Change-Id: Ia44138aadcd30c474f744a9c552220e18302ecc6
|
|\ \ \ \
| |/ / / |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since commit "Update container sync to use internal client" get_object is
done using internal_client and not directly on nodes which makes the block
of code to shuffle the nodes redundant.
Change-Id: I45a6dab05f6f87510cf73102b1ed191238209efe
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Currently, the ECObjectController removes the 'content-length' header.
This part is ok, except that value is being used to set
'X-Backend-Obj-Content-Length', so it is always 0. This leads to not
calling fallocate (details on bug) on a PUT since the size is 0.
This change makes use of some numbers returned from the EC Driver
get_segment_info method in order to calculate the expected on-disk
size that should be allocated. The EC controller will now set the
'X-Backend-Obj-Content-Length' value appropriately.
Co-Authored-By: Kota Tsuyuzaki
Co-Authored-By: John Dickinson
Co-Authored-By: Tim Burke
Change-Id: Ifd16c1438539e6fd9bb2dbcd053d11bea2e09fee
Fixes: bug 1532008
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure
Change-Id: I70db7d29a9859cb47144ac49df8c289d1c2ec3e6
|
|\ \ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The object auditor will save a short status file on each device, containing a
list of remaining partitions for auditing. If the auditor is restarted, it will
only audit partitions not yet checked. If all partitions on the current device
have been checked, it will simply skip this device. Once all partitions on all
disks are successfully audited, all status files are removed.
Closes-Bug: #1183656
Change-Id: Icf1d920d0942ce48f1d3d374ea4d63dbc29ea464
|
|\ \ \ \ \ \ |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
If an EC diskfile is missing its .durable file (for example
due to a partial PUT failure) then the ssync missing check
will fail to open the file and will consider it
missing. This can result in possible reconstruction of the
fragment archive (for a sync job) and definite transmission
of the fragment archive (for sync and revert jobs), which is
wasteful.
This patch makes the ssync receiver inspect the diskfile
state after attempting to open it, and if fragments exist at
the timestamp of the sender's diskfile, but a .durable file
is missing, then the receiver will commit the diskfile at
the sender's timestamp. As a result, there is no longer any
need to send a fragment archive.
Change-Id: I4766864fcc0a3553976e8fd85bbb2fc782f04abd
|
|\ \ \ \ \ \ \ |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Today recon will include normal files in the payload it returns for
/recon/unmounted and /recon/diskusage. As a result it can trigger
bogus alarms on any operations-side monitoring checking for unmounted
disks or disks that show up in diskusage with weird looking stats.
This change adds an isdir check for the entries it finds in /srv/node.
Change-Id: Iad72e03fdda11ff600b81b4c5d58020cc4b9048e
Closes-bug: #1556747
|
|\ \ \ \ \ \ \ \ |
|