summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* ceph, config, auth: better messages on failure to open keyring/ceph.confwip-5634Dan Mick2013-07-167-15/+31
| | | | | | | | | | If something as simple as file ownership is wrong, Ceph commands and daemons can fail to run, and the diagnostics are not great. Improve that for at least the specific cases of unopenable keyring and ceph.conf files. Fixes: #5634 Signed-off-by: Dan Mick <dan.mick@inktank.com>
* mon/MDSMonitor: make 'mds cluster_{up,down}' idempotentSage Weil2013-07-161-6/+2
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osdmaptool: fix cli testsSage Weil2013-07-162-9/+9
| | | | | | From the HASHPSPOOL change in acbc2f0bc0b4266125403aebb28e6e3a2365394d. Signed-off-by: Sage Weil <sage@inktank.com>
* Merge branch 'wip-ceph-disk' into nextSage Weil2013-07-161-47/+75
|\ | | | | | | | | Reviewed-by: Gary Lowell <gary.lowell@inktank.com> Tested-by: Jing Yuan Luke <jyluke@gmail.com>
| * ceph-disk: use /sys/block to determine partition device namesSage Weil2013-07-161-1/+24
| | | | | | | | | | | | | | Not all devices are basename + number; some have intervening character(s), like /dev/cciss/c0d1p2. Signed-off-by: Sage Weil <sage@inktank.com>
| * ceph-disk: reimplement is_partition() using /sys/blockSage Weil2013-07-161-16/+9
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
| * ceph-disk: use get_dev_name() helper throughoutSage Weil2013-07-161-4/+4
| | | | | | | | | | | | This is more robust than the broken split trick. Signed-off-by: Sage Weil <sage@inktank.com>
| * ceph-disk: refactor list_[all_]partitionsSage Weil2013-07-161-32/+19
| | | | | | | | | | | | | | Make these methods work in terms of device *names*, not paths, and fix up the only direct list_partitions() caller to do the same. Signed-off-by: Sage Weil <sage@inktank.com>
| * ceph-disk: add get_dev_name, path helpersSage Weil2013-07-161-0/+25
|/ | | | Signed-off-by: Sage Weil <sage@inktank.com>
* mon/OSDMonitor: fix typoSage Weil2013-07-161-1/+1
| | | | | | From 5eac38797d9eb5a59fcff1d81571cff7a2f10e66 Signed-off-by: Sage Weil <sage@inktank.com>
* osd/OSDMonitor: make 'osd pool rmsnap ...' not racy/crashySage Weil2013-07-161-24/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that the snap does in fact exist before we try to remove it. This avoids a crash where a we get two dup rmsnap requests (due to thrashing, or a reconnect, or something), the committed (p) value does have the snap, but the uncommitted (pp) does not. This fails the old test such that we try to remove it from pp again, and assert. Restructure the flow so that it is easier to distinguish the committed short return from the uncommitted return (which must still wait for the commit). 0> 2013-07-16 14:21:27.189060 7fdf301e9700 -1 osd/osd_types.cc: In function 'void pg_pool_t::remove_snap(snapid_t)' thread 7fdf301e9700 time 2013-07-16 14:21:27.187095 osd/osd_types.cc: 662: FAILED assert(snaps.count(s)) ceph version 0.66-602-gcd39d8a (cd39d8a6727d81b889869e98f5869e4227b50720) 1: (pg_pool_t::remove_snap(snapid_t)+0x6d) [0x7ad6dd] 2: (OSDMonitor::prepare_command(MMonCommand*)+0x6407) [0x5c1517] 3: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x1fb) [0x5c41ab] 4: (PaxosService::dispatch(PaxosServiceMessage*)+0x937) [0x598c87] 5: (Monitor::handle_command(MMonCommand*)+0xe56) [0x56ec36] 6: (Monitor::_ms_dispatch(Message*)+0xd1d) [0x5719ad] 7: (Monitor::handle_forward(MForward*)+0x821) [0x572831] 8: (Monitor::_ms_dispatch(Message*)+0xe44) [0x571ad4] 9: (Monitor::ms_dispatch(Message*)+0x32) [0x588c52] 10: (DispatchQueue::entry()+0x549) [0x7cf1d9] 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7060fd] 12: (()+0x7e9a) [0x7fdf35165e9a] 13: (clone()+0x6d) [0x7fdf334fcccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
* ObjectStore: add omap_rmkeyrange to dumpSamuel Just2013-07-161-0/+14
| | | | | Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* OSD: add perfcounter tracking messages delayed pending a mapSamuel Just2013-07-162-0/+4
| | | | | Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* FileStore: add a perf counter for time spent acquiring op queue throttleSamuel Just2013-07-162-0/+5
| | | | | Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* Merge branch 'wip-4779' into nextSage Weil2013-07-165-36/+85
|\ | | | | | | Reviewed-by: Sage Weil <sage@inktank.com># Please enter a commit message to explain why this merge is necessary,
| * mon/OSDMonitor: return error if we can't set the new bucket's nameSage Weil2013-07-161-1/+5
| | | | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
| * crush: return EINVAL on invalid name from ↵Sage Weil2013-07-162-1/+14
| | | | | | | | | | | | | | {insert,update,create_or_move}_item, set_item_name Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
| * crush: add is_valid_crush_name() helperSage Weil2013-07-162-0/+20
| | | | | | | | | | | | | | [A-Za-z0-9-_.]+ Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
| * MonCommands.h: use new validation for crush names (CephString goodchars)Dan Mick2013-07-121-16/+17
| | | | | | | | Signed-off-by: Dan Mick <dan.mick@inktank.com>
| * ceph_argparse.py: allow valid char RE arg to CephStringDan Mick2013-07-121-8/+19
| | | | | | | | | | | | | | | | Change badchars to goodchars (no one was using badchars); allow goodchars to be a RE character class of valid characters for the param. First use: crush item names. Signed-off-by: Dan Mick <dan.mick@inktank.com>
| * ceph_argparse: ignore prefix mismatches, but quit if non-prefixDan Mick2013-07-121-9/+9
| | | | | | | | | | | | | | I don't know what I was thinking; this was always the right validation algorithm, and I broke it trying to simplify. Signed-off-by: Dan Mick <dan.mick@inktank.com>
| * ceph_argparse.py: validate's 3rd arg is not verbose, it's partialDan Mick2013-07-121-1/+1
| | | | | | | | Signed-off-by: Dan Mick <dan.mick@inktank.com>
* | Merge pull request #439 from yehudasa/wip-rgw-nextGregory Farnum2013-07-161-0/+5
|\ \ | | | | | | | | | rgw: quiet down ECANCELED on put_obj_meta() Reviewed-by: Greg Farnum <greg@inktank.com>
| * | rgw: quiet down ECANCELED on put_obj_meta()Yehuda Sadeh2013-07-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: #5439 ECANCELED there means that we lost in a race to write the object. We should treat it as a successful write. This is reviving an old behavior that was changed inadvertently. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
* | | mon: OSDMonitor: only thrash and propose if we are the leaderJoao Eduardo Luis2013-07-161-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'thrash_map' is only set if we are the leader, so we would thrash and propose the pending value if we are the leader. However, we should keep the 'is_leader()' check not only for clarity's sake (an unfamiliar reader may cry OMGBUG, prompting to a patch much like this), but also because we may lose a subsequent election and become a peon instead, while still holding a 'thrash_map' value > 0 -- and we really don't want to propose while being a peon. Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | | mon/MDSMonitor: make 'ceph mds remove_data_pool ...' idempotentSage Weil2013-07-161-0/+2
| | | | | | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | | mon/OSDMonitor: clean up waiting_for_map messages on shutdownSage Weil2013-07-164-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Do not leak these. Fixes: #5643 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
* | | mon/OSDMonitor: send_to_waiting() in on_active()Sage Weil2013-07-161-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The send_latest() helper may put a message in the waiting_for_map list if we are not readable, but currently send_to_waiting() is only called from update_from_paxos(), and it is possible that we may be unreadable but not get a map update. Instead, share the map when we are active. Do the same for check_subs(), which is also about sharing the *new* map. Leave share_map_with_random_osd() and process_failures() which are not concerned with whether this is the latest map or not. This problem surfaced when we changed the timing of refresh relative to paxos commit, since update_from_paxos() is now not normally called while readable; see f1ce8d7c955a2443111bf7d9e16b4c563d445712 and c711203c0d4b924e5951aa808b243bf06e7ad23a. Fixes: #5643 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
* | | osd: do not enable HASHPSPOOL pool feature by defaultSage Weil2013-07-161-2/+1
| | | | | | | | | | | | | | | | | | This was added in kernel 3.9 and should not yet be enabled by default. Signed-off-by: Sage Weil <sage@inktank.com>
* | | ceph-disk: rely on /dev/disk/by-partuuid instead of special-casing journal ↵Sage Weil2013-07-161-34/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | symlinks This was necessary when ceph-disk-udev didn't create the by-partuuid (and other) symlinks for us, but now it is fragile and error-prone. (It also appears to be broken on a certain customer RHEL VM.) See d7f7d613512fe39ec883e11d201793c75ee05db1. Instead, just use the by-partuuid symlinks that we spent all that ugly effort generating. Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
* | | PendingReleaseNotes: formatted ceph CLI output and ceph-rest-apiDan Mick2013-07-161-0/+11
|/ / | | | | | | Signed-off-by: Dan Mick <dan.mick@inktank.com>
* | mon: Monitor: StoreConverter: clearer debug message on 'needs_conversion()'Joao Eduardo Luis2013-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The previous debug message outputted the function's name, as often our functions do. This was however a source of bewilderment, as users would see those in logs and think their stores would need conversion. Changing this message is trivial enough and it will make ceph users happier log readers. Backport: cuttlefish Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | mon: Monitor: StoreConverter: sanitize 'store' pointer on initJoao Eduardo Luis2013-07-161-0/+1
| | | | | | | | | | | | | | | | We are supposed to have umount'ed the store and set the pointer to NULL. We should not tolerate any other case on init(). Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | mon: Monitor: do not reopen MonitorDBStore during conversionJoao Eduardo Luis2013-07-161-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already open the store on ceph_mon.cc, before we start the conversion. Given we are unable to reproduce this every time a conversion is triggered, we are led to believe that this causes a race in leveldb that will lead to 'store.db/LOCK' being locked upon the open this patch removes. Regardless, reopening the db here is pointless as we already did it when we reach Monitor::StoreConverter::convert(). Fixes: #5640 Backport: cuttlefish Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | Merge pull request #438 from yehudasa/wip-rgw-nextGregory Farnum2013-07-163-24/+45
|\ \ | | | | | | | | | | | | Fix an issue with bucket placements and with listing on new installations. Reviewed-by: Greg Farnum <greg@inktank.com>
| * | rgw: handle ENOENT when listing bucket metadata entriesYehuda Sadeh2013-07-151-2/+12
| | | | | | | | | | | | | | | | | | Just return success (with an empty list) Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
| * | rgw: fix bucket placement assignmentYehuda Sadeh2013-07-152-22/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we set bucket.instance meta, we need to set the correct bucket placement to the bucket (according to the specific placement rule). However, it might be that bucket placement was never configured and we just go by the defaults, using the old legacy pools selection. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
* | | OSD: add config option for peering_wq batch sizeSamuel Just2013-07-153-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Large peering_wq batch sizes may excessively delay peering messages resulting in unreasonably long peering. This may speed up peering. Backport: cuttlefish Related: #5084 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | | mon: make report pure jsonSage Weil2013-07-151-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Put the crc in the status string and drop the header and footer. If users want to capture it, ceph report 2>&1 > foo.txt Signed-off-by: Sage Weil <sage@inktank.com>
* | | Merge remote-tracking branch 'gh/wip-mon-report' into nextSage Weil2013-07-1510-1/+42
|\ \ \
| * | | mon: include some (basic) auth info in reportSage Weil2013-07-144-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | Nothing privileged! Signed-off-by: Sage Weil <sage@inktank.com>
| * | | mon: include paxos info in reportSage Weil2013-07-143-0/+18
| | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
| * | | mon: move quorum out of monmapSage Weil2013-07-141-1/+1
| | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
| * | | mon: include service first_committed in reportSage Weil2013-07-144-0/+7
| | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | | | ceph: drop --threshold hack for 'pg dump_stuck'Sage Weil2013-07-154-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can live with the incompatibility here; the hack is currently not working anyway (see #5623). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
* | | | msg/Pipe: be a bit more explicit about encoding outgoing messagesSage Weil2013-07-151-2/+8
| | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | | | messages/MClientReconnect: clear data when encodingSage Weil2013-07-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MClientReconnect puts everything in the data payload portion of the message and nothing in the front portion. That means that if the message is resent (socket failure or something), the messenger thinks it hasn't been encoded yet (front empty) and reencodes, which means everything gets added (again) to the data portion. Decoding keep decoding until it runs out of data, so the second copy means we decode garbage snap realms, leading to the crash in bug Clearing data each time around resolves the problem, although it does mean we do the encoding work multiple times. We could alternatively (or also) stick some data in the front portion of the payload (ignored), but that changes the wire protocol and I would rather not do that. Fixes: #4565 Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* | | | Merge pull request #436 from ceph/wip-mon-fixesSage Weil2013-07-156-74/+103
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | Wip mon fixes Reviewed-by: Greg Farnum <greg@inktank.com>
| * | | | mon: set forwarded message recv stampSage Weil2013-07-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set it to the stamp of the MForward that carried us. One could argue we really want the original receive stamp on the origin, but that is not available to us, and this is better than nothing. In particular, this gives 'ceph log ...' commands a timestamp when they are forwarded via a peon. The stamp is still between when the request is sent and when it is committed/acked, so all is well from the client's perspective. Signed-off-by: Sage Weil <sage@inktank.com>
| * | | | mon: drop win_election() _reset() kludge and strengthen assertionsSage Weil2013-07-151-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is only there for the benefit of win_standalone_election(), but it doesn't need it, it clutters the code, and weakens our assertions. Now the only win_election() callers are win_standalone_election() (which is a single path that just did _reset()) and from the elector. Signed-off-by: Sage Weil <sage@inktank.com>