| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
This is to disable the Erlang VM from attempting to open files when an
operator is undeleting a database. Obviously, the nodes must have been
configured for database recovery for this to be useful.
|
|
|
|
|
|
|
|
|
|
|
| |
Sleeps there not enough when run in a contrained test environment
Adjust timeouts to let tests pass even when setting CPU usage limit down to
1% in my VirtualBox VM.
Also switch to using macro defines to make it look slightly cleaner.
Fixes #731
|
|
|
|
| |
Closes #729. See the ticket for additional information.
|
|
|
|
|
|
|
| |
The test was racy. Use test_util:wait/1 function there just like other
places like couch_index_compaction_tests
Fixes #724
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were a couple issues with the previous ddoc_cache implementation
that made it possible to tip over the ddoc_cache_opener process. First,
there were a lot of messages flowing through a single gen_server. And
second, the cache relied on periodically evicting entries to ensure
proper behavior in not caching an entry forever after it had changed on
disk.
The new version makes two important changes. First, entries now have an
associated process that manages the cache entry. This process will
periodically refresh the entry and if the entry has changed or no longer
exists the process will remove its entry from cache.
The second major change is that the cache entry process directly mutates
the related ets table entries so that our performance is not dependent
on the speed of ets table mutations. Using a custom entry that does no
work the cache can now sustain roughly one million operations a second
with a twenty thousand clients fighting over a cache limited to one
thousand items. In production this means that cache performance will
likely be rate limited by other factors like loading design documents
from disk.
|
|
|
|
|
| |
This is an old merge artifact that was duplicating the event
notifications twice per design document update.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replications checkpoint to _local documents identified by replication ids. If
replication ids change replication tasks will not be able to find their
previous checkpoints and will rewind their change feeds back to 0. For a large
database that could mean reprocessing millions of documents.
Current version of replication id generation algorithm hashes the full url of
the source, target, their headers, including authorization ones as well, and a
few other things. This means when user changes their password and updates their
replication document, replication ids will change and all the checkpoint will
be invalidated.
Also, it is fairly common to upgrade services from http:// to https://.
Replication endpoint URIs then typically just change their schema part
accordingly. However, schema is part of the replication ID calculation, so
replication ids would then change as well.
Introduce a more robust replication id generation algorithm which can handle
some of those issues. The new algorithm:
1. Excludes source and target URI schema from the replication id calculation.
As long as the host and other parts stay the same changing the schema will
have no effect on the replication id.
2. Ignores inline (specified in the URL) basic authentication passwords.
3. Ignores basic authentication password even if provided in the
basic authorization headers.
4. Is insensitive to switching between providing basic authentication
credentials inline or in a headers section. However it includes the username
used in the basic auth in the calculation. It is plausible scenario that
http://user1:pass1@a.host.com is really a different database than
http://user2:pass@@a.host.com
Issue #688
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, if the replication id algorithm was updated, replicator would
migrate checkpoint documents but keep them in memory. They would be written to
their respective databases only if checkpoints need to be updated, which
doesn't happen unless the source database changes. As a result it was possible
for checkpoints to be lost. Here is how it could happen:
1. Checkpoints were created for current (3) version of the replicator document.
Assume the replication document contains some credentials tha look like
'adm:pass', and the commputed v3 replication id is "3abc...".
2. Replication id algorithm is updated to version 4. Version 4 ignores
passwords, such that changing authentication from 'adm:pass' to 'adm:pass2'
would not change the replication ids.
3. Server code is updated with version 4. Replicator looks for checkpoints with
the new version 4, which it calculates to be "4def...". It can't find it, so it
looks for v3, it finds "3abc..." and decides to migrate it. However migration
only happens in memory. That is, the checkpoint document is updated but it
need a checkpoint to happen for it to be written to disk.
4. There are no changes to the source db. So no checkpoints are forced to
happen.
5. User hears that the new replicator version is improved and passwords
shouldn't alter the replication ids and all the checkpoints are reused. They
update the replication document with their new credentials - adm:pass2.
6. The updated document with 'adm:pass2' credentials is processed by the
replicator. It computes the v4 replication id - "4def...". It's the same as
before since it wasn't affected by pass -> pass2 change. That replication
checkpoint document is not found on neither source not target. Replicator then
computes v3 of the id to find the older version. However, v3 is affected by the
passwords, so there it computes "3ghi..." which is different from previous v3
which was "3abc..." It cannot find it. Computes v2 and checks, then v1, and
eventually gives up not finding checkpoint and restart the change feed from 0
again.
To fix it, update `find_replication_logs` to also write the migrated
replication checkpoint documents to their respective databases as soon as it
finds them.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also decrease number of rounds from 5 to 3.
With:
`VBoxManage bandwidthctl ${VM} set Limit --limit 100K`
It needed over 400 seconds to pass with 5 rounds.
Fixes #725
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently when ddoc is updated, couch_index and couch_index_updater processes
corresponding to the previous version of ddoc will still exist until
all indexing processing initiated by them is done.
When ddoc of a big database is rapidly modified, this puts a lot
of unnecessary strain on database resources.
With this change, when ddoc is updated:
* all couch_index processes for the previous version of ddoc will be shutdown
* all linked to them couch_index_updater processes will die as well
* all processes waiting for indexing activity to be finished (waiters
for couch_index:get_status) will receive an immediate reply:
ddoc_updated. Interactive user requests (view queries) will get response:
{404, <<"not_found">>, <<"Design document was updated or deleted.">>}
Check if there are ddocs that use the same couch_index process
before closing it on ddoc_updated
1. When opening an index, always add a record {DbName, {DDocId, Sig}} to ?BY_DB.
2. When ddoc_updated, check if there other ddocs in ?BY_DB with the same Sig.
If there are no, stop couch_index processes.
If there are other, only remove {DbName, {DDocId, Sig}}
record from ?BY_DB for this ddoc.
|
| |
|
| |
|
|\
| |
| | |
Update Rebar file with Fauxton 1.1.13
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible that sometimes a multipart/related PUT with a doc and
an attachment would fail with the connection being un-expectedly
closed before the client (ibrowse) gets to parse the 413 error
response.
That makes the test flaky so it is disabled for now.
Issue #574
|
|
|
|
|
|
|
|
|
| |
In some case such as when replicator flushes a document received from an
open_revs response, it explictly sets the number of retries to 0 because
the context for that request might not be restartable and the retry should
happen at a higher level.
Issue #574
|
| |
|
| |
|
|
|
|
|
| |
Someone asked on Slack/IRC about this so I figured I'd clean it up a bit
to be more clear on how it works.
|
|\
| |
| | |
Fix/peruser test
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
This reverts commit 4b63ba898562382e48a1899af5efa3cb77bda1d7.
|
| |
| |
| |
| |
| |
| | |
Several eunit tests tend to fail by timing out when run on
travis-ci. This change increases timeouts on the more commonly failing
tests, and improves test robustness.
|
| |
| |
| |
| |
| |
| |
| | |
This makes sure that we correctly synchronize with the process running
compaction before we perform our desired assertions.
Fixes #701
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previous default timeout of 5 seconds was not enough when running in an
environment where disk access is severly throttled.
To add a timeout, changed the test function into a test generator. That also
made the `with` construct un-necessary.
Fixes #695
|
|/ |
|
|\ |
|
| | |
|
|/
|
|
| |
https://github.com/apache/couchdb-config/pull/16
|
|
|
|
|
|
| |
Looks like an oversight in commit 789f75d.
Closes #703
|
| |
|
|
|
|
|
|
|
|
| |
The test was repeatedly creating/deleting the exact same DB
name, which is a recipe for disaster. Changed to use unique
DB names.
Closes #705.
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we potentially could attempt to restart couch,
immediately attempt to see if couch had restarted, and fail
if the server wasn't there (pre- or post-restart).
This change wraps all attempts to contact couch in restartServer()
with try blocks and simplifies the check-if-restarted logic.
Closes #669. May or may not help with #673.
|
|
|
|
|
|
| |
LEGAL-303
Closes #697
|
| |
|
|
|
|
|
|
|
|
| |
Could reproduce issue #633 by limiting disk throughput in a VBox
VM instance to about 5KB. Try to increase the timeouts to let it
handle such apparent slowdowns.
Fixed #633
|
| |
|
|
|
|
|
|
|
|
| |
Replication cancelation doesn't immediately update active tasks. Instead, use
the new `waitReplicationTaskStop(rep_id)` function to propery wait for the
task status.
Issue #634
|
|
|
|
|
|
|
|
|
|
|
| |
The previous version of this test relied on trying to bump into the
all_dbs_active error from the couch_server LRU. This proves to be rather
difficult to reliably provide assertions on behavior. In hindsight all
we really care about is that the compactor holds a monitor against the
database and then we can trust couch_server will not evict anything that
is actively monitored.
Fixes #680
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was used from a test only and it wasn't reliable. Because of replicator
job delays initialization the `State` would be either #rep_state{} or #rep. If
replication job hasn't finished initializing, then state would be #rep{} and a
call like get_details which matches the state with #rep_state{] would fail with
the batmatch error.
As seen in issue #686
So remove `get_details` call and let the test rely on task polling as all other
tests do.
|
|\
| |
| | |
Use test_util:stop_config in mem3_util_test
|
|/
|
|
|
|
|
| |
The config:stop is asynchronous which causes test failures with error
like the following
{error,{already_started,<0.32662.3>}
|
|\
| |
| | |
3367 fix test case
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
We should use random names for databases. Otherwise the test fails with
database already exists error. This commit uses random name for users db
and corrects the section name for `authentication_db` setting.
COUCHDB-3367
|