diff options
author | Steven Danna <steve@chef.io> | 2016-12-13 17:57:08 +0000 |
---|---|---|
committer | Steven Danna <steve@chef.io> | 2016-12-13 17:57:08 +0000 |
commit | c20f0740ce2fce5f3968e9f18a2ac8ac6c9b779f (patch) | |
tree | 3eb30cfe83a866ae6dc2e1718647367ae7fbf20c /lib/chef/chef_fs | |
parent | c56cf68d67e01c0923009b58e01d24ce0efe537d (diff) | |
download | chef-c20f0740ce2fce5f3968e9f18a2ac8ac6c9b779f.tar.gz |
[cheffs] Don't iterate parent object on exist? callssd/chef-fs-n-squared
Previously, this code determined if an object existed by doing the
following:
parent.children.any? { |child| child.api_child_name == api_child_name }
For organizations and object types with a small number of total objects,
this wasn't problematic; however, it has very bad worst-case behavior.
For example, if a user was attempting to restore an organization with
30k client records to an empty organization, each client upload would do
the following:
1. Run GET /clients
2. Iterate the list returned in (1) normalizing the clients names along
the way.
3. Upload the client via POST when exist? returned false.
When the clients don't exist, this means step (2) will always iterate
over every member returned in step (1). By the time you get into the
1000s of clients, this iteration dominates the running time of the
process. For instance, consider the following ruby profile data:
Measure Mode: wall_time
Thread ID: 16939380
Fiber ID: 22163920
Total: 109.860468
Sort by: self_time
%self total self wait child calls name
26.83 29.475 29.475 0.000 0.000 500 <Class::IO>#select
8.60 38.876 9.446 0.000 29.430 464125 <Class::Chef::ChefFS::PathUtils>#join
7.02 12.226 7.717 0.000 4.509 928750 Chef::ChefFS::FileSystem::ChefServer::RestListEntry#api_child_name
6.79 7.459 7.459 0.000 0.000 930250 String#gsub
5.31 5.834 5.834 0.000 0.000 2320625 <Class::Chef::ChefFS::PathUtils>#regexp_path_separator
3.33 3.654 3.654 0.000 0.000 500 OpenSSL::X509::Store#set_default_paths
3.28 42.482 3.606 0.000 38.876 464125 Chef::ChefFS::FileSystem::BaseFSObject#initialize
2.95 3.244 3.244 0.000 0.000 930250 <Class::File>#extname
2.29 48.421 2.513 0.000 45.908 483375 *Class#new
38 wall clock seconds spent in PathUtils.join (part of the name
normalization during the list walk)
Note, an alternative might be to skip the exist? check completely,
opting to blindly POST and then rescuing the 409 and retry with a PUT.
Experimentation shows that PathUtils.join can also be improved
substantially; however, I'll leave those for a follow-up PR.
Signed-off-by: Steven Danna <steve@chef.io>
Diffstat (limited to 'lib/chef/chef_fs')
-rw-r--r-- | lib/chef/chef_fs/file_system/chef_server/rest_list_entry.rb | 9 |
1 files changed, 8 insertions, 1 deletions
diff --git a/lib/chef/chef_fs/file_system/chef_server/rest_list_entry.rb b/lib/chef/chef_fs/file_system/chef_server/rest_list_entry.rb index b8ec5f8524..eb793d5e87 100644 --- a/lib/chef/chef_fs/file_system/chef_server/rest_list_entry.rb +++ b/lib/chef/chef_fs/file_system/chef_server/rest_list_entry.rb @@ -69,7 +69,14 @@ class Chef def exists? if @exists.nil? begin - @exists = parent.children.any? { |child| child.api_child_name == api_child_name } + rest.get(api_path) + @exists = true + rescue Net::HTTPServerException => e + if e.response.code == "404" + @exists = false + else + raise + end rescue Chef::ChefFS::FileSystem::NotFoundError @exists = false end |