delta/mongo.git - github.com: mongodb/mongo.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	SERVER-73030 Generate random array data for CE testing	Timour Katchaounov	2023-02-22	1	-0/+1
\| \| \| \| \|	* Ensure all generated fields are non-empty. * Reduce/modify some data distributions.
*	SERVER-73979 Partition large random data files into chunks	Timour Katchaounov	2023-02-17	1	-22/+58
\| \| \| \| \| \|	Save randomly generated data into files split into chunks with a limited number of documents per chunk. This is done to avoid the 2GB limit on JS files imposed by Node.js.
*	SERVER-73031 Generate random data with mixed data types	Timour Katchaounov	2023-02-15	1	-1/+7
\| \| \| \| \| \|	* Added generation of random data with mixed data types * Generation of random dates and doubles * Some refactoring of the python generation framework wrt types
*	SERVER-73030 Generate random array data for CE testing	Timour Katchaounov	2023-01-27	1	-5/+10
\| \| \| \| \|	* Add generation of random array data of integers and strings. * Minor change to string generation to limit printable chars.
*	SERVER-72662 Generate random string data for CE	Timour Katchaounov	2023-01-20	1	-13/+27
\| \| \| \| \|	* Added generation of random string data. * Remove ce_generate_data_settings.py - no longer needed and used.
*	SERVER-72663 Visualize distribution of generated data	Timour Katchaounov	2023-01-17	1	-0/+16
\| \| \| \| \|	* Visualize generated data via histograms stored as png files in stests/query_golden/libs/data. * Added a couple of more mixed distributions.
*	SERVER-72236 Generate random integer data for CE	Timour Katchaounov	2023-01-12	1	-9/+4
\| \| \| \|	Address final review comments.
*	SERVER-72236 Generate random integer data for CE	Timour Katchaounov	2023-01-10	1	-4/+15
\| \| \| \| \| \| \| \| \|	Generate random data with integers. The approach is as follows: - There is one collection for each different cardinality. All collections contain the same fields. - Each field contains the data generated from a certain data distribution. The data could be anything - same type, mixed types, same mathematical distribution (e.g. normal), or a mixed distribution. - The committed configuration file, and the corresponding data file are reduced to only two small collections. For actual experiments one needs to add more data sizes, and re-generate the data locally. This is done so that Evergreen tests can run fast, and to reduce the size of the git repository. - All data is saved in a single JavaScript file: jstests/query_golden/libs/data/ce_accuracy_test.data, with a corresponding schema file jstests/query_golden/libs/data/ce_accuracy_test.schema. - The data file is a JavaScript file that can be loaded directly inside a JS test. When loading this file, it creates a global variable dataSet. The reason is that this is the only way to load an external JSON file that doesn't need to install external tools in Evergreen.
*	SERVER-72036 Implement data generation and loading into JS CE accuracy tests	Timour Katchaounov	2022-12-17	1	-0/+140
	* Extend the data generation Python framework for cost calibration to support data generation for CE testing as follows: - the entry point is ce_generate_data.py, - the configuration of the generated data is in ce_generate_data_settings.py, - all collection data is exported into a single JSON file stored in 'jstests/query_golden/libs/data', and a schema file stored in the same directory * Implement a JS data loader function that also creates all indexes specified in the schema file. * Add a small JS test that shows how to load the generated JSON files into collections.