| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
* Ensure all generated fields are non-empty.
* Reduce/modify some data distributions.
|
|
|
|
|
|
| |
Save randomly generated data into files split into chunks
with a limited number of documents per chunk. This is done
to avoid the 2GB limit on JS files imposed by Node.js.
|
|
|
|
|
|
| |
* Added generation of random data with mixed data types
* Generation of random dates and doubles
* Some refactoring of the python generation framework wrt types
|
|
|
|
|
| |
* Add generation of random array data of integers and strings.
* Minor change to string generation to limit printable chars.
|
|
|
|
|
| |
* Added generation of random string data.
* Remove ce_generate_data_settings.py - no longer needed and used.
|
|
|
|
|
| |
* Visualize generated data via histograms stored as png files in stests/query_golden/libs/data.
* Added a couple of more mixed distributions.
|
|
|
|
| |
Address final review comments.
|
|
|
|
|
|
|
|
|
| |
Generate random data with integers. The approach is as follows:
- There is one collection for each different cardinality. All collections contain the same fields.
- Each field contains the data generated from a certain data distribution. The data could be anything - same type, mixed types, same mathematical distribution (e.g. normal), or a mixed distribution.
- The committed configuration file, and the corresponding data file are reduced to only two small collections. For actual experiments one needs to add more data sizes, and re-generate the data locally. This is done so that Evergreen tests can run fast, and to reduce the size of the git repository.
- All data is saved in a single JavaScript file: jstests/query_golden/libs/data/ce_accuracy_test.data, with a corresponding schema file jstests/query_golden/libs/data/ce_accuracy_test.schema.
- The data file is a JavaScript file that can be loaded directly inside a JS test. When loading this file, it creates a global variable dataSet. The reason is that this is the only way to load an external JSON file that doesn't need to install external tools in Evergreen.
|
|
* Extend the data generation Python framework for cost calibration to support data generation for CE testing as follows:
- the entry point is ce_generate_data.py,
- the configuration of the generated data is in ce_generate_data_settings.py,
- all collection data is exported into a single JSON file stored in 'jstests/query_golden/libs/data', and a schema file stored in the same directory
* Implement a JS data loader function that also creates all indexes specified in the schema file.
* Add a small JS test that shows how to load the generated JSON files into collections.
|