NOTE: Out datasets have moved from /mnt/training to another data repository.
The data used in these experiments is approximately 4.5 TB and the technology behind
/mnt/training replicates that data to some 30+ distinct object stores across multiple
clouds at a cost of 120+ TBs.
To replicate these experiments (see warning below), look for any references to /mnt/training or dbfs:/mnt/training as seen here:
Delete the reference to /mnt/training and replace it with wasbs://spark-ui-simulator@dbacademy.blob.core.windows.net as seen here:
There should be no need for additional changes to your code.
WARNING: Please do not expect to precisely replicate our results. These new datasets are
located in a data center in Washington, US and pulling this data from any other region will
incur significant, if not crippling, network overhead.
For example, the differences in execution time seen between West US 2 → West US and
West US 2 → Southeast Asia or even West US 2 → North Europe can vary wildly. In
addition to this, the intranational and internation movement of this data is subjected to
random events typical in our global network making it nearly impossible to get repeatable results.