Home 论坛 AWS MLS use Pipe mode with CSV datasets for faster training on AWS 回复于:use Pipe mode with CSV datasets for faster training on AWS

回复于:use Pipe mode with CSV datasets for faster training on AWS

Home 论坛 AWS MLS use Pipe mode with CSV datasets for faster training on AWS 回复于:use Pipe mode with CSV datasets for faster training on AWS

#1338

aluck
参与者

Amazon SageMaker supports two mechanisms for transferring training data: File mode and Pipe mode. In File mode, the training data is downloaded first to an encrypted EBS volume attached to the training instance prior to commencing the training. However, in Pipe mode the input data is streamed directly to the training algorithm while it is running. This continuous streaming of data enables a few significant advantages. First, the startup time of a training job becomes independent of the size of the input data, resulting in much quicker startup, especially while training on gigabyte- and petabyte-scale datasets. Furthermore, you don’t have to pay for a large disk volume to download large datasets. Finally, if your training algorithm is I/O-bound, the highly concurrent, high-throughput reading mechanism employed by Pipe mode can significantly speed up your model training.

error: Content is protected !!