2020年1月8日 - 17:18 #1336
Amazon SageMaker built-in algorithms now support Pipe mode for fetching datasets in CSV format from Amazon Simple Storage Service (S3) into Amazon SageMaker while training machine learning (ML) models.
With Pipe input mode, the data is streamed directly to the algorithm container while model training is in progress. This is unlike File mode, which downloads data to the local Amazon Elastic Block Store (EBS) volume prior to starting the training. Using Pipe mode your training jobs start faster, use significantly less disk space and finish sooner. This reduces your overall cost to train machine learning models. In some of our internal benchmarks that trained a regression model with the Amazon SageMaker Linear Learner algorithm on a 3.9 GB CSV dataset, the overall time to train the model was reduced by up to 40 percent by using Pipe mode instead of File mode.2020年1月8日 - 17:26 #1337
The following Amazon SageMaker built-in algorithms now have full support for training with datasets in CSV format using Pipe input mode:
Principal Component Analysis (PCA)
Linear Learner (Classification and Regression)
Neural Topic Modelling
Random Cut Forest2020年1月8日 - 17:28 #1338
Amazon SageMaker supports two mechanisms for transferring training data: File mode and Pipe mode. In File mode, the training data is downloaded first to an encrypted EBS volume attached to the training instance prior to commencing the training. However, in Pipe mode the input data is streamed directly to the training algorithm while it is running. This continuous streaming of data enables a few significant advantages. First, the startup time of a training job becomes independent of the size of the input data, resulting in much quicker startup, especially while training on gigabyte- and petabyte-scale datasets. Furthermore, you don’t have to pay for a large disk volume to download large datasets. Finally, if your training algorithm is I/O-bound, the highly concurrent, high-throughput reading mechanism employed by Pipe mode can significantly speed up your model training.