Predicting Training Time Without Training

上传：hospitable_26882 浏览： 19 推荐： 0 文件：.pdf 大小：2.02 MB 上传时间：2021-01-24 08:55:49 版权申诉

Predicting Training Time Without Training

We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. To do so, we leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.This allows us to approximate the training loss and accuracy at any point during training by solving a low-dimensional Stochastic Differential Equation (SDE) in function space. Using this result, we are able to predict the time it takes for Stochastic Gradient Descent (SGD) to fine-tune a model to a given loss without having to perform any training. In our experiments, we are able to predict training time of a ResNet within a 20% error margin on a variety of datasets and hyper-parameters, at a 30 to 45-fold reduction in cost compared to actual training. We also discuss how to further reduce the computational and memory cost of our method, and in particular we show that by exploiting the spectral properties of the gradients' matrix it is possible predict training time on a large dataset while processing only a subset of the samples.

无需培训即可预测培训时间

我们解决了预测预训练的深度网络收敛到损失函数给定值所需的优化步骤数的问题。为此，我们利用了以下事实：在微调过程中，深层网络的训练动力学可以很好地被线性模型的训练动力学近似。.. 这使我们能够通过求解函数空间中的低维随机微分方程（SDE）来估计训练过程中任意点的训练损失和准确性。使用此结果，我们可以预测随机梯度下降（SGD）将模型微调到给定损失所需的时间，而无需执行任何训练。在我们的实验中，我们能够预测ResNet在各种数据集和超参数上误差在20％以内的训练时间，与实际训练相比，其成本降低了30到45倍。我们还讨论了如何进一步降低我们方法的计算和存储成本，尤其是表明，通过利用梯度矩阵的光谱特性，可以在仅处理一部分样本的情况下预测大型数据集上的训练时间。（阅读更多）

上传资源


微信扫一扫

用户评论

相关推荐

Time Management Training

我们作为一个高新科技企业的员工，要步上职业化的道路，成为一个强调实效性的职业人士，不应该把以上原因当做工作中的借口，为什么呢？剖析如下

DOC

0B

2019-08-17 23:00

Training Neural Networks without Gradients

Withthegrowingimportanceoflargenetworkmodelsandenormoustrainingdatasets,GPUshavebecomeincreasingly

PDF

0B

2020-03-28 09:08

Deep Neural Network Training without Multiplications

深度神经网络真的需要乘法吗？在这里，我们建议仅使用整数加法指令代替浮点乘法指令，将两个IEEE754浮点数相加。.. 我们证明，可以使用此操作以具有竞争力的分类准确性来训练ResNet。我们的建议不需

.pdf

187.04 KB

2021-01-24 06:18

Kazan training TAKT time VS cycle time

takttimevscycletimedefinition

PPT

0B

2019-08-17 23:00

KNX Training Requirements for KNX Training Centres

ItisindispensablethataKNXinstallationisproperlyprogrammedandcommissionedbyskilfulpersonnel.KNXAssoci

PDF

0B

2020-01-14 12:57

Progress Training

详细了解Progess的句法，函数等基础知识，是想了解progess的很好文件

ppt

0B

2019-04-02 19:01

Training框架

struts2,spring,mybatis整合框架

ZIP

0B

2019-05-06 20:44

SD training

SD training for SAP and it is very useful. jens

PPT

0B

2019-04-05 20:15

doe training

design of experiment，试验设计，本文是对DOE的一个简易培训，以便大家对DOE有初步的理解。

PDF

0B

2019-04-07 23:21

CANoe training

公司内部培训资料，基础入门，参考恒润公司文档

PPT

0B

2019-06-21 05:06

Agile training

AgiletrainingWhatisAgile?AgileManifestoPrinciplesbehindAgileManifestoAgileMethodsScrumandXPAgileTest

PPT

0B

2019-06-04 17:18

SOA TRAINING

SOA training docutment for study

PPTX

0B

2018-12-26 02:09

biee training

OBIEE的介绍和深入，适应于新手入门学习,老手温故知新，哈哈

PPT

0B

2018-12-25 12:08

Android Training

Android Training

7Z

0B

2018-12-26 11:25

Aix Training

grep是UNIX中使用最广泛的命令之一。grep允许对文本文件进行模式查找。如果找到匹配模式， grep打印包含模式的所有行如果要在当前目录下所有.doc文件中查找字符串“sort”，方法如下：

word

0B

2018-12-25 18:05