Recent studies show that LSTM-based neural optimizers are competitive with state-of-the-art hand-designed optimization methods for short horizons. Existing neural optimizers learn how to update the optimizee parameters, namely, predicting the product of learning rates and gradients directly and we suspect it's the reason why the training task becomes unnecessarily difficult. Instead, we train a neural optimizer to only control the learning rates of another optimizer using gradients of the training loss with respect to the learning rates. Furthermore, with the assumption that learning rates tend to remain unchanged over a certain number of iterations, the neural optimizer is only allowed to propose learning rates every $S$ iterations where the learning rates are fixed during these $S$ iterations and this enables it to generalize to longer horizons. The optimizee is trained by Adam on MNIST, and our neural optimizer learns to tune the learning rates for the Adam. After 5 meta-iterations, another optimizee trained by Adam whose learning rates are tuned by the learned but frozen neural optimizer, outperforms those trained by existing hand-designed and learned neural optimizers in terms of convergence rate and final accuracy for long horizons across several datasets.
The back-propagation (BP) algorithm has been considered the de-facto method for training deep neural networks. It back-propagates errors from the output layer to the hidden layers in an exact manner using the transpose of the feedforward weights. However, it has been argued that this is not biologically plausible because back-propagating error signals with the exact incoming weights is not considered possible in biological neural systems. In this work, we propose a biologically plausible paradigm of neural architecture based on related literature in neuroscience and asymmetric BP-like methods. Specifically, we propose two bidirectional learning algorithms with trainable feedforward and feedback weights. The feedforward weights are used to relay activations from the inputs to target outputs. The feedback weights pass the error signals from the output layer to the hidden layers. Different from other asymmetric BP-like methods, the feedback weights are also plastic in our framework and are trained to approximate the forward activations. Preliminary results show that our models outperform other asymmetric BP-like methods on the MNIST and the CIFAR-10 datasets.
The performance of deep neural networks is well-known to be sensitive to the setting of their hyperparameters. Recent advances in reverse-mode automatic differentiation allow for optimizing hyperparameters with gradients. The standard way of computing these gradients involves a forward and backward pass of computations. However, the backward pass usually needs to consume unaffordable memory to store all the intermediate variables to exactly reverse the forward training procedure. In this work we propose a simple but effective method, DrMAD, to distill the knowledge of the forward pass into a shortcut path, through which we approximately reverse the training trajectory. Experiments on several image benchmark datasets show that DrMAD is at least 45 times faster and consumes 100 times less memory compared to state-of-the-art methods for optimizing hyperparameters with minimal compromise to its effectiveness. To the best of our knowledge, DrMAD is the first research attempt to make it practical to automatically tune thousands of hyperparameters of deep neural networks.
Predicting the affective valence of unknown multiword expressions is key for concept-level sentiment analysis. ActiveSpace 2 is a vector space model, built by means of random projection, that allows for reasoning by analogy on natural language concepts. By reducing the dimensionality of affective common-sense knowledge, the model allows semantic features associated with concepts to be generalized and, hence, allows concepts to be intuitively clustered according to their semantic and affective relatedness. Such an affective intuition (so called because it does not rely on explicit features, but rather on implicit analogies) enables the inference of emotions and polarity conveyed by multi-word expressions, thus achieving effcient concept-level sentiment analysis.
MATLAB source code can be downloaded here.
Ant Colony Optimization is computationallyexpensive when it comes to complex problems. This paper presents andimplements a parallel MAX-MIN Ant System (MMAS) basedon a GPU+CPU hardware platform under the MATLABenvironment with Jacket toolbox to solve Traveling SalesmanProblem (TSP). The key idea is to let all ants share only onepseudorandom number matrix, one pheromone matrix, onetaboo matrix, and one probability matrix. We also use a newselection approach based on those matrices, named AIR (All-In-Roulette). The main contribution of this paper is thedescription of how to design parallel MMAS based on thoseideas and the comparison to the relevant sequential version. Thecomputational results show that our parallel algorithm is muchmore efficient than the sequential version.