• Home   /
• Archive by category "1"

## Katyusha Research Paper

### Research Areas

Machine Learning, Optimization, Algorithms

### Research Interests

I work on the foundations of machine learning and optimization, and apply them to subjects across deep learning, theoretical computer science, and statistics. I am also interested in the mathematical modeling for physical, social, economic, and biological systems.

• ICML
• NIPS
• STOC
• FOCS
• SODA
• ICALP
• SoCG
• WSDM
• ICDM
• CIKM
• EC
• ITCS
• WINE
• POPL

### Acknowledgements

I would love to thank my wonderful collaborators without whom these papers below would never have been accomplished. In inverse chronological order:

Sébastien Bubeck (1), Ankit Garg (1), Wei Hu (1), Avi Wigderson (2), Rafael Oliveira (2), Yining Wang (2), Aarti Singh (2), Naman Agarwal (1), Brian Bullins (1), Tengyu Ma (1), Yuanzhi Li (12), Elad Hazan (4), Karthik Sridharan (1), Yang Yuan (4), Peter Richtárik (1), Zheng Qu (1), Zhenyu Liao (2), Yin Tat Lee (1), Lorenzo Orecchia (8), Aditya Bhaskara (1), Ilya Razenshteyn (1), Rati Gelashvili (2), Nir Shavit (1), Aaron Sidford (1), Silvio Lattanzi (2), Vahab Mirrokni (2), Jonathan Kelner (2), Sasa Misailovic (1), Martin Rinard (1), Alessandro Chiesa (4), Silvio Micali (5), Wei Chen (1), Pinyan Lu (2), Xiaorui Sun (2), Bo Tang (1), Yajun Wang (2), Zheng Chen (5), Tom Minka (1), Haixun Wang (1), Dong Wang (1), Chenguang Zhu (5), Gang Wang (4), Weizhu Chen (5).

## Working Papers

### Remarks

• My middle name "Allen" was legally merged into my family name in Feburary 2014, becoming "Allen-Zhu".
• Authors marked with * are for equal contribution.
 "Natasha 2: Faster Non-Convex Optimization Than SGD" (3rd edition). By: Zeyuan Allen-Zhu. August 2017 (revised February 2018) We design a stochastic algorithm to train any smooth neural network to $$\varepsilon$$-approximate local minima, using $$O(\varepsilon^{-3.25})$$ backpropagations. The best result was essentially $$O(\varepsilon^{-4})$$ by SGD. More broadly, it finds $$\varepsilon$$-approximate local minima of any smooth nonconvex function in rate $$O(\varepsilon^{-3.25})$$, with only oracle access to stochastic gradients. "Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits". By: Zeyuan Allen-Zhu*, Sébastien Bubeck*, Yuanzhi Li*. February 2018. Regret bounds in online learning compare the player's performance to $$L^*$$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $$T$$. The more refined concept of first-order regret bound replaces this with a scaling $$\sqrt{L^*}$$, which may be much smaller than $$\sqrt{T}$$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space. "Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization". By: Zeyuan Allen-Zhu. February 2018. The problem of minimizing sum-of-nonconvex functions (i.e., convex functions that are average of non-convex ones) is becoming increasingly important in machine learning, and is the core machinery for PCA, SVD, regularized Newton's method, accelerated non-convex optimization, and more.We show how to provably obtain an accelerated stochastic algorithm for minimizing sum-of-nonconvex functions, by adding one additional line to the well-known SVRG method. This line corresponds to momentum, and shows how to directly apply momentum to the finite-sum stochastic minimization of sum-of-nonconvex functions. As a side result, our method enjoys linear parallel speed-up using mini-batch. "How To Make the Gradients Small Stochastically". By: Zeyuan Allen-Zhu. Janurary 2018. In convex stochastic optimization, convergence rates in terms of minimizing the objective have been well-established. However, in terms of making the gradients small, the best known convergence rate was $$O(\varepsilon^{-8/3})$$ and it was left open how to improve it. In this paper, we improve this rate to $$\tilde{O}(\varepsilon^{-2})$$, which is optimal up to log factors. "Neon2: Finding Local Minima via First-Order Oracles" (2nd edition). By: Zeyuan Allen-Zhu*, Yuanzhi Li*. November 2017 (revised February 2018). We propose a reduction for non-convex optimization that can (1) turn a stationary-point finding algorithm into a local-minimum finding one, and (2) replace the Hessian-vector product computations with only gradient computations. It works both in the stochastic and the deterministic settings, without hurting the algorithm's performance. As applications, our reduction turns Natasha2 into a first-order method without hurting its performance. It also converts SGD, GD, SCSG, and SVRG into local-minimum finding algorithms outperforming some best known results. "Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach". By: Zeyuan Allen-Zhu*, Yuanzhi Li*, Aarti Singh*, Yining Wang*. November 2017 (submitted to a slow-reviewing journal). The experimental design problem concerns the selection of k points from a potentially large design pool of p-dimensional vectors, so as to maximize the statistical efficiency regressed on the selected k design points. Statistical efficiency is measured by \emph{optimality criteria}, including A(verage), D(eterminant), T(race), E(igen), V(ariance) and G-optimality. Except for the T-optimality, exact optimization is NP-hard. We propose a polynomial-time regret minimization framework to achieve a $$(1+\varepsilon)$$ approximation with only $$O(p/\varepsilon^2)$$ design points, for all the optimality criteria above. In contrast, to the best of our knowledge, before our work, no polynomial-time algorithm achieves $$(1+\varepsilon)$$ approximations for D/E/G-optimality, and the best poly-time algorithm achieving $$(1+\varepsilon)$$-approximation for A/V-optimality requires $$k=\Omega(p^2/\varepsilon)$$ design points.

## Published Papers (All Peer-Reviewed)

 "Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing". By: Zeyuan Allen-Zhu*, Ankit Garg*, Yuanzhi Li*, Rafael Oliveira*, Avi Wigderson*.STOC 2018: Symposium on Theory of Computing.We propose a new second-order method for geodesically convex optimization on the natural hyperbolic metric over positive definite matrices. We apply it to solve the operator scaling problem in time polynomial in the input size and logarithmic in the error. This is an exponential improvement over previous algorithms which were analyzed in the usual Euclidean, commutative metric (for which the above problem is not convex).As a consequence, we solve the equivalence problem for the left-right group action underlying the operator scaling problem, and derandomize a new class of Polynomial Identity Testing (PIT) problems, which was the original motivation for studying operator scaling. "Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls". By: Zeyuan Allen-Zhu*, Elad Hazan*, Wei Hu*, Yuanzhi Li*.NIPS 2017: Neural Information Processing Systems (spotlight). We propose a rank-k variant of the classical Frank-Wolfe algorithm to solve convex optimization over a trace-norm ball. Our algorithm replaces the top singular-vector computation (1-SVD) in Frank-Wolfe with a top-k singular-vector computation (k-SVD), which can be done by repeatedly applying 1-SVD k times. Our algorithm has a linear convergence rate when the objective function is smooth and strongly convex, and the optimal solution has rank at most k. This improves the convergence rate and the total complexity of the Frank-Wolfe method and its variants. "First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate". By: Zeyuan Allen-Zhu*, Yuanzhi Li*.FOCS 2017: Symposium on Foundations of Computer Science. We study online principle component analysis (PCA), that is to find the top k eigenvectors of a $$d\times d$$ hidden matrix $$\mathbf{\Sigma}$$ with online data samples drawn from covariance matrix $$\mathbf{\Sigma}$$. We provide global convergence for Oja's algorithm which is popularly used in practice but lacks theoretical understanding for $$k>1$$. We also provide a modified variant $$\mathsf{Oja}^{++}$$ that runs even faster than Oja's. Our results match the information theoretic lower bound in terms of dependency on error, on eigengap, on rank $$k$$, and on dimension $$d$$, up to poly-log factors. In addition, our convergence rate can be made gap-free, that is proportional to the approximation error and independent of the eigengap. In contrast, for general rank k, before our work (1) it was open to design any algorithm with efficient global convergence rate; and (2) it was open to design any algorithm with (even local) gap-free convergence rate. "Much Faster Algorithms for Matrix Scaling". By: Zeyuan Allen-Zhu*, Yuanzhi Li*, Rafael Oliveira*, Avi Wigderson*.FOCS 2017: Symposium on Foundations of Computer Science. We develop several efficient algorithms for the classical Matrix Scaling problem, which is used in many diverse areas, from preconditioning linear systems to approximation of the permanent. On an input $$n\times n$$ matrix $$A$$, this problem asks to find diagonal scaling matrices $$X, Y$$ (if they exist), so that $$X A Y$$ $$\varepsilon$$-approximates a doubly stochastic matrix, or more generally a matrix with prescribed row and column sums.We address the general scaling problem as well as some important special cases. In particular, if $$A$$ has $$m$$ nonzero entries, and if there exist $$X, Y$$ with polynomially large entries such that $$X A Y$$ is doubly stochastic, then we can solve the problem in total complexity $$\tilde{O}(m + n^{4/3})$$. This greatly improves on the best known previous results, which were either $$\tilde{O}(n^4)$$ or $$O(m n^{1/2}/\varepsilon)$$. "Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU". By: Zeyuan Allen-Zhu*, Yuanzhi Li*.ICML 2017: International Conference on Machine Learning. Matrix multiplicative weight update (MMWU) is an extremely powerful algorithmic tool for computer science and related fields. However, it comes with a slow running time due to the matrix exponential and eigendecomposition computations. For this reason, many researchers studied the followed-the-perturbed-leader (FTPL) framework which is faster, but a factor $$\sqrt{d}$$ worse than the optimal regret of MMWU for dimension-d matrices. In this paper, we propose a followed-the-compressed-leader framework which, not only matches the optimal regret of MMWU (up to polylog factors), but runs even faster than FTPL. Our main idea is to compress the matrix exponential computation to dimension 3 in the adversarial setting, or dimension 1 in the stochastic setting. This result resolves an open question regarding how to obtain both (nearly) optimal and efficient algorithms for the online eigenvector problem. "Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter". By: Zeyuan Allen-Zhu.ICML 2017: International Conference on Machine Learning. Given a non-convex function $$f(x)$$ that is an average of n smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue $$-\sigma$$ of the Hessian. This parameter $$\sigma$$ captures how strongly non-convex $$f(x)$$ is, and is analogous to the strong convexity parameter for convex optimization. Our methods outperform the best known results for a wide range of $$\sigma$$, and can also be used to find approximate local minima. In particular, we find an interesting dichotomy: there exists a threshold $$\sigma_0$$ so that the fastest methods for $$\sigma>\sigma_0$$ and for $$\sigma<\sigma_0$$ have drastically different behaviors: the former scales with $$n^{2/3}$$ and the latter scales with $$n^{3/4}$$. "Near-Optimal Design of Experiments via Regret Minimization". By: Zeyuan Allen-Zhu*, Yuanzhi Li*, Aarti Singh*, Yining Wang*.ICML 2017: International Conference on Machine Learning. The results of this paper have been outperformed by our subsequent paper entitled "Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach". "Faster Principal Component Regression and Stable Matrix Chebyshev Approximation". By: Zeyuan Allen-Zhu*, Yuanzhi Li*.ICML 2017: International Conference on Machine Learning. We solve principal component regression (PCR), up to a multiplicative accuracy $$1+\gamma$$, by reducing the problem to $$\tilde{O}(\gamma^{-1})$$ black-box calls of ridge regression. Therefore, our algorithm does not require any explicit construction of the top principal components, and is suitable for large-scale PCR instances. In contrast, previous result requires $$\tilde{O}(\gamma^{-2})$$ such black-box calls. We obtain this result by developing a general stable recurrence formula for matrix Chebyshev polynomials, and a degree-optimal polynomial approximation to the matrix sign function. Our techniques may be of independent interests, especially when designing iterative methods. "Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition". By: Zeyuan Allen-Zhu*, Yuanzhi Li*.ICML 2017: International Conference on Machine Learning. We study k-GenEV, the problem of finding the top k generalized eigenvectors, and k-CCA, the problem of finding the top k vectors in canonical-correlation analysis. We propose algorithms LazyEV and LazyCCA to solve the two problems with running times linearly dependent on the input size and on k. Furthermore, our algorithms are DOUBLY-ACCELERATED: our running times depend only on the square root of the matrix condition number, and on the square root of the eigengap. This is the first such result for both k-GenEV or k-CCA. We also provide the first gap-free results, which provide running times that depend on $$1/\sqrt{\varepsilon}$$ rather than the eigengap. "Katyusha: The First Direct Acceleration of Stochastic Gradient Methods". By: Zeyuan Allen-Zhu. JMLR 2017: Journal of Machine Learning Research (to appear).STOC 2017: Symposium on Theory of Computing (abstract). We introduce , the first direct, primal-only stochastic gradient method that has a provably accelerated convergence rate in convex optimization. In contrast, previous methods are based on dual coordinate descent which are more restrictive, or based on outer-inner loops which make them "blind" to the underlying stochastic nature of the optimization process. is the first algorithm that incorporates acceleration directly into stochastic gradient updates. Unlike previous results, obtains an optimal convergence rate. It also supports proximal updates, non-Euclidean norm smoothness, non-uniform sampling, and mini-batch sampling. When applied to interesting classes of convex objectives, including smooth objectives (e.g., Lasso, Logistic Regression), strongly-convex objectives (e.g., SVM), and non-smooth objectives (e.g., L1SVM), improves the best known convergence rates. The main ingredient behind our result is Katyusha momentum, a novel "negative momentum on top of momentum" that can be incorporated into a variance-reduction based algorithm and speed it up. As a result, since variance reduction has been successfully applied to a fast growing list of practical problems, our paper suggests that in each of such cases, one had better hurry up and give Katyusha a hug. "Finding Approximate Local Minima Faster Than Gradient Descent". By: Naman Agarwal*, Zeyuan Allen-Zhu*, Brian Bullins*, Elad Hazan*, Tengyu Ma*. STOC 2017: Symposium on Theory of Computing. We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which is linear in the input representation. The time complexity of our algorithm to find an approximate local minimum is even faster than that of gradient descent to find a critical point. Our algorithm applies to a general class of optimization problems including training a neural network and other non-convex objectives arising in machine learning. "Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent". By: Zeyuan Allen-Zhu*, Lorenzo Orecchia*. ITCS 2017: Innovations in Theoretical Computer Science.First-order methods play a central role in large-scale machine learning. Even though many variations exist, each suited to a particular problem, almost all such methods fundamentally rely on two types of algorithmic steps: gradient descent, which yields primal progress, and mirror descent, which yields dual progress.We observe that the performances of gradient and mirror descent are complementary, so that faster algorithms can be designed by linearly coupling the two. We show how to reconstruct Nesterov's accelerated gradient methods using linear coupling, which gives a cleaner interpretation than Nesterov's original proofs. We also discuss the power of linear coupling by extending it to many other settings that Nesterov's methods cannot apply to. "LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain". By: Zeyuan Allen-Zhu*, Yuanzhi Li*. NIPS 2016: Neural Information Processing Systems (spotlight). We study k-SVD that is to obtain the first k singular vectors of a matrix A. Recently, a few breakthroughs have been discovered on k-SVD: Musco and Musco [1] provided the first gap-free theorem for the block Krylov method, Shamir [2] discovered the first variance-reduction stochastic method, and Bhojanapalli et al. [3] provided the fastest $$O(\mathsf{nnz}(A)+\mathsf{poly}(1/\varepsilon))$$-time algorithm using alternating minimization. In this paper, we put forward a new and simple LazySVD framework to improve the above breakthroughs. This framework leads to a faster gap-free method outperforming [1], and the first accelerated and stochastic method outperforming [2]. In the $$O(\mathsf{nnz}(A)+\mathsf{poly}(1/\varepsilon))$$ running-time regime, LazySVD outperforms [3] in certain parameter regimes without even using alternating minimization. "Optimal Black-Box Reductions Between Optimization Objectives". By: Zeyuan Allen-Zhu*, Elad Hazan*. NIPS 2016: Neural Information Processing Systems (spotlight). The diverse world of machine learning applications has given rise to a plethora of algorithms and optimization methods, finely tuned to the specific regression or classification task at hand. We reduce the complexity of algorithm design for machine learning by reductions: we develop reductions that take a method developed for one setting and apply it to the entire spectrum of smoothness and strong-convexity in applications.Furthermore, unlike existing results, our new reductions are OPTIMAL and more PRACTICAL. We show how these new reductions give rise to new and faster running times on training linear classifiers for various families of loss functions, and conclude with experiments showing their successes also in practice. "Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters". By: Zeyuan Allen-Zhu*, Yang Yuan*, Karthik Sridharan. NIPS 2016: Neural Information Processing Systems (spotlight). The amount of data available in the world is growing faster and bigger than our ability to deal with it. However, if we take advantage of the internal structure, data may become much smaller for machine learning purposes. In this paper we focus on one of the most fundamental machine learning tasks, empirical risk minimization (ERM), and provide faster algorithms with the help from the clustering structure of the data. We introduce a simple notion of raw clustering that can be efficiently obtained with just one pass of the data, and propose two algorithms. Our variance-reduction based algorithm ClusterSVRG introduces a new gradient estimator using the clustering information, and our accelerated algorithm ClusterACDM is built on a novel Haar transformation applied to the dual space of each cluster. Our algorithms outperform their classical counterparts both in theory and practice. "Variance Reduction for Faster Non-Convex Optimization". By: Zeyuan Allen-Zhu*, Elad Hazan*. ICML 2016: International Conference on Machine Learning. We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in $$O(1/\varepsilon)$$ iterations for smooth objectives, and stochastic gradient descent that converges in $$O(1/\varepsilon^2)$$ iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an $$O(1/\varepsilon)$$ rate, and is faster than full gradient descent by $$\Omega(n^{1/3})$$. We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets. ﻿"Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling". By: Zeyuan Allen-Zhu*, Peter Richtárik*, Zheng Qu*, Yang Yuan*. ICML 2016: International Conference on Machine Learning. Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems. Up to a primal-dual transformation, it is also the same as accelerated stochastic gradient descent that is one of the central methods used in machine learning. In this paper, we improve the best known running time of accelerated coordinate descent by a factor up to $$\sqrt{n}$$. Our improvement is based on a clean, novel non-uniform sampling that selects each coordinate with a probability proportional to the square root of its smoothness parameter. Our proof technique also deviates from the classical estimation sequence technique used in prior work. Our speed-up applies to important problems such as empirical risk minimization and solving linear systems, both in theory and in practice. "Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives". By: Zeyuan Allen-Zhu* and Yang Yuan*. ICML 2016: International Conference on Machine Learning. Many classical algorithms are found until several years later to outlive the confines in which they were conceived, and continue to be relevant in unforeseen settings. In this paper, we show that SVRG is one such method: being originally designed for strongly convex objectives, it is also very robust in non-strongly convex or sum-of-non-convex settings. More precisely, we provide new analysis to improve the state-of-the-art running times in both settings by either applying SVRG or its novel variant. Since non-strongly convex objectives include important examples such as Lasso or logistic regression, and sum-of-non-convex objectives include famous examples such as stochastic PCA and is even believed to be related to training deep neural nets, our results also imply better performances in these applications. "Optimization Algorithms for Faster Computational Geometry". By: Zeyuan Allen-Zhu*, Zhenyu Liao*, Yang Yuan*. ICALP 2016: International Colloquium on Automata, Languages, and Programming. We study two fundamental problems in computational geometry: finding the maximum inscribed ball (MaxIB) inside a bounded polyhedron defined by $m$ hyperplanes, and the minimum enclosing ball (MinEB) of a set of $n$ points, both in $d$-dimensional space. We improve the running time of iterative algorithms on MaxIB from $$\tilde{O}(m d \alpha^3 / \varepsilon^3)$$ to $$\tilde{O}(md + m \sqrt{d} \alpha / \varepsilon)$$, a speed-up up to $$\tilde{O}(\sqrt{d} \alpha^2 / \varepsilon^2)$$, andMinEB from $$\tilde{O}(n d / \sqrt{\varepsilon})$$ to $$\tilde{O}(nd + n \sqrt{d} / \sqrt{\varepsilon})$$, a speed-up up to $$\tilde{O}(\sqrt{d})$$. Our improvements are based on a novel saddle-point optimization framework. We propose a new algorithm L1L2SPSolver for solving a class of regularized saddle-point problems, and apply a randomized Hadamard space rotation which is a technique borrowed from compressive sensing. Interestingly, the motivation of using Hadamard rotation solely comes from our optimization view but not the original geometry problem: indeed, it is not immediately clear why MaxIB or MinEB, as a geometric problem, should be easier to solve if we rotate the space by a unitary matrix. We hope that our optimization perspective sheds lights on solving other geometric problems as well. "Restricted Isometry Property for General p-Norms". By: Zeyuan Allen-Zhu*, Rati Gelashvili*, Ilya Razenshteyn*. IEEE-IT 2016: IEEE Transactions on Information Theory.SoCG 2015: International Symposium on Computational Geometry (abstract).The Restricted Isometry Property (RIP) is a fundamental property of a matrix which enables sparse recovery. Informally, an $$m \times n$$ matrix satisfies RIP of order $$k$$ for the $$\ell_p$$ norm, if $$\|Ax\|_p \approx \|x\|_p$$ for every $$x$$ with at most $$k$$ non-zero coordinates.For every $$1 \leq p < \infty$$ we obtain almost tight bounds on the minimum number of rows $$m$$ necessary for the RIP property to hold. Before, only the cases $$p \in \big\{1, 1 + \frac{1}{\log k}, 2\big\}$$ were studied. Interestingly, our results show that the case $$p = 2$$ is a `singularity' in terms of the optimum value of $$m$$.We also obtain almost tight bounds for the column sparsity of RIP matrices and discuss implications of our results for the Stable Sparse Recovery problem. "Using Optimization to Obtain a Width-Independent, Parallel, Simpler, and Faster Positive SDP Solver". By: Zeyuan Allen-Zhu*, Yin Tat Lee*, and Lorenzo Orecchia*. SODA 2016: Symposium on Discrete Algorithms. We study the design of polylogarithmic depth algorithms for approximately solving packing and covering semidefinite programs (or positive SDPs for short). This is a natural SDP generalization of the well-studied positive LP problem. Although positive LPs can be solved in polylogarithmic depth while using only $$\log^{2} n/\varepsilon^3$$ parallelizable iterations [3], the best known positive SDP solvers due to Jain and Yao [18] require $$\log^{14} n /\varepsilon^{13}$$ parallelizable iterations. Several alternative solvers have been proposed to reduce the exponents in the number of iterations [19, 29]. However, the correctness of the convergence analyses in these works has been called into question [29], as they both rely on algebraic monotonicity properties that do not generalize to matrix algebra. In this paper, we propose a very simple algorithm based on the optimization framework proposed in [3] for LP solvers. Our algorithm only needs $$\log^2 n / \varepsilon^3$$ iterations, matching that of the best LP solver. To surmount the obstacles encountered by previous approaches, our analysis requires a new matrix inequality that extends Lieb-Thirring's inequality, and a sign-consistent, randomized variant of the gradient truncation technique proposed in [2, 3]. "Expanders via Local Edge Flips". By: Zeyuan Allen-Zhu*, Aditya Bhaskara*, Silvio Lattanzi*, Vahab Mirrokni*, and Lorenzo Orecchia*. SODA 2016: Symposium on Discrete Algorithms. Designing distributed and scalable algorithms to improve network connectivity is a central topic in peer-to-peer networks. In this paper we focus on the following well-known problem: given an n-node d-regular network, we want to design a decentralized, local algorithm that transforms the graph into one that has good connectivity properties (low diameter, expansion, etc.) without affecting the sparsity of the graph. To this end, Mahlmann and Schindelhauer introduced the random "flip" transformation, where in each time step, a random pair of vertices that have an edge decide to 'swap a neighbor'. They conjectured that performing $$\mathrm{poly}(d) \times n\log n$$ such flips at random would convert any connected d-regular graph into a d-regular expander graph, with high probability. However, the best known upper bound for the number of steps is roughly $$O(n^{17} d^{23})$$, obtained via a delicate Markov chain comparison argument. Our main result is to prove that a natural instantiation of the random flip produces an expander in at most $$O(n^2 d^2 \sqrt{\log n})$$ steps, with high probability. Our argument uses a potential-function analysis based on the matrix exponential, together with the recent beautiful results on the higher-order Cheeger inequality of graphs. We also show that our technique can be used to analyze another well-studied random process known as the 'random switch', and show that it produces an expander in $$O(n d)$$ steps for $$d=\Omega(\log n)$$, with high probability. "Nearly Linear-Time Packing and Covering LP Solvers". By: Zeyuan Allen-Zhu*, Lorenzo Orecchia*. STOC 2015: Symposium on Theory of Computing (abstract).MAPR 2018: Mathematical Programming (to appear). Packing and covering linear programs (PC-LPs) form an important class of linear programs (LPs) across computer science, operations research, and optimization. In 1993, Luby and Nisan constructed an iterative algorithm for approximately solving PC-LPs in nearly linear time, where the time complexity scales nearly linearly in $$N$$, the number of nonzero entries of the matrix, and polynomially in $$\varepsilon$$, the (multiplicative) approximation error. Unfortunately, existing nearly linear-time algorithms for solving PC-LPs require time at least proportional to $$\varepsilon^{-2}$$. In this paper, we break this longstanding barrier by designing a packing solver that runs in time $$\tilde{O}(N \varepsilon^{-1})$$ and covering LP solver that runs in time $$\tilde{O}(N \varepsilon^{-1.5})$$. Our packing solver can be extended to run in time $$\tilde{O}(N \varepsilon^{-1})$$ for a class of well-behaved covering programs. In a follow-up work, Wang et al. showed that all covering LPs can be converted into well-behaved ones by a reduction that blows up the problem size only logarithmically. "Spectral Sparsification and Regret Minimization Beyond Multiplicative Updates". By: Zeyuan Allen-Zhu*, Zhenyu Liao*, Lorenzo Orecchia*. STOC 2015: Symposium on Theory of Computing. In this paper, we provide a novel construction of the linear-sized spectral sparsifiers of Batson, Spielman and Srivastava [BSS14]. While previous constructions required $$\Omega(n^4)$$ running time [BSS14, Zou12], our sparsification routine can be implemented in almost-quadratic running time $$O(n^{2+eps})$$.The fundamental conceptual novelty of our work is the leveraging of a strong connection between sparsification and a regret minimization problem over density matrices. This connection was known to provide an interpretation of the randomized sparsifiers of Spielman and Srivastava [SS11] via the application of matrix multiplicative weight updates (MWU) [CHS11, Vis14]. In this paper, we explain how matrix MWU naturally arises as an instance of the Follow-the-Regularized-Leader framework and generalize this approach to yield a larger class of updates. This new class allows us to accelerate the construction of linear-sized spectral sparsifiers, and give novel insights on the motivation behind Batson, Spielman and Srivastava [BSS14]. "Using Optimization to Solve Positive LPs Faster in Parallel". By: Zeyuan Allen-Zhu*, Lorenzo Orecchia*. SODA 2015: Symposium on Discrete Algorithms. We study the design of nearly-linear-time algorithms for approximately solving positive linear programs. Both the parallel and the sequential deterministic versions of these algorithms require $$\tilde{O}(\varepsilon^{-4})$$ iterations, a dependence that has not been improved since the introduction of these methods in 1993 by Luby and Nisan. Moreover, previous algorithms and their analyses rely on update steps and convergence arguments that are combinatorial in nature, and do not seem to arise naturally from an optimization viewpoint. In this paper, we leverage insights from optimization theory to construct a novel algorithm that breaks the longstanding $$\tilde{O}(\varepsilon^{-4})$$ barrier. Our algorithm has a simple analysis and a clear motivation. Our work introduces a number of novel techniques, such as the combined application of gradient descent and mirror descent, and a truncated, smoothed version of the standard multiplicative weight update, which may be of independent interest. "Knightian Analysis of the Vickrey Mechanism" By: Alessandro Chiesa*, Silvio Micali* and Zeyuan Allen Zhu*. Econometrica 2015: Volume 83, Issue 5, pages 1727-1754. "Shorter Arithmetization of Nondeterministic Computations". By: Alessandro Chiesa*, Zeyuan Allen Zhu*. TCS 2015: Theoretical Computer Science, Volume 600, pages 107-131. Arithmetizing computation is a crucial component of many fundamental results in complexity theory, including results that gave insight into the power of interactive proofs, multi-prover interactive proofs, and probabilistically-checkable proofs. Informally, an arithmetization is a way to encode a machine's computation so that its correctness can be easily verified via few probabilistic algebraic checks. We study the problem of arithmetizing nondeterministic computations for the purpose of constructing short probabilistically-checkable proofs (PCPs) with polylogarithmic query complexity. In such a setting, a PCP's proof length depends (at least!) linearly on the length, in bits, of the encoded computation. Thus, minimizing the number of bits in the encoding is crucial for minimizing PCP proof length. In this paper we show how to arithmetize any T-step computation on a nondeterministic Turing machine by using a polynomial encoding of length $$O\big( T \cdot (\log T)^2 \big) \enspace.$$ Previously, the best known length was $$\Omega(T \cdot (\log T)^4)$$. For nondeterministic random-access machines, our length is $$O(T \cdot (\log T)^{2+o(1)})$$, while prior work only achieved $$\Omega(T \cdot (\log T)^5)$$. The polynomial encoding that we use is the Reed--Solomon code. When combined with the best PCPs of proximity for this code, our result yields quasilinear-size PCPs with polylogarithmic query complexity that are shorter, by at least two logarithmic factors, than in all prior work. Our arithmetization also enjoys additional properties. First, it is succinct, i.e., the encoding of the computation can be probabilistically checked in $$(\log T)^{O(1)}$$ time; this property is necessary for constructing short PCPs with a polylogarithmic-time verifier. Furthermore, our techniques extend, in a certain well-defined sense, to the arithmetization of yet other NEXP-complete languages. "Johnson-Lindenstrauss Compression with Neuroscience-Based Constraints". By: Zeyuan Allen-Zhu*, Rati Gelashvili*, Silvio Micali*, Nir Shavit*. PNAS 2014: Proceedings of the National Academy of Sciences of the USA, vol 111, no 47. Johnson-Lindenstrauss (JL) matrices implemented by sparse random synaptic connections are thought to be a prime candidate for how convergent pathways in the brain compress information. However, to date, there is no complete mathematical support for such implementations given the constraints of real neural tissue. The fact that neurons are either excitatory or inhibitory implies that every so implementable JL matrix must be sign-consistent (i.e., all entries in a single column must be either all non-negative or all non-positive), and the fact that any given neuron connects to a relatively small subset of other neurons implies that the JL matrix had better be sparse. We construct sparse JL matrices that are sign-consistent, and prove that our construction is essentially optimal. Our work answers a mathematical question that was triggered by earlier work and is necessary to justify the existence of JL compression in the brain, and emphasizes that inhibition is crucial if neurons are to perform efficient, correlation-preserving compression. "Reconstructing Markov Processes from Independent and Anonymous Experiments". By: Silvio Micali*, Zeyuan Allen Zhu*. DAM 2015: Discrete Applied Mathematics, volume 200, pages 108-122.We investigate the problem of exactly reconstructing, with high confidence and up to isomorphism, the ball of radius r centered at the starting state of a Markov process from independent and anonymous experiments. In an anonymous experiment, the states are visited according to the underlying transition probabilities, but no global state names are known: one can only recognize whether two states, reached within the same experiment, are the same. We prove quite tight bounds for such exact reconstruction in terms of both the number of experiments and their lengths. "Knightian Self Uncertainty in the VCG Mechanism for Unrestricted Combinatorial Auctions". By: Alessandro Chiesa*, Silvio Micali*, Zeyuan Allen Zhu*. ACM-EC 2014: Conference on Economics and Computation. "Flow-Based Algorithms for Local Graph Clustering". By: Lorenzo Orecchia* and Zeyuan Allen Zhu*. SODA 2014: Symposium on Discrete Algorithms. Given a subset S of vertices of an undirected graph G, the cut-improvement problem asks us to find a subset S that is similar to A but has smaller conductance. A very elegant algorithm for this problem has been given by Andersen and Lang [AL08] and requires solving a small number of single-commodity maximum flow computations over the whole graph G. In this paper, we introduce LocalImprove, the first cut-improvement algorithm that is local, i.e. that runs in time dependent on the size of the input set A rather than on the size of the entire graph. Moreover, LocalImprove achieves this local behaviour while essentially matching the same theoretical guarantee as the global algorithm of Andersen and Lang. The main application of LocalImprove is to the design of better local-graph-partitioning algorithms. All previously known local algorithms for graph partitioning are random-walk based and can only guarantee an output conductance of $$\tilde{O}(\sqrt{\phi})$$ when the target set has conductance $$\phi \in [0,1]$$. Very recently, Zhu, Lattanzi and Mirrokni [ZLM13] improved this to $$O(\phi / \sqrt{\mathsf{Conn}})$$ where the internal connectivity parameter $$\mathsf{Conn} \in [0,1]$$ is defined as the reciprocal of the mixing time of the random walk over the graph induced by the target set. In this work, we show how to use LocalImprove to obtain a constant approximation $$O(\phi)$$ as long as $$\mathsf{Conn}/\phi = \Omega(1)$$. This yields the first flow-based algorithm. Moreover, its performance strictly outperforms the ones based on random walks and surprisingly matches that of the best known global algorithm, which is SDP-based, in this parameter regime [MMV12]. Finally, our results show that spectral methods are not the only viable approach to the construction of local graph partitioning algorithm and open door to the study of algorithms with even better approximation and locality guarantees. "Local Graph Clustering Beyond Cheeger's Inequality". By: Zeyuan Allen Zhu, Silvio Lattanzi, Vahab Mirrokni. ICML 2013: International Conference on Machine Learning. Motivated by applications of large-scale graph clustering, we study random-walk-based local algorithms whose running times depend only on the size of the output cluster, rather than the entire graph. All previously known such algorithms guarantee an output conductance of $$\tilde{O}(\sqrt{\phi(A)})$$ when the target set A has conductance $$\phi(A)\in[0,1]$$. In this paper, we improve it to $\tilde{O}\bigg( \min\Big\{\sqrt{\phi(A)}, \frac{\phi(A)}{\sqrt{\mathsf{Conn}(A)}} \Big\} \bigg)\enspace,$ where the internal connectivity parameter $$\mathsf{Conn}(A)\in[0,1]$$ is defined as the reciprocal of the mixing time of the random walk over the induced subgraph on A. For instance, using $$\mathsf{Conn}(A)=\Omega(\lambda(A)/\log n)$$ where $$\lambda$$ is the second eigenvalue of the Laplacian of the induced subgraph on A, our conductance guarantee can be as good as $$\tilde{O}(\phi(A)/\sqrt{\lambda(A)})$$. This builds an interesting connection to the recent advance of the so-called improved Cheeger's Inequality [KKL+13], which says that global spectral algorithms can provide a conductance guarantee of $$O(\phi_{\mathsf{opt}}/\sqrt{\lambda_3})$$ instead of $$O(\sqrt{\phi_{\mathsf{opt}}})$$. In addition, we provide theoretical guarantee on the clustering accuracy (in terms of precision and recall) of the output set. We also prove that our analysis is tight, and perform empirical evaluation to support our theory on both synthetic and real data. It is worth noting that, our analysis outperforms prior work when the cluster is well-connected. In fact, the better it is well-connected inside, the more significant improvement (both in terms of conductance and accuracy) we can obtain. Our results shed light on why in practice some random-walk-based algorithms perform better than its previous theory, and help guide future research about local clustering. "A Simple, Combinatorial Algorithm for Solving SDD Systems in Nearly-Linear Time". By: Jonathan A. Kelner*, Lorenzo Orecchia*, Aaron Sidford*, Zeyuan Allen Zhu*. STOC 2013: Symposium on Theory of Computing. In this paper, we present a simple combinatorial algorithm that solves symmetric diagonally dominant (SDD) linear systems in nearly-linear time. It uses very little of the machinery that previously appeared to be necessary for a such an algorithm. It does not require recursive preconditioning, spectral sparsi cation, or even the Chebyshev Method or Conjugate Gradient. After constructing a "nice" spanning tree of a graph associated with the linear system, the entire algorithm consists of the repeated application of a simple (non-recursive) update rule, which it implements using a lightweight data structure. The algorithm is numerically stable and can be implemented without the increased bit-precision required by previous solvers. As such, the algorithm has the fastest known running time under the standard unit-cost RAM model. We hope that the simplicity of the algorithm and the insights yielded by its analysis will be useful in both theory and practice. "Mechanism Design with Approximate Valuations". By: Alessandro Chiesa*, Silvio Micali*, Zeyuan Allen Zhu*. Invited to ACM Trans. on Computation Theory.ITCS 2012: Innovations in Theoretical Computer Science. "Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations". By: Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A. Kelner, Martin Rinard.

Soviet rocketry spanned the entire history of the Soviet Union (1922–1991). Rocket scientists and engineers contributed to the development of rocket and jet engine propulsion systems, which were first used for artillery and later for fighter aircraft, bombers, ballistic missiles, and space exploration. Progress was greatly augmented by the reverse engineering of German technology captured by westward-moving troops during the final days of World War II and the immediate period following.

## Liquid Fuel: The early contribution

Traditionally, the Russian military had used only solid fuel in rockets (generally black powder).[1] Russian involvement in rocketry began in 1903 when Konstantin Tsiolkovsky published a paper on liquid-propelled rockets (LPREs).[2] Though the idea of rockets stems from the 1600s, Tsiolkovsky's efforts made significant advances in the use of liquid fuel. His work challenged traditional thought and sparked a revolution in science which embraced new ideas in rocket technology.[3]

One of the early leaders in Russian rocket engine development was Friedrich Zander, a German-born spaceflight enthusiast. Zander was the head of a group of rocket researchers called GIRD, the Russian acronym for "Group for Research of Reactive Propulsion," which was created in 1931. Zander, who idolized the Russian scientist Konstantin Tsiolkovsky and the German rocket scientist Hermann Oberth, oversaw the development of Russia's first liquid fueled rocket, the GIRD 10. The rocket was launched successfully in 1933, and it reached an altitude of 1300 feet, but Zander died before the test took place.[4]

Two research groups played exceptionally important roles in the early development of Soviet Cold War-era jet propulsion: The Leningrad Gas Dynamics Laboratory (GDL), and the Group for Research of Reactive Propulsion (GIRD). Due to their similar objectives and parallel existence, there was some overlap between GDL and GIRD, and the two organizations were eventually merged. Generally speaking, however, while the GDL focused primarily on the development of rocket engines, GIRD was involved with the engineering design, construction, and testing of the craft that would be powered using the engines that the GDL developed.[5]

The GDL was formed in 1928, initially focusing primarily on the development of solid-fuel rockets for military use, such as anti-aircraft and anti-tank weapons, though it branched out into liquid-fueled rocket engines in May 1929. These rockets were developed to be used as engines for aircraft instead of for conventional artillery. It was primarily through the work of the GDL that the OR and ORM series of rocket engines were developed, which would become the backbone of early Russian jet development.[6]

GIRD began as the Jet Engine Section of a larger civil defense organization known as the Society for the Promotion of Defense and Aerochemical Development; GIRD's role was to deliver practical jet engine technology to be employed in aerial military applications. Although branches of GIRD were established in major cities all throughout the Soviet Union, the two most active branches were those in Moscow (MosGIRD, formed in January 1931) and in Leningrad (LenGIRD, formed in November 1931).[7] MosGIRD worked on the development of space research, liquid-propellant rockets, rocket design as it pertained to aircraft, and the construction of a supersonic wind tunnel (used for the aerodynamic testing of the aircraft that they developed), whereas LenGIRD developed solid-fuel rockets used for photographing the upper atmosphere, carrying flares, and atmospheric sounding.[8]

Early pioneers in the field began to postulate that liquid fuels were more powerful than solid fuels.[9] Some of the early fuels used by these scientists were oxygen, alcohol, methane, hydrogen, or combinations of them.[10] A bitter rivalry developed between the researchers of these institutes.[11]

In order to obtain maximum military benefits, the Red Army's chief-of-staff Marshal Mikhail Tukhacheskii merged GIRD with the GDL to study both fuel types. The new group was given the designation RNII.[12] Before merging, the GDL had conducted liquid fuel tests and used nitric acid, while the GIRD had been using liquid oxygen.[13] A brilliant, though often confrontational Sergei Korolev, headed the GIRD when it merged into RNII, and he was originally RNII's deputy director. Korolev's boss was a hard-nosed man from the GDL by the name of Kleimenov. Bitter in-fighting slowed the pace and quality of the research at RNII, but despite internal dissention, Korolev began to produce designs of missiles with liquid fueled engines. By 1932, RNII was using liquid oxygen with kerosene as a coolant as well as nitric acid and a hydrocarbon.[14] By 1933, the Soviets had successfully developed the GIRD 09. The engine was tested on a glider, which reached a propelled altitude of 1300 feet before its thrust chamber failed.[15]

## Applications in early aircraft

As a young adult, Sergei Korolev (1907–1966) had always been fascinated by aviation. At college, his fascination towards rocketry and space travel grew. He became one of the most important rocket engineers of Soviet aircraft technology, and became "Chief Designer" of the Soviet space program.[16] Sergei Korolev was a vitally important member of GIRD, and later became the head of the Soviet space program. Korolev would play a crucial role in both the launch of Sputnik in 1957, and the mission which put Yuri Gagarin in space in 1961. In the early stages of Soviet rocketry science, Korolev, who initiated the program in Moscow, called the Group for the Study of Reactive Motion, abbreviated as GIRD in Russian. As a renowned aeronautical engineer and the director of GIRD, he and his colleagues were enthusiastic supporters of Russia's race to space, and their focus was aimed at using liquid propellant to push their rockets into the atmosphere.[17]

In 1931, Korolev had come to Zander with a conceptual design for a rocket-powered aircraft called the RP-1.[18] This craft was essentially a glider, powered with one of GDL's rocket motors, the OR-2. The OR-2 was a rocket engine powered with gasoline and liquid oxygen, and produced a thrust of 500 Newtons. In May 1932, about a year before Zander died, Korolev became the director of GIRD. At this point, he continued developing his design for the RP-1, an updated version called the RP-2, and another craft that he called the RP-218. The plan for the RP-218 called for a two-seat rocket powered plane, complete with a pressurized cabin, a retractable undercarriage, and equipment for high altitude research. The design was never realized, though, because at the time, there was not a rocket powerful enough and light enough to make the RP-218 practical.[18]

In September 1933, GIRD was combined with the Gas Dynamics Laboratory, and the conglomerate was named the RN II Rocket Scientific Research Institution. When the two institutes combined, they brought together two of the most exceptional and successful engineers in the history of Soviet rocketry. Korolev teamed up with propulsion engineer Valentin Glushko, and together they excelled in the rocket industry, pushing the Soviet Union ahead of the United States in the space race. Instead of pursuing the RP-218, in 1935, Korolev and RN II began developing the SK-9, a simple wooden two-seat glider which was to be used for testing rocket engines.[19] The rear seat was replaced with tanks holding kerosene and nitric acid, and the OR-2 rocket motor was installed in the fuselage. The resulting craft was referred to as the RP-318. The RP-318 was tested numerous times with the engine installed, and was deemed ready for test flights in April 1938, but the plane's development halted when Joseph Stalin performed a purge of RN II, executing the director and chief engineer, and imprisoning Korolev to the Kolyma gold mines for 10 years.[20] Despite all of his achievements, Korolev's identity actually remained a Soviet secret up until his death in 1966.[21]

## World War II

The Soviets began to redesign the thrust chambers of their rocket engines, as well as investigate better ignition systems. These research endeavors were receiving more attention and funding as Europe began its escalation into the chaos of World War II. The Soviet rocket program had developed engines with two-stage ignition and variable thrust nearly two years before Germany rolled out their Me-163.[22] However, the Soviet engine was only on gliders for testing, and was not available for full-powered flight. The engine's thrust was too low, and pressure build-up caused systemic failures.

Toward the end of 1938, work resumed on the RP-318 at N II-3, which was the new title for the Rocket Scientific Research Institution. The aircraft was repaired and modified, with the addition of a new, more powerful engine to replace the OR-2. The new engine (the ORM-65) had been originally designed for a use in a single-launch cruise missile, but was adapted so that it could be employed in a multi-use aircraft.[23] For comparison to the OR-2, the new ORM-65 could produce a variable thrust between 700 and 1400 Newtons. After extensive testing, on February 28, 1940, the new RP-318-1 was successfully tested in a full-powered flight; the craft attained a speed of 90 mph, reached an altitude of 1.8 miles, in 110 seconds of operation, and was landed safely when the fuel was exhausted. Although this was a momentous occasion in Russian jet development, further plans to enhance this aircraft were shelved, and when the German Army neared Moscow in August 1941, the RP-318-1 was burned to keep it away from the Germans.[24]

The German invasion of Russia in the summer of 1941 led to an acute sense of urgency for the Soviets to develop practical rocket-powered aircraft. The Russian conventional air force was dominated by the Luftwaffe, with scores of their planes being shot down by individual German fighters.[25] The Russians needed a superior weapon to counter the German air forces, and they looked to rocket-powered interceptor craft as the solution to their dilemma. In spring of 1941, Andrei Kostikov (the new director of N II-3, previously RN II) and Mikhail Tikhonravov began designing a new rocket-powered interceptor, the Kostikov 302.

The Kostikov 302 became the first Russian rocket plane that would have many features shared with modern fighter aircraft. It was built out of wood, with some aluminum, but it included a pressurized cockpit and retractable landing gear. Another key aspect of the Kostikov 302 was that it was equipped with hydraulic actuators, which allowed the pilot to fly the aircraft with more ease. These actuators, in effect the equivalent of power steering in a car, greatly reduced amount of force the pilots had to apply to control the plane. Because of the ongoing war with Germany, Russian officials strove to make the Kostikov aircraft a functional military asset as quickly as possible. This entailed outfitting it with armored glass, armored plates, several 20 mm cannons, and the option of a payload of either rockets or bombs under the wings. Although it had limited range, this aircraft became a serviceable tool for the purpose of brief forays, such as intercepting enemy aircraft. However, by 1944, the 302 was unable to reach Kostikov's performance requirements, in part because the engine technology was not keeping pace with the aircraft development.[26]

The research teams made an important breakthrough in 1942: finally producing a tested and combat-ready rocket engine, the D-7-A-1100. This utilized a kerosene liquid fuel with a nitric acidoxidizer. However, the Nazi invasion had the Soviet high command centered on other matters, and the engine was never produced for use.[27]

## German impact

By 1944 Nazi Germany was crumbling underneath a two front war. Both American and Soviet forces were in a race for German rocket facilities. The Soviet army occupied Peenemünde on May 2, 1944. Realizing the importance of their capture, the Soviets immediately began salvage and repair of the facility. They also began an equivalent operation to Operation Paperclip in order to catch German scientists. Although the Soviets missed Wernher von Braun's research group, they did capture Helmut Gröttrup and 200 rocket specialists.[28] As part of the Peenemünde occupation, the Soviets obtained the V-2 rocket platform, the A-9/A-10 ocean range rockets, the Rheinbote surface-surface missile, and the R-4/M air to air missile. These represent a few of the notable platforms, captured mostly intact and operational. Gröttrup's team swelled to 5500 personnel, and he put Peenemünde back into full production. Not satisfied with the results, Stalin ordered that the facility and its personnel be moved back to the USSR, where the scientists were distributed to various facilities.[29]

Another major factor in the development of modern Russian aircraft was technology obtained from the Germans after the end of World War II. Since most of the Axis powers were in no condition to repay the billions of dollars that they supposedly owed, the Soviets deployed "trophy brigades", whose task it was to confiscate all equipment, materials, and technology that would be of scientific use to the USSR.[30] Siddiqi points out that the Soviets obtained models of multiple jet fighters, jet engines, and a wealth of technical information concerning equipment related to aviation. By the summer of 1945, the Soviet Union had control of 600 German aviation plants, which constituted more than 50% of all of Germany's aerospace industry. In fact, the Soviet Commissariat of Aviation Industry (NKAP) sent Russian aviation engineers to Germany in order to study in-depth details of German aircraft design: wing design, rocket propulsion, and electronic systems were of particular interest.[31] German expertise about reactive propulsion played a considerable role in the progression of Soviet development of both jet aircraft and rocket-powered spacecraft. Major General Nikolai Petrov, who headed a commission sent by the NKAP to examine German research facilities, informed the trophy brigades in Soviet occupied Germany that their task involved:

"... the removal, safekeeping and shipment to Moscow of all German experimental aircraft and engines of all types; aviation equipment, components and all materials associated with their design and production; scientific research materials; laboratory installations; wind tunnels; instrumentation; libraries; and scientific archives. The Commission must work on the scene immediately after Soviet troops capture appropriate locations, scientific centers and industrial regions of Germany."[32]

By 1953, the Soviet Union had developed their own version of the V-2. This system was called the R-11 missile. The R-11 was operational by 1955, had a range of 270 kilometers, and had an engine with a thrust of 8300 kgf. This system, inspired by the German V-2, became the basis for Submarine Launched Ballistic Missiles (SLBM). However, this application required a change of fuel from the land based fuel of Nitric acid with kerosene to the actual V-2 fuel using a graphite gas jet.[33]

Over the course of the Cold War, the Soviet Union developed an estimated 500 LPRE rocket platforms. In 1982, the Soviets began testing of the RD-170. This nitric acid and kerosene propelled rocket was capable of producing more thrust than any engine available. The RD-170 had 4 variable thrusters with staged combustion. The engine experienced early technical difficulties, and it experienced massive damage as it was shut down in stages. To remediate this, Soviet engineers had to reduce its thrust capacity. The engine was officially flight tested successfully in 1985.[34]

The need for mobile nuclear forces began to increase as the Cold War escalated in the early 1950s. The idea of naval launched tactical nuclear weaponry began to take hold. By 1950, the USSR had developed submarine launched ballistic missiles. These missiles were multi stage, but due to fuel constraints, they could not be launched from underwater. The initial missile system used land based armaments. The USSR is the only known nation to utilize LPRE fueled engines for its SLBMs.

Aside from the nuclear aspect of rocket propelled missiles, Soviet scientists sought to harness this technology for other weapon systems. As early as 1938, the Soviets were capable of using rockets for anti-personnel purposes. This technology had been honed with the Katyusha Rocket used extensively against the Nazis during the German invasion.[35] During World War II, there is no record of any liquid fueled weapons being either produced or designed.[36] From 1958 to 1962, the Soviets researched and developed LPRE propelled anti-aircraft missile systems. These rockets primarily used nitric acid ratioed with a hypergolic amine for fuel.[37]

### Andrei Tupolev

Andrei Tupolev was a leading aircraft designer of Soviet Russia. Tupolev was part of a company that specialized in all metal military aircraft. Tupolev recruited and formed TsAGI which was the Soviet aviation research institute. From the 1920s until 1937, Tupolev and his group worked on design and production of Soviet aircraft. In 1937, Tupolev was arrested by Stalin during the Great Purge. While in prison at Bulshevo Prison in Moscow, Tupolev was recruited by the NKVD to run TsKB-29. This organization utilized political prisoners to produce aircraft for the Soviet state. While in prison, Tupolev began to focus on bomber design and produced the Tu-2 which became the premier Soviet bomber during World War II.[38]

After World War II, Tupolev was assigned to working on reverse engineering scavenged US B-29 bombers. From his work, he produced the Tu-4. As the Cold war began to take shape, the emphasis began to turn toward speed of aircraft. By 1950, Tupolev's group produced the USSR's first turboprop, the Tu-95. Production and design progressed rapidly, and by 1952 Tupolev had produced the first Soviet jet bomber, the Tu-16. The Tu-22 quickly followed as a twin engine jet bomber. The Tupolev group evolved more into civilian jet aircraft until his death in 1972.[38]

### Pavel Sukhoi

Pavel Sukhoi was a senior designer at the Central Aerohydrodynamics Institute in Moscow. This design group was under the control of Tupolev's TsAGI. In 1939, Moscow ordered Sukhoi to head a new scientific research group called OKB. This organization was based in modern-day Kharkiv, Ukraine. This new organization under Sukhoi's direction began research and design of round attack aircraft. The first of these was these was the Su-6. The onslaught of the Nazi invasion disrupted fighter development for the OKB. Following the end of World War II, Stalin directed Sukhoi to begin investigations into jet aircraft.[39] Early development issues, combined with political prejudice, doomed the first Soviet jet fighter, the Su-9, and it was never in production. Stalin thought the group's designs were too close to captured German jet aircraft. As a result, the design bureau was closed and moved back to Tupolev's department in Moscow.[39]

Sukhoi's luck changed again in 1953 when Stalin died. The new government permitted him to create another independent jet fighter design group. By 1954, the group was named OKB-51, which remains to this day an active research group. The early 1950s and 1960s yielded tremendous results in the form of the Su-7 and delta wing Su-9. These two fighters were individually updated with new technology to later become the Su-11 and Su-15 fighter interceptors. Upon his death in 1975, Pavel Sukhoi's name was added to the bureau name in recognition of his services.[39]

### Development of MiG aircraft

One of the premier fighter jet aircraft that Russia employed throughout the duration of the Cold War was the MiG. In a Britannica Academic article, Siddiqi explains that in 1939, Joseph Stalin called for the creation of a new jet aircraft for the Russian military. The men chosen to lead the design of this new fighter were Artem I. Mikoyan and Mikhail I. Gurevich; the abbreviation MiG is a conjunction of the last names of these men. The first aircraft they designed was the I-200. The I-200 was a single engine jet, designed to operate at high altitudes and at high speed to intercept enemy bombers. This aircraft was first flown in 1940 (only 1 year after Stalin's declaration), and it was later renamed the MiG-1. Later, an improved MiG-3 was developed, and by 1942, Mikoyan and Gurevich's team was made an independent design bureau, known informally as MiG, but formally as OKB-155 (which stands for Experimental Design Bureau in Russian).[40]

Throughout the Cold War, OKB-155 churned out some of Russia's most important jet aircraft. According to Siddiqi, technical information captured from defeated Germany played a substantial role in OKB-155 rolling out the USSR's first jet fighter, the MiG-9, in 1946. Other prominent jets designed and produced by this group include the MiG-15, the MiG-17, the MiG-19, the MiG-21, the MiG-23, and the MiG-25. The MiG-15 through the MiG-21 were produced in the mid-1940s into the latter part of the 1950s. The MiG-23 and MiG-25 were not developed until the 1960s. Each of these aircraft offered unique capabilities to the Soviet military. The MiG-15 was employed primarily against American forces during the Korean War, and proved to be highly successful. The MiG-17, -19, and -21 continued to improve upon this design, as each model reached progressively greater speeds; The MiG-19 was Russia's first supersonic jet to be produced in industrial quantities, and the MiG-21 attained speeds in excess of Mach 2. Finally, the MiG-23 was the Soviet Union's first variable-sweep wing fighter aircraft, and the MiG-25 was Russia's first jet capable of reaching Mach 3.[40]

Sputnik 1 was the first artificial Earth satellite ever launched. On Oct. 4, 1957, the USSR launched Sputnik 1 into orbit and received transmissions from it.[41] Sputnik 1 was designed to be the forerunner for multiple satellite missions. The technology constantly underwent upgrades as the weight of satellites increased. The first notable failure occurred during Sputnik 4, an unmanned test of the Vostok capsule. A guidance system malfunction pointed the capsule in the wrong direction for the orbit-exiting engine burn, sending it instead into a higher orbit, which decayed approximately four months later.[42] The success of Sputnik 1 was followed by the launch of 175 meteorological rockets in the next two years. In all, there were ten of the Sputnik satellites launched.

The ability to launch satellites came from the Soviet intercontinental ballistic missile (ICBM) arsenal, using the RD-107 engine for the Vostok launch vehicle. The first Vostok version had 1 core engine and 4 strap-on stage engines. The engines were all vectored thrust capable. The original Vostok was fueled by liquid oxygen and kerosene. There were a total of 20 engines, each capable of contributing 55,000 pounds of thrust.[43] The Vostok engine was the first true Soviet design. The technical name was the RD-107 and later the RD-108. These engines had two thrust chambers. They were originally mono-propellant-burning using hydrogen peroxide fuel. This family of engines were utilized not just on the Vostok, but also on the Voskhod, Molniya, and Soyuz launch vehicles.[44]

By 1959, the space program needed a 3-stage engine platform, so the Vostok engine was adapted accordingly for launching Moon probes. By 1963, the Vostok was equipped for 4-stage applications. This platform was used for the first multi-manned flight.[45] As 1964 began, the Soviets introduced a new engine into its booster engine program, the RD-0110. This engine replaced the RD-107 in the second stage, in both the Molniya and Soyuz launch vehicles. These engines were liquid oxygen propelled, with kerosene coolant. The RD-0110 had four variable thrusters. This engine was unique because it initially was launched by a solid fuel propellant, but was fueled in flight by liquid oxygen.[46]

This development caused a new problem for the Soviet scientific community, however. The Vostok was too powerful for newer satellites trying to reach low Earth orbit.[clarification needed] The space community turned once again to the Soviet missile command. The new Intermediate Ballistic Missiles (IBRM) systems provided two engine options: the Sandal (1 stage), or the Skean (2 stage). Both systems were upgraded to a new RD-111 engine. Following these upgrades, the largest satellite called Proton I was launched in 1965.[47] The type of engine used for Proton I was the RD-119. This engine provided nearly 13.3 million newtons (3.0 million pounds-force) of thrust, and was ultimately used to execute low Earth orbit.[47]

## Notes

1. ^Siddiqi, Asif (July 2003). "The Rockets' Red Glare". Technology and Culture. 44 (3): 473.
2. ^Siddiqi, Asif (July 2003). "The Rockets' Red Glare: Technology, Conflict, and Terror in the Soviet Union". Technology and Culture. 44 (3): 474.
3. ^Siddiqi, Asif (July 2003). "The Rockets' Red Glare". Technology and Culture. 44 (3): 474.
4. ^Van Pelt, Michel (2012). Rocketing into the Future: The History and Technology of Rocket Planes. New York: Springer Business and Science Media. p. 120.
5. ^Stoiko, Michael. Soviet Rocketry. p. 46.
6. ^Stoiko, Michael (1970). Soviet Rocketry: Past, Present, and Future. New York, Chicago, and San Francisco: Holt, Rinehart, and Winston. pp. 43–44.
7. ^Stoiko, Michael. Soviet Rocketry. p. 51.
8. ^Stoiko, Michael. Soviet Rocketry. pp. 51–53.
9. ^Sutton, George (Nov–Dec 2003). "History of Liquid-Propellant Rocket Engines in Russia, Formerly the Soviet Union". Journal of Propulsion and Power. 19 (6): 2.
10. ^Sutton, George (Nov–Dec 2003). "History of Liquid-Propellant Rocket Engines". Journal of Propulsion and Power: 2–3.
11. ^Siddiqi, Asif (July 2003). "The Rockets' Red Glare". Technology and Culture. 44 (3): 476.
12. ^Siddiqi, Asif (July 2003). "The Rockets' Red Glare". Technology and Culture. 44 (3): 478.
13. ^Siddiqi, Asif (July 2003). "The Rockets' Red Glare". Technology and Culture. 44 (3): 481.
14. ^Sutton, George (Nov–Dec 2003). "History of Liquid-Propellant Rocket Engines". Journal of Propulsion and Power. 19 (6): 3.
15. ^Sutton, George (Nov–Dec 2003). "History of Liquid Propellant Rocket Engines". Journal of Propulsion and Power. 19 (6): 4.
16. ^West, John (2001). "Historical Aspects of the Early Soviet Russian Manned Space Program". Journal of Applied Physiology: 1501–1511.
17. ^Siddiqi, Asif (July 2003). "The Rockets' Red Glare". Technology and Culture. 44 (3): 470–501. doi:10.1353/tech.2003.0133.
18. ^ abVan Pelt, Michel. Rocketing into the Future: The History and Technology of Rocket Planes. p. 120.
19. ^Van Pelt, Michel. Rocketing into the Future: The History and Technology of Rocket Planes. p. 121.
20. ^Van Pelt, Michel. Rocketing into the Future: The History and Technology of Rocket Planes. pp. 121–122.
21. ^West, John (2001). "Historical Aspects of the Early Soviet/Russian Manned Space Program". Journal of Applied Physiology: 1501–1511.
22. ^Sutton, George (July 2003). "History of Liquid-Propellant Rocket Engines". Journal of Power and Propulsion. 19 (6): 15.
23. ^Van Pelt, Michel. Rocketing into the Future: The History and Technology of Rocket Planes. p. 122.
24. ^Van Pelt, Michel. Rocketing into the Future: The History and Technology of Rocket Planes. p. 123.
25. ^Van Pelt, Michel. Rocketing into the Future: The History and Technology of Soviet Rocketry. p. 120.
26. ^Van Pelt, Michel. Rocketing into the Future: The History and Technology of Rocket Planes. pp. 123–125.
27. ^Sutton, George (July 2003). "History of Liquid-Propellant Rocket Engines". Journal of Power and Propulsion. 19 (6): 6.
28. ^Stoiko, Michael. Soviet Rocketry. p. 71.
29. ^Stoiko, Michael. Soviet Rocketry. pp. 71–72.
30. ^Siddiqi, Asif (December 2004). "Russians in Germany: Founding the Post-War Missile Programme". Europe-Asia Studies. 56 (8): 1134.
31. ^Siddiqi, Asif (December 2004). "Russians in Germany: Founding the Post-War Missile Programme". Europe-Asia Studies. 56 (8): 1135.
32. ^Siddiqi, Asif (December 2004). "Russians in Germany: Founding the Post-War Missile Programme". Europe-Asia Studies. 56 (8): 1135–1136. doi:10.1080/1465342042000308893.
33. ^Chertok, B. (2004). "German Influence in USSR". Acta Astronautica. 55: 735–740. Bibcode:2004AcAau..55..735C. doi:10.1016/j.actaastro.2004.05.025.
34. ^Sutton, George (Nov–Dec 2003). "History of Liquid-Propellant Rocket Engines". Journal of Propulsion and Power. 19 (6): 15.
35. ^Siddiqi, Asif (July 2003). "The Rockets' Red Glare". Technology and Culture. 44 (3): 493.
36. ^Chertok, B. (2004). "German Influence in USSR". Acta Astronautica. 55: 738. Bibcode:2004AcAau..55..735C. doi:10.1016/j.actaastro.2004.05.025.
37. ^Sutton, George (July 2003). "History of Liquid-Propellant Rocket Engines". Journal of Propulsion and Power. 19 (6): 25.
38. ^ abSiddiqi, Asif. "Tupolev". Brittanica Academic. Encyclopædia Britannica. Retrieved April 4, 2016.
39. ^ abcSiddiqi, Asif. "Sukhoy". Brittanica Academic. Encyclopædia Britannica. Retrieved April 4, 2016.
40. ^ abSiddiqi, Asif. "MiG". Brittanica Academic. Encyclopædia Britannica. Retrieved April 4, 2016.
41. ^Stoiko, Michael (1970). Soviet Rocketry. p. 79.
42. ^Stoiko, Michael (1970). Soviet Rocketry. pp. 84–87.
43. ^Stoiko, Michael (1970). Soviet Rocketry. p. 93.
44. ^Sutton, George (2003). "History of Liquid-Propellant Rocket Engines". Journal of Propulsion and Power. 19 (6): 10.
45. ^Stoiko, Michael (1970). Soviet Rocketry. p. 95.
46. ^Sutton, George (2003). "History of Liquid-Propellant Rocket Engines". Journal of Propulsion and Power. 19 (6): 19.
47. ^ abStoiko, Michael (1970). Soviet Rocketry. p. 97.

## Bibliography

• Burgess, Colin, and Hall, Rex. The First Soviet Cosmonaut Team: Their Lives, Legacy, and Historical Impact. Berlin: Springer, 2009.
• Chertok, B. (2004). "German Influence in USSR". Acta Astronautica. 55: 735–740. Bibcode:2004AcAau..55..735C. doi:10.1016/j.actaastro.2004.05.025.
• Chertok, B. E. Rockets and People: Volume II. Washington, D.C.: NASA, 2006. Accessed April 7, 2016. https://history.nasa.gov/SP-4110/vol2.pdf.
• Chertok, Boris Evseyevich. Rockets and People: Volume IV: The Moon Race. Washington, D.C.: National Aeronautics and Space Administration, NASA History Office, Office of External Affairs, 2005.
• Columbia Electronic Encyclopedia, 6th Edition. Wernher Von Braun. June 2015. Accessed April 8, 2016.
• Darrin, Ann Garrison and O'Leary, Beth Laura. Handbook of Space Engineering, Archaeology, and Heritage. Boca Raton: Taylor & Francis, 2009.
• Faure, Gunter, and Mensing, Teresa M. Introduction to Planetary Science: The Geological Perspective. Dordrecht: Springer, 2007.
• "Glushko." Encyclopedia Astronautica Glushko. Web, Accessed 08 Apr. 2016. <https://web.archive.org/web/20060830130455/http://astronautix.com/astros/glushko.htm>.
• Hagler, Gina. Modeling Ships and Space Craft: The Science and Art of Mastering the Oceans and Sky. New York: Springer, 2013.
• Harvey, Brian. Russian Planetary Exploration: History, Development, Legacy, Prospects. Berlin: Springer, 2007.
• "Konstantin Tsiolkovsky" Florida International University. Accessed 08 Apr. 2016. <http://www.allstar.fiu.edu/aero/tsiolkovsky.htm>
• "Konstantin Tsiolkovsky." NASA. Accessed 08 Apr. 2016. <https://www.nasa.gov/audience/foreducators/rocketry/home/konstantin- tsiolkovsky.html>
• Lethbridge, Cliff. "History of Rocketry Chapter 6: 1945 to the Creation of NASA." Spaceline. (2000). Accessed April 7, 2016. http://www.spaceline.org/history/6.html.
• MSFC History Office: NASA. Biography of Wernher Von Braun. Accessed April 7, 2016. http://history.msfc.nasa.gov/vonbraun/bio.html.
• "MiG." Encyclopædia Britannica. Britannica Academic. Encyclopædia Britannica Inc. Web, Accessed 04 Apr. 2016. <http://academic.eb.com/EBchecked/topic/748233/MiG>.
• O'Brien, Jason L.; Sears, Christine E. "Victor or Villain? Wernher von Braun and the Space Race". Social Studies. 102 (2).
• "Sergei Korolev." Russian Space Web. Web, Accessed 08 Apr. 2016. <http://www.russianspaceweb.com/korolev.html>
• Siddiqi, Asif A. The Rockets' Red Glare: Spaceflight and the Russian Imagination, 1857-1957. (2004).
• Siddiqi, Asif A. "The Rockets' Red Glare: Technology, Conflict, and Terror in the Soviet Union." Technology and Culture 44, No. 3 (July 2003).
• Siddiqi, Asif A. "Russians in Germany: Founding the Post-War Missile Programme." Europe-Asia Studies 56, No.8 (Dec. 2004).
• Stoiko, Michael. Soviet Rocketry: Past, Present, and Future. New York, Chicago, and San Francisco: Holt, Rinehart, and Winston, 1970.
• Sutton, George P. "History of Liquid-Propellant Rocket Engines in Russia, Formerly the Soviet Union." Journal of Propulsion and Power 19, No.6 (Nov.-Dec. 2003).
• "Sukhoy." Encyclopædia Britannica. Britannica Academic. Encyclopædia Britannica Inc., 2016. Web. 11 Apr. 2016. <http://academic.eb.com/EBchecked/topic/749677/Sukhoy>.
• "Tupolev." Encyclopædia Britannica. Britannica Academic. Encyclopædia Britannica Inc., 2016. Web.11 Apr. 2016. <http://academic.eb.com/EBchecked/topic/750142/Tupolev>.
• Van Pelt, Michel. Rocketing into the Future: The History and Technology of Rocket Planes. New York: Springer Business and Science Media, 2012.
• West, John. "Historical Aspects of The Early Soviet/Russian Manned Space Program." Journal of Applied Physiology 91.4 (2001).
• "Yury Alekseyevich Gagarin." Encyclopædia Britannica. Britannica Academic. Encyclopædia Britannica Inc., 2016. Web. 20 Apr. 2016. <http://academic.eb.com/EBchecked/topic/223437/Yury-Alekseyevich-Gagarin>.
Members of GIRD. Left to right: standing I.P. Fortikov, Yu A Pobedonostsev, Zabotin; sitting: A. Levitsky, Nadezhda Sumarokova, Sergei Korolev, B.I. Charanovsky, Friedrich Zander.
Katyusha Rocket Launcher in Action.
Tupolev TU-16 Bomber, the First Soviet Jet Bomber.
Sputnik I, the first artificial Earth satellite