The Annals of Statistics

Principal support vector machines for linear and nonlinear sufficient dimension reduction

Bing Li, Andreas Artemiou, and Lexin Li

Full-text: Open access


We introduce a principal support vector machine (PSVM) approach that can be used for both linear and nonlinear sufficient dimension reduction. The basic idea is to divide the response variables into slices and use a modified form of support vector machine to find the optimal hyperplanes that separate them. These optimal hyperplanes are then aligned by the principal components of their normal vectors. It is proved that the aligned normal vectors provide an unbiased, √n-consistent, and asymptotically normal estimator of the sufficient dimension reduction space. The method is then generalized to nonlinear sufficient dimension reduction using the reproducing kernel Hilbert space. In that context, the aligned normal vectors become functions and it is proved that they are unbiased in the sense that they are functions of the true nonlinear sufficient predictors. We compare PSVM with other sufficient dimension reduction methods by simulation and in real data analysis, and through both comparisons firmly establish its practical advantages.

Article information

Ann. Statist. Volume 39, Number 6 (2011), 3182-3210.

First available in Project Euclid: 5 March 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62-09: Graphical methods 62G08: Nonparametric regression 62H12: Estimation

Contour regression invariant kernel inverse regression principal components reproducing kernel Hilbert space support vector machine


Li, Bing; Artemiou, Andreas; Li, Lexin. Principal support vector machines for linear and nonlinear sufficient dimension reduction. Ann. Statist. 39 (2011), no. 6, 3182--3210. doi:10.1214/11-AOS932.

Export citation


  • Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404.
  • Artemiou, A. A. (2010). Topics on supervised and unsupervised dimension reduction. Ph.D. thesis, Pennsylvania State Univ., University Park, PA.
  • Bickel, P., Klaassen, C. A. J., Ritov, Y. and Wellner, J. (1993). Efficient and Adaptive Inference in Semi-Parametric Models. Johns Hopkins Univ. Press, Baltimore.
  • Bura, E. and Pfeiffer, R. (2008). On the distribution of the left singular vectors of a random matrix and its applications. Statist. Probab. Lett. 78 2275–2280.
  • Conway, J. B. (1990). A Course in Functional Analysis, 2nd ed. Graduate Texts in Mathematics 96. Springer, New York.
  • Cook, R. D. (1994). Using dimension-reduction subspaces to identify important inputs in models of physical systems. In Proc. Section on Physical and Engineering Sciences 18–25. Amer. Statist. Assoc., Alexandria, VA.
  • Cook, R. D. (1996). Graphics for regressions with a binary response. J. Amer. Statist. Assoc. 91 983–992.
  • Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions Through Graphics. Wiley, New York.
  • Cook, R. D. (2007). Fisher lecture: Dimension reduction in regression. Statist. Sci. 22 1–26.
  • Cook, R. D. and Forzani, L. (2008). Principal fitted components for dimension reduction in regression. Statist. Sci. 23 485–501.
  • Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression. Ann. Statist. 30 455–474.
  • Cook, R. D. and Ni, L. (2005). Sufficient dimension reduction via inverse regression: A minimum discrepancy approach. J. Amer. Statist. Assoc. 100 410–428.
  • Cook, R. D. and Weisberg, S. (1991). Discussion of “Sliced inverse regression for dimension reduction,” by K.-C. Li. J. Amer. Statist. Assoc. 86 316–342.
  • Eaton, M. L. (1986). A characterization of spherical distributions. J. Multivariate Anal. 20 272–276.
  • Fukumizu, K., Bach, F. R. and Jordan, M. I. (2004). Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J. Mach. Learn. Res. 5 73–99.
  • Fukumizu, K., Bach, F. R. and Jordan, M. I. (2009). Kernel dimension reduction in regression. Ann. Statist. 37 1871–1905.
  • Fung, W. K., He, X., Liu, L. and Shi, P. (2002). Dimension reduction based on canonical correlation. Statist. Sinica 12 1093–1113.
  • Gretton, A., Bousquet, O., Smola, A. and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert–Schmidt norms. In 16th International Conference on Algorithmic Learning Theory (S. Jain, H. U. Simon and E. Tomita, eds.). Lecture Notes in Computer Science 3734 63–77. Springer, Berlin.
  • Hall, P. and Li, K.-C. (1993). On almost linearity of low-dimensional projections from high-dimensional data. Ann. Statist. 21 867–889.
  • Hsing, T. and Ren, H. (2009). An RKHS formulation of the inverse regression dimension-reduction problem. Ann. Statist. 37 726–755.
  • Jiang, B., Zhang, X. and Cai, T. (2008). Estimating the confidence interval for prediction errors of support vector machine classifiers. J. Mach. Learn. Res. 9 521–540.
  • Karatzoglou, A. and Meyer, D. (2006). Support vector machines in R. J. Stat. Softw. 15 9.
  • Karatzoglou, A., Smola, A., Hornik, K. and Zeileis, A. (2004). Kernlab—an S4 package for kernel methods in R. J. Stat. Software 11 9.
  • Kurdila, A. J. and Zabarankin, M. (2005). Convex Functional Analysis. Birkhäuser, Basel.
  • Kutner, M. H., Nachtsheim, C. J. and Neter, J. (2004). Applied Linear Regression Models, 4th ed. McGraw-Hill/Irwin, Boston.
  • Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316–342.
  • Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Amer. Statist. Assoc. 87 1025–1039.
  • Li, B. (2000). Nonparametric estimating equations based on a penalized information criterion. Canad. J. Statist. 28 621–639.
  • Li, B. (2001). On quasi likelihood equations with non-parametric weights. Scand. J. Stat. 28 577–602.
  • Li, B. and Dong, Y. (2009). Dimension reduction for nonelliptically distributed predictors. Ann. Statist. 37 1272–1298.
  • Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. Ann. Statist. 17 1009–1052.
  • Li, B. and Wang, S. (2007). On directional regression for dimension reduction. J. Amer. Statist. Assoc. 102 997–1008.
  • Li, B., Zha, H. and Chiaromonte, F. (2005). Contour regression: A general approach to dimension reduction. Ann. Statist. 33 1580–1616.
  • Li, Y. and Zhu, L.-X. (2007). Asymptotics for sliced average variance estimation. Ann. Statist. 35 41–69.
  • Loh, W.-Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statist. Sinica 12 361–386.
  • Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: Some properties and applications. Ann. Statist. 7 381–394.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
  • Vapnik, V. N. (1998). Statistical Learning Theory. Wiley, New York.
  • Wang, Y. (2008). Nonlinear dimension reduction in feature space. Ph.D. thesis, Pennsylvania State Univ., University Park, PA.
  • Wang, Q. and Yin, X. (2008). A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE. Comput. Statist. Data Anal. 52 4512–4520.
  • Weidmann, J. (1980). Linear Operators in Hilbert Spaces. Graduate Texts in Mathematics 68. Springer, New York.
  • Wu, H.-M. (2008). Kernel sliced inverse regression with applications to classification. J. Comput. Graph. Statist. 17 590–610.
  • Wu, Q., Liang, F. and Mukherjee, S. (2008). Regularized sliced inverse regression for kernel models. Technical report, Duke Univ., Durham, NC.
  • Xia, Y., Tong, H., Li, W. K. and Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 363–410.
  • Yeh, Y.-R., Huang, S.-Y. and Lee, Y.-Y. (2009). Nonlinear dimension reduction with kernel sliced inverse regression. IEEE Transactions on Knowledge and Data Engineering 21 1590–1603.
  • Yin, X. and Cook, R. D. (2002). Dimension reduction for the conditional kth moment in regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 159–175.
  • Yin, X., Li, B. and Cook, R. D. (2008). Successive direction extraction for estimating the central subspace in a multiple-index regression. J. Multivariate Anal. 99 1733–1757.
  • Zhu, L., Miao, B. and Peng, H. (2006). On sliced inverse regression with high-dimensional covariates. J. Amer. Statist. Assoc. 101 630–643.

Supplemental materials