Carnegie Mellon University
Browse
jwieczor_statistics_2018.pdf (987.78 kB)

Model Selection and Stopping Rules for High-Dimensional Forward Selection

Download (987.78 kB)
thesis
posted on 2018-04-01, 00:00 authored by Jerzy WieczorekJerzy Wieczorek
Forward Selection (FS) is a popular variable selection method for linear regression. Working in a sparse high-dimensional setting, we derive sufficient conditions for FS to attain model-selection consistency, assuming the true model size is known. Compared with earlier results for the closely-related Orthogonal Matching Pursuit (OMP), our conditions are similar but obtained using a different argument. We also demonstrate why a submodularity-based argument is not fruitful for the purpose of correct model recovery.
Since the true model size is rarely known in practice, we also derive sufficient conditions for model-selection consistency of FS with a data-driven stopping rule, based on a sequential variant of cross-validation (CV). As a by-product of our proofs, we also have a sharp (sufficient and almost necessary) condition for model selection consistency when using "wrapper" forward search for linear regression. This appears to be the first consistency result for any wrapper model-selection method. We illustrate intuition and demonstrate performance of our methods using simulation studies and real datasets.

History

Date

2018-04-01

Degree Type

  • Dissertation

Department

  • Statistics

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Jing Lei

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC