This paper evaluates how autoencoder variants with different architectures and parameter settings affect the quality of 2D projections for spatial ensembles, and proposes a guided selection approach based on partially labeled data. Extracting features with autoencoders prior to applying techniques like UMAP substantially enhances the projection results and better conveys spatial structures and spatio-temporal behavior. Our comprehensive study demonstrates substantial impact of different variants, and shows that it is highly data-dependent which ones yield the best possible projection results. We propose to guide the selection of an autoencoder configuration for a specific ensemble based on projection metrics. These metrics are based on labels, which are however prohibitively time-consuming to obtain for the full ensemble. Addressing this, we demonstrate that a small subset of labeled members suffices for choosing an autoencoder configuration. We discuss results featuring various types of autoencoders applied to two fundamentally different ensembles featuring thousands of members: channel structures in soil from Markov chain Monte Carlo and time-dependent experimental data on droplet-film interaction.