1 Overview

In Tang et al. (2024), we developed the simplified MyProstateScore2.0 (sMPS2), a 7-gene urine test which achieves similar state-of-the-art diagnostic accuracy for predicting high-grade prostate cancer as the original 18-gene MyProstateScore2.0 (MPS2) (Tosoian et al. 2024). This simplified biomarker test provides a more cost-effective alternative to the original MPS2 test and greatly increases its accessibility for routine clinical care.

In this PCS documentation (Yu and Kumbier 2020), we expand upon the sMPS2 model development pipeline, transparently documenting and justifying human judgment calls (including data preprocessing and modeling decisions) when possible. We also provide additional visualizations and stability analyses to further support the robustness and generalizability of the sMPS2 test.

2 Exploratory Data Analysis

In this section, we provide a brief exploration of the Development Cohort data, which was used to develop the simplified MyProstateScore2.0 (sMPS2) test. This Development Cohort consists of 761 samples and was used to build the original MPS2 models (Tosoian et al. 2024).

Below, we visualize the (marginal) distribution of each gene and clinical variable, grouped by prostate cancer (PCa) grade. Some interesting observations:

  • The expression of some genes, such as PCA3 and T2:ERG (used in the original MPS test (Tomlins et al. 2011)), show clear differences between low- and high-grade prostate cancers (Figure 2.2).
  • While PSA, by itself, appears to have limited diagnostic accuracy in this cohort, prostate volume shows greater potential for distinguishing between low- and high-grade PCa (Figure 2.4).
  • We emphasize that these are merely observations based on marginal analyses. More formal analyses will be conducted in subsequent sections.

We also plot a correlation heatmap (Figure 2.3), showing the pairwise relationships between two genes’ expressions (as measured via their Ct values). This correlation heatmap shows that there are indeed strong positive correlations between groups of genes, which can complicate the interpretation and affect the stability of feature importances.

Primary Outcome

Figure 2.1: Number of high-grade and low-grade prostate cancer (PCa) patients in Development Cohort.

Gene Expression Data

Distribution of Ct values in Development Cohort for each gene by prostate cancer (PCa) grade.

Figure 2.2: Distribution of Ct values in Development Cohort for each gene by prostate cancer (PCa) grade.

Figure 2.3: Correlation heatmap of gene expression (Ct values) in Development Cohort data. Genes have been clustered using hierarchical clustering.

Clinical Variables

Distribution of clinical features in Development Cohort by prostate cancer (PCa) grade.

Figure 2.4: Distribution of clinical features in Development Cohort by prostate cancer (PCa) grade.

3 Data Preprocessing Choices

As discussed in Tang et al. (2024), gene expression in each urine sample was measured via the cycle threshold (Ct) using qPCR profiling across 54 genes. These 54 genes were previously nominated as potential biomarkers for prostate cancer (PCa) detection in the MPS2 study (Tosoian et al. 2024) and are thus of interest here. In what follows, we recap the data preprocessing procedure used in this study (also described in Tang et al. (2024)) and provide additional justification for our judgment calls wherever possible.

As a starting point, we preprocessed the expression data as in the original MPS2 study (Tosoian et al. 2024):

  1. We set the upper Ct value limit to 35. Specifically, Ct values greater than this limit were considered undetected and set to 35. Ct values from OpenArray that were “Undetermined” or “Inconclusive/No Amp” were also considered to be undetected and set to the upper Ct value limit of 35.

    • While it is arguably common practice to set the upper Ct value limit to 40, previous work has shown that setting the Ct value limit to 40 can often introduce unwanted biases and that setting this limit to 35 can effectively reduce this bias (McCall et al. 2014).
  2. We computed the standard deviation (SD) across 3 technical replicates. If SD \(\geq\) 1, the replicate farthest from the mean was removed; otherwise, all 3 replicates were kept. This is to help filter out poor quality replicates.

  3. We computed the average Ct value across the remaining technical replicates.

  4. All samples with an average Ct value of the reference gene KLK3 above the 95th percentile were removed.

    • Note that KLK3 is a well-known prostate marker. Hence, if the urine sample does not contain detectable levels of KLK3 (i.e., the average Ct value for KLK3 is high), it is unlikely that the sample will contain detectable levels of other prostate cancer biomarkers. We thus performed this filtering step to remove poor quality samples.
  5. We normalized the average Ct values for each target gene by KLK3 using the formula -[ average Ct of gene X - average Ct of KLK3 ].

  6. Finally, z-score scaling was applied to the normalized average Ct before downstream model development and feature selection.

We refer to this data preprocessing pipeline as the base preprocessing pipeline. However, there are several alternative, but equally-reasonable ways to deal with undetectable Ct values and poor quality samples in the data preprocessing. While we cannot explore all possible preprocessing choices, we do explore a few alternatives in this work in order to improve the robustness of our model and conclusions. Namely, we considered the following alternative preprocessing pipelines:

  • Ct limit = 40 preprocessing pipeline: Rather than setting the upper Ct value limit to 35 for undetected replicates, we instead follow popular practice and set the upper Ct value limit to 40. All other preprocessing steps remain unchanged from the base preprocessing pipeline.
  • Normalized Ct limit = -21 preprocessing pipeline: In the aforementioned data preprocessing pipelines, the Ct values for undetected replicates were set prior to the normalization of the Ct values. Thus, the normalized Ct value for undetected replicates differs between genes. For comparison, in this preprocessing pipeline, we instead replace the Ct values for all undetected replicates after Ct normalization to have a constant value of -21 (which was the lowest Ct value post-normalization). All other preprocessing steps remain unchanged from the base preprocessing pipeline.
  • No sample exclusion preprocessing pipeline: Rather than excluding all samples with an average Ct value of the reference gene KLK3 above the 95th percentile, this preprocessing pipeline does not exclude any samples based upon their Ct value for the reference gene KLK3. This is to assess whether or not the exclusion of samples based upon their Ct value for the reference gene KLK3 impacts downstream conclusions. All other data preprocessing steps remain unchanged from the base preprocessing pipeline.

In addition to these preprocessed gene expression data, we also have access to various clinical data for each sample. We chose to focus on the following clinical variables for model development, as they are both known to be associated with high-grade prostate cancer and generally available in clinical practice: age, race, family history of prostate cancer, abnormal DRE, prior negative biopsy, and prostate specific antigen (PSA) (Thompson et al. 2006).

4 Modeling Choices

For each preprocessed dataset, we trained many different statistical/machine learning models to predict high-grade prostate cancer (PCa). Specifically, we considered the following models:

Logistic-based Models:

  • Logistic regression
  • Logistic regression with \(L_1\) (LASSO) regularization
  • Logistic regression with \(L_2\) (ridge) regularization
  • Logistic regression with combined \(L_1\) + \(L_2\) (elastic net) regularization

Tree-based Models:

  • Random forests (RF)
  • Gradient boosting decision trees (GBDT)
  • RuleFit
  • Random forests+ (RF+)
  • Fast interpretable greedy-tree sums (FIGS)

We focused on these logistic- and tree-based models given the importance of interpretability and our goal of identifying important genes for reliable biomarker development. While the logistic-based models are generally thought to serve as baseline models, the tree-based models can provide greater flexibility to capture more complex relationships between genes and the outcome of interest without sacrificing interpretability. We note that other flexible but interpretable machine learning models could also be considered and may be of interest in future work. However, in this current work, we chose to first focus on these logistic-based models and tree-based models – the latter of which is often uniquely suited for biological tasks such as this, in part due to the resemblance between the thresholding behavior of decision trees and the on-off switch-like behavior commonly thought to govern genetic processes (Nelson, Lehninger, and Cox 2008).

We detailed the hyperparameters and python implementation used for each model in Table 1 in Tang et al. (2024). Hyperparameters were tuned using 5-fold cross-validation.

5 Prediction Check

As the first step in the sMPS2 model development pipeline, we performed a prediction check to filter out models which have poor prediction performance and thus may not accurately reflect reality (Yu and Kumbier 2020). Here, guided by the PCS framework, we use prediction performance as a reality check and a minimum requirement for interpretability. Moreover, we assess the prediction performance, not only for different models but also for different data preprocessing pipelines. Examining multiple prediction metrics (i.e., area under the receiver operating characteristic (AUROC), area under the precision-recall curve (AUPRC), and classification accuracy), we found that:

  • The prediction performance was quite stable across different data preprocessing pipelines. Notably, the variation in prediction performance across data preprocessing pipelines (blue barplot) was substantially smaller than the variation in prediction performance across models (pink barplot). These observations suggest that downstream conclusions are robust to these data preprocessing choices.
  • Ordinary logistic regression appears to have reasonable prediction performance (only \(\sim 1\%\) lower than the best-performing model in terms of AUROC). Given its simplicity, we chose to use logistic regression as the baseline model to determine whether or not other models passed the prediction check.
  • RuleFit, GBDT, and FIGS, on average, performed worse than logistic regression across the different prediction performance metrics, suggesting that they may not be appropriate fits for this data. RF also performed slightly worse than logistic regression on average. However, unlike RuleFit, GBDT, and FIGS, RF yielded higher prediction performance (measured via AUROC, AUPRC, and classification accuracy) than logistic regression in at least one data preprocessing pipeline. We thus excluded RuleFit, GBDT, and FIGS (but not RF) from the remainder of the model development pipeline.

We defer additional methodological details on how this prediction check was conducted to Tang et al. (2024).

AUROC

(Left) For each choice of data preprocessing and prediction model, the validation AUROC, averaged across 4 CV folds and 10 repeated Development-Test splits, is shown. The error bars represent the inner 95% quantile range of the distribution of AUROCs. (Middle, Right) We compare the variation in AUROC across data preprocessing pipelines and methods. In the middle subplot, we show the range of mean AUROCs across the four data preprocessing pipelines for each method. In the right subplot, we show the difference between the mean AUROC from each method and the best performing method (i.e., logistic regression with the elastic net penalty) across all data preprocessing pipelines. The difference in AUROCs across data preprocessing pipelines is substantially smaller than that across prediction methods, suggesting that the development pipeline and downstream findings are robust to data preprocessing choices.

Figure 5.1: (Left) For each choice of data preprocessing and prediction model, the validation AUROC, averaged across 4 CV folds and 10 repeated Development-Test splits, is shown. The error bars represent the inner 95% quantile range of the distribution of AUROCs. (Middle, Right) We compare the variation in AUROC across data preprocessing pipelines and methods. In the middle subplot, we show the range of mean AUROCs across the four data preprocessing pipelines for each method. In the right subplot, we show the difference between the mean AUROC from each method and the best performing method (i.e., logistic regression with the elastic net penalty) across all data preprocessing pipelines. The difference in AUROCs across data preprocessing pipelines is substantially smaller than that across prediction methods, suggesting that the development pipeline and downstream findings are robust to data preprocessing choices.

AUPRC

(Left) For each choice of data preprocessing and prediction model, the validation AUPRC, averaged across 4 CV folds and 10 repeated Development-Test splits, is shown. The error bars represent the inner 95% quantile range of the distribution of AUPRCs. (Middle, Right) We compare the variation in AUPRC across data preprocessing pipelines and methods. In the middle subplot, we show the range of mean AUPRCs across the four data preprocessing pipelines for each method. In the right subplot, we show the difference between the mean AUPRC from each method and the best performing method (i.e., logistic regression with the elastic net penalty) across all data preprocessing pipelines. The difference in AUPRCs across data preprocessing pipelines is substantially smaller than that across prediction methods, suggesting that the development pipeline and downstream findings are robust to data preprocessing choices.

Figure 5.2: (Left) For each choice of data preprocessing and prediction model, the validation AUPRC, averaged across 4 CV folds and 10 repeated Development-Test splits, is shown. The error bars represent the inner 95% quantile range of the distribution of AUPRCs. (Middle, Right) We compare the variation in AUPRC across data preprocessing pipelines and methods. In the middle subplot, we show the range of mean AUPRCs across the four data preprocessing pipelines for each method. In the right subplot, we show the difference between the mean AUPRC from each method and the best performing method (i.e., logistic regression with the elastic net penalty) across all data preprocessing pipelines. The difference in AUPRCs across data preprocessing pipelines is substantially smaller than that across prediction methods, suggesting that the development pipeline and downstream findings are robust to data preprocessing choices.

Accuracy

(Left) For each choice of data preprocessing and prediction model, the validation Accuracy, averaged across 4 CV folds and 10 repeated Development-Test splits, is shown. The error bars represent the inner 95% quantile range of the distribution of Accuracys. (Middle, Right) We compare the variation in Accuracy across data preprocessing pipelines and methods. In the middle subplot, we show the range of mean Accuracys across the four data preprocessing pipelines for each method. In the right subplot, we show the difference between the mean Accuracy from each method and the best performing method (i.e., logistic regression with the elastic net penalty) across all data preprocessing pipelines. The difference in Accuracys across data preprocessing pipelines is substantially smaller than that across prediction methods, suggesting that the development pipeline and downstream findings are robust to data preprocessing choices.

Figure 5.3: (Left) For each choice of data preprocessing and prediction model, the validation Accuracy, averaged across 4 CV folds and 10 repeated Development-Test splits, is shown. The error bars represent the inner 95% quantile range of the distribution of Accuracys. (Middle, Right) We compare the variation in Accuracy across data preprocessing pipelines and methods. In the middle subplot, we show the range of mean Accuracys across the four data preprocessing pipelines for each method. In the right subplot, we show the difference between the mean Accuracy from each method and the best performing method (i.e., logistic regression with the elastic net penalty) across all data preprocessing pipelines. The difference in Accuracys across data preprocessing pipelines is substantially smaller than that across prediction methods, suggesting that the development pipeline and downstream findings are robust to data preprocessing choices.

6 Stability-driven Gene Ranking

After filtering out the poor-performing prediction models, we sought next to identify the topmost important genes, which were stably important across all four data preprocessing pipelines, six prediction-checked models, and ten Development-Test splits (i.e., \(4 \times 6 \times 10 = 240\) combinations). Details on how we computed the gene importances for each model are discussed in Tang et al. (2024).

We instead use this opportunity to conduct a stability analysis of the PCS-ensembled gene rankings. As discussed in Tang et al. (2024), the obtained PCS-ensembled gene rankings are an ensemble of gene rankings across four different data preprocessing pipelines and six prediction-checked models (RF, RF+, logistic elastic net, logistic LASSO, logistic ridge, and ordinary logistic regression). We chose to use these six prediction models since each passed the prediction check. However, it is natural to wonder whether the PCS-ensembled gene rankings would change if a different subset of the prediction-checked prediction models were used. In particular, since the original set of prediction models consisted of four logistic-based models and two tree-based models, we investigated how the PCS-ensembled gene rankings would change if we used a “balanced” set of prediction models, composed of two logistic-based models and two tree-based models.

Below in the Gene Ranking Summary tab, we examined the PCS-ensembled gene rankings, ensembled across all four data preprocessing pipelines and

  • all six prediction-checked models (RF, RF+, logistic elastic net, logistic LASSO, logistic ridge, and ordinary logistic regression)
  • two logistic-based models (logistic elastic net and logistic ridge) and the two tree-based models (RF and RF+)
  • two logistic-based models with (logistic elastic net and logistic LASSO) and the two tree-based models (RF and RF+)
  • two logistic-based models with (logistic elastic net and logistic) and the two tree-based models (RF and RF+)

Here, we chose to always include logistic elastic net in the PCS ensemble as it demonstrated the highest predictive power in the prediction check step.

Takeaways from this stability analysis of the PCS-ensembled gene rankings:

  • Besides the shuffled ranking of PCA3, the top gene rankings are the same across the different PCS ensembles:
    • Order of top-ranked genes when including all methods or the balanced ensemble with logistic elastic net, logistic, RF, and RF+: T2:ERG, SCHLAP1, OR51E2, PCAT14, TFF3, PCA3, APOC1
    • Order of top-ranked genes using the two balanced method ensembles, excluding logistic regression: T2:ERG, SCHLAP1, PCA3, OR51E2, PCAT14, TFF3, APOC1
  • When ensembling across all six prediction-checked models, PCA3 was the 5th ranked gene according to its mean rank, ranked in the top 5 in almost 50% of the fits, but appeared to have mild instability (seen by the moderate SD of its ranking distribution) (Figure 6.1). However, when excluding the ordinary logistic regression model (which performed worse than the regularized logistic regression models in the prediction check (Figure 5.1)) from the balanced PCS ensembles, PCA3 became the 3rd ranked gene according to its mean rank, ranked in the top 5 in over 60% of the fits, and exhibited far greater stability than before (Figures 6.2-6.3). This boost in ranking and stability supports the case for including PCA3 in the final sMPS2 model.
  • While APOC1 appears to be a stably ranked top 10 gene in the initial PCS ensemble using all six prediction-checked models, we found that APOC1 only appeared in the top 10 in approximately 50% of the fits when using the balanced PCS ensembles (Figures 6.2-6.4). The top 6 genes, in contrast, appeared in the top 10 in >70% (and often >85%) of the fits when using the balanced PCS ensembles. This contrast is further seen by the stark drop in the top 10 stability plot between TFF3 and APOC1, possibly suggesting that APOC1 is not as stably important as the other top genes and perhaps should be excluded from the final sMPS2 model.
  • Across the different PCS ensembles, the 7th-ranked gene (CAMKK2 in the original PCS ensemble and ERG in the balanced PCS ensembles) and onward appear to have substantially more unstable gene rankings compared to the top 6 genes (as made evident in the SD Rank subplots). For this reason, we chose not to include these genes in the final sMPS2 model.
  • Overall, this stability analysis provided crucial information to help us decide which genes to include in the final sMPS2 model. More specifically, using evidence provided by this stability analysis, we decided to include T2:ERG, SCHLAP1, OR51E2, PCAT14, TFF3, and PCA3 in the final s7MPS2 model. Given the borderline status of APOC1, we also developed the s8MPS2 model, which includes all of the genes from s7MPS2 and APOC1.

To supplement this stability analysis, we also provide a more granular view of the gene rankings per data preprocessing pipeline and model in the Gene Ranking Heatmap tab. These heatmaps showcase both genes that are stably important across all data preprocessing pipelines and methods as well as genes that are stably important across only a subset of data preprocessing pipelines and/or methods. In particular, these heatmaps confirm that the logistic regression model drives much of the instability that we observed previously in the PCA3 gene rankings.

Gene Ranking Summary

Aggregating All Methods

Figure 6.1: Summary of the gene importance rankings, as measuerd by their mean gene ranking across four data preprocessing pipelines and 6 prediction-checked models (Logistic, Logistic Elastic Net, Logistic Lasso, Logistic Ridge, RF, RF+), the variability of their gene rankings as measured by the standard deviation (SD) of this distribution, and the proportion of times that the gene appeared int he top 5, 10, and 17 genes. The six genes highlighted in dark teal were used in the s7MPS2 model. The APOC1 gene, highlighted in light teal, was used in the s8MPS2 model.

Aggregating RF, RF+, Elastic Net, Ridge

Figure 6.2: Summary of the gene importance rankings, as measuerd by their mean gene ranking across four data preprocessing pipelines and 4 prediction-checked models (Logistic Elastic Net, Logistic Ridge, RF, RF+), the variability of their gene rankings as measured by the standard deviation (SD) of this distribution, and the proportion of times that the gene appeared int he top 5, 10, and 17 genes. The six genes highlighted in dark teal were used in the s7MPS2 model. The APOC1 gene, highlighted in light teal, was used in the s8MPS2 model.

Aggregating RF, RF+, Elastic Net, Lasso

Figure 6.3: Summary of the gene importance rankings, as measuerd by their mean gene ranking across four data preprocessing pipelines and 4 prediction-checked models (Logistic Elastic Net, Logistic Lasso, RF, RF+), the variability of their gene rankings as measured by the standard deviation (SD) of this distribution, and the proportion of times that the gene appeared int he top 5, 10, and 17 genes. The six genes highlighted in dark teal were used in the s7MPS2 model. The APOC1 gene, highlighted in light teal, was used in the s8MPS2 model.

Aggregating RF, RF+, Elastic Net, Logistic

Figure 6.4: Summary of the gene importance rankings, as measuerd by their mean gene ranking across four data preprocessing pipelines and 4 prediction-checked models (Logistic, Logistic Elastic Net, RF, RF+), the variability of their gene rankings as measured by the standard deviation (SD) of this distribution, and the proportion of times that the gene appeared int he top 5, 10, and 17 genes. The six genes highlighted in dark teal were used in the s7MPS2 model. The APOC1 gene, highlighted in light teal, was used in the s8MPS2 model.

Gene Ranking Heatmap

Figure 6.5: Heatmap of the mean gene ranking (averaged across 10 Development-Test splits) per data preprocessing pipeline and model choice.

Heatmap of the gene ranking per data preprocessing pipeline, model, and Development-Test split. Each row corresponds to a different Development-Test split for the data preprocessing pipeline and model choice labeled on the right.

Figure 6.6: Heatmap of the gene ranking per data preprocessing pipeline, model, and Development-Test split. Each row corresponds to a different Development-Test split for the data preprocessing pipeline and model choice labeled on the right.

7 Validation

We next assessed the impact of the choice of gene panel size (i.e,. the number of top-ranked genes used in the sMPS2 model) and the gene ranking strategy (i.e., model-specific versus model-ensembled versus PCS-ensembled) on the prediction accuracy, evaluated on the test set (from the Development-Test split). Test prediction accuracies are averaged across 10 different Development-Test splits. In this section, we summarize these test prediction results (measured via AUROC, AUPRC, and classification accuracy) across the different gene panel sizes, gene ranking strategies, data preprocessing pipelines, and model choices. In general, the results suggest that 6 or 7 genes are sufficient to achieve competitive prediction performance, and that the PCS-ensembled gene ranking strategy is the most robust across different data preprocessing pipelines and model choices.

In Tang et al. (2024), we additionally conducted and detailed an external validation study, which confirmed the strong prediction accuracy of the sMPS2 models. However, given the blinded nature of the external validation study, the data is not accessible by the co-first authors for use in this PCS documentation. We refer the reader to the original publication for details.

AUROC

Mean AUROC, evaluated on test set, when training various models (rows) using various choices of gene panel sizes (x-axis), data preprocessing pipelines (columns), and gene rankings (color). The PCS-ensembled gene rankings (in black) generally yield the highest test AUROCs compared to other procedures for obtaining the gene rankings. Moreover, using 6 or 7 predictor genes (vertical dotted and dashed lines, respectively) yields very competitive test prediction performance and is often comparable to the high achieved AUROC.

Figure 7.1: Mean AUROC, evaluated on test set, when training various models (rows) using various choices of gene panel sizes (x-axis), data preprocessing pipelines (columns), and gene rankings (color). The PCS-ensembled gene rankings (in black) generally yield the highest test AUROCs compared to other procedures for obtaining the gene rankings. Moreover, using 6 or 7 predictor genes (vertical dotted and dashed lines, respectively) yields very competitive test prediction performance and is often comparable to the high achieved AUROC.

AUPRC

Mean AUPRC, evaluated on test set, when training various models (rows) using various choices of gene panel sizes (x-axis), data preprocessing pipelines (columns), and gene rankings (color). The PCS-ensembled gene rankings (in black) generally yield the highest test AUPRCs compared to other procedures for obtaining the gene rankings. Moreover, using 6 or 7 predictor genes (vertical dotted and dashed lines, respectively) yields very competitive test prediction performance and is often comparable to the high achieved AUPRC.

Figure 7.2: Mean AUPRC, evaluated on test set, when training various models (rows) using various choices of gene panel sizes (x-axis), data preprocessing pipelines (columns), and gene rankings (color). The PCS-ensembled gene rankings (in black) generally yield the highest test AUPRCs compared to other procedures for obtaining the gene rankings. Moreover, using 6 or 7 predictor genes (vertical dotted and dashed lines, respectively) yields very competitive test prediction performance and is often comparable to the high achieved AUPRC.

Accuracy

Mean Accuracy, evaluated on test set, when training various models (rows) using various choices of gene panel sizes (x-axis), data preprocessing pipelines (columns), and gene rankings (color). The PCS-ensembled gene rankings (in black) generally yield the highest test Accuracys compared to other procedures for obtaining the gene rankings. Moreover, using 6 or 7 predictor genes (vertical dotted and dashed lines, respectively) yields very competitive test prediction performance and is often comparable to the high achieved Accuracy.

Figure 7.3: Mean Accuracy, evaluated on test set, when training various models (rows) using various choices of gene panel sizes (x-axis), data preprocessing pipelines (columns), and gene rankings (color). The PCS-ensembled gene rankings (in black) generally yield the highest test Accuracys compared to other procedures for obtaining the gene rankings. Moreover, using 6 or 7 predictor genes (vertical dotted and dashed lines, respectively) yields very competitive test prediction performance and is often comparable to the high achieved Accuracy.

8 Final Remarks

In this PCS documentation (Yu and Kumbier 2020), we have shed additional light on the various decisions that were made throughout the development of the sMPS2 model and have justified many of these choices to the best of our ability. While we acknowledge that other equally-reasonable choices could have been made, we hope that this documentation will be a useful resource for researchers and clinicians who are interested in building upon this work.

Bibliography

McCall, Matthew N, Helene R McMurray, Hartmut Land, and Anthony Almudevar. 2014. “On Non-Detects in qPCR Data.” Bioinformatics 30 (16): 2310–16.
Nelson, David L, Albert L Lehninger, and Michael M Cox. 2008. Lehninger Principles of Biochemistry. Macmillan.
Tang, Tiffany M, Yuping Zhang, Ana M Kenney, Cassie Xie, Yingye Zheng, Jeffrey J Tosoian, Lanbo Xiao, et al. 2024. “A Simplified MyProstateScore2.0 for High-Grade Prostate Cancer.”
Thompson, Ian M, Donna Pauler Ankerst, Chen Chi, Phyllis J Goodman, Catherine M Tangen, M Scott Lucia, Ziding Feng, Howard L Parnes, and Charles A Coltman Jr. 2006. “Assessing Prostate Cancer Risk: Results from the Prostate Cancer Prevention Trial.” Journal of the National Cancer Institute 98 (8): 529–34.
Tomlins, Scott A, Sheila MJ Aubin, Javed Siddiqui, Robert J Lonigro, Laurie Sefton-Miller, Siobhan Miick, Sarah Williamsen, et al. 2011. “Urine TMPRSS2: ERG Fusion Transcript Stratifies Prostate Cancer Risk in Men with Elevated Serum PSA.” Science Translational Medicine 3 (94): 94ra72–72.
Tosoian, Jeffrey J, Yuping Zhang, Lanbo Xiao, Cassie Xie, Nathan L Samora, Yashar S Niknafs, Zoey Chopra, et al. 2024. “Development and Validation of an 18-Gene Urine Test for High-Grade Prostate Cancer.” JAMA Oncology.
Yu, Bin, and Karl Kumbier. 2020. “Veridical Data Science.” Proceedings of the National Academy of Sciences 117 (8): 3920–29. https://doi.org/10.1073/pnas.1901326117.
