Large “omics” datasets are increasingly available
in public repositories, such as GEO1, ArrayExpress2, Proteome Exchange3 and the GDC4. However, the sheer volume of information requires automated
procedures to extract what is biologically relevant for researchers and
clinicians. Historically, the NCI60 cell line panel5 (60 cell lines), was the go-to resource that combined phenotypic
data with drug response. Later, GDSC6 and CTRP7 have emerged and cover a much greater number of cell lines (987 for
GDSC, 860 for CCLE), albeit for a much smaller set of therapies (GDSC1: 320,
CTRP: 481) and with a much more incomplete drug-cell line matrix. The next step
in the evolution, is the Cancer Dependency Map (https://depmap.org/). The CDM integrates
several projects, such as gene expression from CCLE, CRISPR knockouts8, proteomics9 and the PRISM10 drug repositioning screen (578 cell lines, 4686 drugs).
Ironically, when it comes to investigating
the sensitivity to drug treatment in a cancer-specific context, limitations of
the data become immediately apparent. For example, for the average cancer type only
45 cell lines are available. By...