A Single-Cell Atlas of the Multicellular Ecosystem of Primary and Metastatic Hepatocellular Carcinoma
SUMMARY
Hepatocellular carcinoma (HCC) represents a paradigm of the relation between tumor microenvironment (TME) and tumor development. Here, we generated > 70,000 single-cell transcriptomes for 10 HCC patients from four relevant sites: primary tumor, portal vein tumor thrombus (PVTT), metastatic lymph node and non-tumor liver. We discovered a cluster of antitumor central memory T (TCM) cells enriched in intratumoral tertiary lymphoid structures (TLSs) of HCC. We found chronic HBV/HCV infection increases the infiltration of CD8+ T cells in tumors but aggravates the exhaustion of tumor-infiltrating lymphocytes. We identified CD11b+ macrophages to be terminally differentiated tumor-associated macrophages (TAMs) and two distinct differentiation trajectories are related to their accumulation. We further demonstrated CD11b+ TAMs promote HCC cells invasion and migration, and angiogenesis. Our data also revealed the heterogeneous population of malignant hepatocytes and their potential multifaceted roles in shaping the immune microenvironment of HCC. Finally, we identified seven TME subtypes of HCC that can predict patient prognosis. Collectively, this large-scale, single-cell atlas deepens our understanding of the ecosystem in primary and metastatic HCCs, might facilitating the development of new immune therapy strategies for this malignancy.
Droplet-based scRNA-seq and gene expression quantification
Single-cell suspensions were converted to barcoded scRNA-seq libraries by using the Chromium Single Cell 3’ Library, Gel Bead & Multiplex Kit and Chip Kit (10x Genomics), aiming for an estimated 5,000 cells per library and following the manufacturer’s instructions. Samples were processed using kits pertaining to V2 barcoding chemistry of 10x Genomics. Single samples are always processed in a single well of a PCR plate, allowing all cells from a sample to be treated with the same master mix and in the same reaction vessel. For each patient, all samples (NTL, PT, PVTT and MLN) were processed in parallel in the same thermal cycler. The generated scRNA-seq libraries were sequenced on a NovaSeq sequencer (Illumina). The Cell Ranger software (version 2.2.0; 10x Genomics) was used to perform sample demultiplexing, barcode processing and single-cell 3’ counting. Cell Ranger’s mkfastq function was used to demultiplex raw base call files from the sequencer, into sample-specific fastq files. Afterward, fastq files for each sample were processed with Cell Ranger’s count function, which was used to align reads to human genome (build hg38) and quantify gene expression levels in single cells.
Quality control and batch correction
To filter out low-quality cells and doublets (two cells encapsulated in a single droplet), for each sample, cells were removed that had either fewer than 200 unique molecular identifiers (UMIs), over 8,000 or below 200 expressed genes. To filter out dead or dying cells, cells were further removed that had over 10% UMIs derived from mitochondrial genome. This resulted in a total of 71,915 high-quality single-cell transcriptomes in all samples.
To further merge samples across tissues and patients, we run a canonical correlation analysis (CCA) for batch correction using the RunMultiCCA function in R package Seurat v2. To calculate canonical correlation vectors (CCVs), variably expressed genes were selected for each sample as having a normalized expression between 0.125 and 3, and a quantile-normalized variance exceeding 0.5, and then combined across all samples. The resulting 2,773 non-redundant variable genes were summarized by CCA, and the first 15 CCVs were aligned to combine raw gene expression matrices generated per sample. The aligned CCVs were also used for tSNE dimensionality reduction using the RunTSNE function in Seurat.
Cell clustering
For cell clustering, we used the FindClusters function in Seurat v2 that implements shared nearest neighbor (SNN) modularity optimization-based clustering algorithm on 30 aligned CCVs with resolution 1–4, leading to 26–61 clusters. A resolution of 3 was chosen for the analysis and a final of 53 clusters were obtained.