proteomic dataset of human regulatory T cells A high-resolution mass spectrometry based Data in Brief

(1)

ContentslistsavailableatScienceDirect

Data in Brief

journalhomepage:www.elsevier.com/locate/dib

Data Article

A high-resolution mass spectrometry based proteomic dataset of human regulatory T cells

Harshi Weerakoon

^a^,^b^,^c

, John J. Miles

^d^,^e

, Ailin Lepletier

^d^,^f

, Michelle M. Hill

^a^,^g^,^∗

aPrecision and Systems Biomedicine Laboratory, QIMR Berghofer Medical Research Institute, Herston, QLD, Australia

bSchool of Biomedical Sciences, The University of Queensland, Brisbane, QLD, Australia

cDepartment of Biochemistry, Faculty of Medicine and Allied Sciences, Rajarata University of Sri Lanka, Saliyapura, Sri Lanka

dHuman Immunity Laboratory, QIMR Berghofer Medical Research Institute, Herston, QLD, Australia

eAustralian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia

fLaboratory of Vaccines for the Developing World, Institute for Glycomics, Southport, QLD, Australia

gCentre for Clinical Research, Faculty of Medicine, The University of Queensland, Brisbane, QLD, Australia

a rt i c l e i n f o

Article history:

Received 18 September 2021 Revised 21 November 2021 Accepted 3 December 2021 Available online 6 December 2021 Keywords:

Regulatory T cell Conventional T cell Proteomics

Tandem mass spectrometry LC-MS/MS

a b s t r a c t

Regulatory T cells (Tregs) play a core role in maintain- ingimmunetolerance,homeostasis, and hosthealth. High- resolutionanalysisoftheTregproteomeisrequiredtoiden- tify enrichedbiological processesand pathways distinct to thisimportantimmune celllineage. Wepresent acompre- hensiveproteomicdatasetofTregspairedwithconventional CD4⁺ (ConvCD4⁺)Tcells inhealthyindividuals.Tregs and Conv CD4⁺ T cells were sorted to high purity using dual magneticbead-basedandflowcytometry-basedmethodolo- gies.Proteinsweretrypsin-digestedandanalysedusinglabel- free data-dependent acquisition mass spectrometry (DDA- MS) followed by label free quantitation (LFQ) proteomics analysis using MaxQuant software. Approximately 4,000 T cellproteinswereidentifiedwitha1%falsediscoveryrate,of whichapproximately2,800proteinswereconsistentlyiden- tified and quantified in all the samples. Finally, flow cytometry with amonoclonal antibodywas used to validate the elevated abundance ofthe protein phosphatase CD148 inTregs.Thisproteomicdatasetservesasareference point for future mechanistic and clinical T cell immunology and

∗ Corresponding author.

E-mail address: [email protected] (M.M. Hill).

https://doi.org/10.1016/j.dib.2021.107687

(2)

identiﬁes receptors, processes, and pathways distinct to Tregs. Collectively, these datawill lead to a better under- standing of Tregimmunophysiology and potentially reveal novelleadsfortherapeuticsseekingTregregulation.

ThisisanopenaccessarticleundertheCCBY-NC-ND license(http://creativecommons.org/licenses/by-nc-nd/4.0/)

SpeciﬁcationsTable

Subject Immunology

Speciﬁc subject area Regulatory T cells (Tregs) play a key role in directing adaptive immunity, particularly enforcing immune tolerance. However, few studies have examined the ex vivo Treg proteome. High-resolution and fully quantitative MS was used to proﬁle the Treg and conventional CD4 ⁺(Conv CD4 ⁺) T cell proteomes from healthy human blood.

Bioinformatics yielded reliable, reproducible, and quantitative data.

Differentially abundant proteins were identiﬁed between subsets, and ﬂow cytometry was performed to validate elevated cell surface levels of CD148 in Tregs.

Type of data Tables and Figures.

How data were acquired Orbitrap Fusion ^TMTribrid ^TM(Thermo Fisher Scientiﬁc, USA) inline coupled to nano ACQUITY UPLC (Waters, USA) was used to acquire label-free proteomic data using data-dependent acquisition (DDA-MS).

Data format Raw and analysed data.

Parameters for data collection CD3 ⁺, CD4 ⁺, CD25 ^high, CD127 ^low, FOXP3 ⁺and CD3 ⁺, CD4 ⁺, CD25 ⁻, FOXP3 ⁻cells were identiﬁed as Tregs and Conv CD4 ⁺, respectively.

Peptides were separated using a 160 mins chromatographic gradient at 0.3 μl/min ﬂow rate while ionised at 1900 V and 285 °C. MS1 ranged, and resolutions were 380 – 1500 m/z and 120,000 FWHM,

respectively. Injection time for MS1 and MS2 were 50 ms and 70 ms, respectively. Fifteen ions were selected to trigger MS2 at 90 s dynamic exclusion. The total cycle time was two seconds.

Description of data collection Label-free proteomics data were acquired on peptide samples prepared from Treg and Conv CD4 ⁺from peripheral blood mononuclear cells (PBMC) from three healthy volunteers (age 30-35 years) at the QIMR Berghofer Medical Research Institute (QIMRB), Brisbane, Australia.

Data source location Raw proteomic data are available via ProteomeXchange [1] . Data accessibility Repository name: ProteomeXchange

Data identiﬁcation number: PXD022095 Direct URL to data:

http://www.ebi.ac.uk/pride/archive/projects/PXD022095 FTP Download:

ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2021/08/PXD022095 Related research article H. Weerakoon, J. Straube, K. Lineburg, L. Cooper, S. Lane, C. Smith, S.

Alabbas, J. Begun, J.J. Miles, M.M. Hill, A. Lepletier, Expression of CD49f deﬁnes subsets of human regulatory T cells with divergent

transcriptional landscape and function that correlate with ulcerative colitis disease activity., Clin. Transl. Immunol. 10 (2021) e1334.

https://doi.org/10.1002/cti2.1334 .

ValueoftheData

• This label-free quantitative dataset provides a high-resolution landscape of Treg andConv CD4⁺TcellproteomesfromprimaryTcellsubsetsincirculatinghumanblood.

• Tcell immunologists canusethisdataset toexplore TregandConv CD4⁺ Tcellproteomic landscapesandprovidenovelinsightsintothefundamentalworkingsofTcellsfromdifferent lineages.

(3)

• ClinicalresearcherscanusethisdatasettoexploreTregproteindysfunctionindifferentclini- calsettings.Thesestudieswillhelpcorrectdysfunctionthroughnewinterventionsandresult inbetterpatientmanagementandoutcomesacrossTregassociateddiseases.

• Researchers in T cell biology andimmunology can use the described methods to achieve furtherknowledgegainintheimportantﬁeldofimmunoproteomics.Themethodologieswill alsoaidresearchersworkingwithsmallamountsofvolunteerorpatientmaterial.

1. DataDescription

TheproteomicdatasetdescribedherewasgeneratedfromregulatoryTcells(Tregs),andcon- ventionalCD4⁺ Tcells(Conv CD4⁺) purifiedfromperipheral blood mononuclearcells(PBMC) of three healthyvolunteers (age 30-35 years). The workflow across samplecollection,protein processing, anddata-dependentacquisitionmassspectrometry(DDA-MS)analysisaredepicted in Fig. 1 and detailedin the experimental design, materials, and methods section. The DDA- MSdatawereanalysedusingMaxQuantsoftware(Release1.6.0.16)[2]againstUniProt/SwissProt humanrevieweddatabase[3]using1% falsediscoveryrate(FDR)cut-off tofilterpeptidespec- tralmatchesandtheproteins.The.rawfilesobtainedfromDDA-MSanalysisandtheMaxQuant search resultsare available throughthe ProteomeXchangedatarepository (PXD022095)asde- tailedinTable1.Thesepubliclyavailablerawdatacanbere-analysedusingdifferentparameters anddatabasestoretrievemorespecificinformationdependingontheresearchinterest.Thecon- sistencyofproteomicdataacrossdonorsareshowninFig.2,includingMS/MSspectralcountper sample(Fig.2A)andperprotein(Fig.2B),thenumberofpeptidespersample(Fig.2C)andper protein(Fig.2D),percentageofaminoacidsequencecoverageinpeptides(Fig.2E)andpersam- ple(Fig.2F).Oftheidentified totalproteins(n=4,177),92%wereidentified withsingleUniProt IDsinUniProt/SwissProtdatabase(Fig.2G).Asimilarnumberofproteinswasidentifiedforeach donorinboththeTregandConvCD4⁺ Tcells(Fig.2H).

NormalisedproteinintensityvaluesobtainedfromMaxLFQ[4]inMaxQuantwerethenused indifferentialabundanceanalysistoidentifyproteinsenrichedinTregsrelativeto pairedConv CD4⁺. First, data cleaning retrieved proteins with (i) at least two unique razor peptides, (ii) an m-scoreof ﬁve and(iii) lessthan 50% missing intensity valuesacross samples. The missing intensityvalueswere imputedusingthe maximumlikelihood estimate algorithm(Rpack- age) [5].Next, differentially abundantproteins were identiﬁed using multiplet-test withFDR correctionfromapublishedtwo-stagelinearstep-upprocedure [6](Table S1,Fig.1C). Analysis

Table 1

Description of the data ﬁles deposited in ProteomeXchange data repository under the data identiﬁcation number of PXD022095.

Title of the ﬁle/folder Description

1 Rep1_Treg.raw .raw ﬁle of Treg cells - Replicate 1

4 Rep1_nonTreg.raw .raw file of Conv CD4 ⁺T cells - Replicate 1 5 Rep2_nonTreg.raw .raw file of Conv CD4 ⁺T cells - Replicate 2 6 Rep3_nonTreg.raw .raw file of Conv CD4 ⁺T cells - Replicate 3

7 search.zip MaxQuant ouput ﬁles resulted from the

analysis of the above .raw ﬁles against UniProt/SwissProt human reviewed proteome

8 parameters.txt Parameters used in the data analysis through

MaxQuant search engine

9 human_proteome_reviewed_25102017.fasta UniProt/SwissProt proteome database used in the analysis

10 Treg_nonTreg_protein_quantification.txt MaxQuant ouput files giving the protein quantification data and LFQ normalised protein intensities

(4)

Fig. 1. Study workflow. A. Workflow for sorting Tregs and Conventional (Conv) CD4 ⁺. PBMC obtained from three volunteers underwent two rounds of magnetic-activated cell sorting (MACS) followed by flow cytometry based cell sorting (FACS), yielding Tregs and Conv CD4 ⁺ at high purity. B. Proteomic sample preparation and mass spectrometry (LC- MS/MS) analysis. Peptide samples were prepared from whole cell lysate using protein co-precipitation with trypsin in methanol. The resulting tryptic peptides were desalted before LC-MS/MS analysis on Obitrap Fusion Tribrid inline coupled to nano ACQUITY UPLC. C. DDA-MS data were deconvoluted using the MaxQuant search engine against the UniProt/SwissProt human proteome. Differential expression analysis was performed on only high-quality label-free protein intensity data D. Orthogonal validation of CD148 ( PTPRJ ) enrichment in Treg cells.

returned 227 differentially abundantproteins betweenTreg andConv CD4⁺ Tcells, with157 (69%oftotal)proteinssigniﬁcantlyenrichedinTregs.Theenrichedbiologicalprocessesineach celltypeweredeterminedusinggeneontology(GO)andpathwayanalysiscomprisingKyotoEn- cyclopaediaofGenesandGenomes(KEGG)analysis[7],andReactomepathways[8].Functional enrichmentanalysisusingtheSTRINGnetwork[9]revealednoenrichedbiologicalprocessesfor ConvCD4⁺Tcells,whileTregcellswereenrichedwithnegativeregulationofinterferon-gamma

(5)

Fig. 2. Summary of DDA-MS proteomic dataset in human Tregs. Label-free DDA-MS data were analyzed using MaxQuant search engine against UniProt/SwissProt human reviewed proteome database. MS/MS spectral data determined at 1% FDR were selected for further analysis across Tregs (blue) and Conv CD4 ⁺(orange). A . The number of MS/MS spectra detected in each sample. B. Mean MS/MS spectra per protein per sample. C. Total number of peptides and unique + razor peptides detected in each sample. D . Mean unique + razor peptides per protein per sample. E. Percentage of amino acid sequence coverage from the peptide dataset (unique + razor). F. Mean percentage of amino acid sequence coverage per peptide per sample (unique + razor). G. Total number of proteins quantiﬁed with single or multiple UniProt entries. Here, 92% of proteins had single entries (light grey), and 8% of proteins had multiple peptide entries (dark grey). H. The number of proteins quantiﬁed per sample. Error bars with standard deviation are shown.

production and T cell activation and leukocyte cell-cell adhesion and nuclear protein export (Fig.3A).Further, extracellularmatrix organisation, mitoticprophase andnuclearenvelopere- assembly were the most enriched pathways in Tregs. Conv CD4⁺ showed enrichment of vi- ralmRNAtranslation,selenocysteinesynthesisandpeptidechainelongationpathways.Detailed

(6)

Fig. 3. Tregs and Conv CD4 ⁺T cell exhibit divergent proteomes. A. Word clouds of biological processes and reactome pathways enriched in Treg versus Conv CD4 ⁺. B. Volcano plot showing differential protein abundance of proteins between Treg and Conv CD4 ⁺, using q < 0.05 and log 2FC > 1 or -1 as cut-off (dotted lines). When comparing Conv CD4 ⁺, proteins signiﬁcantly upregulated in Treg (red) and downregulated (blue) are shown, along with CD49f ( ITGA6 ) and CD148 (PTPRJ) which are shown in green. Gene lists of the top 10 most upregulated and top 10 most downregulated proteins are shown. C. Proteomic data quantifying CD148 levels in Treg and Conv CD4 ⁺. Pairwise comparison showed higher abundance of CD148 in Treg cells in all donors (q < 0.0 0 01, multiple t-test with false discovery rate correction).

D. Flow cytometry data for cell surface CD148 in Treg and Conv CD4 ⁺T cells in six healthy donors denoted by different colors. Note some of the data points overlap ( ^∗p < 0.05, Mann Whitney U test). E. Representative histogram from one donor displaying the ﬂuorescence intensity difference of CD148 between two T cell populations.

(7)

results,including enrichmentscore, thenumber ofproteins detected, andgenenomenclature, areprovidedinTableS2.

To validate the proteomicsdataset by orthogonal methods, we selected two Treg-enriched cellsurfaceproteinswithavailablemonoclonalantibodiesforflowcytometry,specificallyCD49f (ITGA6) andCD148 (PTPRJ). The relative abundance of these andother proteins are shownin volcanoplot(Fig.3B). Wehaverecentlyreportedtheflow cytometryandfunctionalvalidation of CD49finTregs [10].CD148 isa protein tyrosinephosphatase that regulates T cellreceptor (TCR) signalling through Srcfamily kinases(SFK), andhasdual inhibitory/activatory functions andimmunecell-specificpatterning[11].InagreementwiththemarkedlyenrichedbyDDA-MS (Log₂FC=3.02, p < 0.001, Fig. 3C), flow cytometry validation confirmed CD148enrichment in Tregs(Fig.3Dand3E).

2. ExperimentalDesign,Materials,andMethods 2.1. Experimentaldesign

TheexperimentalphasesaredepictedinFig.1,includingTcellisolation(Fig.1A),proteomic samplepreparationanddata acquisition (Fig. 1B), proteomicsdatadeconvolutionand analysis (Fig.1C),andtheorthogonalvalidation(Fig.1D).

2.2. HumanTcellisolation

Sequential magnetic-activated cell isolation (MACS) and ﬂow cytometry-based cell sorting (FACS)wereusedtoisolateTregsandConvCD4⁺TcellsfromPBMCtohighpurity(Fig.1A).

2.2.1. IsolationofhumanPBMCfromvenousblood

PBMCwere isolatedfromfreshvenousbloodfromvolunteersatQIMRB,Brisbane,Australia.

Ficoll-Paque^TM densitygradientmedium(GE-Healthcare,USA)wasusedtoseparatePBMCfrom peripheralblood.Sampleswere centrifugedat434xgfor20minswithano-brakedeceleration.

PBMCwerethenwashedthreetimeswithRoswellParkMemorialInstituteMedium(RPMI-1640) medium.

2.2.2. MACSpuriﬁcationofTcellsubsets

CD3⁺Tcellsnext purifiedfromPBMC(n=3)usingMACS.Here,ahumanpanTcellisola- tionkit(MiltenyiBiotec,Germany)wasusedfornegativecellselectionasperthemanufacturer’s instructions.TheunlabeledCD3⁺Tcellflow-throughwascollectedandthenmovedtoanother round ofMACS cell isolation. Here, purifiedCD3⁺ T cellswere labelledwithCD25-PE (BioLe- gend,USA)for20mins at4°C, washedwithcoldMACS buffer(Miltenyi Biotec,Germany)and thenlabelledwithanti-PEmagneticbeads(MiltenyiBiotec,Germany)andpurifiedpertheman- ufacturer’sinstructionsusingLScolumns(MiltenyiBiotec,Germany).CD25^high (columnbound) andCD25^low(flow-through)Tcellswerecollected,washedinR10medium(RPMI-1640contain- ing 10% FCS) andtransferred into5ml tubesfor FACS-based sorting.Thisdual MACSmethod reducesFACStimeandincreasessortyields.

2.2.3. FACSpuriﬁcationofTregsandConvCD4⁺Tcells

MACSsortedCD25^high,andCD25^lowTcellswerenextstainedwithLIVE/DEAD^®FixableAqua (LifeTechnologies,USA),CD3-APCe780(eBioscience, ThermoFisher, Scientiﬁc,USA),CD4-VB711 andCD127-BV786(BDBiosciences,USA)for20minsat4°C,washedwithcoldFACSbufferthree timesforsorting.SinglecellsweresortedonaBDFACSAriaIII(BDBiosciences,USA)gatedusing FSC-WandFSC-H,andnon-viablecellsweredumped.Insorting,ﬁrsttheviablesinglecellswere gated toselectCD3⁺ andCD4⁺ Tcell population.OfthemTregCD25^high,CD127^low cellswere

(8)

sortedasTreg cellswhileCD25⁻ cellswere sortedas Conv CD4⁺ Tcells[10].Lowﬂow rates were usedto increase yield througha reduction in streamabort events.Approximately 10⁶ T cellsweresortedforTregandConvCD4⁺intandemforeachsample.EachsortedTcellsample waswashed three times withphosphate bufferedsaline (pH – 7.2)and then lysedin 100 μl lysisbuffercontaining1%sodiumdodecylsulphate(Biorad,USA)in100mMTriethylammonium bicarbonate (Sigma-Aldrich,USA) and 1 x Roche complete protease inhibitor cocktail (Sigma- Aldrich,USA)andstoredat-80°Cforproteomicsamplepreparation.

2.2.4. Proteinextractionandtrypsindigestion

Eachcelllysatewasnextthawedonice,and200ngofovalbumin(Sigma-Aldrich,USA)was addedas an internal standard. The amount of protein ineach cell lysatewas quantified at a wavelengthof562nmusingPiercebicinchoninicacid(BCA)proteinassay(ThermoFisherScien- tific,USA),followingthe manufacturer’sinstructions.Fortrypsindigestion, 20μg proteinfrom eachsamplewasreducedwith10mMofTris(2-carboxyethyl)phosphinehydrochloride(Thermo FisherScientific,USA)at60°Cfor30minsfollowedbyalkylationwithfreshlyprepared40mM chloroacetamide (Sigma, USA) at37°C inthe dark for45 mins in a volume of 100 μl. Deter- gentsinthesampleswerethenremovedusingproteinco-precipitationwithtrypsininmethanol [12,13]using1μlof1μg/μlsequencinggrademodifiedporcinetrypsin(Promega,USA)and1ml of100%cold(-20°C)chromARgrademethanol(HoneywellResearchChemicals,USA)toobtain a final lysate: methanolratio of1:10. Samples were mixedthoroughly by vortexing andkept overnightat-20°Cforproteinprecipitation.Thenextday,sampleswerecentrifugedfor15mins at16,100xg at4°C,andthesupernatantwasaspirated carefullywithout disturbingtheprotein pellet.The protein pelletswere washed (15 minsat 16,100xg at4°C)two timeswith1 mlof 90%and100%cold methanol,respectively.Methanolwasremovedafterthesecond centrifuga- tionthroughcarefulpipettingandairdryingfor2-3mins.Proteinpelletswerethenresuspended in50mMTEABcontaining5%acetonitrile(ACN,Honeywellresearchchemicals,USA)andmixed thoroughlybyrepeatedpipettingandvortexing.Sampleswerenextincubatedat37°Cfor2hrs withshaking. One μl of 1 μg/μl (1:100) trypsin wasthen added, vortexed andincubated for 12 hrs at37°C. The enzymaticreaction wasnext inhibited by adding25 μl of5% formicacid (FA,Sigma,USA)anddesaltedusingstrata-xpolymericreversed-phase10mg/mlC18cartridges (Phenomenex,USA). Conditioning,equilibrating, and washingsteps in thedesalting procedure wereperformedasfollows.First,theC18cartridgeswereconditionedthrough2washesof100%

ACNandequilibratedvia 2washeswith0.1% FA.Sampleswere then incubatedinconditioned cartridgesfor1mintoallowadsorptionofionisedpeptides.Cartridgeswerewashedtwicewith 0.1%FA,andadsorbedpeptideswereelutedintonew1.5mlmicrocentrifugetubesusing500μl 80%ACNin0.1%FA.Ineachofthesesteps1mloftherelevantsolutionwasused.Peptidesam- pleswereﬁnallydriedinaspeedvacuumat35°Candstoredat-80°Cuntilmassspectrometry (LC-MS/MS)analysis.

2.2.5. PeptidequantiﬁcationandpreparationforLC-MS/MSanalysis

ForLC-MS/MSanalysis,eachsamplewasresuspendedin20μlof0.1%FA,andpeptidequan- tiﬁcationwasperformedusingPiercemicroBCAproteinestimationkit(ThermoFisherScientiﬁc, USA)asperthemanufacturer’sinstructions.Afteradjustingthepeptideconcentration0.2μg/μl, 5μlofsolution(1μgpeptide)fromeachsamplewasinjectedintotheLC-MS/MSsystem.

2.2.6. LC-MS/MSproteomicsampleanalysis

HybridLC-MS/MS system; Orbitrap Fusion^TM Tribrid^TM (Thermo Fisher Scientiﬁc,USA) was usedtoanalysethepeptidesamplesusingDDA-MS.TheLC-MS/MSparametersusedinsample acquisitionaredetailedinTable2.

2.2.7. DDA-MSdatadeconvolution,proteinidentiﬁcationandintensitynormalisation

MaxQuant(Release1.6.0.16)softwarewasusedintheanalysisof.rawﬁlesfromDDA-MS,by searchingagainst theUniProt/SwissProthuman reviewedproteome databasecontaining20,242 entries(October,2017). Parametersusedinpeptideandproteinidentiﬁcationinthisstudyare

(9)

Table 2

Chromatographic and mass spectrometry parameters used in sample analysis.

Parameter Description/Settings

Mass Spectrometer LC system

Orbitrap Fusion ^TMTribrid ^TM(Thermo Fisher Scientiﬁc, USA) nanoACQUITY UPLC (Waters, USA)

Total LC gradient 175 minutes

Buffers A 0.1% FA

B 100% ACN + 0.1% FA

LC gradient

(buffer B concentration) 5% at 3 minutes, 9% at 10 minutes, 26% at 120 minutes, 40% at 145 minutes, 80% at 152 minutes, 80% at 157 minutes and 1% at 160 minutes

Trap column Symmetry C18 trap, 2G VM trap (Waters, USA),

100 ˚A, 5 μm particle size, 180 μm ×20 mm

Column BEH C18 (Waters, USA), 130 ˚A, 1.7 μm particle size,

75 μm ×200 mm

Flow rate 0.3 μl/ minutes

Ion source EASY-Max NG ^TMion source (Thermo Fisher Scientiﬁc, USA

Ion spray voltage 1900 V

Heating temperature 285 °C

Data acquisition method DDA-MS

MS1 mass range 380 – 1500 m/z

Injection time 50 milliseconds

Resolution 120,0 0 0 FWHM

Charge state Rejecting + 1, Selecting + 2 to + 7

No. of ions selected to trigger MS2

Fragmentation 15

Higher Energy C-trap Dissociation (HCD) Injection time

Resolution

70 milliseconds 30,0 0 0 FWHM

Dynamic exclusion 90 seconds

Cycle time 2 seconds

Table 3

Parameters used in DDA-MS data analysis and label-free quantiﬁcation using MaxQuant software and MaxLFQ.

Parameter Settings

Digestive enzyme Trypsin

Maximum number of miscleavages 2

Fixed modiﬁcation Carbamidomethylation (Cystiene)

Variable modiﬁcations Oxidation (Methionine), Acetylation (Protein N-terminal)

Precursor mass tolerance ±20 ppm

Product mass tolerance ±40 ppm

Maximum peptide charge + 7

Match between runs Yes (0.7 minutes match time window and 20 minutes alignment time window)

Protein FDR 0.01

Peptide spectral match (PSM) FDR 0.01

Minimum peptides 1

Peptide for protein quantiﬁcation Unique + Razor

Peptide selection for quantification Unmodified peptides and only the peptides modified with oxidation and N terminal acetylation

No. of peptides for quantiﬁcation ≥2

summarisedinTable3.MaxLFQtool inMaxQuantsoftwarewasusedtonormalisepeptideand proteinintensitiesforlabel-freequantiﬁcation.Forrobustness,theinternalcontrol(chickenoval- bumin,UniProtIDP01012)andthecommonMScontaminantdatabaseinbuiltinMaxQuantsoft- warewereusedforbenchmarking.

2.2.8. Dataﬁlteringandmissingvalueimputation

Common contaminants and false-positive proteins detected duringMaxQuant search were removed manually from further analysis. Proteins with more than one UniProt/SwissProt

(10)

accessionsidentiﬁed with≤ 1 peptideand/or a m_score of≤ 5 were removed. The resulting protein lists were further examined forthe percentage of proteins with unrecorded intensity values,and onlythe proteins with < 50% missing protein intensityvalues were selected and imputedusingmaximumlikelihoodestimation(Rpackage)andusedfordownstreamanalysis.

2.2.9. Differentiallyabundantproteinidentiﬁcationandpathwaysanalyses

LFQnormalised expressiondataoftheselectedproteinswere usedtoidentifythedifferen- tiallyabundantproteins.Followingimputingthemissingvaluesintheseselectedproteins,dif- ferentiallyabundantproteinsbetweenTregandConvCD4⁺wereanalysed(Tregvs.ConvCD4⁺).

Theresults were obtainedasLog₂FC values,wherethe positive valuesrepresenthighlyabun- dant proteins in Treg cells. Multiple t-testwith false discovery determination by a two-stage linearstep-upprocedure ofBenjamini,KriegerandYekutieli wasusedtocalculateq valuesof differentiallyabundant proteins betweenTregs andConv CD4⁺ Tcells. Functional enrichment ofthe proteomicdataset wasanalysed usingLog₂FC valuesusing thebioinformatics software, STRING:Functionalproteinassociationnetwork,version11.5todetermineGOfunctionalenrich- ment,KEGGandReactomepathwayswithacut-off FDRof0.05%.TheR packageandGraphPad Prism(version9.2.0 forWindows,GraphPadSoftware,SanDiego,CaliforniaUSA)were usedin bioinformaticsanalysisandgraphgeneration.Wordcloudsweregeneratedusingtheonlinetool (https://www.wordclouds.com,September8th,2021).

2.2.10. ValidationofMSresultsbyﬂowcytometry

DDA-MSdatawasvalidatedusingFACS.Here,10⁶PBMCfromsixhealthyindividuals,includ- ingdonorsusedintheproteomicsstudy,werelabelledwiththeforkheadboxP3(FOXP3)stain- ingkitasperthemanufacturer’sinstructions(Biolegend,USA),followedbysurfacestainingwith LIVE/DEAD^® Fixable Aqua (Life Technologies, USA),CD3-APCe780 (eBioscience, Thermo Fisher, Scientiﬁc, USA), CD4-BV711 (BD Biosciences, USA) CD25-PEcy7 (BD Biosciences, USA), CD127- BV786(BDBiosciences, USA)andCD148-PE (BioLegend, USA)for20mins at4°C,andwashed withcold FACS buffer three times. Single cells were analysed on a BD LSRFortessa (BD Bio- sciences,USA)usingBDFACSDiva8.0software(BDBiosciences,USA).GatingcomprisedFSC-W andFSC-H withnon-viable cellsdumped.Flow data wasanalysed usingFlowJo v10(TreeStar, USA).

EthicsStatement

EthicalclearanceforthisstudywasobtainedfromtheQIMRBhumanresearchethicscommit- tee(HREC, #P2058).Informedconsentwasobtainedfromallvolunteersandthestudyadhered totheDeclarationofHelsinkiof1975.

CRediTAuthorStatement

HarshiWeerakoon:Data curation;Formal analysis;Validation;Investigation;Methodology;

Writing-originaldraft;John J.Miles: Conceptualization;Methodology; Resources;Supervision;

Writing-Review & Editing; Project administration; Funding acquisition; Ailin Lepletier: Con- ceptualization; Methodology; Supervision; Writing-Review & Editing; Project administration;

Michelle M. Hill: Conceptualization; Methodology; Resources; Supervision; Writing-original draft;Projectadministration.

DeclarationofCompetingInterest

Theauthorsdeclarethattheyhavenoknowncompetingﬁnancialinterestsorpersonalrela- tionshipswhichhaveorcouldbeperceivedtohaveinﬂuencedtheworkreportedinthisarticle.

(11)

Acknowledgments

We thank the volunteers for blood samples for this study. We sincerely thank Dr Renee RichardsandDrJasonMulvenna(QIMRB)foradministrativeandprojectsupport.HWwassup- ported by an International Postgraduate Research Scholarship, The University of Queensland, Brisbane, Australia, and PhD Top-up Scholarship (QIMRB). JJM was supported by a National HealthandMedicalResearchCouncilCareerDevelopmentLevel2Fellowship(1131732).Theau- thorsthankthestaff fromtheﬂowcytometryfacilityandtheanalyticalfacilityatQIMRB.

SupplementaryMaterials

Supplementary material associatedwith this articlecan be found in theonline version at doi:10.1016/j.dib.2021.107687.

References

[1] Y. Perez-Riverol, A. Csordas, J. Bai, M. Bernal-Llinares, S. Hewapathirana, D.J. Kundu, A. Inuganti, J. Griss, G. Mayer, M. Eisenacher, E. Pérez, J. Uszkoreit, J. Pfeuffer, T. Sachsenberg, S. Yilmaz, S. Tiwary, J. Cox, E. Audain, M. Walzer, A.F. Jarnuczak, T. Ternent, A . Brazma, J.A . Vizcaíno, The PRIDE database and related tools and resources in 2019:

improving support for quantiﬁcation data, Nucleic Acids Res. 47 (2019) D442–D450, doi: 10.1093/nar/gky1106 . [2] J. Cox, M. Mann, MaxQuant enables high peptide identiﬁcation rates, individualized p.p.b.-range mass accuracies

and proteome-wide protein quantiﬁcation, Nat. Biotechnol. 26 (2008) 1367–1372, doi: 10.1038/nbt.1511 .

[3] T.U. Consortium, UniProt: a worldwide hub of protein knowledge, Nucleic Acids Res. 47 (2019) D506–D515, doi: 10.

1093/nar/gky1049 .

[4] J. Cox, M.Y. Hein, C.A. Luber, I. Paron, N. Nagaraj, M. Mann, Accurate proteome-wide label-free quantiﬁcation by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ, Mol. Cell. Proteom. 13 (2014) 2513–

2526, doi: 10.1074/mcp.M113.031591 .

[5] RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.

com/ .

[6] Y. Benjamini, A.M. Krieger, D. Yekutieli, Adaptive linear step-up procedures that control the false discovery rate, Biometrika 93 (2006) 491–507, doi: 10.1093/biomet/93.3.491 .

[7] M. Kanehisa, S. Goto, KEGG: kyoto encyclopedia of genes and genomes, Nucleic Acids Res. 28 (20 0 0) 27–30, doi: 10.

1093/nar/28.1.27 .

[8] B. Jassal, L. Matthews, G. Viteri, C. Gong, P. Lorente, A. Fabregat, K. Sidiropoulos, J. Cook, M. Gillespie, R. Haw, F. Loney, B. May, M. Milacic, K. Rothfels, C. Sevilla, V. Shamovsky, S. Shorser, T. Varusai, J. Weiser, G. Wu, L. Stein, H. Hermjakob, P. D’Eustachio, The reactome pathway knowledgebase, Nucleic Acids Res. 48 (2020) D498–D503, doi: 10.1093/nar/gkz1031 .

[9] D. Szklarczyk, A.L. Gable, D. Lyon, A. Junge, S. Wyder, J. Huerta-Cepas, M. Simonovic, N.T. Doncheva, J.H. Mor- ris, P. Bork, L.J. Jensen, C. von Mering, STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets, Nucleic Acids Res. 47 (2019) D607–D613, doi: 10.1093/nar/gky1131 .

[10] H. Weerakoon, J. Straube, K. Lineburg, L. Cooper, S. Lane, C. Smith, S. Alabbas, J. Begun, J.J. Miles, M.M. Hill, A. Lepletier, Expression of CD49f deﬁnes subsets of human regulatory T cells with divergent transcriptional landscape and function that correlate with ulcerative colitis disease activity, Clin. Transl. Immunol. 10 (2021) e1334, doi: 10.1002/cti2.1334 .

[11] O. Stepanek, T. Kalina, P. Draber, T. Skopcova, K. Svojgr, P. Angelisova, V. Horejsi, A. Weiss, T. Brdicka, Regulation of Src family kinases involved in T cell receptor signaling by protein-tyrosine phosphatase CD148, J. Biol. Chem. 286 (2011) 22101–22112, doi: 10.1074/jbc.M110.196733 .

[12] K.A. Dave, M.J. Headlam, T.P. Wallis, J.J. Gorman, Preparation and analysis of proteins and peptides using MALDI TOF/TOF mass spectrometry, Curr. Protoc. Protein Sci. Chapter 16 (2011) Unit 16.13, doi: 10.1002/0471140864.

ps1613s63 .

[13] H. Weerakoon, J. Potriquet, A.K. Shah, S. Reed, B. Jayakody, C. Kapil, M.K. Midha, R.L. Moritz, A. Lepletier, J. Mulvenna, J.J. Miles, M.M. Hill, A primary human T-cell spectral library to facilitate large scale quantitative T-cell proteomics, Sci. Data 7 (2020) 412, doi: 10.1038/s41597- 020- 00744- 3 .