Contents lists available atScienceDirect
Information Processing Letters
www.elsevier.com/locate/ipl
A refined approximation for Euclidean k-means
Fabrizio Grandoni
a,∗ , Rafail Ostrovsky
b, Yuval Rabani
c, Leonard J. Schulman
d, Rakesh Venkat
eaIDSIA,USI-SUPSI,Switzerland bUCLA,UnitedStatesofAmerica cTheHebrewUniversityofJerusalem,Israel dCaltech,UnitedStatesofAmerica eIITHyderabad,India
a rt i c l e i nf o a b s t ra c t
Articlehistory:
Received15July2021 Accepted23January2022 Availableonline2February2022 CommunicatedbyLeahEpstein
Keywords:
Approximationalgorithms Euclideank-means Euclideanfacilitylocation Integralitygaps
IntheEuclideank-MeansproblemwearegivenacollectionofnpointsDinanEuclidean spaceandapositiveintegerk.Ourgoalistoidentifyacollectionofkpointsinthesame space(centers)soastominimizethesumofthesquaredEuclideandistancesbetweeneach pointinDandtheclosestcenter.ThisproblemisknowntobeAPX-hardandthecurrent bestapproximationratioisaprimal-dual6.357 approximationbasedonastandardLPfor theproblem[Ahmadianetal.FOCS’17,SICOMP’20].
In thisnote weshow how aminor modification ofAhmadian etal.’sanalysis leads to a slightlyimproved6.12903 approximation. As arelated result,we alsoshow thatthe mentionedLPhasintegralitygapatleast 16+15√5>1.2157.
©2022TheAuthor(s).PublishedbyElsevierB.V.Thisisanopenaccessarticleunderthe CCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
ClusteringisacentralprobleminComputerScience,withmanyapplicationsindatascience,machinelearningetc.Oneof themostfamousandbest-studiedproblemsinthisareaisEuclideank-Means:givenasetDofnpoints(ordemands)inR andanintegerk∈ [1
,
n],selectkpoints S (centers)soastominimizej∈Dd2
(
j,
S)
.Hered(
j,
i)
istheEuclideandistance betweenpoints j andi andforasetofpointsI,d(
j,
I)
=mini∈Id(
j,
i)
.Inotherwords,wewishtoselectk centerssoasto minimizethesumofthesquaredEuclideandistancesbetweeneachdemandandtheclosestcenter.Equivalently,afeasible solutionisgivenbya partitionofthedemandsintok subsets(clusters).Thecost wC ofa clusterC⊂Disj∈Cd2
(
c, μ )
, whereμ
isthe centerofmassofC.Werecall that wC canalsobe expressedas 2|1C|j∈C
j∈Cd2
(
j,
j)
.Ourgoalis to minimizethetotalcostoftheseclusters.Euclideank-Meansiswell-studiedintermsofapproximationalgorithms.ItisknowntobeAPX-hard.Moreprecisely,itis hardtoapproximatek-Meansbelowafactor1
.
0013 inpolynomialtimeunless P=N P [6,16].Thehardness wasimproved to 1.
07 under the Unique Games Conjecture [9]. Some heuristics are known to perform very well in practice, however their approximation factor is O(
logk)
or worse on generalinstances [3,4,17,21]. Constantapproximation algorithms are known.Alocal-searchalgorithmbyKanugoetal. [15] providesa9+ε
approximation.1 Theauthorsalsoshowthatnatural local-search basedalgorithmscannot performbetter thanthis. Thisratiowas improvedto 6.
357 byAhmadian etal. [1,2]*
Correspondingauthor.E-mailaddress:fabrizio@idsia.ch(F. Grandoni).
1 Throughoutthispaperbyεwemeananarbitrarilysmallpositiveconstant.W.l.o.g.weassumeε≤1.
https://doi.org/10.1016/j.ipl.2022.106251
0020-0190/©2022TheAuthor(s).PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).
usinga primal-dualapproach. Theyalso provea 9+
ε
approximation forgeneral(possiblynon-Euclidean) metrics.Better approximationfactorsareknownunderreasonablerestrictionsontheinput[5,7,10,20].APTASisknownforconstantk[19]orforconstantdimension
[10,12].Noticethat
can bealwaysassumedtobe O
(
logn)
by astandardapplicationofthe Johnson-Lindenstrausstransform[14].Thiswasrecentlyimprovedto O(
logk+log logn)
[8] andfinallyto O(
logk)
[18].InthispaperwedescribeasimplemodificationoftheanalysisofAhmadianetal. [2] whichleadstoaslightlyimproved approximationforEuclideank-Means(seeSection2).
Theorem1.Thereexistsadeterministicpolynomial-timealgorithmforEuclideank-Meanswithapproximationratio
ρ
+ε
forany positiveconstantε >
0,whereρ :=
⎛
⎝
1+
1 2
(
2+
3 3−
2√
2
+
33
+
2√
2)
⎞
⎠
2
<
6.
12903.
The above approximation ratio is w.r.t. the optimal fractional solution to a standard LP relaxation L Pk-Means for the problem(definedlater). Asaside result(seeSection 3),weprove alower boundontheintegralitygapofthisrelaxation (wearenotawareofanyexplicitsuchlowerboundintheliterature).
Theorem2.TheintegralitygapofL Pk-Means,evenintheEuclideanplane(i.e.,for
=2),isatleast16+15√5
>
1.
2157.1.1. Preliminaries
As mentioned earlier, one can formulate Euclideank-Means in term of the selection of k centers. In this case, it is convenienttodiscretizethepossiblechoicesforthecenters,henceobtainingapolynomial-sizesetF ofcandidatecenters, atthecostofanextrafactor1+
ε
intheapproximationratio(wewillneglectthisfactorintheapproximationratiossince itisabsorbedbyanalogousfactorsintherestoftheanalysis).Inparticularwewillusetheconstructionin[11] (Lemma24) thatchoosesasF thecentersofmassofanycollectionofupto16/ ε
2 pointswithrepetitions.Inparticular|F|=O(
n16/ε2)
inthiscase.Letc
(
j,
i)
beanabbreviationford2(
j,
i)
.ThenastandardLP-relaxationfork-Meansisasfollows:min
i∈F,j∈D
xi j
·
c(
j,
i)
L Pk-Meanss
.
t.
i∈F
xi j
≥
1∀
j∈
Dxi j
≤
yi∀
j∈
D, ∀
i∈
Fi∈F
yi
≤
k∀
j∈
D, ∀
i∈
Fxi j
,
yi≥
0∀
j∈
D, ∀
i∈
FIn an integral solution,we interpret yi=1 as i beinga selectedcenter in S (i is open), andxi j=1 as demand j being assignedtocenteri.2 Thefirstfamilyofconstraintsstatesthateachdemandhastobeassignedtosomecenter,thesecond onethatademandcanonlybeassignedtoanopencenter,andthethirdonethatwecanopenatmostkcenters.
Foranyparameter
λ >
0 (Lagrangianmultiplier),theLagrangianrelaxation L P(λ)
ofL Pk-Means (w.r.t.thelastmatrixcon- straint)anditsdualD P(λ)
areasfollows:min
i∈F,j∈D
xi j
·
c(
j,
i) + λ ·
i∈F
yi
− λ
k L P(λ)
s
.
t.
i∈F
xi j
≥
1∀
j∈
Dxi j
≤
yi∀
j∈
D,∀
i∈
Fxi j
,
yi≥
0∀
j∈
D,∀
i∈
F2 Technicallyeachdemandisautomaticallyassignedtotheclosestopencenter.Howeveritisconvenienttoallowalsosub-optimalassignmentsinthe LPrelaxation.
max
j∈D
α
j− λ
k D P(λ)
s
.
t.
j∈D
max
{
0, α
j−
c(
j,
i) } ≤ λ ∀
i∈
F (1)α
j≥
0∀
j∈
DAbovemax{0
, α
j−c(
j,
i)
}replacesthedualvariableβ
i j correspondingtothesecondconstraintintheprimalinthestandard formulationofthedualLP.Noticethat,byremovingthefixedterm−λkintheobjectivefunctionsofL P(λ)
andD P(λ)
,one obtains thestandardLPrelaxationL PF L(λ)
fortheFacilityLocationproblem(FL)withuniformfacilitycostλ
andits dual D PF L(λ)
.Wesaythata
ρ
-approximationalgorithmforaFLinstanceoftheabovetypeisLagrangianMultiplierPreserving(LMP)if itreturnsasetoffacilitiesS thatsatisfies:j∈D
c
(
j,
S) ≤ ρ (
O P T(λ) − λ|
S|),
where O P T
(λ)
isthevalueoftheoptimalsolutiontoL PF L(λ)
. 2. ArefinedapproximationforEuclideank-meansInthissection we presentourrefined approximationforEuclideank-Means.Westart bypresentingthe LMPapproxi- mation algorithmforthe FLinstances arisingfromk-Meansdescribed in[2] in Section 2.1.Wethen presenttheanalysis ofthatalgorithmasin[2] inSection2.2.InSection2.3we describeourrefined analysisofthesamealgorithm.Finally,in Section2.4wesketchhowtousethistoapproximatek-Means.
2.1. Aprimal-dualLMPalgorithmforEuclideanfacilitylocation
Weconsider aninstanceofEuclideanFLinducedby ak-Meansinstanceinthementionedway,foragivenLagrangian multiplier
λ >
0.We considerexactly the sameLagrangian Multiplier Preserving(LMP) primal-dual algorithm J V
(δ)
asin[2].In more detail, letδ
≥2 be a parameter to be fixed later. The algorithm consists of a dual-growth phase and a pruningphase.The dual-growth phase is exactly asin the classical primal-dual algorithm JV by Jain andVazirani [13]. We start with all the dual variables set to 0 and an empty set Ot of tentatively open facilities. The clients such that
α
j≥c(
j,
i)
for some i∈Ot arefrozen,andtheother clientsareactive.Wegrowthedualvariables ofactive clientsatuniformrateuntil one ofthefollowingtwo eventshappens. Thefirsteventisthat some constraintoftype (1) becomes tight.At thatpoint the corresponding facilityi isadded to Ot andall clients j withα
j≥c(
j,
i)
are set tofrozen. The second eventisthatα
j=c(
j,
i)
forsomei∈Ot.Inthatcase j issettofrozen.Inanycase,thefacilityw(
j)
thatcauses j tobecomefrozenis calledthewitnessof j.Thephasehaltswhenallclientsarefrozen.Inthepruningphasewe willclosesomefacilitiesinOt,henceobtainingthefinalsetofopenfacilities I S.Here J V
(δ)
deviates from J V.Foreach client j∈D,let N(
j)
= {i∈F:α
j>
c(
j,
i)
} be thesetof facilities i such that j contributed with apositive amount to the openingof i. Symmetrically,for i∈F, let N(
i)
= {j∈D:α
j>
c(
j,
i)}
be theclients that contributed with a positive amount to the opening of i. For i∈Ot, we let ti=maxj∈N(i)α
j, where the valuesα
j are consideredattheendofthedual-growthphase.Wesetconventionallyti=0 forN(
i)
= ∅.Intuitively,ti isthe“time”when facilityi istentativelyopen(atwhichpointallthedualvariablesofcontributingclientsstopgrowing).Wedefineaconflict graph H overtentatively open facilitiesasfollows. Thenode set of H is Ot. Weplace an edgebetween i,
i∈Ot iff the followingtwoconditionshold:(1)forsome client j, j∈N(
i)
∩N(
i)
(inwords, j contributestotheopeningofbothi and i) and(2)one has c(
i,
i)
≤δ
·min{ti,
ti}.In thisgraphwe compute a maximal independent set I S,which providesthe desiredsolutiontothefacilitylocationproblem(whereeachclientisassignedtotheclosestfacilityinI S).Weremarkthatthepruningphaseof J V differsfromtheoneof J V
(δ)
onlyinthedefinitionofH,wherecondition(2) isnotrequiredtohold(or,equivalently, J V behaveslike J V(
+∞)
forλ >
0).2.2. Theanalysisin[2]
Thegeneralgoalistoshowthat
j∈D
c
(
j,
I S) ≤ ρ (
j∈D
α
j− λ|
I S|),
for some
ρ
≥1 as small as possible. This shows that the algorithm is an LMPρ
-approximation for the problem. It is sufficienttoprovethat,foreachclient j,onehasc
(
j,
I S) ρ ≤ α
j−
i∈N(j)∩I S
( α
j−
c(
j,
i)) = α
j−
i∈I S
max
{
0, α
j−
c(
j,
i) } .
LetS=N
(
j)
∩I S ands= |S|.Wedistinguish3 casesdependingonthevalueofs.CaseA:s=1 Let S= {i∗}.Thenforany
ρ
≥1, c(
j,
I S)
ρ ≤
c(
j,
I S) =
c(
j,
i∗
) = α
j− ( α
j−
c(
j,
i∗)).
CaseB:s
>
1 Here weusethe propertiesofEuclideanmetrics.Thesumi∈Sc
(
j,
i)
isthesumofthesquareddistances from j tothe facilitiesin S. Thisquantityis lowerboundedby thesumofthe squareddistancesfrom S tothe centroidμ
of S.Recallthati∈Sc
( μ ,
i)
=2s1i∈S
i∈Sc
(
i,
i)
.Wealsoobservethat,byconstruction,foranytwodistinct i,
i∈I S onehasc
(
i,
i) > δ ·
min{
ti,
ti} ≥ δ · α
j,
wherethelastinequalityfollowsfromthefactthat jiscontributingtotheopeningofbothiandi.Altogetheroneobtains
i∈S
c
(
j,
i) ≥
i∈S
c
( μ ,
i) =
1 2s i∈Si∈Sc
(
i,
i) ≥ (
s−
1)δ α
j2
.
Thus
i∈S
( α
j−
c(
j,
i)) ≤ (
s− δ(
s−
1)
2
) α
j= (
s(
1− δ
2) + δ
2
) α
j δ≥2,s≥2≤ (
2− δ
2) α
j.
Usingthefactthat
α
j>
c(
j,
i)
foralli∈S,henceα
j>
c(
j,
I S)
,onegets( δ
2
−
1)
c(
j,
I S)
δ≥≤
2( δ
2−
1) α
j.
Weconcludethat
i∈S
( α
j−
c(
j,
i)) + ( δ
2
−
1)
c(
j,
I S) ≤ (
2− δ
2
) α
j+ ( δ
2
−
1) α
j= α
j.
Thisgivesthedesiredinequalityassumingthat
ρ
≥δ/21−1.CaseC:s=0 Considerthewitnessi=w
(
j)
of j.Noticethatα
j≥tiandα
j≥c(
j,
i)
=d2(
j,
i)
.Hence d(
j,
i) +
δ
ti≤ (
1+ √ δ) √
α
j.
Ifi∈I S,thend
(
j,
I S)
≤d(
j,
i)
.Otherwisethereexistsi∈I S suchthatd2(
i,
i)
≤δ
min{ti,
ti}≤δ
ti.Thusd(
j,
I S)
≤d(
j,
i)
+ d(
i,
i)
≤d(
j,
i)
+√δ
ti.Inbothcasesonehasd(
j,
I S)
≤(
1+√δ)
√α
j,hence c(
j,
I S) ≤ (
1+ √
δ)
2α
j.
Thisgivesthedesiredinequalityfor
ρ
≥(
1+√δ)
2. Fixingδ
Altogetherwecansetρ
=max{δ/21−1, (
1+√δ)
2}.Thebestchoiceforδ
(namely,theonethatminimizesρ
)isthe solutionof δ/21−1=(
1+√δ)
2.Thisisachievedforδ
2.
3146 andgivesρ
6.
3574.2.3. Arefinedanalysis
WerefinetheanalysisinCaseBasfollows.Let =
i∈Sc
(
j,
i)
.Ourgoalistoupperbound c(
j,
S)
α
j−
i∈S
( α
j−
c(
j,
i)) =
c(
j,
S)
i∈Sc
(
j,
i) − (
s−
1) α
j=
c(
j,
S)
− (
s−
1) α
j.
Insteadofusingtheupperboundc
(
j,
S)
≤α
j weusetheaveragec
(
j,
S) ≤
1 s i∈Sc
(
j,
i) =
s.
Thenitissufficienttoupperbound 1
s
− (
s−
1) α
j.
The derivativein oftheabove functionis 1s −(s−1)αj
( −(s−1)αj)2
<
0.Hence themaximumisachievedforthesmallestpossible valueof .Recallthatwealreadyshowedthat ≥(s−12)δαj.Henceavalidupperboundis1 s
(s−1)δαj 2 (s−1)δαj
2
− (
s−
1) α
j=
1 sδ/
2δ/
2−
1s≥2,δ≥2
≤ δ/
4δ/
2−
1.
This imposes
ρ
≥ δ/δ/2−41 rather thanρ
≥δ/21−1 in Case B.Notice that thisis an improvementforδ <
4. The bestchoice ofδ
is now obtained by imposing δ/δ/2−41 =(
1+√δ)
2. This givesδ
=12(
2+3 3−2√2+3 3+2√
2
)
2.
1777 andρ
=1+
1 2
(
2+33−2√ 2+3
3+2√ 2
)
2
<
6.
12903.2.4. Fromfacilitylocationtok-means
Wecanusetherefined
ρ
:=1+
1 2
(
2+33−2√ 2+3
3+2√ 2
)
2
approximationforEuclideanFacilityLocationfrom previoussectiontoderivea
ρ
+ε
approximationforEuclideank-Means,foranyconstantε >
0.Herewefollowtheapproach of[2] withonlyminorchanges.Inmoredetail,theauthorsconsideravariantoftheFLalgorithmdescribedbefore,whose approximationfactorisρ
+ε
ratherthanρ
.Acarefuluseofthisalgorithmleadstoasolutionopeningpreciselykfacilities, which leads to the desiredapproximation factor. In their analysisthe authors use slight modifications of the inequality(
1+√δ)
√α
j≥d(
j,
i)
+√δ
ti (comingfromCaseC,whichisthesameintheir andouranalysis).Thegoalistoprovethat themodifiedalgorithmisρ
+ε
approximate.Hereδ
andρ
areusedasparameters.Thereforeitissufficienttoreplacetheir valuesoftheseparameterswiththeonescomingfromourrefinedanalysis.Therestisidentical.3. Lowerboundontheintegralitygap
In thissectionwe describe ourlower bound instanceforthe integralitygapof L Pk-Means. Itis convenientto consider firstthefollowingslightlydifferentrelaxation,basedonclusters(withwC asdefinedinSection1):
min
C∈C
wCxC L Pk-Means
s
.
t.
C∈C:j∈C
xC
≥
1∀
j∈
DC∈C xC
≤
kxC
≥
0∀
C∈
CHere C denotes theset ofpossible clusters, i.e.thepossible subsetsofpoints. In an integral solution xC=1 meansthat clusterC ispartofoursolution.
Ourinstanceison theEuclideanplane, andits points arethe(10)vertices oftworegular pentagonsofside length 1.
ThesepentagonsareplacedsothatanytwoverticesofdistinctpentagonsareatlargeenoughdistanceM tobefixedlater.
Here k=5. We remark that our argument can be easily extended to an arbitrarynumber of points by taking 2h such pentagons forany integer h≥1 so that the pairwise distancebetween vertices ofdistinct pentagons is atleast M, and settingk=5h.
Afeasible fractionalsolutionis obtainedby settingxC =0
.
5 forevery C consistingofa pairofconsecutiveverticesin thesamepentagon(soweareconsidering10 fractionalclustersintotal).Obviouslythissolutionisfeasible.ThecostwC of eachsuchclusterC is2(
0.
5)
2=0.
5.Hencethecostofthisfractionalsolutionis10·0.
5·0.
5=52.Next consider the optimalintegral solution,consisting of5 clusters. Recall that the radius of each pentagon (i.e. the distancefromavertextoitscenter) isr=
2 5−√
50
.
851 andthedistancebetweentwo non-consecutiveverticesinthe same pentagonis d=√52+11.
618.A solutionwithtwo clustersconsistingoftheverticesof eachpentagoncosts 10r2. Any cluster involving vertices ofdistinct pentagonscosts atleast M2/
2, hence for M large enough the optimal solution forms clusters only with vertices of the same pentagon. In more detail the optimal solution consists of x∈ {1,
2,
3,
4} clusters containing the vertices of one pentagon and 5−x clusters containing the vertices of the remaining pentagon.Let w
(
x)
be theminimum costassociated withonepentagonassuming that we formx clusterswithitsvertices. Clearly w(
1)
=5r2=5−10√5.Regarding w(
4)
,itisobviouslyconvenienttochoosetwoconsecutiveverticesintheuniqueclusterof size2.Thus w(
4)
=1/
2.Forx∈ {2,
3},wenote,asiseasytoverify,thatclusterswithconsecutiveverticesarelessexpensive thanthealternatives.Forw(
2)
,onemightformoneclusterofsize1 andoneofsize4.Thiswouldcost 3(1+4d2)=15+83√5. Alternatively,onemightformoneclusterofsize2 andoneofsize3,atsmallercost 12+2+3d2=10+6√5.Thus w(
2)
=10+6√5. Forx=3,onemightformtwoclustersofsize1 andoneofsize3,ortwoclustersofsize2 andoneofsize1.Theassociated costinthetwocasesis 2+3d2>
1 and2·12=1,resp.Hence w(
3)
=1.Sotheoverallcostoftheoptimalintegralsolutionis min{w(
1)
+w(
4),
w(
2)
+w(
3)
}=w(
2)
+w(
3)
=16+6√5.ThustheintegralitygapofL Pk-Means isatleast 16+√5
6 ·25=16+15√5. Consider next L Pk-Means.Herea technicalcomplicationcomesfromthe definitionofF whichisnot partoftheinput instance ofk-Means. The sameconstruction as above worksifwe let F contain the centersof mass ofany setof 2 or 3 points. Notice thatthis isautomatically guaranteedby the construction in[11] for 16
/ ε
2≥3. Inthiscasethe optimal integral solutionsto L Pk-Means andL Pk-Means are thesameintheconsideredexample.Furthermoreone obtainsafeasible fractional solutionto L Pk-Means ofcost5/
2 bysetting yi=0.
5 for thecentersofmassofanytwo consecutivevertices of thesamepentagon,andsettingxi j=0.
5 foreachpointi andthetwoclosestcenters jwithpositive yj.Thisconcludesthe proofofTheorem2.Declarationofcompetinginterest
The authors declare that they haveno known competingfinancial interests or personal relationships that could have appearedtoinfluencetheworkreportedinthispaper.
Acknowledgements
WorksupportedinpartbytheNSFgrants1909972andCNS-2001096,theSNFexcellencegrant200020B_182865/1,the BSFgrants2015782and2018687,andtheISFgrants956-15,2533-17,and3565-21.
References
[1]SaraAhmadian,AshkanNorouzi-Fard,OlaSvensson,JustinWard,Betterguaranteesfork-meansandEuclideank-medianbyprimal-dualalgorithms, in:ChrisUmans(Ed.),58thIEEEAnnualSymposiumonFoundationsofComputerScience,FOCS2017,Berkeley,CA,USA,October15-17,2017,IEEE ComputerSociety,,2017,pp. 61–72.
[2]SaraAhmadian,AshkanNorouzi-Fard,OlaSvensson,JustinWard,Betterguaranteesfork-meansandEuclideank-medianbyprimal-dualalgorithms, SIAMJ.Comput.49 (4)(2020).
[3]DavidArthur,SergeiVassilvitskii,Howslowisthek-meansmethod?,in:NinaAmenta,OtfriedCheong(Eds.),Proceedingsofthe22ndACMSymposium onComputationalGeometry,Sedona,Arizona,USA,June5–7,2006,ACM,2006,pp. 144–153.
[4]DavidArthur,SergeiVassilvitskii,k-means++:theadvantagesofcarefulseeding,in:NikhilBansal,KirkPruhs,CliffordStein(Eds.),Proceedingsof theEighteenthAnnual ACM-SIAMSymposiumon DiscreteAlgorithms,SODA2007, NewOrleans,Louisiana,USA, January7-9, 2007, SIAM,2007, pp. 1027–1035.
[5]PranjalAwasthi,AvrimBlum,OrSheffet,StabilityyieldsaPTASfork-medianandk-meansclustering,in:51thAnnualIEEESymposiumonFoundations ofComputerScience,FOCS2010,October23-26,2010,LasVegas,Nevada,USA,IEEEComputerSociety,2010,pp. 309–318.
[6]PranjalAwasthi,MosesCharikar,RavishankarKrishnaswamy,AliKemalSinop,ThehardnessofapproximationofEuclideank-means,in:LarsArge, JánosPach(Eds.),31stInternationalSymposiumonComputationalGeometry,SoCG2015,June22-25,2015,Eindhoven,TheNetherlands,in:LIPIcs, vol. 34,SchlossDagstuhl- Leibniz-ZentrumfürInformatik,2015,pp. 754–767.
[7]Maria-FlorinaBalcan,AvrimBlum,AnupamGupta,Approximateclusteringwithouttheapproximation,in:ClaireMathieu(Ed.),Proceedingsofthe TwentiethAnnualACM-SIAMSymposiumonDiscreteAlgorithms,SODA2009,NewYork,NY,USA,January4-6,2009,SIAM,2009,pp. 1068–1077.
[8]LucaBecchetti,MarcBury,VincentCohen-Addad,FabrizioGrandoni,ChrisSchwiegelshohn,Obliviousdimensionreductionfork-means:beyondsub- spacesandtheJohnson-Lindenstrausslemma,in:MosesCharikar,EdithCohen(Eds.),Proceedingsofthe51stAnnualACMSIGACTSymposiumon TheoryofComputing,STOC2019,Phoenix,AZ,USA,June23-26,2019,ACM,2019,pp. 1039–1050.
[9]VincentCohen-Addad,C.S.Karthik,Inapproximabilityofclusteringinlpmetrics,in:DavidZuckerman(Ed.),60thIEEEAnnualSymposiumonFounda- tionsofComputerScience,FOCS2019,Baltimore,Maryland,USA,November9-12,2019,IEEEComputerSociety,2019,pp. 519–539.
[10]VincentCohen-Addad,PhilipN.Klein,ClaireMathieu,Localsearchyieldsapproximationschemesfork-meansandk-medianinEuclideanandminor- freemetrics,SIAMJ.Comput.48 (2)(2019)644–667.
[11]Wenceslas FernandezdelaVega,MarekKarpinski,ClaireKenyon,YuvalRabani,Approximation schemesfor clusteringproblems,in:LawrenceL.
Larmore,MichelX.Goemans(Eds.),Proceedingsofthe35thAnnualACMSymposiumonTheoryofComputing,June9-11,2003,SanDiego,CA,USA, ACM,2003,pp. 50–58.
[12]ZacharyFriggstad,MohsenRezapour,MohammadR.Salavatipour,LocalsearchyieldsaPTASfork-meansindoublingmetrics,SIAMJ.Comput.48 (2) (2019)452–480.
[13]KamalJain,VijayV.Vazirani,Approximationalgorithmsformetricfacilitylocationandk-medianproblemsusingtheprimal-dualschemaandLa- grangianrelaxation,J.ACM48 (2)(2001)274–296.
[14]WilliamB.Johnson,JoramLindenstrauss,ExtensionsofLipschitzmappingsintoaHilbertspace,Contemp.Math.26(1984)189–206.
[15]TapasKanungo,DavidM.Mount,NathanS.Netanyahu,ChristineD.Piatko,RuthSilverman,AngelaY.Wu,Alocalsearchapproximationalgorithmfor k-meansclustering,Comput.Geom.28 (2–3)(2004)89–112.
[16]EuiwoongLee,MelanieSchmidt,JohnWright,Improvedandsimplifiedinapproximabilityfork-means,Inf.Process.Lett.120(2017)40–43.
[17]StuartP.Lloyd,LeastsquaresquantizationinPCM,IEEETrans.Inf.Theory28 (2)(1982)129–136.
[18]KonstantinMakarychev,YuryMakarychev,IlyaP.Razenshteyn,PerformanceofJohnson-Lindenstrausstransformfork-meansandk-mediansclustering, in:MosesCharikar,EdithCohen(Eds.),Proceedingsofthe51stAnnualACMSIGACTSymposiumonTheoryofComputing,STOC2019,Phoenix,AZ, USA,June23-26,2019,ACM,2019,pp. 1027–1038.
[19]JiríMatousek,Onapproximategeometrick-clustering,DiscreteComput.Geom.24 (1)(2000)61–84.
[20]RafailOstrovsky,YuvalRabani,LeonardJ.Schulman,ChaitanyaSwamy,TheeffectivenessofLloyd-typemethodsforthek-meansproblem,J.ACM59 (6) (2012)28.
[21]AndreaVattani,k-meansrequiresexponentiallymanyiterationsevenintheplane,DiscreteComput.Geom.45 (4)(2011)596–616.