A refined approximation for Euclidean k-means

(1)

Contents lists available atScienceDirect

Information Processing Letters

www.elsevier.com/locate/ipl

A reﬁned approximation for Euclidean k-means

Fabrizio Grandoni

^a

^,∗ , Rafail Ostrovsky

^b

, Yuval Rabani

^c

, Leonard J. Schulman

^d

, Rakesh Venkat

^e

aIDSIA,USI-SUPSI,Switzerland bUCLA,UnitedStatesofAmerica cTheHebrewUniversityofJerusalem,Israel dCaltech,UnitedStatesofAmerica eIITHyderabad,India

a rt i c l e i nf o a b s t ra c t

Articlehistory:

Received15July2021 Accepted23January2022 Availableonline2February2022 CommunicatedbyLeahEpstein

Keywords:

Approximationalgorithms Euclideank-means Euclideanfacilitylocation Integralitygaps

IntheEuclideank-MeansproblemwearegivenacollectionofnpointsDⁱⁿânÊuclidean spaceandapositiveintegerk.Ourgoalistoidentifyacollectionofkpointsinthesame space(centers)soastominimizethesumofthesquaredEuclideandistancesbetweeneach pointinDând^the^closest^center.^This^problemîs^known^to^beÂPX-hardând^the^current bestapproximationratioisaprimal-dual6.357 approximationbasedonastandardLPfor theproblem[Ahmadianetal.FOCS’17,SICOMP’20].

In thisnote weshow how aminor modiﬁcation ofAhmadian etal.’sanalysis leads to a slightlyimproved6.12903 approximation. As arelated result,we alsoshow thatthe mentionedLPhasintegralitygapatleast ¹⁶⁺₁₅^√⁵>1.2157.

©²⁰²²^TheÂuthor(s).^Published^byÊlsevier^B.V.^Thisîsânôpenâccessârticleûnder^the CCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

ClusteringisacentralprobleminComputerScience,withmanyapplicationsindatascience,machinelearningetc.Oneof themostfamousandbest-studiedproblemsinthisareaisEuclideank-Means:givenasetD^ofⁿ^points^(or^demands)ⁱⁿR andanintegerk∈ [¹

,

n]^,^select^k^points ^S ^(centers)^so^as^to^minimize

j∈Dd²

(

j

,

S

)

.Hered

(

j

,

i

)

istheEuclideandistance betweenpoints j andi andforasetofpointsI,d

(

j

,

I

)

=^mini∈Id

(

j

,

i

)

.Inotherwords,wewishtoselectk centerssoasto minimizethesumofthesquaredEuclideandistancesbetweeneachdemandandtheclosestcenter.Equivalently,afeasible solutionisgivenbya partitionofthedemandsintok subsets(clusters).Thecost w_C ofa clusterC⊂D^is

j∈Cd²

(

c

, μ )

, where

μ

isthe centerofmassofC.Werecall that wC canalsobe expressedas ₂_|¹_C_|

j∈^C

j∈^Cd²

(

j

,

j

)

.Ourgoalis to minimizethetotalcostoftheseclusters.

Euclideank-Meansiswell-studiedintermsofapproximationalgorithms.ItisknowntobeAPX-hard.Moreprecisely,itis hardtoapproximatek-Meansbelowafactor1

.

0013 inpolynomialtimeunless P=^{N P} ^[6,16].^The^hardness ^was^improved to 1

.

07 under the Unique Games Conjecture [9]. Some heuristics are known to perform very well in practice, however their approximation factor is O

(

logk

)

or worse on generalinstances [3,4,17,21]. Constantapproximation algorithms are known.Alocal-searchalgorithmbyKanugoetal. [15] providesa9+

ε

approximation.¹ Theauthorsalsoshowthatnatural local-search basedalgorithmscannot performbetter thanthis. Thisratiowas improvedto 6

.

357 byAhmadian etal. [1,2]

*

Correspondingauthor.

E-mailaddress:fabrizio@idsia.ch(F. Grandoni).

1 Throughoutthispaperbyεwemeananarbitrarilysmallpositiveconstant.W.l.o.g.weassumeε≤1.

https://doi.org/10.1016/j.ipl.2022.106251

0020-0190/©²⁰²²^TheÂuthor(s).^Published^byÊlsevier^B.V.^Thisîsânôpenâccessârticleûnder^the^CC^BY-NC-ND^license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

(2)

usinga primal-dualapproach. Theyalso provea 9+

ε

approximation forgeneral(possiblynon-Euclidean) metrics.Better approximationfactorsareknownunderreasonablerestrictionsontheinput[5,7,10,20].APTASisknownforconstantk[19]

orforconstantdimension

[10,12].Noticethat

can bealwaysassumedtobe O

(

logn

)

by astandardapplicationofthe Johnson-Lindenstrausstransform[14].Thiswasrecentlyimprovedto O

(

logk+^{log logn}

)

[8] andﬁnallyto O

(

logk

)

[18].

InthispaperwedescribeasimplemodiﬁcationoftheanalysisofAhmadianetal. [2] whichleadstoaslightlyimproved approximationforEuclideank-Means(seeSection2).

Theorem1.Thereexistsadeterministicpolynomial-timealgorithmforEuclideank-Meanswithapproximationratio

ρ

+

ε

forany positiveconstant

ε >

0,where

ρ :=

⎛

⎝

1

+

1 2

(

2

+

³

3

−

²

√

2

+

³

3

+

²

√

2

)

⎞

⎠

2

<

6

.

12903

.

The above approximation ratio is w.r.t. the optimal fractional solution to a standard LP relaxation L P_k-Means for the problem(deﬁnedlater). Asaside result(seeSection 3),weprove alower boundontheintegralitygapofthisrelaxation (wearenotawareofanyexplicitsuchlowerboundintheliterature).

Theorem2.TheintegralitygapofL P_k-Means,evenintheEuclideanplane(i.e.,for

=^2),^is^at^least¹⁶⁺₁₅^√⁵

>

1

.

2157.

1.1. Preliminaries

As mentioned earlier, one can formulate Euclideank-Means in term of the selection of k centers. In this case, it is convenienttodiscretizethepossiblechoicesforthecenters,henceobtainingapolynomial-sizesetF ofcandidatecenters, atthecostofanextrafactor1+

ε

intheapproximationratio(wewillneglectthisfactorintheapproximationratiossince itisabsorbedbyanalogousfactorsintherestoftheanalysis).Inparticularwewillusetheconstructionin[11] (Lemma24) thatchoosesasF ^the^centersôf^massôfâny^collectionôfûp^to¹⁶

/ ε

² pointswithrepetitions.Inparticular|F|=^O

(

n¹⁶^/ε²

)

inthiscase.

Letc

(

j

,

i

)

beanabbreviationford²

(

j

,

i

)

.ThenastandardLP-relaxationfork-Meansisasfollows:

min

i∈F,j∈D

x_{i j}

·

^c

(

j

,

i

)

L P_k-Means

s

.

t

.

i∈F

x_{i j}

≥

¹

∀

^j

∈

D

x_{i j}

≤

^yi

∀

^j

∈

D

, ∀

ⁱ

∈

F

i∈F

yi

≤

k

∀

j

∈

D

, ∀

i

∈

F

x_{i j}

,

y_i

≥

⁰

∀

^j

∈

D

, ∀

ⁱ

∈

F

In an integral solution,we interpret yi=^{1 as} ⁱ ^beingâ ^selected^center ⁱⁿ ^S ⁽ⁱ îs ôpen), ând^xi j=^{1 as} ^demand ^j ^being assignedtocenteri.² Thefirstfamilyofconstraintsstatesthateachdemandhastobeassignedtosomecenter,thesecond onethatademandcanonlybeassignedtoanopencenter,andthethirdonethatwecanopenatmostkcenters.

Foranyparameter

λ >

0 (Lagrangianmultiplier),theLagrangianrelaxation L P

(λ)

ofL Pk-Means (w.r.t.thelastmatrixcon- straint)anditsdualD P

(λ)

areasfollows:

min

i∈F,j∈D

xi j

·

^c

(

j

,

i

) + λ ·

i∈F

yi

− λ

k L P

(λ)

s

.

t

.

i∈F

x_{i j}

≥

¹

∀

^j

∈

D

x_{i j}

≤

^yi

∀

^j

∈

D

,∀

ⁱ

∈

F

xi j

,

yi

≥

0

∀

j

∈

D

,∀

i

∈

F

2 Technicallyeachdemandisautomaticallyassignedtotheclosestopencenter.Howeveritisconvenienttoallowalsosub-optimalassignmentsinthe LPrelaxation.

(3)

max

j∈D

α

_j

₋ λ

k D P

(λ)

s

.

t

.

j∈D

max

{

⁰

, α

j

−

^c

(

j

,

i

) } ≤ λ ∀

ⁱ

∈

F (1)

α

j

≥

⁰

∀

^j

∈

D

Abovemax{⁰

, α

j−^c

(

j

,

i

)

}^replaces^the^dual^variable

β

i j correspondingtothesecondconstraintintheprimalinthestandard formulationofthedualLP.Noticethat,byremovingthefixedterm−λ^kⁱⁿ^theôbjective^functionsôf^{L P}

(λ)

andD P

(λ)

,one obtains thestandardLPrelaxationL PF L

(λ)

fortheFacilityLocationproblem(FL)withuniformfacilitycost

λ

andits dual D PF L

(λ)

.

Wesaythata

ρ

-approximationalgorithmforaFLinstanceoftheabovetypeisLagrangianMultiplierPreserving(LMP)if itreturnsasetoffacilitiesS thatsatisﬁes:

j∈D

c

(

j

,

S

) ≤ ρ (

O P T

(λ) − λ|

^S

|),

where O P T

(λ)

isthevalueoftheoptimalsolutiontoL PF L

(λ)

. 2. AreﬁnedapproximationforEuclideank-means

Inthissection we presentourreﬁned approximationforEuclideank-Means.Westart bypresentingthe LMPapproxi- mation algorithmforthe FLinstances arisingfromk-Meansdescribed in[2] in Section 2.1.Wethen presenttheanalysis ofthatalgorithmasin[2] inSection2.2.InSection2.3we describeourreﬁned analysisofthesamealgorithm.Finally,in Section2.4wesketchhowtousethistoapproximatek-Means.

2.1. Aprimal-dualLMPalgorithmforEuclideanfacilitylocation

Weconsider aninstanceofEuclideanFLinducedby ak-Meansinstanceinthementionedway,foragivenLagrangian multiplier

λ >

0.

We considerexactly the sameLagrangian Multiplier Preserving(LMP) primal-dual algorithm J V

(δ)

asin[2].In more detail, let

δ

≥^{2 be} â ^parameter ^to ^be ^fixed ^later. ^The âlgorithm ^consists ôf â dual-growth phase and a pruningphase.

The dual-growth phase is exactly asin the classical primal-dual algorithm JV by Jain andVazirani [13]. We start with all the dual variables set to 0 and an empty set Ot of tentatively open facilities. The clients such that

α

j≥^c

(

j

,

i

)

for some i∈Ot arefrozen,andtheother clientsareactive.Wegrowthedualvariables ofactive clientsatuniformrateuntil one ofthefollowingtwo eventshappens. Theﬁrsteventisthat some constraintoftype (1) becomes tight.At thatpoint the corresponding facilityi isadded to Ot andall clients j with

α

j≥^c

(

j

,

i

)

are set tofrozen. The second eventisthat

α

j=^c

(

j

,

i

)

forsomei∈Ot.Inthatcase j issettofrozen.Inanycase,thefacilityw

(

j

)

thatcauses j tobecomefrozenis calledthewitnessof j.Thephasehaltswhenallclientsarefrozen.

Inthepruningphasewe willclosesomefacilitiesinOt,henceobtainingtheﬁnalsetofopenfacilities I S.Here J V

(δ)

deviates from J V.Foreach client j∈D,let N

(

j

)

= {ⁱ∈F:

α

j

>

c

(

j

,

i

)

} ^be ^the^set^of ^facilities ⁱ ^such ^that ^j contributed with apositive amount to the openingof i. Symmetrically,for i∈F^, ^let ^N

(

i

)

= {^j∈D:

α

j

>

c

(

j

,

i

)}

^be ^the^clients ^that contributed with a positive amount to the opening of i. For i∈Ot, we let ti=^maxj∈^N(i)

α

j, where the values

α

j are consideredattheendofthedual-growthphase.Wesetconventionallyti=^{0 for}^N

(

i

)

= ∅^.Intuitively,ti isthe“time”when facilityi istentativelyopen(atwhichpointallthedualvariablesofcontributingclientsstopgrowing).Wedeﬁneaconﬂict graph H overtentatively open facilitiesasfollows. Thenode set of H is Ot. Weplace an edgebetween i

,

i∈Ot iff the followingtwoconditionshold:(1)forsome client j, j∈^N

(

i

)

∩^N

(

i

)

(inwords, j contributestotheopeningofbothi and i) and(2)one has c

(

i

,

i

)

≤

δ

·^min{^ti

,

ti}^.^In ^this^graph^we ^compute ^a ^maximal independent set I S,which providesthe desiredsolutiontothefacilitylocationproblem(whereeachclientisassignedtotheclosestfacilityinI S).

Weremarkthatthepruningphaseof J V differsfromtheoneof J V

(δ)

onlyinthedeﬁnitionofH,wherecondition(2) isnotrequiredtohold(or,equivalently, J V behaveslike J V

(

+∞

)

for

λ >

0).

2.2. Theanalysisin[2]

Thegeneralgoalistoshowthat

j∈D

c

(

j

,

I S

) ≤ ρ (

j∈D

α

j

− λ|

I S

|),

for some

ρ

≥^{1 as} ^small âs ^possible. ^This ^shows ^that ^the âlgorithm îs ân ^LMP

ρ

-approximation for the problem. It is suﬃcienttoprovethat,foreachclient j,onehas

(4)

c

(

j

,

I S

) ρ ^≤ α

_j

₋

i∈^N(j)∩^{I S}

( α

_j

₋

c

(

j

,

i

)) = α

_j

₋

i∈^{I S}

max

{

⁰

, α

_j

₋

c

(

j

,

i

) } .

LetS=^N

(

j

)

∩^{I S} ^and^s= |^S|^.^Wedistinguish3 casesdependingonthevalueofs.

CaseA:s=^{1 Let} ^S= {ⁱ^∗}^.^Then^for^any

ρ

≥^1, c

(

j

,

I S

)

ρ ^≤

^c

⁽

^j

^,

^{I S}

⁾ ⁼

^c

⁽

^j

^,

ⁱ

∗

) = α

j

− ( α

j

−

^c

(

j

,

i^∗

)).

CaseB:s

>

1 Here weusethe propertiesofEuclideanmetrics.Thesum

i∈^Sc

(

j

,

i

)

isthesumofthesquareddistances from j tothe facilitiesin S. Thisquantityis lowerboundedby thesumofthe squareddistancesfrom S tothe centroid

μ

of S.Recallthat

i∈Sc

( μ ,

i

)

=_2s¹

i∈S

i∈Sc

(

i

,

i

)

.Wealsoobservethat,byconstruction,foranytwodistinct i

,

i∈^{I S} onehas

c

(

i

,

i

) > δ ·

min

{

ti

,

ti

} ≥ δ · α

j

,

wherethelastinequalityfollowsfromthefactthat jiscontributingtotheopeningofbothiandi.Altogetheroneobtains

i∈S

c

(

j

,

i

) ≥

i∈S

c

( μ ,

i

) =

¹ 2s i∈Si∈^S

c

(

i

,

i

) ≥ (

s

−

¹

)δ α

j

2

.

Thus

i∈S

( α

j

−

c

(

j

,

i

)) ≤ (

s

− δ(

s

−

¹

)

2

) α

j

= (

s

(

1

− δ

2

) + δ

2

) α

j δ≥²,s≥²

≤ (

2

− δ

2

) α

j

.

Usingthefactthat

α

j

>

c

(

j

,

i

)

foralli∈^S,^hence

α

j

>

c

(

j

,

I S

)

,onegets

( δ

2

−

¹

)

c

(

j

,

I S

)

^δ≥

≤

²

( δ

2

−

¹

) α

j

.

Weconcludethat

i∈^S

( α

j

−

^c

(

j

,

i

)) + ( δ

2

−

¹

)

c

(

j

,

I S

) ≤ (

2

− δ

2

) α

j

+ ( δ

2

−

¹

) α

j

= α

j

.

Thisgivesthedesiredinequalityassumingthat

ρ

≥_δ/₂¹₋₁^.

CaseC:s=^{0 Consider}^the^witnessⁱ=^w

(

j

)

of j.Noticethat

α

j≥^tiand

α

j≥^c

(

j

,

i

)

=^d²

(

j

,

i

)

.Hence d

(

j

,

i

) +

δ

t_i

≤ (

1

+ √ δ) √

α

j

.

Ifi∈^{I S}^,^then^d

(

j

,

I S

)

≤^d

(

j

,

i

)

.Otherwisethereexistsi∈^{I S} ^such^that^d²

(

i

,

i

)

≤

δ

min{^ti

,

ti}≤

δ

ti.Thusd

(

j

,

I S

)

≤^d

(

j

,

i

)

+ d

(

i

,

i

)

≤^d

(

j

,

i

)

+√

δ

t_i.Inbothcasesonehasd

(

j

,

I S

)

≤

(

1+√

δ)

√

α

j,hence c

(

j

,

I S

) ≤ (

1

+ √

δ)

²

α

j

.

Thisgivesthedesiredinequalityfor

ρ

_≥

(

1+√

δ)

². Fixing

δ

Altogetherwecanset

ρ

=^max{_δ/₂¹₋₁

, (

1+√

δ)

²}^.^The^best^choice^for

δ

(namely,theonethatminimizes

ρ

)isthe solutionof _δ/₂¹₋₁=

(

1+√

δ)

².Thisisachievedfor

δ

²

.

3146 andgives

ρ

6

.

3574.

2.3. Areﬁnedanalysis

WereﬁnetheanalysisinCaseBasfollows.Let =

i∈^Sc

(

j

,

i

)

.Ourgoalistoupperbound c

(

j

,

S

)

α

j

−

i∈S

( α

j

−

^c

(

j

,

i

)) =

^c

(

j

,

S

)

i∈Sc

(

j

,

i

) − (

s

−

¹

) α

j

=

^c

(

j

,

S

)

− (

s

−

¹

) α

j

.

Insteadofusingtheupperboundc

(

j

,

S

)

≤

α

j weusetheaverage

c

(

j

,

S

) ≤

¹ s i∈^S

c

(

j

,

i

) =

s

.

(5)

Thenitissuﬃcienttoupperbound 1

s

− (

s

−

¹

) α

j

.

The derivativein oftheabove functionis ¹_s ⁻⁽^s⁻¹⁾αj

( −(s−¹)αj)²

<

0.Hence themaximumisachievedforthesmallestpossible valueof .Recallthatwealreadyshowedthat ≥⁽^s⁻¹₂^)δ^α^j^.^Henceâ^validûpper^boundîs

1 s

(s−1)δαj 2 (s−1)δαj

2

− (

s

−

¹

) α

j

=

¹ s

δ/

2

δ/

2

−

¹

s≥2,δ≥2

≤ δ/

4

δ/

2

−

¹

.

This imposes

ρ

_≥ _δ/^δ/₂₋⁴₁ rather than

ρ

_≥_δ/₂¹₋₁ in Case B.Notice that thisis an improvementfor

δ <

4. The bestchoice of

δ

is now obtained by imposing _δ/^δ/₂₋⁴₁ =

(

1+√

δ)

². This gives

δ

=¹₂

(

2+³ 3−²√

2+³ 3+²√

2

)

²

.

1777 and

ρ

=

1+

1 2

(

2+³

3−²√ 2+³

3+²√ 2

)

2

<

6

.

12903.

2.4. Fromfacilitylocationtok-means

Wecanusethereﬁned

ρ

:=

1+

1 2

(

2+³

3−²√ 2+³

3+²√ 2

)

2

approximationforEuclideanFacilityLocationfrom previoussectiontoderivea

ρ

+

ε

approximationforEuclideank-Means,foranyconstant

ε >

0.Herewefollowtheapproach of[2] withonlyminorchanges.Inmoredetail,theauthorsconsideravariantoftheFLalgorithmdescribedbefore,whose approximationfactoris

ρ

+

ε

ratherthan

ρ

.Acarefuluseofthisalgorithmleadstoasolutionopeningpreciselykfacilities, which leads to the desiredapproximation factor. In their analysisthe authors use slight modiﬁcations of the inequality

(

1+√

δ)

√

α

j≥^d

(

j

,

i

)

+√

δ

ti (comingfromCaseC,whichisthesameintheir andouranalysis).Thegoalistoprovethat themodiﬁedalgorithmis

ρ

+

ε

approximate.Here

δ

and

ρ

areusedasparameters.Thereforeitissuﬃcienttoreplacetheir valuesoftheseparameterswiththeonescomingfromourreﬁnedanalysis.Therestisidentical.

3. Lowerboundontheintegralitygap

In thissectionwe describe ourlower bound instanceforthe integralitygapof L Pk-Means. Itis convenientto consider ﬁrstthefollowingslightlydifferentrelaxation,basedonclusters(withw_C asdeﬁnedinSection1):

min

C∈C

wCxC L P_k-Means

s

.

t

.

C∈C:^j∈^C

xC

≥

¹

∀

^j

∈

D

C∈C x_C

≤

^k

xC

≥

⁰

∀

^C

∈

C

Here C ^denotes ^the^set ôf^possible ^clusters, î.e.^the^possible ^subsetsôf^points. În ân întegral ^solution ^xC=^{1 means}^that clusterC ispartofoursolution.

Ourinstanceison theEuclideanplane, andits points arethe(10)vertices oftworegular pentagonsofside length 1.

ThesepentagonsareplacedsothatanytwoverticesofdistinctpentagonsareatlargeenoughdistanceM tobeﬁxedlater.

Here k=^5. ^We ^remark ^that ôur ârgument ^can ^be êasily êxtended ^to ân ârbitrary^number ôf ^points ^by ^taking ^2h ^such pentagons forany integer h≥^{1 so} ^that ^the ^pairwise ^distance^between ^vertices ôf^distinct ^pentagons îs ât^least ^M, ând settingk=^5h.

Afeasible fractionalsolutionis obtainedby settingxC =⁰

.

5 forevery C consistingofa pairofconsecutiveverticesin thesamepentagon(soweareconsidering10 fractionalclustersintotal).Obviouslythissolutionisfeasible.ThecostwC of eachsuchclusterC is2

(

0

.

5

)

²=⁰

.

5.Hencethecostofthisfractionalsolutionis10·⁰

.

5·⁰

.

5=⁵₂^.

Next consider the optimalintegral solution,consisting of5 clusters. Recall that the radius of each pentagon (i.e. the distancefromavertextoitscenter) isr=

2 5−√

5⁰

.

851 andthedistancebetweentwo non-consecutiveverticesinthe same pentagonis d=^√⁵₂⁺¹¹

.

618.A solutionwithtwo clustersconsistingoftheverticesof eachpentagoncosts 10r². Any cluster involving vertices ofdistinct pentagonscosts atleast M²

/

2, hence for M large enough the optimal solution forms clusters only with vertices of the same pentagon. In more detail the optimal solution consists of x∈ {¹

,

2

,

3

,

4} clusters containing the vertices of one pentagon and 5−^x ^clusters ^containing ^the ^vertices ^of ^the ^remaining ^pentagon.

(6)

Let w

(

x

)

be theminimum costassociated withonepentagonassuming that we formx clusterswithitsvertices. Clearly w

(

1

)

=^5r²=₅₋¹⁰^√₅^.^Regarding ^w

(

4

)

,itisobviouslyconvenienttochoosetwoconsecutiveverticesintheuniqueclusterof size2.Thus w

(

4

)

=¹

/

2.Forx∈ {²

,

3}^,^we^note,âsîsêasy^to^verify,^that^clusters^withconsecutiveverticesarelessexpensive thanthealternatives.Forw

(

2

)

,onemightformoneclusterofsize1 andoneofsize4.Thiswouldcost ³⁽¹⁺₄^d²⁾=¹⁵⁺₈³^√⁵^. Alternatively,onemightformoneclusterofsize2 andoneofsize3,atsmallercost ¹₂+²⁺₃^d²=¹⁰⁺₆^√⁵^.^Thus ^w

(

2

)

=¹⁰⁺₆^√⁵^. Forx=^3,ône^might^form^two^clustersôf^size^{1 and}ôneôf^size^3,ôr^two^clustersôf^size^{2 and}ôneôf^size^1.^Theâssociated costinthetwocasesis ²⁺₃^d²

>

1 and2·¹₂=^1,^resp.^Hence ^w

(

3

)

=^1.^So^theôverall^costôf^theôptimalîntegral^solutionîs min{^w

(

1

)

+^w

(

4

),

w

(

2

)

+^w

(

3

)

}=^w

(

2

)

+^w

(

3

)

=¹⁶⁺₆^√⁵^.^Thus^theintegralitygapofL P_k-Means isatleast ¹⁶⁺

√5

6 ·²₅=¹⁶⁺₁₅^√⁵^. Consider next L P_k-Means.Herea technicalcomplicationcomesfromthe definitionofF ^whichîs^not ^partôf^theînput instance ofk-Means. The sameconstruction as above worksifwe let F ^contain ^the ^centersôf ^mass ôfâny ^setôf ^{2 or} 3 points. Notice thatthis isautomatically guaranteedby the construction in[11] for 16

/ ε

²_≥3. Inthiscasethe optimal integral solutionsto L P_k-Means andL P_k-Means are thesameintheconsideredexample.Furthermoreone obtainsafeasible fractional solutionto L P_k-Means ofcost5

/

2 bysetting y_i=⁰

.

5 for thecentersofmassofanytwo consecutivevertices of thesamepentagon,andsettingxi j=⁰

.

5 foreachpointi andthetwoclosestcenters jwithpositive yj.Thisconcludesthe proofofTheorem2.

Declarationofcompetinginterest

The authors declare that they haveno known competingﬁnancial interests or personal relationships that could have appearedtoinﬂuencetheworkreportedinthispaper.

Acknowledgements

WorksupportedinpartbytheNSFgrants1909972andCNS-2001096,theSNFexcellencegrant200020B_182865/1,the BSFgrants2015782and2018687,andtheISFgrants956-15,2533-17,and3565-21.

References

[1]SaraAhmadian,AshkanNorouzi-Fard,OlaSvensson,JustinWard,Betterguaranteesfork-meansandEuclideank-medianbyprimal-dualalgorithms, in:ChrisUmans(Ed.),58thIEEEAnnualSymposiumonFoundationsofComputerScience,FOCS2017,Berkeley,CA,USA,October15-17,2017,IEEE ComputerSociety,,2017,pp. 61–72.

[2]SaraAhmadian,AshkanNorouzi-Fard,OlaSvensson,JustinWard,Betterguaranteesfork-meansandEuclideank-medianbyprimal-dualalgorithms, SIAMJ.Comput.49 (4)(2020).

[3]DavidArthur,SergeiVassilvitskii,Howslowisthek-meansmethod?,in:NinaAmenta,OtfriedCheong(Eds.),Proceedingsofthe22ndACMSymposium onComputationalGeometry,Sedona,Arizona,USA,June5–7,2006,ACM,2006,pp. 144–153.

[4]DavidArthur,SergeiVassilvitskii,k-means++:theadvantagesofcarefulseeding,in:NikhilBansal,KirkPruhs,CliffordStein(Eds.),Proceedingsof theEighteenthAnnual ACM-SIAMSymposiumon DiscreteAlgorithms,SODA2007, NewOrleans,Louisiana,USA, January7-9, 2007, SIAM,2007, pp. 1027–1035.

[5]PranjalAwasthi,AvrimBlum,OrSheffet,StabilityyieldsaPTASfork-medianandk-meansclustering,in:51thAnnualIEEESymposiumonFoundations ofComputerScience,FOCS2010,October23-26,2010,LasVegas,Nevada,USA,IEEEComputerSociety,2010,pp. 309–318.

[6]PranjalAwasthi,MosesCharikar,RavishankarKrishnaswamy,AliKemalSinop,ThehardnessofapproximationofEuclideank-means,in:LarsArge, JánosPach(Eds.),31stInternationalSymposiumonComputationalGeometry,SoCG2015,June22-25,2015,Eindhoven,TheNetherlands,in:LIPIcs, vol. 34,SchlossDagstuhl- Leibniz-ZentrumfürInformatik,2015,pp. 754–767.

[7]Maria-FlorinaBalcan,AvrimBlum,AnupamGupta,Approximateclusteringwithouttheapproximation,in:ClaireMathieu(Ed.),Proceedingsofthe TwentiethAnnualACM-SIAMSymposiumonDiscreteAlgorithms,SODA2009,NewYork,NY,USA,January4-6,2009,SIAM,2009,pp. 1068–1077.

[8]LucaBecchetti,MarcBury,VincentCohen-Addad,FabrizioGrandoni,ChrisSchwiegelshohn,Obliviousdimensionreductionfork-means:beyondsub- spacesandtheJohnson-Lindenstrausslemma,in:MosesCharikar,EdithCohen(Eds.),Proceedingsofthe51stAnnualACMSIGACTSymposiumon TheoryofComputing,STOC2019,Phoenix,AZ,USA,June23-26,2019,ACM,2019,pp. 1039–1050.

[9]VincentCohen-Addad,C.S.Karthik,Inapproximabilityofclusteringinlpmetrics,in:DavidZuckerman(Ed.),60thIEEEAnnualSymposiumonFounda- tionsofComputerScience,FOCS2019,Baltimore,Maryland,USA,November9-12,2019,IEEEComputerSociety,2019,pp. 519–539.

[10]VincentCohen-Addad,PhilipN.Klein,ClaireMathieu,Localsearchyieldsapproximationschemesfork-meansandk-medianinEuclideanandminor- freemetrics,SIAMJ.Comput.48 (2)(2019)644–667.

[11]Wenceslas FernandezdelaVega,MarekKarpinski,ClaireKenyon,YuvalRabani,Approximation schemesfor clusteringproblems,in:LawrenceL.

Larmore,MichelX.Goemans(Eds.),Proceedingsofthe35thAnnualACMSymposiumonTheoryofComputing,June9-11,2003,SanDiego,CA,USA, ACM,2003,pp. 50–58.

[12]ZacharyFriggstad,MohsenRezapour,MohammadR.Salavatipour,LocalsearchyieldsaPTASfork-meansindoublingmetrics,SIAMJ.Comput.48 (2) (2019)452–480.

[13]KamalJain,VijayV.Vazirani,Approximationalgorithmsformetricfacilitylocationandk-medianproblemsusingtheprimal-dualschemaandLa- grangianrelaxation,J.ACM48 (2)(2001)274–296.

[14]WilliamB.Johnson,JoramLindenstrauss,ExtensionsofLipschitzmappingsintoaHilbertspace,Contemp.Math.26(1984)189–206.

[15]TapasKanungo,DavidM.Mount,NathanS.Netanyahu,ChristineD.Piatko,RuthSilverman,AngelaY.Wu,Alocalsearchapproximationalgorithmfor k-meansclustering,Comput.Geom.28 (2–3)(2004)89–112.

[16]EuiwoongLee,MelanieSchmidt,JohnWright,Improvedandsimpliﬁedinapproximabilityfork-means,Inf.Process.Lett.120(2017)40–43.

[17]StuartP.Lloyd,LeastsquaresquantizationinPCM,IEEETrans.Inf.Theory28 (2)(1982)129–136.

[18]KonstantinMakarychev,YuryMakarychev,IlyaP.Razenshteyn,PerformanceofJohnson-Lindenstrausstransformfork-meansandk-mediansclustering, in:MosesCharikar,EdithCohen(Eds.),Proceedingsofthe51stAnnualACMSIGACTSymposiumonTheoryofComputing,STOC2019,Phoenix,AZ, USA,June23-26,2019,ACM,2019,pp. 1027–1038.

(7)

[19]JiríMatousek,Onapproximategeometrick-clustering,DiscreteComput.Geom.24 (1)(2000)61–84.

[20]RafailOstrovsky,YuvalRabani,LeonardJ.Schulman,ChaitanyaSwamy,TheeffectivenessofLloyd-typemethodsforthek-meansproblem,J.ACM59 (6) (2012)28.

[21]AndreaVattani,k-meansrequiresexponentiallymanyiterationsevenintheplane,DiscreteComput.Geom.45 (4)(2011)596–616.