• Tidak ada hasil yang ditemukan

ي ﺧﻮدﮐﺎرﺳﺎز ﺑﺮاي ﯽﺘﯾ ﺗﻘﻮ ي ﺮﮔﯿ ﺎدﯾ ﺘﻢﯾ اﻟﮕﻮر ﺗﻮﺳﻌﻪ ﻣﺪ

N/A
N/A
Protected

Academic year: 2024

Membagikan "ي ﺧﻮدﮐﺎرﺳﺎز ﺑﺮاي ﯽﺘﯾ ﺗﻘﻮ ي ﺮﮔﯿ ﺎدﯾ ﺘﻢﯾ اﻟﮕﻮر ﺗﻮﺳﻌﻪ ﻣﺪ"

Copied!
10
0
0

Teks penuh

(1)

ﯽﺿﺎﯾر لﺪﻣ ﻪﻌﺳﻮﺗ رﻮﮕﻟا

ﻢﺘ دﺎ ﯿﮔ ي ﻮﻘﺗ ﯾﺘ ياﺮﺑ زﺎﺳرﺎﮐدﻮﺧ ي

بآ ﺪﻨﺑ يﺎﻫ ﯽﯾﻮﺸﮐ رد

لﺎﻧﺎﮐ ﺎﻫ

يدرﻮﻫﺎﺷ ﻢﻇﺎﮐ -1

ﻢﻌﻨﻣ داﻮﺟﺪﻤﺤﻣ

2

*

:ﺖﻓﺎﯾرد ﺦﯾرﺎﺗ 28

/ 05 / 1392

ﺗ :شﺮﯾﺬﭘ ﺦﯾرﺎ 31

/ 03 / 1394

هﺪﯿﮑﭼ

ﻪﮑﺒﺷ دﺮﮑﻠﻤﻋ دﻮﺒﻬﺑ ،بآ زا ﻪﻨﯿﻬﺑ هدﺎﻔﺘﺳا رﻮﻈﻨﻣ ﻪﺑ و يزروﺎﺸﮐ بآ ﺶﺨﺑ رد ﺖﯾﺮﯾﺪﻣ ﻒﻌﺿ و بآ دﻮﺒﻤﮐ ﻪﺑ ﻪﺟﻮﺗ ﺎﺑ هزوﺮﻣا يروﺮـﺿ يرﺎـﯿﺑآ يﺎﻫ

ﻪﻧﺎﻣﺎﺳ يﺮﯿﮔرﺎﮑﺑ ،ﺮﯿﺧا يﺎﻬﻟﺎﺳ رد .ﺖﺳا ﭘ رﺎﮐدﻮﺧ لﺮﺘﻨﮐ يﺎﻫ

ﯿ ﻪﮑﺒﺷ دﺮﮑﻠﻤﻋ دﻮﺒﻬﺑ رﻮﻈﻨﻣ ﻪﺑ ﻪﺘﻓﺮﺸ ﻣ يرﺎﯿﺑآ يﺎﻫ

ياﺮﺑ .ﺖﺳا ﻪﺘﻓﺮﮔراﺮﻗ نﺎﻘﻘﺤﻣ ﻪﺟﻮﺗ درﻮ

ﻪﻧﺎﻣﺎﺳ ﻦﯾا زا هدﺎﻔﺘﺳا ﻢﺘﺴﯿﺳ ﯽﺿﺎﯾر لﺪﻣ ﻪﯿﻬﺗ ،ﺎﻫ

و رﺎﮐدﻮﺧ لﺮﺘﻨﮐ يﺎﻫ هزﺎﺳ

لﺪﻣ ﺎﺑ ﯽﻘﯿﻔﻠﺗ ترﻮﺼﺑ طﻮﺑﺮﻣ يﺎﻫ .ﺖـﺳا يروﺮـﺿ ﯽﮑﯿﻣﺎﻨﯾدورﺪﯿﻫ يﺎﻫ

ا زا هدﺎﻔﺘﺳا ﺎﺑ ﯽﯾﻮﺸﮐ ﻪﭽﯾرد ﺖﺳدﻻﺎﺑ رﺎﮐدﻮﺧ لﺮﺘﻨﮐ ﻢﺘﺴﯿﺳ ﯽﺿﺎﯾر لﺪﻣ ﻪﻌﺳﻮﺗ ،ﻖﯿﻘﺤﺗ ﻦﯾا ﯽﻠﺻا فﺪﻫ ﮏـﯾ ترﻮـﺼﺑ ﯽﺘﯾﻮـﻘﺗ يﺮﯿﮔدﺎـﯾ ﻢﺘﯾرﻮـﮕﻟ

ﯽﮑﯿﻣﺎﻨﯾدورﺪﯿﻫ لﺪﻣ رد ﻪﻣﺎﻧﺮﺑﺮﯾز ﻪﯿﻟوا ﯽﺑد ﺎﺑ يدورو نﺎﯾﺮﺟ ياﺮﺑ ﯽﺸﻫﺎﮐ و ﯽﺸﯾاﺰﻓا يﻮﯾرﺎﻨﺳ ود .ﺖﺳاICSS

25 و ﺪـﺷ ﻪـﺘﻓﺮﮔ ﺮﻈﻧ رد ﻪﯿﻧﺎﺛ رد ﺮﺘﯿﻟ

ﺑ ﻢﺘﺴﯿﺳ ﯽﯾﻮﮕﺨﺳﺎﭘ نﺎﻣز و ﺎﻫﺎﻄﺧ ﻂﺳﻮﺘﻣ ﻦﯿﮕﻧﺎﯿﻣ ،ﻖﻠﻄﻣ يﺎﻄﺧ ﺮﺜﮐاﺪﺣ يﺎﻬﺼﺧﺎﺷ .ﺖﻓﺮﮔ مﺎﺠﻧا يزﺎﺳ ﻪﯿﺒﺷ لﺮـﺘﻨﮐ ﻢﺘـﺴﯿﺳ دﺮـﮑﻠﻤﻋ ﯽﺑﺎﯾزرا ياﺮ

ﯽﻣ نﺎﺸﻧ ﻞﺻﺎﺣ ﺞﯾﺎﺘﻧ .ﺪﻣآ ﺖﺳﺪﺑ ﺮﻔﺻ ﺮﺑاﺮﺑ نﺎﯾﺮﺟ ﺶﻫﺎﮐ يﻮﯾرﺎﻨﺳ ياﺮﺑ ﺎﻬﺼﺧﺎﺷ ﻦﯾا ﺮﯾدﺎﻘﻣ .ﺪﺷ هداد ﻪﻌﺳﻮﺗ ﯽﺒﺳﺎﻨﻣ دﺮﮑﻠﻤﻋ ﻪﺘﻓﺎﯾ ﻪﻌﺳﻮﺗ لﺪﻣ ﺪﻫد

ﯽﻣ لﺮﺘﻨﮐ ﯽﺑﻮﺨﺑ ار بآ ﺢﻄﺳ و ﻪﺘﺷاد ﯽﻣ دﺎﻬﻨﺸﯿﭘ يرﺎﯿﺑآ يﺎﻬﻟﺎﻧﺎﮐ ياﺮﺑ نآ ﻪﻌﺳﻮﺗ ور ﻦﯾا زا .ﺪﻨﮐ

.دﻮﺷ

هژاو :يﺪﯿﻠﮐ يﺎﻫ ﻖﯿﻔﻠﺗ

،لﺪﻣ ﯾ ،دﺮﮑﻠﻤﻋ دﻮﺒﻬﺑ ﮔدﺎ

ﯿﺮ ي

ﻪﻣﺪﻘﻣ 1 2

ﻪﺑ ﻞﯿﻟد ﻦﯿﯾﺎﭘ ندﻮﺑ دﺮـﮑﻠﻤﻋ ﻪﮑﺒـﺷ ،يرﺎـﯿﺑآ يﺎـﻫ هﺪـﯾا

دﻮـﺒﻬﺑ

يﺎﻬﺷور ﻞﯾﻮﺤﺗ و ﻊﯾزﻮﺗ و بآ ﺶﯾاﺰﻓا فﺎﻄﻌﻧا يﺮﯾﺬﭘ حﺮـﻄﻣ عﻮﺿﻮﻣ

ﻦﯿﻘﻘﺤﻣ ﻦﯿﺑ رد ﺖﺳا

. دﻮـﺒﻬﺑ ياﺮـﺑ ﻦـﮑﻤﻣ يﺎـﻬﻫار ﻦﯾﺮﺘﻬﺑ زا ﯽﮑﯾ

فﺎـﻄﻌﻧا ﺶﯾاﺰـﻓا و دﺮﮑﻠﻤﻋ ﺬـﭘ

،زﺎـﺑور يﺎـﻬﻟﺎﻧﺎﮐ رد بآ ﻞـﯾﻮﺤﺗ يﺮﯾ

ﺎــﻬﻟﺎﻧﺎﮐ رد بآ ﻞـﯾﻮﺤﺗ يﺎـﻫ ﻢﺘــﺴﯿﺳ يزﺎـﺳرﺎﮐدﻮﺧ ﺎـﯾ نﻮﯿـﺳﺎﻣﻮﺗا ﯽﻣ ﻪﻧﺎﻣﺎـﺳ ﺎـﺑ رﺎـﺑ ﻦﯿـﻟوا يرﺎـﯿﺑآ يﺎﻬﻟﺎﻧﺎﮐ يزﺎﺳرﺎﮐدﻮﺧ .ﺪﺷﺎﺑ يﺎـﻫ

ﯽﮑﯿﻧﺎﮑﻣوﺮﺘﮑﻟا Littleman

و Colvin ) 6(

ﻪـﭽﯾرد ﺎـﺑ ﺲﭙﺳ و يﺎـﻫ

ﺮﯿﺧا يﺎﻬﻟﺎﺳ رد .ﺖﻓﺎﯾ ﻪﻌﺳﻮﺗ ﻮﯾوآ و ﺲﯾوآ ،ﻞﯿﻣآ ﯽﮑﯿﻟورﺪﯿﻫ رﺎﮐدﻮﺧ لﺪﻣ ﻪﻌﺳﻮﺗ ﺎﺑ ﺮﺗ ،ﯽﺿﺎﯾر يﺎﻫ

لﺪﻣ ﺐﯿﮐ ﻪﯿﺒﺷ يﺎﻫ لﺎﻧﺎﮐ نﺎﯾﺮﺟ زﺎﺳ ﺎﻫ

ﻢﺘﯾرﻮﮕﻟا ﺎﺑ .ﺖﺳا ﻪﺘﻓﺮﮔ راﺮﻗ ﻦﯿﻘﻘﺤﻣ ﻪﺟﻮﺗ درﻮﻣ ،ﻪﺘﻓﺮﺸﯿﭘ لﺮﺘﻨﮐ يﺎﻫ

ﻦﯿﻘﻘﺤﻣ زا ﯽﺧﺮﺑ و هداد راﺮﻗ ﯽﺳرﺮﺑ درﻮﻣ ار دﻮﺟﻮﻣ يﺎﻬﻤﺘﯾرﻮﮕﻟا

راﺮﻗ ﺚﺤﺑ درﻮﻣ ار ﺎﻬﻤﺘﯾرﻮﮕﻟا ﻦﯾا زا ﮏﯾ ﺮﻫ ﺐﯾﺎﻌﻣ و ﺎﯾاﺰﻣ ،تﺎﺼﺨﺸﻣ هداد ) ﺪﻧا 7 و 10 ﻪﻧﻮﻤﻧ .(

ا زا ﯽﯾﺎﻫ ﻢﺘﯾرﻮـﮕﻟا ﻦﯾ هژوﺮـﭘ رد ﻪـﮐ ﺎـﻫ

يﺎـﻫ

1 و 2 - ﻪﺘﺧﻮﻣآ ﺶﻧاد رﺎﯿﺸﻧاد و يﺮﺘﮐد

هوﺮﮔ ﯽﺳﺪﻨﻬﻣ هزﺎـﺳ يﺎـﻫ ،ﯽـﺑآ هﺎﮕـﺸﻧاد

ﺖﯿﺑﺮﺗ سرﺪﻣ

*) - :لﻮﺌﺴﻣ هﺪﻨﺴﯾﻮﻧ

Email: [email protected] (

ﻪﺘﻓﺮﮔ راﺮﻗ هدﺎﻔﺘﺳا درﻮﻣ يرﺎﯿﺑآ :زا ﺪـﻨﺗرﺎﺒﻋ ﺪـﻧا

)PI 2 ،(

Bival ) 12 ،(

CARRD ) 3

،(

Fuzzy ) 16 ( ، )MPC 15 ،(

Adaptive control

) 13 .(

ﯽﻣ هﺪﻫﺎﺸﻣ ﻪﮐ رﻮﻄﻧﺎﻤﻫ دﻮﺷ

هﺪﯾﺪﭘ ﺮﺘﻬﺑ لﺮﺘﻨﮐ ﯽﮑﯿﻟورﺪﯿﻫ يﺎﻫ

ﺘﻓﺮﮔ راﺮﻗ ﯽﺳرﺮﺑ درﻮﻣ ﻒﻠﺘﺨﻣ ﻦﯿﻘﻘﺤﻣ ﻂﺳﻮﺗ نﺎﻨﭽﻤﻫ يﺎﻬﮑﯿﻨﮑﺗ و ﻪ

ﯽﻣ ﻪﻌﺳﻮﺗ لﺎﺣ رد ﺮﺘﻬﺑ لﺮﺘﻨﮐ يﺮﯿﮔدﺎﯾ ﻢﺘﯾرﻮﮕﻟا ﺎﺘﺳار ﻦﯿﻤﻫ رد .ﺪﺷﺎﺑ

.ﺖﺳا ﻪﺘﻓﺮﮔ راﺮﻗ ﻪﺟﻮﺗ درﻮﻣ ﻖﯿﻘﺤﺗ ﻦﯾا رد ﯽﺘﯾﻮﻘﺗ

ﯽﻋﻮﻨــﺼﻣ شﻮـﻫ يﺎـﻬﻤﺘﯾرﻮﮕﻟا زا ﯽـﮑﯾ

ﻢﺘﯾرﻮــﮕﻟا لﺮــﺘﻨﮐ يﺎـﻫ

ﯽﻣ ﻪﺘﻓﺮﺸﯿﭘ ﯽﻣ ﻪﮐ ﺪﺷﺎﺑ

هزﺎﺳ رﺎﮐدﻮﺧ لﺮﺘﻨﮐ رﻮﻈﻨﻣ ﻪﺑ ﺎﻬﻧآ زا ناﻮﺗ ﺎـﻫ

يرﺎﯿﺑآ يﺎﻬﻟﺎﻧﺎﮐ رد .دﺮـﮐ هدﺎﻔﺘﺳا

ﻦﺘﺧﺎـﺳ ﺶـﻧاد ،ﯽﻋﻮﻨـﺼﻣ شﻮـﻫ

ﻦﯿﺷﺎﻣ ﻪﻣﺎﻧﺮﺑ ﺎﯾ ﺎﻫ ﺖـﺳا ﺪﻨﻤﺷﻮﻫ يﺎﻫ

ﻪﺧﺎـﺷ زا ﯽـﮑﯾ . و ﻢـﻬﻣ يﺎـﻫ

ﯽﺘﯾﻮﻘﺗ يﺮﯿﮔدﺎﯾ ،ﯽﻋﻮﻨﺼﻣ شﻮﻫ دﺮﺑرﺎﮐﺮﭘ (RL)3

ﯽـﻣ هﻮـﺤﻧ .ﺪـﺷﺎﺑ

ﻪﯿﺒﺷ لﻮﻃ رد ﺖﻗد ﺶﯾاﺰﻓا ﻪﺠﯿﺘﻧ رد و ﻢﺘﯾرﻮﮕﻟا ﻦﯾا دﺮﮑﻠﻤﻋ ﺎﺑ يزﺎﺳ

ﺪﯾدﺮﮔ ﯽﺳرﺮﺑ شور ﻦﯾا زا هدﺎﻔﺘﺳا )5

ﺪﻧور ﺲﭙﺳ و ( ﻢﺘﯾرﻮـﮕﻟا ﻦﯾا ﺎﻤﻧ

) ﺪــﺷ ﻪــﺋارا ﻦﯿــﻘﻘﺤﻣ هدﺎﻔﺘــﺳا رﻮــﻈﻨﻣ ﻪــﺑ 4

ﻢﺘﯾرﻮــﮕﻟا .(

ياﺮــﺑRL

هﺮﻬﺑ درﻮـﻣ هزﺎﺑ ود ﺎﺑ زﺎﺑور لﺎﻧﺎﮐ ﮏﯾ رد ﯽﺋﻮﺸﮐ ﻪﭽﯾرد هزﺎﺳ يرادﺮﺑ

) ﺖﻓﺮﮔ راﺮﻗ هدﺎﻔﺘﺳا 9

لﺪـﻣ ﻪﻌـﺳﻮﺗ ﻖـﯿﻘﺤﺗ ﻦﯾا زا فﺪﻫ .(

ﯽـﺿﺎﯾر

ﻢﺘﺴﯿﺳ لﺮﺘﻨﮐ رﺎﮐدﻮﺧ ﺖﺳدﻻﺎﺑ ﻪﭽﯾرد ﯽﯾﻮﺸﮐ ﺎﺑ هدﺎﻔﺘﺳا زا ﻢﺘﯾرﻮﮕﻟا

3- Reinforcement Learning

(2)

ﺎﯾ لﺪﻣ ﺎﺑ ﻖﯿﻔﻠﺗ ترﻮﺼﺑ و يﺮﯿﮔد ﯽﮑﯿﻣﺎﻨﯾدورﺪﯿﻫ

ﯽﻣICSS .ﺪﺷﺎﺑ

ﻢﺘﯾرﻮﮕﻟا ﯽﻧﺎﺒﻣ ﯽﻓﺮﻌﻣ RL

ﻦﯿﺷﺎﻣ يﺮﯿﮔدﺎﯾ رد

، ﻪﻣﺎﻧﺮﺑ فﺪﻫ ﻪﻧﺎﯾار يﺰﯾر

ﻪـﮐ ﺖﺳا يﻮﺤﻨﺑ ﺎﻫ

ﺪﻨﮐ هدﺎﻔﺘﺳا هﺪﺷ هداد ﻪﻠﺌﺴﻣ ﮏﯾ ﻞﺣ ياﺮﺑ ﻪﺘﺷﺬﮔ تﺎﯿﺑﺮﺠﺗ زا ﺪﻧاﻮﺘﺑ .

ﺎﯾ د ﺮﯿﮔ ي ﯽﺘﯾﻮﻘﺗ رد ﺎﯿﺑ ﮏﯾ ن ﺧﺎﻨﺷ ﻪﺑ ﻪﺟﻮﺗ ﺎﺑ ﯽﻨﻌﯾ ﯽﻠﮐ ت

ﻂﯿﺤﻣ و ﺮﺑ

ا ﺎﺳ س ﻌﺗ ﺞﯾﺎﺘﻧ ـ ﻂﯿﺤﻣ ﺎﺑ ﻪﮐﯽﺗﻼﻣﺎ دﻮﺟو

اد ﻪﺘﺷ و ﻮﺳ د ﺎﻫ ز و ﻪﮐﯽﯾﺎﻬﻧﺎﯾ

رد ﻪﺠﯿﺘﻧ ا ﺎﺠﻧ م ﺶﻨﮐ يﺎﻫ ﺖﺳﺪﺑ ﻒﻠﺘﺨﻣ آ

ﺖﺳا هﺪﻣ

، ذﺎـﺨﺗا يدﺮﺒﻫار

دﻮﺷ ﻪﺑ ﻞﻤﻋ ﺎﺑ ﻪﮐ رد نآ

ﺪﻣ ﺪﻨﻠﺑ ت

، ﻪﻨﯿﺸﯿﺑ ﺖﯿﺑﻮﻠﻄﻣ ) دﻮﺷ

14 .(

رﻮﻄﺑ اﺰﺟا ﯽﻠﮐ ء ﯽﻠﺻا ﻞﻣﺎﺷ ﯽﺘﯾﻮﻘﺗ يﺮﯿﮔدﺎﯾ ﻞﺋﺎﺴﻣ ﺞﻨﭘ

ﺮـﺼﻨﻋ

ﻞﻣﺎﻋ ﻂﯿﺤﻣ ،1

ﯿﺳ ،2

ﺖﺳﺎ شادﺎﭘ ﻊﺑﺎﺗ ،3

لﺪﻣ و4

.ﺪﻨﺘﺴﻫ ﻞـﻣﺎﻋ نﺎـﻤﻫ

ﻪﮐ ﺖﺳا هﺪﻧﺮﯿﮔدﺎﯾ ﻪﻋﻮﻤﺠﻣ ﻖﯾﺮﻃزا

ﺶﯾﺎﻬﮐﺮﺤﻣ و ﺎﻫﺮﮕﺴﺣزا يا ﯽﻣ

يﺎﻬﺸﻨﮐ ﻞﻣﺎﻋ ﻪﮐ ﺖﺳا ﯽﯾﺎﻀﻓ ﻂﯿﺤﻣ .ﺪﺷﺎﺑ ﻞﻣﺎﻌﺗ رد ﻂﯿﺤﻣ ﺎﺑ ﺪﻧاﻮﺗ

5

هداد مﺎﺠﻧا نآ رد ار ﻪﻃﻮﺑﺮﻣ ﯽﻣ ﺮﯿﺛﺎﺗ نآ يور و

دراﺬﮔ ﺳ . ﯿ ﺖﺳﺎ

، ﻪﺘﺴﻫ

ﺖﺳا ﯽﺘﯾﻮﻘﺗ هﺪﻧﺮﯿﮔدﺎﯾ ﻞﻣﺎﻋ ﯽﻠﺻا ﻪﮐ

ﯽﻣ ﺺﺨﺸﻣ ﻪﭼ ﻞﻣﺎﻋ ﻪﮐ ﺪﻨﮐ

ﺎﺣﺮﻫ رد ار ﯽﺸﻨﮐ ﺖﻟ

ﺪﯾﺎﺑ مﺎﺠﻧا ﺪﻫد . ﺶﻨﮐ ﺖﯿﺑﻮﻠﻄﻣ ناﺰﯿﻣ ﻪﺑ ﻪﺟﻮﺗ ﺎﺑ

ﯽﻣ هداد نآ ﻪﺑ ﯽﺷادﺎﭘ ،هﺪﺷ مﺎﺠﻧا ﻪﮐ دﻮﺷ

راﺪـﻘﻣ نآ ﺮﻃ زا ـﯾ ﻖ ﻊﺑﺎـﺗ

ﻣ ﺺﺨﺸﻣ شادﺎﭘ ﯽ

دﻮﺷ ﻪﯿﺒﺷ ار ﻂﯿﺤﻣ رﺎﺘﻓر لﺪﻣ . ﺖﻟﺎﺣ و هدﺮﮐ يزﺎﺳ

ﯿﺳ يﺪﻌﺑ ﻢﺘﺴ ﺶﯿﭘ ار ﯽﻣ ﯽﻨﯿﺑ ﺪﻨﮐ )1 (.

رد د ﻖﯿﻘﺤﺗ ﻦﯾا ﻪﮑﺒﺷ ر

هزﺎـﺳ ،ﻞـﻣﺎﻋ زا رﻮﻈﻨﻣ يرﺎﯿﺑآ يﺎﻫ يﺎـﻫ

بآ ﯽﻣ ﺪﻨﺑ هداد دﺎﯾ ﻒﻠﺘﺨﻣ ﻂﯾاﺮﺷ ياﺮﺑ ﺎﻬﻧآ ﯽﮔﺪﺷزﺎﺑ ناﺰﯿﻣ ﻪﮐ ﺪﺷﺎﺑ

ﯽﻣ .دﻮﺷ ﺤﻣ زا رﻮﻈﻨﻣ ﯿ

ﻂ ، لﺎﻧﺎﮐ ﻪﮑﺒﺷ ﺎﯾ ﻂﯾاﺮﺷ ﻪﮐ ﺖﺳا ﺎﻬﻟﺎﻧﺎﮐ زا يا

ﺖﻟﺎﺣ لدﺎﻌﻣ ﻪﮐ) نآ نورد نﺎﯾﺮﺟ ﯽﮑﯿﻟورﺪﯿﻫ ﯽﺘﯾﻮـﻘﺗ يﺮﯿﮔدﺎـﯾ رد6

ﻪﺑ يدورو ناﻮﻨﻋ ﻪﺑ ،(ﺖﺳا رﻮﮕﻟا

ﯾ ﻢﺘ ﯽﻣ هدادRL ﺮـﺑ يﺮﯿﮔدﺎﯾ و دﻮﺷ

ﯽﻣ مﺎﺠﻧا نآ تاﺮﯿﯿﻐﺗ سﺎﺳا دﺮﯿﮔ

. ناﻮﻨﻋ ﻪﺑ بآ ﻖﻤﻋ ،ﻖﯿﻘﺤﺗ ﻦﯾا رد

ﯽﻣ ﺖﻟﺎﺣ ﯽﻣ ﯽﮑﯿﻣﺎﻨﯾدورﺪﯿﻫ لﺪﻣ ،لﺪﻣ زا رﻮﻈﻨﻣ .ﺪﺷﺎﺑ رد ﻪـﮐ ﺪـﺷﺎﺑ

ﻖﯿﻘﺤﺗ ﻦﯾا ﺖـﺳا ﻪـﺘﻓﺮﮔ راﺮـﻗ هدﺎﻔﺘـﺳا درﻮﻣICSS

ناﺰـﯿﻣ ،ﻖـﻤﻋ .

ﯽﻘﻠﺗ ﻮﻀﻋ ﮏﯾ ناﻮﻨﻋ ﻪﺑ نآ ﺎﺑ ﻂﺒﺗﺮﻣ شادﺎﭘ و ﯽﮔﺪﺷزﺎﺑ ﯽﻣ

ﻪﮐ دﻮﺷ

ﺮﯾﺎﻔﯿﺳﻼﮐ ﮏﯾ ﯽﻣ هﺪﯿﻣﺎﻧ7

زا ﺲـﭘ ﺎﻫﺮﯾﺎﻔﯿـﺳﻼﮐ زا يا ﻪﻋﻮﻤﺠﻣ .دﻮﺷ

ﯽـﻣ ﻞﯿﮑـﺸﺗ ار ﯽﺘﯾﻮـﻘﺗ يﺮﯿﮔدﺎـﯾ ﻢﺘﯾرﻮـﮕﻟا ﺖﺳﺎﯿﺳ ،يﺮﯿﮔدﺎﯾ ﺪـﻫد

نآ ﺎﺑ ﻂﺒﺗﺮﻣ ﻪﭽﯾرد ﯽﮔﺪﺷزﺎﺑ ناﺰﯿﻣ ،ﺺﺨﺸﻣ ﻖﻤﻋ ﺮﻫ ياﺮﺑ ﻪﮑﯾرﻮﻄﺑ ﯽـﻣ فﺪـﻫ راﺪـﻘﻣ رد بآ ﻖﻤﻋ ﺖﯿﺒﺜﺗ ﺐﺟﻮﻣ ﻪﮐ هﺮـﯿﺧذ نآ رد دﻮـﺷ

ﯽﻣ .ددﺮﮔ ﻞﺣاﺮﻣ ﺠﻧا ﻞﮑﺷ رد ﯽﺘﯾﻮﻘﺗ يﺮﯿﮔدﺎﯾ رد تﺎﯿﻠﻤﻋ مﺎ 1

هداد نﺎـﺸﻧ

1- Agent

2- Environment 3- Policy

4- Reward function 5- Actions 6- State 7- Classifier ا .ﺖــﺳا هﺪــﺷ

ــﯾ ﻦ ﺗﺮﺗ ﻪــﺑ ﻞــﺣاﺮﻣ ــﯿ

ﺐ ﺖــﻟﺎﺣ هﺪﻫﺎــﺸﻣ ﻞﻣﺎــﺷ St

،

ﻢﯿﻤﺼﺗ ﺶﻨﮐ مﺎﺠﻧا ياﺮﺑ ﻞﻣﺎﻋ يﺮﯿﮔ at

ﺶﻨﮐ مﺎﺠﻧا ، at

، ﻪﯿﺒﺷ يزﺎـﺳ

و ﻢﺘﺴﯿﺳ ) ﺪﯾﺪﺟ ﺖﻟﺎﺣ هﺪﻫﺎﺸﻣ +1

St

( ، شادﺎـﭘ صﺎﺼﺘﺧا +1

Rt

ياﺮـﺑ

ﺶﻨﮐ at

ﻣ تﺎﯿﺑﺮﺠﺗ زا يﺮﯿﮔدﺎﯾ و ﯽ

ﺪـﺷﺎﺑ ﺎـﺗ ﻪـﮐ ـﺳر ﯿﺪ ن طﺮـﺷ ﻪـﺑ

ﺎﭘﯾ راﺪ ي اﯾ ﻦ ﻣ راﺮﮑﺗ ﺪﻧور ﯽ

) دﻮﺷ 14 (.

ﻪﮑﺒﺷ رد هزﺎﺳ رد ،يرﺎﯿﺑآ يﺎﻫ بآ يﺎﻫ

ﺖـﻬﺟ ﻪﺑ ﻪﺟﻮﺗ ﺎﺑ اﺪﺘﺑا ﺪﻨﺑ

ﻦﯿﯾﺎﭘ ﺎﯾ ﺖﺳدﻻﺎﺑ) لﺮﺘﻨﮐ ﯽﻣ هﺪﻫﺎﺸﻣ نﺎﯾﺮﺟ ﻖﻤﻋ (ﺖﺳد

) دﻮﺷ St

ﺎﺑ و (

ﻞـﻣﺎﻋ ،فﺪـﻫ ﻖـﻤﻋ زا نآ فاﺮﺤﻧا ناﺰﯿﻣ و ﻖﻤﻋ ﻦﯾا راﺪﻘﻣ ﻪﺑ ﻪﺟﻮﺗ ار ﻪﭽﯾرد ﯽﮔﺪﺷزﺎﺑ رد ﺮﯿﯿﻐﺗ ناﺰﯿﻣ ﯽﻣ لﺎﻤﻋا اﺮﻧآ و ﻪﺒﺳﺎﺤﻣ

) ﺪـﻨﮐ at

.(

ﻪﯿﺒﺷ ﯽﻣ مﺎﺠﻧا يزﺎﺳ و دﻮﺷ

ﻖﻤﻋ ﯽـﻣ هﺪﻫﺎـﺸﻣ ﺪﯾﺪﺟ دﻮـﺷ

) +1 St

و (

ﯽﮔﺪﺷزﺎﺑ ﻪﺑ ﺲﭙﺳ at

ﯽﻣ هداد شادﺎﭘ ﻪـﮐ ﯽﻧﺎـﻣز ﺎﺗ ﺪﻨﯾآﺮﻓ ﻦﯾا و دﻮﺷ

ﯽﻣ ﻪﻣادا دﻮﺷ ﺖﯿﺒﺜﺗ فﺪﻫ ﻖﻤﻋ رد نﺎﯾﺮﺟ ﻖﻤﻋ .ﺪﺑﺎﯾ

(14)ﯽﺘﯾﻮﻘﺗ يﺮﯿﮔدﺎﯾﺪﻨﯾاﺮﻓ -1ﻞﮑﺷ Figure 1- Reinforcement Learning procees [14]

ﻫ لﺪﻣ ﯿ دورﺪ ﻣﺎﻨ ﯿ ICSS

لﺪــﻣ ﻪﯿﺒــﺷ ﯽﯾﺎــﻧاﻮﺗICSS

ﻢــﻬﻣ تﺎــﺼﺨﺸﻣ ﯽﻣﺎــﻤﺗ يزﺎــﺳ

لﺎﺳ رد ﺰﻧﺎﻣ ﻂﺳﻮﺗ و دراد ار يرﺎﯿﺑآ يﺎﻬﻟﺎﻧﺎﮐرد زﺎﯿﻧ درﻮﻣ و ﯽﮑﯿﻟورﺪﯿﻫ 1990 ﺪﯾدﺮﮔ ﻪﯿﻬﺗ .

ﯽﻣ لﺪﻣ ﻦﯾا زا هدﺎﻔﺘﺳا ﺎﺑ هﺮـﻬﺑ ﻪﺑ ناﻮﺗ

زا يرادﺮـﺑ

هزﺎﺳ ﻪﮑﺒﺷ رد ﺎﻫ ماﺪﻗا يرﺎﯿﺑآ يﺎﻫ

ﺎـﺑ لﺪـﻣ ﻦﯾا دﺮﮑﻠﻤﻋ ﺖﺤﺻ .دﺮﮐ

هداد زا هدﺎﻔﺘﺳا ﻪﺘﻓﺮﮔراﺮﻗ ﺪﯿﯾﺎﺗ و ﯽﺳرﺮﺑ درﻮﻣ ﯽﻫﺎﮕﺸﯾﺎﻣزآ ﻖﯿﻗد يﺎﻫ

ﺖـﺳا هﺪﯿﺳر ادﺎﻧﺎﮐ ناﺮﻤﻋ ﻦﯿﺳﺪﻨﻬﻣ ﻦﻤﺠﻧا ﺪﯿﯾﺎﺗ ﻪﺑ و ﺖﺳا )

11 (.

رد

ﻪﯿﺒﺷ نﺎﮑﻣا لﺪﻣ ﻦﯾا هزﺎﺳ زا ﯽﻬﺟﻮﺗ ﻞﺑﺎﻗ ﻒﯿﻃ يزﺎﺳ

دﻮﺟو ﯽﺑآ يﺎﻫ

ﺮﮔﺮﻈﻧ رد يزﺮﻣ طﺮﺷ ﮏﯾ ترﻮﺻ ﻪﺑ ماﺪﮐﺮﻫ ﻪﮐ دراد ﯽﻣ ﻪﺘﻓ

ﺪﻧﻮﺷ . ود

لﺪﻣ زا ﻢﻬﻣ ﻪﻣﺎﻧﺮﺑﺮﯾز ﺪﻧرادرﻮﺧﺮﺑ ﺖﯿﻤﻫا زا ﻖﯿﻘﺤﺗ ﻦﯾا رد ﻪﮐICSS

ﻪﻣﺎﻧﺮﺑﺮﯾز يﺎﻫ WRRST (WRite ReStarT)

و RDRST (ReaD

ReStarT) ﯽﻣ

ﻪـﻣﺎﻧﺮﺑﺮﯾز .ﺪـﺷﺎﺑ WRRST

ﻪﯿﺒـﺷ تﺎـﻋﻼﻃا ﯽﻣﺎـﻤﺗ

نﺎـﻣز رد هﺪـﺷ يزﺎﺳ ﻪـﻣﺎﻧﺮﺑﺮﯾز و هﺮـﯿﺧذ ﯽـﻨﺘﻣ ﻞـﯾﺎﻓ ﮏـﯾ رد ارt

RDRST تﺎﻋﻼﻃا ﻦﯾا ﻪﯿﺒﺷ ﻪﻣادا ياﺮﺑ ار

ﯽﻣ ﯽﻧاﻮﺧاﺮﻓ يزﺎﺳ .ﺪﻨﮐ

لﺮﺘﻨﮐ ﻢﺘﺴﯿﺳ ﯽﺿﺎﯾر لﺪﻣ RL

ﻢﺘﯾرﻮﮕﻟا ﯽﺿﺎﯾر لﺪﻣ ﻖﯿﻘﺤﺗ ﻦﯾا رد ﺎـﺑ و ﺪـﺷ هداد ﻪﻌـﺳﻮﺗ ،RL

ﯽﮑﯿﻣﺎﻨﯾدورﺪﯿﻫ لﺪﻣ رد ﻢﺘﯾرﻮـﮕﻟا ﻦﯾا يﺎﻤﻧﺪﻧور .ﺪﯾدﺮﮔ ﻖﯿﻔﻠﺗICSS

ﻞﮑﺷ 2 ردﺎﮐ .ﺖﺳا هﺪﺷ هداد نﺎﺸﻧ route

هﺪـﯾد ﻞﮑـﺷ ﻦـﯾا رد ﻪـﮐ

(3)

ﯽﻣ ﻪﻣﺎﻧﺮﺑ ،دﻮﺷ route1

ﯽﻣ يزﺎﺳ ﻪﯿﺒﺷ ار رﺎﮔﺪﻧﺎﻣﺮﯿﻏ نﺎﯾﺮﺟ ﻪﮐ ﺪﺷﺎﺑ

ﯽﻣ ﻪـﻣﺎﻧﺮﺑﺮﯾز ﻦﯾﺪﻨﭼ ﻞﻣﺎﺷ دﻮﺧ و ﺪﻨﮐ ﯽـﻣ2

ﻪـﻣﺎﻧﺮﺑ ﻞـﺧاد رد .ﺪـﺷﺎﺑ

route ﻪﻣﺎﻧﺮﺑﺮﯾز ﺪﻨﻧﺎﻣ ﯽﯾﺎﻫ ،pool

estimat و action ﻪـﮐ ﺪﻧراد راﺮﻗ

ﻪﻣﺎﻧﺮﺑﺮﯾز action ﻪﻣﺎﻧﺮﺑﺮﯾز زا ﯽﮑﯾ ﻖـﯿﻘﺤﺗ ﻦـﯾا رد ﻪـﮐ ﺖـﺳا ﯽﯾﺎﻫ

ﺮﺑﺮﯾز ﺮﯾﺎﺳ و هﺪﺷ هداد ﻪﻌﺳﻮﺗ ﻪﻣﺎﻧ

ﺰﻧﺎﻣ ﻂﺳﻮﺗ ﺎﻫ ﺖﺳا هﺪﺷ هداد ﻪﻌﺳﻮﺗ

ﻪـﻣﺎﻧﺮﺑﺮﯾز ﻦـﯾا .ﺖـﺴﯿﻧ ﻖـﯿﻘﺤﺗ ﻦـﯾا ﺚـﺤﺑ ﺰﺟ و ياﺮـﺟا ياﺮـﺑ ﺎـﻫ

ﻪﯿﺒﺷ ﯽﻣ زﺎﯿﻧ درﻮﻣ يزﺎﺳ .ﺪﻨﺷﺎﺑ

مﺮــﻧ ﻪــﮑﯿﺘﻗو راﺰــﻓا

ﯽــﻣ اﺮــﺟاICSS و ﯽــﮑﯾﺰﯿﻓ تﺎــﻋﻼﻃا دﻮــﺷ

ﻪﻣﺎﻧﺮﺑﺮﯾز ﻂﺳﻮﺗ ﯽﮑﯿﻟورﺪﯿﻫ WRRST

ﯽﻣ هﺮﯿﺧذ ،ﺪﻌﺑ ﻪﻠﺣﺮﻣ رد .دﻮﺷ

هﺮﻬﺑ ﯽﻣ مﺎﺠﻧا يرادﺮﺑ و دﻮﺷ

ﯽﻣ ﺮﯿﯿﻐﺗ يدورو نﺎﯾﺮﺟ نﺎﯾﺮﺟ ﺮﯿﯿﻐﺗ .ﺪﻨﮐ

ﯽﻣ بآ ﻖﻤﻋ ﺮﯿﯿﻐﺗ ﺚﻋﺎﺑ يدورو ) ﻪـﻣﺎﻧﺮﺑ ﻪﻣادا رد .دﻮﺷ

continue رد

ﻞﮑﺷ 2 ﻪﻣﺎﻧﺮﺑ ،(

و هﺪـﺷ هﺪﻧاﻮﺧ تﺎﻋﻼﻃا ﯽﻣﺎﻤﺗ و هﺪﺷ ﯽﻧاﻮﺧاﺮﻓRL

هﺮﻬﺑ ﻪﺑ هﺪﺷ يرادﺮﺑ ﯽﻣ ﻞﻘﺘﻨﻣRL

رد .دﻮﺷ ﻪـﯿﻟوا ﺮﯾدﺎـﻘﻣ اﺪـﺘﺑا ،RL

ﺪﻨﻧﺎﻣ ﺮﺜﮐاﺪﺣ و (شادﺎﭘ ﺮﺜﮐاﺪﺣ) Rmax

شرﺎﻤـﺷ ﻢﺘﯾرﻮـﮕﻟا ﻪـﺑ3

RL

ﯽــﻣ ﯽــﻓﺮﻌﻣ ) شرﺎﻤــﺷ ﺮﺜﮐاﺪــﺣ هزاﺪــﻧا ﻪــﺑ ﺲﭙــﺳ .دﻮــﺷ

1000 ،(

ﺪـﯿﻟﻮﺗ ﻢﺘﯾرﻮـﮕﻟا ﻂﺳﻮﺗ ﯽﻓدﺎﺼﺗ ترﻮﺼﺑ ﻪﯿﻟوا ﺖﯿﻌﻤﺟ يﺎﻫﺮﯾﺎﻔﯿﺳﻼﮐ ﯽﻣ شرﺎﻤﺷ زا ﺮﺘﻤﮐ ﺎﻬﺷرﺎﻤﺷ داﺪﻌﺗ ﻪﮐ ﯽﻧﺎﻣز ﺎﺗ ،رﻮﻈﻨﻣ ﻦﯾا ياﺮﺑ .دﻮﺷ

و ﻪﺘﻓﺎﯾ ﺶﯾاﺰﻓا ﺎﻫ شرﺎﻤﺷ ﺪﺷﺎﺑ ﺮﺜﮐاﺪﺣ RDRST

ﯽﻣ ﯽﻧاﻮﺧاﺮﻓ .دﻮﺷ

ﺎﻣز هزاﺪﻧا ﻪﺑ ن ﯽﻣ ﺶﯾاﺰﻓاdt

راﺮـﻗ ﻪـﺟﻮﺗ درﻮﻣ ﯽﻧﺎﻣز مﺎﮔ ﮏﯾ و ﺪﺑﺎﯾ

ﯽﻣ دﺮﯿﮔ (t=t+dt) ﻪﻣﺎﻧﺮﺑ . route ﯽﻣ ﯽﻧاﻮﺧاﺮﻓ ﻪﻣﺎﻧﺮﺑﺮﯾز و دﻮﺷ

pool

ﻪﻣﺎﻧﺮﺑﺮﯾز ﺮﯾﺎﺳ و ﻪﻣﺎﻧﺮﺑﺮﯾز ﻪﺑ نﺪﯿﺳر ﺎﺗ ﺎﻫ

action ﯽـﻣ اﺮﺟا رد .ﺪﻧﻮـﺷ

ﻪﻣﺎﻧﺮﺑﺮﯾز ﻂﺳﻮﺗ ﻪﭽﯾرد ﯽﮔﺪﺷزﺎﺑ ﻪﻣادا action

ﯽﻣ ﻪﺒﺳﺎﺤﻣ ﺲﭙﺳ دﻮﺷ

ﻧﺮﺑﺮﯾز ﻪﻣﺎ estimat ﻪﻣﺎﻧﺮﺑﺮﯾز ﺮﯾﺎﺳ و يﺎﻫ

route ﯽﻣ اﺮﺟا رد ﻪـﮐ دﻮـﺷ

ﯽﻣ ﺺﺨﺸﻣ ﯽﮑﯿﻟورﺪﯿﻫ يﺎﻫﺮﺘﻣارﺎﭘﺮﯾﺎﺳ و ﺪﯾﺪﺟ ﻖﻤﻋ نآ ﻪﺠﯿﺘﻧ .ددﺮﮔ

ﻪﻣﺎﻧﺮﺑﺮﯾز ﻪﻣادا رد reward

ﻪﺑ ،ﺪﯾﺪﺟ ﻖﻤﻋ ﻪﺑ ﻪﺟﻮﺗ ﺎﺑ و هﺪﺷ ﯽﻧاﻮﺧاﺮﻓ

ﯽﻣ هداد شادﺎﭘ هﺪﺷ لﺎﻤﻋا ﯽﮔﺪﺷزﺎﺑ طﺮﺷ .دﻮﺷ

ﯽﻣ لﺮﺘﻨﮐc ﺮﮔا دﻮﺷ

ﻮﺗ ﺮﯾﺎﻔﺳﻼﮐc=0 ﯽﻣ ﻪﻓﺎﺿا دﻮﺟﻮﻣ ﺖﯿﻌﻤﺟ ﻪﺑ هﺪﺷ ﺪﯿﻟ

ﺪﻨﯾاﺮﻓ ﻦﯾا .دﻮﺷ

ﯽﻣ راﺮﮑﺗ دﻮﺷ ﻞﺻﺎﺣ شرﺎﻤﺷ ﺮﺜﮐاﺪﺣ ﻪﮐ ﯽﻧﺎﻣز ﺎﺗ ترﻮـﺼﻨﯾا رد .دﻮﺷ

و هﺪﯿـﺳر نﺎـﯾﺎﭘ ﻪـﺑ ﻪـﯿﻟوا ﺖـﯿﻌﻤﺟ ﺪﯿﻟﻮﺗ ﻪـﮑﻨﯾا ﻪـﺑ ﻪـﺟﻮﺗ ﺎـﺑ .c=1

ﻪﻣﺎﻧﺮﺑﺮﯾز WRRST ﯽﻤﻧ ﯽﻧاﻮﺧاﺮﻓ ﻪﺳوﺮﭘ ﻦﯾا رد

يﺮـﯿﯿﻐﺗ نﺎـﻣز دﻮﺷ

ﯽﻤﻧ ﻪﯿﺒﺷ مﺎﺠﻧا ياﺮﺑ .ﺪﻨﮐ ﻞﺣاﺮﻣ يزﺎﺳ

ﯽﻣ مﺎﺠﻧا ﺮﯾز .دﻮﺷ

ﻪﯿﺒﺷ زا ﻞﮑﯿﺳ ﺮﻫ رد يﺮﯿﮔدﺎﯾ ﺪﻨﯾاﺮﻓ زا لوا مﺎﮔ رد راﺪـﻘﻣ يزﺎـﺳ

شادﺎﭘ ﯽـﻠﺒﻗ شادﺎﭘ و هﺪﻧرﺎﻤﺷ ،R )4

هداد راﺮـﻗ ﺮﻔـﺻ ﺎـﺑ ﺮـﺑاﺮﺑ (PR

ﯽﻣ ﻪﻣﺎﻧﺮﺑﺮﯾز .دﻮﺷ يﺎﻫ

RDRST ، route و reward ياﺮﺑ ﺐﯿﺗﺮﺗ ﻪﺑ

ﯽﻣ اﺮﺟا و هﺪﺷ ﯽﻧاﻮﺧاﺮﻓ ﯽﻧﺎﻣز مﺎﮔ ﮏﯾ ﮏـﯾ نآ ﻪـﺠﯿﺘﻧ رد ﻪﮐ ﺪﻧﻮﺷ

ﻋ ﯽﻣ ﺪﯿﻟﻮﺗ ﻮﻀ ﯽﻧﻮﻨﮐ شادﺎﭘ ﺲﭙﺳ .دﻮﺷ

)5

ﺎﺑ و هﺪﺷ ﻪﺒﺳﺎﺤﻣ (CR PR

1- Program

2- subprogram 3- Counter0 4- Previous Reward 5- Current Reward ﯽﻣ ﻪﺴﯾﺎﻘﻣ

ﺮﮔا .دﻮﺷ زا ﺮﺘﻤﮐCR

ﺎﻤﯿﻘﺘﺴﻣ هﺪﺷ ﺪﯿﻟﻮﺗ ﻮﻀﻋ ،ﺪﺷﺎﺑPR

ﯽﻣ ﻪﻓﺎﺿا ﺖﯿﻌﻤﺟ ﻪﺑ ترﻮﺼﻨﯾا ﺮﯿﻏ رد ،دﻮﺷ

ﺎﺑ ﺮﺑاﺮﺑR هداد راﺮـﻗCR

و هﺪﺷ ﺎﺑ ﺮﺑاﺮﺑ ﺰﯿﻧPR

ﯽﻣ هداد راﺮﻗCR هﺪﺷ ﺪﯿﻟﻮﺗ ﻮﻀﻋ ﺲﭙﺳ و دﻮﺷ

ﯿﻟوا ﺖﯿﻌﻤﺟ ﻪﺑ ﯽﻣ ﻪﻓﺎﺿا ﻪ

راﺪﻘﻣ .دﻮﺷ ﺎﺑR

ﯽـﻣ ﻪﺴﯾﺎﻘﻣRmax .دﻮـﺷ

ﺮﮔا زا ﺮﺘﻤﮐR ﯽﻣ ﻪﻣادا ﺪﻨﯾاﺮﻓ ﻦﯾا ﺪﺷﺎﺑRmax

ترﻮﺼﻨﯾا ﺮﯿﻏ رد ﺪﺑﺎﯾ

WRRST ﯽـﻣ هﺮـﯿﺧذ رﻮﮐﺬـﻣ نﺎﻣز تﺎﻋﻼﻃا و هﺪﺷ ﯽﻧاﻮﺧاﺮﻓ

دﻮـﺷ

ﻪﻣﺎﻧﺮﺑﺮﯾز ﺲﭙﺳ report

شراﺰـﮔ ﻪـﻣﺎﻧﺮﺑ ﯽـﺟوﺮﺧ و هﺪـﺷ ﯽﻧاﻮـﺧاﺮﻓ

ﯽﻣ ﯽﻣ يﺪﻌﺑ ﯽﻧﺎﻣز مﺎﮔ ﻪﺑ ﻪﻣﺎﻧﺮﺑ و دﻮﺷ نﺎﻣز مﺎﻤﺗا ﺎﺗ ﺪﻨﯾاﺮﻓ ﻦﯾا .دور

ﻪﯿﺒﺷ لﺪﻣ ﻞﺧاد رد ﯽﻧﺎﻣز مﺎﮔ ﺮﻫ ياﺮﺑ يزﺎﺳ ﯽﻣ راﺮﮑﺗICSS

.دﻮﺷ

ﻪﻣﺎﻧﺮﺑﺮﯾز ترﺎﭼﻮﻠﻓ action

ﻞﮑﺷ رد 3 رد .ﺖـﺳا هﺪﺷ هداد نﺎﺸﻧ

تﺎﭼﻮﻠﻓ ﻦﯾا Rclass

ﯽـﻣ هﺪـﺷ بﺎﺨﺘﻧا ﺮﯾﺎﻔﯿﺳﻼﮐ شادﺎﭘ .ﺪـﺷﺎﺑ

GO

ﯽﻣ ﻪﭽﯾرد ﯽﮔﺪﺷزﺎﺑ ـﺳا هﺪـﺷ ﻪﺘـﺷﻮﻧ رﺎﺼﺘﺧا ﻪﺑ ﻞﮑﺷ رد ﻪﮐ ﺪﺷﺎﺑ

.ﺖ

ﯽﻣ ﯽﻧاﻮﺧاﺮﻓ ﻪﻣﺎﻧﺮﺑﺮﯾز ﻦﯾا ﻪﮐ ﯽﺘﻗو ﯽﻣ ﯽﻌﺳ ،دﻮﺷ

ﯽﮔﺪﺷزﺎﺑ ﻪﮐ دﻮﺷ

ﯽﻣ ﺖﯿﺒﺜﺗ فﺪﻫ ﻖﻤﻋ رد ار بآ ﻖﻤﻋ ﻪﮐ ﺐﺳﺎﻨﻣ ﻪﭽﯾرد .دﻮﺷ اﺪﯿﭘ ،ﺪﻨﮐ

ﯽـﻣ مﺎـﺠﻧا ﻮﺠﺘـﺴﺟ دﻮﺟﻮﻣ ﺖﯿﻌﻤﺟ رد اﺪﺘﺑا ،رﻮﻈﻨﻣ ﻦﯾا ياﺮﺑ و دﻮـﺷ

MS6

ﯽﻣ ﻞﯿﮑﺸﺗ .دﻮﺷ

ﻖﻤﻋ ﻪﮐ ﺖﺳا ﯽﯾﺎﻫﺮﯾﺎﻔﯿﺳﻼﮐ مﺎﻤﺗ ﻞﻣﺎﺷMS

ﻋ ﺎﺑ ﺮﺑاﺮﺑ ﺎﻬﻧآ رد ﯽﻣ دﻮﺟﻮﻣ ﻖﻤ

يﺎﻫﺮﯾﺎﻔﯿﺳﻼﮐ زا ﯽﮑﯾ شادﺎﭘ ﺮﮔا .ﺪﺷﺎﺑ

ﺎﺑ ﺮﺑاﺮﺑ MS لﺎـﻤﻋا نآ ﯽﮔﺪـﺷزﺎﺑ ناﺰـﯿﻣ ترﻮـﺼﻨﯾا رد ﺪﺷﺎﺑRmax

ﯽﻣ ﺪـﯿﻟﻮﺗ ﯽﻓدﺎـﺼﺗ ترﻮـﺼﺑ ﺮﯾﺎﻔﯿـﺳﻼﮐ ﮏـﯾ ترﻮﺼﻨﯾا ﺮﯿﻏ رد دﻮﺷ

ﯽﻣ ﺲﭘ ﺰﯿﻧ نآ شادﺎﭘ و دﻮﺟﻮﻣ ﻖﻤﻋ ﺎﺑ ﺮﺑاﺮﺑ ﺮﯾﺎﻔﯿﺳﻼﮐ ﻦﯾا ﻖﻤﻋ .دﻮﺷ

ﻪﯿﺒﺷ زا ﯽﻣ صﺎﺼﺘﺧا يزﺎﺳ ﺘﯾﺎﻬﻧ و ﺪﺑﺎﯾ

ﻪـﺑ و هﺪـﺷ ﻞﯿﻤﮑﺗﺮﯾﺎﻔﯿﺳﻼﮐ ﺎ

ﯽﻣ ﻪﻓﺎﺿا ﺖﯿﻌﻤﺟ .دﻮﺷ

ﻪﻣﺎﻧﺮﺑﺮﯾز رد reward

تﻻدﺎـﻌﻣ زا هدﺎﻔﺘﺳا ﺎﺑ شادﺎﭘ راﺪﻘﻣ 1

و 2

ﯽﻣ ﻪﺒﺳﺎﺤﻣ .دﻮﺷ

)1 (

t t

Y Y A Y

نآ رد ﻪﮐ =Y

و هﺪﺷ هﺪﻫﺎﺸﻣ ﻖﻤﻋ راﺪﻘﻣ Yt

ﻖﻤﻋ راﺪﻘﻣ=

ﯽﻣ فﺪﻫ -

.ﺪﺷﺎﺑ

)2 ( ) 0

1 ( A R R 

نآ رد ﻪﮐ R0

= ﯽﻣ ﻦﮑﻤﻣ شادﺎﭘ ﺮﺜﮐاﺪﺣ ﺮـﻫ ﻪـﮐ ﯽـﻨﻌﻣ ﻦﯾﺪﺑ ،ﺪﺷﺎﺑ

ﺪﺷﺎﺑ فﺪﻫ ﻖﻤﻋ ﺎﺑ ﺮﺑاﺮﺑ بآ ﻖﻤﻋ ﺖﻗو R

R0 ﺢﯾﺮﺸﺗ يﺎﻫﺪﻨﯾاﺮﻓ.

ﺮﺑ ﻻﺎﺑ رد هﺪﺷ .ﺖﺳاﺮﺟا ﻞﺑﺎﻗ لﺎﻧﺎﮐ زا هزﺎﺑ ﮏﯾ يا

ﻞﮑﺷ ﻢﺘﯾرﻮﮕﻟا ﺪﻨﯾاﺮﻓ ،هزﺎﺑ ﻦﯾﺪﻨﭼ ﺎﺑ لﺎﻧﺎﮐ ﮏﯾ ياﺮﺑ 2

ياﺮﺑ اﺪﺘﺑا

ﯽﻣ مﺎﺠﻧا لوا هزﺎﺑ هزﺎﺑ ياﺮﺑ ﺲﭙﺳ و دﻮﺷ

2 هزﺎـﺑ ياﺮﺑ ﺲﭙﺳ و يﺎـﻫ

ﯽﻣ راﺮﮑﺗ ﺐﯿﺗﺮﺗ ﻪﺑ يﺪﻌﺑ .دﻮﺷ

6- Matching Set

(4)

ﻞﮑﺷ 2 ﻢﺘﯾرﻮﮕﻟا يﺎﻤﻧﺪﻧور- ﻪﺘﻓﺎﯾ ﻪﻌﺳﻮﺗRL

Figure 2- The flowchart of developed RL

ﻢﺘﯾرﻮﮕﻟا رد ﻪﭽﯾرد ﯽﮔﺪﺷزﺎﺑ ﻪﺒﺳﺎﺤﻣ يﺎﻤﻧﺪﻧور-3 ﻞﮑﺷ RL

Figure 3- The flowchart of gate opening calculation in RL algorithm

ﺺﺧﺎﺷ ﯽﺑﺎﯾزرا يﺎﻫ

ﻪﻌﺳﻮﺗ لﺪﻣ ﯽﺑﺎﯾزرا ياﺮﺑ ﻪﺘﻓﺎﯾ

زا ﺺﺧﺎـﺷ يﺎـﻄﺧ ﺮﺜﮐاﺪـﺣ يﺎـﻫ

) ﻖــﻠﻄﻣ MAE1

) ﺎــﻫﺎﻄﺧ ﻖــﻠﻄﻣ ﻦﯿﮕﻧﺎــﯿﻣ ،(

IAE2

) ( 8 و ( نﺎــﻣز

ﺲﮑﻋ ) ﻢﺘﺴﯿﺳ ﻞﻤﻌﻟا SRT3

) ( 14 ﺺﺧﺎﺷ .ﺪﺷ هدﺎﻔﺘﺳا ( ﺮﺜﮐاﺪـﺣ يﺎـﻫ

ﻪﺑ ﺎﻫﺎﻄﺧ ﻖﻠﻄﻣ ﻦﯿﮕﻧﺎﯿﻣ و ﻖﻠﻄﻣ يﺎﻄﺧ ﻂﺑاور ترﻮﺻ

3 و 4 ﯽـﻓﺮﻌﻣ

1- Maximum Absolute Error

2- Integral of Absolute magnitude of Error 3- System Response Time

ﯽﻣ :ﺪﻧﻮﺷ

)3 (

t t

Y Y MAE max(Y  )

)4 (

t T

t

Y Y T Y

t IAE

0

،نآ رد ﻪﮐ ﻪﯿﺒﺷ هرود لﻮﻃ =T

ﯽﻣ يزﺎﺳ ﻼﺒـﻗ ﺎـﻫﺮﺘﻣارﺎﭘ ﻪﯿﻘﺑ و ﺪﺷﺎﺑ

(5)

هﺪﺷ ﯽﻓﺮﻌﻣ نﺎـﺸﻧ ﻖـﻠﻄﻣ يﺎﻄﺧ ﺮﺜﮐاﺪﺣ ﺺﺧﺎﺷ .ﺪﻧا

ﺮﺜﮐاﺪـﺣ ةﺪـﻨﻫد

،ﺎـﻄﺧ ﻖـﻠﻄﻣ ﻦﯿﮕﻧﺎـﯿﻣ ﺺﺧﺎـﺷ .ﺖـﺳا فﺪﻫ ﻖﻤﻋ زا ﻖﻤﻋ فاﺮﺤﻧا نﺎﺸﻧ ةرود لﻮـﻃرد فﺪـﻫ ﻖـﻤﻋ زا بآ ﻖـﻤﻋ فاﺮـﺤﻧا ﻂﺳﻮﺘﻣةﺪﻨﻫد

ﻪﯿﺒﺷ ﯿﺑ ﯽﻧﺎﻣز ﮥﻠﺻﺎﻓ .ﺖﺳا يزﺎﺳ هدوﺪـﺤﻣ زا ﻖـﻤﻋ فاﺮﺤﻧا عوﺮﺷ ﻦ

ﺲﮑﻋ نﺎﻣز ار هدوﺪﺤﻣ ﻦﯾا نورد رد ﻖﻤﻋ هرﺎﺑود ﺖﯿﺒﺜﺗ ﺎﺗ زﺎﺠﻣ ﻞﻤﻌﻟا

ﯽﻣ ﺎﺑ ار نآ و ﺪﻨﻣﺎﻧ ﯽﻣ نﺎﺸﻧSRT

ﺺﺧﺎـﺷ ﻦﯾاراﺪـﻘﻣ ﻪـﭼ ﺮﻫ .ﺪﻨﻫد

ﺪﺷﺎﺑ ﺮﺘﻤﮐ ﯽـﻣ ﺖﯿﺒﺜﺗ زﺎﺠﻣ ةدوﺪﺤﻣ ﻞﺧاد رد ﺮﺘﻌﯾﺮﺳ بآ ﻖﻤﻋ،

.دﻮـﺷ

ﺺﺧﺎﺷ تاﺮﯿﯿﻐﺗ ﮥﻨﻣاد ﻦﯿﮕﻧﺎـﯿﻣ و ﻖـﻠﻄﻣ يﺎﻄﺧ ﺮﺜﮐاﺪﺣ يﺎﻫ

ﻖـﻠﻄﻣ

و ﺮﻔﺻ ﻦﯿﺑ ﺎﻄﺧ 100

رد ﻖـﻤﻋ تاﺮـﯿﯿﻐﺗ ﻪـﮐ ﯽﺗرﻮﺻ رد .ﺖﺳا ﺪﺻرد

ﺲﮑﻋ نﺎﻣز ،دﺮﯿﮔ ترﻮﺻ زﺎﺠﻣ هدوﺪﺤﻣ نورد ﺪﻫاﻮﺧ ﺮﻔﺻ ﺮﺑاﺮﺑ ﻞﻤﻌﻟا

.دﻮﺑ ﻪﯿﺒﺷ يﺎﻫﻮﯾرﺎﻨﺳ يزﺎﺳ

لﺮﺘﻨﮐ ﻢﺘﺴﯿﺳ ﯽﺑﺎﯾزرا و نﻮﻣزآ ياﺮﺑ ﻪﻌﺳﻮﺗRL

ﻪﺑ ﯽﻣﻮﻠﻓ ،ﻪﺘﻓﺎﯾ

لﻮﻃ 10 ﺐﯿﺗﺮﺗ ﻪﺑ ضﺮﻋ و عﺎﻔﺗرا ﺎﺑ و ﺮﺘﻣ 45

/0 و 3/

0 ﺮﺘﻣ ﺮـﻈﻧ رد

رﺎﮐدﻮﺧ ﻪﭽﯾرد ﮏﯾ ﻪﻠﯿﺳﻮﺑ ﻪﮐ هدﻮﺑ هزﺎﺑ ود ياراد مﻮﻠﻓ ﻦﯾا .ﺪﺷ ﻪﺘﻓﺮﮔ ﻖﻄﻨﻣ ﺎﺑ هﺪﺷ اﺪﺟ ﺮﮔﺪﮑﯾ زاRL

ﻪﻠﺻﺎﻓ رد ﻪﭽﯾرد ﻦﯾا .ﺪﻧا 4

زا يﺮـﺘﻣ

ﮏﯾ زا هدﺎﻔﺘﺳا ﺎﺑ مﻮﻠﻓ ﻪﺑ يدورو نﺎﯾﺮﺟ .ﺖﺳا ﻪﺘﻓﺮﮔ راﺮﻗ مﻮﻠﻓ ياﺪﺘﺑا ﯽﻣ لﺮﺘﻨﮐ ،ﺖﺳا ﻪﺘﻓﺮﮔ راﺮﻗ مﻮﻠﻓ ياﺪﺘﺑا رد ﻪﮐ ﻪﮑﻠﻓ ﺮﯿﺷ .دﻮﺷ

ﻖـﻤﻋ

ﯽﻣ لﺮﺘﻨﮐ رﺎﮐدﻮﺧ ﻪﭽﯾرد ﺖﺳدﻻﺎﺑ رد بآ ﯽـﺳررﺮﺑ رﻮـﻈﻨﻣ ﻪـﺑ .دﻮﺷ

ﯽﮑﯿﻟورﺪﯿﻫ و ﯽﮑﯾﺰﯿﻓ تﺎﺼﺨﺸﻣ اﺪﺘﺑا ،رﺎﮐدﻮﺧ لﺮﺘﻨﮐ ﻢﺘﺴﯿﺳ دﺮﮑﻠﻤﻋ لﺪﻣ ﻪﺑ مﻮﻠﻓ ﯽﮔﺪـﺷزﺎﺑ ،ﻪﯿﻟوا ﯽﺑد ﺮﯾدﺎﻘﻣ ﺲﭙﺳ و ﺪﺷ ﯽﻓﺮﻌﻣICSS

ﺮﯾدﺎـﻘﻣ رد فﺪـﻫ ﻖـﻤﻋ و ﻪﭽﯾرد ﻪﯿﻟوا 25 l/s

، 14.3 cm و

20 cm

ﻮﯾرﺎﻨﺳ ود .ﺪﻧﺪﺷ ﻢﯿﻈﻨﺗ راﺮـﻗ ﻪـﺟﻮﺗ درﻮـﻣ نﺎﯾﺮﺟ ﺶﻫﺎﮐ و ﺶﯾاﺰﻓا ي

ﻪـﯿﻟوا ﯽـﺑد ،نﺎـﯾﺮﺟ ﺶﯾاﺰـﻓا يﻮﯾرﺎﻨﺳ رد .ﺖﻓﺮﮔ 25 l/s

ﻪـﺑ 30 l/s

ﯽﻣ ﺶﯾاﺰﻓا ﻪﯿﻟوا ﯽﺑد نﺎﯾﺮﺟ ﺶﻫﺎﮐ يﻮﯾرﺎﻨﺳ رد و ﺖﻓﺎﯾ

25 l/s ﻪﺑ 20

ﯽﻣ ﺶﻫﺎﮐl/s ﺖﻓﺎﯾ

.

و ﺞﯾﺎﺘﻧ ﺚﺤﺑ

ﻪﯿﺒﺷ ،هﺪﺷ هرﺎﺷا يﺎﻫﻮﯾرﺎﻨﺳ ﻪﺑ ﻪﺟﻮﺗ ﺎﺑ يزﺎﺳ

ﺞﯾﺎـﺘﻧ و ﺪﺷ مﺎﺠﻧا

ﮔ ﺰﯿﻟﺎﻧآ تﺎـﻓاﺮﺤﻧا رﺎـﮐدﻮﺧ ﻪـﭽﯾرد ،نﺎﯾﺮﺟ تاﺮﯿﯿﻐﺗ ﻪﺑ ﺦﺳﺎﭘ رد .ﺪﯾدﺮ

ﯽﻣ ﺖﯿﺒﺜﺗ زﺎﺠﻣ هدوﺪﺤﻣ ﻞﺧاد رد اﺮﻧآ و هدﺮﮐ لﺮﺘﻨﮐ ار بآ ﺢﻄﺳ .ﺪﻨﮐ

راﺪـﻘﻣ و ﻪﭽﯾرد ﺖﺳدﻻﺎﺑ رد بآ ﺢﻄﺳ و ﯽﺑد تاﺮﯿﯿﻐﺗ ﮏﯿﺗﺎﻤﺷ ﻞﮑﺷ يﺎﻬﻠﮑﺷ رد ﺎﻫﻮﯾرﺎﻨﺳ ياﺮﺑ ﻪﭽﯾرد ﯽﮔﺪﺷزﺎﺑ 4

و 5 هﺪـﺷ هداد نﺎـﺸﻧ

ﺺﺧﺎﺷ ﺮﯾدﺎﻘﻣ ﻦﯿﻨﭽﻤﻫ .ﺖﺳا ﻫ

لﺪﺟ رد ﯽﺑﺎﯾزرا يﺎ 1

.ﺖﺳا هﺪﺷ ﻪﺋارا

زا ﺖﺳدﻻﺎﺑ نﺎﯾﺮﺟ ،نﺎﯾﺮﺟ ﺶﯾاﺰﻓا يﻮﯾرﺎﻨﺳ ياﺮﺑ 25 l/s

ﻪـﺑ 30 l/s

ﺢﻄـﺳ ،ﺶﯾاﺰﻓا ﻦﯾا ﻪﺑ ﺦﺳﺎﭘ رد يﺮﯿﮔدﺎﯾ مﺎﺠﻧا زا ﺲﭘ .ﺖﻓﺎﯾ ﺶﯾاﺰﻓا ﻪﺑ ﻪﭽﯾرد ﺖﺳدﻻﺎﺑ رد بآ 20.7 cm

راﺪـﻘﻣ ﻪﺑ ﺲﭙﺳ و ﺖﻓﺎﯾ ﺶﯾاﺰﻓا 20.5 cm

ﻣ هدوﺪﺤﻣ .ﺪﯾدﺮﮔ ﺖﯿﺒﺜﺗ نآ رد و ﺶﻫﺎﮐ زﺎﺠ

± 10%

ﻖﻤﻋ

ﺎﺑ ﺮﺑاﺮﺑ فﺪﻫ 18 cm

ﺎﺗ 22 cm ﯽـﻣ ،بآ ﻖـﻤﻋ ﻪـﺑ ﻪـﺟﻮﺗ ﺎـﺑ .ﺪـﺷﺎﺑ

ﯽﻣ هﺪﻫﺎﺸﻣ ﻖـﻤﻋ زﺎـﺠﻣ هدوﺪﺤﻣ ﻞﺧاد رد هراﻮﻤﻫ بآ ﻖﻤﻋ ﻪﮐ دﻮﺷ

ﺺﺧﺎﺷ راﺪﻘﻣ ﻪﺠﯿﺘﻧرد .ﺖﺳا هﺪﺸﻧ جرﺎﺧ نآ زا و هدﻮﺑ ﺎـﺑ ﺮـﺑاﺮﺑSRT

ﯽﻣ ﺮﻔﺻ ﺺﺧﺎﺷ ﺮﯾدﺎﻘﻣ .ﺪﺷﺎﺑ يﺎﻫ

وIAE ﺎـﺑ ﺮـﺑاﺮﺑ ﺐﯿﺗﺮﺗ ﻪﺑMAE

2.57%

و ﻣ3.5%

ﯽ رد هﺪـﺷ ﺪﯿﻟﻮﺗ ﺪﯾﺪﺟ ﺖﯿﻌﻤﺟ ﺶﻫﺎﮐ راﺪﻘﻣ .ﺪﺷﺎﺑ

ﻞﮑﺷ رد يﺮﯿﮔدﺎﯾ ﺪﻨﯾاﺮﻓ لﻮﻃ 6

ناﺰـﯿﻣ ،اﺪـﺘﺑا .ﺖﺳا هﺪﺷ هداد نﺎﺸﻧ

ﺎﺑ ﺮﺑاﺮﺑ ﻪﯿﻟوا ﺖﯿﻌﻤﺟ 1000

ﯽﻣ ﻪﯿﺒـﺷ لﻮﻃ رد ﻪﮐ ﺪﺷﺎﺑ ناﺰـﯿﻣ يزﺎـﺳ

زا ﺪﻌﺑ و ﻪﺘﻓﺎﯾ ﺶﻫﺎﮐ باﻮﺟ ﻪﺑ نﺪﯿﺳر ياﺮﺑ هﺪﺷ ﺪﯿﻟﻮﺗ ﺪﯾﺪﺟ ﺖﯿﻌﻤﺟ دوﺪﺣ 0.03 hr ﯽﻣ ﺮﻔﺻ ﺎﺑ ﺮﺑاﺮﺑ .دﻮﺷ

راﺪـﻘﻣ ﻪﮐ ﺖﺳا ﯽﻨﻌﻣ ﻦﯾﺪﺑ ﻦﯾا

مﺰﯿﻧﺎـﮑﻣ ﺮﻃﺎـﺨﺑ باﻮـﺟ ﻪـﺑ نﺪﯿـﺳر ياﺮﺑ ﯽﻧﺎﻣز مﺎﮔ ﺮﻫ رد ﺎﻫراﺮﮑﺗ ﯽﻣ ﺶﻫﺎﮐ يﺮﯿﮔدﺎﯾ ﺪﯿﻟﻮﺗ ﺪﯾﺪﺟ ﺖﯿﻌﻤﺟ ،يﺮﯿﮔدﺎﯾ مﺎﻤﺗا زا ﺲﭘ و ﺪﺑﺎﯾ

ﯽﻤﻧ ﻪـﮐ ﺪـﻨﭼ ﺮﻫ ،ﺪﻨﮐ اﺪﯿﭘ ﺶﻫﺎﮐ ﺪﯾﺎﺑ بآ ﻖﻤﻋ ،ﺪﻌﺑ ﻪﻠﺣﺮﻣ رد .دﻮﺷ

ﺲﮑﻋ وﺪـﺤﻣ ﻞـﺧاد رد بآ ﻖـﻤﻋ هﺪﺷ ﺚﻋﺎﺑ ﻪﭽﯾرد ﺐﺳﺎﻨﻣ ﻞﻤﻌﻟا هد

فﺪـﻫ ﻖـﻤﻋ راﺪـﻘﻣ ﻪـﺑ ﯽﻧﺎﻣز مﺎﮔ ﺮﻫ يﺎﻬﺘﻧا رد و هﺪﺷ ﺖﯿﺒﺜﺗ زﺎﺠﻣ ﻪﺘﮑﻧ .ﺖﺳا هﺪﯿﺳر ﯽـﻣ ﺖﯿﻤﻫا ﺰﺋﺎﺣ ﺎﺠﻨﯾا رد ﻪﮐ يا

ﻪـﮐ ﺖـﺴﻨﯾا ﺪـﺷﺎﺑ

ﻪﭽﯾرد ﯽﮔﺪﺷزﺎﺑ ناﺰﯿﻣ ﻢﺘﯾرﻮﮕﻟا ﯽـﻣ جاﺮﺨﺘـﺳا يرﻮﻃ ار ﺎﻫ

ﻪـﮐ ﺪـﻨﮐ

ﻪﯿﺒﺷ ﺪﻨﯾاﺮﻓ لﻮﻃ رد ﻖﻤﻋ تاﺮﯿﯿﻐﺗ ﻦﮑﻤﻣ ﻦﯾاﺮﺑﺎﻨﺑ ،دﻮﺷ ﻞﻗاﺪﺣ يزﺎﺳ

ﻤﻋ ﻪﮐ ﯽﺘﻟﺎﺣ رد ﺖﺳا ﯽـﻣ اﺪﯿﭘ ﺶﻫﺎﮐ بآ ﻖ

.دﻮـﺷ زﺎـﺑ ﻪـﭽﯾرد ﺪـﻨﮐ

ﺮﻇﺎﻨﺘﻣ ﺮﯾدﺎﻘﻣ ،SRT

وMAE ﯽﻣ ﺮﻔﺻ ﺎﺑ ﺮﺑاﺮﺑIAE

ﻞﮑـﺷ .ﺪـﺷﺎﺑ 7

ﻢﺘﯾرﻮﮕﻟا رﺎﺘﻓر ﻪﯿﺒﺷ هرود لﻮﻃ رد ﺪﯾﺪﺟ ﺖﯿﻌﻤﺟ ﺪﯿﻟﻮﺗ ردRL

ار يزﺎﺳ

ﯽﻣ نﺎﺸﻧ ﻪﯿﺒﺷ ياﺪﺘﺑا رد .ﺪﻫد ﺎـﺑ ﺮﺑاﺮﺑ هﺪﺷ ﺪﯿﻟﻮﺗ ﺖﯿﻌﻤﺟ داﺪﻌﺗ يزﺎﺳ

1000 ﯽﻣ ﻣ ﯽﻧﺎﻣز مﺎﮔ ﺮﻫ لﻮﻃ رد .ﺪﺷﺎﺑ هﺪﺷ ﺪﯿﻟﻮﺗ ﺪﯾﺪﺟ ﺖﯿﻌﻤﺟ ناﺰﯿ

نﺎﻣز رد و ﻪﺘﻓﺎﯾ ﺶﻫﺎﮐ 0.16 hr

ﻪﯿﺒﺷ عوﺮﺷ زا ﺖـﯿﻌﻤﺟ داﺪﻌﺗ يزﺎﺳ

ﯽـﻣ ،ﺞﯾﺎـﺘﻧ ﻪﺑ ﻪﺟﻮﺗ ﺎﺑ .ﺖﺳا هﺪﯿﺳر ﺮﻔﺻ ﻪﺑ هﺪﺷ ﺪﯿﻟﻮﺗ ﻪـﺠﯿﺘﻧ ناﻮـﺗ

ﻪﻌﺳﻮﺗ لﺮﺘﻨﮐ ﻢﺘﺴﯿﺳ ﻪﮐ ﺖﻓﺮﮔ باﻮﺟ ﻪﺑ نﺪﯿﺳر نﺎﻣز و ﺖﻗد رد ﻪﺘﻓﺎﯾ

و هدﻮﺑ يﺪﻨﻤﻧاﻮﺗ ﻢﺘﺴﯿﺳ ﮏﯾ ـﻫ ﻖـﻤﻋ رد بآ ﻖـﻤﻋ ﺖﯿﺒﺜﺗ ياﺮﺑ

فﺪ

ﯽﻣ ﺐﺳﺎﻨﻣ رﺎﯿﺴﺑ .ﺪﺷﺎﺑ

ﻪﺠﯿﺘﻧ يﺮﯿﮔ ﯽﻠﮐ

ﺎﺑ ﯽﺋﻮﺸﮐ ﻪﭽﯾرد هزﺎﺳرﺎﮐدﻮﺧ لﺮﺘﻨﮐ ﯽﺿﺎﯾر لﺪﻣ ،ﻖﯿﻘﺤﺗ ﻦﯾا رد ﻢﺘﯾرﻮﮕﻟا ﯽﮑﯿﻣﺎﻨﯾدورﺪـﯿﻫ لﺪﻣ ﺎﺑ ﯽﻘﯿﻔﻠﺗ ترﻮﺼﺑ وRL

ﻪـﯿﻬﺗICSS

ﺶﻫﺎﮐ و ﺶﯾاﺰﻓا يﻮﯾرﺎﻨﺳ ود .ﺖﻓﺮﮔ راﺮﻗ ﯽﺑﺎﯾزرا و نﻮﻣزآ درﻮﻣ و ﺪﺷ ﻪﯿﻟوا ﯽﺑد ﺎﺑ يدورو ﯽﺑد 25

ﻪﯿﻧﺎﺛ رد ﺮﺘﯿﻟ تاﺮـﯿﯿﻐﺗ .ﺪﺷ ﻪﺘﻓﺮﮔ ﺮﻈﻧ رد

ﻪـﺑ ﺦـﺳﺎﭘ رد (رﺎـﮐدﻮﺧ ﻪـﭽﯾرد ﺖﺳدﻻﺎﺑ ) لﺮﺘﻨﮐ ﻪﻄﻘﻧ رد بآ ﻖﻤﻋ ﯽﻣ نﺎﺸﻧ ﻪﮐ ﺪﺷ هﺪﻫﺎﺸﻣ ﺖﺳدﻻﺎﺑ نﺎﯾﺮﺟ تاﺮﯿﯿﻐﺗ نﺎﻣز تﺪﻣ رد ﺪﻫد

.ﺖﺳا هدﺮﮐ ﺖﯿﺒﺜﺗ فﺪﻫ ﻖﻤﻋ رد ار بآ ﻖﻤﻋ ﯽﻫﺎﺗﻮﮐ

(6)

نﺎﯾﺮﺟ ﺶﯾاﺰﻓا يﻮﯾرﺎﻨﺳ رد ﻪﭽﯾرد ﯽﮔﺪﺷزﺎﺑ و ﻖﻤﻋ ،نﺎﯾﺮﺟ تاﺮﯿﯿﻐﺗ-4 ﻞﮑﺷ Figure 4- Flow, water depth and gate opening variations in flow increase scenario

نﺎﯾﺮﺟ ﺶﻫﺎﮐ يﻮﯾرﺎﻨﺳ رد ﻪﭽﯾرد ﯽﮔﺪﺷزﺎﺑ و ﻖﻤﻋ ،نﺎﯾﺮﺟ تاﺮﯿﯿﻐﺗ-5 ﻞﮑﺷ Figure 5- Flow, water depth and gate opening variations in flow decrease scenario

لوﺪﺟ 1 - ا يﺎﻫ ﺺﺧﺎﺷ ﺮﯾدﺎﻘﻣ نﺎﯾﺮﺟ ﺶﻫﺎﮐ و ﺶﯾاﺰﻓا يﺎﻫﻮﯾرﺎﻨﺳ ياﺮﺑ ﯽﺑﺎﯾزر

Table 1- Performance indicators for flow increase and decrease scenarios نﺎﯾﺮﺟ ﺶﯾاﺰﻓا

(Flow increase)

نﺎﯾﺮﺟ ﺶﻫﺎﮐ (Flow decrease)

Yt(cm) 20.00 20.00

Max Y(cm) 20.07 20.00

Min Y(cm) 20.00 20.00

MAE 3.50 0.00

IAE 2.57 0.00

SRT10(min) 0.00 0.00

(7)

نﺎﯾﺮﺟ ﺶﯾاﺰﻓا يﻮﯾرﺎﻨﺳ رد هﺪﺷ ﺪﯿﻟﻮﺗ ﺪﯾﺪﺟ ﺖﯿﻌﻤﺟ داﺪﻌﺗ رادﻮﻤﻧ-6 ﻞﮑﺷ Figure 6- Generation of population diagram in flow increase scenario

نﺎﯾﺮﺟ ﺶﻫﺎﮐ يﻮﯾرﺎﻨﺳ رد هﺪﺷ ﺪﯿﻟﻮﺗ ﺪﯾﺪﺟ ﺖﯿﻌﻤﺟ داﺪﻌﺗ رادﻮﻤﻧ - 7 ﻞﮑﺷ Figure 7- Generation of population diagram in flow decrease scenario

ﯽﻣ نﺎﺸﻧ ﺞﯾﺎﺘﻧ لﺮـﺘﻨﮐ ﻢﺘـﺴﯿﺳ ﺪﻫد

نﺎـﻣز و ﺖـﻗد ﺮـﻈﻧ زاRL

ﯽﻣ ﯽﺑﻮﻠﻄﻣ دﺮﮑﻠﻤﻋ ياراد ﯽﯾﻮﮕﺨﺳﺎﭘ ﯽﻣ و ﺪﺷﺎﺑ

تﺎـﻌﻟﺎﻄﻣ ياﺮﺑ ﺪﻧاﻮﺗ هزﺎﺳ لﺮﺘﻨﮐ ياﺮﺑ ﺮﺘﺸﯿﺑ

.دﺮﯿﮔ راﺮﻗ هدﺎﻔﺘﺳا درﻮﻣ يرﺎﯿﺑآ يﺎﻬﻟﺎﻧﺎﮐ رد ﺎﻫ

ﻊﺑﺎﻨ

1- Alpaydin E. 2004. Introduction to Machine Learning. The MIT Press.

2- Bennett S. 1993. A History of Control Engineering. Peter Peregrinus Ltd.

3- Burt C.M. 1983. Regulation of Sloping Canals by Automatic Downstream Control. Dept. of Agricultural and Irrigation Engineering, Utah State University.

4- Butz M., and Wilson S. 2001a. An algorithmic description of XCS. Advances in Learning Classifier Systems, 267- 274.

5- Butz M.V., Kovacs T., Lanzi P.L., and Wilson S.W. 2001b. How XCS evolves accurate classifiers. Proceedings of the Third Genetic and Evolutionary Computation Conference (GECCO).

6- Buyalski C.P., Ehler D., Falvey, H.T., Rogers D.C., and Serfozo E.A. 1991. Canal Systems Automation Manual:

Volume 1. US Department of Interior.

(8)

7- Clemmens A. and Replogle J. 1989. Control of irrigation canal networks. Journal of irrigation and drainage engineering, 115(1): 96-110.

8-Clemmens A.J., Kacerek T. F., Grawitz B., and Schuurmans W. 1998. Test cases for canal control algorithms. Journal of irrigation and drainage engineering, 124(1): 23-30.

9- Hernández J. and Merkley G. 2011. Canal structure automation rules using an accuracy-based learning classifier system, a genetic algorithm, and a hydraulic simulation model. I: result. Journal of irrigation and drainage engineering, 137(1), 12–16.

10- Malaterre P.O., Rogers D. C., and Schuurmans J. 1998. Classification of canal control algorithms. Journal of irrigation and drainage engineering, 124(1): 3-10.

11- Monem M.J., and Manz D. H. 1994. Application of simulation techniques for improving the performance of irrigation conveyance systems. Journal of Water Resources Engineering, 2: 1-22.

12-Rogers D.C., and Goussard J. 1998. Canal control algorithms currently in use. Journal of irrigation and drainage engineering, 124(1): 11-15.

13- Sawadogo S., Faye R., Benhammou A., and Akouz K. 2000. Decentralized adaptive predictive control of multireach irrigation canal. Institute of Electrical and Electronics Engineers: 5: 3438-3442.

14-Sutton R.S., and Barto A. G. 1998. Reinforcement Learning: An Introduction. Cambridge University Press.

15- Wagemaker R. 2005. Model predictive control on irrigation canals application of various internal models. Delft University of Technology, Netherland.

16- Zhang Q., Wu C.H., and Tilt K. 1996. Application of fuzzy logic in an irrigation control system. Industrial Technology, Proceedings of Institute of Electrical and Electronics Engineers International Conference on, Institute of Electrical and Electronics Engineers, New York.

(9)

Development of Reinforcement Learning Algorithm for Automation of Slide Gate Check Structure in Canals

K. Shahverdi1 – M.J. Monem2* Received: 15-11-2013 Accepted: 21-06-2015

Introduction: Nowadays considering water shortage and weak management in agricultural water sector and for optimal uses of water, irrigation networks performance need to be improveed. Recently, intelligent management of water conveyance and delivery, and better control technologies have been considered for improving the performance of irrigation networks and their operation. For this affair, providing of mathematical model of automatic control system and related structures, which connected with hydrodynamic models, is necessary. The main objective of this research, is development of mathematical model of RL upstream control algorithm inside ICSS hydrodynamic model as a subroutine.

Materials and Methods: In the learning systems, a set of state-action rules called classifiers compete to control the system based on the system's receipt from the environment. One could be identified five main elements of the RL: an agent, an environment, a policy, a reward function, and a simulator. The learner (decision-maker) is called the agent. The thing it interacts with, comprising everything outside the agent, is called the environment. The agent selects an action based on existing state in the environment. When the agent takes an action and performs on environment, the environment goes new state and reward is assigned based on it.

The agent and the environment continually interact to maximize the reward. The policy is a set of state-action pair, which have higher rewards. It defines the agent's behavior and says which action must be taken in which state. The reward function defines the goal in a RL problem. The reward function defines what the good and bad events are for the agent. The higher the reward, the better the action. The simulator provides environment information. In irrigation canals, the agent is the check structures. The action and state are the check structures adjustment and the water depth, respectively. The environment comprises the hydraulic information existing in the canal. Policy is a map of water depth-check structure pairs. Reward function is defined based on the difference between water depth and target depth, and the simulator is a hydrodynamic model which, in the present study, was Irrigation Conveyance System Simulation (ICSS). In the developed RL, the RL begins with required initializations, and then the canal structures are operated. While the maximum reward is reached at the time step of t, the agent receives some representation of the state of the environment and, on that basis, selects an action. The simulator performs the action and provides information on the state of the new environment by simulating the canal system. Finally, the reward is assigned. Maximizing the reward, the RL goes on to the next time step. This process is continued until the final simulation time step is reached. The learning process is similar for all operations. The ICSS hydrodynamic model was used to simulate the canal system and provide the environmental information. Input to the ICSS was the new selected action. The ICSS performed the action and simulated the canal system. The output from the ICSS was information on the new environment for use in the next time step. Two scenarios of flow increase and decrease with initial flow of 25 l/s were simulated. MAE (Maximum Absolute Error), IAE (Integral of Absolute magnitude of Error) and SRT (System Response Time) indicators have been used to assess developed model. For flow decrease scenario, the indicators value are obtained zero.

Results and Discussion: Results were obtained from the performed scenarios. In the flow increase scenario, water depth variations were inside the dead band, therefore, SRT indicator was obtained zero. The MAE and IAE indicators were obtained 3.5% and 2.57%, respectively, which showed the water depth deviations from target depth was very low. In the flow decrease scenario, the all indicators values were obtained zero. At time zero in two scenarios, 1000 populations were generated and tested. As the RL controlled the water depth, it generated new populations, too. The reason for this is that the RL generates a new population if there is no classifier with maximum reward in the population. There are no new generation after 0.03 hr and 0.16 hr in flow increase and flow decrease scenarios, respectively. Considering the results, it could be concluded that the developed control system is a powerful technique in terms of accuracy and response time for water depth

1 , 2- Ph.D. Graduated and Professor of Water Structures Engineering Department, Tarbiat Modares University (*- Corresponding Author Email: [email protected])

(10)

control.

Conclusion: In this research, the RL upstream control system was developed and connected with ICSS hydrodynamic model and evaluated in two scenarios of flow increase and flow decrease. The results showed an ability to control of deviations, short response time and accurate performance of the developed RL control system, which could be used for further study in irrigation canals.

Keywords: Learning, Model development, Performance improvement

Referensi

Dokumen terkait

Cross-over لﺪﻣ هرﺎﺷا يﺎﻫ ﻤﻫ هﺪﺷ ﮕ ﺮﺑ ﻲﻨﺘﺒﻣ ﻲ ﭘ ﺘﺳﻮﻴ ﮕ دﻮﺑ ﻲ ها .ﺪﻧ رد لﺎﺳ ﺮﻴﺧا يﺎﻫ ﮔ ﺶﻳاﺮ ﭼ ﻤﺸ ﮕ شور ﻪﺑ يﺮﻴ لﺎﻴﺳ ﺮﺑ ﻲﻨﺘﺒﻣ يﺎﻫ ﮔ ﻪﺑ ﻪﺘﺴﺴ صﻮﺼﺧ شور ﻪﻜﺒﺷ ﻦﻣﺰﺘﻟﻮﺑ يا ترﻮﺻ ﮔ ﻪﺘﻓﺮ

.364 ،1359 ،ﮓﻧﻮﯾﺖﺳا ﮏﯿﻟﻮﺒﻤﺳ ىﺎﻫدﻮﻤﻧ ترﻮﺻ ﻪﺑ ﻂﻘﻓ ﺎﻬﻧآ هرﻮﻄﺳا ﻪﮐ ﻰﯾﺎﺠﻧآ زا و ﺖﺳا دﺎﻤﻧ ﺎﺠﻨﯾا رد ﺮﯿﻃﺎﺳا نﺎﯿﺑ نﺎﺑز ىﺎﻫوﺮﯿﻧ زا ﻰﯾﺎﯿﻧد و دراد هرﺎﺷا لﻮﻘﻌﻣ و سﻮﺴﺤﻣﺮﯿﻏ ﻢﯿﻫﺎﻔﻣ ﻪﺑ ﻢﯿﻘﺘﺴﻣ هرﺎﺷا ،ﺪﺸﮐ

،ﺪﻧا ﯽﻧدﻮﻣزآ زا تﺎﻌﻟﺎﻄﻣ ﻦﯾا ﺮﺘﺸﯿﺑ ﻪﺑ ﯽﻧاﻮﯿﺣ يﺎﻫ شﻮﻣ هﮋﯾو هدﺮﮐ هدﺎﻔﺘﺳا ﯽﯾاﺮﺤﺻ يﺎﻫ ﺪﻧا 23 ،ﻦﯾا دﻮﺟو ﺎﺑ ، ﻦﯿﻤﻫ و ﯽﻧﺪﺑ ﯽﮔدﺎﻣآ ﺮﯿﺛﺄﺗ ﺺﺧﺎﺷ ﺦﺳﺎﭘ ﺮﺑ دﺮﻣ ﺮﺑاﺮﺑ رد نز ﺖﯿﺴﻨﺟ رﻮﻃ ﯽﻧدﻮﻣزآ

ﺺﻴﺨﺸﺗ يﺎﻫ نﺎﻣرد قﻮﻓ يﺎﻫ ﻲﭘ رد ﺰﻴﻧ ار ﻲﻔﻠﺘﺨﻣ ﺘﻧ ﻪﺑ ﺚﺤﺑ ﻦﻳا نﺎﻳﺎﭘ رد ﻪﻛ ﻪﺘﺷاد ﺪﺷ ﺪﻫاﻮﺧ هرﺎﺷا ﺰﻴﻧ نآ ﻪﺠﻴ ﻪﻛ هﺪﻣآ ﻊﺑﺎﻨﻣ رد ﺮﻈﻧ ﻦﻳا زا و دﻮﺟو ﺎﺑ رد ﺺﺨﺸﻣ ﺶﻳاﺰﻓا ناﺪﻨﻤﻟﺎﺳ رد ﺖﻧﻮﻔﻋ ضراﻮﻋ

،هﺪﺷ دﺎﯾ لﺎﻤﺘﺣا ﺮﺑﺎﻨﺑ ﻪﮑﻧآ لﺎﺣ ،ﺖﺳا هداد ﻞﯾﺪﺒﺗ و صﺎﺼﻗ نﺎﯾﺮﺟ مﺪﻋ یﺮﻬﻗ ﻪﺑ نآ رد شرا ﺎﯾ و ﻪﯾد ﻮﻣ ﺮﺑ هدراو یﺪﻤﻋ تﺎﯾﺎﻨﺟ ، ﻪﯿﺟﻮﺗ ﯽﻤﻧ ﺮﯾﺬﭘ ،هزوﺮﻣا ﻪﮐاﺮﭼ .ﺪﺷﺎﺑ هدﺮﮐ ﻊﻓر ار ﻊﻧﺎﻣ ﻦﯾا ﻮﻣ و

ﻪﻟﺄﺴﻣ ﻖﯿﻘﺤﺗ ﻦﯾا ﯽﻠﺻا ﻪﺑ ﻪﺟﻮﺗ ﺎﺑ ﻪﮐ ﺖﺳا نآ ﺪﻨﭼ رد ﺎﻬﻫﺎﮕﺸﻧاد رد ناﺮﺘﺧد ﺶﯿﭘ زا ﺶﯿﺑ رﻮﻀﺣ ناﻮﻨﻋ ﻪﺑ و ﻪﯿﻟﺎﻋ تﻼﯿﺼﺤﺗ ﻪﺑ ﯽﺑﺎﯿﺘﺳد و ﺮﯿﺧا لﺎﺳ ﻪﺑ ور ﺖﯿﻌﻤﺟ ﻪﮐ هدﺮﮐ ﻞﯿﺼﺤﺗ نﺎﻧاﻮﺟ زا ﯽﺸﺨﺑ و نﺎﮕﺒﺨﻧ

ﺎﺑ ﻪﻤﻫ ﮔ ﯿﺮ ي ﺪﯾوﻮﮐ - 19 ﻪﻋﻮﻤﺠﻣ ا ي ﺎﺑ ﻂﺒﺗﺮﻣ ﯽﻋﺎﻤﺘﺟا و ﯽﻧﺎﻤﺴﺟ ،ﯽﻧاور تﻼﮑﺸﻣ زا سوﺮﯾوﺎﻧوﺮﮐ 2019دﻮﺸﻧ ﻪﺟﻮﺗ تﻼﮑﺸﻣ ﻦﯾا ﻪﺑ ﺮﮔا ﻪﮐ 2021 ،نارﺎﮑﻤﻫ و يﺪﻧﺎﻫرو ﺪﺷ دﺎﺠﯾا نﺎﻬﺟ مدﺮﻣ زا يرﺎﯿﺴﺑ رد ، و

ﯽﻣ ﻪﮑﻨﯾا ﻪﺑ ﻪﺟﻮﺗ ﺎﺑ و ﺪﺷﺎﺑ ﻞﺋﺎـﺴﻣ ﻪﺑ و ﺪﻨﺘﺴﻫ هﺪﯿﭽﯿﭘ رﺎﯿﺴﺑ رﻮﺷ بآ ﻞﺧاﺪﺗ ﯽـﻤﻧ ﯽـﻠﮐ رﻮـﻃ ﻪـﺑ ﺪـﻨﻧاﻮﺗ شور ﻦﯾاﺮﺑﺎﻨﺑ ،ﺪﻧﻮﺷ ﻞﺣ ﯽﻠﯿﻠﺤﺗ ترﻮﺻ هﺪـﯾا يراﺰـﺑا يدﺪـﻋ يﺎﻫ لآ ﻪﯿﺒﺷ ياﺮﺑ ﺶﯿﭘ و