What I didisifthe Q valueisthemax Q valueforeachofthepossibleactions, I markedthatthatcombinationwithgreendotOtherwise, it's reddotsoyoucanseehereovertime.
I alsopridecouldjustkeptitgreendotbutwhatever.
Anyway, um, myinitialplanactuallywastocreate a three D chartandthen, um, andhave, like, cubesbasically, and I startedtodothatandthenitactuallywasn't ascoolas I thoughtitwasgonnabe.
Then I waslike, Well, I couldjustgraphlikeeachoftheactions, andthisiskindofwhat I cameupwith.
Sothisisattheveryend, andyoucanactuallyseehere.
Actionzero.
Thatactionzerowas, I think, pushleftactionOneif I recall, rightisjustdonothing.
Andtheaction, too, ispushright.
Soyoucanseeinthesekindofthisclusterhere, it's like, thisisprettymuchalways a pushleftandlikesomeofthesethataren't pushedleftoractuallyprobablyjustthoseairstilljustleftoverfromtherandominitializationandwejusthaven't hitthatcombinationyetisprobablywhat's goingonthere.
Um, andthenhereagain, I can't imaginemanycircumstanceswhereyouwouldn't wanttobepushingonewayortheother.
I mean, there's probablysomelike, asyoucomeup, likemaybeonlywanttogosohighupthehillsorsomething, likewhenyou'reswingingbacktheotherway.
Butonethingyou'llnoteisitneveractuallytouchesanyoftheselikefullyrightpositionsbecauseofvictorywasthat I wanttosaylike 0.5, buttheactionspaceforthat X axiswentto 0.6 or 0.7 orsomethinglikethat, and I thinkthat's whythesenevergettouched, becausethethecarnevergoestothosepositions.
Soit's kindofcooltoseeitbecauseyoucan.
Youcanactuallyseewhat's happeninginthe, uhintheexampleonthegraph, whichisjustkindofcool, Um, andalsojustthefactthatitslowlygrowsoutand I thinkit's again.
I thinkit's slowlygrowingoutbecauseitlearnstopushitselffurtherandfurtherbybuildingmomentum.
Soimmediatelyafter I learnedquelearning, thefirstthing I wanttodoismakemyownenvironment.
I I didn't reallyintendThiodo a tutorialonthatjustbecause I didn't reallyseewhy.
Buteverybody's askingaboutit, andit's, like, soobvious, likethefirstthing I wanttodoisbuildmyownkeylearningenvironment.
So I guess I shouldn't besurprisedthatlotsofpeoplewanttoseeusmakeourown.
So, uh, that's what I dointhenexttutorial, andthebenefitthereisthatyoucanifit's yourenvironment, youcanmakecertaintweaksandchangesand, likeslowlyaddcomplexityorremove.
Or, youknowyouhavefullcontrolovertheenvironment.
Whereis, likewithmountaincar?
I can't reallymakethatenvironmenttoomuchmorecomplicated.
Um, I couldgowith a differentenvironment, maybe, orsomethinglikethat, butyetyouneverhave a lotofcontrol.
Ifyoucanmakeyourownenvironment, plusyouconstructtoe, understandhowyoucouldtakesomelikerealworldexamplesormaybe a gamethatyouwanttomakeonyourownorwhatever.