Inmath, wecalltheserelationsfunctions. Itsourwayofrepresenting a setofpatterns, a mapping,
a relationshipbetweenmanyvariables.
Nomatterwhatmachinelearningmodelweuse, nomatterwhatdatasetweuse, thegoalofmachinelearningistooptimizeforanobjectiveandbydoingsoweareapproximating a function.
we’llseethatthereexists a valley, theminima. We’lluseourerrortohelpuscomputethepartialderivativewithrespecttoeachweightvaluewehaveandthisgivesusourgradient.
Thegradientrepresentsthechangeintheerrorwhentheweightsarechangedby a verysmallvaluefromtheiroriginalvalue.
Weusethegradienttoupdatethevaluesofourweightsin a directionsuchthattheerrorisminimized,
Thefirstderivativetellsusifthefunctionisincreasingordecreasingat a certainpoint, andthesecondderivativetellsusifthefirstderivativeisincreasingordecreasing, whichhintsatitscurvature.
Firstordermethodsprovideuswith a linethatistangentialto a pointonanerrorsurface, andsecondordermethodsprovidesuswith a quadraticsurfacethatkissesthecurvatureoftheerrorsurface.
HahaGet a roomyoutwoTheadvantagethenofsecondordermethodsisthattheydon’t ignorethecurvatureoftheerrorsurface, andintermsofstep-wiseperformance, theyarebetter.
Letslookat a popularsecondorderoptimizationtechniquecalledNewton’s methodnamedafterthedudewhoinventedcalculus.
Who’s namewas…
ThereareactuallytwoversionsofNewton’s method, thefirstversionisforfindingtherootsof a polynomial, allthosepointswhereitintersectsthe x-axis.
Soifyouthrew a ballandrecordeditstrajectory, findingtherootoftheequationwouldtellyouexactlywhattimeithitstheground.
Letssaywehave a function f of x andsomeinitialguessedsolution. Newtonsmethodsaysthatwefirstfindtheslopeofthetangentlineatourguesspoint, thenfindthepointatwhichthetangentlineintersectsthe x axis.
At a highlevel, given a randomstartinglocation, weconstruct a quadraticapproximationtotheobjectivefunctionthatmatchesthefirstandsecondderivativevaluesatthatpoint.
OKletsgoovertwocasesofNewton’s Methodforoptimizationtolearnmore, a 1D caseand 2D case.
Inthefirstcasewe’vegot a 1 dimensionalfunction. Wecanobtain a quadraticapproximationat a givenpointofthefunctionusingwhat’s called a Taylorseriesexpansion,
neglectingtermsoforderthreeorhigher.
A Taylorseriesis a representationof a functionasaninfinitesumoftermsthatarecalculatedfromthevaluesofthefunctionsderivativesat a singlepoint.
Sowhenshouldyouuse a secondordermethod? Firstordermethodsareusuallylesscomputationallyexpensivetocomputeandlesstimeexpensive, convergingprettyfastonlargedatasets.
Herearethekeypointstoremember: Firstorderoptimizationtechniquesusethefirstderivativeof a functiontominimizeit, secondorderoptimizationtechniquesused
thesecondderivative. TheJacobianis a matrixoffirstpartialderivativesandtheHessianis a matrixofsecondpartialderivatives.
AndNewton’s Methodis a a popularsecondorderoptimizationtechniquethatcansometimesoutperformgradientdescent. LastweekscodingchallengewinnerisAlbertoGarces.