Packt Machine Learning with Spark

上传:mamadebi40015 浏览: 59 推荐: 0 文件:PDF 大小:4.59MB 上传时间:2020-06-06 22:52:41 版权申诉
MachineLearningwithSpark Copyrighto2015PacktPublishi ing Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrieval system,ortransmittedinanyformorbyanymeans,withoutthepriorwritten permissionofthepublisher,exceptinthecaseofbriefquotationsembeddedin criticalarticlesorreviews rthasbeenmadeinthepreparationofthisbooktoensuretheaccuracy oftheinformationpresentedHowevertheinformationcontainedinthisbookis soldwithoutwarranty,citherexpressorimplied.NeithertheauthornorPacl Publishinganditsdealersanddistributorswillbeheldliableforanydamages causedorallegedtobecauseddirectlyorindirectlybythisbook PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthe companiesandproductsmentionedinthisbookbytheappropriateuseofcapitals However,PacktPublishingcannotguaranteetheaccuracyofthisinfori Firstpublished:February2015 Productionreference:1170215 PublishedbyPacktPublishingltd Liveryplace 35Liverystreet Birminghamb32PBUK ISBN978-178328-8519 www.packtpub.com CoverimagebyAkshayPaunikar(akshaypaunikar4@gmail.com) Credits Author ProjectCoordinator NickPentreath Miltondsouza Reviewers Proofreaders Andreamostosi SimranBhogal Haoren Mariagould Krishnasankar AmeeshaGreen Paulhindle CommissioningEditor Rebeccayoue Indexer PriyaSane AcquisitionEditor Rebeccayoue Graphics SheetalAute ContentDevelopmentEditor Abhinashsahu Susmitasabat Productioncoordinator Technicaleditors NiteshThakur Vivekarora PankajKadam CoverWork NiteshThakur CopyEdit KarunaNarayanan Abouttheauthor lickPentreathhasabackgroundinfinancialmarkets,machinelearning,and softwaredevelopment.HehasworkedatGoldmanSachsGroup,Inc;asaresearch scientistattheonlineadtargetingstart-upCognitiveMatchLimited,London;and ledthedataScienceandanalyticsteamatmxit,Africaslargestsocialnetwork Heisacofounderofgraphflow,abigdataandmachinelearningcompanyfocused onuser-centricrecommendationsandcustomerintelligence.Heispassionateabout combiningcommercialfocuswithmachinelearningandcutting-edgetechnologyto buildintelligentsystemsthatlearnfromdatatoaddvaluetothebottomline NickisamemberoftheApacheSparkProjectManagementCommittee Acknowledgments Writingthisbookhasbeenquitearollercoasterrideoverthepastyear,withmany upsanddowns,latenights,andworkingweekends.Ithasalsobeenextremely rewardingtocombinemypassionformachinelearningwithmyloveoftheApache sparkproject,andIhopetobringsomeofthisoutinthisbook Iwouldliketothankthepacktpublishingteamforalltheirassistancethroughout thewritingandeditingprocess:Rebecca,Susmita,Sudhir,Amey,Neil,Vivek, Pankaj,andeveryonewhoworkedonthebook ThanksalsogotoDeboradonatoatstumbleUponforassistancewithdata-and legal-relatedqueries Writingabooklikethiscanbeasomewhatlonelyprocess,soitisincrediblyhelpful togetthefeedbackofreviewerstounderstandwhetheroneisheadedintheright direction(andwhatcourseadjustmentsneedtobemade).I'mdeeplygrateful AndreaMostosi,HaoRen,andKrishnaSankarfortakingthetimetoprovidesuch detailedandcriticalfeedback Icouldnothavegottenthroughthisprojectwithouttheunwaveringsupportofall myfamilyandfriendsespeciallymywonderfulwifeTammywhowillbegladto havemebackintheeveningsandonweekendsonceagain.Thankyouall! Finally,thankstoallofyoureadingthis;Ihopeyoufindituseful Aboutthereviewers AndreaMostosiisatechnologyenthusiast.Aninnovationloversincehewasa child,hestartedaprofessionaljobin2003andworkedonseveralprojects,playing almosteveryroleinthecomputerscienceenvironment.HeiscurrentlytheCtoat TheFool,acompanythattriestomakesenseofwebandsocialdata.Duringhisfree time,helikestraveling,running,cooking,biking,andcoding iwouldliketothankmygeekfriends:Simonem,Danielev,lucat, LuigiP,Michelen,lucaO,Lucab,DiegoCandFabiob.theyare thesmartestpeopleiknow,andcomparingmyselfwiththemhas alwayspushedmetobebetter Haorenisasoftwaredeveloperwhoispassionateaboutscala,distributed systems,machinelearning,andApacheSpark.HewasanexchangestudentatEPFL whenhelearnedaboutScalain2012.HeiscurrentlyworkinginParisasabackend anddataengineerforclaravista-acompanythatfocusesonhigh-performance marketing.HisworkresponsibilityistobuildaSpark-basedplatformforpurchase predictionandanewrecommendersystem Besidesprogrammingheenjoysrunning,swimming,andplayingbasketballand badmintonYoucanlearnmoreathisbloghttp://www.invkrh.me Krishnasankarisachiefdatascientistatblackarrowwhereheisfocusin ng onenhancinguserexperienceviainference,intelligence,andinterfaces.earlier stintsincludeworkingasaprincipalarchitectanddatascientistatTataAmerica InternationalCorporation,directorofdatascienceatabioinformaticsstart-up company,andasadistinguishedengineeratCiscoSystems,Inc.Hehasspokenat variousconferencesaboutdatascience(http://goo.gl/9pyjmh),machinelearning (http://goo.gl/ssem2r),andsocialmediaanalysis(http://goo.g1/d9ypvq).Ile hasalsobeenaguestlecturerattheNavalPostgraduateSchool.Hehaswrittenafew booksonJava,wirelessLansecurity,Web2.0,andnowonSpark.Hisotherpassion iSLEGOrobotics.EarlierinApril,hewasattheStLouisFlLWorldCompetitionas arobotsdesignjudge Www.Packtpub.com Supportfiles,eBooks,discountoffers,andmore Forsupportfilesanddownloadsrelatedtoyourbook,pleasevisit wwwpacktpub.com DidyouknowthatpacktoffersebookversionsofeverybookpublishedwithPDF andepuBfilesavailableYoucanupgradetotheebookversionatwww.packtpub comandasaprintbookcustomer,youareentitledtoadiscountontheebookcopy Getintouchwithusatservice@packtpubcomformoredetails Atwww.Packtpub.comyoucanalsoreadacollectionoffreetechnicalarticles signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffers onpacktbooksandebooks PACKTLIB https://www2.packtpub.ccm/books/subscription/packtlib DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt'sonlinedigital booklibrary.IIere,youcansearch,access,andreadPackt'sentirelibraryofbooks Whysubscribe? FullysearchableacrosseverybookpublishedbyPackt Copyandpaste,print,andbookmarkcontent Ondemandandaccessibleviaawebbrowser Freeaccessforpacktaccountholders IfyouhaveanaccountwithPacktatwww.packtpub.comyoucanusethistoaccess PacktLibtodayandviewgentirelyfreebooksSimplyuseyourlogincredentialsfor immediateaccess Tableofcontents Preface Chapter1:GettingUpandRunningwithSpark 7 InstallingandsettingupSparklocally 8 Sparkclusters 10 TheSparkprogrammingmodel SparkContextandSparkConf TheSparkshell 12 Resilientdistributeddatasets 14 CreatingRDds 15 Sparkoperations 15 CachingRDDs 18 Broadcastvariablesandaccumulators 19 Thefirststeptoasparkprograminscala 21 ThefirststeptoasparkprograminJava 24 ThefirststeptoaSparkprograminPython 28 GettingSparkrunningonAmazonEC2 30 Launchinganec2Sparkcluster 31 Summary 35 Chapter2:DesigningaMachineLearningSystem 37 IntroducingMovieStream 38 Businessusecasesforamachinelearningsystem 39 Personalization 40 Targetedmarketingandcustomersegmentation 40 Predictivemodelingandanalytics 41 Typesofmachinelearningmodels 41 Thecomponentsofadata-drivenmachinelearningsystem 42 Dataingestionandstorage 42 Datacleansingandtransformation 43
上传资源
用户评论