/Users/andrea/_magisterarbeit/korpus/clean/trainkorpus/1/file_125.html NN ----------------------------------------- : Parallel JJ Patterns NNS for IN Synchronization NN on IN Shared JJ Memory NP Multiprocessors NP Parallel JJ Patterns NNS for IN Synchronization NN on IN Shared JJ Memory NP Multiprocessors NP Paul NP E NP . SENT McKenney NP Sequent NN Computer NP Systems NP , , Inc NP . SENT Copyright NN c SYM 1995 CD , , Paul NP E NP . SENT McKenney NP All NP rights NNS reserved VVD Abstract JJ Synchronization NN primitives NNS allow VVP controlled VVN coordination NN of IN different JJ activities NNS on IN a DT multiprocessor NN system NN . SENT The DT only JJ reason NN to TO go VV to TO the DT trouble NN of IN parallelizing VVG a DT program NN is VBZ to TO gain VV performance NN , , so RB it PP is VBZ necessary JJ to TO understand VV the DT performance NN implications NNS of IN synchronization NN primitives NNS in IN addition NN to TO their PP$ correctness NN , , liveness NN , , and CC safety NN properties NNS . SENT This DT paper NN presents VVZ a DT set NN of IN patterns NNS for IN the DT use NN of IN a DT simple JJ set NN of IN synchronization NN primitives NNS to TO increase VV performance NN or CC reduce VV maintenance NN costs NNS of IN parallel JJ programs NNS running VVG on IN symmetric JJ shared VVN memory NN multiprocessors NNS . SENT Section NN sec NN . SENT example NN presents VVZ the DT example NN that WDT is VBZ used VVN throughout IN the DT paper NN to TO demonstrate VV use NN of IN the DT patterns NNS . SENT Section NN sec NN . SENT parallel NN describes VVZ the DT overall JJ parallelization NN pattern NN , , and CC the DT succeeding VVG sections NNS describe VVP subpatterns NNS of IN this DT pattern NN . SENT Example NN Algorithm NN A DT simple JJ hashed VVN lookup NN table NN is VBZ used VVN to TO illustrate VV the DT patterns NNS in IN this DT paper NN . SENT This DT example NN searches VVZ for IN a DT specified JJ key NN in IN a DT list NN of IN elements NNS and CC performs VVZ operations NNS on IN those DT elements NNS . SENT Both CC the DT individual JJ elements NNS and CC the DT list NN itself PP may MD require VV mutual JJ exclusion NN . SENT The DT data NN structure NN for IN a DT mono JJ processor NN implementation NN is VBZ shown VVN in IN Figure NN fig NN . SENT lkupstruct NN . SENT typedef NN struct NN locktab NN struct NN locktab NN t NN rc NN next JJ . SENT int NP rc NN key NN . SENT int NP rc NN data NNS . SENT locktab NN t NN . SENT Lookup NP Table NN Element NN Again RB , , this DT structure NN will MD be VB embellished VVN as IN needed VVN for IN each DT of IN the DT example NN uses VVZ with IN synchronization NN . SENT A DT search NN for IN the DT element NN with IN a DT given VVN key NN might MD be VB implemented VVN as IN shown VVN in IN Figure NN fig NN . SENT lkupsrch NN . SENT Header NN for IN list NN of IN locktab NN t's NNS . SENT locktab NN t NN looktab NN head NN LOOKTAB NP NHASH NP NULL NN . SENT define VV LOOKTAB NP HASH VV key JJ key JJ Return NN a DT pointer NN to TO the DT element NN of IN the DT table NN with IN the DT specified JJ key NN , , or CC return NN NULL NN if IN no DT such JJ element NN exists VVZ . SENT locktab NN t NN locktab NN search NN int NP key JJ locktab NN t NN p NN . SENT p NN locktab NN head NN LOOKTAB NP HASH NN key NN . SENT while IN p NN . SENT NULL NN if IN p NN rc NN key JJ key JJ return NN NULL NN . SENT if IN p NN rc NN key JJ key JJ return NN p NN . SENT p NN p NN rc NN next JJ . SENT return NN NULL NN . SENT Lookup NP Table NN Search NP Parallelization NP Problem NN Naive JJ approaches NNS to TO parallelization NN often RB result NN in IN inadequate JJ speedup NNS and CC excessive JJ contention NN , , while IN more RBR sophisticate VV approaches NNS often RB result VVP in IN good JJ speedup NNS but CC excessive JJ maintenance NN costs NNS . SENT Context NN An DT existing JJ sequential JJ program NN with IN inadequate JJ performance NN , , an DT existing JJ parallel JJ program NN with IN inadequate JJ speedup NNS , , or CC an DT existing JJ parallel JJ program NN with IN excessive JJ maintenance NN costs NNS . SENT Forces VVZ The DT forces NNS acting VVG on IN the DT performance NN of IN parallel JJ programs NNS are VBP speedup NNS , , contention NN , , overhead RB , , economics NNS , , and CC development NN and CC maintenance NN costs NNS . SENT Speedup NNS . SENT Getting VVG a DT program NN to TO run VV faster JJR is VBZ the DT only JJ reason NN to TO go VV to TO all DT of IN the DT time NN and CC trouble NN required VVN to TO parallelize VV it PP . SENT Speedup NNS is VBZ defined VVN to TO be VB the DT ratio NN of IN the DT time NN required VVN to TO run VV a DT sequential JJ version NN of IN the DT program NN to TO the DT time NN required VVN to TO run VV a DT parallel JJ version NN . SENT Contention NN . SENT If IN more JJR CPUs NNS are VBP applied VVN to TO a DT parallel JJ program NN than IN can MD be VB kept VVN busy JJ given VVN that IN program NN , , the DT excess JJ CPUs NN are VBP prevented VVN from IN doing VVG useful JJ work NN by IN contention NN . SENT Overhead RB . SENT A DT monoprocessor NN version NN of IN a DT given VVN parallel JJ program NN would MD not RB need VV synchronization NN primitives NNS . SENT Therefore RB , , any DT time NN consumed VVN by IN these DT primitives NNS is VBZ overhead NN that WDT does VVZ not RB contribute VV directly RB to TO the DT useful JJ work NN that IN the DT program NN is VBZ intended VVN to TO accomplish VV . SENT Note NN that IN the DT important JJ measure NN is VBZ usually RB the DT relationship NN between IN the DT synchronization NN overhead NN and CC the DT serial JJ overhead NN the DT greater JJR the DT serial JJ overhead NN of IN a DT given VVN critical JJ section NN , , the DT greater JJR the DT amount NN of IN synchronization NN overhead NN that WDT may MD be VB tolerated VVN . SENT Read VVN to TO Write VV Ratio NN . SENT A DT data NNS structure NN that WDT is VBZ mostly RB rarely RB updated VVN may MD be VB protected VVN with IN lower JJR overhead JJ synchronization NN primitives NNS than IN may MD a DT data NN structure NN with IN a DT high JJ update NN rate NN . SENT Economics NNS . SENT Budgetary JJ constraints NNS can MD limit VV the DT number NN of IN CPUs NP available JJ regardless RB of IN the DT potential JJ speedup NNS . SENT Development NN and CC Maintenance NN Costs NNS . SENT Budgetary JJ constraints NNS can MD limit VV the DT number NN and CC types NNS of IN modifications NNS made VVN to TO an DT existing JJ program NN a DT given VVN degree NN of IN speedup NNS is VBZ worth JJ only RB so RB much JJ time NN and CC trouble NN . SENT These DT forces NNS will MD act VV together RB to TO enforce VV a DT maximum JJ speedup NNS . SENT The DT first JJ three CD forces NNS are VBP deeply RB interrelated VVN , , so RB the DT remainder NN of IN this DT section NN analyzes VVZ these DT interrelationships NNS . SENT footnote NN A DT real JJ world NN parallel NN system NN will MD have VH many JJ additional JJ forces NNS acting VVG on IN it PP , , such JJ as IN data NN structure NN layout NN , , memory NN hierarchy NN latencies NNS , , and CC bandwidth NN limitations NNS . SENT Note NN that IN these DT forces NNS may MD also RB appear VV as IN part NN of IN the DT context NN . SENT For IN example NN , , economics NNS may MD act VV as IN a DT force NN cheaper JJR is VBZ better JJR or CC as IN part NN of IN the DT context NN the DT cost NN must MD not RB exceed VV 100 CD . SENT Appendix NN sec NN . SENT forcerelate NN presents VVZ mathematical JJ relationships NNS between IN these DT forces NNS . SENT An DT understanding NN of IN these DT relationships NNS can MD be VB very RB helpful JJ when WRB resolving VVG the DT forces NNS acting VVG on IN an DT existing JJ parallel JJ program NN . SENT The DT following NN gives VVZ a DT brief JJ summary NN of IN these DT relationships NNS . SENT The DT less JJR time NN a DT program NN spends VVZ in IN critical JJ sections NNS , , the DT better JJR the DT potential JJ speedup NNS . SENT See VV Figure NN fig NN . SENT speedup NNS . SENT The DT fraction NN of IN time NN that IN the DT program NN spends VVZ in IN a DT given VVN critical JJ section NN must MD be VB much RB less JJR than IN the DT reciprocal JJ of IN the DT number NN of IN CPUs NP for IN the DT actual JJ speedup NNS to TO approach VV the DT number NN of IN CPUs NP . SENT For IN example NN , , as RB shown VVN on IN Figure NN fig NN . SENT cpulim RB , , a DT program NN running VVG on IN 10 CD CPUs NP must MD spend VV much RB less JJR than IN one CD tenth NN of IN its PP$ time NN in IN the DT critical JJ section NN if IN it PP is VBZ to TO scale VV well RB . SENT Contention NN effects NNS will MD consume VV the DT excess JJ CPU NN and CC or CC wallclock JJ time NN should MD the DT actual JJ speedup NNS be VB less JJR than IN the DT number NN of IN available JJ CPUs NP . SENT The DT larger JJR the DT gap NN between IN the DT number NN of IN CPUs NP and CC the DT actual JJ speedup NNS , , the DT less RBR efficiently RB the DT CPUs NP will MD be VB used VVN . SENT Figure NN fig NN . SENT cpueff NP shows VVZ speedups NNS attainable JJ given VVN a DT desired VVN efficiency NN m NN . SENT The DT greater JJR the DT desired VVN efficiency NN , , the DT smaller JJR the DT acheivable JJ speedup NNS . SENT If IN the DT available JJ synchronization NN primitives NNS have VHP high JJ overhead NN compared VVN to TO the DT critical JJ sections NNS that IN they PP guard VVP , , the DT best JJS way NN to TO improve VV speedup NNS is VBZ to TO reduce VV the DT number NN of IN times NNS that IN the DT primitives NNS are VBP invoked VVN perhaps RB by IN fusing VVG critical JJ sections NNS , , using VVG data NN ownership NN , , or CC by IN moving VVG toward IN a DT more JJR coarse JJ grained VVN parallelism NN such JJ as IN code NN locking VVG . SENT See VV Regime NN 1 CD in IN Figure NN fig NN . SENT overhead RB . SENT If IN the DT critical JJ sections NNS have VHP high JJ overhead NN compared VVN to TO the DT primitives NNS guarding VVG them PP , , the DT best JJS way NN to TO improve VV speedup NNS is VBZ to TO increase VV parallelism NN by IN moving VVG to TO reader NN writer NN locking VVG , , partitioning VVG , , or CC data NNS ownership NN . SENT See VV Regime NN 3 CD in IN Figure NN fig NN . SENT overhead RB . SENT If IN the DT critical JJ sections NNS have VHP high JJ overhead NN compared VVN to TO the DT primitives NNS guarding VVG them PP and CC the DT data NNS structure NN being VBG guarded VVN is VBZ read VVN much RB more RBR often RB than IN modified VVN , , the DT best JJS way NN to TO increase VV parallelism NN is VBZ to TO move VV to TO reader NN writer NN locking VVG . SENT Solution NN Use VV the DT subpatterns NNS described VVN in IN this DT paper NN to TO re NN engineer NN the DT program NN so RB as IN to TO better RBR resolve VV the DT forces NNS acting VVG on IN it PP . SENT Resulting VVG Context NN A DT program NN that IN better JJR resolves NNS the DT forces NNS acting VVG on IN it PP . SENT This DT can MD mean VV a DT program NN with IN better JJR speedup NNS or CC a DT program NN with IN lower JJR maintenance NN costs NNS , , depending VVG on IN the DT context NN . SENT Design NN Rationale NN The DT relationships NNS between IN the DT subpatterns NNS is VBZ shown VVN in IN Figure NN fig NN . SENT relate VVP . SENT relate VVP Relationship NN Between IN Subpatterns NNS of IN Parallel JJ Pattern NN There EX are VBP many JJ other JJ parallel JJ programming NN performance NN patterns NNS . SENT footnote NN Operation NN combining VVG , , optimistic JJ locking VVG , , spin VV versus CC block VV , , wait VV free JJ synchronization NN , , and CC data NNS layout NN , , to TO name VV but CC a DT few JJ . SENT any DT attempt NN to TO capture VV all DT of IN them PP in IN a DT single JJ paper NN would MD be VB as RB foolhardy JJ as IN an DT attempt NN to TO capture VV all DT data NN structure NN related VVN patterns NNS in IN a DT single JJ paper NN . SENT The DT arcs NNS in IN Figure NN fig NN . SENT relate VV correspond VV to TO transformations NNS between IN the DT subpatterns NNS . SENT A DT Parallelize NP an DT existing JJ sequential JJ program NN . SENT This DT increases NNS speedup NNS , , but CC the DT increase NN will MD be VB limited VVN by IN contention NN and CC overhead NN . SENT In IN addition NN , , parallelization NN usually RB increases VVZ maintenance NN costs NNS . SENT B SYM Serialize VV a DT parallel JJ program NN . SENT This DT decreases VVZ maintenance NN costs NNS , , but CC also RB eliminates VVZ speedup NNS . SENT C SYM Insert NN explicit JJ locking VVG into IN a DT parallel JJ program NN that IN previously RB relied VVN on IN data NN ownership NN . SENT This DT increases NNS overhead NN , , thereby RB decreasing VVG speedup NNS . SENT This DT action NN might MD nevertheless RB be VB warranted VVN when WRB maintaining VVG a DT parallel JJ version NN of IN a DT serial JJ third JJ party NN product NN a DT new JJ version NN of IN the DT product NN might MD incorporate VV changes NNS that WDT invalidate VV the DT assumptions NNS on IN which WDT the DT data NNS ownership NN rested VVD . SENT The DT effect NN is VBZ to TO give VV up RP some DT speedup NNS in IN return NN for IN reduced JJ maintenance NN costs VVZ the DT more RBR heavily RB a DT third JJ party NN product NN is VBZ modified VVN , , the DT greater JJR the DT maintenance NN costs NNS . SENT D SYM Partition NN a DT data NN structure NN so IN that DT individual JJ CPUs NP own JJ portions NNS of IN it PP , , eliminating VVG the DT need NN for IN explicit JJ locking VVG and CC its PP$ overhead NN thereby RB increasing VVG speedup NNS . SENT E SYM Combine VV partitioned VVN critical JJ sections NNS . SENT This DT can MD increase VV speedup NNS on IN systems NNS with IN very RB high JJ synchronization NN primitive JJ overhead NN , , but CC will MD normally RB decrease VV speedup NNS . SENT The DT simpler JJR code NN locking VVG paradigm NN often RB results VVZ in IN decreased VVN maintenance NN costs NNS . SENT F SYM Partition NN code NN locked VVN critical JJ sections NNS . SENT This DT usually RB increases VVZ speedup NNS , , but CC may MD also RB increase VV maintenance NN costs NNS . SENT G NP Split NP up RB fused VVD critical JJ sections NNS . SENT If IN the DT overhead NN of IN the DT synchronization NN primitives NNS is VBZ low RB compared VVN to TO that DT of IN the DT code NN making VVG up RP the DT critical JJ section NN , , this DT will MD increase VV speedup NNS . SENT This DT action NN can MD be VB motivated VVN when WRB moving VVG the DT program NN to TO a DT machine NN with IN cheaper JJR synchronization NN primitives NNS . SENT H NP Fuse NP critical JJ sections NNS . SENT If IN the DT overhead NN of IN the DT synchronization NN primitives NNS is VBZ high RB compared VVN to TO that DT of IN the DT code NN making VVG up RP the DT critical JJ section NN , , this DT will MD increase VV speedup NNS . SENT I PP , , L NP , , M NP Use NP reader NN writer NN locking VVG . SENT If IN the DT critical JJ sections NNS have VHP high JJ overhead NN compared VVN to TO that DT of IN the DT synchronization NN primitives NNS , , if IN there EX is VBZ a DT high JJ read NN to TO write VV ratio NN , , and CC if IN the DT existing JJ program NN has VHZ high JJ contention NN , , this DT will MD increase VV speedup NNS . SENT J NP , , K NP , , N NP Stop NP using VVG reader NN writer NN locking VVG . SENT Since IN reader NN writer NN primitives NNS often RB have VHP higher JJR overhead NN than IN do VV normal JJ synchronization NN primitives NNS , , this DT action NN will MD result VV in IN greater JJR speedup NNS if IN the DT read NN to TO write VV ratio NN or CC the DT contention NN level NN is VBZ too RB low JJ for IN reader NN writer NN primitives NNS to TO be VB effective JJ . SENT Code NN Locking VVG Problem NN Obtaining VVG and CC maintaining VVG large JJ speedups NNS , , particularly RB from IN existing VVG third JJ party NN code NN , , can MD be VB time NN consuming NN and CC expensive JJ . SENT Context NN An DT existing JJ sequential JJ program NN whose WP$ critical JJ sections NNS are VBP very RB small JJ compared VVN to TO the DT parallel JJ portion NN of IN the DT code NN , , of IN which WDT only JJ modest JJ speedups NNS are VBP required VVN , , or CC whose WP$ development NN or CC maintenance NN cost NN constraints NNS rule VVP out RP more RBR effective JJ parallelization NN techniques NNS . SENT Forces NNS Speedup NNS Contention NN Overhead NN Economics NP Development NP and CC Maintenance NN Costs VVZ The DT required VVN speedup NNS may MD be VB modest JJ , , or CC the DT contention NN and CC overhead NN present NN may MD be VB negligible JJ , , thereby RB allowing VVG adequate JJ speedups NNS . SENT Economics NNS may MD prohibit VV applying VVG more JJR than IN a DT small JJ number NN of IN CPUs NP to TO a DT problem NN , , thereby RB rendering VVG high JJ speedups NNS irrelevant JJ . SENT Development NN and CC maintenance NN costs NNS of IN highly RB parallel JJ code NN might MD be VB prohibitive JJ . SENT Solution NN Use NN code NN locking VVG . SENT Resulting VVG Context NN A DT moderately RB parallel JJ program NN that WDT is VBZ very RB similar JJ to TO its PP$ sequential JJ counterpart NN , , resulting VVG in IN relatively RB low JJ development NN and CC maintenance NN costs NNS . SENT Design NN Rationale NN Code NP locking VVG is VBZ the DT simplest JJS locking VVG design NN . SENT It PP is VBZ especially RB easy JJ to TO retrofit VV an DT existing JJ program NN to TO use VV code NN locking VVG in IN order NN to TO run VV it PP on IN an DT multiprocessor NN . SENT However RB , , most JJS programs NNS of IN any DT size NN and CC complexity NN require VVP much RB of IN the DT execution NN to TO occur VV in IN critical JJ sections NNS , , which WDT in IN turn NN sharply RB limits VVZ the DT scaling NN as IN specified VVN by IN Amdahl's NP Law NP . SENT Therefore RB , , code VV locking VVG should MD be VB used VVN on IN programs NNS that WDT spend VVP only RB a DT small JJ fraction NN of IN their PP$ run NN time NN in IN critical JJ sections NNS or CC from IN which WDT only JJ modest JJ scaling NN is VBZ required VVN . SENT In IN these DT cases NNS , , the DT simplicity NN of IN code NN locking VVG will NN result NN in IN much RB lower JJR development NN costs NNS being VBG incurred VVN while IN parallelizing VVG . SENT The DT lookup NN table NN example NN could MD be VB used VVN verbatim RB , , but CC calls VVZ to TO locktab NN search NN function NN and CC subsequent JJ uses NNS of IN the DT return NN value NN would MD have VH to TO be VB surrounded VVN by IN a DT synchronization NN primitive JJ as IN shown VVN in IN Figure NN fig NN . SENT codelock NN . SENT Global JJ lock NN for IN locktab NN t NN manipulations NNS . SENT slock NN t NN locktab NN mutex NN . SENT . SENT . SENT . SENT Look VV up RP a DT locktab NN element NN and CC examine VV it PP . SENT S PP LOCK VVP locktab NN mutex NN . SENT p NN locktab NN search NN mykey NN . SENT insert NN code NN here RB to TO examine VV or CC update VV the DT element NN . SENT S PP UNLOCK VV locktab NN mutex NN . SENT Code NN Locking VVG Lookup NP Table NN Note NN that IN the DT lock NN must MD be VB held VVN across IN the DT use NN of IN the DT element NN as IN well RB as RB across IN the DT lookup NN , , particularly RB if IN the DT element NN can MD be VB deleted VVN . SENT This DT means VVZ that IN it PP is VBZ not RB useful JJ to TO bury VV the DT locking VVG in IN locktab NN search NN itself PP . SENT Nevertheless RB , , it PP is VBZ relatively RB easy JJ to TO create VV and CC maintain VV a DT code NN locking VVG version NN of IN a DT program NN . SENT No DT restructuring NN is VBZ needed VVN , , only RB the DT admittedly RB non JJ trivial JJ task NN of IN inserting VVG the DT locking VVG operations NNS . SENT This DT program NN would MD scale VV well RB if IN the DT table NN search NN and CC update NN was VBD a DT very RB small JJ part NN of IN the DT program's JJ execution NN time NN . SENT Partitioning VVG Problem NN Straightforward JJ parallelizations NNS such JJ as IN those DT using VVG code NN locking VVG typically RB exhibit NN very RB low JJ speedups NNS . SENT Context NN An DT existing JJ sequential JJ or CC parallel JJ program NN requiring VVG greater JJR speedup NNS than IN can MD be VB obtained VVN via IN code NN locking VVG , , and CC whose WP$ data NN structures NNS may MD be VB split VVN up RP into IN independent JJ partitions NNS , , so RB that IN different JJ partitions NNS may MD be VB operated VVN on IN in IN parallel NN . SENT Forces NNS Speedup NNS Contention NN Overhead NN Economics NP Development NP and CC Maintenance NN Costs NNS A DT greater JJR speedup NNS than IN can MD be VB obtained VVN via IN code NN locking VVG must NN be VB needed VVN badly RB enough RB to TO invest VV the DT development NN , , maintenance NN , , and CC machine NN resources NNS required VVN for IN more JJR aggressive JJ parallelization NN . SENT The DT program NN must MD also RB be VB partitionable JJ so RB as RB to TO allow VV partitioning VVG to TO be VB used VVN to TO reduce VV contention NN and CC overhead NN . SENT Solution NN Use NN partitioning VVG . SENT Resulting VVG Context NN A DT more RBR heavily RB parallel JJ program NN that WDT is VBZ less RBR similar JJ to TO its PP$ sequential JJ counterpart NN , , but CC which WDT exhibits VVZ much RB lower JJR contention NN and CC overhead NN and CC thus RB higher JJR speedups NNS . SENT Design NN Rationale NN Many JJ algorithms NNS and CC data NNS structures NNS may MD be VB split VVN into IN independent JJ parts NNS , , with IN each DT part NN having VHG its PP$ own JJ independent JJ critical JJ section NN . SENT Then RB the DT critical JJ sections NNS for IN each DT part NN could MD execute VV in IN parallel NN although IN only RB one CD instance NN of IN the DT critical JJ section NN for IN a DT given VVN part NN could MD be VB executing VVG at IN a DT given VVN time NN . SENT This DT partitioning VVG is VBZ useful JJ in IN Regime NN 3 CD of IN Figure NN fig NN . SENT overhead RB , , where WRB critical JJ section NN overhead NN must MD be VB reduced VVN . SENT Partitioning VVG reduces VVZ this DT overhead NN by IN distributing VVG the DT instances NNS of IN the DT overly RB large JJ critical JJ section NN into IN multiple JJ critical JJ sections NNS . SENT This DT reduces VVZ n NN c NN in IN Equation NN eq NP . SENT overhead RB , , thereby RB reducing VVG T NN c NN and CC increasing VVG the DT speedup NNS , , as RB can MD be VB seen VVN from IN Figures NNS fig NN . SENT speedup NNS , , fig NN . SENT cpulim NN , , and CC fig NN . SENT cpueff NP . SENT For IN example NN , , if IN there EX were VBD several JJ independent JJ lookup NN tables NNS in IN a DT program NN , , each DT could MD have VH its PP$ own JJ critical JJ section NN , , as RB shown VVN in IN Figure NN fig NN . SENT partition NN . SENT This DT figure NN assumes VVZ that DT locktab NN search NN has VHZ been VBN modified VVN to TO take VV an DT additional JJ pointer NN to TO the DT table NN to TO be VB searched VVD . SENT Structuring VVG the DT code NN in IN this DT manner NN allows VVZ searches NNS of IN different JJ tables NNS to TO proceed VV in IN parallel NN , , although IN searches NNS of IN a DT particular JJ table NN will MD still RB be VB serialized VVN . SENT Global JJ lock NN for IN locktab NN t NN manipulations NNS . SENT slock NN t NN my PP$ locktab NN mutex NN LOCKTAB NP NHASH NP . SENT . SENT . SENT . SENT Look VV up RP a DT locktab NN element NN and CC examine VV it PP . SENT S PP LOCK VVP my PP$ locktab NN mutex NN LOCKTAB NP HASH NN mykey NN . SENT p NN locktab NN search VVP my PP$ locktab NN , , mykey NP . SENT insert NN code NN here RB to TO examine VV or CC update VV the DT element NN . SENT S PP UNLOCK VV my PP$ locktab NN mutex NN LOCKTAB NP HASH NN mykey NN . SENT Partitioned VVN Lookup JJ Table NN Locating VVG the DT lock NN for IN each DT hash NN line NN in IN the DT hash NN line NN header NN data NNS structure NN , , as RB shown VVN in IN Figure NN fig NN . SENT datalock NN , , results NNS in IN a DT special JJ case NN of IN partitioning VVG known VVN as IN data NNS locking VVG . SENT This DT may MD be VB done VVN if IN the DT individual JJ elements NNS in IN a DT table NN are VBP independent JJ from IN one CD another DT , , and CC allows VVZ searches NNS of IN a DT single JJ table NN to TO proceed VV in IN parallel NN if IN the DT items NNS being VBG searched VVD for IN do VVP not RB collide VV after IN being VBG hashed VVN . SENT Type NN definition NN for IN locktab NN header NN . SENT All DT fields NNS are VBP protected VVN by IN lth NN mutex NN . SENT typedef NN struct NN locktab NN head NN locktab NN t NN lth NN head NN . SENT slock NN t NN lth NN mutex NN . SENT locktab NN head NN t NN . SENT Header NN for IN list NN of IN locktab NN t's NNS . SENT locktab NN head NN t NN looktab NN head NN LOCKTAB NP NHASH NP NULL NN . SENT define VV LOCKTAB NP HASH VV key JJ key JJ Return NN a DT pointer NN to TO the DT element NN of IN the DT table NN with IN the DT specified JJ key NN , , or CC return NN NULL NN if IN no DT such JJ element NN exists VVZ . SENT locktab NN t NN locktab NN search NN int NP key JJ locktab NN t NN p NN . SENT p NN locktab NN head NN LOCKTAB NP HASH NN key NN . SENT lth NN head NN . SENT while IN p NN . SENT NULL NN if IN p NN rc NN key JJ key JJ return NN NULL NN . SENT if IN p NN rc NN key JJ key JJ return NN p NN . SENT p NN p NN rc NN next JJ . SENT return NN NULL NN . SENT . SENT . SENT . SENT Look VV up RP a DT locktab NN element NN and CC examine VV it PP . SENT S PP LOCK VVP locktab NN head NN LOCKTAB NP HASH NN key NN . SENT lth NN mutex NN . SENT p NN locktab NN search NN mykey NN . SENT insert NN code NN here RB to TO examine VV or CC update VV the DT element NN . SENT S PP UNLOCK VV locktab NN head NN LOCKTAB NP HASH NN key NN . SENT lth NN mutex NN . SENT Data NNS Locking VVG These DT two CD techniques NNS can MD be VB combined VVN to TO allow VV parallel RB searching VVG within IN and CC between IN lookup NN tables NNS . SENT In IN this DT case NN , , data NNS locking VVG is VBZ not RB too RB much RB more RBR complex JJ than IN code NN locking VVG . SENT Programs NNS containing VVG data NNS structures NNS that WDT are VBP not RB fully RB independent JJ are VBP much RB more RBR challenging JJ . SENT such JJ programs NNS must MD be VB restructured VVN in IN order NN to TO get VV the DT needed VVN independence NN or CC must MD use VV complex JJ locking VVG models NNS that WDT allow VVP near IN independence NN to TO be VB exploited VVN . SENT However RB , , in IN the DT second JJ example NN , , once RB the DT transformation NN to TO data NNS locking VVG is VBZ completed VVN , , the DT speedup NNS may MD be VB increased VVN simply RB by IN increasing VVG the DT size NN of IN the DT hash NN table NN . SENT Data NP Ownership NP Problem NN Programs NNS with IN frequent JJ , , small JJ critical JJ section NN do VVP not RB get VV high JJ speedups NNS on IN machines NNS with IN high JJ synchronization NN overheads NNS . SENT Context NN An DT existing JJ sequential JJ or CC parallel JJ program NN that WDT can MD be VB partitioned VVN so RB that IN only RB one CD process NN accesses VVZ a DT given VVN set NN of IN data NNS at IN a DT time NN , , thereby RB greatly RB reducing VVG or CC eliminating VVG the DT need NN for IN synchronization NN . SENT Forces NNS Speedup NNS Contention NN Overhead NN Economics NP Development NP and CC Maintenance NN Costs VVZ A DT high JJ speedup NNS must MD be VB needed VVN badly RB enough RB to TO invest VV the DT resources NNS needed VVN to TO develop VV , , maintain VV , , and CC run VV a DT highly RB parallel JJ version NN . SENT The DT program NN must MD be VB perfectly RB partitionable JJ , , eliminating VVG contention NN an DT synchronization NN overhead NN . SENT Solution NN Use NN data NNS ownership NN . SENT Resulting VVG Context NN A DT more RBR heavily RB parallel JJ program NN that WDT is VBZ even RB less RBR similar JJ to TO its PP$ sequential JJ counterpart NN , , but CC which WDT exhibits VVZ much RB lower JJR contention NN and CC overhead NN and CC thus RB higher JJR speedups NNS . SENT Design NN Rationale NN The DT last JJ example NN of IN the DT previous JJ section NN splits VVZ a DT lookup NN table NN into IN multiple JJ independent JJ classes NNS of IN keys NNS , , where WRB the DT key JJ classes NNS are VBP defined VVN by IN a DT hash NN function NN . SENT If IN each DT key JJ class NN can MD be VB assigned VVN to TO a DT separate JJ process NN , , we PP have VHP a DT special JJ case NN of IN partitioning VVG known VVN as IN data NN ownership NN . SENT Data NP ownership NN is VBZ a DT very RB powerful JJ pattern NN , , as IN it PP can MD entirely RB eliminate VV synchronization NN overhead JJ d NN c SYM becomes VVZ zero CD in IN Equation NN eq NP . SENT overhead NN . SENT This DT is VBZ particularly RB useful JJ in IN cases NNS in IN Regime NN 1 CD in IN Figure NN fig NN . SENT overhead RB that RB have VH small JJ critical JJ sections NNS whose WP$ overhead NN is VBZ dominated VVN by IN synchronization NN overhead NN . SENT If IN the DT key JJ classes NNS are VBP well RB balanced JJ , , data NNS ownership NN can MD allow VV virtually RB unlimited JJ speedups NNS . SENT Partitioning VVG the DT lookup NN table NN example NN over IN separate JJ processes NNS would MD require VV a DT convention NN or CC mechanism NN to TO force VV lookups NNS for IN keys NNS of IN a DT given VVN class NN to TO be VB performed VVN by IN the DT proper JJ process NN . SENT Such PDT a DT convention NN or CC mechanism NN would MD be VB application NN and CC implementation NN specific NN , , and CC beyond IN the DT scope NN of IN this DT paper NN . SENT However RB , , a DT data NN ownership NN partitioning VVG for IN a DT parallel JJ memory NN allocator NN with IN partially RB independent JJ data NN structures NNS is VBZ described VVN by IN McKenney NP and CC Slingwine NP McKenney NP 93 CD . SENT Critical JJ Section NN Fusing VVG Problem NN Programs NNS with IN frequent JJ , , small JJ critical JJ section NN do VVP not RB get VV high JJ speedups NNS on IN machines NNS with IN high JJ synchronization NN overheads NNS . SENT Context NN An DT existing JJ program NN with IN many JJ small JJ critical JJ sections NNS so IN that DT synchronization NN overhead NN dominates VVZ , , but CC which WDT is VBZ not RB easily RB partitionable JJ . SENT Forces NNS Speedup NNS Contention NN Overhead NN Economics NP Development NP and CC Maintenance NN Costs NNS Additional JJ speedup NNS must MD be VB needed VVN badly RB enough RB to TO justify VV the DT cost NN of IN creating VVG , , maintaining VVG , , and CC running VVG a DT highly RB parallel JJ program NN . SENT Synchronization NN overhead NN must MD dominate VV , , and CC contention NN must MD be VB low JJ enough RB to TO allow VV increasing VVG the DT size NN of IN critical JJ sections NNS . SENT Solution NN Fuse NP critical JJ sections NNS . SENT Resulting VVG Context NN A DT program NN with IN fewer JJR but CC larger JJR critical JJ sections NNS , , thus RB less RBR subject JJ to TO synchronization NN overheads NNS . SENT Design NN Rationale NN A DT program NN in IN Regime NN 1 CD in IN Figure NN fig NN . SENT overhead NN can MD reduce VV T NN c NN by IN decreasing VVG n NN c NN . SENT If IN the DT overhead NN of IN the DT code NN between IN two CD critical JJ sections NNS is VBZ less JJR than IN the DT overhead NN of IN the DT synchronization NN primitives NNS , , such PDT a DT decrease NN can MD be VB obtained VVN by IN fusing VVG the DT two CD critical JJ sections NNS . SENT For IN example NN , , imagine VV a DT program NN containing VVG back RB to TO back VV searches NNS of IN a DT code NN locked VVN lookup NN table NN . SENT A DT straightforward JJ implementation NN would MD acquire VV the DT lock NN , , do VVP the DT first JJ search NN , , do VVP the DT first JJ update NN , , release VV the DT lock NN , , acquire VV the DT lock NN once RB more JJR , , do VVP the DT second JJ search NN , , do VVP the DT second JJ update NN , , and CC finally RB release VV the DT lock NN . SENT If IN this DT sequence NN of IN code NN was VBD dominated VVN by IN synchronization NN overhead NN , , eliminating VVG the DT first JJ release NN and CC second JJ acquisition NN as IN shown VVN in IN Figure NN fig NN . SENT fuse NN would MD increase VV speedup NNS . SENT Global JJ lock NN for IN locktab NN t NN manipulations NNS . SENT slock NN t NN locktab NN mutex NN . SENT . SENT . SENT . SENT Look VV up RP a DT locktab NN element NN and CC examine VV it PP . SENT S PP LOCK VVP locktab NN mutex NN . SENT p NN locktab NN search NN mykey NN . SENT insert NN code NN here RB to TO examine VV or CC update VV the DT element NN . SENT p NN locktab NN search NN myotherkey NN . SENT insert NN code NN here RB to TO examine VV or CC update VV the DT element NN . SENT S PP UNLOCK VV locktab NN mutex NN . SENT Critical JJ Section NN Fusing VVG Reader NP Writer NP Locking VVG Problem NN Programs NNS that WDT spend VVP most JJS of IN their PP$ time NN in IN critical JJ sections NNS get VVP very RB small JJ speedups NNS , , even RB if IN the DT critical JJ sections NNS do VVP not RB modify VV data NNS . SENT Context NN An DT existing JJ program NN with IN much JJ of IN the DT code NN contained VVN in IN critical JJ sections NNS , , but CC where WRB the DT read NN to TO write VV ratio NN is VBZ large JJ . SENT Forces NNS Speedup NNS Contention NN Overhead NN Read NP to TO Write VV Ratio NN Synchronization NN overhead NN must MD not RB dominate VV , , contention NN must MD be VB high JJ , , and CC read VVP to TO write VV ratio NN must MD be VB high JJ . SENT Low JJ synchronization NN overhead NN is VBZ especially RB important JJ , , as IN most JJS implementations NNS of IN reader NN writer NN locking VVG incur VV greater JJR synchronization NN overhead NN than IN does VVZ normal JJ locking VVG . SENT Specialized JJ forms NNS of IN reader NN writer NN locking VVG may NN be VB used VVN when WRB synchronization NN overhead RB dominates VVZ Andrews NPS 91 JJ Tay NP 87 CD . SENT Solution NN Use NN reader NN writer NN locking VVG . SENT Resulting VVG Context NN A DT program NN that WDT allows VVZ CPUs NN that WDT are VBP not RB modifying VVG data NNS to TO proceed VV in IN parallel NN , , thereby RB increasing VVG speedup NNS . SENT Design NN Rationale NN If IN synchronization NN overhead NN is VBZ negligible JJ i NP . SENT e SYM . SENT , , the DT program NN uses VVZ coarse JJ grained VVN parallelism NN , , and CC only RB f NN w NN fraction NN of IN the DT critical JJ sections NNS modify VV data NNS , , then RB allowing VVG multiple JJ readers NNS to TO proceed VV in IN parallel NN can MD greatly RB increase VV speedup NNS . SENT The DT lookup NN table NN example NN uses VVZ read VV side NN primitives NNS to TO search VV the DT table NN and CC write VV size NN primitives NNS to TO modify VV it PP . SENT Figure NN fig NN . SENT rwrdlock NN shows NNS locking VVG for IN search NN , , and CC Figure NN fig NN . SENT rwwrlock NN shows NNS locking VVG for IN update NN . SENT Since IN this DT example NN demonstrates VVZ reader NN writer NN locking VVG applied VVN to TO a DT code NN locked VVN program NN , , the DT locks NNS must MD surround VV the DT calls NNS to TO locktab NN search NN as RB well RB as IN the DT code NN that WDT examines VVZ or CC modifies VVZ the DT selected JJ element NN . SENT Reader NN writer NN locking VVG can NN easily RB be VB adapted VVN to TO partitioned VVN parallel JJ programs NNS as IN well RB . SENT Global JJ lock NN for IN locktab NN t NN manipulations NNS . SENT srwlock NN t NN locktab NN mutex NN . SENT . SENT . SENT . SENT Look VV up RP a DT locktab NN element NN and CC examine VV it PP . SENT S NP RDLOCK NP locktab NN mutex NN . SENT p NN locktab NN search NN mykey NN . SENT insert NN code NN here RB to TO examine VV the DT element NN . SENT S PP UNLOCK VV locktab NN mutex NN . SENT Read NP Side NP Locking VVG Global JJ lock NN for IN locktab NN t NN manipulations NNS . SENT srwlock NN t NN locktab NN mutex NN . SENT . SENT . SENT . SENT Look VV up RP a DT locktab NN element NN and CC examine VV it PP . SENT S NP WRLOCK NP locktab NN mutex NN . SENT p NN locktab NN search NN mykey NN . SENT insert NN code NN here RB to TO update VV the DT element NN . SENT S PP UNLOCK VV locktab NN mutex NN . SENT Write VV Side NP Locking VVG Acknowledgments NNS I PP owe VVP thanks NNS to TO Ward NP Cunningham NP and CC Steve NP Peterson NP for IN encouraging VVG me PP to TO set VV these DT ideas NNS down RB . SENT I PP am VBP indebted JJ to TO Dale NP Goebel NP for IN his PP$ consistent JJ support NN . SENT Relations NNS of IN Forces NPS The DT forces NNS acting VVG on IN the DT performance NN of IN parallel JJ programs NNS are VBP speedup NNS , , contention NN , , overhead RB , , economics NNS , , and CC development NN and CC maintenance NN costs NNS . SENT Of IN these DT , , speedup NNS , , contention NN , , overhead RB , , read VVP to TO write VV ratio NN , , and CC economics NNS have VHP strong JJ mathematical JJ interrelationships NNS . SENT Consider VV an DT idealized VVN parallel JJ program NN with IN a DT single JJ critical JJ section NN , , and CC where WRB all DT code NN outside IN the DT critical JJ section NN may MD be VB executed VVN with IN an DT arbitrarily RB fine JJ granularity NN . SENT footnote NN In IN real JJ life NN , , performance NN penalties NNS must MD be VB paid VVN for IN too RB fine JJ granularity NN , , resulting VVG in IN speedup NNS limitations NNS . SENT In IN addition NN , , data NN dependencies NNS can MD result VV in IN additional JJ limitations NNS . SENT This DT idealized VVN model NN nevertheless RB produces VVZ several JJ widely RB applicable JJ rules NNS of IN thumb NN . SENT Only RB one CD CPU NN may MD execute VV in IN the DT critical JJ section NN at IN a DT given VVN time NN , , and CC thus RB at IN most JJS one CD CPUs NP worth NN of IN throughput NN may MD be VB obtained VVN from IN that DT critical JJ section NN . SENT This DT effect NN is VBZ known VVN as IN Amdahl's NP Law NP , , and CC may MD be VB cast VVN into IN mathematical JJ form NN as RB follows VVZ . SENT S PP where WRB S NP is VBZ the DT maximum JJ speedup NNS obtainable JJ and CC T NN c NN is VBZ the DT fraction NN of IN total JJ time NN spent VVN in IN the DT critical JJ section NN , , including VVG the DT overhead NN of IN the DT mutual JJ exclusion NN primitive JJ . SENT This DT expression NN can MD be VB roughly RB adapted VVN for IN programs NNS containing VVG multiple JJ independent JJ critical JJ sections NNS by IN setting NN T NN c SYM equal JJ to TO the DT maximum NN of IN the DT times NNS spent VVN in IN the DT critical JJ sections NNS . SENT footnote NN An DT analogy NN to TO queuing VVG systems NNS can MD provide VV more JJR accurate JJ results NNS . SENT Contention NN is VBZ analogous JJ to TO in RB queue VV waiting VVG time NN , , overhead NN and CC critical JJ section NN execution NN time NN are VBP analogous JJ to TO service NN rate NN , , and CC attempts VVZ to TO enter VV the DT critical JJ section NN are VBP analogous JJ to TO arrival NN rate NN . SENT A DT plot NN of IN Equation NN eq NP . SENT speedup NNS is VBZ shown VVN in IN Figure NN fig NN . SENT speedup NNS . SENT Theoretical JJ Speedup NNS Limit NN The DT combined JJ forces NNS of IN contention NN and CC overhead NN will MD always RB force VV a DT program's JJ speedup NNS to TO be VB below IN the DT line NN in IN this DT plot NN . SENT a DT program NN that WDT spends VVZ 25 CD of IN its PP$ time NN in IN its PP$ critical JJ section NN cannot NN achieve VV a DT speedup NNS of IN greater JJR than IN four CD . SENT The DT next JJ force NN , , economics NNS , , manifests VVZ itself PP as IN a DT constraint NN on IN speedup NNS . SENT This DT paper NN looks VVZ at IN two CD simple JJ economic JJ constraints NNS . SENT 1 LS a DT budgetary JJ limit NN on IN the DT number NN of IN CPUs NP , , and CC 2 CD a DT cost NN effectiveness NN constraint NN that WDT requires VVZ that IN at IN least JJS some DT fraction NN of IN the DT last JJ CPU's NP capacity NN be VB used VVN for IN useful JJ work NN . SENT The DT first JJ constraint NN limits VVZ the DT number NN of IN available JJ CPUs NP n NN . SENT S NP e SYM 1 CD frac NN 1 CD frac NN 1 CD T NN c SYM n NN T NN c SYM This DT equation NN simply RB divides VVZ the DT parallelizable JJ fraction NN of IN the DT work NN 1 CD T NN c SYM , , assuming VVG negligible JJ synchronization NN overhead NN by IN the DT available JJ number NN of IN CPUs NP , , adds VVZ the DT serial JJ fraction NN and CC takes VVZ the DT reciprocal JJ to TO calculate VV the DT speedup NNS . SENT Rearranging VVG . SENT S NP e SYM 1 CD frac NN n NN 1 CD n NN 1 CD T NN c SYM This DT expression NN is VBZ plotted VVD on IN Figure NN fig NN . SENT cpulim NNS for IN n NN equal JJ to TO 10 CD , , 100 CD , , and CC 1000 CD . SENT Speedup NNS Given VVN Limited NP CPUs NP The DT straight JJ line NN on IN this DT plot NN is VBZ the DT theoretical JJ speedup NNS given VVN unlimited JJ numbers NNS of IN CPUs NP . SENT Contention NN and CC overhead RB force VV a DT program's JJ speedup NNS to TO be VB below IN this DT line NN , , while IN economics NNS forces VVZ it PP even RB further RBR below RB , , depending VVG on IN the DT number NN of IN CPUs NP available JJ . SENT The DT second JJ constraint NN places VVZ a DT lower JJR bound VVN on IN the DT marginal JJ speedup NNS obtained VVN by IN adding VVG an DT additional JJ CPU NN . SENT The DT marginal JJ speedup NNS is VBZ calculated VVN by IN subtracting VVG the DT speedup NNS attained VVN by IN n NN 1 CD CPUs NN from IN that DT attained VVN by IN n NN CPUs NP . SENT m NN frac NN n NN 1 CD n NN 1 CD T NN c SYM frac JJ n NN 1 CD 1 CD n NN 2 CD T NN c SYM Solving VVG this DT equation NN for IN n NN gives VVZ the DT maximum JJ number NN of IN CPUs NP that WDT may MD be VB added VVN while IN still RB beating VVG a DT specified JJ marginal JJ speedup NNS m NN . SENT n NN frac NN sqrt NN m NN 2 CD T NN c SYM 2 CD 4 CD m NN T NN c SYM 4 CD m NN 3 CD m NN T NN c SYM 2 CD m NN 2 CD m NN T NN c SYM Substituting VVG this DT CPU NN limit NN into IN Equation NN eq NP . SENT speedupe NN 1 CD and CC thus RB once RB again RB assuming VVG that DT synchronization NN overhead NN is VBZ negligible JJ . SENT S NP e SYM 2 CD frac NN sqrt NN m NN T NN c SYM 2 CD 4 CD T NN c SYM 4 CD 3 CD T NN c SYM sqrt NN m NN 2 CD sqrt NN m NN T NN c SYM sqrt NN m NN T NN c SYM 2 CD 4 CD T NN c SYM 4 CD T NN c SYM 2 CD sqrt NN m NN This DT equation NN is VBZ plotted VVD for IN marginal JJ speedups NNS of IN 0 CD , , 0 CD . SENT 5 CD , , 0 CD . SENT 7 CD , , and CC 0 CD . SENT 9 CD in IN Figure NN fig NN . SENT cpueff NP . SENT Speedup NNS Given VVN Marginal JJ Speedup NNS Constraint NN Again RB , , contention NN and CC overhead RB force VV a DT program's JJ speedup NNS to TO be VB below IN the DT speedup NNS 0 CD line NN , , while IN economics NNS forces VVZ it PP even RB further RBR below RB , , depending VVG on IN the DT number NN of IN CPUs NP available JJ . SENT The DT common JJ thread NN through IN all DT of IN the DT speedup NNS plots NNS is VBZ that DT minimizing VVG T NN c NN leads VVZ to TO high JJ speedups NNS . SENT The DT T NN c SYM term NN can MD be VB broken VVN down RP into IN critical JJ section NN and CC synchronization NN overhead NN components NNS as RB follows VVZ . SENT T NN c SYM n NN c NN d NN c SYM d NN p NN where WRB n NN c NN is VBZ the DT number NN of IN times NNS per IN unit NN time NN that IN the DT critical JJ section NN was VBD entered VVN , , d NN c SYM the DT critical JJ section NN overhead NN , , and CC d LS p NN the DT synchronization NN overhead NN . SENT Figure NN fig NN . SENT overhead NN shows VVZ a DT plot NN of IN T NN c NN versus CC d NN c NN for IN n NN c NN 1 CD and CC d NN p NN 0 CD . SENT 001 CD . SENT Critical JJ Section NN Overhead NN Regimes NNS This DT plot NN shows VVZ three CD regimes NNS . SENT the DT first JJ below IN 0 CD . SENT 0002 CD dominated VVN by IN the DT synchronization NN overhead NN d SYM p NN , , the DT second NN from IN 0 CD . SENT 0002 CD to TO 0 CD . SENT 003 CD where WRB both CC have VHP influence NN , , and CC the DT third JJ above IN 0 CD . SENT 003 CD dominated VVN by IN the DT critical JJ section NN overhead NN d SYM c NN . SENT Synchronization NN overheads NNS can MD be VB quite RB large JJ in IN comparison NN to TO critical JJ section NN overheads NNS . SENT For IN example NN , , the DT Stanford NP Flash NP project NN plans VVZ to TO produce VV a DT machine NN with IN d NN p NN equal JJ to TO about RB one CD to TO two CD microseconds NNS but CC with IN d NN c NN as IN low JJ as IN a DT few JJ nanoseconds NNS Heinrich NP 94 CD so IN that DT synchronization NN overhead NN can MD be VB over IN two CD orders NNS of IN magnitude NN greater JJR than IN critical JJ section NN overhead NN . SENT Critical JJ sections NNS guarding VVG simple JJ single JJ variable JJ updates NNS on IN the DT Flash NP therefore RB fall VVP on IN the DT extreme JJ left NN of IN Figure NN fig NN . SENT overhead RB . SENT maximum JJ speedups NNS are VBP determined VVN in IN this DT case NN by IN the DT overhead NN of IN the DT synchronization NN primitives NNS . SENT The DT high JJ cost NN of IN general JJ purpose NN synchronization NN operations NNS relative JJ to TO machine NN instructions NNS is VBZ likely JJ to TO persist VV for IN the DT foreseeable JJ future NN Hennessy NP 91 CD Stone NP 91 CD Note NN also RB that IN there EX is VBZ nothing NN stopping NN T NN c SYM from IN being VBG greater JJR than IN one CD , , in IN which WDT case NN the DT theoretical JJ speedup NNS with IN be VB less JJR than IN one CD . SENT This DT simply RB means VVZ that IN the DT parallelism NN present NN in IN the DT implementation NN is VBZ not RB sufficient JJ to TO overcome VV the DT synchronization NN overhead NN . SENT Many JJ projects NNS , , both CC in IN academia NN and CC industry NN , , have VHP experienced VVN this DT frustrating JJ situation NN . SENT Reader NN writer NN primitives NNS are VBP helpful JJ in IN cases NNS where WRB the DT overhead NN of IN the DT critical JJ section NN dominates VVZ that IN of IN the DT primitives NNS and CC where WRB the DT read NN to TO write VV ratio NN is VBZ high JJ . SENT The DT derivations NNS in IN this DT section NN will MD use VV the DT fraction NN of IN operations NNS that WDT require VVP write VV side NN locking VVG f NN w NN in IN place NN of IN the DT more RBR intuitive JJ read VV to TO write VV ratio NN . SENT The DT speedup NNS due JJ to TO use VV of IN reader NN writer NN locking VVG can NN be VB approximated VVN as RB follows VVZ . SENT S NP f SYM w NN 1 CD f SYM w NN min NN n NN , , frac NP 1 CD f SYM w NN f NN w NN where WRB n NN is VBZ the DT number NN of IN CPUs NP . SENT This DT approximation NN assumes VVZ that IN readers NNS and CC writers NNS alternate JJ access NN to TO the DT data NN structure NN . SENT one CD writer NN proceeds NNS through IN its PP$ critical JJ section NN , , then RB all DT readers NNS proceed VVP , , then RB the DT next JJ writer NN proceeds NNS , , and CC so RB on IN . SENT It PP also RB assumes VVZ that IN the DT overhead NN of IN the DT reader NN and CC writer NN critical JJ sections NNS are VBP constant JJ and CC identical JJ . SENT More RBR sophisticated JJ analyses NNS may MD be VB found VVN in IN the DT literature NN Reiman NP 91 CD . SENT A DT plot NN of IN this DT expression NN is VBZ shown VVN in IN Figure NN fig NN . SENT rwlock NN . SENT Reader NP Writer NP Locking VVG Speedup NNS As RB can MD be VB seen VVN from IN this DT figure NN , , reader NN writer NN locking VVG can NN achieve VV nearly RB linear JJ speedups NNS if IN the DT write VVP fraction NN is VBZ low JJ enough RB i NP . SENT e SYM . SENT , , if IN the DT read NN to TO write VV ratio NN is VBZ high JJ enough RB . SENT The DT following VVG sections NNS examine VVP ways NNS to TO increase VV speedups NNS by IN reducing VVG contention NN and CC overhead NN . SENT Although IN methods NNS of IN removing VVG economic JJ constraints NNS lie VVP beyond IN the DT scope NN of IN this DT paper NN , , reductions NNS in IN contention NN and CC overhead NN can MD result VV in IN more JJR effective JJ use NN of IN limited JJ resources NNS . SENT And CC 91 CD Gregory NP R NP . SENT Andrews NPS . SENT Paradigms NNS for IN process NN interaction NN in IN distributed VVN programs NNS . SENT ACM NP Computing NP Surveys NNS . SENT , , 1991 CD . SENT HJ NP 91 CD John NP L NP . SENT Hennessy NP and CC Norman NP P NN . SENT Jouppi NP . SENT Computer NN technology NN and CC architecture NN . SENT An DT evolving VVG interaction NN . SENT IEEE NP Computer NP , , pages NNS 18 CD 28 CD , , September NP 1991 CD . SENT HKO NP 94 CD Mark NP Heinrich NP , , Jeffrey NP Kuskin NP , , David NP Ofelt NP , , John NP Heinlein NP , , Joel NP Baxter NP , , Jaswinder NP Pal NP Singh NP , , Richard NP Simoni NP , , Kourosh NP Gharachorloo NP , , David NP Nakahira NP , , Mark NP Horowitz NP , , Anoop NP Gupta NP , , Mendel NP Rosenblum NP , , and CC John NP Hennessy NP . SENT The DT performance NN impact NN of IN flexibility NN in IN the DT stanford NP flash NN multiprocessor NN . SENT Proceedings NNS of IN the DT 6 CD th NN International NP Conference NP on IN Architectural JJ Support NN for IN Programming NN Languages NNS and CC Operating NN Systems NPS , , October NP 1994 CD . SENT MS NN 93 CD Paul NP E NP . SENT McKenney NP and CC Jack NP Slingwine NP . SENT Efficient JJ kernel NN memory NN allocation NN on IN shared VVN memory NN multiprocessors NNS . SENT In IN USENIX NP Conference NP Proceedings NNS , , Berkeley NP CA MD , , February NP 1993 CD . SENT RW NP 91 NP Martin NP I PP . SENT Reiman NP and CC Paul NP E NP . SENT Wright NP . SENT Performance NN analysis NN of IN concurrent JJ read VVN exclusive JJ write VVP . SENT ACM NP , , pages NNS 168 CD 177 CD , , February NP 1991 CD . SENT SC NP 91 CD Harold NP S NP . SENT Stone NP and CC John NP Cocke NP . SENT Computer NN architecture NN in IN the DT 1990 CD s PP . SENT IEEE NP Computer NP , , pages NNS 30 CD 38 CD , , September NP 1991 CD . SENT Tay NP 87 CD Y NN . SENT C LS . SENT Tay NP . SENT Locking VVG Performance NN in IN Centralized JJ Databases NNS . SENT Academic NP Press NP , , 1987 CD . SENT