/Users/andrea/_magisterarbeit/korpus/clean/testkorpus/1/file_17.html NN ----------------------------------------- : FLASH VV . SENT Integration NN of IN Distributed NP Shared VVD Memory NP and CC Message NP Passing VVG Stanford NP University NP Computer NP Systems NPS Laboratory NP Semiannual JJ Technical NP Report NP Advanced NP Research NP Projects NPS Agency NP For IN the DT period NN October NP 1994 CD March NP 1995 CD Contract NP Number NP . SENT DABT NP 63 CD 94 CD C NP 0054 CD AO NP B NP 837 CD Principal NN Investigator NN John NP L NP . SENT Hennessy NP jlh NP vsop NN . SENT stanford NP . SENT edu NN 415 CD 725 CD 3712 JJ 415 CD 725 CD 6949 CD fax NN Co NP principal JJ Investigator NN Mark NP A NP . SENT Horowitz NP horowitz NP chroma NN . SENT stanford NP . SENT edu NN 415 CD 725 CD 3707 JJ 415 CD 725 CD 6949 CD fax NN Table NN of IN Contents NNS Executive NP Summary NN Technical NP Summary NN Parallel NN Hardware NN Magic NP Hardware NN Status NN Architecture NP Simulation NP Tools NP Applications NP and CC Architectural JJ Studies NNS Parallel JJ Application NN Effort NN Latency NN , , Occupancy NN , , and CC Bandwidth NN Issues NNS in IN Distributed NP Shared VVD Memory NP MPs NP COMA NN Protocol NP on IN FLASH NN Software NP Based VVN Clustered JJ Distributed NP Virtual JJ Shared VVD Memory NP Shared VVD Cache NP Architecture NP Studies NPS Operating NP System NP and CC Compilers NNS FLASH VVP OS NN Hive NN Efficient JJ Logic NP Simulation NP Data NP and CC Computation NN Transformations NNS for IN Multiprocessors NP Verification NP State NP Enumeration NN New NP Methods NNS in IN Processor NN Verification NP Bibliography NN Executive NP Summary NN FLASH NN Design NN . SENT Over IN the DT last JJ few JJ months NNS , , the DT hardware NN design NN team NN has VHZ completed VVN the DT coding VVG of IN the DT MAGIC JJ RTL NP . SENT We PP have VHP also RB partitioned VVN the DT RTL NP into IN control NN and CC datapath NN logic NN so RB that IN the DT RTL NP is VBZ amenable JJ to TO logic NN synthesis NN and CC datapath NN layout NN . SENT The DT hardware NN efforts NNS over IN the DT next JJ few JJ months NNS will MD be VB concerned VVN mostly RB with IN synthesizing VVG the DT RTL NP model NN and CC addressing VVG any DT resulting VVG timing NN or CC layout NN difficulties NNS . SENT Of IN course NN , , we PP also RB will MD be VB focusing VVG heavily RB on IN design NN validation NN , , but CC that DT topic NN is VBZ detailed JJ in IN a DT separate JJ section NN of IN this DT report NN . SENT DSM NP Architecture NP Studies NPS . SENT We PP have VHP developed VVN a DT model NN for IN DSM NP multiprocessors NNS that WDT extends VVZ the DT logP NN ideas NNS to TO shared JJ memory NN architectures NNS . SENT Using VVG the DT model NN , , we PP show VVP the DT importance NN of IN occupancy NN , , the DT DSM NP counterpart NN to TO the DT message NN passing VVG overhead NN . SENT The DT results NNS show VVP that IN several JJ proposed VVN design NN points NNS have VHP poor JJ performance NN . SENT SPLASH NN 2 CD Benchmarks NNS . SENT We PP released VVD the DT SPLASH NN 2 CD benchmarks NNS , , which WDT includes VVZ both DT new JJ benchmarks NNS and CC improvements NNS over IN SPEC NN 1 CD . SENT Extensive JJ documentation NN , , including VVG scaling VVG instructions NNS , , is VBZ included VVN . SENT The DT benchmark JJ documentation NN , , and CC an DT extensive JJ set NN of IN benchmarks NNS are VBP available JJ by IN the DT Internet NN . SENT Architecture NP Simulation NP Tools NP . SENT We PP have VHP expanded VVN the DT functionality NN of IN FlashLite NP , , our PP$ system NN level NN simulator NN , , and CC have VHP continued VVN to TO develop VV multiple JJ protocols NNS to TO run VV on IN FLASH NN . SENT We PP have VHP used VVN FlashLite NP both CC as IN a DT tool NN for IN performance NN studies NNS 10 CD and CC as IN a DT hardware NN verification NN tool NN . SENT Our PP$ FlashLite NP Verilog NP environment NN has VHZ enabled VVN us PP to TO test VV basic JJ protocol NN operations NNS within IN the DT MAGIC JJ chip NN and CC has VHZ been VBN instrumental JJ in IN finding VVG most JJS of IN the DT early JJ bugs NNS in IN the DT design NN . SENT We PP are VBP currently RB working VVG on IN a DT lower JJR level NN interface NN that WDT will MD allow VV us PP to TO write VV more JJR controlled JJ directed VVN diagnostics NNS to TO further RBR verify VV the DT chip NN . SENT We PP are VBP also RB in IN the DT process NN of IN converting VVG the DT front JJ end NN reference NN generator NN for IN FlashLite NP to TO a DT processor NN emulator NN so RB that IN we PP can MD more RBR thoroughly RB model JJ data NN transfer NN throughout IN the DT processor NN memory NN system NN hierarchy NN . SENT This DT accuracy NN in IN data NN handling NN will MD further RBR increase VV the DT test NN coverage NN of IN our PP$ protocols NNS as RB well RB as IN the DT test NN coverage NN for IN the DT MAGIC JJ chip NN itself PP . SENT Parallel JJ Applications NP . SENT We PP have VHP continued VVN our PP$ efforts NNS in IN studying VVG the DT characteristics NNS of IN parallel JJ applications NNS on IN shared JJ address NN space NN multiprocessors NNS . SENT As IN part NN of IN this DT effort NN we PP have VHP released VVN the DT SPLASH NN 2 CD suite NN of IN parallel JJ applications NNS to TO facilitate VV the DT study NN of IN centralized VVN and CC distributed VVN shared JJ address NN space NN multiprocessors NNS . SENT The DT SPLASH NN 2 CD suite NN is VBZ the DT successor NN to TO the DT SPLASH NN suite NN , , and CC includes VVZ several JJ new JJ codes NNS in IN application NN domains NNS not RB previously RB explored VVN in IN SPLASH NN Shared VVN Cache NP Architecture NP Studies NPS . SENT We PP have VHP studied VVN the DT impact NN of IN clustering VVG at IN the DT second JJ level NN cache NN on IN the DT performance NN of IN small JJ scale NN bus NN based VVN multiprocessors NNS . SENT We PP show VVP that IN in IN an DT eight CD processor NN system NN , , the DT use NN of IN a DT large JJ shared JJ secondary JJ cache NN that WDT might MD be VB implemented VVN with IN MCM NP technology NN can MD significantly RB reduce VV bus NN contention NN and CC overall JJ execution NN time NN . SENT FLASH NN Operating NN System NP . SENT A DT first JJ prototype NN of IN Hive NN , , the DT operating VVG system NN for IN the DT FLASH NN machine NN , , has VHZ been VBN booted VVN and CC runs VVZ on IN a DT machine NN simulator NN . SENT This DT version NN contains VVZ most JJS of IN the DT mechanisms NNS for IN the DT internal JJ fault NN containment NN system NN . SENT We PP have VHP been VBN able JJ to TO do VV some DT performance NN studies NNS using VVG complex JJ workloads NNS as RB well RB as IN some DT fault NN injection NN studies NNS to TO evaluate VV its PP$ effectiveness NN . SENT We PP also RB have VHP an DT initial JJ implementation NN of IN the DT automatic JJ page NN migration NN and CC replication NN system NN that WDT is VBZ in IN the DT process NN of IN being VBG debugged VVN and CC tuned VVN . SENT Simulation NP Environment NP . SENT We PP continue VVP to TO enhance VV the DT SimOS NP simulation NN environment NN to TO make VV it PP a DT faster JJR , , more RBR accurate JJ model NN of IN the DT FLASH NN machine NN . SENT We PP have VHP added VVN to TO SimOS NP an DT accurate JJ CPU NN model NN of IN a DT superscalar NN dynamically RB scheduled VVN processor NN like IN the DT MIPS NP R NP 10000 CD , , which WDT the DT FLASH NN machine NN will MD use VV . SENT We PP have VHP also RB added VVN more RBR accurate JJ disk NN models NNS to TO the DT simulator NN as IN well RB as RB improved VVN its PP$ statistics NNS gathering VVG capabilities NNS . SENT Efficient JJ Logic NP Simulation NP . SENT We PP have VHP developed VVN a DT new JJ approach NN to TO event NN driven JJ simulation NN that WDT does VVZ not RB use VV a DT centralized JJ run NN time NN event NN queue NN , , yet RB is VBZ capable JJ of IN handling VVG arbitrary JJ models NNS , , including VVG those DT with IN unclocked JJ feedback NN and CC nonunit NN delay NN . SENT The DT elimination NN of IN the DT event NN queue NN significantly RB reduces VVZ run VVN time NN overhead NN , , resulting VVG in IN faster JJR simulation NN . SENT We PP have VHP implemented VVN our PP$ algorithm NN in IN a DT prototype NN Verilog NN simulator NN called VVN VeriSUIF NP . SENT Using VVG this DT simulator NN we PP demonstrate VVP improved JJ performance NN vs NP . SENT a DT commercial JJ simulator NN on IN a DT small JJ set NN of IN programs NNS . SENT Compiler NN Tools NP . SENT Effective JJ memory NN hierarchy NN utilization NN is VBZ critical JJ to TO the DT performance NN of IN modern JJ multiprocessor NN architectures NNS . SENT We PP have VHP developed VVN the DT first JJ compiler NN system NN that WDT fully RB automatically RB parallelizes VVZ sequential JJ programs NNS and CC changes VVZ the DT original JJ array NN layouts NNS to TO improve VV memory NN system NN performance NN . SENT We PP ran VVD our PP$ compiler NN on IN a DT set NN of IN application NN programs NNS and CC measured VVN their PP$ performance NN on IN the DT Stanford NP DASH NN multiprocessor NN . SENT Our PP$ results NNS show VVP that IN the DT compiler NN can MD effectively RB optimize VV parallelism NN in IN conjunction NN with IN memory NN subsystem NN performance NN . SENT Parallel JJ VLSI NP Simulation NP . SENT Development NN of IN a DT heterogeneous JJ multi NNS level NN mixed JJ mode NN simulation NN environment NN implemented VVN on IN multiprocessors NNS . SENT The DT project NN goal NN is VBZ to TO speedup NNS the DT simulation NN performance NN for IN large JJ system NN design NN tasks NNS through IN parallel JJ computing NN . SENT The DT research NN work NN includes VVZ distributed VVN scheduling NN , , parallel JJ execution NN of IN feedback NN networks NNS and CC integration NN of IN various JJ simulation NN paradigms NNS . SENT Formal JJ Verification NP . SENT We PP have VHP been VBN developing VVG formal JJ verification NN methods NNS for IN use NN in IN FLASH NN . SENT We PP have VHP successfully RB used VVN a DT new JJ verification NN method NN to TO formally RB verify VV the DT complete JJ pipeline NN for IN the DT protocol NN processor NN PP NP in IN FLASH NN . SENT We PP have VHP a DT prototype NN of IN an DT improved JJ verifier NN for IN synchronous JJ systems NNS , , where WRB the DT input NN language NN and CC implementation NN are VBP based VVN on IN C NP . SENT State NN Enumeration NN . SENT We PP continued VVD work NN on IN the DT automatic JJ generation NN of IN corner NN case NN tests NNS using VVG state NN enumeration NN , , and CC succeeded VVN in IN finding VVG several JJ non JJ trivial JJ bugs NNS in IN the DT Protocol NP Processor NN design NN 9 CD . SENT This DT work NN was VBD extended VVN to TO other JJ parts NNS of IN the DT MAGIC JJ chip NN , , starting VVG with IN the DT Data NP Buffer NN Allocator NN and CC moving VVG to TO the DT Inbox NN . SENT A DT new JJ state NN enumeration NN tool NN , , called VVD MPP NP , , has VHZ been VBN developed VVN to TO replace VV Synchronous JJ Murphi NP . SENT MPP NP is VBZ functionally RB equivalent JJ to TO Synchronous JJ Murphi NP but CC does VVZ not RB suffer VV from IN an DT exponential JJ increase NN in IN the DT size NN of IN the DT code NN it PP generates VVZ with IN increased JJ non JJ determinism NN in IN the DT model NN . SENT This DT will MD allow VV us PP to TO model VV larger JJR parts NNS of IN the DT design NN for IN automatic JJ corner NN case NN test NN generation NN . SENT Technical JJ Summary NN Parallel NN Hardware NN Magic NP Hardware NN Status NN The DT hardware NN design NN has VHZ progressed VVN significantly RB over IN the DT last JJ few JJ months NNS . SENT The DT focus NN of IN the DT design NN team NN has VHZ been VBN in IN two CD areas NNS . SENT 1 CD completion NN of IN the DT RTL NP description NN of IN the DT MAGIC JJ chip NN . SENT and CC 2 CD partitioning VVG the DT RTL NP in IN preparation NN for IN logic NN synthesis NN and CC datapath NN layout NN . SENT Task NP 1 CD involved VVD completing VVG the DT RTL NP code NN such JJ that IN all DT functional JJ features NNS of IN the DT chip NN are VBP implemented VVN and CC , , at IN least JJS at IN a DT rudimentary JJ level NN , , tested VVN . SENT Though IN the DT actual JJ amount NN of IN code NN that WDT had VHD to TO be VB written VVN to TO complete VV this DT task NN was VBD not RB large JJ , , the DT goal NN of IN having VHG an DT RTL NP description NN that WDT is VBZ functionally RB complete JJ required VVN us PP to TO resolve VV a DT number NN of IN outstanding JJ design NN alternatives NNS , , and CC to TO coordinate VV with IN the DT operating VVG system NN team NN to TO ensure VV that IN the DT final JJ MAGIC JJ feature NN set NN would MD provide VV efficient JJ support NN for IN the DT operating NN system NN . SENT Because IN the DT MAGIC JJ chip NN contains VVZ an DT embedded VVN general JJ purpose NN processor NN , , a DT number NN of IN low JJ level NN operating VVG system NN functions NNS are VBP actually RB being VBG implemented VVN on IN MAGIC JJ rather RB than IN on IN the DT compute NN processor NN . SENT Though IN the DT flexibility NN of IN the DT MAGIC JJ chip NN allows VVZ the DT exact JJ implementation NN of IN these DT functions NNS to TO be VB modified VVN even RB after IN the DT design NN is VBZ frozen JJ , , efficient JJ low JJ level NN communication NN between IN the DT MAGIC JJ chip NN and CC the DT OS NN running VVG on IN the DT compute NN processor NN requires VVZ hardware NN support NN . SENT Along RB with IN implementing VVG all PDT the DT functionality NN in IN the DT RTL NP , , the DT design NN team NN also RB partitioned VVD the DT RTL NP so RB that IN it PP will MD be VB amenable JJ to TO logic NN synthesis NN . SENT As RB with IN almost RB all DT complex JJ chips NNS , , MAGIC NN will MD make VV heavy JJ use NN of IN logic NN synthesis NN to TO translate VV the DT RTL NP description NN into IN actual JJ gates NNS . SENT Logic NN synthesis NN will MD be VB used VVN mostly RB for IN the DT control NN logic NN . SENT to TO achieve VV higher JJR performance NN and CC density NN , , layout NN of IN the DT datapath NN logic NN will MD make VV use NN of IN specialized JJ datapath NN layout NN and CC compilation NN tools NNS instead RB of IN a DT traditional JJ logic NN synthesis NN program NN . SENT Basically RB , , then RB , , the DT goal NN of IN the DT RTL NP partitioning VVG effort NN was VBD to TO separate VV the DT RTL NP into IN control NN logic NN meant VVN to TO be VB synthesized VVN and CC datapath NN logic NN meant VVN for IN manual JJ layout NN or CC processing NN by IN datapath NN tools NNS . SENT In IN tandem NN with IN the DT partitioning VVG effort NN , , we PP also RB began VVD the DT actual JJ synthesis NN process NN , , taking VVG a DT number NN of IN the DT larger JJR control NN logic NN sections NNS of IN the DT chip NN and CC running VVG them PP through IN the DT Synopsys NP logic NN synthesis NN program NN . SENT The DT results NNS of IN these DT synthesis NN runs VVZ allow VV us PP to TO gauge VV how WRB efficiently RB the DT synthesis NN program NN is VBZ able JJ to TO implement VV the DT control NN logic NN , , both CC in IN terms NNS of IN delay NN and CC area NN . SENT Because IN the DT target NN clock NN rate NN for IN the DT MAGIC JJ chip NN is VBZ 100 CD MHz NN , , effective JJ use NN of IN the DT synthesis NN tools NNS will MD be VB necessary JJ if IN the DT target NN clock NN rate NN is VBZ to TO be VB met VVN . SENT In IN many JJ cases NNS , , RTL NP modifications NNS will MD be VB required VVN to TO improve VV the DT output NN of IN the DT synthesis NN tools NNS . SENT Since IN the DT RTL NP is VBZ now RB complete JJ and CC fully RB partitioned VVN , , the DT focus NN of IN the DT hardware NN design NN team NN over IN the DT next JJ few JJ months NNS will MD shift VV from IN implementation NN of IN the DT RTL NP to TO synthesis NN , , timing NN , , and CC other JJ issues NNS related VVN to TO the DT translation NN of IN the DT RTL NP model NN into IN a DT gate NN level NN description NN suitable JJ for IN fabrication NN . SENT As IN noted VVN , , much RB of IN this DT effort NN will MD be VB driven VVN by IN the DT results NNS of IN the DT synthesis NN process NN as IN critical JJ logic NN paths NNS and CC other JJ timing NN or CC layout NN issues NNS are VBP identified VVN , , the DT RTL NP will MD need VV to TO be VB altered VVN to TO alleviate VV the DT problem NN . SENT Though IN we PP do VVP not RB expect VV significant JJ difficulties NNS in IN this DT area NN , , the DT process NN is VBZ nevertheless RB time NN consuming NN . SENT Architecture NP Simulation NP Tools NP We PP have VHP increased VVN the DT quality NN of IN many JJ of IN our PP$ simulation NN tools NNS . SENT Our PP$ main JJ simulation NN tool NN is VBZ FlashLite NP , , which WDT is VBZ the DT system NN level NN simulator NN written VVN in IN C NP and CC C NP that IN models NNS each DT component NN on IN the DT FLASH NN node NN as RB well RB as IN the DT internals NNS of IN the DT MAGIC JJ chip NN itself PP . SENT Included VVN in IN the DT simulation NN is VBZ the DT Protocol NP Processor NN portion NN of IN the DT MAGIC JJ chip NN which WDT runs VVZ the DT actual JJ protocol NN code NN we PP will MD run VV on IN the DT FLASH NN prototype NN . SENT The DT base NN cache NN coherence NN protocol NN code NN has VHZ been VBN expanded VVN to TO include VV support NN for IN cache NN coherent JJ I NP O NP , , code NN to TO boot NN the DT Protocol NP Processor NN , , code NN for IN error NN handling NN and CC software NN partitioning VVG of IN the DT machine NN into IN cells NNS . SENT FlashLite NP has VHZ also RB been VBN partitioned VVN to TO better RBR facilitate VV the DT simultaneous JJ development NN of IN multiple JJ protocols NNS . SENT The DT development NN of IN an DT alternate JJ cache NN coherence NN COMA NN protocol NN is VBZ finished VVN , , and CC is VBZ being VBG performance NN tuned VVN . SENT In IN addition NN , , message NN passing NN protocols NNS are VBP expanding VVG from IN the DT original JJ RDMA NP implementation NN . SENT A DT simple JJ memory NN copy NN protocol NN is VBZ now RB operational JJ and CC will MD be VB used VVN in IN our PP$ operating NN system NN development NN . SENT Besides IN the DT ability NN to TO run VV multiprocessor NN applications NNS through IN FlashLite NP to TO test VV the DT protocols NNS and CC analyze VV machine NN performance NN , , we PP also RB have VHP the DT ability NN to TO run VV an DT N NP processor NN simulation NN with IN N NP 1 CD FlashLite NP nodes NNS and CC one CD Verilog NP node NN . SENT That DT is VBZ , , on IN one CD of IN the DT nodes NNS we PP replace VVP the DT FlashLite NP model NN of IN the DT MAGIC JJ chip NN with IN the DT actual JJ Verilog NP model NN of IN the DT MAGIC JJ chip NN . SENT We PP call VVP this DT simulation NN regime NN FlashLite NP Verilog NP . SENT This DT proves VVZ to TO be VB a DT convenient JJ and CC useful JJ way NN to TO generate VV test NN vectors NNS for IN the DT part NN and CC to TO ensure VV that DT basic JJ protocol NN operations NNS are VBP handled VVN properly RB . SENT With IN the DT addition NN of IN I PP O NN modeling NN in IN FlashLite NP , , we PP can MD now RB test VV the DT I NN O NN subsystem NN via IN FlashLite NP Verilog NP as RB well RB . SENT Currently RB , , we PP are VBP developing VVG a DT verification NN environment NN that WDT gives VVZ a DT diagnostic JJ writer NN lower JJR level NN access NN to TO the DT pins NNS of IN MAGIC NN to TO facilitate VV directed VVN tests NNS . SENT It PP is VBZ these DT diagnostics NNS which WDT will MD form VV the DT suite NN of IN regression NN tests NNS that IN we PP run VVP through IN the DT part NN . SENT As IN an DT execution NN driven VVN memory NN system NN simulator NN , , FlashLite NP accepts VVZ a DT reference NN stream NN from IN some DT reference NN generator NN . SENT Our PP$ original JJ reference NN generator NN , , Tango NP Lite NP , , annotated VVD an DT executable's JJ load NN and CC store NN instructions NNS with IN calls NNS into IN FlashLite NP which WDT then RB modelled VVD those DT loads NNS and CC stores NNS appropriately RB . SENT This DT form NN of IN reference NN generation NN has VHZ the DT limitation NN that IN it PP is VBZ extremely RB difficult JJ to TO simulate VV data NNS handling VVG since IN library NN calls NNS and CC system NN calls NNS are VBP not RB annotated VVN . SENT For IN this DT reason NN , , we PP have VHP developed VVN another DT reference NN generator NN , , Mipsy NP , , which WDT emulates VVZ each DT instruction NN of IN the DT main JJ processor NN , , including VVG all DT library NN calls VVZ . SENT With IN Mipsy NP , , we PP can MD simulate VVP proper JJ data NNS handling VVG throughout IN the DT memory NN system NN . SENT This DT is VBZ critical JJ for IN verifying VVG both CC the DT cache NN coherence NN protocol NN , , and CC when WRB using VVG FlashLite NP Verilog NP , , verifying VVG the DT MAGIC JJ chip NN design VV itself PP . SENT We PP are VBP adding VVG uncached JJ operations NNS , , and CC prefetching VVG support NN to TO Mipsy NP as RB well RB as IN detailed JJ statistics NNS similar JJ to TO the DT ones NNS available JJ under IN Tango NP Lite NP . SENT Applications NNS and CC Architectural JJ Studies NNS Parallel JJ Application NN Effort NN Recently RB , , we PP have VHP continued VVN our PP$ efforts NNS in IN studying VVG the DT characteristics NNS of IN parallel JJ applications NNS on IN shared JJ address NN space NN multiprocessors NNS . SENT As IN part NN of IN this DT effort NN see VVP 19 CD for IN details NNS , , we PP have VHP released VVN the DT SPLASH NN 2 CD suite NN of IN parallel JJ applications NNS to TO facilitate VV the DT study NN of IN centralized VVN and CC distributed VVN shared JJ address NN space NN multiprocessors NNS . SENT The DT SPLASH NN 2 CD suite NN is VBZ the DT successor NN to TO the DT SPLASH NN suite NN , , and CC includes VVZ several JJ new JJ codes NNS in IN application NN domains NNS not RB previously RB explored VVN in IN SPLASH NN . SENT In IN addition NN , , several JJ of IN the DT SPLASH NN codes NNS have VHP been VBN improved VVN to TO use VV better JJR algorithms NNS and CC data NNS structures NNS , , and CC to TO incorporate VV an DT understanding NN of IN the DT underlying JJ architecture NN to TO enhance VV performance NN . SENT We PP have VHP quantitatively RB characterized VVN the DT SPLASH NN 2 CD programs NNS in IN terms NNS of IN fundamental JJ properties NNS and CC architectural JJ interactions NNS that WDT are VBP important JJ to TO understand VV them PP well RB . SENT The DT properties NNS we PP have VHP studied VVN include VVP the DT computational JJ load NN balance NN , , communication NN to TO computation NN ratio NN , , sizes NNS and CC scaling VVG of IN the DT important JJ working NN sets NNS , , sharing VVG behavior NN , , and CC spatial JJ locality NN . SENT We PP have VHP also RB addressed VVN methodological JJ considerations NNS for IN pruning VVG the DT domain NN of IN application NN and CC machine NN parameters NNS for IN architectural JJ studies NNS . SENT Architectural JJ evaluations NNS often RB have VHP a DT huge JJ space NN of IN application NN and CC machine NN parameters NNS to TO consider VV , , and CC many JJ of IN these DT parameters NNS can MD have VH substantial JJ impact NN on IN the DT results NNS of IN a DT given VVN study NN . SENT Since IN most JJS studies NNS that WDT evaluate VVP architectural JJ ideas NNS use VVP software NN simulation NN which WDT is VBZ typically RB very RB slow JJ , , it PP is VBZ important JJ to TO identify VV and CC avoid VV unrealistic JJ combinations NNS of IN problem NN and CC machine NN parameters NNS , , and CC to TO identify VV the DT realistic JJ ones NNS so RB that IN the DT rest NN of IN the DT space NN can MD be VB pruned VVN . SENT We PP find VVP that IN an DT understanding NN of IN the DT working VVG set NN characteristics NNS of IN an DT application NN , , coupled VVN with IN an DT understanding NN of IN the DT growth NN rates NNS of IN key JJ characteristics NNS such JJ as IN the DT communication NN to TO computation NN ratio NN is VBZ crucial JJ to TO selecting VVG important JJ and CC meaningful JJ points NNS in IN the DT experiment NN space NN . SENT Latency NN , , Occupancy NN , , and CC Bandwidth NN Issues NNS in IN Distributed NP Shared VVD Memory NP MPs NP Designers NNS of IN distributed VVN shared JJ memory NN DSM NP multiprocessors NNS are VBP moving VVG toward IN the DT use NN of IN commodity NN parts NNS , , not RB only RB in IN the DT processor NN and CC memory NN subsystem NN but CC also RB in IN the DT communication NN architecture NN . SENT While IN the DT desire NN to TO use VV commodity NN parts NNS and CC not RB perturb VV the DT underlying JJ uniprocessor NN node NN can MD compromise VV the DT efficiency NN of IN the DT communication NN architecture NN , , the DT impact NN on IN the DT end NN performance NN of IN applications NNS is VBZ unclear JJ . SENT We PP studied VVD this DT see VV 10 CD for IN details NNS performance NN impact NN through IN detailed JJ simulation NN and CC analytical JJ modeling NN , , using VVG a DT range NN of IN important JJ applications NNS and CC computational JJ kernels NNS . SENT To TO try VV and CC make VV some DT general JJ claims NNS about IN DSMs NNS , , we PP characterized VVD the DT communication NN architectures NNS of IN DSM NP machines NNS by IN four CD parameters NNS , , similar JJ to TO those DT in IN the DT logP NN model NN , , rather RB than IN studying VVG one CD specific JJ machine NN in IN detail NN . SENT The DT four CD parameters NNS are VBP latency NN , , occupancy NN of IN the DT communication NN controller NN in IN this DT model NN , , gap NN node NN to TO network NN bandwidth NN , , and CC finally RB the DT number NN of IN processors NNS . SENT Conventional JJ wisdom NN is VBZ that DT latency NN is VBZ the DT dominant JJ performance NN bottleneck NN in IN DSM NP machines NNS for IN scientific JJ applications NNS . SENT We PP showed VVD that IN for IN 64 CD processors NNS , , controller NN occupancy NN also RB has VHZ a DT substantial JJ impact NN on IN application NN performance NN , , even RB in IN the DT range NN of IN realistic JJ occupancies NNS that WDT are VBP being VBG proposed VVN for IN cache NN coherent JJ DSM NP machines NNS . SENT Of IN the DT two CD contributions NNS of IN occupancy NN to TO performance NN degradation NN the DT latency NN it PP adds VVZ and CC the DT contention NN it PP induces VVZ it PP is VBZ the DT contention NN component NN that WDT dominates VVZ the DT performance NN regardless RB of IN network NN latency NN . SENT As IN expected VVN , , techniques NNS to TO reduce VV the DT impact NN of IN latency NN make VVP controller NN occupancy NN a DT greater JJR bottleneck NN . SENT What WP is VBZ surprising JJ , , however RB , , is VBZ that IN the DT performance NN impact NN of IN occupancy NN is VBZ substantial JJ even RB for IN highly RB tuned VVN applications NNS and CC even RB in IN the DT absence NN of IN latency NN hiding NN techniques NNS . SENT We PP also RB showed VVD that IN the DT applications NNS we PP studied VVD are VBP not RB limited VVN by IN the DT bandwidth NN in IN recent JJ and CC upcoming JJ machines NNS . SENT Scaling VVG the DT problem NN size NN is VBZ often RB used VVN as IN a DT technique NN to TO overcome VV limitations NNS in IN communication NN latency NN and CC bandwidth NN . SENT We PP showed VVD that IN in IN many JJ structured JJ computations NNS occupancy NN induced VVD contention NN is VBZ not RB alleviated VVN by IN increasing VVG problem NN size NN , , and CC that IN there EX are VBP important JJ classes NNS of IN applications NNS for IN which WDT the DT performance NN lost VVN by IN using VVG higher JJR latency NN networks NNS or CC higher JJR occupancy NN controllers NNS cannot NN be VB regained VVN easily RB , , if IN at IN all DT , , by IN scaling VVG the DT problem NN size NN . SENT COMA NN Protocol NP on IN FLASH NN A DT key JJ attribute NN of IN FLASH NN is VBZ that IN it PP will MD allow VV experimentation NN with IN multiple JJ protocols NNS . SENT Over IN the DT last JJ six CD months NNS , , we PP have VHP continued VVN development NN of IN the DT COMA NN Cache NP Only JJ Memory NP Architecture NP protocol NN for IN the DT FLASH NN multiprocessor NN . SENT The DT first JJ phase NN of IN the DT work NN was VBD focused VVN toward IN understanding VVG the DT requirements NNS a DT COMA NN protocol NN places NNS on IN the DT FLASH NN design NN , , specifically RB , , on IN the DT custom NN node NN controller NN MAGIC NN . SENT The DT next JJ phase NN of IN the DT work NN was VBD implementation NN without IN respect NN to TO physical JJ constraints NNS such JJ as IN input NN queue NN sizes NNS and CC numbers NNS of IN available JJ buffers NNS both CC of IN which WDT were VBD assumed VVN to TO be VB infinite JJ . SENT Completion NN of IN these DT phases NNS took VVD roughly RB six CD months NNS . SENT After IN this DT ideal JJ protocol NN was VBD debugged VVN , , implementing VVG the DT remaining JJ functionality NN with IN respect NN to TO finite JJ queue NN depths NNS and CC buffer NN management NN began VVD . SENT This DT phase NN took VVD roughly RB six CD months NNS , , and CC resulted VVD in IN a DT functional JJ , , albeit IN unoptimized JJ protocol NN . SENT In IN fleshing VVG out RP this DT portion NN of IN the DT design NN , , we PP were VBD able JJ to TO evaluate VV the DT MAGIC JJ chip NN and CC provide VV feedback NN to TO the DT hardware NN team NN concerning VVG performance NN and CC correctness NN problems NNS . SENT One CD example NN concerns NNS resource NN allocation NN . SENT Before IN our PP$ work NN , , certain JJ buffering VVG elements NNS were VBD allocated VVN in IN such PDT a DT way NN that IN one CD of IN the DT possible JJ request NN queues NNS could MD be VB repeatedly RB denied VVN a DT buffer NN . SENT The DT result NN is VBZ starvation NN of IN that DT unit NN . SENT Because IN the DT COMA NN protocol NN is VBZ more JJR buffer NN intensive JJ than IN the DT base NN protocol NN , , it PP exposed VVD this DT problem NN . SENT The DT problem NN has VHZ now RB been VBN solved VVN using VVG a DT simple JJ partitioning VVG of IN the DT buffer NN elements NNS . SENT A DT baseline JJ performance NN comparison NN of IN the DT COMA NN protocol NN versus IN the DT base NN protocol NN was VBD then RB undertaken VVN . SENT Preliminary JJ results NNS indicate VVP that DT COMA NN can MD provide VV performance NN gains NNS of IN up RB to TO 50 CD if IN the DT application NN reads VVZ large JJ data NNS sets NNS that WDT do VVP not RB fit VV in IN the DT caches NNS , , but CC can MD also RB degrade VV performance NN by IN up RB to TO 30 CD for IN applications NNS that WDT involve VVP large JJ amounts NNS of IN writing NN . SENT However RB , , there EX is VBZ much RB room NN for IN low JJ level NN optimizations NNS in IN the DT COMA NN protocol NN . SENT We PP are VBP currently RB in IN the DT process NN of IN optimizing VVG the DT COMA NN protocol NN so RB that IN we PP can MD perform VV a DT more RBR meaningful JJ comparison NN to TO the DT base NN protocol NN . SENT In IN addition NN , , another DT goal NN of IN this DT work NN is VBZ to TO run VV COMA NN on IN workloads NNS previously RB not RB studied VVN , , for IN example NN , , operating VVG systems NNS . SENT To TO that DT end NN , , we PP are VBP also RB modifying VVG the DT COMA NN protocol NN so RB that IN it PP can MD interface NN with IN a DT detailed JJ operating NN system NN simulator NN , , SimOS NP , , developed VVN at IN Stanford NP . SENT We PP expect VVP these DT modifications NNS and CC optimizations NNS to TO be VB done VVN over IN the DT next JJ few JJ months NNS , , and CC then RB will MD undertake VV a DT more RBR thorough JJ evaluation NN of IN COMA NN versus IN the DT base NN for IN both DT scientific JJ workloads NNS and CC operating VVG system NN workloads NNS . SENT After IN that DT , , we PP will MD evaluate VV whether IN or CC not RB further RBR architectural JJ changes NNS to TO MAGIC NN can MD improve VV performance NN . SENT Software NN Based VVN Clustered VVN Distributed NP Virtual JJ Shared VVD Memory NP We PP are VBP collaborating VVG with IN Silicon NP Graphics NPS to TO evaluate VV software NN based VVN distributed VVD virtual JJ shared VVN memory NN over IN fast JJ networks NNS . SENT The DT topic NN of IN virtual JJ shared JJ memory NN is VBZ not RB new JJ , , however RB research NN to TO date NN has VHZ focused VVN more JJR on IN networks NNS of IN workstations NNS . SENT By IN using VVG n NN way NN multiprocessors NNS instead RB of IN workstations NNS as IN the DT underlying JJ nodes NNS we PP can MD take VV advantage NN of IN the DT improved JJ communication NN to TO computation NN ratio NN between IN the DT nodes NNS , , and CC amortize VV cost NN of IN the DT fast JJ network NN link NN . SENT Existing JJ bus NN based VVN shared JJ memory NN multiprocessors NNS scale NN to TO tens NNS of IN processors NNS and CC next JJ generation NN HW NP technologies NNS , , such JJ as IN FLASH NN , , will MD scale VV to TO 100 CD s PP of IN processors NNS . SENT Clustered VVN DVM NP introduces VVZ a DT new JJ scaling NN principle NN which WDT has VHZ the DT potential NN to TO increase VV the DT number NN of IN processors NNS in IN each DT generation NN of IN machines NNS using VVG SW NP while IN leveraging VVG the DT state NN of IN the DT art NN in IN HW NP . SENT The DT resulting VVG machine NN uses VVZ both DT HW NP and CC SW NP coherency NN and CC maintains VVZ the DT shared VVN memory NN programming NN paradigm NN of IN the DT underlying JJ multiprocessors NNS . SENT HW NP is VBZ used VVN within IN the DT node NN and CC SW NP between IN them PP . SENT The DT TLB NP is VBZ used VVN to TO detect VV coherency NN violations NNS for IN SW NP coherency NN . SENT We PP are VBP prototyping VVG a DT Distributed NP Virtual JJ Shared JJ Memory NP machine NN using VVG SGI NP Challenge NP machines NNS and CC a DT high JJ speed NN switch NN based VVN HIPPI NP network NN . SENT The DT prototype NN is VBZ leveraging VVG the DT work NN of IN the DT Stanford NP FLASH NN HW NP development NN effort NN by IN using VVG the DT FLASH NN protocol NN handlers NNS state NN machines NNS . SENT Because IN FLASH NN executes VVZ SW NP handlers NNS on IN a DT relatively RB general JJ purpose NN processor NN MAGIC NN , , the DT protocols NNS are VBP well RB suited VVN for IN software NN implementation NN . SENT The DT prototype NN code NN is VBZ being VBG implemented VVN both CC at IN the DT user NN level NN and CC at IN the DT kernel NN level NN SGI NP IRIX NP based VVN . SENT User NN and CC kernel NN implementations NNS will MD be VB able JJ to TO exist VV on IN the DT same JJ network NN , , facilitating VVG debugging VVG . SENT A DT kernel NN level NN implementation NN is VBZ necessary JJ to TO provide VV fast RB interrupts VVZ and CC user NN level NN delivery NN of IN data NN pages NNS , , which WDT are VBP the DT underlying JJ coherency NN objects NNS , , directly RB to TO user NN space NN . SENT The DT current JJ status NN of IN the DT project NN is VBZ as RB follows VVZ . SENT We PP have VHP implemented VVN the DT FLASH NN DELAYED VVD mode NN handlers NNS and CC are VBP starting VVG to TO implement VV the DT EAGER JJ mode NN handlers NNS . SENT The DT code NN has VHZ been VBN tested VVN over IN slower JJR networks NNS ethernet NP , , FDDI NP and CC will MD soon RB be VB tested VVN on IN HIPPI NP . SENT Experiments NNS will MD then RB begin VV to TO characterize VV the DT application NN space NN and CC understand VV the DT performance NN . SENT Eventually RB more RBR novel JJ protocol NN optimizations NNS , , such JJ as IN lazy JJ release NN and CC possibly RB , , entry NN consistency NN , , will MD be VB tested VVN . SENT Silicon NP Graphics NPS is VBZ also RB supporting VVG this DT work NN by IN providing VVG computer NN time NN for IN development NN , , a DT testbed NN for IN experiments NNS , , and CC the DT efforts NNS of IN an DT IRIX NP kernel NN SW NP engineer NN . SENT Shared VVN Cache NP Architecture NP Studies NPS Small NP scale NN , , shared VVN memory NN multiprocessors NNS represent VVP an DT important JJ class NN of IN multiprocessor NN design NN due JJ to TO their PP$ increasing VVG popularity NN in IN both CC the DT workstation NN and CC server NN markets NNS . SENT Typically RB , , these DT multiprocessors NNS are VBP comprised VVN of IN two CD to TO eight CD high JJ performance NN processing NN nodes NNS connected VVN together RB , , and CC to TO main JJ memory NN , , using VVG a DT shared VVN global JJ bus NN . SENT Parallel JJ applications NNS can MD place VV heavy JJ demands NNS on IN the DT bus NN system NN due JJ to TO capacity NN and CC communication NN misses VVZ in IN the DT L NP 2 CD caches NNS , , yet RB , , bus NN performance NN has VHZ failed VVN to TO scale VV at IN the DT same JJ rate NN as IN processor NN performance NN due JJ to TO the DT inherent JJ limitations NNS of IN the DT bus NN topology NN . SENT To TO address VV the DT bus NN performance NN bottleneck NN , , we PP have VHP investigated VVN the DT performance NN improvements NNS that WDT are VBP possible RB using VVG the DT plentiful JJ low JJ latency NN interconnections NNS available JJ using VVG multi NNS chip NN module NN MCM NP technology NN in IN a DT shared VVN secondary JJ cache NN architecture NN . SENT Our PP$ results NNS show VVP that IN clustering VVG at IN the DT secondary JJ cache NN provides VVZ much RB of IN the DT same JJ sorts NNS of IN benefits NNS that IN result NN from IN sharing VVG at IN the DT primary JJ cache NN without IN the DT degradation NN in IN primary JJ cache NN access NN time NN and CC bandwidth NN . SENT This DT degradation NN becomes VVZ more RBR of IN an DT issue NN in IN a DT multiprogramming NN workload NN and CC applications NNS without IN much JJ sharing NN , , since IN there EX are VBP no DT benefits NNS from IN clustering VVG to TO counteract VV the DT degradation NN . SENT We PP also RB find VVP that IN for IN parallel JJ scientific JJ applications NNS with IN high JJ levels NNS of IN communication NN , , clustering VVG can MD dramatically RB reduce VV contention NN for IN the DT shared VVN global JJ bus NN while IN not RB increasing VVG contention NN at IN the DT shared VVN secondary JJ caches NNS appreciably RB . SENT This DT has VHZ the DT effect NN of IN more RBR evenly RB distributing VVG resource NN contention NN throughout IN the DT system NN , , and CC can MD result VV in IN large JJ performance NN gains NNS . SENT Operating VVG System NP and CC Compilers NNS FLASH VVP OS NN Hive VV The DT work NN on IN the DT operating VVG system NN for IN the DT FLASH NN machine NN has VHZ advanced VVN significantly RB over IN the DT last JJ six CD months NNS . SENT By IN the DT end NN of IN the DT reporting VVG period NN the DT operating NN system NN , , Hive NN , , has VHZ booted VVN and CC runs VVZ complex JJ workloads NNS within IN our PP$ simulation NN environment NN . SENT Using VVG the DT SimOS NP simulation NN environment NN , , we PP have VHP been VBN measuring VVG and CC evaluating VVG the DT performance NN and CC correctness NN of IN Hive's NP failure NN containment NN mechanisms NNS . SENT The DT description NN of IN this DT mechanism NN and CC the DT results NNS of IN our PP$ experience NN to TO date NN can MD be VB found VVN in IN the DT paper NN Hive NN . SENT Fault NN Containment NN For IN Shared JJ Memory NP Multiprocessors NP 14 CD which WDT has VHZ been VBN submitted VVN to TO SOSP NP . SENT Development NN work NN on IN Hive NN during IN the DT reporting NN period NN included VVD . SENT extending VVG the DT virtual JJ memory NN system NN to TO support VV sharing NN of IN memory NN among IN kernels NNS , , extending VVG the DT file NN system NN to TO support VV locating VVG and CC accessing VVG file NN data NNS cached VVN in IN a DT remote JJ kernel NN , , extending VVG the DT process NN fork VV mechanism NN to TO fork VV a DT child NN process NN on IN a DT remote JJ kernel NN , , extending VVG the DT process NN signal NN and CC process NN group NN mechanisms NNS to TO manage VV distributed VVN sets NNS of IN processes NNS , , and CC implementing VVG a DT failure NN recovery NN mechanism NN that WDT reestablishes VVZ a DT consistent JJ system NN state NN when WRB one NN or CC more JJR of IN the DT kernels NNS fails VVZ . SENT These DT changes NNS represent VVP much JJ new JJ code NN and CC heavy JJ modifications NNS of IN the DT existing JJ code NN . SENT Much RB of IN our PP$ time NN has VHZ been VBN spent VVN and CC will MD continue VV to TO be VB spent VVN on IN the DT debugging VVG , , performance NN tuning VVG , , and CC functional JJ enhancement NN of IN this DT code NN . SENT A DT second JJ focus NN of IN the DT operating VVG system NN work NN on IN Hive NN has VHZ been VBN on IN mechanisms NNS and CC policies NNS for IN reducing VVG memory NN latency NN of IN a DT CC NP NUMA NP machine NN like IN FLASH NN . SENT For IN the DT last JJ six CD months NNS , , we PP have VHP focused VVN on IN using VVG automatic JJ page NN replication NN and CC migration NN to TO move VV memory NN pages NNS closer RBR to TO the DT CPUs NP that WDT are VBP accessing VVG them PP . SENT Using VVG SimOS NP , , we PP studied VVD the DT benefits NNS of IN automatic JJ page NN replication NN and CC migration NN on IN CC NP NUMA NP and CC CC NP NOW RB compute VV servers NNS . SENT We PP studied VVD several JJ realistic JJ compute NN server NN workloads NNS including VVG a DT program NN development NN environment NN , , engineering NN simulations NNS , , a DT commercial JJ database NN Sybase NP , , and CC a DT multiprogrammed JJ parallel JJ graphics NNS and CC scientific JJ workload NN . SENT Results NNS of IN this DT study NN are VBP the DT focus NN of IN the DT paper NN , , OS NN Support NN for IN Improving VVG Data NP Locality NN on IN CC NP NUMA NP Compute VV Servers NNS , , 16 CD which WDT has VHZ been VBN submitted VVN to TO SOSP NP . SENT We PP found VVD that DT automatic JJ page NN migration NN and CC replication NN can MD improve VV performance NN by IN 15 CD to TO 35 CD on IN CC NP NUMA NP machines NNS and CC by IN 15 CD to TO 50 CD on IN CC NP NOWs NNS for IN many JJ workloads NNS . SENT We PP studied VVD various JJ metrics NNS for IN migrating VVG and CC replicating VVG pages NNS and CC found VVD that DT sampling NN cache NN misses VVZ does VVZ very RB well RB in IN approximating VVG cache NN misses VVZ . SENT However RB TLB NP misses VVZ do VV not RB consistently RB approximate JJ cache NN misses VVZ . SENT We PP also RB found VVD the DT cost NN of IN migrating VVG and CC replicating VVG pages NNS is VBZ very RB sensitive JJ to TO the DT OS NN implementation NN , , and CC efficient JJ data NN structures NNS and CC algorithms NNS need VVP to TO be VB studied VVN further RBR . SENT As IN part NN of IN page NN migration NN replication NN study NN , , we PP have VHP developed VVN an DT initial JJ implementation NN of IN the DT Hive NN virtual JJ memory NN system NN that WDT supports VVZ page NN migration NN and CC replication NN . SENT This DT work NN required VVD redoing VVG much RB of IN the DT locking VVG structure NN of IN the DT existing JJ virtual JJ memory NN to TO support VV this DT new JJ functionality NN . SENT Finally RB , , the DT SimOS NP simulation NN environment NN was VBD advanced VVN in IN two CD areas NNS . SENT First RB , , it PP was VBD heavily RB used VVN by IN the DT above JJ studies NNS which WDT resulted VVD in IN many JJ of IN the DT remaining JJ bugs NNS being VBG flushed VVN out RP and CC some DT new JJ functionally RB to TO enhance VV its PP$ usability NN . SENT Much RB of IN new JJ functionally RB involves VVZ better JJR ways NNS to TO keep VV and CC generate VV the DT statistics NNS collected VVN from IN the DT simulation NN . SENT In IN particular JJ , , a DT system NN of IN annotations NNS that WDT allowed VVD the DT user NN of IN the DT simulation NN system NN to TO specify VV what WP to TO study VV was VBD built VVN . SENT Using VVG annotations NNS , , we PP have VHP been VBN able JJ to TO study VV the DT behavior NN of IN the DT operating VVG system NN and CC application NN programs NNS in IN detail NN . SENT The DT second JJ push NN in IN the DT simulation NN environment NN has VHZ been VBN to TO increase VV its PP$ accuracy NN . SENT We PP have VHP included VVN detailed JJ simulation NN models NNS of IN advanced JJ microprocessors NNS . SENT These DT CPU NN models NNS include VVP features NNS such JJ as IN multiple JJ instruction NN issue NN , , out RB of IN order NN issue NN , , and CC non JJ blocking VVG caches NNS . SENT Many JJ of IN these DT features NNS are VBP needed VVN to TO accurately RB model VV the DT MIPS NP R NP 10000 CD being VBG used VVN in IN FLASH NN . SENT We PP also RB added VVD more JJR detailed JJ models NNS of IN the DT memory NN systems NNS as RB well RB as IN some DT of IN the DT I NN O NN devices NNS . SENT Efficient JJ Logic NP Simulation NP Modern NP digital JJ system NN design NN relies VVZ heavily RB on IN simulation NN to TO reduce VV the DT number NN of IN design NN errors NNS and CC to TO improve VV system NN efficiency NN . SENT In IN large JJ system NN designs NNS , , so RB much JJ time NN is VBZ spent VVN in IN simulation NN that IN it PP has VHZ become VVN a DT design NN bottleneck NN . SENT Event NN driven VVN simulation NN and CC levelized JJ compiled VVN simulation NN are VBP two CD well RB known VVN simulation NN techniques NNS that WDT are VBP currently RB used VVN in IN digital JJ system NN design NN . SENT Levelized JJ compiled VVN code NN logic NN simulators NNS have VHP the DT potential NN to TO provide VV much RB higher JJR simulation NN performance NN than IN event NN driven VVN simulators NNS because IN they PP eliminate VVP much RB of IN the DT run VVN time NN overhead NN associated VVN with IN ordering VVG and CC propagating VVG events NNS . SENT The DT main JJ disadvantage NN of IN levelized JJ compiled VVN simulation NN techniques NNS is VBZ that IN they PP are VBP not RB general JJ . SENT We PP have VHP devised VVN a DT general JJ method NN compiling VVG event NN driven VVN models NNS called VVD static JJ simulation NN that WDT combines VVZ the DT generality NN of IN event NN driven VVN simulations NNS and CC the DT efficiency NN of IN the DT levelized JJ simulation NN approach NN . SENT Like IN event NN driven VVN simulation NN , , our PP$ technique NN applies VVZ to TO all DT general JJ models NNS , , including VVG both DT synchronous JJ and CC asynchronous JJ designs NNS . SENT The DT only JJ restriction NN is VBZ that IN any DT specified VVN delays NNS in IN the DT simulation NN must MD be VB known VVN constants NNS at IN compile VV time NN . SENT In IN our PP$ method NN , , we PP represent VVP the DT event NN driven VVN behavior NN with IN an DT event NN graph NN , , whose WP$ vertices NN represent VVP events NNS in IN the DT simulation NN and CC whose WP$ edges NNS represent VVP the DT causal JJ relationships NNS between IN the DT events NNS . SENT We PP then RB use VVP the DT general JJ technique NN of IN partial JJ evaluation NN to TO schedule VV the DT events NNS as RB well RB as IN possible JJ using VVG statically RB available JJ information NN . SENT Specifically RB , , the DT compiler NN tries VVZ to TO approximate VV the DT dynamic JJ simulation NN process NN by IN keeping VVG track NN of IN all PDT the DT available JJ static JJ information NN that WDT affects VVZ the DT contents NNS of IN the DT run VVN time NN event NN queue NN in IN a DT dynamic JJ simulation NN . SENT To TO test VV our PP$ algorithm NN , , we PP have VHP implemented VVN a DT prototype NN simulator NN , , called VVD VeriSUIF NP , , using VVG the DT SUIF NP Stanford NP University NP Intermediate NP Format NN compiler NN system NN . SENT Our PP$ prototype NN implementation NN of IN the DT simulator NN achieves VVZ an DT average JJ speedup NNS of IN about IN two CD when WRB compared VVN to TO VCS NP 2 CD . SENT 3 CD on IN six CD benchmarks NNS . SENT More RBR importantly RB , , our PP$ average JJ scheduling NN overhead NN amounts NNS to TO only RB 4 CD of IN that WDT found VVD in IN the DT VCS NP code NN . SENT We PP are VBP improving VVG our PP$ implementation NN with IN the DT goal NN of IN simulating VVG a DT chip NN as IN complex NN as IN MAGIC JJ five CD times NNS faster RBR than IN it PP is VBZ possible JJ today NN . SENT Data NNS and CC Computation NN Transformations NNS for IN Multiprocessors NNS Even RB though RB shared VVN address NN space NN machines NNS have VHP hardware NN support NN for IN coherence NN , , getting VVG good JJ performance NN on IN these DT machines NNS requires VVZ programmers NNS to TO pay VV special JJ attention NN to TO the DT memory NN hierarchy NN . SENT Today NN , , expert NN users NNS restructure VV their PP$ codes NNS and CC change VV their PP$ data NN structures NNS manually RB to TO improve VV a DT program's JJ locality NN of IN reference NN . SENT We PP have VHP developed VVN a DT fully RB automatic JJ compiler NN that WDT translates VVZ sequential JJ code NN to TO efficient JJ parallel JJ code NN on IN shared JJ address NN space NN machines NNS . SENT Multiprocessor NN memory NN hierarchy NN performance NN can MD be VB affected VVN by IN the DT decisions NNS made VVN in IN a DT number NN of IN phases NNS of IN the DT compilation NN process NN . SENT parallelization NN , , assignment NN of IN computation NN to TO processors NNS and CC data NNS layout NN designs NNS . SENT We PP show VVP that IN the DT problem NN of IN optimizing VVG parallelism NN in IN conjunction NN with IN memory NN system NN performance NN can MD be VB reduced VVN to TO two CD simpler JJR problems NNS . SENT The DT first JJ step NN chooses VVZ the DT parallelization NN and CC computation NN assignment NN such JJ that IN synchronization NN and CC true JJ sharing NN are VBP minimized VVN , , without IN regard NN to TO the DT data NN layout NN . SENT The DT algorithm NN for IN this DT step NN is VBZ useful JJ also RB for IN distributed VVN address NN space NN machines NNS . SENT The DT second JJ step NN then RB makes VVZ the DT data NNS accessed VVN by IN each DT processor NN contiguous JJ in IN memory NN . SENT We PP introduce VV two CD primitive JJ data NN transforms VVZ , , strip VV mining NN and CC permutation NN , , and CC show VVP that IN they PP can MD be VB used VVN to TO derive VV all PDT the DT standard JJ data NN distributions NNS . SENT These DT primitives NNS can MD also RB be VB used VVN for IN other JJ purposes NNS , , such JJ as IN data NN layout NN optimizations NNS on IN uniprocessor NN code NN . SENT Since IN our PP$ data NN transformation NN algorithm NN uses VVZ data NN decompositions NNS as IN inputs NNS , , it PP is VBZ also RB immediately RB applicable JJ to TO HPF NP programs NNS . SENT HPF NP programs NNS have VHP traditionally RB been VBN targeted VVN for IN distributed VVN address NN space NN machines NNS . SENT Our PP$ algorithm NN uses VVZ this DT same JJ information NN to TO optimize VV for IN shared VVN address NN space NN machines NNS , , while IN taking VVG full JJ advantage NN of IN the DT underlying JJ hardware NN . SENT Locality NN of IN reference NN is VBZ achieved VVN simply RB by IN making VVG the DT data NNS accessed VVN by IN each DT processor NN contiguous JJ and CC relying VVG on IN the DT cache NN hardware NN to TO provide VV memory NN management NN and CC coherence NN functions NNS . SENT We PP have VHP implemented VVN our PP$ techniques NNS in IN the DT SUIF NP compiler NN system NN . SENT We PP ran VVD our PP$ compiler NN over IN a DT set NN of IN sequential JJ FORTRAN NP programs NNS and CC measured VVN the DT performance NN of IN our PP$ parallelized JJ code NN on IN the DT DASH NN multiprocessor NN . SENT Our PP$ experimental JJ results NNS show VVP that IN our PP$ algorithm NN can MD dramatically RB improve VV the DT parallel JJ performance NN of IN shared JJ address NN space NN machines NNS . SENT Verification NN We PP have VHP been VBN developing VVG formal JJ verification NN methods NNS for IN use NN in IN FLASH NN . SENT There EX have VHP been VBN two CD initiatives NNS . SENT state NN enumeration NN methods NNS for IN finite JJ state NN machines NNS , , and CC a DT new JJ method NN for IN formally RB verifying VVG processor NN pipelines NNS . SENT State NN Enumeration NN Our PP$ technique NN of IN using VVG state NN enumeration NN to TO automatically RB generate VV corner NN case NN tests NNS for IN validation NN produced VVD some DT promising JJ results NNS with IN the DT Protocol NP Processor NN design NN . SENT 9 CD Some RB bugs NNS were VBD discovered VVN that IN required VVN a DT complex JJ combination NN of IN unusual JJ events NNS to TO occur VV simultaneously RB . SENT These DT cases NNS may MD not RB have VH been VBN found VVN using VVG our PP$ hand NN written VVN test NN vectors NNS or CC with IN randomly RB generated VVN vectors NNS . SENT The DT use NN of IN state NN enumeration NN gives VVZ us PP an DT automatic JJ way NN to TO create VV lots NNS of IN corner NN case NN situations NNS directly RB from IN the DT RTL NP Verilog NP , , hopefully RB covering VVG situations NNS that IN the DT designers NNS overlooked VVN in IN their PP$ testing NN . SENT We PP continued VVD our PP$ work NN by IN applying VVG the DT same JJ technique NN to TO the DT Data NP Buffer NN Allocator NN DBA NP unit NN of IN MAGIC NN . SENT The DT state NN enumeration NN required VVD some DT down RB scaling VVG of IN the DT size NN of IN counters NNS and CC the DT number NN of IN input NN ports NNS modelled VVD . SENT However RB , , it PP was VBD functionally RB complete JJ in IN other JJ respects NNS . SENT Running VVG the DT generated VVN test NN vectors NNS showed VVD that IN the DT RTL NP Verilog NP conformed VVD to TO the DT abstract JJ model NN that IN we PP had VHD created VVN from IN the DT written JJ specification NN of IN the DT DBA NP for IN the DT corner NN cases NNS found VVD in IN the DT state NN enumeration NN . SENT The DT next JJ unit NN targeted VVN was VBD the DT Inbox NN . SENT Work NN on IN this DT has VHZ progressed VVN to TO the DT point NN where WRB test NN vectors NNS are VBP about RB to TO be VB generated VVN and CC run VVN . SENT We PP are VBP also RB beginning VVG to TO work VV on IN techniques NNS to TO extend VV this DT unit NN level NN testing NN method NN so IN that DT corner NN case NN tests NNS can MD be VB generated VVN spanning VVG multiple JJ units NNS of IN the DT chip NN and CC run VV on IN a DT full JJ chip NN simulation NN model NN . SENT This DT extension NN is VBZ required VVN since IN a DT unit NN may MD pass VV a DT test NN when WRB run VVN in IN isolation NN against IN a DT specification NN model NN , , and CC yet RB contain VV a DT bug NN because IN of IN conflicting JJ assumptions NNS with IN other JJ units NNS . SENT Since IN October NP , , we PP have VHP transitioned VVN from IN Synchronous JJ Murphi NP to TO MPP NP as IN our PP$ state NN enumeration NN tool NN . SENT While IN working VVG on IN the DT Inbox NN , , we PP discovered VVD that IN Synchronous JJ Murphi NP suffered VVD from IN an DT exponential JJ growth NN in IN the DT size NN of IN the DT code NN it PP produced VVD . SENT The DT Synchronous JJ Murphi NP compiler NN takes VVZ a DT description NN of IN the DT model NN in IN it's NNS own VVP specialized JJ language NN and CC creates VVZ a DT C NP program NN that WDT will MD perform VV the DT state NN enumeration NN for IN that DT model NN . SENT Unconstrained JJ inputs NNS to TO the DT model NN are VBP represented VVN in IN Synchronous JJ Murphi NP as IN non JJ deterministic JJ variables NNS . SENT The DT verifier NN tries VVZ all DT possible JJ combinations NNS of IN these DT non JJ deterministic JJ variables NNS to TO find VV all PDT the DT reachable JJ states NNS . SENT The DT problem NN arises VVZ because IN the DT Synchronous JJ Murphi NP compiler NN statically RB expands VVZ all DT possible JJ combinations NNS of IN the DT non JJ deterministic JJ variables NNS in IN the DT generated VVN C NP code NN . SENT The DT resulting VVG exponential JJ growth NN in IN code NN size NN quickly RB overwhelms VVZ the DT C NP compiler NN . SENT For IN many JJ of IN our PP$ units NNS , , this DT prevents VVZ us PP from IN modelling VVG all PDT the DT inputs NNS together RB , , forcing VVG us PP to TO chose VVD subsets NNS for IN each DT run NN and CC tying VVG the DT remaining VVG inputs NNS to TO known JJ values NNS . SENT Our PP$ new JJ state NN enumeration NN tool NN , , MPP NP , , was VBD developed VVN to TO overcome VV this DT problem NN . SENT Instead RB of IN statically RB expanding VVG the DT combinations NNS of IN non JJ determinism NN , , it PP does VVZ this DT dynamically RB at IN run NN time NN . SENT This DT is VBZ a DT trade NN off RP between IN the DT amount NN of IN non JJ determinism NN that WDT can MD be VB modelled VVN and CC its PP$ running VVG time NN . SENT We PP have VHP found VVN that IN we PP can MD drastically RB increase VV the DT amount NN of IN non JJ determinism NN modelled VVD , , giving VVG us PP a DT better JJR chance NN of IN reaching VVG more JJR corner NN cases NNS in IN each DT unit NN for IN test NN generation NN . SENT New JJ Methods NNS in IN Processor NN Verification NP Last JJ year NN under IN other JJ funding NN , , we PP developed VVD a DT new JJ method NN for IN formally RB verifying VVG processor NN pipeline NN control NN . SENT The DT method NN compares VVZ two CD high JJ level NN descriptions NNS , , reporting VVG inconsistencies NNS if IN it PP detects VVZ them PP . SENT One CD description NN is VBZ of IN the DT microprocessor NN implementation NN , , called VVD the DT MicroArchitecture NP MA NN . SENT The DT microarchitecture NN description NN must MD capture VV all DT of IN the DT pipelining VVG and CC cycle NN level NN timing NN of IN the DT implementation NN . SENT The DT second JJ description NN , , which WDT specifies VVZ the DT correctness NN requirements NNS of IN the DT process NN , , is VBZ a DT programmer NN level NN view NN of IN the DT microprocessor NN , , which WDT we PP call VVP the DT Instruction NN Set VVN Architecture NP ISA NP . SENT Both DT descriptions NNS must MD have VH functionally RB equivalent JJ data NN path NN elements NNS . SENT for IN example NN , , the DT processor NN ALU NP must MD be VB the DT same JJ in IN both DT descriptions NNS . SENT The DT new JJ method NN is VBZ similar JJ to TO traditional JJ methods NNS , , but CC requires VVZ orders NNS of IN magnitude NN less CC labor NN because IN it PP automates VVZ many JJ of IN the DT difficult JJ parts NNS of IN the DT verification NN process NN . SENT The DT ultimate JJ theorem NN to TO be VB proved VVN is VBZ that IN programs NNS execute VVP equivalently RB on IN the DT MA NN and CC ISA NP in IN other JJ words NNS , , the DT MA NN is VBZ a DT faithful JJ implementation NN of IN the DT ISA NP . SENT More RBR precisely RB , , we PP aim VVP to TO show VV that IN , , for IN every DT step NN of IN every DT program NN , , the DT MA NN state NN can MD be VB mapped VVN to TO an DT ISA NP state NN . SENT This DT is VBZ done VVN by IN creating VVG an DT abstraction NN function NN to TO do VV the DT mapping NN , , and CC proving VVG that IN it PP holds VVZ for IN every DT step NN of IN a DT program NN . SENT Our PP$ method NN and CC traditional JJ methods NNS would MD approach VV this DT as IN an DT induction NN proof NN . SENT The DT most RBS difficult JJ and CC time NN consuming NN parts NNS of IN this DT process NN would MD be VB finding VVG the DT induction NN hypothesis NN the DT right JJ abstraction NN function NN , , and CC proving VVG that IN the DT hypothesis NN was VBD correct JJ . SENT Our PP$ method NN greatly RB reduces VVZ the DT difficulty NN of IN both DT of IN these DT tasks NNS . SENT We PP have VHP discovered VVN a DT processor NN specific JJ trick NN to TO make VV it PP easier JJR to TO find VV an DT appropriate JJ abstraction NN function NN . SENT The DT primary JJ problem NN is VBZ that IN , , at IN any DT given VVN time NN , , the DT MA NN has VHZ a DT number NN of IN partially RB executed VVN instructions NNS in IN its PP$ pipeline NN . SENT For IN example NN , , it PP is VBZ difficult JJ to TO compare VV the DT registers NNS and CC program NN counters NNS of IN the DT MA NN and CC ISA NP because IN the DT MA NN may MD have VH a DT jump NN instruction NN sitting VVG somewhere RB in IN its PP$ pipe NN which WDT updates NNS the DT program NN counter NN in IN a DT different JJ step NN from IN when WRB it PP updates VVZ the DT registers NNS . SENT However RB , , it PP is VBZ relatively RB simple JJ to TO define VV an DT abstraction NN function NN for IN the DT special JJ case NN where WRB there EX are VBP no RB partially RB executed VVN instructions NNS we PP call VVP this DT a DT clean JJ state NN it PP requires VVZ that IN certain JJ parts NNS of IN the DT MA NN state NN , , such JJ as IN the DT registers NNS , , memory NN , , and CC program NN counter NN , , directly RB match VV corresponding JJ state NN in IN the DT ISA NP . SENT Once RB the DT abstraction NN function NN has VHZ been VBN specified VVN in IN a DT clean JJ state NN , , our PP$ method NN can MD automatically RB augment VV the DT definition NN for IN the DT unclean JJ state NN . SENT If IN the DT processor NN starts VVZ in IN an DT unclean JJ state NN , , it PP can MD be VB forced VVN into IN a DT clean JJ state NN by IN stalling VVG e NN . SENT g NN . SENT simulating VVG a DT cache NN miss VVP for IN a DT certain JJ number NN of IN cycles NNS , , which WDT will MD finish VV executing VVG the DT instructions NNS in IN the DT pipeline NN without IN starting VVG any DT new JJ instructions NNS . SENT Using VVG a DT symbolic JJ simulator NN , , this DT stalling VVG process NN can MD be VB executed VVN from IN a DT symbolic JJ state NN , , yielding VVG a DT flushing VVG function NN which WDT can MD be VB applied VVN to TO any DT state NN . SENT The DT flushing VVG function NN can MD then RB be VB composed VVN with IN the DT user NN supplied VVD abstraction NN function NN , , to TO produce VV an DT abstraction NN function NN that WDT works VVZ for IN any DT state NN by IN flushing VVG to TO a DT clean JJ state NN , , then RB applying VVG the DT abstraction NN function NN for IN clean JJ states NNS . SENT The DT second JJ difficult JJ task NN is VBZ proving VVG that IN the DT abstraction NN function NN truly RB is VBZ an DT abstraction NN function NN . SENT Let VV MAstep NN s PP be VB the DT function NN that WDT simulates VVZ one CD step NN of IN the DT MA NN , , starting VVG in IN state NN s PP , , and CC returning VVG the DT resulting VVG state NN as IN a DT value NN . SENT let VV ISAstep NN s PP be VB the DT function NN that WDT simulates VVZ a DT step NN in IN the DT ISA NP given VVN an DT ISA NP state NN s PP . SENT and CC let VV ABS NP s NN be VB the DT abstraction NN function NN , , which WDT maps VVZ MA NN states NNS to TO ISA NP states NNS . SENT Then RB it PP is VBZ necessary JJ to TO prove VV the DT theorem NN ABS NP MAstep NP s PP ISAstep NP ABS NP s PP for IN legal JJ MA NN state NN s PP . SENT In IN other JJ words NNS , , if IN we PP start VVP with IN s PP and CC take VV a DT step NN in IN the DT MA NN , , the DT result NN is VBZ the DT same JJ as IN abstracting VVG s PP first JJ , , then RB taking VVG a DT step NN in IN the DT ISA NP . SENT This DT is VBZ an DT induction NN hypothesis NN that WDT can MD be VB used VVN to TO prove VV that DT programs NNS execute VVP in IN the DT same JJ way NN on IN the DT MA NN and CC ISA NP . SENT We PP use VVP a DT particular JJ logic NN that WDT is VBZ well RB suited VVN for IN processor NN verification NN problems NNS . SENT the DT logic NN of IN equality NN over IN uninterpreted JJ functions NNS . SENT This DT logic NN allows VVZ expressions NNS like IN f NN a DT g NN b SYM to TO appear VV in IN logical JJ formulas NNS , , along RB with IN the DT usual JJ Boolean NP functions NNS . SENT The DT uninterpreted JJ function NN symbols NNS are VBP especially RB useful JJ for IN modelling VVG data NN path NN computations NNS . SENT For IN example NN , , ALU NP opcode NP , , src NP 1 CD , , src NP 2 CD might MD represent VV the DT result NN of IN the DT ALU NP when WRB given VVN opcode NN , , src NP 1 CD , , and CC src NP 2 CD . SENT Note NN that IN the DT expression NN says VVZ nothing NN about IN what WP the DT ALU NP actually RB does VVZ , , or CC how WRB many JJ bits NNS of IN data NNS are VBP in IN src NP 1 CD and CC src NP 2 CD . SENT This DT information NN is VBZ not RB necessary JJ , , since IN the DT MA NN and CC ISA NP have VHP the DT same JJ ALU NP and CC data NN values NNS . SENT We PP have VHP written VVN a DT program NN that WDT checks VVZ validity NN is VBZ it PP always RB true JJ . SENT for IN formulas NNS in IN this DT logic NN . SENT The DT problem NN is VBZ NP NN complete JJ , , so RB the DT program NN is VBZ heuristic JJ . SENT However RB , , it PP has VHZ succeeded VVN on IN some DT large JJ practical JJ examples NNS . SENT The DT theorem NN ABS NP MAstep NP s PP ISAstep NP ABS NP s PP can MD be VB created VVN automatically RB by IN generating VVG the DT abstraction NN function NN as RB described VVD above IN , , and CC generating VVG the DT MAstep NP and CC ISAstep NP functions NNS by IN simulating VVG the DT MA NN and CC ISA NP for IN one CD step NN starting VVG in IN a DT symbolic JJ state NN . SENT We PP have VHP applied VVN this DT formal JJ verification NN method NN to TO the DT FLASH NN protocol NN processor NN PP NP . SENT Formally RB verifying VVG , , the DT PP NP is VBZ a DT challenging JJ problem NN . SENT It PP is VBZ a DT dual JJ pipeline NN VLIW NP processor NN , , with IN complex JJ constraints NNS on IN the DT code NN generated VVN by IN the DT compiler NN . SENT We PP hand RB translated VVN the DT Verilog NP description NN of IN FLASH NN written VVN by IN the DT designers NNS to TO the DT input NN language NN for IN our PP$ symbolic JJ simulator NN . SENT We PP wrote VVD an DT ISA NP for IN the DT PP NP in IN the DT same JJ language NN . SENT We PP have VHP been VBN able JJ to TO formally RB verify VV the DT entire JJ pipeline NN for IN the DT PP NP , , which WDT is VBZ orders NNS of IN magnitude NN more RBR difficult JJ than IN previous JJ examples NNS that WDT have VHP been VBN verified VVN using VVG other JJ methods NNS . SENT To TO do VV this DT , , we PP had VHD to TO deal VV with IN some DT difficulties NNS in IN modelling VVG . SENT First RB , , unlike IN many JJ processors NNS , , the DT PP NP does VVZ not RB continue VV processing VVG the DT instructions NNS already RB in IN the DT pipeline NN when WRB it PP gets VVZ a DT cache NN miss VV . SENT So RB we PP had VHD to TO add VV this DT capability NN in IN order NN to TO get VV the DT abstraction NN function NN . SENT This DT entails VVZ some DT extra JJ work NN , , but CC it PP is VBZ not RB dangerous JJ since IN the DT MAstep NN in IN the DT theorem NN can MD still RB be VB performed VVN using VVG the DT original JJ , , unmodified JJ description NN the DT modified JJ hardware NN is VBZ only RB used VVN to TO obtain VV an DT abstraction NN function NN . SENT Also RB , , the DT compiler NN constraints NNS had VHD to TO be VB written VVN in IN our PP$ logic NN , , and CC used VVN as IN assumptions NNS in IN the DT proof NN of IN correctness NN since IN the DT PP NP is VBZ not RB guaranteed VVN to TO work VV correctly RB if IN the DT constraints NNS are VBP not RB satisfied VVN . SENT The DT PP NP also RB has VHZ a DT complex JJ memory NN system NN with IN caches NNS , , store NN buffers NNS , , and CC so RB on IN . SENT We PP modelled VVD the DT memory NN system NN abstractly RB to TO verify VV the DT pipeline NN . SENT We PP are VBP currently RB working VVG on IN verifying VVG that IN the DT actual JJ memory NN system NN implementation NN is VBZ consistent JJ with IN the DT abstract JJ memory NN . SENT When WRB this DT succeeds VVZ , , the DT results NNS of IN the DT individual JJ pipeline NN and CC memory NN system NN verifications NNS can MD be VB composed VVN , , without IN doing VVG any DT additional JJ verification NN , , with IN full JJ confidence NN that IN the DT results NNS are VBP sound JJ . SENT The DT verification NN of IN the DT memory NN system NN is VBZ posing VVG some DT special JJ problems NNS because IN it PP has VHZ embedded VVN state NN machines NNS , , which WDT seem VVP to TO cause VV our PP$ method NN some DT difficulty NN . SENT Research NP is VBZ currently RB underway JJ to TO understand VV and CC solve VV this DT problem NN . SENT Bibliography NN 1 CD . SENT Amarasinghe NP , , S NP . SENT , , Anderson NP , , J NP . SENT , , Lam NP , , M NP . SENT S PP . SENT , , and CC Tseng NP , , C NP . SENT W NP . SENT An DT Overview NN of IN the DT SUIF NP Compiler NN for IN Scalable JJ Parallel JJ Machines NNS in IN Proceedings NNS of IN the DT 7 CD th NN SIAM NP Conference NP on IN Parallel JJ Processing NP for IN Scientific NP Computing NP . SENT San NP Francisco NP , , CA MD . SENT February NP , , 1995 CD . SENT 2 LS . SENT Anderson NP , , J NP . SENT , , Amarasinghe NP , , S NP . SENT , , and CC Lam NP , , M NP . SENT S NP . SENT Data NNS and CC Computation NN Transformations NNS for IN Multiprocessors NNS in IN Proceedings NNS of IN the DT 5 CD th NN Symposium NN on IN Principles NNS and CC Practice NN of IN Parallel JJ Programming NN . SENT ACM NP SIGPLAN NP . SENT July NP , , 1995 CD . SENT To TO appear VV . SENT 3 LS . SENT Chandra NP , , R NN . SENT , , Devine NP , , S NP . SENT , , Verghese NP , , B NN . SENT , , Gupta NP , , A NP . SENT , , et CC al NP . SENT Scheduling NP and CC Page NP Migration NN for IN Multiprocessor NN Compute VV Servers NNS in IN Sixth NP International NP Conference NP on IN Architectural JJ Support NN for IN Programming NN Languages NNS and CC Operating VVG Systems NP . SENT ACM NP IEEE NP . SENT San NP Jose NP , , CA MD . SENT pgs NNS . SENT 12 CD 24 CD . SENT October NP , , 1994 CD . SENT 4 LS . SENT Chapin NP , , J NP . SENT , , Herrod NP , , S NP . SENT , , Rosenblum NP , , M NP . SENT , , and CC Gupta NP , , A NP . SENT Memory NN System NP Performance NP of IN UNIX NP on IN CC NP NUMA NP Multiprocessors NP in IN Joint NP International NP Conference NP on IN Measurement NN and CC Modeling NN of IN Computer NP Systems NPS . SENT ACM NP Sigmetrics NP . SENT Ottawa NP , , Ontario NP , , Canada NP . SENT May NP , , 1995 CD . SENT To TO appear VV 5 CD . SENT French NP , , R NN . SENT , , Lam NP , , M NP . SENT S PP . SENT , , Levitt NP , , J NP . SENT , , and CC Olukotun NP , , K NP . SENT A DT General NP Method NN for IN Compiling VVG Event NP Drive NP Simulations NNS in IN 32 CD nd NN Design NN Automation NN Conference NN . SENT IEEE NP ACM NP . SENT San NP Francisco NP , , CA MD . SENT June NP , , 1995 CD . SENT To TO appear VV . SENT 6 CD . SENT Hall NP , , M NP . SENT , , Murphy NP , , B NN . SENT , , and CC Amarasinghe NP , , S NP . SENT Interprocedural NP Parallelization NP Analysis NP . SENT A DT Case NN Study NN in IN Proceedings NNS of IN the DT 7 CD th NN SIAM NP Conference NP on IN Parallel JJ Processing NP for IN Scientific NP Computing NP . SENT San NP Francisco NP , , CA MD . SENT February NP , , 1995 CD . SENT 7 CD . SENT Heinlein NP , , J NP . SENT , , Gharachorloo NP , , K NP . SENT , , Dresser NP , , S NP . SENT , , and CC Gupta NP , , A NP . SENT Integration NN of IN Message NP Passing VVG and CC Shared VVD Memory NP in IN the DT Stanford NP FLASH NN Multiprocessor NN in IN Sixth NP International NP Conference NP on IN Architectural JJ Support NN for IN Programming NN Languages NNS and CC Operating VVG Systems NP . SENT ACM NP IEEE NP . SENT San NP Jose NP , , CA MD . SENT pgs NNS . SENT 38 CD 50 CD . SENT October NP , , 1994 CD . SENT 8 CD . SENT Heinrich NP , , M NP . SENT , , Kuskin NP , , J NP . SENT , , Ofelt NP , , D NP . SENT , , Heinlein NP , , J NP . SENT , , et CC al NP . SENT The DT Performance NP Impact NN of IN Flexibility NN in IN the DT Stanford NP FLASH NN Multiprocessor NN in IN Sixth NP International NP Conference NP on IN Architectural JJ Support NN for IN Programming NN Languages NNS and CC Operating VVG Systems NP . SENT ACM NP IEEE NP . SENT San NP Jose NP , , CA MD . SENT pgs NNS . SENT 274 CD 285 CD . SENT October NP , , 1994 CD . SENT 9 CD . SENT Ho NP , , R NN . SENT , , Yang NP , , C NP . SENT H NN . SENT , , Horowitz NP , , M NP . SENT , , and CC Dill NP , , D NP . SENT Architecture NN Validation NN for IN Processors NNS in IN 22 CD nd NN International JJ Symposium NN on IN Computer NP Architecture NP . SENT ACM NP SIGARCH NP and CC IEEE NP TCCA NP . SENT Santa NP Margherita NP Ligure NN , , Italy NP . SENT June NP , , 1995 CD . SENT To TO appear VV . SENT 10 CD . SENT Holt NP , , C NP . SENT , , Heinrich NP , , M NP . SENT , , Singh NP , , J NP . SENT P NN . SENT , , Rothberg NP , , E NN . SENT , , et CC al NP . SENT The DT Effects NNS of IN Latency NN , , Occupancy NN , , and CC Bandwidth NN in IN Distributed NP Shared VVD Memory NP Multiprocessors NP . SENT Stanford NP University NP Computer NP Systems NPS Laboratory NP . SENT Technical NP Report NP , , CSL NP TR NP 95 CD 660 CD . SENT January NP 1995 CD . SENT 11 CD . SENT Nayfeh NP , , B NP . SENT and CC Oluktoun NP , , K NP . SENT Evaluating VVG the DT Impact NN of IN Clustering VVG for IN Small NP Scale NP Shared VVD Memory NP Multiprocessors NP in IN 2 CD nd NN International JJ Symposium NN on IN High NP Performance NP Computer NP Architecture NP . SENT IEEE NP ACM NP . SENT San NP Jose NP , , CA MD . SENT February NP , , 1996 CD . SENT Submitted VVN . SENT 12 CD . SENT Olukotun NP , , K NP . SENT , , Bergmann NP , , J NP . SENT , , and CC Chang NP , , K NP . SENT Y NP . SENT Rationale NN and CC Design NN of IN the DT Hydra NN Multiprocessor NN . SENT Stanford NP University NP Computer NP Systems NPS Laboratory NP . SENT Technical NP Report NP , , CSL NP TR NP 94 CD 645 CD . SENT November NP 1994 CD . SENT 13 CD . SENT Rosenblum NP , , M NP . SENT , , Herrod NP , , S NP . SENT , , Witchel NP , , E NN . SENT , , and CC Gupta NP , , A NP . SENT , , Complete JJ Computer NP System NP Simulation NP . SENT The DT SimOS NP Approach NN . SENT IEEE NP Journal NP of IN Parallel NN and CC Distributed NP Technology NP . SENT November NP 1995 CD . SENT To TO appear VV . SENT 14 CD . SENT Rosenblum NP , , M NP . SENT Hive NN . SENT Fault NN Containment NN for IN Shared JJ Memory NP Multiprocessors NP in IN SOSP NP 95 CD . SENT 1995 CD . SENT Submitted VVN . SENT 15 CD . SENT Sidiropoulos NP , , S NP . SENT and CC Horowitz NP , , M NP . SENT Current JJ Integrating VVG Receivers NNS for IN High NP Speed NN System NP Interconnects NP in IN Custom NP Integrated NP Circuits NNS Conference NN . SENT IEEE NP Electron NP Devices NPS Society NP . SENT Santa NP Clara NP , , CA MD . SENT May NP , , 1995 CD . SENT To TO appear VV . SENT 16 CD . SENT Verghese NP , , B NN . SENT , , Devine NP , , S NP . SENT , , Rosenblum NP , , M NP . SENT , , and CC Gupta NP , , A NP . SENT OS NN Support NN for IN Improving VVG Data NP Locality NN on IN CC NP NUMA NP Compute VV Servers NNS in IN Symposium NN on IN Systems NP Programming NN . SENT 1996 CD . SENT To TO appear VV . SENT 17 CD . SENT Wilson NP , , R NN . SENT , , French NP , , R NN . SENT , , Wilson NP , , C NP . SENT , , Amarasinghe NP , , S NP . SENT , , et CC al NP . SENT , , SUIF NP . SENT An DT Infrastructure NN for IN Research NP on IN Parallelizing VVG and CC Optimizing VVG Compilers NNS . SENT ACM NP SIGPLAN NP Notices NNS . SENT Vol NP . SENT 29 CD 12 CD . SENT pgs NNS . SENT 31 CD 37 CD . SENT December NP 1994 CD . SENT 18 CD . SENT Woo NP , , S NP . SENT , , Singh NP , , J NP . SENT P NN . SENT , , and CC Hennessy NP , , J NP . SENT The DT Performance NP Advantages NNS of IN Integrating VVG Block NP Data NP Transfer NN in IN Cache NP Coherent JJ Multiprocessors NNS in IN Sixth NP International NP Conference NP on IN Architectural JJ Support NN for IN Programming NN Languages NNS and CC Operating VVG Systems NP . SENT ACM NP IEEE NP . SENT San NP Jose NP , , CA MD . SENT pgs NNS . SENT 219 CD 230 CD . SENT October NP , , 1994 CD . SENT 19 CD . SENT Woo NP , , S NP . SENT , , Ohara NP , , M NP . SENT , , Torrie NP , , E NN . SENT , , Singh NP , , J NP . SENT P NN . SENT , , et CC al NP . SENT Methodological JJ Considerations NNS and CC Characterization NN of IN the DT SPLASH NN 2 CD Parallel NN Application NN Suite NP in IN 22 CD nd NN International JJ Symposium NN on IN Computer NP Architecture NP . SENT ACM NP SIGARCH NP and CC IEEE NP TCCA NP . SENT Santa NP Margherita NP Ligure NN , , Italy NP . SENT June NP , , 1995 CD . SENT To TO appear VV . SENT Last JJ modified VVN 5 CD 24 CD 95 CD by IN Joel NP Baxter NP , , webmaster JJR www JJ flash NN . SENT stanford NP . SENT edu NN . SENT