/Users/andrea/_magisterarbeit/korpus/clean/trainkorpus/1/file6.html NN ----------------------------------------- : Assembling VVG a DT Balanced JJ Corpus NP from IN the DT Internet NP Johan NP Dewe NP , , Telia NNS Research NP , , Jussi NP Karlgren NP , , SICS VVZ and CC University NP of IN Helsinki NP , , and CC Ivan NP Bretan NP , , Telia NNS Research NP . SENT Address NN for IN correspondence NN . SENT Jussi NP Karlgren NP , , SICS VVZ , , Box NP 1263 CD , , 164 CD 29 CD Kista NP , , Sweden NP Fax NP . SENT 46 CD 8 CD 751 JJ 72 CD 30 CD Jussi NP . SENT Karlgren NP sics VVZ . SENT se FW Balanced JJ Corpora NNS for IN Textual JJ Research NP For IN empirically RB oriented VVN textual JJ research NN it PP is VBZ crucial JJ to TO have VH materials NNS available JJ for IN extraction NN of IN statistics NNS , , training VVG probabilistic JJ algorithms NNS , , and CC testing VVG hypotheses NNS about IN language NN and CC language NN processing NN in IN general NN . SENT In IN recent JJ years NNS , , the DT awareness NN that IN text NN is VBZ not RB just RB text NN , , but CC that IN texts NNS comes VVZ in IN several JJ forms NNS , , has VHZ spread VVN from IN more JJR theoretical JJ and CC literary JJ subfields NNS of IN linguistics NNS to TO the DT more RBR practically RB oriented VVN information NN retrieval NN and CC natural JJ language NN processing NN fields NNS . SENT As IN a DT consequence NN , , several JJ test NN collections NNS available JJ for IN research NN explicitly RB attempt VVP to TO cover VV many JJ or CC most RBS well RB established VVN textual JJ genres NNS , , or CC functional JJ styles NNS in IN well RB balanced JJ proportions NNS Francis NP and CC Kucera NP , , 1982 CD . SENT K NP llgren NP , , 1990 CD . SENT The DT creation NN of IN such PDT a DT collection NN is VBZ a DT complex JJ matter NN in IN several JJ respects NNS . SENT Our PP$ reseach NN area NN is VBZ to TO build VV retrieval NN tools NNS for IN the DT Internet NN , , and CC thus RB , , for IN our PP$ purposes NNS , , the DT choice NN of IN genres NNS to TO include VV is VBZ one CD of IN the DT more JJR central JJ problems NNS . SENT there EX is VBZ no DT well RB established VVN genre NN palette NN for IN Internet NN materials NNS . SENT To TO find VV materials NNS to TO experiment NN with IN , , we PP need VVP to TO create VV them PP in IN a DT form NN suitable JJ for IN our PP$ purposes NNS . SENT This DT is VBZ a DT double RB edged VVN problem NN , , involving VVG both DT vaguely RB expressed VVN user NN expectations NNS and CC establishing VVG categories NNS using VVG large JJ numbers NNS of IN features NNS which WDT taken VVN singly RB have VHP low JJ predictive JJ and CC explanatory JJ power NN . SENT This DT paper NN gives VVZ an DT outline NN of IN the DT methodology NN we PP use VVP for IN determining VVG which WDT genres NNS to TO include VV . SENT Stylistic JJ Variation NN and CC Genre NP Texts NNS exhibit NN considerable JJ variation NN . SENT While IN the DT variation NN in IN topic NN or CC content NN is VBZ quite RB obvious JJ and CC the DT basis NN for IN most JJS categorization NN enterprises NNS in IN information NN retrieval NN research NN variation NN in IN style NN is VBZ as RB noticeable JJ , , and CC forms VVZ a DT second JJ basis NN for IN categorization NN . SENT poetry NN , , prose NN , , non JJ fiction NN , , reference NN materials NNS , , and CC so RB forth RB are VBP all DT stylistic JJ categories NNS or CC genres NNS . SENT Stylistic JJ variation NN shows NNS through IN stylistic JJ items NNS . SENT observable JJ choices NNS of IN linguistic JJ items NNS . SENT Stylistic JJ items NNS can MD be VB observed VVN on IN any DT level NN of IN linguistic JJ abstraction NN . SENT lexical JJ , , for IN the DT choice NN between IN words NNS of IN similar JJ meaning NN but CC different JJ connotations NNS . SENT syntactic JJ , , for IN the DT choice NN between IN equivalent JJ constructions NNS with IN different JJ communicative JJ import NN . SENT textual JJ , , for IN decisions NNS of IN textual JJ organization NN . SENT Each DT stylistic JJ item NN is VBZ of IN little JJ import NN , , but CC taken VVN together RB they PP are VBP indicative JJ of IN systematic JJ differences NNS . SENT A DT set NN of IN documents NNS with IN a DT perceived VVN consistent JJ tendency NN to TO make VV the DT same JJ stylistic JJ choices NNS is VBZ called VVN a DT genre NN or CC , , specifically RB , , if IN it PP has VHZ an DT established JJ communicative JJ function NN , , a DT functional JJ style NN see VVP e NN . SENT g NN . SENT Enkvist NP , , 1973 CD . SENT Vachek NP , , 1975 CD . SENT Stylistic JJ variation NN between IN genres NNS or CC language NN varieties NNS can MD be VB detected VVN reliably RB using VVG a DT large JJ battery NN of IN quite RB simple JJ stylistic JJ items NNS such JJ as IN pronoun NN counts NNS or CC relative JJ frequencies NNS of IN certain JJ types NNS of IN constructions NNS such JJ as IN agentless JJ passives NNS Biber NP , , 1988 CD , , 1989 CD . SENT Karlgren NP and CC Cutting NP , , 1994 CD , , utilized VVN for IN authorship NN determination NN by IN simple JJ calculations NNS of IN average JJ word NN length NN distributions NNS Mendenhall NP , , 1887 CD , , and CC with IN some DT success NN predictively RB for IN information NN retrieval NN Karlgren NP , , 1996 CD . SENT Karlgren NP and CC Straszheim NP , , 1997 CD . SENT Stralkowski NP et NP al NP , , 1996 CD . SENT Establishing VVG Genres NNS Method NN In IN previous JJ similar JJ studies NNS , , we PP have VHP used VVN introspective JJ methods NNS . SENT we PP have VHP established VVN genres NNS mainly RB based VVN on IN personal JJ experience NN Ben NP Cheikh NP and CC Zackrisson NP , , 1994 CD . SENT Hussain NP and CC Tzikas NP , , 1995 CD . SENT Other JJ text NN collections NNS organized VVN by IN genre NN , , genre NN is VBZ largely RB equated VVN with IN source NN . SENT Texts NNS from IN some DT organization NN are VBP categorized VVN together RB with IN texts NNS from IN similar JJ organizations NNS , , without IN regard NN for IN text NN usage NN . SENT e SYM . SENT g NN journalistic JJ press NN archives NN , , personal JJ letters NNS , , technical JJ documentation NN . SENT e SYM . SENT g NN . SENT K NP llgren NP , , 1990 CD . SENT For IN this DT study NN , , we PP wished VVD to TO have VH a DT better JJR foundation NN for IN our PP$ genre NN palette NN . SENT Our PP$ basic JJ souce NN of IN knowledge NN is VBZ interviewing VVG users NNS about IN their PP$ perceptions NNS of IN what WP types NNS of IN material NN they PP find VVP and CC interact VV with IN online JJ . SENT We PP collate VV the DT impressions NNS and CC try VVP to TO define VV genres NNS that WDT are VBP both CC reasonably RB consistent JJ with IN what WP users NNS expect VVP and CC observable JJ and CC conveniently RB computable JJ using VVG measures NNS of IN stylistic JJ variation NN as IN outlined VVN in IN the DT previous JJ section NN . SENT Cf NP . SENT Figure NN 1 CD . SENT Figure NN 1 CD . SENT A DT snapshot NN of IN the DT methodology NN shows VVZ the DT interplay NN between IN vaguely RB expressed VVN user NN expectations NNS and CC observable JJ and CC conveniently RB computable JJ categories NNS . SENT Questionnaire NN The DT questionnaire NN in IN Figure NP 2 CD was VBD sent VVN to TO 648 CD computer NN users NNS students NNS , , researchers NNS , , and CC teachers NNS at IN Stockholm NP University NP and CC the DT Royal NP Institute NP of IN Technology NP . SENT We PP received VVD 7 CD error NN messages NNS and CC 67 CD responses NNS , , which WDT gives VVZ a DT response NN rate NN of IN 10 CD per IN cent NN . SENT Hi NP . SENT I PP need VVP two CD minutes NNS of IN your PP$ time NN . SENT For IN my PP$ M NP Sc NN project NN I PP will MD classify VV WWW NP documents NNS by IN genre NN . SENT What WP is VBZ a DT genre NN . SENT A DT genre NN is VBZ a DT group NN of IN documents NNS with IN similarities NNS as RB regards VVZ form NN . SENT Journalistic JJ material NN , , for IN instance NN , , gives VVZ us PP several JJ examples NNS of IN genres NNS . SENT We PP find VVP scientific JJ materials NNS , , short JJ stories NNS , , news NN items NNS , , advertisments NNS , , and CC so RB forth RB . SENT In IN a DT larger JJR perspective NN a DT newspaper NN itself PP is VBZ a DT genre NN , , as RB compared VVN to TO crime NN fiction NN , , parliamentary JJ records NNS , , and CC chat VV group NN text NN . SENT Similarly RB , , it PP should MD be VB possible JJ to TO categorize VV materials NNS from IN the DT WWW NP in IN genres NNS . SENT The DT obvious JJ ones NNS I PP can MD figure VV out IN myself PP , , but CC I PP do VVP not RB want VV to TO constrain VV myself PP to TO a DT single JJ perspective NN . SENT So RB I PP need VVP your PP$ help NN to TO gain VV a DT wider JJR view NN . SENT What WP genres NNS do VVP you PP feel VVP you PP find VVP on IN the DT WWW NP . SENT Take VV a DT minute NN to TO think VV over RP the DT question NN , , and CC send VV me PP a DT list NN of IN the DT genres NNS that WDT occur VVP to TO you PP . SENT All DT replies NNS are VBP useful JJ to TO me PP . SENT Thank VV you PP for IN your PP$ time NN , , Johan NP Dewe NP , , d SYM 92 CD jde NN nada NP . SENT kth NN . SENT se FW Figure NP 2 CD . SENT The DT genre NN questionnaire NN This DT is VBZ an DT English JJ translation NN . SENT The DT Swedish JJ original NN can MD be VB found VVN at IN http NN . SENT www JJ . SENT stacken NP . SENT kth NN . SENT se FW dewe NP dropjaw NN enkat NN . SENT txt NN Compiling VVG the DT results NNS Science NN , , Entertainment NP , , Information NP Here RB I PP am VBP , , Sales NNS pitches NNS , , Serious JJ material NN Home NN pages NNS Data NP bases NNS Guest NP books NNS Comics NNS Pornography NN FAQs NP Search NP pages NNS Corporate JJ info NN Product NN info NN Reference NN materials NNS My PP$ immediate JJ reaction NN is VBZ that IN genres NNS from IN general JJ society NN will MD be VB found VVN on IN the DT WWW NP as RB well RB . SENT We PP get VVP stuck VVN in IN old JJ conventions NNS . SENT . SENT . SENT . SENT e SYM . SENT g NN . SENT e SYM mail NN conventions NNS follow VVP paper NN letter NN conventions NNS . SENT I PP would MD start VV by IN using VVG genres NNS from IN ordinary JJ life NN and CC see VV if IN they PP are VBP applicable JJ to TO WWW NP . SENT Home NN pages NNS Public JJ info NN Non JJ government NN organization NN info NN Search NP info NN Corporate JJ info NN Informative JJ advertisements NNS Non JJ informative JJ advertisments NNS Research NP materials NNS Games NP and CC pornography NN News NP Economic NP info NN News NP Tourism NP Sports NP Games NPS Adult NN pages NNS Science NP Culture NP Language NP Media NP Public NP documents NNS , , Internal NP documents NNS , , Personal JJ documents NNS Information NP Check NP out IN what WP a DT flashy JJ page NN I PP can MD code VV I PP guess VVP we PP have VHP to TO be VB on IN the DT net NN too RB Figure NP 3 CD . SENT Some DT translated VVN excerpts NNS from IN the DT answers NNS to TO the DT questionnaire NN . SENT The DT answers NNS in IN their PP$ entirety NN can MD be VB found VVN at IN http NN . SENT www JJ . SENT stacken NP . SENT kth NN . SENT se FW dewe NP dropjaw NN enkatsvar NN . SENT txt NN . SENT The DT answers NNS ranged VVD from IN very RB short JJ to TO extensive JJ discussions NNS some DT examples NNS are VBP shown VVN in IN Figure NP 3 CD . SENT It PP was VBD very RB clear JJ to TO us PP from IN that IN most JJS readers NNS conflated VVD genre NN and CC form NN on IN the DT one CD hand NN with IN content NN and CC topic NN on IN the DT other JJ . SENT tourism NN , , sports NNS , , games NNS , , adult NN pages NNS . SENT This DT is VBZ not RB surprising JJ . SENT Genre NN and CC topic NN are VBP not RB independent JJ dimensions NNS of IN variation NN , , and CC a DT typical JJ library NN categorization NN reflects VVZ both DT dimensions NNS simultaneously RB . SENT Several JJ respondents NNS did VVD give VV examples NNS of IN more RBR cleanly RB form VV oriented JJ genres NNS as IN well RB . SENT home NN pages NNS , , data NNS bases NNS , , FAQs NP , , search NN pages NNS , , reference NN materials NNS . SENT Some DT respondents NNS gave VVD explicit JJ references NNS to TO paper NN genres NNS one CD lengthy JJ quote NN is VBZ given VVN among IN the DT examples NNS in IN Figure NP 3 CD . SENT The DT intention NN of IN the DT information NN provider NN showed VVD up RP as IN a DT genre NN formation NN criterion NN in IN several JJ responses NNS . SENT here RB I PP am VBP , , sales NNS pitches NNS , , serious JJ material NN . SENT or CC , , as IN an DT alternative JJ formulation NN of IN the DT same JJ criterion NN , , the DT type NN of IN author NN . SENT commercial JJ info NN , , public JJ info NN , , non JJ governmental JJ organization NN info NN . SENT Some DT responses NNS explicitly RB brought VVD up RP quality NN . SENT boring JJ home NN pages NNS and CC text NN ecology NN or CC intended JJ environment NN . SENT public JJ documents NNS , , internal JJ documents NNS , , personal JJ documents NNS . SENT We PP have VHP attempted VVN to TO systematize VV some DT of IN the DT user NN perceived VVN distinctions NNS , , namely RB those DT that WDT are VBP predictable JJ enough RB to TO be VB modeled VVN with IN simple JJ metrics NNS , , in IN the DT genre NN palette NN shown VVN in IN Figure NP 4 CD . SENT Informal JJ , , Private JJ Personal JJ home NN pages NNS . SENT Public JJ , , commercial JJ Home NP pages NNS for IN the DT general JJ public NN . SENT Searchable JJ indices NN Pages NP with IN feed NN back RB . SENT customer NN dialogue NN . SENT searchable JJ indexes NNS . SENT Journalistic JJ materials NNS Press NP . SENT news NN , , reportage NN , , editorials NNS , , reviews NNS , , popular JJ reporting NN , , e NN zines NNS . SENT Reports NP Scientific NP , , legal JJ , , and CC public JJ materials NNS . SENT formal JJ text NN . SENT Other JJ running VVG text NN FAQs NP Link NN Collections NNS Other JJ listings NNS and CC tables NNS Asynchronous JJ multi NNS party NN correspondence NN Contributions NNS to TO discussions NNS , , requests NNS , , comments NNS . SENT Usenet NP News NP materials NNS . SENT Error NN Messages NNS Figure NP 4 CD . SENT The DT current JJ genre NN palette NN . SENT When WRB trying VVG to TO assign VV textual JJ materials NNS to TO the DT various JJ categories NNS automatically RB we PP expect VVP to TO find VV that IN some DT genres NNS are VBP not RB as RB useful JJ as IN they PP may MD seem VV at IN first JJ sight NN . SENT we PP will MD find VV that IN some DT of IN these DT categories NNS may MD have VH to TO be VB adjusted VVN merged VVN , , split NN , , or CC redefined VVD as IN the DT collection NN is VBZ evaluated VVN using VVG statistical JJ methods NNS . SENT The DT categories NNS shown VVN in IN Figure NP 4 CD are VBP starting VVG points NNS for IN research NN , , not RB final JJ results NNS . SENT Finding NP Samples NP We PP use VVP three CD methods NNS to TO collect VV data NNS from IN the DT World NP Wide NP Web NP . SENT Firstly RB , , we PP take VVP queries NNS used VVN for IN the DT Text NN Retrieval NP Conference NP Harman NP , , 1996 CD TREC NP queries NNS nos NNS . SENT 251 CD 300 CD . SENT fields NNS topic NN and CC description NN and CC run VV them PP through IN Altavista NP , , a DT search NN service NN on IN the DT Internet NN . SENT We PP use VVP the DT top JJ ten CD hits NNS for IN each DT query NN to TO retrieve VV about IN 500 CD documents NNS . SENT Secondly RB , , we PP take VVP sixty CD queries NNS from IN Magellan NP , , another DT search NN service NN on IN the DT Internet NN . SENT Magellan NP provides VVZ a DT voyeur NN page NN http NN . SENT mckinley NP . SENT voyeur NN . SENT com NN voyeur NN . SENT cgi NN voyeur NN . SENT 1 CD which WDT displays VVZ real JJ user NN queries NNS in IN real JJ time NN . SENT We PP run VVP the DT sixty CD queries NNS through IN Magellan NP , , and CC similarly RB obtain VV about IN 600 CD documents NNS . SENT Thirdly RB we PP use VVP history NN files NNS from IN local JJ Netscape NP users NNS to TO retrieve VV about IN 700 CD additional JJ documents NNS . SENT URL NP source NN TREC NP Magellan NP History NN Total JJ via IN Voyeur NN List NP Altavista NP 01 CD Informal JJ , , Private JJ 11 CD 67 CD 50 JJ 128 CD 02 CD Public NP , , Commercial NP 23 CD 87 CD 87 JJ 197 CD 03 CD Searchable JJ indices NN 4 CD 14 CD 55 JJ 73 CD 04 CD Journalistic JJ materials NNS 50 CD 28 CD 16 JJ 94 CD 05 CD Reports NNS 106 JJ 5 CD 2 CD 113 CD 06 CD Other JJ running VVG text NN 73 CD 49 CD 38 JJ 160 CD 07 CD FAQs NP 0 CD 4 CD 8 CD 12 CD 08 CD Link NN Collections NNS 31 CD 50 CD 67 JJ 148 CD 09 CD Listings NNS , , Tables NP 17 CD 138 CD 70 JJ 225 CD 10 CD Discussions NNS 16 JJ 0 CD 8 CD 24 CD 11 CD Error NN Messages NNS 55 CD 36 CD 93 CD 184 CD Total JJ 386 CD 478 CD 494 CD 1358 CD Figure NN 5 CD . SENT The DT current JJ composition NN of IN the DT corpus NN . SENT Evaluating VVG the DT choice NN of IN genres NNS To TO evaluate VV the DT genre NN palette NN we PP sent VVD out RP the DT list NN of IN genres NNS we PP settled VVD on IN to TO the DT same JJ recipients NNS we PP originally RB solicited VVD the DT genre NN distinctions NNS from IN , , with IN a DT question NN if IN they PP understood VVD what WP the DT genres NNS represented VVD and CC if IN any DT obvious JJ genre NN was VBD missing VVG . SENT We PP received VVD 102 CD responses NNS . SENT Most JJS respondents NNS claimed VVD to TO understand VV what WP type NN of IN text NN our PP$ genre NN labels NNS were VBD intended VVN to TO cover VV , , and CC while IN most JJS categories NNS got VVD some DT comments NNS of IN one CD form NN or CC another DT , , most JJS comments NNS were VBD caused VVN by IN our PP$ giving VVG too RB few JJ examples NNS of IN what WP the DT genres NNS were VBD intended VVN to TO cover VV . SENT Most JJS comments NNS concerned VVD the DT category NN Interactive JJ pages NNS . SENT Many JJ respondents NNS were VBD annoyed VVN by IN the DT fact NN that IN the DT category NN was VBD not RB of IN the DT same JJ type NN as IN the DT other JJ types NNS . SENT Some DT respondents NNS objected VVD or CC did VVD not RB understand VV the DT labels NNS e NN . SENT g NN . SENT FAQ NP or CC Listings NNS , , tables NNS or CC Error NN messages NNS . SENT many JJ asked VVD for IN a DT download NN page NN or CC ftp NN database NN category NN . SENT some DT wondered VVD about IN the DT all DT inclusiveness NN of IN Other JJ running VVG text NN . SENT several JJ asked VVD for IN a DT specific JJ category NN for IN Search NP engines NNS . SENT several JJ suggested VVD more JJR content JJ based VVN genres NNS . SENT Many JJ pointed VVD out RP that IN some DT of IN the DT categories NNS were VBD less RBR suitable JJ for IN search NN in IN that WDT they PP did VVD not RB imagine VV themselves PP ever RB searching VVG for IN Error NN messages NNS or CC Interactive JJ pages NNS specifically RB . SENT Several JJ respondents NNS pointed VVD out RP that IN the DT categories NNS were VBD not RB mutually RB exclusive JJ . SENT In IN summary NN , , the DT most RBS central JJ objections NNS were VBD either RB such JJ that WDT would MD be VB remedied VVN in IN an DT interactive JJ situation NN where WRB examples NNS are VBP readily RB available JJ , , or CC requests NNS for IN more JJR flexible JJ genre NN assignment NN . SENT Recognizing VVG genres NNS automatically RB The DT genre NN palette NN , , besides IN being VBG intuitively RB understandable JJ , , needs VVZ to TO be VB workable JJ for IN automatic JJ analysis NN . SENT We PP calculate VVP a DT quite RB large JJ number NN of IN textual JJ features NNS for IN each DT individual JJ text NN and CC work VV them PP together RB for IN a DT categorization NN decision NN using VVG a DT machine NN learning VVG algorithm NN . SENT The DT pioneering VVG work NN by IN Douglas NP Biber NP 1988 CD , , 1989 CD on IN computational JJ corpus NN based VVN stylistics NNS has VHZ been VBN descriptive JJ rather RB than IN predictive JJ , , aiming VVG to TO find VV distinctions NNS between IN different JJ registers NNS opr NN varieties NNS of IN spoken VVN and CC written VVN language NN . SENT It PP has VHZ made VVN use NN of IN large JJ numbers NNS of IN stylistic JJ features NNS collected VVN from IN previous JJ , , non JJ computational JJ work NN and CC weighing VVG them PP together RB using VVG standard JJ methods NNS from IN multivariate JJ statistics NNS . SENT We PP use VVP this DT work NN as IN a DT basis NN for IN ours PP . SENT Most JJS of IN Biber's NP features VVZ we PP use VVP here RB are VBP rather RB lexical JJ in IN nature NN , , for IN ease NN of IN processing NN . SENT the DT relative JJ frequency NN of IN certain JJ classes NNS of IN words NNS such JJ as IN personal JJ pronouns NNS , , emphatic JJ expressions NNS , , or CC downtoning VVG expressions NNS , , for IN instance NN . SENT We PP add VVP more RBR general JJ textual JJ and CC genre NN specific JJ features NNS . SENT relative JJ number NN of IN digits NNS , , or CC average JJ word NN length NN , , for IN instance NN Karlgren NP , , 1996 CD . SENT Karlgren NP and CC Straszheim NP , , 1997 CD . SENT Others NNS yet RB are VBP vectored VVN specifically RB to TO the DT Internet NN material NN we PP have VHP been VBN using VVG for IN experimentation NN . SENT number NN of IN images NNS or CC number NN of IN HREF NP links NNS in IN the DT document NN , , for IN instance NN . SENT We PP normalize VV the DT measurements NNS by IN mean JJ and CC standard JJ deviation NN , , and CC combine VV them PP 40 CD of IN them PP , , at IN present NN into IN simple JJ if IN then RB categorization NN rules NNS using VVG C $ 4 CD . SENT 5 CD , , a DT non JJ parametric JJ categorization NN tool NN Quinlan NP , , 1993 CD . SENT If IN there EX are VBP more JJR because IN than IN average NN , , longer JJR words NNS than IN average NN , , type NN token JJ ratio NN is VBZ above RB average JJ , , then RB the DT object NN is VBZ of IN class NN Textual JJ with IN a DT certainty NN of IN 90 CD . SENT 0 CD . SENT Figure NN 6 CD . SENT An DT example NN classification NN rule NN . SENT We PP have VHP a DT few JJ dozen NN rules NNS to TO categorize VV texts NNS into IN one CD of IN the DT eleven NN genres NNS defined VVN in IN the DT above JJ sections NNS . SENT The DT genres NNS partition NN into IN two CD major JJ hypercategories NNS . SENT textual JJ 04 CD , , 05 CD , , 06 CD , , 07 CD , , 10 CD and CC non JJ textual JJ 01 CD , , 02 CD , , 03 CD , , 08 CD , , 09 CD , , 11 CD . SENT each DT of IN them PP in IN turn NN splits NNS to TO one CD of IN five CD or CC six CD sub NN categories NNS . SENT These DT splits NNS are VBP of IN varying VVG quality NN . SENT the DT first JJ does VVZ quite RB well RB , , something NN like IN a DT ninety CD per IN cent NN success NN rate NN , , while IN the DT subsplits NNS make VVP the DT wrong JJ choice NN somewhere RB between IN once RB in IN three CD or CC four CD times NNS . SENT With IN additional JJ features NNS and CC a DT better JJR defined VVN genre NN palette NN results NNS will MD improve VV . SENT However RB , , to TO get VV really RB useful JJ results NNS the DT categorization NN should MD not RB be VB exclusive JJ . SENT Every DT object NN should MD potentially RB be VB of IN several JJ genres NNS . SENT Conclusions NNS Internet NN users NNS have VHP a DT vague JJ sense NN of IN genres NNS among IN the DT documents NNS they PP retrieve VVP and CC read VVP . SENT The DT impressions NNS users NNS have VHP of IN genre NN can MD be VB elicited VVN and CC to TO some DT extent NN formalized VVD enough RB for IN genre NN collection NN . SENT The DT names NNS of IN genres NNS should MD be VB judiciously RB chosen VVN to TO be VB on IN an DT appropriate JJ level NN of IN abstraction NN so IN that DT mismatches NNS will MD not RB faze VV readers NNS . SENT References NNS Douglas NP Biber NP . SENT 1988 CD . SENT Variation NN across IN speech NN and CC writing NN . SENT Cambridge NP University NP Press NP . SENT Douglas NP Biber NP . SENT 1989 CD . SENT A DT typology NN of IN English JJ texts NNS , , Linguistics NNS , , 27 CD . SENT 3 CD 43 CD . SENT Naoufel NP Ben NP Cheikh NP and CC Magnus NP Zackrisson NP . SENT 1994 CD . SENT Genrekategorisering NP av NP text NN f SYM r NN filtrering VVG av NP elektroniska NP meddelanden NP Genre NP Classification NN of IN Texts NNS for IN Filtering VVG of IN Electronic JJ Messages NNS Stockholm NP University NP Bachelor's NP thesis NN in IN Computer NP and CC Systems NP Sciences NPS , , Stockholm NP University NP . SENT Nils NP Erik NP Enkvist NP . SENT 1973 CD . SENT Linguistic JJ Stylistics NP . SENT The DT Hague NP . SENT Mouton NP . SENT Donna NP Harman NP ed NP . SENT . SENT 1996 CD . SENT The DT Fourth JJ Text NN REtrieval NN Conference NP TREC NP 4 CD . SENT National NP Institute NP of IN Standards NP Special JJ Publication NN 500 CD 236 CD . SENT Washington NP . SENT Fahima NP Polly NP Hussain NP and CC Ioannis NP Tzikas NP . SENT 1995 CD . SENT Ordstatistisk NP kategorisering NP av NP text NN f SYM r NN filtrering VVG av NP elektroniska NP meddelanden NP Genre NP Classification NN of IN Texts NNS by IN Word NP Occurrence NN Statistics NNS for IN Filtering VVG of IN Electronic JJ Messages NNS Stockholm NP University NP Bachelor's NP thesis NN in IN Computer NP and CC Systems NP Sciences NPS , , Stockholm NP University NP . SENT Jussi NP Karlgren NP . SENT 1996 CD . SENT Stylistic JJ Variation NN in IN an DT Information NP Retrieval NP Experiment NN In IN Proceedings NNS NeMLaP NP 2 CD , , Bilkent NP , , September NP 1996 CD . SENT Ankara NP . SENT Bilkent NP University NP . SENT In IN the DT Computation NN and CC Language NP E NP Print NP Archive NP . SENT cmp NN lg NN 9608003 CD . SENT Jussi NP Karlgren NP and CC Douglass NP Cutting NP . SENT 1994 CD . SENT Recognizing VVG Text NN Genres NNS with IN Simple JJ Metrics NP Using VVG Discriminant NN Analysis NN , , Proceedings NNS of IN the DT 15 CD th NN International NP Conference NP on IN Computational JJ Linguistics NN COLING NP 94 CD , , Kyoto NP . SENT In IN the DT Computation NN and CC Language NP E NP Print NP Archive NP . SENT cmp NN lg NN 9410008 CD . SENT Jussi NP Karlgren NP and CC Troy NP Straszheim NP . SENT 1997 CD . SENT Visualizing VVG Stylistic JJ Variation NN . SENT In IN the DT Proceedings NNS of IN the DT 30 CD th NN HICSS NP , , Maui NP . SENT W NP . SENT N NP . SENT Francis NP and CC F NP . SENT Kucera NP . SENT 1982 CD . SENT Frequency NN Analysis NN of IN English NP Usage RB , , Houghton NP Mifflin NP . SENT Gunnel NN K NP llgren NP . SENT 1990 CD . SENT The DT First NP Million NP is VBZ Hardest RBS to TO Get VV . SENT Corpus NN Tagging VVG . SENT Proceedings NNS of IN the DT 13 CD th NN International NP Conference NP on IN Computational JJ Linguistics NN COLING NP 90 CD Hans NP Karlgren NP ed NP . SENT , , Helsinki NP . SENT T NN . SENT C LS . SENT Mendenhall NP . SENT 1887 CD . SENT The DT Characteristic JJ Curves NNS of IN Composition NN . SENT Science NP 9 CD . SENT 237 CD 49 CD . SENT J NP . SENT Ross NP Quinlan NP . SENT 1993 CD . SENT C NP 4 CD . SENT 5 CD . SENT Programs NNS for IN Machine NP Learning NP . SENT San NP Mateo NP . SENT Morgan NP Kaufmann NP . SENT Tomek NP Strzalkowski NP , , Louise NP Guthrie NP , , Jussi NP Karlgren NP , , Jim NP Leistensnider NP , , Fang NP Lin NP , , Jose NP Perez NP Carballo NP , , Troy NP Straszheim NP , , Jin NP Wang NP , , Jon NP Wilding NP . SENT 1996 CD . SENT Natural NP Language NP Information NP Retrieval NP . SENT TREC NP 5 CD Report NP Proceedings NNS of IN The DT Fifth NP Text NN REtrieval NN Conference NP TREC NP 5 CD . SENT Donna NP Harman NP ed NP . SENT . SENT National NP Institute NP of IN Standards NP Special JJ Publication NN . SENT Washington NP . SENT Josef NP Vachek NP . SENT 1975 CD . SENT Some DT remarks NNS on IN functional JJ dialects NNS of IN standard JJ languages NNS . SENT In IN Style NP and CC Text NN Studies NNS presented VVD to TO Nils NP Erik NP Enkvist NP . SENT H NP kan NP Ringbom NP . SENT ed NP . SENT Stockholm NP . SENT Skriptor NP and CC Turku NP . SENT bo NP Akademi NP . SENT