/Users/andrea/_magisterarbeit/korpus/clean/trainkorpus/27/file5.html NN ----------------------------------------- : Frequently RB Asked VVN Questions NNS Home NP Site NN Map NN Search NP Character NN Properties NP , , Case NP Mappings NNS Names NNS Do VVP all DT scripts NNS have VHP upper JJ and CC lower JJR case NN . SENT Do VV the DT case NN mappings NNS in IN Unicode NP allow VVP a DT round NN trip NN . SENT Doesn't NP this DT cause NN a DT problem NN . SENT Why WRB aren't NN there RB extra JJ characters NNS to TO support VV locale NN independent JJ casing NN for IN Turkish JJ . SENT Why WRB is VBZ there RB no DT upper JJ case NN SHARP JJ S NP . SENT Is VBZ all RB of IN the DT Unicode NP case NN mapping NN information NN in IN UnicodeData NP . SENT txt NN . SENT Near IN the DT end NN of IN the DT SpecialCasing NP . SENT txt RB , , there EX are VBP the DT two CD lines NNS on IN SIGMA NN that IN look NN weird JJ to TO me PP . SENT Could MD you PP explain VV them PP . SENT Where WRB are VBP private JJ use NN characters NNS used VVN , , and CC how WRB should MD they PP be VB handled VVN . SENT The DT character NN name NN for IN the DT control NN character NN U NP 0082 CD is VBZ BREAK NN PERMITTED VVN HERE RB . SENT Does VVZ that RB mean VV I PP have VHP to TO interpret VV that DT control NN character NN in IN that DT way NN . SENT Where WRB can MD I PP find VV formal JJ definitions NNS of IN the DT terms NNS used VVN in IN the DT Character NN Name NN field NN of IN the DT UnicodeData NP . SENT txt NN file NN . SENT Most RBS specifically RB , , precise JJ explanations NNS of IN designations NNS like IN turned VVN , , inverse JJ , , inverted JJ , , reversed VVD , , rotated VVD . SENT Are VBP any DT unassigned JJ characters NNS or CC reserved JJ characters NNS given VVN default NN properties NNS . SENT Unicode NP now RB treats VVZ the DT SOFT JJ HYPHEN NN as IN format NN control NN Cf NP character NN when WRB formerly RB it PP was VBD a DT punctuation NN character NN Pd NP . SENT Doesn't NP this DT break NN ISO NP 8859 CD 1 CD compatibility NN . SENT Where WRB can MD I PP find VV the DT numerical JJ values NNS of IN characters NNS with IN the DT Hexadecimal NN Digit NN property NN . SENT Why WRB is VBZ the DT hacek NP accent NN called VVD caron NP in IN Unicode NP . SENT Q NP . SENT Do VV all DT scripts NNS have VHP upper JJ and CC lower JJR case NN . SENT No RB , , as IN a DT matter NN of IN fact NN , , most JJS scripts NNS do VVP not RB have VH cases NNS . SENT JR NP Q NP . SENT Do VV the DT case NN mappings NNS in IN Unicode NP allow VVP a DT round NN trip NN . SENT A DT . SENT No RB , , there EX are VBP instances NNS where WRB two CD characters NNS map VVP to TO the DT same JJ result NN . SENT For IN example NN , , both CC a DT sigma NN and CC a DT final JJ sigma NN uppercase JJ to TO a DT capital NN sigma NN . SENT There EX are VBP other JJ cases NNS where WRB the DT uppercase JJ of IN a DT character NN requires VVZ decomposition NN . SENT In IN some DT cases NNS , , the DT correct JJ mapping NN also RB depends VVZ on IN the DT locale NN . SENT For IN example NN , , in IN Turkish JJ , , an DT i NP maps NNS to TO an DT uppercase JJ dotted JJ I NN . SENT MD NP Q NP . SENT Doesn't NP this DT cause NN a DT problem NN . SENT A DT . SENT Remember VV that IN in IN general JJ , , case NN mappings NNS of IN strings NNS lose VVP information NN and CC thus RB do VVP not RB allow VV round NN tripping VVG . SENT Take VV the DT word NN anglo NP American NP or CC the DT Italian JJ word NN vederLa NN . SENT Once RB you PP uppercase VV , , lowercase VV or CC titlecase VV these DT strings NNS , , you PP can't VVD recover VV the DT original JJ just RB by IN performing VVG the DT reverse JJ operation NN . SENT MD NP Q NP . SENT Why WRB aren't NN there RB extra JJ characters NNS to TO support VV locale NN independent JJ casing NN for IN Turkish JJ . SENT A DT . SENT The DT fact NN is VBZ that IN there EX is VBZ too RB much JJ data NNS coded VVN in IN 8859 CD 9 CD with IN 0 CD xDD JJ LATIN JJ CAPITAL NN LETTER NN I NN WITH IN DOT NP and CC 0 CD xFD NN LATIN JJ SMALL NP LETTER NN DOTLESS NP I NN which WDT contains VVZ both DT Turkish JJ and CC non JJ Turkish JJ text NN . SENT Transcoding VVG this DT data NNS to TO Unicode NP would MD be VB intolerably RB difficult JJ if IN it PP all RB had VHD to TO be VB tagged VVN first RB to TO sort VV out IN which WDT 0 CD x NN 49 CD characters NNS are VBP ordinary JJ I PP and CC which WDT are VBP CAPITAL NN LETTER NN DOTLESS NP I NP . SENT Better RBR to TO accept VV the DT compromise NN and CC get VV on RP with IN moving VVG to TO Unicode NP . SENT Moreover RB , , there EX is VBZ a DT strong JJ doubt NN that IN users NNS will MD get VV it PP right RB in IN future NN either CC when WRB they PP enter VVP new JJ characters NNS . SENT JC NP Q NP . SENT Why WRB is VBZ there RB no DT upper JJ case NN SHARP JJ S NP . SENT A DT . SENT There EX are VBP 139 CD lower JJR case NN letters NNS in IN Unicode NP 2 CD . SENT 1 CD that WDT have VHP no DT direct JJ uppercase JJ equivalent NN . SENT Should MD there RB be VB introduced VVN new JJ bogus JJ characters NNS for IN all DT of IN them PP , , so RB that IN when WRB you PP see VVP an DT fl NN ligature NN you PP can MD uppercase VV it PP to TO FL NP without IN expanding VVG anything NN . SENT Of IN course NN not RB . SENT Note NN that IN case NN conversion NN is VBZ inherently RB language NN sensitive JJ , , notably RB in IN the DT case NN of IN IPA NP , , which WDT needs VVZ to TO be VB left VVN strictly RB alone RB even RB when WRB embedded VVN in IN another DT language NN which WDT is VBZ being VBG case NN converted VVN . SENT The DT best JJS you PP can MD get VV is VBZ an DT approximate JJ fit NN . SENT JC NP Q NP . SENT Is VBZ all RB of IN the DT Unicode NP case NN mapping NN information NN in IN UnicodeData NP . SENT txt NN . SENT A DT . SENT No UH . SENT The DT UnicodeData NP . SENT txt NN file NN includes VVZ all DT of IN the DT 1 CD . SENT 1 CD case NN mappings NNS , , but CC doesn't NN include VVP 1 CD . SENT many JJ mappings NNS such JJ as IN the DT one NN needed VVN for IN uppercasing VVG . SENT Since IN many JJ parsers NNS now RB expect VVP this DT file NN to TO have VH at IN most JJS single JJ characters NNS in IN the DT case NN mapping NN fields NNS , , an DT additional JJ file NN SpecialCasing NP . SENT txt NN was VBD added VVN to TO provide VV the DT 1 CD . SENT many JJ mappings NNS . SENT For IN more JJR information NN , , see VVP UTR NP 21 CD Case NP Mappings NNS MD NP Q NP . SENT Near IN the DT end NN of IN the DT SpecialCasing NP . SENT txt RB , , there EX are VBP the DT two CD lines NNS on IN SIGMA NN that IN look NN weird JJ to TO me PP . SENT Can MD you PP explain VV them PP . SENT 03 CD C NP 3 CD . SENT 03 CD C NP 2 CD . SENT 03 CD A NP 3 CD . SENT 03 CD A NP 3 CD . SENT FINAL JJ . SENT GREEK NP SMALL NP LETTER NN SIGMA NN 03 CD C NP 2 CD . SENT 03 CD C NP 3 CD . SENT 03 CD A NP 3 CD . SENT 03 CD A NP 3 CD . SENT NON JJ FINAL JJ . SENT GREEK NP SMALL NP LETTER NN FINAL JJ SIGMA NN A NP . SENT Both CC of IN these DT are VBP conditional JJ column NN 5 CD . SENT that DT is VBZ , , in IN normal JJ Greek JJ text NN a DT 03 CD C NP 3 CD non JJ final JJ sigma NN should MD be VB written VVN as IN 03 CD C NP 2 CD final JJ sigma NN if IN it PP is VBZ at IN the DT end NN of IN a DT word NN , , and CC a DT 03 CD C NP 2 CD final JJ sigma NN should MD be VB written VVN as IN a DT 03 CD C NP 3 CD non JJ final JJ sigma NN if IN it PP is VBZ not RB at IN the DT end NN of IN a DT word NN . SENT That's NNS what WP these DT two CD lines NNS would MD mean VV if IN they PP were VBD uncommented JJ . SENT However RB , , they PP are VBP commented VVN , , just RB for IN that DT reason NN . SENT the DT SpecialCasing NP file NN is VBZ not RB intended VVN to TO normalize VV the DT appearance NN of IN a DT small JJ sigma NN . SENT MD NP Q NP . SENT Where WRB are VBP private JJ use NN characters NNS used VVN , , and CC how WRB should MD they PP be VB handled VVN . SENT A DT . SENT Private JJ use NN characters NNS also RB known VVN as IN user NN defined VVN characters NNS are VBP used VVN commonly RB in IN East NP Asia NP , , particularly RB Japan NP , , China NP , , and CC Korea NP , , to TO extend VV the DT available JJ characters NNS in IN various JJ national JJ standard NN and CC vendor NN character NN sets NNS . SENT The DT Unicode NP Standard NP also RB makes VVZ provision NN for IN private JJ use NN characters NNS . SENT Since IN the DT Unicode NP Standard NP includes VVZ so RB many JJ more JJR standard JJ characters NNS than IN any DT other JJ character NN encoding VVG , , there EX is VBZ less JJR of IN a DT requirement NN for IN private JJ use NN characters NNS than IN in IN a DT typical JJ legacy NN character NN set VVD . SENT however RB , , there EX are VBP occasionally RB cases NNS where WRB characters NNS that WDT are VBP not RB yet RB in IN the DT standard JJ need NN to TO be VB represented VVN by IN codepoints NNS in IN the DT Private NP Use NP Area NP PUA NP . SENT Some DT private JJ use NN characters NNS may MD never RB get VV standard JJ encodings NNS for IN one CD reason NN or CC another DT . SENT Also RB , , a DT particular JJ implementation NN may MD choose VV to TO use VV private JJ use NN characters NNS for IN specific JJ internal JJ purposes NNS . SENT It PP is VBZ relatively RB easy JJ for IN Input NN Method NN Editors NNS IME JJ to TO allow VV private JJ use NN characters NNS to TO be VB added VVN in IN the DT PUA NP , , keeping VVG track NN of IN the DT text NN sequence NN that WDT should MD convert VV to TO those DT private JJ use NN characters NNS . SENT With IN modern JJ font NN technologies NNS such JJ as IN OpenType NP and CC AAT NP , , these DT characters NNS can MD also RB be VB added VVN to TO fonts NNS for IN display NN . SENT However RB , , the DT same JJ codepoints NNS in IN the DT PUA NP may MD be VB given VVN different JJ meanings NNS in IN different JJ contexts NNS , , since IN they PP are VBP , , after IN all DT , , defined VVN by IN users NNS and CC are VBP not RB standardized JJ . SENT If IN text NN comes VVZ , , for IN example NN , , from IN a DT legacy NN NEC NP encoding VVG in IN Japan NP , , the DT same JJ codepoint NN in IN the DT PUA NP may MD mean VV something NN entirely RB different JJ if IN interpreted VVN on IN a DT legacy NN Fujitsu NP machine NN , , even RB though IN both DT systems NNS would MD share VV the DT same JJ standard JJ codepoints NNS . SENT For IN each DT given VVN interpretation NN of IN a DT private JJ use NN character NN one NN would MD have VH to TO pick VV the DT appropriate JJ IME JJ user NN dictionary NN and CC fonts NNS to TO work VV with IN it PP . SENT One PP should MD not RB expect VV the DT rest NN of IN an DT operating VVG system NN to TO override VV the DT character NN properties NNS for IN these DT private JJ use NN characters NNS , , since IN private JJ use NN characters NNS can MD have VH different JJ meanings NNS , , depending VVG on IN how WRB they PP originated VVP . SENT In IN terms NNS of IN line NN breaking NN , , case NN conversions NNS , , and CC other JJ textual JJ processes NNS , , private JJ use NN characters NNS will MD typically RB be VB treated VVN by IN the DT operating VVG system NN as IN otherwise RB undistinguished JJ letters NNS or CC ideographs NNS with IN no DT uppercase JJ lowercase JJ distinctions NNS . SENT MD NP and CC KW NP Q NP . SENT The DT character NN name NN for IN the DT control NN character NN U NP 0082 CD is VBZ BREAK NN PERMITTED VVN HERE RB . SENT Does VVZ that RB mean VV I PP have VHP to TO interpret VV that DT control NN character NN in IN that DT way NN . SENT A DT . SENT The DT character NN names NNS are VBP actually RB undefined JJ , , and CC simply RB marked VVD by IN to TO indicate VV their PP$ functional JJ use NN . SENT What WP you PP are VBP thinking VVG of IN as IN names NNS are VBP marked VVN as IN aliases NNS pointing VVG to TO the DT ISO NP 6429 CD usage NN , , as RB in IN http NN . SENT www JJ . SENT unicode NN . SENT org NP charts VVZ PDF NP U NP 0080 CD . SENT pdf NN . SENT The DT Unicode NP Standard NP does VVZ not RB define VV U NP 0082 CD to TO mean VV BREAK NN PERMITTED VVN HERE RB . SENT It PP just RB says VVZ that IN it PP is VBZ a DT control NN code NN , , one CD which WDT in IN ISO NP 6429 CD has VHZ that DT name NN and CC meaning NN . SENT Implementers NNS of IN the DT Unicode NP Standard NP are VBP not RB required VVN to TO interpret VV the DT U NP 0082 CD in IN accordance NN with IN ISO NP 6429 CD or CC to TO interpret VV it PP at IN all DT . SENT The DT standard NN does VVZ assign VV particular JJ properties NNS and CC semantics NNS for IN the DT high JJ use NN controls NNS , , including VVG tab NN , , carriage NN return NN , , line NN feed NN , , form NN feed NN , , and CC next JJ line NN . SENT But CC it PP does VVZ not RB give VV the DT majority NN of IN control NN codes NNS any DT semantics NNS at IN all DT . SENT that WDT is VBZ left VVN to TO a DT higher JJR level NN protocol NN . SENT MD NP Q NP . SENT Where WRB can MD I PP find VV formal JJ definitions NNS of IN the DT terms NNS used VVN in IN the DT Character NN Name NN field NN of IN the DT UnicodeData NP . SENT txt NN file NN . SENT Most RBS specifically RB , , precise JJ explanations NNS of IN designations NNS like IN turned VVN , , inverse JJ , , inverted JJ , , reversed VVD , , rotated VVD A DT . SENT These DT terms NNS are VBP basically RB typographical JJ rather RB than IN Unicode NN specific NN . SENT A DT turned VVN character NN is VBZ one NN that WDT has VHZ been VBN rotated VVD 180 CD degrees NNS around IN its PP$ center NN . SENT A DT turned VVN e NN winds NNS up RB with IN the DT opening NN in IN the DT upper JJ left JJ portion NN . SENT U NP 0259 CD LATIN NP SMALL NP LETTER NN SCHWA NN is VBZ a DT turned VVN e NN . SENT An DT inverted JJ character NN has VHZ been VBN flipped VVN along IN the DT horizontal JJ axis NN . SENT An DT inverted JJ e NN winds NNS up RB with IN the DT opening NN in IN the DT upper JJ right NN portion NN . SENT There EX is VBZ no DT Unicode NP character NN representing VVG an DT inverted JJ e NN . SENT A DT reversed JJ character NN has VHZ been VBN flipped VVN along IN the DT vertical JJ axis NN . SENT A DT reversed JJ e NN winds NNS up RB with IN the DT opening NN in IN the DT lower JJR left NN portion NN . SENT U NP 0258 CD LATIN NP SMALL NP LETTER NN REVERSED VVD E NP is VBZ an DT reversed JJ e NN . SENT A DT rotated VVD character NN has VHZ been VBN rotated VVD 90 CD degrees NNS , , but CC one CD can't NN tell VV which WDT way NN without IN looking VVG at IN the DT glyph NN . SENT U NP 213 NP A NP ROTATED VVD CAPITAL NP Q NP is VBZ a DT Q NP that WDT has VHZ been VBN rotated VVN counterclockwise RB . SENT Inverse JJ means NNS that IN the DT white JJ parts NNS of IN the DT glyph NN are VBP made VVN black JJ , , and CC vice NN versa FW . SENT An DT inverse NN e SYM looks VVZ like IN a DT normal JJ e NN but CC is VBZ white JJ on IN a DT black JJ background NN . SENT There EX is VBZ no DT Unicode NP character NN representing VVG an DT inverse JJ e NN . SENT JC NP Q NP . SENT Are VBP any DT unassigned JJ characters NNS or CC reserved JJ characters NNS given VVN default NN properties NNS . SENT A DT . SENT The DT Bidi NP Algorithm NN UAX NP 9 CD gives VVZ different JJ default NN Bidi NP Class NP property NN values NNS to TO certain JJ ranges NNS of IN unassigned JJ codepoints NNS . SENT see VV the DT discussion NN of IN the DT Bidi NP Class NP in IN UCD NP . SENT html NN for IN details NNS . SENT This DT is VBZ different JJ than IN the DT general JJ policy NN of IN giving VVG a DT single JJ default NN value NN to TO all DT unassigned JJ codepoints NNS . SENT Also RB look VV at IN the DT UCD NP file NN DerivedBidiClass NN . SENT txt NN which WDT assigns VVZ Bidi NP Class NP values VVZ to TO the DT unassigned JJ codepoints NNS anything NN not RB mentioned VVN in IN that DT file NN belongs VVZ to TO class NN L NP . SENT Note NN . SENT for IN each DT Unicode NP property NN , , UCD NP . SENT html NN also RB summarizes VVZ where WRB to TO find VV the DT data NNS for IN the DT property NN values NNS , , and CC the DT default NN value NN used VVN for IN unassigned JJ characters NNS . SENT Q NP . SENT Unicode NP now RB treats VVZ the DT SOFT JJ HYPHEN NN as IN format NN control NN Cf NP character NN when WRB formerly RB it PP was VBD a DT punctuation NN character NN Pd NP . SENT Doesn't NP this DT break NN ISO NP 8859 CD 1 CD compatibility NN . SENT A DT . SENT No UH . SENT The DT ISO NP 8859 CD 1 CD standard NN defines VVZ the DT SOFT JJ HYPHEN NN as IN a DT graphic JJ character NN that WDT is VBZ imaged VVN by IN a DT graphic JJ symbol NN identical JJ with IN , , or CC similar JJ to TO , , that IN representing VVG hyphen NN section NN 6 CD . SENT 3 LS . SENT 3 LS , , but CC does VVZ not RB specify VV details NNS of IN how WRB or CC when WRB it PP is VBZ to TO be VB displayed VVN , , nor CC other JJ details NNS of IN its PP$ semantics NNS . SENT The DT soft JJ hyphen NN has VHZ had VHN a DT long JJ history NN of IN legacy NN implementation NN in IN two CD or CC more JJR incompatible JJ ways NNS . SENT Unicode NP clarifies VVZ the DT semantics NNS of IN this DT character NN for IN Unicode NP implementations NNS , , but CC this DT does VVZ not RB affect VV its PP$ usage NN in IN ISO NP 8859 CD 1 CD implementations NNS . SENT Processes NNS that WDT convert VVP back RB and CC forth RB may MD need VV to TO pay VV attention NN to TO semantic JJ differences NNS between IN the DT standards NNS , , just RB as RB for IN any DT other JJ character NN . SENT In IN a DT terminal JJ emulation NN environment NN , , particularly RB in IN ISO NP 8859 CD 1 CD contexts NNS , , one PP could MD display VV the DT soft JJ hyphen NN as IN a DT hyphen NN in IN all DT circumstances NNS . SENT The DT change NN in IN semantics NNS of IN the DT Unicode NP character NN does VVZ not RB require VV that IN implementations NNS of IN terminal JJ emulators NNS in IN other JJ environments NNS , , such JJ as IN ISO NP 8859 CD 1 CD , , make VVP any DT change NN in IN their PP$ current JJ behavior NN . SENT Q NP . SENT Where WRB can MD I PP find VV the DT numerical JJ values NNS of IN characters NNS with IN the DT Hexadecimal NN Digit NN Hex NN Digit NN property NN . SENT A DT . SENT The DT Unicode NP Standard NP provides VVZ the DT Hex NN Digit NN property NN , , which WDT specifies VVZ which WDT characters NNS are VBP hexadecimal NN digits NNS . SENT 0 CD 9 CD , , A DT F NN , , a DT f NN , , and CC their PP$ fullwidth NN equivalents NNS . SENT The DT ASCII NP Hex NN Digit NN property NN specifies VVZ the DT intersection NN of IN the DT Hex NN Digit NN property NN and CC the DT Basic JJ Latin JJ block NN . SENT There EX is VBZ no DT table NN in IN the DT UCD NP mapping NN the DT hexadecimal NN digit NN characters NNS to TO their PP$ values NNS , , analogous JJ to TO the DT Numeric JJ Value NP property NN . SENT The DT table NN linked VVN here RB removes VVZ this DT real JJ , , if IN trivial JJ , , gap NN . SENT JC NP Q NP . SENT Why WRB is VBZ the DT hacek NP accent NN called VVD caron NP in IN Unicode NP . SENT A DT . SENT Nobody NN knows VVZ . SENT Legend NP has VHZ it PP that IN the DT term NN was VBD first RB spotted VVN in IN one CD of IN the DT giant JJ books NNS from IN the DT 30 CD s PP at IN Mergenthaler NP Linotype NN Company NN in IN Brooklyn NP , , NY NP . SENT but CC no DT one NN has VHZ been VBN able JJ to TO confirm VV that IN . SENT More RBR accurate JJ reports NNS trace VV the DT term NN back RB to TO the DT mid JJ 80 CD s PP where WRB we PP do VVP have VH documented VVN sightings NNS of IN caron NP in IN publications NNS such JJ as RB . SENT The DT TypEncyclopedia NP by IN Frank NP Romano NP , , ISBN NP . SENT 0 CD 835 JJ 21925 CD 9 CD , , Libraries NNS Unlimited NP . SENT 1984 CD p NN . SENT 6 CD shows NNS the DT mark NN with IN the DT notation NN caron NP hacek NP clicka NP IBM's NP Green NP Book NP which WDT has VHZ an DT original JJ copyright NN date NN of IN 1986 CD . SENT Caron NP Accent NN appears VVZ on IN p NN . SENT K NP 432 CD , , in IN a DT table NN entitled VVD Diacritic NN Mark NP Special JJ Graphic JJ Characters NNS . SENT National NP Language NP Support NN Reference NN Manual NP . SENT 4 CD th NN ed NP . SENT 1994 CD . SENT National NP Language NP Design NP Guide NP , , 2 CD SGML NP Adobe NP documentation NN in IN this DT 1986 CD reference NN Unicode NP and CC ISO NP 8859 CD x SYM just RB carried VVN the DT tradition NN along RB . SENT In IN an DT article NN published VVN in IN 2001 CD . SENT Orthographic JJ diacritics NNS and CC multilingual JJ computing NN , , J NP . SENT C SYM . SENT Wells NP a DT linguist NN at IN the DT University NP College NP in IN London NP writes VVZ . SENT The DT term NN caron NP , , however RB , , is VBZ wrapped VVN in IN mystery NN . SENT Incredibly RB , , it PP seems VVZ to TO appear VV in IN no DT current JJ dictionary NN of IN English NP , , not RB even RB the DT OED NP . SENT Whoever WP the DT originator NN is VBZ , , we PP suspect VVP that IN he PP has VHZ probably RB taken VVN his PP$ secret NN to TO the DT grave NN by IN now RB . SENT Various JJ authors NNS