These perl programs can be used to identify the single genres. contains_xxx.pl recognizes, wheter genre xxx is contained in the text, find_xxx.pl whether the complete text is of genre xxx.
Usage:
perl program.pl [DIRECTORY [CORPUS [FILENAME]]] where CORPUS := train | testWhen no filename is given, the whole directory will be processed.
For the programs contains_literatur und find_literatur the TreeTagger has to be installed and the path thereto adapted in those programs. Some other programs need already tagged versions of the files. See the corpora for examples how these files have to look like.
- contains_code.pl
- contains_formular.pl (form)
- contains_literaturliste.pl (list of references)
- contains_statistik.pl (statistics)
- find_anleitung.pl (tutorials)
- find_bericht.pl (report)
- find_blog.pl
- find_brief.pl (letter)
- find_code.pl
- find_comment.pl (commentary)
- find_dictionary.pl
- find_drehbuch.pl (screenplay)
- find_erklaerung.pl (explanation)
- find_faq.pl
- find_feature.pl
- find_formular.pl (form)
- find_forum.pl (bulletin board)
- find_gesetz.pl (law)
- find_glossar.pl (glossary)
- find_glosse.pl (glos, squib)
- find_interview.pl
- find_katalog.pl (catalog)
- find_linklist.pl (list of links)
- find_literaturliste.pl (list of references)
- find_meetingminutes.pl
- find_nachricht.pl (news)
- find_nothing.pl (nothing)
- find_person.pl (list of persons)
- find_poems.pl
- find_portrait.pl
- find_presentation.pl
- find_reportage.pl
- find_rezension.pl (review)
- find_roman.pl (prose)
- find_statistik.pl (statistics)
- find_timeline.pl
- find_wissenschaft.pl (scientific texts)