Návrh, vytvoření a využití francouzskočeského a česko-francouzského paralelního korpusu

Svášek, Martin

Repository landing page

oai:https://dspace.cuni.cz:20.500.11956/12105

Návrh, vytvoření a využití francouzskočeského a česko-francouzského paralelního korpusu

Authors: Martin Svášek
Publication date: 1 January 2007
Publisher: Univerzita Karlova, Filozofická fakulta

Abstract

Disertační práce sestává ze tří částí, jež odpovídají názvu. Autor nejprve představuje koncepci paralelního korpusu a na obecné rovině jej definuje jakožto soubor textů v několika jazycích (nejméně dvou) tvořený dvojicemi originál-překlad. Je uvedena terminologie pro pojmenování různých druhů textových souborů ve víceru jazyků a k získání obecného přehledu v dané oblasti se mohou čtenáři seznámit s paralelními korpusy, které v současné době existují. Následně je definován francouzsko-český a česko-francouzský paralelní korpus (Fratchque), a to vzhledem k jeho budoucímu použití pro jazykovědný výzkum, zejména k vyhledávání nesklonných výrazů, jež představují autorův střed zájmu. Fratchque je paralelní korpus beletristických textů psaných francouzsky a česky; autor uvádí výčet obtíží, jež zamezily tomu, aby korpus obsahoval i jiné druhy textů. Korpus existuje pouze v digitální podobě, aby umožnil vyhledávání za pomoci počítače. Snaží se odrážet moderní jazyk, proto obsahuje pouze texty, které pocházejí z doby po roce 1945. Struktura souborů uložených na pevném disku, již spravuje program ParaConc, může být v budoucnu obohacena o nové dvojice českofrancouzských či francouzsko-českých textů. Není explicitně označkována XML značkami, což je ospravedlněno tím, že v současné době značkování není třeba; korpusový...According to the title, the thesis is composed of three parts. At the beginning the author introduces the concept of a parallel corpus defining it as a set of texts in different (at least two) languages, composed of original-translation couples. A terminology is provided to name different sets of texts in different languages. To have a general overview of this specific field, readers can be acquainted with the present existing parallel corpus. A definition of the project for creating a bidirectional French-Czech Czech-French parallel corpus is given in order to use it in linguistic research, notably research upon inflected expressions. French and Czech texts composing the parallel corpus Fratchque come from literature; the author also gives the explanation regarding the reason why other kinds of texts have not been taken into account. This corpus, conceived for PC-based researches, exists only in digital format. Having in mind the purpose of representing modern language, only texts after the year 1945 have been selected. New couples of French-Czech Czech-French texts could be easily added thanks to the files structure stored on a hard disk and managed by ParaConc. The corpus is not marked up explicitly by XML tags because the tagging is not necessary for the proper functioning of ParaConc - this step,...Institute of the Czech National CorpusÚstav českého národního korpusuFilozofická fakultaFaculty of Art

dizertační práce

Similar works

Full text

Open in the Core reader

Download PDF

CU Digital Repository

oai:https://dspace.cuni.cz:20....

Last time updated on 17/04/2021

This paper was published in CU Digital Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.