.. _overview: Overview ======== GreynirCorrect can be used in three modes, depending on your requirements. * You can use it as a Python package to proofread text at the **token (word) level**, correcting and annotating individual errors in the token stream. * It can also perform more extensive grammar checking at the **sentence level**, returning a list of annotations for each sentence. * Finally, it can be used as a **command-line tool**, at the token level, to consume an input file stream (or *stdin*) and write to a corrected output file stream (or *stdout*). These modes are described below. The :ref:`Reference` section contains further detail. Token-level correction ---------------------- GreynirCorrect can tokenize text and return an automatically corrected token stream. This catches token-level errors, such as spelling errors and erroneous fixed phrases (*?að ýmsu leiti* → *að ýmsu leyti*), but not grammatical errors. The returned token objects are annotated with explanations of each correction or suggestion. Token-level correction is relatively fast. Full grammar analysis --------------------- GreynirCorrect can analyze text grammatically by attempting to parse each sentence in turn, after token-level correction. The parsing is done according to Greynir's context-free grammar for Icelandic, augmented with additional production rules for common grammatical errors (*?Manninum á verkstæðinu vantaði hamar* → *Manninn á verkstæðinu vantaði hamar*). The analysis returns each sentence along with a set of annotations (errors and suggestions) that apply to spans (consecutive tokens) within the sentence, in addition to individual token annotations. Full grammar analysis of sentences is slower than token-level correction. Command-line tool ----------------- GreynirCorrect can be invoked as a command-line tool to perform token-level correction. The command is ``correct infile.txt outfile.txt``, where ``infile.txt`` and ``outfile.txt`` are the input and output filenames, respectively. The command-line tool is further documented :ref:`here `. Examples -------- To perform token-level correction from Python code: .. code-block:: python from reynir_correct import tokenize g = tokenize("Af gefnu tilefni fékk fékk daninn vilja sýnum " "framgengt í auknu mæli.") for tok in g: print("{0:10} {1}".format(tok.txt or "", tok.error_description)) Output:: Að Orðasambandið 'Af gefnu tilefni' var leiðrétt í 'að gefnu tilefni' gefnu tilefni fékk Endurtekið orð ('fékk') var fellt burt Daninn Orð á að byrja á hástaf: 'daninn' vilja Orðasambandið 'vilja sýnum framgengt' var leiðrétt í 'vilja sínum framgengt' sínum framgengt í Orðasambandið 'í auknu mæli' var leiðrétt í 'í auknum mæli' auknum mæli . To perform full spelling and grammar analysis of a sentence from Python code: .. code-block:: python from reynir_correct import check_single sent = check_single("Páli, vini mínum, langaði að horfa á sjónnvarpið.") for annotation in sent.annotations: print("{0}".format(annotation)) Output:: 000-004: P_WRONG_CASE_þgf_þf Á líklega að vera 'Pál, vin minn' / [Pál , vin minn] 009-009: S004 Orðið 'sjónnvarpið' var leiðrétt í 'sjónvarpið' .. code-block:: python >>> sent.tidy_text Output:: 'Páli, vini mínum, langaði að horfa á sjónvarpið.' Note that the ``annotation.start`` and ``annotation.end`` properties (here ``start`` is 0 and ``end`` is 4) contain the indices of the first and last tokens to which the annotation applies. ``P_WRONG_CASE_þgf_þf`` and ``S004`` are error codes.