.. _reference: Reference ========= This section describes the following functions and classes: * :py:func:`tokenize()` * :py:func:`check_single()` * :py:func:`check()` * :py:func:`check_with_stats()` * :py:class:`CorrectToken` * :py:class:`Annotation` Importing GreynirCorrect ------------------------ After installing the ``reynir-correct`` package (see :ref:`installation`), import it using: .. code-block:: python import reynir_correct as grc If you only want to do token-level checking, the simplest method is to import only the :py:func:`tokenize()` function (documented below): .. code-block:: python from reynir_correct import tokenize Similarly, if you only want straightforward checking on single sentences, you can import only the :py:func:`check_single()` method (documented below): .. code-block:: python from reynir_correct import check_single The tokenize() function ----------------------- .. py:function:: tokenize(text: Union[str, Iterable[str]], **options) -> Iterator[CorrectToken] Consumes a text stream and returns a generator of instances of :py:class:`CorrectToken`, corrected or annotated as the case may be. :param text: A text string, or an iterator of strings. :param options: Tokenizer options can be passed via keyword arguments, as in ``g = tokenize(text, convert_numbers=True)``. See the documentation for the `Tokenizer `_ package for further information. Two boolean flags directly affect the correction process. Setting ``only_ci=True`` tells the checker to look only for context-independent errors. Setting ``apply_suggestions=True`` makes the checker more aggressive in turning suggestions into corrections. :return: A generator of tokens, where each token is an instance of the :py:class:`CorrectToken` class. Example:: from reynir_correct import tokenize g = tokenize("Maðurin borðaði aldrey Danskan hammborgara") for t in g: if t.txt: print(f"{t.txt:12} {t.error_code:8} {t.error_description}") Output:: Maðurinn S004 Orðið 'Maðurin' var leiðrétt í 'Maðurinn' borðaði aldrei S001 Orðið 'aldrey' var leiðrétt í 'aldrei' danskan Z001 Orð á að byrja á lágstaf: 'Danskan' hamborgara S004 Orðið 'hammborgara' var leiðrétt í 'hamborgara' The check_single() function --------------------------- .. py:function:: check_single(sentence: str) -> _Sentence Analyzes the spelling and grammar of a sentence, returning an instance of the :py:class:`_Sentence` class. The :py:class:`_Sentence` class is described in the `Greynir documentation `__. GreynirCorrect adds the ``annotations`` property to the :py:class:`_Sentence` object, which returns a list of :py:class:`Annotation` instances applying to the sentence. :param sentence: The sentence to analyze, as a string. If the string contains more than one sentence, only the first one is analyzed. :return: A :py:class:`_Sentence` object. Example:: s = check_single("Ég dreimi um að leita af mindinni") for a in s.annotations: print(a) Output (showing token span, error code, error description and suggested replacement):: 000-000: P_WRONG_CASE_nf_þf Á líklega að vera 'Mig' / [Mig] 001-001: S004 Orðið 'dreimi' var leiðrétt í 'dreymi' 004-005: P001 'leita af' á sennilega að vera 'leita að' / [um að leita að myndinni] 006-006: S004 Orðið 'mindinni' var leiðrétt í 'myndinni' The check() function -------------------- .. py:function:: check(text: str, *, split_paragraphs: bool=False) -> Iterable[_Paragraph] Returns a generator of checked paragraphs of text (instances of the :py:class:`_Paragraph` class), with each of those being a generator of checked sentences with annotations. Sentences are parsed and checked "on demand", just before being returned from the generator. :param text: The text to analyze, as a string. It may contain multiple paragraphs and sentences. :param split_paragraphs: If set to ``True``, the text will be split into paragraphs at each newline. :return: A generator of :py:class:`_Paragraph` instances. The check_with_stats() function ------------------------------- .. py:function:: check_with_stats(text: str, *, split_paragraphs: bool=False) -> Dict Returns a dictionary with the results of a grammar and spelling check on the given text. This is a synchronous call, i.e. it does not return until the entire text has been processed. :param text: The text to analyze, as a string. It may contain multiple paragraphs and sentences. :param split_paragraphs: If set to ``True``, the text is automatically split into paragraphs between empty lines. :return: A dictionary with the following keys and values: * ``paragraphs``: A list of lists of :py:class:`_Sentence` objects, each having the ``annotations`` property containing a list of :py:class:`Annotation` objects. * ``num_tokens``: The total number of tokens processed. * ``num_sentences``: The number of sentences found in the text. * ``num_parsed``: The number of sentences that were successfully parsed. * ``ambiguity``: A ``float`` weighted average of the ambiguity of the parsed sentences. Ambiguity is defined as the *n*-th root of the number of possible parse trees for the sentence, where *n* is the number of tokens in the sentence. * ``parse_time``: A ``float`` with the wall clock time, in seconds, spent on tokenizing and parsing the sentences. The CorrectToken class ---------------------- .. py:class:: CorrectToken The :py:class:`CorrectToken` class replaces the default ``tokenizer.Tok`` named tuple normally returned by the `Tokenizer `_. By way of duck typing, it replicates the ``kind``, ``txt`` and ``val`` properties of the ``Tok`` tuple. It then adds a number of properties to access error codes and annotations on the token, as described here: .. py:attribute:: error_description(self) -> str Returns the description of the error associated with the token, or an empty string if there is no error. .. py:attribute:: error_code(self) -> str Returns the code of the error associated with the token, or an empty string if there is no error. .. py:attribute:: error_suggestion(self) -> str Returns the text of a suggested replacement for the text of this token, or an empty string if there is no error. .. py:attribute:: error_span(self) -> int Returns the number of consecutive tokens, starting with this one, that are affected by the same error. In most cases this is 1, meaning that there are no additional affected tokens. The Annotation class -------------------- .. py:class:: Annotation The :py:class:`Annotation` class represents an annotation of a token span within a sentence. An annotation describes a correction that has already been applied to the sentence, or a suggested correction. .. py:method:: __str__(self) -> str Returns a string representation of the annotation. This is intended mainly for debugging and development purposes. .. py:attribute:: start(self) -> int Returns the index of the first token to which the annotation applies. Token indices are 0-based. .. py:attribute:: end(self) -> int Returns the index of the last token to which the annotation applies. Token indices are 0-based. .. py:attribute:: code(self) -> str Returns an error or warning code for the annotation. If the code ends with ``"/w"``, it is a warning. .. py:attribute:: text(self) -> str Returns a brief, human-readable description of the annotation. .. py:attribute:: detail(self) -> str Returns a more detailed, human-readable description of the annotation. .. py:attribute:: suggest(self) -> str Returns a suggested replacement for the text within the token span to which the annotation applies. This only applies for suggested corrections, i.e. if the correction has not been already applied to the sentence. The _Paragraph class -------------------- .. py:class:: _Paragraph The :py:class:`_Paragraph` class is described in the `Greynir documentation `__. The _Sentence class -------------------- .. py:class:: _Sentence The :py:class:`_Sentence` class is described in the `Greynir documentation `__.