Example usage ============= python-vibrato does not contain model files. To perform tokenization, follow `the document of Vibrato `_ to download distribution models or train your own models beforehand. You can check the version number as shown below to use compatible models: .. code-block:: python >>> import vibrato >>> vibrato.VIBRATO_VERSION '0.5.2' Examples: .. code-block:: python >>> import vibrato >>> with open('tests/data/system.dic', 'rb') as fp: ... tokenizer = vibrato.Vibrato(fp.read()) >>> tokens = tokenizer.tokenize('社長は火星猫だ') >>> len(tokens) 5 >>> tokens[0] Token { surface: "社長", feature: "名詞,普通名詞,一般,*" } >>> tokens[0].surface() '社長' >>> tokens[0].feature() '名詞,普通名詞,一般,*' >>> tokens[0].start() 0 >>> tokens[0].end() 2 The distributed models are compressed in zstd format. If you want to load these compressed models, you must decompress them outside the API: .. code-block:: python >>> import vibrato >>> import zstandard # zstandard package in PyPI >>> dctx = zstandard.ZstdDecompressor() >>> with open('tests/data/system.dic.zst', 'rb') as fp: ... with dctx.stream_reader(fp) as dict_reader: ... tokenizer = vibrato.Vibrato(dict_reader.read())