Link Search Menu Expand Document

Comparison to other available Python packages

We have benchmarked our code on cs-ud-train-l.conllu from UDv1.2 (68 MiB, 41k sentences, 800k words) and compared it with the other available libraries providing a Python API. All benchmarks were run 30 times on Ubuntu 20.04 (64-bit Python) and Windows 10 (32-bit Python) installed on the machine equipped with Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz.

Package OS Memory, MiB Load, s Save, s Read, s Write, s Text, s Relchain, s
pyconll Ubuntu 1683.1 12.88 ± 1.07 6.32 ± 0.25 0.34 ± 0.02 0.23 ± 0.01 NA 0.47 ± 0.02
  Windows 876.4 10.97 ± 0.63 6.23 ± 0.11 0.38 ± 0.03 0.23 ± 0.02 NA 0.54 ± 0.04
conllu Ubuntu 1208.7 16.83 ± 0.3 4.28 ± 0.06 0.19 ± 0.0 0.1 ± 0.0 NA 0.25 ± 0.02
  Windows 707.2 19.11 ± 3.32 5.23 ± 1.1 0.22 ± 0.04 0.09 ± 0.01 NA 0.3 ± 0.06
Udapi-Python Ubuntu 756.0 19.88 ± 1.05 6.86 ± 0.12 0.19 ± 0.01 0.14 ± 0.01 0.94 ± 0.03 0.16 ± 0.01
  Windows 421.6 19.09 ± 1.08 8.51 ± 1.36 0.2 ± 0.02 0.11 ± 0.01 1.01 ± 0.1 0.15 ± 0.01
UDon2 Ubuntu 772.0 3.27 ± 0.07 3.34 ± 0.03 0.75 ± 0.0 0.42 ± 0.0 0.24 ± 0.0 0.14 ± 0.0
  Windows 439.7 4.44 ± 0.32 5.53 ± 0.64 0.83 ± 0.07 0.42 ± 0.04 0.41 ± 0.34 0.15 ± 0.01

More detailed descriptions of each benchmark:

  • Load refers to loading from CoNLL-U file;
  • Save - to storing to the CoNLL-U file;
  • Read - getting a form and a lemma for every node of every tree;
  • Write - changing a deprel for every node of every tree;
  • Text - computing a textual representation of a subtree induced by every root node of every tree;
  • Relchain - finding nodes at the end of a relchain for every tree.

For more details, please refer to the benchmark code available here.

Important

UDon2 performs worse on Read and Write benchmarks due to them using Python’s for-loops with C++ objects. This is known to result in slow performance (e.g. also in Numpy), which is why it is recommended to use the pre-defined methods for querying. Optimizing these benchmarks is currently work in progress.

Note

UDon2 compiled for a specific Linux machine performs significantly better than the one available via PyPi, since the wheel for PyPi was created with manylinux2010 tag (to ensure compatibility with many Linux distributions). So if you need performance boost on a Linux, machine-specific compilation will most probably solve your problems. To give an idea of the kind of performance boost, we present results of the same benchmarks on the same machine, but with a machine-specific version of UDon2.

Package OS Memory, MiB Load, s Save, s Read, s Write, s Text, s Relchain, s
UDon2* Ubuntu 549.9 1.79 ± 0.11 2.42 ± 0.12 0.75 ± 0.03 0.36 ± 0.02 0.2 ± 0.01 0.1 ± 0.0

Observe, that machine-specific compilation for Windows will not give any performance benefits, since wheels were built with regular win32 and win_amd64 tags.