Comparison to other available Python packages

We have benchmarked our code on cs-ud-train-l.conllu from UDv1.2 (68 MiB, 41k sentences, 800k words) and compared it with the other available libraries providing a Python API. All benchmarks were run 30 times on Ubuntu 20.04 (64-bit Python) and Windows 10 (32-bit Python) installed on the machine equipped with Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz.

Package	OS	Memory, MiB	Load, s	Save, s	Read, s	Write, s	Text, s	Relchain, s
pyconll	Ubuntu	1683.1	12.88 ± 1.07	6.32 ± 0.25	0.34 ± 0.02	0.23 ± 0.01	NA	0.47 ± 0.02
	Windows	876.4	10.97 ± 0.63	6.23 ± 0.11	0.38 ± 0.03	0.23 ± 0.02	NA	0.54 ± 0.04
conllu	Ubuntu	1208.7	16.83 ± 0.3	4.28 ± 0.06	0.19 ± 0.0	0.1 ± 0.0	NA	0.25 ± 0.02
	Windows	707.2	19.11 ± 3.32	5.23 ± 1.1	0.22 ± 0.04	0.09 ± 0.01	NA	0.3 ± 0.06
Udapi-Python	Ubuntu	756.0	19.88 ± 1.05	6.86 ± 0.12	0.19 ± 0.01	0.14 ± 0.01	0.94 ± 0.03	0.16 ± 0.01
	Windows	421.6	19.09 ± 1.08	8.51 ± 1.36	0.2 ± 0.02	0.11 ± 0.01	1.01 ± 0.1	0.15 ± 0.01
UDon2	Ubuntu	772.0	3.27 ± 0.07	3.34 ± 0.03	0.75 ± 0.0	0.42 ± 0.0	0.24 ± 0.0	0.14 ± 0.0
	Windows	439.7	4.44 ± 0.32	5.53 ± 0.64	0.83 ± 0.07	0.42 ± 0.04	0.41 ± 0.34	0.15 ± 0.01

More detailed descriptions of each benchmark:

Load refers to loading from CoNLL-U file;
Save - to storing to the CoNLL-U file;
Read - getting a form and a lemma for every node of every tree;
Write - changing a deprel for every node of every tree;
Text - computing a textual representation of a subtree induced by every root node of every tree;
Relchain - finding nodes at the end of a relchain for every tree.

For more details, please refer to the benchmark code available here.

Important

UDon2 performs worse on Read and Write benchmarks due to them using Python’s for-loops with C++ objects. This is known to result in slow performance (e.g. also in Numpy), which is why it is recommended to use the pre-defined methods for querying. Optimizing these benchmarks is currently work in progress.

Note

UDon2 compiled for a specific Linux machine performs significantly better than the one available via PyPi, since the wheel for PyPi was created with manylinux2010 tag (to ensure compatibility with many Linux distributions). So if you need performance boost on a Linux, machine-specific compilation will most probably solve your problems. To give an idea of the kind of performance boost, we present results of the same benchmarks on the same machine, but with a machine-specific version of UDon2.

Package	OS	Memory, MiB	Load, s	Save, s	Read, s	Write, s	Text, s	Relchain, s
UDon2*	Ubuntu	549.9	1.79 ± 0.11	2.42 ± 0.12	0.75 ± 0.03	0.36 ± 0.02	0.2 ± 0.01	0.1 ± 0.0

Observe, that machine-specific compilation for Windows will not give any performance benefits, since wheels were built with regular win32 and win_amd64 tags.