Skip to content

Don't build C source if running on PYPY #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

imwhocodes
Copy link

Avoid building C optimised version if running on PYPY and use the plain python implementation

  1. pypy have a penalty cost at the python to C interface
  2. I found this memory leak: memoryview leaking memory pypy/pypy#5100

As implementation I took inspiration from msgpack https://github.com/msgpack/msgpack-python/blob/main/setup.py

Alternative is to use import cobs.cobs._cobs_py as cobs instead of from cobs import cobs but this move the burden to the userspace and it is pretty ugly

Thanks,
Luca

Avoid building C optimised version if running on PYPY and use the plain python implementation

1- pypy have a penalty cost at the python to C interface
2- I found this memory leak: pypy/pypy#5100

As implementation I took inspiration from msgpack https://github.com/msgpack/msgpack-python/blob/main/setup.py
@whitequark
Copy link
Contributor

whitequark commented May 10, 2025

Unfortunately the performance hit from using pure-Python version is massive:

import timeit
import os
if "PURE_PYTHON" in os.environ:
    from cobs.cobs import _cobs_py as cobs
else:
    from cobs import cobs

with open("/vmlinuz", "rb") as f:
    ddata = f.read()
    edata = cobs.encode(ddata)

number = int(os.environ.get("ITERS", "100"))
dtime = timeit.timeit(lambda: cobs.encode(ddata), number=number) / number / len(ddata)
etime = timeit.timeit(lambda: cobs.decode(edata), number=number) / number / len(edata)

print(f"encoding: {1 / (dtime * 1e9)} GB/s")
print(f"decoding: {1 / (etime * 1e9)} GB/s")
$ pdm run python --version
Python 3.13.3
$ pdm run benchmark.py 
encoding: 1.1086298500843723 GB/s
decoding: 2.100975460564308 GB/s
$ ITERS=10 PURE_PYTHON=1 pdm run benchmark.py 
encoding: 0.015219794224444787 GB/s
decoding: 0.04777418311304811 GB/s
$ pdm run --venv pypy python --version
Python 3.11.11 (7.3.19+dfsg-2, Apr 11 2025, 02:57:20)
[PyPy 7.3.19 with GCC 14.2.0]
$ pdm run --venv pypy benchmark.py 
encoding: 0.6770020925247543 GB/s
decoding: 0.8494924667477181 GB/s
$ ITERS=10 PURE_PYTHON=1 pdm run --venv pypy benchmark.py 
encoding: 0.013618624366727946 GB/s
decoding: 0.013457611599891738 GB/s

For many applications, such as mine, this would make the package effectively useless.

@cmcqueen
Copy link
Owner

I wonder if it would be possible to set up the package so that the C extension is an installation option.

I'm only vaguely aware of a possible mechanism: in the qrcode package, it mentions the ability to install it with an optional dependency:

For more image functionality, install qrcode with the pil dependency so that pillow is installed and can be used for generating images:

pip install "qrcode[pil]"

@whitequark
Copy link
Contributor

I wonder if it would be possible to set up the package so that the C extension is an installation option.

I'm only vaguely aware of a possible mechanism: in the qrcode package, it mentions the ability to install it with an optional dependency:

As far as I'm aware, this mechanism can't be used in the way you suggest.

Ultimately, I think it's OK to leave the C extension to be built when installing on PyPy: it does improve performance (if I recall correctly by a factor of about 2x) despite the fact that on PyPy, the Python C API bridge is relatively slow and the use of CFFI is preferred.

@cmcqueen
Copy link
Owner

I would prefer to solve this the "proper way", but I don't know what the proper way is. I've looked at this several times over the years. It is hard to come up with a solution that satisfies all use-cases automatically. Every use-case can be worked out manually, but automatic is preferred.

Some projects in the past have written the setup.py to attempt building a C extension, and fall back to installing the pure Python version. But these days it seems Python is preferring to move away from bespoke setup.py installation methods.

I could split the cobs package into two separate packages, something like cobs_pure and cobs_ext. But that still doesn't provide a mechanism for automatically selecting the ideal package when installing dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants