Showing posts with label lxml. Show all posts
Showing posts with label lxml. Show all posts

2023-09-09

Notes on building Genomic Data Commons gdc-client

The National Cancer Institute’s Genomic Data Commons (GDC) produces a tool which facilitates data transfer to and from their data repository called gdc-client, which is open sourced on GitHub.

My first pass at building it gave an error while trying to build lxml without Cython:

      building 'lxml.etree' extension

      creating build/temp.linux-x86_64-cpython-311

      creating build/temp.linux-x86_64-cpython-311/src

      creating build/temp.linux-x86_64-cpython-311/src/lxml

      gcc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DCYTHON_CLINE_IN_TRACEBACK=0 -I/usr/include/libxml2 -Isrc -Isrc/lxml/includes -I/home/chind/Src/gdc-client/venv/include -I/home/chind/opt/include/python3.11 -c src/lxml/etree.c -o build/temp.linux-x86_64-cpython-311/src/lxml/etree.o -w

      src/lxml/etree.c:289:12: fatal error: longintrepr.h: No such file or directory

        289 |   #include "longintrepr.h"

            |            ^~~~~~~~~~~~~~~

      compilation terminated.

      Compile failed: command '/usr/bin/gcc' failed with exit code 1

The fix was to build and install lxml from source, using Cython. And Cython < 3 is needed, i.e. Cython 0.29.x.

Once lxml 4.4.2 was installed manually, following the gdc-client build instructions was successful, and the gdc-client script was created.

For more detail, see this Gist.