pax_global_header 0000666 0000000 0000000 00000000064 14031305437 0014512 g ustar 00root root 0000000 0000000 52 comment=c36468ac609386dafe0e6e93a120cf25951b06be cooler-0.8.11/ 0000775 0000000 0000000 00000000000 14031305437 0013064 5 ustar 00root root 0000000 0000000 cooler-0.8.11/.codecov.yml 0000664 0000000 0000000 00000000345 14031305437 0015311 0 ustar 00root root 0000000 0000000 codecov: notify: require_ci_to_pass: yes coverage: precision: 2 round: down range: 70..100 status: project: default: target: 90% threshold: 1% patch: no changes: no comment: off cooler-0.8.11/.coveragerc 0000664 0000000 0000000 00000000361 14031305437 0015205 0 ustar 00root root 0000000 0000000 [run] source= cooler/ omit= cooler/__main__.py cooler/_version.py cooler/sandbox/* cooler/io.py cooler/cli/csort.py [report] exclude_lines = pragma: no cover return NotImplemented raise NotImplementedError cooler-0.8.11/.gitignore 0000664 0000000 0000000 00000000426 14031305437 0015056 0 ustar 00root root 0000000 0000000 *.swp *.swo *~ *.py[cod] __pycache__ # test and coverage artifacts .cache .pytest_cache .coverage coverage.xml htmlcov/ # setup and build artifacts docs/_* *.egg-info/ dist/ build/ MANIFEST # OS-generated files .DS_Store .Spotlight-V100 .Trashes ehthumbs.db Thumbs.db tmp/ cooler-0.8.11/.readthedocs.yml 0000664 0000000 0000000 00000001071 14031305437 0016151 0 ustar 00root root 0000000 0000000 # .readthedocs.yml # Read the Docs configuration file # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details # Required version: 2 # Build documentation in the docs/ directory with Sphinx sphinx: configuration: docs/conf.py # Build documentation with MkDocs #mkdocs: # configuration: mkdocs.yml # Optionally build your docs in additional formats such as PDF and ePub formats: all # Optionally set the version of Python and requirements required to build your docs python: version: 3.7 install: - requirements: docs/requirements.txt cooler-0.8.11/.travis.yml 0000664 0000000 0000000 00000002342 14031305437 0015176 0 ustar 00root root 0000000 0000000 language: python python: # We don't actually use the Travis Python, but this keeps it organized. - "2.7" - "3.6" - "3.7" - "3.8" - "3.9" install: - sudo apt-get update - sudo apt-get install -y pigz tabix # We do this conditionally because it saves us some downloading if the version is the same. - if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh; else wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh; fi - bash miniconda.sh -b -p $HOME/miniconda - export PATH="$HOME/miniconda/bin:$PATH" - hash -r - conda config --set always_yes yes --set changeps1 no - conda update -q conda # Useful for debugging any issues with conda - conda info -a # Create test environment and install deps - conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION numpy cython h5py - source activate test-environment - pip install six scipy pandas dask[array,dataframe] - pip install 'biopython<1.77' pysam psutil ipytree matplotlib - pip install mock pytest pytest-flake8 pytest-cov codecov - pip install -e . script: - pytest after_success: - codecov cooler-0.8.11/.zenodo.json 0000664 0000000 0000000 00000003764 14031305437 0015345 0 ustar 00root root 0000000 0000000 { "creators": [ { "name": "Abdennur, Nezar", "affiliation": "MIT, Cambridge, MA, USA", "orcid": "0000-0001-5814-0864" }, { "name": "Goloborodko, Anton", "affiliation": "MIT, Cambridge, MA, USA" }, { "name": "Imakaev, Maxim", "affiliation": "MIT, Cambridge, MA, USA" }, { "name": "Kerpedjiev, Peter", "affiliation": "Harvard Medical School" }, { "name": "Fudenberg, Geoffrey", "affiliation": "UCSF" }, { "name": "Oullette, Scott", "affiliation": "Harvard Medical School" }, { "name": "Lee, Soo", "affiliation": "Harvard Medical School" }, { "name": "Strobelt, Hendrik", "affiliation": "Harvard" }, { "name": "Gehlenborg, Nils", "affiliation": "Harvard Medical School" }, { "name": "Mirny, Leonid", "affiliation": "MIT, Cambridge, MA, USA" } ], "keywords": [ "bioinformatics", "genomics", "Hi-C", "sparse", "matrix", "format", "Python", "out-of-core" ], "description": "
Cooler is a Python support library for .cool files: an efficient storage format for high resolution genomic interaction matrices.
\n\nThe cooler package aims to provide the following functionality:
\n\nFollow cooler development on GitHub.
", "access_right": "open", "license": "BSD-3-Clause", "upload_type": "software" } cooler-0.8.11/CHANGES.md 0000664 0000000 0000000 00000040545 14031305437 0014466 0 ustar 00root root 0000000 0000000 # Release notes # ## [v0.8.10](https://github.com/open2c/cooler/compare/v0.8.9...v0.8.10) Date : 2020-09-25 ### Bug fixes * Fixed the new header parsing in `cooler cload pairs` to handle esoteric file stream implementations. Specifically `GzipFile` had stopped working. By @golobor ## [v0.8.9](https://github.com/open2c/cooler/compare/v0.8.8...v0.8.9) Date : 2020-07-17 ### Enhancements * Added single-cell cooler file flavor (.scool) (#201) ## [v0.8.8](https://github.com/open2c/cooler/compare/v0.8.7...v0.8.8) Date : 2020-06-23 ### Maintenance * Improved code coverage * Added missing autodoc for cooler balance * Dropped pysam and biopython as hard dependencies * Officially sunsetting Python 2.7 support ### Enhancements * Added zoom progressions (#203) ### Bug fixes * Allow hashes in read IDs in cload pairs (#193) ## [v0.8.7](https://github.com/open2c/cooler/compare/v0.8.6...v0.8.7) Date: 2020-01-12 ### Maintenance * Code styling with black * Add coverage reporting ### Bug fixes * Replace `json` with `simplejson` to deal with attrs stored as bytes ## [v0.8.6](https://github.com/open2c/cooler/compare/v0.8.5...v0.8.6) Date: 2019-08-12 ### Maintenance * Added contributing guidelines ### Bug fixes * Fixed a related regression that affected selection of the `chrom` column. Post-release `v0.8.6.post0`: requirements files added to MANIFEST.in ## [v0.8.5](https://github.com/open2c/cooler/compare/v0.8.4...v0.8.5) Date: 2019-04-08 ### Bug fixes * Fixed a regression that prevented selection of bins excluding the `chrom` column. ## [v0.8.4](https://github.com/open2c/cooler/compare/v0.8.3...v0.8.4) Date: 2019-04-04 ### Enhancements * When creating coolers from unordered input, change the default temporary dir to be the same as the output file instead of the system tmp (pass '-' to use the system one). #150 * `cooler ls` and `list_coolers()` now output paths in natural order. #153 * New option in `cooler.matrix()` to handle divisive balancing weight vectors. ### Bug fixes * Restore function of `--count-as-float` option to `cooler load` * Fixed partitioning issue sometimes causing some bins to get split during coarsen * `rename_chroms()` will refresh cached chromosome names #147 * `Cooler.bins()` selector will always properly convert bins/chrom integer IDs to categorical chromosome names when the number of contigs is very large and therefore the HDF5 ENUM header is missing. Before this would only happen when explicitly requesting `convert_enum=True`. ## [v0.8.3](https://github.com/open2c/cooler/compare/v0.8.2...v0.8.3) Date: 2019-02-11 ### Bug fixes * Fixed import bug in `rename_chroms` * `create_cooler` no longer requires a "count" column when specifying custom value columns ## [v0.8.2](https://github.com/open2c/cooler/compare/v0.8.1...v0.8.2) Date: 2019-01-20 ### Enhancements New options for `cooler dump` pixel output: * `--matrix` option: Applies to symmetric-upper coolers; no-op for square coolers. Generates all lower triangular pixels necessary to fill the requested genomic query window. Without this option, `cooler dump` will only return the data explicity stored in the pixel table (i.e. upper triangle). * `-one-based-ids` and `--one-based-starts` convenience options. ### Bug fixes * A bug was introduced into the matrix-as-pixels selector in 0.8.0 that also affected `cooler dump`. The behavior has been restored to that in 0.7. ## [v0.8.1](https://github.com/open2c/cooler/compare/v0.8.0...v0.8.1) Date: 2019-01-02 ### Enhancements * `cooler zoomify` command can take additional base resolutions as input. ### Bug fixes * Fixed regression that slowed down pre-processing during coarsen. * Fixed missing import on handling bad URIs. * Restore but deprecate `cooler.io.ls` for backwards compatibility. ## [v0.8.0](https://github.com/open2c/cooler/compare/v0.7.11...v0.8.0) Date: 2018-12-31 This is a major release from 0.7 and includes an updated format version, and several API changes and deprecations. ### Schema * New schema version: v3 * Adds required `storage-mode` metadata attribute. Two possible values: `"symmetric-upper"` indicates a symmetric matrix encoded as upper triangle (previously the only storage mode); `"square"` indicates no special encoding (e.g. for non-symmetric matrices). ### New features * Support for **non-symmetric** matrices, e.g. RNA-DNA maps. * Create function accepts a boolean `symmetric_upper` option to set the storage mode. Default is `True`. * Creation commands also use `symmetric_upper` by default, which can be overridden with a flag. * All main functionality exposed through top-level functions (create, merge, coarsen, zoomify, balance) * New commands for generic file operations and file inspection. ### API changes * `cooler.annotate()` option `replace` now defaults to `False`. * Submodule renaming. Old names are preserved as aliases but are deprecated. * `cooler.io` -> `cooler.create`. * `cooler.ice` -> `cooler.balance`. * New top level public functions: * `cooler.create_cooler()`. Use instead of `cooler.io.create` and `cooler.io.create_from_unordered`. * `cooler.merge_coolers()` * `cooler.coarsen_cooler()` * `cooler.zoomify_cooler()` * `cooler.balance_cooler()`. Alias: `cooler.balance.iterative_correction()`. * Refactored file operations available in `cooler.fileops`. See the API reference. ### CLI changes * Various output options added to `cooler info`, `cooler dump`, `cooler makebins` and `cooler digest`. * Generic data and attribute hierarchy viewers `cooler tree` and `cooler attrs`. * Generic `cp`, `mv` and `ln` convenience commands. * New verbosity and process info options. ### Maintenance * Unit tests refactored and re-written for pytest. ## [v0.7.11](https://github.com/open2c/cooler/compare/v0.7.10...v0.7.11) Date: 2018-08-17 * Genomic range parser supports humanized units (k/K(b), m/M(b), g/G(b)) * Experimental support for arbitrary aggregation operations in `cooler csort` (e.g. mean, median, max, min) * Documentation updates Bug fixes * Fix newline handling for csort when p1 or p2 is last column. * Fix `--count-as-float` regression in load/cload. ## [v0.7.10](https://github.com/open2c/cooler/compare/v0.7.9...v0.7.10) Date: 2018-05-07 * Fix a shallow copy bug in validate pixels causing records to sometimes flip twice. * Add ignore distance (bp) filter to cooler balance * Start using shuffle filter by default ## [v0.7.9](https://github.com/open2c/cooler/compare/v0.7.8...v0.7.9) Date: 2018-03-30 * Indexed pairs loading commands now provide option for 0- or 1-based positions (1-based by default). #115 * Fixed error introduced into cload pairix in last release. ## [v0.7.8](https://github.com/open2c/cooler/compare/v0.7.7...v0.7.8) Date: 2018-03-18 ### Enhancements * New `cooler cload pairs` command provides index-free loading of pairs. * Changed name of `create_from_unsorted` to more correct `create_from_unordered`. ### Bug fixes * Fixed broken use of single-file temporary store in `create_from_unordered`. * Added heuristic in pairix cload to prevent excessively large chunks. #92 * Added extra checks in `cload pairix` and `cload tabix`. #62, #75 ## [v0.7.7](https://github.com/open2c/cooler/compare/v0.7.6...v0.7.7) Date: 2018-03-16 ### Enhancements * Implementation of unsorted (index-free) loading * `cooler.io.create_from_unsorted` takes an iterable of pixel dataframe chunks that need not be properly sorted. * Use input sanitization procedures for pairs `sanitize_records` and binned data `sanitize_pixels` to feed data to `create_from_unsorted`. #87 #108 #109 * The `cooler load` command is now index-free: unsorted `COO` and `BG2` input data can be streamed in. #90. This will soon be implemented as an option for loading pairs as well. * Prevent `cooler balance` command from exiting with non-zero status upon failed convergence using convergence error policies. #93 * Improve the `create` API to support pandas read_csv-style `columns` and `dtype` kwargs to add extra value columns or override default dtypes. #108 * Experimental implementation of trans-only balancing. #56 ### Bug fixes * Fix argmax deprecation. #99 ## [v0.7.6](https://github.com/open2c/cooler/compare/v0.7.5...v0.7.6) Date: 2017-10-31 ### Enhancements * Cooler zoomify with explicit resolutions * Towards standardization of multicooler structure * Support for loading 1-based COO triplet input files ### Bug fixes * Fixed issue of exceeding header limit with too many scaffolds. If header size is exceeded, chrom IDs are stored as raw integers instead of HDF5 enums. There should be no effect at the API level. * Fixed issue of single-column chromosomes files not working in `cload`. * Fixed edge case in performing joins when using both `as_pixels` and `join` options in the matrix selector. Happy Halloween! ## [v0.7.5](https://github.com/open2c/cooler/compare/v0.7.4...v0.7.5) Date: 2017-07-13 * Fix pandas issue affecting cases when loading single chromosomes * Add transform options to higlass API ## [v0.7.4](https://github.com/open2c/cooler/compare/v0.7.3...v0.7.4) Date: 2017-05-25 * Fix regression in automatic --balance option in cooler zoomify * Fix special cases where cooler.io.create and append would not work with certain inputs ## [v0.7.3](https://github.com/open2c/cooler/compare/v0.7.2...v0.7.3) Date: 2017-05-22 * Added function to print higlass zoom resolutions for a given genome and base resolution. ## [v0.7.2](https://github.com/open2c/cooler/compare/v0.7.1...v0.7.2) Date: 2017-05-09 * Improve chunking and fix pickling issue with aggregating very large text datasets * Restore zoom binsize metadata to higlass files ## [v0.7.1](https://github.com/open2c/cooler/compare/v0.7.0...v0.7.1) Date: 2017-04-29 * `cooler load` command can now accept supplemental pixel fields and custom field numbers * Fix parsing errors with unused pixel fields * Eliminate hard dependence on dask to make pip installs simpler. Conda package will retain dask as a run time requirement. ## [v0.7.0](https://github.com/open2c/cooler/compare/v0.6.6...v0.7.0) Date: 2017-04-27 ### New features * New Cooler URIs: Full support for Cooler objects anywhere in the data hierarchy of a .cool file * Experimental dask support via `cooler.contrib.dask` * New explicit bin blacklist option for `cooler balance` * Various new CLI tools: * `cooler list` * `cooler copy` * `cooler merge` * `cooler csort` now produces Pairix files by default * `cooler load` now accepts two types of matrix text input formats * 3-column sparse matrix * 7-column bg2.gz (2D bedGraph) indexed with Pairix (e.g. using csort) * `cooler coarsegrain` renamed `cooler coarsen` * Multi-resolution HiGlass input files can now be generated with the `cooler zoomify` command * More flexible API functions to create and append columns to Coolers in `cooler.io` #### API/CLI changes * `cooler.io.create` signature changed; `chromsizes` argument is deprecated. * `cooler csort` argument order changed ### Bug fixes * Chromosome name length restriction removed * `Cooler.open` function now correctly opens the specific root group of the Cooler and behaves like a proper context manager in all cases ## [v0.6.6](https://github.com/open2c/cooler/compare/v0.6.5...v0.6.6) Date: 2017-03-21 * Chromosome names longer than 32 chars are forbidden for now * Improved pairix and tabix iterators, dropped need for slow first pass over contacts ## [v0.6.5](https://github.com/open2c/cooler/compare/v0.6.4...v0.6.5) Date: 2017-03-18 * Fixed pairix aggregator to properly deal with autoflipping of pairs ## [v0.6.4](https://github.com/open2c/cooler/compare/v0.6.3...v0.6.4) Date: 2017-03-17 * Migrated higlass multires aggregator to `cooler coarsegrain` command * Fixed pairix aggregator to properly deal with autoflipping of pairs ## [v0.6.3](https://github.com/open2c/cooler/compare/v0.6.2...v0.6.3) Date: 2017-02-22 * Merge PairixAggregator patch from Soo. * Update repr string * Return matrix scale factor in balance stats rather than the bias scale factor: #35. ## [v0.6.2](https://github.com/open2c/cooler/compare/v0.6.1...v0.6.2) Date: 2017-02-12 Fixed regressions in * cooler cload tabix/pairix failed on non-fixed sized bins * cooler show ## [v0.6.1](https://github.com/open2c/cooler/compare/v0.6.0...v0.6.1) Date: 2017-02-06 * This fixes stale build used in bdist_wheel packaging that broke 0.6.0. #29 ## [v0.6.0](https://github.com/open2c/cooler/compare/v0.5.3...v0.6.0) Date: 2017-02-03 ### Enhancements * Dropped Python 3.3 support. Added 3.6 support. * Added `contrib` subpackage containing utilities for higlass, including multires aggregation. * Fixed various issues with synchronizing read/write multiprocessing with HDF5. * Replacing prints with logging. * Added sandboxed `tools` module to develop utilities for out-of-core algorithms using Coolers. ### New features * Cooler objects have additional convenience properties `chromsizes`, `chromnames`. * New file introspection functions `ls` and `is_cooler` to support nested Cooler groups. * Cooler initializer can accept a file path and path to Cooler group. * `cload` accepts contact lists in hiclib-style HDF5 format, the legacy tabix-indexed format, and new pairix-indexed format. ### API/CLI changes * `create` only accepts a file path and optional group path instead of an open file object. * `Cooler.matrix` selector now returns a balanced dense 2D NumPy array by default. Explicitly set `balance` to False to get raw counts and set `sparse` to True to get a `coo_matrix` as per old behavior. * Command line parameters of `cload` changed significantly ### Bug fixes * Fixed bug in `csort` that led to incorrect triangularity of trans read pairs. ## [v0.5.3](https://github.com/open2c/cooler/compare/v0.5.2...v0.5.3) Date: 2016-09-10 * Check for existence of required external tools in CLI * Fixed `cooler show` incompatibility with older versions of matplotlib * Fixed `cooler.annotate` to work on empty dataframe input * Fixed broken pipe signals not getting suppressed on Python 2 * `cooler cload` raises a warning when bin file lists a contig missing from the contact list ## [v0.5.2](https://github.com/open2c/cooler/compare/v0.5.1...v0.5.2) Date: 2016-08-26 * Fix bug in `cooler csort` parsing of chromsizes file. * Workaround for two locale-related issues on Python 3. Only affects cases where a machine's locale is set to ASCII or Unices which use the ambiguous C or POSIX locales. * Fix typo in setup.py and add pysam to dependencies. ## [v0.5.1](https://github.com/open2c/cooler/compare/v0.5.0...v0.5.1) Date: 2016-08-24 * Bug fix in input parser to `cooler csort` * Update triu reording awk template in `cooler csort` * Rename `cooler binnify` to `cooler makebins`. Binnify sounds like "aggregate" which is what `cload` does. ## [v0.5.0](https://github.com/open2c/cooler/compare/v0.4.0...v0.5.0) Date: 2016-08-24 * Most scripts ported over to a new command line interface using the Click framework with many updates. * New `show` and `info` scripts. * Updated Readme. * Minor bug fixes. ## [v0.4.0](https://github.com/open2c/cooler/compare/v0.3.0...v0.4.0) Date: 2016-08-18 ### Schema * Updated file schema: v2 * `/bins/chroms` is now an enum instead of string column ### API changes * Table views are a bit more intuitive: selecting field names on table view objects returns a new view on the subset of columns. * New API function: `cooler.annotate` for doing joins ### New Features * Support for nested Cooler "trees" at any depth in an HDF5 hierarchy * Refactored `cooler.io` to provide "contact readers" that process different kinds of input (aggregate from a contact list, load from an existing matrix, etc.) * Added new scripts for contact aggregation, loading, dumping and balancing ## [v0.3.0](https://github.com/open2c/cooler/compare/v0.2.1...v0.3.0) Date: 2016-02-18 * 2D range selector `matrix()` now provides either rectangular data as coo_matrix or triangular data as a pixel table dataframe. * Added binning support for any genome segmentation (i.e., fixed or variable bin width). * Fixed issues with binning data from mapped read files. * Genomic locus string parser now accepts ENSEMBL-style number-only chromosome names and FASTA-style sequence names containing pipes. ## [v0.2.1](https://github.com/open2c/cooler/compare/v0.2...v0.2.1) Date: 2016-02-07 * Fixed bintable region fetcher ## [v0.2](https://github.com/open2c/cooler/compare/v0.1...v0.2) Date: 2016-01-17 * First beta release ## [v0.1](https://github.com/open2c/cooler/releases/tag/v0.1) Date: 2015-11-22 * Working initial prototype. cooler-0.8.11/CONTRIBUTING.md 0000664 0000000 0000000 00000007707 14031305437 0015330 0 ustar 00root root 0000000 0000000 # Contributing ## General guidelines If you haven't contributed to open-source before, we recommend you read [this excellent guide by GitHub on how to contribute to open source](https://opensource.guide/how-to-contribute). The guide is long, so you can gloss over things you're familiar with. If you're not already familiar with it, we follow the [fork and pull model](https://help.github.com/articles/about-collaborative-development-models) on GitHub. Also, check out this recommended [git workflow](https://www.asmeurer.com/git-workflow/). ## Contributing Code This project has a number of requirements for all code contributed. * We follow the [PEP-8 style](https://www.python.org/dev/peps/pep-0008/) convention. * We use [Numpy-style docstrings](https://numpydoc.readthedocs.io/en/latest/format.html). * It's ideal if user-facing API changes or new features have documentation added. ## Setting up Your Development Environment After forking and cloning the repository, install in "editable" (i.e. development) mode using the `-e` option: ```sh git clone https://github.com/open2c/cooler.git cd cooler pip install -e .[all] ``` Editable mode installs the package by creating a "link" to the working (repo) directory. ## Running/Adding Unit Tests It is best if all new functionality and/or bug fixes have unit tests added with each use-case. We use [pytest](https://docs.pytest.org/en/latest) as our unit testing framework. Once you've configured your environment, you can just `cd` to the root of your repository and run ```sh pytest ``` Unit tests are automatically run on Travis CI for pull requests. ## Adding/Building the Documentation If a feature is stable and relatively finalized, it is time to add it to the documentation. If you are adding any private/public functions, it is best to add docstrings, to aid in reviewing code and also for the API reference. We use [Numpy style docstrings](https://numpydoc.readthedocs.io/en/latest/format.html>) and [Sphinx](http://www.sphinx-doc.org/en/stable) to document this library. Sphinx, in turn, uses [reStructuredText](http://www.sphinx-doc.org/en/stable/rest.html) as its markup language for adding code. We use the [Sphinx Autosummary extension](http://www.sphinx-doc.org/en/stable/ext/autosummary.html) to generate API references. You may want to look at `docs/api.rst` to see how these files look and where to add new functions, classes or modules. To build the documentation: ```sh make docs ``` After this, you can find an HTML version of the documentation in `docs/_build/html/index.html`. Documentation from `master` and tagged releases is automatically built and hosted thanks to [readthedocs](https://readthedocs.org/). ## Acknowledgments This document is based off of the [guidelines from the sparse project](https://github.com/pydata/sparse/blob/master/docs/contributing.rst).