pax_global_header 0000666 0000000 0000000 00000000064 13131213142 0014501 g ustar 00root root 0000000 0000000 52 comment=c583bf2e4e2fa6ba70c88fe88a7c3b39789f8c51
transit-2.1.1/ 0000775 0000000 0000000 00000000000 13131213142 0013166 5 ustar 00root root 0000000 0000000 transit-2.1.1/.gitignore 0000664 0000000 0000000 00000001145 13131213142 0015157 0 ustar 00root root 0000000 0000000 # Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
# C extensions
*.so
#atom files
*.nfs*
# Distribution / packaging
.Python
env/
bin/
build/
develop-eggs/
dist/
eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.cache
nosetests.xml
coverage.xml
# Translations
*.mo
# Mr Developer
.mr.developer.cfg
.project
.pydevproject
# Rope
.ropeproject
# Django stuff:
*.log
*.pot
# Sphinx documentation
docs/_build/
#Data files
*.wig
*.dat
*.prot_table
*.png
*.out
transit-2.1.1/.hgignore 0000664 0000000 0000000 00000000362 13131213142 0014772 0 ustar 00root root 0000000 0000000 ^output/.*
^plugins/.*
^sdk/.*
^src/.*
^build/.*
^dist/.*
^ObjectListView-1.2/.*
^backup/.*
^listctrl_example/.*
^example/.*
^source_
.*\.pyc$
.*\.swp$
.*\.png$
.*\.bat$
.*\.jpeg$
.*\.txt$
.*\.dat$
.*\.exe$
.*\.spec$
.*\.tar.gz$
^.hgignore~$
transit-2.1.1/.travis.yml 0000664 0000000 0000000 00000001057 13131213142 0015302 0 ustar 00root root 0000000 0000000 language: python
matrix:
include:
- os: linux
dist: trusty
sudo: required
python: 2.7
before_install: "sudo apt-get install -y -f python python-dev python-pip pkg-config python-wxgtk2.8 libpng-dev libjpeg8-dev libfreetype6-dev"
install: "pip install --upgrade pip setuptools numpy scipy pillow matplotlib pytest"
DISPLAY: 0.0
notifications:
email:
on_success: change # default: change
on_failure: change # default: always
before_script: cd /home/travis/build/mad-lab/transit/tests
script: travis_wait 30 pytest
transit-2.1.1/CHANGELOG.md 0000664 0000000 0000000 00000015240 13131213142 0015001 0 ustar 00root root 0000000 0000000 # Change log
All notable changes to this project will be documented in this file.
## Version 2.1.0 - 2017-06-23
- TRANSIT:
- Added tooltips next to most parameters to explain their functionality.
- Added Quality Control window, with choice for normalization method.
- Added more normalization options to the HMM method.
- Added LOESS correction functionality back to TRANSIT
- Added ability to scale Track View based on mean-count of the window.
- Added ability to scale individual tracks in Track View.
- Added ability to add tracks of features to Track View.
- New documentation on normalization.
- TPP:
- TPP can now accept empty primer prefix (in case reads have been trimmed).
- TPP can now process reads obtained using Mme1 enzyme and protocol.
- TPP can now pass flags to BWA.
## Version 2.0.2 - 2016-08-19
- TRANSIT:
- Now accepts GFF3 formatted annotations.
- Added ability to specify pseudocounts for resampling.
- Added extra columns to resampling output.
- Fixed bug with some log2FC calculations.
- Export to combined wig format now asks for normalization BEFORE file name.
- Fixed bug preventing Quality Control window from opening.
- Miscellanous bug fixes.
- Updates to Documentation
- TPP:
- Now accepts custom primer sequences.
- Reporting additional diagnostic statistics for reads mapping to phiMycoMarT7, and Illumina adapters.
- Miscellaneous bug fixes.
## Version 2.0.1 - 2016-07-05
-TRANSIT:
- Fixed crash in TPP.
- Misc changes for outputs.
## Version 2.0.0 - 2016-06-16
- TRANSIT:
- Added new method for datasets created with Tn5 transposons.
- Added label indicating intended transposons for the methods.
- Added textbox with short description of the chosen method.
- Changed methods choices to be in menu (on top).
- Changed the file display window.
- Added Help menu with link to online documentation.
- Added new logo.
- Added option to export (normalized) datasets to IGV or combined wig format.
- Can now select multiple .wig files at the same time (Ctrl + select).
- Lots of changes under the hood.
## Version 1.4.5 - 2016-01-10
- TRANSIT:
- Added Binomial analysis method as an option to TRANSIT.
- Added DE-HMM analysis method as an option to TRANSIT.
## Version 1.4.3.1 - 2016-01-02
- TRANSIT:
- Fixed bug causing TRANSIT not to open on some Windows systems.
## Version 1.4.3 - 2015-12-04
- TRANSIT:
- Precision of resampling p-values in output file now increases with sample size
- Added preliminary Quality Control functionality. Select some datasets and click View -> Quality Control
- In resampling, changed logFC to divide by number of replicates
- Changed plotting of results files to be more versitile
- Fixed bug causing HMM_sites output not to be added to list of files
- Fixed bug causing LOESS correction not to work in HMM
## Version 1.4.2 - 2015-07-29
- TRANSIT:
- Added Total Trimmed Reads normaliztion (TTR) as the default option. This is the recommended normalization method at this point.
- Added BetaGeomtric Correction (betageom) as a normalization option. This is recommended for datasets that are very skewed.
- Fixed bug that caused transit to create histograms when not desired.
- Added a pseudo-count when calculating log-FC to genes without reads.
- Increased size of result windows so that all columns are immediately visible.
## Version 1.4.1 - 2015-06-5
- TRANSIT:
- TRANSIT now accepts read-counts in floating-point precision, not just integers.
- Made transit work with most recent versions of matplotlib.
## Version 1.4.0 - 2015-05-27
- TRANSIT:
- Added option to correct for genomic position bias (using LOESS)
- Added more options for normalization, including zero-inflated negative binomial and quantile normalization.
- TPP:
- Eliminated soft-clipped reads.
- Modified template_counts() to be much more memory efficient (does not need gigabytes of RAM any more to process large datasets)
- Added ability to process Tn5 datasets
## Version 1.3.0 - 2015-03-31
- TRANSIT:
- Fixed threading issue for volcano plot.
- Improved format and quality of the output messages.
- Fixed direction of log-fold change in volcano plots.
- Added log-fold change column to resampling output file.
- Made adaptive resampling work better with custom sample sizes.
- TPP:
- Fixed genomic portion for single ends.
- Added usage help as part of command line arguments.
## Version 1.2.33 - 2015-03-06
- TRANSIT:
- Fixed issue with histograms create using adaptive resampling.
## Version 1.2.32 - 2015-03-05
- TRANSIT:
- Put .pyc files in in new src/ directory.
- Fixed error that sometimes occurred when plotting volcano plots.
- Made TRANSIT default to the current working directory when opening file dialogs.
- TPP:
- TPP can now process files with single-end reads.
- TPP can now process *.fasta and compressed files with "*.fastq.gz" extension
## Version 1.2.7 - 2015-02-25
- TRANSIT:
- Fixed error that occured when displaying graphs after running an analysis.
- Updated datasets included in the data/ directory.
- TPP:
- Removed the requirement for wxPython when running TPP on command-line mode.
## Version 1.1 - 2015-02-20
- TRANSIT:
- Fixed error in HMM results file table, which was not correctly showing breakdown of genes.
- Made TRANSIT work from the command-line, without displaying GUI. See documentation for arguments/flags.
- Added ability to convert annotation files between several formats (.prot_table, ptt.table, gff3).
- TPP:
- User can supply reads in either FastA or FastQ format.
- Added an option to specify number of mismatches (default=1) when looking
for sequence patterns such as the transposon prefix in read 1.
- Added command-line arguments so TPP can be run in batch mode without the GUI.
- Number of mapped reads for R1 and R2 independently is also now reported.
- Modified how barcodes are extracted from read 2. It now looks for specific
sequence patterns, even if they are shifted. This should greatly increase
the number of mapped reads (esp. the genomic part of R2) for certain datasets.
- Properly handle short fragments, ie. for reads where the insert size is shorter
than the read length. In such cases, the adapter from other end appears
at the end of read 1, and this suffix is now stripped off so these reads
will map too.
## Version 1.0 - 2015-02-10
- First limited-release version of TRANSIT
- Released to close collaborators first and presented in teleconference to get feedback.
transit-2.1.1/LICENSE.md 0000664 0000000 0000000 00000104553 13131213142 0014602 0 ustar 00root root 0000000 0000000 GNU GENERAL PUBLIC LICENSE
==========================
Version 3, 29 June 2007
Copyright © 2007 Free Software Foundation, Inc. <>
Everyone is permitted to copy and distribute verbatim copies of this license
document, but changing it is not allowed.
## Preamble
The GNU General Public License is a free, copyleft license for software and other
kinds of works.
The licenses for most software and other practical works are designed to take away
your freedom to share and change the works. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change all versions of a
program--to make sure it remains free software for all its users. We, the Free
Software Foundation, use the GNU General Public License for most of our software; it
applies also to any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not price. Our General
Public Licenses are designed to make sure that you have the freedom to distribute
copies of free software (and charge for them if you wish), that you receive source
code or can get it if you want it, that you can change the software or use pieces of
it in new free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you these rights or
asking you to surrender the rights. Therefore, you have certain responsibilities if
you distribute copies of the software, or if you modify it: responsibilities to
respect the freedom of others.
For example, if you distribute copies of such a program, whether gratis or for a fee,
you must pass on to the recipients the same freedoms that you received. You must make
sure that they, too, receive or can get the source code. And you must show them these
terms so they know their rights.
Developers that use the GNU GPL protect your rights with two steps: (1) assert
copyright on the software, and (2) offer you this License giving you legal permission
to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains that there is
no warranty for this free software. For both users' and authors' sake, the GPL
requires that modified versions be marked as changed, so that their problems will not
be attributed erroneously to authors of previous versions.
Some devices are designed to deny users access to install or run modified versions of
the software inside them, although the manufacturer can do so. This is fundamentally
incompatible with the aim of protecting users' freedom to change the software. The
systematic pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we have designed
this version of the GPL to prohibit the practice for those products. If such problems
arise substantially in other domains, we stand ready to extend this provision to
those domains in future versions of the GPL, as needed to protect the freedom of
users.
Finally, every program is threatened constantly by software patents. States should
not allow patents to restrict development and use of software on general-purpose
computers, but in those that do, we wish to avoid the special danger that patents
applied to a free program could make it effectively proprietary. To prevent this, the
GPL assures that patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and modification follow.
## TERMS AND CONDITIONS
### 0. Definitions.
“This License” refers to version 3 of the GNU General Public License.
“Copyright” also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
“The Program” refers to any copyrightable work licensed under this
License. Each licensee is addressed as “you”. “Licensees” and
“recipients” may be individuals or organizations.
To “modify” a work means to copy from or adapt all or part of the work in
a fashion requiring copyright permission, other than the making of an exact copy. The
resulting work is called a “modified version” of the earlier work or a
work “based on” the earlier work.
A “covered work” means either the unmodified Program or a work based on
the Program.
To “propagate” a work means to do anything with it that, without
permission, would make you directly or secondarily liable for infringement under
applicable copyright law, except executing it on a computer or modifying a private
copy. Propagation includes copying, distribution (with or without modification),
making available to the public, and in some countries other activities as well.
To “convey” a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through a computer
network, with no transfer of a copy, is not conveying.
An interactive user interface displays “Appropriate Legal Notices” to the
extent that it includes a convenient and prominently visible feature that (1)
displays an appropriate copyright notice, and (2) tells the user that there is no
warranty for the work (except to the extent that warranties are provided), that
licensees may convey the work under this License, and how to view a copy of this
License. If the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
### 1. Source Code.
The “source code” for a work means the preferred form of the work for
making modifications to it. “Object code” means any non-source form of a
work.
A “Standard Interface” means an interface that either is an official
standard defined by a recognized standards body, or, in the case of interfaces
specified for a particular programming language, one that is widely used among
developers working in that language.
The “System Libraries” of an executable work include anything, other than
the work as a whole, that (a) is included in the normal form of packaging a Major
Component, but which is not part of that Major Component, and (b) serves only to
enable use of the work with that Major Component, or to implement a Standard
Interface for which an implementation is available to the public in source code form.
A “Major Component”, in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system (if any) on which
the executable work runs, or a compiler used to produce the work, or an object code
interpreter used to run it.
The “Corresponding Source” for a work in object code form means all the
source code needed to generate, install, and (for an executable work) run the object
code and to modify the work, including scripts to control those activities. However,
it does not include the work's System Libraries, or general-purpose tools or
generally available free programs which are used unmodified in performing those
activities but which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for the work, and
the source code for shared libraries and dynamically linked subprograms that the work
is specifically designed to require, such as by intimate data communication or
control flow between those subprograms and other parts of the work.
The Corresponding Source need not include anything that users can regenerate
automatically from other parts of the Corresponding Source.
The Corresponding Source for a work in source code form is that same work.
### 2. Basic Permissions.
All rights granted under this License are granted for the term of copyright on the
Program, and are irrevocable provided the stated conditions are met. This License
explicitly affirms your unlimited permission to run the unmodified Program. The
output from running a covered work is covered by this License only if the output,
given its content, constitutes a covered work. This License acknowledges your rights
of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not convey, without
conditions so long as your license otherwise remains in force. You may convey covered
works to others for the sole purpose of having them make modifications exclusively
for you, or provide you with facilities for running those works, provided that you
comply with the terms of this License in conveying all material for which you do not
control copyright. Those thus making or running the covered works for you must do so
exclusively on your behalf, under your direction and control, on terms that prohibit
them from making any copies of your copyrighted material outside their relationship
with you.
Conveying under any other circumstances is permitted solely under the conditions
stated below. Sublicensing is not allowed; section 10 makes it unnecessary.
### 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological measure under any
applicable law fulfilling obligations under article 11 of the WIPO copyright treaty
adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention
of such measures.
When you convey a covered work, you waive any legal power to forbid circumvention of
technological measures to the extent such circumvention is effected by exercising
rights under this License with respect to the covered work, and you disclaim any
intention to limit operation or modification of the work as a means of enforcing,
against the work's users, your or third parties' legal rights to forbid circumvention
of technological measures.
### 4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you receive it, in any
medium, provided that you conspicuously and appropriately publish on each copy an
appropriate copyright notice; keep intact all notices stating that this License and
any non-permissive terms added in accord with section 7 apply to the code; keep
intact all notices of the absence of any warranty; and give all recipients a copy of
this License along with the Program.
You may charge any price or no price for each copy that you convey, and you may offer
support or warranty protection for a fee.
### 5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to produce it from
the Program, in the form of source code under the terms of section 4, provided that
you also meet all of these conditions:
* **a)** The work must carry prominent notices stating that you modified it, and giving a
relevant date.
* **b)** The work must carry prominent notices stating that it is released under this
License and any conditions added under section 7. This requirement modifies the
requirement in section 4 to “keep intact all notices”.
* **c)** You must license the entire work, as a whole, under this License to anyone who
comes into possession of a copy. This License will therefore apply, along with any
applicable section 7 additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no permission to license the
work in any other way, but it does not invalidate such permission if you have
separately received it.
* **d)** If the work has interactive user interfaces, each must display Appropriate Legal
Notices; however, if the Program has interactive interfaces that do not display
Appropriate Legal Notices, your work need not make them do so.
A compilation of a covered work with other separate and independent works, which are
not by their nature extensions of the covered work, and which are not combined with
it such as to form a larger program, in or on a volume of a storage or distribution
medium, is called an “aggregate” if the compilation and its resulting
copyright are not used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work in an aggregate
does not cause this License to apply to the other parts of the aggregate.
### 6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms of sections 4 and
5, provided that you also convey the machine-readable Corresponding Source under the
terms of this License, in one of these ways:
* **a)** Convey the object code in, or embodied in, a physical product (including a
physical distribution medium), accompanied by the Corresponding Source fixed on a
durable physical medium customarily used for software interchange.
* **b)** Convey the object code in, or embodied in, a physical product (including a
physical distribution medium), accompanied by a written offer, valid for at least
three years and valid for as long as you offer spare parts or customer support for
that product model, to give anyone who possesses the object code either (1) a copy of
the Corresponding Source for all the software in the product that is covered by this
License, on a durable physical medium customarily used for software interchange, for
a price no more than your reasonable cost of physically performing this conveying of
source, or (2) access to copy the Corresponding Source from a network server at no
charge.
* **c)** Convey individual copies of the object code with a copy of the written offer to
provide the Corresponding Source. This alternative is allowed only occasionally and
noncommercially, and only if you received the object code with such an offer, in
accord with subsection 6b.
* **d)** Convey the object code by offering access from a designated place (gratis or for
a charge), and offer equivalent access to the Corresponding Source in the same way
through the same place at no further charge. You need not require recipients to copy
the Corresponding Source along with the object code. If the place to copy the object
code is a network server, the Corresponding Source may be on a different server
(operated by you or a third party) that supports equivalent copying facilities,
provided you maintain clear directions next to the object code saying where to find
the Corresponding Source. Regardless of what server hosts the Corresponding Source,
you remain obligated to ensure that it is available for as long as needed to satisfy
these requirements.
* **e)** Convey the object code using peer-to-peer transmission, provided you inform
other peers where the object code and Corresponding Source of the work are being
offered to the general public at no charge under subsection 6d.
A separable portion of the object code, whose source code is excluded from the
Corresponding Source as a System Library, need not be included in conveying the
object code work.
A “User Product” is either (1) a “consumer product”, which
means any tangible personal property which is normally used for personal, family, or
household purposes, or (2) anything designed or sold for incorporation into a
dwelling. In determining whether a product is a consumer product, doubtful cases
shall be resolved in favor of coverage. For a particular product received by a
particular user, “normally used” refers to a typical or common use of
that class of product, regardless of the status of the particular user or of the way
in which the particular user actually uses, or expects or is expected to use, the
product. A product is a consumer product regardless of whether the product has
substantial commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
“Installation Information” for a User Product means any methods,
procedures, authorization keys, or other information required to install and execute
modified versions of a covered work in that User Product from a modified version of
its Corresponding Source. The information must suffice to ensure that the continued
functioning of the modified object code is in no case prevented or interfered with
solely because modification has been made.
If you convey an object code work under this section in, or with, or specifically for
use in, a User Product, and the conveying occurs as part of a transaction in which
the right of possession and use of the User Product is transferred to the recipient
in perpetuity or for a fixed term (regardless of how the transaction is
characterized), the Corresponding Source conveyed under this section must be
accompanied by the Installation Information. But this requirement does not apply if
neither you nor any third party retains the ability to install modified object code
on the User Product (for example, the work has been installed in ROM).
The requirement to provide Installation Information does not include a requirement to
continue to provide support service, warranty, or updates for a work that has been
modified or installed by the recipient, or for the User Product in which it has been
modified or installed. Access to a network may be denied when the modification itself
materially and adversely affects the operation of the network or violates the rules
and protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided, in accord with
this section must be in a format that is publicly documented (and with an
implementation available to the public in source code form), and must require no
special password or key for unpacking, reading or copying.
### 7. Additional Terms.
“Additional permissions” are terms that supplement the terms of this
License by making exceptions from one or more of its conditions. Additional
permissions that are applicable to the entire Program shall be treated as though they
were included in this License, to the extent that they are valid under applicable
law. If additional permissions apply only to part of the Program, that part may be
used separately under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option remove any
additional permissions from that copy, or from any part of it. (Additional
permissions may be written to require their own removal in certain cases when you
modify the work.) You may place additional permissions on material, added by you to a
covered work, for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you add to a
covered work, you may (if authorized by the copyright holders of that material)
supplement the terms of this License with terms:
* **a)** Disclaiming warranty or limiting liability differently from the terms of
sections 15 and 16 of this License; or
* **b)** Requiring preservation of specified reasonable legal notices or author
attributions in that material or in the Appropriate Legal Notices displayed by works
containing it; or
* **c)** Prohibiting misrepresentation of the origin of that material, or requiring that
modified versions of such material be marked in reasonable ways as different from the
original version; or
* **d)** Limiting the use for publicity purposes of names of licensors or authors of the
material; or
* **e)** Declining to grant rights under trademark law for use of some trade names,
trademarks, or service marks; or
* **f)** Requiring indemnification of licensors and authors of that material by anyone
who conveys the material (or modified versions of it) with contractual assumptions of
liability to the recipient, for any liability that these contractual assumptions
directly impose on those licensors and authors.
All other non-permissive additional terms are considered “further
restrictions” within the meaning of section 10. If the Program as you received
it, or any part of it, contains a notice stating that it is governed by this License
along with a term that is a further restriction, you may remove that term. If a
license document contains a further restriction but permits relicensing or conveying
under this License, you may add to a covered work material governed by the terms of
that license document, provided that the further restriction does not survive such
relicensing or conveying.
If you add terms to a covered work in accord with this section, you must place, in
the relevant source files, a statement of the additional terms that apply to those
files, or a notice indicating where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the form of a
separately written license, or stated as exceptions; the above requirements apply
either way.
### 8. Termination.
You may not propagate or modify a covered work except as expressly provided under
this License. Any attempt otherwise to propagate or modify it is void, and will
automatically terminate your rights under this License (including any patent licenses
granted under the third paragraph of section 11).
However, if you cease all violation of this License, then your license from a
particular copyright holder is reinstated (a) provisionally, unless and until the
copyright holder explicitly and finally terminates your license, and (b) permanently,
if the copyright holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is reinstated permanently
if the copyright holder notifies you of the violation by some reasonable means, this
is the first time you have received notice of violation of this License (for any
work) from that copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the licenses of
parties who have received copies or rights from you under this License. If your
rights have been terminated and not permanently reinstated, you do not qualify to
receive new licenses for the same material under section 10.
### 9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or run a copy of the
Program. Ancillary propagation of a covered work occurring solely as a consequence of
using peer-to-peer transmission to receive a copy likewise does not require
acceptance. However, nothing other than this License grants you permission to
propagate or modify any covered work. These actions infringe copyright if you do not
accept this License. Therefore, by modifying or propagating a covered work, you
indicate your acceptance of this License to do so.
### 10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically receives a license
from the original licensors, to run, modify and propagate that work, subject to this
License. You are not responsible for enforcing compliance by third parties with this
License.
An “entity transaction” is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an organization, or
merging organizations. If propagation of a covered work results from an entity
transaction, each party to that transaction who receives a copy of the work also
receives whatever licenses to the work the party's predecessor in interest had or
could give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if the predecessor
has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the rights granted or
affirmed under this License. For example, you may not impose a license fee, royalty,
or other charge for exercise of rights granted under this License, and you may not
initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging
that any patent claim is infringed by making, using, selling, offering for sale, or
importing the Program or any portion of it.
### 11. Patents.
A “contributor” is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The work thus
licensed is called the contributor's “contributor version”.
A contributor's “essential patent claims” are all patent claims owned or
controlled by the contributor, whether already acquired or hereafter acquired, that
would be infringed by some manner, permitted by this License, of making, using, or
selling its contributor version, but do not include claims that would be infringed
only as a consequence of further modification of the contributor version. For
purposes of this definition, “control” includes the right to grant patent
sublicenses in a manner consistent with the requirements of this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free patent license
under the contributor's essential patent claims, to make, use, sell, offer for sale,
import and otherwise run, modify and propagate the contents of its contributor
version.
In the following three paragraphs, a “patent license” is any express
agreement or commitment, however denominated, not to enforce a patent (such as an
express permission to practice a patent or covenant not to sue for patent
infringement). To “grant” such a patent license to a party means to make
such an agreement or commitment not to enforce a patent against the party.
If you convey a covered work, knowingly relying on a patent license, and the
Corresponding Source of the work is not available for anyone to copy, free of charge
and under the terms of this License, through a publicly available network server or
other readily accessible means, then you must either (1) cause the Corresponding
Source to be so available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner consistent with
the requirements of this License, to extend the patent license to downstream
recipients. “Knowingly relying” means you have actual knowledge that, but
for the patent license, your conveying the covered work in a country, or your
recipient's use of the covered work in a country, would infringe one or more
identifiable patents in that country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or arrangement, you
convey, or propagate by procuring conveyance of, a covered work, and grant a patent
license to some of the parties receiving the covered work authorizing them to use,
propagate, modify or convey a specific copy of the covered work, then the patent
license you grant is automatically extended to all recipients of the covered work and
works based on it.
A patent license is “discriminatory” if it does not include within the
scope of its coverage, prohibits the exercise of, or is conditioned on the
non-exercise of one or more of the rights that are specifically granted under this
License. You may not convey a covered work if you are a party to an arrangement with
a third party that is in the business of distributing software, under which you make
payment to the third party based on the extent of your activity of conveying the
work, and under which the third party grants, to any of the parties who would receive
the covered work from you, a discriminatory patent license (a) in connection with
copies of the covered work conveyed by you (or copies made from those copies), or (b)
primarily for and in connection with specific products or compilations that contain
the covered work, unless you entered into that arrangement, or that patent license
was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting any implied
license or other defenses to infringement that may otherwise be available to you
under applicable patent law.
### 12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or otherwise)
that contradict the conditions of this License, they do not excuse you from the
conditions of this License. If you cannot convey a covered work so as to satisfy
simultaneously your obligations under this License and any other pertinent
obligations, then as a consequence you may not convey it at all. For example, if you
agree to terms that obligate you to collect a royalty for further conveying from
those to whom you convey the Program, the only way you could satisfy both those terms
and this License would be to refrain entirely from conveying the Program.
### 13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have permission to link or
combine any covered work with a work licensed under version 3 of the GNU Affero
General Public License into a single combined work, and to convey the resulting work.
The terms of this License will continue to apply to the part which is the covered
work, but the special requirements of the GNU Affero General Public License, section
13, concerning interaction through a network will apply to the combination as such.
### 14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of the GNU
General Public License from time to time. Such new versions will be similar in spirit
to the present version, but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Program specifies that
a certain numbered version of the GNU General Public License “or any later
version” applies to it, you have the option of following the terms and
conditions either of that numbered version or of any later version published by the
Free Software Foundation. If the Program does not specify a version number of the GNU
General Public License, you may choose any version ever published by the Free
Software Foundation.
If the Program specifies that a proxy can decide which future versions of the GNU
General Public License can be used, that proxy's public statement of acceptance of a
version permanently authorizes you to choose that version for the Program.
Later license versions may give you additional or different permissions. However, no
additional obligations are imposed on any author or copyright holder as a result of
your choosing to follow a later version.
### 15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
### 16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY
COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS
PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE
OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE
WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
### 17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided above cannot be
given local legal effect according to their terms, reviewing courts shall apply local
law that most closely approximates an absolute waiver of all civil liability in
connection with the Program, unless a warranty or assumption of liability accompanies
a copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
## How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest possible use to
the public, the best way to achieve this is to make it free software which everyone
can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest to attach them
to the start of each source file to most effectively state the exclusion of warranty;
and each file should have at least the “copyright” line and a pointer to
where the full notice is found.
Copyright (C)
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see .
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short notice like this
when it starts in an interactive mode:
Copyright (C)
This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type 'show c' for details.
The hypothetical commands 'show w' and 'show c' should show the appropriate parts of
the General Public License. Of course, your program's commands might be different;
for a GUI interface, you would use an “about box”.
You should also get your employer (if you work as a programmer) or school, if any, to
sign a “copyright disclaimer” for the program, if necessary. For more
information on this, and how to apply and follow the GNU GPL, see
<>.
The GNU General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may consider it
more useful to permit linking proprietary applications with the library. If this is
what you want to do, use the GNU Lesser General Public License instead of this
License. But first, please read
<>.
transit-2.1.1/MANIFEST.in 0000664 0000000 0000000 00000000600 13131213142 0014720 0 ustar 00root root 0000000 0000000 include MANIFEST.in
include README.md
include LICENSE.md
include VERSION
recursive-include src/pytransit/data *
recursive-include src/pytransit/genomes *
recursive-include src/pytransit/doc/build/html *.html
recursive-include src/pytransit/doc/build/html/_images *.png
recursive-include src/pytransit/doc/build/html/_modules *
recursive-include src/pytransit/doc/build/html/_static *
transit-2.1.1/README.md 0000664 0000000 0000000 00000005562 13131213142 0014455 0 ustar 00root root 0000000 0000000
# TRANSIT 2.1.0
[](https://travis-ci.org/mad-lab/transit) [](http://transit.readthedocs.io/en/latest/?badge=latest)
**Version 2.1.0 changes (June, 20017)**
- Added tooltips next to most parameters to explain their functionality.
- Added Quality Control window, with choice for normalization method.
- Added more normalization options to the HMM method.
- Added LOESS correction functionality back to TRANSIT
- Added ability to scale Track View based on mean-count of the window.
- Added ability to scale individual tracks in Track View.
- Added ability to add tracks of features to Track View.
- Better status messages for TrackView
- TPP can now accept empty primer prefix (in case reads have been trimmed).
- TPP can now process reads obtained using Mme1 enzyme and protocol.
- TPP can now pass flags to BWA.
- Lots of bug fixes.
**Version 2.0.2 changes (August, 2016)**
- Added support for for custom primers in TPP.
- Added support for annotations in GFF3 format.
- Ability to specify pseudocounts in resampling.
- Misc. Bug fixes
- **New [mailing list](https://groups.google.com/forum/#!forum/tnseq-transit/join)**
**New in Version 2.0+**
- Support for Tn5 datasets.
- New analysis methods.
- New way to export normalized datasets.
Welcome! This is the distribution for the TRANSIT and TPP tools developed by the Ioerger Lab.
TRANSIT is a tool for the analysis of Tn-Seq data. It provides an easy to use graphical interface and access to three different analysis methods that allow the user to determine essentiality in a single condition as well as between conditions.
## Mailing List
You can join our mailing list to get announcements of new versions, discuss any bugs, or request features! Just head over to the following site and enter your email address:
https://groups.google.com/forum/#!forum/tnseq-transit/join
## Instructions
For full instructions on how to install and run TRANSIT (and the optional pre-processor, TPP), please see the documentation included in this distribution ("doc" folder) or visit the following web page:
http://saclab.tamu.edu/essentiality/transit/transit.html
## Datasets
The TRANSIT distribution comes with some example .wig files in the data/ directory, as well as an example annotation file (.prot\_table format) in the genomes/ directory. Additional genomes may be found on the following website:
http://saclab.tamu.edu/essentiality/transit/genomes/
## Copyright Information
Source code for TRANSIT and TPP are available open source under the terms of the GNU General Public License (Version 3.0) as published by the Free Software Foundation. For more information on this license, please see the included LICENSE.md file or visit their website at:
http://www.gnu.org/licenses/gpl.html
transit-2.1.1/VERSION 0000664 0000000 0000000 00000000034 13131213142 0014233 0 ustar 00root root 0000000 0000000 version: 2.1.0-18-g4192-mod
transit-2.1.1/setup.cfg 0000664 0000000 0000000 00000000050 13131213142 0015002 0 ustar 00root root 0000000 0000000 [metadata]
description-file = README.md
transit-2.1.1/setup.py 0000664 0000000 0000000 00000010545 13131213142 0014705 0 ustar 00root root 0000000 0000000 """A setuptools based setup module.
See:
https://packaging.python.org/en/latest/distributing.html
https://github.com/pypa/sampleproject
"""
# Always prefer setuptools over distutils
from setuptools import setup, find_packages
# To use a consistent encoding
from codecs import open
from os import path
here = path.abspath(path.dirname(__file__))
# Get the long description from the README file
with open(path.join(here, 'README.md'), encoding='utf-8') as f:
long_description = f.read()
# Get current version
import sys
sys.path.insert(1, "src/")
import pytransit
version = pytransit.__version__[1:] #"2.0.3"
setup(
name='tnseq-transit',
# Versions should comply with PEP440. For a discussion on single-sourcing
# the version across setup.py and the project code, see
# https://packaging.python.org/en/latest/single_source_version.html
version=version,
description='TRANSIT is a tool for the analysis of Tn-Seq data. It provides an easy to use graphical interface and access to three different analysis methods that allow the user to determine essentiality in a single condition as well as between conditions.',
long_description=long_description,
# The project's main homepage.
url='https://github.com/mad-lab/transit',
download_url='https://github.com/mad-lab/transit',
# Author details
author='Michael A. DeJesus',
author_email='mad@cs.tamu.edu',
# Choose your license
license='GNU GPL',
# See https://pypi.python.org/pypi?%3Aaction=list_classifiers
classifiers=[
#'Development Status :: 3 - Alpha',
'Development Status :: 5 - Production/Stable',
'Programming Language :: Python :: 2.7',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: GNU General Public License v3 (GPLv3)',
'Operating System :: OS Independent',
'Topic :: Scientific/Engineering :: Bio-Informatics',
],
# What does your project relate to?
keywords=['tnseq', 'analysis', 'biology', 'genome'],
#package_dir = {'tnseq-transit': 'src/pytransit'},
# You can just specify the packages manually here if your project is
# simple. Or you can use find_packages().
packages = find_packages('src', exclude=['contrib', 'tests']),
#packages = ['pytransit'],
package_dir = {'pytransit': 'src/pytransit', 'pytpp': 'src/pytpp'},
include_package_data=True,
#py_modules = ['tpp'],
# Alternatively, if you want to distribute just a my_module.py, uncomment
# this:
# py_modules=["my_module"],
# List run-time dependencies here. These will be installed by pip when
# your project is installed. For an analysis of "install_requires" vs pip's
# requirements files see:
# https://packaging.python.org/en/latest/requirements.html
install_requires=['setuptools', 'numpy', 'scipy', 'pillow', 'matplotlib'],
#dependency_links = [
# "git+https://github.com/wxWidgets/wxPython.git#egg=wxPython"
#],
# List additional groups of dependencies here (e.g. development
# dependencies). You can install these using the following syntax,
# for example:
# $ pip install -e .[dev,test]
#extras_require={
# 'dev': ['check-manifest'],
# 'test': ['coverage'],
#},
# If there are data files included in your packages that need to be
# installed, specify them here. If using Python 2.6 or less, then these
# have to be included in MANIFEST.in as well.
package_data={
'pytransit': ['pytransit/data/*', 'pytransit/doc/*.*', 'pytransit/doc/images/*', 'pytransit/genomes/*']
},
#scripts=['src/tpp.py', 'src/transit.py'],
# Although 'package_data' is the preferred approach, in some case you may
# need to place data files outside of your packages. See:
# http://docs.python.org/3.4/distutils/setupscript.html#installing-additional-files # noqa
# In this case, 'data_file' will be installed into '/my_data'
#data_files=[('transitdata', ['package_data.dat'])],
# To provide executable scripts, use entry points in preference to the
# "scripts" keyword. Entry points provide cross-platform support and allow
# pip to create the appropriate form of executable for the target platform.
entry_points={
'console_scripts': [
'transit=pytransit.__main__:main',
'tpp=pytpp.__main__:main',
],
},
)
transit-2.1.1/src/ 0000775 0000000 0000000 00000000000 13131213142 0013755 5 ustar 00root root 0000000 0000000 transit-2.1.1/src/pytpp/ 0000775 0000000 0000000 00000000000 13131213142 0015131 5 ustar 00root root 0000000 0000000 transit-2.1.1/src/pytpp/__init__.py 0000664 0000000 0000000 00000000001 13131213142 0017231 0 ustar 00root root 0000000 0000000
transit-2.1.1/src/pytpp/__main__.py 0000664 0000000 0000000 00000006156 13131213142 0017233 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# Copyright 2015.
# Michael A. DeJesus, Chaitra Ambadipudi, and Thomas R. Ioerger.
#
#
# This file is part of TRANSIT.
#
# TRANSIT is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License.
#
#
# TRANSIT is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with TRANSIT. If not, see .
import sys
import glob
import os
import time
import math
import sys
import re
import shutil
import platform
import gzip
from tpp_tools import *
from tpp_gui import *
def main(arguments=[]):
vars = Globals()
if len(arguments) <= 1 and hasWx:
app = wx.App(False)
form = MyForm(vars)
form.update_dataset_list()
form.Show()
app.MainLoop()
# vars.action not defined, quit...
if hasattr(vars, 'action'):
if vars.action=="start":
verify_inputs(vars)
if vars.fq2=="": msg = 'running pre-processing on %s' % (vars.fq1)
else: msg = 'running pre-processing on %s and %s' % (vars.fq1,vars.fq2)
message(msg)
message("transposon type: %s" % vars.transposon)
message("protocol: %s" % vars.protocol)
save_config(vars)
driver(vars)
else:
pass
elif len(arguments) <= 1 and not hasWx:
print "Please install wxPython to run in GUI Mode."
print "To run in Console Mode please follow these instructions:"
print ""
show_help()
else:
(args, kwargs) = cleanargs(arguments)
# Show help if needed
if "help" in kwargs or "-help" in kwargs:
show_help()
sys.exit()
# Check for strange flags
known_flags = set(["tn5", "help", "himar1", "protocol", "primer", "reads1",
"reads2", "bwa", "ref", "maxreads", "output", "mismatches", "flags"])
unknown_flags = set(kwargs.keys()) - known_flags
if unknown_flags:
print "error: unrecognized flags:", ", ".join(unknown_flags)
show_help()
sys.exit()
# Initialize variables
initialize_globals(vars, args, kwargs)
# Check inputs make sense
verify_inputs(vars)
# Print some messages
if vars.fq2:
msg = 'running pre-processing on %s' % (vars.fq1)
else:
msg = 'running pre-processing on %s and %s' % (vars.fq1, vars.fq2)
message(msg)
message("protocol: %s" % vars.protocol)
message("transposon type: %s" % vars.transposon)
# Save configuration file
save_config(vars)
# Run TPP
driver(vars)
if __name__ == "__main__":
main(sys.argv[1:])
transit-2.1.1/src/pytpp/tpp_gui.py 0000664 0000000 0000000 00000045132 13131213142 0017157 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# Copyright 2017.
# Michael A. DeJesus, Chaitra Ambadipudi, and Thomas R. Ioerger.
#
#
# This file is part of TRANSIT.
#
# TRANSIT is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License.
#
#
# TRANSIT is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with TRANSIT. If not, see .
import sys
import glob
import os
import time
import math
import re
import shutil
import platform
import gzip
try:
import wx
import wx.lib.filebrowsebutton
hasWx = True
except Exception as e:
hasWx = False
from tpp_tools import *
if hasWx:
class TPPIcon(wx.StaticBitmap):
def __init__(self, panel, flag, bmp, tooltip=""):
wx.StaticBitmap.__init__(self, panel, flag, bmp)
tp = wx.ToolTip(tooltip)
self.SetToolTip(tp)
class MyForm(wx.Frame):
def __init__(self,vars):
self.vars = vars
initialize_globals(self.vars)
wx.Frame.__init__(self, None, wx.ID_ANY, "TPP: Tn-Seq PreProcessor") # v%s" % vars.version)
# Add a panel so it looks the correct on all platforms
panel = wx.ScrolledWindow( self, wx.ID_ANY, wx.DefaultPosition, wx.Size( -1,-1 ), wx.HSCROLL|wx.VSCROLL )
panel.SetScrollRate( 5, 5 )
panel.SetMaxSize( wx.Size( -1, 1000 ) )
sizer = wx.BoxSizer(wx.VERTICAL)
self.list_ctrl = None
self.InitMenu()
self.InitFiles(panel,sizer)
buttonrow = wx.BoxSizer(wx.HORIZONTAL)
btn = wx.Button(panel, label="Start")
btn.Bind(wx.EVT_BUTTON, self.map_reads)
buttonrow.Add(btn,0,0,0,10)
btn = wx.Button(panel, label="Quit")
btn.Bind(wx.EVT_BUTTON, self.OnQuit)
buttonrow.Add(btn,0,0,0,10)
sizer.Add(buttonrow,0,0,0)
self.InitList(panel,sizer)
panel.SetSizer(sizer)
# self.SetSize((1305, 700))
self.SetSize((900, 750))
#self.SetTitle('Simple menu')
self.Centre()
#self.Show(True)
self.pid = None
#
def InitFiles(self,panel,sizer):
vars = self.vars
# Define
bmp = wx.ArtProvider.GetBitmap(wx.ART_INFORMATION, wx.ART_OTHER, (16, 16))
# BWA
sizer0 = wx.BoxSizer(wx.HORIZONTAL)
label0 = wx.StaticText(panel, label='BWA executable:',size=(330,-1))
sizer0.Add(label0,0,wx.ALIGN_CENTER_VERTICAL,0)
self.picker0 = wx.lib.filebrowsebutton.FileBrowseButton(panel, id = wx.ID_ANY, size=(400,30), dialogTitle='Path to BWA', fileMode=wx.OPEN, fileMask='bwa*', startDirectory=os.path.dirname(vars.bwa), initialValue=vars.bwa, labelText='')
sizer0.Add(self.picker0, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer0.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Specify a path to the BWA executable (including the executable)."), flag=wx.CENTER, border=0)
sizer0.Add((10, 1), 0, wx.EXPAND)
sizer.Add(sizer0,0,wx.EXPAND,0)
# REFERENCE
sizer3 = wx.BoxSizer(wx.HORIZONTAL)
label3 = wx.StaticText(panel, label='Choose a reference genome (FASTA):',size=(330,-1))
sizer3.Add(label3,0,wx.ALIGN_CENTER_VERTICAL,0)
self.picker3 = wx.lib.filebrowsebutton.FileBrowseButton(panel, id=wx.ID_ANY, dialogTitle='Please select the reference genome', fileMode=wx.OPEN, fileMask='*.fna;*.fasta;*.fa', size=(400,30), startDirectory=os.path.dirname(vars.ref), initialValue=vars.ref, labelText='')
sizer3.Add(self.picker3, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer3.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Select a reference genome in FASTA format."), flag=wx.CENTER, border=0)
sizer3.Add((10, 1), 0, wx.EXPAND)
sizer.Add(sizer3,0,wx.EXPAND,0)
# READS 1
sizer1 = wx.BoxSizer(wx.HORIZONTAL)
label1 = wx.StaticText(panel, label='Choose the Fastq file for read 1:',size=(330,-1))
sizer1.Add(label1,0,wx.ALIGN_CENTER_VERTICAL,0)
self.picker1 = wx.lib.filebrowsebutton.FileBrowseButton(panel, id=wx.ID_ANY, dialogTitle='Please select the .fastq file for read 1', fileMode=wx.OPEN, fileMask='*.fastq;*.fq;*.reads;*.fasta;*.fa;*.fastq.gz', size=(400,30), startDirectory=os.path.dirname(vars.fq1), initialValue=vars.fq1, labelText='',changeCallback=self.OnChanged2)
sizer1.Add(self.picker1, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer1.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Select a file containing the reads in .FASTQ (or compressed FASTQ) format."), flag=wx.CENTER, border=0)
sizer1.Add((10, 1), 0, wx.EXPAND)
sizer.Add(sizer1,0,wx.EXPAND,0)
# READS 2
sizer2 = wx.BoxSizer(wx.HORIZONTAL)
label2 = wx.StaticText(panel, label='Choose the Fastq file for read 2:',size=(330,-1))
sizer2.Add(label2,0,wx.ALIGN_CENTER_VERTICAL,0)
self.picker2 = wx.lib.filebrowsebutton.FileBrowseButton(panel, id=wx.ID_ANY, dialogTitle='Please select the .fastq file for read 2', fileMode=wx.OPEN, fileMask='*.fastq;*.fq;*.reads;*.fasta;*.fa;*.fastq.gz', size=(400,30), startDirectory=os.path.dirname(vars.fq2), initialValue=vars.fq2, labelText='', changeCallback=self.OnChanged2)
sizer2.Add(self.picker2, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer2.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Select a file containing the pair-end reads in .FASTQ (or compressed FASTQ) format. Optional."), flag=wx.CENTER, border=0)
sizer2.Add((10, 1), 0, wx.EXPAND)
sizer.Add(sizer2,0,wx.EXPAND,0)
# OUTPUT PREFIX
sizer5 = wx.BoxSizer(wx.HORIZONTAL)
label5 = wx.StaticText(panel, label='Prefix to use for output filenames:',size=(340,-1))
sizer5.Add(label5,0,wx.ALIGN_CENTER_VERTICAL,0)
self.base = wx.TextCtrl(panel,value=vars.base,size=(400,30))
sizer5.Add(self.base, proportion=1.0, flag=wx.EXPAND|wx.ALL, border=5)
sizer5.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Select a a label prefix that will be used when writing output files e.g. 'wt_run1'"), flag=wx.CENTER, border=0)
sizer5.Add((130, 1), 0, wx.EXPAND)
sizer.Add(sizer5,0,wx.EXPAND,0)
# PROTOCOL
sizer_protocol = wx.BoxSizer(wx.HORIZONTAL)
label_protocol = wx.StaticText(panel, label='Protocol used:',size=(340,-1))
sizer_protocol.Add(label_protocol,0,wx.ALIGN_CENTER_VERTICAL,0)
self.protocol = wx.ComboBox(panel,choices=['Sassetti','Mme1', 'Tn5'],size=(400,30))
self.protocol.SetStringSelection(vars.protocol)
sizer_protocol.Add(self.protocol, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
protocol_tooltip_text = """Select which protocol best represents the reads. Default values will populate the fields.
The Sassetti protocol generally assumes the reads include the primer prefix and part of the transposon sequencing. It also assumes reads are sequenced in the forward direction.
The Mme1 protocol generally assumes reads do NOT include the primer prefix, and that the reads are sequenced in the reverse direction"""
sizer_protocol.Add(TPPIcon(panel, wx.ID_ANY, bmp, protocol_tooltip_text), flag=wx.CENTER, border=0)
sizer_protocol.Add((130, 1), 0, wx.EXPAND)
sizer.Add(sizer_protocol,0,wx.EXPAND,0)
self.Bind(wx.EVT_COMBOBOX, self.OnProtocolSelection, id=self.protocol.GetId())
# TRANSPOSON
sizer8 = wx.BoxSizer(wx.HORIZONTAL)
label8 = wx.StaticText(panel, label='Transposon used:',size=(340,-1))
sizer8.Add(label8,0,wx.ALIGN_CENTER_VERTICAL,0)
self.transposon = wx.ComboBox(panel,choices=['Himar1','Tn5', '[Custom]'],size=(400,30))
self.transposon.SetStringSelection(vars.transposon)
sizer8.Add(self.transposon, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer8.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Select the transposon used to construct the TnSeq libraries. This will automatically populate the primer prefix field. Select custom to specify your own sequence."), flag=wx.CENTER, border=0)
sizer8.Add((130, 1), 0, wx.EXPAND)
sizer.Add(sizer8,0,wx.EXPAND,0)
# PRIMER SEQUENCE
sizer4 = wx.BoxSizer(wx.HORIZONTAL)
label4 = wx.StaticText(panel, label='Primer sequence:',size=(340,-1))
sizer4.Add(label4,0,wx.ALIGN_CENTER_VERTICAL,0)
self.prefix = wx.TextCtrl(panel,value=str(vars.prefix), size=(400,30))
sizer4.Add(self.prefix, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer4.Add(TPPIcon(panel, wx.ID_ANY, bmp, "If present in the reads, specify the primer sequence. If it has been stripped away already, leave this field empty."), flag=wx.CENTER, border=0)
sizer4.Add((130, 1), 0, wx.EXPAND)
sizer.Add(sizer4,0,wx.EXPAND,0)
self.Bind(wx.EVT_COMBOBOX, self.OnTransposonSelection, id=self.transposon.GetId())
self.prefix.Bind(wx.EVT_TEXT, self.OnChangePrimerPrefix)
# MAX READS
sizer6 = wx.BoxSizer(wx.HORIZONTAL)
label6 = wx.StaticText(panel, label='Max reads (leave blank to use all):',size=(340,-1))
sizer6.Add(label6,0,wx.ALIGN_CENTER_VERTICAL,0)
self.maxreads = wx.TextCtrl(panel,size=(400,30))
sizer6.Add(self.maxreads, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer6.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Maximum reads to use from the reads files. Useful for running only a portion of very large number of reads. Leave blank to use all the reads."), flag=wx.CENTER, border=0)
sizer6.Add((130, 1), 0, wx.EXPAND)
sizer.Add(sizer6,0,wx.EXPAND,0)
# MISMATCHES
sizer7 = wx.BoxSizer(wx.HORIZONTAL)
label7 = wx.StaticText(panel, label='Mismatches allowed in Tn prefix:',size=(340,-1))
sizer7.Add(label7,0,wx.ALIGN_CENTER_VERTICAL,0)
self.mismatches = wx.TextCtrl(panel,value=str(vars.mm1),size=(400,30))
sizer7.Add(self.mismatches, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer7.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Number of mismatches allowed in the tn-prefix before discarding the read."), flag=wx.CENTER, border=0)
sizer7.Add((130, 1), 0, wx.EXPAND)
sizer.Add(sizer7,0,wx.EXPAND,0)
# BWA FLAGS
sizer8 = wx.BoxSizer(wx.HORIZONTAL)
label8 = wx.StaticText(panel, label='BWA flags (Optional)',size=(340,-1))
sizer8.Add(label8,0,wx.ALIGN_CENTER_VERTICAL,0)
self.flags = wx.TextCtrl(panel,value=vars.flags,size=(400,30))
sizer8.Add(self.flags, proportion=1, flag=wx.EXPAND|wx.ALL, border=5)
sizer8.Add(TPPIcon(panel, wx.ID_ANY, bmp, "Use this textobx to enter any desired flags for the BWA alignment. For example, to limit the number of mismatches to 1, type: -k 1. See the BWA documentation for all possible flags."), flag=wx.CENTER, border=0)
sizer8.Add((130, 1), 0, wx.EXPAND)
sizer.Add(sizer8,0,wx.EXPAND,0)
#
def OnTransposonSelection(self, event):
if self.transposon.GetValue()=="Tn5":
self.prefix.SetValue("TAAGAGACAG")
self.transposon.SetStringSelection("Tn5")
self.vars.transposon = "Tn5"
elif self.transposon.GetValue()=="Himar1":
self.prefix.SetValue("ACTTATCAGCCAACCTGTTA")
self.transposon.SetStringSelection("Himar1")
self.vars.transposon = "Himar1"
else:
self.transposon.SetValue("[Custom]")
self.transposon.SetStringSelection("[Custom]")
self.vars.transposon = "[Custom]"
#
def OnProtocolSelection(self, event):
self.vars.transposon = self.protocol.GetValue()
if self.protocol.GetValue()=="Tn5":
self.prefix.SetValue("TAAGAGACAG")
self.transposon.SetStringSelection("Tn5")
self.vars.transposon = "Tn5"
elif self.protocol.GetValue()=="Sassetti":
self.prefix.SetValue("ACTTATCAGCCAACCTGTTA")
self.transposon.SetStringSelection("Himar1")
self.vars.transposon = "Himar1"
elif self.protocol.GetValue()=="Mme1":
self.prefix.SetValue("")
self.transposon.SetStringSelection("Himar1")
self.vars.transposon = "Himar1"
#
def OnChanged(self, str_path):
print "changed"
value = os.path.basename(str_path).split('.')[0]
if '_R1' in value or '_R2':
value = value.split('_')[0]
self.base.SetValue(value)
#
def OnChanged2(self, event):
value2 = os.path.basename(self.picker2.GetValue()).split('.')[0]
value1 = os.path.basename(self.picker1.GetValue()).split('.')[0]
value = os.path.commonprefix([value1, value2])
self.base.SetValue(value)
self.base.Refresh()
#
def OnChangePrimerPrefix(self, event):
self.transposon.SetValue("[Custom]")
#
def InitList(self,panel,sizer):
self.list_ctrl = wx.ListCtrl(panel, size=(500,210), style=wx.LC_REPORT|wx.BORDER_SUNKEN)
self.list_ctrl.InsertColumn(0, 'Dataset (*.tn_stats)',width=300)
self.list_ctrl.InsertColumn(1, 'total reads',wx.LIST_FORMAT_RIGHT,width=125)
self.list_ctrl.InsertColumn(2, 'Tn prefix', wx.LIST_FORMAT_RIGHT,width=125)
self.list_ctrl.InsertColumn(3, 'R1_mapped', wx.LIST_FORMAT_RIGHT,width=90)
self.list_ctrl.InsertColumn(4, 'R2_mapped', wx.LIST_FORMAT_RIGHT,width=90)
self.list_ctrl.InsertColumn(5, 'mapped\nreads', wx.LIST_FORMAT_RIGHT,width=90)
self.list_ctrl.InsertColumn(6, 'template\ncount', wx.LIST_FORMAT_RIGHT,width=90)
self.list_ctrl.InsertColumn(7, 'TAs hit', wx.LIST_FORMAT_RIGHT,width=90)
self.list_ctrl.InsertColumn(8, 'insertion\ndensity',wx.LIST_FORMAT_RIGHT,width=90)
self.list_ctrl.InsertColumn(9, 'NZmean', wx.LIST_FORMAT_RIGHT,width=90)
self.list_ctrl.InsertColumn(10, 'maxcount', wx.LIST_FORMAT_RIGHT,width=90)
self.list_ctrl.InsertColumn(11, 'primer', wx.LIST_FORMAT_RIGHT,width=90)
self.list_ctrl.InsertColumn(12, 'vector',wx.LIST_FORMAT_RIGHT,width=90)
sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 10)
#
def InitMenu(self):
menubar = wx.MenuBar()
fileMenu = wx.Menu()
quit_menuitem = fileMenu.Append(wx.ID_EXIT, 'Quit', 'Quit application')
self.Bind(wx.EVT_MENU, self.OnQuit, quit_menuitem)
menubar.Append(fileMenu, '&File')
self.SetMenuBar(menubar)
#
def addNewDataset(self, event):
dlg = wx.FileDialog(
self, message="Choose a file",
defaultDir=".",
defaultFile="",
wildcard="*.wig",
style=wx.OPEN | wx.MULTIPLE | wx.CHANGE_DIR
)
if dlg.ShowModal() == wx.ID_OK:
paths = dlg.GetPaths()
for path in paths:
print "analyzing dataset:",path
analyze_dataset(path)
dlg.Destroy()
self.update_dataset_list()
#
def update_dataset_list(self):
if self.list_ctrl==None: return
self.list_ctrl.DeleteAllItems()
self.index = 0
datasets = []
for fname in glob.glob("*.tn_stats"):
filedate = os.path.getmtime(fname)
datasets.append((filedate,fname))
datasets.sort(reverse=True)
for (filedate,fname) in datasets:
stats = self.read_stats_file(fname)
ntrim = stats.get("TGTTA_reads","?")
if ntrim=="?": ntrim = stats.get("trimmed_reads","?")
vals = [stats.get("total_reads","?"),ntrim,stats.get("reads1_mapped", "?"),stats.get("reads2_mapped","?"),stats.get("mapped_reads","?"),stats.get("template_count","?"), stats.get("TAs_hit","?"), stats.get("density", "?"), stats.get("NZ_mean", "?"), stats.get("max_count", "?"), stats.get("primer_matches:","?"),stats.get("vector_matches:","?")]
dsname = "[%s] %s" % (time.strftime("%m/%d/%y",time.localtime(filedate)),fname[:fname.rfind('.')])
self.add_data(dsname, vals)
#
def read_stats_file(self,fname):
stats = {}
for line in open(fname):
w = line.rstrip().split()
val = ""
if len(w)>2: val = w[2]
stats[w[1]] = val
return stats
#
def add_data(self, dataset,vals):
self.list_ctrl.InsertStringItem(self.index, dataset)
for i in range(1, len(vals)+1):
self.list_ctrl.SetStringItem(self.index, i, vals[i-1])
self.index += 1
#
def OnQuit(self, e):
print "Quitting TPP. Good bye."
self.vars.action = "quit"
self.Close()
return 0
#
def map_reads(self,event):
# add bwa path, prefix
bwapath = self.picker0.GetValue()
fq1, fq2, ref, base, prefix, maxreads = self.picker1.GetValue(), self.picker2.GetValue(), self.picker3.GetValue(), self.base.GetValue(), self.prefix.GetValue(), self.maxreads.GetValue()
mm1 = self.mismatches.GetValue()
try: mm1 = int(mm1)
except Exception: mm1 = 1
self.vars.flags = self.flags.GetValue()
self.vars.transposon = self.transposon.GetStringSelection()
self.vars.protocol = self.protocol.GetValue()
self.vars.bwa = bwapath
self.vars.fq1 = fq1
self.vars.fq2 = fq2
self.vars.ref = ref
self.vars.base = base
self.vars.mm1 = mm1
self.vars.prefix = prefix
if maxreads == '': self.vars.maxreads = -1
else: self.vars.maxreads = int(maxreads)
self.vars.action = "start"
self.Close()
return 0
transit-2.1.1/src/pytpp/tpp_tools.py 0000664 0000000 0000000 00000101021 13131213142 0017521 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# Copyright 2015.
# Michael A. DeJesus, Chaitra Ambadipudi, and Thomas R. Ioerger.
#
#
# This file is part of TRANSIT.
#
# TRANSIT is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License.
#
#
# TRANSIT is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with TRANSIT. If not, see .
import glob,os,sys,time,math
import sys, re, shutil
import platform
import gzip
import subprocess
def cleanargs(rawargs):
#TODO: Write docstring
args = []
kwargs = {}
count = 0
while count < len(rawargs):
if rawargs[count].startswith("-"): #and len(rawargs[count].split(" ")) == 1:
if count + 1 < len(rawargs) and (not rawargs[count+1].startswith("-") or len(rawargs[count+1].split(" ")) > 1):
kwargs[rawargs[count][1:]] = rawargs[count+1]
count += 1
else:
kwargs[rawargs[count][1:]] = True
else:
args.append(rawargs[count])
count += 1
return (args, kwargs)
def analyze_dataset(wigfile):
data = []
TAs,ins,reads = 0,0,0
for line in open(wigfile):
if line[0]=='#': continue
if line[:3]=='var': continue # variableStep
w = line.rstrip().split()
TAs += 1
cnt = int(w[1])
if cnt>1: ins += 1
reads += cnt
data.append((cnt,w[0]))
output = open(wigfile+".stats","w")
output.write("total TAs: %d, insertions: %d (%0.1f%%), total reads: %d\n" % (TAs,ins,100*(ins/float(TAs)),reads))
output.write("mean read count per non-zero site: %0.1f\n" % (reads/float(ins)))
output.write("5 highest counts:\n")
data.sort(reverse=True)
for cnt,coord in data[:5]:
output.write("coord=%s, count=%s\n" % (coord,cnt))
output.close()
#############################################################################
def fastq2reads(infile,outfile,maxreads):
output = open(outfile,"w")
cnt,tot = 0,0
for line in open(infile):
if cnt==0 and line[0]=='@':
tot += 1
if tot%1000000==0: message("%s reads processed" % tot)
if maxreads > -1:
if tot > maxreads:
break
if cnt==0:
h = line[1:] # strip off '@'
#h = h.replace(' ','_')
output.write(">%s" % h)
if cnt==1: output.write(line)
cnt = (cnt+1)%4
output.close()
# the headers for each pair must be identical up to /1 and /2 at the ends
# if the variable character with the read number occurs in the middle, move it to the end
def fix_paired_headers_for_bwa(reads1,reads2):
a = open(reads1)
b = open(reads2)
temp1 = reads1+".temp"
temp2 = reads2+".temp"
c = open(temp1,"w")
d = open(temp2,"w")
tot = 0
try:
while True:
e = a.readline().rstrip()
f = b.readline().rstrip()
if len(e)<=2 or len(f)<=2: break
if e[0]=='>':
tot += 1
if tot%1000000==0: message("%s reads processed" % tot)
# find first position where there is a difference
i,n = 0,len(e)
if len(f)!=n: raise Exception('Error: unexpected format of headers in .fastq files')
while im
a = G[:n].find(H[:m])
if a!=-1: return a # shortcut for perfect matches
for i in range(0,n-m):
cnt = 0
for k in range(m):
if G[i+k]!=H[k]: cnt += 1
if cnt>max: break
if cnt<=max: return i
return -1
def extract_staggered(infile,outfile,vars):
Tn = vars.prefix
message("prefix sequence: %s" % vars.prefix)
lenTn = len(Tn)
ADAPTER2 = "TACCACGACCA"
lenADAP = len(ADAPTER2)
#P,Q = 5,10 # 1-based inclusive positions to look for start of Tn prefix
P,Q = 0,15
vars.tot_tgtta = 0
vars.truncated_reads = 0
output = open(outfile,"w")
tot = 0
#print infile
for line in open(infile):
#print line
line = line.rstrip()
if not line: continue
if line[0]=='>': header = line; continue
tot += 1
if tot%1000000==0: message("%s reads processed" % tot)
readlen = len(line)
a = mmfind(line,readlen,Tn,lenTn,vars.mm1) # allow some mismatches
b = mmfind(line,readlen,ADAPTER2,lenADAP, 1) # look for end of short frags
if a>=P and a<=Q:
gstart,gend = a+lenTn,readlen
if b!=-1: gend = b; vars.truncated_reads += 1
#if gend-gstart<20: continue # too short
if gend-gstart<5: continue # too short
output.write(header+"\n")
output.write(line[gstart:gend]+"\n")
vars.tot_tgtta += 1
output.close()
if vars.tot_tgtta == 0:
raise ValueError("Error: Input files did not contain any reads matching prefix sequence with %d mismatches" % vars.mm1)
def message(s):
print "[tn_preprocess]",s
sys.stdout.flush()
def get_id(line):
a,b = line.find(":")+1,line.rfind("#")
if b==-1: b = line.rfind("_")
return line[a:b]
# select the reads from infile that have headers occuring in goodreads
def select_reads(goodreads,infile,outfile):
hash = {}
for line in open(goodreads):
if line[0]=='>':
#id = line[line.find(":")+1:line.rfind("#")]
id = get_id(line)
hash[id] = 1
output = open(outfile,"w")
for line in open(infile):
if line[0]=='>':
header = line
id = get_id(line)
else:
if hash.has_key(id):
output.write(header)
output.write(line)
output.close()
def replace_ids(infile1,infile2,outfile):
f = open(infile1)
g = open(infile2)
h = open(outfile,"w")
while True:
a = f.readline()
b = g.readline()
if len(a)<2: break
if a[0]=='>': header = a
else:
h.write(header)
h.write(b)
f.close()
g.close()
h.close()
# indexes i and j are 1-based and inclusive (could be -1)
def select_cycles(infile,i,j,outfile):
output = open(outfile,"w")
for line in open(infile):
if line[0]=='>': header = line
else:
output.write(header)
output.write(line[i-1:j]+"\n")
output.close()
def read_genome(filename):
s = ""
for line in open(filename):
if line[0]=='>': continue # skip fasta header
else: s += line[:-1]
return s
# convert to bistring (8 bits; bit 0 is low-order bit)
#
# Bit Description
# 0 0x1 template having multiple segments in sequencing
# 1 0x2 each segment properly aligned according to the aligner
# 2 0x4 segment unmapped
# 3 0x8 next segment in the template unmapped
# 4 0x10 SEQ being reverse complemented
# 5 0x20 SEQ of the next segment in the template being reversed
# 6 0x40 the first segment in the template
# 7 0x80 the last segment in the template
#
# code[6]=1 means read1
# code[4]=1 means reverse strand
def samcode(num): return bin(int(num))[2:].zfill(8)[::-1]
def template_counts(ref,sam,bcfile,vars):
genome = read_genome(ref)
barcodes = {}
fil1 = open(bcfile)
fil2 = open(sam)
idx=1
for line in fil1:
if idx==1: break
idx+=1
idx=1
for line in fil2:
if idx==2: break
idx+=1
'''
for line in open(bcfile):
line = line.rstrip()
if line[0]=='>': id = line[1:]
else: barcodes[id] = line
'''
hits = {}
vars.tot_tgtta,vars.mapped = 0,0
vars.r1 = vars.r2 = 0
#for line in open(sam):
bcline=''
for line in fil2:
try:
bcline = fil1.next().rstrip()
if bcline[0] !='>': bc = bcline
except StopIteration:
pass
if line[0]=='@': continue
else:
w = line.split('\t')
code = samcode(w[1])
if 'S' in w[5]: continue #elimate softclipped reads
if code[6]=="1": # previously checked for for reads1's via w[1]<128
vars.tot_tgtta += 1
if code[2]=="0": vars.r1 += 1
if code[7]=="1" and code[2]=="0": vars.r2 += 1
# include "improperly mapped reads, which might just be short frags
#if w[1]=="99" or w[1]=="83" or w[1]=="97" or w[1]=="81":
if code[6]=="1" and code[2]=="0" and code[3]=="0": # both reads mapped (proper or not)
vars.mapped += 1
readlen = len(w[9])
pos,size = int(w[3]),int(w[8]) # note: size could be negative
strand,delta = 'F',-2
if code[4]=="1": strand,delta = 'R',readlen
pos += delta
#bc = barcodes[w[0]]
if pos not in hits: hits[pos] = []
hits[pos].append((strand,size,bc))
sites = []
for i in range(len(genome)-1):
if genome[i:i+2].upper()=="TA":
pos = i+1
h = hits.get(pos,[])
f = filter(lambda x: x[0]=='F',h)
r = filter(lambda x: x[0]=='R',h)
h.sort()
unique = {}
for (strand,size,bc) in h:
#print strand,bc,size
s = "%s-%s-%s" % (strand,bc,size)
unique[s] = 1
u = unique.keys()
uf = filter(lambda x: x[0]=='F',u)
ur = filter(lambda x: x[0]=='R',u)
data = [pos,len(f),len(uf),len(r),len(ur),len(f)+len(r),len(uf)+len(ur)]
sites.append(data)
return sites # (coord, Fwd_Rd_Ct, Fwd_Templ_Ct, Rev_Rd_Ct, Rev_Templ_Ct, Tot_Rd_Ct, Tot_Templ_Ct)
# pretend that all reads count as unique templates
def increase_counts(pos,sites, strand):
if strand == "F":
sites[pos][1] += 1 #if read has been found before, tally 1 more in R reads
sites[pos][2] += 1 #if read has been found before, tally 1 more in R reads
if strand == "R":
sites[pos][3] += 1 #if read has been found before, tally 1 more in R reads
sites[pos][4] += 1 #if read has been found before, tally 1 more in R reads
sites[pos][5] += 1 #if read has been found before, tally 1 more in R reads
sites[pos][6] += 1 #if read has been found before, tally 1 more in R reads
def read_counts(ref,sam,vars):
genome = read_genome(ref)
sites = {}
for i in range(len(genome)-1):
if genome[i:i+2]=="TA" or vars.transposon=='Tn5':
pos = i+1
sites[pos] = [pos,0,0,0,0,0,0]
hits = {}
vars.tot_tgtta,vars.mapped = 0,0
vars.r1 = vars.r2 = 0
for line in open(sam):
if line[0]=='@': continue
else:
w = line.split('\t')
code,icode = samcode(w[1]),int(w[1])
vars.tot_tgtta += 1
if icode==0 or icode==16:
vars.r1 += 1
vars.mapped += 1
readlen = len(w[9])
pos = int(w[3])
if vars.protocol.lower() == "mme1":
strand,delta = 'F',readlen
if code[4]=="1": strand,delta = 'R',1
site1 = pos + delta - 2 #if on + strand, take column 3 position and add 1bp,
site2 = pos + delta - 1 #check one off just in case it enzyme chewed too much
if site1 in sites:
increase_counts(site1, sites, strand)
if site2 in sites:
increase_counts(site2, sites, strand)
else:
strand,delta = 'F',-2
if code[4]=="1": strand,delta = 'R',readlen
site1 = pos + delta #if on + strand, take column 3 position and add 1bp)
if site1 in sites:
increase_counts(site1, sites, strand)
results = []
for key in sorted(sites.keys()):
results.append(sites[key])
return results # (coord, Fwd_Rd_Ct, Fwd_Templ_Ct, Rev_Rd_Ct, Rev_Templ_Ct, Tot_Rd_Ct, Tot_Templ_Ct)
def driver(vars):
vars.reads1 = vars.base+".reads1"
vars.reads2 = vars.base+".reads2"
vars.trimmed1 = vars.base+".trimmed1"
vars.trimmed2 = vars.base+".trimmed2"
vars.barcodes1 = vars.base+".barcodes1"
vars.barcodes2 = vars.base+".barcodes2"
vars.genomic2 = vars.base+".genomic2"
vars.sai1 = vars.base+".sai1"
vars.sai2 = vars.base+".sai2"
vars.sam = vars.base+".sam"
vars.tc = vars.base+".counts"
vars.wig = vars.base+".wig"
vars.stats = vars.base+".tn_stats"
if not vars.prefix:
if vars.transposon=="Tn5": vars.prefix = "TAAGAGACAG"
elif vars.transposon=="Himar1": vars.prefix = "ACTTATCAGCCAACCTGTTA"
else: vars.prefix = ""
try:
extract_reads(vars)
run_bwa(vars)
generate_output(vars)
except ValueError as err:
message("")
message("%s" % " ".join(err.args))
message("Exiting.")
sys.exit()
except IOError as err:
message("")
message("%s" % " ".join(err.args))
message("Make sure you have read/write access in the directories containing the necessary files.")
message("Note: If TPP cannot find index files for the FASTA sequence (i.e. *.fna.bwt, *.fna.pac, *.fna.ann, *.fna.sa), it will attempt to create them.")
message("Exiting.")
sys.exit()
message("Done.")
def uncompress(filename):
outfil = open(filename[0:-3], "w+")
for line in gzip.open(filename):
outfil.write(line)
return filename[0:-3]
def extract_reads(vars):
message("extracting reads...")
flag = ['','']
for idx, name in enumerate([vars.fq1, vars.fq2]):
if idx==1 and vars.single_end==True: continue
fil = open(name)
for line in fil:
if line[0] == '>':
flag[idx] = 'FASTA'
break
flag[idx] = 'FASTQ'
break
fil.close()
if vars.fq1.endswith('.gz'):
vars.fq1 = uncompress(vars.fq1)
if vars.fq2.endswith('.gz'):
vars.fq2 = uncompress(vars.fq2)
if(flag[0] == 'FASTQ'):
message("fastq2reads: %s -> %s" % (vars.fq1,vars.reads1))
fastq2reads(vars.fq1,vars.reads1,vars.maxreads)
else:
shutil.copyfile(vars.fq1, vars.reads1)
if vars.single_end==True:
message("assuming single-ended reads")
message("creating %s" % vars.trimmed1)
extract_staggered(vars.reads1,vars.trimmed1,vars)
return
if(flag[1] == 'FASTQ'):
message("fastq2reads: %s -> %s" % (vars.fq2,vars.reads2))
fastq2reads(vars.fq2,vars.reads2,vars.maxreads)
else:
shutil.copyfile(vars.fq2, vars.reads2)
message("fixing headers of paired reads for bwa...")
fix_paired_headers_for_bwa(vars.reads1,vars.reads2)
message("extracting barcodes and genomic parts of reads...")
message("creating %s" % vars.trimmed1)
extract_staggered(vars.reads1,vars.trimmed1,vars)
message("creating %s" % vars.trimmed2)
select_reads(vars.trimmed1,vars.reads2,vars.trimmed2)
#message("creating %s" % vars.barcodes2)
#select_cycles(vars.trimmed2,22,30,vars.barcodes2)
#message("creating %s" % vars.genomic2)
#select_cycles(vars.trimmed2,43,-1,vars.genomic2)
# instead of using select_cycles, do these both in one shot by looking for constant seqs
message("creating %s" % vars.barcodes2)
message("creating %s" % vars.genomic2)
extract_barcodes(vars.trimmed2,vars.barcodes2,vars.genomic2, vars.mm1)
message("creating %s" % vars.barcodes1)
replace_ids(vars.trimmed1,vars.barcodes2,vars.barcodes1)
# pattern for read 2...
# TAGTGGATGATGGCCGGTGGATTTGTG GTAATTACCA TGGTCGTGGTAT CCCAGCGCGACTTCTTCGGCGCACACACC TAACAGGTTGGCTGATAAGTCCCCG?AGAT AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGT
# -----const1---------------- --barcode- ---const2--- ------genomic---------------- ------const3--------------------------------------------------------------
# const suffix might appear if fragment is shorter than read length; if so, truncate
# if genomic part is too short, just output at least 20bp of const so as not to mess up BWA
# could the start of these be shifted slightly?
def extract_barcodes(fn_tgtta2,fn_barcodes2,fn_genomic2,mm1):
const1 = "GATGGCCGGTGGATTTGTG"
const2 = "TGGTCGTGGTAT"
const3 = "TAACAGGTTGGCTGATAAG"
nconst1,nconst2,nconst3 = len(const1),len(const2),len(const3)
fl_barcodes2 = open(fn_barcodes2,"w")
fl_genomic2 = open(fn_genomic2,"w")
tot,DEBUG = 0,0
for line in open(fn_tgtta2):
line = line.rstrip()
if line[0]=='>': header = line
else:
tot += 1
if tot%1000000==0: message("%s reads processed" % tot)
#a = line.find(const1)
#b = line.find(const2)
#c = line.find(const3)
a = mmfind(line,len(line),const1,nconst1, mm1)
b = mmfind(line,len(line),const2,nconst2, mm1)
c = mmfind(line,len(line),const3,nconst3, mm1)
bstart,bend = a+nconst1,b
gstart,gend = b+nconst2,len(line)
if c!=-1 and c-gstart>20: gend = c
if a==-1 or bendbstart+15:
# you can't just reject these, beacuse they are paired with R1
# but setting the genomic part to the first 20 cycles should prevent it from mapping
bstart,bend = 0,10
gstart,gend = 0,20
barcode,genomic = "XXXXXXXXXX","XXXXXXXXXX"
else: barcode,genomic = line[bstart:bend],line[gstart:gend]
if DEBUG==1:
fl_barcodes2.write(header+"\n")
fl_barcodes2.write(line+"\n")
#fl_barcodes2.write((" "*bstart)+line[bstart:bend]+"\n")
fl_barcodes2.write((" "*bstart)+barcode+"\n")
fl_genomic2.write(header+"\n")
fl_genomic2.write(line+"\n")
#fl_genomic2.write((" "*gstart)+line[gstart:gend]+"\n")
fl_genomic2.write((" "*gstart)+genomic+"\n")
else:
fl_barcodes2.write(header+"\n")
#fl_barcodes2.write(line[bstart:bend]+"\n")
fl_barcodes2.write(barcode+"\n")
fl_genomic2.write(header+"\n")
#fl_genomic2.write(line[gstart:gend]+"\n")
fl_genomic2.write(genomic+"\n")
fl_barcodes2.close()
fl_genomic2.close()
if DEBUG==1: sys.exit(0)
def bwa_subprocess(command, outfile):
commandstr = " ".join(command)
if outfile.name != "":
commandstr += " > %s" % outfile.name
message(commandstr)
process = subprocess.Popen(command, stdout=outfile, stderr=subprocess.PIPE)
for line in iter(process.stderr.readline, ''):
if "Permission denied" in line:
raise IOError("Error: BWA encountered a permissions error: \n\n%s" % line)
if "invalid option" in line:
raise ValueError("Error: Unrecognized flag for BWA: %s" % (line.split()[-1]))
sys.stderr.write("%s\n" % line.strip())
def run_bwa(vars):
message("mapping reads using BWA...(this takes a couple of minutes)")
if not os.path.exists(vars.ref+".amb"):
cmd = [vars.bwa, "index", vars.ref]
bwa_subprocess(cmd, sys.stdout)
cmd = [vars.bwa, "aln"]
if vars.flags.strip():
cmd.extend( vars.flags.split(" "))
cmd.extend([vars.ref, vars.trimmed1])
outfile = open(vars.sai1, "w")
bwa_subprocess(cmd, outfile)
if vars.single_end==True:
cmd = [vars.bwa, "samse", vars.ref, vars.sai1, vars.trimmed1]
outfile = open(vars.sam, "w")
bwa_subprocess(cmd, outfile)
else:
cmd = [vars.bwa, "aln"]
if vars.flags.strip():
cmd.extend(vars.flags.split(" "))
cmd.extend([vars.ref, vars.genomic2])
outfile = open(vars.sai2, "w")
bwa_subprocess(cmd, outfile)
cmd = [vars.bwa, "sampe", vars.ref, vars.sai1, vars.sai2, vars.trimmed1, vars.genomic2]
outfile = open(vars.sam, "w")
bwa_subprocess(cmd, outfile)
def stats(vals):
sum,ss = 0,0
for x in vals: sum += x; ss += x*x
N = float(len(vals))
mean = sum/N
var = ss/N-mean*mean
stdev = math.sqrt(var)
return mean,stdev
def corr(X,Y):
muX,sdX = stats(X)
muY,sdY = stats(Y)
if sdX == 0 or sdY == 0:
raise ValueError("Warning: Standard deviations of counts is zero.")
cX = [x-muX for x in X]
cY = [y-muY for y in Y]
s = sum([x*y for (x,y) in zip(cX,cY)])
return s/(float(len(X))*sdX*sdY)
def get_read_length(filename):
fil = open(filename)
i = 0
for line in fil:
if i == 1:
#print "reads1 line: " + line
return len(line.strip())
i+=1
def get_genomic_portion(filename):
fil = open(filename)
i = 0
tot_len = 0.0
n = 1
for line in fil:
if i%2 == 1:
tot_len += len(line.strip())
n += 1
i+=1
return tot_len/n
def generate_output(vars):
message("tabulating template counts and statistics...")
if vars.single_end==True: counts = read_counts(vars.ref,vars.sam,vars) # return read counts copied as template counts
else: counts = template_counts(vars.ref,vars.sam,vars.barcodes1,vars)
tcfile = open(vars.tc,"w")
tcfile.write('\t'.join("coord Fwd_Rd_Ct Fwd_Templ_Ct Rev_Rd_Ct Rev_Templ_Ct Tot_Rd_Ct Tot_Templ_Ct".split())+"\n")
for data in counts: tcfile.write('\t'.join([str(x) for x in data])+"\n")
tcfile.close()
if vars.mapped == 0:
raise ValueError('Error: BWA was unable to map any reads to the genome.')
message("writing %s" % vars.wig)
output = open(vars.wig,"w")
read1 = os.path.basename(vars.fq1)
read2 = os.path.basename(vars.fq2)
fi = re.split(r'\.', os.path.basename(vars.ref))[0]
output.write("# Generated by tpp from " + read1 + " and " + read2 + "\n")
output.write("variableStep chrom="+ fi + "\n")
for data in counts: output.write("%s %s\n" % (data[0],data[-1]))
output.close()
primer = "CTAGAGGGCCCAATTCGCCCTATAGTGAGT"
vector = "CTAGACCGTCCAGTCTGGCAGGCCGGAAAC"
adapter = "GATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
Himar1 = "ACTTATCAGCCAACCTGTTA"
tot_reads,nprimer,nvector,nadapter,misprimed = 0,0,0,0,0
for line in open(vars.reads1):
if line[0]=='>': tot_reads += 1; continue
if primer in line: nprimer += 1
if vector in line: nvector += 1
if adapter in line: nadapter += 1
if Himar1[:-5] in line and Himar1 not in line: misprimed += 1
rcounts = [x[5] for x in counts]
tcounts = [x[6] for x in counts]
rc,tc = sum(rcounts),sum(tcounts)
ratio = rc/float(tc) if (rc != 0 and tc !=0) else 0
ta_sites = len(rcounts)
tas_hit = len(filter(lambda x: x>0,rcounts))
density = tas_hit/float(ta_sites)
counts.sort(key=lambda x: x[-1])
max_tc = counts[-1][6]
max_coord = counts[-1][0]
NZmean = tc/float(tas_hit)
try:
FR_corr = corr([x[1] for x in counts],[x[3] for x in counts])
except ValueError:
FR_corr = float("nan")
try:
BC_corr = corr([x for x in rcounts if x!=0],[x for x in tcounts if x!=0])
except ValueError:
BC_corr = float("nan")
read_length = get_read_length(vars.base + ".reads1")
mean_r1_genomic = get_genomic_portion(vars.base + ".trimmed1")
if vars.single_end==False: mean_r2_genomic = get_genomic_portion(vars.base + ".genomic2")
output = open(vars.stats,"w")
version = "1.0"
#output.write("# title: Tn-Seq Pre-Processor, version %s\n" % vars.version)
output.write("# title: Tn-Seq Pre-Processor\n")
output.write("# date: %s\n" % time.strftime("%m/%d/%Y %H:%M:%S"))
output.write("# command: python ")
output.write(' '.join(sys.argv)+"\n")
output.write('# transposon type: %s\n' % vars.transposon)
output.write('# protocol type: %s\n' % vars.protocol)
output.write('# bwa flags: %s\n' % vars.flags)
output.write('# read1: %s\n' % vars.fq1)
output.write('# read2: %s\n' % vars.fq2)
output.write('# ref_genome: %s\n' % vars.ref)
output.write("# total_reads %s (or read pairs)\n" % tot_reads)
#output.write("# truncated_reads %s (fragments shorter than the read length; ADAP2 appears in read1)\n" % vars.truncated_reads)
output.write("# trimmed_reads %s (reads with valid Tn prefix, and insert size>20bp)\n" % vars.tot_tgtta)
output.write("# reads1_mapped %s\n" % vars.r1)
output.write("# reads2_mapped %s\n" % vars.r2)
output.write("# mapped_reads %s (both R1 and R2 map into genome)\n" % vars.mapped)
output.write("# read_count %s (TA sites only, for Himar1)\n" % rc)
output.write("# template_count %s\n" % tc)
output.write("# template_ratio %0.2f (reads per template)\n" % ratio)
output.write("# TA_sites %s\n" % ta_sites)
output.write("# TAs_hit %s\n" % tas_hit)
output.write("# density %0.3f\n" % density)
output.write("# max_count %s (among templates)\n" % max_tc)
output.write("# max_site %s (coordinate)\n" % max_coord)
output.write("# NZ_mean %0.1f (among templates)\n" % NZmean)
output.write("# FR_corr %0.3f (Fwd templates vs. Rev templates)\n" % FR_corr)
output.write("# BC_corr %0.3f (reads vs. templates, summed over both strands)\n" % BC_corr)
output.write("# primer_matches: %s reads (%0.1f%%) contain %s (Himar1)\n" % (nprimer,nprimer*100/float(tot_reads),primer))
output.write("# vector_matches: %s reads (%0.1f%%) contain %s (phiMycoMarT7)\n" % (nvector,nvector*100/float(tot_reads),vector))
output.write("# adapter_matches: %s reads (%0.1f%%) contain %s (Illumina/TruSeq index)\n" % (nadapter,nadapter*100/float(tot_reads),adapter))
output.write("# misprimed_reads: %s reads (%0.1f%%) contain Himar1 prefix but don't end in TGTTA\n" % (misprimed,misprimed*100/float(tot_reads)))
output.write("# read_length: %s bp\n" % read_length)
output.write("# mean_R1_genomic_length: %0.1f bp\n" % mean_r1_genomic)
if vars.single_end==False: output.write("# mean_R2_genomic_length: %0.1f bp\n" % mean_r2_genomic)
#output.write("# most_abundant_prefix: %s reads start with %s\n" % (temp[0][1],temp[0][0]))
# since these are reads (within Tn prefix stripped off), I expect ~1/4 to match Tn prefix
vals = [vars.fq1,vars.fq2,tot_reads,vars.tot_tgtta,vars.r1,vars.r2,vars.mapped,rc,tc,ratio,ta_sites,tas_hit,max_tc,density,max_coord,NZmean,FR_corr,BC_corr,nprimer,nvector,nadapter,misprimed]
output.write('\t'.join([str(x) for x in vals])+"\n")
output.close()
message("writing %s" % vars.stats)
#os.system("grep '#' %s" % vars.stats)
infile = open(vars.stats)
for line in infile:
if '#' in line:
print line.rstrip()
infile.close()
#############################################################################
def error(s):
print "error:",s
sys.exit(0)
def warning(s):
print "warning:",s
def set_defaults(vars, protocol):
#protocol = kwargs.get("protocol", "sassetti")
if protocol == "sassetti":
set_sassetti_defaults(vars)
elif protocol == "mme1":
set_mme1_defaults(vars)
elif protocol == "tn5":
set_tn5_defaults(vars)
else:
set_sassetti_defaults(vars)
def set_attributes(vars, attributes_list, override=False):
for (attr, value) in attributes_list:
if override:
setattr(vars, attr, value)
else:
if not hasattr(vars, attr):
setattr(vars, attr, value)
def set_sassetti_defaults(vars):
attributes_list = []
attributes_list.append(("transposon", "Himar1"))
attributes_list.append(("protocol", "Sassetti"))
attributes_list.append(("prefix", "ACTTATCAGCCAACCTGTTA"))
attributes_list.append(("maxreads", -1))
attributes_list.append(("mm1", 100))
set_attributes(vars, attributes_list)
def set_mme1_defaults(vars):
attributes_list = []
attributes_list.append(("transposon", "Himar1"))
attributes_list.append(("protocol", "Mme1"))
attributes_list.append(("prefix", ""))
attributes_list.append(("maxreads", -1))
attributes_list.append(("mm1", 2))
set_attributes(vars, attributes_list)
def set_tn5_defaults(vars):
attributes_list = []
attributes_list.append(("transposon", "Tn5"))
attributes_list.append(("protocol", "Tn5"))
attributes_list.append(("prefix", ""))
attributes_list.append(("maxreads", -1))
attributes_list.append(("mm1", 2))
set_attributes(vars, attributes_list)
def verify_inputs(vars):
if not os.path.exists(vars.fq1): error("reads1 file not found: "+vars.fq1)
vars.single_end = False
if vars.fq2=="": vars.single_end = True
elif not os.path.exists(vars.fq2): error("reads2 file not found: "+vars.fq2)
if not os.path.exists(vars.ref): error("reference file not found: "+vars.ref)
if vars.base == '': error("prefix cannot be empty")
if vars.fq1 == vars.fq2: error('fastq files cannot be identical')
# If Mme1 protocol, warn that we don't use read2 file
if vars.protocol.lower() == "mme1" and not vars.single_end:
warning("Ignoring Read 2 file. TPP assumes Mme1 protocol runs in single-end mode.")
vars.single_end = True
vars.fq2 = ""
if os.path.isdir(vars.bwa):
bwaexec_unix = os.path.join(vars.bwa, "bwa")
bwaexec_win = os.path.join(vars.bwa, "bwa.exe")
if os.path.exists(bwaexec_unix) and not os.path.isdir(bwaexec_unix):
warning("did not include BWA executable name. Assuming BWA executable is named 'bwa'")
vars.bwa = bwaexec_unix
elif os.path.exists(bwaexec_win) and not os.path.isdir(bwaexec_win):
warning("did not include BWA executable name. Assuming BWA executable is named 'bwa.exe'")
vars.bwa = bwaexec_win
else:
error('cannot find BWA executable. Please include the full executable name as well as its directory.')
elif not os.path.exists(vars.bwa):
error('cannot find BWA executable. Please include the full executable name as well as its directory.')
def initialize_globals(vars, args=[], kwargs={}):
vars.fq1,vars.fq2,vars.ref,vars.bwa,vars.base,vars.maxreads = "","","","","temp",-1
vars.mm1 = 1 # mismatches allowed in Tn prefix
vars.transposon = 'Himar1'
vars.protocol = "Sassetti"
vars.prefix = "ACTTATCAGCCAACCTGTTA"
vars.flags = ""
# Update defaults
protocol = kwargs.get("protocol", "").lower()
if protocol:
set_protocol_defaults(vars, protocol)
elif not kwargs:
read_config(vars)
# If running in console mode with flags
if "protocol" in kwargs:
vars.protocol = kwargs["protocol"]
if "himar1" in kwargs:
vars.transposon = "Himar1"
if "tn5" in kwargs:
vars.transposon = "Tn5"
if "protocol" in kwargs:
vars.protocol = kwargs["protocol"]
if "primer" in kwargs:
vars.prefix = kwargs["primer"]
if "reads1" in kwargs:
vars.fq1 = kwargs["reads1"]
if "reads2" in kwargs:
vars.fq2 = kwargs["reads2"]
if "bwa" in kwargs:
vars.bwa = kwargs["bwa"]
if "ref" in kwargs:
vars.ref = kwargs["ref"]
if "maxreads" in kwargs:
vars.maxreads = int(kwargs["maxreads"])
if "output" in kwargs:
vars.base = kwargs["output"]
if "mismatches" in kwargs:
vars.mm1 = int(kwargs["mismatches"])
if "flags" in kwargs:
vars.flags = kwargs["flags"]
def read_config(vars):
if not os.path.exists("tpp.cfg"): return
for line in open("tpp.cfg"):
w = line.split()
if len(w)>=2 and w[0]=='reads1': vars.fq1 = w[1]
if len(w)>=2 and w[0]=='reads2': vars.fq2 = w[1]
if len(w)>=2 and w[0]=='ref': vars.ref = w[1]
if len(w)>=2 and w[0]=='bwa': vars.bwa = w[1]
if len(w)>=2 and w[0]=='prefix': vars.base = w[1]
if len(w)>=2 and w[0]=='mismatches1': vars.mm1 = int(w[1])
if len(w)>=2 and w[0]=='transposon': vars.transposon = w[1]
if len(w)>=2 and w[0]=='protocol': vars.protocol = " ".join(w[1:])
if len(w)>=2 and w[0]=='primer': vars.prefix = w[1]
if len(w)>=2 and w[0]=='flags': vars.flags = " ".join(w[1:])
def save_config(vars):
f = open("tpp.cfg","w")
f.write("reads1 %s\n" % vars.fq1)
f.write("reads2 %s\n" % vars.fq2)
f.write("ref %s\n" % vars.ref)
f.write("bwa %s\n" % vars.bwa)
f.write("prefix %s\n" % vars.base)
f.write("mismatches1 %s\n" % vars.mm1)
f.write("transposon %s\n" % vars.transposon)
f.write("protocol %s\n" % vars.protocol)
f.write("primer %s\n" % vars.prefix)
f.write("flags %s\n" % vars.flags)
f.close()
def show_help():
print 'usage: python PATH/src/tpp.py -bwa -ref -reads1 [-reads2 ] -output [-maxreads ] [-mismatches ] [-flags ""] [-tn5|-himar1] [-primer ]'
class Globals:
pass
transit-2.1.1/src/pytransit/ 0000775 0000000 0000000 00000000000 13131213142 0016012 5 ustar 00root root 0000000 0000000 transit-2.1.1/src/pytransit/__init__.py 0000664 0000000 0000000 00000000167 13131213142 0020127 0 ustar 00root root 0000000 0000000
__all__ = ["transit_tools", "tnseq_tools", "norm_tools", "stat_tools"]
__version__ = "v2.1.1"
prefix = "[TRANSIT]"
transit-2.1.1/src/pytransit/__main__.py 0000664 0000000 0000000 00000004267 13131213142 0020115 0 ustar 00root root 0000000 0000000
import sys
try:
import wx
hasWx = True
#Check if wx is the newest 3.0+ version:
try:
from wx.lib.pubsub import pub
pub.subscribe
newWx = True
except AttributeError as e:
from wx.lib.pubsub import Publisher as pub
newWx = False
except Exception as e:
hasWx = False
import pytransit
import pytransit.transit_tools as transit_tools
import pytransit.analysis
method_wrap_width = 250
methods = pytransit.analysis.methods
export_methods = pytransit.analysis.export_methods
all_methods = {}
all_methods.update(methods)
all_methods.update(export_methods)
wildcard = "Python source (*.py)|*.py|" \
"All files (*.*)|*.*"
transit_prefix = "[TRANSIT]"
def main(args=None):
#If no arguments, show GUI:
DEBUG = "--debug" in sys.argv
if DEBUG:
sys.argv.remove("--debug")
# Check if running in GUI Mode
if len(sys.argv) == 1 and hasWx:
import pytransit.transit_gui as transit_gui
transit_tools.transit_message("Running in GUI Mode")
app = wx.App(False)
#create an object of CalcFrame
frame = transit_gui.TnSeekFrame(None, DEBUG)
#show the frame
frame.Show(True)
#start the applications
app.MainLoop()
# Tried GUI mode but has no wxPython
elif len(sys.argv) == 1 and not hasWx:
print "Please install wxPython to run in GUI Mode."
print "To run in Console Mode please follow these instructions:"
print ""
print "Usage: python %s " % sys.argv[0]
print "List of known methods:"
for m in methods:
print "\t - %s" % m
# Running in Console mode
else:
method_name = sys.argv[1]
if method_name not in all_methods:
print "Error: The '%s' method is unknown." % method_name
print "Please use one of the known methods (or see documentation to add a new one):"
for m in all_methods:
print "\t - %s" % m
print "Usage: python %s " % sys.argv[0]
else:
methodobj = all_methods[method_name].method.fromconsole()
methodobj.Run()
if __name__ == "__main__":
main()
transit-2.1.1/src/pytransit/analysis/ 0000775 0000000 0000000 00000000000 13131213142 0017635 5 ustar 00root root 0000000 0000000 transit-2.1.1/src/pytransit/analysis/__init__.py 0000664 0000000 0000000 00000001664 13131213142 0021755 0 ustar 00root root 0000000 0000000 #__all__ = []
from os.path import dirname, basename, isfile
import glob
modules = glob.glob(dirname(__file__)+"/*.py")
__all__ = [ basename(f)[:-3] for f in modules if isfile(f)]
import base
import gumbel
import example
import tn5gaps
import binomial
import griffin
import resampling
import hmm
import rankproduct
methods = {}
methods["example"] = example.ExampleAnalysis()
methods["gumbel"] = gumbel.GumbelAnalysis()
methods["binomial"] = binomial.BinomialAnalysis()
methods["griffin"] = griffin.GriffinAnalysis()
methods["hmm"] = hmm.HMMAnalysis()
methods["resampling"] = resampling.ResamplingAnalysis()
methods["tn5gaps"] = tn5gaps.Tn5GapsAnalysis()
methods["rankproduct"] = rankproduct.RankProductAnalysis()
#methods["mcce"] = mcce.MCCEAnalysis()
#methods["mcce2"] = mcce2.MCCE2Analysis()
#methods["motifhmm"] = motifhmm.MotifHMMAnalysis()
# EXPORT METHODS
import norm
export_methods = {}
export_methods["norm"] = norm.NormAnalysis()
transit-2.1.1/src/pytransit/analysis/base.py 0000664 0000000 0000000 00000035264 13131213142 0021133 0 ustar 00root root 0000000 0000000 #__all__ = []
import sys
try:
import wx
hasWx = True
#Check if wx is the newest 3.0+ version:
try:
from wx.lib.pubsub import pub
pub.subscribe
newWx = True
except AttributeError as e:
from wx.lib.pubsub import Publisher as pub
newWx = False
except Exception as e:
hasWx = False
newWx = False
import traceback
import datetime
import pytransit.transit_tools as transit_tools
file_prefix = "[FileDisplay]"
class InvalidArgumentException(Exception):
def __init__(self, message):
# Call the base class constructor with the parameters it needs
super(InvalidArgumentException, self).__init__(message)
if hasWx:
class InfoIcon(wx.StaticBitmap):
def __init__(self, panel, flag, bmp=None, tooltip=""):
if not bmp:
bmp = wx.ArtProvider.GetBitmap(wx.ART_INFORMATION, wx.ART_OTHER, (16, 16))
wx.StaticBitmap.__init__(self, panel, flag, bmp)
tp = wx.ToolTip(tooltip)
self.SetToolTip(tp)
class TransitGUIBase:
def __init__(self):
self.wxobj = None
self.short_name = "TRANSIT"
self.long_name = "TRANSIT"
def status_message(self, text, time=-1):
#TODO: write docstring
if self.wxobj:
if newWx:
wx.CallAfter(pub.sendMessage, "status", msg=(self.short_name, text, time))
else:
wx.CallAfter(pub.sendMessage, "status", (self.short_name, text, time))
wx.Yield()
def console_message(self, text):
#TODO: write docstring
sys.stdout.write("[%s] %s\n" % (self.short_name, text))
def console_message_inplace(self, text):
#TODO: write docstring
sys.stdout.write("[%s] %s \r" % (self.short_name, text) )
sys.stdout.flush()
def transit_message(self, text):
#TODO: write docstring
self.console_message(text)
self.status_message(text)
def transit_message_inplace(self, text):
#TODO: write docstring
self.console_message_inplace(text)
self.status_message(text)
def transit_error(self,text):
self.transit_message(text)
if self.wxobj:
transit_tools.ShowError(text)
def transit_warning(self,text):
self.transit_message(text)
if self.wxobj:
transit_tools.ShowWarning(text)
class TransitFile (TransitGUIBase):
#TODO write docstring
def __init__(self, identifier="#Unknown", colnames=[]):
#TODO write docstring
TransitGUIBase.__init__(self)
self.identifier = identifier
self.colnames = colnames
def getData(self, path, colnames):
#TODO write docstring
row = 0
data = []
shownError = False
for line in open(path):
if line.startswith("#"): continue
tmp = line.split("\t")
tmp[-1] = tmp[-1].strip()
#print colnames
#print len(colnames), len(tmp)
try:
rowdict = dict([(colnames[i], tmp[i]) for i in range(len(colnames))])
except Exception as e:
if not shownError:
self.transit_warning("Error reading data! This may be caused by trying to load a old results file, when the format has changed.")
shownError = True
rowdict = dict([(colnames[i], tmp[i]) for i in range(min(len(colnames), len(tmp)))])
data.append((row, rowdict))
row+=1
return data
def getHeader(self, path):
#TODO write docstring
return "Generic Transit File Type."
def getMenus(self):
menus = [("Display in Track View", self.displayInTrackView)]
return menus
def displayInTrackView(self, displayFrame, event):
#print "Self:", self
#print "Frame:", displayFrame
#print "Event:", event
#print "Frame parent:", displayFrame.parent
try:
gene = displayFrame.grid.GetCellValue(displayFrame.row, 0)
displayFrame.parent.allViewFunc(displayFrame, gene)
except Exception as e:
print file_prefix, "Error occurred: %s" % e
class AnalysisGUI:
def __init__(self):
self.wxobj = None
self.panel = None
self.LABELSIZE = (100,-1)
self.WIDGETSIZE = (100,-1)
def Hide(self):
self.panel.Hide()
def Show(self):
self.panel.Show()
def Enable(self):
self.panel.Enable()
def definePanel(self, wxobj):
#TODO: write docstring
self.wxobj = wxobj
wPanel = wx.Panel( self.wxobj.optionsWindow, wx.ID_ANY, wx.DefaultPosition, wx.DefaultSize, wx.TAB_TRAVERSAL )
Section = wx.BoxSizer( wx.VERTICAL )
Label = wx.StaticText(wPanel, id=wx.ID_ANY, label=str("Options"), pos=wx.DefaultPosition, size=wx.DefaultSize, style=0 )
Label.Wrap( -1 )
Section.Add( Label, 0, wx.ALL|wx.ALIGN_CENTER_HORIZONTAL, 5 )
Sizer1 = wx.BoxSizer( wx.HORIZONTAL )
Section.Add( Sizer1, 1, wx.EXPAND, 5 )
Button = wx.Button( wPanel, wx.ID_ANY, u"Run", wx.DefaultPosition, wx.DefaultSize, 0 )
Section.Add( Button, 0, wx.ALL|wx.ALIGN_CENTER_HORIZONTAL, 5 )
wPanel.SetSizer( Section )
wPanel.Layout()
Section.Fit( wPanel )
#Connect events
Button.Bind( wx.EVT_BUTTON, self.wxobj.RunMethod )
self.panel = wPanel
def defineTextBox(self, panel, labelText="", widgetText="", tooltipText="", labSize=None, widgetSize=None):
if not labSize: labSize = self.LABELSIZE
if not widgetSize: widgetSize = self.WIDGETSIZE
sizer = wx.BoxSizer( wx.HORIZONTAL )
label = wx.StaticText(panel, wx.ID_ANY, labelText, wx.DefaultPosition, labSize, 0)
label.Wrap( -1 )
textBox = wx.TextCtrl( panel, wx.ID_ANY, widgetText, wx.DefaultPosition, widgetSize, 0 )
sizer.Add(label, 0, wx.ALIGN_CENTER_VERTICAL, 5 )
sizer.Add(textBox, 0, wx.ALIGN_CENTER_VERTICAL, 5 )
sizer.Add(InfoIcon(panel, wx.ID_ANY, tooltip=tooltipText), 0, wx.ALIGN_CENTER_VERTICAL, 5 )
return (label, textBox, sizer)
def defineChoiceBox(self, panel, labelText="", widgetChoice=[""], tooltipText="", labSize=None, widgetSize=None):
if not labSize: labSize = self.LABELSIZE
if not widgetSize: widgetSize = self.WIDGETSIZE
sizer = wx.BoxSizer( wx.HORIZONTAL )
label = wx.StaticText(panel, wx.ID_ANY, labelText, wx.DefaultPosition, labSize, 0)
label.Wrap( -1 )
choiceBox = wx.Choice( panel, wx.ID_ANY, wx.DefaultPosition, widgetSize, widgetChoice, 0 )
choiceBox.SetSelection(0)
sizer.Add(label, 0, wx.ALIGN_CENTER_VERTICAL, 5 )
sizer.Add(choiceBox, 0, wx.ALIGN_CENTER_VERTICAL, 5 )
sizer.Add(InfoIcon(panel, wx.ID_ANY, tooltip=tooltipText), 0, wx.ALIGN_CENTER_VERTICAL, 5 )
return (label, choiceBox, sizer)
def defineCheckBox(self, panel, labelText="", widgetCheck=False, tooltipText="", widgetSize=None):
if not widgetSize: widgetSize = self.WIDGETSIZE
sizer = wx.BoxSizer( wx.HORIZONTAL )
checkBox = wx.CheckBox(panel, label = labelText, size=widgetSize)
checkBox.SetValue(widgetCheck)
sizer.Add(checkBox, 0, wx.ALIGN_CENTER_VERTICAL, 5 )
sizer.Add(InfoIcon(panel, wx.ID_ANY, tooltip=tooltipText), 0, wx.ALIGN_CENTER_VERTICAL, 5 )
return (checkBox, sizer)
class AnalysisMethod:
'''
Basic class for analysis methods. Inherited by SingleMethod and ComparisonMethod.
'''
def __init__(self, short_name, long_name, description, output, annotation_path, wxobj=None):
self.short_name = short_name
self.long_name = long_name
self.description = description
self.output = output
self.annotation_path = annotation_path
self.newWx = newWx
self.wxobj = wxobj
@classmethod
def fromGUI(self, wxobj):
#TODO: write docstring
raise NotImplementedError
@classmethod
def fromargs(self, rawargs):
#TODO: write docstring
raise NotImplementedError
@classmethod
def fromconsole(self):
#TODO: write docstring
try:
return self.fromargs(sys.argv[2:])
except InvalidArgumentException as e:
print "Error: %s" % str(e)
print self.usage_string()
except IndexError as e:
print "Error: %s" % str(e)
print self.usage_string()
except TypeError as e:
print "Error: %s" % str(e)
traceback.print_exc()
print self.usage_string()
except ValueError as e:
print "Error: %s" % str(e)
traceback.print_exc()
print self.usage_string()
except Exception as e:
print "Error: %s" % str(e)
traceback.print_exc()
print self.usage_string()
sys.exit()
@classmethod
def usage_string(self):
#TODO: write docstring
raise NotImplementedError
def Run(self):
#TODO write docstring
raise NotImplementedError
def print_members(self):
#TODO: write docstring
members = sorted([attr for attr in dir(self) if not callable(getattr(self,attr)) and not attr.startswith("__")])
for m in members:
print "%s = %s" % (m, getattr(self, m))
def add_file(self, path=None, filetype=None):
#TODO: write docstring
if not path:
path = self.output.name
if not filetype:
filetype = self.short_name
data = {"path":path, "type":filetype, "date": datetime.datetime.today().strftime("%B %d, %Y %I:%M%p")}
if self.wxobj:
if newWx:
wx.CallAfter(pub.sendMessage, "file", data=data)
else:
wx.CallAfter(pub.sendMessage, "file", data)
def finish(self):
#TODO: write docstring
if self.wxobj:
if newWx:
wx.CallAfter(pub.sendMessage,"finish", msg=self.short_name.lower())
else:
wx.CallAfter(pub.sendMessage,"finish", self.short_name.lower())
def progress_update(self, text, count):
#TODO: write docstring
if self.wxobj:
if newWx:
wx.CallAfter(pub.sendMessage, "progress", msg=(self.short_name, count))
else:
wx.CallAfter(pub.sendMessage, "progress", (self.short_name, count))
wx.Yield()
def progress_range(self, count):
#TODO: write docstring
if self.wxobj:
if newWx:
wx.CallAfter(pub.sendMessage, "progressrange", msg=count)
else:
wx.CallAfter(pub.sendMessage, "progressrange", count)
wx.Yield()
def status_message(self, text, time=-1):
#TODO: write docstring
if self.wxobj:
if newWx:
wx.CallAfter(pub.sendMessage, "status", msg=(self.short_name, text, time))
else:
wx.CallAfter(pub.sendMessage, "status", (self.short_name, text, time))
wx.Yield()
def console_message(self, text):
#TODO: write docstring
sys.stdout.write("[%s] %s\n" % (self.short_name, text))
def console_message_inplace(self, text):
#TODO: write docstring
sys.stdout.write("[%s] %s \r" % (self.short_name, text) )
sys.stdout.flush()
def transit_message(self, text):
#TODO: write docstring
self.console_message(text)
self.status_message(text)
def transit_message_inplace(self, text):
#TODO: write docstring
self.console_message_inplace(text)
self.status_message(text)
def transit_error(self,text):
self.transit_message(text)
if self.wxobj:
transit_tools.ShowError(text)
def transit_warning(self,text):
self.transit_message(text)
if self.wxobj:
transit_tools.ShowWarning(text)
class SingleConditionMethod(AnalysisMethod):
'''
Class to be inherited by analysis methods that determine essentiality in a single condition (e.g. Gumbel, Binomial, HMM).
'''
def __init__(self, short_name, long_name, description, ctrldata, annotation_path, output, replicates="Sum", normalization=None, LOESS=False, ignoreCodon=True, NTerminus=0.0, CTerminus=0.0, wxobj=None):
AnalysisMethod.__init__(self, short_name, long_name, description, output, annotation_path, wxobj)
self.ctrldata = ctrldata
self.replicates = replicates
self.normalization = normalization
self.LOESS = LOESS
self.ignoreCodon = ignoreCodon
self.NTerminus = NTerminus
self.CTerminus = CTerminus
class DualConditionMethod(AnalysisMethod):
'''
Class to be inherited by analysis methods that determine changes in essentiality between two conditions (e.g. Resampling, DEHMM).
'''
def __init__(self, short_name, long_name, description, ctrldata, expdata, annotation_path, output, normalization, replicates="Sum", LOESS=False, ignoreCodon=True, NTerminus=0.0, CTerminus=0.0, wxobj=None):
AnalysisMethod.__init__(self, short_name, long_name, description, output, annotation_path, wxobj)
self.ctrldata = ctrldata
self.expdata = expdata
self.normalization = normalization
self.replicates = replicates
self.LOESS = LOESS
self.ignoreCodon = ignoreCodon
self.NTerminus = NTerminus
self.CTerminus = CTerminus
class TransitAnalysis:
def __init__(self, sn, ln, desc, tn, method_class=AnalysisMethod, gui_class=AnalysisGUI, filetypes=[TransitFile]):
self.short_name = sn
self.long_name = ln
self.description = desc
self.transposons = tn
self.method = method_class
self.gui = gui_class()
self.filetypes = filetypes
def __str__(self):
return """Analysis Method:
Short Name: %s
Long Name: %s
Description: %s
Method: %s
GUI: %s""" % (self.short_name, self.long_name, self.description, self.method, self.gui)
def fullname(self):
return "[%s] - %s" % (self.short_name, self.long_name)
def getInstructionsText(self):
return ""
def getDescriptionText(self):
return self.description
def getTransposonsText(self):
if len(self.transposons) == 0:
return "Tn attribute missing!"
elif len(self.transposons) == 1:
return "Intended for %s only" % self.transposons[0]
elif len(self.transposons) == 2:
return "Intended for %s or %s" % tuple(self.transposons)
else:
return "Intended for " + ", ".join(self.transposons[:-1]) + ", and " + self.transposons[-1]
if __name__ == "__main__":
pass
transit-2.1.1/src/pytransit/analysis/binomial.py 0000664 0000000 0000000 00000047147 13131213142 0022016 0 ustar 00root root 0000000 0000000 import sys
try:
import wx
hasWx = True
#Check if wx is the newest 3.0+ version:
try:
from wx.lib.pubsub import pub
pub.subscribe
newWx = True
except AttributeError as e:
from wx.lib.pubsub import Publisher as pub
newWx = False
except Exception as e:
hasWx = False
newWx = False
import os
import time
import math
import random
import numpy
import scipy.stats
import datetime
import base
import pytransit.transit_tools as transit_tools
import pytransit.tnseq_tools as tnseq_tools
import pytransit.norm_tools as norm_tools
import pytransit.stat_tools as stat_tools
#method_name = "binomial"
############# GUI ELEMENTS ##################
short_name = "binomial"
long_name = "Hierarchical binomial model of essentiality with individual frequencies."
description = """Hierarchical bayesian model of essentiality based on the binomial distribution. Estimates individual probabilities for insertion, leading to more conservative predictions.
Reference: DeJesus and Ioerger (2014; IEEE TCBB)
"""
transposons = ["himar1"]
columns = ["Orf","Name","Description","Mean Insertion","Sites per Replicate","Total Insertions","Total Sites","thetabar", "zbar", "Call"]
############# Analysis Method ##############
class BinomialAnalysis(base.TransitAnalysis):
def __init__(self):
base.TransitAnalysis.__init__(self, short_name, long_name, description, transposons, BinomialMethod, BinomialGUI, [BinomialFile])
################## FILE ###################
class BinomialFile(base.TransitFile):
def __init__(self):
base.TransitFile.__init__(self, "#Binomial", columns)
def getHeader(self, path):
ess=0; unc=0; non=0; short=0
for line in open(path):
if line.startswith("#"): continue
tmp = line.strip().split("\t")
if tmp[-1] == "Essential": ess+=1
if tmp[-1] == "Uncertain": unc+=1
if tmp[-1] == "Non-Essential": non+=1
text = """Results:
Essentials: %s
Uncertain: %s
Non-Essential: %s
""" % (ess, unc, non)
return text
################## GUI ###################
class BinomialGUI(base.AnalysisGUI):
def definePanel(self, wxobj):
self.wxobj = wxobj
binomialPanel = wx.Panel( self.wxobj.optionsWindow, wx.ID_ANY, wx.DefaultPosition, wx.DefaultSize, wx.TAB_TRAVERSAL )
binomialSection = wx.BoxSizer( wx.VERTICAL )
binomialLabel = wx.StaticText( binomialPanel, wx.ID_ANY, u"Binomial Options", wx.DefaultPosition, wx.DefaultSize, 0 )
binomialLabel.Wrap( -1 )
binomialSection.Add( binomialLabel, 0, wx.ALL|wx.ALIGN_CENTER_HORIZONTAL, 5 )
binomialSizer1 = wx.BoxSizer( wx.HORIZONTAL )
binomialSizer2 = wx.BoxSizer( wx.HORIZONTAL )
binomialLabelSizer = wx.BoxSizer( wx.VERTICAL )
binomialControlSizer = wx.BoxSizer( wx.VERTICAL )
#Samples
binomialSampleLabel = wx.StaticText( binomialPanel, wx.ID_ANY, u"Samples", wx.DefaultPosition, wx.DefaultSize, 0 )
binomialSampleLabel.Wrap(-1)
binomialLabelSizer.Add(binomialSampleLabel, 1, wx.ALL, 5)
self.wxobj.binomialSampleText = wx.TextCtrl( binomialPanel, wx.ID_ANY, u"10000", wx.DefaultPosition, wx.DefaultSize, 0 )
binomialControlSizer.Add(self.wxobj.binomialSampleText, 0, wx.ALL|wx.EXPAND, 5)
#Burnin
binomialBurnLabel = wx.StaticText( binomialPanel, wx.ID_ANY, u"Burn-in", wx.DefaultPosition, wx.DefaultSize, 0 )
binomialBurnLabel.Wrap(-1)
binomialLabelSizer.Add(binomialBurnLabel, 1, wx.ALL, 5)
self.wxobj.binomialBurnText = wx.TextCtrl( binomialPanel, wx.ID_ANY, u"500", wx.DefaultPosition, wx.DefaultSize, 0 )
binomialControlSizer.Add(self.wxobj.binomialBurnText, 0, wx.ALL|wx.EXPAND, 5)
binomialSizer2.Add(binomialLabelSizer, 1, wx.EXPAND, 5)
binomialSizer2.Add(binomialControlSizer, 1, wx.EXPAND, 5)
binomialSizer1.Add(binomialSizer2, 1, wx.EXPAND, 5 )
binomialSection.Add( binomialSizer1, 1, wx.EXPAND, 5 )
binomialButton = wx.Button( binomialPanel, wx.ID_ANY, u"Run binomial", wx.DefaultPosition, wx.DefaultSize, 0 )
binomialSection.Add( binomialButton, 0, wx.ALL|wx.ALIGN_CENTER_HORIZONTAL, 5 )
binomialPanel.SetSizer( binomialSection )
binomialPanel.Layout()
binomialSection.Fit( binomialPanel )
#Connect events
binomialButton.Bind( wx.EVT_BUTTON, self.wxobj.RunMethod )
self.panel = binomialPanel
########## CLASS #######################
class BinomialMethod(base.SingleConditionMethod):
"""
binomial
"""
def __init__(self,
ctrldata,
annotation_path,
output_file,
samples=10000,
burnin=500,
replicates="Sum",
normalization=None,
LOESS=False,
ignoreCodon=True,
NTerminus=0.0,
CTerminus=0.0,
pi0=0.5,
pi1=0.5,
M0=1.0,
M1=1.0,
a0=10.0,
a1=10.0,
b0=1.0,
b1=1.0,
alpha_w=0.5,
beta_w=0.5,
wxobj=None):
base.SingleConditionMethod.__init__(self, short_name, long_name, description, ctrldata, annotation_path, output_file, replicates=replicates, normalization=normalization, LOESS=LOESS, NTerminus=NTerminus, CTerminus=CTerminus, wxobj=wxobj)
self.samples = samples
self.burnin = burnin
self.pi0 = pi0
self.pi1 = pi1
self.M0 = M0
self.M1 = M1
self.a0 = a0
self.a1 = a1
self.b0 = b0
self.b1 = b1
self.alpha_w = alpha_w
self.beta_w = beta_w
@classmethod
def fromGUI(self, wxobj):
""" """
#Get Annotation file
annotationPath = wxobj.annotation
if not transit_tools.validate_annotation(annotationPath):
return None
#Get selected files
ctrldata = wxobj.ctrlSelected()
if not transit_tools.validate_control_datasets(ctrldata):
return None
#Validate transposon types
if not transit_tools.validate_filetypes(ctrldata, transposons):
return None
#Read the parameters from the wxPython widgets
samples = int(wxobj.binomialSampleText.GetValue())
burnin = int(wxobj.binomialBurnText.GetValue())
ignoreCodon = True
NTerminus = float(wxobj.globalNTerminusText.GetValue())
CTerminus = float(wxobj.globalCTerminusText.GetValue())
replicates = "Sum"
normalization = None
LOESS = False
#Get output path
name = transit_tools.basename(ctrldata[0])
defaultFileName = "binomial_output.dat"
defaultDir = os.getcwd()
output_path = wxobj.SaveFile(defaultDir, defaultFileName)
if not output_path: return None
output_file = open(output_path, "w")
return self(ctrldata,
annotationPath,
output_file,
samples=samples,
burnin=burnin,
replicates=replicates,
normalization=normalization,
LOESS=LOESS,
ignoreCodon=ignoreCodon,
NTerminus=NTerminus,
CTerminus=CTerminus,
wxobj=wxobj)
@classmethod
def fromargs(self, rawargs):
(args, kwargs) = transit_tools.cleanargs(rawargs)
ctrldata = args[0].split(",")
annotationPath = args[1]
outpath = args[2]
output_file = open(outpath, "w")
samples = int(kwargs.get("s", 10000))
burnin = int(kwargs.get("b", 500))
replicates = "Sum"
normalization = None
LOESS = False
ignoreCodon = True
NTerminus = float(kwargs.get("iN", 0.0))
CTerminus = float(kwargs.get("iC", 0.0))
pi0 = float(kwargs.get("pi0", 0.5))
pi1 = float(kwargs.get("pi1", 0.5))
M0 = float(kwargs.get("M0", 1.0))
M1 = float(kwargs.get("M1", 1.0))
a0 = float(kwargs.get("a0", 10.0))
a1 = float(kwargs.get("a1", 10.0))
b0 = float(kwargs.get("b0", 1.0))
b1 = float(kwargs.get("b1", 1.0))
alpha_w = float(kwargs.get("aw", 0.5))
beta_w = float(kwargs.get("bw", 0.5))
return self(ctrldata,
annotationPath,
output_file,
samples=samples,
burnin=burnin,
replicates=replicates,
normalization=normalization,
LOESS=LOESS,
ignoreCodon=ignoreCodon,
NTerminus=NTerminus,
CTerminus=CTerminus,
pi0=pi0,
pi1=pi1,
M0=M0,
M1=M1,
a0=a0,
a1=a1,
b0=b0,
b1=b1,
alpha_w=alpha_w,
beta_w=beta_w)
def Run(self):
self.transit_message("Starting Binomial Method")
start_time = time.time()
self.progress_range(self.samples+self.burnin)
#Get orf data
self.transit_message("Getting Data")
G = tnseq_tools.Genes(self.ctrldata, self.annotation_path, ignoreCodon=self.ignoreCodon, nterm=self.NTerminus, cterm=self.CTerminus)
#Parameters
self.transit_message("Setting Parameters")
w1 = 0.15
w0 = 1.0 - w1
mu_c = 0
Ngenes = len(G)
sample_size = self.samples+self.burnin
numReps = len(self.ctrldata)
theta = numpy.zeros((Ngenes, sample_size))
theta[:,0] = 0.10
rho0 = numpy.zeros(sample_size); rho0[0] = 0.5; Kp0 = numpy.zeros(sample_size); Kp0[0] = 10;
rho1 = numpy.zeros(sample_size); rho1[0] = 0.10; Kp1 = numpy.zeros(sample_size); Kp1[0] = 3;
Z = numpy.zeros((Ngenes, sample_size))
pz1 = numpy.zeros(sample_size);
n1 = 0
w1 = scipy.stats.beta.rvs(self.alpha_w, self.beta_w)
W1 = numpy.zeros(sample_size); W1[0] = w1
#
self.transit_message("Setting Initial Values")
K = numpy.array([sum([1 for x in gene.reads.flatten() if x> 0]) for gene in G])
N = numpy.array([len(gene.reads.flatten()) for gene in G])
for g,gene in enumerate(G):
if N[g] == 0: theta[g][0] = 0.5
elif K[g]/float(N[g]) == 0: theta[g][0] = 0.001
elif K[g]/float(N[g]) == 1: theta[g][0] = 0.001
else: theta[g][0] = K[g]/float(N[g])
#print g, ORF[g], K[g], N[g], theta[g][0]
Z[g][0] = scipy.stats.bernoulli.rvs(1-theta[g][0])
acc_p0 = 0; acc_k0 = 0;
acc_p1 = 0; acc_k1 = 0;
rho0c_std = 0.010
kp0c_std = 1.40
rho1c_std = 0.009
kp1c_std = 1.1
numpy.seterr(divide='ignore')
for i in range(1, sample_size):
i0 = Z[:,i-1] == 0; n0 = numpy.sum(i0);
i1 = Z[:,i-1] == 1; n1 = numpy.sum(i1);
theta[i0,i] = scipy.stats.beta.rvs(Kp0[i-1]*rho0[i-1] + K[i0], Kp0[i-1]*(1-rho0[i-1]) + N[i0] - K[i0])
theta[i1,i] = scipy.stats.beta.rvs(Kp1[i-1]*rho1[i-1] + K[i1], Kp1[i-1]*(1-rho1[i-1]) + N[i1] - K[i1])
rho0_c = rho0[i-1] + scipy.stats.norm.rvs(0, rho0c_std)
Kp0_c = Kp0[i-1] + scipy.stats.norm.rvs(0, kp0c_std)
if rho0_c <= 0: rho0[i] = rho0[i-1]
else:
fc = numpy.log(scipy.stats.beta.pdf(rho0_c, self.M0*self.pi0, self.M0*(1.0-self.pi0)))
f0 = numpy.log(scipy.stats.beta.pdf(rho0[i-1], self.M0*self.pi0, self.M0*(1.0-self.pi0)))
fc += numpy.sum(numpy.log(scipy.stats.beta.pdf(theta[i0,i], Kp0[i-1]*rho0_c, Kp0[i-1]*(1-rho0_c))))
f0 += numpy.sum(numpy.log(scipy.stats.beta.pdf(theta[i0,i], Kp0[i-1]*rho0[i-1], Kp0[i-1]*(1-rho0[i-1]))))
if numpy.log(scipy.stats.uniform.rvs()) < fc - f0:
rho0[i] = rho0_c
acc_p0+=1
else: rho0[i] = rho0[i-1]
if Kp0_c <= 0: Kp0[i] = Kp0[i-1]
else:
fc = numpy.log(scipy.stats.gamma.pdf(Kp0_c, self.a0, self.b0));
f0 = numpy.log(scipy.stats.gamma.pdf(Kp0[i-1], self.a0, self.b0));
fc += numpy.sum(numpy.log(scipy.stats.beta.pdf(theta[i0,i], Kp0_c*rho0[i], Kp0_c*(1-rho0[i]))))
f0 += numpy.sum(numpy.log(scipy.stats.beta.pdf(theta[i0,i], Kp0[i-1]*rho0[i], Kp0[i-1]*(1-rho0[i]))))
if numpy.log(scipy.stats.uniform.rvs()) < fc - f0:
Kp0[i] = Kp0_c
acc_k0+=1
else: Kp0[i] = Kp0[i-1]
rho1_c = rho1[i-1] + scipy.stats.norm.rvs(0, rho1c_std)
Kp1_c = Kp1[i-1] + scipy.stats.norm.rvs(0, kp1c_std)
if rho1_c <= 0:
rho1[i] = rho1[i-1]
else:
fc = numpy.log(scipy.stats.beta.pdf(rho1_c, self.M1*self.pi1, self.M1*(1-self.pi1)))
f1 = numpy.log(scipy.stats.beta.pdf(rho1[i-1], self.M1*self.pi1, self.M1*(1-self.pi1)))
fc += numpy.sum(numpy.log(scipy.stats.beta.pdf(theta[i1,i], Kp1[i-1]*rho1_c, Kp1[i-1]*(1-rho1_c))))
f1 += numpy.sum(numpy.log(scipy.stats.beta.pdf(theta[i1,i], Kp1[i-1]*rho1[i-1], Kp1[i-1]*(1-rho1[i-1]))))
if numpy.log(scipy.stats.uniform.rvs()) < fc - f1:
rho1[i] = rho1_c
acc_p1+=1
else: rho1[i] = rho1[i-1]
if Kp1_c <= 0: Kp1[i] = Kp1[i-1]
else:
fc = numpy.log(scipy.stats.gamma.pdf(Kp1_c, self.a1, self.b1));
f1 = numpy.log(scipy.stats.gamma.pdf(Kp1[i-1], self.a1, self.b1));
fc += numpy.sum(numpy.log(scipy.stats.beta.pdf(theta[i1,i], Kp1_c*rho1[i], Kp1_c*(1-rho1[i]))))
f1 += numpy.sum(numpy.log(scipy.stats.beta.pdf(theta[i1,i], Kp1[i-1]*rho1[i], Kp1[i-1]*(1-rho1[i]))))
if numpy.log(scipy.stats.uniform.rvs()) < fc - f1:
Kp1[i] = Kp1_c
acc_k1+=1
else: Kp1[i] = Kp1[i-1]
g0 = scipy.stats.beta.pdf(theta[:,i], Kp0[i]*rho0[i], Kp0[i]*(1-rho0[i])) * (1-w1)
g1 = scipy.stats.beta.pdf(theta[:,i], Kp1[i]*rho1[i], Kp1[i]*(1-rho1[i])) * (w1)
p1 = g1/(g0+g1)
p1 = numpy.nan_to_num(p1)
try:
Z[:,i] = scipy.stats.bernoulli.rvs(p1)
except:
inan = numpy.isnan(p1)
print >> sys.stderr, "K=\t", K[inan]
print >> sys.stderr, "N=\t", N[inan]
print >> sys.stderr, "theta=", theta[inan,i]
sys.exit()
pz1[i] = p1[0]
i1 = Z[:,i] == 1; n1 = numpy.sum(i1);
#w1 = 0.15
w1 = scipy.stats.beta.rvs(self.alpha_w + n1, self.beta_w + Ngenes - n1)
W1[i] = w1
#Update progress
text = "Running Binomial Method... %2.0f%%" % (100.0*(i+1)/(sample_size))
self.progress_update(text, i)
self.transit_message_inplace(text)
numpy.seterr(divide='warn')
z_bar = numpy.apply_along_axis(numpy.mean, 1, Z[:, self.burnin:])
theta_bar = numpy.apply_along_axis(numpy.mean, 1, theta[:, self.burnin:])
#(ess_threshold, noness_threshold) = stat_tools.fdr_post_prob(z_bar)
(ess_threshold, noness_threshold) = stat_tools.bayesian_ess_thresholds(z_bar)
self.output.write("#Binomial\n")
#output.write("#Command: %s\n" % " ".join(["%s=%s" %(key,val) for (key,val) in kwargs.items()]))
if self.wxobj:
members = sorted([attr for attr in dir(self) if not callable(getattr(self,attr)) and not attr.startswith("__")])
memberstr = ""
for m in members:
memberstr += "%s = %s, " % (m, getattr(self, m))
self.output.write("#GUI with: ctrldata=%s, annotation=%s, output=%s, samples=%s, burnin=%s\n" % (",".join(self.ctrldata).encode('utf-8'), self.annotation_path.encode('utf-8'), self.output.name.encode('utf-8'), self.samples, self.burnin))
else:
self.output.write("#Console: python %s\n" % " ".join(sys.argv))
self.output.write("#Thresholds: (%1.5f, %1.5f)\n" % (ess_threshold,noness_threshold))
self.output.write("#rho0 Acceptance Rate:\t%f%%\n" % ((100.0*acc_p0)/sample_size))
self.output.write("#Kp0 Acceptance Rate:\t%f%%\n" % ((100.0*acc_k0)/sample_size))
self.output.write("#rho1 Acceptance Rate:\t%f%%\n" % ((100.0*acc_p1)/sample_size))
self.output.write("#Kp1 Acceptance Rate:\t%f%%\n" % ((100.0*acc_k1)/sample_size))
self.output.write("#Hyperparameters rho: \t%1.2f\t%3.1f\t%1.2f\t%3.1f\n" % (self.pi0, self.M0, self.pi1, self.M1))
self.output.write("#Hyperparameters Kp: \t%3.1f\t%3.1f\t%3.1f\t%3.1f\n" % (self.a0, self.b0, self.a1, self.b1))
self.output.write("#Hyperparameters W: \t%1.3f\t%1.3f\n" % (self.alpha_w, self.beta_w))
self.output.write("#%s\n" % "\t".join(columns))
data = []
for g,gene in enumerate(G):
c = "Uncertain"
if z_bar[g] > ess_threshold:
c = "Essential"
if z_bar[g] < noness_threshold:
c = "Non-Essential"
data.append("%s\t%s\t%s\t%1.1f\t%d\t%d\t%d\t%f\t%f\t%s" % (gene.orf, gene.name, gene.desc, K[g]/float(numReps), N[g]/numReps, K[g], N[g], theta_bar[g], z_bar[g], c))
data.sort()
for row in data:
self.output.write("%s\n" % row)
self.output.close()
self.transit_message("") # Printing empty line to flush stdout
self.transit_message("Adding File: %s" % (self.output.name))
self.add_file(filetype="Binomial")
self.finish()
self.transit_message("Finished Binomial Method")
@classmethod
def usage_string(self):
return """python %s binomial