pax_global_header00006660000000000000000000000064130575226670014527gustar00rootroot0000000000000052 comment=2147cb8465839f42660e5af5168c010375db12ba TFBS-0.7.1/000077500000000000000000000000001305752266700122725ustar00rootroot00000000000000TFBS-0.7.1/BUGS000066400000000000000000000006501305752266700127560ustar00rootroot00000000000000Known Bugs TFBS 0.3.0 ============= * TFBS::PatternGen::Gibbs sometimes emits an "unequal column counts" warning. This is a result of sporadic roundoff errors due to information loss when converting form Gibbs percentage matrices to PFMs. We are working on a more inteligent conversion procedure, while trying to persuade the authors of Gibbs program to include standard PFMs in the program's output. TFBS-0.7.1/Changes000066400000000000000000000027761305752266700136010ustar00rootroot00000000000000 Changes in 0.3.1: ================ * Available as two distributions 0.3.1s - for use with current stable (0.7.*) release of bioperl 0.3.1d - for use with development (currently 1.0.alpha) release of bioperl; produced using patches kindly provided by Jason Stajich * Added POD documentation for * Iterator method in TFBS::MatrixSet, TFBS::SiteSet and TFBS::SitePairSet * search_seq and search_aln methods in TFBS::MatrixSet * TFBS::Matrix::PWM : fixed bug in handling -seqstring parameter passed to the search_seq method * TFBS::Matrix::* : fixed bug in handling -matrixstring parameter passed to the constructor Changes in 0.3.0: ================ * All aggregate classes (TFBS::MatrixSet, TFBS::SiteSet and TFBS::SitePairSet) have iterators with uniform interface. * added search_aln method to TFBS::MatrixSet, making possible phylogenetic footprinting scans with sets of matrices * Removed absolute requirement for the GD.pm module: its import is deferred until the first call of draw_logo method of TFBS::Matrix subclasses. Package test suite now does not require it, either. * Changes in Makefile.PL: it now very clearly notifies user about missing prerequisute modules. * Improved documentation: added README and CHANGES files, and data model information for JASPAR2 database in TFBS::DB::JASPAR2 POD * More example scripts included in the distribution (see below) * Fixed quite a few bugs, mainly in TFBS::DB::FlatFileDir and aggregate classes TFBS-0.7.1/Ext/000077500000000000000000000000001305752266700130325ustar00rootroot00000000000000TFBS-0.7.1/Ext/MYMETA.json000066400000000000000000000014331305752266700147220ustar00rootroot00000000000000{ "abstract" : "unknown", "author" : [ "unknown" ], "dynamic_config" : 0, "generated_by" : "ExtUtils::MakeMaker version 6.74, CPAN::Meta::Converter version 2.132140", "license" : [ "unknown" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : "2" }, "name" : "TFBS-Ext-pwmsearch", "no_index" : { "directory" : [ "t", "inc" ] }, "prereqs" : { "build" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "configure" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "runtime" : { "requires" : {} } }, "release_status" : "stable", "version" : "0.2" } TFBS-0.7.1/Ext/MYMETA.yml000066400000000000000000000006611305752266700145540ustar00rootroot00000000000000--- abstract: unknown author: - unknown build_requires: ExtUtils::MakeMaker: 0 configure_requires: ExtUtils::MakeMaker: 0 dynamic_config: 0 generated_by: 'ExtUtils::MakeMaker version 6.74, CPAN::Meta::Converter version 2.132140' license: unknown meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: 1.4 name: TFBS-Ext-pwmsearch no_index: directory: - t - inc requires: {} version: 0.2 TFBS-0.7.1/Ext/Makefile000066400000000000000000000642271305752266700145050ustar00rootroot00000000000000# This Makefile is for the TFBS::Ext::pwmsearch extension to perl. # # It was generated automatically by MakeMaker version # 6.74 (Revision: 67400) from the contents of # Makefile.PL. Don't edit this file, edit Makefile.PL instead. # # ANY CHANGES MADE HERE WILL BE LOST! # # MakeMaker ARGV: () # # MakeMaker Parameters: # BUILD_REQUIRES => { } # CONFIGURE_REQUIRES => { } # DEFINE => q[] # INC => q[-I. -I./lib] # LIBS => [q[-lm]] # NAME => q[TFBS::Ext::pwmsearch] # PREREQ_PM => { } # TEST_REQUIRES => { } # VERSION_FROM => q[pwmsearch.pm] # --- MakeMaker post_initialize section: # --- MakeMaker const_config section: # These definitions are from config.sh (via /System/Library/Perl/5.12/darwin-thread-multi-2level/Config.pm). # They may have been overridden via Makefile.PL or on the command line. AR = ar CC = clang CCCDLFLAGS = CCDLFLAGS = DLEXT = bundle DLSRC = dl_dlopen.xs EXE_EXT = FULL_AR = /usr/bin/ar LD = clang -mmacosx-version-min=10.8 LDDLFLAGS = -arch i386 -arch x86_64 -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector LDFLAGS = -arch i386 -arch x86_64 -fstack-protector -L/usr/local/lib LIBC = LIB_EXT = .a OBJ_EXT = .o OSNAME = darwin OSVERS = 12.0 RANLIB = /usr/bin/ar s SITELIBEXP = /Library/Perl/5.12 SITEARCHEXP = /Library/Perl/5.12/darwin-thread-multi-2level SO = dylib VENDORARCHEXP = /Network/Library/Perl/5.12/darwin-thread-multi-2level VENDORLIBEXP = /Network/Library/Perl/5.12 # --- MakeMaker constants section: AR_STATIC_ARGS = cr DIRFILESEP = / DFSEP = $(DIRFILESEP) NAME = TFBS::Ext::pwmsearch NAME_SYM = TFBS_Ext_pwmsearch VERSION = 0.2 VERSION_MACRO = VERSION VERSION_SYM = 0_2 DEFINE_VERSION = -D$(VERSION_MACRO)=\"$(VERSION)\" XS_VERSION = 0.2 XS_VERSION_MACRO = XS_VERSION XS_DEFINE_VERSION = -D$(XS_VERSION_MACRO)=\"$(XS_VERSION)\" INST_ARCHLIB = ../blib/arch INST_SCRIPT = ../blib/script INST_BIN = ../blib/bin INST_LIB = ../blib/lib INST_MAN1DIR = ../blib/man1 INST_MAN3DIR = ../blib/man3 MAN1EXT = 1 MAN3EXT = 3pm INSTALLDIRS = site DESTDIR = PREFIX = $(SITEPREFIX) PERLPREFIX = / SITEPREFIX = /usr/local VENDORPREFIX = /usr/local INSTALLPRIVLIB = /Library/Perl/Updates/5.12.4 DESTINSTALLPRIVLIB = $(DESTDIR)$(INSTALLPRIVLIB) INSTALLSITELIB = /Library/Perl/5.12 DESTINSTALLSITELIB = $(DESTDIR)$(INSTALLSITELIB) INSTALLVENDORLIB = /Network/Library/Perl/5.12 DESTINSTALLVENDORLIB = $(DESTDIR)$(INSTALLVENDORLIB) INSTALLARCHLIB = /Library/Perl/Updates/5.12.4/darwin-thread-multi-2level DESTINSTALLARCHLIB = $(DESTDIR)$(INSTALLARCHLIB) INSTALLSITEARCH = /Library/Perl/5.12/darwin-thread-multi-2level DESTINSTALLSITEARCH = $(DESTDIR)$(INSTALLSITEARCH) INSTALLVENDORARCH = /Network/Library/Perl/5.12/darwin-thread-multi-2level DESTINSTALLVENDORARCH = $(DESTDIR)$(INSTALLVENDORARCH) INSTALLBIN = /usr/bin DESTINSTALLBIN = $(DESTDIR)$(INSTALLBIN) INSTALLSITEBIN = /usr/local/bin DESTINSTALLSITEBIN = $(DESTDIR)$(INSTALLSITEBIN) INSTALLVENDORBIN = /usr/local/bin DESTINSTALLVENDORBIN = $(DESTDIR)$(INSTALLVENDORBIN) INSTALLSCRIPT = /usr/bin DESTINSTALLSCRIPT = $(DESTDIR)$(INSTALLSCRIPT) INSTALLSITESCRIPT = /usr/local/bin DESTINSTALLSITESCRIPT = $(DESTDIR)$(INSTALLSITESCRIPT) INSTALLVENDORSCRIPT = /usr/local/bin DESTINSTALLVENDORSCRIPT = $(DESTDIR)$(INSTALLVENDORSCRIPT) INSTALLMAN1DIR = /usr/share/man/man1 DESTINSTALLMAN1DIR = $(DESTDIR)$(INSTALLMAN1DIR) INSTALLSITEMAN1DIR = /usr/local/share/man/man1 DESTINSTALLSITEMAN1DIR = $(DESTDIR)$(INSTALLSITEMAN1DIR) INSTALLVENDORMAN1DIR = /usr/local/share/man/man1 DESTINSTALLVENDORMAN1DIR = $(DESTDIR)$(INSTALLVENDORMAN1DIR) INSTALLMAN3DIR = /usr/share/man/man3 DESTINSTALLMAN3DIR = $(DESTDIR)$(INSTALLMAN3DIR) INSTALLSITEMAN3DIR = /usr/local/share/man/man3 DESTINSTALLSITEMAN3DIR = $(DESTDIR)$(INSTALLSITEMAN3DIR) INSTALLVENDORMAN3DIR = /usr/local/share/man/man3 DESTINSTALLVENDORMAN3DIR = $(DESTDIR)$(INSTALLVENDORMAN3DIR) PERL_LIB = /System/Library/Perl/5.12 PERL_ARCHLIB = /System/Library/Perl/5.12/darwin-thread-multi-2level LIBPERL_A = libperl.a FIRST_MAKEFILE = Makefile MAKEFILE_OLD = Makefile.old MAKE_APERL_FILE = Makefile.aperl PERLMAINCC = $(CC) PERL_INC = /System/Library/Perl/5.12/darwin-thread-multi-2level/CORE PERL = /usr/bin/perl FULLPERL = /usr/bin/perl ABSPERL = $(PERL) PERLRUN = $(PERL) FULLPERLRUN = $(FULLPERL) ABSPERLRUN = $(ABSPERL) PERLRUNINST = $(PERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" FULLPERLRUNINST = $(FULLPERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" ABSPERLRUNINST = $(ABSPERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" PERL_CORE = 0 PERM_DIR = 755 PERM_RW = 644 PERM_RWX = 755 MAKEMAKER = /Library/Perl/5.12/ExtUtils/MakeMaker.pm MM_VERSION = 6.74 MM_REVISION = 67400 # FULLEXT = Pathname for extension directory (eg Foo/Bar/Oracle). # BASEEXT = Basename part of FULLEXT. May be just equal FULLEXT. (eg Oracle) # PARENT_NAME = NAME without BASEEXT and no trailing :: (eg Foo::Bar) # DLBASE = Basename part of dynamic library. May be just equal BASEEXT. MAKE = make FULLEXT = TFBS/Ext/pwmsearch BASEEXT = pwmsearch PARENT_NAME = TFBS::Ext DLBASE = $(BASEEXT) VERSION_FROM = pwmsearch.pm INC = -I. -I./lib DEFINE = OBJECT = $(BASEEXT)$(OBJ_EXT) LDFROM = $(OBJECT) LINKTYPE = dynamic BOOTDEP = # Handy lists of source code files: XS_FILES = pwmsearch.xs C_FILES = pwmsearch.c O_FILES = pwmsearch.o H_FILES = MAN1PODS = MAN3PODS = pwmsearch.pm # Where is the Config information that we are using/depend on CONFIGDEP = $(PERL_ARCHLIB)$(DFSEP)Config.pm $(PERL_INC)$(DFSEP)config.h # Where to build things INST_LIBDIR = $(INST_LIB)/TFBS/Ext INST_ARCHLIBDIR = $(INST_ARCHLIB)/TFBS/Ext INST_AUTODIR = $(INST_LIB)/auto/$(FULLEXT) INST_ARCHAUTODIR = $(INST_ARCHLIB)/auto/$(FULLEXT) INST_STATIC = $(INST_ARCHAUTODIR)/$(BASEEXT)$(LIB_EXT) INST_DYNAMIC = $(INST_ARCHAUTODIR)/$(DLBASE).$(DLEXT) INST_BOOT = $(INST_ARCHAUTODIR)/$(BASEEXT).bs # Extra linker info EXPORT_LIST = PERL_ARCHIVE = PERL_ARCHIVE_AFTER = TO_INST_PM = lib/pwm_search.h \ lib/pwm_searchPFF.c \ pwmsearch.pm PM_TO_BLIB = pwmsearch.pm \ $(INST_LIB)/TFBS/Ext/pwmsearch.pm \ lib/pwm_search.h \ ../blib/lib/pwm_search.h \ lib/pwm_searchPFF.c \ ../blib/lib/pwm_searchPFF.c # --- MakeMaker platform_constants section: MM_Unix_VERSION = 6.74 PERL_MALLOC_DEF = -DPERL_EXTMALLOC_DEF -Dmalloc=Perl_malloc -Dfree=Perl_mfree -Drealloc=Perl_realloc -Dcalloc=Perl_calloc # --- MakeMaker tool_autosplit section: # Usage: $(AUTOSPLITFILE) FileToSplit AutoDirToSplitInto AUTOSPLITFILE = $(ABSPERLRUN) -e 'use AutoSplit; autosplit($$$$ARGV[0], $$$$ARGV[1], 0, 1, 1)' -- # --- MakeMaker tool_xsubpp section: XSUBPPDIR = /System/Library/Perl/5.12/ExtUtils XSUBPP = $(XSUBPPDIR)$(DFSEP)xsubpp XSUBPPRUN = $(PERLRUN) $(XSUBPP) XSPROTOARG = XSUBPPDEPS = /System/Library/Perl/5.12/ExtUtils/typemap $(XSUBPP) XSUBPPARGS = -typemap /System/Library/Perl/5.12/ExtUtils/typemap XSUBPP_EXTRA_ARGS = # --- MakeMaker tools_other section: SHELL = /bin/sh CHMOD = chmod CP = cp MV = mv NOOP = $(TRUE) NOECHO = @ RM_F = rm -f RM_RF = rm -rf TEST_F = test -f TOUCH = touch UMASK_NULL = umask 0 DEV_NULL = > /dev/null 2>&1 MKPATH = $(ABSPERLRUN) -MExtUtils::Command -e 'mkpath' -- EQUALIZE_TIMESTAMP = $(ABSPERLRUN) -MExtUtils::Command -e 'eqtime' -- FALSE = false TRUE = true ECHO = echo ECHO_N = echo -n UNINST = 0 VERBINST = 0 MOD_INSTALL = $(ABSPERLRUN) -MExtUtils::Install -e 'install([ from_to => {@ARGV}, verbose => '\''$(VERBINST)'\'', uninstall_shadows => '\''$(UNINST)'\'', dir_mode => '\''$(PERM_DIR)'\'' ]);' -- DOC_INSTALL = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'perllocal_install' -- UNINSTALL = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'uninstall' -- WARN_IF_OLD_PACKLIST = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'warn_if_old_packlist' -- MACROSTART = MACROEND = USEMAKEFILE = -f FIXIN = $(ABSPERLRUN) -MExtUtils::MY -e 'MY->fixin(shift)' -- # --- MakeMaker makemakerdflt section: makemakerdflt : all $(NOECHO) $(NOOP) # --- MakeMaker dist section skipped. # --- MakeMaker macro section: # --- MakeMaker depend section: # --- MakeMaker cflags section: CCFLAGS = -arch i386 -arch x86_64 -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -fstack-protector -I/usr/local/include OPTIMIZE = -Os PERLTYPE = MPOLLUTE = # --- MakeMaker const_loadlibs section: # TFBS::Ext::pwmsearch might depend on some other libraries: # See ExtUtils::Liblist for details # EXTRALIBS = LDLOADLIBS = -lm BSLOADLIBS = # --- MakeMaker const_cccmd section: CCCMD = $(CC) -c $(PASTHRU_INC) $(INC) \ $(CCFLAGS) $(OPTIMIZE) \ $(PERLTYPE) $(MPOLLUTE) $(DEFINE_VERSION) \ $(XS_DEFINE_VERSION) # --- MakeMaker post_constants section: # --- MakeMaker pasthru section: PASTHRU = LIBPERL_A="$(LIBPERL_A)"\ LINKTYPE="$(LINKTYPE)"\ OPTIMIZE="$(OPTIMIZE)"\ PREFIX="$(PREFIX)"\ PASTHRU_DEFINE="$(PASTHRU_DEFINE)"\ PASTHRU_INC="$(PASTHRU_INC)" # --- MakeMaker special_targets section: .SUFFIXES : .xs .c .C .cpp .i .s .cxx .cc $(OBJ_EXT) .PHONY: all config static dynamic test linkext manifest blibdirs clean realclean disttest distdir # --- MakeMaker c_o section: .c.i: clang -E -c $(PASTHRU_INC) $(INC) \ $(CCFLAGS) $(OPTIMIZE) \ $(PERLTYPE) $(MPOLLUTE) $(DEFINE_VERSION) \ $(XS_DEFINE_VERSION) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c > $*.i .c.s: $(CCCMD) -S $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c .c$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c .cpp$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cpp .cxx$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cxx .cc$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cc .C$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.C # --- MakeMaker xs_c section: .xs.c: $(XSUBPPRUN) $(XSPROTOARG) $(XSUBPPARGS) $(XSUBPP_EXTRA_ARGS) $*.xs > $*.xsc && $(MV) $*.xsc $*.c # --- MakeMaker xs_o section: .xs$(OBJ_EXT): $(XSUBPPRUN) $(XSPROTOARG) $(XSUBPPARGS) $*.xs > $*.xsc && $(MV) $*.xsc $*.c $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c # --- MakeMaker top_targets section: all :: pure_all manifypods $(NOECHO) $(NOOP) pure_all :: config pm_to_blib subdirs linkext $(NOECHO) $(NOOP) subdirs :: $(MYEXTLIB) $(NOECHO) $(NOOP) config :: $(FIRST_MAKEFILE) blibdirs $(NOECHO) $(NOOP) help : perldoc ExtUtils::MakeMaker # --- MakeMaker blibdirs section: blibdirs : $(INST_LIBDIR)$(DFSEP).exists $(INST_ARCHLIB)$(DFSEP).exists $(INST_AUTODIR)$(DFSEP).exists $(INST_ARCHAUTODIR)$(DFSEP).exists $(INST_BIN)$(DFSEP).exists $(INST_SCRIPT)$(DFSEP).exists $(INST_MAN1DIR)$(DFSEP).exists $(INST_MAN3DIR)$(DFSEP).exists $(NOECHO) $(NOOP) # Backwards compat with 6.18 through 6.25 blibdirs.ts : blibdirs $(NOECHO) $(NOOP) $(INST_LIBDIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_LIBDIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_LIBDIR) $(NOECHO) $(TOUCH) $(INST_LIBDIR)$(DFSEP).exists $(INST_ARCHLIB)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_ARCHLIB) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_ARCHLIB) $(NOECHO) $(TOUCH) $(INST_ARCHLIB)$(DFSEP).exists $(INST_AUTODIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_AUTODIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_AUTODIR) $(NOECHO) $(TOUCH) $(INST_AUTODIR)$(DFSEP).exists $(INST_ARCHAUTODIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_ARCHAUTODIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_ARCHAUTODIR) $(NOECHO) $(TOUCH) $(INST_ARCHAUTODIR)$(DFSEP).exists $(INST_BIN)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_BIN) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_BIN) $(NOECHO) $(TOUCH) $(INST_BIN)$(DFSEP).exists $(INST_SCRIPT)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_SCRIPT) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_SCRIPT) $(NOECHO) $(TOUCH) $(INST_SCRIPT)$(DFSEP).exists $(INST_MAN1DIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_MAN1DIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_MAN1DIR) $(NOECHO) $(TOUCH) $(INST_MAN1DIR)$(DFSEP).exists $(INST_MAN3DIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_MAN3DIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_MAN3DIR) $(NOECHO) $(TOUCH) $(INST_MAN3DIR)$(DFSEP).exists # --- MakeMaker linkext section: linkext :: $(LINKTYPE) $(NOECHO) $(NOOP) # --- MakeMaker dlsyms section: # --- MakeMaker dynamic section: dynamic :: $(FIRST_MAKEFILE) $(INST_DYNAMIC) $(INST_BOOT) $(NOECHO) $(NOOP) # --- MakeMaker dynamic_bs section: BOOTSTRAP = $(BASEEXT).bs # As Mkbootstrap might not write a file (if none is required) # we use touch to prevent make continually trying to remake it. # The DynaLoader only reads a non-empty file. $(BOOTSTRAP) : $(FIRST_MAKEFILE) $(BOOTDEP) $(INST_ARCHAUTODIR)$(DFSEP).exists $(NOECHO) $(ECHO) "Running Mkbootstrap for $(NAME) ($(BSLOADLIBS))" $(NOECHO) $(PERLRUN) \ "-MExtUtils::Mkbootstrap" \ -e "Mkbootstrap('$(BASEEXT)','$(BSLOADLIBS)');" $(NOECHO) $(TOUCH) $@ $(CHMOD) $(PERM_RW) $@ $(INST_BOOT) : $(BOOTSTRAP) $(INST_ARCHAUTODIR)$(DFSEP).exists $(NOECHO) $(RM_RF) $@ - $(CP) $(BOOTSTRAP) $@ $(CHMOD) $(PERM_RW) $@ # --- MakeMaker dynamic_lib section: # This section creates the dynamically loadable $(INST_DYNAMIC) # from $(OBJECT) and possibly $(MYEXTLIB). ARMAYBE = : OTHERLDFLAGS = INST_DYNAMIC_DEP = INST_DYNAMIC_FIX = $(INST_DYNAMIC): $(OBJECT) $(MYEXTLIB) $(BOOTSTRAP) $(INST_ARCHAUTODIR)$(DFSEP).exists $(EXPORT_LIST) $(PERL_ARCHIVE) $(PERL_ARCHIVE_AFTER) $(INST_DYNAMIC_DEP) $(RM_F) $@ $(LD) $(LDDLFLAGS) $(LDFROM) $(OTHERLDFLAGS) -o $@ $(MYEXTLIB) \ $(PERL_ARCHIVE) $(LDLOADLIBS) $(PERL_ARCHIVE_AFTER) $(EXPORT_LIST) \ $(INST_DYNAMIC_FIX) $(CHMOD) $(PERM_RWX) $@ # --- MakeMaker static section: ## $(INST_PM) has been moved to the all: target. ## It remains here for awhile to allow for old usage: "make static" static :: $(FIRST_MAKEFILE) $(INST_STATIC) $(NOECHO) $(NOOP) # --- MakeMaker static_lib section: $(INST_STATIC) : $(OBJECT) $(MYEXTLIB) $(INST_ARCHAUTODIR)$(DFSEP).exists $(RM_RF) $@ $(FULL_AR) $(AR_STATIC_ARGS) $@ $(OBJECT) && $(RANLIB) $@ $(CHMOD) $(PERM_RWX) $@ $(NOECHO) $(ECHO) "$(EXTRALIBS)" > $(INST_ARCHAUTODIR)/extralibs.ld # --- MakeMaker manifypods section: POD2MAN_EXE = $(PERLRUN) "-MExtUtils::Command::MM" -e pod2man "--" POD2MAN = $(POD2MAN_EXE) manifypods : pure_all \ pwmsearch.pm $(NOECHO) $(POD2MAN) --section=3 --perm_rw=$(PERM_RW) \ pwmsearch.pm $(INST_MAN3DIR)/TFBS::Ext::pwmsearch.$(MAN3EXT) # --- MakeMaker processPL section: # --- MakeMaker installbin section: # --- MakeMaker subdirs section: # none # --- MakeMaker clean_subdirs section: clean_subdirs : $(NOECHO) $(NOOP) # --- MakeMaker clean section: # Delete temporary files but do not touch installed files. We don't delete # the Makefile here so a later make realclean still has a makefile to use. clean :: clean_subdirs - $(RM_F) \ *$(LIB_EXT) core \ core.[0-9] $(INST_ARCHAUTODIR)/extralibs.all \ core.[0-9][0-9] $(BASEEXT).bso \ pm_to_blib.ts MYMETA.json \ core.[0-9][0-9][0-9][0-9] MYMETA.yml \ $(BASEEXT).x $(BOOTSTRAP) \ perl$(EXE_EXT) tmon.out \ *$(OBJ_EXT) pm_to_blib \ pwmsearch.c $(INST_ARCHAUTODIR)/extralibs.ld \ blibdirs.ts core.[0-9][0-9][0-9][0-9][0-9] \ *perl.core core.*perl.*.? \ $(MAKE_APERL_FILE) $(BASEEXT).def \ perl core.[0-9][0-9][0-9] \ mon.out lib$(BASEEXT).def \ perlmain.c perl.exe \ so_locations $(BASEEXT).exp - $(RM_RF) \ blib $(NOECHO) $(RM_F) $(MAKEFILE_OLD) - $(MV) $(FIRST_MAKEFILE) $(MAKEFILE_OLD) $(DEV_NULL) # --- MakeMaker realclean_subdirs section: realclean_subdirs : $(NOECHO) $(NOOP) # --- MakeMaker realclean section: # Delete temporary files (via clean) and also delete dist files realclean purge :: clean realclean_subdirs - $(RM_F) \ $(OBJECT) $(MAKEFILE_OLD) \ $(FIRST_MAKEFILE) - $(RM_RF) \ $(DISTVNAME) # --- MakeMaker metafile section: metafile : create_distdir $(NOECHO) $(ECHO) Generating META.yml $(NOECHO) $(ECHO) '---' > META_new.yml $(NOECHO) $(ECHO) 'abstract: unknown' >> META_new.yml $(NOECHO) $(ECHO) 'author:' >> META_new.yml $(NOECHO) $(ECHO) ' - unknown' >> META_new.yml $(NOECHO) $(ECHO) 'build_requires:' >> META_new.yml $(NOECHO) $(ECHO) ' ExtUtils::MakeMaker: 0' >> META_new.yml $(NOECHO) $(ECHO) 'configure_requires:' >> META_new.yml $(NOECHO) $(ECHO) ' ExtUtils::MakeMaker: 0' >> META_new.yml $(NOECHO) $(ECHO) 'dynamic_config: 1' >> META_new.yml $(NOECHO) $(ECHO) 'generated_by: '\''ExtUtils::MakeMaker version 6.74, CPAN::Meta::Converter version 2.132140'\''' >> META_new.yml $(NOECHO) $(ECHO) 'license: unknown' >> META_new.yml $(NOECHO) $(ECHO) 'meta-spec:' >> META_new.yml $(NOECHO) $(ECHO) ' url: http://module-build.sourceforge.net/META-spec-v1.4.html' >> META_new.yml $(NOECHO) $(ECHO) ' version: 1.4' >> META_new.yml $(NOECHO) $(ECHO) 'name: TFBS-Ext-pwmsearch' >> META_new.yml $(NOECHO) $(ECHO) 'no_index:' >> META_new.yml $(NOECHO) $(ECHO) ' directory:' >> META_new.yml $(NOECHO) $(ECHO) ' - t' >> META_new.yml $(NOECHO) $(ECHO) ' - inc' >> META_new.yml $(NOECHO) $(ECHO) 'requires: {}' >> META_new.yml $(NOECHO) $(ECHO) 'version: 0.2' >> META_new.yml -$(NOECHO) $(MV) META_new.yml $(DISTVNAME)/META.yml $(NOECHO) $(ECHO) Generating META.json $(NOECHO) $(ECHO) '{' > META_new.json $(NOECHO) $(ECHO) ' "abstract" : "unknown",' >> META_new.json $(NOECHO) $(ECHO) ' "author" : [' >> META_new.json $(NOECHO) $(ECHO) ' "unknown"' >> META_new.json $(NOECHO) $(ECHO) ' ],' >> META_new.json $(NOECHO) $(ECHO) ' "dynamic_config" : 1,' >> META_new.json $(NOECHO) $(ECHO) ' "generated_by" : "ExtUtils::MakeMaker version 6.74, CPAN::Meta::Converter version 2.132140",' >> META_new.json $(NOECHO) $(ECHO) ' "license" : [' >> META_new.json $(NOECHO) $(ECHO) ' "unknown"' >> META_new.json $(NOECHO) $(ECHO) ' ],' >> META_new.json $(NOECHO) $(ECHO) ' "meta-spec" : {' >> META_new.json $(NOECHO) $(ECHO) ' "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",' >> META_new.json $(NOECHO) $(ECHO) ' "version" : "2"' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "name" : "TFBS-Ext-pwmsearch",' >> META_new.json $(NOECHO) $(ECHO) ' "no_index" : {' >> META_new.json $(NOECHO) $(ECHO) ' "directory" : [' >> META_new.json $(NOECHO) $(ECHO) ' "t",' >> META_new.json $(NOECHO) $(ECHO) ' "inc"' >> META_new.json $(NOECHO) $(ECHO) ' ]' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "prereqs" : {' >> META_new.json $(NOECHO) $(ECHO) ' "build" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {' >> META_new.json $(NOECHO) $(ECHO) ' "ExtUtils::MakeMaker" : "0"' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "configure" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {' >> META_new.json $(NOECHO) $(ECHO) ' "ExtUtils::MakeMaker" : "0"' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "runtime" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {}' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "release_status" : "stable",' >> META_new.json $(NOECHO) $(ECHO) ' "version" : "0.2"' >> META_new.json $(NOECHO) $(ECHO) '}' >> META_new.json -$(NOECHO) $(MV) META_new.json $(DISTVNAME)/META.json # --- MakeMaker signature section: signature : cpansign -s # --- MakeMaker dist_basics section skipped. # --- MakeMaker dist_core section skipped. # --- MakeMaker distdir section skipped. # --- MakeMaker dist_test section skipped. # --- MakeMaker dist_ci section skipped. # --- MakeMaker distmeta section: distmeta : create_distdir metafile $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'exit unless -e q{META.yml};' \ -e 'eval { maniadd({q{META.yml} => q{Module YAML meta-data (added by MakeMaker)}}) }' \ -e ' or print "Could not add META.yml to MANIFEST: $$$${'\''@'\''}\n"' -- $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'exit unless -f q{META.json};' \ -e 'eval { maniadd({q{META.json} => q{Module JSON meta-data (added by MakeMaker)}}) }' \ -e ' or print "Could not add META.json to MANIFEST: $$$${'\''@'\''}\n"' -- # --- MakeMaker distsignature section: distsignature : create_distdir $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'eval { maniadd({q{SIGNATURE} => q{Public-key signature (added by MakeMaker)}}) }' \ -e ' or print "Could not add SIGNATURE to MANIFEST: $$$${'\''@'\''}\n"' -- $(NOECHO) cd $(DISTVNAME) && $(TOUCH) SIGNATURE cd $(DISTVNAME) && cpansign -s # --- MakeMaker install section skipped. # --- MakeMaker force section: # Phony target to force checking subdirectories. FORCE : $(NOECHO) $(NOOP) # --- MakeMaker perldepend section: PERL_HDRS = \ $(PERL_INC)/EXTERN.h \ $(PERL_INC)/INTERN.h \ $(PERL_INC)/XSUB.h \ $(PERL_INC)/av.h \ $(PERL_INC)/bitcount.h \ $(PERL_INC)/cc_runtime.h \ $(PERL_INC)/config.h \ $(PERL_INC)/cop.h \ $(PERL_INC)/cv.h \ $(PERL_INC)/dosish.h \ $(PERL_INC)/embed.h \ $(PERL_INC)/embedvar.h \ $(PERL_INC)/fakesdio.h \ $(PERL_INC)/fakethr.h \ $(PERL_INC)/form.h \ $(PERL_INC)/git_version.h \ $(PERL_INC)/gv.h \ $(PERL_INC)/handy.h \ $(PERL_INC)/hv.h \ $(PERL_INC)/intrpvar.h \ $(PERL_INC)/iperlsys.h \ $(PERL_INC)/keywords.h \ $(PERL_INC)/malloc_ctl.h \ $(PERL_INC)/mg.h \ $(PERL_INC)/mydtrace.h \ $(PERL_INC)/nostdio.h \ $(PERL_INC)/op.h \ $(PERL_INC)/opcode.h \ $(PERL_INC)/opnames.h \ $(PERL_INC)/overload.h \ $(PERL_INC)/pad.h \ $(PERL_INC)/parser.h \ $(PERL_INC)/patchlevel.h \ $(PERL_INC)/perl.h \ $(PERL_INC)/perlapi.h \ $(PERL_INC)/perldtrace.h \ $(PERL_INC)/perlio.h \ $(PERL_INC)/perliol.h \ $(PERL_INC)/perlsdio.h \ $(PERL_INC)/perlsfio.h \ $(PERL_INC)/perlvars.h \ $(PERL_INC)/perly.h \ $(PERL_INC)/pp.h \ $(PERL_INC)/pp_proto.h \ $(PERL_INC)/proto.h \ $(PERL_INC)/reentr.h \ $(PERL_INC)/regcharclass.h \ $(PERL_INC)/regcomp.h \ $(PERL_INC)/regexp.h \ $(PERL_INC)/regnodes.h \ $(PERL_INC)/scope.h \ $(PERL_INC)/sv.h \ $(PERL_INC)/thread.h \ $(PERL_INC)/time64.h \ $(PERL_INC)/time64_config.h \ $(PERL_INC)/uconfig.h \ $(PERL_INC)/unixish.h \ $(PERL_INC)/utf8.h \ $(PERL_INC)/utfebcdic.h \ $(PERL_INC)/util.h \ $(PERL_INC)/uudmap.h \ $(PERL_INC)/warnings.h $(OBJECT) : $(PERL_HDRS) pwmsearch.c : $(XSUBPPDEPS) # --- MakeMaker makefile section: $(OBJECT) : $(FIRST_MAKEFILE) # We take a very conservative approach here, but it's worth it. # We move Makefile to Makefile.old here to avoid gnu make looping. $(FIRST_MAKEFILE) : Makefile.PL $(CONFIGDEP) $(NOECHO) $(ECHO) "Makefile out-of-date with respect to $?" $(NOECHO) $(ECHO) "Cleaning current config before rebuilding Makefile..." -$(NOECHO) $(RM_F) $(MAKEFILE_OLD) -$(NOECHO) $(MV) $(FIRST_MAKEFILE) $(MAKEFILE_OLD) - $(MAKE) $(USEMAKEFILE) $(MAKEFILE_OLD) clean $(DEV_NULL) $(PERLRUN) Makefile.PL $(NOECHO) $(ECHO) "==> Your Makefile has been rebuilt. <==" $(NOECHO) $(ECHO) "==> Please rerun the $(MAKE) command. <==" $(FALSE) # --- MakeMaker staticmake section: # --- MakeMaker makeaperl section --- MAP_TARGET = ../perl FULLPERL = /usr/bin/perl # --- MakeMaker test section: TEST_VERBOSE=0 TEST_TYPE=test_$(LINKTYPE) TEST_FILE = test.pl TEST_FILES = t/*.t TESTDB_SW = -d testdb :: testdb_$(LINKTYPE) test :: $(TEST_TYPE) subdirs-test subdirs-test :: $(NOECHO) $(NOOP) test_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES) testdb_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) $(TESTDB_SW) "-I$(INST_LIB)" "-I$(INST_ARCHLIB)" $(TEST_FILE) test_ : test_dynamic test_static :: pure_all $(MAP_TARGET) PERL_DL_NONLAZY=1 ./$(MAP_TARGET) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES) testdb_static :: pure_all $(MAP_TARGET) PERL_DL_NONLAZY=1 ./$(MAP_TARGET) $(TESTDB_SW) "-I$(INST_LIB)" "-I$(INST_ARCHLIB)" $(TEST_FILE) # --- MakeMaker ppd section: # Creates a PPD (Perl Package Description) for a binary distribution. ppd : $(NOECHO) $(ECHO) '' > $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) '' >> $(DISTNAME).ppd # --- MakeMaker pm_to_blib section: pm_to_blib : $(FIRST_MAKEFILE) $(TO_INST_PM) $(NOECHO) $(ABSPERLRUN) -MExtUtils::Install -e 'pm_to_blib({@ARGV}, '\''$(INST_LIB)/auto'\'', q[$(PM_FILTER)], '\''$(PERM_DIR)'\'')' -- \ pwmsearch.pm $(INST_LIB)/TFBS/Ext/pwmsearch.pm \ lib/pwm_search.h ../blib/lib/pwm_search.h \ lib/pwm_searchPFF.c ../blib/lib/pwm_searchPFF.c $(NOECHO) $(TOUCH) pm_to_blib # --- MakeMaker selfdocument section: # --- MakeMaker postamble section: # End. TFBS-0.7.1/Ext/Makefile.PL000066400000000000000000000006771305752266700150160ustar00rootroot00000000000000use ExtUtils::MakeMaker; # See lib/ExtUtils/MakeMaker.pm for details of how to influence # the contents of the Makefile that is written. WriteMakefile( 'NAME' => 'TFBS::Ext::pwmsearch', 'VERSION_FROM' => 'pwmsearch.pm', # finds $VERSION 'PREREQ_PM' => {}, # e.g., Module::Name => 1.1 'LIBS' => ['-lm'], # e.g., '-lm' 'DEFINE' => '', # e.g., '-DHAVE_SOMETHING' 'INC' => '-I. -I./lib', # e.g., '-I/usr/include/other' ); TFBS-0.7.1/Ext/Makefile.old000066400000000000000000000644161305752266700152620ustar00rootroot00000000000000# This Makefile is for the TFBS::Ext::pwmsearch extension to perl. # # It was generated automatically by MakeMaker version # 6.68 (Revision: 66800) from the contents of # Makefile.PL. Don't edit this file, edit Makefile.PL instead. # # ANY CHANGES MADE HERE WILL BE LOST! # # MakeMaker ARGV: () # # MakeMaker Parameters: # BUILD_REQUIRES => { } # CONFIGURE_REQUIRES => { } # DEFINE => q[] # INC => q[-I. -I./lib] # LIBS => [q[-lm]] # NAME => q[TFBS::Ext::pwmsearch] # PREREQ_PM => { } # TEST_REQUIRES => { } # VERSION_FROM => q[pwmsearch.pm] # --- MakeMaker post_initialize section: # --- MakeMaker const_config section: # These definitions are from config.sh (via /usr/local/Cellar/perl/5.14.3/lib/5.14.3/darwin-2level/Config.pm). # They may have been overridden via Makefile.PL or on the command line. AR = ar CC = cc CCCDLFLAGS = CCDLFLAGS = DLEXT = bundle DLSRC = dl_dlopen.xs EXE_EXT = FULL_AR = /usr/bin/ar LD = env MACOSX_DEPLOYMENT_TARGET=10.3 cc LDDLFLAGS = -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector LDFLAGS = -fstack-protector -L/usr/local/lib LIBC = LIB_EXT = .a OBJ_EXT = .o OSNAME = darwin OSVERS = 12.2.1 RANLIB = ranlib SITELIBEXP = /usr/local/Cellar/perl/5.14.3/lib/site_perl/5.14.3 SITEARCHEXP = /usr/local/Cellar/perl/5.14.3/lib/site_perl/5.14.3/darwin-2level SO = dylib VENDORARCHEXP = VENDORLIBEXP = # --- MakeMaker constants section: AR_STATIC_ARGS = cr DIRFILESEP = / DFSEP = $(DIRFILESEP) NAME = TFBS::Ext::pwmsearch NAME_SYM = TFBS_Ext_pwmsearch VERSION = 0.2 VERSION_MACRO = VERSION VERSION_SYM = 0_2 DEFINE_VERSION = -D$(VERSION_MACRO)=\"$(VERSION)\" XS_VERSION = 0.2 XS_VERSION_MACRO = XS_VERSION XS_DEFINE_VERSION = -D$(XS_VERSION_MACRO)=\"$(XS_VERSION)\" INST_ARCHLIB = ../blib/arch INST_SCRIPT = ../blib/script INST_BIN = ../blib/bin INST_LIB = ../blib/lib INST_MAN1DIR = ../blib/man1 INST_MAN3DIR = ../blib/man3 MAN1EXT = 1 MAN3EXT = 3 INSTALLDIRS = site DESTDIR = PREFIX = $(SITEPREFIX) PERLPREFIX = /usr/local/Cellar/perl/5.14.3 SITEPREFIX = /usr/local/Cellar/perl/5.14.3 VENDORPREFIX = INSTALLPRIVLIB = /usr/local/Cellar/perl/5.14.3/lib/5.14.3 DESTINSTALLPRIVLIB = $(DESTDIR)$(INSTALLPRIVLIB) INSTALLSITELIB = /usr/local/Cellar/perl/5.14.3/lib/site_perl/5.14.3 DESTINSTALLSITELIB = $(DESTDIR)$(INSTALLSITELIB) INSTALLVENDORLIB = DESTINSTALLVENDORLIB = $(DESTDIR)$(INSTALLVENDORLIB) INSTALLARCHLIB = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/darwin-2level DESTINSTALLARCHLIB = $(DESTDIR)$(INSTALLARCHLIB) INSTALLSITEARCH = /usr/local/Cellar/perl/5.14.3/lib/site_perl/5.14.3/darwin-2level DESTINSTALLSITEARCH = $(DESTDIR)$(INSTALLSITEARCH) INSTALLVENDORARCH = DESTINSTALLVENDORARCH = $(DESTDIR)$(INSTALLVENDORARCH) INSTALLBIN = /usr/local/Cellar/perl/5.14.3/bin DESTINSTALLBIN = $(DESTDIR)$(INSTALLBIN) INSTALLSITEBIN = /usr/local/Cellar/perl/5.14.3/bin DESTINSTALLSITEBIN = $(DESTDIR)$(INSTALLSITEBIN) INSTALLVENDORBIN = DESTINSTALLVENDORBIN = $(DESTDIR)$(INSTALLVENDORBIN) INSTALLSCRIPT = /usr/local/Cellar/perl/5.14.3/bin DESTINSTALLSCRIPT = $(DESTDIR)$(INSTALLSCRIPT) INSTALLSITESCRIPT = /usr/local/Cellar/perl/5.14.3/bin DESTINSTALLSITESCRIPT = $(DESTDIR)$(INSTALLSITESCRIPT) INSTALLVENDORSCRIPT = DESTINSTALLVENDORSCRIPT = $(DESTDIR)$(INSTALLVENDORSCRIPT) INSTALLMAN1DIR = /usr/local/Cellar/perl/5.14.3/share/man/man1 DESTINSTALLMAN1DIR = $(DESTDIR)$(INSTALLMAN1DIR) INSTALLSITEMAN1DIR = /usr/local/Cellar/perl/5.14.3/share/man/man1 DESTINSTALLSITEMAN1DIR = $(DESTDIR)$(INSTALLSITEMAN1DIR) INSTALLVENDORMAN1DIR = DESTINSTALLVENDORMAN1DIR = $(DESTDIR)$(INSTALLVENDORMAN1DIR) INSTALLMAN3DIR = /usr/local/Cellar/perl/5.14.3/share/man/man3 DESTINSTALLMAN3DIR = $(DESTDIR)$(INSTALLMAN3DIR) INSTALLSITEMAN3DIR = /usr/local/Cellar/perl/5.14.3/share/man/man3 DESTINSTALLSITEMAN3DIR = $(DESTDIR)$(INSTALLSITEMAN3DIR) INSTALLVENDORMAN3DIR = DESTINSTALLVENDORMAN3DIR = $(DESTDIR)$(INSTALLVENDORMAN3DIR) PERL_LIB = /usr/local/Cellar/perl/5.14.3/lib/5.14.3 PERL_ARCHLIB = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/darwin-2level LIBPERL_A = libperl.a FIRST_MAKEFILE = Makefile MAKEFILE_OLD = Makefile.old MAKE_APERL_FILE = Makefile.aperl PERLMAINCC = $(CC) PERL_INC = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/darwin-2level/CORE PERL = /usr/local/bin/perl FULLPERL = /usr/local/bin/perl ABSPERL = $(PERL) PERLRUN = $(PERL) FULLPERLRUN = $(FULLPERL) ABSPERLRUN = $(ABSPERL) PERLRUNINST = $(PERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" FULLPERLRUNINST = $(FULLPERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" ABSPERLRUNINST = $(ABSPERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" PERL_CORE = 0 PERM_DIR = 755 PERM_RW = 644 PERM_RWX = 755 MAKEMAKER = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/ExtUtils/MakeMaker.pm MM_VERSION = 6.68 MM_REVISION = 66800 # FULLEXT = Pathname for extension directory (eg Foo/Bar/Oracle). # BASEEXT = Basename part of FULLEXT. May be just equal FULLEXT. (eg Oracle) # PARENT_NAME = NAME without BASEEXT and no trailing :: (eg Foo::Bar) # DLBASE = Basename part of dynamic library. May be just equal BASEEXT. MAKE = make FULLEXT = TFBS/Ext/pwmsearch BASEEXT = pwmsearch PARENT_NAME = TFBS::Ext DLBASE = $(BASEEXT) VERSION_FROM = pwmsearch.pm INC = -I. -I./lib DEFINE = OBJECT = $(BASEEXT)$(OBJ_EXT) LDFROM = $(OBJECT) LINKTYPE = dynamic BOOTDEP = # Handy lists of source code files: XS_FILES = pwmsearch.xs C_FILES = pwmsearch.c O_FILES = pwmsearch.o H_FILES = MAN1PODS = MAN3PODS = pwmsearch.pm # Where is the Config information that we are using/depend on CONFIGDEP = $(PERL_ARCHLIB)$(DFSEP)Config.pm $(PERL_INC)$(DFSEP)config.h # Where to build things INST_LIBDIR = $(INST_LIB)/TFBS/Ext INST_ARCHLIBDIR = $(INST_ARCHLIB)/TFBS/Ext INST_AUTODIR = $(INST_LIB)/auto/$(FULLEXT) INST_ARCHAUTODIR = $(INST_ARCHLIB)/auto/$(FULLEXT) INST_STATIC = $(INST_ARCHAUTODIR)/$(BASEEXT)$(LIB_EXT) INST_DYNAMIC = $(INST_ARCHAUTODIR)/$(DLBASE).$(DLEXT) INST_BOOT = $(INST_ARCHAUTODIR)/$(BASEEXT).bs # Extra linker info EXPORT_LIST = PERL_ARCHIVE = PERL_ARCHIVE_AFTER = TO_INST_PM = lib/pwm_search.h \ lib/pwm_searchPFF.c \ pwmsearch.pm PM_TO_BLIB = pwmsearch.pm \ $(INST_LIB)/TFBS/Ext/pwmsearch.pm \ lib/pwm_search.h \ ../blib/lib/pwm_search.h \ lib/pwm_searchPFF.c \ ../blib/lib/pwm_searchPFF.c # --- MakeMaker platform_constants section: MM_Unix_VERSION = 6.68 PERL_MALLOC_DEF = -DPERL_EXTMALLOC_DEF -Dmalloc=Perl_malloc -Dfree=Perl_mfree -Drealloc=Perl_realloc -Dcalloc=Perl_calloc # --- MakeMaker tool_autosplit section: # Usage: $(AUTOSPLITFILE) FileToSplit AutoDirToSplitInto AUTOSPLITFILE = $(ABSPERLRUN) -e 'use AutoSplit; autosplit($$$$ARGV[0], $$$$ARGV[1], 0, 1, 1)' -- # --- MakeMaker tool_xsubpp section: XSUBPPDIR = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/ExtUtils XSUBPP = $(XSUBPPDIR)$(DFSEP)xsubpp XSUBPPRUN = $(PERLRUN) $(XSUBPP) XSPROTOARG = XSUBPPDEPS = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/ExtUtils/typemap $(XSUBPP) XSUBPPARGS = -typemap /usr/local/Cellar/perl/5.14.3/lib/5.14.3/ExtUtils/typemap XSUBPP_EXTRA_ARGS = # --- MakeMaker tools_other section: SHELL = /bin/sh CHMOD = chmod CP = cp MV = mv NOOP = $(TRUE) NOECHO = @ RM_F = rm -f RM_RF = rm -rf TEST_F = test -f TOUCH = touch UMASK_NULL = umask 0 DEV_NULL = > /dev/null 2>&1 MKPATH = $(ABSPERLRUN) -MExtUtils::Command -e 'mkpath' -- EQUALIZE_TIMESTAMP = $(ABSPERLRUN) -MExtUtils::Command -e 'eqtime' -- FALSE = false TRUE = true ECHO = echo ECHO_N = echo -n UNINST = 0 VERBINST = 0 MOD_INSTALL = $(ABSPERLRUN) -MExtUtils::Install -e 'install([ from_to => {@ARGV}, verbose => '\''$(VERBINST)'\'', uninstall_shadows => '\''$(UNINST)'\'', dir_mode => '\''$(PERM_DIR)'\'' ]);' -- DOC_INSTALL = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'perllocal_install' -- UNINSTALL = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'uninstall' -- WARN_IF_OLD_PACKLIST = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'warn_if_old_packlist' -- MACROSTART = MACROEND = USEMAKEFILE = -f FIXIN = $(ABSPERLRUN) -MExtUtils::MY -e 'MY->fixin(shift)' -- # --- MakeMaker makemakerdflt section: makemakerdflt : all $(NOECHO) $(NOOP) # --- MakeMaker dist section skipped. # --- MakeMaker macro section: # --- MakeMaker depend section: # --- MakeMaker cflags section: CCFLAGS = -fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include OPTIMIZE = -O3 PERLTYPE = MPOLLUTE = # --- MakeMaker const_loadlibs section: # TFBS::Ext::pwmsearch might depend on some other libraries: # See ExtUtils::Liblist for details # EXTRALIBS = LDLOADLIBS = -lm BSLOADLIBS = # --- MakeMaker const_cccmd section: CCCMD = $(CC) -c $(PASTHRU_INC) $(INC) \ $(CCFLAGS) $(OPTIMIZE) \ $(PERLTYPE) $(MPOLLUTE) $(DEFINE_VERSION) \ $(XS_DEFINE_VERSION) # --- MakeMaker post_constants section: # --- MakeMaker pasthru section: PASTHRU = LIBPERL_A="$(LIBPERL_A)"\ LINKTYPE="$(LINKTYPE)"\ OPTIMIZE="$(OPTIMIZE)"\ PREFIX="$(PREFIX)"\ PASTHRU_DEFINE="$(PASTHRU_DEFINE)"\ PASTHRU_INC="$(PASTHRU_INC)" # --- MakeMaker special_targets section: .SUFFIXES : .xs .c .C .cpp .i .s .cxx .cc $(OBJ_EXT) .PHONY: all config static dynamic test linkext manifest blibdirs clean realclean disttest distdir # --- MakeMaker c_o section: .c.i: cc -E -c $(PASTHRU_INC) $(INC) \ $(CCFLAGS) $(OPTIMIZE) \ $(PERLTYPE) $(MPOLLUTE) $(DEFINE_VERSION) \ $(XS_DEFINE_VERSION) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c > $*.i .c.s: $(CCCMD) -S $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c .c$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c .cpp$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cpp .cxx$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cxx .cc$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cc .C$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.C # --- MakeMaker xs_c section: .xs.c: $(XSUBPPRUN) $(XSPROTOARG) $(XSUBPPARGS) $(XSUBPP_EXTRA_ARGS) $*.xs > $*.xsc && $(MV) $*.xsc $*.c # --- MakeMaker xs_o section: .xs$(OBJ_EXT): $(XSUBPPRUN) $(XSPROTOARG) $(XSUBPPARGS) $*.xs > $*.xsc && $(MV) $*.xsc $*.c $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c # --- MakeMaker top_targets section: all :: pure_all manifypods $(NOECHO) $(NOOP) pure_all :: config pm_to_blib subdirs linkext $(NOECHO) $(NOOP) subdirs :: $(MYEXTLIB) $(NOECHO) $(NOOP) config :: $(FIRST_MAKEFILE) blibdirs $(NOECHO) $(NOOP) help : perldoc ExtUtils::MakeMaker # --- MakeMaker blibdirs section: blibdirs : $(INST_LIBDIR)$(DFSEP).exists $(INST_ARCHLIB)$(DFSEP).exists $(INST_AUTODIR)$(DFSEP).exists $(INST_ARCHAUTODIR)$(DFSEP).exists $(INST_BIN)$(DFSEP).exists $(INST_SCRIPT)$(DFSEP).exists $(INST_MAN1DIR)$(DFSEP).exists $(INST_MAN3DIR)$(DFSEP).exists $(NOECHO) $(NOOP) # Backwards compat with 6.18 through 6.25 blibdirs.ts : blibdirs $(NOECHO) $(NOOP) $(INST_LIBDIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_LIBDIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_LIBDIR) $(NOECHO) $(TOUCH) $(INST_LIBDIR)$(DFSEP).exists $(INST_ARCHLIB)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_ARCHLIB) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_ARCHLIB) $(NOECHO) $(TOUCH) $(INST_ARCHLIB)$(DFSEP).exists $(INST_AUTODIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_AUTODIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_AUTODIR) $(NOECHO) $(TOUCH) $(INST_AUTODIR)$(DFSEP).exists $(INST_ARCHAUTODIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_ARCHAUTODIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_ARCHAUTODIR) $(NOECHO) $(TOUCH) $(INST_ARCHAUTODIR)$(DFSEP).exists $(INST_BIN)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_BIN) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_BIN) $(NOECHO) $(TOUCH) $(INST_BIN)$(DFSEP).exists $(INST_SCRIPT)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_SCRIPT) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_SCRIPT) $(NOECHO) $(TOUCH) $(INST_SCRIPT)$(DFSEP).exists $(INST_MAN1DIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_MAN1DIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_MAN1DIR) $(NOECHO) $(TOUCH) $(INST_MAN1DIR)$(DFSEP).exists $(INST_MAN3DIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_MAN3DIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_MAN3DIR) $(NOECHO) $(TOUCH) $(INST_MAN3DIR)$(DFSEP).exists # --- MakeMaker linkext section: linkext :: $(LINKTYPE) $(NOECHO) $(NOOP) # --- MakeMaker dlsyms section: # --- MakeMaker dynamic section: dynamic :: $(FIRST_MAKEFILE) $(INST_DYNAMIC) $(INST_BOOT) $(NOECHO) $(NOOP) # --- MakeMaker dynamic_bs section: BOOTSTRAP = $(BASEEXT).bs # As Mkbootstrap might not write a file (if none is required) # we use touch to prevent make continually trying to remake it. # The DynaLoader only reads a non-empty file. $(BOOTSTRAP) : $(FIRST_MAKEFILE) $(BOOTDEP) $(INST_ARCHAUTODIR)$(DFSEP).exists $(NOECHO) $(ECHO) "Running Mkbootstrap for $(NAME) ($(BSLOADLIBS))" $(NOECHO) $(PERLRUN) \ "-MExtUtils::Mkbootstrap" \ -e "Mkbootstrap('$(BASEEXT)','$(BSLOADLIBS)');" $(NOECHO) $(TOUCH) $@ $(CHMOD) $(PERM_RW) $@ $(INST_BOOT) : $(BOOTSTRAP) $(INST_ARCHAUTODIR)$(DFSEP).exists $(NOECHO) $(RM_RF) $@ - $(CP) $(BOOTSTRAP) $@ $(CHMOD) $(PERM_RW) $@ # --- MakeMaker dynamic_lib section: # This section creates the dynamically loadable $(INST_DYNAMIC) # from $(OBJECT) and possibly $(MYEXTLIB). ARMAYBE = : OTHERLDFLAGS = INST_DYNAMIC_DEP = INST_DYNAMIC_FIX = $(INST_DYNAMIC): $(OBJECT) $(MYEXTLIB) $(BOOTSTRAP) $(INST_ARCHAUTODIR)$(DFSEP).exists $(EXPORT_LIST) $(PERL_ARCHIVE) $(PERL_ARCHIVE_AFTER) $(INST_DYNAMIC_DEP) $(RM_F) $@ $(LD) $(LDDLFLAGS) $(LDFROM) $(OTHERLDFLAGS) -o $@ $(MYEXTLIB) \ $(PERL_ARCHIVE) $(LDLOADLIBS) $(PERL_ARCHIVE_AFTER) $(EXPORT_LIST) \ $(INST_DYNAMIC_FIX) $(CHMOD) $(PERM_RWX) $@ # --- MakeMaker static section: ## $(INST_PM) has been moved to the all: target. ## It remains here for awhile to allow for old usage: "make static" static :: $(FIRST_MAKEFILE) $(INST_STATIC) $(NOECHO) $(NOOP) # --- MakeMaker static_lib section: $(INST_STATIC) : $(OBJECT) $(MYEXTLIB) $(INST_ARCHAUTODIR)$(DFSEP).exists $(RM_RF) $@ $(FULL_AR) $(AR_STATIC_ARGS) $@ $(OBJECT) && $(RANLIB) $@ $(CHMOD) $(PERM_RWX) $@ $(NOECHO) $(ECHO) "$(EXTRALIBS)" > $(INST_ARCHAUTODIR)/extralibs.ld # --- MakeMaker manifypods section: POD2MAN_EXE = $(PERLRUN) "-MExtUtils::Command::MM" -e pod2man "--" POD2MAN = $(POD2MAN_EXE) manifypods : pure_all \ pwmsearch.pm $(NOECHO) $(POD2MAN) --section=3 --perm_rw=$(PERM_RW) \ pwmsearch.pm $(INST_MAN3DIR)/TFBS::Ext::pwmsearch.$(MAN3EXT) # --- MakeMaker processPL section: # --- MakeMaker installbin section: # --- MakeMaker subdirs section: # none # --- MakeMaker clean_subdirs section: clean_subdirs : $(NOECHO) $(NOOP) # --- MakeMaker clean section: # Delete temporary files but do not touch installed files. We don't delete # the Makefile here so a later make realclean still has a makefile to use. clean :: clean_subdirs - $(RM_F) \ *$(LIB_EXT) core \ core.[0-9] $(INST_ARCHAUTODIR)/extralibs.all \ core.[0-9][0-9] $(BASEEXT).bso \ pm_to_blib.ts MYMETA.json \ core.[0-9][0-9][0-9][0-9] MYMETA.yml \ $(BASEEXT).x $(BOOTSTRAP) \ perl$(EXE_EXT) tmon.out \ *$(OBJ_EXT) pm_to_blib \ pwmsearch.c $(INST_ARCHAUTODIR)/extralibs.ld \ blibdirs.ts core.[0-9][0-9][0-9][0-9][0-9] \ *perl.core core.*perl.*.? \ $(MAKE_APERL_FILE) $(BASEEXT).def \ perl core.[0-9][0-9][0-9] \ mon.out lib$(BASEEXT).def \ perlmain.c perl.exe \ so_locations $(BASEEXT).exp - $(RM_RF) \ blib - $(MV) $(FIRST_MAKEFILE) $(MAKEFILE_OLD) $(DEV_NULL) # --- MakeMaker realclean_subdirs section: realclean_subdirs : $(NOECHO) $(NOOP) # --- MakeMaker realclean section: # Delete temporary files (via clean) and also delete dist files realclean purge :: clean realclean_subdirs - $(RM_F) \ $(OBJECT) $(MAKEFILE_OLD) \ $(FIRST_MAKEFILE) - $(RM_RF) \ $(DISTVNAME) # --- MakeMaker metafile section: metafile : create_distdir $(NOECHO) $(ECHO) Generating META.yml $(NOECHO) $(ECHO) '---' > META_new.yml $(NOECHO) $(ECHO) 'abstract: unknown' >> META_new.yml $(NOECHO) $(ECHO) 'author:' >> META_new.yml $(NOECHO) $(ECHO) ' - unknown' >> META_new.yml $(NOECHO) $(ECHO) 'build_requires:' >> META_new.yml $(NOECHO) $(ECHO) ' ExtUtils::MakeMaker: 0' >> META_new.yml $(NOECHO) $(ECHO) 'configure_requires:' >> META_new.yml $(NOECHO) $(ECHO) ' ExtUtils::MakeMaker: 0' >> META_new.yml $(NOECHO) $(ECHO) 'dynamic_config: 1' >> META_new.yml $(NOECHO) $(ECHO) 'generated_by: '\''ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.112621'\''' >> META_new.yml $(NOECHO) $(ECHO) 'license: unknown' >> META_new.yml $(NOECHO) $(ECHO) 'meta-spec:' >> META_new.yml $(NOECHO) $(ECHO) ' url: http://module-build.sourceforge.net/META-spec-v1.4.html' >> META_new.yml $(NOECHO) $(ECHO) ' version: 1.4' >> META_new.yml $(NOECHO) $(ECHO) 'name: TFBS-Ext-pwmsearch' >> META_new.yml $(NOECHO) $(ECHO) 'no_index:' >> META_new.yml $(NOECHO) $(ECHO) ' directory:' >> META_new.yml $(NOECHO) $(ECHO) ' - t' >> META_new.yml $(NOECHO) $(ECHO) ' - inc' >> META_new.yml $(NOECHO) $(ECHO) 'requires: {}' >> META_new.yml $(NOECHO) $(ECHO) 'version: 0.2' >> META_new.yml -$(NOECHO) $(MV) META_new.yml $(DISTVNAME)/META.yml $(NOECHO) $(ECHO) Generating META.json $(NOECHO) $(ECHO) '{' > META_new.json $(NOECHO) $(ECHO) ' "abstract" : "unknown",' >> META_new.json $(NOECHO) $(ECHO) ' "author" : [' >> META_new.json $(NOECHO) $(ECHO) ' "unknown"' >> META_new.json $(NOECHO) $(ECHO) ' ],' >> META_new.json $(NOECHO) $(ECHO) ' "dynamic_config" : 1,' >> META_new.json $(NOECHO) $(ECHO) ' "generated_by" : "ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.112621",' >> META_new.json $(NOECHO) $(ECHO) ' "license" : [' >> META_new.json $(NOECHO) $(ECHO) ' "unknown"' >> META_new.json $(NOECHO) $(ECHO) ' ],' >> META_new.json $(NOECHO) $(ECHO) ' "meta-spec" : {' >> META_new.json $(NOECHO) $(ECHO) ' "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",' >> META_new.json $(NOECHO) $(ECHO) ' "version" : "2"' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "name" : "TFBS-Ext-pwmsearch",' >> META_new.json $(NOECHO) $(ECHO) ' "no_index" : {' >> META_new.json $(NOECHO) $(ECHO) ' "directory" : [' >> META_new.json $(NOECHO) $(ECHO) ' "t",' >> META_new.json $(NOECHO) $(ECHO) ' "inc"' >> META_new.json $(NOECHO) $(ECHO) ' ]' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "prereqs" : {' >> META_new.json $(NOECHO) $(ECHO) ' "build" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {' >> META_new.json $(NOECHO) $(ECHO) ' "ExtUtils::MakeMaker" : 0' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "configure" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {' >> META_new.json $(NOECHO) $(ECHO) ' "ExtUtils::MakeMaker" : 0' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "runtime" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {}' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "release_status" : "stable",' >> META_new.json $(NOECHO) $(ECHO) ' "version" : "0.2"' >> META_new.json $(NOECHO) $(ECHO) '}' >> META_new.json -$(NOECHO) $(MV) META_new.json $(DISTVNAME)/META.json # --- MakeMaker signature section: signature : cpansign -s # --- MakeMaker dist_basics section skipped. # --- MakeMaker dist_core section skipped. # --- MakeMaker distdir section skipped. # --- MakeMaker dist_test section skipped. # --- MakeMaker dist_ci section skipped. # --- MakeMaker distmeta section: distmeta : create_distdir metafile $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'exit unless -e q{META.yml};' \ -e 'eval { maniadd({q{META.yml} => q{Module YAML meta-data (added by MakeMaker)}}) }' \ -e ' or print "Could not add META.yml to MANIFEST: $$$${'\''@'\''}\n"' -- $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'exit unless -f q{META.json};' \ -e 'eval { maniadd({q{META.json} => q{Module JSON meta-data (added by MakeMaker)}}) }' \ -e ' or print "Could not add META.json to MANIFEST: $$$${'\''@'\''}\n"' -- # --- MakeMaker distsignature section: distsignature : create_distdir $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'eval { maniadd({q{SIGNATURE} => q{Public-key signature (added by MakeMaker)}}) } ' \ -e ' or print "Could not add SIGNATURE to MANIFEST: $$$${'\''@'\''}\n"' -- $(NOECHO) cd $(DISTVNAME) && $(TOUCH) SIGNATURE cd $(DISTVNAME) && cpansign -s # --- MakeMaker install section skipped. # --- MakeMaker force section: # Phony target to force checking subdirectories. FORCE : $(NOECHO) $(NOOP) # --- MakeMaker perldepend section: PERL_HDRS = \ $(PERL_INC)/EXTERN.h \ $(PERL_INC)/INTERN.h \ $(PERL_INC)/XSUB.h \ $(PERL_INC)/av.h \ $(PERL_INC)/bitcount.h \ $(PERL_INC)/config.h \ $(PERL_INC)/cop.h \ $(PERL_INC)/cv.h \ $(PERL_INC)/dosish.h \ $(PERL_INC)/embed.h \ $(PERL_INC)/embedvar.h \ $(PERL_INC)/fakesdio.h \ $(PERL_INC)/fakethr.h \ $(PERL_INC)/form.h \ $(PERL_INC)/git_version.h \ $(PERL_INC)/gv.h \ $(PERL_INC)/handy.h \ $(PERL_INC)/hv.h \ $(PERL_INC)/intrpvar.h \ $(PERL_INC)/iperlsys.h \ $(PERL_INC)/keywords.h \ $(PERL_INC)/l1_char_class_tab.h \ $(PERL_INC)/malloc_ctl.h \ $(PERL_INC)/metaconfig.h \ $(PERL_INC)/mg.h \ $(PERL_INC)/mydtrace.h \ $(PERL_INC)/nostdio.h \ $(PERL_INC)/op.h \ $(PERL_INC)/op_reg_common.h \ $(PERL_INC)/opcode.h \ $(PERL_INC)/opnames.h \ $(PERL_INC)/overload.h \ $(PERL_INC)/pad.h \ $(PERL_INC)/parser.h \ $(PERL_INC)/patchlevel.h \ $(PERL_INC)/perl.h \ $(PERL_INC)/perlapi.h \ $(PERL_INC)/perlio.h \ $(PERL_INC)/perliol.h \ $(PERL_INC)/perlsdio.h \ $(PERL_INC)/perlsfio.h \ $(PERL_INC)/perlvars.h \ $(PERL_INC)/perly.h \ $(PERL_INC)/pp.h \ $(PERL_INC)/pp_proto.h \ $(PERL_INC)/proto.h \ $(PERL_INC)/reentr.h \ $(PERL_INC)/regcharclass.h \ $(PERL_INC)/regcomp.h \ $(PERL_INC)/regexp.h \ $(PERL_INC)/regnodes.h \ $(PERL_INC)/scope.h \ $(PERL_INC)/sv.h \ $(PERL_INC)/thread.h \ $(PERL_INC)/time64.h \ $(PERL_INC)/time64_config.h \ $(PERL_INC)/uconfig.h \ $(PERL_INC)/unixish.h \ $(PERL_INC)/utf8.h \ $(PERL_INC)/utfebcdic.h \ $(PERL_INC)/util.h \ $(PERL_INC)/uudmap.h \ $(PERL_INC)/warnings.h $(OBJECT) : $(PERL_HDRS) pwmsearch.c : $(XSUBPPDEPS) # --- MakeMaker makefile section: $(OBJECT) : $(FIRST_MAKEFILE) # We take a very conservative approach here, but it's worth it. # We move Makefile to Makefile.old here to avoid gnu make looping. $(FIRST_MAKEFILE) : Makefile.PL $(CONFIGDEP) $(NOECHO) $(ECHO) "Makefile out-of-date with respect to $?" $(NOECHO) $(ECHO) "Cleaning current config before rebuilding Makefile..." -$(NOECHO) $(RM_F) $(MAKEFILE_OLD) -$(NOECHO) $(MV) $(FIRST_MAKEFILE) $(MAKEFILE_OLD) - $(MAKE) $(USEMAKEFILE) $(MAKEFILE_OLD) clean $(DEV_NULL) $(PERLRUN) Makefile.PL $(NOECHO) $(ECHO) "==> Your Makefile has been rebuilt. <==" $(NOECHO) $(ECHO) "==> Please rerun the $(MAKE) command. <==" $(FALSE) # --- MakeMaker staticmake section: # --- MakeMaker makeaperl section --- MAP_TARGET = ../perl FULLPERL = /usr/local/bin/perl # --- MakeMaker test section: TEST_VERBOSE=0 TEST_TYPE=test_$(LINKTYPE) TEST_FILE = test.pl TEST_FILES = t/*.t TESTDB_SW = -d testdb :: testdb_$(LINKTYPE) test :: $(TEST_TYPE) subdirs-test subdirs-test :: $(NOECHO) $(NOOP) test_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES) testdb_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) $(TESTDB_SW) "-I$(INST_LIB)" "-I$(INST_ARCHLIB)" $(TEST_FILE) test_ : test_dynamic test_static :: pure_all $(MAP_TARGET) PERL_DL_NONLAZY=1 ./$(MAP_TARGET) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES) testdb_static :: pure_all $(MAP_TARGET) PERL_DL_NONLAZY=1 ./$(MAP_TARGET) $(TESTDB_SW) "-I$(INST_LIB)" "-I$(INST_ARCHLIB)" $(TEST_FILE) # --- MakeMaker ppd section: # Creates a PPD (Perl Package Description) for a binary distribution. ppd : $(NOECHO) $(ECHO) '' > $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) '' >> $(DISTNAME).ppd # --- MakeMaker pm_to_blib section: pm_to_blib : $(FIRST_MAKEFILE) $(TO_INST_PM) $(NOECHO) $(ABSPERLRUN) -MExtUtils::Install -e 'pm_to_blib({@ARGV}, '\''$(INST_LIB)/auto'\'', q[$(PM_FILTER)], '\''$(PERM_DIR)'\'')' -- \ pwmsearch.pm $(INST_LIB)/TFBS/Ext/pwmsearch.pm \ lib/pwm_search.h ../blib/lib/pwm_search.h \ lib/pwm_searchPFF.c ../blib/lib/pwm_searchPFF.c $(NOECHO) $(TOUCH) pm_to_blib # --- MakeMaker selfdocument section: # --- MakeMaker postamble section: # End. TFBS-0.7.1/Ext/lib/000077500000000000000000000000001305752266700136005ustar00rootroot00000000000000TFBS-0.7.1/Ext/lib/pwm_search.h000066400000000000000000000115741305752266700161110ustar00rootroot00000000000000/*--------------------------------------------------------------- * INCLUDES *---------------------------------------------------------------*/ #include #include /*--------------------------------------------------------------- * DECLARATIONS *---------------------------------------------------------------*/ /* extern double atof(); extern double log2(); extern double sqrt(); extern FILE *fopen(); */ void err_log(), err_show(); /*--------------------------------------------------------------- * DEFINES *---------------------------------------------------------------*/ #define __DEBUG__ 0 /* put debug messages on */ #define FNAMELEN 1000 /* max allowed length of file name */ #define MAX_LINE 200 #define MAXCOUNTS 1000 /* max number of counts in count matrix */ #define MAXERR 100 /* max number of errors that err_log can handle */ #define MAXHITS 1000 #define SEQLEN 1000000 /* max sequence length allowed */ #define SEQNAMELEN MAX_LINE /* max allowed sequence name length */ /*--------------------------------------------------------------- * GLOBALS *---------------------------------------------------------------*/ static char PANIC[] = "err_log function failure"; static char *__ERR__[MAXERR]; static int NUM_ERRS=0; static char SQCOMP[] = /* calculate base on complementary strand */ { /* ASCII chars; IUPAC conventions */ /* Control characters unchanged */ '\000','\001','\002','\003','\004','\005','\006','\007', '\010','\011','\012','\013','\014','\015','\016','\017', '\020','\021','\022','\023','\024','\025','\026','\027', '\030','\031','\032','\033','\034','\035','\036','\037', /* Punctuation and digits unchanged */ '\040','\041','\042','\043','\044','\045','\046','\047', '\050','\051','\052','\053','\054','\055','\056','\057', '\060','\061','\062','\063','\064','\065','\066','\067', '\070','\071','\072','\073','\074','\075','\076','\077', /* Capitals go to capitals */ '\100', 'T', 'V', 'G', 'H', '?', '?', 'C', /* @,A-G */ 'D', '?', '?', 'M', '?', 'K', 'N', '?', /* H-O */ '?', '?', 'Y', 'S', 'A', '?', 'B', 'W', /* P-W */ '?', 'R', '?','\133','\134','\135','\136','\137', /* X-Z,etc */ /* Lower case goes to lower case */ '\140', 't', 'v', 'g', 'h', '?', '?', 'c', 'd', '?', '?', 'm', '?', 'k', 'n', '?', '?', '?', 'y', 's', 'a', '?', 'b', 'w', '?', 'r', '?','\173','\174','\175','\176','\177' }; static int TRANS[] = /* translate characters to numbers */ { /* A=0; C=1; G=2; T=3; other = 4 */ /* Control characters */ 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, /* Punctuation and digits */ 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, /* Capitals */ 4,0,4,1,4,4,4,2, /* @,A-G */ 4,4,4,4,4,4,4,4, /* H-O */ 4,4,4,4,3,3,4,4, /* P-W */ 4,4,4,4,4,4,4,4, /* X-Z,etc */ /* Lower case */ 4,0,4,1,4,4,4,2, /* @,A-G */ 4,4,4,4,4,4,4,4, /* H-O */ 4,4,4,4,3,3,4,4, /* P-W */ 4,4,4,4,4,4,4,4 /* X-Z,etc */ }; /*--------------------------------------------------------------- * STRUCTURE DEFINITIONS *---------------------------------------------------------------*/ /* ARGUMENTS -- Structure to contain shared arguments */ struct arguments { char counts_file[FNAMELEN+1]; /* file name, count matrix */ char mask_file[FNAMELEN+1]; /* file name, masked seq output, "" means none. */ char seq_file[FNAMELEN+1]; /* file name, sequences */ char name[FNAMELEN+1]; /* TF name */ char class[FNAMELEN+1]; /* TF structural class */ int print_all; /* print scores of all hits */ long best_base; /* base for best score on sequence */ int best_only; /* only show best score on each sequence */ double best_score; /* best score on this sequence */ int best_strand; /* strand for best score on sequence */ double max_score; /* max score possible (implied from pwm) */ double min_score; /* min score possible (implied from pwm) */ double threshold; /* print stuff with log score > max_possible - threshold */ int width; /* pattern width (implied from number of counts) */ }; /* HIT - location and score of a site scoring above threshold */ struct HIT { long base; /* location */ int strand; /* 0 forward, 1 complement */ double score; /* score */ }; TFBS-0.7.1/Ext/lib/pwm_searchPFF.c000066400000000000000000000501451305752266700164350ustar00rootroot00000000000000/*-------------------------------------------------------------------- * BUGS or limitations * mask option not yet implemented. * * Extensions/revisions worth considering * pwm_calc that calculates pwm scores for every position; pipe to * selection programs that pull what I want. *------------------------------------------------------------------*/ /*-------------------------------------------------------------------- * This version is a quick and dirty modification of Wyeth Wasserman's * standalone pwm_searchPFF program. * * Boris Lenhard, August 2001 * * Read pwm matrix * Figure maximum and minimum possible scores * Read sequences (fasta format) one at a time, and for each: * Window through the sequence and complement * * Find all occurrences of pattern with * matrix score > threshold * * If -a flag is set just print all the values, otherwise: * * If -b flag is not set, * For each find, show seq name, location, find, score * otherwise * just show the best hit for this sequence * If "-m" option is set, write out all input sequences to * filename given, with finds replaced by 'n's. * * Exit: 0 for success, -1 otherwise. *------------------------------------------------------------------*/ #include "pwm_search.h" int do_search(char* matrixfile, char* seqfile, float threshold, char* tfname, char* tfclass, char* outfile) /*was: main int argc; char **argv;*/ { double pwm[2*MAXCOUNTS]; /* for pwm matrix */ /* do own indexing; 5*pos + nt */ int exitval = -1; /* exit value from main */ struct arguments args; /* command line args */ FILE *fp; /* for sequence input file */ FILE *outfp; NUM_ERRS = 0; if (__DEBUG__) fprintf(stderr, "%s %s %f %s %s %s\n", matrixfile, seqfile, threshold, tfname, tfclass, outfile); if ( __DEBUG__ ) announce("+++\nEntering main.\n+++\n"); /* Parse command line arguments */ /*if ( get_cmd_args(argc,argv,&args) ) { err_log( "Usage: pwm_searchPFF pwm_file seq_file threshold [-a][-b]|[-m mask_file] [-n TFname] [-c TFclass]\n" ); }*/ strcpy(args.counts_file, matrixfile); strcpy(args.seq_file, seqfile); args.threshold = threshold; strcpy(args.name, tfname); strcpy(args.class, tfclass); args.print_all = 0; args.best_only= 0; /* Read in the pwm; calculate max/min score */ //else if ( get_matrix(&args,pwm) ) { err_log("MAIN: get_matrix failed."); } /* Open the sequence file */ else if ( (fp=fopen(args.seq_file,"r")) == NULL ) { err_log("MAIN: open_seq_file failed."); } else if ( (outfp=fopen(outfile,"w")) == NULL ) { err_log("MAIN: open_outfile failed."); } /* Loop on sequences */ else if ( loop_on_seqs(&args,pwm,fp,outfp) ) { err_log("MAIN: loop_on_seqs failed."); } /* Normal completion */ else { exitval = 0; } /* Clean up and close out */ err_show(); fclose(fp); fclose(outfp); if ( __DEBUG__ ) announce("+++\nLeaving main.\n+++\n"); return(exitval); } /*-------------------------------------------------------------------- * Announce * * Print a debugging message * * Returns 0 *------------------------------------------------------------------*/ int announce(msg) char *msg; { int retval = 0; fprintf(stderr,msg); return(retval); } /*-------------------------------------------------------------------- * BEST_SAVE - Save the best score so far * * Called by do_seq * * Returns: 0 *------------------------------------------------------------------*/ int best_save(struct arguments* pargs, long base, int strand, double score) //struct arguments *pargs; /* args from command line */ //long base; /* base where score occurs */ //int strand; /* strand where score occurs */ //double score; /* score of hit to save */ { if ( pargs->best_base < 0 || score > pargs->best_score ) { pargs->best_base = base; pargs->best_score = score; pargs->best_strand = strand; } return(0); } /*-------------------------------------------------------------------- * BEST_PULL - Copy back the best score saved * * Called by do_seq * * Returns: 0 *------------------------------------------------------------------*/ best_pull(pargs,pbase,pstrand,pscore) struct arguments *pargs; /* args from command line */ long *pbase; /* base where score occurs */ int *pstrand; /* strand where score occurs */ double *pscore; /* score of hit to pull back */ { *pbase = pargs->best_base; if ( pargs->best_base >= 0 ) { *pscore = pargs->best_score; *pstrand = pargs->best_strand; } return(0); } /*-------------------------------------------------------------------- * DO_SEQ - Search through the given sequence with the given matrix * * Called by loop_on_seqs * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int do_seq(pargs,pwm,seqid,seq,outfp) struct arguments *pargs; /* args from command line */ double *pwm; /* pwm from get_matrix */ char *seqid; /* id of sequence to work on */ char *seq; /* the sequence to work on */ FILE *outfp; { double backward_score; double forward_score; double score; long base; int done = 0; int nt; int pos; int retval = 0; int strand; long l; long nhit=0L; struct HIT hits[MAXHITS]; if ( __DEBUG__ ) announce("+++\nEntering do_seq.\n+++\n"); /* first make sure sequence is long enough */ for ( base=0; base < pargs->width; ++base ) { if ( seq[base] == '\0' ) done = 1; } /* loop on windows */ pargs->best_base = -1; for ( base=0; !retval && !done && seq[base+pargs->width-1]; ++base ) { forward_score = 0.0; backward_score = 0.0; for ( pos=0; poswidth; ++pos ) { nt = TRANS[seq[base+pos]]; forward_score += pwm[5*pos + nt]; nt = ( nt==4 ) ? 4 : 3-nt; backward_score += pwm[5*(pargs->width - pos -1) + nt]; } if ( forward_score > pargs->threshold ) { if ( pargs->print_all ) { if ( save_hit(base,0,forward_score,hits,&nhit) ) { err_log("DO_SEQ: save_hit failed"); retval = -1; } } else if ( pargs->best_only ) { best_save(pargs,base,0,forward_score); } else if ( output(pargs,seqid,base,seq,0,forward_score,outfp) ) { err_log("DO_SEQ: output failed"); retval = -1; } } if ( backward_score > pargs->threshold ) { if ( pargs->print_all ) { if ( save_hit(base,1,backward_score,hits,&nhit) ) { err_log("DO_SEQ: save_hit failed"); retval = -1; } } else if ( pargs->best_only ) { best_save(pargs,base,1,backward_score); } else if ( output(pargs,seqid,base,seq,1,backward_score, outfp) ) { err_log("DO_SEQ: output failed"); retval = -1; } } } if ( pargs->print_all ) { for ( l=0; l=0 ) { if ( output(pargs,seqid,base,seq,strand,score,outfp) ) { err_log("DO_SEQ: output failed"); retval = -1; } } } if ( __DEBUG__ ) announce("+++\nLeaving do_seq.\n+++\n"); return(retval); } /*********************************************************************** * ERR_LOG and ERR_SHOW * * A pair of functions for saving up and then printing error messages. * err_log stores away an error message each time it is called. When * err_show is called it prints all the messages saved up so far. * * Neither function returns a value **********************************************************************/ void err_log(msg) char *msg; { if ( __DEBUG__ ) announce("+++\nEntering err_log\n+++\n"); NUM_ERRS++; if ( (__ERR__[NUM_ERRS-1] = (char *) malloc( 1+strlen(msg) ) ) == NULL ) __ERR__[NUM_ERRS - 1] = PANIC; else strcpy( __ERR__[NUM_ERRS - 1],msg ); if ( __DEBUG__ ) announce("+++\nLeaving err_log\n+++\n"); return; } void err_show() { int err_num; for ( err_num=0; err_numcounts_file,argv[1]); strcpy(pargs->seq_file,argv[2]); pargs->threshold = atof(argv[3]); pargs->best_only = 0; pargs->print_all = 0; pargs->mask_file[0] = '\0'; while (arg_count < argc) { if ( argv[arg_count][0]=='-' && argv[arg_count][1]=='b' ) { pargs->best_only = 1; arg_count++; } else if ( argv[arg_count][0]=='-' && argv[arg_count][1]=='a' ) { pargs->print_all = 1; arg_count++; } else if ( arg_countmask_file,argv[arg_count+1]); arg_count = arg_count+2; } else if ( arg_countname,argv[arg_count+1]); arg_count = arg_count+2; } else if ( arg_countclass,argv[arg_count+1]); arg_count = arg_count+2; } else { arg_count++; } } } if ( __DEBUG__ ) announce("+++\nLeaving get_cmd_args\n+++\n"); return( retval ); } /*-------------------------------------------------------------------- * GET_MATRIX - Read in pwm. * * Called by main. * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int get_matrix(struct arguments* pargs, double* pwm) /* struct arguments *pargs; args from command line double *pwm; array for pwm */ /* do own indexing; 5*pos + nt */ { double counts[2*MAXCOUNTS]; double max_log; double min_log; double scratch[1+MAXCOUNTS]; int done = 0; int nt; int num_counts; int pos; int retval=0; FILE *fp; /* stream for counts file */ if ( __DEBUG__ ) announce("+++\nEntering get_matrix\n+++\n"); /* Open the file */ if ( (fp=fopen(pargs->counts_file,"r")) == NULL ) { err_log("GET_MATRIX: could not open specified file."); retval = -1; } /* Read in the real numbers without regard to dimension */ else { for ( num_counts=0; !done && num_countswidth = num_counts/4; for ( pos=0; poswidth; ++pos ) { for ( nt=0; nt<4; ++nt ) { pwm[5*pos + nt] = scratch[(pargs->width)*nt + pos]; } pwm[5*pos + 4] = (pwm[5*pos + 0] + pwm[5*pos + 1] + pwm[5*pos + 2] + pwm[5*pos + 3] ) / 4; } /* Next the extreme scores */ pargs->max_score = 0; pargs->min_score = 0; for ( pos=0; poswidth; ++pos ) { max_log = -10.0; min_log = 10.0; for ( nt=0; nt<4; ++nt ) { max_log = ( max_log>pwm[5*pos+nt] ) ? max_log : pwm[5*pos+nt]; min_log = ( min_logmax_score += max_log; pargs->min_score += min_log; } } if ( __DEBUG__ ) announce("+++\nLeaving get_matrix\n+++\n"); return (retval); } /*-------------------------------------------------------------------- * GET_SEQUENCE * * Get the next sequence from the input file (fasta format) * * Called by loop_on_seqs. * * Return 0 normally, -1 on error, 1 if called at EOF. *------------------------------------------------------------------*/ get_sequence(fp,seq_id,sequence) FILE *fp; /* file to read */ char *seq_id; /* name of sequence */ char *sequence; /* text of sequence */ { char msg[2*MAX_LINE]; int c; int done=0; int position; int retval = 0; int word = 0; int count = 0; long base = 0L; char line[MAX_LINE]; // was static int at_eof = 0; // was static int first_time=1; // was static if ( __DEBUG__ ) { announce("+++\nEntering Get_sequence\n+++\n"); } if ( first_time ) { first_time=0; if ( fgets(line,MAX_LINE,fp)==NULL ) { at_eof = 1; } } if ( at_eof ) /* this time or last time */ { retval = 1; } /* At this point, line should always be the first line of an entry */ /* Pull out the id */ if ( !retval ) { strcpy(seq_id,line+1); seq_id[ strlen(seq_id) -1 ] = '\0'; while (count < strlen(seq_id) && !word) { if (seq_id[count] == ' ') { word++; seq_id[count]= '\0'; } count++; } } /* Read in the sequence */ while ( !retval && !done ) { if ( __DEBUG__ ) { announce("+++\nReading in...\n+++\n"); } if ( fgets(line,MAX_LINE,fp) == NULL ) { at_eof = 1; done = 1; } else if ( line[0] == '>' ) { done = 1; } else { for ( position=0; !retval && line[position]!='\0'; ++position) { c = line[position]; if ( !isdigit( c ) && !isspace( c ) ) { if ( base >= SEQLEN ) { err_log("GET_SEQUENCE: Sequence too long."); retval = -1; } else { sequence[base++] = c; } } } } } sequence[base] = '\0'; if ( __DEBUG__ ) { announce("+++\nLeaving Get_sequence\n+++\n"); sprintf(msg,"seq_id=%s\nlength=%ld\n", seq_id, base ); announce(msg); } return(retval); } /*-------------------------------------------------------------------- * LOOP_ON_SEQS - Loop through the sequences of the input file, * doing the search and output. * * Called by main. * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int loop_on_seqs(pargs,pwm,fp, outfp) struct arguments *pargs; /* args from command line */ double *pwm; /* pwm, from get_matrix */ FILE *fp; /* sequence file pointer */ FILE *outfp; /* output file pointer */ { char seq[SEQLEN+1]; char seqid[SEQNAMELEN+1]; int done = 0; int retval=0; if ( __DEBUG__ ) announce("+++\nEntering loop_on_seqs\n+++\n"); /* Main loop */ while ( !retval && !done ) { done = get_sequence(fp,seqid,seq); if ( done == -1 ) { err_log("LOOP_ON_SEQS: get_sequence failed."); retval = -1; } else if ( done == 0 ) { if ( do_seq(pargs,pwm,seqid,seq,outfp) ) { err_log("LOOP_ON_SEQS: do_seq failed."); retval = -1; } } } if ( __DEBUG__ ) announce("+++\nLeaving loop_on_seqs\n+++\n"); return (retval); } /*-------------------------------------------------------------------- * MARK - write "width" dashes, to mark strand * * Called by output. * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int mark(width) int width; { int pos; for ( pos=0; posmin_score)/(pargs->max_score - pargs->min_score), # pargs->min_score, # pargs->max_score); # printf("\n%ld\n",base+1); */ fprintf(outfp, "%s\tTFBS\t%s\t%s\t",seqid,pargs->name,pargs->class); if (strand) { fprintf(outfp, "-\t"); /* FIXED BY BORIS : 1 is for "-" strand */ } else fprintf(outfp, "+\t"); /* FIXED BY BORIS : 0 is for "+" strand */ fprintf(outfp, "%6.3f\t%6.1f\t", score, 100*(score - pargs->min_score)/(pargs->max_score - pargs->min_score)); fprintf(outfp, "%ld\t%ld\t",base+1,base+pargs->width); for ( pos=0; poswidth; ++pos ) { putc(seq[base+pos], outfp); } putc('\n', outfp); /* #endif */ if ( __DEBUG__ ) announce("+++\nLeaving output\n+++\n"); return( retval ); } /*-------------------------------------------------------------------- * SAVE_HIT - save location, strand and score of a hit in an array of such * * Called by do_seq. * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int save_hit(base,strand,score,hits,pnhit) long base; int strand; double score; struct HIT *hits; long *pnhit; { int retval = 0; if ( *pnhit == MAXHITS ) { err_log("SAVE_HIT: MAXHITS limit reached."); retval = -1; } hits[*pnhit].base = base; hits[*pnhit].strand = strand; hits[*pnhit].score = score; *pnhit = *pnhit + 1; return(retval); } TFBS-0.7.1/Ext/pm_to_blib000066400000000000000000000000001305752266700150510ustar00rootroot00000000000000TFBS-0.7.1/Ext/pwmsearch.bs000066400000000000000000000000001305752266700153370ustar00rootroot00000000000000TFBS-0.7.1/Ext/pwmsearch.c000066400000000000000000000064561305752266700152020ustar00rootroot00000000000000/* * This file was generated automatically by ExtUtils::ParseXS version 2.21 from the * contents of pwmsearch.xs. Do not edit this file, edit pwmsearch.xs instead. * * ANY CHANGES MADE HERE WILL BE LOST! * */ #line 1 "pwmsearch.xs" #include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include "pwm_searchPFF.c" #include #line 18 "pwmsearch.c" #ifndef PERL_UNUSED_VAR # define PERL_UNUSED_VAR(var) if (0) var = var #endif #ifndef PERL_ARGS_ASSERT_CROAK_XS_USAGE #define PERL_ARGS_ASSERT_CROAK_XS_USAGE assert(cv); assert(params) /* prototype to pass -Wmissing-prototypes */ STATIC void S_croak_xs_usage(pTHX_ const CV *const cv, const char *const params); STATIC void S_croak_xs_usage(pTHX_ const CV *const cv, const char *const params) { const GV *const gv = CvGV(cv); PERL_ARGS_ASSERT_CROAK_XS_USAGE; if (gv) { const char *const gvname = GvNAME(gv); const HV *const stash = GvSTASH(gv); const char *const hvname = stash ? HvNAME(stash) : NULL; if (hvname) Perl_croak(aTHX_ "Usage: %s::%s(%s)", hvname, gvname, params); else Perl_croak(aTHX_ "Usage: %s(%s)", gvname, params); } else { /* Pants. I don't think that it should be possible to get here. */ Perl_croak(aTHX_ "Usage: CODE(0x%"UVxf")(%s)", PTR2UV(cv), params); } } #undef PERL_ARGS_ASSERT_CROAK_XS_USAGE #ifdef PERL_IMPLICIT_CONTEXT #define croak_xs_usage(a,b) S_croak_xs_usage(aTHX_ a,b) #else #define croak_xs_usage S_croak_xs_usage #endif #endif /* NOTE: the prototype of newXSproto() is different in versions of perls, * so we define a portable version of newXSproto() */ #ifdef newXS_flags #define newXSproto_portable(name, c_impl, file, proto) newXS_flags(name, c_impl, file, proto, 0) #else #define newXSproto_portable(name, c_impl, file, proto) (PL_Sv=(SV*)newXS(name, c_impl, file), sv_setpv(PL_Sv, proto), (CV*)PL_Sv) #endif /* !defined(newXS_flags) */ #line 70 "pwmsearch.c" XS(XS_TFBS__Ext__pwmsearch_search_xs); /* prototype to pass -Wmissing-prototypes */ XS(XS_TFBS__Ext__pwmsearch_search_xs) { #ifdef dVAR dVAR; dXSARGS; #else dXSARGS; #endif if (items != 6) croak_xs_usage(cv, "matrixfile, seqfile, threshold, tfname, tfclass, outfile"); { char* matrixfile = (char *)SvPV_nolen(ST(0)); char* seqfile = (char *)SvPV_nolen(ST(1)); double threshold = (double)SvNV(ST(2)); char* tfname = (char *)SvPV_nolen(ST(3)); char* tfclass = (char *)SvPV_nolen(ST(4)); char* outfile = (char *)SvPV_nolen(ST(5)); int RETVAL; dXSTARG; #line 18 "pwmsearch.xs" do_search(matrixfile, seqfile, threshold, tfname, tfclass, outfile); #line 93 "pwmsearch.c" } XSRETURN(1); } #ifdef __cplusplus extern "C" #endif XS(boot_TFBS__Ext__pwmsearch); /* prototype to pass -Wmissing-prototypes */ XS(boot_TFBS__Ext__pwmsearch) { #ifdef dVAR dVAR; dXSARGS; #else dXSARGS; #endif #if (PERL_REVISION == 5 && PERL_VERSION < 9) char* file = __FILE__; #else const char* file = __FILE__; #endif PERL_UNUSED_VAR(cv); /* -W */ PERL_UNUSED_VAR(items); /* -W */ XS_VERSION_BOOTCHECK ; (void)newXS("TFBS::Ext::pwmsearch::search_xs", XS_TFBS__Ext__pwmsearch_search_xs, file); #if (PERL_REVISION == 5 && PERL_VERSION >= 9) if (PL_unitcheckav) call_list(PL_scopestack_ix, PL_unitcheckav); #endif XSRETURN_YES; } TFBS-0.7.1/Ext/pwmsearch.pm000066400000000000000000000073301305752266700153640ustar00rootroot00000000000000package TFBS::Ext::pwmsearch; require 5.005_62; use strict; use warnings; use vars qw(@ISA @EXPORT @EXPORT_OK %EXPORT_TAGS $VERSION); use Bio::SeqIO; use File::Temp qw (:POSIX); require Exporter; require DynaLoader; our @ISA = qw(Exporter DynaLoader); # Items to export into callers namespace by default. Note: do not export # names by default without a very good reason. Use EXPORT_OK instead. # Do not simply export all your public functions/methods/constants. # This allows declaration use TFBS::Ext::pwmsearch ':all'; # If you do not need this, moving things directly into @EXPORT or @EXPORT_OK # will save memory. %EXPORT_TAGS = ( 'all' => [ qw( ) ] ); @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } ); @EXPORT = qw( ); $VERSION = '0.2'; bootstrap TFBS::Ext::pwmsearch $VERSION; # Preloaded methods go here. sub pwmsearch { my ($matrixobj, $seqobj, $threshold, $start, $end) = @_; $start = 1 if !defined($start); $end = $seqobj->length if !defined($end); my $matrixfile = tmpnam(); open (MATRIX, ">$matrixfile") or die ("Error opening temporary file."); print MATRIX $matrixobj->rawprint(); close MATRIX; my $outfile = tmpnam(); # pwm_search is confused by long descriptions - we delete desc temporarily: my $save_desc = $seqobj->desc(); $seqobj->desc(""); my $seqfile; if ($seqobj->{_fastafile}) { $seqfile = $seqobj->{_fastafile}; } else { $seqfile = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$seqfile", -format=>"fasta"); $outstream->write_seq(Bio::Seq->new(-seq =>$seqobj->subseq($start, $end), -id =>$seqobj->id)); $outstream->close(); } $seqobj->desc($save_desc); # calculate threshold if ($threshold) { if ($threshold =~ /(.+)%/) { # percentage $threshold = $matrixobj->{min_score} + ($matrixobj->{max_score} - $matrixobj->{min_score})* $1/100; } else { # absolute value # $threshold = $args{-threshold}; } } else { # no threshold given $threshold = $matrixobj->{min_score} -1; } search_xs($matrixfile, $seqfile, $threshold, $matrixobj->name()."", $matrixobj->{'class'}."", $outfile); unlink $seqfile unless $seqobj->{'_fastafile'}; unlink $matrixfile; my $hitlist = TFBS::SiteSet->new(); my ($TFname, $TFclass) = ($matrixobj->{name}, $matrixobj->{class}); my $save_delim = $/; # bugfix submitted local $/ = "\n"; # by Michal Lapidot open (OUTFILE, $outfile) or die("Could not read temporary outfile"); while (my $line = ) { # print STDERR $line; chomp $line; $line =~ s/^\s+//; $line =~ s/ *\t */\t/g; my ($seq_id, $factor, $class, $strand, $score, $pos, $siteseq) = (split /\t/, $line)[0, 2, 3, 4, 5, 7, 9]; my $num_strand = ($strand eq "-")? "-1" : "1"; my $site = TFBS::Site->new ( -seq_id => $seqobj->display_id()."", -seqobj => $seqobj, -strand => $num_strand."", -pattern => $matrixobj, -siteseq => $siteseq."", -score => $score."", -start => $pos +$start -1, -end => $pos +$start +length($siteseq) -2 ); $hitlist->add_site($site); } close OUTFILE; $/ = $save_delim; unlink $outfile; return $hitlist; } 1; __END__ =head1 NAME TFBS::Ext::pwmsearch - Perl extension for scanning a DNA sequence object with a position weight matrix =head1 SYNOPSIS use TFBS::Ext::pwmsearch; pwmsearch =head1 DESCRIPTION Stub documentation for TFBS::Ext::pwmsearch, created by h2xs. It looks like the author of the extension was negligent enough to leave the stub unedited. Blah blah blah. =head2 EXPORT None by default. =head1 AUTHOR A. U. Thor, a.u.thor@a.galaxy.far.far.away =head1 SEE ALSO perl(1). =cut TFBS-0.7.1/Ext/pwmsearch.xs000066400000000000000000000006711305752266700154030ustar00rootroot00000000000000#include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include "pwm_searchPFF.c" #include MODULE = TFBS::Ext::pwmsearch PACKAGE = TFBS::Ext::pwmsearch int search_xs (matrixfile, seqfile, threshold, tfname, tfclass, outfile) char* matrixfile; char* seqfile; double threshold; char* tfname; char* tfclass; char* outfile; CODE: do_search(matrixfile, seqfile, threshold, tfname, tfclass, outfile); TFBS-0.7.1/Ext/t/000077500000000000000000000000001305752266700132755ustar00rootroot00000000000000TFBS-0.7.1/Ext/t/pwmsearch.t000066400000000000000000000014511305752266700154540ustar00rootroot00000000000000use Test; use TFBS::Ext::pwmsearch; use TFBS::Matrix::PFM; plan(tests=>2); my $matrixstring = "0 0 0 0 0 0 0 0\n". "0 12 12 0 12 0 12 12\n". "0 0 0 12 0 12 0 0\n". "12 0 0 0 0 0 0 0"; my $pfm = TFBS::Matrix::PFM->new(-matrix=>$matrixstring, -name=>"MyMatrix"); my $pwm = $pfm->to_PWM; my $seq = Bio::SeqIO->new(-file=>"t/test.fa", -format=>"fasta")->next_seq(); my $siteset1 = TFBS::Ext::pwmsearch::pwmsearch($pwm, $seq, "60%"); my $siteset2 = TFBS::Ext::pwmsearch::pwmsearch($pwm, $seq, "60%"); ok($siteset1->size(), 194); #print STDERR "SIZE::".$siteset1->size()."\n"; my $it = $siteset2->Iterator(); my $startsum = 0; while (my $site = $it->next()) { $startsum += $site->start; } #print STDERR "STARTSUM::".$startsum."\n"; ok($startsum, 457608); TFBS-0.7.1/Ext/t/test.fa000066400000000000000000000117471305752266700145760ustar00rootroot00000000000000>AP000365 GTCGCCCAGGCTGGAGTACAATGGCGCAATCTCGGCTCACCCTCGGCTCACCACAGCCTC TGCCTCCCGGGTTCAAGCAATTCTCTTGCCTCAGCCTCCTGAGTAGCTGGGACTGAGTAG CCATGTGCCACCATGCCCGGCTAATTTTGTGTTTTTAGTAGAGACAGGGTTTCTCCATGT TAGTCAGGCTGGTCTCAAACTCCTGACCTCAGGGGATCCACCCGCCTCGGCCTCCCAAAA GTGCTGGGATTACAGGCGTGTGCCACTGTGCCTGGTCTGTGAGCCACTGTGCCCGGCCTG AGAAATGTTTCTTTTTTTCTTTCTTTTTTTTTTTTTAAGCAGAAACACATTCATTTATTA ACCAAAGGGATGATCCTAATGAATCCAACACACTTTGAAATAGCTGCATGTAAAATGTTT GTGATAAAGATAATTGAACACAGTAATGAAAAAAAAAAAAGAAAGAAAGAAACGGTATGG AGATTTGCTCATTGAACTGAGCTTGGTCATTCTCTTAGTTAACTCCTGTCCAAAGTGATG ATGGAATCTTTATTGTACTTTTTCATAGATCCGAGTACAGGCGACATGGTTCATGACACA GTCCACCACTAATTTCCCATCTTTCAATGTTCTTGTTATTGTGCTTTCCTTCCCATCCCA CTCCTGATGCTGAACCAATGCACCATCTGTAAAGTTGCACACAGTCTGAGTTTTTCTGCC ATCAGCTGTGGTTTCTTCAAACTTCTCTCCCAGGGTACAAGAAAACTGTGTTGTTTTCAA AGTGCTCTCAGTTTTTATGGTGAGGTTTTTGCCATCACAAGTGATGATACAATCTGGCTT GGCCATTGCGCCCATTTTTTGCAAAGCTATTTCCTCCTAGCTCCTTCATGTATTCATCAA AGCCTTCGCTGTCCACCAGGCGCCATCTTCCTTCCAGCTGCTGAACTGTGGCCATGGTGG GTGCAGGGGGGCTGGTGTGCAGAGCAGGGTCTGCGTCGGCGTGGCAGCGTGCTGTCGAGA AATGTTTCTAAGGAGATCTTATTTGGTCTGAGAACCATGAATGATTATTTTGAGCACTTT TGATTCTGGAGACTCCATTTGGATCAGGCATGGTCCTCCAAATTCAGGCTTCTGAAAGCC TGTACCTCAGAGTAGGCTTGATGTTCCATAAAAGATGTGGTTATGAGTGCAAAGATGACT TGCCTGTATTGTTATACAAATGTAAAATGTAACAATCAACAAAAATGTAGCAAAGTATGC ATGTATACATTTTCTCTAAAGATACAGTTTCTTTTTTGAAAAAATAAACACATTAGGCAG GTGTGATGGCGGGTGCCTGTTATCCCAGCTACTCCGGAGGCTAAGGCACGAGAATCTCTT GAACCTGGGAGGTGGACAAATTGCAGTGAGCCAAGATTGCGCCACTATACTCCAGCCTGG GCAATAGAGCGAGACTCAGTCTCAAAAAATAAATAAATAAATAAATAAATAAATAAATAA ATAAAATAAACACTACCGGCCAGTGGCCATGGCTCGAGCCTATAATCCCAGCACTTTGGG AGGCCTGAGCCAGGTGGAGTTCAGGCATTCAAGACCAGCTTGGGCAATATGACAAGACCC CTGTCTCTACTAAAAATACAAAACAATAGCCGGCCGTGGTGGTGTGTGCCTGTAGTCAGC TGCTTGGGAGGCTGAGGTGGGAGGATTGCTTGAGCCCTGAAGGTGGAAGTTGCAGTGAGC TGAGATAGTGCCATTGCACTCCAGCCTGGGTGACAGAGTGAGACCCTGTCTCAAAAAATA AAATAAAATAAACACTCCTATAAAGGATCCTCTTAGCTCTTTTTCTAACACCTAATCTAC ATTTTCATATTCATTTCAGTTACCCTACAACTGTTCACTGAGCTGCTGTTGAATAGGGGA AATAAGGCAGATAACTACTGCCATCTCCGCTGGAGGGACGATACAGACATTAATCTGGGC ACTTTGATTACAGGCAATGAGAGCTGTGAGTGGGGAAAGCACAAGGTTGGCAGAAGCATT TAGGGGGACACAGCCATTCTCACGGAGGGCAGAGGTCTAAAGCAAGAGCTGAATAAAAAG TAGGAACTGGCCTCGTGGAAAGGGGAAGGGTGATGGGACAGCCTGGTGGTTTGTAGCCCA CTGGAAGGAGTTCTGAAAACTGGTGGTCAGGTGAGAAGGAAAGCTGGGGAAGAGATGAGC ACGTTCGCCAGAGGGTAGCAGGGGCTCTCCGGACCTAGTGAGTCAAGCCAAGGAATTAAG GCTTCAGCCTGCAGGGTGATGAATAGGGCTGTCTATTCCATTTCTTCCTTCTTTCTTTCT TTTCTTTCTTTTTTTGAGACAGCGTCTCACTCTGTCACCCAGGCTGGAGTGCAGTGGCAC GATCCTGGCTCACTGCAACCTCTGCCTCCCTGATTCAAGCAATTCTCCTGCTTCAGCCTC CAGAATAGCCGGGATTACGGGTGCCTGCTACCACGCCTGGCTAATTTTGTATTTTTAGTA GAGGCGAGGTTTCACCATGTTGGTCAGGCTGGTCTCGAACTCCTGACCTCAAGTGATCTG CCTACCTCGGCCTCCCAAAGTGCTGGGATTACAGGTGTAAACCACCGTGCCTGGCCTGAA AATTTCTAGTTTATGATACTTGCCAGCAGAATGTGTTCTGTCACCCTCTTCTGAATAGAT ATGGTTGTCTGCTATGACTTCTCCCACTGCTGCCCTTCCCCCTGAATCCACAGATGCATT TCTTTTAAAACTATGATCTTGTACACAATGGATGTAAATATTTAATCTTTCTATTTGTAT GTTTTTCCATGTTTCTTTTCTTTCTTTCTCTTTTTTTTTTTTTTTTTTTTTTTTTTGGAG GTGGTGTCTGCCTCTATTGCCCACAGGCTGGAGTGCACTGGTACAATCTCGGCTCACTGC ACCCTCCGCCTCCTAGGTTCAAGGGATTCTGCTGCCTGAGCCTCCTGAGTAGCTGGGACT ACAGGTGTGCACCACCACGCCCGGCTAGTTTTTATATTTTTAACAGAGACAGGGTTTCAC CATATTGGCCAGGCTGGTCTCGAACTCCTGACCTCGTGATCCTCTCACCTCGTCCTCCCA AAGTGCTGGGATTACAGGCATGAGCCACCGTGCCCGGCCTCCATGTTTATTTTCTAGTTG CTTACTTGTCCTTTTGTGTTTATCCTTGTTAACTACTACTGCCAGGCTTAAAGTATAGAC CCCTAGAGGGCAAGATTTGTATCTATATAAAATGTACTGCAAAACATCTACTTAAGCCTC ACATTCTTAAACACAAATTACTTTTGAAGATGACTGTTCTGTTTGTTTCCTTCCTGGTTT CTTCCTTTAACTTTTCCACCAAACAGGTACATGATATACTTTACTGAAATAACTTATATA GCAATATGAATTTTTTTTTTGAGGCGGAGTTTCGCTCTTGTTGCCCAGGCTAGAGTGCAA TGGCGTGATCTTGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAACAATTCTCCTGTCTC AGCCTCCAGAATAGCGGGGATTACAGGCGCACACCACCATGCCAGGCTAATTTTTGTATT TTTAGTAGAGACGGGGGTTCACCATGTTGGCCACGCTGGTCTCGAACTCCTGACCTCAGG TGATCCGCCTGCCTTGGCCTCCCAAAGTGCTGGGACTACAGGCATGAGCCACCGTGCCCG GCAAATTTGAGGTGGAGGTTGCAGTGAGCTGAGATCGCATCACTGCACTCTAGCCTAGGT GACAGAGCAAGACTGTCTCCCACTTCAGCCTCCCAAGTAGCTGGGACTACAAGCATGTGC CACCAGACCTGGTTAATTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCCATCACC CAGGCTGGAGTGCAGTGGCGCGATCTCAGCTCACTGCAAGCTCCCCCTCCCGGGTACACG CCACTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCACCTGCCAGCACGCCCG GCTAACTTTTTGCATTTTTAGTAGAGACAGGGTTTCACCGTGTTAGCCAGGATGGTCTCG ATCTCCTGACCTCATGATCCACCTGCCTTGGCCTCTCAAAGTGCTGGGATTATAGGCGTG AGCCACCGCGCCCAGCCAGGCCTGGTTAATTTTCTTTGGTATTTTTTTGTAGAGACGGAG GTCTCACTATGTTGCCCAGGCTGGTCTCGAACTCCTGAGCTCAAGTGATCCACCTGCCTT GGCCTTCCAAAGTGCTAGGATTACAGGCATGAGCCACGGTGCCCAGCCTACAGTGCAACT TTAATAATAACAATATGAACACAAAAATTCTAAGATCTAAAATTTAAGCTTTCAGTAGTC CTTCTATAACTGTGAAAGTTTGGTTCCTAAAAAGCCCTGAGGAATTTATGGGAAAACAAG AGAGACAACATTTAGTAGTGAACCTGTGCATTCTAAATAAAGACAATATCAATGACGTGT TATAGGTCTTCAATTAGTAAGAATGAATATTGGACTATGAATTTTTATTCACTGTCACTT GTTTGCTAGATGCTTTGAGAATCTTCCTTGCCTATATTTTCCTGAGATGTTGGTTTTTCT TTGTCACAGATAACAATGCTCATTCCCTCCCCATTAAAAACTAAATATATATATATATAT ATATATGATTAAACGATTACTACATGTGCTTTGAAATATTCAAATATTTTAGACAGTAAA AGTCCCTTGTAATTCAACCCTTTGCAGATGATTGGTTAACAGGTTAGTACACATCTACCT AAATTTAAAATCCCATATTTAACATGTATACTTATTAGAAAGTACACATTCTAATATTTT TCTATTGTATTTGGTACTATTTTCAGATGCTCCTGCCTTTTTCTTTCGTAATTTTGAAGG ACCTCAGCTCCCTGCCTCCTAGATTTTTGCTACTATGGTCTCAGAGCTGTGTAATTTGGA TGACTGAGATGGAAAAACCTC TFBS-0.7.1/License000066400000000000000000000002301305752266700135720ustar00rootroot00000000000000TFBS is distributed under the Mozilla Public License Version 2.0 (MPL-2.0). A copy of MPL-2.0 is available from https://www.mozilla.org/en-US/MPL/2.0/. TFBS-0.7.1/MANIFEST000066400000000000000000000044231305752266700134260ustar00rootroot00000000000000Changes MANIFEST Makefile.PL README BUGS TFBS/DB.pm TFBS/_Iterator.pm TFBS/Matrix.pm TFBS/MatrixSet.pm TFBS/PatternGenI.pm TFBS/PatternI.pm TFBS/SitePair.pm TFBS/SitePairSet.pm TFBS/Site.pm TFBS/SiteSet.pm TFBS/_Iterator/_SiteSetIterator.pm TFBS/_Iterator/_MatrixSetIterator.pm TFBS/Matrix/_Alignment.pm TFBS/Matrix/ICM.pm TFBS/Matrix/PFM.pm TFBS/Matrix/PWM.pm TFBS/Word.pm TFBS/Word/Consensus.pm TFBS/DB/FlatFileDir.pm TFBS/DB/JASPAR2.pm TFBS/DB/JASPAR4.pm TFBS/DB/TRANSFAC.pm TFBS/DB/LocalTRANSFAC.pm TFBS/PatternGen.pm TFBS/PatternGen/Gibbs.pm TFBS/PatternGen/SimplePFM.pm TFBS/PatternGen/Motif/Matrix.pm TFBS/PatternGen/Motif/Word.pm TFBS/PatternGen/Gibbs/Motif.pm TFBS/PatternGen/AnnSpec.pm TFBS/PatternGen/Elph.pm TFBS/PatternGen/YMF.pm TFBS/PatternGen/MEME.pm TFBS/PatternGen/MEME/Motif.pm TFBS/PatternGen/Elph/Motif.pm TFBS/PatternGen/YMF/Motif.pm TFBS/PatternGen/AnnSpec/Motif.pm TFBS/PatternGen/Motif/Word.pm TFBS/PatternGen/Motif/Matrix.pm TFBS/Tools/SetOperations.pm Ext/Makefile.PL Ext/lib/pwm_search.h Ext/lib/pwm_searchPFF.c Ext/pwmsearch.pm Ext/pwmsearch.xs Ext/t/pwmsearch.t Ext/t/test.fa examples/script1.pl examples/script2.pl examples/sample_alignment.aln examples/phylofoot.pl examples/list_matrices.pl examples/viewpfm.cgi examples/SAMPLE_FlatFileDir/MA0001.pfm examples/SAMPLE_FlatFileDir/MA0008.pfm examples/SAMPLE_FlatFileDir/MA0015.pfm examples/SAMPLE_FlatFileDir/MA0022.pfm examples/SAMPLE_FlatFileDir/MA0029.pfm examples/SAMPLE_FlatFileDir/MA0036.pfm examples/SAMPLE_FlatFileDir/MA0043.pfm examples/SAMPLE_FlatFileDir/MA0050.pfm examples/SAMPLE_FlatFileDir/MA0057.pfm examples/SAMPLE_FlatFileDir/MA0064.pfm examples/SAMPLE_FlatFileDir/MA0071.pfm examples/SAMPLE_FlatFileDir/MA0078.pfm examples/SAMPLE_FlatFileDir/MA0085.pfm examples/SAMPLE_FlatFileDir/MA0092.pfm examples/SAMPLE_FlatFileDir/MA0099.pfm examples/SAMPLE_FlatFileDir/MA0106.pfm examples/SAMPLE_FlatFileDir/matrix_list.txt t/01_Matrix.t t/02_Search.t t/03_DB_FlatFileDir.t t/04_DB_TRANSFAC.t t/05_DB_JASPAR.t t/06_SimplePFM.t t/07_Elph.t t/07_MEME.t t/07_AnnSpec.t t/07_Gibbs.t t/08_DB_LocalTRANSFAC.t t/09_Word_Consensus.t t/10_Tools_SetOperations.t t/test.aln t/test.fa t/test_meme.fa t/test.gibbin t/transfac_old/matrix.dat t/transfac_new/matrix.dat META.yml Module meta-data (added by MakeMaker) TFBS-0.7.1/META.yml000066400000000000000000000004321305752266700135420ustar00rootroot00000000000000# http://module-build.sourceforge.net/META-spec.html #XXXXXXX This is a prototype!!! It will change in the future!!! XXXXX# name: TFBS version: 0.5.0 version_from: installdirs: site requires: distribution_type: module generated_by: ExtUtils::MakeMaker version 6.17 TFBS-0.7.1/MYMETA.json000066400000000000000000000014201305752266700141560ustar00rootroot00000000000000{ "abstract" : "unknown", "author" : [ "unknown" ], "dynamic_config" : 0, "generated_by" : "ExtUtils::MakeMaker version 6.74, CPAN::Meta::Converter version 2.132140", "license" : [ "unknown" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : "2" }, "name" : "TFBS", "no_index" : { "directory" : [ "t", "inc" ] }, "prereqs" : { "build" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "configure" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "runtime" : { "requires" : {} } }, "release_status" : "stable", "version" : "v0.5.0" } TFBS-0.7.1/MYMETA.yml000066400000000000000000000006461305752266700140170ustar00rootroot00000000000000--- abstract: unknown author: - unknown build_requires: ExtUtils::MakeMaker: 0 configure_requires: ExtUtils::MakeMaker: 0 dynamic_config: 0 generated_by: 'ExtUtils::MakeMaker version 6.74, CPAN::Meta::Converter version 2.132140' license: unknown meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: 1.4 name: TFBS no_index: directory: - t - inc requires: {} version: v0.5.0 TFBS-0.7.1/Makefile000066400000000000000000001121401305752266700137310ustar00rootroot00000000000000# This Makefile is for the TFBS extension to perl. # # It was generated automatically by MakeMaker version # 6.74 (Revision: 67400) from the contents of # Makefile.PL. Don't edit this file, edit Makefile.PL instead. # # ANY CHANGES MADE HERE WILL BE LOST! # # MakeMaker ARGV: () # # MakeMaker Parameters: # BUILD_REQUIRES => { } # CONFIGURE_REQUIRES => { } # DISTNAME => q[TFBS] # NAME => q[TFBS] # PREREQ_PM => { } # TEST_REQUIRES => { } # VERSION => q[0.5.0] # dist => { DIST_DEFAULT=>q[all tardist], COMPRESS=>q[gzip -9f], SUFFIX=>q[.gz] } # --- MakeMaker post_initialize section: # --- MakeMaker const_config section: # These definitions are from config.sh (via /System/Library/Perl/5.12/darwin-thread-multi-2level/Config.pm). # They may have been overridden via Makefile.PL or on the command line. AR = ar CC = clang CCCDLFLAGS = CCDLFLAGS = DLEXT = bundle DLSRC = dl_dlopen.xs EXE_EXT = FULL_AR = /usr/bin/ar LD = clang -mmacosx-version-min=10.8 LDDLFLAGS = -arch i386 -arch x86_64 -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector LDFLAGS = -arch i386 -arch x86_64 -fstack-protector -L/usr/local/lib LIBC = LIB_EXT = .a OBJ_EXT = .o OSNAME = darwin OSVERS = 12.0 RANLIB = /usr/bin/ar s SITELIBEXP = /Library/Perl/5.12 SITEARCHEXP = /Library/Perl/5.12/darwin-thread-multi-2level SO = dylib VENDORARCHEXP = /Network/Library/Perl/5.12/darwin-thread-multi-2level VENDORLIBEXP = /Network/Library/Perl/5.12 # --- MakeMaker constants section: AR_STATIC_ARGS = cr DIRFILESEP = / DFSEP = $(DIRFILESEP) NAME = TFBS NAME_SYM = TFBS VERSION = 0.5.0 VERSION_MACRO = VERSION VERSION_SYM = 0_5_0 DEFINE_VERSION = -D$(VERSION_MACRO)=\"$(VERSION)\" XS_VERSION = 0.5.0 XS_VERSION_MACRO = XS_VERSION XS_DEFINE_VERSION = -D$(XS_VERSION_MACRO)=\"$(XS_VERSION)\" INST_ARCHLIB = blib/arch INST_SCRIPT = blib/script INST_BIN = blib/bin INST_LIB = blib/lib INST_MAN1DIR = blib/man1 INST_MAN3DIR = blib/man3 MAN1EXT = 1 MAN3EXT = 3pm INSTALLDIRS = site DESTDIR = PREFIX = $(SITEPREFIX) PERLPREFIX = / SITEPREFIX = /usr/local VENDORPREFIX = /usr/local INSTALLPRIVLIB = /Library/Perl/Updates/5.12.4 DESTINSTALLPRIVLIB = $(DESTDIR)$(INSTALLPRIVLIB) INSTALLSITELIB = /Library/Perl/5.12 DESTINSTALLSITELIB = $(DESTDIR)$(INSTALLSITELIB) INSTALLVENDORLIB = /Network/Library/Perl/5.12 DESTINSTALLVENDORLIB = $(DESTDIR)$(INSTALLVENDORLIB) INSTALLARCHLIB = /Library/Perl/Updates/5.12.4/darwin-thread-multi-2level DESTINSTALLARCHLIB = $(DESTDIR)$(INSTALLARCHLIB) INSTALLSITEARCH = /Library/Perl/5.12/darwin-thread-multi-2level DESTINSTALLSITEARCH = $(DESTDIR)$(INSTALLSITEARCH) INSTALLVENDORARCH = /Network/Library/Perl/5.12/darwin-thread-multi-2level DESTINSTALLVENDORARCH = $(DESTDIR)$(INSTALLVENDORARCH) INSTALLBIN = /usr/bin DESTINSTALLBIN = $(DESTDIR)$(INSTALLBIN) INSTALLSITEBIN = /usr/local/bin DESTINSTALLSITEBIN = $(DESTDIR)$(INSTALLSITEBIN) INSTALLVENDORBIN = /usr/local/bin DESTINSTALLVENDORBIN = $(DESTDIR)$(INSTALLVENDORBIN) INSTALLSCRIPT = /usr/bin DESTINSTALLSCRIPT = $(DESTDIR)$(INSTALLSCRIPT) INSTALLSITESCRIPT = /usr/local/bin DESTINSTALLSITESCRIPT = $(DESTDIR)$(INSTALLSITESCRIPT) INSTALLVENDORSCRIPT = /usr/local/bin DESTINSTALLVENDORSCRIPT = $(DESTDIR)$(INSTALLVENDORSCRIPT) INSTALLMAN1DIR = /usr/share/man/man1 DESTINSTALLMAN1DIR = $(DESTDIR)$(INSTALLMAN1DIR) INSTALLSITEMAN1DIR = /usr/local/share/man/man1 DESTINSTALLSITEMAN1DIR = $(DESTDIR)$(INSTALLSITEMAN1DIR) INSTALLVENDORMAN1DIR = /usr/local/share/man/man1 DESTINSTALLVENDORMAN1DIR = $(DESTDIR)$(INSTALLVENDORMAN1DIR) INSTALLMAN3DIR = /usr/share/man/man3 DESTINSTALLMAN3DIR = $(DESTDIR)$(INSTALLMAN3DIR) INSTALLSITEMAN3DIR = /usr/local/share/man/man3 DESTINSTALLSITEMAN3DIR = $(DESTDIR)$(INSTALLSITEMAN3DIR) INSTALLVENDORMAN3DIR = /usr/local/share/man/man3 DESTINSTALLVENDORMAN3DIR = $(DESTDIR)$(INSTALLVENDORMAN3DIR) PERL_LIB = /System/Library/Perl/5.12 PERL_ARCHLIB = /System/Library/Perl/5.12/darwin-thread-multi-2level LIBPERL_A = libperl.a FIRST_MAKEFILE = Makefile MAKEFILE_OLD = Makefile.old MAKE_APERL_FILE = Makefile.aperl PERLMAINCC = $(CC) PERL_INC = /System/Library/Perl/5.12/darwin-thread-multi-2level/CORE PERL = /usr/bin/perl FULLPERL = /usr/bin/perl ABSPERL = $(PERL) PERLRUN = $(PERL) FULLPERLRUN = $(FULLPERL) ABSPERLRUN = $(ABSPERL) PERLRUNINST = $(PERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" FULLPERLRUNINST = $(FULLPERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" ABSPERLRUNINST = $(ABSPERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" PERL_CORE = 0 PERM_DIR = 755 PERM_RW = 644 PERM_RWX = 755 MAKEMAKER = /Library/Perl/5.12/ExtUtils/MakeMaker.pm MM_VERSION = 6.74 MM_REVISION = 67400 # FULLEXT = Pathname for extension directory (eg Foo/Bar/Oracle). # BASEEXT = Basename part of FULLEXT. May be just equal FULLEXT. (eg Oracle) # PARENT_NAME = NAME without BASEEXT and no trailing :: (eg Foo::Bar) # DLBASE = Basename part of dynamic library. May be just equal BASEEXT. MAKE = make FULLEXT = TFBS BASEEXT = TFBS PARENT_NAME = DLBASE = $(BASEEXT) VERSION_FROM = OBJECT = LDFROM = $(OBJECT) LINKTYPE = dynamic BOOTDEP = # Handy lists of source code files: XS_FILES = C_FILES = O_FILES = H_FILES = MAN1PODS = MAN3PODS = TFBS/DB/FlatFileDir.pm \ TFBS/DB/JASPAR2.pm \ TFBS/DB/JASPAR4.pm \ TFBS/DB/LocalTRANSFAC.pm \ TFBS/DB/TRANSFAC.pm \ TFBS/Matrix.pm \ TFBS/Matrix/ICM.pm \ TFBS/Matrix/PFM.pm \ TFBS/Matrix/PWM.pm \ TFBS/MatrixSet.pm \ TFBS/PatternGen.pm \ TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/PatternGen/Elph.pm \ TFBS/PatternGen/Elph/Motif.pm \ TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/PatternGen/MEME.pm \ TFBS/PatternGen/MEME/Motif.pm \ TFBS/PatternGen/SimplePFM.pm \ TFBS/PatternGen/YMF.pm \ TFBS/PatternGen/YMF/Motif.pm \ TFBS/PatternI.pm \ TFBS/Site.pm \ TFBS/SitePair.pm \ TFBS/SitePairSet.pm \ TFBS/SiteSet.pm \ TFBS/Word.pm \ TFBS/Word/Consensus.pm # Where is the Config information that we are using/depend on CONFIGDEP = $(PERL_ARCHLIB)$(DFSEP)Config.pm $(PERL_INC)$(DFSEP)config.h # Where to build things INST_LIBDIR = $(INST_LIB) INST_ARCHLIBDIR = $(INST_ARCHLIB) INST_AUTODIR = $(INST_LIB)/auto/$(FULLEXT) INST_ARCHAUTODIR = $(INST_ARCHLIB)/auto/$(FULLEXT) INST_STATIC = INST_DYNAMIC = INST_BOOT = # Extra linker info EXPORT_LIST = PERL_ARCHIVE = PERL_ARCHIVE_AFTER = TO_INST_PM = TFBS/DB.pm \ TFBS/DB/FlatFileDir.pm \ TFBS/DB/JASPAR2.pm \ TFBS/DB/JASPAR4.pm \ TFBS/DB/LocalTRANSFAC.pm \ TFBS/DB/TRANSFAC.pm \ TFBS/Matrix.pm \ TFBS/Matrix/ICM.pm \ TFBS/Matrix/PFM.pm \ TFBS/Matrix/PWM.pm \ TFBS/Matrix/_Alignment.pm \ TFBS/MatrixSet.pm \ TFBS/PatternGen.pm \ TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/PatternGen/Elph.pm \ TFBS/PatternGen/Elph/Motif.pm \ TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/PatternGen/MEME.pm \ TFBS/PatternGen/MEME/Motif.pm \ TFBS/PatternGen/Motif/Matrix.pm \ TFBS/PatternGen/Motif/Word.pm \ TFBS/PatternGen/SimplePFM.pm \ TFBS/PatternGen/YMF.pm \ TFBS/PatternGen/YMF/Motif.pm \ TFBS/PatternGenI.pm \ TFBS/PatternI.pm \ TFBS/Site.pm \ TFBS/SitePair.pm \ TFBS/SitePairSet.pm \ TFBS/SiteSet.pm \ TFBS/Tools/SetOperations.pm \ TFBS/Word.pm \ TFBS/Word/Consensus.pm \ TFBS/_Iterator.pm \ TFBS/_Iterator/_MatrixSetIterator.pm \ TFBS/_Iterator/_SiteSetIterator.pm PM_TO_BLIB = TFBS/PatternGen/YMF.pm \ $(INST_LIB)/TFBS/PatternGen/YMF.pm \ TFBS/DB/TRANSFAC.pm \ $(INST_LIB)/TFBS/DB/TRANSFAC.pm \ TFBS/DB.pm \ $(INST_LIB)/TFBS/DB.pm \ TFBS/DB/LocalTRANSFAC.pm \ $(INST_LIB)/TFBS/DB/LocalTRANSFAC.pm \ TFBS/PatternGen/MEME.pm \ $(INST_LIB)/TFBS/PatternGen/MEME.pm \ TFBS/PatternGen/Gibbs.pm \ $(INST_LIB)/TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/Elph.pm \ $(INST_LIB)/TFBS/PatternGen/Elph.pm \ TFBS/DB/JASPAR4.pm \ $(INST_LIB)/TFBS/DB/JASPAR4.pm \ TFBS/PatternGenI.pm \ $(INST_LIB)/TFBS/PatternGenI.pm \ TFBS/Matrix/ICM.pm \ $(INST_LIB)/TFBS/Matrix/ICM.pm \ TFBS/PatternGen/YMF/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/YMF/Motif.pm \ TFBS/SitePairSet.pm \ $(INST_LIB)/TFBS/SitePairSet.pm \ TFBS/PatternGen/Elph/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/Elph/Motif.pm \ TFBS/Site.pm \ $(INST_LIB)/TFBS/Site.pm \ TFBS/_Iterator/_MatrixSetIterator.pm \ $(INST_LIB)/TFBS/_Iterator/_MatrixSetIterator.pm \ TFBS/Tools/SetOperations.pm \ $(INST_LIB)/TFBS/Tools/SetOperations.pm \ TFBS/Matrix/PFM.pm \ $(INST_LIB)/TFBS/Matrix/PFM.pm \ TFBS/PatternI.pm \ $(INST_LIB)/TFBS/PatternI.pm \ TFBS/Word/Consensus.pm \ $(INST_LIB)/TFBS/Word/Consensus.pm \ TFBS/PatternGen/Gibbs/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/SiteSet.pm \ $(INST_LIB)/TFBS/SiteSet.pm \ TFBS/Matrix/_Alignment.pm \ $(INST_LIB)/TFBS/Matrix/_Alignment.pm \ TFBS/PatternGen/AnnSpec.pm \ $(INST_LIB)/TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/Motif/Word.pm \ $(INST_LIB)/TFBS/PatternGen/Motif/Word.pm \ TFBS/SitePair.pm \ $(INST_LIB)/TFBS/SitePair.pm \ TFBS/_Iterator/_SiteSetIterator.pm \ $(INST_LIB)/TFBS/_Iterator/_SiteSetIterator.pm \ TFBS/_Iterator.pm \ $(INST_LIB)/TFBS/_Iterator.pm \ TFBS/PatternGen/MEME/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/MEME/Motif.pm \ TFBS/DB/FlatFileDir.pm \ $(INST_LIB)/TFBS/DB/FlatFileDir.pm \ TFBS/PatternGen.pm \ $(INST_LIB)/TFBS/PatternGen.pm \ TFBS/MatrixSet.pm \ $(INST_LIB)/TFBS/MatrixSet.pm \ TFBS/PatternGen/SimplePFM.pm \ $(INST_LIB)/TFBS/PatternGen/SimplePFM.pm \ TFBS/Word.pm \ $(INST_LIB)/TFBS/Word.pm \ TFBS/DB/JASPAR2.pm \ $(INST_LIB)/TFBS/DB/JASPAR2.pm \ TFBS/Matrix.pm \ $(INST_LIB)/TFBS/Matrix.pm \ TFBS/PatternGen/AnnSpec/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/PatternGen/Motif/Matrix.pm \ $(INST_LIB)/TFBS/PatternGen/Motif/Matrix.pm \ TFBS/Matrix/PWM.pm \ $(INST_LIB)/TFBS/Matrix/PWM.pm # --- MakeMaker platform_constants section: MM_Unix_VERSION = 6.74 PERL_MALLOC_DEF = -DPERL_EXTMALLOC_DEF -Dmalloc=Perl_malloc -Dfree=Perl_mfree -Drealloc=Perl_realloc -Dcalloc=Perl_calloc # --- MakeMaker tool_autosplit section: # Usage: $(AUTOSPLITFILE) FileToSplit AutoDirToSplitInto AUTOSPLITFILE = $(ABSPERLRUN) -e 'use AutoSplit; autosplit($$$$ARGV[0], $$$$ARGV[1], 0, 1, 1)' -- # --- MakeMaker tool_xsubpp section: XSUBPPDIR = /System/Library/Perl/5.12/ExtUtils XSUBPP = $(XSUBPPDIR)$(DFSEP)xsubpp XSUBPPRUN = $(PERLRUN) $(XSUBPP) XSPROTOARG = XSUBPPDEPS = /System/Library/Perl/5.12/ExtUtils/typemap $(XSUBPP) XSUBPPARGS = -typemap /System/Library/Perl/5.12/ExtUtils/typemap XSUBPP_EXTRA_ARGS = # --- MakeMaker tools_other section: SHELL = /bin/sh CHMOD = chmod CP = cp MV = mv NOOP = $(TRUE) NOECHO = @ RM_F = rm -f RM_RF = rm -rf TEST_F = test -f TOUCH = touch UMASK_NULL = umask 0 DEV_NULL = > /dev/null 2>&1 MKPATH = $(ABSPERLRUN) -MExtUtils::Command -e 'mkpath' -- EQUALIZE_TIMESTAMP = $(ABSPERLRUN) -MExtUtils::Command -e 'eqtime' -- FALSE = false TRUE = true ECHO = echo ECHO_N = echo -n UNINST = 0 VERBINST = 0 MOD_INSTALL = $(ABSPERLRUN) -MExtUtils::Install -e 'install([ from_to => {@ARGV}, verbose => '\''$(VERBINST)'\'', uninstall_shadows => '\''$(UNINST)'\'', dir_mode => '\''$(PERM_DIR)'\'' ]);' -- DOC_INSTALL = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'perllocal_install' -- UNINSTALL = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'uninstall' -- WARN_IF_OLD_PACKLIST = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'warn_if_old_packlist' -- MACROSTART = MACROEND = USEMAKEFILE = -f FIXIN = $(ABSPERLRUN) -MExtUtils::MY -e 'MY->fixin(shift)' -- # --- MakeMaker makemakerdflt section: makemakerdflt : all $(NOECHO) $(NOOP) # --- MakeMaker dist section: TAR = COPY_EXTENDED_ATTRIBUTES_DISABLE=1 COPYFILE_DISABLE=1 tar TARFLAGS = cvf ZIP = zip ZIPFLAGS = -r COMPRESS = gzip -9f SUFFIX = .gz SHAR = shar PREOP = $(NOECHO) $(NOOP) POSTOP = $(NOECHO) $(NOOP) TO_UNIX = $(NOECHO) $(NOOP) CI = ci -u RCS_LABEL = rcs -Nv$(VERSION_SYM): -q DIST_CP = best DIST_DEFAULT = all tardist DISTNAME = TFBS DISTVNAME = TFBS-0.5.0 # --- MakeMaker macro section: # --- MakeMaker depend section: # --- MakeMaker cflags section: CCFLAGS = -arch i386 -arch x86_64 -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -fstack-protector -I/usr/local/include OPTIMIZE = -Os PERLTYPE = MPOLLUTE = # --- MakeMaker const_loadlibs section: # TFBS might depend on some other libraries: # See ExtUtils::Liblist for details # # --- MakeMaker const_cccmd section: CCCMD = $(CC) -c $(PASTHRU_INC) $(INC) \ $(CCFLAGS) $(OPTIMIZE) \ $(PERLTYPE) $(MPOLLUTE) $(DEFINE_VERSION) \ $(XS_DEFINE_VERSION) # --- MakeMaker post_constants section: # --- MakeMaker pasthru section: PASTHRU = LIBPERL_A="$(LIBPERL_A)"\ LINKTYPE="$(LINKTYPE)"\ OPTIMIZE="$(OPTIMIZE)"\ PREFIX="$(PREFIX)" # --- MakeMaker special_targets section: .SUFFIXES : .xs .c .C .cpp .i .s .cxx .cc $(OBJ_EXT) .PHONY: all config static dynamic test linkext manifest blibdirs clean realclean disttest distdir # --- MakeMaker c_o section: .c.i: clang -E -c $(PASTHRU_INC) $(INC) \ $(CCFLAGS) $(OPTIMIZE) \ $(PERLTYPE) $(MPOLLUTE) $(DEFINE_VERSION) \ $(XS_DEFINE_VERSION) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c > $*.i .c.s: $(CCCMD) -S $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c .c$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c .cpp$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cpp .cxx$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cxx .cc$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cc .C$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.C # --- MakeMaker xs_c section: .xs.c: $(XSUBPPRUN) $(XSPROTOARG) $(XSUBPPARGS) $(XSUBPP_EXTRA_ARGS) $*.xs > $*.xsc && $(MV) $*.xsc $*.c # --- MakeMaker xs_o section: .xs$(OBJ_EXT): $(XSUBPPRUN) $(XSPROTOARG) $(XSUBPPARGS) $*.xs > $*.xsc && $(MV) $*.xsc $*.c $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c # --- MakeMaker top_targets section: all :: pure_all manifypods $(NOECHO) $(NOOP) pure_all :: config pm_to_blib subdirs linkext $(NOECHO) $(NOOP) subdirs :: $(MYEXTLIB) $(NOECHO) $(NOOP) config :: $(FIRST_MAKEFILE) blibdirs $(NOECHO) $(NOOP) help : perldoc ExtUtils::MakeMaker # --- MakeMaker blibdirs section: blibdirs : $(INST_LIBDIR)$(DFSEP).exists $(INST_ARCHLIB)$(DFSEP).exists $(INST_AUTODIR)$(DFSEP).exists $(INST_ARCHAUTODIR)$(DFSEP).exists $(INST_BIN)$(DFSEP).exists $(INST_SCRIPT)$(DFSEP).exists $(INST_MAN1DIR)$(DFSEP).exists $(INST_MAN3DIR)$(DFSEP).exists $(NOECHO) $(NOOP) # Backwards compat with 6.18 through 6.25 blibdirs.ts : blibdirs $(NOECHO) $(NOOP) $(INST_LIBDIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_LIBDIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_LIBDIR) $(NOECHO) $(TOUCH) $(INST_LIBDIR)$(DFSEP).exists $(INST_ARCHLIB)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_ARCHLIB) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_ARCHLIB) $(NOECHO) $(TOUCH) $(INST_ARCHLIB)$(DFSEP).exists $(INST_AUTODIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_AUTODIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_AUTODIR) $(NOECHO) $(TOUCH) $(INST_AUTODIR)$(DFSEP).exists $(INST_ARCHAUTODIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_ARCHAUTODIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_ARCHAUTODIR) $(NOECHO) $(TOUCH) $(INST_ARCHAUTODIR)$(DFSEP).exists $(INST_BIN)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_BIN) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_BIN) $(NOECHO) $(TOUCH) $(INST_BIN)$(DFSEP).exists $(INST_SCRIPT)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_SCRIPT) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_SCRIPT) $(NOECHO) $(TOUCH) $(INST_SCRIPT)$(DFSEP).exists $(INST_MAN1DIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_MAN1DIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_MAN1DIR) $(NOECHO) $(TOUCH) $(INST_MAN1DIR)$(DFSEP).exists $(INST_MAN3DIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_MAN3DIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_MAN3DIR) $(NOECHO) $(TOUCH) $(INST_MAN3DIR)$(DFSEP).exists # --- MakeMaker linkext section: linkext :: $(LINKTYPE) $(NOECHO) $(NOOP) # --- MakeMaker dlsyms section: # --- MakeMaker dynamic section: dynamic :: $(FIRST_MAKEFILE) $(INST_DYNAMIC) $(INST_BOOT) $(NOECHO) $(NOOP) # --- MakeMaker dynamic_bs section: BOOTSTRAP = # --- MakeMaker dynamic_lib section: # --- MakeMaker static section: ## $(INST_PM) has been moved to the all: target. ## It remains here for awhile to allow for old usage: "make static" static :: $(FIRST_MAKEFILE) $(INST_STATIC) $(NOECHO) $(NOOP) # --- MakeMaker static_lib section: # --- MakeMaker manifypods section: POD2MAN_EXE = $(PERLRUN) "-MExtUtils::Command::MM" -e pod2man "--" POD2MAN = $(POD2MAN_EXE) manifypods : pure_all \ TFBS/Matrix/PFM.pm \ TFBS/PatternI.pm \ TFBS/PatternGen/YMF.pm \ TFBS/DB/TRANSFAC.pm \ TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/Word/Consensus.pm \ TFBS/SiteSet.pm \ TFBS/DB/LocalTRANSFAC.pm \ TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/MEME.pm \ TFBS/SitePair.pm \ TFBS/PatternGen/Elph.pm \ TFBS/DB/FlatFileDir.pm \ TFBS/PatternGen/MEME/Motif.pm \ TFBS/DB/JASPAR4.pm \ TFBS/PatternGen.pm \ TFBS/MatrixSet.pm \ TFBS/PatternGen/SimplePFM.pm \ TFBS/Matrix/ICM.pm \ TFBS/Word.pm \ TFBS/SitePairSet.pm \ TFBS/PatternGen/YMF/Motif.pm \ TFBS/DB/JASPAR2.pm \ TFBS/Matrix.pm \ TFBS/PatternGen/Elph/Motif.pm \ TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/Site.pm \ TFBS/Matrix/PWM.pm $(NOECHO) $(POD2MAN) --section=3 --perm_rw=$(PERM_RW) \ TFBS/Matrix/PFM.pm $(INST_MAN3DIR)/TFBS::Matrix::PFM.$(MAN3EXT) \ TFBS/PatternI.pm $(INST_MAN3DIR)/TFBS::PatternI.$(MAN3EXT) \ TFBS/PatternGen/YMF.pm $(INST_MAN3DIR)/TFBS::PatternGen::YMF.$(MAN3EXT) \ TFBS/DB/TRANSFAC.pm $(INST_MAN3DIR)/TFBS::DB::TRANSFAC.$(MAN3EXT) \ TFBS/PatternGen/Gibbs/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::Gibbs::Motif.$(MAN3EXT) \ TFBS/Word/Consensus.pm $(INST_MAN3DIR)/TFBS::Word::Consensus.$(MAN3EXT) \ TFBS/SiteSet.pm $(INST_MAN3DIR)/TFBS::SiteSet.$(MAN3EXT) \ TFBS/DB/LocalTRANSFAC.pm $(INST_MAN3DIR)/TFBS::DB::LocalTRANSFAC.$(MAN3EXT) \ TFBS/PatternGen/AnnSpec.pm $(INST_MAN3DIR)/TFBS::PatternGen::AnnSpec.$(MAN3EXT) \ TFBS/PatternGen/Gibbs.pm $(INST_MAN3DIR)/TFBS::PatternGen::Gibbs.$(MAN3EXT) \ TFBS/PatternGen/MEME.pm $(INST_MAN3DIR)/TFBS::PatternGen::MEME.$(MAN3EXT) \ TFBS/SitePair.pm $(INST_MAN3DIR)/TFBS::SitePair.$(MAN3EXT) \ TFBS/PatternGen/Elph.pm $(INST_MAN3DIR)/TFBS::PatternGen::Elph.$(MAN3EXT) \ TFBS/DB/FlatFileDir.pm $(INST_MAN3DIR)/TFBS::DB::FlatFileDir.$(MAN3EXT) \ TFBS/PatternGen/MEME/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::MEME::Motif.$(MAN3EXT) \ TFBS/DB/JASPAR4.pm $(INST_MAN3DIR)/TFBS::DB::JASPAR4.$(MAN3EXT) \ TFBS/PatternGen.pm $(INST_MAN3DIR)/TFBS::PatternGen.$(MAN3EXT) \ TFBS/MatrixSet.pm $(INST_MAN3DIR)/TFBS::MatrixSet.$(MAN3EXT) \ TFBS/PatternGen/SimplePFM.pm $(INST_MAN3DIR)/TFBS::PatternGen::SimplePFM.$(MAN3EXT) \ TFBS/Matrix/ICM.pm $(INST_MAN3DIR)/TFBS::Matrix::ICM.$(MAN3EXT) \ TFBS/Word.pm $(INST_MAN3DIR)/TFBS::Word.$(MAN3EXT) \ TFBS/SitePairSet.pm $(INST_MAN3DIR)/TFBS::SitePairSet.$(MAN3EXT) \ TFBS/PatternGen/YMF/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::YMF::Motif.$(MAN3EXT) \ TFBS/DB/JASPAR2.pm $(INST_MAN3DIR)/TFBS::DB::JASPAR2.$(MAN3EXT) \ TFBS/Matrix.pm $(INST_MAN3DIR)/TFBS::Matrix.$(MAN3EXT) \ TFBS/PatternGen/Elph/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::Elph::Motif.$(MAN3EXT) \ TFBS/PatternGen/AnnSpec/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::AnnSpec::Motif.$(MAN3EXT) \ TFBS/Site.pm $(INST_MAN3DIR)/TFBS::Site.$(MAN3EXT) \ TFBS/Matrix/PWM.pm $(INST_MAN3DIR)/TFBS::Matrix::PWM.$(MAN3EXT) # --- MakeMaker processPL section: # --- MakeMaker installbin section: # --- MakeMaker subdirs section: # The default clean, realclean and test targets in this Makefile # have automatically been given entries for each subdir. subdirs :: $(NOECHO) cd Ext && $(MAKE) $(USEMAKEFILE) $(FIRST_MAKEFILE) all $(PASTHRU) # --- MakeMaker clean_subdirs section: clean_subdirs : $(ABSPERLRUN) -e 'exit 0 unless chdir '\''Ext'\''; system '\''$(MAKE) clean'\'' if -f '\''$(FIRST_MAKEFILE)'\'';' -- # --- MakeMaker clean section: # Delete temporary files but do not touch installed files. We don't delete # the Makefile here so a later make realclean still has a makefile to use. clean :: clean_subdirs - $(RM_F) \ *$(LIB_EXT) core \ core.[0-9] $(INST_ARCHAUTODIR)/extralibs.all \ core.[0-9][0-9] $(BASEEXT).bso \ pm_to_blib.ts MYMETA.json \ core.[0-9][0-9][0-9][0-9] MYMETA.yml \ $(BASEEXT).x $(BOOTSTRAP) \ perl$(EXE_EXT) tmon.out \ *$(OBJ_EXT) pm_to_blib \ $(INST_ARCHAUTODIR)/extralibs.ld blibdirs.ts \ core.[0-9][0-9][0-9][0-9][0-9] *perl.core \ core.*perl.*.? $(MAKE_APERL_FILE) \ $(BASEEXT).def perl \ core.[0-9][0-9][0-9] mon.out \ lib$(BASEEXT).def perl.exe \ perlmain.c so_locations \ $(BASEEXT).exp - $(RM_RF) \ blib $(NOECHO) $(RM_F) $(MAKEFILE_OLD) - $(MV) $(FIRST_MAKEFILE) $(MAKEFILE_OLD) $(DEV_NULL) # --- MakeMaker realclean_subdirs section: realclean_subdirs : - $(ABSPERLRUN) -e 'chdir '\''Ext'\''; system '\''$(MAKE) $(USEMAKEFILE) $(MAKEFILE_OLD) realclean'\'' if -f '\''$(MAKEFILE_OLD)'\'';' -- - $(ABSPERLRUN) -e 'chdir '\''Ext'\''; system '\''$(MAKE) $(USEMAKEFILE) $(FIRST_MAKEFILE) realclean'\'' if -f '\''$(FIRST_MAKEFILE)'\'';' -- # --- MakeMaker realclean section: # Delete temporary files (via clean) and also delete dist files realclean purge :: clean realclean_subdirs - $(RM_F) \ $(MAKEFILE_OLD) $(FIRST_MAKEFILE) - $(RM_RF) \ $(DISTVNAME) # --- MakeMaker metafile section: metafile : create_distdir $(NOECHO) $(ECHO) Generating META.yml $(NOECHO) $(ECHO) '---' > META_new.yml $(NOECHO) $(ECHO) 'abstract: unknown' >> META_new.yml $(NOECHO) $(ECHO) 'author:' >> META_new.yml $(NOECHO) $(ECHO) ' - unknown' >> META_new.yml $(NOECHO) $(ECHO) 'build_requires:' >> META_new.yml $(NOECHO) $(ECHO) ' ExtUtils::MakeMaker: 0' >> META_new.yml $(NOECHO) $(ECHO) 'configure_requires:' >> META_new.yml $(NOECHO) $(ECHO) ' ExtUtils::MakeMaker: 0' >> META_new.yml $(NOECHO) $(ECHO) 'dynamic_config: 1' >> META_new.yml $(NOECHO) $(ECHO) 'generated_by: '\''ExtUtils::MakeMaker version 6.74, CPAN::Meta::Converter version 2.132140'\''' >> META_new.yml $(NOECHO) $(ECHO) 'license: unknown' >> META_new.yml $(NOECHO) $(ECHO) 'meta-spec:' >> META_new.yml $(NOECHO) $(ECHO) ' url: http://module-build.sourceforge.net/META-spec-v1.4.html' >> META_new.yml $(NOECHO) $(ECHO) ' version: 1.4' >> META_new.yml $(NOECHO) $(ECHO) 'name: TFBS' >> META_new.yml $(NOECHO) $(ECHO) 'no_index:' >> META_new.yml $(NOECHO) $(ECHO) ' directory:' >> META_new.yml $(NOECHO) $(ECHO) ' - t' >> META_new.yml $(NOECHO) $(ECHO) ' - inc' >> META_new.yml $(NOECHO) $(ECHO) 'requires: {}' >> META_new.yml $(NOECHO) $(ECHO) 'version: v0.5.0' >> META_new.yml -$(NOECHO) $(MV) META_new.yml $(DISTVNAME)/META.yml $(NOECHO) $(ECHO) Generating META.json $(NOECHO) $(ECHO) '{' > META_new.json $(NOECHO) $(ECHO) ' "abstract" : "unknown",' >> META_new.json $(NOECHO) $(ECHO) ' "author" : [' >> META_new.json $(NOECHO) $(ECHO) ' "unknown"' >> META_new.json $(NOECHO) $(ECHO) ' ],' >> META_new.json $(NOECHO) $(ECHO) ' "dynamic_config" : 1,' >> META_new.json $(NOECHO) $(ECHO) ' "generated_by" : "ExtUtils::MakeMaker version 6.74, CPAN::Meta::Converter version 2.132140",' >> META_new.json $(NOECHO) $(ECHO) ' "license" : [' >> META_new.json $(NOECHO) $(ECHO) ' "unknown"' >> META_new.json $(NOECHO) $(ECHO) ' ],' >> META_new.json $(NOECHO) $(ECHO) ' "meta-spec" : {' >> META_new.json $(NOECHO) $(ECHO) ' "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",' >> META_new.json $(NOECHO) $(ECHO) ' "version" : "2"' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "name" : "TFBS",' >> META_new.json $(NOECHO) $(ECHO) ' "no_index" : {' >> META_new.json $(NOECHO) $(ECHO) ' "directory" : [' >> META_new.json $(NOECHO) $(ECHO) ' "t",' >> META_new.json $(NOECHO) $(ECHO) ' "inc"' >> META_new.json $(NOECHO) $(ECHO) ' ]' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "prereqs" : {' >> META_new.json $(NOECHO) $(ECHO) ' "build" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {' >> META_new.json $(NOECHO) $(ECHO) ' "ExtUtils::MakeMaker" : "0"' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "configure" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {' >> META_new.json $(NOECHO) $(ECHO) ' "ExtUtils::MakeMaker" : "0"' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "runtime" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {}' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "release_status" : "stable",' >> META_new.json $(NOECHO) $(ECHO) ' "version" : "v0.5.0"' >> META_new.json $(NOECHO) $(ECHO) '}' >> META_new.json -$(NOECHO) $(MV) META_new.json $(DISTVNAME)/META.json # --- MakeMaker signature section: signature : cpansign -s # --- MakeMaker dist_basics section: distclean :: realclean distcheck $(NOECHO) $(NOOP) distcheck : $(PERLRUN) "-MExtUtils::Manifest=fullcheck" -e fullcheck skipcheck : $(PERLRUN) "-MExtUtils::Manifest=skipcheck" -e skipcheck manifest : $(PERLRUN) "-MExtUtils::Manifest=mkmanifest" -e mkmanifest veryclean : realclean $(RM_F) *~ */*~ *.orig */*.orig *.bak */*.bak *.old */*.old # --- MakeMaker dist_core section: dist : $(DIST_DEFAULT) $(FIRST_MAKEFILE) $(NOECHO) $(ABSPERLRUN) -l -e 'print '\''Warning: Makefile possibly out of date with $(VERSION_FROM)'\''' \ -e ' if -e '\''$(VERSION_FROM)'\'' and -M '\''$(VERSION_FROM)'\'' < -M '\''$(FIRST_MAKEFILE)'\'';' -- tardist : $(DISTVNAME).tar$(SUFFIX) $(NOECHO) $(NOOP) uutardist : $(DISTVNAME).tar$(SUFFIX) uuencode $(DISTVNAME).tar$(SUFFIX) $(DISTVNAME).tar$(SUFFIX) > $(DISTVNAME).tar$(SUFFIX)_uu $(NOECHO) $(ECHO) 'Created $(DISTVNAME).tar$(SUFFIX)_uu' $(DISTVNAME).tar$(SUFFIX) : distdir $(PREOP) $(TO_UNIX) $(TAR) $(TARFLAGS) $(DISTVNAME).tar $(DISTVNAME) $(RM_RF) $(DISTVNAME) $(COMPRESS) $(DISTVNAME).tar $(NOECHO) $(ECHO) 'Created $(DISTVNAME).tar$(SUFFIX)' $(POSTOP) zipdist : $(DISTVNAME).zip $(NOECHO) $(NOOP) $(DISTVNAME).zip : distdir $(PREOP) $(ZIP) $(ZIPFLAGS) $(DISTVNAME).zip $(DISTVNAME) $(RM_RF) $(DISTVNAME) $(NOECHO) $(ECHO) 'Created $(DISTVNAME).zip' $(POSTOP) shdist : distdir $(PREOP) $(SHAR) $(DISTVNAME) > $(DISTVNAME).shar $(RM_RF) $(DISTVNAME) $(NOECHO) $(ECHO) 'Created $(DISTVNAME).shar' $(POSTOP) # --- MakeMaker distdir section: create_distdir : $(RM_RF) $(DISTVNAME) $(PERLRUN) "-MExtUtils::Manifest=manicopy,maniread" \ -e "manicopy(maniread(),'$(DISTVNAME)', '$(DIST_CP)');" distdir : create_distdir distmeta $(NOECHO) $(NOOP) # --- MakeMaker dist_test section: disttest : distdir cd $(DISTVNAME) && $(ABSPERLRUN) Makefile.PL cd $(DISTVNAME) && $(MAKE) $(PASTHRU) cd $(DISTVNAME) && $(MAKE) test $(PASTHRU) # --- MakeMaker dist_ci section: ci : $(PERLRUN) "-MExtUtils::Manifest=maniread" \ -e "@all = keys %{ maniread() };" \ -e "print(qq{Executing $(CI) @all\n}); system(qq{$(CI) @all});" \ -e "print(qq{Executing $(RCS_LABEL) ...\n}); system(qq{$(RCS_LABEL) @all});" # --- MakeMaker distmeta section: distmeta : create_distdir metafile $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'exit unless -e q{META.yml};' \ -e 'eval { maniadd({q{META.yml} => q{Module YAML meta-data (added by MakeMaker)}}) }' \ -e ' or print "Could not add META.yml to MANIFEST: $$$${'\''@'\''}\n"' -- $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'exit unless -f q{META.json};' \ -e 'eval { maniadd({q{META.json} => q{Module JSON meta-data (added by MakeMaker)}}) }' \ -e ' or print "Could not add META.json to MANIFEST: $$$${'\''@'\''}\n"' -- # --- MakeMaker distsignature section: distsignature : create_distdir $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'eval { maniadd({q{SIGNATURE} => q{Public-key signature (added by MakeMaker)}}) }' \ -e ' or print "Could not add SIGNATURE to MANIFEST: $$$${'\''@'\''}\n"' -- $(NOECHO) cd $(DISTVNAME) && $(TOUCH) SIGNATURE cd $(DISTVNAME) && cpansign -s # --- MakeMaker install section: install :: pure_install doc_install $(NOECHO) $(NOOP) install_perl :: pure_perl_install doc_perl_install $(NOECHO) $(NOOP) install_site :: pure_site_install doc_site_install $(NOECHO) $(NOOP) install_vendor :: pure_vendor_install doc_vendor_install $(NOECHO) $(NOOP) pure_install :: pure_$(INSTALLDIRS)_install $(NOECHO) $(NOOP) doc_install :: doc_$(INSTALLDIRS)_install $(NOECHO) $(NOOP) pure__install : pure_site_install $(NOECHO) $(ECHO) INSTALLDIRS not defined, defaulting to INSTALLDIRS=site doc__install : doc_site_install $(NOECHO) $(ECHO) INSTALLDIRS not defined, defaulting to INSTALLDIRS=site pure_perl_install :: all $(NOECHO) $(MOD_INSTALL) \ read $(PERL_ARCHLIB)/auto/$(FULLEXT)/.packlist \ write $(DESTINSTALLARCHLIB)/auto/$(FULLEXT)/.packlist \ $(INST_LIB) $(DESTINSTALLPRIVLIB) \ $(INST_ARCHLIB) $(DESTINSTALLARCHLIB) \ $(INST_BIN) $(DESTINSTALLBIN) \ $(INST_SCRIPT) $(DESTINSTALLSCRIPT) \ $(INST_MAN1DIR) $(DESTINSTALLMAN1DIR) \ $(INST_MAN3DIR) $(DESTINSTALLMAN3DIR) $(NOECHO) $(WARN_IF_OLD_PACKLIST) \ $(SITEARCHEXP)/auto/$(FULLEXT) pure_site_install :: all $(NOECHO) $(MOD_INSTALL) \ read $(SITEARCHEXP)/auto/$(FULLEXT)/.packlist \ write $(DESTINSTALLSITEARCH)/auto/$(FULLEXT)/.packlist \ $(INST_LIB) $(DESTINSTALLSITELIB) \ $(INST_ARCHLIB) $(DESTINSTALLSITEARCH) \ $(INST_BIN) $(DESTINSTALLSITEBIN) \ $(INST_SCRIPT) $(DESTINSTALLSITESCRIPT) \ $(INST_MAN1DIR) $(DESTINSTALLSITEMAN1DIR) \ $(INST_MAN3DIR) $(DESTINSTALLSITEMAN3DIR) $(NOECHO) $(WARN_IF_OLD_PACKLIST) \ $(PERL_ARCHLIB)/auto/$(FULLEXT) pure_vendor_install :: all $(NOECHO) $(MOD_INSTALL) \ read $(VENDORARCHEXP)/auto/$(FULLEXT)/.packlist \ write $(DESTINSTALLVENDORARCH)/auto/$(FULLEXT)/.packlist \ $(INST_LIB) $(DESTINSTALLVENDORLIB) \ $(INST_ARCHLIB) $(DESTINSTALLVENDORARCH) \ $(INST_BIN) $(DESTINSTALLVENDORBIN) \ $(INST_SCRIPT) $(DESTINSTALLVENDORSCRIPT) \ $(INST_MAN1DIR) $(DESTINSTALLVENDORMAN1DIR) \ $(INST_MAN3DIR) $(DESTINSTALLVENDORMAN3DIR) doc_perl_install :: all $(NOECHO) $(ECHO) Appending installation info to $(DESTINSTALLARCHLIB)/perllocal.pod -$(NOECHO) $(MKPATH) $(DESTINSTALLARCHLIB) -$(NOECHO) $(DOC_INSTALL) \ "Module" "$(NAME)" \ "installed into" "$(INSTALLPRIVLIB)" \ LINKTYPE "$(LINKTYPE)" \ VERSION "$(VERSION)" \ EXE_FILES "$(EXE_FILES)" \ >> $(DESTINSTALLARCHLIB)/perllocal.pod doc_site_install :: all $(NOECHO) $(ECHO) Appending installation info to $(DESTINSTALLARCHLIB)/perllocal.pod -$(NOECHO) $(MKPATH) $(DESTINSTALLARCHLIB) -$(NOECHO) $(DOC_INSTALL) \ "Module" "$(NAME)" \ "installed into" "$(INSTALLSITELIB)" \ LINKTYPE "$(LINKTYPE)" \ VERSION "$(VERSION)" \ EXE_FILES "$(EXE_FILES)" \ >> $(DESTINSTALLARCHLIB)/perllocal.pod doc_vendor_install :: all $(NOECHO) $(ECHO) Appending installation info to $(DESTINSTALLARCHLIB)/perllocal.pod -$(NOECHO) $(MKPATH) $(DESTINSTALLARCHLIB) -$(NOECHO) $(DOC_INSTALL) \ "Module" "$(NAME)" \ "installed into" "$(INSTALLVENDORLIB)" \ LINKTYPE "$(LINKTYPE)" \ VERSION "$(VERSION)" \ EXE_FILES "$(EXE_FILES)" \ >> $(DESTINSTALLARCHLIB)/perllocal.pod uninstall :: uninstall_from_$(INSTALLDIRS)dirs $(NOECHO) $(NOOP) uninstall_from_perldirs :: $(NOECHO) $(UNINSTALL) $(PERL_ARCHLIB)/auto/$(FULLEXT)/.packlist uninstall_from_sitedirs :: $(NOECHO) $(UNINSTALL) $(SITEARCHEXP)/auto/$(FULLEXT)/.packlist uninstall_from_vendordirs :: $(NOECHO) $(UNINSTALL) $(VENDORARCHEXP)/auto/$(FULLEXT)/.packlist # --- MakeMaker force section: # Phony target to force checking subdirectories. FORCE : $(NOECHO) $(NOOP) # --- MakeMaker perldepend section: # --- MakeMaker makefile section: # We take a very conservative approach here, but it's worth it. # We move Makefile to Makefile.old here to avoid gnu make looping. $(FIRST_MAKEFILE) : Makefile.PL $(CONFIGDEP) $(NOECHO) $(ECHO) "Makefile out-of-date with respect to $?" $(NOECHO) $(ECHO) "Cleaning current config before rebuilding Makefile..." -$(NOECHO) $(RM_F) $(MAKEFILE_OLD) -$(NOECHO) $(MV) $(FIRST_MAKEFILE) $(MAKEFILE_OLD) - $(MAKE) $(USEMAKEFILE) $(MAKEFILE_OLD) clean $(DEV_NULL) $(PERLRUN) Makefile.PL $(NOECHO) $(ECHO) "==> Your Makefile has been rebuilt. <==" $(NOECHO) $(ECHO) "==> Please rerun the $(MAKE) command. <==" $(FALSE) # --- MakeMaker staticmake section: # --- MakeMaker makeaperl section --- MAP_TARGET = perl FULLPERL = /usr/bin/perl $(MAP_TARGET) :: static $(MAKE_APERL_FILE) $(MAKE) $(USEMAKEFILE) $(MAKE_APERL_FILE) $@ $(MAKE_APERL_FILE) : $(FIRST_MAKEFILE) pm_to_blib $(NOECHO) $(ECHO) Writing \"$(MAKE_APERL_FILE)\" for this $(MAP_TARGET) $(NOECHO) $(PERLRUNINST) \ Makefile.PL DIR=Ext \ MAKEFILE=$(MAKE_APERL_FILE) LINKTYPE=static \ MAKEAPERL=1 NORECURS=1 CCCDLFLAGS= # --- MakeMaker test section: TEST_VERBOSE=0 TEST_TYPE=test_$(LINKTYPE) TEST_FILE = test.pl TEST_FILES = t/*.t TESTDB_SW = -d testdb :: testdb_$(LINKTYPE) test :: $(TEST_TYPE) subdirs-test subdirs-test :: $(NOECHO) $(NOOP) subdirs-test :: $(NOECHO) cd Ext && $(MAKE) test $(PASTHRU) test_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES) testdb_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) $(TESTDB_SW) "-I$(INST_LIB)" "-I$(INST_ARCHLIB)" $(TEST_FILE) test_ : test_dynamic test_static :: pure_all $(MAP_TARGET) PERL_DL_NONLAZY=1 ./$(MAP_TARGET) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES) testdb_static :: pure_all $(MAP_TARGET) PERL_DL_NONLAZY=1 ./$(MAP_TARGET) $(TESTDB_SW) "-I$(INST_LIB)" "-I$(INST_ARCHLIB)" $(TEST_FILE) # --- MakeMaker ppd section: # Creates a PPD (Perl Package Description) for a binary distribution. ppd : $(NOECHO) $(ECHO) '' > $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) '' >> $(DISTNAME).ppd # --- MakeMaker pm_to_blib section: pm_to_blib : $(FIRST_MAKEFILE) $(TO_INST_PM) $(NOECHO) $(ABSPERLRUN) -MExtUtils::Install -e 'pm_to_blib({@ARGV}, '\''$(INST_LIB)/auto'\'', q[$(PM_FILTER)], '\''$(PERM_DIR)'\'')' -- \ TFBS/PatternGen/YMF.pm $(INST_LIB)/TFBS/PatternGen/YMF.pm \ TFBS/DB/TRANSFAC.pm $(INST_LIB)/TFBS/DB/TRANSFAC.pm \ TFBS/DB.pm $(INST_LIB)/TFBS/DB.pm \ TFBS/DB/LocalTRANSFAC.pm $(INST_LIB)/TFBS/DB/LocalTRANSFAC.pm \ TFBS/PatternGen/MEME.pm $(INST_LIB)/TFBS/PatternGen/MEME.pm \ TFBS/PatternGen/Gibbs.pm $(INST_LIB)/TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/Elph.pm $(INST_LIB)/TFBS/PatternGen/Elph.pm \ TFBS/DB/JASPAR4.pm $(INST_LIB)/TFBS/DB/JASPAR4.pm \ TFBS/PatternGenI.pm $(INST_LIB)/TFBS/PatternGenI.pm \ TFBS/Matrix/ICM.pm $(INST_LIB)/TFBS/Matrix/ICM.pm \ TFBS/PatternGen/YMF/Motif.pm $(INST_LIB)/TFBS/PatternGen/YMF/Motif.pm \ TFBS/SitePairSet.pm $(INST_LIB)/TFBS/SitePairSet.pm \ TFBS/PatternGen/Elph/Motif.pm $(INST_LIB)/TFBS/PatternGen/Elph/Motif.pm \ TFBS/Site.pm $(INST_LIB)/TFBS/Site.pm \ TFBS/_Iterator/_MatrixSetIterator.pm $(INST_LIB)/TFBS/_Iterator/_MatrixSetIterator.pm \ TFBS/Tools/SetOperations.pm $(INST_LIB)/TFBS/Tools/SetOperations.pm \ TFBS/Matrix/PFM.pm $(INST_LIB)/TFBS/Matrix/PFM.pm \ TFBS/PatternI.pm $(INST_LIB)/TFBS/PatternI.pm \ TFBS/Word/Consensus.pm $(INST_LIB)/TFBS/Word/Consensus.pm \ TFBS/PatternGen/Gibbs/Motif.pm $(INST_LIB)/TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/SiteSet.pm $(INST_LIB)/TFBS/SiteSet.pm \ TFBS/Matrix/_Alignment.pm $(INST_LIB)/TFBS/Matrix/_Alignment.pm \ TFBS/PatternGen/AnnSpec.pm $(INST_LIB)/TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/Motif/Word.pm $(INST_LIB)/TFBS/PatternGen/Motif/Word.pm \ TFBS/SitePair.pm $(INST_LIB)/TFBS/SitePair.pm \ TFBS/_Iterator/_SiteSetIterator.pm $(INST_LIB)/TFBS/_Iterator/_SiteSetIterator.pm \ TFBS/_Iterator.pm $(INST_LIB)/TFBS/_Iterator.pm \ TFBS/PatternGen/MEME/Motif.pm $(INST_LIB)/TFBS/PatternGen/MEME/Motif.pm \ TFBS/DB/FlatFileDir.pm $(INST_LIB)/TFBS/DB/FlatFileDir.pm \ TFBS/PatternGen.pm $(INST_LIB)/TFBS/PatternGen.pm \ TFBS/MatrixSet.pm $(INST_LIB)/TFBS/MatrixSet.pm \ TFBS/PatternGen/SimplePFM.pm $(INST_LIB)/TFBS/PatternGen/SimplePFM.pm \ TFBS/Word.pm $(INST_LIB)/TFBS/Word.pm \ TFBS/DB/JASPAR2.pm $(INST_LIB)/TFBS/DB/JASPAR2.pm \ TFBS/Matrix.pm $(INST_LIB)/TFBS/Matrix.pm \ TFBS/PatternGen/AnnSpec/Motif.pm $(INST_LIB)/TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/PatternGen/Motif/Matrix.pm $(INST_LIB)/TFBS/PatternGen/Motif/Matrix.pm \ TFBS/Matrix/PWM.pm $(INST_LIB)/TFBS/Matrix/PWM.pm $(NOECHO) $(TOUCH) pm_to_blib # --- MakeMaker selfdocument section: # --- MakeMaker postamble section: # End. TFBS-0.7.1/Makefile.PL000066400000000000000000000072611305752266700142520ustar00rootroot00000000000000require 5.006; use ExtUtils::MakeMaker; my $NAME = 'TFBS'; my $DISTNAME = "TFBS"; my $VERSION = "0.5.0"; get_sql_data(); WriteMakefile( NAME => $NAME, DISTNAME => $DISTNAME, VERSION => $VERSION, 'dist' => { COMPRESS => 'gzip -9f', SUFFIX => '.gz', DIST_DEFAULT => 'all tardist', }, ); sub get_sql_data { my $ans = "abc"; do { print "Do you have write access to a MySQL database server? [n] "; $ans=; chomp $ans; } until $ans =~ /^y|n/i or $ans eq ""; if (uc(substr($ans,0,1)) eq 'Y') { print "\nOK, tell me more about it.\n\n"; print "\tHost name : [localhost] "; my $hostname = ; chomp $hostname; $hostname = 'localhost' unless $hostname; print "\tUsername : [none] "; my $username = ; chomp $username; $username = '' unless $username; print "\tPassword : [none] "; my $password = ; chomp $password; $password = '' unless password; open FILE, ">t/MYSQLCONNECT" or die "Can't write to t/ directory, stopped"; print FILE join("::", $hostname, $username, $password, " "); close FILE; } else { unlink "t/MYSQLCONNECT" if -e "t/MYSQLCONNECT"; } } BEGIN { my $fail = 0; unless (eval "use Bio::Root::RootI;1") { $fail = 1; print qq! ------------------------------------------------- WARNING ------------------------------------------------- Bioperl does not seem to be installed. Bioperl 1.0 or newer is unconditionally required by TFBS. Please install Bioperl BEFORE proceeding with TFBS installation. Go to http://bioperl.org for information on how to obtain and install it. ------------------------------------------------- !; } unless (eval "use PDL; 1") { $fail = 1; print qq! ------------------------------------------------- WARNING ------------------------------------------------- PDL (Perl Data Language) does not seem to be installed. PDL is unconditionally required by TFBS. Please install PDL BEFORE proceeding with TFBS installation. Go to http://pdl.perl.org for information on how to obtain and install it. NOTE FOR LINUX USERS: PDL binary packages (.rpm, .deb) are included in all major Linux distributions and repositories. Unless you are an advanced Linux user, it is recommended that you install PDL from one of these packages, or from CPAN command line. ------------------------------------------------- !; } unless (eval "use File::Temp; 1") { $fail = 1; print qq! ------------------------------------------------- WARNING ------------------------------------------------- File::Temp package does not seem to be installed. File::Temp is unconditionally required by TFBS. Please install File::Temp BEFORE proceeding with TFBS installation. The package is available from CPAN (http://cpan.perl.org/). ------------------------------------------------- !; } if ($fail) { print STDERR "TFBS installation aborted.\n"; print STDERR "Please install one or more missing modules before proceeding\n\n"; exit(0); } unless (eval "use GD; 1") { # do not fail print qq! ------------------------------------------------- WARNING ------------------------------------------------- GD.pm does not seem to be installed. GD is reqired to produce "sequence logos" from information content matrices. If you need this functionality, please visit http://stein.cshl.org/WWW/software/GD/ for information on obtaining and installing GD. ------------------------------------------------- !; } }; TFBS-0.7.1/Makefile.old000066400000000000000000001117041305752266700145130ustar00rootroot00000000000000# This Makefile is for the TFBS extension to perl. # # It was generated automatically by MakeMaker version # 6.68 (Revision: 66800) from the contents of # Makefile.PL. Don't edit this file, edit Makefile.PL instead. # # ANY CHANGES MADE HERE WILL BE LOST! # # MakeMaker ARGV: () # # MakeMaker Parameters: # BUILD_REQUIRES => { } # CONFIGURE_REQUIRES => { } # DISTNAME => q[TFBS] # NAME => q[TFBS] # PREREQ_PM => { } # TEST_REQUIRES => { } # VERSION => q[0.5.0] # dist => { DIST_DEFAULT=>q[all tardist], COMPRESS=>q[gzip -9f], SUFFIX=>q[.gz] } # --- MakeMaker post_initialize section: # --- MakeMaker const_config section: # These definitions are from config.sh (via /usr/local/Cellar/perl/5.14.3/lib/5.14.3/darwin-2level/Config.pm). # They may have been overridden via Makefile.PL or on the command line. AR = ar CC = cc CCCDLFLAGS = CCDLFLAGS = DLEXT = bundle DLSRC = dl_dlopen.xs EXE_EXT = FULL_AR = /usr/bin/ar LD = env MACOSX_DEPLOYMENT_TARGET=10.3 cc LDDLFLAGS = -bundle -undefined dynamic_lookup -L/usr/local/lib -fstack-protector LDFLAGS = -fstack-protector -L/usr/local/lib LIBC = LIB_EXT = .a OBJ_EXT = .o OSNAME = darwin OSVERS = 12.2.1 RANLIB = ranlib SITELIBEXP = /usr/local/Cellar/perl/5.14.3/lib/site_perl/5.14.3 SITEARCHEXP = /usr/local/Cellar/perl/5.14.3/lib/site_perl/5.14.3/darwin-2level SO = dylib VENDORARCHEXP = VENDORLIBEXP = # --- MakeMaker constants section: AR_STATIC_ARGS = cr DIRFILESEP = / DFSEP = $(DIRFILESEP) NAME = TFBS NAME_SYM = TFBS VERSION = 0.5.0 VERSION_MACRO = VERSION VERSION_SYM = 0_5_0 DEFINE_VERSION = -D$(VERSION_MACRO)=\"$(VERSION)\" XS_VERSION = 0.5.0 XS_VERSION_MACRO = XS_VERSION XS_DEFINE_VERSION = -D$(XS_VERSION_MACRO)=\"$(XS_VERSION)\" INST_ARCHLIB = blib/arch INST_SCRIPT = blib/script INST_BIN = blib/bin INST_LIB = blib/lib INST_MAN1DIR = blib/man1 INST_MAN3DIR = blib/man3 MAN1EXT = 1 MAN3EXT = 3 INSTALLDIRS = site DESTDIR = PREFIX = $(SITEPREFIX) PERLPREFIX = /usr/local/Cellar/perl/5.14.3 SITEPREFIX = /usr/local/Cellar/perl/5.14.3 VENDORPREFIX = INSTALLPRIVLIB = /usr/local/Cellar/perl/5.14.3/lib/5.14.3 DESTINSTALLPRIVLIB = $(DESTDIR)$(INSTALLPRIVLIB) INSTALLSITELIB = /usr/local/Cellar/perl/5.14.3/lib/site_perl/5.14.3 DESTINSTALLSITELIB = $(DESTDIR)$(INSTALLSITELIB) INSTALLVENDORLIB = DESTINSTALLVENDORLIB = $(DESTDIR)$(INSTALLVENDORLIB) INSTALLARCHLIB = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/darwin-2level DESTINSTALLARCHLIB = $(DESTDIR)$(INSTALLARCHLIB) INSTALLSITEARCH = /usr/local/Cellar/perl/5.14.3/lib/site_perl/5.14.3/darwin-2level DESTINSTALLSITEARCH = $(DESTDIR)$(INSTALLSITEARCH) INSTALLVENDORARCH = DESTINSTALLVENDORARCH = $(DESTDIR)$(INSTALLVENDORARCH) INSTALLBIN = /usr/local/Cellar/perl/5.14.3/bin DESTINSTALLBIN = $(DESTDIR)$(INSTALLBIN) INSTALLSITEBIN = /usr/local/Cellar/perl/5.14.3/bin DESTINSTALLSITEBIN = $(DESTDIR)$(INSTALLSITEBIN) INSTALLVENDORBIN = DESTINSTALLVENDORBIN = $(DESTDIR)$(INSTALLVENDORBIN) INSTALLSCRIPT = /usr/local/Cellar/perl/5.14.3/bin DESTINSTALLSCRIPT = $(DESTDIR)$(INSTALLSCRIPT) INSTALLSITESCRIPT = /usr/local/Cellar/perl/5.14.3/bin DESTINSTALLSITESCRIPT = $(DESTDIR)$(INSTALLSITESCRIPT) INSTALLVENDORSCRIPT = DESTINSTALLVENDORSCRIPT = $(DESTDIR)$(INSTALLVENDORSCRIPT) INSTALLMAN1DIR = /usr/local/Cellar/perl/5.14.3/share/man/man1 DESTINSTALLMAN1DIR = $(DESTDIR)$(INSTALLMAN1DIR) INSTALLSITEMAN1DIR = /usr/local/Cellar/perl/5.14.3/share/man/man1 DESTINSTALLSITEMAN1DIR = $(DESTDIR)$(INSTALLSITEMAN1DIR) INSTALLVENDORMAN1DIR = DESTINSTALLVENDORMAN1DIR = $(DESTDIR)$(INSTALLVENDORMAN1DIR) INSTALLMAN3DIR = /usr/local/Cellar/perl/5.14.3/share/man/man3 DESTINSTALLMAN3DIR = $(DESTDIR)$(INSTALLMAN3DIR) INSTALLSITEMAN3DIR = /usr/local/Cellar/perl/5.14.3/share/man/man3 DESTINSTALLSITEMAN3DIR = $(DESTDIR)$(INSTALLSITEMAN3DIR) INSTALLVENDORMAN3DIR = DESTINSTALLVENDORMAN3DIR = $(DESTDIR)$(INSTALLVENDORMAN3DIR) PERL_LIB = /usr/local/Cellar/perl/5.14.3/lib/5.14.3 PERL_ARCHLIB = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/darwin-2level LIBPERL_A = libperl.a FIRST_MAKEFILE = Makefile MAKEFILE_OLD = Makefile.old MAKE_APERL_FILE = Makefile.aperl PERLMAINCC = $(CC) PERL_INC = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/darwin-2level/CORE PERL = /usr/local/bin/perl FULLPERL = /usr/local/bin/perl ABSPERL = $(PERL) PERLRUN = $(PERL) FULLPERLRUN = $(FULLPERL) ABSPERLRUN = $(ABSPERL) PERLRUNINST = $(PERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" FULLPERLRUNINST = $(FULLPERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" ABSPERLRUNINST = $(ABSPERLRUN) "-I$(INST_ARCHLIB)" "-I$(INST_LIB)" PERL_CORE = 0 PERM_DIR = 755 PERM_RW = 644 PERM_RWX = 755 MAKEMAKER = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/ExtUtils/MakeMaker.pm MM_VERSION = 6.68 MM_REVISION = 66800 # FULLEXT = Pathname for extension directory (eg Foo/Bar/Oracle). # BASEEXT = Basename part of FULLEXT. May be just equal FULLEXT. (eg Oracle) # PARENT_NAME = NAME without BASEEXT and no trailing :: (eg Foo::Bar) # DLBASE = Basename part of dynamic library. May be just equal BASEEXT. MAKE = make FULLEXT = TFBS BASEEXT = TFBS PARENT_NAME = DLBASE = $(BASEEXT) VERSION_FROM = OBJECT = LDFROM = $(OBJECT) LINKTYPE = dynamic BOOTDEP = # Handy lists of source code files: XS_FILES = C_FILES = O_FILES = H_FILES = MAN1PODS = MAN3PODS = TFBS/DB/FlatFileDir.pm \ TFBS/DB/JASPAR2.pm \ TFBS/DB/JASPAR4.pm \ TFBS/DB/LocalTRANSFAC.pm \ TFBS/DB/TRANSFAC.pm \ TFBS/Matrix.pm \ TFBS/Matrix/ICM.pm \ TFBS/Matrix/PFM.pm \ TFBS/Matrix/PWM.pm \ TFBS/MatrixSet.pm \ TFBS/PatternGen.pm \ TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/PatternGen/Elph.pm \ TFBS/PatternGen/Elph/Motif.pm \ TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/PatternGen/MEME.pm \ TFBS/PatternGen/MEME/Motif.pm \ TFBS/PatternGen/SimplePFM.pm \ TFBS/PatternGen/YMF.pm \ TFBS/PatternGen/YMF/Motif.pm \ TFBS/PatternI.pm \ TFBS/Site.pm \ TFBS/SitePair.pm \ TFBS/SitePairSet.pm \ TFBS/SiteSet.pm \ TFBS/Word.pm \ TFBS/Word/Consensus.pm # Where is the Config information that we are using/depend on CONFIGDEP = $(PERL_ARCHLIB)$(DFSEP)Config.pm $(PERL_INC)$(DFSEP)config.h # Where to build things INST_LIBDIR = $(INST_LIB) INST_ARCHLIBDIR = $(INST_ARCHLIB) INST_AUTODIR = $(INST_LIB)/auto/$(FULLEXT) INST_ARCHAUTODIR = $(INST_ARCHLIB)/auto/$(FULLEXT) INST_STATIC = INST_DYNAMIC = INST_BOOT = # Extra linker info EXPORT_LIST = PERL_ARCHIVE = PERL_ARCHIVE_AFTER = TO_INST_PM = TFBS/DB.pm \ TFBS/DB/FlatFileDir.pm \ TFBS/DB/JASPAR2.pm \ TFBS/DB/JASPAR4.pm \ TFBS/DB/LocalTRANSFAC.pm \ TFBS/DB/TRANSFAC.pm \ TFBS/Matrix.pm \ TFBS/Matrix/ICM.pm \ TFBS/Matrix/PFM.pm \ TFBS/Matrix/PWM.pm \ TFBS/Matrix/_Alignment.pm \ TFBS/MatrixSet.pm \ TFBS/PatternGen.pm \ TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/PatternGen/Elph.pm \ TFBS/PatternGen/Elph/Motif.pm \ TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/PatternGen/MEME.pm \ TFBS/PatternGen/MEME/Motif.pm \ TFBS/PatternGen/Motif/Matrix.pm \ TFBS/PatternGen/Motif/Word.pm \ TFBS/PatternGen/SimplePFM.pm \ TFBS/PatternGen/YMF.pm \ TFBS/PatternGen/YMF/Motif.pm \ TFBS/PatternGenI.pm \ TFBS/PatternI.pm \ TFBS/Site.pm \ TFBS/SitePair.pm \ TFBS/SitePairSet.pm \ TFBS/SiteSet.pm \ TFBS/Tools/SetOperations.pm \ TFBS/Word.pm \ TFBS/Word/Consensus.pm \ TFBS/_Iterator.pm \ TFBS/_Iterator/_MatrixSetIterator.pm \ TFBS/_Iterator/_SiteSetIterator.pm PM_TO_BLIB = TFBS/PatternGen/YMF.pm \ $(INST_LIB)/TFBS/PatternGen/YMF.pm \ TFBS/DB/TRANSFAC.pm \ $(INST_LIB)/TFBS/DB/TRANSFAC.pm \ TFBS/DB.pm \ $(INST_LIB)/TFBS/DB.pm \ TFBS/DB/LocalTRANSFAC.pm \ $(INST_LIB)/TFBS/DB/LocalTRANSFAC.pm \ TFBS/PatternGen/MEME.pm \ $(INST_LIB)/TFBS/PatternGen/MEME.pm \ TFBS/PatternGen/Gibbs.pm \ $(INST_LIB)/TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/Elph.pm \ $(INST_LIB)/TFBS/PatternGen/Elph.pm \ TFBS/DB/JASPAR4.pm \ $(INST_LIB)/TFBS/DB/JASPAR4.pm \ TFBS/PatternGenI.pm \ $(INST_LIB)/TFBS/PatternGenI.pm \ TFBS/Matrix/ICM.pm \ $(INST_LIB)/TFBS/Matrix/ICM.pm \ TFBS/PatternGen/YMF/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/YMF/Motif.pm \ TFBS/SitePairSet.pm \ $(INST_LIB)/TFBS/SitePairSet.pm \ TFBS/PatternGen/Elph/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/Elph/Motif.pm \ TFBS/Site.pm \ $(INST_LIB)/TFBS/Site.pm \ TFBS/_Iterator/_MatrixSetIterator.pm \ $(INST_LIB)/TFBS/_Iterator/_MatrixSetIterator.pm \ TFBS/Tools/SetOperations.pm \ $(INST_LIB)/TFBS/Tools/SetOperations.pm \ TFBS/Matrix/PFM.pm \ $(INST_LIB)/TFBS/Matrix/PFM.pm \ TFBS/PatternI.pm \ $(INST_LIB)/TFBS/PatternI.pm \ TFBS/Word/Consensus.pm \ $(INST_LIB)/TFBS/Word/Consensus.pm \ TFBS/PatternGen/Gibbs/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/SiteSet.pm \ $(INST_LIB)/TFBS/SiteSet.pm \ TFBS/Matrix/_Alignment.pm \ $(INST_LIB)/TFBS/Matrix/_Alignment.pm \ TFBS/PatternGen/AnnSpec.pm \ $(INST_LIB)/TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/Motif/Word.pm \ $(INST_LIB)/TFBS/PatternGen/Motif/Word.pm \ TFBS/SitePair.pm \ $(INST_LIB)/TFBS/SitePair.pm \ TFBS/_Iterator/_SiteSetIterator.pm \ $(INST_LIB)/TFBS/_Iterator/_SiteSetIterator.pm \ TFBS/_Iterator.pm \ $(INST_LIB)/TFBS/_Iterator.pm \ TFBS/PatternGen/MEME/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/MEME/Motif.pm \ TFBS/DB/FlatFileDir.pm \ $(INST_LIB)/TFBS/DB/FlatFileDir.pm \ TFBS/PatternGen.pm \ $(INST_LIB)/TFBS/PatternGen.pm \ TFBS/MatrixSet.pm \ $(INST_LIB)/TFBS/MatrixSet.pm \ TFBS/PatternGen/SimplePFM.pm \ $(INST_LIB)/TFBS/PatternGen/SimplePFM.pm \ TFBS/Word.pm \ $(INST_LIB)/TFBS/Word.pm \ TFBS/DB/JASPAR2.pm \ $(INST_LIB)/TFBS/DB/JASPAR2.pm \ TFBS/Matrix.pm \ $(INST_LIB)/TFBS/Matrix.pm \ TFBS/PatternGen/AnnSpec/Motif.pm \ $(INST_LIB)/TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/PatternGen/Motif/Matrix.pm \ $(INST_LIB)/TFBS/PatternGen/Motif/Matrix.pm \ TFBS/Matrix/PWM.pm \ $(INST_LIB)/TFBS/Matrix/PWM.pm # --- MakeMaker platform_constants section: MM_Unix_VERSION = 6.68 PERL_MALLOC_DEF = -DPERL_EXTMALLOC_DEF -Dmalloc=Perl_malloc -Dfree=Perl_mfree -Drealloc=Perl_realloc -Dcalloc=Perl_calloc # --- MakeMaker tool_autosplit section: # Usage: $(AUTOSPLITFILE) FileToSplit AutoDirToSplitInto AUTOSPLITFILE = $(ABSPERLRUN) -e 'use AutoSplit; autosplit($$$$ARGV[0], $$$$ARGV[1], 0, 1, 1)' -- # --- MakeMaker tool_xsubpp section: XSUBPPDIR = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/ExtUtils XSUBPP = $(XSUBPPDIR)$(DFSEP)xsubpp XSUBPPRUN = $(PERLRUN) $(XSUBPP) XSPROTOARG = XSUBPPDEPS = /usr/local/Cellar/perl/5.14.3/lib/5.14.3/ExtUtils/typemap $(XSUBPP) XSUBPPARGS = -typemap /usr/local/Cellar/perl/5.14.3/lib/5.14.3/ExtUtils/typemap XSUBPP_EXTRA_ARGS = # --- MakeMaker tools_other section: SHELL = /bin/sh CHMOD = chmod CP = cp MV = mv NOOP = $(TRUE) NOECHO = @ RM_F = rm -f RM_RF = rm -rf TEST_F = test -f TOUCH = touch UMASK_NULL = umask 0 DEV_NULL = > /dev/null 2>&1 MKPATH = $(ABSPERLRUN) -MExtUtils::Command -e 'mkpath' -- EQUALIZE_TIMESTAMP = $(ABSPERLRUN) -MExtUtils::Command -e 'eqtime' -- FALSE = false TRUE = true ECHO = echo ECHO_N = echo -n UNINST = 0 VERBINST = 0 MOD_INSTALL = $(ABSPERLRUN) -MExtUtils::Install -e 'install([ from_to => {@ARGV}, verbose => '\''$(VERBINST)'\'', uninstall_shadows => '\''$(UNINST)'\'', dir_mode => '\''$(PERM_DIR)'\'' ]);' -- DOC_INSTALL = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'perllocal_install' -- UNINSTALL = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'uninstall' -- WARN_IF_OLD_PACKLIST = $(ABSPERLRUN) -MExtUtils::Command::MM -e 'warn_if_old_packlist' -- MACROSTART = MACROEND = USEMAKEFILE = -f FIXIN = $(ABSPERLRUN) -MExtUtils::MY -e 'MY->fixin(shift)' -- # --- MakeMaker makemakerdflt section: makemakerdflt : all $(NOECHO) $(NOOP) # --- MakeMaker dist section: TAR = COPY_EXTENDED_ATTRIBUTES_DISABLE=1 COPYFILE_DISABLE=1 tar TARFLAGS = cvf ZIP = zip ZIPFLAGS = -r COMPRESS = gzip -9f SUFFIX = .gz SHAR = shar PREOP = $(NOECHO) $(NOOP) POSTOP = $(NOECHO) $(NOOP) TO_UNIX = $(NOECHO) $(NOOP) CI = ci -u RCS_LABEL = rcs -Nv$(VERSION_SYM): -q DIST_CP = best DIST_DEFAULT = all tardist DISTNAME = TFBS DISTVNAME = TFBS-0.5.0 # --- MakeMaker macro section: # --- MakeMaker depend section: # --- MakeMaker cflags section: CCFLAGS = -fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include OPTIMIZE = -O3 PERLTYPE = MPOLLUTE = # --- MakeMaker const_loadlibs section: # TFBS might depend on some other libraries: # See ExtUtils::Liblist for details # # --- MakeMaker const_cccmd section: CCCMD = $(CC) -c $(PASTHRU_INC) $(INC) \ $(CCFLAGS) $(OPTIMIZE) \ $(PERLTYPE) $(MPOLLUTE) $(DEFINE_VERSION) \ $(XS_DEFINE_VERSION) # --- MakeMaker post_constants section: # --- MakeMaker pasthru section: PASTHRU = LIBPERL_A="$(LIBPERL_A)"\ LINKTYPE="$(LINKTYPE)"\ OPTIMIZE="$(OPTIMIZE)"\ PREFIX="$(PREFIX)" # --- MakeMaker special_targets section: .SUFFIXES : .xs .c .C .cpp .i .s .cxx .cc $(OBJ_EXT) .PHONY: all config static dynamic test linkext manifest blibdirs clean realclean disttest distdir # --- MakeMaker c_o section: .c.i: cc -E -c $(PASTHRU_INC) $(INC) \ $(CCFLAGS) $(OPTIMIZE) \ $(PERLTYPE) $(MPOLLUTE) $(DEFINE_VERSION) \ $(XS_DEFINE_VERSION) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c > $*.i .c.s: $(CCCMD) -S $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c .c$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c .cpp$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cpp .cxx$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cxx .cc$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.cc .C$(OBJ_EXT): $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.C # --- MakeMaker xs_c section: .xs.c: $(XSUBPPRUN) $(XSPROTOARG) $(XSUBPPARGS) $(XSUBPP_EXTRA_ARGS) $*.xs > $*.xsc && $(MV) $*.xsc $*.c # --- MakeMaker xs_o section: .xs$(OBJ_EXT): $(XSUBPPRUN) $(XSPROTOARG) $(XSUBPPARGS) $*.xs > $*.xsc && $(MV) $*.xsc $*.c $(CCCMD) $(CCCDLFLAGS) "-I$(PERL_INC)" $(PASTHRU_DEFINE) $(DEFINE) $*.c # --- MakeMaker top_targets section: all :: pure_all manifypods $(NOECHO) $(NOOP) pure_all :: config pm_to_blib subdirs linkext $(NOECHO) $(NOOP) subdirs :: $(MYEXTLIB) $(NOECHO) $(NOOP) config :: $(FIRST_MAKEFILE) blibdirs $(NOECHO) $(NOOP) help : perldoc ExtUtils::MakeMaker # --- MakeMaker blibdirs section: blibdirs : $(INST_LIBDIR)$(DFSEP).exists $(INST_ARCHLIB)$(DFSEP).exists $(INST_AUTODIR)$(DFSEP).exists $(INST_ARCHAUTODIR)$(DFSEP).exists $(INST_BIN)$(DFSEP).exists $(INST_SCRIPT)$(DFSEP).exists $(INST_MAN1DIR)$(DFSEP).exists $(INST_MAN3DIR)$(DFSEP).exists $(NOECHO) $(NOOP) # Backwards compat with 6.18 through 6.25 blibdirs.ts : blibdirs $(NOECHO) $(NOOP) $(INST_LIBDIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_LIBDIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_LIBDIR) $(NOECHO) $(TOUCH) $(INST_LIBDIR)$(DFSEP).exists $(INST_ARCHLIB)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_ARCHLIB) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_ARCHLIB) $(NOECHO) $(TOUCH) $(INST_ARCHLIB)$(DFSEP).exists $(INST_AUTODIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_AUTODIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_AUTODIR) $(NOECHO) $(TOUCH) $(INST_AUTODIR)$(DFSEP).exists $(INST_ARCHAUTODIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_ARCHAUTODIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_ARCHAUTODIR) $(NOECHO) $(TOUCH) $(INST_ARCHAUTODIR)$(DFSEP).exists $(INST_BIN)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_BIN) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_BIN) $(NOECHO) $(TOUCH) $(INST_BIN)$(DFSEP).exists $(INST_SCRIPT)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_SCRIPT) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_SCRIPT) $(NOECHO) $(TOUCH) $(INST_SCRIPT)$(DFSEP).exists $(INST_MAN1DIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_MAN1DIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_MAN1DIR) $(NOECHO) $(TOUCH) $(INST_MAN1DIR)$(DFSEP).exists $(INST_MAN3DIR)$(DFSEP).exists :: Makefile.PL $(NOECHO) $(MKPATH) $(INST_MAN3DIR) $(NOECHO) $(CHMOD) $(PERM_DIR) $(INST_MAN3DIR) $(NOECHO) $(TOUCH) $(INST_MAN3DIR)$(DFSEP).exists # --- MakeMaker linkext section: linkext :: $(LINKTYPE) $(NOECHO) $(NOOP) # --- MakeMaker dlsyms section: # --- MakeMaker dynamic section: dynamic :: $(FIRST_MAKEFILE) $(INST_DYNAMIC) $(INST_BOOT) $(NOECHO) $(NOOP) # --- MakeMaker dynamic_bs section: BOOTSTRAP = # --- MakeMaker dynamic_lib section: # --- MakeMaker static section: ## $(INST_PM) has been moved to the all: target. ## It remains here for awhile to allow for old usage: "make static" static :: $(FIRST_MAKEFILE) $(INST_STATIC) $(NOECHO) $(NOOP) # --- MakeMaker static_lib section: # --- MakeMaker manifypods section: POD2MAN_EXE = $(PERLRUN) "-MExtUtils::Command::MM" -e pod2man "--" POD2MAN = $(POD2MAN_EXE) manifypods : pure_all \ TFBS/Matrix/PFM.pm \ TFBS/PatternI.pm \ TFBS/PatternGen/YMF.pm \ TFBS/DB/TRANSFAC.pm \ TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/Word/Consensus.pm \ TFBS/SiteSet.pm \ TFBS/DB/LocalTRANSFAC.pm \ TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/MEME.pm \ TFBS/SitePair.pm \ TFBS/PatternGen/Elph.pm \ TFBS/DB/FlatFileDir.pm \ TFBS/PatternGen/MEME/Motif.pm \ TFBS/DB/JASPAR4.pm \ TFBS/PatternGen.pm \ TFBS/MatrixSet.pm \ TFBS/PatternGen/SimplePFM.pm \ TFBS/Matrix/ICM.pm \ TFBS/Word.pm \ TFBS/SitePairSet.pm \ TFBS/PatternGen/YMF/Motif.pm \ TFBS/DB/JASPAR2.pm \ TFBS/Matrix.pm \ TFBS/PatternGen/Elph/Motif.pm \ TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/Site.pm \ TFBS/Matrix/PWM.pm $(NOECHO) $(POD2MAN) --section=3 --perm_rw=$(PERM_RW) \ TFBS/Matrix/PFM.pm $(INST_MAN3DIR)/TFBS::Matrix::PFM.$(MAN3EXT) \ TFBS/PatternI.pm $(INST_MAN3DIR)/TFBS::PatternI.$(MAN3EXT) \ TFBS/PatternGen/YMF.pm $(INST_MAN3DIR)/TFBS::PatternGen::YMF.$(MAN3EXT) \ TFBS/DB/TRANSFAC.pm $(INST_MAN3DIR)/TFBS::DB::TRANSFAC.$(MAN3EXT) \ TFBS/PatternGen/Gibbs/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::Gibbs::Motif.$(MAN3EXT) \ TFBS/Word/Consensus.pm $(INST_MAN3DIR)/TFBS::Word::Consensus.$(MAN3EXT) \ TFBS/SiteSet.pm $(INST_MAN3DIR)/TFBS::SiteSet.$(MAN3EXT) \ TFBS/DB/LocalTRANSFAC.pm $(INST_MAN3DIR)/TFBS::DB::LocalTRANSFAC.$(MAN3EXT) \ TFBS/PatternGen/AnnSpec.pm $(INST_MAN3DIR)/TFBS::PatternGen::AnnSpec.$(MAN3EXT) \ TFBS/PatternGen/Gibbs.pm $(INST_MAN3DIR)/TFBS::PatternGen::Gibbs.$(MAN3EXT) \ TFBS/PatternGen/MEME.pm $(INST_MAN3DIR)/TFBS::PatternGen::MEME.$(MAN3EXT) \ TFBS/SitePair.pm $(INST_MAN3DIR)/TFBS::SitePair.$(MAN3EXT) \ TFBS/PatternGen/Elph.pm $(INST_MAN3DIR)/TFBS::PatternGen::Elph.$(MAN3EXT) \ TFBS/DB/FlatFileDir.pm $(INST_MAN3DIR)/TFBS::DB::FlatFileDir.$(MAN3EXT) \ TFBS/PatternGen/MEME/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::MEME::Motif.$(MAN3EXT) \ TFBS/DB/JASPAR4.pm $(INST_MAN3DIR)/TFBS::DB::JASPAR4.$(MAN3EXT) \ TFBS/PatternGen.pm $(INST_MAN3DIR)/TFBS::PatternGen.$(MAN3EXT) \ TFBS/MatrixSet.pm $(INST_MAN3DIR)/TFBS::MatrixSet.$(MAN3EXT) \ TFBS/PatternGen/SimplePFM.pm $(INST_MAN3DIR)/TFBS::PatternGen::SimplePFM.$(MAN3EXT) \ TFBS/Matrix/ICM.pm $(INST_MAN3DIR)/TFBS::Matrix::ICM.$(MAN3EXT) \ TFBS/Word.pm $(INST_MAN3DIR)/TFBS::Word.$(MAN3EXT) \ TFBS/SitePairSet.pm $(INST_MAN3DIR)/TFBS::SitePairSet.$(MAN3EXT) \ TFBS/PatternGen/YMF/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::YMF::Motif.$(MAN3EXT) \ TFBS/DB/JASPAR2.pm $(INST_MAN3DIR)/TFBS::DB::JASPAR2.$(MAN3EXT) \ TFBS/Matrix.pm $(INST_MAN3DIR)/TFBS::Matrix.$(MAN3EXT) \ TFBS/PatternGen/Elph/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::Elph::Motif.$(MAN3EXT) \ TFBS/PatternGen/AnnSpec/Motif.pm $(INST_MAN3DIR)/TFBS::PatternGen::AnnSpec::Motif.$(MAN3EXT) \ TFBS/Site.pm $(INST_MAN3DIR)/TFBS::Site.$(MAN3EXT) \ TFBS/Matrix/PWM.pm $(INST_MAN3DIR)/TFBS::Matrix::PWM.$(MAN3EXT) # --- MakeMaker processPL section: # --- MakeMaker installbin section: # --- MakeMaker subdirs section: # The default clean, realclean and test targets in this Makefile # have automatically been given entries for each subdir. subdirs :: $(NOECHO) cd Ext && $(MAKE) $(USEMAKEFILE) $(FIRST_MAKEFILE) all $(PASTHRU) # --- MakeMaker clean_subdirs section: clean_subdirs : $(ABSPERLRUN) -e 'chdir '\''Ext'\''; system '\''$(MAKE) clean'\'' if -f '\''$(FIRST_MAKEFILE)'\'';' -- # --- MakeMaker clean section: # Delete temporary files but do not touch installed files. We don't delete # the Makefile here so a later make realclean still has a makefile to use. clean :: clean_subdirs - $(RM_F) \ *$(LIB_EXT) core \ core.[0-9] $(INST_ARCHAUTODIR)/extralibs.all \ core.[0-9][0-9] $(BASEEXT).bso \ pm_to_blib.ts MYMETA.json \ core.[0-9][0-9][0-9][0-9] MYMETA.yml \ $(BASEEXT).x $(BOOTSTRAP) \ perl$(EXE_EXT) tmon.out \ *$(OBJ_EXT) pm_to_blib \ $(INST_ARCHAUTODIR)/extralibs.ld blibdirs.ts \ core.[0-9][0-9][0-9][0-9][0-9] *perl.core \ core.*perl.*.? $(MAKE_APERL_FILE) \ $(BASEEXT).def perl \ core.[0-9][0-9][0-9] mon.out \ lib$(BASEEXT).def perl.exe \ perlmain.c so_locations \ $(BASEEXT).exp - $(RM_RF) \ blib - $(MV) $(FIRST_MAKEFILE) $(MAKEFILE_OLD) $(DEV_NULL) # --- MakeMaker realclean_subdirs section: realclean_subdirs : - $(ABSPERLRUN) -e 'chdir '\''Ext'\''; system '\''$(MAKE) $(USEMAKEFILE) $(MAKEFILE_OLD) realclean'\'' if -f '\''$(MAKEFILE_OLD)'\'';' -- - $(ABSPERLRUN) -e 'chdir '\''Ext'\''; system '\''$(MAKE) $(USEMAKEFILE) $(FIRST_MAKEFILE) realclean'\'' if -f '\''$(FIRST_MAKEFILE)'\'';' -- # --- MakeMaker realclean section: # Delete temporary files (via clean) and also delete dist files realclean purge :: clean realclean_subdirs - $(RM_F) \ $(MAKEFILE_OLD) $(FIRST_MAKEFILE) - $(RM_RF) \ $(DISTVNAME) # --- MakeMaker metafile section: metafile : create_distdir $(NOECHO) $(ECHO) Generating META.yml $(NOECHO) $(ECHO) '---' > META_new.yml $(NOECHO) $(ECHO) 'abstract: unknown' >> META_new.yml $(NOECHO) $(ECHO) 'author:' >> META_new.yml $(NOECHO) $(ECHO) ' - unknown' >> META_new.yml $(NOECHO) $(ECHO) 'build_requires:' >> META_new.yml $(NOECHO) $(ECHO) ' ExtUtils::MakeMaker: 0' >> META_new.yml $(NOECHO) $(ECHO) 'configure_requires:' >> META_new.yml $(NOECHO) $(ECHO) ' ExtUtils::MakeMaker: 0' >> META_new.yml $(NOECHO) $(ECHO) 'dynamic_config: 1' >> META_new.yml $(NOECHO) $(ECHO) 'generated_by: '\''ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.112621'\''' >> META_new.yml $(NOECHO) $(ECHO) 'license: unknown' >> META_new.yml $(NOECHO) $(ECHO) 'meta-spec:' >> META_new.yml $(NOECHO) $(ECHO) ' url: http://module-build.sourceforge.net/META-spec-v1.4.html' >> META_new.yml $(NOECHO) $(ECHO) ' version: 1.4' >> META_new.yml $(NOECHO) $(ECHO) 'name: TFBS' >> META_new.yml $(NOECHO) $(ECHO) 'no_index:' >> META_new.yml $(NOECHO) $(ECHO) ' directory:' >> META_new.yml $(NOECHO) $(ECHO) ' - t' >> META_new.yml $(NOECHO) $(ECHO) ' - inc' >> META_new.yml $(NOECHO) $(ECHO) 'requires: {}' >> META_new.yml $(NOECHO) $(ECHO) 'version: v0.5.0' >> META_new.yml -$(NOECHO) $(MV) META_new.yml $(DISTVNAME)/META.yml $(NOECHO) $(ECHO) Generating META.json $(NOECHO) $(ECHO) '{' > META_new.json $(NOECHO) $(ECHO) ' "abstract" : "unknown",' >> META_new.json $(NOECHO) $(ECHO) ' "author" : [' >> META_new.json $(NOECHO) $(ECHO) ' "unknown"' >> META_new.json $(NOECHO) $(ECHO) ' ],' >> META_new.json $(NOECHO) $(ECHO) ' "dynamic_config" : 1,' >> META_new.json $(NOECHO) $(ECHO) ' "generated_by" : "ExtUtils::MakeMaker version 6.68, CPAN::Meta::Converter version 2.112621",' >> META_new.json $(NOECHO) $(ECHO) ' "license" : [' >> META_new.json $(NOECHO) $(ECHO) ' "unknown"' >> META_new.json $(NOECHO) $(ECHO) ' ],' >> META_new.json $(NOECHO) $(ECHO) ' "meta-spec" : {' >> META_new.json $(NOECHO) $(ECHO) ' "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",' >> META_new.json $(NOECHO) $(ECHO) ' "version" : "2"' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "name" : "TFBS",' >> META_new.json $(NOECHO) $(ECHO) ' "no_index" : {' >> META_new.json $(NOECHO) $(ECHO) ' "directory" : [' >> META_new.json $(NOECHO) $(ECHO) ' "t",' >> META_new.json $(NOECHO) $(ECHO) ' "inc"' >> META_new.json $(NOECHO) $(ECHO) ' ]' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "prereqs" : {' >> META_new.json $(NOECHO) $(ECHO) ' "build" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {' >> META_new.json $(NOECHO) $(ECHO) ' "ExtUtils::MakeMaker" : 0' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "configure" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {' >> META_new.json $(NOECHO) $(ECHO) ' "ExtUtils::MakeMaker" : 0' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "runtime" : {' >> META_new.json $(NOECHO) $(ECHO) ' "requires" : {}' >> META_new.json $(NOECHO) $(ECHO) ' }' >> META_new.json $(NOECHO) $(ECHO) ' },' >> META_new.json $(NOECHO) $(ECHO) ' "release_status" : "stable",' >> META_new.json $(NOECHO) $(ECHO) ' "version" : "v0.5.0"' >> META_new.json $(NOECHO) $(ECHO) '}' >> META_new.json -$(NOECHO) $(MV) META_new.json $(DISTVNAME)/META.json # --- MakeMaker signature section: signature : cpansign -s # --- MakeMaker dist_basics section: distclean :: realclean distcheck $(NOECHO) $(NOOP) distcheck : $(PERLRUN) "-MExtUtils::Manifest=fullcheck" -e fullcheck skipcheck : $(PERLRUN) "-MExtUtils::Manifest=skipcheck" -e skipcheck manifest : $(PERLRUN) "-MExtUtils::Manifest=mkmanifest" -e mkmanifest veryclean : realclean $(RM_F) *~ */*~ *.orig */*.orig *.bak */*.bak *.old */*.old # --- MakeMaker dist_core section: dist : $(DIST_DEFAULT) $(FIRST_MAKEFILE) $(NOECHO) $(ABSPERLRUN) -l -e 'print '\''Warning: Makefile possibly out of date with $(VERSION_FROM)'\''' \ -e ' if -e '\''$(VERSION_FROM)'\'' and -M '\''$(VERSION_FROM)'\'' < -M '\''$(FIRST_MAKEFILE)'\'';' -- tardist : $(DISTVNAME).tar$(SUFFIX) $(NOECHO) $(NOOP) uutardist : $(DISTVNAME).tar$(SUFFIX) uuencode $(DISTVNAME).tar$(SUFFIX) $(DISTVNAME).tar$(SUFFIX) > $(DISTVNAME).tar$(SUFFIX)_uu $(DISTVNAME).tar$(SUFFIX) : distdir $(PREOP) $(TO_UNIX) $(TAR) $(TARFLAGS) $(DISTVNAME).tar $(DISTVNAME) $(RM_RF) $(DISTVNAME) $(COMPRESS) $(DISTVNAME).tar $(POSTOP) zipdist : $(DISTVNAME).zip $(NOECHO) $(NOOP) $(DISTVNAME).zip : distdir $(PREOP) $(ZIP) $(ZIPFLAGS) $(DISTVNAME).zip $(DISTVNAME) $(RM_RF) $(DISTVNAME) $(POSTOP) shdist : distdir $(PREOP) $(SHAR) $(DISTVNAME) > $(DISTVNAME).shar $(RM_RF) $(DISTVNAME) $(POSTOP) # --- MakeMaker distdir section: create_distdir : $(RM_RF) $(DISTVNAME) $(PERLRUN) "-MExtUtils::Manifest=manicopy,maniread" \ -e "manicopy(maniread(),'$(DISTVNAME)', '$(DIST_CP)');" distdir : create_distdir distmeta $(NOECHO) $(NOOP) # --- MakeMaker dist_test section: disttest : distdir cd $(DISTVNAME) && $(ABSPERLRUN) Makefile.PL cd $(DISTVNAME) && $(MAKE) $(PASTHRU) cd $(DISTVNAME) && $(MAKE) test $(PASTHRU) # --- MakeMaker dist_ci section: ci : $(PERLRUN) "-MExtUtils::Manifest=maniread" \ -e "@all = keys %{ maniread() };" \ -e "print(qq{Executing $(CI) @all\n}); system(qq{$(CI) @all});" \ -e "print(qq{Executing $(RCS_LABEL) ...\n}); system(qq{$(RCS_LABEL) @all});" # --- MakeMaker distmeta section: distmeta : create_distdir metafile $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'exit unless -e q{META.yml};' \ -e 'eval { maniadd({q{META.yml} => q{Module YAML meta-data (added by MakeMaker)}}) }' \ -e ' or print "Could not add META.yml to MANIFEST: $$$${'\''@'\''}\n"' -- $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'exit unless -f q{META.json};' \ -e 'eval { maniadd({q{META.json} => q{Module JSON meta-data (added by MakeMaker)}}) }' \ -e ' or print "Could not add META.json to MANIFEST: $$$${'\''@'\''}\n"' -- # --- MakeMaker distsignature section: distsignature : create_distdir $(NOECHO) cd $(DISTVNAME) && $(ABSPERLRUN) -MExtUtils::Manifest=maniadd -e 'eval { maniadd({q{SIGNATURE} => q{Public-key signature (added by MakeMaker)}}) } ' \ -e ' or print "Could not add SIGNATURE to MANIFEST: $$$${'\''@'\''}\n"' -- $(NOECHO) cd $(DISTVNAME) && $(TOUCH) SIGNATURE cd $(DISTVNAME) && cpansign -s # --- MakeMaker install section: install :: pure_install doc_install $(NOECHO) $(NOOP) install_perl :: pure_perl_install doc_perl_install $(NOECHO) $(NOOP) install_site :: pure_site_install doc_site_install $(NOECHO) $(NOOP) install_vendor :: pure_vendor_install doc_vendor_install $(NOECHO) $(NOOP) pure_install :: pure_$(INSTALLDIRS)_install $(NOECHO) $(NOOP) doc_install :: doc_$(INSTALLDIRS)_install $(NOECHO) $(NOOP) pure__install : pure_site_install $(NOECHO) $(ECHO) INSTALLDIRS not defined, defaulting to INSTALLDIRS=site doc__install : doc_site_install $(NOECHO) $(ECHO) INSTALLDIRS not defined, defaulting to INSTALLDIRS=site pure_perl_install :: all $(NOECHO) $(MOD_INSTALL) \ read $(PERL_ARCHLIB)/auto/$(FULLEXT)/.packlist \ write $(DESTINSTALLARCHLIB)/auto/$(FULLEXT)/.packlist \ $(INST_LIB) $(DESTINSTALLPRIVLIB) \ $(INST_ARCHLIB) $(DESTINSTALLARCHLIB) \ $(INST_BIN) $(DESTINSTALLBIN) \ $(INST_SCRIPT) $(DESTINSTALLSCRIPT) \ $(INST_MAN1DIR) $(DESTINSTALLMAN1DIR) \ $(INST_MAN3DIR) $(DESTINSTALLMAN3DIR) $(NOECHO) $(WARN_IF_OLD_PACKLIST) \ $(SITEARCHEXP)/auto/$(FULLEXT) pure_site_install :: all $(NOECHO) $(MOD_INSTALL) \ read $(SITEARCHEXP)/auto/$(FULLEXT)/.packlist \ write $(DESTINSTALLSITEARCH)/auto/$(FULLEXT)/.packlist \ $(INST_LIB) $(DESTINSTALLSITELIB) \ $(INST_ARCHLIB) $(DESTINSTALLSITEARCH) \ $(INST_BIN) $(DESTINSTALLSITEBIN) \ $(INST_SCRIPT) $(DESTINSTALLSITESCRIPT) \ $(INST_MAN1DIR) $(DESTINSTALLSITEMAN1DIR) \ $(INST_MAN3DIR) $(DESTINSTALLSITEMAN3DIR) $(NOECHO) $(WARN_IF_OLD_PACKLIST) \ $(PERL_ARCHLIB)/auto/$(FULLEXT) pure_vendor_install :: all $(NOECHO) $(MOD_INSTALL) \ read $(VENDORARCHEXP)/auto/$(FULLEXT)/.packlist \ write $(DESTINSTALLVENDORARCH)/auto/$(FULLEXT)/.packlist \ $(INST_LIB) $(DESTINSTALLVENDORLIB) \ $(INST_ARCHLIB) $(DESTINSTALLVENDORARCH) \ $(INST_BIN) $(DESTINSTALLVENDORBIN) \ $(INST_SCRIPT) $(DESTINSTALLVENDORSCRIPT) \ $(INST_MAN1DIR) $(DESTINSTALLVENDORMAN1DIR) \ $(INST_MAN3DIR) $(DESTINSTALLVENDORMAN3DIR) doc_perl_install :: all $(NOECHO) $(ECHO) Appending installation info to $(DESTINSTALLARCHLIB)/perllocal.pod -$(NOECHO) $(MKPATH) $(DESTINSTALLARCHLIB) -$(NOECHO) $(DOC_INSTALL) \ "Module" "$(NAME)" \ "installed into" "$(INSTALLPRIVLIB)" \ LINKTYPE "$(LINKTYPE)" \ VERSION "$(VERSION)" \ EXE_FILES "$(EXE_FILES)" \ >> $(DESTINSTALLARCHLIB)/perllocal.pod doc_site_install :: all $(NOECHO) $(ECHO) Appending installation info to $(DESTINSTALLARCHLIB)/perllocal.pod -$(NOECHO) $(MKPATH) $(DESTINSTALLARCHLIB) -$(NOECHO) $(DOC_INSTALL) \ "Module" "$(NAME)" \ "installed into" "$(INSTALLSITELIB)" \ LINKTYPE "$(LINKTYPE)" \ VERSION "$(VERSION)" \ EXE_FILES "$(EXE_FILES)" \ >> $(DESTINSTALLARCHLIB)/perllocal.pod doc_vendor_install :: all $(NOECHO) $(ECHO) Appending installation info to $(DESTINSTALLARCHLIB)/perllocal.pod -$(NOECHO) $(MKPATH) $(DESTINSTALLARCHLIB) -$(NOECHO) $(DOC_INSTALL) \ "Module" "$(NAME)" \ "installed into" "$(INSTALLVENDORLIB)" \ LINKTYPE "$(LINKTYPE)" \ VERSION "$(VERSION)" \ EXE_FILES "$(EXE_FILES)" \ >> $(DESTINSTALLARCHLIB)/perllocal.pod uninstall :: uninstall_from_$(INSTALLDIRS)dirs $(NOECHO) $(NOOP) uninstall_from_perldirs :: $(NOECHO) $(UNINSTALL) $(PERL_ARCHLIB)/auto/$(FULLEXT)/.packlist uninstall_from_sitedirs :: $(NOECHO) $(UNINSTALL) $(SITEARCHEXP)/auto/$(FULLEXT)/.packlist uninstall_from_vendordirs :: $(NOECHO) $(UNINSTALL) $(VENDORARCHEXP)/auto/$(FULLEXT)/.packlist # --- MakeMaker force section: # Phony target to force checking subdirectories. FORCE : $(NOECHO) $(NOOP) # --- MakeMaker perldepend section: # --- MakeMaker makefile section: # We take a very conservative approach here, but it's worth it. # We move Makefile to Makefile.old here to avoid gnu make looping. $(FIRST_MAKEFILE) : Makefile.PL $(CONFIGDEP) $(NOECHO) $(ECHO) "Makefile out-of-date with respect to $?" $(NOECHO) $(ECHO) "Cleaning current config before rebuilding Makefile..." -$(NOECHO) $(RM_F) $(MAKEFILE_OLD) -$(NOECHO) $(MV) $(FIRST_MAKEFILE) $(MAKEFILE_OLD) - $(MAKE) $(USEMAKEFILE) $(MAKEFILE_OLD) clean $(DEV_NULL) $(PERLRUN) Makefile.PL $(NOECHO) $(ECHO) "==> Your Makefile has been rebuilt. <==" $(NOECHO) $(ECHO) "==> Please rerun the $(MAKE) command. <==" $(FALSE) # --- MakeMaker staticmake section: # --- MakeMaker makeaperl section --- MAP_TARGET = perl FULLPERL = /usr/local/bin/perl $(MAP_TARGET) :: static $(MAKE_APERL_FILE) $(MAKE) $(USEMAKEFILE) $(MAKE_APERL_FILE) $@ $(MAKE_APERL_FILE) : $(FIRST_MAKEFILE) pm_to_blib $(NOECHO) $(ECHO) Writing \"$(MAKE_APERL_FILE)\" for this $(MAP_TARGET) $(NOECHO) $(PERLRUNINST) \ Makefile.PL DIR=Ext \ MAKEFILE=$(MAKE_APERL_FILE) LINKTYPE=static \ MAKEAPERL=1 NORECURS=1 CCCDLFLAGS= # --- MakeMaker test section: TEST_VERBOSE=0 TEST_TYPE=test_$(LINKTYPE) TEST_FILE = test.pl TEST_FILES = t/*.t TESTDB_SW = -d testdb :: testdb_$(LINKTYPE) test :: $(TEST_TYPE) subdirs-test subdirs-test :: $(NOECHO) $(NOOP) subdirs-test :: $(NOECHO) cd Ext && $(MAKE) test $(PASTHRU) test_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES) testdb_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) $(TESTDB_SW) "-I$(INST_LIB)" "-I$(INST_ARCHLIB)" $(TEST_FILE) test_ : test_dynamic test_static :: pure_all $(MAP_TARGET) PERL_DL_NONLAZY=1 ./$(MAP_TARGET) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES) testdb_static :: pure_all $(MAP_TARGET) PERL_DL_NONLAZY=1 ./$(MAP_TARGET) $(TESTDB_SW) "-I$(INST_LIB)" "-I$(INST_ARCHLIB)" $(TEST_FILE) # --- MakeMaker ppd section: # Creates a PPD (Perl Package Description) for a binary distribution. ppd : $(NOECHO) $(ECHO) '' > $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) ' ' >> $(DISTNAME).ppd $(NOECHO) $(ECHO) '' >> $(DISTNAME).ppd # --- MakeMaker pm_to_blib section: pm_to_blib : $(FIRST_MAKEFILE) $(TO_INST_PM) $(NOECHO) $(ABSPERLRUN) -MExtUtils::Install -e 'pm_to_blib({@ARGV}, '\''$(INST_LIB)/auto'\'', q[$(PM_FILTER)], '\''$(PERM_DIR)'\'')' -- \ TFBS/PatternGen/YMF.pm $(INST_LIB)/TFBS/PatternGen/YMF.pm \ TFBS/DB/TRANSFAC.pm $(INST_LIB)/TFBS/DB/TRANSFAC.pm \ TFBS/DB.pm $(INST_LIB)/TFBS/DB.pm \ TFBS/DB/LocalTRANSFAC.pm $(INST_LIB)/TFBS/DB/LocalTRANSFAC.pm \ TFBS/PatternGen/MEME.pm $(INST_LIB)/TFBS/PatternGen/MEME.pm \ TFBS/PatternGen/Gibbs.pm $(INST_LIB)/TFBS/PatternGen/Gibbs.pm \ TFBS/PatternGen/Elph.pm $(INST_LIB)/TFBS/PatternGen/Elph.pm \ TFBS/DB/JASPAR4.pm $(INST_LIB)/TFBS/DB/JASPAR4.pm \ TFBS/PatternGenI.pm $(INST_LIB)/TFBS/PatternGenI.pm \ TFBS/Matrix/ICM.pm $(INST_LIB)/TFBS/Matrix/ICM.pm \ TFBS/PatternGen/YMF/Motif.pm $(INST_LIB)/TFBS/PatternGen/YMF/Motif.pm \ TFBS/SitePairSet.pm $(INST_LIB)/TFBS/SitePairSet.pm \ TFBS/PatternGen/Elph/Motif.pm $(INST_LIB)/TFBS/PatternGen/Elph/Motif.pm \ TFBS/Site.pm $(INST_LIB)/TFBS/Site.pm \ TFBS/_Iterator/_MatrixSetIterator.pm $(INST_LIB)/TFBS/_Iterator/_MatrixSetIterator.pm \ TFBS/Tools/SetOperations.pm $(INST_LIB)/TFBS/Tools/SetOperations.pm \ TFBS/Matrix/PFM.pm $(INST_LIB)/TFBS/Matrix/PFM.pm \ TFBS/PatternI.pm $(INST_LIB)/TFBS/PatternI.pm \ TFBS/Word/Consensus.pm $(INST_LIB)/TFBS/Word/Consensus.pm \ TFBS/PatternGen/Gibbs/Motif.pm $(INST_LIB)/TFBS/PatternGen/Gibbs/Motif.pm \ TFBS/SiteSet.pm $(INST_LIB)/TFBS/SiteSet.pm \ TFBS/Matrix/_Alignment.pm $(INST_LIB)/TFBS/Matrix/_Alignment.pm \ TFBS/PatternGen/AnnSpec.pm $(INST_LIB)/TFBS/PatternGen/AnnSpec.pm \ TFBS/PatternGen/Motif/Word.pm $(INST_LIB)/TFBS/PatternGen/Motif/Word.pm \ TFBS/SitePair.pm $(INST_LIB)/TFBS/SitePair.pm \ TFBS/_Iterator/_SiteSetIterator.pm $(INST_LIB)/TFBS/_Iterator/_SiteSetIterator.pm \ TFBS/_Iterator.pm $(INST_LIB)/TFBS/_Iterator.pm \ TFBS/PatternGen/MEME/Motif.pm $(INST_LIB)/TFBS/PatternGen/MEME/Motif.pm \ TFBS/DB/FlatFileDir.pm $(INST_LIB)/TFBS/DB/FlatFileDir.pm \ TFBS/PatternGen.pm $(INST_LIB)/TFBS/PatternGen.pm \ TFBS/MatrixSet.pm $(INST_LIB)/TFBS/MatrixSet.pm \ TFBS/PatternGen/SimplePFM.pm $(INST_LIB)/TFBS/PatternGen/SimplePFM.pm \ TFBS/Word.pm $(INST_LIB)/TFBS/Word.pm \ TFBS/DB/JASPAR2.pm $(INST_LIB)/TFBS/DB/JASPAR2.pm \ TFBS/Matrix.pm $(INST_LIB)/TFBS/Matrix.pm \ TFBS/PatternGen/AnnSpec/Motif.pm $(INST_LIB)/TFBS/PatternGen/AnnSpec/Motif.pm \ TFBS/PatternGen/Motif/Matrix.pm $(INST_LIB)/TFBS/PatternGen/Motif/Matrix.pm \ TFBS/Matrix/PWM.pm $(INST_LIB)/TFBS/Matrix/PWM.pm $(NOECHO) $(TOUCH) pm_to_blib # --- MakeMaker selfdocument section: # --- MakeMaker postamble section: # End. TFBS-0.7.1/README.md000066400000000000000000000103741305752266700135560ustar00rootroot00000000000000# This is the README file for TFBS modules distribution, Version 0.7.1 **NOTE** TFBS perl module is no longer under active development. All the functionality can be found in [TFBSTools](http://bioconductor.org/packages/TFBSTools/) Bioconductor package. Users are highly encouraged to switch to TFBSTools. ## About TFBS TFBS is a computational framework for transcription factor binding site analysis. It can also be used for analysis involving other DNA paterns representable by matrices, e.g. splice sites. ## Contact info Author: b.lenhard at imperial.ac.uk TFBS website: http://tfbs.genereg.net/ Bug reports: https://github.com/ComputationalRegulatoryGenomicsICL/TFBS/issues Please send bug reports, in particular about documentation which you think is unclear or problems in installation. The author is also interested in suggestions on directions in which TFBS functionality should be expanded. ## System requirements * Tested on Linux/i686, Mac OS 10.12.3 * perl 5.10.0 or later * ANSI C or Gnu C compiler for XS extensions * bioperl 1.0 or later * Additional perl module and application dependencies listed at http://tfbs.genereg.net/ ## Documentation The modules have a reasovably complete POD documentation. After instalation, type e.g. ```sh perldoc TFBS::Matrix::ICM ``` to display the documentation for a particular module. POD documentation for all modules, as well as additional information can be accessed from TFBS web page at http://tfbs.genereg.net/POD/. A limited amount of example code, can also be found in the `examples/` directory. The current collection includes scripts for demonstrational purposes. The explanations an be found in the source code of individual scripts. ## Installation The TFBS modules are distributed as a tar file in standard perl CPAN distribution form. This means that installation is very simple. Once you have unpacked the tar distribution there is a directory called TFBS-xx/, which is where this file is. Move into that directory and issue the following commands: ```sh perl Makefile.PL # makes a system-specific makefile make # makes the distribution make test # runs the test code make install # [may need root access for system install. See below for how to get around this.] ``` This should build, test and install the distribution cleanly on your system, provided that all prerequisite modules have been installed. Running perl Makefile.PL will ask you for a MySQL server write access details if you want to test TFBS::DB::JASPAR7 module. You can safely choose not to do the test by answering "no" tho the first questions. To install you need write permission in the `perl5/site_perl/` source area. Quite often this will require you (or someone else) becoming root, so you will want to talk to your systems manager if you don't have the necessary access. It is possible to install the package outside of the standard Perl5 location. See below for details. ### INSTALLING TFBS IN A PERSONAL OR PRIVATE MODULE AREA If you lack permission to install perl modules into the standard `site_perl/` system area you can configure TFBS to install itself anywhere you choose. Ideally this would be a personal perl directory or standard place where you plan to put all your 'local' or personal perl modules. *Note*: you _must_ have write permission to this area. Simply pass a parameter to perl as it builds your system specific makefile. Example: ```sh perl Makefile.PL PREFIX=/home/borisl/perllib make make test make install ``` This will cause perl to install the TFBS modules in: `/home/borisl/perllib/lib/perl5/site_perl/` And the man pages will go in: `/home/dag/My_Perl_Modules/lib/perl5/man/` To specify a directory besides `lib/perl5/site_perl`, or if there are still permission problems, include an **INSTALLSITELIB** directive along with the PREFIX: ```sh perl Makefile.PL PREFIX=/home/borisl/perl INSTALLSITELIB=/home/borisl/perl/lib ``` See below for how to use modules that are not installed in the standard Perl5 location. ### USING MODULES NOT INSTALLED IN THE STANDARD PERL LOCATION You can explicitly tell perl where to look for modules by using the lib module which comes standard with perl. Example: ```sh #!/usr/bin/perl -w use lib "/home/borisl/perllib/"; use TFBS::PatternGen::Gibbs; # etc... ``` TFBS-0.7.1/TFBS.png000066400000000000000000001256651305752266700135550ustar00rootroot00000000000000‰PNG  IHDR_ľ¶gĆgAMAŮBÖOˇ cHRMz%€ô$„Ďm_čj<‹W©í«@IDATxśbŘ7ÜÁ˙˙˙B cô @ŤOaOž8qb„ 7nÜřđă\±b}‡Ľ˙žH•@Q1 ôŰŤ ôđ^|ř@Ľkpĺ¬RřĹńKáq@“.˝÷áÇ ŁȱcÇŽ7÷'˛ă0q)ĆĹFó31VaOttĚ1âĎüřńź@4ÓÝT÷$‘V>O.X°Hc豆† ÇÖlŘô*K’5¸ÜGdrýýű7Düůó礦UiDBé Ε/ČđáĘ<‰,,oHő'P @ëÉ'N€’+¸býH~r%¨Ś ű (O8pŕÂ… 6l€äL ů–'€AGGA›Čó$˛F L±řÍÄj@ödÁŠT ô*ĐŔţ„Ëľ9ÁňdÉ Ôő$IÉżŐDŘ“36l–®Ŕ„ ŚLP% ®-â/0@<íŁ Ţ{pőDTrůŕĂ  o ­ 7^\ŮłâÉ–)SČt8-ĐDŔ“ŕ8FAjË˙°|d<8Ŕaš 00ś [|€ ž¦ĚŕBő¸tŠÜXascÄ3  )Ąř}xďŢ˝GʎYŹËYřÍÁ#‹Ő4€ÂçIHűŇ(­ŔV|°Câ+ ż_00ź;věÜąs˙ÁŐ/3±HBBtýGő˙©S§´ €çIÇ<ćEhýńáĂŠ dą‰`OîŘq‚HËŕ ÓńxXU:t®9DţŠjL{€'Yq°5ţŕAd`€•1 ¦Râíô @Uş˝ L´3făÔ«Çé|#2€'6JfKĽF€"ěÉŕ&Ί €ő$Đ{V¬ąAb 9GÁ0GÁ«;`YT\˘@ŘČU"Ľví0 ĂąD–@@•D„'ź€zÉ@lńüçĚ DÇ!I€Ôb“Hô @áóä‰WţËŘ Ňe¤.¸q"B gQ=@Ä €3!°{ j€ë• `@P#0r Ő˙¨e,°Łś±&Q xýú5„7 ąš’ź?†kÁęI€"Ę“Ŕ¶°’zďĉV¬V$ 6# @gÁ»Ľ¸‚-€Y®kéő •a 8@đ$¤€FdÁ† Ň<zű5Ň*I5ŮV>O~6˙ÁȨ$$ą´HăÁ_$Ż*z €đy˛ˇˇč78'!=fČŔ$öˇŮŤśµ ‚Čަ[dőČ-AR- b‡?Ŕ1 *iV€›ě‡đHuĎHpO"g`¸'á… Đ“Ú•$»€ę„,ČŕÝ@>OKHm™¸Aâ€ň @Q1IaâXô @1üÇÝŁ6 €Ú4ŔX š4µú €őä"ZŞ= @< )W_€«@i 8 ~ö?Ř“DŔ“ŕúÔ¦Ś#ăs† $Y€+ź` ˘őBŕ>5ďµ EĐ D8ąî¸páÁť+Ŕh6Ŕ‘ČťşĂÁě(#«Dó š!™˙a3?oNX<8@Ŕ = gă7—ŐDŘ“ŔcV'ŔŢ2€ń ôípdXÁđćCE ÁaXR=IdJţüů31ž ž„ڵ˝wŕh^ęŔçí€`Á”‚ćPŔÇľĐ&Čđ b€"ěIxyó<*ąg¨ ,‡@¦ë€<÷óeNE E>  z €5ëŔXŔ{ă•'@mş7O®@'0ŻÂc’Ěő=t@OOq  yâÄŚ3 Ă'V0¬X±ŕĹ ´Ő'*Ѳ1íOĚr…’S @áód„ éWĐňđś9°ČŮ1šo0Á’boŇp8~O‡Ś!UĹíŰ·ż˙ţu±ń–zŔúcÎ’5 ~|€fÂp.ćĂŠŠ "-[‰Ë“űP§Y±°äy €yňĹ‹„ .ܸŚĆđ‚ž†; +!00pp5Ç<°čI€"\ş‚fŔU¨#t˝Ŕ@¸bD$ÍÉ‘‘ë@< L¨ŔČlX°9_€ P†Ŕ„Ú‘@Ľ5ČłQČvCQsČ<,$â)Gk<ŔŐă·¨ €čj}ř°ÜóxžT§]=A•YwLô$@ö$0•Ó*¤«u<¬6w€W0ŃÂMT@OQY7`q¬4! `UrľÖ†(rËł]ŠkŢŹ "Ź‚gÇ`“ظqă0OBş!Wöěč b[<Ű` Ô!0ř·bM±ŔŞ(źśÝ‡4µ ń!ţ*¨ €ŠÉ†+ K^A z::ŔK–¨“3i”‘Đ“„Ď“đąd`«čIp/ň¨ŚĄöz\‰ MśĽĺ!@Cź'‘çy€Ť»3^<ŹP<Đ w.ňú3´Ö Zň†č"r=2ę ˘Fţ{€‰ö°±~ăh\kÇR-Cł€{ě?ˇîňÔµ$$Ů@„= ˢčI€"\UECC0Cn q uđ 'hDĚjĐ„ áżr„€ŤČ! 9 €¨‘‘<Üâ5€-ŔA2)=ě$":9ŘŤ$"  Ý >®<Ä ŔVŕA0ću!/]±ád0rBĘŤ W ŠćG€D$@Q‘Ŕ§FŔÖh*Bčjp˘aŔ,Ľ°eĺdÄ00€LčéYC•ćĎH ň#ňt(çËر5 {›6l€ŞeÎěEá‹+đ¨z«3rîŔ)G!hxˇĆť×~|űŹt@°‰iLţţýŘâřüů3ÚB ÷öíŰYxż¨˛Úô?leDĺo0@^Ű4ŢX…¬–Cv D%P \ś¤ťeDúüś±aĂŽ v€÷|€w’±b¨˝ .f"M#ŕ;8€źż´¬¨ťÔő0° Ѱ¶z ~–ÄŘę dő¸H¸JddAČŞ)ř`¤Q o@íĂ8N NR׿@@”Ö‘>"02€ť`ä10[< y ŕc2V€ł&(F; ‘w‚XBĆöŕQK%ŤP‰H€˘("‘wĽů2šĂŔ°áÁť+sH‡4nÇB"ěxťú*zşXNÂKTâĄĐŐ Oü‘DiŽ6V/Ľx,$@K|!Ą(dă(hÜőĆ `¦śSQŚd`–v±9ţľb X¸A¦úŕĄňIZŔř€Ô…đÂyř=0€ BŘđ#» ‡r!ź'´†@“!Šá%ęëׯ!&@†U÷!ma†lů‹ ˛ýŃ@Tč~\ǨűcÝěM‚'#Ůń°¨źş”0a —â“€­H`Á˝µ†Ă‡Ç$|á Y\ Wi1ÁăľôłÚC3rä>Ř2r­ qóŠkMHDE éçAš¬Ŕ2P>cč´pČRŹŕ%˘pőĐs(i0Ă<’$" 9ňxĆ2¦YŰŚKpc´aí?xHt–ß‹}Ţ‘<Ę'™1ŰúX•‘—ę!ęŃN@ÄĄ ©eˇÍwÄHDu†č:Ŕ»™€ĺjĹŠŔ pvĽđśG7€Wg="Ů÷ŚÚŻ|†G$žţVÖČ…0„©á0UţG;…ŰŽf,`Ő‚U#r/ł" @l ň#XfBvQţ‡ ńą-S¦ěٰbx˙(hzśÁűF@ş› > U/ÝŔ“üÝp‚ŰÔá-¸ih(i¬«: bd)äU@]p—`ŐV·ČćĽüŽÄ  @ä/ľúľ|.Śž`Ô‚×ě€n›€tiąWĄ B•˛]Ł @" €Č‰HĐlFG0á}yČ©ösk 'Ľ€˛ xĺä¤[Č»ŤB´˛îŇĄKČ%fÉąvW„ o¦îM c-ş!*Ń X:ř €¨żŇ|¸\ŤÄ‚>ŔY}2§*` w¬ăvŔhö piA#!ýHxă…G$ä\+Zű : _# Ě‘Wöě@ž˝:žá ”÷/Pcsĺ=$"̬¨¨€Ż<†ł˙ÁóţÔvä(  @-‡|ŰÄüĽyíň=îu(HD2g ]@Ă˙Úa Y €F—9 @Ł9ä$_ĐhDy‰H€˘rDv Ů“1†.€D$@Q‘ó!lĐ™CO@CqkQF»•ô J#ľÍęx4˛xgl˘ĽóĄ‡% xŮ‘DiDBĆtŔÍ>qâd-ÖŤ W ŔkţŽŕÍtü;ÜĐc žʱ^ €ŐĘĂ €¨SGžođ€ą *TÁw¦BhďÇĚ{ˇFőÄ%dk$/Ň\^j9 2×1|6ń € A‹I"C -Čh‘đ;[IuµÂ €¨#7€7ŔĎé‡ět„L@nЇ^ ^8Iąux1„‘$™‰¬ůDLŁđÄ1šRü‡Ď…D…Ü.H!“ÉŕĂĐ~wŔF$ň”$šT9…  ´„/REžĄ‚Ä"čTđĄˇkn!‹z »€W ‹Ö l逊€ŚÄz6 ňÍ–¸"˛—uű°Ĺ(Äp€"łhE^}ŚEHĄŚfřŇVđ*:Đn`M ą»ůíű”dD$V€Üá#¨HŚTń5@‘‘qř©Ľś ęJ"ÝÝd…€ó%;cÆ´čÄeyşČö5@QÔŘt9Đ*NČ2s` ˛„ş74ă‘—sü„,~„dJxÖ„ďBżľ+‹k6ŕ  îŠ2"’$31Ő+xÎ$Ţwř]@ÔŮÄčś`äAŹl˝ń¶Ź˛L SĺçÜŁA‘j!Ć@€˘NDBÚ>ŕ•ć@T2a °ÔMwH [ŇńL†lămß. Ô1$" K‹ *Däp ¬Ŕ·Ţďćuˇ;=ŔË[`­€MXÎh@Ď3,¨ČČ Đ%'"ńNŞt!@Q!"_€«đˇ¸ŕD!#v‘đX\°Ř肌äAŔ› HĄäĚ+d/Ń4"ńx S ˛7˛łĘ Ű×D…śqŕ@řÔdЀܓ'Ŕă; ‹™?€ÎŢ…­wŰ\ŘŔđŕ$ Aű :*H»B  `D"÷&‰ÔB•X„[@”FäŘmRŔ=ŤŽ+€\Hä8›‚v2ß*†OfAN„„ô8ˇ "đqJ :Ţ<ąBˇ—č‘Č#A˙1ĆZ±*Ţ'1 €(ŤHĐuĚ.@Z±Ąö ‰Ý SĐň0#ŢXaóâLÎđÍ·đ¶Ł€*MžÁ ç$Ń(ţŕb2@Qٱľťřľ"+ŕ3 .€wž¸pľ?‹ŔĚD#ł°Ęť4˘$":ůܢYqtNŮ pˇzÜŢěuEVĽgĂŠ9 v00ś``¸ÁÁšpvI9™'a`xŃĐŃÁŔ0ČčÖ™T=kHD"XĎttĽŻę´ jo¸ŤWv4ä ¤fä𹀏\U«đęľPîĽa  @Eä `[2>ţt@ňđŽó`Ţà »c°D}°T˘B4"¶ ‰H`=:u Ôűd`2€Gr!  @E$$.€®ŘżňąłqăÂĐ1»grt zv Ś E*hF,#zŞ6°¦Č‹#@" €(ŠČ mAµáČ1:őđzńÁ`7˙ĆxĚAÄI3,@=ćsüŚĎó 9ź Źbzµc®{±©  @äGdxŮ*äĚ+PŔxX˙‹H`“ŘťĆâ„ hD+K¤¨BÄ"$Ž2¤DIŰuęi5đ“WáG§".DŇ+€÷ =ztéŇ%8~2D1ňÉť]đë;ŮđK ˙—@ŇĐ%cÉöV˙@‘‘ŔŞ ´}GĐ `ěAn€].¬+R€*f40@ľ~pe¨§>#ňĘž4jÔ G$yěŘ1ĚĂZ!Q hČ9§Nť‚ź‡|î‘$.—@Rüč;ęúüąâŔđ8dÔ˛$tZřxȸ9 >ł¬ĄÄXĚw6¬€\µÔ‘lË@b‘ď"łŃňZD"+ \ ů”\$|™šö˙ĽvĘ$=ű‘DQ Ů)p|ČduňŠ @ë<ŔăăĹ;“Ëŕĺ'°o±ś#˙ĂŢ}>2‹Jž´ „.‘ ×?Ů4Óŕ7ń É"żŚţ?ĆůyÔň/eÝđńÉŔÂs°űx4“ YvľP ´ ë»`N $ÎřčÜÇ"0ăÂëBJŞĂQđ‘DQDţď븾ĺcÁŽÎä,`ĽBQ^4]ŮtÉŽŔ>h§ă•Ľ3r$"˘ 2ĺřřäŕĘ6¬X3gÉšŕČ ; c•Á·-­çE ÂşĘy<íI"›šŔ‚m2„Ö‘D…!:`ű ĺ%äŔ@Č%YŔ–+0ĎAŽâ=ńŕ°ió€Ř7dGŞĚ「ż Dk0S @" €(žĆfÇ W ÝÇw®€Ďß± nŕpp€NpFň pĆ✊޸@í?`¶8˙a´Ö&¤·?ÜMůśj´†+r›~\6Í„ě 9ýő?R7˛8~Dö>đéęM$RD;@Q‘7ŔG@^xŚ Ŕš˛dÂčőJŕZ$ şâĆ‚))  ¦ś MvüŹŃ"…4!MĘ˙Hqđćmřő.XµcƦ ČgŁ%Ž˙H9rlx˘†Âń!DiČv€&žVt€—ZAzŤű”€eěž  ]’ ř;jm Y,ţGę;Â{@€|ş<śĽ ű`Ç|BFp0c ~Ä2$wâ’úŹ;"‘ 6BFŕÇGÂE¨‘Diŕ20@/ßę ŢăI_qýą<ŽCˇEô´«qi  @ďXGdŮ*h|sYI´+\]&LĘΠř(óQ€  @äD¤€bDmÂŚ/Ŕ«­ çy¬Ř°ŁĐ¶Ň®Źŕűz+6lF-äL^Zoz9‘DiŽ|[¶z|ĹdiůđęHȱć˙aeCeA"tCĹYkk—2RKK"ua*ŔŻ­2¦ €D$@Q§rÜL…DŇ Ř•ő1ذ%çŕ ! Ů`¦tičˇÜRĚ–!\r9 hl¬ÂNÂĄwŞb5 —yTť÷Dť|néL€śfŰ„ő|Ż+° ,xŔ$d…äpź’rKáţGkÖc˝Ö? WąéîŁ˙Á WŚ9+‚+†ŕQě"ÝC|8@Qi7¤t—Ą–® ‡|€ąĐAó€Ĺ/°7R@ŤQ:\ˇŹHČÍŽx‚ 9ľ˙g}±Ş„\nőéŽřT32 ż’M śMjŻ”řp€"3";ŔÎćł ŕ“€ŤXĐţdđJsČ6tČáIą*Čú ¤Łë)ř6‰ĽŃÎ…¦YM íŔ+řý,Xµŕ1‡Gôů9XŐÁ/p™?v\ ś/<‡L5WÁű –ť=ä°AH—śČ+!3€D$@‘‘ĐYbđ„0¬X°  Ą#;v쀟y™v–Ąŕ9çŕőTßÜ:  @äD$¸ž;µ ‚ę?đÔ¤i @-­6b?|(¶7čp°Ů€ÔĘ ˙­tdTudHD9 ŚBx ?=2 ľ 4Đ:ačŽe` Řľly0\©„yK|?šJxë~ůrďłmŚĽ¤ł5KkůŤäÎţĂvźoŹě€dÁŁt}Ëŕ˝’ 7źQÁáä`ŁůJ]¬$`ŽŚCş&(Ä( Hó9Ę!Úá"đŐ‘´™‰‹Č˛ćęxdŕxOd¸` JW"Ż»‚¬˛Č=xDţ‡…#D#$S"›†Öá€8 €¨9µTQQąŹľ*rŰd”Ň'ˇ˘u¤‚}°iŁ˙8úűp•Č…-räÁóZDB’DdbŮdZŻü€Ř@ÔŚH`6…­đó±˙{&'†őőJôl×ŕr"r)6ś»aĹ䱂Q@k‰H€"?"!íHó­ŹOĎcËF$"ü„ł ŢĽţC®#G$"ĚÖ…đ(„ÔČ™r ÂŤ‰H€>GhŚX‰H€˝?r€ŤČaú·žŹ‚Q0 Č4š!GÁ(D €F3ä( ›¬4š!GÁ(`€ś!hŕ3$d‡„Ťv´dohŰxż%äŞŃpă(4Č €8CN0˛k mz8+B.'€,sfËđĚă•~ŔŽHŁ`¨ä @_CÂç2˙ďśx^Đü|Ś/dqsÉ„)7Ŕ ď*Ŕëž!»J ;„ wŤ¬űÉçh Đězţü9µLĆş®“ s »š>ţL•Ŕ„¬N%­¦îˇśTČ. Ďp9r’Aű|Ŕ7űŔŻHě1ÔĐť C{ŔgQÎذ˛ĹvÂY•@d" ŮEä ďĬŰ#¨e8IY~Ž#…€˘X¤@v@ L†ć(äîbČ‚s–¬Ö“µAylÇŽ7n4¬Xąur"äś Đ%ęě8qbĂŠ5Ŕ¶.¨¶jxÚPұbEÇiÇR%1™Č†J†„‚븩kŐb”€ě€řňř|^`¦ę€ÝĚ`ç¤CÓţ>pŕxç3¤‰ Ů@©0o€‡| őç p˙rĂ ß|IÝT…?‘ ­ ąw‚$č‘K—.aÍĎŔ6-®»t)ŚGŞd÷ĐŔgH] ľçrJäę Hžć4 ,0C{’ŔÜ;a;Ź Ň™„¬‹ľpáFÄŚăđ.€˝C¤ÂÄd¤W\O:¦üÜňśŠ|&ńş¨&đÓqČLZdŹĐ ČŔl6ăÄ `NŢűnŁBšŁ.$ă=¸sĺxĺđq÷Ŕě9ŹŕězŚ3@cB?~€ŽÓşpeÁP>ěŽZ‰ď˙ ĚdčÂ_=Rî‹Č ÚfHâ‡@!·0C˛_¸Q ĚŤ1 äp»ŠŠČivĂzŔç˝@a‚·`O€Źś8ľŕ cČÎ[R1ń  ‰ç˘[\Z«5\ŘdŠgě { €h^C"0Yąx "r| @NĂ‚°á—¤n)|đrĎö Řy’#™ jV€ŔČt%$G•NĐC¨ČH帵ú¸FA©bř>Ř‘ňä… ĺ† @v'@ŃµÉ ©÷€o‚…çC`ľ‚L3>׍Ŕ:rü2ä‰ đĹżŔ\¬ú çÚÝ€í@ć-!ç‹âłxŇ©F ‰ 9†“ Ŕ56C* óm,ř˛ĂŢ}H`… ? yW:$ď!«<>‘ž>a¨±gÉ‚Ŕö*8ľ@b…ÔĄ;ýŕ*ńOb"Ő¨!‘!‘ÁÜB­l l9“´ČN Ô\ĚfÁäsxШ=şaÉ ŕ÷M<źč۰`hÄuÆ^@F|ŕZ Nâ Řńp<ÉTŁđdHČYÄD\ů„T§BN«ĹźU ÇŔ#…+¬™i Ý@“!áB˛"ä8ר˘+Xpâ°F}ńĺćeĐ,ČŠ\1%+20,měşpQ+^¸0gɆ@ć>ćj†/@„¤kČ/— gH€ř ą<ÁiF«Á`U Ȣ9 @’Ŕ\9n˛l1Ţ4~łchZ2NóĚ“ČC8 :@S‹ zŹ70Z¤7VŘs¨ĆŚŇ3â0ć ĚĚě:˘ąŘ|… lÍŇ3¸ -#sŃ2*dNí´ Čć)\Ů3·#ď6„§u\%¦vdqČJ4Ű‘Ůh;!»®~˙ţŤf&ÚVL`&DłńůóçhZ S˙9ŘcÇŽáňţ`ČÎ zgH`nf­ŕLĚ‘Ŕ§ă444kHH•źxWz+€B'Ŕ«y\&LXqáJ°ˇ ÎĆ€mĘ?Ŕ«Ŕ˝MĐĚ0g‚ĎUĽxŕyě"—ž``*Ol r#,~ˇgP0=$µaŠCb3űĚČlĚ\„¦® ’1Đ2-~Ł•ˇĺ+¸´Ü…çěIxFĹęÂ˙° IĹŤ#´Čž k˛łÓ đŽ €ÜäÉ“ ÓÁw|·9Wś8P™ŔŔpŘž}ž{­#?ł’{o<¸És±nlAdEP®Ź–BęD ěš¶D$Uş¤śAÍŤ/í¨)®Ľi¶!Żü„,Fˇz†„XOÖpqxÎÁeÔ©S§ţ#5/Ń”}ţüÍU=ČKĆqĺ(¬ąV„T¶C.CĐ@N{ŔAţîIB&ú!™d0w8páĆŤžž°WPŐ·ăÄ č k6ě€dÝódx…ňŞT6}áKs€3'ĐjŚfjĹ ÍŤd>ĐúÖ `ţÜÉ&ţ 0Ő˙ţ¨žcŕ€ů¨yG2D/©ž‚3oďŔ3$@ X†\pŕ°! ĚE sÁÇ€OU=îD~Z5g47B&ŻX°ŕř`ŰrÂŚNś0a°Y‹Ľl€L"Ő€Ŕşń?¨OřYŮš-6¬X3 0 FÁÔ @9 Ůő9ÁŇjť^wĽXŘC„‚ š„Ď@BZŞ@6rv‚źv$Ĺć@Či‘WŔŐ㠆ѵoŁ`0ä @żRg† ŕqČZś°s VHÉ„)îexj$X7޸ÁÁń2)ň6tĽĹ2„sĽEkRÝ8°~$ Ť‘ 8 Ň=hý@ęú9±’¤‘>3=TČ~ Ď°ĽŻt$ů“á–°c9 ł‘ ”ŻśO•€éxń'A“™Ŕî%xÇÖđÄĆ Ô6ęŤ W@•ëPČh›¤ >t&‚™F!\řN|Čhʤ€źć†9sĎHĆÔ¬OT@ş|îAËČçŮ ź%‡ËÁp€6y)‚<š…v,V‡ˇÍ¦Ŕ»¬.ĐpČěu˘“€ě*€ŕ ąľ— Ř{ç4`›H»…Ŕţaxác?ÖéČÉ`6ĂŽ %@]3ň!¤á ĚśC«™ŠY·Ŕ2& xRĆT‰Éć 䡬ĆBlG^r€5C"›°A3 Yš1FXÇH ý ĺ˛-h{ŽŃjHČĺđ˛©R  ĺOúä @bčUđ:† ož@Í›Ş;vI“ž”kç.\˘Ś3 Ŕ粀oőäÁđęo`ť®!/€'‡DĹ fHr žŮvĚô ¬'áÇ"bU€Őv‚«užU(ĚČĘ0mAÎE 4.¤F… Ŕ±Yä @ßdfI`öM<>¸qâÄč¨ ¬Úü«7€€źřĘśŔ¬?0›˛+€č‡Đ´C-"xÄ`­aŕ"Ŕ’žČ ‰ĽąŤ‡ÜB†“!á¦!7>ŃnwC3Ů"äşŽŚ o~CîEG¶ůö¸ŰŹŇ ř…ę*CĐ@fČ dd€n§‚oŞm‰Ž ¬÷Ŕ“Ŕ·)ďٰ2i Y 9kÁśš  íĂ Ő*4, oذáŔŤ3†ËąÉ†Ä űČ €čľ–őĘ`},Ůš ÉŤ+V¬¸[ó<ĄĚ«ŕsĺ S‘Ă”rV¬݉qÝÝŚ SĚ/Ŕ®Č’ŕö0°§Z±aĂđĚ'˝; Fa€ś!¶rÍĐŽ^üÇűó°ŻľUn¤ ąĆ~O+PľC~#°ęëŔ0t¦řlČ ŕkŇ™px­,hr;¬=L;/đÔ6¸ox1m?jęšOk×â˛q°-ĺAÎ4úŕ#s€äĐÄƉ€ŽČ~«/ ·č@ 1c°) »’ő dŃÝ ŘŚ|+3ܸpĄ|Ń2Ľ‰ dž@ş“gŠîXdDÍ `M{ăFx?× ° 'h3gˇ{xč°Žćc& \©k"ĂĎ&R;2¦Qř6Â}¨3fţGĘçČ7ء™IĚv-4íDÚÇĽ€`†ÄT‰gřšF9CĐ ĘŔëÝ^€ł¤ĆÜ«ÉsĘ!kë@$d'Ç‹{6¬€ě˘D,€íႜqĚş <}b ·bž\L°şCN—Č‹oŕăřÄk˙Ź”ŕëuşy&D ňlÄdĚ{T1ýľ@ÄŃVíÄŰ‚&ŽśëŕĘĄ0·ACü8° €č—!A7ÉÁ.fíč耜 ¬‘&L€öÁtN€z„ß{”˝Ţ… ß‹ Zĺs⤯řz†ňČ®+Čát~&}Ľ9$ZBl# Ç@Ń/Csäâd َˇm€$˛ȉrŔㆠÝÉŔ,•^=ąl4)5…Ü?oŁ‚ć!'L¶E!Y÷ř¶,HU ©6!űEčăÍQ0 HČ €čZC6ŔädTř‰UŔüeěجîJ&LÉ0e¬J„ ˘ÂnYľÔ ß©!7·n8pŕxârM:0[s>}Ľ6śţqWŞ`ëyČ”9@ÎDż ̇ŕłĆO@˛ @2$ćő8Wö€ÎÇě]ŢľĄ’!!y€  ÇC‚okő9a+x`§€ć*' Ž1Őˇč°ki°í2p€ś!Nrŕ*ňýɢ»“_Ŕî?§×!€Ď:b=ĚfҦ'Č0d4˛>y :…ĆÂOĺ€Ř@Ä1‡|!ŁŁßy?8Ä:Č,VŤó!ł)4 Ršä @t­!!MM`.Bë=â--7ŔÇ @Z§˝Čđx Kę ×›CQVHu ¬+w ĺ:ěňŕ!Ú *&sW I†±đü ˙Ä3“a`]ĎĄČŃN¦C^ě>ä˛GhPL{@n}Ä\ŠÖdźŮzăřČsđ@ÎŚŕö*¤N„lą®$ˇş‚/Ě:0Â2$¤žTwhç‘âĘ9đ Ľk™rc!ěç`€'C"gu¸uűŔł˙@őđťÇ˙1jHČ&O¬+~†@Î4(2$V¬Eg€O4ßË'°e-řţć  Ę‘ÚFę`¨ç( r† A”!á®đk­€µäĚUx«FČú`=™0côö›Q0ôr† şoż‚UbEŔ<9˛˛î?řćL˝7FXăsŚ€ś!®’ á›6 Čüž đ?8C–łŇÍ©Ł`Đ gH€˘k†„ ®B&!!3@°ĽI RIB”AĆK˙›¬ýŤtsá(ôČ €čÝd…·N!c6˙‘Vđ@şŽđ‘UzşjŚ‚Č €pIŚ‚Q0 čh4CŽ‚Q0@1 ´FÁ(é^#Đ>żtŚ‚as#@ŤćĆQ0  çF€ÍŤŁ` $@ÎŤ4šGÁ(H€Ü ŃÜ8 FÁ@äÜ@ŁąqŚ‚Čą €Fsă( s#@ öܸáÂĚ»:FÁ(697ĐŔçFČaÇâŰ‘Áá|€9ş”| ?€śh€să‚ gp`JANŮ€čĚŠ?Ŕ—p@˛(ýť: F-rn ÎŤ­đmÇşŻĽą~eŐ đÉ7C!çVŤn}Ă çF€ŕÜŮëż>V€ďNý®gĚqr¶ę‰‹ĺŕ'˙Ăt@ÇŹ‚Á đÜä3rn ÎŤČ{€+ðÓAyďÉŕK O€ok…ľ ą„Čş=És43HidUÔq@˝B &ő0Ďů¦˘á´đȱcÇČË!o §T’sÔČ® ÎŤ3ŔW¬BŘ[qŕ÷Ăű‡ŕë«ŕ]G ¨âě rĐ;Üëş1š‡`n„`ž$>ô0ŻŁ# † ‰ě€ČÜ8éFČ]«3‹lH˝7Ü(…çOH¶1€$P,:PçÁ ŕĆ-1cLNR%1aC47â2Ŕ//  ¶Ü@“Ńú{!ŕđE«W^€Çi€ŤT`Ö\±b¨ę\°r+¤Ąz܄ퟆ éIB®ĺ*:1tZ©•ž0ÁÎŤkH´ű(-7ťr#äú*¬.7Ŕuä¨Ő`5ĺ 9N0cl€\nu|_ä6HË2ś¬39´`4V©¤ĐŔÎŤX­€Čő8T-7=r#ÖĂ‹!วiy3$qś9!3ŤůŚ'@íŐŕ+rŕ@‚Żßé:űsV@Z¶Ŕ'ěóA ¨›ŞÁPĎŤ·oß&;Đ0Ç“˙'<ŕ·nˇÁ–hŕÇTYčřî `ÝŞ÷€f2ŔťÄđ Ś?@&Ác63Ŕ|ů Ȥ) şůĆŤŕ»:€­\`î]€»ä~Ť&Ŕ:Í€ŕIľ”;• Ăń·6qÝE…ve€\J‡üţý~Ő ¶Ü@4ĎŤDŕ:^|EÜP6;\…śM”ć´ ŕŰT'€Ŕ ČeVđËä@×H8°bĹ đxĎ…ˇÓŤDĂ;7â×…K#ő¤† 7âń ®[ĘńX„§«9ř˛G¶ąóîT¬ŕxE8°Vć:`»B«; vXÝ8á¸i ą˛gÇ‹'nŔ¦A5*¸É ÉŔ7ŔË ťĚ!´4 űÜż>•x]x,Ú»´|(d_ms#ňrp<Ť^€×Őw4[ž@dĽÔ ‚$7‚Çx@V@jEČ€*„„¬q…t:±ŢĘ:$Ŕ°ĎŤ¸´ŕ™oÄc‘Çu Ó/^@*CČî* ™i„d`ČŠ9ČĽ˙ŘT$|nXUv€ű“@ˇ7n;‘Ŕ$ÓÚ_4tËŤ$šŽË|b„ ¸téŇ mĘ"» €č7ŠĚuwÂÍ€ŕf'$ŹAš¦Ëä y˛<ŁY׳dÁž +n€n%-jď«:qĽTőxś0š‘ى ĐÎp\#7T·ëÚµk¤†­˛ó®ą˛SjH ”'!ŤO ”ÓŔU0Oł\„ @.dÖŘZV€ €őÉČĽ¤˝ ŮŽ żŮ1Äw!Ź´Ül…3}O­­j ŮaD×ČbT¬RđŃČmÇ7`Ý đÄ#<[BŞ>`/Xoű—;`ÍWĐ'Ngס;x#-7ÂÁ<‰EI€Ô`¤@v@Ń{ľ׸°>¶?ĘV¬­•ö@&ý!Č ­ŚŔŚ Î„;Ŕ VH k ›Pü`ÄćF 8uęŮC* 5$iÝ@0űÉŔ<ŮÝYzŠG}Ä„)@»«@Ŕąî쨎7n@VŤłrFrnÜt!č)"Oß Ü"Zd÷Đ€­Ĺć.ČńŤČł»=€9 2ýxĽB2ĚĚŤ+fAČD˙†kŕ†@š»DÎsBŔ ĚÉĂx†â;‚ÓDžÍńčŃ#üć¤&Ő˛chŔr#d… ň f Y• ™ë‡ćĆĐc82,€ś­ză¸!řëX¬€ČEítĂ~ö˙?xG˝Ű«hbäŐ–ÄŚŃ ; €rť*r%†Ü‡— 6\¸: ŕĽJ„ +V$L?p¨¨ tA@G˝aĐďl$Ś„ÜřźĐ™”¸ęůóçDž@FxR ; €xŐ8dŕąńxÁ7d›âŠ W@VşoŔŚ űŚ;Ŕ$ä,9ĐBVŘÁäC}ę˙˙ÉŤßż§©Űţ7äC‹ČČ. ßĂČWŔ‡j@ő!¸sŘŢféF>OuĚ/ÁĚOB}ÉD¬‰Ăě+ľ8ÁĚç ÜŢ1đa2Br#­ÝFŚ-ÔµB·ĐŔ§<€ŻÔÔ3fjĽĽÉřxżUxÍęž +:€ů|Ü80SAtA±/`€ř™7O®<8ŔóĐŇŻ„ÁÉŤř+.L‹ČžµÇÓj%éäHZdÇĐ`ÉŤ ŕÍSđ^dď"ddÝéŤ7ć,Y$!;§.€ëI` ě.Őv€×ĐAäĐŮ>şX­{°C`Af&„2dFHnÄ?˛ŠË"2ŽBĆ“|]˛ch°äĆŠ Z¦LěcśŢ[ ą`ĹŠ ; c§@kß&€W@ŽX…T‰©ČH"˝Ó i‹#üÄş'ćŕĘ„;LîŢÚÁ0€r$äFü“§ă Ë’Zˇá±hŔ/@v @ ŠÜě>źˇ ĚŤµ5YŤ'NT€÷s@–ˇBÖÄÁ÷p@+1 Ę'`>`¸±;ăŔŕcć°î«Â•ÚŻě٬˘×Ě™ra0CŇ?( `xçFb& 1§7±*ĂzH2Ŕ?ŹB•  ; €En<^RZ;żRĹ}݇Tt;Ŕűúg@Fq gXkBč}°ä`kYŕ80îÁť+ E°/®`ćF EWVL2€ŤŘ €r Ft†Gn¤j°- ąi€lb¦Űą84(ră`|Íp­?»X%n÷!g4ó °cągĂ ذbÍŠ €5jCCĂ p~†g*HŁt¶Š!™Ř}sÂ97ľ8“$g40tT0¬Ŕp`Ă„ŠŃÜHľád¬­GŞŰBęd'Đ ČŤŔüvŕŔ„ €äđ"ČY©Ŕ<lyséđX+Pđ,»BFkN€÷m€ĎC†¶ZR­Ű0áƆ ÜBŔl Ď    |ĘŘx€!€Frnĵaźş¶ŕ:µ•ţŮU4ŔąŘĘćłŕ® °*f­ ŕ{‹!Ő#¨»ŰB9ÓńҢ`ÝÉ´ Űş €MYđDÂpl«Ţ€Ů Ô'ÜÂóâ$7ű‡đÜäv4 ę€Ó1< `ÄćF0ŃO©‚g°6 KČÁkN€«AČh¤<‡$|€îĄş^©sĽÁ b8fżX›©Đ|ö:¬$wĚŔ>Ől˛Ű«ŕ’ޡ4s#Á™\§÷“ Űa9Čn ]§şadĐß:Ŕ?€`>W?Ŕ»ĄZ¦L^« >gőÂ…!UřQq +V+Ďŕóăŕ† ­ Ťł‚g, Ç®B0çs8°iŠ– +R ŤU`ďqĂhn¤Ŕpb°š"i(ĺű÷ďdgËÁ0f ]@ßoćĂ ŕçđŃéD`5yŔŰÂD`#v0,[Ă çF€,I žë ő^x~ĄqřľŤŕ 6 űý‚ŔÜ8<ůş˘؎ŸĽĄµ"°g¬Ün€góčĹ™ ™g¶”“áýĚ`duŚ€śh°äFĐ}8ŕ†(0GuěŘńâ4[ţŹč;›Yî¸Â„,»Ţţ”v\¸Pz Ý•=;vpp<€t9 Ś@ä töTTô444t»Ś @…Æ®|˘ŕuä@ĆĎ(¶97Đ ĘŤŔ#«Ŕ„żĽ«ľ§bÁčáq°u6+.\źw ož`_Ŕ̬ † '  Ů^QFÎHVܰbÍđM;Ě Vü…áŚű€ąşˇŁŁ Ą…ăP#Ăf g€śhPäFČFŞđbqčÖŤ ™´‡ €‚Ŕ§úÎY=prsä,9<p[¤đÖSRn`ÔŠŕĚörV2Ś ş* (Ě ěÚÓł’!áZ€9ČĄWđŚ‚á s#@ ŠÜxĽg tĹ͉ĐCŠÁ]D`F‚Śë@ ŔKä°žgóâhN˙Ž ‰sjŢś ç ĺFx% í">uVÁµčp&\pá HnDÎŁąqP çF€ŕÜl…žĎé€ Ň€2$x*˙¸u Ů„Lu|€ŤŻŔ ypelj Ţ 0§=ŘÁĚrŔڇi×ĐÁUŚşq!ő!AŘá„4kŐ#8sÎ7qoŔ•Ű«t §Q0lrn ËŤŔÎ!č˙' ťCu fěŘ̇ ÷m€VáĚŻ?/žĘľ@®m„×™ŕL‹ľÔ¦"…áÉ `Gń P}_˙ YzжyĂřP (Ç^x@([óä$nš,l [€śhŔZŞ #§@óа[Ą`‡Ů@Nť‚Ü© Ż3!sŔ׉#lĘ‚¶Gݸ°aĂŽS€ąÔ…MŮY‚2eŹ\1ë= ™Ŕ†(î¬ŘT=NčéY*ŰđŚjäÜ@–+fĚ0˛&f€ÎˇzpâÄČhÍö ::>€ëMČ<Š!)  #<Ŕűž€íU`nD^_ŠÜ|s‚áÂ`÷ď0wÁrÔä1R4pâÄHnÄ*KĎ€‚ěd‡ňV]B"›ęnCđ•îĸ„ňµŁČ ë)4j`rn ÉŤ}ñ– 6dLYk˝".0M~ܸQŢá>Úĺ(q`/ŇQ`=ůcĽôÍ ‹˙ŕ˝T+Ĺ3rî@f, ëlĐÇ˙  Ţ#dʵb$ů~J0îä2ür#ÄśsçÎ řŚähŔܵëÂ@?ŢOđ»Łaa`1a`Ôłgj-ş;‘ŠŇ/Uú¸ŻŃ8KµT? DŁ&p_h”äţ(žz‰|đtÇŃ抸‡ĄN(®ŢÝé©Ý ĄňŤş¸řˇŰÉ|tŹp›_î˘q3=—´š^JKlIl8xĂI)j­ä±1RŹg:ţ&–2Ů9jŔ$’s¶«8tܢ‘=lë0 ăr¸Ú’„+ňi<•)ĄQÄV0¶[ń7±hÜ]săđ§ “5ŔSnáđJqČÎ)đą© ‹nŔ‹ż€ŽŤ;pŕřf8ČXD ÜŔgr€=F`KőĹ đd#úRđX+čđbČŠ8`ţ»PŃύز"úś>°‹ČŔ­WÁ‹Wé}ĎrnüŹT'ŮŔ<‰µÎ„« âĘŤG9ˇ= Q†ąq "Žu&˛p±!óäHÖÂ<ţÍŮČRbÓČ*á·R4ČÎ şćĆ.ĚذĄ «mv\¸pâÂ…w®€Ž·oˇ’Ŕ¦áđ¤?hx«Ô đv řŢČá˙@rÎĐôĆ•3s€äđ jG‹ś€× lĎ>€-¨3±äF—”3ô "rn„×?5d.ZŇ„7äpĄEHn„Tđ!R¸Řś IîČŽAk©“‘Ĺá·cşkť†¬R*Á›¬)ȡČA1xÎÂAČą €č{ÓřŠ @N•í-7G!ëżá;ŞmÄ;Ŕ‹Č]GĐVŕ ©Q2€ f40@V€N‘›ăĎŤ 90čÝ ˛(°v[±‘ęĂ/@ÄÁ1H÷g`ö!9k˝©31óţÜoĺbÍBB¬Ů†ěÜĐÄy Ďrf˘y —Ç@ö,@ Ŕ(䎍 ŕc8@7Ŕ54ŔĎ˙9÷ ĚČX±˘`Á—†ĐÄĆ… éŕ ŕk86<Ř!ě7>ďx„ÜW4 š‘&3€ytRkE\źQ׸AjÍ/ô b<7˘]iŠÜw‚H—ihĺF¬w¶˘·éňa™h`ĆTYX­-ݰ™1cdŇ’'!UäÜŞĐĂQO@&â/€›¸Ŕ—4‚»‘ s^ŕÖŤ@ŁĐÇTÍÔ°#[Â@hhś—  ĐúŤČ"´vşp%AJr#ąq ‡řŤµ trô?¶Ü|>¤1‰<"ELn„´±á Q\.ś97Đ€Í7BÎ,†¬}»˛4e?˛Ł |Ö8¤Ť n]Ľqś97€ę4čÁ6 9đaÄŕő ű3 íŢ (Ní7SW u×Ě™ňéđ˙ŠŠ `ţ|ńbfĹ˙xs#0ěC4ĘŤaŠ3¦vL-Čąë±üzц[±ún;š\<ŮŮ4`ąťvŔ.`|;ź ÂÍX°Řň÷îĽxíŕAć9:`‡ľč40@.Ďv:ŤŐ†+Đ–ˇ‚ňîť+±YČYŚŔ<ů4LúYCŃyŢ‚Šľ<€ÖĽ«  -h' B*dĚIz´é¬h ÄýČ&ż~ý(BŚv4— ׫C çF€°Ü¬‘@•!8·lß÷vâÂ… .€wú˙Ż•­YőA÷ŤCNUonşąźŘR-R+íđŔŘF ¬`áC©ŔmŕŕÜř Úfę(ö97ĐŔŐŤŕ ířfbČYă@`rx˙đ.'čVFČé+Ve'L0gÉPËöÁ fëO Č ¤µŕp0iż0ĽŤ Ôř#7Úń›Q0rn ÉŤ;`GŕLvÁS đ8Ŕ~!¤ń™0c$·@ÎM…¬&?žíćŔŕĄp@˛aÆĐ/ś:é+>řp¬:ďĎő,Xňř4 ,Ś‚Q@O€śh`ră Ř)ă;.\x>|éÔ†„ n€NŤć“sě€Üš l¬č$UĐÍ µ$×tt@ 8ńe{ä˘U`Íů¶mTO"-âŁ`äÜ@ÖRuičťö®$uň<¨Ňüʨ@Á eAשBn†Ösă92€=Op‹śO€ÇQ‘g5Nś¸2aPś4´Ú 'MuŃÓŔAs#@ |í-€;Ô¬¶$o€Ý-µâ•°km€MŮ ŔWś€!nŁ‚r/,C i‰HX? ŕknŕSŘI Ü_ż~ýi3D/r~€0 +Ë‘W±Ó.Ú.`f†duLëpĺFbx˘f]qä\Ť¬ (w0dť.ă -uĐ›]š]‘hz!Y !j°†ŢľÎŤ4Ŕ·©nŹ”BŹWŚŔ)háđńă@6d$tSň‡đst +Čŕŕx0aNE ĽJ„Śĺl:ÝE8@N"p.Ľ>`=L1đbY%d¨ľZí?l˝D^©Ů„™úŃl˙Ź-_AŘČKOńô!9^ AjBĚ­XČ&č$ČÂäě„V0áL‚^Ł?@ÎŤ4éX­Źoů_nBúŤŔذb°b„Ü{ßd ™Ě€_Ŕ D@5  H ýQ3äPé.ÂÖ<ś_Ť‰¶}Y%ň^ Ě… ˙1Ö¦â˛ý?Ţ$KLnD“Bv. űP7Răę¸ĹÜ@ś^ŤĚ„ 2ŔWPihśĚ[3!°ٰcG‚iFHîm?ľZŻS1c°^˝šşŤńđŃy>"ŕ‹&+0‹Vdd ¬×Čs#®4D0qĂëHËmŇŢ\\¶ă±ą:“Ń\©“‘ŰŔ$ĺFEh›úńhÇđÂŕĚŤ4đµä¤c†ŕŰTź€耗ćü/_Ěo7nĚW›Ŕ Ył Ôą»•üËpwń¸J„lˇ˘ł‹(ćČji\ą˛˘šÜÉu@őpAän0eÍ´c˙—o˛uČÇ`šas \ň–|’r#˛EX;~ľ+DÎđđž6rOހǴ…ţ97ĐŔ§ZpwqĹ đ!Žjr;Ő pç~78dy č:E0fKČ~H <0÷3!0O‚VşÂöL E€6b‰™áišČÜ6şI—€Ľ "NLn„ëBŢ]Äzň˛EđQP2ră¤ńa´=“˙Q‡Ž!…ň¸1D ň) 4š % G@ pn\±t1ä¸Ô°â6Ŕ6ţËŤ Ą:ŕíW`Ú¨>:ÝÉ*qř€·őÎ0›d‡1@ÎŤ4É—´pt„  IüĐĂă>@v·˙›ˇ7ŔkY'¬€ŢQ5a ČAŹ ¦¤€ŽB/#ßłaf>ęÚíVąQ@ÍŤ4Čą €&7•đ!?@jBđÜ#dvńäpÇ@3ř %ĺ #® ~ă””O@}Č +Ö`5xPçxℎŢŁ€d€śŢą‘áÂÔ˝đ™}`6¬5…L0Bš©L™É€âą:±^¨ Ş/\ćdČ­¬ w>‚V±&¬X!éĺ×Q0 äÜ@tĘŤŕu2;0Ĺ]z ÇRA6p@&ô,X™uŠs#,CAŹu„¬_Ęv`ě‚lŚ­/żq´íăÇ đ’tĐ!ŕF/¤ÂÄÔ8 FÁ@äÜ@´ÍŤŕ©x,™ Ŕůd@ÎYÝ9˘ |ÖdăÚůŽŕ!ś9 ° ĚWöě8;on¸˙ égëR`&šřiˇ„U÷ĄEßꇬđ˛ ď±Ćś‰őVéďFç썻Dt੠´ŕ<"úĽ’×lĺFČĹUĐśŢöąW˛’ˇąń (3C2$dŤ+HŮz_h ěĂ´‚\–cř_;‚vv(Öş‘ íČë¦$Ě…¦Č˛đÜ…,ŢşE—c ¦8Z`®B ´Ą9Č€¦Ů"€DąŇo|í~ö'W€· «6ĐÝŚŕl Ěl®#äxU`Ť 5…á]+Ű{µ|U+¤·yâÄ Ěč# § Čn HRŔĚNŐ¤ČKOŕ Á‘ŤÂ•1µă9ů— ˙ëÉ˙„ZŞČ\HĄŤk%7ćkѶk’d DľÖyCÖ–*Ö-iČk’ Ž„,ńˇŃ^jd«! €QnÜ>V˛E*Ŕ«Q ZZ Cf˙ˇ×°B2xř1*A·_-XĐŃŃqâôäŽŕZń8sŢÖ :@ypś)Ž+…af'x:@K¸hâ¸r#IÚ1]Ŕ’‚řÜđxOMK†-X­Ă“ѤŮ(šžl€é#€DąŘŕlX° `ĆŚ ;vtźŕ Sŕ}Ž7 W©3x"äĆ„3 —Ě]¸Đ> ŕxŃÜ pŤÇo.€o¤[±á@hżň‡  ®Ú— €ßČur#¤>AŰvڵ%†ÖXýO\nÄzĹ VŹcćFx?Y/‘¶ ŹÜ@(7Weć(0;έZ°<ëÎŤO@nÝN…´`2Gr¶k Ł;ŕĹ­ ű[Á–/0ČúŤ6d¤˛=źě_ Ť,N|n„łŻ®Ü|őIą>,epµT1oŃA»:ň?ŇĹ8DÚ‚ćä–*Vż`m±cŢđEçÜ@%7ÂŹ¶'!ţ7€O‘¬wo@:Ťđ1H{ČnOńĂÇT!¨ĐŐäŕUG•ű0DśČěDü(Ví7"ăo©îø1Sͤô ‰Ď-:ž"ÉLń˙¨FŁ8h7(cîśÄ FZd7}oSíčhhć˛ȵmŔZÁX±Ň]ĽńŕdŞcNE č$UpÝxeĎ`΄_ąńiB(6vŘ„Dî^B˛ĺ…ž`ÄŕgÉ €ş¨2ŻIXg8ŢqŮÄhÓ]XďÉůŹ÷ť˙°‹w $É8rB.Áe#\–Ö áČą €č—ąÎf¨ëE7€Oöú ß @çÁS Ŕ4?`'â@. €Ô‡đc¬ŕ#®/Ŕ-U 9v•nŢü`hŤh€4-UČâoH­ µč¶c`vĽpaĹ…+@.dť*Jn|ń¨Ô4…Í1¶L™9Ě ˘ýřbP>üđ|}ňŹÁÖRXl©Â÷ć♦ôČą €čšą¨ ť:d)H˙2*żŘx¸- ’Ýg ľ"ÎĄĐ9FH愜©Bô!cBF÷UŤ‚Á s#@Ń/7.·B!XË!ď«‚ďŤfľ† ":: ZZ@—_¸ě:B&9@ŕÂřlů¬,ÔuüŻá9~b´Ą: +@ÎŤDżÜ¬Í`ÔmhŔTsĽˇß&¬XT0Ľxś› Ů´ÇjĹĐÍč°ŞKCädÇŕ‰~Čć'N€â4ěŔ·}d`Xç ‰tpýrn úĺFHmiŁsÚŘěň¦áŽ+z–,hO3ëOČ*d­)(/‚ÇuV€Çxŕ“ŠŔKĚ!×ćT,X©€/6^08ÖÁ -@‡q˛­–cNČą €č”ő0ÎŻˇLl,Ŕ±ťâx˛X+B®‘äFČ:Čp+d d&cĹ ĐŇp`äń˛—ęx!äăQ@*ĚąqXäÜ@tĘŤĽźl„"ŽU%äśŕµ8Ŕ Ěo°=3 y 2ű©OŔöC6[;–YGHw‘>^lmď|ć˛6yęľĆšíd‹f&ćÉÂřZČH/älH¬—ů łŃÖ0 ź(9´˛/NąŇW„Źâ@ŽŘřží@S 9öć?([.€T€đýŤđł9 Đ] fÂóä ŘiWQúřnPřô7|y $ŔOFľč ˘ÂjÄł cá[7±ŢňÇň´˙„.óů[dŮřt0?ÓnŹ5­rn úő!ă7ĺ8Ŕ¬ÄÚ^EŕVwŔ Ě~ ŮČ W:věh÷![áçYAň-ś9ZndćFÖhËP‘×Í!/vE»$Wn$ŐX€,ŽÁz482˛Źî ¬—ů k„Żk…97ýr#°…ů|#äţp¸ AŤó¦€r“1d„ś÷`‹ZÁł‘pĽÍIľx1a„ĺIřňKřĘO¬çt UD¸Vi“m,ň…4˘kGČ]v  ·ŰaU9cÇŽ *ńĂČ˝Í@ކk€$¤ă/Ŕ$őáĂŠ W "ÚȆt+áýKČýxxFnGÁ(Ł`ĐŕŞhD׎˙ÁwiA®čÍ bTŤŔ>e@GÇ pµřÖGDî/µ@úťŰď@5"°"|ńbxś(©2!]F҉„5Ú•Ł`ŔÁű÷ď=ztíÚµsçÎ;v WÁŤNť:Tűöí{÷îMřţýű@űc(\ˇ @#˝vü® !ť9dÁ`řś"¤#©Ň Gxµ¬W¬Xа`PËŽ'6ś8Q±`Đ@`żXwBn‘…ĚkB& U#dŽ21 T9şŠ‡n€Č˛f°`‘‡ß_@íF,€ ł>FÇ@ůt\@ٵăHí…,_¤Żá$PX7žxđ gÉČÎĄŞ`5ĐąĆŕŃW  ¨f…qA•â‹@˝ŔşqĂđŢ'´ dΉ3fěÚ ´ŚvŚ^ţLm@‡b`´v¤8tčĐÍŔJ‹i _d-hżîvLG»€+hD׎ŔްfÂÜłah°XQ]¸© !sŤŔę°ˇˇTĄýř¬˝f€+9`5©;!U´ʞ"ŻÖj‡h|đá°ÚWŤ) Ą'nÜuX<€¶XeVŚö,©čXřPŚÖŽôŔ^5RÜ ŽŁ÷ďßS׏Cŕ €‰µ#°‡kð#™†ŞŮžM„,˘ îŘěN>€Í,¸qŁ`ÁHL^€ďâęš’…Ŕ „hšŘ^Ľ ůNȬäđ:`yŇËšsŕ@Ăčnę:–<Ôٵă@Ęë`O”>c§d€ŃÚp…@ çÚ^u+žŕš «2Čdá đ)Y§¬Ţ€d>Rçµk¸ 6,os„oŰv4wěŘ©V!łŚ¨¨(µ â@ëfěŘŃ©§$|Ç$d őÁť+­3F«FŞ:–<Ôٵă"}\»vm ťŹŚÖŽ€+|hřÔŽąCRW¸«ľ.Ľ€ŤŤűv ´ ܇›î B €Ôj໮_ŕ#® >ć‰+7.\TĄ€•â…ŔĆBLĚP"-äą]ňt†ŔČT%x’Rď«ÍŔ%°3úŕÁŠ W:FkJŠKj‚ŃÚq`ÁëׯIMiŔ.#µ&Ź;vîÜąK—.ÝC¤®kĹٵ#ŕ €µ#ć p$ČlßČ xšqNEIAK tjđÇ `°×8cdKĆ$_Č &Őmn<€‰k» ¨˛ý˛‘í `ő Y 1'hxë$¤Ú"`ٰc°ŽÖ”ŁĂ­äs”ň†Č€Ą$…öś#Żv„»´DNÝ‘áň}D”ZďÁ2”lýBTň‚ «F`µGÉz`]4ŚÖŽ€+|h8ÔŽT €ťÂ;&Lv'ë €őlîđ xGăř*V´ ŇAÍ5~řpâ¨v„l|Ľf ·«Ŕ°-ŹpíŮG é;IPWőŔ ŔÎ+Đ:Q;.\€Ô‘Ő>'ŔťÔ¶‘€Ą …©e+€ĽÚ‘’1C*2\ľŹôR‹Âů?’ěöóČł…ęKţ-0bµn´v„\Ń@ٵ#tlجĺŕDP] …,9ją‚’Ŕš X5B–í@–ó<¸sĄcĹ `ý $+fĚŞGžŹ-öśTľb hxł#°f… ŘÂϨ4äxŘŁ"xrŘó|®ŰN\¸<€tžähVČ–đˇ7ŕÓ™:<ů˛Ŕő¸j„ś$pv¬9PĹčĽ#ťÁhíH @†Ë÷‘[j‘±Â“ÔO zň|ózF«Iz\q@C¸v„TE¸dqUŤ¸Fb!sŠ; űoÜŘ›Jv XA‚6;N€Ö—đ˝‰ŔÚ ˛Ó±€\ řŠ`&Ýę…Îe"_Ž|ĽH%xPÄŚ-@gtěŘѰc\¤×škw(uä‚ŃŞ‘î`´v¤ ĂĺűH/µ€˝F2Ö‘’13÷ű÷oň|„ kôK—.=zôhtšFWČĐ®˙LqČř$¤ÂCYĹZ›B*<`­1 Č…x‡LŕřÜ8Č*ČÚČZÓŕ^&dA ¤¦„\€ Ż˙ ·%Nś{Ż‘·slٰ˛B" T ¬#€Çi3Ŕ+]ěŕłh Ł/fµ#íIĺ8ŮVŕŔŠľ#ěí€@˝dGÇíŰ·Éł” L]{9Čví(Ŕ¶4´kGŔ_…¬ …`âęGbť8wgŔk/ä±SĐ”!ĆvH]™5Í2‚ç&_ vŔnü€Č Ő0Ző :Cu7¤’ţđŇkśŰ#Ł€>`´v$Сv¤Ö¦T©i(٤O98uę°ŻIëăÔ‡ ŔŚ4jG8€”©í §Áyz˘Žś1ăÁť+đkŕbŕ†k ;÷!dúŇYěź’ ™›„Ô ŔJÜÜqăFņ q€ľ-Rq‚ÎV-čą4zh^óÇŘRUř˛ŐŃ•8FkG"Á«•ŐowµËţč’Ń»«đ\á@ĂŞvDđăRq8ľŰq3+V,€Í5N€Ě&‚6Z\¸QüDęBŘ‹>ŕÂđűA'Źk>$ő<Ö‘ŔŘ 5ééŠÂ5~@şZ˛›äx9ĎX- 9ÇgtÎŔ‚ŃÚ‘H0äjG4¬,©Ň÷˘p…Ő-Ăŕ .€¶µ#ŔOLĹ”‚ŚjbŐ…ĽÄ˛Z.•ěBŽ€Ý¶QŢř™w„t"’@ŤjĽňD1Dׂ.@®ŕ€ÜňţE…\hY}đ ||°?Z°`ÁčÇÁFkG"ÁPŻ‘Áµk×(¬&[ąŹ.' !€+”h8׎°`ÁĚ•,#Ĺ!ŁŁpÁ°}űÍ\Ţó.Y}î ‚öo€§<&Ě8Ţ’ńĽŔőĆ˙ŕ ˛*˛!hř đ±v”ű}PŚÖŽD‚áT;BŔ©S§(¬#ÚŐ•ŁŰ+áWĐđŻá^«1đ3;ŕu$B x•$9ăhĐ Č)9hű7€µd#°ŽÜ´Á4R Űł©!k|¨ćaRµŽŢ`´v$Сv$ň$tęŢ­H­.0|IM9şŘő?î@#¨v„W<ĐmŚŕĂ€•©;#á˛đŰ'ŔN†źŹż»˛č2r Yť°aCDG|ş2§9RgjŐuż ţČ^r2rŔ0«GÔ~G8F"% J=zDEżC°4v VńŚ.mĹ24kÇ˙°5;Xϸëâ1 rP|Ń)hÁ*Ňäí@°żx<¦ Ú•2…\‰ zö9Açň Öy¸öqRŚÖ‹$ŃÚ‘€ —ÔVHd_2E·I E˝›Ŕ$D­Ű"1ÁíŰ·éăťA p… @Ť Ú@öţcŠŁÍG"ŕ3ß §°î¸q#`Ęä«đÚÎţŢěąú4â V™n„>m˛ĺqř`ą–)S^€ ‡pĽ îZôG‘`´v¤ Ăĺű¨Qj‘wćę 7`ĹůüůsŞŚ{Ň훸B €Ftíř\Ů`ެB ˘ťł +Á÷ß©,? mŰ8©/Á»ţ!Ő„<ľń¨(Ý›q„¨,ÉÜőź†DŚÍ‚Ż~¤c¨üóäĘ‹ /đ<ŘÁ@O{9­iČpů>*•ZäYM‡%P8‹9ĐÎ`€+Xh¤×ŽYn aC†U!{!S‰p© 6,vď.\’ NśŘd@*·vŔ×é@®Á‚¨'ňd\˙Lę…  ępÇ †Řh{LĹHŻ)GkGZ2\ľo׎Ő@t»Ž €˝IňĽC­ŔşW°ĐhíŕgŔëHäĹ®ŔŞ˛1ălŃ äÚâ;v€ŃČi8ŔRíANˇÜÎ?ĺr;ü˛ ˛N§Ľf@z“@Ç€,¸pt!%q M1G‰Ľ¸ňŕĂ<7¶€ŞĂ gv`­1Á; ŤTńˇFkGZ2\ľŹĄÖŁGŹČł¸!«O`Ă… Ŕ^mŇ‚[üËvÎl)ÁĽ@« ¶ëH jČ`h€Uţĺť+3AäČb­iČp9ÝW> ´“ńŃz+Ŕ\4rkÇŕŠíěza`ç C¨đŰŚ‘ë Čé©đaUČřçđ"RřeĹ@ “T'L@®! `?(xehť d"›-™0r Ä]"P+°UŤŔ.jˇĄŞ@_«´  ěĘŠ)điO`U BO,’ŢňŞ5°GŮÜ0rË3şřúŕQV`5 ěSŢçŚÖŽ´d¸śÖX‹}žę@»;$q=8®@ ‘[;‚z„ŕÚčx˘tD8¸y|đň˝ÄąFp5ą|\ŔŚ3 Ę€µ,P× đ6˙pĄ™e„śoYÝ Ů4 9‹rěř‡/ŕu/dm*PuÄŚqŕ°;[±aCxäÍŮ µ?7ntŕ^ŞÚ!uŢPŐµ|(,SrĂ“ŕ!bĐ2"`Ŭ>1ëE88łĄdFhşXkDč@+lőΉH?XżűuFkGZ2\N @ÝĄ¤@s€ő+uo!ĂGŁ»5¸ €Fní3ŔŐ!¨b¬ś/7˝>Ń ´‹|Přđ٧ŔÎ P 蔜;z–,ذb ä Đ.ţ  Ű"Á*A{üÁçék»‚+věÖd 6€;СKH­ ˇÖŽ+ŔZ·‚śŔvzÜ Ü•"hlÔ ä€ďGĽ°‚>cŠĄS8rÁ„ ˘6{ ŚŻ‚ňîI`5 ¬‰ź\Că`0Z;ŇárňdëĄK—€V]tľĽ X_ľ~ýúöíŰ4Ş2ý] ×€Őá蝤\A @#şvDŔzXSB:‘ŔĘ´§¶aމ„Ô zčÎČÄ!„vĹ7i+Ĺ'NQ°×4¬ëŇÂWČhęزU DdíĚĐY¬ŕĘ•—CęŰJB‚¦aC»0—a©#©.śŮ¬k‡}Ő8 FMÁçĎźß°†»‡Ą jF«@*\µ#@ŤÖŽ( ĽtĂŕŁPa{äî*ČâŇ/ľ@*9xÍ B…ŘţwÝGĐq¬°Ę 2ݬS€Ź”Ô‘Ŕžë`ß\.čرŁŘ«*Ŕ»î¤µR„‹€ŐŐ‡O>€ű¬đąRęÖ‹ :@ă«@pž1‚ŻŽ‚Q0 † ŔU;Đhí@őÍť+ŕÍŚ_€ŐäpĹÖ±cčx9pGtxĆľ72C ©A#&L9YĽ©]Ďf8Ô5t¤ř¸Đ©Ş°KŹ!UfHÇ`• ´ĄbÁ‚°“2pÜWĹŔ°C@ŕu€ş‰đµŁ76@WĄ‚‡Xy@—l€îŮŕ€L ^93čA`­I /xĺĂ † .00Đéě¨ţłŹŁ`Ś‚ápŐŽ4Z;˘ČŽ~PĄµc¤ĎéB©Vu+.\tÄOĚ-Ă_nu¬tż1dŻäŽs€—Ň$ŔÎ(ťĽşad«ÄH‡ľÖfqwqíÖŽŔşa°Ô—kćLą±ÂćÉ p˝=ÔX;B*QUrL+š{t€*°;@C f€+Ĺ Ő4@‘+€ŘŕŞTżľxAđę(`ÇrÉDÁe8°JëĂlâ_ŹD×'ŕ˛v©îś6ŽN`‘1:9 FÁ(,Wí@#«vĽľćůŔ¶®@ŞÉŐ|˙¨Wtá ¤˛ś›ô!×NAĎ„»pĽ€őĺđ)&ĚM~ĺx>XŐˇ;i¨Î .`€ßďęmSę˙ťÉypç d} D|Fľ rČ…3; u$1U#¸ěŔ!ľ©ň›F Ŕhí@ľ·,»ĽŘv`í;‚Í\”˘eTŹ‚Q0 FQWí@#«vlX±â¬Ź©OŔvëŮŔ÷UA¦á[űAŐäť+V¬×śŔîňô$ĽÄ‡ô gś8:VíÎŁĂ‘Á<*€Ý;žSR€&<9 ™ö۱` °ż© ¦ANzťntäÁsMAóŽŕ:’ČÚ?Ví==ŔjŻ‚ˇFvŔ*Ë ŕşpAÄA$°ŹË4 .*Eé(Ł`PpŐŽ4˛jÇ˙ŕ;Ş€WÔ\÷#ˇ§§"íÜž€„Ü<Ů›^;ł`Íčys@˝%<đ#rW®î€Ý‰ěV^ŮďFâ7O®<ŘÁŞ Áç—«@`ŤAđn⍠ňŔ¸2"kG¬—RýŐ÷? ŐžŞ‹ ×Đ*P@v<Ţ•ŕŤ’ŔĘ xőN v¬Ž‚Q0 F™Wí@#®vü>˛RGŢ_AőĽ[ŘŻVo3,Vl˝ 3Ä!›ŕű=f€ÁŘm©D HlÜ^YÂW¨b`ŐŘQŞđn¬°ŐŔjwt¦đĘ™9 ÓĆ@ŻÎ…ąO(‚őđ6ČZŚ: ŘĂ›0gČyŔ~-le)ÉgË+×` $˘v$Nďh—‘†ŕ÷ďßď‘Ŕ@;‡|@]Ź 5x`Cöăč­R pŐŽ4âjGČŮo/^ĽČ1#aÁ‚;v@.ٸ>ÎmÁNá ¤áç ďÁť+éFđ*MĐ6†ô,Yp<¦ ďq-š§vĽ°ÔGÜî‚.ŠÚŔđć„řtČ’č"<ő °7‰f&°Ë›ö›ĐÓłS ĐÇđíô ńµ#tç ĽvÄşĚGŐýÜźa~ěŘ1˛oD" ’Nkëh¨ëdŁĽŃ¬±¦Ar"üȸ˛-@Ť¬Ú:"ŠTá!ź0ţ|Ŕ)ňâUéđnČTäÉ!g±®X±rvůđtăp{âÄĚK6ACčba`şgńÉÉcťWČ ęŚ^ رZAB6ţűtx«ĆČU#ńÝG F—”3Črp`YЎť¨Ő‡(@«ĎÁf~Łi韎"Ş‡Íą«ĺ92l±,¬E‹žŽoDZň»J,T7€pĐíńÝŠăÉĚ|$Ě/s”?_"sYˇ»‰µ#¤^\:d ™MV9Ŕnă†k+NČUŔÚňlÉkƆ ŤđCs}9°!?`»;*ŔÓŤqKěp ®}ŻÜŘÂóŔ¶—€:Á3•€ö/ŕ:€ęňŕľhĹ@GŃ(Ł`Đ ŕŞh$ÖŽ „€ÉŮŃ2e d}éŤŔ‹lN@–ä€6rĹÁ‡Ě;‹ŔŞmÉ+°ËŘÓłă¤:VKO Ç±Bö„Ü»ÔŮ4ră°˙¶X9ÁOmV‡@ë:ŔG™/č`x°C|Dω %9řë°ŕ^ă Ř=R›7^€Wč€Ěďč€ ¨ţŐŽ_KŇţŤQ0 FÁ(Wí@#«vÖXŔő¤Ş;ŰÁZˇ>4r°8°Ţ‚(ë׋'Ŕ;áC¦Ŕ.㊠WV îŠ€\Đę;Ţą˛fĂ 7 Łăl&XQ«°Đ= ű­€vÍX°4ŸaA°ę‚Čő‰kćLwµzŇţ ř€*ňčXďYüÝé˙~>Ü(Ł`Ś4€«v ‘U;îŻ&…,¨ůľÇx  ŤÁU °ŇŻÓÝsĽ´Wň¨Ę„ljW¨7@ëVD€ÇEˇS’ŕz2 zl8č«3 K~€âVě¨X° 4CŮu>¤ÂľŢűd@¶R˝Ě>a˝Ţĺě˘V›RÓs‚ÎáOP±ýŃÂ$óĹrĘxjPăoŞ_ýµ˙B!l@#(jCŕTXp#ăXăÁJěiçś]kimŰNĄŽdb8ř1 fšÍ»®cŰ8 ĚD-b›A+U)× ą˝Ľ­×T‹<ć™ńĆ{ĎoáA™)SÚŘâ©Ăgľ5˛H= _m˙ŘĂJąÚşB”‡Ř'!ÎËŚpé J6•ć Bč®}Ń-Ďl1/¦˘ăKŤÖŽ ±bEÇŽŕ3Äw€&@Ż„N7‚î1†ěĘ˙;…üě„qx-ět‚ĎŐ— VÜV´Ŕ~č ŘzWČü"¸‚śŔÁńr7ň đhęđ­@Ĺ  Ü_Vx {ÜřđáÉ‚)) 0VĄ‚6'‚Żś1cƄѪ‘z˙ˇ©đľ ˛ ®ÓLě‚vŘ4~Ĺh†˙Ä_˙,ćđ+Ćęl`u…ő°4\ý9"d‡”ąďńž7KĐ.`9Žő:d€Váă6‚ÝSüÎ&FÍ>ŘŐ`ČĆâ÷ $4đ›I•`Ás4Ďp¸B €Fní1a °"ěذáědUP?ěĂĐíS°­ŕĄ4@ˇ^Ŕ;‹ŕÓŕN«Rř‚UHŐ?[^_BV˘Âţ@;…“t@•î Ž==;flذcÁ”¸{Š H[>@{9`âŔGŘuŚVŤ4GóĐ2ä‚'ČőOř "OuA6í<¬ęqťDa Öh…#°<…˘†éwx‰ řĂĎůĄČľV-ÄŹ¬˘=Šć#äsmĎ,%5„ńňΩA®·ý‚ëŽ*¶˘<Źžň`ŮGJb`@#´vĽľý1 Ł´sV{˝€ť~<‰ŘŃq‡a¤ţí÷źqł`Ç Ć ČZSđţ Č­ŤŮJč *¨ł¨!}Gä%˛ >ŕn"řDş Ŕšt#1x‹$°ł8cŃ bŔčŘ©6ĂT;b=ě?¸»†˙b‡}Äťy†«@Äc8°\C.Išş#F11µ#fťŐäŐŽřuá 1RC čl`řă:|Ž$żé5Z˸" €FhíŔzîĉ 'N@öx\oóŘŻ/ÝĐÓłrą#xÎ Čâ  °cĘ!'¦|@Ňw„ś6~é@n}<[ˇs€axEč¸ÔW@sś ¦<¸p|äéHU ¬8w ö#?Ŕ·Ű0şmĆ€ňÚą;¬«ŕ};<Ą.3ńđŰw±‚÷×EˇÝ˙@†ßń¨ÁS:ăşrřŐŽh v€*áłwýBđÂŃÚ‘ę`@#·v„€Ž f€»»‰ÁłŚŔz4ÜZÚőxv˝hG#älň°ÓV•$h’ň• T€Ď €ź~ř©öZ¦L™°bEĎ’đ[± »D,,,@KiŔ—kV`Ő¸aĹš;ŕ…<_€«ĂŕK’ŔÇąŢ鬜×Ě™2:ËH€\ŠaŽtý'˘@ÁĄ€Śó˛qj7Gb:Rc–×Čęődő.íÄX¸Jg´É0 uXµ}DĽ]ČŁ‘hó‘ČRČ“»¤†0€§Â#…udÍ/ŔŔA›’„,®ÁęTĚU3T –öá4ŇkG ö#::€µ°RĄB/Ę1ăĹŹđyDČ]ÇĆE!zŃ1lŕtxK>d|ÔkW‡đ!—a]€Üőŕ¤ÖĽ±Ât.ë† |t@ýÂkn€og„.I˝pe¸Ć†\µ1 čp­ŞŔş*k‚yű -ŤAÖ‚ËLbšüXrÝIp±ŐkG\óňB¬Ë…Îqţ't#1ĐXx•ĽĆđ÷Ňđ\čAž_­Ŕ5T@ő`!W@ĐhířĽ«đ€KĘPŐuŕdR™FVaÝľ Hëw^ -´A›J’>1çl$°:„/Ô,íÍ)BÁp Ňu\ˇÎw7€PpNEÉ袛Ŕ‚˛%€Ý>R¸CčőâZLO9€ď ­ěĂŤő@AőÄĚ:˛Ű„€«$ŇȲ}®wěC€ű$ż E(®ę ˛a‰X• Î`lWí@#şvŻkŐ ę®]¸۲Ź:RBNa-1cřdTř&ŤŕK9€×¦‚j>đ6HŐ© ˇç Ŕwś8qfKÉ đžEĐF0Ř;=nqËP!Ł»t ¤Q0 H$-0Ł`0\µ#@ŤÄÚ<…·Yä€ë¸/ s‡ŕ[UŕÔË Aëw€ś!¬+Ĺ3 [!+w ‹w€Ü`C µ hĎ”” g@»'ő1č̆`4¸ŕV¬€Üą|tńä‰4 §Q0 H%Bđ˝˙íśQ0 \µ#@Ť”ÚrqĹś;0Ą —!ęEČ–G€ěÖö+Ŕ×4>@:ňćŇ`-©Yˇç±ţř¬¸™°`AGCÁŤ67 ĆŢŘł´!XwărČA¬Ŕzq°2/‚ä‚ćG<Y°`ʼn;Ŕ×l-­,GÁ(Ł€€«v a[;20Üŕŕ8ÖGÄ @ĂŞ7.Ŕű…Č˝Ć ŕíŹ +V@65Bş†ŕÓ »;f@ź¬d#oí€ßŃgÜŹ Bć2 €UčŠ5ř´ěظ ŕ«Aű@ŔŰN 5ô‹;–/ ŐĚv€kĘ ŕ#[O€/ƢR¸Ž‚Q0 FÁ°¸jG€µă…+^€wÓźčéÁŇ;Ä::8™S„ďbV~ ŔÇÓ|ď;lhh€íćŐ|íŹđÚ˘˛3ňěŇGä™Kč~°FĐ"đŢŹ‚Şí€)S@—'řp|Ů$|˛´8|IV°‡ >ńľ réčTtíđëťo€oݢ ڇ<Ŕ<˙“ xĽ'UÖ#.éf;íä Q\G€,ĚnŁ"Ŕ•´FŔU;Đ®32÷ ‚ŕ}/`˘cŢ Ý2ČÚÔŕŽ#čŘń;W g+KÄyŞH5dşšîrxe Ů4r<%yvĐąvÖnf€ŚŐ"$ť/C6@ÎŤČ%5}l§ŔłGbŔ˝FĆţ \`ýB•ăśFÁܵ#@ áÚ‘*~’Ü؆E`}ą|d4ŘiëذˇaĂ`/­bĹŠŔŽ&lyą ¤v\ł|&ř„ @|2… ©e!—r-vűŔGłB4Bżô>…‰Ě†nTśŔŢç‚hÇşÂë` ±{6€*Č °Ë,Ř@ `Đ®ĄEŘhíH0ŕÁ8l®Ú €Fzí3f¬vËŔWd$LY°`°’ 2ŘßZž&VYýŽđÓSA˝IŚľ#°Nš±`°vö/A[A`C¬ŕXP˛`ÍY[× éqíĹzŽ92RY^€M‚–ŃÂvˇŔ«a`Ĺ ©#ŻěŮ졸pa Cw°´ÚńöíŰ™íÜ"ŹÁş˙ó›Ďź?Ľ9ľ#y¶MĆÜqŹ&BdÝŚ5čŕ~ Ö˙YČŢjǵˇčkâÇ _ż~Mü5&$YM;żŕQŮČO¤ŐČ‚h‘‹'5­F›z@‹J¬AŠő<©ˇp…$@öÎ A †˘—ôiŽĂŠxŔx}ôC©ť™€ &4,0#©b_ íďIdž Ć•çB*uA’DJjnî\´üďJ¦h©¤č¨¦Ž© çţTţU…9@Wo 9HKUËĂ*&a…©^ëk…ôŇń¶T AS Y «Óąů]§.—r9SĆ`) ňčęýÉ÷ËÁhjů’84#i—ř8ŔzVßŢÓ*ßôŢ´%‰ŹÎ`â;î´G˙©M:Ş9=†~>¶˝×‚gâGüžYŚŃ×ĘĹë{4ăŕO®ŹŘKú>çjóǸkZŇÇM×ď>ă…#d5i¦GIżZ7JŁž#˙·gěŘGö®‡A~—'tä Śó ¦Š1cĆŽ3ň.6X§(±TĘP©˝á"ĺ8;ś˙ěŘŔ(-8^GFÓLăʡӔ˘ś …Đ«Ývňb÷” *Ř-KsGËľ*]˝L –î9!ĚjJľ®ĐŻ‹D†"ŚRhT l’ čţćQqÔö En’ąťjť¦űcŻtířŰĺF“Y­—ŕ¶ť”'Ŕş3Ŕ‹t@6^¸¬5Aő:xżÇ@ä`x˛1®K3¨U?Ń­ďU Zцv.9.ۉ}hC|űQ±†*ŻaĘR2XGaß‘H«©ë‚7Ăoő"nÓÄĽ- ë"#¶vŔŢ•äÂŔŻú^fĂ­MëščI9*Đ ±#3ĄýŃŃ·06§Â™iUŻđúz¸˘K)xŘä):˛…CM”»ۤ¤ @ĺžH†ë¬1ÉxT›”g\˛™ŻC„S…ő)'=jF!o/ćßů`łź1¶8UˉN¤<‡O%(”mbäĹKÖ‰·×NŃQMçś•'ÜZ—Ş>ýŔĄş¸ę^Č ĐuO†.E5]ă©/ŽĄ?(ĐĆ;ä®ég×űł˛íĽÉ‘FÓĺÚŹW+Ż ó;-Ö®ńłč8 ŔÝÜ0ĂPté:B‡č =2Ç»IűâßF–1V |B!Ř!–üqřqNŽŽ¤b·źŚSgh¨ŚŞZz&×Ů7/+AŽŚ¶ţ X Đńa§(÷ýŽZkL Aą•–ďŃĘ-łlxż^†vdňÓt4†Í!şî,VÓ1č.ŞöťŕĂHřdBô‡đȆřô¶ż›!.‹ŻřO[o] žŞ*ˇP´ť9ë©Đˇ Ý˘Ó§Ĺő®‰°ýQ…Â9ş)2őW*ÓĄ`‘’б-5˝ů»¨Ú™ů÷Ą!jÓľ18·0üX»ňÜčř@övśVcđÚΆÔŔ›7ĐPXÍ4€×Ëś€Ě28°|T ¤ =çÁ 9­ŤŔs~ EŞŔÎ"ěđqHÝ ŻŐ€Żěݱ 9Ž^Ă!od‰cě而 ‚îŔ‚t:Ał?€ĚhÂO…Ą8Ź‚!ŕ˝=r3ry=t‹¶Q0 (¸jG€žµ#Ľ# g‚«Iü °ćűđqŔw\@Ż{ś0>Ĺ;î ° öC€·<ĐŘđ(tňřrcčőđç@ýĂ/ŔÚr:ůȲWXŻXďBz±/ž 6VÂíÚxvň*äÚ„č(ÚWţ‡bvGŚ‚Q0\®|@ðvDA=ó„tż µ]QQÇČ|!d¸˛ěeĆŽ +VśzÂ÷lŘĐłdd…ĐL Č:xířIräÍŘ•©DřÄä‰ŕ‹Tć6ŕ¸kŚ*‚ŃÚq¨Ř8 ȸjG€ě]»€ ü}?Ë/pr`d„ŔŃKO ňh4†KˇŔriŇ^˙ĆŽ ¦‰C ěSDúłăx&(ٍ!'’7Háź°ŁtÎdD:,Ń<)nTZŰŁüŤ•t şľJµÝâŃW”°R¶ ˛Ř‡±ă[đŢ;çز=í`źxQă4źśLkn ZŘ9`Ç*Nćůƚл·˝"çŞdr1DXMÎdč$.ŚíĂ«aÁ»âËă˝°4Îĺî¸5v\Đp«őÖ ¤Íő8ę €µ |–2P Té˙÷4—A6rärŹh&Pă›'(=Eč™5ŕ“N °nďý‡,ŢŻŁŮłaĹ ÄmŽáVäŽ#d`.©n!$˝Cv ¸‡÷Ƣ}ÇIŁťu€ mpÄĽ‡+$®ĄÄ xOů¬´óŮѦ‡T{ç9`ü6ţǸ­Śv4épĹ@ö® @†ý˙SľÄGđí¬6 %q ˝b˝4k÷+v¤›‚‘,rÎŻ(… ‡%<ŕ—Ž#ý,l{i­„ ١Ů ’7št:ŇÔEWHĂ5x32Ž~Zń\•-ÇňŻ„ŁÜ >ľQ9«ÉżoâSĚŮOHÓôFVUŃ·ĘŞ¬&žöĺ+ś~<úÇ Őăܤç|s&¸mô1ľÍ8Ý.Áz˙™éX:ŘаŞľ‘ľ… #˘hëqиdH_Ř]^G ŢL2rćΊ W.€çëEđˇáĐrž<ź'€8^ycäľF4ä3Zˇ#®Ŕ®*Ľp%‚b/Ś‚A€ĺ>°`Ĺßq!¦2řOú˝ dÔŽ´v-­@;‹R˝!wď+<<Ý>`UGĚŐŮŔĘ— @gÍÇt¸" €†[íČ•"°ľ„0°Ş¤Ľv`•{\Ë+ݰłR/€WŮ@6G]žE;ŽŇß…özÁă±ŕ®*Đa]›pőČ‹Ś Çâ@„Fׯo€vA#°†÷®H­ piˇbíH×R× 4"Žő–  ux†‚‰ R`4ó z=ŇŹţWlĐđ¬IŔJí‹5ŔËVo@nę€×Žýü/Ŕçď¸=a<‰r×té x˙Ćčő °Ýđ[;Źj:űř ‘ŕľ)µBc 6€§ßCFe€U ÚQÝČÝ ´ľÁŕZęZńÇY޵°>ĂzäVĂqŤ¬bí Ź…ÁC`Ć>Đhíŕ­ «[1e‘+QäűQö;˘îčŘpáBĂŠŔžexD¨ tˇ#xG&|8XťEŐM€­‚t7!×Ď€-m „-ť˝˛gÇ đťĚ;Nś¨ ¬vŕŽ#ăî­˙¸{W¸®5Fłe0¸–şV@ŔĄK—ĐŚÂTŐRd@ŇĽ#VpęÔ)\ˉGí®č ŃÚ‘X?ľµ˘˘˛âąvDŰďd'ŕŢF ™ţ\®8A'Ćű—zń{|ĐĐ `ó˙ĂVÁ÷‡PŘßC@nŐŻŔŽíf§ ; (·…®Ą®hŤ<~ţüb/٤nĂ€ß X·”Śş\µ#@ŤÖŽŘ®[>ţ'5áµă Píu2¸ ¬×l€î踀[;2€V˝KÜYš’°`\pŕŔŚŃ 7FÁ( =@Űżu=ę(ĆWí@ٵ#QůŢČźŕWá—xś€ť-ßďH¤ÉŔŢář]’'N€–älŘ$WŚN%Ž‚Q@p y©ň(NWí@ٵ# € “Bę<¸ d5 -'` sȨ‘Ař `}ą`´×8 FÁ(t¸jG€­ˇŘ„Ô‚M÷Ő§˙ÁKu 'ŕÚ‚Ŕ6űă©PGÁ(Ł` €«v a[;^ßď˝Sż!g@NŔyńâüh8D#D%¤Öü»6Ď|$2@ë}Ž‚Q0 FÁ(śWí@ĂłvÖgđCsţĂf ŔîľV‡‘>Ŕ«Ix ?Áś {a·1>­GÁ(Ł`H\µ#@ ĎÚ˛jÂÖ|đşR;Bną‚śWż t*č Žđî#I6^€HĄ8Z;Ž‚Q0 FÁ¸jG€¶µ#rőáž8q~üB+`• ©;á÷AB#ź9ľ`ÁČń4ŔŞRł» = Ů@ö;RĺPşQ0 FÁ(ô¸jG€žµ#d€~˙üĚ ˛1>č ©2ŃÔ@.Ä€÷ qČzWújŚ‚Q0 Fµ®Ú €†gíĘ^G‡L!çzźß'ďD˘ő˙#íq„źS{}, alignment => undef, ref_sequence => undef, %args } ,ref $caller || $caller; if (!defined($self->{conservation}) or ref($self->{conservation}) ne "ARRAY") { $self->throw("conservation: argument missing or wrong object type"); } return $self; } sub param { my ($self, $param, $value) = @_; if (!defined $param) { return keys %{$self->{parameters}}; } elsif (defined $value) { $self->{parameters}->{$param} = $value; } return $self->{parameters}->{$param}; } sub alignment { $_[0]->{alignment} } sub ref_sequence { $_[0]->{ref_sequence} } sub conserved_regions_as_gff { my ($self, %args) = @_; my @features = $self->conserved_regions_as_features; my $output = ""; foreach my $f (@features) { $output .= $f->gff_string."\n"; } return $output; } sub conserved_regions_as_features { my ($self, %args) = @_; my ($excl_feature_ref, $cutoff) = $self->_rearrange([qw(exclude_features cutoff)], %args); $cutoff = ($cutoff or $self->param("cutoff") or 0.7); print STDERR "CURRENT CUTOFF $cutoff\n"; my @conservation = $self->conservation; my ($START, $END); if ($self->ref_sequence and $self->ref_sequence->isa("Bio::LocatableSeq")) { $START = $self->ref_sequence->start; $END = $self->ref_sequence->end; } else { $START = $self->alignment->get_seq_by_pos($self->param("ref_seq_nr" or 1))->start; $END = $self->alignment->get_seq_by_pos($self->param("ref_seq_nr" or 1))->end; } if ($excl_feature_ref and ref($excl_feature_ref) eq "ARRAY") { foreach my $f (@$excl_feature_ref) { my ($s, $e) = ($f->start, $f->end); next if ($e<$START or $s>$END); $s = $START if $s<$START; $e = $END if $e>$END; map { $conservation[$_] = 0 } ($s-$START..$e-$START); } } # obtain a list of positions with threshold my @cons_positions = ( (grep {$conservation[$_-$START]>=$cutoff} ($START..$#conservation+$START)), $END+1000); # $END+1000 is a dummy position that simplifies the procedure my @features; my $region_start = $cons_positions[0]; my $counter = 0; foreach my $i (0..$#cons_positions ) { if ($cons_positions[$i+1]-$cons_positions[$i]>10) { # 10 can be changed to max allowed gap in a conserved region $counter++; my $f = Bio::SeqFeature::Generic->new(-start=>$region_start, -end =>$cons_positions[$i], -primary => "CR$counter", -score => $cons_positions[$i]-$region_start+1 ); push @features, $f; print STDERR $features[-1]->gff_string."\n"; $region_start = $cons_positions[$i+1]; } } return @features; } sub conserved_regions_as_feature_pairs { } sub conservation { my ($self, $start, $end) = @_; if (!defined $start) { return @{$self->{conservation}} } else { my ($START, $END); if ($self->alignment) { my $ref_seq_nr = ($self->param("ref_seq_nr") or 1); $START = $self->alignment->get_seq_by_pos($ref_seq_nr)->start; $END = $self->alignment->get_seq_by_pos($ref_seq_nr)->end; } else { $START = 1; } if (defined $end) { my ($pad_start, $pad_end) = (0, 0); if ($start < $START) { $pad_start = $START-$start; $start = $START; } if ($end > $END) { $pad_end = $end-$END; $end = $END } return (_blank_array($pad_start), @{$self->{conservation}}[$start-$START, $end-$START], _blank_array($pad_end) ); } elsif ($start< $START) { return undef; } else { return $self->{conservation}->[$start-$START]; } } } sub _blank_array { my ($length) = @_; my @arr = map {undef} (0..$length); pop @arr; return @arr } sub conserved_sequences { } sub conserved_subalignments { } 1; TFBS-0.7.1/TFBS/DB.pm000077500000000000000000000005061305752266700136570ustar00rootroot00000000000000# This package should hold interface and common database manipulation # methods, if we decide there are any package TFBS::DB; use vars qw(@ISA); use strict; use Bio::Root::Root; use TFBS::Matrix; @ISA = qw(Bio::Root::Root); sub new { } sub get_MatrixSet { } sub get_Matrix_by_ID { } # not finished (apparently) TFBS-0.7.1/TFBS/DB/000077500000000000000000000000001305752266700133155ustar00rootroot00000000000000TFBS-0.7.1/TFBS/DB/FlatFileDir.pm000077500000000000000000000330031305752266700160020ustar00rootroot00000000000000# TFBS module for TFBS::DB::FlatFileDir # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::FlatFileDir - interface to a database of pattern matrices stored as a collection of flat files in a dedicated directory =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing directory my $db = TFBS::DB::FlatFileDir->connect("/home/boris/MatrixDir"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M00079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: my @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); # retrieving a set of all PFMs in the database my $matrixset = $db->get_MatrixSet(-matrixtype => "PFM"); =item * creating a new FlatFileDir database in a new directory: my $db = TFBS::DB::JASPAR2->create("/home/boris/NewMatrixDir"); =item * storing a matrix in the database: #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::FlatFileDir is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a set of flat files in a dedicated directory. It has a very simple structure and can be easily set up manually if desired. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::FlatFileDir; use vars qw(@ISA); use strict; use Bio::Root::Root; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::Matrix::PWM; use TFBS::MatrixSet; @ISA = qw(TFBS::DB Bio::Root::Root); =head2 new Title : new Usage : my $db = TFBS::DB::FlatFileDir->new(%args); Function: the formal constructor for the TFBS::DB::FlatFileDir object; most users will not use it - they will use specialized I or I constructors to create a database object Returns : a TFBS::DB::FlatFileDir object Args : -dir # the directory containing flat files =cut sub new { my $caller = shift; my $self = bless {_item => {}, _idlist_of_name=>{} , _idlist_of_class=>{} }, ref ($caller) || $caller; if (-d $_[0]) { $self->{dir} = $_[0]; } elsif ($_[0] eq '-dir' and -d $_[1]) { $self->{dir} = $_[1]; } else { $self->throw("Error initializing FlatFileDir database dir: ", ($_[1] or $_[0] or "No directory parameter passed.")); } $self->_load_db_index(); return $self; } =head2 connect Title : connect Usage : my $db = TFBS::DB::FlatFileDir->connect($directory); Function: Creates a database object that retrieves TFBS::Matrix::* object data from or stores it in an existing directory Returns : a TFBS::DB::FlatFileDir object Args : ($directory) The name of the directory (possibly with fully qualified path). =cut sub connect { my ($caller, $dir) = @_; $caller->new(-dir=>$dir); } =head2 create Title : create Usage : my $newdb = TFBS::DB::FlatFileDir->create($new_directory); Function: connects to the database server, creates a new directory, sets up a FlatFileDir database and returns a database object that interfaces the database Returns : a TFBS::DB::FlatFileDir object Args : ($new_directory) The name of the directory to create (possibly with fully qualified path). =cut sub create { my ($caller, $dir) = @_; if (-d $dir) { die ("Directory $dir exists") ; } mkdir ($dir) or die("Error creating directory $dir, stopped"); open FILE, ">$dir/matrix_list.txt" or die ("Error creating matrix_list.txt"); close FILE; $caller->new(-dir=>$dir); } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PWM is retrieved by default. =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; $self->throw("No ID passed to get_Matrix_by_ID.") unless defined $ID; $mt = defined $mt ? $self->_check_matrixtype($mt) : "PWM"; my $matrixobj; { no strict 'refs'; my $working_mt = $mt = uc $mt; my $matrixstring = $self->_read_file($ID,$mt) # if no desired $mt, is there a PFM? || $self->_read_file($ID,$working_mt="PFM") || return undef; eval("\$matrixobj= TFBS::Matrix::$working_mt->new".' ( -ID => $ID, -name => $self->{_item}->{$ID}->{name} || "", -class => $self->{_item}->{$ID}->{class}|| "", -matrix=> $matrixstring, -tags=> $self->{_item}->{$ID}->{tags} );'. "if (\$working_mt ne \$mt) {\$matrixobj = \$matrixobj->to_$mt;}"); if ($@) {$self->throw($@); } } # print "MATRIXOBJ: $matrixobj\n"; return $matrixobj; } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1', 'PWM'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM') Args : (Matrix_name, Matrix_type) Matrix_name is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PWM is retrieved by default. Warning : According to the current JASPAR2 data model, name is not necessarily a unique identifier. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You have been warned. =cut sub get_Matrix_by_name { my ($self, $name, $mt) = @_; my $ID=$self->{_idlist_of_name}->{$name}->[0] or return undef; if ((my $L= scalar @{ $self->{_idlist_of_name}->{$name} }) > 1) { $self->warn("There are $L matrices with name '$name'"); } return $self->get_Matrix_by_ID($ID, $mt); } sub get_matrix { # an obsolete method - kept for the time being for backward compatibility my ($self, %args) = @_; my $DIR = $self->{dir}; my $ID; # retrieval from .pwm files in a directory my $mt = ($self->_get_matrixtype_from_args(%args) or $self->throw("No -matrixtype provided.")); if ($args{-ID}) { $ID = $args{-ID}; } elsif (my $name = $args{-name}) { $ID=$self->{_idlist_of_name}->{$name}->[0] or $self->warn("No matrix with name $name found."); if ((my $L= scalar @{ $self->{_idlist_of_name}->{$name} }) > 1) { $self->warn("There are $L matrices with name '$name'"); } } else { $self->throw("No -ID or -name passed to ".ref($self)); } my $matrixobj; { no strict 'refs'; my $ucmt = uc $mt; my $matrixstring =`cat $DIR/$ID.$mt`; eval("\$matrixobj= TFBS::Matrix::$ucmt->new".' ( -ID => $ID, -name => $self->{_item}->{$ID}->{name}, -class => $self->{_item}->{$ID}->{class}, -matrix=> $matrixstring # FIXME - temporary );'); if ($@) {$self->throw($@); } } # print "MATRIXOBJ: $matrixobj\n"; return $matrixobj; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($matrixobj); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : ($matrixobj) # a TFBS::Matrix::* object =cut sub store_Matrix { my ($self, $matrixobj) = @_; my ($mt) = ($matrixobj =~ /TFBS::Matrix::(\w+)/) or $self->throw("Wrong type of object passed to store_Matrix."); if (defined $self->{_item}->{$matrixobj->ID()}) { $self->throw("ID ".$matrixobj->ID()." exists in the database."); } else { my $matrixfile = $self->{dir}."/".$matrixobj->ID().".".lc($mt); open FILE, ">$matrixfile" or $self->throw("Could not write file $matrixfile."); print FILE $matrixobj->rawprint; close FILE; my $ic = ($mt eq "ICM") ? $matrixobj->total_ic : ($mt eq "PFM") ? $matrixobj->to_ICM->total_ic : ""; $self->{_item}->{$matrixobj->ID()} = { 'name' => $matrixobj->name || "", 'ic' => $ic, 'class'=> $matrixobj->class || "" }; my %tags= $matrixobj->all_tags(); foreach my $named_tag (keys %tags){ if ( ref $tags{$named_tag} eq "ARRAY"){ my $val= join (",",@{$tags{$named_tag}}); $tags{$named_tag}=$val; } $self->{_item}->{$matrixobj->ID()}{'tag'}{$named_tag}=$tags{$named_tag}; # print $named_tag , " ", $self->{_item}->{$matrixobj->ID()}{'tag'}{$named_tag}, "\n"; } $self->_update_db_index(); } return 0; } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (ID) A string Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, $ID) = @_; my $DIR = $self->{dir}; unlink <$DIR/$ID.*>; delete $self->{_item}->{$ID}; $self->_update_db_index(); } sub _update_db_index { my $self = shift; rename $self->{dir}."/matrix_list.txt", $self->{dir}."/~matrix_list.txt"; open FILE, ">".$self->{dir}."/matrix_list.txt"; foreach my $ID ( keys %{$self->{_item}} ) { print FILE join("\t", $ID, $self->{_item}->{$ID}->{ic}, $self->{_item}->{$ID}->{name}, $self->{_item}->{$ID}->{class} )."\t"; # add tagged annotation # my %tag = $self->{_item}->{$ID}->{'all_tags'}; foreach my $name(sort keys %{$self->{'_item'}->{$ID}{'tag'}}){ print FILE "; ", $name, " \"", $self->{'_item'}->{$ID}{'tag'}{$name}, "\"\ "; } print FILE "\n"; } close FILE; } sub _load_db_index { my ($self, $field, $value) = @_; my $DIR = $self->{dir}; open (MATRIXLIST, "$DIR/matrix_list.txt") or $self->throw("Could not read matrix list $DIR/matrix_list.txt"); while (my $line = ) { chomp $line; my ($ID, $ic, $name, $class) = split /\s+/, $line, 4; if ($ID =~ /(\w+)\.(\w+)$/) { $ID = $1; } defined($self->{_item}->{$ID}) and $self->warn("Duplicate entries for ID $ID"); $self->{_item}->{$ID} = {name=>$name, ic=>$ic, class=>$class}; push @{ $self->{_idlist_of_name}->{$name} }, $ID; push @{ $self->{_idlist_of_class}->{$class} }, $ID; # annoatation my @anno= split(/\s?;\s?/, $line); my %tags; shift @anno; foreach (@anno){ my ($name, $val)=split(/\s?\"/, $_); # print "$name $val\n"; $self->{_item}->{$ID}->{'tags'}->{$name}=$val; } } close MATRIXLIST; return scalar keys %{ $self->{_item} }; # false if list empty } sub get_MatrixSet { my ($self, %args) = @_; my $DIR = $self->{db}; my $arrayref; my $mt = $self->_check_matrixtype($args{-matrixtype}) || $self->throw("No matrix type provided."); delete $args{'-matrixtype'}; my ($field, $value) = %args; unless (defined $field) { $field="-IDs"; $arrayref = [ keys %{ $self->{_item}} ]; } my @IDlist; if ($field eq "-IDs") { @IDlist = @$arrayref; } elsif ($field eq "-names") { foreach (@$arrayref) { push @IDlist, @{ $self->{_idlist_of_name}->{$_} }; } } elsif ($field eq "-classes") { foreach (@$arrayref) { push @IDlist, @{ $self->{_idlist_of_class}->{$_} }; } } else { $self->throw("Unknown matrixset selector: $field."); } my $matrixset = TFBS::MatrixSet->new(); foreach my $ID(@IDlist) { $matrixset->add_matrix($self->get_Matrix_by_ID($ID, $mt)); } close MATRIXLIST; return $matrixset; } sub _check_matrixtype { my ($self, $mt) = @_; $mt = uc $mt; return undef unless $mt; unless ( $mt eq "PFM" or $mt eq "ICM" or $mt eq "PWM") { $self->throw("Unsupported matrix type: ".$mt); } return $mt; } sub _read_file { my ($self, $id, $mt) = @_; local $/ = undef; open FILE, $self->{dir}."/$id.".lc($mt) or return undef; my $matrixstring = ; #slurp; close FILE; return $matrixstring; } 1; TFBS-0.7.1/TFBS/DB/JASPAR.pm000066400000000000000000001624621305752266700146460ustar00rootroot00000000000000# TFBS module for TFBS::DB::JASPAR # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # Maintainers: # Xiaobei Zhao - JASPAR6 (JASPAR 2014 experimental) # David Arenillas - JASPAR7 (JASPAR 2016) # # This is an update of, and follows from TFBS::DB::JASPAR6 which itself was # created / modified from JASPAR5 by Xiaobei Zhao. It was created # by copying the existing TFBS::DB::JASPAR6 and modifying it to reflect the # changes made to the 2016 update of the JASPAR database / webserver. # # As there seemed to be some discrepancy / confusion between the version # numbers with the JASPAR DB/webserver releases and the perl module, from # now on the latest (current) version of the perl module will simply be named # JASPAR.pm (without version number attached). Previous versions will have a # version number attached to them to indicate which DB/webserver they are # related to. # # JASPAR 5 and 6 are associated with JASPAR 2014 (also known as JASPAR 5.0). # It appears that the JASPAR6 perl module was an experimental update to # JASPAR5 and did not reflect a version change in the JASPAR DB/webserver. # # Change summary since JASPAR[5/6].pm: # - In the _store_matrix routine, the code was checking for a version number # by checking for a 'version' tag and NOT even checking to see if the version # number was already contained in the matrix ID!?!? Modified to first check # for a version encoded in the matrix ID and then only if it isn't, check for # a version tag. # - Fixed the way the 'acc' tag is stored. # - The DB schema was changed to allow multiple values for a given matrix in # some instances where previously only a single value was allowed. E.g. for # the MATRIX_ANNOTATION table, removed the unique key constrain on ID + TAG # so that multiple VALs with the same TAG can be stored for a given matrix. # As it was, multiple values were already stored as comma separated values # in a single row which is bad DB design (not normalized). The Code was thus # modified to reflect that these values may now be stored in multiple DB # rows. # - Related to this, previously it was assumed that TF class and family could # each only be a single value. However, for heterodimers it is quite possible # that the two TFs making up the dimer have a different class/family so now # the code assumes multiple could exist in the DB for these tags. As a result # the class/family may now be stored in a TFBS::Matrix object as either a # scalar string (for a single value) or as a listref (for multiple values). # This is consistent with the way other tag/values are stored although # it can certainly be argued that it is not good programming style to have # different return types for the same method! This behaviour should be # reconsidered in future versions of the TFBS modules. # - Similarly to values stored in MATRIX_ANNOTATION, the MATRIX_SPECIES table # has been modified to allow storage or multiple species for a matrix as # separate rows rather than as comma separated strings and related code # modified accordingly. # - Added some exception calls and/or more informative error messages in some # places they were missing. # - Fixed up (some of) the ugly code formatting, e.g. changed mixture of # leading tabs / space to be consistently spaces and changed indentation # levels to always be 4 spaces. # # Also see specific embedded comments tagged with DJA for more details # # POD =head1 NAME TFBS::DB::JASPAR - interface to MySQL relational database of pattern matrices. Currently status: experimental. =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing JASPAR6-type database my $db = TFBS::DB::JASPAR6->connect("dbi:mysql:JASPAR6:myhost", "myusername", "mypassword"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M0079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); =item * creating a new JASPAR6-type database named MYJASPAR6: my $db = TFBS::DB::JASPAR4->create("dbi:mysql:MYJASPAR6:myhost", "myusername", "mypassword"); =item * storing a matrix in the database (currently only PFMs): #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::JASPAR is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. The interface is nearly identical to the JASPAR2 and JASPAR4 interface, while the underlying data model is different =head1 JASPAR6 DATA MODEL JASPAR6 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed (JASPAR2) to store matrices for the JASPAR database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile ID (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional, and custom, tags. Here goes a moore thorough description on tables and IDs ----------------------- ADVANCED --------------------------------- For the developers and the curious, here is the JASPAR6 data model: MISSING TEXT HEER ON HOW IT WORKS It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most TFBS users. Most users should only know the methods to store the data and which tags are supported. ------------------------------------------------------------------------- =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::JASPAR7; use vars qw(@ISA $AUTOLOAD); # we need all three matrices due to the redundancy in JASPAR2 data model # which will hopefully be removed in JASPAR3 use TFBS::Matrix::PWM; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::TFFM; use TFBS::MatrixSet; use Bio::Root::Root; use DBI; # use TFBS::DB; # eventually use strict; @ISA = qw(TFBS::DB Bio::Root::Root); ######################################################################### # CONSTANTS ######################################################################### use constant DEFAULT_CONNECTSTRING => "dbi:mysql:JASPAR_DEMO"; # on localhost use constant DEFAULT_USER => ""; use constant DEFAULT_PASSWORD => ""; ######################################################################### # PUBLIC METHODS ######################################################################### =head2 new Title : new Usage : DEPRECATED - for backward compatibility only Use connect() or create() instead =cut sub new { _new(@_); } =head2 connect Title : connect Usage : my $db = TFBS::DB::JASPAR6->connect("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the existing JASPAR6-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR6 object Args : a standard database connection triplet ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, use the actual values. PASSWORD and USERNAME might be optional, depending on the user's acces permissions for the database server. =cut sub connect { #DONE # a more intuitive syntax for the constructor my ($caller, @connection_args) = @_; $caller->new(-connect => \@connection_args); } =head2 dbh Title : dbh Usage : my $dbh = $db->dbh(); $dbh->do("UPDATE matrix_data SET name='ADD1' WHERE NAME='SREBP2'"); Function: returns the DBI database handle of the MySQL database interfaced by $db; THIS IS USED FOR WRITING NEW METHODS FOR DIRECT RELATIONAL DATABASE MANIPULATION - if you have write access AND do not know what you are doing, you can severely corrupt the data For documentation about database handle methods, see L Returns : the database (DBI) handle of the MySQL JASPAR2-type relational database associated with the TFBS::DB::JASPAR2 object Args : none =cut sub dbh { #DONE my ($self, $dbh) = @_; $self->{'dbh'} = $dbh if $dbh; return $self->{'dbh'}; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($matrixobject); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (PFM_object) A TFBS::Matrix::PFM, FBS::Matrix::PWM or FBS::Matrix::ICM object. PFM object are recommended to use, as they are eaily converted to other formats # might have to give version and collection here Comment : this is an experimental method that is not 100% bulletproof; use at your own risk =cut sub store_Matrix { #PROBABLY DONE # collection, version are taken from the corresponding tags. Warn if they are not there ; my ($self, @PFMs) = @_; my $err; foreach my $pfm (@PFMs) { eval { my $int_id = $self->_store_matrix($pfm) ; # needs to have collection and version $self->_store_matrix_data($pfm, $int_id); $self->_store_matrix_annotation($pfm, $int_id); $self->_store_matrix_species($pfm, $int_id); $self->_store_matrix_acc($pfm, $int_id); }; } return $@; } sub create { #done my ($caller, $connectstring, $user, $password) = @_; if ( $connectstring and $connectstring =~ /dbi:mysql:(\w+)(.*)/) { # connect to the server; my $dbh = DBI->connect("dbi:mysql:mysql" . $2, $user, $password) or die("Error connecting to the database"); # create database and open it $dbh->do("create database $1") or die("Error creating database."); $dbh->do("use $1"); # create tables _create_tables($dbh); $dbh->disconnect; # run "new" with new database return $caller->new(-connect => [$connectstring, $user, $password]); } else { die( "Missing or malformed connect string for " . "TFBS::DB::JASPAR2 connection."); } } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix is stored in the database (PFM is default) Args : (Matrix_ID) Matrix_ID id is a string which refers to the stable JASPAR ID (usually something like "MA0001") with or without version numbers. "MA0001" will give the latest version on MA0001, while "MA0001.2" will give the second version, if existing. Warnings will be given for non-existing matrices. =cut sub get_Matrix_by_ID { #DONE. MAYBE :) my ($self, $q, $mt) = @_; # q is a stable ID with possible version number # jsp6 $mt = (uc($mt) or "PFM"); unless (defined $q) { $self->throw("No ID passed to get_Matrix_by_ID"); } my $ucmt = uc $mt; # separate stable ID and version number my ($base_ID, $version) = split(/\./, $q); $version = $self->_get_latest_version($base_ID) unless $version; # latest version per default # get internal ID - also a check for validity my $int_id = $self->_get_internal_id($base_ID, $version); # get matrix using internal ID my $m = $self->_get_Matrix_by_int_id($int_id, $ucmt); #warn ref($m); return ($m); } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix object was stored in the database (default PFM)) Args : (Matrix_name) Warning : According to the current JASPAR6 data model, name is not necessarily a unique identifier. Also, names change over time. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You've been warned. Some matrices have multiple versions. The function will return the latest version. For specific versions, use get_Matrix_by_ID($ID.$version) =cut sub get_Matrix_by_name { #DONE my ($self, $name, $mt) = @_; unless (defined $name) { $self->throw("No name passed to get_Matrix_by_name."); } # sanity check: are there many different stable IDs with same name? my $sth = $self->dbh->prepare( qq!SELECT distinct BASE_ID FROM MATRIX WHERE NAME="$name"! ); $sth->execute(); my (@stable_ids) = $sth->fetchrow_array(); my $L = scalar @stable_ids; $self->warn("There are $L distinct stable IDs with name '$name'") if scalar $L > 1; return $self->get_Matrix_by_ID($stable_ids[0], $mt); } =head2 get_MatrixSet Title : get_MatrixSet Usage : my $matrixset = $db->get_MatrixSet(%args); Function: fetches matrix data under for all matrices in the database matching criteria defined by the named arguments and returns a TFBS::MatrixSet object Returns : a TFBS::MatrixSet object Args : This method accepts named arguments, corresponding to arbitrary tags, and also some utility functions Note that this is different from JASPAR2 and to some extent JASPAR4. As any tag is supported for database storage, any tag can be used for information retrieval. Additionally, arguments as 'name','class','collection' can be used (even though they are not tags. Per default, only the last version of the matrix is given. The only way to get older matrices out of this to use an array of IDs with actual versions like MA0001.1, or set the argyment -all_versions=>1, in which case you get all versions for each stable ID Examples include: Fundamental matrix features -all # gives absolutely all matrix entry, regardless of versin and collection. Only useful for backup situations and sanity checks. Takes precedence over everything else -ID # a reference to an array of stable IDs (strings), with or without version, as above. tyically something like "MA0001.2" . Takes precedence over everything salve -all -name # a reference to an array of # transcription factor names (string). Will only take latest version. NOT a preferred way to access since names change over time -collection # a string corresponding to a JASPAR collection. Per default CORE -all_versions # gives all matrix versions that fit with rest of criteria, including obsolete ones.Is off per default. # Typical usage is in combiation with a stable IDs withou versions to get all versinos of a particular matrix Typical tag queries: These can be either a string or a reference to an array of strings. If it is an arrau it will be interpreted as as an "or"s statement -class # a reference to an array of # structural class names (strings) -species # a reference to an array of # NCBI Taxonomy IDs (integers) -taxgroup # a reference to an array of # higher taxonomic categories (string) Computed features of the matrices -min_ic # float, minimum total information content # of the matrix. -matrixtype #string describing type of matrix to retrieve. If left out, the format will revert to the database format, which is PFM. The arguments that expect list references are used in database query formulation: elements within lists are combined with 'OR' operators, and the lists of different types with 'AND'. For example, my $matrixset = $db->(-class => ['TRP_CLUSTER', 'FORKHEAD'], -species => ['Homo sapiens', 'Mus musculus'], ); gives a set of TFBS::Matrix::PFM objects (given that the matrix models are stored as such) whose (structural clas is 'TRP_CLUSTER' OR'FORKHEAD') AND (the species they are derived from is 'Homo sapiens'OR 'Mus musculus'). As above, unless IDs with version numbers are used, only one matrix per stable ID wil be returned: the matrix with the highest version number The -min_ic filter is applied after the query in the sense that the matrices profiles with total information content less than specified are not included in the set. =cut # jsp6 sub get_MatrixSet { # IC conetent and matrix stuff is not there yet, rest should work my ($self, %args) = @_; #jsp6 $args{'-collection'} = 'CORE' unless $args{'-collection'}; $args{'-all_versions'} = 0 unless $args{'-all_versions'}; my @IDlist = @{$self->_get_IDlist_by_query(%args)} ; # the IDlist here are INTERNAL ids my $type; my $matrixset = TFBS::MatrixSet->new(); foreach my $int_id (@IDlist) { my $matrix = $self->_get_Matrix_by_int_id($int_id); if (defined $args{'-min_ic'}) { # we assume the matrix IS a PFM, o something in normal space at # least unless it explicitly says otherwise in tag=matrixtype # if so warn and do not use IC content # this is not foolproof in any way # # Fixed up logic to actually check $matrix->isa(TFBS::Matrix::ICM) # before checking the matrixtype tag. Also check that matrixtype # tag is defined before comparison to prevent annoying "Use of # uninitialized value in string eq" messages from perl. # DJA 2012/05/11 # if ($matrix->isa("TFBS::Matrix::ICM") || (defined $matrix->{tags}{matrixtype} && $matrix->{tags}{matrixtype} eq "ICM") ) { next if ($matrix->total_ic() < $args{'-min_ic'}); } elsif ($matrix->isa("TFBS::Matrix::PFM")) { next if ($matrix->to_ICM->total_ic() < $args{'-min_ic'}); } else { warn "Warning: you are assessing information content on matrices that are not in PFM or ICM format. Skipping this criteria"; next; } } # length if (defined $args{'-length'}) { next if ($matrix->length() < $args{'-length'}); } # number of sites within # since column sums MIGHT be slightly different we take the integer of the mean of the columns # or really int( sum of matrix/#columns) if (defined $args{'-sites'}) { my $sum = 0; foreach (1 .. $matrix->length) { $sum += $matrix->column_sum(); } $sum = int($sum / $matrix->length); #warn $matrix->ID, " $sum is $sum"; next if ($sum < $args{'-sites'}); } #ugly code: think about this a bit. if ($args{'-matrixtype'} && $matrix->isa("TFBS::Matrix::PFM")) { if ($args{'-matrixtype'} eq ('PWM')) { $matrix = $matrix->to_PWM(); } if ($args{'-matrixtype'} eq ('ICM')) { $matrix = $matrix->to_PWM(); } } $matrixset->add_Matrix($matrix); } return $matrixset; } sub store_MatrixSet { #DONE a wrapper around store_Matrix (which also can take an array of matrices, so utility only my ($self, $matrixset) = @_; my $it = $matrixset->Iterator(); while (my $matrix_object = $it->next) { # do whatever you want with individual matrix objects $self->store_Matrix( $matrix_object); } } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045.1'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-ike and may change in future versions) Args : (ID) A string. Has to be a matrix ID with version suffix in JASPAR6. Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, @IDs) = @_; # this has to be versioned IDs foreach my $ID (@IDs) { my ($base_id, $version) = split(/\./, $ID); unless ($version) { warn "You have supplied a non-versioned matrix ID to delete. Skipping $ID "; return 0; } # get relevant internal ID my ($int_id) = $self->_get_internal_id($base_id, $version); eval { my $q_ID = $self->dbh->quote($int_id); foreach my $table ( qw (MATRIX_DATA MATRIX MATRIX_SPECIES MATRIX_PROTEIN MATRIX_ANNOTATION) ) { $self->dbh->do("DELETE from $table where ID=$q_ID"); } }; } return $@; } =head2 get_TFFM_by_ID Title : get_TFFM_by_ID Usage : my $tffm = $db->get_TFFM_by_ID('TFFM0001'); Function: fetches TFFM data under the given ID from the database and returns a TFBS::TFFM object. Returns : a TFBS::TFFM object Args : TFFM ID TFFM_ID id is a string which refers to the stable JASPAR TFFM ID (usually something like "TFFM0001") with or without version numbers. "TFFM0001" will give the latest version on TFFM0001, while "TFFM0001.2" will give the second version, if existing. Warnings will be given for non-existing TFFMs. =cut sub get_TFFM_by_ID { my ($self, $id) = @_; # id is a stable ID with possible version number unless (defined $id) { $self->throw("No ID passed to get_TFFM_by_ID"); } # separate stable ID and version number my ($base_ID, $version) = split(/\./, $id); # latest version per default $version = $self->_get_TFFM_latest_version($base_ID) unless $version; # get internal ID - also a check for validity my $int_id = $self->_get_TFFM_internal_id($base_ID, $version); # get TFFM using internal ID my $tffm = $self->_get_TFFM_by_int_id($int_id); return ($tffm); } =head2 get_TFFM_by_matrix_ID Title : get_TFFM_by_matrix_ID Usage : my $tffm = $db->get_TFFM_by_matrix_ID('MA0001.1'); Function: fetches TFFM data under related to the given matrix ID from the database and returns a TFBS::TFFM object. Returns : a TFBS::TFFM object Args : Matrix ID Matrix ID id is a string which refers to the stable JASPAR matrix ID (usually something like "MA0148.3"). Note that this *should* be a fully qualified matrix ID (with version) as the TFFM is related to a specific version of a matrix. If no matrix version is given, the latest version of the matrix is retrieved and the corresponding TFFM for that matrix version is retrieved (could be no TFFM). In general only a few matrices have associated TFFMs so in many cases no TFFM will be retrieved. In these cases we return undef. =cut sub get_TFFM_by_matrix_ID { # id is a fully qualified matrix stable ID including version number my ($self, $matrix_id) = @_; unless (defined $matrix_id) { $self->throw("No matrix ID passed to get_TFFM_by_matrix_ID"); } # separate matrix stable ID and version number my ($matrix_base_id, $matrix_version) = split(/\./, $matrix_id); # latest matrix version per default $matrix_version = $self->_get_latest_version($matrix_base_id) unless $matrix_version; my $sth = $self->dbh->prepare( qq! SELECT BASE_ID, VERSION, NAME, LOG_P_1ST_ORDER, LOG_P_DETAILED, EXPERIMENT_NAME FROM TFFM WHERE MATRIX_BASE_ID = "$matrix_base_id" AND MATRIX_VERSION = "$matrix_version" ! ); $sth->execute(); my ($base_id, $version, $name, $log_p_1st_order, $log_p_detailed, $exp_name) = $sth->fetchrow_array(); my $tffm; if ($base_id) { eval { $tffm = TFBS::TFFM->new( -ID => "$base_id.$version", -name => $name, -log_p_1st_order => $log_p_1st_order, -log_p_detailed => $log_p_detailed, -experiment_name => $exp_name, # # OR we could retrieve the matrix and set the corresponding # attribute # -matrix_ID => "$matrix_base_id.$matrix_version" ) }; if ($@) { $self->throw($@); } # # Instead of storing the matrix ID, get the related matrix and store in # the matrix attribute. # # my $matrix = $self->get_Matrix_by_ID( # "$matrix_base_id.$matrix_version", 'PFM' # ); # # $tffm->matrix($matrix); # } else { warn "No TFFM exists for matrix '$matrix_base_id.$matrix_version'"; } return $tffm; } =head2 get_TFFM_by_name Title : get_TFFM_by_name Usage : my $tffm = $db->get_TFFM_by_name('HNF-1'); Function: fetches TFFM data under the given name from the database and returns a TFBS::TFFM object Returns : a TFBS::TFFM object Args : A TFFM name - the name of the transcription factor for which this TFFM was modelled. This is the same as the name of the matrix used to train the TFFM. Warning : According to the current JASPAR data model, name is not necessarily a unique identifier. Also, names change over time. In the case where there are several TFFMs with the same name in the database, the function fetches the first one and prints a warning on STDERR. You've been warned. Some matrices have multiple versions. The function will return the latest version. For specific versions, use get_TFFM_by_ID($ID.$version) =cut sub get_TFFM_by_name { my ($self, $name) = @_; unless (defined $name) { $self->throw("No name passed to get_TFFM_by_name."); } # sanity check: are there many different stable IDs with same name? my $sth = $self->dbh->prepare( qq!SELECT distinct BASE_ID FROM TFFM WHERE NAME="$name"! ); $sth->execute(); my (@stable_ids) = $sth->fetchrow_array(); my $L = scalar @stable_ids; $self->warn("There are $L distinct stable IDs with name '$name'") if scalar $L > 1; return $self->get_TFFM_by_ID($stable_ids[0]); } ######################################################################### # PRIVATE METHODS ######################################################################### sub _new { #PROBABLY OK my ($caller, %args) = @_; my $class = ref $caller || $caller; my $self = bless {}, $class; my ($connectstring, $user, $password); if ($args{'-connect'} and (ref($args{'-connect'}) eq "ARRAY")) { ($connectstring, $user, $password) = @{$args{'-connect'}}; } elsif ($args{'-create'} and (ref($args{'-create'}) eq "ARRAY")) { return $caller->create(@{-args {'create'}}); } else { ($connectstring, $user, $password) = (DEFAULT_CONNECTSTRING, DEFAULT_USER, DEFAULT_PASSWORD); } $self->dbh(DBI->connect($connectstring, $user, $password, {mysql_enable_utf8 => 1})); return $self; } sub _store_matrix_data { # DONE my ($self, $pfm, $int_id, $ACTION) = @_; my @base = qw(A C G T); my $matrix = $pfm->matrix(); my $type; my $sth = $self->dbh->prepare(q! INSERT INTO MATRIX_DATA VALUES(?,?,?,?) !); for my $i (0 .. 3) { for my $j (0 .. ($pfm->length - 1)) { $sth->execute( $int_id, $base[$i], $j + 1, $matrix->[$i][$j] ) or $self->throw("Error executing query."); } } } sub _store_matrix { #DONE # creation of the matrix will also give an internal unique ID # (incremental int) which will be returned to use for the other tables my ($self, $pfm, $ACTION) = @_; # # Added check that the version is not already stored as part of the ID. # Also added a more informative insertion exception message. # DJA 2016/08/26 # my $id = $pfm->ID; my $base_id; my $version; if ($id =~ /^(\S+)\.(\d+)/) { $base_id = $1; $version = $2; } else { $base_id = $id; } unless ($version) { # Get collection and version from the matrix tags $version = $pfm->{'tags'}{'version'}; } # will warn but not die if version is missing: will assume 1 unless ($version) { warn "WARNING: Lacking version number for " . $pfm->ID . ". Setting version=1"; $version = 1; } my $collection = $pfm->{'tags'}{'collection'}; unless ($collection) { warn "WARNING: Lacking collection name for " . $pfm->ID . ". Setting collection to an empty string. You probably do not want this"; $collection = ''; } # sanity check: do we already have this cobination of base ID and version? # If we do, die my $sth = $self->dbh->prepare( qq! select count(*) from MATRIX where VERSION=$version and BASE_ID= "$base_id" and collection="$collection" ! ); $sth->execute; my ($sanity_count) = $sth->fetchrow_array; if ($sanity_count > 0) { warn "WARNING: Database input inconsistency: You have already have $sanity_count $base_id matrices of version $version in collection $collection. Terminating program"; die; } # insert data $sth = $self->dbh->prepare( q! INSERT INTO MATRIX VALUES(?,?,?,?,?) ! ); # update next sth with actual version and collection: DO $sth->execute(0, $collection, $pfm->ID, $version, $pfm->name) or $self->throw( sprintf("Error inserting matrix %s as %s.%d to %s collection", $pfm->name, $pfm->ID, $version, $collection) ); # get the actual (new) iternal ID my $int_id = $self->dbh->{q{mysql_insertid}}; return $int_id; } sub _store_matrix_annotation { # DONE # # this is for tag-value items that are not one-to-many (so, not species # and not acc) # # We do need to store multiple values of class and family. In the case # of profiles representing dimers, the TFs making up the dimer may have # different TF classes and families. It also seemed abitrary to have a # primary key constraint on the ID + TAG. This has been removed. # # Added more informative messages to insertion failure exceptions. # Modified split to also handle spaces around comma. # DJA 2015/08/26 # my ($self, $pfm, $int_id, $ACTION) = @_; my $sth = $self->dbh->prepare( q! INSERT INTO MATRIX_ANNOTATION (ID, tag, val) VALUES(?,?,?) ! ); # get all tags # but skip out collection or version as we already have those in the # MATRIX table # special handling for class which mighht have a true slot my %tags = $pfm->all_tags(); if (defined($pfm->{class})) { $tags{class} = $pfm->{class}; } foreach my $tag (keys %tags) { next if $tag eq "collection"; next if $tag eq "version"; next if $tag eq "species"; # # The 'acc' tag was commented out in JASPAR6. Not sure the reasoning # as this stores the protein accession (UniProt IDs) which are # stored in the MATRIX_PROTEIN table and therefore should NOT be # stored in the MATRIX_ANNOTATION table and therefore *should* be # skipped here. # DJA 2015/08/26 # next if $tag eq "acc"; # next if $tag eq "class"; #$sth->execute($int_id, $tag, ($tags{$tag} or ""),) # or $self->throw("Error executing query"); # # Since we may have multiple values stored in some tags we need to # handle this. Assume that they may be stored either as array # references or as strings with comma separators. # DJA 2015/08/26 # my $vals = $tags{$tag}; if (ref $vals eq 'ARRAY') { foreach my $val (@$vals) { $val =~ s/^\s+//; $val =~ s/\s+$//; $sth->execute($int_id, $tag, $val) or $self->throw( "Error inserting tag/value pair ('$tag', '$val')" . " into MATRIX_ANNOTATION" ); } } else { foreach my $val (split(/\s*\,\s*/, $vals)) { $val =~ s/^\s+//; $val =~ s/\s+$//; $sth->execute($int_id, $tag, $val) or $self->throw( "Error inserting tag/value pair ('$tag', '$val')" . " into MATRIX_ANNOTATION" ); } } } } sub _store_matrix_species { # DONE #these are for species IDs - can be several # these are taken from the tag "species" # if that tag is a reference to an array we walk over the array # if it is a comma-separated string we split the string # # Added exception handling/messages for insertion failure. # Modified split to also handle spaces around comma. # DJA 2015/08/26 # my ($self, $pfm, $int_id, $ACTION) = @_; my $sth = $self->dbh->prepare( q! INSERT INTO MATRIX_SPECIES VALUES(?,?) ! ); #sanity check: are there any species? Its ok not to have it. return () unless $pfm->{'tags'}{'species'}; #is the species a string or an arrayref? if (ref($pfm->{'tags'}{'species'}) eq 'ARRAY') { # walkthru array foreach my $species (@{$pfm->{'tags'}{'species'}}) { $sth->execute($int_id, $species) or $self->throw( "Error inserting species '$species' for matrix $int_id)" ); } } else { # split and walk thru foreach my $species (split(/\s*\,\s*/, $pfm->{'tags'}{'species'})) { $species =~ s/^\s+//; $sth->execute($int_id, $species) or $self->throw( "Error inserting species '$species' for matrix $int_id)" ); } } } sub _store_matrix_acc { # DONE #these are for protein accession numbers - can be several # these are taken from the tag "acc" # if that tag is a reference to an array we walk over the array # if it is a comma-separated string we split the string # # For some reason the MATRIX_PROTEIN table had a primary key constraint # on the matrix ID field, preventing multiple proteins from being stored # here. That was presumably an oversight or otherwise an out of date table # definition. # # Also added more informative insertion exception messages. # DJA 2016/08/26 # my ($self, $pfm, $int_id, $ACTION) = @_; my $sth = $self->dbh->prepare( q! INSERT INTO MATRIX_PROTEIN VALUES(?,?) ! ); #sanity check: are there any accession numbers? Its ok not to have it. return () unless $pfm->{'tags'}{'acc'}; #is the protein ID a string or an arrayref? if (ref($pfm->{'tags'}{'acc'}) eq 'ARRAY') { # walkthru array foreach my $acc (@{$pfm->{'tags'}{'acc'}}) { $acc =~ s/\s//g; $sth->execute($int_id, $acc) or $self->throw( "Error inserting protein '$acc' for matrix $int_id)" ); } } else { # split and walk thru foreach my $acc (split(/\,/, $pfm->{'tags'}{'acc'})) { $acc =~ s/\s//g; $sth->execute($int_id, $acc) or $self->throw( "Error inserting protein '$acc' for matrix $int_id)" ); } } } #when creating: try to support arbitrary tags sub _create_tables { # DONE # utility function # If you want to change the databse schema, # this is the right place to do it # # Changed the primary key constraint on MATRIX_ANNOTATION to a simple # key. There are tags for which we do want to have multiple entries, e.g. # in the case of a dimer profiles the two TFs making up the dimer may # have a different TF class and/or family therefore we need to store more # than one record with tag 'class' or tag 'family' for the same matrix # ID. # # Also added key on ID to MATRIX_PROTEIN and MATRIX_SPECIES which seemed # to be missing. # # We may also want to set the charset/collation to utf8/utf8_unicode_ci # either at the DB level or to the individual table definitions # (particularly for the MATRIX_ANNOTATION table) to handle non-Latin # characters correctly. # # DJA 2015/08/26 # my $dbh = shift; my @queries = ( q! CREATE TABLE MATRIX ( ID INT(11) NOT NULL AUTO_INCREMENT, COLLECTION VARCHAR (16) DEFAULT '', BASE_ID VARCHAR (16) DEFAULT '' NOT NULL , VERSION TINYINT(4) DEFAULT 1 NOT NULL , NAME VARCHAR (255) DEFAULT '' NOT NULL, PRIMARY KEY (ID)) !, q! CREATE TABLE MATRIX_DATA ( ID INT(11) NOT NULL, row VARCHAR(1) NOT NULL, col TINYINT(3) UNSIGNED NOT NULL, val float(10,3), PRIMARY KEY (ID, row, col)) !, q! CREATE TABLE MATRIX_ANNOTATION ( ID INT(11) NOT NULL, TAG VARCHAR(255) DEFAULT '' NOT NULL, VAL varchar(255) DEFAULT '', KEY (ID, TAG)) !, q! CREATE TABLE MATRIX_SPECIES ( ID INT(11) NOT NULL, TAX_ID VARCHAR(255) DEFAULT '' NOT NULL KEY (ID)) !, q! CREATE TABLE MATRIX_PROTEIN ( ID INT(11) NOT NULL, ACC VARCHAR(255) DEFAULT '' NOT NULL KEY (ID)) !, q! CREATE TABLE TFFM ( ID int(11) NOT NULL auto_increment, BASE_ID varchar(16) NOT NULL, VERSION tinyint(4) NOT NULL, MATRIX_BASE_ID varchar(16) NOT NULL, MATRIX_VERSION tinyint(4) NOT NULL, NAME varchar(255) NOT NULL, LOG_P_1ST_ORDER float default NULL, LOG_P_DETAILED float default NULL, EXPERIMENT_NAME varchar(255) default NULL, PRIMARY KEY (ID), KEY BASE_ID (BASE_ID, VERSION), KEY MATRIX_BASE_ID (MATRIX_BASE_ID, MATRIX_VERSION) ! ); foreach my $query (@queries) { $dbh->do($query) or die("Error executing the query: $query\n"); } } sub _get_matrixstring { #DONE my ($self, $ID) = @_; #my %dbname = (PWM => 'pwm', PFM => 'raw', ICM => 'info'); #unless (defined $dbname{$mt}) { #$self->throw("Unsupported matrix type: ".$mt); #} my $sth; my $qID = $self->dbh->quote($ID); my $matrixstring = ""; foreach my $base (qw(A C G T)) { $sth = $self->dbh->prepare( "SELECT val FROM MATRIX_DATA WHERE ID=$qID AND row='$base' ORDER BY col" ); $sth->execute; $matrixstring .= join(" ", (map {$_->[0]} @{$sth->fetchall_arrayref()})) . "\n"; } $sth->finish; return undef if $matrixstring eq "\n" x 4; return $matrixstring; } sub _get_latest_version { #DONE my ($self, $base_ID) = @_; # SELECT VERSION FROM MATRIX WHERE BASE_ID=? ORDER BY VERSION DESC LIMIT 1 my $sth = $self->dbh->prepare( qq!SELECT VERSION FROM MATRIX WHERE BASE_ID="$base_ID" ORDER BY VERSION DESC LIMIT 1! ); $sth->execute; my ($latest) = $sth->fetchrow_array(); return ($latest); } sub _get_internal_id { #DONE # picks out the internal id for a a stable id+ version. Also checks if this cobo exists or not my ($self, $base_ID, $version) = @_; # SELECT ID FROM MATRIX WHERE BASE_ID=? and VERSION=? my $sth = $self->dbh->prepare( qq!SELECT ID FROM MATRIX WHERE BASE_ID="$base_ID" AND VERSION="$version"! ); $sth->execute; my ($int_id) = $sth->fetchrow_array(); return ($int_id); } sub _get_Matrix_by_int_id { #done my ($self, $int_id, $mt) = @_; my $matrixobj; $mt = 'PFM' unless $mt; # get the matrix as a string my $matrixstring = $self->_get_matrixstring($int_id) || return undef; #get remaining data in the matrix table: name, collection my $sth = $self->dbh->prepare( qq!SELECT BASE_ID,VERSION, COLLECTION,NAME FROM MATRIX WHERE ID="$int_id"! ); $sth->execute(); my ($base_ID, $version, $collection, $name) = $sth->fetchrow_array(); # jsp6 # get species ##$sth=$self->dbh->prepare(qq!SELECT TAX_ID FROM MATRIX_SPECIES WHERE ID="$int_id"!); $sth = $self->dbh->prepare( qq!SELECT GROUP_CONCAT(TAX_ID SEPARATOR ', ') as TAX_ID FROM MATRIX_SPECIES WHERE ID="$int_id"! ); $sth->execute(); my @tax_ids; while (my ($res) = $sth->fetchrow_array()) { my @res_v = split(/,/, $res); my @res_v2 = grep(s/^\s*(.*)\s*$/$1/g, @res_v); push(@tax_ids, @res_v2); } # jsp6 # get acc ##$sth=$self->dbh->prepare(qq!SELECT ACC FROM MATRIX_PROTEIN WHERE ID="$int_id"!); $sth = $self->dbh->prepare( qq!SELECT GROUP_CONCAT(ACC SEPARATOR ', ') as ACC FROM MATRIX_PROTEIN WHERE ID="$int_id"! ); $sth->execute(); my @accs; while (my ($res) = $sth->fetchrow_array()) { my @res_v = split(/,/, $res); my @res_v2 = grep(s/^\s*(.*)\s*$/$1/g, @res_v); push(@accs, @res_v2); } # jsp6 # get remaining annotation as tags, form ANNOTATION table my %tags; $sth = $self->dbh->prepare( qq{SELECT TAG, VAL FROM MATRIX_ANNOTATION WHERE ID = "$int_id" }); $sth->execute(); ## my @key_to_split=("acc", "medline", "pazar_tf_id"); #if acc in MATRIX_ANNOTATION #my @key_to_split=("medline", "pazar_tf_id"); # # Added 'class' and 'family' to keys to split. Since we have dimers that # may have different classes / families. Previously stored in DB as comma # separated lists, now store as separate records. # XXX But this breaks the interface code! FIXME XXX # DJA 2015/09/10 # #my @key_to_split = ("medline", "pazar_tf_id", "tfbs_shape_id", "tfe_id", "class", "family"); my @key_to_split = ("class", "family", "medline", "pazar_tf_id", "tfbs_shape_id", "tfe_id"); # # See FIXME comment below. # DJA 2015/09/14 # #foreach my $key (@key_to_split) { # $tags{$key} = ['-']; #} # # Fixed so that we can have multiple comma separated values, values in # separate rows of the table or a combination thereof. We really should # try to get away from having to specify which keys can be split (should # be handled more gracefully). # DJA 2015/09/14 # my $vals; while (my ($tag, $val) = $sth->fetchrow_array()) { $vals = []; if ($tag ~~ @key_to_split) { my @val_v = split(/,/, $val); my @val_v2 = grep(s/^\s*(.*)\s*$/$1/g, @val_v); #push(@$vals, @val_v2); #$tags{$tag} = $vals; push @{$tags{$tag}}, @val_v2; } else { $tags{$tag} = $val; } } # # XXX FIXME # This really doesn't belong here. It is done for the purposes of the web # interface but this is DB code which should not presume how the returned # data is going to be used. That should be handled in the JASPAR web code # modules. JASPAR web code currently expects all keys to split to exist! # DJA 2015/09/14 # XXX FIXME # foreach my $key (@key_to_split) { $tags{$key} = ['-'] unless $tags{$key}; } # jsp6 $tags{'collection'} = $collection; $tags{'species'} = \@tax_ids; # as array reference instead of strigifying $tags{'acc'} = \@accs; # same, if acc MATRIX_PROTEIN # my $class = $tags{'class'}; delete($tags{'class'}); # eval( "\$matrixobj = TFBS::Matrix::PFM->new" . ' ( -ID => "$base_ID.$version", -name => $name, -class => $class, -tags => \%tags, -matrixstring => $matrixstring # FIXME - temporary );' ); if ($@) { $self->throw($@); } # warn $int_id, "\t", ref($matrixobj); return ($matrixobj->to_PWM) if $mt eq "PWM"; return ($matrixobj->to_ICM) if $mt eq "ICM"; return ($matrixobj); # default PFM } ##jsp6 sub _get_IDlist_by_query { #needs cleanup. NOT for the faint-hearted. my ($self, %args) = @_; warn '_get_IDlist_by_query | $self || ', $self; warn '_get_IDlist_by_query | %args || ', %args; # called by get_MatrixSet # warn $args{"-collection"}; $args{'-collection'} = 'CORE' unless $args{'-collection'}; # returns a set of internal IDs with whicj to get the actual matrices # current idea: # 1: first catch non-tag things like collection, name and version, species # makw one query for these if they are named and check the IDs for "latest" unless requested not to. # these are AND statements # 2:then do the rest on tag level: # to be able to do this with actual and tattemnet innthe tag table, we do an inner join query, which is kept separate just for convenice # we then intersect 1 and 2 # 3: then do matrix-based features such as ic, with, number of sites etc, for the surviving matrices. This shold happen in the get_matrixset part my @int_ids_to_return; ## jsp6 - autosearch if ($args{'-auto'}) { ##my $sth=$self->dbh->prepare (qq!SELECT ID FROM MATRIX WHERE BASE_ID=?!); my $sth = $self->dbh->prepare( qq!SELECT U.ID FROM (SELECT ID, BASE_ID as VAL FROM MATRIX UNION ALL SELECT ID, NAME as VAL FROM MATRIX UNION ALL SELECT ID, ACC as VAL FROM MATRIX_PROTEIN UNION ALL SELECT ID, TAX_ID as VAL FROM MATRIX_SPECIES UNION ALL SELECT ID, SPECIES as VAL FROM MATRIX_SPECIES,TAX WHERE MATRIX_SPECIES.TAX_ID=TAX.TAX_ID UNION ALL SELECT ID, NAME as VAL FROM MATRIX_SPECIES,TAX_EXT WHERE MATRIX_SPECIES.TAX_ID=TAX_EXT.TAX_ID AND MATRIX_SPECIES.TAX_ID=9606 UNION ALL SELECT ID, VAL as VAL FROM MATRIX_ANNOTATION) AS U WHERE LOWER(`VAL`) LIKE LOWER(?)! ); warn '_get_IDlist_by_query | $sth || ', $sth; foreach my $stID (@{$args{'-auto'}}) { warn '_get_IDlist_by_query | $stID || ', $stID; my ($stable_ID, $version) = split(/\./, $stID) ; # ignore vesion here, this is a stupidity filter #$sth->execute($stable_ID); $sth->execute("%" . $stable_ID . "%"); while (my ($int_id) = $sth->fetchrow_array()) { warn '_get_IDlist_by_query | $int_id || ', $int_id; push(@int_ids_to_return, $int_id); } } return \@int_ids_to_return; } # should redo so that matrix_annotation queries are separate, with an intersect in the end #special case 1: get ALL matrices. Higher priority than all if ($args{'-all'}) { my $sth = $self->dbh->prepare(qq!SELECT ID FROM MATRIX!); $sth->execute(); my @a; while (my ($i) = $sth->fetchrow_array()) { push(@a, $i); } return \@a; } # ids: special case2 which is has higher priority than any other except the above (ignore all others if ($args{'-ID'}) { # these might be either stable IDs or stableid.version. # if just stable ID and if all_versions==1, take all versions, otherwise the latest if ($args{-all_versions}) { my $sth = $self->dbh->prepare( qq!SELECT ID FROM MATRIX WHERE BASE_ID=?!); foreach my $stID (@{$args{'-ID'}}) { my ($stable_ID, $version) = split(/\./, $stID) ; # ignore vesion here, this is a stupidity filter $sth->execute($stable_ID); while (my ($int_id) = $sth->fetchrow_array()) { push(@int_ids_to_return, $int_id); } } } else { # only the lastest version, or the requested version foreach my $stID (@{$args{'-ID'}}) { #warn $stID; my ($stable_ID, $version) = split(/\./, $stID); $version = $self->_get_latest_version($stable_ID) unless $version; my $int_id = $self->_get_internal_id($stable_ID, $version); push(@int_ids_to_return, $int_id) if $int_id; } } return \@int_ids_to_return; } my @tables = ("MATRIX M"); my @and; # in matrix table: collection, if ($args{-collection}) { my $q = ' (COLLECTION='; if (ref $args{-collection} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-collection}}) { push(@a, "\"$_\""); } $q .= join(" or COLLECTION=", @a); } else { # just one - typical usage $q .= "\"$args{-collection}\""; } $q .= " ) "; push(@and, $q); } # in matrix table: names. Is something that is basically only used from the web interface # typically used by the get_matrix_by_name function instead if ($args{-name}) { my $q = ' (NAME='; if (ref $args{-name} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-name}}) { push(@a, "\"$_\""); } $q .= join(" or NAME=", @a); } else { # just one - typical usage $q .= "\"$args{-name}\""; } $q .= " ) "; push(@and, $q); } # in species table: tax.id: possibly many species with OR in between if ($args{-species}) { push(@tables, "MATRIX_SPECIES S"); my $q = " M.ID=S.ID and (TAX_ID= "; if (ref $args{-species} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-species}}) { push(@a, "\"$_\""); } $q .= join(" or TAX_ID=", @a); } else { # just one - typical usage $q .= "=\"$args{-species}\""; } $q .= ") "; push(@and, $q); } # TAG_BASED # an internal join query:should be able to handle up to 26 tags-value combos with ANDS in between # Very ugly code ahead: my (@inner_tables, @internal_ands1, @internal_ands2); my $int_counter = 0; # for keeping track of names; my @alpha = ("a" .. "z"); my %arrayref; foreach my $key (keys %args) { next if $key eq "-min_ic"; next if $key eq "-matrixtype"; next if $key eq "-species"; next if $key eq "-collection"; next if $key eq "-all_versions"; next if $key eq "-all"; next if $key eq "-ID"; next if $key eq "-length"; next if $key eq "-name"; my $oldkey = $key; $key =~ s/-//; $arrayref{$key} = $args{$oldkey}; } if (%arrayref) { # get an internal name for the table push(@internal_ands2, " M.ID=a.ID "); my @a; foreach my $key (keys %arrayref) { my $tname = $alpha[$int_counter]; push(@inner_tables, "MATRIX_ANNOTATION $tname"); push(@internal_ands1, $alpha[$int_counter] . ".ID=" . $alpha[$int_counter - 1] . ".ID") unless $int_counter == 0; $int_counter++; # is the thing aupplied an array reference in inteslf: make an "or" query from that if (ref $arrayref{$key} eq "ARRAY") { my @b; foreach (@{$arrayref{$key}}) { push(@b, $self->dbh->quote($_)); } my $orstring = join(" or $tname.VAL=", @b); push(@a, "($tname.TAG=\"$key\" AND ($tname.VAL=$orstring))"); } #or not else { push(@a, "($tname.TAG=\"$key\" AND $tname.VAL=\"$arrayref{$key}\")" ); } } my $s = " ( " . join(" AND ", @a) . ")"; push(@internal_ands2, $s); } my $qq = "SELECT distinct(M.ID) from " . join(",", (@tables, @inner_tables)) . " where" . join(" AND ", (@and, @internal_ands1, @internal_ands2)); # warn $qq; #do actual mammoth query,and check for latest matrix my $sth = $self->dbh->prepare($qq); $sth->execute(); my @r; while (my ($int_id) = $sth->fetchrow_array) { if ($args{-all_versions}) { push(@r, $int_id); } else { # is latest? push(@r, $int_id) if ($self->_is_latest_version($int_id) == 1); } } warn "Warning: Zero matrices returned with current critera" unless scalar @r; return \@r; } ## jsp6 - checkpoint sub _is_latest_version { # is a particular internal ID representingthe latest matrix (collapse on base ids) my ($self, $int_id) = @_; my $sth = $self->dbh->prepare( qq! select count(*) from MATRIX where BASE_ID= (SELECT BASE_ID from MATRIX where ID=$int_id) AND VERSION>(SELECT VERSION from MATRIX where ID=$int_id) ! ); $sth->execute(); my ($count) = $sth->fetchrow_array(); return (1) if $count == 0; # no matrices with higher version ID and same base id return (0); } sub _get_TFFM_latest_version { my ($self, $base_ID) = @_; # SELECT VERSION FROM TFFM WHERE BASE_ID=? ORDER BY VERSION DESC LIMIT 1 my $sth = $self->dbh->prepare( qq!SELECT VERSION FROM TFFM WHERE BASE_ID="$base_ID" ORDER BY VERSION DESC LIMIT 1! ); $sth->execute; my ($latest) = $sth->fetchrow_array(); return ($latest); } sub _get_TFFM_internal_id { # picks out the internal id for a stable id version. Also checks if this # combo exists or not my ($self, $base_ID, $version) = @_; # SELECT ID FROM TFFM WHERE BASE_ID=? and VERSION=? my $sth = $self->dbh->prepare( qq!SELECT ID FROM TFFM WHERE BASE_ID="$base_ID" AND VERSION="$version"! ); $sth->execute; my ($int_id) = $sth->fetchrow_array(); return ($int_id); } sub _get_TFFM_by_int_id { #done my ($self, $int_id) = @_; my $sth = $self->dbh->prepare( qq! SELECT BASE_ID, VERSION, MATRIX_BASE_ID, MATRIX_VERSION, NAME, LOG_P_1ST_ORDER, LOG_P_DETAILED, EXPERIMENT_NAME FROM TFFM WHERE ID = "$int_id" ! ); $sth->execute(); my ($base_id, $version, $matrix_base_id, $matrix_version, $name, $log_p_1st_order, $log_p_detailed, $exp_name) = $sth->fetchrow_array(); my $tffm; eval { $tffm = TFBS::TFFM->new( -ID => "$base_id.$version", -name => $name, -log_p_1st_order => $log_p_1st_order, -log_p_detailed => $log_p_detailed, -experiment_name => $exp_name, # # OR we could retrieve the matrix and set the corresponding # attribute # -matrix_ID => "$matrix_base_id.$matrix_version" ) }; if ($@) { $self->throw($@); } # # Instead of storing the matrix ID, get the related matrix and store in # the matrix attribute. # # my $matrix = $self->get_Matrix_by_ID( # "$matrix_base_id.$matrix_version", 'PFM' # ); # # $tffm->matrix($matrix); # return $tffm; # default PFM } sub _is_TFFM_latest_version { my ($self, $int_id) = @_; my $sth = $self->dbh->prepare( qq! select count(*) from TFFM where BASE_ID = (SELECT BASE_ID from TFFM where ID = $int_id) AND VERSION > (SELECT VERSION from TFFM where ID = $int_id) ! ); $sth->execute(); my ($count) = $sth->fetchrow_array(); # no TFFMs with higher version ID and same base id return (1) if $count == 0; return (0); } sub DESTROY { #OK $_[0]->dbh->disconnect() if $_[0]->dbh; } TFBS-0.7.1/TFBS/DB/JASPAR2.pm000077500000000000000000000615141305752266700147270ustar00rootroot00000000000000# TFBS module for TFBS::DB::JASPAR2 # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::JASPAR2 - interface to MySQL relational database of pattern matrices =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing JASPAR2-type database my $db = TFBS::DB::JASPAR2->connect("dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M0079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: my @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); # retrieving a set of all PFMs in the database # derived from human genes: my $matrixset = $db->get_MatrixSet(-species => ['Homo sapiens'], -matrixtype => "PFM"); =item * creating a new JASPAR2-type database named MYJASPAR2: my $db = TFBS::DB::JASPAR2->create("dbi:mysql:MYJASPAR2:myhost", "myusername", "mypassword"); =item * storing a matrix in the database (currently only PFMs): #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::JASPAR2 is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. =head1 JASPAR2 DATA MODEL JASPAR2 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed to store matrices for the JASPAR database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile ID (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional optional tags. Due to its data model, which precedeed the design of the module, TFBS::DB::JASPAR2 cannot store arbitrary tags for a matrix. The supported tags are 'acc' # (accession number; # originally for transcription factor protein seq) 'seqdb' # sequence database where 'acc' comes from 'medline' # PubMed ID 'species' # Species name 'sysgroup' 'total_ic' # total information content - redundant, present # for historical "medline" => ($self->_get_medline($ID) or ""), "species" => ($self->_get_species($ID) or ""), "sysgroup"=> ($self->_get_sysgroup($ID) or ""), "type" => ($self->_get_type($ID) or ""), "seqdb" => ($self->_get_seqdb($ID) or ""), "acc" => ($self->_get_acc($ID) or ""), "total_ic"= ----------------------- ADVANCED --------------------------------- For the developers and the curious, here is the JASPAR2 data model: CREATE TABLE matrix_data ( ID varchar(16) DEFAULT '' NOT NULL, pos_ID varchar(24) DEFAULT '' NOT NULL, base enum('A','C','G','T'), position tinyint(3) unsigned, raw int(3) unsigned, info float(7,5) unsigned, -- calculated pwm float(7,5) unsigned, -- calculated normalized float(7,5) unsigned, PRIMARY KEY (pos_ID), KEY id_index (ID) ); CREATE TABLE matrix_info ( ID varchar(16) DEFAULT '' NOT NULL, name varchar(15) DEFAULT '' NOT NULL, type varchar(8) DEFAULT '' NOT NULL, class varchar(20), phylum varchar (32), -- maps to 'sysgroup' tag litt varchar(40), -- not used by this module medline int(12), information varchar(20), -- not used by this module iterations varchar(6), width int(2), -- calculated consensus varchar(25), -- calculated IC float(6,4), -- maps to 'total_ic' tag sites int(3) unsigned, -- not used by this module PRIMARY KEY (ID) ) CREATE TABLE matrix_seqs ( ID varchar(16) DEFAULT '' NOT NULL, internal varchar(8) DEFAULT '' NOT NULL, seq_db varchar(15) NOT NULL, seq varchar(10) NOT NULL, PRIMARY KEY (ID, seq_db, seq) ) CREATE TABLE matrix_species ( ID varchar(16) DEFAULT '' NOT NULL, internal varchar(8) DEFAULT '' NOT NULL, species varchar(24) NOT NULL, PRIMARY KEY (ID, species) ) It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most TFBS users, simply because for historical reasons some table column names are confusing at best. Most users should only know the methods to store the data and which tags are supported. ------------------------------------------------------------------------- =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::JASPAR2; use vars qw(@ISA $AUTOLOAD); # we need all three matrices due to the redundancy in JASPAR2 data model # which will hopefully be removed in JASPAR3 use TFBS::Matrix::PWM; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::MatrixSet; use Bio::Root::Root; use DBI; # use TFBS::DB; # eventually use strict; @ISA = qw(TFBS::DB Bio::Root::Root); ######################################################################### # CONSTANTS ######################################################################### use constant DEFAULT_CONNECTSTRING => "dbi:mysql:JASPAR_DEMO"; # on localhost use constant DEFAULT_USER => ""; use constant DEFAULT_PASSWORD => ""; ######################################################################### # PUBLIC METHODS ######################################################################### =head2 new Title : new Usage : DEPRECATED - for backward compatibility only Use connect() or create() instead =cut sub new { _new (@_); } =head2 connect Title : connect Usage : my $db = TFBS::DB::JASPAR2->connect("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the existing JASPAR2-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR2 object Args : a standard database connection triplet ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, use the actual values. PASSWORD and USERNAME might be optional, depending on the user acces permissions for the database server. =cut sub connect { # a more intuitive syntax for the constructor my ($caller, @connection_args) = @_; $caller->new(-connect => \@connection_args); } =head2 create Title : create Usage : my $newdb = TFBS::DB::JASPAR2->create("dbi:mysql:NEWDATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the database server, creates a new JASPAR2-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR2 object Args : a standard database connection triplet ("dbi:mysql:NEWDATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of NEWDATABASENAME, HOSTNAME, USERNAME and PASSWORD use the actual values. PASSWORD and USERNAME might be optional, depending on the users acces permissions for the database server. =cut sub create { my ($caller, $connectstring, $user, $password) = @_; if ($connectstring and $connectstring =~ /dbi:mysql:(\w+)(.*)/) { # connect to the server; my $dbh=DBI->connect("dbi:mysql:mysql".$2, $user,$password) or die("Error connecting to the database"); # create database and open it $dbh->do("create database $1") or die("Error creating database."); $dbh->do("use $1"); # create tables _create_tables($dbh); $dbh->disconnect; # run "new" with new database return $caller->new(-connect=>[$connectstring, $user, $password]); } else { die("Missing or malformed connect string for ". "TFBS::DB::JASPAR2 connection."); } } =head2 dbh Title : dbh Usage : my $dbh = $db->dbh(); $dbh->do("UPDATE matrix_data SET name='ADD1' WHERE NAME='SREBP2'"); Function: returns the DBI database handle of the MySQL database interfaced by $db; THIS IS USED FOR WRITING NEW METHODS FOR DIRECT RELATIONAL DATABASE MANIPULATION - if you have write access AND do not know what you are doing, you can severely corrupt the data For documentation about database handle methods, see L Returns : the database (DBI) handle of the MySQL JASPAR2-type relational database associated with the TFBS::DB::JASPAR2 object Args : none =cut sub dbh { my ($self, $dbh) = @_; $self->{'dbh'} = $dbh if $dbh; return $self->{'dbh'}; } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PWM is retrieved by default. =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; $mt = (uc($mt) or "PWM"); unless (defined $ID) { $self->throw("No ID passed to get_Matrix_by_ID"); } my $matrixobj; { no strict 'refs'; my $ucmt = uc $mt; my $matrixstring = $self->_get_matrixstring($ID, $mt) || return undef; eval("\$matrixobj= TFBS::Matrix::$ucmt->new".' ( -ID => $ID, -name => $self->_get_name($ID)."", -class => $self->_get_class($ID)."", -tags => { "medline" => ($self->_get_medline($ID) or ""), "species" => ($self->_get_species($ID) or ""), "sysgroup"=> ($self->_get_sysgroup($ID) or ""), "type" => ($self->_get_type($ID) or ""), "seqdb" => ($self->_get_seqdb($ID) or ""), "acc" => ($self->_get_acc($ID) or ""), "total_ic"=> ($self->_get_total_ic($ID) or "") }, -matrix=> $matrixstring # FIXME - temporary );'); if ($@) {$self->throw($@); } } return $matrixobj; } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1', 'PWM'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM') Args : (Matrix_name, Matrix_type) Matrix_name is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PWM is retrieved by default. Warning : According to the current JASPAR2 data model, name is not necessarily a unique identifier. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You have been warned. =cut sub get_Matrix_by_name { my ($self, $name, $mt) = @_; unless(defined $name) { $self->throw("No name passed to get_Matrix_by_name."); } my @IDlist = $self->_get_IDlist_by_query(-names=>[$name]); my $ID= ($IDlist[0] or $self->warn("No matrix with name $name found.")); if ((my $L= scalar @IDlist) > 1) { $self->warn("There are $L matrices with name '$name'"); } return $self->get_Matrix_by_ID($ID, $mt); } =head2 get_MatrixSet Title : get_MatrixSet Usage : my $matrixset = $db->get_MatrixSet(%args); Function: fetches matrix data under for all matrices in the database matching criteria defined by the named arguments and returns a TFBS::MatrixSet object Returns : a TFBS::MatrixSet object Args : This method accepts named arguments: -IDs # a reference to an array of IDs (strings) -names # a reference to an array of # transcription factor names (string) -classes # a reference to an array of # structural class names (strings) -species # a reference to an array of # Latin species names (strings) -sysgroups # a reference to an array of # higher taxonomic categories (strings) -matrixtype # a string, 'PFM', 'ICM' or 'PWM' -min_ic # float, minimum total information content # of the matrix The five arguments that expect list references are used in database query formulation: elements within lists are combined with 'OR' operators, and the lists of different types with 'AND'. For example, my $matrixset = $db->(-classes => ['TRP_CLUSTER', 'FORKHEAD'], -species => ['Homo sapiens', 'Mus musculus'], -matrixtype => 'PWM'); gives a set of PWMs whose (structural clas is 'TRP_CLUSTER' OR 'FORKHEAD') AND (the species they are derived from is 'Homo sapiens' OR 'Mus musculus'). The -min_ic filter is applied after the query in the sense that the matrices profiles with total information content less than specified are not included in the set. =cut sub get_MatrixSet { my ($self, %args) = @_; my @IDlist = $self->_get_IDlist_by_query(%args); my $mt = ($args{'-matrixtype'} or "PWM"); my $matrixset = TFBS::MatrixSet->new(); foreach (@IDlist) { next if (defined $args{'-min_ic'} and $self->_get_total_ic($_) < $args{'-min_ic'}); $matrixset->add_Matrix($self->get_Matrix_by_ID($_, $mt)); } return $matrixset; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($pfm); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (PFM_object) A TFBS::Matrix::PFM object Comment : this is an experimental method that is not 100% bulletproof; use at your own risk =cut sub store_Matrix { my ($self, @PFMs) = @_; my $err; foreach my $pfm (@PFMs) { eval { $self->_store_matrix_data($pfm); $self->_store_matrix_info($pfm); $self->_store_matrix_seqs($pfm); $self->_store_matrix_species($pfm); }; } return $@; } =head2 store_MatrixSet Title : store_MatrixSet Usage : $db->store_Matrix($matrixset); Function: Stores the TFBS::DB::PFM object that are part of a TFBS::MatrixSet object into the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (MatrixSet_object) A TFBS::MatrixSet object Comment : THIS METHOD IS NOT YET IMPLEMENTED =cut sub store_MatrixSet { $_[0]->throw ("Method store_MtrixSet not yet implemented."); } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (ID) A string Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, @IDs) = @_; eval { foreach my $ID (@IDs) { my $q_ID = $self->dbh->quote($ID); foreach my $table (qw (matrix_data matrix_info matrix_seqs matrix_species) ) { $self->dbh->do("DELETE from $table where ID=$q_ID"); } } }; return $@; } ######################################################################### # PRIVATE METHODS ######################################################################### sub _new { my ($caller, %args) = @_; my $class = ref $caller || $caller; my $self = bless {}, $class; my ($connectstring, $user, $password); if ($args{'-connect'} and (ref($args{'-connect'}) eq "ARRAY")) { ($connectstring, $user, $password) = @{$args{'-connect'}}; } elsif ($args{'-create'} and (ref($args{'-create'}) eq "ARRAY")) { return $caller->create(@{-args{'create'}}); } else { ($connectstring, $user, $password) = (DEFAULT_CONNECTSTRING, DEFAULT_USER, DEFAULT_PASSWORD); } $self->dbh( DBI->connect($connectstring, $user, $password) ); return $self; } sub _get_IDlist_by_query { # called by get_MatrixSet my ($self, %args) = @_; my ($TABLES, %arrayref); $args{-names} and $arrayref{name} = $args{-names} ; $args{-classes} and $arrayref{class} = $args{-classes} ; $args{-sysgroups} and $arrayref{phylum} = $args{-sysgroups}; $args{-IDs} and $arrayref{ID} = $args{-IDs}; my @andconditions; if ($args{-species}) { $TABLES = ' matrix_info, matrix_species '; push @andconditions, 'matrix_info.ID = matrix_species.ID', " (". join(" OR ", (map {"matrix_species.species=". $self->dbh->quote($_) } @{$args{-species}} )). ") "; } else { $TABLES = 'matrix_info '; } foreach my $key (keys %arrayref) { if (scalar @{$arrayref{$key}}) { push @andconditions, "(". join(" OR ", (map {"matrix_info.$key=". $self->dbh->quote($_) } @{$arrayref{$key}} )). ")"; } else { push @andconditions, "(1=0)"; } } my $WHERE = ((scalar @andconditions) == 0) ? "" : " WHERE "; my $query = "SELECT DISTINCTROW matrix_info.id FROM $TABLES $WHERE". join(" AND ", @andconditions); my $sth = $self->dbh->prepare($query); $sth->execute() or $self->throw("Query failed:\n$query\n"); # collect IDs and return my @IDlist = (); while (my ($id) = $sth->fetchrow_array()) { push @IDlist, $id; } $sth->finish; return @IDlist; } sub _get_matrixstring { my ($self, $ID, $mt) = @_; my %dbname = (PWM => 'pwm', PFM => 'raw', ICM => 'info'); unless (defined $dbname{$mt}) { $self->throw("Unsupported matrix type: ".$mt); } my $sth; my $qID = $self->dbh->quote($ID); my $matrixstring = ""; foreach my $base (qw(A C G T)) { $sth=$self->dbh->prepare ("SELECT $dbname{$mt} FROM matrix_data WHERE ID=$qID AND base='$base' ORDER BY position"); $sth->execute; $matrixstring .= join (" ", (map {$_->[0]} @{$sth->fetchall_arrayref()}))."\n"; } $sth->finish; return undef if $matrixstring eq "\n"x4; return $matrixstring; } sub _simple_query { my ($self, $table, $retr_field, $search_field, $search_value) = @_; my $q_value = $self->dbh->quote($search_value); my $sth = $self->dbh->prepare ("SELECT DISTINCT $retr_field from $table WHERE $search_field = $q_value and $retr_field <> \"\" ORDER BY $retr_field"); $sth->execute; return (map {$_->[0]} @{$sth->fetchall_arrayref}); } sub _store_matrix_data { my ($self, $pfm, $ACTION) = @_; my @base = qw(A C G T); my $pfmatrix = $pfm->matrix(); my $icmatrix = $pfm->to_ICM()->matrix(); my $pwmatrix = $pfm->to_PWM->matrix(); my $sth = $self->dbh->prepare (q! INSERT INTO matrix_data VALUES(?,?,?,?,?,?,?,?) !); for my $i (0..3) { for my $j (0..($pfm->length-1)) { $sth->execute( $pfm->ID, $pfm->ID.".".$base[$i].".".($j+1), $base[$i], $j+1, $pfmatrix->[$i][$j], $icmatrix->[$i][$j], $pwmatrix->[$i][$j], $pfmatrix->[$i][$j] / $pfm->column_sum()) or $self->throw("Error executing query."); } } } sub _store_matrix_info { my ($self, $pfm, $ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO matrix_info (ID, name, type, class, phylum, width, IC, sites) VALUES(?,?,?,?,?,?,?,?) !); $sth->execute($pfm->ID, ($pfm->name or $pfm->ID), ($pfm->{'tags'}->{'type'} or ""), ($pfm->class() or undef), ($pfm->{'tags'}->{'sysgroup'} or undef), $pfm->length(), $pfm->to_ICM->total_ic(), $pfm->column_sum() ) or $self->throw("Error executing query"); } sub _store_matrix_seqs { my ($self, $pfm, $ACTION) = @_; return unless ($pfm->{'tags'}->{'seqdb'} or $pfm->{'tags'}->{'acc'}); my $sth = $self->dbh->prepare (q! INSERT INTO matrix_seqs (ID, seq_db, seq) VALUES(?,?,?) !); $sth->execute($pfm->ID, ($pfm->{'tags'}->{'seqdb'} or ""), ($pfm->{'tags'}->{'acc'} or "") ) or $self->throw("Error executing query"); } sub _store_matrix_species { my ($self, $pfm, $ACTION) = @_; return unless $pfm->{'tags'}->{'species'}; my $sp = $pfm->{'tags'}->{'species'}; my @splist = (ref($sp) ? @$sp : $sp); foreach my $species (@splist) { my $sth = $self->dbh->prepare (q! INSERT INTO matrix_species (ID, species) VALUES(?,?) !); $sth->execute($pfm->ID, $species ) or $self->throw("Error executing query"); } } sub _create_tables { # utility function # If you want to change the databse schema, # this is the right place to do it my $dbh = shift; my @queries = ( q! CREATE TABLE matrix_data ( ID varchar(16) DEFAULT '' NOT NULL, pos_ID varchar(24) DEFAULT '' NOT NULL, base enum('A','C','G','T'), position tinyint(3) unsigned, raw int(3) unsigned, info float(7,5) unsigned, pwm float(7,5), normalized float(7,5) unsigned, PRIMARY KEY (pos_ID), KEY id_index (ID) ) !, q! CREATE TABLE matrix_info ( ID varchar(16) DEFAULT '' NOT NULL, name varchar(15) DEFAULT '' NOT NULL, type varchar(8) DEFAULT '' NOT NULL, class varchar(20), phylum varchar(32), litt varchar(40), medline int(12), information varchar(20), iterations varchar(6), width int(2), consensus varchar(25), IC float(6,4), sites int(3) unsigned, PRIMARY KEY (ID) ) !, q! CREATE TABLE matrix_seqs ( ID varchar(16) DEFAULT '' NOT NULL, internal varchar(8) DEFAULT '' NOT NULL, seq_db varchar(15) NOT NULL, seq varchar(10) NOT NULL, PRIMARY KEY (ID, seq_db, seq) ) !, q! CREATE TABLE matrix_species ( ID varchar(16) DEFAULT '' NOT NULL, internal varchar(8) DEFAULT '' NOT NULL, species varchar(24) NOT NULL, PRIMARY KEY (ID, species) ) !); foreach my $query (@queries) { $dbh->do($query) or die("Error executing the query: $query\n"); } } sub AUTOLOAD { my ($self, $ID) = @_; no strict 'refs'; my $TABLE; my %dbname_of = (ID => 'ID', name => 'name', class => 'class', species => 'species', sysgroup => 'phylum', type => 'type', seqdb => 'seq_db', acc => 'seq', total_ic => 'IC', medline => 'medline' ); my ($where_column, $where_value); if ($AUTOLOAD =~ /.*::_{0,1}get_(\w+)_list/) { defined $dbname_of{$1} or $self->throw("$AUTOLOAD: no such method!"); ($where_column, $where_value) = (1,1); } elsif ($AUTOLOAD =~ /.*::_get_(\w+)/) { defined $dbname_of{$1} or $self->throw("$AUTOLOAD: no such method!"); defined $ID or $self->throw("No ID provided for $AUTOLOAD"); ($where_column, $where_value) = ('ID', $ID); } else { $self->throw("$AUTOLOAD: no such method!"); } defined $dbname_of{$1} or $self->throw("$AUTOLOAD: no such method!"); if ($1 eq 'species') { $TABLE = 'matrix_species'; } elsif ($1 eq 'seqdb' or $1 eq 'acc') { $TABLE = 'matrix_seqs', } else { $TABLE = 'matrix_info' ; } my @results = $self->_simple_query ($TABLE, $dbname_of{$1}, $where_column => $where_value); wantarray ? return @results : return $results[0]; } sub DESTROY { $_[0]->dbh->disconnect() if $_[0]->dbh; } 1; TFBS-0.7.1/TFBS/DB/JASPAR4.pm000077500000000000000000000534411305752266700147310ustar00rootroot00000000000000# TFBS module for TFBS::DB::JASPAR4 # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::JASPAR4 - interface to MySQL relational database of pattern matrices =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing JASPAR2-type database my $db = TFBS::DB::JASPAR4->connect("dbi:mysql:JASPAR4:myhost", "myusername", "mypassword"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M0079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: my @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); # retrieving a set of all PFMs in the database # derived from human genes: my $matrixset = $db->get_MatrixSet(-species => ['Homo sapiens'], -matrixtype => "PFM"); =item * creating a new JASPAR4-type database named MYJASPAR4: my $db = TFBS::DB::JASPAR4->create("dbi:mysql:MYJASPAR4:myhost", "myusername", "mypassword"); =item * storing a matrix in the database (currently only PFMs): #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::JASPAR4 is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. The interface is nearly identical to the JASPAR2interface, while the underlying data model is different =head1 JASPAR2 DATA MODEL JASPAR4 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed (JASPAR2) to store matrices for the JASPAR database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile ID (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional opseqdbtional tags. Tags that are commonly used in the actual JASPAR database include 'medline' # PubMed ID 'species' # Species name 'superclass' #Species supergroup, eg 'vertebrate', 'plant' etc 'total_ic' # total information content - redundant, present # for historical 'type' #experimental nethod 'acc' #accession number for TF protein sequence 'seqdb' #corresponding database name but any tag is storable and searchable. ----------------------- ADVANCED --------------------------------- For the developers and the curious, here is the JASPAR4 data model: It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most TFBS users. Most users should only know the methods to store the data and which tags are supported. ------------------------------------------------------------------------- =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::JASPAR4; use vars qw(@ISA $AUTOLOAD); # we need all three matrices due to the redundancy in JASPAR2 data model # which will hopefully be removed in JASPAR3 use TFBS::Matrix::PWM; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::MatrixSet; use Bio::Root::Root; use DBI; # use TFBS::DB; # eventually use strict; @ISA = qw(TFBS::DB Bio::Root::Root); ######################################################################### # CONSTANTS ######################################################################### use constant DEFAULT_CONNECTSTRING => "dbi:mysql:JASPAR_DEMO"; # on localhost use constant DEFAULT_USER => ""; use constant DEFAULT_PASSWORD => ""; ######################################################################### # PUBLIC METHODS ######################################################################### =head2 new Title : new Usage : DEPRECATED - for backward compatibility only Use connect() or create() instead =cut sub new { _new (@_); } =head2 connect Title : connect Usage : my $db = TFBS::DB::JASPAR4->connect("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the existing JASPAR4-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR4 object Args : a standard database connection triplet ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, use the actual values. PASSWORD and USERNAME might be optional, depending on the user's acces permissions for the database server. =cut sub connect { # a more intuitive syntax for the constructor my ($caller, @connection_args) = @_; $caller->new(-connect => \@connection_args); } =head2 dbh Title : dbh Usage : my $dbh = $db->dbh(); $dbh->do("UPDATE matrix_data SET name='ADD1' WHERE NAME='SREBP2'"); Function: returns the DBI database handle of the MySQL database interfaced by $db; THIS IS USED FOR WRITING NEW METHODS FOR DIRECT RELATIONAL DATABASE MANIPULATION - if you have write access AND do not know what you are doing, you can severely corrupt the data For documentation about database handle methods, see L Returns : the database (DBI) handle of the MySQL JASPAR2-type relational database associated with the TFBS::DB::JASPAR2 object Args : none =cut sub dbh { my ($self, $dbh) = @_; $self->{'dbh'} = $dbh if $dbh; return $self->{'dbh'}; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($matrixobject); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (PFM_object) A TFBS::Matrix::PFM, FBS::Matrix::PWM or FBS::Matrix::ICM object. PFM object are recommended to use, as they are eaily converted to other formats Comment : this is an experimental method that is not 100% bulletproof; use at your own risk =cut sub store_Matrix { my ($self, @PFMs) = @_; my $err; foreach my $pfm (@PFMs) { eval { $self->_store_matrix_data($pfm); $self->_store_matrix_info($pfm); $self->_store_matrix_annotation($pfm); #$self->_store_matrix_species($pfm); }; } return $@; } sub create { my ($caller, $connectstring, $user, $password) = @_; if ($connectstring and $connectstring =~ /dbi:mysql:(\w+)(.*)/) { # connect to the server; my $dbh=DBI->connect("dbi:mysql:mysql".$2, $user,$password) or die("Error connecting to the database"); # create database and open it $dbh->do("create database $1") or die("Error creating database."); $dbh->do("use $1"); # create tables _create_tables($dbh); $dbh->disconnect; # run "new" with new database return $caller->new(-connect=>[$connectstring, $user, $password]); } else { die("Missing or malformed connect string for ". "TFBS::DB::JASPAR2 connection."); } } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix is stored in the database (PFM is default) Args : (Matrix_ID) Matrix_ID is a string; =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; $mt = (uc($mt) or "PWM"); unless (defined $ID) { $self->throw("No ID passed to get_Matrix_by_ID"); } my $matrixobj; { no strict 'refs'; my $ucmt = uc $mt; my $matrixstring = $self->_get_matrixstring($ID) || return undef; # get type of matrix my $sth=$self->dbh->prepare(qq{SELECT type FROM MATRIX_INFO WHERE ID = '$ID'}); $sth->execute(); my $type=$sth->fetchrow_array(); # get reast of annotation as tags $sth=$self->dbh->prepare(qq{SELECT tag, val FROM MATRIX_ANNOTATION WHERE ID = '$ID' }); $sth->execute(); my %tags; while ( my($tag, $val)= $sth->fetchrow_array()){ $tags{$tag}=$val; } my $name= $tags{'name'}; my $class= $tags{'class'}; delete ($tags{'name'}); delete ($tags{'class'}); eval ("\$matrixobj= TFBS::Matrix::$type->new".' ( -ID => $ID."", -name =>$name, -class => $class, -tags => \%tags, -matrixstring=> $matrixstring # FIXME - temporary );'); #if ($@) {$self->throw($@); } #print "ref:",ref ($matrixobj); } # print $matrixobj->ID(); # print "here\n";print $matrixobj->prettyprint(); return ($matrixobj); } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix object was stored in the database (default PFM)) Args : (Matrix_name) Warning : According to the current JASPAR4 data model, name is not necessarily a unique identifier. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You've been warned. =cut sub get_Matrix_by_name { my ($self, $name, $mt) = @_; unless(defined $name) { $self->throw("No name passed to get_Matrix_by_name."); } my @IDlist = $self->_get_IDlist_by_query(-name=>[$name]); my $ID= ($IDlist[0] or $self->warn("No matrix with name $name found.")); if ((my $L= scalar @IDlist) > 1) { $self->warn("There are $L matrices with name '$name'"); } return $self->get_Matrix_by_ID($ID); } =head2 get_MatrixSet Title : get_MatrixSet Usage : my $matrixset = $db->get_MatrixSet(%args); Function: fetches matrix data under for all matrices in the database matching criteria defined by the named arguments and returns a TFBS::MatrixSet object Returns : a TFBS::MatrixSet object Args : This method accepts named arguments, corresponding to arbitrary tags. Note that this is different from JASPAR2. As any tag is supported for database storage, any tag can be used for information retrieval. Additionally, arguments as 'name' and 'class' can be used (even though they are not tags. As with get_Matrix methods, it is important to realize that any matrix format can be stored in the database: the TFBS::MatrixSet might therefore consist of PFMs, ICMs and PWMS, depending on how matrices are stored, Examples include -ID # a reference to an array of IDs (strings) -name # a reference to an array of # transcription factor names (string) -class # a reference to an array of # structural class names (strings) -species # a reference to an array of # Latin species names (strings) -sysgroup # a reference to an array of # higher taxonomic categories (strings) -min_ic # float, minimum total information content # of the matrix. IMPORTANT:if retrieved matrices are in PWM format there is no way to measureinformation content. -matrixtype #string describing type of matrix to retrieve. If left out, the format will revert to the database format. Note that this option only works if the database format is pfm The arguments that expect list references are used in database query formulation: elements within lists are combined with 'OR' operators, and the lists of different types with 'AND'. For example, my $matrixset = $db->(-class => ['TRP_CLUSTER', 'FORKHEAD'], -species => ['Homo sapiens', 'Mus musculus'], ); gives a set of TFBS::Matrix::PFM objects (given that the matrix models are stored as such) whose (structural clas is 'TRP_CLUSTER' OR'FORKHEAD') AND (the species they are derived from is 'Homo sapiens'OR 'Mus musculus'). The -min_ic filter is applied after the query in the sense that the matrices profiles with total information content less than specified are not included in the set. =cut sub get_MatrixSet { my ($self, %args) = @_; my @IDlist = $self->_get_IDlist_by_query(%args); my $type; my $matrixset = TFBS::MatrixSet->new(); foreach (@IDlist) { # print "$_\n"; } foreach (@IDlist) { #next if (defined $args{'-min_ic'} # and $_->_get_total_ic($_) < $args{'-min_ic'}); #evaluate total information content: ivolves actually retrieving matrix # is actually a problem if matrix is stored PWM: thro an error if so my $matrix=$self->get_Matrix_by_ID($_); #evaluate #ugly code: if (defined $args{'-min_ic'} ){ if ($matrix->isa("TFBS::Matrix::PFM")){ next if ( $matrix->to_ICM->total_ic() < $args{'-min_ic'}); } if ($matrix->isa("TFBS::Matrix::ICM")){ next if ($matrix->total_ic() < $args{'-min_ic'}); } if ($matrix->isa("TFBS::Matrix::PWM")){ $self->throw("Cannot evaluate information constent from PWM matrices"); } } #ugly code: if ($args{'-matrixtype'} && $matrix->isa("TFBS::Matrix::PFM")){ if ( $args{'-matrixtype'} eq ('PWM')) { # warn "change"; $matrix= $matrix->to_PWM(); } if ( $args{'-matrixtype'} eq ('ICM')) { #warn "change"; $matrix= $matrix->to_PWM(); } } $matrixset->add_Matrix($matrix); } return $matrixset; } sub store_MatrixSet { $_[0]->throw ("Method store_MtrixSet not yet implemented."); } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (ID) A string Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, @IDs) = @_; eval { foreach my $ID (@IDs) { my $q_ID = $self->dbh->quote($ID); foreach my $table (qw (MATRIX_DATA MATRIX_INFO MATRIX_ANNOTATION ) ) { $self->dbh->do("DELETE from $table where ID=$q_ID"); } } }; return $@; } ######################################################################### # PRIVATE METHODS ######################################################################### sub _new { my ($caller, %args) = @_; my $class = ref $caller || $caller; my $self = bless {}, $class; my ($connectstring, $user, $password); if ($args{'-connect'} and (ref($args{'-connect'}) eq "ARRAY")) { ($connectstring, $user, $password) = @{$args{'-connect'}}; } elsif ($args{'-create'} and (ref($args{'-create'}) eq "ARRAY")) { return $caller->create(@{-args{'create'}}); } else { ($connectstring, $user, $password) = (DEFAULT_CONNECTSTRING, DEFAULT_USER, DEFAULT_PASSWORD); } $self->dbh( DBI->connect($connectstring, $user, $password) ); return $self; } sub _store_matrix_data { my ($self, $pfm, $ACTION) = @_; my @base = qw(A C G T); my $matrix = $pfm->matrix(); my $type; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_DATA VALUES(?,?,?,?) !); for my $i (0..3) { for my $j (0..($pfm->length-1)) { $sth->execute( $pfm->ID, $base[$i], $j+1, $matrix->[$i][$j] ) or $self->throw("Error executing query."); } } } sub _store_matrix_info { my ($self, $pfm, $ACTION) = @_; my $type; $type= 'PFM' if $pfm->isa("TFBS::Matrix::PFM"); $type= 'PWM' if $pfm->isa("TFBS::Matrix::PWM"); $type= 'ICM' if $pfm->isa("TFBS::Matrix::ICM"); my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_INFO (ID, type) VALUES(?,?) !); $sth->execute($pfm->ID, $type, ) or $self->throw("Error executing query"); } sub _store_matrix_annotation { my ($self, $pfm, $ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_ANNOTATION (ID, tag, val) VALUES(?,?,?) !); $sth->execute($pfm->ID, 'name', ($pfm->name() or ""), ); $sth->execute($pfm->ID, 'class', ($pfm->class() or ""), ); # get all tags my %tags= $pfm->all_tags(); foreach my $tag( keys %tags){ $sth->execute($pfm->ID, $tag, ($tags{$tag} or ""), ) or $self->throw("Error executing query"); } } #when creating: try to support arbitrary tags sub _create_tables { # utility function # If you want to change the databse schema, # this is the right place to do it my $dbh = shift; my @queries = ( q! CREATE TABLE MATRIX_DATA( ID VARCHAR (16) DEFAULT '' NOT NULL, row VARCHAR(1) NOT NULL, col TINYINT(3) UNSIGNED NOT NULL, val FLOAT, PRIMARY KEY (ID, row, col) ) !, q! CREATE TABLE MATRIX_INFO( ID VARCHAR (16) DEFAULT '' NOT NULL PRIMARY KEY , type ENUM ('PFM', 'ICM','PWM') DEFAULT 'PFM' NOT NULL ) !, q! CREATE TABLE MATRIX_ANNOTATION( ID VARCHAR (16) DEFAULT '' NOT NULL, tag VARCHAR(255) DEFAULT '' NOT NULL, val TEXT, PRIMARY KEY (ID, tag) ) !, ); foreach my $query (@queries) { $dbh->do($query) or die("Error executing the query: $query\n"); } } sub _get_matrixstring { my ($self, $ID) = @_; #my %dbname = (PWM => 'pwm', PFM => 'raw', ICM => 'info'); #unless (defined $dbname{$mt}) { #$self->throw("Unsupported matrix type: ".$mt); #} my $sth; my $qID = $self->dbh->quote($ID); my $matrixstring = ""; foreach my $base (qw(A C G T)) { $sth=$self->dbh->prepare ("SELECT val FROM MATRIX_DATA WHERE ID=$qID AND row='$base' ORDER BY col"); $sth->execute; $matrixstring .= join (" ", (map {$_->[0]} @{$sth->fetchall_arrayref()}))."\n"; } $sth->finish; return undef if $matrixstring eq "\n"x4; return $matrixstring; } sub _get_IDlist_by_query { # called by get_MatrixSet # should be able to search for arbitrary tags...hmmm my ($self, %args) = @_; my ($TABLES, %arrayref); my (%intersected_set); foreach my $key(keys %args){ unless ( $key eq "-min_ic" or $key eq "-matrixtype"){ my $oldkey=$key; $key=~s/-//; $arrayref{$key}= $args{$oldkey}; } } my @andconditions; $TABLES = 'MATRIX_ANNOTATION '; #special case: get all matrices unless (keys %arrayref){ my $query = "SELECT DISTINCT ID FROM $TABLES "; my $sth = $self->dbh->prepare($query); $sth->execute() or $self->throw("Query failed:\n$query\n"); my @ary; while (my ($id) = $sth->fetchrow_array()) { push (@ary, $id); } return(@ary); } foreach my $key (keys %arrayref) { #print "key: $key\n"; if ($key eq 'ID'){ push @andconditions, "(". join(" OR ", (map {"MATRIX_ANNOTATION.ID=". $self->dbh->quote($_) } @{$arrayref{$key}} )). ")"; } else{ push @andconditions, "(". join(" OR ", (map {"MATRIX_ANNOTATION.tag=". $self->dbh->quote($key)." AND val=". $self->dbh->quote($_) } @{$arrayref{$key}} )). ")"; } push (@andconditions, 1) unless(@andconditions); my $WHERE = ((scalar @andconditions) == 0) ? "" : " WHERE "; my $query = "SELECT DISTINCT ID FROM $TABLES $WHERE". join(" AND ", @andconditions); # warn $query; undef @andconditions; my $sth = $self->dbh->prepare($query); $sth->execute() or $self->throw("Query failed:\n$query\n"); # collect IDs and return my %current_query; while (my ($id) = $sth->fetchrow_array()) { $current_query{$id}=1; } unless (%intersected_set){ %intersected_set= %current_query; next; } # do intersect foreach my $key (keys %intersected_set){ delete $intersected_set{ $key} unless $current_query{$key}; } } my @ary; foreach my $key (keys %intersected_set){ push (@ary, $key); # warn "$key\n"; } return (@ary); } sub DESTROY { $_[0]->dbh->disconnect() if $_[0]->dbh; } TFBS-0.7.1/TFBS/DB/JASPAR5.pm000066400000000000000000001041201305752266700147160ustar00rootroot00000000000000# TFBS module for TFBS::DB::JASPAR5 # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::JASPAR5 - interface to MySQL relational database of pattern matrices. Currently status: experimental. =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing JASPAR5-type database my $db = TFBS::DB::JASPAR5->connect("dbi:mysql:JASPAR5:myhost", "myusername", "mypassword"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M0079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); =item * creating a new JASPAR5-type database named MYJASPAR5: my $db = TFBS::DB::JASPAR4->create("dbi:mysql:MYJASPAR5:myhost", "myusername", "mypassword"); =item * storing a matrix in the database (currently only PFMs): #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::JASPAR5 is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. The interface is nearly identical to the JASPAR2 and JASPAR4 interface, while the underlying data model is different =head1 JASPAR5 DATA MODEL JASPAR5 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed (JASPAR2) to store matrices for the JASPAR database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile ID (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional, and custom, tags. Here goes a moore thorough description on tables and IDs ----------------------- ADVANCED --------------------------------- For the developers and the curious, here is the JASPAR5 data model: MISSING TEXT HEER ON HOW IT WORKS It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most TFBS users. Most users should only know the methods to store the data and which tags are supported. ------------------------------------------------------------------------- =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::JASPAR5; use vars qw(@ISA $AUTOLOAD); # we need all three matrices due to the redundancy in JASPAR2 data model # which will hopefully be removed in JASPAR3 use TFBS::Matrix::PWM; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::MatrixSet; use Bio::Root::Root; use DBI; # use TFBS::DB; # eventually use strict; @ISA = qw(TFBS::DB Bio::Root::Root); ######################################################################### # CONSTANTS ######################################################################### use constant DEFAULT_CONNECTSTRING => "dbi:mysql:JASPAR_DEMO"; # on localhost use constant DEFAULT_USER => ""; use constant DEFAULT_PASSWORD => ""; ######################################################################### # PUBLIC METHODS ######################################################################### =head2 new Title : new Usage : DEPRECATED - for backward compatibility only Use connect() or create() instead =cut sub new { _new (@_); } =head2 connect Title : connect Usage : my $db = TFBS::DB::JASPAR5->connect("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the existing JASPAR5-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR5 object Args : a standard database connection triplet ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, use the actual values. PASSWORD and USERNAME might be optional, depending on the user's acces permissions for the database server. =cut sub connect { #DONE # a more intuitive syntax for the constructor my ($caller, @connection_args) = @_; $caller->new(-connect => \@connection_args); } =head2 dbh Title : dbh Usage : my $dbh = $db->dbh(); $dbh->do("UPDATE matrix_data SET name='ADD1' WHERE NAME='SREBP2'"); Function: returns the DBI database handle of the MySQL database interfaced by $db; THIS IS USED FOR WRITING NEW METHODS FOR DIRECT RELATIONAL DATABASE MANIPULATION - if you have write access AND do not know what you are doing, you can severely corrupt the data For documentation about database handle methods, see L Returns : the database (DBI) handle of the MySQL JASPAR2-type relational database associated with the TFBS::DB::JASPAR2 object Args : none =cut sub dbh { #DONE my ($self, $dbh) = @_; $self->{'dbh'} = $dbh if $dbh; return $self->{'dbh'}; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($matrixobject); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (PFM_object) A TFBS::Matrix::PFM, FBS::Matrix::PWM or FBS::Matrix::ICM object. PFM object are recommended to use, as they are eaily converted to other formats # might have to give version and collection here Comment : this is an experimental method that is not 100% bulletproof; use at your own risk =cut sub store_Matrix { #PROBABLY DONE # collection, version are taken from the corresponding tags. Warn if they are not there ; my ($self, @PFMs) = @_; my $err; foreach my $pfm (@PFMs) { eval { my $int_id= $self->_store_matrix($pfm); # needs to have collection and version $self->_store_matrix_data($pfm, $int_id); $self->_store_matrix_annotation($pfm, $int_id); $self->_store_matrix_species($pfm, $int_id); $self->_store_matrix_acc($pfm, $int_id); }; } return $@; } sub create { #done my ($caller, $connectstring, $user, $password) = @_; if ($connectstring and $connectstring =~ /dbi:mysql:(\w+)(.*)/) { # connect to the server; my $dbh=DBI->connect("dbi:mysql:mysql".$2, $user,$password) or die("Error connecting to the database"); # create database and open it $dbh->do("create database $1") or die("Error creating database."); $dbh->do("use $1"); # create tables _create_tables($dbh); $dbh->disconnect; # run "new" with new database return $caller->new(-connect=>[$connectstring, $user, $password]); } else { die("Missing or malformed connect string for ". "TFBS::DB::JASPAR2 connection."); } } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix is stored in the database (PFM is default) Args : (Matrix_ID) Matrix_ID id is a string which refers to the stable JASPAR ID (usually something like "MA0001") with or without version numbers. "MA0001" will give the latest version on MA0001, while "MA0001.2" will give the second version, if existing. Warnings will be given for non-existing matrices. =cut sub get_Matrix_by_ID { #DONE. MAYBE :) my ($self, $q, $mt) = @_; # q is a stable ID with possible version number $mt = (uc($mt) or "PFM"); unless (defined $q) { $self->throw("No ID passed to get_Matrix_by_ID"); } my $ucmt = uc $mt; # separate stable ID and version number my ($base_ID, $version)= split (/\./, $q); $version=$self->_get_latest_version($base_ID) unless $version; # latest version per default # get internal ID - also a check for validity my $int_id= $self->_get_internal_id($base_ID, $version); # get matrix using internal ID my $m= $self->_get_Matrix_by_int_id($int_id, $ucmt); warn ref ($m); return ($m); } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix object was stored in the database (default PFM)) Args : (Matrix_name) Warning : According to the current JASPAR5 data model, name is not necessarily a unique identifier. Also, names change over time. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You've been warned. Some matrices have multiple versions. The function will return the latest version. For specific versions, use get_Matrix_by_ID($ID.$version) =cut sub get_Matrix_by_name { #DONE my ($self, $name, $mt) = @_; unless(defined $name) { $self->throw("No name passed to get_Matrix_by_name."); } # sanity check: are there many different stable IDs with same name? my $sth=$self->dbh->prepare(qq!SELECT distinct BASE_ID FROM MATRIX WHERE NAME="$name"!); $sth->execute(); my (@stable_ids)=$sth->fetchrow_array(); my $L =scalar @stable_ids; $self->warn("There are $L distinct stable IDs with name '$name'") if scalar $L>1; return $self->get_Matrix_by_ID($stable_ids[0], $mt); } =head2 get_MatrixSet Title : get_MatrixSet Usage : my $matrixset = $db->get_MatrixSet(%args); Function: fetches matrix data under for all matrices in the database matching criteria defined by the named arguments and returns a TFBS::MatrixSet object Returns : a TFBS::MatrixSet object Args : This method accepts named arguments, corresponding to arbitrary tags, and also some utility functions Note that this is different from JASPAR2 and to some extent JASPAR4. As any tag is supported for database storage, any tag can be used for information retrieval. Additionally, arguments as 'name','class','collection' can be used (even though they are not tags. Per default, only the last version of the matrix is given. The only way to get older matrices out of this to use an array of IDs with actual versions like MA0001.1, or set the argyment -all_versions=>1, in which case you get all versions for each stable ID Examples include: Fundamental matrix features -all # gives absolutely all matrix entry, regardless of versin and collection. Only useful for backup situations and sanity checks. Takes precedence over everything else -ID # a reference to an array of stable IDs (strings), with or without version, as above. tyically something like "MA0001.2" . Takes precedence over everything salve -all -name # a reference to an array of # transcription factor names (string). Will only take latest version. NOT a preferred way to access since names change over time -collection # a string corresponding to a JASPAR collection. Per default CORE -all_versions # gives all matrix versions that fit with rest of criteria, including obsolete ones.Is off per default. # Typical usage is in combiation with a stable IDs withou versions to get all versinos of a particular matrix Typical tag queries: These can be either a string or a reference to an array of strings. If it is an arrau it will be interpreted as as an "or"s statement -class # a reference to an array of # structural class names (strings) -species # a reference to an array of # NCBI Taxonomy IDs (integers) -taxgroup # a reference to an array of # higher taxonomic categories (string) Computed features of the matrices -min_ic # float, minimum total information content # of the matrix. -matrixtype #string describing type of matrix to retrieve. If left out, the format will revert to the database format, which is PFM. The arguments that expect list references are used in database query formulation: elements within lists are combined with 'OR' operators, and the lists of different types with 'AND'. For example, my $matrixset = $db->(-class => ['TRP_CLUSTER', 'FORKHEAD'], -species => ['Homo sapiens', 'Mus musculus'], ); gives a set of TFBS::Matrix::PFM objects (given that the matrix models are stored as such) whose (structural clas is 'TRP_CLUSTER' OR'FORKHEAD') AND (the species they are derived from is 'Homo sapiens'OR 'Mus musculus'). As above, unless IDs with version numbers are used, only one matrix per stable ID wil be returned: the matrix with the highest version number The -min_ic filter is applied after the query in the sense that the matrices profiles with total information content less than specified are not included in the set. =cut sub get_MatrixSet { # IC conetent and matrix stuff is not there yet, rest should work my ($self, %args) = @_; $args{'-collection'}='CORE' unless $args{'-collection'}; $args{'-all_versions'}=0 unless $args{'-all_versions'}; my @IDlist = @{$self->_get_IDlist_by_query(%args)}; # the IDlist here are INTERNAL ids my $type; my $matrixset = TFBS::MatrixSet->new(); foreach my $int_id(@IDlist) { my $matrix=$self->_get_Matrix_by_int_id($int_id); if (defined $args{'-min_ic'} ){ # we assume the matrix IS a PFM, o something in normal space at least # unless it explicitly says otherwise in tag=matrixtype # if so warn and do not use IC content # this is not foolproof in any way if ( $matrix->{tags}{matrixtype} eq "ICM"){ next if ( $matrix->total_ic() < $args{'-min_ic'}); } elsif ($matrix->isa("TFBS::Matrix::PFM")){ next if ( $matrix->to_ICM->total_ic() < $args{'-min_ic'}); } else{ warn "Warning: you are assessning information content on matrices that are not in PFM or ICM format.Skipping this criteria"; next; } } # length if (defined $args{'-length'} ){ next if ( $matrix->length() < $args{'-length'}); } # number of sites within # since column sums MIGHT be slightly different we take the integer of the mean of the columns # or really int( sum of matrix/#columns) if (defined $args{'-sites'} ){ my $sum=0; foreach ( 1..$matrix->length){ $sum+=$matrix->column_sum(); } $sum=int($sum /$matrix->length); warn $matrix->ID, " $sum is $sum"; next if ( $sum < $args{'-sites'}); } #ugly code: think about this a bit. if ($args{'-matrixtype'} && $matrix->isa("TFBS::Matrix::PFM")){ if ( $args{'-matrixtype'} eq ('PWM')) { $matrix= $matrix->to_PWM(); } if ( $args{'-matrixtype'} eq ('ICM')) { $matrix= $matrix->to_PWM(); } } $matrixset->add_Matrix($matrix); } return $matrixset; } sub store_MatrixSet { #DONE a wrapper around store_Matrix (which also can take an array of matrices, so utility only my ($self, $matrixset) = @_; my $it=$matrixset->Iterator(); while (my $matrix_object = $it->next) { # do whatever you want with individual matrix objects $self->store_Matrix($matrix_object) } } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045.1'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-ike and may change in future versions) Args : (ID) A string. Has to be a matrix ID with version suffix in JASPAR5. Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, @IDs) = @_; # this has to be versioned IDs foreach my $ID (@IDs){ my ($base_id, $version)= split (/\./, $ID); unless ($version) { warn "You have supplied a non-versioned matrix ID to delete. Skipping $ID "; return 0; } # get relevant internal ID my($int_id)= $self->_get_internal_id($base_id, $version); eval { my $q_ID = $self->dbh->quote($int_id); foreach my $table (qw (MATRIX_DATA MATRIX MATRIX_SPECIES MATRIX_PROTEIN MATRIX_ANNOTATION ) ) { $self->dbh->do("DELETE from $table where ID=$q_ID"); } }; } return $@; } ######################################################################### # PRIVATE METHODS ######################################################################### sub _new { #PROBABLY OK my ($caller, %args) = @_; my $class = ref $caller || $caller; my $self = bless {}, $class; my ($connectstring, $user, $password); if ($args{'-connect'} and (ref($args{'-connect'}) eq "ARRAY")) { ($connectstring, $user, $password) = @{$args{'-connect'}}; } elsif ($args{'-create'} and (ref($args{'-create'}) eq "ARRAY")) { return $caller->create(@{-args{'create'}}); } else { ($connectstring, $user, $password) = (DEFAULT_CONNECTSTRING, DEFAULT_USER, DEFAULT_PASSWORD); } $self->dbh( DBI->connect($connectstring, $user, $password) ); return $self; } sub _store_matrix_data {# DONE my ($self, $pfm, $int_id,$ACTION) = @_; my @base = qw(A C G T); my $matrix = $pfm->matrix(); my $type; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_DATA VALUES(?,?,?,?) !); for my $i (0..3) { for my $j (0..($pfm->length-1)) { $sth->execute( $int_id, $base[$i], $j+1, $matrix->[$i][$j] ) or $self->throw("Error executing query."); } } } sub _store_matrix { #DONE my ($self, $pfm, $ACTION) = @_; # creation of the matrix will also give an internal unique ID (incremental int) # which will be returned to use for the other tables # Get collection and versio from the matrix tags my $version= $pfm->{'tags'}{'version'}; # will warn but not die if version is missing: will assume 1 unless ($version) { warn "WARNING: Lacking version number for ". $pfm->ID. ". Setting version=1"; $version=1; } my $collection= $pfm->{'tags'}{'collection'}; unless ($collection) { warn "WARNING: Lacking collection name for ". $pfm->ID. ". Setting collection to an empty string. You probably do not want this"; $collection=''; } # sanity check: do we alsready have this cobination of base ID and version? If we do, die my $base_id= $pfm->ID ; my $sth = $self->dbh->prepare (qq! select count(*) from MATRIX where VERSION=$version and BASE_ID= "$base_id"and collection="$collection" !); $sth->execute; my ($sanity_count)= $sth->fetchrow_array; if ($sanity_count >0){ warn "WARNING: Database input inconsistency: You have already have $sanity_count $base_id matrices of version $version in collection $collection. Terminating program"; die; } # insert data $sth = $self->dbh->prepare (q! INSERT INTO MATRIX VALUES(?,?,?,?,?) !); # update next sth with actual version and collection: DO $sth->execute(0, $collection,$pfm->ID,$version,$pfm->name) or $self->throw("Error executing query"); # get the actual (new) iternal ID my $int_id = $self->dbh->{ q{mysql_insertid}}; return $int_id; } sub _store_matrix_annotation { # DONE #this is for tag-value items that are not one-to-many (so, not species and not acc) my ($self, $pfm, $int_id,$ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_ANNOTATION (ID, tag, val) VALUES(?,?,?) !); # get all tags # but skip out collection or version as we already have those in the MATRIX table #special handling for class which mighht have a true slot my %tags= $pfm->all_tags(); if (defined ($pfm->{class})){ $tags{class}=$pfm->{class} ; } foreach my $tag( keys %tags){ next if $tag eq "collection"; next if $tag eq "version"; next if $tag eq "species"; next if $tag eq "acc"; # next if $tag eq "class"; $sth->execute($int_id, $tag, ($tags{$tag} or ""), ) or $self->throw("Error executing query"); } } sub _store_matrix_species { # DONE #these are for species IDs - can be several # these are taken from the tag "species" # if that tag is a reference to an array we walk over the array # if it is a comma-separated string we split the string my ($self, $pfm, $int_id,$ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_SPECIES VALUES(?,?) !); #sanity check: are there any species? Its ok not to have it. return() unless $pfm->{'tags'}{'species'}; #is the species a string or an arrayref? if ( ref ($pfm->{'tags'}{'species'}) eq 'ARRAY'){ # walkthru array foreach my $species ( @{$pfm->{'tags'}{'species'}}){ $sth->execute($int_id,$species); } } else{ # split and walk thru foreach my $species ( split(/\,/, $pfm->{'tags'}{'species'})){ $species=~s/^\s//g; $sth->execute($int_id,$species); } } } sub _store_matrix_acc { # DONE #these are for protein accession numbers - can be several # these are taken from the tag "acc" # if that tag is a reference to an array we walk over the array # if it is a comma-separated string we split the string my ($self, $pfm, $int_id,$ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_PROTEIN VALUES(?,?) !); #sanity check: are there any accession numbers? Its ok not to have it. return() unless $pfm->{'tags'}{'acc'}; #is the species a string or an arrayref? if ( ref ($pfm->{'tags'}{'acc'}) eq 'ARRAY'){ # walkthru array foreach my $acc ( @{$pfm->{'tags'}{'acc'}}){ $acc=~s/\s//g; $sth->execute($int_id,$acc); } } else{ # split and walk thru foreach my $acc ( split(/\,/, $pfm->{'tags'}{'acc'})){ $acc=~s/\s//g; $sth->execute($int_id,$acc); } } } #when creating: try to support arbitrary tags sub _create_tables { # DONE # utility function # If you want to change the databse schema, # this is the right place to do it my $dbh = shift; my @queries = ( q! CREATE TABLE MATRIX( ID INT NOT NULL AUTO_INCREMENT, COLLECTION VARCHAR (16) DEFAULT '', BASE_ID VARCHAR (16)DEFAULT '' NOT NULL , VERSION TINYINT DEFAULT 1 NOT NULL , NAME VARCHAR (255) DEFAULT '' NOT NULL, PRIMARY KEY (ID)) !, q! CREATE TABLE MATRIX_DATA( ID INT NOT NULL, row VARCHAR(1) NOT NULL, col TINYINT(3) UNSIGNED NOT NULL, val float(10,3), PRIMARY KEY (ID, row, col) ) !, q! CREATE TABLE MATRIX_ANNOTATION( ID INT NOT NULL, TAG VARCHAR(255)DEFAULT '' NOT NULL, VAL varchar(255) DEFAULT '', PRIMARY KEY (ID, TAG) ) !, q! CREATE TABLE MATRIX_SPECIES( ID INT NOT NULL, TAX_ID VARCHAR(255)DEFAULT '' NOT NULL ) !, q! CREATE TABLE MATRIX_PROTEIN( ID INT NOT NULL, ACC VARCHAR(255)DEFAULT '' NOT NULL ) ! ); foreach my $query (@queries) { $dbh->do($query) or die("Error executing the query: $query\n"); } } sub _get_matrixstring { #DONE my ($self, $ID) = @_; #my %dbname = (PWM => 'pwm', PFM => 'raw', ICM => 'info'); #unless (defined $dbname{$mt}) { #$self->throw("Unsupported matrix type: ".$mt); #} my $sth; my $qID = $self->dbh->quote($ID); my $matrixstring = ""; foreach my $base (qw(A C G T)) { $sth=$self->dbh->prepare ("SELECT val FROM MATRIX_DATA WHERE ID=$qID AND row='$base' ORDER BY col"); $sth->execute; $matrixstring .= join (" ", (map {$_->[0]} @{$sth->fetchall_arrayref()}))."\n"; } $sth->finish; return undef if $matrixstring eq "\n"x4; return $matrixstring; } sub _get_latest_version { #DONE my ($self, $base_ID) = @_; # SELECT VERSION FROM MATRIX WHERE BASE_ID=? ORDER BY VERSION DESC LIMIT 1 my $sth=$self->dbh->prepare (qq!SELECT VERSION FROM MATRIX WHERE BASE_ID="$base_ID" ORDER BY VERSION DESC LIMIT 1!); $sth->execute; my ($latest)=$sth->fetchrow_array(); return($latest); } sub _get_internal_id { #DONE # picks out the internal id for a a stable id+ version. Also checks if this cobo exists or not my ($self, $base_ID, $version) = @_; # SELECT ID FROM MATRIX WHERE BASE_ID=? and VERSION=? my $sth=$self->dbh->prepare (qq!SELECT ID FROM MATRIX WHERE BASE_ID="$base_ID" AND VERSION="$version"!); $sth->execute; my ($int_id)=$sth->fetchrow_array(); return($int_id); } sub _get_Matrix_by_int_id { #done my ($self, $int_id, $mt)= @_; my $matrixobj; $mt='PFM' unless $mt; # get the matrix as a string my $matrixstring = $self->_get_matrixstring($int_id) || return undef; #get remaining data in the matrix table: name, collection my $sth=$self->dbh->prepare(qq!SELECT BASE_ID,VERSION, COLLECTION,NAME FROM MATRIX WHERE ID="$int_id"!); $sth->execute(); my ($base_ID, $version,$collection,$name)=$sth->fetchrow_array(); # get species $sth=$self->dbh->prepare(qq!SELECT TAX_ID FROM MATRIX_SPECIES WHERE ID="$int_id"!); $sth->execute(); my @tax_ids; while (my ($specie)=$sth->fetchrow_array()) { push(@tax_ids, $specie); } # get acc $sth=$self->dbh->prepare(qq!SELECT ACC FROM MATRIX_PROTEIN WHERE ID="$int_id"!); $sth->execute(); my @accs; while (my ($a)=$sth->fetchrow_array()) { push(@accs, $a); } # get remaining annotation as tags, form ANNOTATION table my %tags; $sth=$self->dbh->prepare(qq{SELECT TAG, VAL FROM MATRIX_ANNOTATION WHERE ID = "$int_id" }); $sth->execute(); while ( my($tag, $val)= $sth->fetchrow_array()) { $tags{$tag}=$val; } $tags{'collection'}= $collection; $tags{'species'}=\@tax_ids; # as array reference instead of strigifying $tags{'acc'}=\@accs; # same # my $class= $tags{'class'}; delete ($tags{'class'}); # eval("\$matrixobj= TFBS::Matrix::PFM->new".' ( -ID => "$base_ID.$version", -name =>$name, -class => $class, -tags => \%tags, -matrixstring=> $matrixstring # FIXME - temporary );' ); if ($@) { $self->throw($@); } # warn $int_id, "\t", ref($matrixobj); return ($matrixobj->to_PWM) if $mt eq "PWM"; return ($matrixobj->to_ICM) if $mt eq "ICM"; return ($matrixobj); # default PFM } sub _get_IDlist_by_query { #needs cleanup. NOT for the faint-hearted. my ($self, %args)=@_; # called by get_MatrixSet # warn $args{"-collection"}; $args{'-collection'}='CORE' unless $args{'-collection'}; # returns a set of internal IDs with whicj to get the actual matrices # current idea: # 1: first catch non-tag things like collection, name and version, species # makw one query for these if they are named and check the IDs for "latest" unless requested not to. # these are AND statements # 2:then do the rest on tag level: # to be able to do this with actual and tattemnet innthe tag table, we do an inner join query, which is kept separate just for convenice # we then intersect 1 and 2 # 3: then do matrix-based features such as ic, with, number of sites etc, for the surviving matrices. This shold happen in the get_matrixset part my @int_ids_to_return; # should redo so that matrix_annotation queries are separate, with an intersect in the end #special case 1: get ALL matrices. Higher priority than all if ($args{'-all'}) { my $sth=$self->dbh->prepare (qq!SELECT ID FROM MATRIX!); $sth->execute(); my @a; while ( my ($i)=$sth->fetchrow_array()) { push (@a, $i); } return \@a; } # ids: special case2 which is has higher priority than any other except the above (ignore all others if ($args{'-ID'}) { # these might be either stable IDs or stableid.version. # if just stable ID and if all_versions==1, take all versions, otherwise the latest if ( $args{-all_versions}) { my $sth=$self->dbh->prepare (qq!SELECT ID FROM MATRIX WHERE BASE_ID=?!); foreach my $stID(@{$args{'-ID'}}) { my ($stable_ID, $version)= split (/\./, $stID); # ignore vesion here, this is a stupidity filter $sth->execute($stable_ID); while( my ($int_id)=$sth->fetchrow_array()) { push (@int_ids_to_return, $int_id); } } } else { # only the lastest version, or the requested version foreach my $stID(@{$args{'-ID'}}) { #warn $stID; my ($stable_ID, $version)= split (/\./, $stID); $version=$self->_get_latest_version($stable_ID) unless $version; my $int_id= $self->_get_internal_id($stable_ID, $version); push (@int_ids_to_return, $int_id) if $int_id; } } return \@int_ids_to_return } my @tables=("MATRIX M"); my @and; # in matrix table: collection, if ( $args{-collection}) { my $q=' (COLLECTION='; if (ref $args{-collection} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-collection}}) { push (@a, "\"$_\""); } $q.= join ( " or COLLECTION=", @a); } else {# just one - typical usage $q.="\"$args{-collection}\""; } $q.=" ) "; push (@and, $q); } # in matrix table: names. Is something that is basically only used from the web interface # typically used by the get_matrix_by_name function instead if ($args{-name}) { my $q=' (NAME='; if (ref $args{-name} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-name}}) { push (@a, "\"$_\""); } $q.= join ( " or NAME=", @a); } else {# just one - typical usage $q.="\"$args{-name}\""; } $q.=" ) "; push (@and, $q); } # in species table: tax.id: possibly many species with OR in between if ( $args{-species}) { push (@tables , "MATRIX_SPECIES S"); my $q=" M.ID=S.ID and (TAX_ID= ";; if (ref $args{-species} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-species}}) { push (@a, "\"$_\""); } $q.= join ( " or TAX_ID=", @a); } else {# just one - typical usage $q.="=\"$args{-species}\""; } $q.=") "; push (@and, $q); } # TAG_BASED # an internal join query:should be able to handle up to 26 tags-value combos with ANDS in between # Very ugly code ahead: my (@inner_tables, @internal_ands1,@internal_ands2 ); my $int_counter=0; # for keeping track of names; my @alpha = ("a" .. "z"); my %arrayref; foreach my $key(keys %args) { next if $key eq "-min_ic"; next if $key eq "-matrixtype"; next if $key eq "-species"; next if $key eq "-collection"; next if $key eq "-all_versions"; next if $key eq "-all"; next if $key eq "-ID"; next if $key eq "-length"; next if $key eq "-name"; my $oldkey=$key; $key=~s/-//; $arrayref{$key}= $args{$oldkey}; } if (%arrayref) { # get an internal name for the table push (@internal_ands2 , " M.ID=a.ID " ); my @a; foreach my $key (keys %arrayref) { my $tname= $alpha[$int_counter]; push (@inner_tables , "MATRIX_ANNOTATION $tname"); push (@internal_ands1 , $alpha[$int_counter].".ID=". $alpha[$int_counter-1].".ID") unless $int_counter==0; $int_counter++; # is the thing aupplied an array reference in inteslf: make an "or" query from that if ( ref $arrayref{$key} eq "ARRAY") { my @b; foreach( @{$arrayref{$key}}) { push (@b, $self->dbh->quote($_)); } my $orstring= join (" or $tname.VAL=" , @b); push (@a, "($tname.TAG=\"$key\" AND ($tname.VAL=$orstring))"); } #or not else { push (@a, "($tname.TAG=\"$key\" AND $tname.VAL=\"$arrayref{$key}\")"); } } my $s= " ( ". join (" AND ", @a). ")"; push (@internal_ands2 , $s); } my $qq= "SELECT distinct(M.ID) from ". join (",", (@tables,@inner_tables)) . " where" . join ( " AND ", (@and,@internal_ands1, @internal_ands2 )); # warn $qq; #do actual mammoth query,and check for latest matrix my $sth=$self->dbh->prepare ($qq); $sth->execute(); my @r; while (my($int_id)= $sth->fetchrow_array) { if ($args{-all_versions}) { push (@r,$int_id); } else { # is latest? push(@r,$int_id) if ( $self->_is_latest_version($int_id) ==1); } } warn "Warning: Zero matrices returned with current critera" unless scalar @r; return \@r; } sub _is_latest_version{ # is a particular internal ID representingthe latest matrix (collapse on base ids) my ($self, $int_id)=@_; my $sth=$self->dbh->prepare( qq! select count(*) from MATRIX where BASE_ID= (SELECT BASE_ID from MATRIX where ID=$int_id) AND VERSION>(SELECT VERSION from MATRIX where ID=$int_id) !); $sth->execute(); my ($count)= $sth->fetchrow_array(); return(1) if $count ==0;# no matrices with higher version ID and same base id return(0); } sub DESTROY { #OK $_[0]->dbh->disconnect() if $_[0]->dbh; } TFBS-0.7.1/TFBS/DB/JASPAR6.pm000077500000000000000000001121301305752266700147220ustar00rootroot00000000000000# TFBS module for TFBS::DB::JASPAR6 # # Copyright Boris Lenhard # Maintainer Xiaobei Zhao # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::JASPAR6 - interface to MySQL relational database of pattern matrices. Currently status: experimental. =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing JASPAR6-type database my $db = TFBS::DB::JASPAR6->connect("dbi:mysql:JASPAR6:myhost", "myusername", "mypassword"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M0079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); =item * creating a new JASPAR6-type database named MYJASPAR6: my $db = TFBS::DB::JASPAR4->create("dbi:mysql:MYJASPAR6:myhost", "myusername", "mypassword"); =item * storing a matrix in the database (currently only PFMs): #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::JASPAR6 is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. The interface is nearly identical to the JASPAR2 and JASPAR4 interface, while the underlying data model is different =head1 JASPAR6 DATA MODEL JASPAR6 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed (JASPAR2) to store matrices for the JASPAR database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile ID (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional, and custom, tags. Here goes a moore thorough description on tables and IDs ----------------------- ADVANCED --------------------------------- For the developers and the curious, here is the JASPAR6 data model: MISSING TEXT HEER ON HOW IT WORKS It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most TFBS users. Most users should only know the methods to store the data and which tags are supported. ------------------------------------------------------------------------- =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::JASPAR6; use vars qw(@ISA $AUTOLOAD); # we need all three matrices due to the redundancy in JASPAR2 data model # which will hopefully be removed in JASPAR3 use TFBS::Matrix::PWM; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::MatrixSet; use Bio::Root::Root; use DBI; # use TFBS::DB; # eventually use strict; @ISA = qw(TFBS::DB Bio::Root::Root); ######################################################################### # CONSTANTS ######################################################################### use constant DEFAULT_CONNECTSTRING => "dbi:mysql:JASPAR_DEMO"; # on localhost use constant DEFAULT_USER => ""; use constant DEFAULT_PASSWORD => ""; ######################################################################### # PUBLIC METHODS ######################################################################### =head2 new Title : new Usage : DEPRECATED - for backward compatibility only Use connect() or create() instead =cut sub new { _new (@_); } =head2 connect Title : connect Usage : my $db = TFBS::DB::JASPAR6->connect("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the existing JASPAR6-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR6 object Args : a standard database connection triplet ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, use the actual values. PASSWORD and USERNAME might be optional, depending on the user's acces permissions for the database server. =cut sub connect { #DONE # a more intuitive syntax for the constructor my ($caller, @connection_args) = @_; $caller->new(-connect => \@connection_args); } =head2 dbh Title : dbh Usage : my $dbh = $db->dbh(); $dbh->do("UPDATE matrix_data SET name='ADD1' WHERE NAME='SREBP2'"); Function: returns the DBI database handle of the MySQL database interfaced by $db; THIS IS USED FOR WRITING NEW METHODS FOR DIRECT RELATIONAL DATABASE MANIPULATION - if you have write access AND do not know what you are doing, you can severely corrupt the data For documentation about database handle methods, see L Returns : the database (DBI) handle of the MySQL JASPAR2-type relational database associated with the TFBS::DB::JASPAR2 object Args : none =cut sub dbh { #DONE my ($self, $dbh) = @_; $self->{'dbh'} = $dbh if $dbh; return $self->{'dbh'}; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($matrixobject); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (PFM_object) A TFBS::Matrix::PFM, FBS::Matrix::PWM or FBS::Matrix::ICM object. PFM object are recommended to use, as they are eaily converted to other formats # might have to give version and collection here Comment : this is an experimental method that is not 100% bulletproof; use at your own risk =cut sub store_Matrix { #PROBABLY DONE # collection, version are taken from the corresponding tags. Warn if they are not there ; my ($self, @PFMs) = @_; my $err; foreach my $pfm (@PFMs) { eval { my $int_id= $self->_store_matrix($pfm); # needs to have collection and version $self->_store_matrix_data($pfm, $int_id); $self->_store_matrix_annotation($pfm, $int_id); $self->_store_matrix_species($pfm, $int_id); $self->_store_matrix_acc($pfm, $int_id); }; } return $@; } sub create { #done my ($caller, $connectstring, $user, $password) = @_; if ($connectstring and $connectstring =~ /dbi:mysql:(\w+)(.*)/) { # connect to the server; my $dbh=DBI->connect("dbi:mysql:mysql".$2, $user,$password) or die("Error connecting to the database"); # create database and open it $dbh->do("create database $1") or die("Error creating database."); $dbh->do("use $1"); # create tables _create_tables($dbh); $dbh->disconnect; # run "new" with new database return $caller->new(-connect=>[$connectstring, $user, $password]); } else { die("Missing or malformed connect string for ". "TFBS::DB::JASPAR2 connection."); } } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix is stored in the database (PFM is default) Args : (Matrix_ID) Matrix_ID id is a string which refers to the stable JASPAR ID (usually something like "MA0001") with or without version numbers. "MA0001" will give the latest version on MA0001, while "MA0001.2" will give the second version, if existing. Warnings will be given for non-existing matrices. =cut sub get_Matrix_by_ID { #DONE. MAYBE :) my ($self, $q, $mt) = @_; # q is a stable ID with possible version number # jsp6 $mt = (uc($mt) or "PFM"); unless (defined $q) { $self->throw("No ID passed to get_Matrix_by_ID"); } my $ucmt = uc $mt; # separate stable ID and version number my ($base_ID, $version)= split (/\./, $q); $version=$self->_get_latest_version($base_ID) unless $version; # latest version per default # get internal ID - also a check for validity my $int_id= $self->_get_internal_id($base_ID, $version); # get matrix using internal ID my $m= $self->_get_Matrix_by_int_id($int_id, $ucmt); warn ref ($m); return ($m); } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix object was stored in the database (default PFM)) Args : (Matrix_name) Warning : According to the current JASPAR6 data model, name is not necessarily a unique identifier. Also, names change over time. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You've been warned. Some matrices have multiple versions. The function will return the latest version. For specific versions, use get_Matrix_by_ID($ID.$version) =cut sub get_Matrix_by_name { #DONE my ($self, $name, $mt) = @_; unless(defined $name) { $self->throw("No name passed to get_Matrix_by_name."); } # sanity check: are there many different stable IDs with same name? my $sth=$self->dbh->prepare(qq!SELECT distinct BASE_ID FROM MATRIX WHERE NAME="$name"!); $sth->execute(); my (@stable_ids)=$sth->fetchrow_array(); my $L =scalar @stable_ids; $self->warn("There are $L distinct stable IDs with name '$name'") if scalar $L>1; return $self->get_Matrix_by_ID($stable_ids[0], $mt); } =head2 get_MatrixSet Title : get_MatrixSet Usage : my $matrixset = $db->get_MatrixSet(%args); Function: fetches matrix data under for all matrices in the database matching criteria defined by the named arguments and returns a TFBS::MatrixSet object Returns : a TFBS::MatrixSet object Args : This method accepts named arguments, corresponding to arbitrary tags, and also some utility functions Note that this is different from JASPAR2 and to some extent JASPAR4. As any tag is supported for database storage, any tag can be used for information retrieval. Additionally, arguments as 'name','class','collection' can be used (even though they are not tags. Per default, only the last version of the matrix is given. The only way to get older matrices out of this to use an array of IDs with actual versions like MA0001.1, or set the argyment -all_versions=>1, in which case you get all versions for each stable ID Examples include: Fundamental matrix features -all # gives absolutely all matrix entry, regardless of versin and collection. Only useful for backup situations and sanity checks. Takes precedence over everything else -ID # a reference to an array of stable IDs (strings), with or without version, as above. tyically something like "MA0001.2" . Takes precedence over everything salve -all -name # a reference to an array of # transcription factor names (string). Will only take latest version. NOT a preferred way to access since names change over time -collection # a string corresponding to a JASPAR collection. Per default CORE -all_versions # gives all matrix versions that fit with rest of criteria, including obsolete ones.Is off per default. # Typical usage is in combiation with a stable IDs withou versions to get all versinos of a particular matrix Typical tag queries: These can be either a string or a reference to an array of strings. If it is an arrau it will be interpreted as as an "or"s statement -class # a reference to an array of # structural class names (strings) -species # a reference to an array of # NCBI Taxonomy IDs (integers) -taxgroup # a reference to an array of # higher taxonomic categories (string) Computed features of the matrices -min_ic # float, minimum total information content # of the matrix. -matrixtype #string describing type of matrix to retrieve. If left out, the format will revert to the database format, which is PFM. The arguments that expect list references are used in database query formulation: elements within lists are combined with 'OR' operators, and the lists of different types with 'AND'. For example, my $matrixset = $db->(-class => ['TRP_CLUSTER', 'FORKHEAD'], -species => ['Homo sapiens', 'Mus musculus'], ); gives a set of TFBS::Matrix::PFM objects (given that the matrix models are stored as such) whose (structural clas is 'TRP_CLUSTER' OR'FORKHEAD') AND (the species they are derived from is 'Homo sapiens'OR 'Mus musculus'). As above, unless IDs with version numbers are used, only one matrix per stable ID wil be returned: the matrix with the highest version number The -min_ic filter is applied after the query in the sense that the matrices profiles with total information content less than specified are not included in the set. =cut # jsp6 sub get_MatrixSet { # IC conetent and matrix stuff is not there yet, rest should work my ($self, %args) = @_; #jsp6 $args{'-collection'}='CORE' unless $args{'-collection'}; $args{'-all_versions'}=0 unless $args{'-all_versions'}; my @IDlist = @{$self->_get_IDlist_by_query(%args)}; # the IDlist here are INTERNAL ids my $type; my $matrixset = TFBS::MatrixSet->new(); foreach my $int_id(@IDlist) { my $matrix=$self->_get_Matrix_by_int_id($int_id); if (defined $args{'-min_ic'} ){ # we assume the matrix IS a PFM, o something in normal space at least # unless it explicitly says otherwise in tag=matrixtype # if so warn and do not use IC content # this is not foolproof in any way # # Fixed up logic to actually check $matrix->isa(TFBS::Matrix::ICM) # before checking the matrixtype tag. Also check that matrixtype # tag is defined before comparison to prevent annoying "Use of # uninitialized value in string eq" messages from perl. # DJA 2012/05/11 if ($matrix->isa("TFBS::Matrix::ICM") || ( defined $matrix->{tags}{matrixtype} && $matrix->{tags}{matrixtype} eq "ICM")){ next if ( $matrix->total_ic() < $args{'-min_ic'}); } elsif ($matrix->isa("TFBS::Matrix::PFM")){ next if ( $matrix->to_ICM->total_ic() < $args{'-min_ic'}); } else{ warn "Warning: you are assessning information content on matrices that are not in PFM or ICM format.Skipping this criteria"; next; } } # length if (defined $args{'-length'} ){ next if ( $matrix->length() < $args{'-length'}); } # number of sites within # since column sums MIGHT be slightly different we take the integer of the mean of the columns # or really int( sum of matrix/#columns) if (defined $args{'-sites'} ){ my $sum=0; foreach ( 1..$matrix->length){ $sum+=$matrix->column_sum(); } $sum=int($sum /$matrix->length); warn $matrix->ID, " $sum is $sum"; next if ( $sum < $args{'-sites'}); } #ugly code: think about this a bit. if ($args{'-matrixtype'} && $matrix->isa("TFBS::Matrix::PFM")){ if ( $args{'-matrixtype'} eq ('PWM')) { $matrix= $matrix->to_PWM(); } if ( $args{'-matrixtype'} eq ('ICM')) { $matrix= $matrix->to_PWM(); } } $matrixset->add_Matrix($matrix); } return $matrixset; } sub store_MatrixSet { #DONE a wrapper around store_Matrix (which also can take an array of matrices, so utility only my ($self, $matrixset) = @_; my $it=$matrixset->Iterator(); while (my $matrix_object = $it->next) { # do whatever you want with individual matrix objects $self->store_Matrix($matrix_object) } } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045.1'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-ike and may change in future versions) Args : (ID) A string. Has to be a matrix ID with version suffix in JASPAR6. Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, @IDs) = @_; # this has to be versioned IDs foreach my $ID (@IDs){ my ($base_id, $version)= split (/\./, $ID); unless ($version) { warn "You have supplied a non-versioned matrix ID to delete. Skipping $ID "; return 0; } # get relevant internal ID my($int_id)= $self->_get_internal_id($base_id, $version); eval { my $q_ID = $self->dbh->quote($int_id); foreach my $table (qw (MATRIX_DATA MATRIX MATRIX_SPECIES MATRIX_PROTEIN MATRIX_ANNOTATION ) ) { $self->dbh->do("DELETE from $table where ID=$q_ID"); } }; } return $@; } ######################################################################### # PRIVATE METHODS ######################################################################### sub _new { #PROBABLY OK my ($caller, %args) = @_; my $class = ref $caller || $caller; my $self = bless {}, $class; my ($connectstring, $user, $password); if ($args{'-connect'} and (ref($args{'-connect'}) eq "ARRAY")) { ($connectstring, $user, $password) = @{$args{'-connect'}}; } elsif ($args{'-create'} and (ref($args{'-create'}) eq "ARRAY")) { return $caller->create(@{-args{'create'}}); } else { ($connectstring, $user, $password) = (DEFAULT_CONNECTSTRING, DEFAULT_USER, DEFAULT_PASSWORD); } $self->dbh( DBI->connect($connectstring, $user, $password) ); return $self; } sub _store_matrix_data {# DONE my ($self, $pfm, $int_id,$ACTION) = @_; my @base = qw(A C G T); my $matrix = $pfm->matrix(); my $type; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_DATA VALUES(?,?,?,?) !); for my $i (0..3) { for my $j (0..($pfm->length-1)) { $sth->execute( $int_id, $base[$i], $j+1, $matrix->[$i][$j] ) or $self->throw("Error executing query."); } } } sub _store_matrix { #DONE my ($self, $pfm, $ACTION) = @_; # creation of the matrix will also give an internal unique ID (incremental int) # which will be returned to use for the other tables # Get collection and versio from the matrix tags my $version= $pfm->{'tags'}{'version'}; # will warn but not die if version is missing: will assume 1 unless ($version) { warn "WARNING: Lacking version number for ". $pfm->ID. ". Setting version=1"; $version=1; } my $collection= $pfm->{'tags'}{'collection'}; unless ($collection) { warn "WARNING: Lacking collection name for ". $pfm->ID. ". Setting collection to an empty string. You probably do not want this"; $collection=''; } # sanity check: do we alsready have this cobination of base ID and version? If we do, die my $base_id= $pfm->ID ; my $sth = $self->dbh->prepare (qq! select count(*) from MATRIX where VERSION=$version and BASE_ID= "$base_id"and collection="$collection" !); $sth->execute; my ($sanity_count)= $sth->fetchrow_array; if ($sanity_count >0){ warn "WARNING: Database input inconsistency: You have already have $sanity_count $base_id matrices of version $version in collection $collection. Terminating program"; die; } # insert data $sth = $self->dbh->prepare (q! INSERT INTO MATRIX VALUES(?,?,?,?,?) !); # update next sth with actual version and collection: DO $sth->execute(0, $collection,$pfm->ID,$version,$pfm->name) or $self->throw("Error executing query"); # get the actual (new) iternal ID my $int_id = $self->dbh->{ q{mysql_insertid}}; return $int_id; } sub _store_matrix_annotation { # DONE #this is for tag-value items that are not one-to-many (so, not species and not acc) my ($self, $pfm, $int_id,$ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_ANNOTATION (ID, tag, val) VALUES(?,?,?) !); # get all tags # but skip out collection or version as we already have those in the MATRIX table #special handling for class which mighht have a true slot my %tags= $pfm->all_tags(); if (defined ($pfm->{class})){ $tags{class}=$pfm->{class} ; } foreach my $tag( keys %tags){ next if $tag eq "collection"; next if $tag eq "version"; next if $tag eq "species"; # next if $tag eq "acc"; # next if $tag eq "class"; $sth->execute($int_id, $tag, ($tags{$tag} or ""), ) or $self->throw("Error executing query"); } } sub _store_matrix_species { # DONE #these are for species IDs - can be several # these are taken from the tag "species" # if that tag is a reference to an array we walk over the array # if it is a comma-separated string we split the string my ($self, $pfm, $int_id,$ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_SPECIES VALUES(?,?) !); #sanity check: are there any species? Its ok not to have it. return() unless $pfm->{'tags'}{'species'}; #is the species a string or an arrayref? if ( ref ($pfm->{'tags'}{'species'}) eq 'ARRAY'){ # walkthru array foreach my $species ( @{$pfm->{'tags'}{'species'}}){ $sth->execute($int_id,$species); } } else{ # split and walk thru foreach my $species ( split(/\,/, $pfm->{'tags'}{'species'})){ $species=~s/^\s//g; $sth->execute($int_id,$species); } } } sub _store_matrix_acc { # DONE #these are for protein accession numbers - can be several # these are taken from the tag "acc" # if that tag is a reference to an array we walk over the array # if it is a comma-separated string we split the string my ($self, $pfm, $int_id,$ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_PROTEIN VALUES(?,?) !); #sanity check: are there any accession numbers? Its ok not to have it. return() unless $pfm->{'tags'}{'acc'}; #is the species a string or an arrayref? if ( ref ($pfm->{'tags'}{'acc'}) eq 'ARRAY'){ # walkthru array foreach my $acc ( @{$pfm->{'tags'}{'acc'}}){ $acc=~s/\s//g; $sth->execute($int_id,$acc); } } else{ # split and walk thru foreach my $acc ( split(/\,/, $pfm->{'tags'}{'acc'})){ $acc=~s/\s//g; $sth->execute($int_id,$acc); } } } #when creating: try to support arbitrary tags sub _create_tables { # DONE # utility function # If you want to change the databse schema, # this is the right place to do it my $dbh = shift; my @queries = ( q! CREATE TABLE MATRIX( ID INT NOT NULL AUTO_INCREMENT, COLLECTION VARCHAR (16) DEFAULT '', BASE_ID VARCHAR (16)DEFAULT '' NOT NULL , VERSION TINYINT DEFAULT 1 NOT NULL , NAME VARCHAR (255) DEFAULT '' NOT NULL, PRIMARY KEY (ID)) !, q! CREATE TABLE MATRIX_DATA( ID INT NOT NULL, row VARCHAR(1) NOT NULL, col TINYINT(3) UNSIGNED NOT NULL, val float(10,3), PRIMARY KEY (ID, row, col) ) !, q! CREATE TABLE MATRIX_ANNOTATION( ID INT NOT NULL, TAG VARCHAR(255)DEFAULT '' NOT NULL, VAL varchar(255) DEFAULT '', PRIMARY KEY (ID, TAG) ) !, q! CREATE TABLE MATRIX_SPECIES( ID INT NOT NULL, TAX_ID VARCHAR(255)DEFAULT '' NOT NULL ) !, q! CREATE TABLE MATRIX_PROTEIN( ID INT NOT NULL, ACC VARCHAR(255)DEFAULT '' NOT NULL ) ! ); foreach my $query (@queries) { $dbh->do($query) or die("Error executing the query: $query\n"); } } sub _get_matrixstring { #DONE my ($self, $ID) = @_; #my %dbname = (PWM => 'pwm', PFM => 'raw', ICM => 'info'); #unless (defined $dbname{$mt}) { #$self->throw("Unsupported matrix type: ".$mt); #} my $sth; my $qID = $self->dbh->quote($ID); my $matrixstring = ""; foreach my $base (qw(A C G T)) { $sth=$self->dbh->prepare ("SELECT val FROM MATRIX_DATA WHERE ID=$qID AND row='$base' ORDER BY col"); $sth->execute; $matrixstring .= join (" ", (map {$_->[0]} @{$sth->fetchall_arrayref()}))."\n"; } $sth->finish; return undef if $matrixstring eq "\n"x4; return $matrixstring; } sub _get_latest_version { #DONE my ($self, $base_ID) = @_; # SELECT VERSION FROM MATRIX WHERE BASE_ID=? ORDER BY VERSION DESC LIMIT 1 my $sth=$self->dbh->prepare (qq!SELECT VERSION FROM MATRIX WHERE BASE_ID="$base_ID" ORDER BY VERSION DESC LIMIT 1!); $sth->execute; my ($latest)=$sth->fetchrow_array(); return($latest); } sub _get_internal_id { #DONE # picks out the internal id for a a stable id+ version. Also checks if this cobo exists or not my ($self, $base_ID, $version) = @_; # SELECT ID FROM MATRIX WHERE BASE_ID=? and VERSION=? my $sth=$self->dbh->prepare (qq!SELECT ID FROM MATRIX WHERE BASE_ID="$base_ID" AND VERSION="$version"!); $sth->execute; my ($int_id)=$sth->fetchrow_array(); return($int_id); } sub _get_Matrix_by_int_id { #done my ($self, $int_id, $mt)= @_; my $matrixobj; $mt='PFM' unless $mt; # get the matrix as a string my $matrixstring = $self->_get_matrixstring($int_id) || return undef; #get remaining data in the matrix table: name, collection my $sth=$self->dbh->prepare(qq!SELECT BASE_ID,VERSION, COLLECTION,NAME FROM MATRIX WHERE ID="$int_id"!); $sth->execute(); my ($base_ID, $version,$collection,$name)=$sth->fetchrow_array(); # jsp6 # get species ##$sth=$self->dbh->prepare(qq!SELECT TAX_ID FROM MATRIX_SPECIES WHERE ID="$int_id"!); $sth=$self->dbh->prepare(qq!SELECT GROUP_CONCAT(TAX_ID SEPARATOR ', ') as TAX_ID FROM MATRIX_SPECIES WHERE ID="$int_id"!); $sth->execute(); my @tax_ids; while (my ($res)=$sth->fetchrow_array()) { my @res_v=split(/,/,$res); my @res_v2=grep(s/^\s*(.*)\s*$/\1/g, @res_v); push(@tax_ids, @res_v2); } # jsp6 # get acc ##$sth=$self->dbh->prepare(qq!SELECT ACC FROM MATRIX_PROTEIN WHERE ID="$int_id"!); $sth=$self->dbh->prepare(qq!SELECT GROUP_CONCAT(ACC SEPARATOR ', ') as ACC FROM MATRIX_PROTEIN WHERE ID="$int_id"!); $sth->execute(); my @accs; while (my ($res)=$sth->fetchrow_array()) { my @res_v=split(/,/,$res); my @res_v2=grep(s/^\s*(.*)\s*$/\1/g, @res_v); push(@accs, @res_v2); } # jsp6 # get remaining annotation as tags, form ANNOTATION table my %tags; $sth=$self->dbh->prepare(qq{SELECT TAG, VAL FROM MATRIX_ANNOTATION WHERE ID = "$int_id" }); $sth->execute(); ## my @key_to_split=("acc", "medline", "pazar_tf_id"); #if acc in MATRIX_ANNOTATION my @key_to_split=("medline", "pazar_tf_id", "tfbs_shape_id", "tfe_id"); #my @key_to_split=("medline", "pazar_tf_id"); foreach my $key(@key_to_split){ $tags{$key}=['-']; } #my @key_to_split=("medline"); my $vals; while ( my($tag, $val)= $sth->fetchrow_array()) { $vals=[]; if ($tag ~~ @key_to_split){ my @val_v=split(/,/,$val); my @val_v2=grep(s/^\s*(.*)\s*$/\1/g, @val_v); push(@$vals, @val_v2); $tags{$tag}=$vals; } else { $tags{$tag}=$val; } # $tags{$tag}=$val; } # jsp6 $tags{'collection'}= $collection; $tags{'species'}=\@tax_ids; # as array reference instead of strigifying $tags{'acc'}=\@accs; # same, if acc MATRIX_PROTEIN # my $class= $tags{'class'}; delete ($tags{'class'}); # eval("\$matrixobj= TFBS::Matrix::PFM->new".' ( -ID => "$base_ID.$version", -name =>$name, -class => $class, -tags => \%tags, -matrixstring=> $matrixstring # FIXME - temporary );' ); if ($@) { $self->throw($@); } # warn $int_id, "\t", ref($matrixobj); return ($matrixobj->to_PWM) if $mt eq "PWM"; return ($matrixobj->to_ICM) if $mt eq "ICM"; return ($matrixobj); # default PFM } ##jsp6 sub _get_IDlist_by_query { #needs cleanup. NOT for the faint-hearted. my ($self, %args)=@_; warn '_get_IDlist_by_query | $self || ', $self; warn '_get_IDlist_by_query | %args || ', %args; # called by get_MatrixSet # warn $args{"-collection"}; $args{'-collection'}='CORE' unless $args{'-collection'}; # returns a set of internal IDs with whicj to get the actual matrices # current idea: # 1: first catch non-tag things like collection, name and version, species # makw one query for these if they are named and check the IDs for "latest" unless requested not to. # these are AND statements # 2:then do the rest on tag level: # to be able to do this with actual and tattemnet innthe tag table, we do an inner join query, which is kept separate just for convenice # we then intersect 1 and 2 # 3: then do matrix-based features such as ic, with, number of sites etc, for the surviving matrices. This shold happen in the get_matrixset part my @int_ids_to_return; ## jsp6 - autosearch if ($args{'-auto'}) { ##my $sth=$self->dbh->prepare (qq!SELECT ID FROM MATRIX WHERE BASE_ID=?!); my $sth=$self->dbh->prepare (qq!SELECT U.ID FROM (SELECT ID, BASE_ID as VAL FROM MATRIX UNION ALL SELECT ID, NAME as VAL FROM MATRIX UNION ALL SELECT ID, ACC as VAL FROM MATRIX_PROTEIN UNION ALL SELECT ID, TAX_ID as VAL FROM MATRIX_SPECIES UNION ALL SELECT ID, SPECIES as VAL FROM MATRIX_SPECIES,TAX WHERE MATRIX_SPECIES.TAX_ID=TAX.TAX_ID UNION ALL SELECT ID, NAME as VAL FROM MATRIX_SPECIES,TAX_EXT WHERE MATRIX_SPECIES.TAX_ID=TAX_EXT.TAX_ID AND MATRIX_SPECIES.TAX_ID=9606 UNION ALL SELECT ID, VAL as VAL FROM MATRIX_ANNOTATION) AS U WHERE LOWER(`VAL`) LIKE LOWER(?)!); warn '_get_IDlist_by_query | $sth || ', $sth; foreach my $stID(@{$args{'-auto'}}) { warn '_get_IDlist_by_query | $stID || ', $stID; my ($stable_ID, $version)= split (/\./, $stID); # ignore vesion here, this is a stupidity filter #$sth->execute($stable_ID); $sth->execute("%" . $stable_ID . "%"); while( my ($int_id)=$sth->fetchrow_array()) { warn '_get_IDlist_by_query | $int_id || ', $int_id; push (@int_ids_to_return, $int_id); } } return \@int_ids_to_return; } # should redo so that matrix_annotation queries are separate, with an intersect in the end #special case 1: get ALL matrices. Higher priority than all if ($args{'-all'}) { my $sth=$self->dbh->prepare (qq!SELECT ID FROM MATRIX!); $sth->execute(); my @a; while ( my ($i)=$sth->fetchrow_array()) { push (@a, $i); } return \@a; } # ids: special case2 which is has higher priority than any other except the above (ignore all others if ($args{'-ID'}) { # these might be either stable IDs or stableid.version. # if just stable ID and if all_versions==1, take all versions, otherwise the latest if ( $args{-all_versions}) { my $sth=$self->dbh->prepare (qq!SELECT ID FROM MATRIX WHERE BASE_ID=?!); foreach my $stID(@{$args{'-ID'}}) { my ($stable_ID, $version)= split (/\./, $stID); # ignore vesion here, this is a stupidity filter $sth->execute($stable_ID); while( my ($int_id)=$sth->fetchrow_array()) { push (@int_ids_to_return, $int_id); } } } else { # only the lastest version, or the requested version foreach my $stID(@{$args{'-ID'}}) { #warn $stID; my ($stable_ID, $version)= split (/\./, $stID); $version=$self->_get_latest_version($stable_ID) unless $version; my $int_id= $self->_get_internal_id($stable_ID, $version); push (@int_ids_to_return, $int_id) if $int_id; } } return \@int_ids_to_return; } my @tables=("MATRIX M"); my @and; # in matrix table: collection, if ( $args{-collection}) { my $q=' (COLLECTION='; if (ref $args{-collection} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-collection}}) { push (@a, "\"$_\""); } $q.= join ( " or COLLECTION=", @a); } else {# just one - typical usage $q.="\"$args{-collection}\""; } $q.=" ) "; push (@and, $q); } # in matrix table: names. Is something that is basically only used from the web interface # typically used by the get_matrix_by_name function instead if ($args{-name}) { my $q=' (NAME='; if (ref $args{-name} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-name}}) { push (@a, "\"$_\""); } $q.= join ( " or NAME=", @a); } else {# just one - typical usage $q.="\"$args{-name}\""; } $q.=" ) "; push (@and, $q); } # in species table: tax.id: possibly many species with OR in between if ( $args{-species}) { push (@tables , "MATRIX_SPECIES S"); my $q=" M.ID=S.ID and (TAX_ID= ";; if (ref $args{-species} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-species}}) { push (@a, "\"$_\""); } $q.= join ( " or TAX_ID=", @a); } else {# just one - typical usage $q.="=\"$args{-species}\""; } $q.=") "; push (@and, $q); } # TAG_BASED # an internal join query:should be able to handle up to 26 tags-value combos with ANDS in between # Very ugly code ahead: my (@inner_tables, @internal_ands1,@internal_ands2 ); my $int_counter=0; # for keeping track of names; my @alpha = ("a" .. "z"); my %arrayref; foreach my $key(keys %args) { next if $key eq "-min_ic"; next if $key eq "-matrixtype"; next if $key eq "-species"; next if $key eq "-collection"; next if $key eq "-all_versions"; next if $key eq "-all"; next if $key eq "-ID"; next if $key eq "-length"; next if $key eq "-name"; my $oldkey=$key; $key=~s/-//; $arrayref{$key}= $args{$oldkey}; } if (%arrayref) { # get an internal name for the table push (@internal_ands2 , " M.ID=a.ID " ); my @a; foreach my $key (keys %arrayref) { my $tname= $alpha[$int_counter]; push (@inner_tables , "MATRIX_ANNOTATION $tname"); push (@internal_ands1 , $alpha[$int_counter].".ID=". $alpha[$int_counter-1].".ID") unless $int_counter==0; $int_counter++; # is the thing aupplied an array reference in inteslf: make an "or" query from that if ( ref $arrayref{$key} eq "ARRAY") { my @b; foreach( @{$arrayref{$key}}) { push (@b, $self->dbh->quote($_)); } my $orstring= join (" or $tname.VAL=" , @b); push (@a, "($tname.TAG=\"$key\" AND ($tname.VAL=$orstring))"); } #or not else { push (@a, "($tname.TAG=\"$key\" AND $tname.VAL=\"$arrayref{$key}\")"); } } my $s= " ( ". join (" AND ", @a). ")"; push (@internal_ands2 , $s); } my $qq= "SELECT distinct(M.ID) from ". join (",", (@tables,@inner_tables)) . " where" . join ( " AND ", (@and,@internal_ands1, @internal_ands2 )); # warn $qq; #do actual mammoth query,and check for latest matrix my $sth=$self->dbh->prepare ($qq); $sth->execute(); my @r; while (my($int_id)= $sth->fetchrow_array) { if ($args{-all_versions}) { push (@r,$int_id); } else { # is latest? push(@r,$int_id) if ( $self->_is_latest_version($int_id) ==1); } } warn "Warning: Zero matrices returned with current critera" unless scalar @r; return \@r; } ## jsp6 - checkpoint sub _is_latest_version{ # is a particular internal ID representingthe latest matrix (collapse on base ids) my ($self, $int_id)=@_; my $sth=$self->dbh->prepare( qq! select count(*) from MATRIX where BASE_ID= (SELECT BASE_ID from MATRIX where ID=$int_id) AND VERSION>(SELECT VERSION from MATRIX where ID=$int_id) !); $sth->execute(); my ($count)= $sth->fetchrow_array(); return(1) if $count ==0;# no matrices with higher version ID and same base id return(0); } sub DESTROY { #OK $_[0]->dbh->disconnect() if $_[0]->dbh; } TFBS-0.7.1/TFBS/DB/JASPAR7.pm000066400000000000000000001625011305752266700147270ustar00rootroot00000000000000# TFBS module for TFBS::DB::JASPAR # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # Maintainers: # Xiaobei Zhao - JASPAR6 (JASPAR 2014 experimental) # David Arenillas - JASPAR7 (JASPAR 2016) # # This is an update of, and follows from TFBS::DB::JASPAR6 which itself was # created / modified from JASPAR5 by Xiaobei Zhao. It was created # by copying the existing TFBS::DB::JASPAR6 and modifying it to reflect the # changes made to the 2016 update of the JASPAR database / webserver. # # As there seemed to be some discrepancy / confusion between the version # numbers with the JASPAR DB/webserver releases and the perl module, from # now on the latest (current) version of the perl module will simply be named # JASPAR.pm (without version number attached). Previous versions will have a # version number attached to them to indicate which DB/webserver they are # related to. # # JASPAR 5 and 6 are associated with JASPAR 2014 (also known as JASPAR 5.0). # It appears that the JASPAR6 perl module was an experimental update to # JASPAR5 and did not reflect a version change in the JASPAR DB/webserver. # # Change summary since JASPAR[5/6].pm: # - In the _store_matrix routine, the code was checking for a version number # by checking for a 'version' tag and NOT even checking to see if the version # number was already contained in the matrix ID!?!? Modified to first check # for a version encoded in the matrix ID and then only if it isn't, check for # a version tag. # - Fixed the way the 'acc' tag is stored. # - The DB schema was changed to allow multiple values for a given matrix in # some instances where previously only a single value was allowed. E.g. for # the MATRIX_ANNOTATION table, removed the unique key constrain on ID + TAG # so that multiple VALs with the same TAG can be stored for a given matrix. # As it was, multiple values were already stored as comma separated values # in a single row which is bad DB design (not normalized). The Code was thus # modified to reflect that these values may now be stored in multiple DB # rows. # - Related to this, previously it was assumed that TF class and family could # each only be a single value. However, for heterodimers it is quite possible # that the two TFs making up the dimer have a different class/family so now # the code assumes multiple could exist in the DB for these tags. As a result # the class/family may now be stored in a TFBS::Matrix object as either a # scalar string (for a single value) or as a listref (for multiple values). # This is consistent with the way other tag/values are stored although # it can certainly be argued that it is not good programming style to have # different return types for the same method! This behaviour should be # reconsidered in future versions of the TFBS modules. # - Similarly to values stored in MATRIX_ANNOTATION, the MATRIX_SPECIES table # has been modified to allow storage or multiple species for a matrix as # separate rows rather than as comma separated strings and related code # modified accordingly. # - Added some exception calls and/or more informative error messages in some # places they were missing. # - Fixed up (some of) the ugly code formatting, e.g. changed mixture of # leading tabs / space to be consistently spaces and changed indentation # levels to always be 4 spaces. # # Also see specific embedded comments tagged with DJA for more details # # POD =head1 NAME TFBS::DB::JASPAR - interface to MySQL relational database of pattern matrices. Currently status: experimental. =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing JASPAR6-type database my $db = TFBS::DB::JASPAR6->connect("dbi:mysql:JASPAR6:myhost", "myusername", "mypassword"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M0079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); =item * creating a new JASPAR6-type database named MYJASPAR6: my $db = TFBS::DB::JASPAR4->create("dbi:mysql:MYJASPAR6:myhost", "myusername", "mypassword"); =item * storing a matrix in the database (currently only PFMs): #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::JASPAR is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. The interface is nearly identical to the JASPAR2 and JASPAR4 interface, while the underlying data model is different =head1 JASPAR6 DATA MODEL JASPAR6 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed (JASPAR2) to store matrices for the JASPAR database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile ID (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional, and custom, tags. Here goes a moore thorough description on tables and IDs ----------------------- ADVANCED --------------------------------- For the developers and the curious, here is the JASPAR6 data model: MISSING TEXT HEER ON HOW IT WORKS It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most TFBS users. Most users should only know the methods to store the data and which tags are supported. ------------------------------------------------------------------------- =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::JASPAR7; use vars qw(@ISA $AUTOLOAD); # we need all three matrices due to the redundancy in JASPAR2 data model # which will hopefully be removed in JASPAR3 use TFBS::Matrix::PWM; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::TFFM; use TFBS::MatrixSet; use Bio::Root::Root; use DBI; # use TFBS::DB; # eventually use strict; @ISA = qw(TFBS::DB Bio::Root::Root); ######################################################################### # CONSTANTS ######################################################################### use constant DEFAULT_CONNECTSTRING => "dbi:mysql:JASPAR_DEMO"; # on localhost use constant DEFAULT_USER => ""; use constant DEFAULT_PASSWORD => ""; ######################################################################### # PUBLIC METHODS ######################################################################### =head2 new Title : new Usage : DEPRECATED - for backward compatibility only Use connect() or create() instead =cut sub new { _new(@_); } =head2 connect Title : connect Usage : my $db = TFBS::DB::JASPAR6->connect("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the existing JASPAR6-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR6 object Args : a standard database connection triplet ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, use the actual values. PASSWORD and USERNAME might be optional, depending on the user's acces permissions for the database server. =cut sub connect { #DONE # a more intuitive syntax for the constructor my ($caller, @connection_args) = @_; $caller->new(-connect => \@connection_args); } =head2 dbh Title : dbh Usage : my $dbh = $db->dbh(); $dbh->do("UPDATE matrix_data SET name='ADD1' WHERE NAME='SREBP2'"); Function: returns the DBI database handle of the MySQL database interfaced by $db; THIS IS USED FOR WRITING NEW METHODS FOR DIRECT RELATIONAL DATABASE MANIPULATION - if you have write access AND do not know what you are doing, you can severely corrupt the data For documentation about database handle methods, see L Returns : the database (DBI) handle of the MySQL JASPAR2-type relational database associated with the TFBS::DB::JASPAR2 object Args : none =cut sub dbh { #DONE my ($self, $dbh) = @_; $self->{'dbh'} = $dbh if $dbh; return $self->{'dbh'}; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($matrixobject); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (PFM_object) A TFBS::Matrix::PFM, FBS::Matrix::PWM or FBS::Matrix::ICM object. PFM object are recommended to use, as they are eaily converted to other formats # might have to give version and collection here Comment : this is an experimental method that is not 100% bulletproof; use at your own risk =cut sub store_Matrix { #PROBABLY DONE # collection, version are taken from the corresponding tags. Warn if they are not there ; my ($self, @PFMs) = @_; my $err; foreach my $pfm (@PFMs) { eval { my $int_id = $self->_store_matrix($pfm) ; # needs to have collection and version $self->_store_matrix_data($pfm, $int_id); $self->_store_matrix_annotation($pfm, $int_id); $self->_store_matrix_species($pfm, $int_id); $self->_store_matrix_acc($pfm, $int_id); }; } return $@; } sub create { #done my ($caller, $connectstring, $user, $password) = @_; if ( $connectstring and $connectstring =~ /dbi:mysql:(\w+)(.*)/) { # connect to the server; my $dbh = DBI->connect("dbi:mysql:mysql" . $2, $user, $password) or die("Error connecting to the database"); # create database and open it $dbh->do("create database $1") or die("Error creating database."); $dbh->do("use $1"); # create tables _create_tables($dbh); $dbh->disconnect; # run "new" with new database return $caller->new(-connect => [$connectstring, $user, $password]); } else { die( "Missing or malformed connect string for " . "TFBS::DB::JASPAR2 connection."); } } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix is stored in the database (PFM is default) Args : (Matrix_ID) Matrix_ID id is a string which refers to the stable JASPAR ID (usually something like "MA0001") with or without version numbers. "MA0001" will give the latest version on MA0001, while "MA0001.2" will give the second version, if existing. Warnings will be given for non-existing matrices. =cut sub get_Matrix_by_ID { #DONE. MAYBE :) my ($self, $q, $mt) = @_; # q is a stable ID with possible version number # jsp6 $mt = (uc($mt) or "PFM"); unless (defined $q) { $self->throw("No ID passed to get_Matrix_by_ID"); } my $ucmt = uc $mt; # separate stable ID and version number my ($base_ID, $version) = split(/\./, $q); $version = $self->_get_latest_version($base_ID) unless $version; # latest version per default # get internal ID - also a check for validity my $int_id = $self->_get_internal_id($base_ID, $version); # get matrix using internal ID my $m = $self->_get_Matrix_by_int_id($int_id, $ucmt); #warn ref($m); return ($m); } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix object was stored in the database (default PFM)) Args : (Matrix_name) Warning : According to the current JASPAR6 data model, name is not necessarily a unique identifier. Also, names change over time. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You've been warned. Some matrices have multiple versions. The function will return the latest version. For specific versions, use get_Matrix_by_ID($ID.$version) =cut sub get_Matrix_by_name { #DONE my ($self, $name, $mt) = @_; unless (defined $name) { $self->throw("No name passed to get_Matrix_by_name."); } # sanity check: are there many different stable IDs with same name? my $sth = $self->dbh->prepare( qq!SELECT distinct BASE_ID FROM MATRIX WHERE NAME="$name"! ); $sth->execute(); my (@stable_ids) = $sth->fetchrow_array(); my $L = scalar @stable_ids; $self->warn("There are $L distinct stable IDs with name '$name'") if scalar $L > 1; return $self->get_Matrix_by_ID($stable_ids[0], $mt); } =head2 get_MatrixSet Title : get_MatrixSet Usage : my $matrixset = $db->get_MatrixSet(%args); Function: fetches matrix data under for all matrices in the database matching criteria defined by the named arguments and returns a TFBS::MatrixSet object Returns : a TFBS::MatrixSet object Args : This method accepts named arguments, corresponding to arbitrary tags, and also some utility functions Note that this is different from JASPAR2 and to some extent JASPAR4. As any tag is supported for database storage, any tag can be used for information retrieval. Additionally, arguments as 'name','class','collection' can be used (even though they are not tags. Per default, only the last version of the matrix is given. The only way to get older matrices out of this to use an array of IDs with actual versions like MA0001.1, or set the argyment -all_versions=>1, in which case you get all versions for each stable ID Examples include: Fundamental matrix features -all # gives absolutely all matrix entry, regardless of versin and collection. Only useful for backup situations and sanity checks. Takes precedence over everything else -ID # a reference to an array of stable IDs (strings), with or without version, as above. tyically something like "MA0001.2" . Takes precedence over everything salve -all -name # a reference to an array of # transcription factor names (string). Will only take latest version. NOT a preferred way to access since names change over time -collection # a string corresponding to a JASPAR collection. Per default CORE -all_versions # gives all matrix versions that fit with rest of criteria, including obsolete ones.Is off per default. # Typical usage is in combiation with a stable IDs withou versions to get all versinos of a particular matrix Typical tag queries: These can be either a string or a reference to an array of strings. If it is an arrau it will be interpreted as as an "or"s statement -class # a reference to an array of # structural class names (strings) -species # a reference to an array of # NCBI Taxonomy IDs (integers) -taxgroup # a reference to an array of # higher taxonomic categories (string) Computed features of the matrices -min_ic # float, minimum total information content # of the matrix. -matrixtype #string describing type of matrix to retrieve. If left out, the format will revert to the database format, which is PFM. The arguments that expect list references are used in database query formulation: elements within lists are combined with 'OR' operators, and the lists of different types with 'AND'. For example, my $matrixset = $db->(-class => ['TRP_CLUSTER', 'FORKHEAD'], -species => ['Homo sapiens', 'Mus musculus'], ); gives a set of TFBS::Matrix::PFM objects (given that the matrix models are stored as such) whose (structural clas is 'TRP_CLUSTER' OR'FORKHEAD') AND (the species they are derived from is 'Homo sapiens'OR 'Mus musculus'). As above, unless IDs with version numbers are used, only one matrix per stable ID wil be returned: the matrix with the highest version number The -min_ic filter is applied after the query in the sense that the matrices profiles with total information content less than specified are not included in the set. =cut # jsp6 sub get_MatrixSet { # IC conetent and matrix stuff is not there yet, rest should work my ($self, %args) = @_; #jsp6 $args{'-collection'} = 'CORE' unless $args{'-collection'}; $args{'-all_versions'} = 0 unless $args{'-all_versions'}; my @IDlist = @{$self->_get_IDlist_by_query(%args)} ; # the IDlist here are INTERNAL ids my $type; my $matrixset = TFBS::MatrixSet->new(); foreach my $int_id (@IDlist) { my $matrix = $self->_get_Matrix_by_int_id($int_id); if (defined $args{'-min_ic'}) { # we assume the matrix IS a PFM, o something in normal space at # least unless it explicitly says otherwise in tag=matrixtype # if so warn and do not use IC content # this is not foolproof in any way # # Fixed up logic to actually check $matrix->isa(TFBS::Matrix::ICM) # before checking the matrixtype tag. Also check that matrixtype # tag is defined before comparison to prevent annoying "Use of # uninitialized value in string eq" messages from perl. # DJA 2012/05/11 # if ($matrix->isa("TFBS::Matrix::ICM") || (defined $matrix->{tags}{matrixtype} && $matrix->{tags}{matrixtype} eq "ICM") ) { next if ($matrix->total_ic() < $args{'-min_ic'}); } elsif ($matrix->isa("TFBS::Matrix::PFM")) { next if ($matrix->to_ICM->total_ic() < $args{'-min_ic'}); } else { warn "Warning: you are assessing information content on matrices that are not in PFM or ICM format. Skipping this criteria"; next; } } # length if (defined $args{'-length'}) { next if ($matrix->length() < $args{'-length'}); } # number of sites within # since column sums MIGHT be slightly different we take the integer of the mean of the columns # or really int( sum of matrix/#columns) if (defined $args{'-sites'}) { my $sum = 0; foreach (1 .. $matrix->length) { $sum += $matrix->column_sum(); } $sum = int($sum / $matrix->length); #warn $matrix->ID, " $sum is $sum"; next if ($sum < $args{'-sites'}); } #ugly code: think about this a bit. if ($args{'-matrixtype'} && $matrix->isa("TFBS::Matrix::PFM")) { if ($args{'-matrixtype'} eq ('PWM')) { $matrix = $matrix->to_PWM(); } if ($args{'-matrixtype'} eq ('ICM')) { $matrix = $matrix->to_PWM(); } } $matrixset->add_Matrix($matrix); } return $matrixset; } sub store_MatrixSet { #DONE a wrapper around store_Matrix (which also can take an array of matrices, so utility only my ($self, $matrixset) = @_; my $it = $matrixset->Iterator(); while (my $matrix_object = $it->next) { # do whatever you want with individual matrix objects $self->store_Matrix( $matrix_object); } } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045.1'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-ike and may change in future versions) Args : (ID) A string. Has to be a matrix ID with version suffix in JASPAR6. Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, @IDs) = @_; # this has to be versioned IDs foreach my $ID (@IDs) { my ($base_id, $version) = split(/\./, $ID); unless ($version) { warn "You have supplied a non-versioned matrix ID to delete. Skipping $ID "; return 0; } # get relevant internal ID my ($int_id) = $self->_get_internal_id($base_id, $version); eval { my $q_ID = $self->dbh->quote($int_id); foreach my $table ( qw (MATRIX_DATA MATRIX MATRIX_SPECIES MATRIX_PROTEIN MATRIX_ANNOTATION) ) { $self->dbh->do("DELETE from $table where ID=$q_ID"); } }; } return $@; } =head2 get_TFFM_by_ID Title : get_TFFM_by_ID Usage : my $tffm = $db->get_TFFM_by_ID('TFFM0001'); Function: fetches TFFM data under the given ID from the database and returns a TFBS::TFFM object. Returns : a TFBS::TFFM object Args : TFFM ID TFFM_ID id is a string which refers to the stable JASPAR TFFM ID (usually something like "TFFM0001") with or without version numbers. "TFFM0001" will give the latest version on TFFM0001, while "TFFM0001.2" will give the second version, if existing. Warnings will be given for non-existing TFFMs. =cut sub get_TFFM_by_ID { my ($self, $id) = @_; # id is a stable ID with possible version number unless (defined $id) { $self->throw("No ID passed to get_TFFM_by_ID"); } # separate stable ID and version number my ($base_ID, $version) = split(/\./, $id); # latest version per default $version = $self->_get_TFFM_latest_version($base_ID) unless $version; # get internal ID - also a check for validity my $int_id = $self->_get_TFFM_internal_id($base_ID, $version); # get TFFM using internal ID my $tffm = $self->_get_TFFM_by_int_id($int_id); return ($tffm); } =head2 get_TFFM_by_matrix_ID Title : get_TFFM_by_matrix_ID Usage : my $tffm = $db->get_TFFM_by_matrix_ID('MA0001.1'); Function: fetches TFFM data under related to the given matrix ID from the database and returns a TFBS::TFFM object. Returns : a TFBS::TFFM object Args : Matrix ID Matrix ID id is a string which refers to the stable JASPAR matrix ID (usually something like "MA0148.3"). Note that this *should* be a fully qualified matrix ID (with version) as the TFFM is related to a specific version of a matrix. If no matrix version is given, the latest version of the matrix is retrieved and the corresponding TFFM for that matrix version is retrieved (could be no TFFM). In general only a few matrices have associated TFFMs so in many cases no TFFM will be retrieved. In these cases we return undef. =cut sub get_TFFM_by_matrix_ID { # id is a fully qualified matrix stable ID including version number my ($self, $matrix_id) = @_; unless (defined $matrix_id) { $self->throw("No matrix ID passed to get_TFFM_by_matrix_ID"); } # separate matrix stable ID and version number my ($matrix_base_id, $matrix_version) = split(/\./, $matrix_id); # latest matrix version per default $matrix_version = $self->_get_latest_version($matrix_base_id) unless $matrix_version; my $sth = $self->dbh->prepare( qq! SELECT BASE_ID, VERSION, NAME, LOG_P_1ST_ORDER, LOG_P_DETAILED, EXPERIMENT_NAME FROM TFFM WHERE MATRIX_BASE_ID = "$matrix_base_id" AND MATRIX_VERSION = "$matrix_version" ! ); $sth->execute(); my ($base_id, $version, $name, $log_p_1st_order, $log_p_detailed, $exp_name) = $sth->fetchrow_array(); my $tffm; if ($base_id) { eval { $tffm = TFBS::TFFM->new( -ID => "$base_id.$version", -name => $name, -log_p_1st_order => $log_p_1st_order, -log_p_detailed => $log_p_detailed, -experiment_name => $exp_name, # # OR we could retrieve the matrix and set the corresponding # attribute # -matrix_ID => "$matrix_base_id.$matrix_version" ) }; if ($@) { $self->throw($@); } # # Instead of storing the matrix ID, get the related matrix and store in # the matrix attribute. # # my $matrix = $self->get_Matrix_by_ID( # "$matrix_base_id.$matrix_version", 'PFM' # ); # # $tffm->matrix($matrix); # } else { warn "No TFFM exists for matrix '$matrix_base_id.$matrix_version'"; } return $tffm; } =head2 get_TFFM_by_name Title : get_TFFM_by_name Usage : my $tffm = $db->get_TFFM_by_name('HNF-1'); Function: fetches TFFM data under the given name from the database and returns a TFBS::TFFM object Returns : a TFBS::TFFM object Args : A TFFM name - the name of the transcription factor for which this TFFM was modelled. This is the same as the name of the matrix used to train the TFFM. Warning : According to the current JASPAR data model, name is not necessarily a unique identifier. Also, names change over time. In the case where there are several TFFMs with the same name in the database, the function fetches the first one and prints a warning on STDERR. You've been warned. Some matrices have multiple versions. The function will return the latest version. For specific versions, use get_TFFM_by_ID($ID.$version) =cut sub get_TFFM_by_name { my ($self, $name) = @_; unless (defined $name) { $self->throw("No name passed to get_TFFM_by_name."); } # sanity check: are there many different stable IDs with same name? my $sth = $self->dbh->prepare( qq!SELECT distinct BASE_ID FROM TFFM WHERE NAME="$name"! ); $sth->execute(); my (@stable_ids) = $sth->fetchrow_array(); my $L = scalar @stable_ids; $self->warn("There are $L distinct stable IDs with name '$name'") if scalar $L > 1; return $self->get_TFFM_by_ID($stable_ids[0]); } ######################################################################### # PRIVATE METHODS ######################################################################### sub _new { #PROBABLY OK my ($caller, %args) = @_; my $class = ref $caller || $caller; my $self = bless {}, $class; my ($connectstring, $user, $password); if ($args{'-connect'} and (ref($args{'-connect'}) eq "ARRAY")) { ($connectstring, $user, $password) = @{$args{'-connect'}}; } elsif ($args{'-create'} and (ref($args{'-create'}) eq "ARRAY")) { return $caller->create(@{-args {'create'}}); } else { ($connectstring, $user, $password) = (DEFAULT_CONNECTSTRING, DEFAULT_USER, DEFAULT_PASSWORD); } $self->dbh(DBI->connect($connectstring, $user, $password, {mysql_enable_utf8 => 1})); return $self; } sub _store_matrix_data { # DONE my ($self, $pfm, $int_id, $ACTION) = @_; my @base = qw(A C G T); my $matrix = $pfm->matrix(); my $type; my $sth = $self->dbh->prepare(q! INSERT INTO MATRIX_DATA VALUES(?,?,?,?) !); for my $i (0 .. 3) { for my $j (0 .. ($pfm->length - 1)) { $sth->execute( $int_id, $base[$i], $j + 1, $matrix->[$i][$j] ) or $self->throw("Error executing query."); } } } sub _store_matrix { #DONE # creation of the matrix will also give an internal unique ID # (incremental int) which will be returned to use for the other tables my ($self, $pfm, $ACTION) = @_; # # Added check that the version is not already stored as part of the ID. # Also added a more informative insertion exception message. # DJA 2016/08/26 # my $id = $pfm->ID; my $base_id; my $version; if ($id =~ /^(\S+)\.(\d+)/) { $base_id = $1; $version = $2; } else { $base_id = $id; } unless ($version) { # Get collection and version from the matrix tags $version = $pfm->{'tags'}{'version'}; } # will warn but not die if version is missing: will assume 1 unless ($version) { warn "WARNING: Lacking version number for " . $pfm->ID . ". Setting version=1"; $version = 1; } my $collection = $pfm->{'tags'}{'collection'}; unless ($collection) { warn "WARNING: Lacking collection name for " . $pfm->ID . ". Setting collection to an empty string. You probably do not want this"; $collection = ''; } # sanity check: do we already have this cobination of base ID and version? # If we do, die my $sth = $self->dbh->prepare( qq! select count(*) from MATRIX where VERSION=$version and BASE_ID= "$base_id" and collection="$collection" ! ); $sth->execute; my ($sanity_count) = $sth->fetchrow_array; if ($sanity_count > 0) { warn "WARNING: Database input inconsistency: You have already have $sanity_count $base_id matrices of version $version in collection $collection. Terminating program"; die; } # insert data $sth = $self->dbh->prepare( q! INSERT INTO MATRIX VALUES(?,?,?,?,?) ! ); # update next sth with actual version and collection: DO $sth->execute(0, $collection, $pfm->ID, $version, $pfm->name) or $self->throw( sprintf("Error inserting matrix %s as %s.%d to %s collection", $pfm->name, $pfm->ID, $version, $collection) ); # get the actual (new) iternal ID my $int_id = $self->dbh->{q{mysql_insertid}}; return $int_id; } sub _store_matrix_annotation { # DONE # # this is for tag-value items that are not one-to-many (so, not species # and not acc) # # We do need to store multiple values of class and family. In the case # of profiles representing dimers, the TFs making up the dimer may have # different TF classes and families. It also seemed abitrary to have a # primary key constraint on the ID + TAG. This has been removed. # # Added more informative messages to insertion failure exceptions. # Modified split to also handle spaces around comma. # DJA 2015/08/26 # my ($self, $pfm, $int_id, $ACTION) = @_; my $sth = $self->dbh->prepare( q! INSERT INTO MATRIX_ANNOTATION (ID, tag, val) VALUES(?,?,?) ! ); # get all tags # but skip out collection or version as we already have those in the # MATRIX table # special handling for class which mighht have a true slot my %tags = $pfm->all_tags(); if (defined($pfm->{class})) { $tags{class} = $pfm->{class}; } foreach my $tag (keys %tags) { next if $tag eq "collection"; next if $tag eq "version"; next if $tag eq "species"; # # The 'acc' tag was commented out in JASPAR6. Not sure the reasoning # as this stores the protein accession (UniProt IDs) which are # stored in the MATRIX_PROTEIN table and therefore should NOT be # stored in the MATRIX_ANNOTATION table and therefore *should* be # skipped here. # DJA 2015/08/26 # next if $tag eq "acc"; # next if $tag eq "class"; #$sth->execute($int_id, $tag, ($tags{$tag} or ""),) # or $self->throw("Error executing query"); # # Since we may have multiple values stored in some tags we need to # handle this. Assume that they may be stored either as array # references or as strings with comma separators. # DJA 2015/08/26 # my $vals = $tags{$tag}; if (ref $vals eq 'ARRAY') { foreach my $val (@$vals) { $val =~ s/^\s+//; $val =~ s/\s+$//; $sth->execute($int_id, $tag, $val) or $self->throw( "Error inserting tag/value pair ('$tag', '$val')" . " into MATRIX_ANNOTATION" ); } } else { foreach my $val (split(/\s*\,\s*/, $vals)) { $val =~ s/^\s+//; $val =~ s/\s+$//; $sth->execute($int_id, $tag, $val) or $self->throw( "Error inserting tag/value pair ('$tag', '$val')" . " into MATRIX_ANNOTATION" ); } } } } sub _store_matrix_species { # DONE #these are for species IDs - can be several # these are taken from the tag "species" # if that tag is a reference to an array we walk over the array # if it is a comma-separated string we split the string # # Added exception handling/messages for insertion failure. # Modified split to also handle spaces around comma. # DJA 2015/08/26 # my ($self, $pfm, $int_id, $ACTION) = @_; my $sth = $self->dbh->prepare( q! INSERT INTO MATRIX_SPECIES VALUES(?,?) ! ); #sanity check: are there any species? Its ok not to have it. return () unless $pfm->{'tags'}{'species'}; #is the species a string or an arrayref? if (ref($pfm->{'tags'}{'species'}) eq 'ARRAY') { # walkthru array foreach my $species (@{$pfm->{'tags'}{'species'}}) { $sth->execute($int_id, $species) or $self->throw( "Error inserting species '$species' for matrix $int_id)" ); } } else { # split and walk thru foreach my $species (split(/\s*\,\s*/, $pfm->{'tags'}{'species'})) { $species =~ s/^\s+//; $sth->execute($int_id, $species) or $self->throw( "Error inserting species '$species' for matrix $int_id)" ); } } } sub _store_matrix_acc { # DONE #these are for protein accession numbers - can be several # these are taken from the tag "acc" # if that tag is a reference to an array we walk over the array # if it is a comma-separated string we split the string # # For some reason the MATRIX_PROTEIN table had a primary key constraint # on the matrix ID field, preventing multiple proteins from being stored # here. That was presumably an oversight or otherwise an out of date table # definition. # # Also added more informative insertion exception messages. # DJA 2016/08/26 # my ($self, $pfm, $int_id, $ACTION) = @_; my $sth = $self->dbh->prepare( q! INSERT INTO MATRIX_PROTEIN VALUES(?,?) ! ); #sanity check: are there any accession numbers? Its ok not to have it. return () unless $pfm->{'tags'}{'acc'}; #is the protein ID a string or an arrayref? if (ref($pfm->{'tags'}{'acc'}) eq 'ARRAY') { # walkthru array foreach my $acc (@{$pfm->{'tags'}{'acc'}}) { $acc =~ s/\s//g; $sth->execute($int_id, $acc) or $self->throw( "Error inserting protein '$acc' for matrix $int_id)" ); } } else { # split and walk thru foreach my $acc (split(/\,/, $pfm->{'tags'}{'acc'})) { $acc =~ s/\s//g; $sth->execute($int_id, $acc) or $self->throw( "Error inserting protein '$acc' for matrix $int_id)" ); } } } #when creating: try to support arbitrary tags sub _create_tables { # DONE # utility function # If you want to change the databse schema, # this is the right place to do it # # Changed the primary key constraint on MATRIX_ANNOTATION to a simple # key. There are tags for which we do want to have multiple entries, e.g. # in the case of a dimer profiles the two TFs making up the dimer may # have a different TF class and/or family therefore we need to store more # than one record with tag 'class' or tag 'family' for the same matrix # ID. # # Also added key on ID to MATRIX_PROTEIN and MATRIX_SPECIES which seemed # to be missing. # # We may also want to set the charset/collation to utf8/utf8_unicode_ci # either at the DB level or to the individual table definitions # (particularly for the MATRIX_ANNOTATION table) to handle non-Latin # characters correctly. # # DJA 2015/08/26 # my $dbh = shift; my @queries = ( q! CREATE TABLE MATRIX ( ID INT(11) NOT NULL AUTO_INCREMENT, COLLECTION VARCHAR (16) DEFAULT '', BASE_ID VARCHAR (16) DEFAULT '' NOT NULL , VERSION TINYINT(4) DEFAULT 1 NOT NULL , NAME VARCHAR (255) DEFAULT '' NOT NULL, PRIMARY KEY (ID)) !, q! CREATE TABLE MATRIX_DATA ( ID INT(11) NOT NULL, row VARCHAR(1) NOT NULL, col TINYINT(3) UNSIGNED NOT NULL, val float(10,3), PRIMARY KEY (ID, row, col)) !, q! CREATE TABLE MATRIX_ANNOTATION ( ID INT(11) NOT NULL, TAG VARCHAR(255) DEFAULT '' NOT NULL, VAL varchar(255) DEFAULT '', KEY (ID, TAG)) !, q! CREATE TABLE MATRIX_SPECIES ( ID INT(11) NOT NULL, TAX_ID VARCHAR(255) DEFAULT '' NOT NULL KEY (ID)) !, q! CREATE TABLE MATRIX_PROTEIN ( ID INT(11) NOT NULL, ACC VARCHAR(255) DEFAULT '' NOT NULL KEY (ID)) !, q! CREATE TABLE TFFM ( ID int(11) NOT NULL auto_increment, BASE_ID varchar(16) NOT NULL, VERSION tinyint(4) NOT NULL, MATRIX_BASE_ID varchar(16) NOT NULL, MATRIX_VERSION tinyint(4) NOT NULL, NAME varchar(255) NOT NULL, LOG_P_1ST_ORDER float default NULL, LOG_P_DETAILED float default NULL, EXPERIMENT_NAME varchar(255) default NULL, PRIMARY KEY (ID), KEY BASE_ID (BASE_ID, VERSION), KEY MATRIX_BASE_ID (MATRIX_BASE_ID, MATRIX_VERSION) ! ); foreach my $query (@queries) { $dbh->do($query) or die("Error executing the query: $query\n"); } } sub _get_matrixstring { #DONE my ($self, $ID) = @_; #my %dbname = (PWM => 'pwm', PFM => 'raw', ICM => 'info'); #unless (defined $dbname{$mt}) { #$self->throw("Unsupported matrix type: ".$mt); #} my $sth; my $qID = $self->dbh->quote($ID); my $matrixstring = ""; foreach my $base (qw(A C G T)) { $sth = $self->dbh->prepare( "SELECT val FROM MATRIX_DATA WHERE ID=$qID AND row='$base' ORDER BY col" ); $sth->execute; $matrixstring .= join(" ", (map {$_->[0]} @{$sth->fetchall_arrayref()})) . "\n"; } $sth->finish; return undef if $matrixstring eq "\n" x 4; return $matrixstring; } sub _get_latest_version { #DONE my ($self, $base_ID) = @_; # SELECT VERSION FROM MATRIX WHERE BASE_ID=? ORDER BY VERSION DESC LIMIT 1 my $sth = $self->dbh->prepare( qq!SELECT VERSION FROM MATRIX WHERE BASE_ID="$base_ID" ORDER BY VERSION DESC LIMIT 1! ); $sth->execute; my ($latest) = $sth->fetchrow_array(); return ($latest); } sub _get_internal_id { #DONE # picks out the internal id for a a stable id+ version. Also checks if this cobo exists or not my ($self, $base_ID, $version) = @_; # SELECT ID FROM MATRIX WHERE BASE_ID=? and VERSION=? my $sth = $self->dbh->prepare( qq!SELECT ID FROM MATRIX WHERE BASE_ID="$base_ID" AND VERSION="$version"! ); $sth->execute; my ($int_id) = $sth->fetchrow_array(); return ($int_id); } sub _get_Matrix_by_int_id { #done my ($self, $int_id, $mt) = @_; my $matrixobj; $mt = 'PFM' unless $mt; # get the matrix as a string my $matrixstring = $self->_get_matrixstring($int_id) || return undef; #get remaining data in the matrix table: name, collection my $sth = $self->dbh->prepare( qq!SELECT BASE_ID,VERSION, COLLECTION,NAME FROM MATRIX WHERE ID="$int_id"! ); $sth->execute(); my ($base_ID, $version, $collection, $name) = $sth->fetchrow_array(); # jsp6 # get species ##$sth=$self->dbh->prepare(qq!SELECT TAX_ID FROM MATRIX_SPECIES WHERE ID="$int_id"!); $sth = $self->dbh->prepare( qq!SELECT GROUP_CONCAT(TAX_ID SEPARATOR ', ') as TAX_ID FROM MATRIX_SPECIES WHERE ID="$int_id"! ); $sth->execute(); my @tax_ids; while (my ($res) = $sth->fetchrow_array()) { my @res_v = split(/,/, $res); my @res_v2 = grep(s/^\s*(.*)\s*$/\1/g, @res_v); push(@tax_ids, @res_v2); } # jsp6 # get acc ##$sth=$self->dbh->prepare(qq!SELECT ACC FROM MATRIX_PROTEIN WHERE ID="$int_id"!); $sth = $self->dbh->prepare( qq!SELECT GROUP_CONCAT(ACC SEPARATOR ', ') as ACC FROM MATRIX_PROTEIN WHERE ID="$int_id"! ); $sth->execute(); my @accs; while (my ($res) = $sth->fetchrow_array()) { my @res_v = split(/,/, $res); my @res_v2 = grep(s/^\s*(.*)\s*$/\1/g, @res_v); push(@accs, @res_v2); } # jsp6 # get remaining annotation as tags, form ANNOTATION table my %tags; $sth = $self->dbh->prepare( qq{SELECT TAG, VAL FROM MATRIX_ANNOTATION WHERE ID = "$int_id" }); $sth->execute(); ## my @key_to_split=("acc", "medline", "pazar_tf_id"); #if acc in MATRIX_ANNOTATION #my @key_to_split=("medline", "pazar_tf_id"); # # Added 'class' and 'family' to keys to split. Since we have dimers that # may have different classes / families. Previously stored in DB as comma # separated lists, now store as separate records. # XXX But this breaks the interface code! FIXME XXX # DJA 2015/09/10 # #my @key_to_split = ("medline", "pazar_tf_id", "tfbs_shape_id", "tfe_id", "class", "family"); my @key_to_split = ("class", "family", "medline", "pazar_tf_id", "tfbs_shape_id", "tfe_id"); # # See FIXME comment below. # DJA 2015/09/14 # #foreach my $key (@key_to_split) { # $tags{$key} = ['-']; #} # # Fixed so that we can have multiple comma separated values, values in # separate rows of the table or a combination thereof. We really should # try to get away from having to specify which keys can be split (should # be handled more gracefully). # DJA 2015/09/14 # my $vals; while (my ($tag, $val) = $sth->fetchrow_array()) { $vals = []; if ($tag ~~ @key_to_split) { my @val_v = split(/,/, $val); my @val_v2 = grep(s/^\s*(.*)\s*$/\1/g, @val_v); #push(@$vals, @val_v2); #$tags{$tag} = $vals; push @{$tags{$tag}}, @val_v2; } else { $tags{$tag} = $val; } } # # XXX FIXME # This really doesn't belong here. It is done for the purposes of the web # interface but this is DB code which should not presume how the returned # data is going to be used. That should be handled in the JASPAR web code # modules. JASPAR web code currently expects all keys to split to exist! # DJA 2015/09/14 # XXX FIXME # foreach my $key (@key_to_split) { $tags{$key} = ['-'] unless $tags{$key}; } # jsp6 $tags{'collection'} = $collection; $tags{'species'} = \@tax_ids; # as array reference instead of strigifying $tags{'acc'} = \@accs; # same, if acc MATRIX_PROTEIN # my $class = $tags{'class'}; delete($tags{'class'}); # eval( "\$matrixobj = TFBS::Matrix::PFM->new" . ' ( -ID => "$base_ID.$version", -name => $name, -class => $class, -tags => \%tags, -matrixstring => $matrixstring # FIXME - temporary );' ); if ($@) { $self->throw($@); } # warn $int_id, "\t", ref($matrixobj); return ($matrixobj->to_PWM) if $mt eq "PWM"; return ($matrixobj->to_ICM) if $mt eq "ICM"; return ($matrixobj); # default PFM } ##jsp6 sub _get_IDlist_by_query { #needs cleanup. NOT for the faint-hearted. my ($self, %args) = @_; warn '_get_IDlist_by_query | $self || ', $self; warn '_get_IDlist_by_query | %args || ', %args; # called by get_MatrixSet # warn $args{"-collection"}; $args{'-collection'} = 'CORE' unless $args{'-collection'}; # returns a set of internal IDs with whicj to get the actual matrices # current idea: # 1: first catch non-tag things like collection, name and version, species # makw one query for these if they are named and check the IDs for "latest" unless requested not to. # these are AND statements # 2:then do the rest on tag level: # to be able to do this with actual and tattemnet innthe tag table, we do an inner join query, which is kept separate just for convenice # we then intersect 1 and 2 # 3: then do matrix-based features such as ic, with, number of sites etc, for the surviving matrices. This shold happen in the get_matrixset part my @int_ids_to_return; ## jsp6 - autosearch if ($args{'-auto'}) { ##my $sth=$self->dbh->prepare (qq!SELECT ID FROM MATRIX WHERE BASE_ID=?!); my $sth = $self->dbh->prepare( qq!SELECT U.ID FROM (SELECT ID, BASE_ID as VAL FROM MATRIX UNION ALL SELECT ID, NAME as VAL FROM MATRIX UNION ALL SELECT ID, ACC as VAL FROM MATRIX_PROTEIN UNION ALL SELECT ID, TAX_ID as VAL FROM MATRIX_SPECIES UNION ALL SELECT ID, SPECIES as VAL FROM MATRIX_SPECIES,TAX WHERE MATRIX_SPECIES.TAX_ID=TAX.TAX_ID UNION ALL SELECT ID, NAME as VAL FROM MATRIX_SPECIES,TAX_EXT WHERE MATRIX_SPECIES.TAX_ID=TAX_EXT.TAX_ID AND MATRIX_SPECIES.TAX_ID=9606 UNION ALL SELECT ID, VAL as VAL FROM MATRIX_ANNOTATION) AS U WHERE LOWER(`VAL`) LIKE LOWER(?)! ); warn '_get_IDlist_by_query | $sth || ', $sth; foreach my $stID (@{$args{'-auto'}}) { warn '_get_IDlist_by_query | $stID || ', $stID; my ($stable_ID, $version) = split(/\./, $stID) ; # ignore vesion here, this is a stupidity filter #$sth->execute($stable_ID); $sth->execute("%" . $stable_ID . "%"); while (my ($int_id) = $sth->fetchrow_array()) { warn '_get_IDlist_by_query | $int_id || ', $int_id; push(@int_ids_to_return, $int_id); } } return \@int_ids_to_return; } # should redo so that matrix_annotation queries are separate, with an intersect in the end #special case 1: get ALL matrices. Higher priority than all if ($args{'-all'}) { my $sth = $self->dbh->prepare(qq!SELECT ID FROM MATRIX!); $sth->execute(); my @a; while (my ($i) = $sth->fetchrow_array()) { push(@a, $i); } return \@a; } # ids: special case2 which is has higher priority than any other except the above (ignore all others if ($args{'-ID'}) { # these might be either stable IDs or stableid.version. # if just stable ID and if all_versions==1, take all versions, otherwise the latest if ($args{-all_versions}) { my $sth = $self->dbh->prepare( qq!SELECT ID FROM MATRIX WHERE BASE_ID=?!); foreach my $stID (@{$args{'-ID'}}) { my ($stable_ID, $version) = split(/\./, $stID) ; # ignore vesion here, this is a stupidity filter $sth->execute($stable_ID); while (my ($int_id) = $sth->fetchrow_array()) { push(@int_ids_to_return, $int_id); } } } else { # only the lastest version, or the requested version foreach my $stID (@{$args{'-ID'}}) { #warn $stID; my ($stable_ID, $version) = split(/\./, $stID); $version = $self->_get_latest_version($stable_ID) unless $version; my $int_id = $self->_get_internal_id($stable_ID, $version); push(@int_ids_to_return, $int_id) if $int_id; } } return \@int_ids_to_return; } my @tables = ("MATRIX M"); my @and; # in matrix table: collection, if ($args{-collection}) { my $q = ' (COLLECTION='; if (ref $args{-collection} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-collection}}) { push(@a, "\"$_\""); } $q .= join(" or COLLECTION=", @a); } else { # just one - typical usage $q .= "\"$args{-collection}\""; } $q .= " ) "; push(@and, $q); } # in matrix table: names. Is something that is basically only used from the web interface # typically used by the get_matrix_by_name function instead if ($args{-name}) { my $q = ' (NAME='; if (ref $args{-name} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-name}}) { push(@a, "\"$_\""); } $q .= join(" or NAME=", @a); } else { # just one - typical usage $q .= "\"$args{-name}\""; } $q .= " ) "; push(@and, $q); } # in species table: tax.id: possibly many species with OR in between if ($args{-species}) { push(@tables, "MATRIX_SPECIES S"); my $q = " M.ID=S.ID and (TAX_ID= "; if (ref $args{-species} eq "ARRAY") { # so, possibly several my @a; foreach (@{$args{-species}}) { push(@a, "\"$_\""); } $q .= join(" or TAX_ID=", @a); } else { # just one - typical usage $q .= "=\"$args{-species}\""; } $q .= ") "; push(@and, $q); } # TAG_BASED # an internal join query:should be able to handle up to 26 tags-value combos with ANDS in between # Very ugly code ahead: my (@inner_tables, @internal_ands1, @internal_ands2); my $int_counter = 0; # for keeping track of names; my @alpha = ("a" .. "z"); my %arrayref; foreach my $key (keys %args) { next if $key eq "-min_ic"; next if $key eq "-matrixtype"; next if $key eq "-species"; next if $key eq "-collection"; next if $key eq "-all_versions"; next if $key eq "-all"; next if $key eq "-ID"; next if $key eq "-length"; next if $key eq "-name"; my $oldkey = $key; $key =~ s/-//; $arrayref{$key} = $args{$oldkey}; } if (%arrayref) { # get an internal name for the table push(@internal_ands2, " M.ID=a.ID "); my @a; foreach my $key (keys %arrayref) { my $tname = $alpha[$int_counter]; push(@inner_tables, "MATRIX_ANNOTATION $tname"); push(@internal_ands1, $alpha[$int_counter] . ".ID=" . $alpha[$int_counter - 1] . ".ID") unless $int_counter == 0; $int_counter++; # is the thing aupplied an array reference in inteslf: make an "or" query from that if (ref $arrayref{$key} eq "ARRAY") { my @b; foreach (@{$arrayref{$key}}) { push(@b, $self->dbh->quote($_)); } my $orstring = join(" or $tname.VAL=", @b); push(@a, "($tname.TAG=\"$key\" AND ($tname.VAL=$orstring))"); } #or not else { push(@a, "($tname.TAG=\"$key\" AND $tname.VAL=\"$arrayref{$key}\")" ); } } my $s = " ( " . join(" AND ", @a) . ")"; push(@internal_ands2, $s); } my $qq = "SELECT distinct(M.ID) from " . join(",", (@tables, @inner_tables)) . " where" . join(" AND ", (@and, @internal_ands1, @internal_ands2)); # warn $qq; #do actual mammoth query,and check for latest matrix my $sth = $self->dbh->prepare($qq); $sth->execute(); my @r; while (my ($int_id) = $sth->fetchrow_array) { if ($args{-all_versions}) { push(@r, $int_id); } else { # is latest? push(@r, $int_id) if ($self->_is_latest_version($int_id) == 1); } } warn "Warning: Zero matrices returned with current critera" unless scalar @r; return \@r; } ## jsp6 - checkpoint sub _is_latest_version { # is a particular internal ID representingthe latest matrix (collapse on base ids) my ($self, $int_id) = @_; my $sth = $self->dbh->prepare( qq! select count(*) from MATRIX where BASE_ID= (SELECT BASE_ID from MATRIX where ID=$int_id) AND VERSION>(SELECT VERSION from MATRIX where ID=$int_id) ! ); $sth->execute(); my ($count) = $sth->fetchrow_array(); return (1) if $count == 0; # no matrices with higher version ID and same base id return (0); } sub _get_TFFM_latest_version { my ($self, $base_ID) = @_; # SELECT VERSION FROM TFFM WHERE BASE_ID=? ORDER BY VERSION DESC LIMIT 1 my $sth = $self->dbh->prepare( qq!SELECT VERSION FROM TFFM WHERE BASE_ID="$base_ID" ORDER BY VERSION DESC LIMIT 1! ); $sth->execute; my ($latest) = $sth->fetchrow_array(); return ($latest); } sub _get_TFFM_internal_id { # picks out the internal id for a stable id version. Also checks if this # combo exists or not my ($self, $base_ID, $version) = @_; # SELECT ID FROM TFFM WHERE BASE_ID=? and VERSION=? my $sth = $self->dbh->prepare( qq!SELECT ID FROM TFFM WHERE BASE_ID="$base_ID" AND VERSION="$version"! ); $sth->execute; my ($int_id) = $sth->fetchrow_array(); return ($int_id); } sub _get_TFFM_by_int_id { #done my ($self, $int_id) = @_; my $tffm; my $sth = $self->dbh->prepare( qq! SELECT BASE_ID, VERSION, MATRIX_BASE_ID, MATRIX_VERSION, NAME, LOG_P_1ST_ORDER, LOG_P_DETAILED, EXPERIMENT_NAME FROM TFFM WHERE ID = "$int_id" ! ); $sth->execute(); my ($base_id, $version, $matrix_base_id, $matrix_version, $name, $log_p_1st_order, $log_p_detailed, $exp_name) = $sth->fetchrow_array(); my $tffm; eval { $tffm = TFBS::TFFM->new( -ID => "$base_id.$version", -name => $name, -log_p_1st_order => $log_p_1st_order, -log_p_detailed => $log_p_detailed, -experiment_name => $exp_name, # # OR we could retrieve the matrix and set the corresponding # attribute # -matrix_ID => "$matrix_base_id.$matrix_version" ) }; if ($@) { $self->throw($@); } # # Instead of storing the matrix ID, get the related matrix and store in # the matrix attribute. # # my $matrix = $self->get_Matrix_by_ID( # "$matrix_base_id.$matrix_version", 'PFM' # ); # # $tffm->matrix($matrix); # return $tffm; # default PFM } sub _is_TFFM_latest_version { my ($self, $int_id) = @_; my $sth = $self->dbh->prepare( qq! select count(*) from TFFM where BASE_ID = (SELECT BASE_ID from TFFM where ID = $int_id) AND VERSION > (SELECT VERSION from TFFM where ID = $int_id) ! ); $sth->execute(); my ($count) = $sth->fetchrow_array(); # no TFFMs with higher version ID and same base id return (1) if $count == 0; return (0); } sub DESTROY { #OK $_[0]->dbh->disconnect() if $_[0]->dbh; } TFBS-0.7.1/TFBS/DB/LocalTRANSFAC.pm000077500000000000000000000167241305752266700160440ustar00rootroot00000000000000# TFBS module for TFBS::DB::LocalTRANSFAC # # Copyright Stephen Montgomery smontgom@bcgsc.bc.ca # # Contributors: Boris Lenhard, Leonardo Marino-Ramirez # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::LocalTRANSFAC - interface to local transfac database position frequency matrices (matrix.dat) -------------------------------- NOTICE ---------------------------------- The TRANSFAC database is free for non-commercial use. For commercial use the TRANSFAC databases and programs have to be licensed. Please read the DISCLAIMER at http://transfac.gbf.de/TRANSFAC/disclaimer.htm. ------------------------------------------------------------------------- =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to TRANSFAC data my $db = TFBS::DB::LocalTRANSFAC->connect(-localdir => '/home/someusr'); localdir is the location of the matrix.dat TRANSFAC datafile =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('V$CEBPA_01','PFM'); #retrieving a PWM by TRANSFAC accession number my $pwm = $db->get_Matrix_by_acc('M00116', 'PWM'); =back =head1 DESCRIPTION TFBS::DB::LocalTRANSFAC is a read only database interface that fetches TRANSFAC matrix data from a local TRANSFAC install (matrix.dat) =cut package TFBS::DB::LocalTRANSFAC; use vars qw(@ISA); use strict; use TFBS::DB::TRANSFAC; use TFBS::Matrix::PFM; @ISA = qw(TFBS::DB::TRANSFAC); =head2 connect Title : connect Usage : my $db = TFBS::DB::TRANSFAC->connect(%args); Function: Creates a TRANSFAC database connection object, which can be used to retrieve matrices from a locally installed TRANSFAC database Returns : a TFBS::DB::TRANSFAC object Args : -localdir # REQUIRED: the directory of the matrix.dat TRANSFAC # datafile. matrix.dat must have read access. -accept_conditions # OPTIONAL: by setting this to a true # value, you confirm that you # have read and accepted the terms # of use of TRANSFAC at # http://transfac.gbf.de/TRANSFAC/disclaimer.htm; # this also suppresses the annoying # message that is printed to STDERR # upon invoking the method =cut sub connect { my ($caller, %args) = @_; my $self = bless { 'loc' => $args{'-localdir'}}, ref $caller || $caller; unless (defined ($args{-accept_conditions}) and $args{-accept_conditions}) { print STDERR <connect(-accept_conditions => 1); -------------------------------------------------------------------------- ENDNOTICE ; } unless (defined $args{-localdir}) { $self->throw("Need directory of TRANSFAC database"); } return $self; } =head2 get_Matrix_by_acc Title : get_Matrix_by_acc Usage : my $pfm = $db->get_Matrix_by_acc('V$CREB_01', 'PFM'); Function: fetches matrix data under the given TRANSFAC aaccession number from database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PFM is retrieved by default. =cut sub get_Matrix_by_acc { my ($self, $acc, $mt) = @_; unless (defined $acc) { $self->throw("No parameters passed to get_Matrix_by_ID."); } my $datablock = _get_Matrix_Block ( 'acc' => $acc, 'loc' => $self->{'loc'}); return $self->_get_Matrix_by_Block($datablock, $mt); } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('V$CREB_01', 'PFM'); Function: fetches matrix data under the given TRANSFAC ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PFM is retrieved by default. =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; unless (defined $ID) { $self->throw("No parameters passed to get_Matrix_by_ID."); } my $datablock = _get_Matrix_Block ( 'ID' => $ID, 'loc' => $self->{'loc'}); return $self->_get_Matrix_by_Block($datablock, $mt); } sub _get_Matrix_Block { my %params = @_; my $loc = $params{'loc'}; my $acc = $params{'acc'}; my $ID = $params{'ID'}; $loc = $loc . "/matrix.dat"; open(HANDLE, $loc) || die ("File opening failed for matrix.dat: Check file permissions"); my @raw_data=; my @block = (); my $hit = 0; foreach my $line (@raw_data) { if ($line eq "//\n") { foreach my $lineinblock (@block) { if (defined $ID) { if ($lineinblock eq "ID $ID\n") { $hit = 1 }; } if (defined $acc) { if ($lineinblock eq "AC $acc\n") { $hit = 1 }; } } if ($hit == 0) { @block = (); } } if ($hit == 0) { push @block, $line; } } close(HANDLE); return \@block; } sub _get_Matrix_by_Block { my ($self, $datablock, $mt) = @_; my @datalines = @$datablock; my (@As, @Cs, @Gs, @Ts, $name, $ID, $acc); foreach my $line (@datalines) { if ($line =~ /NA\s+(\S+)\n/) { $name = $1; } if ($line =~ /ID\s+(\S+)\n/) { $ID = $1; } if ($line =~ /AC\s+(\S+)\n/) { $acc = $1; } #if ($line =~ /\d{2}\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\n/) { # change to allow for both older and newer file format # contributed by Leonardo Marino-Ramirez: # Updated 2003-09-05 to enable parsing of non-integer entries if ($line =~ /^\d{2}\s+(\d+\.?\d*)\s+(\d+\.?\d*)\s+(\d+\.?\d*)\s+(\d+\.?\d*).*$/) { push @As, $1; push @Cs, $2; push @Gs, $3; push @Ts, $4; } } return undef unless @As; my $pfm = TFBS::Matrix::PFM-> new ( -ID => $ID, -name => $name, -tags => {acc=>$acc}, -matrix => [ \@As, \@Cs, \@Gs, \@Ts] ); if (!defined($mt) or uc($mt) eq "PFM") {return $pfm;} elsif (uc($mt) eq "ICM") {return $pfm->to_ICM;} elsif (uc($mt) eq "PWM") {return $pfm->to_PWM;} else { $self->throw("Unrecognized matrix format: $mt"); } } 1; TFBS-0.7.1/TFBS/DB/TRANSFAC.pm000077500000000000000000000165371305752266700150730ustar00rootroot00000000000000# TFBS module for TFBS::DB::TRANSFAC # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::TRANSFAC - interface to database of TRANSFAC public position frequency matrices at TESS (http://www.cbil.upenn.edu/tess) -------------------------------- NOTICE ---------------------------------- The TRANSFAC database is free for non-commercial use. For commercial use the TRANSFAC databases and programs have to be licensed. Please read the DISCLAIMER at http://transfac.gbf.de/TRANSFAC/disclaimer.htm. ------------------------------------------------------------------------- =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to TRANSFAC data my $db = TFBS::DB::TRANSFAC->connect(); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('V$CEBPA_01','PFM'); #retrieving a PWM by TRANSFAC accession number my $pwm = $db->get_Matrix_by_acc('M00116', 'PWM'); =back =head1 DESCRIPTION TFBS::DB::TRANSFAC is a read only database interface that fetches TRANSFAC matrix data from TESS web interface (http://www.cbil.upen.edu/TESS) and returns TFBS::Matrix::* objects. =cut package TFBS::DB::TRANSFAC; use vars qw(@ISA $ua); use strict; use Bio::Root::Root; use TFBS::Matrix::PFM; use LWP::Simple qw($ua get); @ISA = qw(TFBS::DB Bio::Root::Root); =head2 connect Title : connect Usage : my $db = TFBS::DB::TRANSFAC->connect(%args); Function: Creates a TRANSFAC database connection object, which can be used to retrieve matrices from public TRANSFAC databases via the web Returns : a TFBS::DB::TRANSFAC object Args : -proxy # OPTIONAL: a http proxy server name, # usually required for accessing TRANSFAC from behind # a firewall -accept_conditions # OPTIONAL: by setting this to a true # value, you confirm that you # have read and accepted the terms # of use of TRANSFAC at # http://transfac.gbf.de/TRANSFAC/disclaimer.htm; # this also suppresses the annoying # message that is printed to STDERR # upon invoking the method =cut sub connect { my ($caller, %args) = @_; my $self = bless {}, ref $caller || $caller; unless (defined ($args{-accept_conditions}) and $args{-accept_conditions}) { print STDERR <connect(-accept_conditions => 1); -------------------------------------------------------------------------- ENDNOTICE ; } if (defined $args{'-proxy'}) { $ua->proxy('http',$args{'-proxy'}); } return $self; } =head2 new Title : connect Usage : my $db = TFBS::DB::TRANSFAC->connect(%args); Function: Here, I is just a synonim for I (to make the interface consistent with other bioperl read-obly Bio::DB::* objects) Returns : a TFBS::DB::TRANSFAC object Args : -accept_conditions # see explanation at I =cut sub new { my ($caller, %args) = @_; $caller->connect(%args); } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('V$CREB_01', 'PFM'); Function: fetches matrix data under the given TRANSFAC ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PFM is retrieved by default. =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; unless (defined $ID) { $self->throw("No parameters passed to get_Matrix_by_ID."); } my $url = "http://www.cbil.upenn.edu/cgi-bin/tess/tess33?request=MTX-DBRTRV-Id&key=$ID"; # my $url = "http://www.cbil.upenn.edu/cgi-bin/tess/tess33?request=MTX-DBRTRV-Id&key=$ID"; return $self->_get_Matrix_by_URL($url, $mt); } =head2 get_Matrix_by_acc Title : get_Matrix_by_acc Usage : my $pfm = $db->get_Matrix_by_acc('V$CREB_01', 'PFM'); Function: fetches matrix data under the given TRANSFAC aaccession number from database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PFM is retrieved by default. =cut sub get_Matrix_by_acc { my ($self, $acc, $mt) = @_; unless (defined $acc) { $self->throw("No parameters passed to get_Matrix_by_ID."); } my $url = "http://www.cbil.upenn.edu/cgi-bin/tess/tess33?request=MTX-DBRTRV-Accno&key=$acc"; return $self->_get_Matrix_by_URL($url, $mt); } sub get_MatrixSet { my ($self, %args) = @_; # not yet implemented } sub _get_Matrix_by_URL { my ($self, $url, $mt) = @_; my $HTMLpage = get $url || return undef; my (@As, @Cs, @Gs, @Ts, $name, $ID, $acc); my @lines = split "\n", $HTMLpage; foreach my $line (@lines) { $line =~ s/\r//; $line =~ s/<\/{0,1}b>//gi; $line =~ s/ //gi; if ($line =~ /Name<\/td>([^<]+)([^<]+)([^<]+) new ( -ID => $ID, -name => $name, -tags => {acc=>$acc}, -matrix => [ \@As, \@Cs, \@Gs, \@Ts] ); if (!defined($mt) or uc($mt) eq "PFM") {return $pfm;} elsif (uc($mt) eq "ICM") {return $pfm->to_ICM;} elsif (uc($mt) eq "PWM") {return $pfm->to_PWM;} else { $self->throw("Unrecognized matrix format: $mt"); } } 1; TFBS-0.7.1/TFBS/Matrix.pm000077500000000000000000000252111305752266700146360ustar00rootroot00000000000000# TFBS module for TFBS::Matrix # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix - base class for matrix patterns, containing methods common to all =head1 DESCRIPTION TFBS::Matrix is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and matrix manipulation methods that are independent of the matrix type. It is not meant to be instantiated itself. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE Modified by Eivind Valen eivind.valen@gmail.com =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Matrix; use vars '@ISA'; use PDL; # this dependency has to be eliminated in the future versions use TFBS::PatternI; use strict; @ISA = qw(TFBS::PatternI); # IUPAC codes my %dna = ("A" => [12,0,0,0], "C" => [0,12,0,0], "G" => [0,0,12,0], "T" => [0,0,0,12], "U" => [0,0,0,12], "R" => [6,0,6,0], "Y" => [0,6,0,6], "M" => [6,6,0,0], "K" => [0,0,6,6], "W" => [6,0,0,6], "S" => [0,6,6,0], "B" => [0,4,4,4], "D" => [4,0,4,4], "H" => [4,4,0,4], "V" => [4,4,4,0], "N" => [3,3,3,3]); sub new { my $class = shift; my %args = @_; my $self = bless {}, ref($class) || $class; # first figure out how it was called # we need (-dbh and (-ID or -name) for fetching it from a database # or -matrix for direct matrix input if (defined $args{'-matrix'}) { $self->set_matrix($args{'-matrix'}); } elsif (defined $args{'-matrixstring'}) { $self->set_matrix($args{'-matrixstring'}); } elsif (defined $args{-matrixfile}) { my $matrixstring; open (FILE, $args{-matrixfile}) or $self->throw("Could not open $args{-matrixfile}"); { local $/ = undef; $matrixstring = ; } $self->set_matrix($matrixstring); } else { $self->throw("No matrix or db object provided."); } # Set the object data. # Parameters specified in constructor call override those # fetched from the database. $self->{'ID'} = ($args{-ID} or $self->{ID} or "Unknown"); $self->{'name'} = ($args{-name} or $self->{name} or "Unknown"); $self->{'class'} = ($args{-class} or $self->{class} or "Unknown"); $self->{'strand'} = ($args{-strand} or $self->{strand} or "+"); $self->{'bg_probabilities'} = ($args{'-bg_probabilities'} || {A => 0.25, C => 0.25, G => 0.25, T => 0.25}); $self->{'tags'} = $args{-tags} ? ((ref($args{-tags}) eq "HASH") ? $args{-tags} : {} ) :{}; return $self; } =head2 matrix Title : matrix Usage : my $matrix = $pwm->matrix(); $pwm->matrix( [ [12, 3, 0, 0, 4, 0], [ 0, 0, 0,11, 7, 0], [ 0, 9,12, 0, 0, 0], [ 0, 0, 0, 1, 1,12] ]); Function: get/set for the matrix data Returns : a reference to 2D array of integers(PFM) or floats (ICM, PWM) Args : none for get; a four line string, reference to 2D array, or a 2D piddle for set =cut sub matrix { my ($self, $matrixdata) = @_; $self->set_matrix($matrixdata) if $matrixdata; return $self->{'matrix'}; } =head2 pdl_matrix Title : pdl_matrix Usage : my $pdl = $pwm->pdl_matrix(); Function: access the PDL matrix used to store the actual matrix data directly Returns : a PDL object, aka a piddle Args : none =cut sub pdl_matrix { pdl $_[0]->{'matrix'}; } sub set_matrix { my ($self, $matrixdata) = @_; # The input matrix (specified as -array=> in the constructir call # can either be # * a 2D regular perl array with 4 rows, # * a piddle (FIXME - check for 4 rows), or # * a four-line string of numbers # print STDERR "MATRIX>>>".$matrixdata; if (ref($matrixdata) eq "ARRAY" and ref($matrixdata->[0]) eq "ARRAY" and scalar(@{$matrixdata}) == 4) { # it is a perl array $self->{'matrix'} = $matrixdata; } elsif (ref($matrixdata) eq "PDL") { # it's a piddle $self->{matrix} = _pdl_to_matrixref($matrixdata); } elsif (!ref($matrixdata)) #and (scalar split "\n",$matrixdata) == 4) { # it's a string then, but what string? if ($matrixdata =~ /^DE/) { # STAMP string $self->{matrix} = $self->_matrix_from_STAMP_string($matrixdata); } elsif ($matrixdata =~ /[ACGTURYMKWSBDHVNacgturymkwsbdhvn]{3,}/) { $self->{matrix} = $self->_matrix_from_IUPAC_string($matrixdata); } else { # Regular string $self->{matrix} = $self->_matrix_from_string($matrixdata); } } else { $self->throw("Wrong data type/format for -matrix.\n". "Acceptable formats are Array of Arrays (4 rows),\n". "PDL Array, (4 rows),\n". "or plain string (4 lines)."); } # $self->_set_min_max_score(); # print STDERR $self->prettyprint(); return 1; } sub _matrix_from_IUPAC_string { my ($self, $matrixstring) = @_; $matrixstring =~ s/^\s+//; $matrixstring =~ s/\s+$//; my @str = split(//, $matrixstring); my @matrix; for my $ltr (0..3) { my @l = map {$dna{uc($_)}->[$ltr]} @str; push @matrix, \@l; } return \@matrix; } sub _matrix_from_string { my ($self, $matrixstring) = @_; my @array = (); foreach ((split "\n", $matrixstring)[0..3]) { s/^\s+//; s/\s+$//; push @array, [split]; } return \@array; } sub _matrix_from_STAMP_string { my ($self, $matrixstring) = @_; my @lines = split("\n", $matrixstring); my (@a, @c, @g, @t); # Remove garbage shift(@lines); pop(@lines); for (@lines) { my @l = split; push @a, $l[1]; push @c, $l[2]; push @g, $l[3]; push @t, $l[4]; } return [\@a, \@c, \@g, \@t]; } sub _set_min_max_score { my ($self) = @_; my $transpose = $self->pdl_matrix->xchg(0,1); $self->{min_score} = sum(minimum $transpose); $self->{max_score} = sum(maximum $transpose); } sub _load { my ($self, $field, $value) = @_; if (substr(ref($self->{db}),0,5) eq "DBI::") { # database retrieval } elsif (-d $self->{dbh}) { # retrieval from .pwm files in a directory $self->_lookup_in_matrixlist($field, $value) or do { warn ("Matrix with $field=>$value not found."); return undef; }; my $ID = $self->{ID}; my $DIR = $self->{dbh}; $self->set_matrix(scalar `cat $DIR/$ID.pwm`); # FIXME - temporary } else { $self->throw("-dbh is not a valid database handle or a directory."); } } =head2 revcom Title : revcom Usage : my $revcom_pfm = $pfm->revcom(); Function: create a matrix pattern object which is reverse complement of the current one Returns : a TFBS::Matrix::* object of the same type as the one the method acted upon Args : none =cut sub revcom { my ($self) = @_; my $revcom_matrix = $self->new(-matrix => $self->pdl_matrix->slice('-1:0,-1:0'), # the above line rotates the original matrix 180 deg, -ID => ($self->{ID} or ""), -name => ($self->{name} or ""), -class => ($self->{class} or ""), -strand => ($self->{strand} and $self->{strand} eq "-") ? "+" : "-", -tags => ($self->{tags} or {}) ); return $revcom_matrix; } =head2 rawprint Title : rawprint Usage : my $rawstring = $pfm->rawprint); Function: convert matrix data to a simple tab-separated format Returns : a four-line string of tab-separated integers or floats Args : none =cut sub rawprint { my $self = shift; my $pwmstring = sprintf ( $self->pdl_matrix ); $pwmstring =~ s/\[|\]//g; # lose [] $pwmstring =~ s/\n /\n/g; # lose leading spaces my @pwmlines = split("\n", $pwmstring); # f $pwmstring = join ("\n", @pwmlines[2..5])."\n"; return $pwmstring; } =head2 prettyprint Title : prettyprint Usage : my $prettystring = $pfm->prettyprint(); Function: convert matrix data to a human-readable string format Returns : a four-line string with nucleotides and aligned numbers Args : none =cut sub prettyprint { my $self = shift; my $pwmstring = sprintf ( $self->pdl_matrix ); $pwmstring =~ s/\[|\]//g; # lose [] $pwmstring =~ s/\n /\n/g; # lose leading spaces my @pwmlines = split("\n", $pwmstring); # @pwmlines = ("A [$pwmlines[2] ]", "C [$pwmlines[3] ]", "G [$pwmlines[4] ]", "T [$pwmlines[5] ]"); $pwmstring = join ("\n", @pwmlines)."\n"; return $pwmstring; } =head2 STAMPprint Title : STAMPprint Usage : my $STAMPstring = $pfm->STAMPprint(); Function: Convert the matrix to STAMP readable format Returns : A string of the matrix in a TRANSFAC like format for STAMP Args : none =cut sub STAMPprint { my $self = shift; my $pwmstring = sprintf ( $self->pdl_matrix ); $pwmstring =~ s/\[|\]//g; # lose [] $pwmstring =~ s/\n\s+/\n/g; # lose leading spaces my @pwmlines = split(/\n+/, $pwmstring); @pwmlines = map { [ split(/\s+/, $_) ] } @pwmlines; $pwmstring = "DE ".$self->{'ID'}."\t".$self->{'name'}."\t".$self->{'class'}."\n"; for my $row (0..$#{ $pwmlines[1] }) { $pwmstring .= sprintf "%02d\t", $row; $pwmstring .= $pwmlines[$_]->[$row]."\t" for (1..$#pwmlines); $pwmstring .= "\n"; } $pwmstring .= "XX\n"; return $pwmstring; } =head2 length Title : length Usage : my $pattern_length = $pfm->length; Function: gets the pattern length in nucleotides (i.e. number of columns in the matrix) Returns : an integer Args : none =cut sub length { my $self = shift; return $self->pdl_matrix->getdim(0); } sub _pdl_to_matrixref { my ($matrixdata) = @_; unless ($matrixdata->isa("PDL")) { die "A non-PDL object passed to _pdl_to_matrixref"; } my @list = list $matrixdata; my @array; my $matrix_width = scalar(@list) / 4; for (0..3) { push @array, [splice(@list, 0, $matrix_width)]; } return \@array; } =head2 randomize_columns Title : randomize_columns Usage : $pfm->randomize_columns(); Function: Randomizes the columns of a matrix (in place). Returns : Nothing Args : none =cut sub randomize_columns { my $self = shift; my $matrix = $self->{'matrix'}; my $i = 0; # Schwartzian Transform to get random permutation map { ( undef, $$matrix[0][$i], $$matrix[1][$i], $$matrix[2][$i], $$matrix[3][$i] ) = @$_; $i++; } sort { $a->[0] <=> $b->[0] } map { [ rand(), $$matrix[0][$_], $$matrix[1][$_], $$matrix[2][$_], $$matrix[3][$_] ] } ( 0 .. ($self->length()-1) ); } sub DESTROY { # nothing } 1; TFBS-0.7.1/TFBS/Matrix/000077500000000000000000000000001305752266700142745ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Matrix/.svn/000077500000000000000000000000001305752266700151605ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Matrix/.svn/all-wcprops000077500000000000000000000011061305752266700173470ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 40 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Matrix END ICM.pm K 25 svn:wc:ra_dav:version-url V 47 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Matrix/ICM.pm END _Alignment.pm K 25 svn:wc:ra_dav:version-url V 54 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Matrix/_Alignment.pm END Alignment.pm K 25 svn:wc:ra_dav:version-url V 53 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Matrix/Alignment.pm END PFM.pm K 25 svn:wc:ra_dav:version-url V 47 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Matrix/PFM.pm END PWM.pm K 25 svn:wc:ra_dav:version-url V 47 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Matrix/PWM.pm END TFBS-0.7.1/TFBS/Matrix/.svn/entries000077500000000000000000000014501305752266700165570ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/Matrix http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df ICM.pm file 2009-08-07T13:10:46.000000Z 82d50dd52a35b43eb6db89437ff750a7 2008-01-24T20:21:25.772223Z 8 chrb _Alignment.pm file 2009-08-07T13:10:46.000000Z 62a9aba220bba9fa97a324e7f0105a7d 2008-01-24T20:21:25.772223Z 8 chrb Alignment.pm file 2009-08-07T13:10:46.000000Z 43a5ee65c66ae28efd38e6343eb47c6f 2008-01-24T20:21:25.772223Z 8 chrb PFM.pm file 2009-08-07T13:10:47.000000Z cd3d067545004c1a5df52903395592ca 2008-01-24T20:21:25.772223Z 8 chrb PWM.pm file 2009-08-07T13:10:47.000000Z 5244969e644beb349a298d19d404c9c1 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/Matrix/.svn/format000077500000000000000000000000021305752266700163660ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/Matrix/.svn/text-base/000077500000000000000000000000001305752266700170545ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Matrix/.svn/text-base/Alignment.pm.svn-base000077500000000000000000000161101305752266700230470ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::ICM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::Alignment - class for alignment of PFM objects =head1 SYNOPSIS =over 1 =item * Making an alignment: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pfm1 = $db_obj->get_Matrix_by_ID("M0001", "PFM"); my $pfm2 = $db_obj->get_Matrix_by_ID("M0002", "PFM"); my $alignment= new TFBS::Matrix::Alignment( -pfm1=>$pfm1, -pfm2=>$pfm2, -binary=>"/TFBS/Ext/matrix_aligner", ); =head1 DESCRIPTION TFBS::Matrix::Alignment is a class for representing and performing pairwise alignments of profiles (in the form of TFBS::PFM objects) Alignments are preformed using a semi-global variant of the Needleman-Wunsch algorithm that only permits the opening of one internal gap. Fore reference, the algorithm is described in Sandelin et al Funct Integr Genomics. 2003 Jun 25 =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code starts HERE: package TFBS::Matrix::Alignment; use vars '@ISA'; use strict; use Bio::Root::Root; use TFBS::Matrix; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); #alignment methods: for making and storing a single matrix-alignments =head2 new Title : new Usage : my $alignment = TFBS::Matrix::Alignment->new(%args) Function: constructor for the TFBS::Matrix::Alignment object Returns : a new TFBS::Matrix::Alignment object Args : # you must specify: -pfm1, # a TFBS::Matrix::PFM object -pfm2, # another TFBS::Matrix::PFM object -binary, # a valid path to the comparison algorithm (matrixalign) ####### -ext_penalty #OPTIONAL gap extension penalty in Needleman-Wunsch algorithmstring. Default 0.01 -open_penalty, #OPTIONAL gap opening penalty in Needleman-Wunsch algorithmstring. Default 3.0 =cut sub new { #defines and createa an alignment # args: two pfm objects # binary file #optional scoring penalites my ($class, %args) = @_; my $self={ _pfm1=> $args{'-pfm1'}, _pfm2=> $args{'-pfm2'}, _ext_penalty=>$args{'-ext_penalty'}|| 0.01, _open_penalty=> $args{'-open_penalty'}|| 3.00, _strand=>'', _align_string=>'', _gaps=>'', _aligned_positions =>'', _score=>'', }; bless $self, "TFBS::Matrix::Alignment"; # errorcheck: # save temp files my($fh1, $file1) = tmpnam(); print $fh1 $args{'-pfm1'}->rawprint()|| die " Cannot save temporary files for alignment"; my($fh2, $file2) = tmpnam(); print $fh2 $args{'-pfm2'}->rawprint()|| die " Cannot save temporary files for alignment"; #align my @pfm1_string; my @pfm2_string; foreach (`$args{'-binary'} $file1 $file2 $self->{'_open_penalty'} $self->{'_ext_penalty'}`){ # my @pfm2_string; my $max_length=$self->{'_pfm1'}->length(); $max_length=$self->{'_pfm2'}->length() if ( $self->{'_pfm2'}->length() > $self->{'_pfm1'}->length()); if (/^PFM1/){ s/PFM1//; s/\t0/\t-/g; @pfm1_string= split(); next; } if (/^PFM2/){ s/PFM2//; s/\t0/\t-/g; @pfm2_string= split(); next; } if (/^INFO/){ my @temp=split; ($self->{'_score'}, $self->{'_strand'}, $self->{'_aligned_positions'}, $self->{'_gaps'})= ($temp[3], $temp[6], $temp[7],$temp[8]); next; } } my $string= ($self->{'_pfm1'}->name()||$self->{'_pfm1'}->ID()||'PFM1')."\t\t"; my $string2=($self->{'_pfm2'}->name()||$self->{'_pfm2'}->ID()||'PFM2')."\t\t";; if ($pfm1_string[0]==1){ $string.="-\t" x ($pfm2_string[0]-1); foreach (my $j=1; $j< $pfm2_string[0]; $j++){ $string2.="$j\t"; } } if ($pfm2_string[0]==1){ $string2.="-\t" x ($pfm1_string[0]-1); for (my $j=1; $j< $pfm1_string[0]; $j++){ $string.="$j\t"; } } $string.= join("\t", @pfm1_string); $string2.= join("\t", @pfm2_string); if ($pfm1_string[-1]==$self->{'_pfm1'}->length()){ $string.="\t-" x ($self->{'_pfm2'}->length()-$pfm2_string[-1]); for (my $j=$pfm2_string[-1]+1; $j<= $self->{'_pfm2'}->length(); $j++){ $string2.="\t$j"; } } if ($pfm2_string[-1]==$self->{'_pfm2'}->length()){ $string2.="\t-" x ($self->{'_pfm1'}->length()-$pfm1_string[-1]); for (my $j=$pfm1_string[-1]+1; $j<= $self->{'_pfm1'}->length(); $j++){ $string.="\t$j"; } } $self->{'_align_string'}= $string ."\n". $string2; return $self; } # access functions =head2 score Title : score Usage : my $score = $alignmentobject->score(); Function: access an alignment score (where each aligned position can contribute max 2) Returns : a floating point number Args : none =cut =head2 score Title : gaps Usage : my $nr_of_gaps = $alignmentobject->gaps(); Function: access the number of gaps in an alignment Returns : an integer Args : none =cut =head2 length Title : length Usage : my $length = $alignmentobject->length(); Function: access the length of an alignment (ie thenumber of aligned positions) Returns : an integer Args : none =cut =head2 strand Title : strand Usage : my $strand = $alignmentobject->strand(); Function: access the oriantation of the aligned patterns: ++= oriented as input +-= second pattern is reverse-complemented Returns : a string Args : none =cut =head2 alignment Title : alignment Usage : my $alignment_string = $alignmentobject->alignment(); Function: access a string describing the alignment Returns : an string, where each number refers to a position in repective PFM. Position numbering is according to orientation: ie if the second profile is reversed, position 1 corresponds to the last position in the input profile. Gaps are denoted as - . RXR-VDR - 1 2 3 - 4 5 - PPARgamma-RXRal 1 2 3 4 5 6 7 8 Args : none =cut sub gaps{ return $_[0]->{'_gaps'};} sub score{ return $_[0]->{'_score'};} sub length{ return $_[0]->{'_aligned_positions'};} sub strand{ return $_[0]->{'_strand'};} sub alignment{ return $_[0]->{'_align_string'};} 1 TFBS-0.7.1/TFBS/Matrix/.svn/text-base/ICM.pm.svn-base000077500000000000000000000773711305752266700215610ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::ICM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::ICM - class for information content matrices of nucleotide patterns =head1 SYNOPSIS =over 4 =item * creating a TFBS::Matrix::ICM object manually: my $matrixref = [ [ 0.00, 0.30, 0.00, 0.00, 0.24, 0.00 ], [ 0.00, 0.00, 0.00, 1.45, 0.42, 0.00 ], [ 0.00, 0.89, 2.00, 0.00, 0.00, 0.00 ], [ 0.00, 0.00, 0.00, 0.13, 0.06, 2.00 ] ]; my $icm = TFBS::Matrix::ICM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = <new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" ); =item * retrieving a TFBS::Matix::ICM object from a database: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pfm = $db_obj->get_Matrix_by_ID("M0001", "ICM"); # or my $pfm = $db_obj->get_Matrix_by_name("MyProfile", "ICM"); =item * retrieving list of individual TFBS::Matrix::ICM objects from a TFBS::MatrixSet object (see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices) my @icm_list = $matrixset->all_patterns(-sort_by=>"name"); * drawing a sequence logo $icm->draw_logo(-file=>"logo.png", -full_scale =>2.25, -xsize=>500, -ysize =>250, -graph_title=>"C/EBPalpha binding site logo", -x_title=>"position", -y_title=>"bits"); =back =head1 DESCRIPTION TFBS::Matrix::ICM is a class whose instances are objects representing position weight matrices (PFMs). An ICM is normally calculated from a raw position frequency matrix (see L for the explanation of position frequency matrices). For example, given the following position frequency matrix, A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ] the standard computational procedure is applied to convert it into the following information content matrix: A:[2.00 0.30 0.00 0.00 0.24 0.00] C:[0.00 0.00 0.00 1.45 0.42 0.00] G:[0.00 0.89 2.00 0.00 0.00 0.00] T:[0.00 0.00 0.00 0.13 0.06 2.00] which contains the "weights" associated with the occurence of each nucleotide at the given position in a pattern. A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code starts HERE: package TFBS::Matrix::ICM; use vars '@ISA'; use PDL; use strict; use Bio::Root::Root; use Bio::SeqIO; use TFBS::Matrix; BEGIN { # this will not fail if the modules are nit available # but only if the user tries to actually draw a logo eval "use SVG"; eval "use GD"; }; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); ################################################################# # PUBLIC METHODS ################################################################# =head2 new Title : new Usage : my $icm = TFBS::Matrix::ICM->new(%args) Function: constructor for the TFBS::Matrix::ICM object Returns : a new TFBS::Matrix::ICM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL =cut sub new { my ($class, %args) = @_; my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"ICM"); my $self = bless $matrix, ref($class) || $class; $self->_check_ic_validity(); return $self; } =head2 to_PWM Title : to_PWM Usage : my $pwm = $icm->to_PWM() Function: converts an information content matrix (a TFBS::Matrix::ICM object) to position weight matrix. At present it assumes uniform background distribution of nucleotide frequencies. Returns : a new TFBS::Matrix::PWM object Args : none; in the future releases, it should be able to accept a user defined background probability of the four nucleotides =cut sub to_PWM { my ($self) = @_; $self->throw ("Method to_PWM not yet implemented."); } =head2 draw_logo Title : draw_logo Usage : my $gdImageObj = $icm->draw_logo(%args) Function: Draws a "sequence logo", a graphical representation of a possibly degenerate fixed-width nucleotide sequence pattern, from the information content matrix Returns : a GD::Image object; if you only need the image file you can ignore it Args : -file, # the name of the output PNG image file # OPTIONAL: default none -xsize # width of the image in pixels # OPTIONAL: default 600 -ysize # height of the image in pixels # OPTIONAL: default 5/8 of -x_size -startpos # start position in the logo for x axis # OPTIONAL: default is 1 -margin # size of image margins in pixels # OPTIONAL: default 15% of -y_size -full_scale # the maximum value on the y-axis, in bits # OPTIONAL: default 2.25 -graph_title,# the graph title # OPTIONAL: default none -x_title, # x-axis title; OPTIONAL: default none -y_title # y-axis title; OPTIONAL: default none -error_bars # reference to an array of S.D. values for each column; OPTIONAL -ps # if true, produces a postscript string instead of a GD::Image object -pdf # if true AND the -file argumant is used, produces an output pdf file =cut sub draw_logo { no strict; my $self = shift; my %args = (-xsize => 600, -full_scale => 2.25, -graph_title=> "", -x_title => "", -y_title => "", -startpos => 1, @_); # Other parameters that can be specified: # -ysize -line_width -margin # do not have a fixed default value # - they are calculated from xsize if not specified # draw postscript logo if asked for if ($args{'-ps'} || $args{'-pdf'}){ return _draw_ps_logo($self, %args); } if ($args{'-svg'} || $args{'-SVG'}){ return _draw_svg_logo($self, %args); } my ($xsize,$FULL_SCALE, $x_title, $y_title) = @args{qw(-xsize -full_scale -x_title y_title)} ; my $PER_PIXEL_LINE = 300; # calculate other parameters if not specified my $line_width = ($args{-line_width} or int ($xsize/$PER_PIXEL_LINE) or 1); my $ysize = ($args{-ysize} or $xsize/1.6); # remark (the line above): 1.6 is a standard screen x:y ratio my $margin = ($args{-margin} or $ysize*0.15); my $image = GD::Image->new($xsize, $ysize); my $white = $image->colorAllocate(255,255,255); my $black = $image->colorAllocate(0,0,0); my $motif_size = $self->pdl_matrix->getdim(0); my $font = ((&GD::gdTinyFont(), &GD::gdSmallFont(), &GD::gdMediumBoldFont(), &GD::gdLargeFont(), &GD::gdGiantFont())[int(($ysize-50)/100)] or &GD::gdGiantFont()); my $title_font = ((&GD::gdSmallFont(), &GD::gdMediumBoldFont(), &GD::gdLargeFont(), &GD::gdGiantFont())[int(($ysize-50)/100)] or &GD::gdGiantFont()); # WRITE LABELS AND TITLE # graph title #&GD::Font::MediumBold $image->string($title_font, $xsize/2-length($args{-graph_title})* $title_font->width() /2, $margin/2 - $title_font->height()/2, $args{-graph_title}, $black); # x_title $image->string($font, $xsize/2-length($args{-x_title})*$font->width()/2, $ysize-( $margin - $font->height()*0 - 5*$line_width)/2 - $font->height()/2*0, $args{-x_title}, $black); # y_title $image->stringUp($font, ($margin -$font->width()- 5*$line_width)/2 - $font->height()/2 , $ysize/2+length($args{'-y_title'})*$font->width()/2, $args{'-y_title'}, $black); # DRAW AXES # vertical: (top left to bottom right) $image->filledRectangle($margin-$line_width, $margin-$line_width, $margin-1, $ysize-$margin+$line_width, $black); # horizontal: (ditto) $image->filledRectangle($margin-$line_width, $ysize-$margin+1, $xsize-$margin+$line_width,$ysize-$margin+$line_width, $black); # DRAW VERTICAL TICKS AND LABELS # vertical axis (IC 1 and 2) my $ic_1 = ($ysize - 2* $margin) / $FULL_SCALE; foreach my $i (1..$FULL_SCALE) { $image->filledRectangle($margin-3*$line_width, $ysize-$margin - $i*$ic_1, $margin-1, $ysize-$margin+$line_width - $i*$ic_1, $black); $image->string($font, $margin-5*$line_width - $font->width, $ysize - $margin - $i*$ic_1 - $font->height()/2, $i, $black); } # DRAW HORIZONTAL TICKS AND LABELS, AND THE LOGO ITSELF # define function refs as hash elements my %draw_letter = ( A => \&_png_draw_A, C => \&_png_draw_C, G => \&_png_draw_G, T => \&_png_draw_T ); my $horiz_step = ($xsize -2*$margin) / $motif_size; #this is to avoid clutter on X axis: my $longest_label_length = length("$motif_size"); if (length ($args{-startpos}) > $longest_label_length) { $longest_label_length = length ($args{-startpos}); } if (length ($args{-startpos}+$motif_size) > $longest_label_length) { $longest_label_length = length ($args{-startpos}+$motif_size); } my $draw_every_nth_label = int($longest_label_length*$font->width+2) / $horiz_step + 1; foreach my $i (0..$motif_size) { $image->filledRectangle($margin + $i*$horiz_step, $ysize-$margin+1, $margin + $i*$horiz_step+ $line_width, $ysize-$margin+3*$line_width, $black); last if $i==$motif_size; # get the $i-th column of matrix my %ic; ($ic{A}, $ic{C}, $ic{G}, $ic{T}) = list $self->pdl_matrix->slice($i); # sort nucleotides by increasing information content my @draw_order = sort {$ic{$a}<=>$ic{$b}} qw(A C G T); # draw logo column my $xlettersize = $horiz_step /1.1; my $ybottom = $ysize - $margin; foreach my $base (@draw_order) { my $ylettersize = int($ic{$base}*$ic_1 +0.5); next if $ylettersize ==0; # draw letter $draw_letter{$base}->($image, $margin + $i*$horiz_step, $ybottom - $ylettersize, $xlettersize, $ylettersize, $white); $ybottom = $ybottom - $ylettersize-1; } if ($args{'-error_bars'} and ref($args{'-error_bars'}) eq "ARRAY") { my $sd_pix = int($args{'-error_bars'}->[$i]*$ic_1); my $yt = $ybottom - $sd_pix+1; my $yb = $ybottom + $sd_pix-1; my $xpos = $margin + ($i+0.45)*$horiz_step; my $half_width; if ($yb > $ysize-$margin+$line_width) { $yb = $ysize-$margin+$line_width } else { $image->line($xpos - $xlettersize/8, $yb, $xpos + $xlettersize/8, $yb, $black); } $image->line($xpos, $yt, $xpos, $yb, $black); $image->line($xpos - 1 , $ybottom, $xpos+1, $ybottom, $black); $image->line($xpos - $xlettersize/8, $yt, $xpos + $xlettersize/8, $yt, $black); } # print position number on x axis (The if condition is for avoiding clutter) my $xlabel = $i+ $args{-startpos}; if ($args{-startpos}<0 and $xlabel>=0) { $xlabel ++; } if ($xlabel % $draw_every_nth_label == 0) { $image->string($font, $margin + ($i+0.5)*$horiz_step - $font->width()/2, $ysize - $margin +5*$line_width, $xlabel, $black); } } # print $args{-file}; if ($args{-file}) { open (PNGFILE, ">".$args{-file}) or $self->throw("Could not write to ".$args{-file}); print PNGFILE $image->png; close PNGFILE; } return $image; } sub total_ic { return $_[0]->pdl_matrix->sum(); } =head2 _draw_ps_logo Title : _draw_ps_logo Usage : my $postscript_string = $icm->_draw_ps_logo(%args) Internal method, should be accessed using draw_logo() Function: Draws a "sequence logo", a graphical representation of a possibly degenerate fixed-width nucleotide sequence pattern, from the information content matrix Returns : a postscript string; if you only need the image file you can ignore it Args : -file, # the name of the output PNG image file # OPTIONAL: default none -xsize # width of the image in pixels # OPTIONAL: default 600 -ysize # height of the image in pixels # OPTIONAL: default 5/8 of -x_size -full_scale # the maximum value on the y-axis, in bits # OPTIONAL: default 2.25 -graph_title,# the graph title # OPTIONAL: default none -x_title, # x-axis title; OPTIONAL: default none -y_title # y-axis title; OPTIONAL: default none =cut sub _draw_ps_logo{ my $self = shift; my %args = (-xsize => 600, -full_scale => 2.25, -graph_title=> "", -x_title => "", -y_title => "", @_); my $xsize= $args{'-xsize'}; my $max_ysize= $args{'-ysize'} ||int 5* $args{'-xsize'}/8; my $ysize= $max_ysize*($args{'-full_scale'}-($args{'-full_scale'}-2))/$args{'-full_scale'}; my $x=100; # nternal, for placement on 'paper' my $y=100; my $out= "%!PS-Adobe-2.0 %%Orientation: Portrait %%Pages: 1 %%BoundingBox: 0 0 ".($args{'-xsize'}*1.2)." ".( $max_ysize*1.5)." %%BeginSetup %%EndSetup %%Magnification: 1.0000 %%EndProlog %%end %%save gsave\n"; #colors and correction definitions my %color; $color{'black'}="0.000 0.000 0.000 setrgbcolor"; $color{'A'}="0.000 1.000 0.000 setrgbcolor"; $color{'C'}="0.000 0.000 1.000 setrgbcolor"; $color{'G'}="1.000 0.860 0.000 setrgbcolor"; $color{'T'}="1.000 0.000 0.000 setrgbcolor"; my $fontsize= int $ysize*0.68; my $fontwidth=1.5*($xsize/$self->length()); my %w_correct; # correction of font widths $w_correct{'A'}=0.95; $w_correct{'T'}=1.05; $w_correct{'C'}=0.90; $w_correct{'G'}=0.90; my %y_next;#correction of font heights $y_next{'A'}=1; $y_next{'T'}=1; $y_next{'C'}=0.94; $y_next{'G'}=0.94; my %y_correct; #correction of font bounding boxes $y_correct{'A'}=0; $y_correct{'C'}=0.035*$fontsize; $y_correct{'G'}=0.035*$fontsize; $y_correct{'T'}=0; #define y axis,tickmarks and scaling my $font= $fontwidth/5; $out.="newpath\n ". ($x-10)." ". ($y+2*$ysize/4 )." moveto\n". "$x ". ($y+2*$ysize/4 ) ." lineto\n stroke\n"; $out.= "gsave\n/Times-Bold findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x-20). " ".( $y+$ysize/2)." moveto\n"; $out.=" (1) show\n grestore\n" ; $out.="newpath\n ". ($x-10)." ". ($y+$ysize )." moveto\n". "$x ". ($y+$ysize) ." lineto\n stroke\n"; $out.="newpath\n ". ($x-10)." ". ($y+$max_ysize )." moveto\n". "$x ". ($y+$max_ysize) ." lineto\n stroke\n"; $out.= "gsave\n/Times-Bold findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x-20). " ".( $y+$ysize)." moveto\n"; $out.=" (2) show\n grestore\n" ; $out.="newpath\n $x $y moveto\n". ($x). " ".($y+$max_ysize) ." lineto\n stroke\n"; $out.="newpath\n $x $y moveto\n". ($x+$xsize). " ".($y) ." lineto\n stroke\n"; # draw titles if requested if ($args{'-y_title'}){ $out.= "gsave\n/Times-Italic findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x-40). " ".( $y+$ysize/2)." moveto\n"; $out.=" 90 rotate ($args{'-y_title'}) show\n grestore\n" ; } if ($args{'-x_title'}){ $out.= "gsave\n/Times-Italic findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x+$xsize/2.5). " ".( $y*(0.60))." moveto\n"; $out.=" ($args{'-x_title'}) show\n grestore\n" ; } if ($args{'-title'}){ $out.= "gsave\n/Times-Roman findfont $color{black} [".($font*2)." 0 0 $font 0 0] makefont setfont\n".($x+$xsize/3). " ".( $y+$max_ysize*1.1)." moveto\n"; $out.=" ($args{'-title'}) show\n grestore\n" ; } # define x axis and x tickmarks my $col_width=($xsize/$self->length()) -0.006*$xsize; my $x_now; for(my $i=1; $i<=$self->length(); $i++){ $x_now=$x+$col_width*$i; $out.="newpath\n ". ($x_now)." ". ($y)." moveto\n". ($x_now)." ". ($y-$ysize/20 ) ." lineto\n stroke\n"; $out.= "gsave\n/Times-Bold findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x_now-$col_width/2). " ".( $y-20)." moveto\n"; $out.=" ($i) show\n grestore\n" ; } # draw the logo foreach my $i (0..$self->length()-1 ) { # get the $i-th column of matrix my %ic; ($ic{A}, $ic{C}, $ic{G}, $ic{T}) = list $self->pdl_matrix->slice($i); my @draw_order = sort {$ic{$a}<=>$ic{$b}} qw(A C G T); #draw this position foreach my $letter (@draw_order){ $ic{$letter}=0.0000001 if ( $ic{$letter}==0); # some interpretors do not uderstand size 0 $out.= "gsave\n/Helvetica-Bold findfont $color{$letter} [".$fontwidth*$w_correct{$letter}." 0 0 "; $out.= $ic{$letter}*$fontsize*$y_next{$letter} ; $y+=$y_correct{$letter}*$ic{$letter}; #movement that isletter specific, due to bounding boxes $out.= " 0 0] makefont setfont\n$x $y moveto\n"; $out.= " ($letter) show\n grestore\n" ; $y+=$fontsize*$ic{$letter}*0.75; #ic content move } $x+=$fontwidth/1.6; $y=100; } # save as file if requested if ($args{-file}) { open (PSFILE, ">".$args{-file}) or $self->throw("Could not write to ".$args{-file}); print PSFILE $out; close PSFILE; } if ($args{-pdf}){ system "ps2pdf $args{-file} ".$args{-file}.".pdf "; system " mv $args{-file}.pdf $args{-file}"; } return $out; } =head2 _draw_svg_logo =cut sub _draw_svg_logo { my $self = shift; my %args = (-xsize => 800, -full_scale => 2.25, -graph_title=> "", -x_title => "", -y_title => "", @_); my $max_ysize= $args{'-ysize'} ||int 5* $args{'-xsize'}/8; my ($xsize,$FULL_SCALE, $x_title, $y_title) = @args{qw(-xsize -full_scale -x_title y_title)} ; my $PER_PIXEL_LINE = 200; # calculate other parameters if not specified my $ysize = ($args{-ysize} or $xsize/1.6); my $line_width = ($args{-line_width} or $ysize/$PER_PIXEL_LINE); # remark (the line above): 1.6 is a standard screen x:y ratio my $margin = ($args{-margin} or $ysize*0.15); my $image = SVG->new(width=>$xsize, height=>$ysize); my $white = 'rgb(255,255,255)'; my $black = 'rgb(0,0,0)'; my $motif_size = $self->pdl_matrix->getdim(0); my $fontsize = int ($ysize/25); my $title_font = {width=>$fontsize*1.5, height=>$fontsize*1.5}; my $font = {width=>$fontsize, height=>$fontsize}; # WRITE LABELS AND TITLE # graph title $image->text(id=>"Title", 'font-size'=>$title_font->{width}, x => $xsize/2, y => 0.6*$margin, 'text-anchor'=>'middle' )->cdata($args{-graph_title}); # x title $image->text(id=>"X_title", 'font-size'=>$font->{width}, x => $xsize/2, y => $ysize -0.3*$margin, 'text-anchor'=>'middle' )->cdata($args{-x_title}); # y title my $g = $image->group; $g->text(id=>"Y_title", 'font-size'=>$font->{width}, x => 0 , 'text-anchor'=>'middle', y => 0, transform => 'rotate(-90) translate(-'.($ysize/2).','.($margin/2).')')->cdata($args{-y_title}); # DRAW AXES # vertical: (top left to bottom right) $image->rectangle(id => "y_axis", style => { #stroke => $black, fill => $black }, x => $margin-$line_width, y => $margin, width => $line_width, height => $ysize -2*$margin ); #$image->filledRectangle($margin-$line_width, $margin-$line_width, # $margin-1, $ysize-$margin+$line_width, # # $black); # horizontal: (ditto) $image->rectangle(id => "x_axis", style => { #stroke => $black, fill => $black }, x => $margin-$line_width, y => $ysize-$margin, width => $xsize-2*$margin+$line_width, height => $line_width ); #$image->filledRectangle($margin-$line_width, $ysize-$margin+1, # $xsize-$margin+$line_width,$ysize-$margin+$line_width, # $black); # DRAW VERTICAL TICKS AND LABELS # vertical axis (IC 1 and 2) my $ic_1 = ($ysize - 2* $margin) / $FULL_SCALE; foreach my $i (1..$FULL_SCALE) { $image->rectangle(x => $margin-3*$line_width, y => $ysize-$margin - $i*$ic_1, width => 3*$line_width, height => $line_width ); $image->text(x => $margin-5*$line_width - $font->{width}, y => $ysize - $margin - $i*$ic_1 +$font->{height}/2, 'font-size'=>$font->{width}, 'text-anchor'=>"right" )->cdata($i); } # DRAW HORIZONTAL TICKS AND LABELS, AND THE LOGO ITSELF # define function refs as hash elements my %draw_letter = ( A => \&_svg_draw_A, C => \&_svg_draw_C, G => \&_svg_draw_G, T => \&_svg_draw_T ); my $horiz_step = ($xsize -2*$margin) / $motif_size; #this is to avoid clutter on X axis: my $longest_label_length = length("$motif_size"); if (length ($args{-startpos}) > $longest_label_length) { $longest_label_length = length ($args{-startpos}); } if (length ($args{-startpos}+$motif_size) > $longest_label_length) { $longest_label_length = length ($args{-startpos}+$motif_size); } my $draw_every_nth_label = int(($longest_label_length+0.25)*$font->{width}) / $horiz_step + 1; foreach my $i (0..$motif_size) { my $height = 3*$line_width; if ($i and $i==$args{-startpos}*-1){ $height = 5*$line_width; } $image->rectangle(x => $margin + $i*$horiz_step -$line_width/2, y => $ysize-$margin, width => $line_width, height => $height ); last if $i==$motif_size; # get the $i-th column of matrix my %ic; ($ic{A}, $ic{C}, $ic{G}, $ic{T}) = list $self->pdl_matrix->slice($i); # sort nucleotides by increasing information content my @draw_order = sort {$ic{$a}<=>$ic{$b}} qw(A C G T); # draw logo column my $xlettersize = $horiz_step*0.95; my $ybottom = $ysize - $margin; foreach my $base (@draw_order) { my $ylettersize = $ic{$base}*$ic_1; next if $ylettersize ==0; # draw letter $draw_letter{$base}->($image, $margin + $i*$horiz_step + 0.025* $horiz_step, $ybottom - $ylettersize, $xlettersize, $ylettersize, $white); $ybottom = $ybottom - $ylettersize; } if ($args{'-error_bars'} and ref($args{'-error_bars'}) eq "ARRAY") { my $sd_pix = int($args{'-error_bars'}->[$i]*$ic_1); my $yt = $ybottom - $sd_pix+1; my $yb = $ybottom + $sd_pix-1; my $xpos = $margin + ($i+0.5)*$horiz_step; my $half_width; if ($yb > $ysize-$margin+$line_width) { $yb = $ysize-$margin+$line_width } else { $image->line(x1=>$xpos - $xlettersize/8, y1=> $yb, x2=> $xpos + $xlettersize/8, y2=>$yb, stroke=>$black, 'stroke-width'=>$line_width); } $image->line(x1=>$xpos, y1=>$yt, x2=>$xpos, y2=>$yb, stroke=>$black, 'stroke-width'=>$line_width); $image->line(x1=>$xpos - $line_width , y1=>$ybottom, x2=>$xpos+$line_width, y2=>$ybottom, stroke=>$black, 'stroke-width'=>$line_width); $image->line(x1=>$xpos - $xlettersize/8, y1=>$yt, x2=>$xpos + $xlettersize/8, y2=>$yt, stroke=>$black, 'stroke-width'=>$line_width); } # print position number on x axis my $xlabel = $i+ $args{-startpos}; if ($args{-startpos}<0 and $xlabel>=0) { $xlabel ++; } if ($xlabel % $draw_every_nth_label == 0) { $image->text(x => $margin + ($i+0.5)*$horiz_step - $font->{width}/2, y => $ysize - $margin +5*$line_width + $font->{width}/2, 'font-size'=>$font->{width}, 'text-anchor'=>"bottom" )->cdata($xlabel); } } # print to $args{-file}; if ($args{-file}) { open (SVGFILE, ">".$args{-file}) or $self->throw("Could not write to ".$args{-file}); my $xml = $image->xmlify; $xml =~ s/\s+<\/text/<\/text/gs; print SVGFILE $xml; close SVGFILE; } return $image; } =head2 name =head2 ID =head2 class =head2 matrix =head2 length =head2 revcom =head2 rawprint =head2 prettyprint The above methods are common to all matrix objects. Please consult L to find out how to use them. =cut ################################################################# # INTERNAL METHODS ################################################################# sub _check_ic_validity { my ($self) = @_; # to do } sub DESTROY { # nothing } ################################################################# # UTILITY FUNCTIONS ################################################################# # letter drawing routines sub _png_draw_A { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $green = $im->colorAllocate(0,255,0); my $outPoly = GD::Polygon->new(); $outPoly->addPt($x, $y+$ysize); $outPoly->addPt($x+$xsize*.42, $y); $outPoly->addPt($x+$xsize*.58, $y); $outPoly->addPt($x+$xsize, $y+$ysize); $outPoly->addPt($x+0.85*$xsize, $y+$ysize); $outPoly->addPt($x+0.725*$xsize, $y+0.75*$ysize); $outPoly->addPt($x+0.275*$xsize, $y+0.75*$ysize); $outPoly->addPt($x+0.15*$xsize, $y+$ysize); $im->filledPolygon($outPoly, $green); if ($ysize>8) { my $inPoly = GD::Polygon->new(); $inPoly->addPt($x+$xsize*.5, $y+0.2*$ysize); $inPoly->addPt($x+$xsize*.34, $y+0.6*$ysize-1); $inPoly->addPt($x+$xsize*.64, $y+0.6*$ysize-1); $im->filledPolygon($inPoly, $white); } return 1; } sub _png_draw_C { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $blue = $im->colorAllocate(0,0,255); $im->arc($x+$xsize*0.54, $y+$ysize/2,1.08*$xsize,$ysize,0,360,$blue); $im->fill($x+$xsize/2, $y+$ysize/2, $blue); if ($ysize>12) { $im->arc($x+$xsize*0.53, $y+$ysize/2, 0.75*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize/2, $y+$ysize/4+1, $x+$xsize*1.1, $y+(3*$ysize/4)-1, $white); } elsif ($ysize>3) { $im->arc($x+$xsize*0.53, $y+$ysize/2, (0.75-0.75/$ysize)*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize*0.25, $y+$ysize/2, $x+$xsize*1.1, $y+$ysize/2, $white); } return 1; } sub _png_draw_G { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $yellow = $im->colorAllocate(200,200,0); $im->arc($x+$xsize*0.54, $y+$ysize/2,1.08*$xsize,$ysize,0,360,$yellow); $im->fill($x+$xsize/2, $y+$ysize/2, $yellow); if ($ysize>20) { $im->arc($x+$xsize*0.53, $y+$ysize/2, 0.75*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize/2, $y+$ysize/4+1, $x+$xsize*1.1, $y+$ysize/2-1, $white); } elsif($ysize>3) { $im->arc($x+$xsize*0.53, $y+$ysize/2, (0.75-0.75/$ysize)*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize*0.25, $y+$ysize/2, $x+$xsize*1.1, $y+$ysize/2, $white); } $im->filledRectangle($x+0.85*$xsize, $y+$ysize/2, $x+$xsize,$y+(3*$ysize/4)-1, $yellow); $im->filledRectangle($x+0.6*$xsize, $y+$ysize/2, $x+$xsize,$y+(5*$ysize/8)-1, $yellow); return 1; } sub _png_draw_T { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $red = $im->colorAllocate(255,0,0); $im->filledRectangle($x, $y, $x+$xsize, $y+0.16*$ysize, $red); $im->filledRectangle($x+0.42*$xsize, $y, $x+0.58*$xsize, $y+$ysize, $red); return 1; } sub _svg_draw_A { my ($im, $x, $y, $xsize, $ysize) = @_; $im->polygon( points => [$x, $y+$ysize, $x+$xsize*.42, $y, $x+$xsize*.58, $y, $x+$xsize, $y+$ysize, $x+0.85*$xsize, $y+$ysize, $x+0.725*$xsize, $y+0.75*$ysize, $x+0.275*$xsize, $y+0.75*$ysize, $x+0.15*$xsize, $y+$ysize, $x, $y+$ysize], fill => 'rgb(0,255,0)' ); $im->polygon( points => [$x+$xsize*.5, $y+0.2*$ysize, $x+$xsize*.34, $y+0.6*$ysize, $x+$xsize*.64, $y+0.6*$ysize ], fill => 'rgb(255,255,255)'); return 1; } sub _svg_draw_C { my ($im, $x, $y, $xsize, $ysize) = @_; $im->ellipse(cx=>$x+$xsize*0.54, cy=>$y+$ysize/2, rx=>$xsize*0.54, ry=>$ysize/2, fill => 'rgb(0,0,255)'); $im->ellipse( cx=>$x+$xsize*0.53, cy=>$y+$ysize/2, rx=>$xsize*0.375, ry=>$ysize*0.375, fill => 'rgb(255,255,255)'); $im->rectangle(x=>$x+$xsize/2, y=>$y+$ysize/4, width =>$xsize*0.6, height =>$ysize/2, fill=> 'rgb(255,255,255)'); return 1; } sub _svg_draw_G { my ($im, $x, $y, $xsize, $ysize, $white) = @_; $im->ellipse(cx => $x+$xsize*0.54, cy => $y+$ysize/2, rx => 0.54*$xsize, ry => $ysize/2, fill => 'rgb(200,200,0)'); $im->ellipse(cx => $x+$xsize*0.53, cy => $y+$ysize/2, rx => 0.375*$xsize, ry => 0.375*$ysize, fill => 'rgb(255,255,255)'); $im->rectangle(x=>$x+$xsize/2, y=>$y+$ysize/4, width =>$xsize*0.6, height =>$ysize/2, fill=> 'rgb(255,255,255)'); $im->rectangle(x=>$x+0.80*$xsize, y=>$y+$ysize/2, width =>$xsize*0.208, height =>$ysize/4, fill=> 'rgb(200,200,0)'); $im->rectangle(x=>$x+0.6*$xsize, y=>$y+$ysize/2, width =>$xsize*0.408, height =>$ysize/8, fill=> 'rgb(200,200,0)'); return 1; } sub _svg_draw_T { my ($im, $x, $y, $xsize, $ysize, $white) = @_; $im->polygon (points =>[$x, $y, $x+$xsize, $y, $x+$xsize, $y+0.16*$ysize, $x+0.58*$xsize, $y+0.16*$ysize, $x+0.58*$xsize, $y+$ysize, $x+0.42*$xsize, $y+$ysize, $x+0.42*$xsize, $y+0.16*$ysize, $x, $y+0.16*$ysize], fill => 'rgb(255,0,0)'); return 1; } 1; TFBS-0.7.1/TFBS/Matrix/.svn/text-base/PFM.pm.svn-base000077500000000000000000000373371305752266700215710ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::PFM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::PFM - class for raw position frequency matrix patterns =head1 SYNOPSIS =over 4 =item * creating a TFBS::Matrix::PFM object manually: my $matrixref = [ [ 12, 3, 0, 0, 4, 0 ], [ 0, 0, 0, 11, 7, 0 ], [ 0, 9, 12, 0, 0, 0 ], [ 0, 0, 0, 1, 1, 12 ] ]; my $pfm = TFBS::Matrix::PFM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = "12 3 0 0 4 0\n0 0 0 11 7 0\n0 9 12 0 0 0\n0 0 0 1 1 12"; my $pfm = TFBS::Matrix::PFM->new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" ); =item * retrieving a TFBS::Matix::PFM object from a database: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pfm = $db_obj->get_Matrix_by_ID("M0001", "PFM"); # or my $pfm = $db_obj->get_Matrix_by_name("MyProfile", "PFM"); =item * retrieving list of individual TFBS::Matrix::PFM objects from a TFBS::MatrixSet object (See the L to learn how to create objects for storage and manipulation of multiple matrices.) my @pfm_list = $matrixset->all_patterns(-sort_by=>"name"); =item * convert a raw frequency matrix to other matrix types: my $pwm = $pfm->to_PWM(); # convert to position weight matrix my $icm = $icm->to_ICM(); # convert to information con =back =head1 DESCRIPTION TFBS::Matrix::PFM is a class whose instances are objects representing raw position frequency matrices (PFMs). A PFM is derived from N nucleotide patterns of fixed size, e.g. the set of sequences AGGCCT AAGCCT AGGCAT AAGCCT AAGCCT AGGCAT AGGCCT AGGCAT AGGTTT AGGCAT AGGCCT AGGCCT will give the matrix: A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ] which contains the count of each nucleotide at each position in the sequence. (If you have a set of sequences as above and want to create a TFBS::Matrix::PFM object out of them, have a look at TFBS::PatternGen::SimplePFM module.) PFMs are easily converted to other types of matrices, namely information content matrices and position weight matrices. A TFBS::Matrix::PFM object has the methods to_ICM and to_PWM which do just that, returning a TFBS::Matrix::ICM and TFBS::Matrix::PWM objects, respectively. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Matrix::PFM; use vars '@ISA'; use PDL; use strict; use Bio::Root::Root; use Bio::SeqIO; use TFBS::Matrix; use TFBS::Matrix::ICM; use TFBS::Matrix::PWM; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); use constant EXACT_SCHNEIDER_MAX => 30; ####################################################### # PUBLIC METHODS ####################################################### =head2 new Title : new Usage : my $pfm = TFBS::Matrix::PFM->new(%args) Function: constructor for the TFBS::Matrix::PFM object Returns : a new TFBS::Matrix::PFM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL Warnings : Warns if the matrix provided has columns with different sums. Columns with different sums contradict the usual origin of matrix data and, unless you are absolutely sure that column sums _should_ be different, it would be wise to check your matrices. =cut sub new { my ($class, %args) = @_; my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"PFM"); my $self = bless $matrix, ref($class) || $class; $self->_check_column_sums(); return $self; } =head2 column_sum Title : column_sum Usage : my $nr_sequences = $pfm->column_sum() Function: calculates the sum of elements of one column (the first one by default) which normally equals the number of sequences used to derive the PFM. Returns : the sum of elements of one column (an integer) Args : columnn number (starting from 1), OPTIONAL - you DO NOT need to specify it unless you are dealing with a matrix =cut sub column_sum { my ($self, $column) = (@_,1); return $self->pdl_matrix->slice($column-1)->sum; } =head2 to_PWM Title : to_PWM Usage : my $pwm = $pfm->to_PWM() Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) to position weight matrix. At present it assumes uniform background distribution of nucleotide frequencies. Returns : a new TFBS::Matrix::PWM object Args : none; in the future releases, it should be able to accept a user defined background probability of the four nucleotides =cut sub to_PWM { my ($self, %args) = @_; my $bg = ($args{'-bg_probabilities' } || $self->{'bg_probabilities'}); my $bg_pdl = transpose pdl ($bg->{'A'}, $bg->{'C'}, $bg->{'G'}, $bg->{'T'}); my $nseqs = $self->pdl_matrix->sum / $self->length; my $q_pdl = ($self->pdl_matrix +$bg_pdl*sqrt($nseqs)) / ($nseqs + sqrt($nseqs)); my $pwm_pdl = log2(4*$q_pdl); my $PWM = TFBS::Matrix::PWM->new ( (map {("-$_", $self->{$_}) } keys %$self), # do not want tags to point to the same arrayref as in $self: -tags => \%{ $self->{'tags'}}, -bg_probabilities => \%{ $self->{'bg_probabilities'}}, -matrix => $pwm_pdl ); return $PWM; } =head2 to_ICM Title : to_ICM Usage : my $icm = $pfm->to_ICM() Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) to information content matrix. At present it assumes uniform background distribution of nucleotide frequencies. Returns : a new TFBS::Matrix::ICM object Args : -small_sample_correction # undef (default), 'schneider' or 'pseudocounts' How a PFM is converted to ICM: For a PFM element PFM[i,k], the probability without pseudocounts is estimated to be simply p[i,k] = PFM[i,k] / Z where - Z equals the column sum of the matrix i.e. the number of motifs used to construct the PFM. - i is the column index (position in the motif) - k is the row index (a letter in the alphacer, here k is one of (A,C,G,T) Here is how one normally calculates the pseudocount-corrected positional probability p'[i,j]: p'[i,k] = (PFM[i,k] + 0.25*sqrt(Z)) / (Z + sqrt(Z)) 0.25 is for the flat distribution of nucleotides, and sqrt(Z) is the recommended pseudocount weight. In the general case, p'[i,k] = (PFM[i,k] + q[k]*B) / (Z + B) where q[k] is the background distribution of the letter (nucleotide) k, and B an arbitrary pseudocount value or expression (for no pseudocounts B=0). For a given position i, the deviation from random distribution in bits is calculated as (Baldi and Brunak eq. 1.9 (2ed) or 1.8 (1ed)): - for an arbitrary alphabet of A letters: D[i] = log2(A) + sum_for_all_k(p[i,k]*log2(p[i,k])) - special case for nucleotides (A=4) D[i] = 2 + sum_for_all_k(p[i,k]*log2(p[i,k])) D[i] equals the information content of the position i in the motif. To calculate the entire ICM, you have to calculate the contrubution of each nucleotide at a position i to D[i], i.e. ICM[i,k] = p'[i,k] * D[i] =cut sub to_ICM { my ($self, %args) = @_; my $bg = ($args{'-bg_probabilities' } || $self->{'bg_probabilities'}); # compute ICM my $bg_pdl = transpose pdl ($bg->{'A'}, $bg->{'C'}, $bg->{'G'}, $bg->{'T'}); my $Z_pdl = $self->pdl_matrix->xchg(0,1)->sumover; # pseudocount calculation my $B = 0; if (lc($args{'-small_sample_correction'} or "") eq "pseudocounts") { $B = sqrt($Z_pdl); } else { $B = 0; # do not add pseudocounts } my $p_pdl = ($self->pdl_matrix +$bg_pdl*$B)/ ($Z_pdl + $B); my $plog_pdl = $p_pdl*log2($p_pdl); $plog_pdl = $plog_pdl->badmask(0); my $D_pdl = 2 + $plog_pdl->xchg(0,1)->sumover; my $ic_pdl = $p_pdl * $D_pdl; # apply Schneider correction if requested if (lc($args{'-small_sample_correction'} or "") eq "schneider") { my $columnsum_pdl = $ic_pdl->transpose->sumover; my $corrected_columnsum_pdl = $columnsum_pdl + _schneider_correction ($self->pdl_matrix, $bg_pdl); $ic_pdl *= $corrected_columnsum_pdl/$columnsum_pdl; } # construct and return an ICM object my $ICM = TFBS::Matrix::ICM->new ( (map {("-$_" => $self->{$_})} keys %$self), -tags => \%{ $self->{'tags'}}, -bg_probabilities => \%{ $self->{'bg_probabilities'}}, -matrix => $ic_pdl ); return $ICM; } =head2 draw_logo Title : draw_logo Usage : my $gd_image = $pfm->draw_logo() Function: draws a sequence logo; similar to the method in TFBS::Matrix::ICM, but can automatically calculate error bars for drawing Returns : a GD image object (see documentation of GD module) Args : many; PFM-specific options are: -small_sample_correction # One of # "Schneider" (uses correction # described by Schneider et al. # (Schneider t et al. (1986) J.Biol.Chem. # "pseudocounts" - standard pseudocount # correction, more suitable for # PFMs with large r column sums # If the parameter is ommited, small # sample correction is not applied -draw_error_bars # if true, adds error bars to each position # in the logo. To calculate the error bars, # it uses the -small_sample_connection # argument if explicitly set, # or "Schneider" by default For other args, see draw_logo entry in TFBS::Matrix::ICM documentation =cut sub draw_logo { my ($self, %args) = @_; if ($args{'-draw_error_bars'}) { $args{'-small_sample_correction'} ||= "Schneider"; # default Schneider my $pdl_no_correction = $self->to_ICM() ->pdl_matrix->transpose->sumover; my $pdl_with_correction = $self->to_ICM(-small_sample_correction => $args{'-small_sample_correction'}) ->pdl_matrix->transpose->sumover; $args{'-error_bars'} = [list ($pdl_no_correction - $pdl_with_correction)]; } $self->to_ICM(%args)->draw_logo(%args); } =head2 add_PFM Title : add_PFM Usage : $pfm->add_PFM($another_pfm) Function: adds the values of $pnother_pfm matrix to $pfm Returns : reference to the updated $pfm object Args : a TFBS::Matrix::PFM object =cut sub add_PFM { my ($self, $pfm) = @_; $pfm->isa("TFBS::Matrix::PFM") or $self->throw("Wrong or no argument passed to add_PFM"); my $sum = $self->pdl_matrix + $pfm->pdl_matrix; $self->set_matrix($sum); return $self; } =head2 name =head2 ID =head2 class =head2 matrix =head2 length =head2 revcom =head2 rawprint =head2 prettyprint The above methods are common to all matrix objects. Please consult L to find out how to use them. =cut ############################################### # PRIVATE METHODS ############################################### sub _check_column_sums { my ($self) = @_; my $pdl = $self->pdl_matrix->sever(); my $rowsums = $pdl->xchg(0,1)->sumover(); #if ($rowsums->where($rowsums != $rowsums->slice(0))->getdim(0) > 0) { #$self->warn("PFM for ".$self->{ID}." has unequal column sums"); #} } sub DESTROY { # does nothing } ############################################### # UTILITY FUNCTIONS ############################################### sub log2 { log($_[0]) / log(2); } sub _schneider_correction { my ($pdl, $bg_pdl) = @_; my $Hg = -sum ($bg_pdl*log2($bg_pdl)); my (@Hnbs, %saved_Hnb); my $is_flat = _is_bg_flat(list $bg_pdl); my @factorials = (1); if (min($pdl->transpose->sumover) <= EXACT_SCHNEIDER_MAX) { foreach my $i (1..max($pdl->transpose->sumover)) { $factorials[$i] =$factorials[$i-1] * $i; } } my @column_sums = list $pdl->transpose->sumover; foreach my $colsum (@column_sums) { if (defined($saved_Hnb{$colsum})) { push @Hnbs, $saved_Hnb{$colsum}; } else { my $Hnb; if ($colsum <= EXACT_SCHNEIDER_MAX) { if ($is_flat) { $Hnb = _schneider_Hnb_precomputed($colsum); } else { $Hnb = _schneider_Hnb_exact($colsum, $bg_pdl, \@factorials); } } else { $Hnb = _schneider_Hnb_approx($colsum, $Hg); } $saved_Hnb{$colsum} = $Hnb; push @Hnbs, $Hnb; } } return -$Hg + pdl(@Hnbs); } sub _schneider_Hnb_exact { my ($n, $bg_pdl, $rFactorial) = @_; my $is_flat = _is_bg_flat(list $bg_pdl); return 0 if $n==1; # my @fctrl = (1); # foreach my $i (1..max($pdl->transpose->sumover)) { # $rFactorial->[$i] =$rFactorial->[$i-1] * $i; # } # my @colsum = list $pdl->transpose->sumover; my ($na, $nc, $ng, $nt) = ($n, 0,0,0); # my $n = $colsum[0]; my $E_Hnb=0; while (1) { my $ns_pdl = pdl [$na, $nc, $ng, $nt]; my $Pnb = ($rFactorial->[$n] / ($rFactorial->[$na] *$rFactorial->[$nc] *$rFactorial->[$ng] *$rFactorial->[$nt]) )*prod($bg_pdl->transpose**pdl($na, $nc, $ng, $nt)); my $Hnb = -1 * sum(($ns_pdl/$n)*log2($ns_pdl/$n)->badmask(0)); $E_Hnb += $Pnb*$Hnb; if ($nt) { if ($ng) { $ng--; $nt++, } elsif ($nc) { $nc--; $ng = $nt+1; $nt = 0; } elsif ($na) { $na--; $nc = $nt+1; $nt = 0; } else { last; } } else { if ($ng) { $ng--; $nt++, } elsif ($nc) { $nc--; $ng++; } else { $na--; $nc++; $nt = 0; } } } return $E_Hnb; } sub _schneider_Hnb_approx { my ($colsum, $Hg) = @_; return $Hg -3/(2*log(2)*$colsum); } sub _schneider_Hnb_precomputed { my $i = shift; if ($i<1 or $i>30) { die "Precomputed params only available for colsums 1 to 30)"; } my @precomputed = ( 0, # 1 0.75, # 2 1.11090234442608, # 3 1.32398964833609, # 4 1.46290503577084, # 5 1.55922640783176, # 6 1.62900374746751, # 7 1.68128673969433, # 8 1.7215504663901, # 9 1.75328193031842, # 10 1.77879136615189, # 11 1.79965855531179, # 12 1.81699248819687, # 13 1.8315892710679, # 14 1.84403166371213, # 15 1.85475371994775, # 16 1.86408383599326, # 17 1.87227404728809, # 18 1.87952034817826, # 19 1.88597702438913, # 20 1.89176691659196, # 21 1.89698887214968, # 22 1.90172322434865, # 23 1.90603586889234, # 24 1.90998133028897, # 25 1.91360509239859, # 26 1.91694538711761, # 27 1.92003457997914, # 28 1.92290025302018, # 29 1.92556605820924, # 30 ); return $precomputed[$i-1]; } sub _is_bg_flat { my @bg = @_; my $ref = shift; foreach my $other (@bg) { return 0 unless $ref==$other; } return 1; } 1; TFBS-0.7.1/TFBS/Matrix/.svn/text-base/PWM.pm.svn-base000077500000000000000000000362201305752266700216000ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::PWM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::PWM - class for position weight matrices of nucleotide patterns =head1 SYNOPSIS =over 4 =item * creating a TFBS::Matrix::PWM object manually: my $matrixref = [ [ 0.61, -3.16, 1.83, -3.16, 1.21, -0.06], [-0.15, -2.57, -3.16, -3.16, -2.57, -1.83], [-1.57, 1.85, -2.57, -1.34, -1.57, 1.14], [ 0.31, -3.16, -2.57, 1.76, 0.24, -0.83] ]; my $pwm = TFBS::Matrix::PWM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = <new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" ); =item * retrieving a TFBS::Matix::PWM object from a database: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pwm = $db_obj->get_Matrix_by_ID("M0001", "PWM"); # or my $pwm = $db_obj->get_Matrix_by_name("MyProfile", "PWM"); =item * retrieving list of individual TFBS::Matrix::PWM objects from a TFBS::MatrixSet object (see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices) my @pwm_list = $matrixset->all_patterns(-sort_by=>"name"); =item * scanning a nucleotide sequence with a matrix my $siteset = $pwm->search_seq(-file =>"myseq.fa", -threshold => "80%"); =item * scanning a pairwise alignment with a matrix my $site_pair_set = $pwm->search_aln(-file =>"myalign.aln", -threshold => "80%", -cutoff => "70%", -window => 50); =back =head1 DESCRIPTION TFBS::Matrix::PWM is a class whose instances are objects representing position weight matrices (PWMs). A PWM is normally calculated from a raw position frequency matrix (see L for the explanation of position frequency matrices). For example, given the following position frequency matrix: A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ] The standard computational procedure is applied to convert it into the following position weight matrix: A:[ 0.61 -3.16 1.83 -3.16 1.21 -0.06] C:[-0.15 -2.57 -3.16 -3.16 -2.57 -1.83] G:[-1.57 1.85 -2.57 -1.34 -1.57 1.14] T:[ 0.31 -3.16 -2.57 1.76 0.24 -0.83] which contains the "weights" associated with the occurence of each nucleotide at the given position in a pattern. A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Matrix::PWM; use vars '@ISA'; use PDL; use strict; use Bio::Root::Root; use Bio::Seq; use Bio::SeqIO; use TFBS::Matrix; use TFBS::SiteSet; use TFBS::Matrix::_Alignment; use TFBS::Ext::pwmsearch; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); ################################################################# # PUBLIC METHODS ################################################################# =head2 new Title : new Usage : my $pwm = TFBS::Matrix::PWM->new(%args) Function: constructor for the TFBS::Matrix::PWM object Returns : a new TFBS::Matrix::PWM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL =cut sub new { my ($class, %args) = @_; my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"PWM"); my $self = bless $matrix, ref($class) || $class; $self->_set_min_max_score(); return $self; } =head2 search_seq Title : search_seq Usage : my $siteset = $pwm->search_seq(%args) Function: scans a nucleotide sequence with the pattern represented by the PWM Returns : a TFBS::SiteSet object Args : # you must specify either one of the following three: -file, # the name od a fasta file (single sequence) #or -seqobj # a Bio::Seq object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -seqstring # a string containing the sequence -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -subpart # subpart of the sequence to search, given as # -subpart => { start => 140, # end => 180 } # where start and end are coordinates in the # sequence; the coordinate range is interpreted # in the BioPerl tradition (1-based, inclusive) # OPTIONAL: by default searches entire alignment =cut sub search_seq { my ($self, %args) = @_; $self->_search(%args); } =head2 search_aln Title : search_aln Usage : my $site_pair_set = $pwm->search_aln(%args) Function: Scans a pairwise alignment of nucleotide sequences with the pattern represented by the PWM: it reports only those hits that are present in equivalent positions of both sequences and exceed a specified threshold score in both, AND are found in regions of the alignment above the specified conservation cutoff value. Returns : a TFBS::SitePairSet object Args : # you must specify either one of the following three: -file, # the name of the alignment file in Clustal format #or -alignobj # a Bio::SimpleAlign object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -alignstring # a multi-line string containing the alignment # in clustal format ############# -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -window, # size of the sliding window (inn nucleotides) # for calculating local conservation in the # alignment # OPTIONAL: default 50 -cutoff # conservation cutoff (%) for including the # region in the results of the pattern search # OPTIONAL: default "70%" -subpart # subpart of the alignment to search, given as e.g. # -subpart => { relative_to => 1, # start => 140, # end => 180 } # where start and end are coordinates in the # sequence indicated by relative_to (1 for the # 1st sequence in the alignment, 2 for the 2nd) # OPTIONAL: by default searches entire alignment -conservation # conservation profile, a TFBS::ConservationProfile # OPTIONAL: by default the conservation profile is # computed internally on the fly (less efficient) =cut sub search_aln { my ($self, %args) = @_; unless ($args{-alignstring} or $args{-alignobj} or $args{-file}) { $self->throw ("No alignment file, string or object passed to search_aln."); } $args{-pattern_set} = $self; my $aln = ($args{-alignment_setup} or TFBS::Matrix::_Alignment->new(%args)); $aln->do_sitesearch(%args); return $aln->site_pair_set; } sub max_score { $_[0]->{max_score}; } sub min_score { $_[0]->{min_score}; } =head2 name =head2 ID =head2 class =head2 matrix =head2 length =head2 revcom =head2 rawprint =head2 prettyprint The above methods are common to all matrix objects. Please consult L to find out how to use them. =cut ################################################################# # PRIVATE METHODS ################################################################# sub _set_min_max_score { my ($self) = @_; my $transpose = $self->pdl_matrix->xchg(0,1); $self->{min_score} = sum(minimum $transpose); $self->{max_score} = sum(maximum $transpose); } sub _search { # this method runs the pwmsearch C extension and parses the data # similarly to _csearch, which will eventually be discontinued my ($self, %args) = @_; my $seqobj = $self->_to_seqobj(%args); my ($subseq_start, $subseq_end) = (1,$seqobj->length); if(my $subpart = $args{-subpart}) { $subseq_start = $subpart->{-start}; $subseq_end = $subpart->{-end}; unless($subseq_start and $subseq_end) { $self->throw("Option -subpart missing suboption -start or -end"); } } return TFBS::Ext::pwmsearch::pwmsearch($self, $seqobj, ($args{-threshold} or 0), $subseq_start, $subseq_end); } sub _csearch { # this is a wrapper around Wyeth Wasserman's's pwm_searchPFF program # until we do a proper extension my ($self) = shift; #the rest of @_ goes to _to_seqob; my %args = @_; my $PWM_SEARCH = $args{'-binary'} || "pwm_searchPFF"; # dump the sequence into a tempfile my $seqobj = $self->_to_seqobj(@_); my ($fastaFH, $fastafile); if (defined $seqobj->{_fastaFH} and defined $seqobj->{_fastafile}) { ($fastaFH, $fastafile) = ($seqobj->{_fastaFH}, $seqobj->{_fastafile}); } else { ($fastaFH, $fastafile) = tmpnam(); my $seqFH = Bio::SeqIO->newFh(-fh =>$fastaFH, -format=>"Fasta"); print $seqFH $seqobj; } # we need $fastafile below # calculate threshold my $threshold; if ($args{-threshold}) { if ($args{-threshold} =~ /(.+)%/) { # percentage $threshold = $self->{min_score} + ($self->{max_score} - $self->{min_score})* $1/100; } else { # absolute value $threshold = $args{-threshold}; } } else { # no threshold given $threshold = $self->{min_score} -1; } # convert piddle to text (there MUST be a better way) my $pwmstring = sprintf ( $self->pdl_matrix ); $pwmstring =~ s/\[|\]//g; # lose [] $pwmstring =~ s/\n /\n/g; # lose leading spaces my @pwmlines = split("\n", $pwmstring); # f $pwmstring = join ("\n", @pwmlines[2..5])."\n"; # dump pwm into a tempfile my ($pwmFH, $pwmfile) = tmpnam(); # we need $pwmfile below print $pwmFH $pwmstring; close $pwmFH; # run pwmsearch my $hitlist = TFBS::SiteSet->new(); my ($TFname, $TFclass) = ($self->{name}, $self->{class}); my @search_result_lines = `$PWM_SEARCH $pwmfile $fastafile $threshold -n $TFname -c $TFclass`; foreach (@search_result_lines) { chomp; my ($seq_id, $factor, $class, $strand, $score, $pos, $siteseq) = (split)[0, 2, 3, 4, 5, 7, 9]; my $correct_strand = ($strand eq "+")? "-1" : "1"; my $site = TFBS::Site->new ( -seq_id => $seqobj->display_id()."", -seqobj => $seqobj, -strand => $correct_strand."", -pattern => $self, -siteseq => $siteseq."", -score => $score."", -start => $pos, -end => $pos + length($siteseq) -1 ); $hitlist->add_site($site); } # cleanup unlink $fastafile unless $seqobj->{_fastafile}; unlink $pwmfile; return $hitlist; } sub _bsearch { # this is Perl/PDL only search routine. For experimental purposes only my ($self,%args) = @_; #the rest of @_ goes to _to_seqob; my @PWMs; # prepare the sequence my $seqobj = $self->_to_seqobj(%args); my $seqmatrix = (defined $seqobj->{_pdl_matrix}) ? $seqobj->{_pdl_matrix} : _seq_to_pdlmatrix($seqobj); # calculate threshold my $threshold; if ($args{-threshold}) { if ($args{-threshold} =~ /(.+)%/) { # percentage $threshold = $self->{min_score} + ($self->{max_score} - $self->{min_score})* $1/100; } else { # absolute value $threshold = $args{-threshold}; } } else { # no threshold given $threshold = $self->{min_score} -1; } # do the analysis my $hitlist = TFBS::SiteSet->new(); foreach my $pwm ($self, $self->revcom()) { my $TFlength = $pwm->pdl_matrix->getdim(0); my $position_score_pdl = zeroes($seqmatrix->getdim(0) - $TFlength + 1); my $position_index_pdl = sequence($seqmatrix->getdim(0) - $TFlength + 1)+1; foreach my $i (0..($TFlength-1)) { my $columnproduct = $seqmatrix * $pwm->pdl_matrix->slice("$i,:"); $position_score_pdl += $columnproduct->xchg(0,1)->sumover->slice($i.":".($i-$TFlength)); } my @hitpositions = list $position_index_pdl->where($position_score_pdl >= $threshold); my @hitscores = list $position_score_pdl->where($position_score_pdl >= $threshold); for my $i(0..$#hitpositions) { my($pos,$score) = ($hitpositions[$i], $hitscores[$i]); my $siteseq = scalar($seqobj->subseq($pos, $pos+$TFlength-1)); my $site = TFBS::Site->new ( -seq_id => $seqobj->display_id(), -seqobj => $seqobj, -strand => $pwm->{strand}, -Matrix => $pwm, -siteseq => $siteseq, -score => $score, -start => $pos); $hitlist->add_site($site); } } return $hitlist; } sub _to_seqobj { my ($self, %args) = @_; my $seq; if ($args{-file}) { # not a Bio::Seq return Bio::SeqIO->new(-file => $args{-file}, -format => 'fasta', -moltype => 'dna')->next_seq(); } elsif ($args{-seqstring} or $args{-seq}) { # I guess it's a string then return Bio::Seq->new(-seq => ($args{-seqstring} or $args{-seq}), -id => ($args{-seq_id} or "undefined"), -moltype => 'dna'); } elsif ($args{'-seqobj'} and ref($args{'-seqobj'}) and $args{'-seqobj'}->can("seq")) { # do nothing (maybe check later) return $args{'-seqobj'}; } #elsif (ref($format) =~ /Bio\:\:Seq/ and !defined $seq) { # if only one parameter passed and it's a Bio::Seq #return $format; #} else { $self->throw ("Wrong parametes passed to search method: ".%args); } } sub _seq_to_pdlmatrix { # called from ?search # not OO - help function for search my $seqobj = shift; my $seqstring = uc($seqobj->seq()); my @perlarray; foreach (qw(A C G T)) { my $seqtobits = $seqstring; eval "\$seqtobits =~ tr/$_/1/"; # curr. letter $_ to 1 eval "\$seqtobits =~ tr/1/0/c"; # non-1s to 0 push @perlarray, [split("", $seqtobits)]; } return byte (\@perlarray); } sub DESTROY { # nothing } 1; TFBS-0.7.1/TFBS/Matrix/.svn/text-base/_Alignment.pm.svn-base000077500000000000000000000273731305752266700232230ustar00rootroot00000000000000package TFBS::Matrix::_Alignment; use vars qw(@ISA $AUTOLOAD); use TFBS::SitePair; use TFBS::SitePairSet; use Bio::Root::Root; use Bio::Seq; use Bio::SimpleAlign; use Bio::AlignIO; use IO::String; use PDL; use strict; @ISA =('Bio::Root::Root'); # CONSTANTS use constant DEFAULT_WINDOW => 50; use constant DEFAULT_CUTOFF => 70; use constant DEFAULT_THRESHOLD => "80%"; sub new { # this is ugly; OK, OK, I'll rewrite it as soon as I can my ($caller, %args) = @_; my $self = bless {}, ref $caller || $caller; $self->window($args{-window} or DEFAULT_WINDOW); $self->_parse_alignment(%args); $self->seq1length(length(_strip_gaps($self->alignseq1()))); $self->seq2length(length(_strip_gaps($self->alignseq2()))); $self->_set_subpart_bounds($args{-subpart}); # # If a conservation profile is provided, no need to compute it again. # NOTE: conservation2 never seems to be used anywhere else so don't worry # about the fact we are ignoring it if conservation is passed in :) # my $cp = $args{-conservation}; if ($cp) { $self->conservation1([$cp->conservation()]); } else { $self->conservation1($self->_calculate_conservation($self->window(),1)); $self->conservation2($self->_calculate_conservation($self->window(),2)); } $self->cutoff($args{-cutoff} or DEFAULT_CUTOFF); #$self->threshold($args{-threshold} or DEFAULT_THRESHOLD); #$self->_do_sitesearch #(($args{-pattern_set} or $self->throw("No -matrixset parameter")), # ($args{-threshold} or DEFAULT_THRESHOLD), # ()); # $self->_set_start_end(%args); # Maybe later... return $self; } sub DESTROY { # empty } sub _parse_alignment { my ($self, %args) = @_; my ($seq1, $seq2, $start); my $alignobj; if (defined $args{'-alignstring'}) { $alignobj = _alignstring_to_alignobj($args{'-alignstring'}); } elsif (defined $args{'-file'}) { $alignobj = _alignfile_to_alignobj($args{'-file'}); } elsif (defined $args{-alignobj}) { $alignobj = $args{'-alignobj'}; } else { $self->throw("No -alignstring, -file or -alignobj passed."); } my @match; my ($seqobj1, $seqobj2) = $alignobj->each_seq; ($seq1, $seq2) = ($seqobj1->seq, $seqobj2->seq); $start = 1; $self->seq1name($seqobj1->display_id); $self->seq2name($seqobj2->display_id); $self->alignseq1($seq1); $self->alignseq2($seq2); my @seq1 = ("-", split('', $seq1) ); my @seq2 = ("-", split('', $seq2) ); $self->{alignseq1array} = [@seq1]; $self->{alignseq2array} = [@seq2]; my (@seq1index, @seq2index); my ($i1, $i2) = (0, 0); for my $pos (0..$#seq1) { my ($s1, $s2) = (0, 0); $seq1[$pos] ne "-" and $s1 = ++$i1; $seq2[$pos] ne "-" and $s2 = ++$i2; push @seq1index, $s1; push @seq2index, $s2; } $self->pdlindex( pdl [ [list sequence($#seq1+1)], [@seq1index], [@seq2index], [list zeroes ($#seq1+1)] ]) ; return 1; } sub pdlindex { my ($self, $input, $p1, $p2) = @_ ; # print ("PARAMS ", join(":", @_), "\n"); if (ref($input) eq "PDL") { $self->{pdlindex} = $input; } unless (defined $p2) { return $self->{pdlindex}; } else { my @results = list $self->{pdlindex}->xchg(0,1)->slice($p2)->where ($self->{pdlindex}->xchg(0,1)->slice($p1)==$input); wantarray ? return @results : return $results[0]; } } sub lower_pdlindex { my ($self, $input, $p1, $p2) = @_; unless (defined $p2) { $self->throw("Wrong number of parameters passed to lower_pdlindex"); } my $result; my $i = $input; until ($result = $self->pdlindex($i, $p1 => $p2)) { $i--; last if $i==0; } return $result or 1; } sub higher_pdlindex { my ($self, $input, $p1, $p2) = @_; unless (defined $p2) { $self->throw("Wrong number of parameters passed to lower_pdlindex"); } my $result; my $i = $input; until ($result = $self->pdlindex($i, $p1 => $p2)) { $i++; last unless ($self->pdlindex($i, $p1=>0) > 0); } return $result; } sub _calculate_conservation { my ($self, $WINDOW, $which) = @_; my (@seq1, @seq2); if ($which==2) { @seq1 = @{$self->{alignseq2array}}; @seq2 = @{$self->{alignseq1array}}; } else { @seq1 = @{$self->{alignseq1array}}; @seq2 = @{$self->{alignseq2array}}; $which=1; } my @CONSERVATION; my @match; while ($seq1[0] eq "-") { shift @seq1; shift @seq2; } for my $i (0..$#seq1) { push (@match,( uc($seq1[$i]) eq uc($seq2[$i]) ? 1:0)) unless ($seq1[$i] eq "-" or $seq1[$i] eq "."); } my @graph=($match[0]); for my $i (1..($#match+$WINDOW/2)) { $graph[$i] = ($graph[$i-1] or 0) + ($i>$#match ? 0: $match[$i]) - ($i<$WINDOW ? 0: $match[$i-$WINDOW]); } # at this point, the graph values are shifted $WINDOW/2 to the right # i.e. the score at a certain position is the score of the window # UPSTREAM of it: To fix it, we shoud discard the first $WINDOW/2 scores: #$self->conservation1 ([]); foreach my $pos (@graph[int($WINDOW/2)..$#graph]) { push @CONSERVATION, 100*$pos/$WINDOW; } # correction foreach my $pos (0..int($WINDOW/2)) { $CONSERVATION[$pos] = $CONSERVATION[$pos]*$WINDOW/(int($WINDOW/2)+$pos); $CONSERVATION[$#CONSERVATION - $pos] = $CONSERVATION[$#CONSERVATION - $pos]*$WINDOW/(int($WINDOW/2)+$pos); } return [@CONSERVATION]; } sub _strip_gaps { # a utility function my $seq = shift; $seq =~ s/\-|\.//g; return $seq; } sub do_sitesearch { my ($self, @args ) = @_; my ($MATRIXSET, $THRESHOLD, $CUTOFF) = $self->_rearrange([qw(PATTERN_SET THRESHOLD CUTOFF)], @args); if (!$MATRIXSET) { $self->throw("No -pattern_set passed to do_sitesearch"); } $CUTOFF = ($CUTOFF or DEFAULT_CUTOFF); $THRESHOLD = ($THRESHOLD or DEFAULT_THRESHOLD); $self->site_pair_set(TFBS::SitePairSet->new()); return if(($self->subpart1 and $self->subpart1->{-start} == 0) or ($self->subpart2 and $self->subpart2->{-start} == 0)); # ^^^ If one of the subparts is a gap, there's no point in searching my $seqobj1 = Bio::Seq->new(-seq=>_strip_gaps($self->alignseq1()), -id => "Seq1"); my $siteset1 = $MATRIXSET->search_seq(-seqobj => $seqobj1, -threshold => $THRESHOLD, -subpart => $self->subpart1); my $siteset1_itr = $siteset1->Iterator(-sort_by => "start"); my $seqobj2 = Bio::Seq->new(-seq=>_strip_gaps($self->alignseq2()), -id => "Seq2"); my $siteset2 = $MATRIXSET->search_seq(-seqobj => $seqobj2, -threshold => $THRESHOLD, -subpart => $self->subpart2); my $siteset2_itr = $siteset2->Iterator(-sort_by => "start"); my $site1 = $siteset1_itr->next(); my $site2 = $siteset2_itr->next(); while (defined $site1 and defined $site2) { my $pos1_in_aln = $self->pdlindex($site1->start(), 1=>0); my $pos2_in_aln = $self->pdlindex($site2->start(), 2=>0); my $cmp = (($pos1_in_aln <=> $pos2_in_aln) or ($site1->pattern->name() cmp $site2->pattern->name()) or ($site1->strand() cmp $site2->strand())); if ($cmp==0) { ### match if (# threshold test: $self->conservation1->[$site1->start()] >= $self->cutoff() ) { my $site_pair = TFBS::SitePair->new($site1, $site2); $self->site_pair_set->add_site_pair($site_pair); } $site1 = $siteset1_itr->next(); $site2 = $siteset2_itr->next(); } elsif ($cmp<0) { ### $siteset1 is behind $site1 = $siteset1_itr->next(); } elsif ($cmp>0) { ### $siteset2 is behind $site2 = $siteset2_itr->next(); } } } sub _set_subpart_bounds { my ($self, $subpart) = @_; if(defined $subpart) { my ($relative_to, $start, $end) = ($subpart->{-relative_to}, $subpart->{-start}, $subpart->{-end}); unless(defined($relative_to) and defined($start) and defined($end) ) { $self->throw("Option -subpart missing suboption -relative_to, -start or -end"); } if($relative_to == 1) { my $other_start = $self->higher_pdlindex($start, 1 => 2); my $other_end = $self->lower_pdlindex($end, 1 => 2); ($other_start, $other_end) = (0,0) if($other_start > $other_end); $self->subpart1({ -start => $start, -end => $end }); $self->subpart2({ -start => $other_start, -end => $other_end }); } elsif($relative_to == 2) { my $other_start = $self->higher_pdlindex($start, 2 => 1); my $other_end = $self->lower_pdlindex($end, 2 => 1); ($other_start, $other_end) = (0,0) if($other_start > $other_end); $self->subpart1({ -start => $other_start, -end => $other_end }); $self->subpart2({ -start => $start, -end => $end }); } else { $self->throw("Suboption -relative_to should be 1 or 2"); } } } sub _calculate_cutoff { my ($self) = @_; my $ile = 0.9; my @conservation_array = sort {$a <=> $b} @{$self->conservation1()}; my $perc_90 = $conservation_array[int($ile*scalar(@conservation_array))]; return $perc_90; } sub _alignfile_to_string { # a utility function # DEPRECATED !!! my $alignfile = shift; if ($alignfile =~ /\.msf$/i) { my $alignobj = Bio::SimpleAlign->new(); $alignobj->read_MSF($alignfile); return _alignobj_to_string($alignobj); } else { #assumed clustalw - no AlignIO import yet local $/ = undef; open FILE, $alignfile or die("Could not read alignfile $alignfile, stopped"); my $alignstring = ; return $alignstring; } } sub _alignfile_to_alignobj { # a utility function my ($alignfile, $format) = (@_,'clustalw'); if (!$format and $alignfile =~ /\.msf$/i) { $format = 'msf' ;} my $alnio = Bio::AlignIO->new(-file=>$alignfile, -format=>$format); return $alnio->next_aln; } sub _alignobj_to_string { # a utility function # DEPRECATED my $alignobj = shift; my $alignstring; my $io = IO::String->new($alignstring); my $alnio = Bio::AlignIO->new(-fh=>$io, -format=>"clustalw"); $alnio->write_aln($alignobj); $alnio->close(); # $io->close; return $alignstring; } sub _alignstring_to_alignobj { # a utility function my ($alignstring, $format) = (@_, 'clustalw'); my $io = IO::String->new($alignstring); my $alnio = Bio::AlignIO->new(-fh=>$io, -format=>$format); my $alignobj = $alnio->next_aln(); $alnio->close(); # $io->close; return $alignstring; } # uglier than AUTOLOAD, but faster - a quick fix to get rid of Class::MethodMaker sub cutoff { $_[0]->{'cutoff'} = $_[1] if exists $_[1]; $_[0]->{'cutoff'}; } sub window { $_[0]->{'window '} = $_[1] if exists $_[1]; $_[0]->{'window '}; } sub alignseq1 { $_[0]->{'alignseq1'} = $_[1] if exists $_[1]; $_[0]->{'alignseq1'}; } sub alignseq2 { $_[0]->{'alignseq2'} = $_[1] if exists $_[1]; $_[0]->{'alignseq2'}; } sub site_pair_set { $_[0]->{'site_pair_set'} = $_[1] if exists $_[1]; $_[0]->{'site_pair_set'};} sub seq1name { $_[0]->{'seq1name'} = $_[1] if exists $_[1]; $_[0]->{'seq1name'}; } sub seq2name { $_[0]->{'seq2name'} = $_[1] if exists $_[1]; $_[0]->{'seq2name'}; } sub seq1length { $_[0]->{'seq1length'} = $_[1] if exists $_[1]; $_[0]->{'seq1length'}; } sub seq2length { $_[0]->{'seq2length'} = $_[1] if exists $_[1]; $_[0]->{'seq2length'}; } sub subpart1 { $_[0]->{'subpart1'} = $_[1] if exists $_[1]; $_[0]->{'subpart1'}; } sub subpart2 { $_[0]->{'subpart2'} = $_[1] if exists $_[1]; $_[0]->{'subpart2'}; } sub conservation1 { $_[0]->{'conservation1'} = $_[1] if exists $_[1]; $_[0]->{'conservation1'};} sub conservation2 { $_[0]->{'conservation2'} = $_[1] if exists $_[1]; $_[0]->{'conservation2'};} sub exclude_orf { $_[0]->{'exclude_orf'} = $_[1] if exists $_[1]; $_[0]->{'exclude_orf'}; } sub start_at { $_[0]->{'start_at'} = $_[1] if exists $_[1]; $_[0]->{'start_at'}; } sub end_at { $_[0]->{'end_at'} = $_[1] if exists $_[1]; $_[0]->{'end_at'}; } 1; TFBS-0.7.1/TFBS/Matrix/Alignment.pm000077500000000000000000000161511305752266700165570ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::ICM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::Alignment - class for alignment of PFM objects =head1 SYNOPSIS =over 1 =item * Making an alignment: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pfm1 = $db_obj->get_Matrix_by_ID("M0001", "PFM"); my $pfm2 = $db_obj->get_Matrix_by_ID("M0002", "PFM"); my $alignment= new TFBS::Matrix::Alignment( -pfm1=>$pfm1, -pfm2=>$pfm2, -binary=>"/TFBS/Ext/matrix_aligner", ); =head1 DESCRIPTION TFBS::Matrix::Alignment is a class for representing and performing pairwise alignments of profiles (in the form of TFBS::PFM objects) Alignments are preformed using a semi-global variant of the Needleman-Wunsch algorithm that only permits the opening of one internal gap. Fore reference, the algorithm is described in Sandelin et al Funct Integr Genomics. 2003 Jun 25 =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code starts HERE: package TFBS::Matrix::Alignment; use vars '@ISA'; use strict; use Bio::Root::Root; use TFBS::Matrix; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); #alignment methods: for making and storing a single matrix-alignments =head2 new Title : new Usage : my $alignment = TFBS::Matrix::Alignment->new(%args) Function: constructor for the TFBS::Matrix::Alignment object Returns : a new TFBS::Matrix::Alignment object Args : # you must specify: -pfm1, # a TFBS::Matrix::PFM object -pfm2, # another TFBS::Matrix::PFM object -binary, # a valid path to the comparison algorithm (matrixalign) ####### -ext_penalty #OPTIONAL gap extension penalty in Needleman-Wunsch algorithmstring. Default 0.01 -open_penalty, #OPTIONAL gap opening penalty in Needleman-Wunsch algorithmstring. Default 3.0 =cut sub new { #defines and createa an alignment # args: two pfm objects # binary file #optional scoring penalites my ($class, %args) = @_; my $self={ _pfm1=> $args{'-pfm1'}, _pfm2=> $args{'-pfm2'}, _ext_penalty=>$args{'-ext_penalty'}|| 0.01, _open_penalty=> $args{'-open_penalty'}|| 3.00, _strand=>'', _align_string=>'', _gaps=>'', _aligned_positions =>'', _score=>'', }; bless $self, "TFBS::Matrix::Alignment"; # errorcheck: # save temp files my($fh1, $file1) = tmpnam(); print $fh1 $args{'-pfm1'}->rawprint()|| die " Cannot save temporary files for alignment"; my($fh2, $file2) = tmpnam(); print $fh2 $args{'-pfm2'}->rawprint()|| die " Cannot save temporary files for alignment"; #align my @pfm1_string; my @pfm2_string; foreach (`$args{'-binary'} $file1 $file2 $self->{'_open_penalty'} $self->{'_ext_penalty'}`){ # my @pfm2_string; my $max_length=$self->{'_pfm1'}->length(); $max_length=$self->{'_pfm2'}->length() if ( $self->{'_pfm2'}->length() > $self->{'_pfm1'}->length()); if (/^PFM1/){ s/PFM1//; s/\t0/\t-/g; @pfm1_string= split(); next; } if (/^PFM2/){ s/PFM2//; s/\t0/\t-/g; @pfm2_string= split(); next; } if (/^INFO/){ my @temp=split; ($self->{'_score'}, $self->{'_strand'}, $self->{'_aligned_positions'}, $self->{'_gaps'})= ($temp[3], $temp[6], $temp[7],$temp[8]); next; } } my $string= ($self->{'_pfm1'}->name()||$self->{'_pfm1'}->ID()||'PFM1')."\t\t"; my $string2=($self->{'_pfm2'}->name()||$self->{'_pfm2'}->ID()||'PFM2')."\t\t";; if ($pfm1_string[0]==1){ $string.="-\t" x ($pfm2_string[0]-1); foreach (my $j=1; $j< $pfm2_string[0]; $j++){ $string2.="$j\t"; } } if ($pfm2_string[0]==1){ $string2.="-\t" x ($pfm1_string[0]-1); for (my $j=1; $j< $pfm1_string[0]; $j++){ $string.="$j\t"; } } $string.= join("\t", @pfm1_string); $string2.= join("\t", @pfm2_string); if ($pfm1_string[-1]==$self->{'_pfm1'}->length()){ $string.="\t-" x ($self->{'_pfm2'}->length()-$pfm2_string[-1]); for (my $j=$pfm2_string[-1]+1; $j<= $self->{'_pfm2'}->length(); $j++){ $string2.="\t$j"; } } if ($pfm2_string[-1]==$self->{'_pfm2'}->length()){ $string2.="\t-" x ($self->{'_pfm1'}->length()-$pfm1_string[-1]); for (my $j=$pfm1_string[-1]+1; $j<= $self->{'_pfm1'}->length(); $j++){ $string.="\t$j"; } } $self->{'_align_string'}= $string ."\n". $string2; unlink $file2; unlink $file1; return $self; } # access functions =head2 score Title : score Usage : my $score = $alignmentobject->score(); Function: access an alignment score (where each aligned position can contribute max 2) Returns : a floating point number Args : none =cut =head2 score Title : gaps Usage : my $nr_of_gaps = $alignmentobject->gaps(); Function: access the number of gaps in an alignment Returns : an integer Args : none =cut =head2 length Title : length Usage : my $length = $alignmentobject->length(); Function: access the length of an alignment (ie thenumber of aligned positions) Returns : an integer Args : none =cut =head2 strand Title : strand Usage : my $strand = $alignmentobject->strand(); Function: access the oriantation of the aligned patterns: ++= oriented as input +-= second pattern is reverse-complemented Returns : a string Args : none =cut =head2 alignment Title : alignment Usage : my $alignment_string = $alignmentobject->alignment(); Function: access a string describing the alignment Returns : an string, where each number refers to a position in respective PFM. Position numbering is according to orientation: ie if the second profile is reversed, position 1 corresponds to the last position in the input profile. Gaps are denoted as - . RXR-VDR - 1 2 3 - 4 5 - PPARgamma-RXRal 1 2 3 4 5 6 7 8 Args : none =cut sub gaps{ return $_[0]->{'_gaps'};} sub score{ return $_[0]->{'_score'};} sub length{ return $_[0]->{'_aligned_positions'};} sub strand{ return $_[0]->{'_strand'};} sub alignment{ return $_[0]->{'_align_string'};} 1 TFBS-0.7.1/TFBS/Matrix/ICM.pm000077500000000000000000000773721305752266700152650ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::ICM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::ICM - class for information content matrices of nucleotide patterns =head1 SYNOPSIS =over 4 =item * creating a TFBS::Matrix::ICM object manually: my $matrixref = [ [ 0.00, 0.30, 0.00, 0.00, 0.24, 0.00 ], [ 0.00, 0.00, 0.00, 1.45, 0.42, 0.00 ], [ 0.00, 0.89, 2.00, 0.00, 0.00, 0.00 ], [ 0.00, 0.00, 0.00, 0.13, 0.06, 2.00 ] ]; my $icm = TFBS::Matrix::ICM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = <new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" ); =item * retrieving a TFBS::Matix::ICM object from a database: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pfm = $db_obj->get_Matrix_by_ID("M0001", "ICM"); # or my $pfm = $db_obj->get_Matrix_by_name("MyProfile", "ICM"); =item * retrieving list of individual TFBS::Matrix::ICM objects from a TFBS::MatrixSet object (see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices) my @icm_list = $matrixset->all_patterns(-sort_by=>"name"); * drawing a sequence logo $icm->draw_logo(-file=>"logo.png", -full_scale =>2.25, -xsize=>500, -ysize =>250, -graph_title=>"C/EBPalpha binding site logo", -x_title=>"position", -y_title=>"bits"); =back =head1 DESCRIPTION TFBS::Matrix::ICM is a class whose instances are objects representing position weight matrices (PFMs). An ICM is normally calculated from a raw position frequency matrix (see L for the explanation of position frequency matrices). For example, given the following position frequency matrix, A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ] the standard computational procedure is applied to convert it into the following information content matrix: A:[2.00 0.30 0.00 0.00 0.24 0.00] C:[0.00 0.00 0.00 1.45 0.42 0.00] G:[0.00 0.89 2.00 0.00 0.00 0.00] T:[0.00 0.00 0.00 0.13 0.06 2.00] which contains the "weights" associated with the occurrence of each nucleotide at the given position in a pattern. A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code starts HERE: package TFBS::Matrix::ICM; use vars '@ISA'; use PDL; use strict; use Bio::Root::Root; use Bio::SeqIO; use TFBS::Matrix; BEGIN { # this will not fail if the modules are nit available # but only if the user tries to actually draw a logo eval "use SVG"; eval "use GD"; }; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); ################################################################# # PUBLIC METHODS ################################################################# =head2 new Title : new Usage : my $icm = TFBS::Matrix::ICM->new(%args) Function: constructor for the TFBS::Matrix::ICM object Returns : a new TFBS::Matrix::ICM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL =cut sub new { my ($class, %args) = @_; my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"ICM"); my $self = bless $matrix, ref($class) || $class; $self->_check_ic_validity(); return $self; } =head2 to_PWM Title : to_PWM Usage : my $pwm = $icm->to_PWM() Function: converts an information content matrix (a TFBS::Matrix::ICM object) to position weight matrix. At present it assumes uniform background distribution of nucleotide frequencies. Returns : a new TFBS::Matrix::PWM object Args : none; in the future releases, it should be able to accept a user defined background probability of the four nucleotides =cut sub to_PWM { my ($self) = @_; $self->throw ("Method to_PWM not yet implemented."); } =head2 draw_logo Title : draw_logo Usage : my $gdImageObj = $icm->draw_logo(%args) Function: Draws a "sequence logo", a graphical representation of a possibly degenerate fixed-width nucleotide sequence pattern, from the information content matrix Returns : a GD::Image object; if you only need the image file you can ignore it Args : -file, # the name of the output PNG image file # OPTIONAL: default none -xsize # width of the image in pixels # OPTIONAL: default 600 -ysize # height of the image in pixels # OPTIONAL: default 5/8 of -x_size -startpos # start position in the logo for x axis # OPTIONAL: default is 1 -margin # size of image margins in pixels # OPTIONAL: default 15% of -y_size -full_scale # the maximum value on the y-axis, in bits # OPTIONAL: default 2.25 -graph_title,# the graph title # OPTIONAL: default none -x_title, # x-axis title; OPTIONAL: default none -y_title # y-axis title; OPTIONAL: default none -error_bars # reference to an array of S.D. values for each column; OPTIONAL -ps # if true, produces a postscript string instead of a GD::Image object -pdf # if true AND the -file argumant is used, produces an output pdf file =cut sub draw_logo { no strict; my $self = shift; my %args = (-xsize => 600, -full_scale => 2.25, -graph_title=> "", -x_title => "", -y_title => "", -startpos => 1, @_); # Other parameters that can be specified: # -ysize -line_width -margin # do not have a fixed default value # - they are calculated from xsize if not specified # draw postscript logo if asked for if ($args{'-ps'} || $args{'-pdf'}){ return _draw_ps_logo($self, %args); } if ($args{'-svg'} || $args{'-SVG'}){ return _draw_svg_logo($self, %args); } my ($xsize,$FULL_SCALE, $x_title, $y_title) = @args{qw(-xsize -full_scale -x_title y_title)} ; my $PER_PIXEL_LINE = 300; # calculate other parameters if not specified my $line_width = ($args{-line_width} or int ($xsize/$PER_PIXEL_LINE) or 1); my $ysize = ($args{-ysize} or $xsize/1.6); # remark (the line above): 1.6 is a standard screen x:y ratio my $margin = ($args{-margin} or $ysize*0.15); my $image = GD::Image->new($xsize, $ysize); my $white = $image->colorAllocate(255,255,255); my $black = $image->colorAllocate(0,0,0); my $motif_size = $self->pdl_matrix->getdim(0); my $font = ((&GD::gdTinyFont(), &GD::gdSmallFont(), &GD::gdMediumBoldFont(), &GD::gdLargeFont(), &GD::gdGiantFont())[int(($ysize-50)/100)] or &GD::gdGiantFont()); my $title_font = ((&GD::gdSmallFont(), &GD::gdMediumBoldFont(), &GD::gdLargeFont(), &GD::gdGiantFont())[int(($ysize-50)/100)] or &GD::gdGiantFont()); # WRITE LABELS AND TITLE # graph title #&GD::Font::MediumBold $image->string($title_font, $xsize/2-length($args{-graph_title})* $title_font->width() /2, $margin/2 - $title_font->height()/2, $args{-graph_title}, $black); # x_title $image->string($font, $xsize/2-length($args{-x_title})*$font->width()/2, $ysize-( $margin - $font->height()*0 - 5*$line_width)/2 - $font->height()/2*0, $args{-x_title}, $black); # y_title $image->stringUp($font, ($margin -$font->width()- 5*$line_width)/2 - $font->height()/2 , $ysize/2+length($args{'-y_title'})*$font->width()/2, $args{'-y_title'}, $black); # DRAW AXES # vertical: (top left to bottom right) $image->filledRectangle($margin-$line_width, $margin-$line_width, $margin-1, $ysize-$margin+$line_width, $black); # horizontal: (ditto) $image->filledRectangle($margin-$line_width, $ysize-$margin+1, $xsize-$margin+$line_width,$ysize-$margin+$line_width, $black); # DRAW VERTICAL TICKS AND LABELS # vertical axis (IC 1 and 2) my $ic_1 = ($ysize - 2* $margin) / $FULL_SCALE; foreach my $i (1..$FULL_SCALE) { $image->filledRectangle($margin-3*$line_width, $ysize-$margin - $i*$ic_1, $margin-1, $ysize-$margin+$line_width - $i*$ic_1, $black); $image->string($font, $margin-5*$line_width - $font->width, $ysize - $margin - $i*$ic_1 - $font->height()/2, $i, $black); } # DRAW HORIZONTAL TICKS AND LABELS, AND THE LOGO ITSELF # define function refs as hash elements my %draw_letter = ( A => \&_png_draw_A, C => \&_png_draw_C, G => \&_png_draw_G, T => \&_png_draw_T ); my $horiz_step = ($xsize -2*$margin) / $motif_size; #this is to avoid clutter on X axis: my $longest_label_length = length("$motif_size"); if (length ($args{-startpos}) > $longest_label_length) { $longest_label_length = length ($args{-startpos}); } if (length ($args{-startpos}+$motif_size) > $longest_label_length) { $longest_label_length = length ($args{-startpos}+$motif_size); } my $draw_every_nth_label = int($longest_label_length*$font->width+2) / $horiz_step + 1; foreach my $i (0..$motif_size) { $image->filledRectangle($margin + $i*$horiz_step, $ysize-$margin+1, $margin + $i*$horiz_step+ $line_width, $ysize-$margin+3*$line_width, $black); last if $i==$motif_size; # get the $i-th column of matrix my %ic; ($ic{A}, $ic{C}, $ic{G}, $ic{T}) = list $self->pdl_matrix->slice($i); # sort nucleotides by increasing information content my @draw_order = sort {$ic{$a}<=>$ic{$b}} qw(A C G T); # draw logo column my $xlettersize = $horiz_step /1.1; my $ybottom = $ysize - $margin; foreach my $base (@draw_order) { my $ylettersize = int($ic{$base}*$ic_1 +0.5); next if $ylettersize ==0; # draw letter $draw_letter{$base}->($image, $margin + $i*$horiz_step, $ybottom - $ylettersize, $xlettersize, $ylettersize, $white); $ybottom = $ybottom - $ylettersize-1; } if ($args{'-error_bars'} and ref($args{'-error_bars'}) eq "ARRAY") { my $sd_pix = int($args{'-error_bars'}->[$i]*$ic_1); my $yt = $ybottom - $sd_pix+1; my $yb = $ybottom + $sd_pix-1; my $xpos = $margin + ($i+0.45)*$horiz_step; my $half_width; if ($yb > $ysize-$margin+$line_width) { $yb = $ysize-$margin+$line_width } else { $image->line($xpos - $xlettersize/8, $yb, $xpos + $xlettersize/8, $yb, $black); } $image->line($xpos, $yt, $xpos, $yb, $black); $image->line($xpos - 1 , $ybottom, $xpos+1, $ybottom, $black); $image->line($xpos - $xlettersize/8, $yt, $xpos + $xlettersize/8, $yt, $black); } # print position number on x axis (The if condition is for avoiding clutter) my $xlabel = $i+ $args{-startpos}; if ($args{-startpos}<0 and $xlabel>=0) { $xlabel ++; } if ($xlabel % $draw_every_nth_label == 0) { $image->string($font, $margin + ($i+0.5)*$horiz_step - $font->width()/2, $ysize - $margin +5*$line_width, $xlabel, $black); } } # print $args{-file}; if ($args{-file}) { open (PNGFILE, ">".$args{-file}) or $self->throw("Could not write to ".$args{-file}); print PNGFILE $image->png; close PNGFILE; } return $image; } sub total_ic { return $_[0]->pdl_matrix->sum(); } =head2 _draw_ps_logo Title : _draw_ps_logo Usage : my $postscript_string = $icm->_draw_ps_logo(%args) Internal method, should be accessed using draw_logo() Function: Draws a "sequence logo", a graphical representation of a possibly degenerate fixed-width nucleotide sequence pattern, from the information content matrix Returns : a postscript string; if you only need the image file you can ignore it Args : -file, # the name of the output PNG image file # OPTIONAL: default none -xsize # width of the image in pixels # OPTIONAL: default 600 -ysize # height of the image in pixels # OPTIONAL: default 5/8 of -x_size -full_scale # the maximum value on the y-axis, in bits # OPTIONAL: default 2.25 -graph_title,# the graph title # OPTIONAL: default none -x_title, # x-axis title; OPTIONAL: default none -y_title # y-axis title; OPTIONAL: default none =cut sub _draw_ps_logo{ my $self = shift; my %args = (-xsize => 600, -full_scale => 2.25, -graph_title=> "", -x_title => "", -y_title => "", @_); my $xsize= $args{'-xsize'}; my $max_ysize= $args{'-ysize'} ||int 5* $args{'-xsize'}/8; my $ysize= $max_ysize*($args{'-full_scale'}-($args{'-full_scale'}-2))/$args{'-full_scale'}; my $x=100; # nternal, for placement on 'paper' my $y=100; my $out= "%!PS-Adobe-2.0 %%Orientation: Portrait %%Pages: 1 %%BoundingBox: 0 0 ".($args{'-xsize'}*1.2)." ".( $max_ysize*1.5)." %%BeginSetup %%EndSetup %%Magnification: 1.0000 %%EndProlog %%end %%save gsave\n"; #colors and correction definitions my %color; $color{'black'}="0.000 0.000 0.000 setrgbcolor"; $color{'A'}="0.000 1.000 0.000 setrgbcolor"; $color{'C'}="0.000 0.000 1.000 setrgbcolor"; $color{'G'}="1.000 0.860 0.000 setrgbcolor"; $color{'T'}="1.000 0.000 0.000 setrgbcolor"; my $fontsize= int $ysize*0.68; my $fontwidth=1.5*($xsize/$self->length()); my %w_correct; # correction of font widths $w_correct{'A'}=0.95; $w_correct{'T'}=1.05; $w_correct{'C'}=0.90; $w_correct{'G'}=0.90; my %y_next;#correction of font heights $y_next{'A'}=1; $y_next{'T'}=1; $y_next{'C'}=0.94; $y_next{'G'}=0.94; my %y_correct; #correction of font bounding boxes $y_correct{'A'}=0; $y_correct{'C'}=0.035*$fontsize; $y_correct{'G'}=0.035*$fontsize; $y_correct{'T'}=0; #define y axis,tickmarks and scaling my $font= $fontwidth/5; $out.="newpath\n ". ($x-10)." ". ($y+2*$ysize/4 )." moveto\n". "$x ". ($y+2*$ysize/4 ) ." lineto\n stroke\n"; $out.= "gsave\n/Times-Bold findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x-20). " ".( $y+$ysize/2)." moveto\n"; $out.=" (1) show\n grestore\n" ; $out.="newpath\n ". ($x-10)." ". ($y+$ysize )." moveto\n". "$x ". ($y+$ysize) ." lineto\n stroke\n"; $out.="newpath\n ". ($x-10)." ". ($y+$max_ysize )." moveto\n". "$x ". ($y+$max_ysize) ." lineto\n stroke\n"; $out.= "gsave\n/Times-Bold findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x-20). " ".( $y+$ysize)." moveto\n"; $out.=" (2) show\n grestore\n" ; $out.="newpath\n $x $y moveto\n". ($x). " ".($y+$max_ysize) ." lineto\n stroke\n"; $out.="newpath\n $x $y moveto\n". ($x+$xsize). " ".($y) ." lineto\n stroke\n"; # draw titles if requested if ($args{'-y_title'}){ $out.= "gsave\n/Times-Italic findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x-40). " ".( $y+$ysize/2)." moveto\n"; $out.=" 90 rotate ($args{'-y_title'}) show\n grestore\n" ; } if ($args{'-x_title'}){ $out.= "gsave\n/Times-Italic findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x+$xsize/2.5). " ".( $y*(0.60))." moveto\n"; $out.=" ($args{'-x_title'}) show\n grestore\n" ; } if ($args{'-title'}){ $out.= "gsave\n/Times-Roman findfont $color{black} [".($font*2)." 0 0 $font 0 0] makefont setfont\n".($x+$xsize/3). " ".( $y+$max_ysize*1.1)." moveto\n"; $out.=" ($args{'-title'}) show\n grestore\n" ; } # define x axis and x tickmarks my $col_width=($xsize/$self->length()) -0.006*$xsize; my $x_now; for(my $i=1; $i<=$self->length(); $i++){ $x_now=$x+$col_width*$i; $out.="newpath\n ". ($x_now)." ". ($y)." moveto\n". ($x_now)." ". ($y-$ysize/20 ) ." lineto\n stroke\n"; $out.= "gsave\n/Times-Bold findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x_now-$col_width/2). " ".( $y-20)." moveto\n"; $out.=" ($i) show\n grestore\n" ; } # draw the logo foreach my $i (0..$self->length()-1 ) { # get the $i-th column of matrix my %ic; ($ic{A}, $ic{C}, $ic{G}, $ic{T}) = list $self->pdl_matrix->slice($i); my @draw_order = sort {$ic{$a}<=>$ic{$b}} qw(A C G T); #draw this position foreach my $letter (@draw_order){ $ic{$letter}=0.0000001 if ( $ic{$letter}==0); # some interpretors do not uderstand size 0 $out.= "gsave\n/Helvetica-Bold findfont $color{$letter} [".$fontwidth*$w_correct{$letter}." 0 0 "; $out.= $ic{$letter}*$fontsize*$y_next{$letter} ; $y+=$y_correct{$letter}*$ic{$letter}; #movement that isletter specific, due to bounding boxes $out.= " 0 0] makefont setfont\n$x $y moveto\n"; $out.= " ($letter) show\n grestore\n" ; $y+=$fontsize*$ic{$letter}*0.75; #ic content move } $x+=$fontwidth/1.6; $y=100; } # save as file if requested if ($args{-file}) { open (PSFILE, ">".$args{-file}) or $self->throw("Could not write to ".$args{-file}); print PSFILE $out; close PSFILE; } if ($args{-pdf}){ system "ps2pdf $args{-file} ".$args{-file}.".pdf "; system " mv $args{-file}.pdf $args{-file}"; } return $out; } =head2 _draw_svg_logo =cut sub _draw_svg_logo { my $self = shift; my %args = (-xsize => 800, -full_scale => 2.25, -graph_title=> "", -x_title => "", -y_title => "", @_); my $max_ysize= $args{'-ysize'} ||int 5* $args{'-xsize'}/8; my ($xsize,$FULL_SCALE, $x_title, $y_title) = @args{qw(-xsize -full_scale -x_title y_title)} ; my $PER_PIXEL_LINE = 200; # calculate other parameters if not specified my $ysize = ($args{-ysize} or $xsize/1.6); my $line_width = ($args{-line_width} or $ysize/$PER_PIXEL_LINE); # remark (the line above): 1.6 is a standard screen x:y ratio my $margin = ($args{-margin} or $ysize*0.15); my $image = SVG->new(width=>$xsize, height=>$ysize); my $white = 'rgb(255,255,255)'; my $black = 'rgb(0,0,0)'; my $motif_size = $self->pdl_matrix->getdim(0); my $fontsize = int ($ysize/25); my $title_font = {width=>$fontsize*1.5, height=>$fontsize*1.5}; my $font = {width=>$fontsize, height=>$fontsize}; # WRITE LABELS AND TITLE # graph title $image->text(id=>"Title", 'font-size'=>$title_font->{width}, x => $xsize/2, y => 0.6*$margin, 'text-anchor'=>'middle' )->cdata($args{-graph_title}); # x title $image->text(id=>"X_title", 'font-size'=>$font->{width}, x => $xsize/2, y => $ysize -0.3*$margin, 'text-anchor'=>'middle' )->cdata($args{-x_title}); # y title my $g = $image->group; $g->text(id=>"Y_title", 'font-size'=>$font->{width}, x => 0 , 'text-anchor'=>'middle', y => 0, transform => 'rotate(-90) translate(-'.($ysize/2).','.($margin/2).')')->cdata($args{-y_title}); # DRAW AXES # vertical: (top left to bottom right) $image->rectangle(id => "y_axis", style => { #stroke => $black, fill => $black }, x => $margin-$line_width, y => $margin, width => $line_width, height => $ysize -2*$margin ); #$image->filledRectangle($margin-$line_width, $margin-$line_width, # $margin-1, $ysize-$margin+$line_width, # # $black); # horizontal: (ditto) $image->rectangle(id => "x_axis", style => { #stroke => $black, fill => $black }, x => $margin-$line_width, y => $ysize-$margin, width => $xsize-2*$margin+$line_width, height => $line_width ); #$image->filledRectangle($margin-$line_width, $ysize-$margin+1, # $xsize-$margin+$line_width,$ysize-$margin+$line_width, # $black); # DRAW VERTICAL TICKS AND LABELS # vertical axis (IC 1 and 2) my $ic_1 = ($ysize - 2* $margin) / $FULL_SCALE; foreach my $i (1..$FULL_SCALE) { $image->rectangle(x => $margin-3*$line_width, y => $ysize-$margin - $i*$ic_1, width => 3*$line_width, height => $line_width ); $image->text(x => $margin-5*$line_width - $font->{width}, y => $ysize - $margin - $i*$ic_1 +$font->{height}/2, 'font-size'=>$font->{width}, 'text-anchor'=>"right" )->cdata($i); } # DRAW HORIZONTAL TICKS AND LABELS, AND THE LOGO ITSELF # define function refs as hash elements my %draw_letter = ( A => \&_svg_draw_A, C => \&_svg_draw_C, G => \&_svg_draw_G, T => \&_svg_draw_T ); my $horiz_step = ($xsize -2*$margin) / $motif_size; #this is to avoid clutter on X axis: my $longest_label_length = length("$motif_size"); if (length ($args{-startpos}) > $longest_label_length) { $longest_label_length = length ($args{-startpos}); } if (length ($args{-startpos}+$motif_size) > $longest_label_length) { $longest_label_length = length ($args{-startpos}+$motif_size); } my $draw_every_nth_label = int(($longest_label_length+0.25)*$font->{width}) / $horiz_step + 1; foreach my $i (0..$motif_size) { my $height = 3*$line_width; if ($i and $i==$args{-startpos}*-1){ $height = 5*$line_width; } $image->rectangle(x => $margin + $i*$horiz_step -$line_width/2, y => $ysize-$margin, width => $line_width, height => $height ); last if $i==$motif_size; # get the $i-th column of matrix my %ic; ($ic{A}, $ic{C}, $ic{G}, $ic{T}) = list $self->pdl_matrix->slice($i); # sort nucleotides by increasing information content my @draw_order = sort {$ic{$a}<=>$ic{$b}} qw(A C G T); # draw logo column my $xlettersize = $horiz_step*0.95; my $ybottom = $ysize - $margin; foreach my $base (@draw_order) { my $ylettersize = $ic{$base}*$ic_1; next if $ylettersize ==0; # draw letter $draw_letter{$base}->($image, $margin + $i*$horiz_step + 0.025* $horiz_step, $ybottom - $ylettersize, $xlettersize, $ylettersize, $white); $ybottom = $ybottom - $ylettersize; } if ($args{'-error_bars'} and ref($args{'-error_bars'}) eq "ARRAY") { my $sd_pix = int($args{'-error_bars'}->[$i]*$ic_1); my $yt = $ybottom - $sd_pix+1; my $yb = $ybottom + $sd_pix-1; my $xpos = $margin + ($i+0.5)*$horiz_step; my $half_width; if ($yb > $ysize-$margin+$line_width) { $yb = $ysize-$margin+$line_width } else { $image->line(x1=>$xpos - $xlettersize/8, y1=> $yb, x2=> $xpos + $xlettersize/8, y2=>$yb, stroke=>$black, 'stroke-width'=>$line_width); } $image->line(x1=>$xpos, y1=>$yt, x2=>$xpos, y2=>$yb, stroke=>$black, 'stroke-width'=>$line_width); $image->line(x1=>$xpos - $line_width , y1=>$ybottom, x2=>$xpos+$line_width, y2=>$ybottom, stroke=>$black, 'stroke-width'=>$line_width); $image->line(x1=>$xpos - $xlettersize/8, y1=>$yt, x2=>$xpos + $xlettersize/8, y2=>$yt, stroke=>$black, 'stroke-width'=>$line_width); } # print position number on x axis my $xlabel = $i+ $args{-startpos}; if ($args{-startpos}<0 and $xlabel>=0) { $xlabel ++; } if ($xlabel % $draw_every_nth_label == 0) { $image->text(x => $margin + ($i+0.5)*$horiz_step - $font->{width}/2, y => $ysize - $margin +5*$line_width + $font->{width}/2, 'font-size'=>$font->{width}, 'text-anchor'=>"bottom" )->cdata($xlabel); } } # print to $args{-file}; if ($args{-file}) { open (SVGFILE, ">".$args{-file}) or $self->throw("Could not write to ".$args{-file}); my $xml = $image->xmlify; $xml =~ s/\s+<\/text/<\/text/gs; print SVGFILE $xml; close SVGFILE; } return $image; } =head2 name =head2 ID =head2 class =head2 matrix =head2 length =head2 revcom =head2 rawprint =head2 prettyprint The above methods are common to all matrix objects. Please consult L to find out how to use them. =cut ################################################################# # INTERNAL METHODS ################################################################# sub _check_ic_validity { my ($self) = @_; # to do } sub DESTROY { # nothing } ################################################################# # UTILITY FUNCTIONS ################################################################# # letter drawing routines sub _png_draw_A { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $green = $im->colorAllocate(0,255,0); my $outPoly = GD::Polygon->new(); $outPoly->addPt($x, $y+$ysize); $outPoly->addPt($x+$xsize*.42, $y); $outPoly->addPt($x+$xsize*.58, $y); $outPoly->addPt($x+$xsize, $y+$ysize); $outPoly->addPt($x+0.85*$xsize, $y+$ysize); $outPoly->addPt($x+0.725*$xsize, $y+0.75*$ysize); $outPoly->addPt($x+0.275*$xsize, $y+0.75*$ysize); $outPoly->addPt($x+0.15*$xsize, $y+$ysize); $im->filledPolygon($outPoly, $green); if ($ysize>8) { my $inPoly = GD::Polygon->new(); $inPoly->addPt($x+$xsize*.5, $y+0.2*$ysize); $inPoly->addPt($x+$xsize*.34, $y+0.6*$ysize-1); $inPoly->addPt($x+$xsize*.64, $y+0.6*$ysize-1); $im->filledPolygon($inPoly, $white); } return 1; } sub _png_draw_C { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $blue = $im->colorAllocate(0,0,255); $im->arc($x+$xsize*0.54, $y+$ysize/2,1.08*$xsize,$ysize,0,360,$blue); $im->fill($x+$xsize/2, $y+$ysize/2, $blue); if ($ysize>12) { $im->arc($x+$xsize*0.53, $y+$ysize/2, 0.75*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize/2, $y+$ysize/4+1, $x+$xsize*1.1, $y+(3*$ysize/4)-1, $white); } elsif ($ysize>3) { $im->arc($x+$xsize*0.53, $y+$ysize/2, (0.75-0.75/$ysize)*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize*0.25, $y+$ysize/2, $x+$xsize*1.1, $y+$ysize/2, $white); } return 1; } sub _png_draw_G { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $yellow = $im->colorAllocate(200,200,0); $im->arc($x+$xsize*0.54, $y+$ysize/2,1.08*$xsize,$ysize,0,360,$yellow); $im->fill($x+$xsize/2, $y+$ysize/2, $yellow); if ($ysize>20) { $im->arc($x+$xsize*0.53, $y+$ysize/2, 0.75*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize/2, $y+$ysize/4+1, $x+$xsize*1.1, $y+$ysize/2-1, $white); } elsif($ysize>3) { $im->arc($x+$xsize*0.53, $y+$ysize/2, (0.75-0.75/$ysize)*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize*0.25, $y+$ysize/2, $x+$xsize*1.1, $y+$ysize/2, $white); } $im->filledRectangle($x+0.85*$xsize, $y+$ysize/2, $x+$xsize,$y+(3*$ysize/4)-1, $yellow); $im->filledRectangle($x+0.6*$xsize, $y+$ysize/2, $x+$xsize,$y+(5*$ysize/8)-1, $yellow); return 1; } sub _png_draw_T { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $red = $im->colorAllocate(255,0,0); $im->filledRectangle($x, $y, $x+$xsize, $y+0.16*$ysize, $red); $im->filledRectangle($x+0.42*$xsize, $y, $x+0.58*$xsize, $y+$ysize, $red); return 1; } sub _svg_draw_A { my ($im, $x, $y, $xsize, $ysize) = @_; $im->polygon( points => [$x, $y+$ysize, $x+$xsize*.42, $y, $x+$xsize*.58, $y, $x+$xsize, $y+$ysize, $x+0.85*$xsize, $y+$ysize, $x+0.725*$xsize, $y+0.75*$ysize, $x+0.275*$xsize, $y+0.75*$ysize, $x+0.15*$xsize, $y+$ysize, $x, $y+$ysize], fill => 'rgb(0,255,0)' ); $im->polygon( points => [$x+$xsize*.5, $y+0.2*$ysize, $x+$xsize*.34, $y+0.6*$ysize, $x+$xsize*.64, $y+0.6*$ysize ], fill => 'rgb(255,255,255)'); return 1; } sub _svg_draw_C { my ($im, $x, $y, $xsize, $ysize) = @_; $im->ellipse(cx=>$x+$xsize*0.54, cy=>$y+$ysize/2, rx=>$xsize*0.54, ry=>$ysize/2, fill => 'rgb(0,0,255)'); $im->ellipse( cx=>$x+$xsize*0.53, cy=>$y+$ysize/2, rx=>$xsize*0.375, ry=>$ysize*0.375, fill => 'rgb(255,255,255)'); $im->rectangle(x=>$x+$xsize/2, y=>$y+$ysize/4, width =>$xsize*0.6, height =>$ysize/2, fill=> 'rgb(255,255,255)'); return 1; } sub _svg_draw_G { my ($im, $x, $y, $xsize, $ysize, $white) = @_; $im->ellipse(cx => $x+$xsize*0.54, cy => $y+$ysize/2, rx => 0.54*$xsize, ry => $ysize/2, fill => 'rgb(200,200,0)'); $im->ellipse(cx => $x+$xsize*0.53, cy => $y+$ysize/2, rx => 0.375*$xsize, ry => 0.375*$ysize, fill => 'rgb(255,255,255)'); $im->rectangle(x=>$x+$xsize/2, y=>$y+$ysize/4, width =>$xsize*0.6, height =>$ysize/2, fill=> 'rgb(255,255,255)'); $im->rectangle(x=>$x+0.80*$xsize, y=>$y+$ysize/2, width =>$xsize*0.208, height =>$ysize/4, fill=> 'rgb(200,200,0)'); $im->rectangle(x=>$x+0.6*$xsize, y=>$y+$ysize/2, width =>$xsize*0.408, height =>$ysize/8, fill=> 'rgb(200,200,0)'); return 1; } sub _svg_draw_T { my ($im, $x, $y, $xsize, $ysize, $white) = @_; $im->polygon (points =>[$x, $y, $x+$xsize, $y, $x+$xsize, $y+0.16*$ysize, $x+0.58*$xsize, $y+0.16*$ysize, $x+0.58*$xsize, $y+$ysize, $x+0.42*$xsize, $y+$ysize, $x+0.42*$xsize, $y+0.16*$ysize, $x, $y+0.16*$ysize], fill => 'rgb(255,0,0)'); return 1; } 1; TFBS-0.7.1/TFBS/Matrix/PFM.pm000077500000000000000000000373341305752266700152710ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::PFM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::PFM - class for raw position frequency matrix patterns =head1 SYNOPSIS =over 4 =item * creating a TFBS::Matrix::PFM object manually: my $matrixref = [ [ 12, 3, 0, 0, 4, 0 ], [ 0, 0, 0, 11, 7, 0 ], [ 0, 9, 12, 0, 0, 0 ], [ 0, 0, 0, 1, 1, 12 ] ]; my $pfm = TFBS::Matrix::PFM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = "12 3 0 0 4 0\n0 0 0 11 7 0\n0 9 12 0 0 0\n0 0 0 1 1 12"; my $pfm = TFBS::Matrix::PFM->new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" ); =item * retrieving a TFBS::Matix::PFM object from a database: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pfm = $db_obj->get_Matrix_by_ID("M0001", "PFM"); # or my $pfm = $db_obj->get_Matrix_by_name("MyProfile", "PFM"); =item * retrieving list of individual TFBS::Matrix::PFM objects from a TFBS::MatrixSet object (See the L to learn how to create objects for storage and manipulation of multiple matrices.) my @pfm_list = $matrixset->all_patterns(-sort_by=>"name"); =item * convert a raw frequency matrix to other matrix types: my $pwm = $pfm->to_PWM(); # convert to position weight matrix my $icm = $icm->to_ICM(); # convert to information con =back =head1 DESCRIPTION TFBS::Matrix::PFM is a class whose instances are objects representing raw position frequency matrices (PFMs). A PFM is derived from N nucleotide patterns of fixed size, e.g. the set of sequences AGGCCT AAGCCT AGGCAT AAGCCT AAGCCT AGGCAT AGGCCT AGGCAT AGGTTT AGGCAT AGGCCT AGGCCT will give the matrix: A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ] which contains the count of each nucleotide at each position in the sequence. (If you have a set of sequences as above and want to create a TFBS::Matrix::PFM object out of them, have a look at TFBS::PatternGen::SimplePFM module.) PFMs are easily converted to other types of matrices, namely information content matrices and position weight matrices. A TFBS::Matrix::PFM object has the methods to_ICM and to_PWM which do just that, returning a TFBS::Matrix::ICM and TFBS::Matrix::PWM objects, respectively. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Matrix::PFM; use vars '@ISA'; use PDL; use strict; use Bio::Root::Root; use Bio::SeqIO; use TFBS::Matrix; use TFBS::Matrix::ICM; use TFBS::Matrix::PWM; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); use constant EXACT_SCHNEIDER_MAX => 30; ####################################################### # PUBLIC METHODS ####################################################### =head2 new Title : new Usage : my $pfm = TFBS::Matrix::PFM->new(%args) Function: constructor for the TFBS::Matrix::PFM object Returns : a new TFBS::Matrix::PFM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL Warnings : Warns if the matrix provided has columns with different sums. Columns with different sums contradict the usual origin of matrix data and, unless you are absolutely sure that column sums _should_ be different, it would be wise to check your matrices. =cut sub new { my ($class, %args) = @_; my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"PFM"); my $self = bless $matrix, ref($class) || $class; $self->_check_column_sums(); return $self; } =head2 column_sum Title : column_sum Usage : my $nr_sequences = $pfm->column_sum() Function: calculates the sum of elements of one column (the first one by default) which normally equals the number of sequences used to derive the PFM. Returns : the sum of elements of one column (an integer) Args : columnn number (starting from 1), OPTIONAL - you DO NOT need to specify it unless you are dealing with a matrix =cut sub column_sum { my ($self, $column) = (@_,1); return $self->pdl_matrix->slice($column-1)->sum; } =head2 to_PWM Title : to_PWM Usage : my $pwm = $pfm->to_PWM() Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) to position weight matrix. At present it assumes uniform background distribution of nucleotide frequencies. Returns : a new TFBS::Matrix::PWM object Args : none; in the future releases, it should be able to accept a user defined background probability of the four nucleotides =cut sub to_PWM { my ($self, %args) = @_; my $bg = ($args{'-bg_probabilities' } || $self->{'bg_probabilities'}); my $bg_pdl = transpose pdl ($bg->{'A'}, $bg->{'C'}, $bg->{'G'}, $bg->{'T'}); my $nseqs = $self->pdl_matrix->sum / $self->length; my $q_pdl = ($self->pdl_matrix +$bg_pdl*sqrt($nseqs)) / ($nseqs + sqrt($nseqs)); my $pwm_pdl = log2(4*$q_pdl); my $PWM = TFBS::Matrix::PWM->new ( (map {("-$_", $self->{$_}) } keys %$self), # do not want tags to point to the same arrayref as in $self: -tags => \%{ $self->{'tags'}}, -bg_probabilities => \%{ $self->{'bg_probabilities'}}, -matrix => $pwm_pdl ); return $PWM; } =head2 to_ICM Title : to_ICM Usage : my $icm = $pfm->to_ICM() Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) to information content matrix. At present it assumes uniform background distribution of nucleotide frequencies. Returns : a new TFBS::Matrix::ICM object Args : -small_sample_correction # undef (default), 'schneider' or 'pseudocounts' How a PFM is converted to ICM: For a PFM element PFM[i,k], the probability without pseudocounts is estimated to be simply p[i,k] = PFM[i,k] / Z where - Z equals the column sum of the matrix i.e. the number of motifs used to construct the PFM. - i is the column index (position in the motif) - k is the row index (a letter in the alphacer, here k is one of (A,C,G,T) Here is how one normally calculates the pseudocount-corrected positional probability p'[i,j]: p'[i,k] = (PFM[i,k] + 0.25*sqrt(Z)) / (Z + sqrt(Z)) 0.25 is for the flat distribution of nucleotides, and sqrt(Z) is the recommended pseudocount weight. In the general case, p'[i,k] = (PFM[i,k] + q[k]*B) / (Z + B) where q[k] is the background distribution of the letter (nucleotide) k, and B an arbitrary pseudocount value or expression (for no pseudocounts B=0). For a given position i, the deviation from random distribution in bits is calculated as (Baldi and Brunak eq. 1.9 (2ed) or 1.8 (1ed)): - for an arbitrary alphabet of A letters: D[i] = log2(A) + sum_for_all_k(p[i,k]*log2(p[i,k])) - special case for nucleotides (A=4) D[i] = 2 + sum_for_all_k(p[i,k]*log2(p[i,k])) D[i] equals the information content of the position i in the motif. To calculate the entire ICM, you have to calculate the contrubution of each nucleotide at a position i to D[i], i.e. ICM[i,k] = p'[i,k] * D[i] =cut sub to_ICM { my ($self, %args) = @_; my $bg = ($args{'-bg_probabilities' } || $self->{'bg_probabilities'}); # compute ICM my $bg_pdl = transpose pdl ($bg->{'A'}, $bg->{'C'}, $bg->{'G'}, $bg->{'T'}); my $Z_pdl = $self->pdl_matrix->xchg(0,1)->sumover; # pseudocount calculation my $B = 0; if (lc($args{'-small_sample_correction'} or "") eq "pseudocounts") { $B = sqrt($Z_pdl); } else { $B = 0; # do not add pseudocounts } my $p_pdl = ($self->pdl_matrix +$bg_pdl*$B)/ ($Z_pdl + $B); my $plog_pdl = $p_pdl*log2($p_pdl); $plog_pdl = $plog_pdl->badmask(0); my $D_pdl = 2 + $plog_pdl->xchg(0,1)->sumover; my $ic_pdl = $p_pdl * $D_pdl; # apply Schneider correction if requested if (lc($args{'-small_sample_correction'} or "") eq "schneider") { my $columnsum_pdl = $ic_pdl->transpose->sumover; my $corrected_columnsum_pdl = $columnsum_pdl + _schneider_correction ($self->pdl_matrix, $bg_pdl); $ic_pdl *= $corrected_columnsum_pdl/$columnsum_pdl; } # construct and return an ICM object my $ICM = TFBS::Matrix::ICM->new ( (map {("-$_" => $self->{$_})} keys %$self), -tags => \%{ $self->{'tags'}}, -bg_probabilities => \%{ $self->{'bg_probabilities'}}, -matrix => $ic_pdl ); return $ICM; } =head2 draw_logo Title : draw_logo Usage : my $gd_image = $pfm->draw_logo() Function: draws a sequence logo; similar to the method in TFBS::Matrix::ICM, but can automatically calculate error bars for drawing Returns : a GD image object (see documentation of GD module) Args : many; PFM-specific options are: -small_sample_correction # One of # "Schneider" (uses correction # described by Schneider et al. # (Schneider t et al. (1986) J.Biol.Chem. # "pseudocounts" - standard pseudocount # correction, more suitable for # PFMs with large r column sums # If the parameter is ommited, small # sample correction is not applied -draw_error_bars # if true, adds error bars to each position # in the logo. To calculate the error bars, # it uses the -small_sample_connection # argument if explicitly set, # or "Schneider" by default For other args, see draw_logo entry in TFBS::Matrix::ICM documentation =cut sub draw_logo { my ($self, %args) = @_; if ($args{'-draw_error_bars'}) { $args{'-small_sample_correction'} ||= "Schneider"; # default Schneider my $pdl_no_correction = $self->to_ICM() ->pdl_matrix->transpose->sumover; my $pdl_with_correction = $self->to_ICM(-small_sample_correction => $args{'-small_sample_correction'}) ->pdl_matrix->transpose->sumover; $args{'-error_bars'} = [list ($pdl_no_correction - $pdl_with_correction)]; } $self->to_ICM(%args)->draw_logo(%args); } =head2 add_PFM Title : add_PFM Usage : $pfm->add_PFM($another_pfm) Function: adds the values of $pnother_pfm matrix to $pfm Returns : reference to the updated $pfm object Args : a TFBS::Matrix::PFM object =cut sub add_PFM { my ($self, $pfm) = @_; $pfm->isa("TFBS::Matrix::PFM") or $self->throw("Wrong or no argument passed to add_PFM"); my $sum = $self->pdl_matrix + $pfm->pdl_matrix; $self->set_matrix($sum); return $self; } =head2 name =head2 ID =head2 class =head2 matrix =head2 length =head2 revcom =head2 rawprint =head2 prettyprint The above methods are common to all matrix objects. Please consult L to find out how to use them. =cut ############################################### # PRIVATE METHODS ############################################### sub _check_column_sums { my ($self) = @_; my $pdl = $self->pdl_matrix->sever(); my $rowsums = $pdl->xchg(0,1)->sumover(); if ($rowsums->where($rowsums != $rowsums->slice(0))->getdim(0) > 0) { $self->warn("PFM for ".$self->{ID}." has unequal column sums"); } } sub DESTROY { # does nothing } ############################################### # UTILITY FUNCTIONS ############################################### sub log2 { log($_[0]) / log(2); } sub _schneider_correction { my ($pdl, $bg_pdl) = @_; my $Hg = -sum ($bg_pdl*log2($bg_pdl)); my (@Hnbs, %saved_Hnb); my $is_flat = _is_bg_flat(list $bg_pdl); my @factorials = (1); if (min($pdl->transpose->sumover) <= EXACT_SCHNEIDER_MAX) { foreach my $i (1..max($pdl->transpose->sumover)) { $factorials[$i] =$factorials[$i-1] * $i; } } my @column_sums = list $pdl->transpose->sumover; foreach my $colsum (@column_sums) { if (defined($saved_Hnb{$colsum})) { push @Hnbs, $saved_Hnb{$colsum}; } else { my $Hnb; if ($colsum <= EXACT_SCHNEIDER_MAX) { if ($is_flat) { $Hnb = _schneider_Hnb_precomputed($colsum); } else { $Hnb = _schneider_Hnb_exact($colsum, $bg_pdl, \@factorials); } } else { $Hnb = _schneider_Hnb_approx($colsum, $Hg); } $saved_Hnb{$colsum} = $Hnb; push @Hnbs, $Hnb; } } return -$Hg + pdl(@Hnbs); } sub _schneider_Hnb_exact { my ($n, $bg_pdl, $rFactorial) = @_; my $is_flat = _is_bg_flat(list $bg_pdl); return 0 if $n==1; # my @fctrl = (1); # foreach my $i (1..max($pdl->transpose->sumover)) { # $rFactorial->[$i] =$rFactorial->[$i-1] * $i; # } # my @colsum = list $pdl->transpose->sumover; my ($na, $nc, $ng, $nt) = ($n, 0,0,0); # my $n = $colsum[0]; my $E_Hnb=0; while (1) { my $ns_pdl = pdl [$na, $nc, $ng, $nt]; my $Pnb = ($rFactorial->[$n] / ($rFactorial->[$na] *$rFactorial->[$nc] *$rFactorial->[$ng] *$rFactorial->[$nt]) )*prod($bg_pdl->transpose**pdl($na, $nc, $ng, $nt)); my $Hnb = -1 * sum(($ns_pdl/$n)*log2($ns_pdl/$n)->badmask(0)); $E_Hnb += $Pnb*$Hnb; if ($nt) { if ($ng) { $ng--; $nt++, } elsif ($nc) { $nc--; $ng = $nt+1; $nt = 0; } elsif ($na) { $na--; $nc = $nt+1; $nt = 0; } else { last; } } else { if ($ng) { $ng--; $nt++, } elsif ($nc) { $nc--; $ng++; } else { $na--; $nc++; $nt = 0; } } } return $E_Hnb; } sub _schneider_Hnb_approx { my ($colsum, $Hg) = @_; return $Hg -3/(2*log(2)*$colsum); } sub _schneider_Hnb_precomputed { my $i = shift; if ($i<1 or $i>30) { die "Precomputed params only available for colsums 1 to 30)"; } my @precomputed = ( 0, # 1 0.75, # 2 1.11090234442608, # 3 1.32398964833609, # 4 1.46290503577084, # 5 1.55922640783176, # 6 1.62900374746751, # 7 1.68128673969433, # 8 1.7215504663901, # 9 1.75328193031842, # 10 1.77879136615189, # 11 1.79965855531179, # 12 1.81699248819687, # 13 1.8315892710679, # 14 1.84403166371213, # 15 1.85475371994775, # 16 1.86408383599326, # 17 1.87227404728809, # 18 1.87952034817826, # 19 1.88597702438913, # 20 1.89176691659196, # 21 1.89698887214968, # 22 1.90172322434865, # 23 1.90603586889234, # 24 1.90998133028897, # 25 1.91360509239859, # 26 1.91694538711761, # 27 1.92003457997914, # 28 1.92290025302018, # 29 1.92556605820924, # 30 ); return $precomputed[$i-1]; } sub _is_bg_flat { my @bg = @_; my $ref = shift; foreach my $other (@bg) { return 0 unless $ref==$other; } return 1; } 1; TFBS-0.7.1/TFBS/Matrix/PWM.pm000077500000000000000000000362221305752266700153050ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::PWM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::PWM - class for position weight matrices of nucleotide patterns =head1 SYNOPSIS =over 4 =item * creating a TFBS::Matrix::PWM object manually: my $matrixref = [ [ 0.61, -3.16, 1.83, -3.16, 1.21, -0.06], [-0.15, -2.57, -3.16, -3.16, -2.57, -1.83], [-1.57, 1.85, -2.57, -1.34, -1.57, 1.14], [ 0.31, -3.16, -2.57, 1.76, 0.24, -0.83] ]; my $pwm = TFBS::Matrix::PWM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = <new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" ); =item * retrieving a TFBS::Matix::PWM object from a database: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pwm = $db_obj->get_Matrix_by_ID("M0001", "PWM"); # or my $pwm = $db_obj->get_Matrix_by_name("MyProfile", "PWM"); =item * retrieving list of individual TFBS::Matrix::PWM objects from a TFBS::MatrixSet object (see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices) my @pwm_list = $matrixset->all_patterns(-sort_by=>"name"); =item * scanning a nucleotide sequence with a matrix my $siteset = $pwm->search_seq(-file =>"myseq.fa", -threshold => "80%"); =item * scanning a pairwise alignment with a matrix my $site_pair_set = $pwm->search_aln(-file =>"myalign.aln", -threshold => "80%", -cutoff => "70%", -window => 50); =back =head1 DESCRIPTION TFBS::Matrix::PWM is a class whose instances are objects representing position weight matrices (PWMs). A PWM is normally calculated from a raw position frequency matrix (see L for the explanation of position frequency matrices). For example, given the following position frequency matrix: A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ] The standard computational procedure is applied to convert it into the following position weight matrix: A:[ 0.61 -3.16 1.83 -3.16 1.21 -0.06] C:[-0.15 -2.57 -3.16 -3.16 -2.57 -1.83] G:[-1.57 1.85 -2.57 -1.34 -1.57 1.14] T:[ 0.31 -3.16 -2.57 1.76 0.24 -0.83] which contains the "weights" associated with the occurrence of each nucleotide at the given position in a pattern. A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Matrix::PWM; use vars '@ISA'; use PDL; use strict; use Bio::Root::Root; use Bio::Seq; use Bio::SeqIO; use TFBS::Matrix; use TFBS::SiteSet; use TFBS::Matrix::_Alignment; use TFBS::Ext::pwmsearch; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); ################################################################# # PUBLIC METHODS ################################################################# =head2 new Title : new Usage : my $pwm = TFBS::Matrix::PWM->new(%args) Function: constructor for the TFBS::Matrix::PWM object Returns : a new TFBS::Matrix::PWM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL =cut sub new { my ($class, %args) = @_; my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"PWM"); my $self = bless $matrix, ref($class) || $class; $self->_set_min_max_score(); return $self; } =head2 search_seq Title : search_seq Usage : my $siteset = $pwm->search_seq(%args) Function: scans a nucleotide sequence with the pattern represented by the PWM Returns : a TFBS::SiteSet object Args : # you must specify either one of the following three: -file, # the name od a fasta file (single sequence) #or -seqobj # a Bio::Seq object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -seqstring # a string containing the sequence -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -subpart # subpart of the sequence to search, given as # -subpart => { start => 140, # end => 180 } # where start and end are coordinates in the # sequence; the coordinate range is interpreted # in the BioPerl tradition (1-based, inclusive) # OPTIONAL: by default searches entire alignment =cut sub search_seq { my ($self, %args) = @_; $self->_search(%args); } =head2 search_aln Title : search_aln Usage : my $site_pair_set = $pwm->search_aln(%args) Function: Scans a pairwise alignment of nucleotide sequences with the pattern represented by the PWM: it reports only those hits that are present in equivalent positions of both sequences and exceed a specified threshold score in both, AND are found in regions of the alignment above the specified conservation cutoff value. Returns : a TFBS::SitePairSet object Args : # you must specify either one of the following three: -file, # the name of the alignment file in Clustal format #or -alignobj # a Bio::SimpleAlign object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -alignstring # a multi-line string containing the alignment # in clustal format ############# -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -window, # size of the sliding window (inn nucleotides) # for calculating local conservation in the # alignment # OPTIONAL: default 50 -cutoff # conservation cutoff (%) for including the # region in the results of the pattern search # OPTIONAL: default "70%" -subpart # subpart of the alignment to search, given as e.g. # -subpart => { relative_to => 1, # start => 140, # end => 180 } # where start and end are coordinates in the # sequence indicated by relative_to (1 for the # 1st sequence in the alignment, 2 for the 2nd) # OPTIONAL: by default searches entire alignment -conservation # conservation profile, a TFBS::ConservationProfile # OPTIONAL: by default the conservation profile is # computed internally on the fly (less efficient) =cut sub search_aln { my ($self, %args) = @_; unless ($args{-alignstring} or $args{-alignobj} or $args{-file}) { $self->throw ("No alignment file, string or object passed to search_aln."); } $args{-pattern_set} = $self; my $aln = ($args{-alignment_setup} or TFBS::Matrix::_Alignment->new(%args)); $aln->do_sitesearch(%args); return $aln->site_pair_set; } sub max_score { $_[0]->{max_score}; } sub min_score { $_[0]->{min_score}; } =head2 name =head2 ID =head2 class =head2 matrix =head2 length =head2 revcom =head2 rawprint =head2 prettyprint The above methods are common to all matrix objects. Please consult L to find out how to use them. =cut ################################################################# # PRIVATE METHODS ################################################################# sub _set_min_max_score { my ($self) = @_; my $transpose = $self->pdl_matrix->xchg(0,1); $self->{min_score} = sum(minimum $transpose); $self->{max_score} = sum(maximum $transpose); } sub _search { # this method runs the pwmsearch C extension and parses the data # similarly to _csearch, which will eventually be discontinued my ($self, %args) = @_; my $seqobj = $self->_to_seqobj(%args); my ($subseq_start, $subseq_end) = (1,$seqobj->length); if(my $subpart = $args{-subpart}) { $subseq_start = $subpart->{-start}; $subseq_end = $subpart->{-end}; unless($subseq_start and $subseq_end) { $self->throw("Option -subpart missing suboption -start or -end"); } } return TFBS::Ext::pwmsearch::pwmsearch($self, $seqobj, ($args{-threshold} or 0), $subseq_start, $subseq_end); } sub _csearch { # this is a wrapper around Wyeth Wasserman's's pwm_searchPFF program # until we do a proper extension my ($self) = shift; #the rest of @_ goes to _to_seqob; my %args = @_; my $PWM_SEARCH = $args{'-binary'} || "pwm_searchPFF"; # dump the sequence into a tempfile my $seqobj = $self->_to_seqobj(@_); my ($fastaFH, $fastafile); if (defined $seqobj->{_fastaFH} and defined $seqobj->{_fastafile}) { ($fastaFH, $fastafile) = ($seqobj->{_fastaFH}, $seqobj->{_fastafile}); } else { ($fastaFH, $fastafile) = tmpnam(); my $seqFH = Bio::SeqIO->newFh(-fh =>$fastaFH, -format=>"Fasta"); print $seqFH $seqobj; } # we need $fastafile below # calculate threshold my $threshold; if ($args{-threshold}) { if ($args{-threshold} =~ /(.+)%/) { # percentage $threshold = $self->{min_score} + ($self->{max_score} - $self->{min_score})* $1/100; } else { # absolute value $threshold = $args{-threshold}; } } else { # no threshold given $threshold = $self->{min_score} -1; } # convert piddle to text (there MUST be a better way) my $pwmstring = sprintf ( $self->pdl_matrix ); $pwmstring =~ s/\[|\]//g; # lose [] $pwmstring =~ s/\n /\n/g; # lose leading spaces my @pwmlines = split("\n", $pwmstring); # f $pwmstring = join ("\n", @pwmlines[2..5])."\n"; # dump pwm into a tempfile my ($pwmFH, $pwmfile) = tmpnam(); # we need $pwmfile below print $pwmFH $pwmstring; close $pwmFH; # run pwmsearch my $hitlist = TFBS::SiteSet->new(); my ($TFname, $TFclass) = ($self->{name}, $self->{class}); my @search_result_lines = `$PWM_SEARCH $pwmfile $fastafile $threshold -n $TFname -c $TFclass`; foreach (@search_result_lines) { chomp; my ($seq_id, $factor, $class, $strand, $score, $pos, $siteseq) = (split)[0, 2, 3, 4, 5, 7, 9]; my $correct_strand = ($strand eq "+")? "-1" : "1"; my $site = TFBS::Site->new ( -seq_id => $seqobj->display_id()."", -seqobj => $seqobj, -strand => $correct_strand."", -pattern => $self, -siteseq => $siteseq."", -score => $score."", -start => $pos, -end => $pos + length($siteseq) -1 ); $hitlist->add_site($site); } # cleanup unlink $fastafile unless $seqobj->{_fastafile}; unlink $pwmfile; return $hitlist; } sub _bsearch { # this is Perl/PDL only search routine. For experimental purposes only my ($self,%args) = @_; #the rest of @_ goes to _to_seqob; my @PWMs; # prepare the sequence my $seqobj = $self->_to_seqobj(%args); my $seqmatrix = (defined $seqobj->{_pdl_matrix}) ? $seqobj->{_pdl_matrix} : _seq_to_pdlmatrix($seqobj); # calculate threshold my $threshold; if ($args{-threshold}) { if ($args{-threshold} =~ /(.+)%/) { # percentage $threshold = $self->{min_score} + ($self->{max_score} - $self->{min_score})* $1/100; } else { # absolute value $threshold = $args{-threshold}; } } else { # no threshold given $threshold = $self->{min_score} -1; } # do the analysis my $hitlist = TFBS::SiteSet->new(); foreach my $pwm ($self, $self->revcom()) { my $TFlength = $pwm->pdl_matrix->getdim(0); my $position_score_pdl = zeroes($seqmatrix->getdim(0) - $TFlength + 1); my $position_index_pdl = sequence($seqmatrix->getdim(0) - $TFlength + 1)+1; foreach my $i (0..($TFlength-1)) { my $columnproduct = $seqmatrix * $pwm->pdl_matrix->slice("$i,:"); $position_score_pdl += $columnproduct->xchg(0,1)->sumover->slice($i.":".($i-$TFlength)); } my @hitpositions = list $position_index_pdl->where($position_score_pdl >= $threshold); my @hitscores = list $position_score_pdl->where($position_score_pdl >= $threshold); for my $i(0..$#hitpositions) { my($pos,$score) = ($hitpositions[$i], $hitscores[$i]); my $siteseq = scalar($seqobj->subseq($pos, $pos+$TFlength-1)); my $site = TFBS::Site->new ( -seq_id => $seqobj->display_id(), -seqobj => $seqobj, -strand => $pwm->{strand}, -Matrix => $pwm, -siteseq => $siteseq, -score => $score, -start => $pos); $hitlist->add_site($site); } } return $hitlist; } sub _to_seqobj { my ($self, %args) = @_; my $seq; if ($args{-file}) { # not a Bio::Seq return Bio::SeqIO->new(-file => $args{-file}, -format => 'fasta', -moltype => 'dna')->next_seq(); } elsif ($args{-seqstring} or $args{-seq}) { # I guess it's a string then return Bio::Seq->new(-seq => ($args{-seqstring} or $args{-seq}), -id => ($args{-seq_id} or "undefined"), -moltype => 'dna'); } elsif ($args{'-seqobj'} and ref($args{'-seqobj'}) and $args{'-seqobj'}->can("seq")) { # do nothing (maybe check later) return $args{'-seqobj'}; } #elsif (ref($format) =~ /Bio\:\:Seq/ and !defined $seq) { # if only one parameter passed and it's a Bio::Seq #return $format; #} else { $self->throw ("Wrong parameters passed to search method: ".%args); } } sub _seq_to_pdlmatrix { # called from ?search # not OO - help function for search my $seqobj = shift; my $seqstring = uc($seqobj->seq()); my @perlarray; foreach (qw(A C G T)) { my $seqtobits = $seqstring; eval "\$seqtobits =~ tr/$_/1/"; # curr. letter $_ to 1 eval "\$seqtobits =~ tr/1/0/c"; # non-1s to 0 push @perlarray, [split("", $seqtobits)]; } return byte (\@perlarray); } sub DESTROY { # nothing } 1; TFBS-0.7.1/TFBS/Matrix/_Alignment.pm000077500000000000000000000273741305752266700167270ustar00rootroot00000000000000package TFBS::Matrix::_Alignment; use vars qw(@ISA $AUTOLOAD); use TFBS::SitePair; use TFBS::SitePairSet; use Bio::Root::Root; use Bio::Seq; use Bio::SimpleAlign; use Bio::AlignIO; use IO::String; use PDL; use strict; @ISA =('Bio::Root::Root'); # CONSTANTS use constant DEFAULT_WINDOW => 50; use constant DEFAULT_CUTOFF => 70; use constant DEFAULT_THRESHOLD => "80%"; sub new { # this is ugly; OK, OK, I'll rewrite it as soon as I can my ($caller, %args) = @_; my $self = bless {}, ref $caller || $caller; $self->window($args{-window} or DEFAULT_WINDOW); $self->_parse_alignment(%args); $self->seq1length(length(_strip_gaps($self->alignseq1()))); $self->seq2length(length(_strip_gaps($self->alignseq2()))); $self->_set_subpart_bounds($args{-subpart}); # # If a conservation profile is provided, no need to compute it again. # NOTE: conservation2 never seems to be used anywhere else so don't worry # about the fact we are ignoring it if conservation is passed in :) # my $cp = $args{-conservation}; if ($cp) { $self->conservation1([$cp->conservation()]); } else { $self->conservation1($self->_calculate_conservation($self->window(),1)); $self->conservation2($self->_calculate_conservation($self->window(),2)); } $self->cutoff($args{-cutoff} or DEFAULT_CUTOFF); #$self->threshold($args{-threshold} or DEFAULT_THRESHOLD); #$self->_do_sitesearch #(($args{-pattern_set} or $self->throw("No -matrixset parameter")), # ($args{-threshold} or DEFAULT_THRESHOLD), # ()); # $self->_set_start_end(%args); # Maybe later... return $self; } sub DESTROY { # empty } sub _parse_alignment { my ($self, %args) = @_; my ($seq1, $seq2, $start); my $alignobj; if (defined $args{'-alignstring'}) { $alignobj = _alignstring_to_alignobj($args{'-alignstring'}); } elsif (defined $args{'-file'}) { $alignobj = _alignfile_to_alignobj($args{'-file'}); } elsif (defined $args{-alignobj}) { $alignobj = $args{'-alignobj'}; } else { $self->throw("No -alignstring, -file or -alignobj passed."); } my @match; my ($seqobj1, $seqobj2) = $alignobj->each_seq; ($seq1, $seq2) = ($seqobj1->seq, $seqobj2->seq); $start = 1; $self->seq1name($seqobj1->display_id); $self->seq2name($seqobj2->display_id); $self->alignseq1($seq1); $self->alignseq2($seq2); my @seq1 = ("-", split('', $seq1) ); my @seq2 = ("-", split('', $seq2) ); $self->{alignseq1array} = [@seq1]; $self->{alignseq2array} = [@seq2]; my (@seq1index, @seq2index); my ($i1, $i2) = (0, 0); for my $pos (0..$#seq1) { my ($s1, $s2) = (0, 0); $seq1[$pos] ne "-" and $s1 = ++$i1; $seq2[$pos] ne "-" and $s2 = ++$i2; push @seq1index, $s1; push @seq2index, $s2; } $self->pdlindex( pdl [ [list sequence($#seq1+1)], [@seq1index], [@seq2index], [list zeroes ($#seq1+1)] ]) ; return 1; } sub pdlindex { my ($self, $input, $p1, $p2) = @_ ; # print ("PARAMS ", join(":", @_), "\n"); if (ref($input) eq "PDL") { $self->{pdlindex} = $input; } unless (defined $p2) { return $self->{pdlindex}; } else { my @results = list $self->{pdlindex}->xchg(0,1)->slice($p2)->where ($self->{pdlindex}->xchg(0,1)->slice($p1)==$input); wantarray ? return @results : return $results[0]; } } sub lower_pdlindex { my ($self, $input, $p1, $p2) = @_; unless (defined $p2) { $self->throw("Wrong number of parameters passed to lower_pdlindex"); } my $result; my $i = $input; until ($result = $self->pdlindex($i, $p1 => $p2)) { $i--; last if $i==0; } return $result or 1; } sub higher_pdlindex { my ($self, $input, $p1, $p2) = @_; unless (defined $p2) { $self->throw("Wrong number of parameters passed to lower_pdlindex"); } my $result; my $i = $input; until ($result = $self->pdlindex($i, $p1 => $p2)) { $i++; last unless ($self->pdlindex($i, $p1=>0) > 0); } return $result; } sub _calculate_conservation { my ($self, $WINDOW, $which) = @_; my (@seq1, @seq2); if ($which==2) { @seq1 = @{$self->{alignseq2array}}; @seq2 = @{$self->{alignseq1array}}; } else { @seq1 = @{$self->{alignseq1array}}; @seq2 = @{$self->{alignseq2array}}; $which=1; } my @CONSERVATION; my @match; while ($seq1[0] eq "-") { shift @seq1; shift @seq2; } for my $i (0..$#seq1) { push (@match,( uc($seq1[$i]) eq uc($seq2[$i]) ? 1:0)) unless ($seq1[$i] eq "-" or $seq1[$i] eq "."); } my @graph=($match[0]); for my $i (1..($#match+$WINDOW/2)) { $graph[$i] = ($graph[$i-1] or 0) + ($i>$#match ? 0: $match[$i]) - ($i<$WINDOW ? 0: $match[$i-$WINDOW]); } # at this point, the graph values are shifted $WINDOW/2 to the right # i.e. the score at a certain position is the score of the window # UPSTREAM of it: To fix it, we should discard the first $WINDOW/2 scores: #$self->conservation1 ([]); foreach my $pos (@graph[int($WINDOW/2)..$#graph]) { push @CONSERVATION, 100*$pos/$WINDOW; } # correction foreach my $pos (0..int($WINDOW/2)) { $CONSERVATION[$pos] = $CONSERVATION[$pos]*$WINDOW/(int($WINDOW/2)+$pos); $CONSERVATION[$#CONSERVATION - $pos] = $CONSERVATION[$#CONSERVATION - $pos]*$WINDOW/(int($WINDOW/2)+$pos); } return [@CONSERVATION]; } sub _strip_gaps { # a utility function my $seq = shift; $seq =~ s/\-|\.//g; return $seq; } sub do_sitesearch { my ($self, @args ) = @_; my ($MATRIXSET, $THRESHOLD, $CUTOFF) = $self->_rearrange([qw(PATTERN_SET THRESHOLD CUTOFF)], @args); if (!$MATRIXSET) { $self->throw("No -pattern_set passed to do_sitesearch"); } $CUTOFF = ($CUTOFF or DEFAULT_CUTOFF); $THRESHOLD = ($THRESHOLD or DEFAULT_THRESHOLD); $self->site_pair_set(TFBS::SitePairSet->new()); return if(($self->subpart1 and $self->subpart1->{-start} == 0) or ($self->subpart2 and $self->subpart2->{-start} == 0)); # ^^^ If one of the subparts is a gap, there's no point in searching my $seqobj1 = Bio::Seq->new(-seq=>_strip_gaps($self->alignseq1()), -id => "Seq1"); my $siteset1 = $MATRIXSET->search_seq(-seqobj => $seqobj1, -threshold => $THRESHOLD, -subpart => $self->subpart1); my $siteset1_itr = $siteset1->Iterator(-sort_by => "start"); my $seqobj2 = Bio::Seq->new(-seq=>_strip_gaps($self->alignseq2()), -id => "Seq2"); my $siteset2 = $MATRIXSET->search_seq(-seqobj => $seqobj2, -threshold => $THRESHOLD, -subpart => $self->subpart2); my $siteset2_itr = $siteset2->Iterator(-sort_by => "start"); my $site1 = $siteset1_itr->next(); my $site2 = $siteset2_itr->next(); while (defined $site1 and defined $site2) { my $pos1_in_aln = $self->pdlindex($site1->start(), 1=>0); my $pos2_in_aln = $self->pdlindex($site2->start(), 2=>0); my $cmp = (($pos1_in_aln <=> $pos2_in_aln) or ($site1->pattern->name() cmp $site2->pattern->name()) or ($site1->strand() cmp $site2->strand())); if ($cmp==0) { ### match if (# threshold test: $self->conservation1->[$site1->start()] >= $self->cutoff() ) { my $site_pair = TFBS::SitePair->new($site1, $site2); $self->site_pair_set->add_site_pair($site_pair); } $site1 = $siteset1_itr->next(); $site2 = $siteset2_itr->next(); } elsif ($cmp<0) { ### $siteset1 is behind $site1 = $siteset1_itr->next(); } elsif ($cmp>0) { ### $siteset2 is behind $site2 = $siteset2_itr->next(); } } } sub _set_subpart_bounds { my ($self, $subpart) = @_; if(defined $subpart) { my ($relative_to, $start, $end) = ($subpart->{-relative_to}, $subpart->{-start}, $subpart->{-end}); unless(defined($relative_to) and defined($start) and defined($end) ) { $self->throw("Option -subpart missing suboption -relative_to, -start or -end"); } if($relative_to == 1) { my $other_start = $self->higher_pdlindex($start, 1 => 2); my $other_end = $self->lower_pdlindex($end, 1 => 2); ($other_start, $other_end) = (0,0) if($other_start > $other_end); $self->subpart1({ -start => $start, -end => $end }); $self->subpart2({ -start => $other_start, -end => $other_end }); } elsif($relative_to == 2) { my $other_start = $self->higher_pdlindex($start, 2 => 1); my $other_end = $self->lower_pdlindex($end, 2 => 1); ($other_start, $other_end) = (0,0) if($other_start > $other_end); $self->subpart1({ -start => $other_start, -end => $other_end }); $self->subpart2({ -start => $start, -end => $end }); } else { $self->throw("Suboption -relative_to should be 1 or 2"); } } } sub _calculate_cutoff { my ($self) = @_; my $ile = 0.9; my @conservation_array = sort {$a <=> $b} @{$self->conservation1()}; my $perc_90 = $conservation_array[int($ile*scalar(@conservation_array))]; return $perc_90; } sub _alignfile_to_string { # a utility function # DEPRECATED !!! my $alignfile = shift; if ($alignfile =~ /\.msf$/i) { my $alignobj = Bio::SimpleAlign->new(); $alignobj->read_MSF($alignfile); return _alignobj_to_string($alignobj); } else { #assumed clustalw - no AlignIO import yet local $/ = undef; open FILE, $alignfile or die("Could not read alignfile $alignfile, stopped"); my $alignstring = ; return $alignstring; } } sub _alignfile_to_alignobj { # a utility function my ($alignfile, $format) = (@_,'clustalw'); if (!$format and $alignfile =~ /\.msf$/i) { $format = 'msf' ;} my $alnio = Bio::AlignIO->new(-file=>$alignfile, -format=>$format); return $alnio->next_aln; } sub _alignobj_to_string { # a utility function # DEPRECATED my $alignobj = shift; my $alignstring; my $io = IO::String->new($alignstring); my $alnio = Bio::AlignIO->new(-fh=>$io, -format=>"clustalw"); $alnio->write_aln($alignobj); $alnio->close(); # $io->close; return $alignstring; } sub _alignstring_to_alignobj { # a utility function my ($alignstring, $format) = (@_, 'clustalw'); my $io = IO::String->new($alignstring); my $alnio = Bio::AlignIO->new(-fh=>$io, -format=>$format); my $alignobj = $alnio->next_aln(); $alnio->close(); # $io->close; return $alignstring; } # uglier than AUTOLOAD, but faster - a quick fix to get rid of Class::MethodMaker sub cutoff { $_[0]->{'cutoff'} = $_[1] if exists $_[1]; $_[0]->{'cutoff'}; } sub window { $_[0]->{'window '} = $_[1] if exists $_[1]; $_[0]->{'window '}; } sub alignseq1 { $_[0]->{'alignseq1'} = $_[1] if exists $_[1]; $_[0]->{'alignseq1'}; } sub alignseq2 { $_[0]->{'alignseq2'} = $_[1] if exists $_[1]; $_[0]->{'alignseq2'}; } sub site_pair_set { $_[0]->{'site_pair_set'} = $_[1] if exists $_[1]; $_[0]->{'site_pair_set'};} sub seq1name { $_[0]->{'seq1name'} = $_[1] if exists $_[1]; $_[0]->{'seq1name'}; } sub seq2name { $_[0]->{'seq2name'} = $_[1] if exists $_[1]; $_[0]->{'seq2name'}; } sub seq1length { $_[0]->{'seq1length'} = $_[1] if exists $_[1]; $_[0]->{'seq1length'}; } sub seq2length { $_[0]->{'seq2length'} = $_[1] if exists $_[1]; $_[0]->{'seq2length'}; } sub subpart1 { $_[0]->{'subpart1'} = $_[1] if exists $_[1]; $_[0]->{'subpart1'}; } sub subpart2 { $_[0]->{'subpart2'} = $_[1] if exists $_[1]; $_[0]->{'subpart2'}; } sub conservation1 { $_[0]->{'conservation1'} = $_[1] if exists $_[1]; $_[0]->{'conservation1'};} sub conservation2 { $_[0]->{'conservation2'} = $_[1] if exists $_[1]; $_[0]->{'conservation2'};} sub exclude_orf { $_[0]->{'exclude_orf'} = $_[1] if exists $_[1]; $_[0]->{'exclude_orf'}; } sub start_at { $_[0]->{'start_at'} = $_[1] if exists $_[1]; $_[0]->{'start_at'}; } sub end_at { $_[0]->{'end_at'} = $_[1] if exists $_[1]; $_[0]->{'end_at'}; } 1; TFBS-0.7.1/TFBS/MatrixSet.pm000077500000000000000000000505771305752266700153270ustar00rootroot00000000000000# TFBS module for TFBS::MatrixSet # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::Set - an agregate class representing a set of matrix patterns, containing methods for manipulating the set as a whole =head1 SYNOPSIS # creation of a TFBS::MatrixSet object # let @list_of_matrix_objects be a list of TFBS::Matrix::* objects ################################### # Create a TFBS::MatrixSet object: my $matrixset = TFBS::MatrixSet->new(); # creates an empty set $matrixset->add_Matrix(@list_of_matrix_objects); #add matrix objects to set $matrixset->add_Matrix($matrixobj); # adds a single matrix object to set # or, same as above: my $matrixset = TFBS::MatrixSet->new(@list_of_matrix_objects, $matrixobj); ################################### # =head1 DESCRIPTION TFBS::MatrixSet is an aggregate class storing a set of TFBS::Matrix::* subclass objects, and providing methods form manipulating those sets as a whole. TFBS::MatrixSet objects are created de novo or returned by some database (TFBS::DB::*) retrieval methods. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE Modified by Eivind Valen eivind.valen@gmail.com =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::MatrixSet; use vars '@ISA'; use PDL; use Bio::Seq; use Bio::SeqIO; use Bio::Root::Root; use Bio::TreeIO; use File::Temp qw/:POSIX/; use TFBS::Matrix; use TFBS::_Iterator::_MatrixSetIterator; use TFBS::SiteSet; use strict; use constant TRUE => 1; use constant FALSE => 0; @ISA = qw(Bio::Root::Root); # Hash of accepted options and their arguments for the program # STAMP. Reference to empty list means the option take no arguments # This test for legal arguments is maybe superfluous and can # potentially be removed. my %stamp_opt = ( -tf => [], -sd => [], -cc => [ "PCC", "ALLR", "ALLR_LL", "CS", "KL", "SSD" ], -align => [ "NW", "SW", "SWA", "SWU" ], -go => [], -ge => [], -out => [], -overlapalign => [], -nooverlapalign => [], -extendedoverlap => [], -printpairwise => [], -tree => [ "UPGMA", "SOTA" ], -ch => [], -ma => [ "PPA", "IR" ], -match => [], -matchtop => [], -prot => [], -genrand => [], -genscores => [], -stampdir => [], -tempdir => [], -noclean => [] ); =head2 new =cut sub new { my ($caller, @matrices) = shift; my $self = bless {matrix_list =>[]}, ref($caller) || $caller; $self->add_matrix(@matrices) if @matrices; return $self; } =head2 new2 =cut sub new2 { my $class = shift; my %args = @_; my $self = bless {}, ref($class) || $class; if (defined $args{'-matrices'}) { $self->add_matrix( @{$args{'-matrices'}} ) if @{$args{'-matrices'}}; } if (defined $args{'-matrixfile'}) { my @matrices; open (FILE, $args{-matrixfile}) or $self->throw("Could not open $args{-matrixfile}"); while () { /^\s*$/ && next; if (/^>/) { } } close(FILE); } return $self; } =head2 add_matrix Title : add_matrix Usage : $matrixset->add_matrix(@list_of_matrix_objects); Function: Adds matrix objects to matrixset Returns : object reference (usually ignored) Args : one or more TFBS::Matrix::* objects =cut sub add_matrix { my ($self, @matrices) = @_; foreach my $matrix (@matrices) { $self->throw("Argument to add_matrix_set not a TFBS::Matrix object") unless $matrix->isa("TFBS::Matrix"); } push @{$self->{matrix_list}}, @matrices; return $self; } sub add_Matrix { my $self = shift; return $self->add_matrix(@_); } =head2 add_matrix_set Title : add_matrix Usage : $matrixset->add_matrix(@list_of_matrixset_objects); Function: Adds to the matrixset matrix objects contained in one or more other matrixsets Returns : object reference (usually ignored) Args : one or more TFBS::MatrixSet objects =cut sub add_matrix_set { my ($self, @sets) = @_; foreach my $matrixset (@sets) { $self->throw("Argument to add_matrix_set not a TFBS::Matrixset object") unless $matrixset->isa("TFBS::MatrixSet"); push @{$self->{matrix_list}}, @{$matrixset->{matrix_list}}; } } sub reset { my ($self) = @_; $self->warn("reset: Deprecated method use Iterator instead."); @{$self->{_iterator_list}} = @{$self->{matrix_list}}; } sub sort_by_name { my ($self) = @_; $self->warn("sort_by_name: Deprecated method use Iterator instead."); @{$self->{matrix_list}} = sort { uc($a->{name}) cmp uc ($b->{name}) } @{$self->{matrix_list}}; $self->reset(); } sub next { my ($self) = @_; $self->warn("next: Deprecated method use Iterator instead."); if (my $next_matrix = shift (@{$self->{_iterator_list}})) { return $next_matrix; } else { $self->reset; return undef; } } =head2 search_seq Title : search_seq Usage : my $siteset = $matrixset->search_seq(%args) Function: scans a nucleotide sequence with all patterns represented stored in $matrixset; It works only if all matrix objects in $matrixset understand search_seq method (currently only TFBS::Matrix::PWM objects do) Returns : a TFBS::SiteSet object Args : # you must specify either one of the following three: -file, # the name od a fasta file (single sequence) #or -seqobj # a Bio::Seq object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -seqstring # a string containing the sequence -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" =cut sub search_seq { my ($self, %args) = @_; $self->_search(%args); } =head2 search_aln Title : search_aln Usage : my $site_pair_set = $matrixset->search_aln(%args) Function: Scans a pairwise alignment of nucleotide sequences with the pattern represented by the PWM: it reports only those hits that are present in equivalent positions of both sequences and exceed a specified threshold score in both, AND are found in regions of the alignment above the specified conservation cutoff value. It works only if all matrix object in $matrixset understand search_aln method (currently only TFBS::Matrix::PWM objects do) Returns : a TFBS::SitePairSet object Args : # you must specify either one of the following three: -file, # the name of the alignment file in Clustal format #or -alignobj # a Bio::SimpleAlign object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -alignstring # a multi-line string containing the alignment # in clustal format ############# -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -window, # size of the sliding window (inn nucleotides) # for calculating local conservation in the # alignment # OPTIONAL: default 50 -cutoff # conservation cutoff (%) for including the # region in the results of the pattern search # OPTIONAL: default "70%" -subpart # subpart of the alignment to search, given as e.g. # -subpart => { relative_to => 1, # start => 140, # end => 180 } # where start and end are coordinates in the # sequence indicated by relative_to (1 for the # 1st sequence in the alignment, 2 for the 2nd) # OPTIONAL: by default searches entire alignment -conservation # conservation profile, a TFBS::ConservationProfile # OPTIONAL: by default the conservation profile is # computed internally on the fly (less efficient) =cut sub search_aln { my ($self, %args) = @_; my $mxit = $self->Iterator(); my $sitepairset = TFBS::SitePairSet->new; my $aln = TFBS::Matrix::_Alignment->new(%args); while (my $mx = $mxit->next) { my $singleset = $mx->search_aln(%args, -alignment_setup => $aln); $sitepairset->add_site_pair_set($singleset); } return $sitepairset; } =head2 size Title : size Usage : my $number_of_matrices = $matrixset->size; Function: gets the number of matrix objects in the $matrixset (i.e. the size of the set) Returns : a number Args : none =cut sub size { scalar @{ $_[0]->{matrix_list} }; } =head2 Iterator Title : Iterator Usage : my $matrixset_iterator = $matrixset->Iterator(-sort_by =>'total_ic'); while (my $matrix_object = $matrix_iterator->next) { # do whatever you want with individual matrix objects } Function: Returns an iterator object that can be used to go through all members of the set Returns : an iterator object (currently undocumentened in TFBS - but understands the 'next' method) Args : -sort_by # optional - currently it accepts # 'ID' (alphabetically) # 'name' (alphabetically) # 'class' (alphabetically) # 'total_ic' (numerically, decreasing order) -reverse # optional - reverses the default sorting order if true =cut sub Iterator { my ($self, %args) = @_; return TFBS::_Iterator::_MatrixSetIterator->new($self->{matrix_list}, $args{'-sort_by'}, $args{'-reverse'} ); } =head2 randomize_columns Title : randomize_columns Usage : $matrixset->randomize_columns(); Function: Randomizes the columns between all the matrices in the set (in place). Returns : nothing Args : none =cut sub randomize_columns { my $self = shift; my (@lengths, @concat); my ($length, $i) = (-1, 0); # Concatenate to one big matrix for my $matrix (@{$self->{matrix_list}}) { $length += $matrix->length(); push @lengths, $matrix->length(); push @{$concat[$_]}, @{${$matrix->matrix()}[$_]} for (0..3); } # Schwartzian transform to get random permutation map { ( undef, $concat[0][$i], $concat[1][$i], $concat[2][$i], $concat[3][$i] ) = @$_; $i++; } sort { $a->[0] <=> $b->[0] } map { [ rand(), $concat[0][$_], $concat[1][$_], $concat[2][$_], $concat[3][$_] ] } ( 0 .. $length ); # Split it up again my $start = 0; for my $matrix (@{$self->{matrix_list}}) { my $length = shift(@lengths); my $end = $start + $length - 1; $matrix->matrix( [ [ @{$concat[0]}[$start..$end] ], [ @{$concat[1]}[$start..$end] ], [ @{$concat[2]}[$start..$end] ], [ @{$concat[3]}[$start..$end] ] ] ); $start += $length; } } sub _search { my ($self, %args) = @_; # DIRTY - stick tmp file name to seq object my $seqobj = $self->_to_seqobj(%args); ($seqobj->{_fastaFH}, $seqobj->{_fastafile}) = tmpnam(); # we need $fastafile below my $outstream = Bio::SeqIO->new(-file=>">".$seqobj->{_fastafile}, -format=>"Fasta"); my $subseqobj; if(my $subpart = $args{-subpart}) { my $subseq_start = $subpart->{-start}; my $subseq_end = $subpart->{-end}; unless($subseq_start and $subseq_end) { $self->throw("Option -subpart missing suboption -relative_to, -start or -end"); } $subseqobj = Bio::Seq->new(-seq => $seqobj->subseq($subseq_start, $subseq_end), -id => $seqobj->id); } $outstream->write_seq($subseqobj or $seqobj); $outstream->close; # iterate through pwms my @PWMs; my $mxit = $self->Iterator(); while (my $pwm = $mxit->next() ) { push @PWMs,$pwm; } # do the analysis my $hitlist = TFBS::SiteSet->new(); foreach my $pwm (@PWMs) { my $threshold = ($args{-threshold} or $pwm->{minscore}); $hitlist->add_siteset($pwm->search_seq(-seqobj=>$seqobj, -threshold =>$threshold, -subpart=>$args{-subpart})); } delete $seqobj->{_fastaFH}; unlink $seqobj->{_fastafile}; delete $seqobj->{_fastafile}; return $hitlist; } sub _csearch { my ($self, %args) = @_; my $PWM_SEARCH = '/home/httpd/cgi-bin/CONSITE/bin/pwm_searchPFF'; # DIRTY - stick tmp file name to seq object my $seqobj = $self->_to_seqobj(%args); ($seqobj->{_fastaFH}, $seqobj->{_fastafile}) = tmpnam(); # we need $fastafile below my $seqFH = Bio::SeqIO->newFh(-fh=>$seqobj->{_fastaFH}, -format=>"Fasta"); print $seqFH $seqobj; # iterate through pwms my @PWMs; $self->reset(); while (my $pwm = $self->next() ) { push @PWMs,$pwm; } # do the analysis my $hitlist = TFBS::SiteSet->new(); foreach my $pwm (@PWMs) { my $threshold = ($args{-threshold} or $pwm->{minscore}); $hitlist->add_siteset($pwm->search_seq(-seqobj=>$seqobj, -threshold =>$threshold )); } delete $seqobj->{_fastaFH}; delete $seqobj->{_fastafile}; return $hitlist; } sub _bsearch { my ($self,%args) = @_; #the rest of @_ goes to _to_seqob; my @PWMs; # prepare the sequence my $seqobj = $self->_to_seqobj(%args); $seqobj->{_pdl_matrix} = _seq_to_pdlmatrix($seqobj); # prepare the PWMs $self->reset(); while (my $pwm = $self->next() ) { push @PWMs,$pwm; } # do the analysis my $hitlist = TFBS::SiteSet->new(); foreach my $pwm (@PWMs) { my $threshold = ($args{-threshold} or $pwm->{minscore}); $hitlist->add_siteset($pwm->bsearch(-seqobj=>$seqobj, -threshold =>$threshold )); } delete $seqobj->{_pdl_matrix}; return $hitlist; } sub _seq_to_pdlmatrix { # not OO - help function for search my $seqobj = shift; my $seqstring = uc($seqobj->seq()); my @perlarray; foreach (qw(A C G T)) { my $seqtobits = $seqstring; eval "\$seqtobits =~ tr/$_/1/"; # curr. letter $_ to 1 eval "\$seqtobits =~ tr/1/0/c"; # non-1s to 0 push @perlarray, [split("", $seqtobits)]; } return byte (\@perlarray); } sub _to_seqobj { my ($self, %args) = @_; my $seq; if ($args{-file}) { # not a Bio::Seq return Bio::SeqIO->new(-file => $args{-file}, -format => 'fasta', -moltype => 'dna')->next_seq(); } elsif ($args{-seqstring} or $args{-seq}) { # I guess it's a string then return Bio::Seq->new(-seq => ($args{-seqstring} or $args{-seq}), -id => ($args{-seq_id} or "undefined"), -moltype => 'dna'); } elsif ($args{'-seqobj'} and ref($args{'-seqobj'}) =~ /Bio\:\:Seq/) { # do nothing (maybe check later) return $args{'-seqobj'}; } #elsif (ref($format) =~ /Bio\:\:Seq/ and !defined $seq) { # if only one parameter passed and it's a Bio::Seq #return $format; #} else { $self->throw ("Wrong parameters passed to search method: ".%args); } # CONTINUE HERE TOMORROW } =head2 remove_Matrix_by_ID Title : remove_Matrix_by_ID Usage : $matrixset->remove_Matrix_by_ID($id); Function: Removes a matrix from the set Returns : Nothing Args : None =cut sub remove_Matrix_by_ID { my ($self, $id) = @_; my @list = grep { $_->ID() ne $id } @{$self->{matrix_list}}; $self->{matrix_list} = \@list; } my $error; sub _check_opt { my ($self, $opt, $arg, $list) = @_; # Invalid argument if (not defined($list)) { $error = "Invalid argument: $opt\n"; return FALSE; } # Valid flag or switch. return TRUE if (not scalar(@$list)); # Valid switch, check the argument for (@$list) { return TRUE if ($arg eq $_) ; } # Valid switch, invalid argument $error = "$arg is invalid argument to $opt"; return FALSE; } sub _find_optimal { my ($self, $output) = @_; my ($optimal, $score_best, $in) = (undef, undef, 0); for (@$output) { if (/NumClust/) { $in = 1; next; } last if (/Tree Built/); if ($in) { my (undef, $clusters, $score) = split(/\t/); if ((not defined($score_best)) || $score < $score_best) { $score_best = $score; $optimal = $clusters; } } } return $optimal; } sub _run_STAMP { my ($self, %args) = @_; my $fh; for (keys(%args)) { die $error unless ($self->_check_opt($_, $args{$_}, $stamp_opt{$_})); } # Write matrices to temporary file if (not exists($args{-tf})) { $fh = new File::Temp( TEMPLATE => 'STAMP-XXXXX', DIR => $args{-tempdir} || '/tmp', SUFFIX => '.set'); print $fh $_->STAMPprint() for (@{$self->{matrix_list}}); $args{-tf} = $fh->filename(); } # Set some default options $args{-tree} ||= "UPGMA"; $args{-ma} ||= "IR"; $args{-cc} ||= "PCC"; $args{-align} ||= "SWU"; # Make sure we find all files my $path; if ($args{-stampdir}) { $path = $args{-stampdir}; die "Could not find STAMP at $path\n" if (not -e "$path/STAMP"); } else { $path = (grep {-e "$_/STAMP"} split(/:+/, $ENV{PATH}))[0]; $path || die "Could not find STAMP in path\n"; } $args{-sd} ||= $path."/ScoreDists/JaspRand_".$args{-cc}."_".$args{-align}.".scores"; die "No score distribution file found or not readable at '$args{-sd}'.\n Use -sd.\n" unless (-r $args{-sd}); # Execute STAMP my $args = ""; $args .= "$_ $args{$_} " for (keys(%args)); my @output = `$path/STAMP -ch $args -out $fh`; # Get tree my $treeio = new Bio::TreeIO(-format => 'newick', -file => $fh->filename().".tree"); my $tree = $treeio->next_tree; # Get FBP my $fbp = TFBS::Matrix::PFM->new(-matrixfile => $fh->filename()."FBP.txt"); $fbp->{'filename'} = $fh->filename()."FBP.txt"; print STDERR "::: $fbp->{'filename'} \n"; if (not $args{-noclean}) { my $deleted = unlink($fh->filename()."FBP.txt", $fh->filename().".tree"); warn("Couldn't remove temporary files") if ($deleted != 2); } return ($fh, \@output, $tree, $fbp); } sub _build_cluster { my ($self, $cluster, $node) = @_; if ($node->is_Leaf()) { for (@{$self->{matrix_list}}) { if ($_->ID() eq $node->id()) { $cluster->add_matrix($_); return; } } } else { $self->_build_cluster($cluster, $_) for ($node->each_Descendent()); } } =head2 cluster Title : cluster Usage : $matrixset->cluster(%args) Function: Clusters the matrices in the set Returns : The root node of the hierachical clustering tree. An integer specifying the optimal number of clusters. An array of TFBS::MatrixSets, one for each cluster. Args : Many: -stampdir Directory where stamp is located. Not necessary if it is in the PATH. -tempdir Directory to put temporary files. Defaults to "/tmp" -noclean 0 to clean up temporary files, 1 otherwise -tree Method for constructing tree (UPGMA/SOTA). Def:UPGMA =cut sub cluster { my ($self, %args) = @_; if ($self->size() <= 1) { warn("Can't cluster MatrixSet of size less than 2"); return; } my ($fh, $output, $tree, $fbp) = $self->_run_STAMP(%args); # Find optimal cluster number my $optimal = $args{-optimal} || $self->_find_optimal($output); my $root = $tree->get_root_node(); my @nodes = ($root); my @leaves; # Descend the tree until the optimal cluster number is reached while (scalar(@nodes) && (scalar(@nodes) + scalar(@leaves)) < $optimal) { my $node = pop @nodes; if ($node->is_Leaf()) { push @leaves, $node; } else { @nodes = sort {$a->height() <=> $b->height()} (@nodes, $node->each_Descendent()); } } # Build the clusters my @clusters; for (@leaves, @nodes) { my $cluster = $self->new(); $self->_build_cluster($cluster, $_); push @clusters, $cluster; } return ($tree, $optimal, \@clusters); } =head2 fbp Title : fbp Usage : $matrixset->fbp(%args); Function: Creates a familial binding profile (FBP) for the set Returns : A familial binding profile represented as a TFBS::Matrix::PFM Args : Many -stampdir Directory where stamp is located. Not necessary if it is in the PATH. -tempdir Directory to put temporary files. Defaults to "/tmp" -noclean 0 to clean up temporary files, 1 otherwise -align Alignment method =cut sub fbp { my ($self, %args) = @_; if ($self->size() == 0) { warn("Can't create FBP for MatrixSet of size 0"); return; } elsif ($self->size() == 1) { return @{$self->{'matrix_list'}}[0]; } my ($fh, $output, $tree, $fbp) = $self->_run_STAMP(%args); return $fbp; } 1; TFBS-0.7.1/TFBS/PatternGen.pm000077500000000000000000000113031305752266700154360ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen - a base class for pattern generators =head1 DESCRIPTION TFBS::PatternGen is a base class providing methods common to all pattern generating modules. It is meant to be inherited by a concrete pattern generator, which must have its own constructor. =cut package TFBS::PatternGen; # Object preamble - inherits from TFBS::PatternGenI; use vars qw(@ISA); use strict; use TFBS::PatternGenI; # use TFBS::PatternGen::_Motif_; use Bio::Seq; use Bio::SeqIO; use Carp; @ISA = qw(TFBS::PatternGenI); sub new { confess("TFBS::PatterGen is a base class for particular pattern generators". "and cannot be instantiated itself."); } =head2 pattern Title : pattern Usage : my $pattern_obj = $patterngen->pattern() Function: retrieves a pattern object produced by the pattern generator Returns : a pattern object (currently available pattern generators return a TFBS::Matrix::PFM object) Args : none Warning : If a pattern generator produces more than one pattern, this method call returns only the first one and prints a warning on STDERR, In those cases you should use I or I methods. =cut sub pattern { my ($self, %args) =@_; my @PFMs = $self->_motifs_to_patterns(%args); if (scalar(@PFMs) > 1) { $self->warn("The pattern generator produced multiple patterns. ". "Please use patternSet method to retrieve a set object, ". "or all_patterns method to retrieve an array of patterns"); } return $PFMs[0]; } =head2 patternSet Title : patternSet Usage : my $patternSet = $patterngen->patternSet() Function: retrieves a pattern set object containing all the patterns produced by the pattern generator Returns : a pattern set object (currently available pattern generators return a TFBS::MatrixSet object) Args : none =cut sub patternSet { my ($self, %args) = @_; my @PFMs = $self->_motifs_to_patterns(%args); my $set = TFBS::MatrixSet->new(); $set->add_matrix(@PFMs); return $set; } =head2 all_patterns Title : all_patterns Usage : my @patterns = $patterngen->all_patterns() Function: retrieves an array of pattern objects produced by the pattern generator Returns : an array of pattern set objects (currently available pattern generators return an array of TFBS::Matrix::PFM objects) Args : none =cut sub all_patterns { my ($self, %args) = @_; my @patterns = $self->_motifs_to_patterns(%args); return @patterns; } sub _create_seq_set { my ($self, %args) = @_; my (@raw_set, @final_set); if ($args{-seq_list}) { @raw_set = @{$args{-seq_list}}; } elsif ($args{-seq_stream} ) { while (my $seqobj = $args{-seq_stream}->next_seq()) { push @raw_set, $seqobj; } } elsif ($args{-seq_file} ) { my $seqstream = Bio::SeqIO->new(-file=>$args{-seq_file}, -format=>"fasta"); while (my $seqobj = $seqstream->next_seq()) { push @raw_set, $seqobj; } } foreach my $seqobj (@raw_set) { my $i = 1; #for unnamed sequences if (ref($seqobj)) { my $seqstring; eval { $seqstring = $seqobj->seq() }; if ($@) { $self->throw("Invalid sequence object passed in -seq_set."); } else { _validate_seq(uc $seqstring) or $self->throw("Illegal character(s) in sequence: $seqstring"); } push @final_set, $seqobj; } else { my $seqstring = $seqobj; _validate_seq(uc $seqstring) or $self->throw("Illegal character(s) in sequence: $seqstring"); push @final_set, Bio::Seq->new(-seq=>$seqstring, -ID=>"unnamed".$i++, -type=>"dna"); } } $self->{'seq_set'} = \@final_set; return 1; } sub _motifs_to_patterns { my ($self, %args) = @_; my $i = 1; my @patterns; my %params = ( -name => "motif", -ID => "motif", -class => "unknown", %args); foreach my $motif (@{ $self->{'motifs'} }) { push @patterns, $motif->pattern(-name => $params{-name}.$i, -ID => $params{-ID}."#".$i, -class => $params{-class}); $i++; } return @patterns; } sub _validate_seq { # a utility function my $sequence = uc $_[0]; $sequence=~ s/[ACGTN]//g; return ($sequence eq "" ? 1 : 0); } sub _check_seqs_for_uniform_length { my $self = shift; my $reflength = $self->{'seq_set'}->[-1]->length(); foreach my $seqobj ( @{ $self->{'seq_set'} } ) { if ($seqobj->length() != $reflength) { $self->throw(ref($self). "object has received sequences of unequal length"); } } } sub all_motifs { return @{$_[0]->{'motifs'}} if $_[0]->{'motifs'}; } TFBS-0.7.1/TFBS/PatternGen/000077500000000000000000000000001305752266700150775ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/.svn/000077500000000000000000000000001305752266700157635ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/.svn/all-wcprops000077500000000000000000000013031305752266700201510ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 44 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen END Gibbs.pm K 25 svn:wc:ra_dav:version-url V 53 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/Gibbs.pm END AnnSpec.pm K 25 svn:wc:ra_dav:version-url V 55 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/AnnSpec.pm END Elph.pm K 25 svn:wc:ra_dav:version-url V 52 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/Elph.pm END YMF.pm K 25 svn:wc:ra_dav:version-url V 51 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/YMF.pm END SimplePFM.pm K 25 svn:wc:ra_dav:version-url V 57 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/SimplePFM.pm END MEME.pm K 25 svn:wc:ra_dav:version-url V 52 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/MEME.pm END TFBS-0.7.1/TFBS/PatternGen/.svn/entries000077500000000000000000000017451305752266700173710ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/PatternGen http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df Gibbs.pm file 2009-08-07T13:10:58.000000Z 5dfe55dc5e474abf13353d6371bfb056 2008-01-24T20:21:25.772223Z 8 chrb AnnSpec.pm file 2009-08-07T13:10:58.000000Z c12e1394e2c7ed39608ad57960adf7f1 2008-01-24T20:21:25.772223Z 8 chrb MEME dir Elph.pm file 2009-08-07T13:10:58.000000Z b28cb34f5e5efeef3e20b026b5e03f71 2008-01-24T20:21:25.772223Z 8 chrb Gibbs dir YMF.pm file 2009-08-07T13:10:58.000000Z 7531fcf49cc6018822dc6b1c94bf18c3 2008-01-24T20:21:25.772223Z 8 chrb AnnSpec dir SimplePFM.pm file 2009-08-07T13:10:59.000000Z 8b2431fd45966d310acbea1d31f85fd7 2008-01-24T20:21:25.772223Z 8 chrb Elph dir YMF dir Motif dir MEME.pm file 2009-08-07T13:10:59.000000Z 69957d56c278eb8223d648e07f70f27f 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/PatternGen/.svn/format000077500000000000000000000000021305752266700171710ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/PatternGen/.svn/text-base/000077500000000000000000000000001305752266700176575ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/.svn/text-base/AnnSpec.pm.svn-base000077500000000000000000000145161305752266700232730ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::AnnSpec # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::AnnSpec - a pattern factory that uses the AnnSpec program (version 2.1) =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::AnnSpec->new(-seq_file=>'sequences.fa', -binary => 'ann-spec ' my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::AnnSpec builds position frequency matrices using an external program AnnSpec (Workman, C. and Stormo, G.D. (2000) ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. Proc. Pacific Symposium on Biocomputing 2000). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.se =cut package TFBS::PatternGen::AnnSpec; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::AnnSpec::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $pattrengen = TFBS::PatternGen::AnnSpec->new(%args); Function: the constructor for the TFBS::PatternGen::AnnSpec object Returns : a TFBS::PatternGen::AnnSpec object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to the 'meme' executable # OPTIONAL: default 'ann-spec' -additional_params # a string containing additional # command-line switches for the # ann-spec program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'filename'} =$args{'-seq_file'}; $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'annspec'; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_AnnSpec() or $self->throw("Error running AnnSpec."); return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _run_AnnSpec{ my ($self)=shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $command_line = $self->{'binary'}." ". "-f ".$tmp_file." ". # $self->{'motif_length_string'}." ". # $self->{'nr_hits_string'}." ". $self->{'additional_params'}. ""; # print STDERR "$command_line\n"; my $resultstring = `$command_line`; print "$resultstring\n"; # open(TEST, "test.out"); # this sentense is the one I add # my @lines = ;# this sentense is the one I add # my $resultstring = join '', @lines;# this sentense is the one I add # print STDERR $resultstring; $self->_parse_AnnSpec_output($resultstring,$command_line); unlink $tmp_file; return 1 } sub _parse_AnnSpec_output{ my ($self,$resultstring,$command_line)=@_; if ($resultstring eq''){ # warn "Error running AnnSpec\nNo patterns produced"; $self->throw ("Error running AnnSpec using command:\n $command_line"); return; } my ($consensus,$matrix)=$self->_parse_raw_matrix($resultstring); my ($score,$sites)=$self->_parse_sites($resultstring); for(my $x = 0; $x < scalar(@$consensus); $x++){ my $motif =TFBS::PatternGen::AnnSpec::Motif->new ( #-length => $length."", # -bg_probabilities => [split /\s+/, $raw_bp], -tags => {consensus => $consensus->[$x], score=>$score->[$x]}, -nr_hits => 1, -sites=>$sites->[$x], -matrix => $matrix->[$x] ); push @{ $self->{'motifs'} }, $motif; } return } sub _parse_sites{ my ($self,$string)=@_; my (@hits, @scores); foreach my $substring (split /REPORTING/, $string ){ my @sub_hits; my ($sites)=$substring=~/STR\s+n.*seq\n(.*)RUN\s+ALIGNMENT.*/s; my ($average)=$substring=~/RUN INFORMATION_CONTENT\s+(\d*\.*\d*)/; my ($score)=$substring=~/RUN\s+SCORE\s+(\d*\.*\d*)/; if($sites){ my @sites=split/\n/,$sites; foreach my $site (@sites){ my @site_array=split(/\s+/,$site); my ($seq_id)=$site_array[6]=~/>(.*)/; my $strand=1; $strand=-1 if $site_array[3]=~/\'/;#MEans we have a pattern in the reverse strand my ($start)=$site_array[3]=~/(\d+)/; my $site = Bio::SeqFeature::Generic->new ( -start => $start, -end => $start+(length$site_array[4])-1, -strand => $strand, -source => 'AnnSpec', -score => $site_array[2], ); foreach my $seq(@{$self->{'seq_set'}}){ if ($seq->id eq $seq_id){ $site->attach_seq ($seq); } } push (@sub_hits,$site); } push @scores, $score; push @hits, \@sub_hits; } } return \@scores,\@hits; } sub _parse_raw_matrix{ my ($self,$string)=@_; my (@pfms, @consensus); foreach my $sub_string (split /REPORTING/, $string){ my ($ma)=$sub_string=~/RUN\s+WEIGHTS_CONS.*ALR\s+\/.*ALR\s+\#.*(ALR.*\nALR.*\nALR.*\nALR.*\s+\d+\n)ALR\s+=+.*/s; my ($con)=$sub_string=~/WEIGHTS_CONS\s+(.*)\n/; if($ma){ my @matrix=split("\n",$ma); my @pfm; foreach my $row(@matrix){ # print $row; my @row=split /\s+/, $row; push @pfm, [@row[2..scalar@row-1]]; } push @pfms, \@pfm; push @consensus, $con; } } return \@consensus, \@pfms; } 1; TFBS-0.7.1/TFBS/PatternGen/.svn/text-base/Elph.pm.svn-base000077500000000000000000000202161305752266700226260ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::Elph # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::Elph - a pattern factory that uses the Elph program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::Elph->new(-seq_file=>'sequences.fa', -binary => '/Elph/elph' -motif_length => [8, 9, 10], -additional_params => '-x -r -e'); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::Gibbs builds position frequency matrices using an advanced Gibbs sampling algorithm implemented in external I program by Chip Lawrence. The algorithm can produce multiple patterns from a single set of sequences. =cut package TFBS::PatternGen::Elph; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::Elph::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $db = TFBS::PatternGen::Gibbs->new(%args); Function: the constructor for the TFBS::PatternGen::Gibbs object Returns : a TFBS::PatternGen::Gibbs object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to Gibbs executable # OPTIONAL: default 'Gibbs' -nr_hits # a presumed number of pattern occurences in the # sequence set: it can be a single integer, e.g. # -nr_hits => 24 , or a reference to an array of # integers, e.g -nr_hits => [12, 24, 36] -motif_length # an expected length of motif in nucleotides: # it can be a single integer, e.g. # -motif_length => 8 , or a reference to an # array ofintegers, e.g -motif_length => [8..12] -additional_params # a string containing additional # command-line switches for the # Gibbs program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'motif_length_string'} = ($args{'-motif_length'} ? (ref($args{'-motif_length'}) ? join(',', @{$args{'-motif_length'}}) : $args{'-motif_length'}) : 8 ); $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'elph'; $self->{'motifs'} = []; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_elph() or $self->throw("Error running elph."); return $self; } sub _run_elph { my $self = shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); $self->{'additional_params'}=~s/-b//; #This removes a -b switch. This enables long output containgin info about the sites my $command_line = $self->{'binary'}." ". $tmp_file." ". "LEN=".$self->{'motif_length_string'}." ". $self->{'additional_params'}." 2>/dev/null"; my $resultstring = `$command_line`; $self->_parse_elph_output($resultstring,$command_line); #print STDERR "$command_line\n"; #print STDERR $resultstring; # unlink $tmp_file; return 1 } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _parse_elph_output { my ($self, $resultstring,$command_line) = @_; #print $resultstring; if ($resultstring=~/^error/){ $self->throw ("Error running elp command:\n $command_line"); return; } #Motif after optimizing #MAP for motif: 46.735 InfoPar=0.098 # #Motif found: # #Background probability model: # a c g t # 0.30 0.20 0.19 0.31 # #Background counts: #a: 1456 #c: 948 #g: 909 #t: 1487 # # #Motif probability model: #Pos: 1 2 3 4 5 6 #a 1.00 0.00 1.00 0.83 0.00 0.00 #c 0.00 0.00 0.00 0.00 0.00 0.17 #g 0.00 1.00 0.00 0.17 1.00 0.83 #t 0.00 0.00 0.00 0.00 0.00 0.00 #------------------------------------------ #Info 1.73 2.42 1.73 1.19 2.42 1.75 # #Motif counts: #a: 6 0 6 5 0 0 #c: 0 0 0 0 0 1 #g: 0 6 0 1 6 5 #t: 0 0 0 0 0 0 # # (my $MAP)=$resultstring=~/MAP for motif: (.*) InfoPar=/; ($resultstring)=~s/.*Motif counts:\n//s; #print STDERR $resultstring; my @array=split "\n",$resultstring; my @matrix; #print $array[0],"\n"; foreach (0..3){ my (@line)=split(/\s+/,$array[$_]); #print "@line\n"; shift @line; push @matrix,\@line; # print "@line\n"; } # print @matrix; #print $resultstring; my $sites=$self->_site_props($resultstring); my $motif =TFBS::PatternGen::Elph::Motif->new ( -tags => {score=>$MAP},#The score in this case is the E-value given in the output -sites=>$sites, -matrix => \@matrix ); # Seq.no Pos ***** Motif ***** Prob D Seq.Id # 1 354 ggatt AGAAGC cgccg 0.1389 -1 GAL1 # 2 636 caaag AGAAGG ttttt 0.6942 -1 GAL10 # 3 456 aaggc AGAAGG cagta 0.6942 -1 GAL2 # 4 444 aaagt AGAGGG ggtaa 0.1388 -1 GAL7 # 5 324 tagag AGAAGG agcaa 0.6942 -1 GAL80 # 6 165 gttac AGAAGG gccgc 0.6942 -1 GCY1 #$resultstring =~ s/.*=== MAP MAXIMIZATION RESULTS ===//s; #my @raw_motifs = split /\-+\n\s+MOTIF \w\n/s, $resultstring; #shift @raw_motifs; # discard the first one #foreach my $raw_motif (@raw_motifs) { # #print $raw_motif; # my $motif =$self->_parse_raw_motif($raw_motif) || next; push @{ $self->{'motifs'} }, $motif; #} #return 1; } sub _site_props{ my ($self,$resultstring)=@_; my @sites; # print $resultstring; #($resultstring)=~s/.*Motif counts:\n//s; my @array=split(/Seq\.no/,$resultstring); #print $array[1]; my @sites_array=split "\n", $array[1]; foreach my $line(@sites_array){ # print $line; next if $line=~/Pos/; last if $line eq''; my @site=split(/\s+/,$line); # print $site[1],"\n"; my $nr=0; $nr = 1 if $site[2]==1; #A special case when the site startsat the first base. #Then no preceding quence is given and the site array =shorter by 1 my $motif_seq=$site[4-$nr]; # print $motif_seq,"\n"; my $site = Bio::SeqFeature::Generic->new ( -start => $site[2], -end => $site[2]+(length$motif_seq)-1, -strand => 1, #Always 1 with elph -source => 'Elph', -score => $site[-3], ); foreach my $seq(@{$self->{'seq_set'}}){ if ($seq->id eq $site[-1]){#last element of the array $site->attach_seq ($seq); } } push (@sites,$site); } return \@sites; } 1; TFBS-0.7.1/TFBS/PatternGen/.svn/text-base/Gibbs.pm.svn-base000077500000000000000000000201241305752266700227620ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::Gibbs # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::Gibbs - a pattern factory that uses Chip Lawrences Gibbs program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::Gibbs->new(-seq_file=>'sequences.fa', -binary => '/Programs/Gibbs-1.0/bin/Gibbs' -nr_hits => 24, -motif_length => [8, 9, 10], -additional_params => '-x -r -e'); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::Gibbs builds position frequency matrices using an advanced Gibbs sampling algorithm implemented in external I program by Chip Lawrence. The algorithm can produce multiple patterns from a single set of sequences. =cut package TFBS::PatternGen::Gibbs; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::Gibbs::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $db = TFBS::PatternGen::Gibbs->new(%args); Function: the constructor for the TFBS::PatternGen::Gibbs object Returns : a TFBS::PatternGen::Gibbs object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to Gibbs executable # OPTIONAL: default 'Gibbs' -nr_hits # a presumed number of pattern occurences in the # sequence set: it can be a single integer, e.g. # -nr_hits => 24 , or a reference to an array of # integers, e.g -nr_hits => [12, 24, 36] -motif_length # an expected length of motif in nucleotides: # it can be a single integer, e.g. # -motif_length => 8 , or a reference to an # array ofintegers, e.g -motif_length => [8..12] -additional_params # a string containing additional # command-line switches for the # Gibbs program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'motif_length_string'} = ($args{'-motif_length'} ? (ref($args{'-motif_length'}) ? join(',', @{$args{'-motif_length'}}) : $args{'-motif_length'}) : 8 ); $self->{'nr_hits_string'} = ($args{'-nr_hits'} ? (ref($args{'-nr_hits'}) ? join(',', @{$args{'-nr_hits'}}) : $args{'-nr_hits'}) : "" ); $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'Gibbs'; $self->{'motifs'} = []; $self->_create_seq_set(%args) or die ('Error creating sequence set'); #print $self->{'seq_set'}->[0]->seq; #$self->_seq_props; $self->_run_Gibbs() or $self->throw("Error running Gibbs."); return $self; } sub _run_Gibbs { my $self = shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $command_line = $self->{'binary'}." ". " -PBernoulli ". $tmp_file." ". $self->{'motif_length_string'}." ". $self->{'nr_hits_string'}." ". $self->{'additional_params'}." -n"; my $resultstring = `$command_line`; $self->_parse_Gibbs_output($resultstring); #print STDERR "$command_line\n"; #print STDERR $resultstring; unlink $tmp_file; return 1 } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _parse_Gibbs_output { my ($self, $resultstring) = @_; #print $resultstring; #print"===========END_RESULTSTRING===============================\n"; $resultstring =~ s/.*=== MAP MAXIMIZATION RESULTS ===//s; my @raw_motifs = split /\-+\n\s+MOTIF \w\n/s, $resultstring; shift @raw_motifs; # discard the first one foreach my $raw_motif (@raw_motifs) { #print $raw_motif; my $motif =$self->_parse_raw_motif($raw_motif) || next; push @{ $self->{'motifs'} }, $motif; } return 1; } sub _site_props{ my ($self,$raw_motif)=@_; my @sites; # print $raw_motif; $raw_motif=~s/.*Num Motifs:\s+\d+\n//s; # print $raw_motif; #print "#####################################################\n"; $raw_motif=~s/\n\s+\*+.*//s; #print $raw_motif,"\n"; my @site_lines=(split("\n", $raw_motif)); foreach my $site(@site_lines){ my $start_seq; my $end_seq; my ($seq_nr,$pattern_nr,$start,$seq,$end,$score,$strand,$desc)=$site=~/\s+(\d+),\s+(\d+)\s+(\d+)\s+([\w,\s]+)\s+(\d+)\s+(\d\.\d+)\s+([F,R])(.*)/; # print $seq_nr,$pattern_nr,$start,$seq; if ($strand eq "F"){ $strand=1; $start_seq=$start; $end_seq=$end; } else{ $strand=-1; $start_seq=$end; $end_seq=$start; } #print $site; my $site = Bio::SeqFeature::Generic->new ( -start => $start_seq, -end => $end_seq, -strand => $strand, -source => 'Gibbs sampler', -score => $score, ); $site->attach_seq ($self->{'seq_set'}->[$seq_nr-1]); push (@sites,$site); } foreach my $site(@sites){ #print $site->start."\n"; } return \@sites; } sub _parse_raw_motif { # a utility function my ($self,$raw_motif) = @_; # print $raw_motif; my ($raw_matrix, $raw_bp, $length, $nr_hits, $MAP_score) = $raw_motif =~ /Motif model \(residue frequency x 100\)\n(.+)Motif probability model\n.+Background probability model\n\s+(.+?)\n.+\D(\d+) columns\nNum Motifs\: (\d+).+Difference of Logs of Maps = ([\-\.\d]+)\n/s; #print $raw_matrix; return undef unless $raw_matrix; my $sites = $self->_site_props($raw_motif); # print STDERR # join ":", ($raw_matrix, $raw_bp, $length, $nr_hits); print "\n"; my $matrix = _parse_raw_matrix($raw_matrix); #print $matrix; return TFBS::PatternGen::Gibbs::Motif->new #This object does not contain a new method. Instead the new method is than searched in the first ISA package. Remember that the object is still a TFBS::PatternGen::Gibbs::Motif. #The only ISA in the package is TFBS::PatternGen::Motif.pm. This package indeed contains the new method (-length => $length."", -bg_probabilities => [split /\s+/, $raw_bp], -MAP_score => $MAP_score, -tags => {MAP_score => $MAP_score}, -nr_hits => $nr_hits, -sites=>$sites, -matrix => $matrix ); } sub _parse_raw_matrix { # a utility function my $raw_matrix = shift; my @lines = split "\n", $raw_matrix; my (@A, @C, @G, @T); foreach my $line (@lines) { my $value_string; next unless ($value_string) = $line =~ /\s+\d+\s+\|\s+(.+)/; $value_string =~ s/\./0/g; my ($a, $t, $c, $g) = split /\s+/, $value_string; push @A, $a; push @C, $c; push @G, $g; push @T, $t; } # print STDERR join(" ",@A, "\n", @C, "\n", @G, "\n", @T, "\n"); return [\@A, \@C, \@G, \@T]; } 1; TFBS-0.7.1/TFBS/PatternGen/.svn/text-base/MEME.pm.svn-base000077500000000000000000000140171305752266700224630ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::MEME # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::MEME - a pattern factory that uses the MEME program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::MEME->new(-seq_file=>'sequences.fa', -binary => 'meme' my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::MEME builds position frequency matrices using an external program MEME written by Bailey and Elkan. For information and source code of MEME see http://www.sdsc.edu/MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =cut package TFBS::PatternGen::MEME; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::SimplePFM; use TFBS::PatternGen::MEME::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $pattrengen = TFBS::PatternGen::MEME->new(%args); Function: the constructor for the TFBS::PatternGen::MEME object Returns : a TFBS::PatternGen::MEME object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to the 'meme' executable # OPTIONAL: default 'meme' -additional_params # a string containing additional # command-line switches for the # meme program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'filename'} =$args{'-seq_file'}; $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'meme'; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_meme() or $self->throw("Error running MEME."); return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _run_meme{ my ($self)=shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $command_line = $self->{'binary'}." ". $tmp_file." ". "-text ". "-dna ". $self->{'additional_params'} ." 2>/dev/null" ; # print STDERR "$command_line\n"; my $resultstring = `$command_line`; # print STDERR $resultstring; $self->_parse_meme_output($resultstring,$command_line); unlink $tmp_file; return 1 } sub _parse_meme_output{ my ($self,$resultstring,$command_line)=@_; if ($resultstring=~/^error/){ # warn "Error running AnnSpec\nNo patterns produced"; $self->throw ("Error running MEME command:\n $command_line"); return; } my @motifs=split(/\*\nMOTIF/,$resultstring); shift @motifs;#discard the first one #print STDERR scalar @motifs,"\n"; foreach my $raw_motif(@motifs){ my ($matrix,$sites,$score)=$self->_parse_raw_matrix($raw_motif); # print STDERR $matrix; my $motif =TFBS::PatternGen::MEME::Motif->new ( -tags => {score=>$score},#The score in this case is the E-value given in the output -sites=>$sites, -matrix => $matrix ); push @{ $self->{'motifs'} }, $motif; } return } # # sub _parse_raw_matrix{ my ($self,$string)=@_; my @sites; my @matrix; $string=~s/(Motif \d+ block diagrams.*)//s; # print STDERR $string; my ($width,$e_value)=$string=~/width =\s+(\d+)\s+sites.*E-value =(.*)\n/; # print STDERR $e_value,"\n"; $string=~s/.*Motif \d+ sites sorted by position p-value//s; #print STDERR $string; my @array=split("\n",$string); foreach my $line(@array){ my $nr=0; my $strand=1;#if revcomp is not selected teh strand is always 1 next if $line=~/^-/; next if $line=~/P-value\s+Site/; my (@properties)=split(/\s+/,$line); next if @properties<1; # print STDERR "@properties\n"; #First determine whether -revcomp switch is used and thus strand info is given if ($properties[1] eq "+" or $properties[1] eq "-"){ $strand=$properties[1]; $nr=1; } my $site = Bio::SeqFeature::Generic->new ( -start =>$properties[1+$nr], -end =>$properties[1+$nr]+$width-1, -strand=>$strand, -source=>'MEME', -score=>$properties[2+$nr] ); foreach my $seq(@{$self->{'seq_set'}}){ if ($seq->id eq $properties[0]){ $site->attach_seq ($seq); } } push @sites,$site; } foreach my $site(@sites){ push @matrix,$site->seq->seq; } my $patterngen=TFBS::PatternGen::SimplePFM->new(-seq_list=>\@matrix); my $matrix=$patterngen->pattern->rawprint; # print STDERR $matrix; return ($matrix,\@sites,$e_value); } 1;TFBS-0.7.1/TFBS/PatternGen/.svn/text-base/SimplePFM.pm.svn-base000077500000000000000000000063171305752266700235400ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::SimplePFM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::SimplePFM - a simple position frequency matrix factory =head1 SYNOPSIS my @sequences = qw( AAGCCT AGGCAT AAGCCT AAGCCT AGGCAT AGGCCT AGGCAT AGGTTT AGGCAT AGGCCT AGGCCT ); my $patterngen = TFBS::PatternGen::SimplePFM->new(-seq_list=>\@sequences); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::SimplePFM generates a position frequency matrix from a set of nucleotide sequences of equal length, The sequences can be passed either as strings, as Bio::Seq objects or as a fasta file. This pattern generator always creates only one pattern from a given set of sequences. =cut package TFBS::PatternGen::SimplePFM; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGenI; use TFBS::PatternGen; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $db = TFBS::PatternGen::SimplePFM->new(%args); Function: the constructor for the TFBS::PatternGen::SimplePFM object Returns : a TFBS::PatternGen::SimplePFM obkect Args : This method takes named arguments; you must specify one of the following -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_check_seqs_for_uniform_length(); $self->{'motifs'} = [$self->_create_motif()]; return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three above methods are used fro the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _create_motif { my $self = shift; my $length = $self->{'seq_set'}->[-1]->length(); # initialize the matrix my $matrixref = []; for my $i (0..3) { for my $j (0..$length-1) { $matrixref->[$i][$j] = 0; } } #fill the matrix my @base = qw(A C G T); foreach my $seqobj ( @{ $self->{seq_set} } ) { for my $i (0..3) { my $seqstring = $seqobj->seq; my @seqbase = split "", uc $seqstring; for my $j (0..$length-1) { $matrixref->[$i][$j] += ($base[$i] eq $seqbase[$j])?1:0; } } } my $nrhits =0; for my $i (0..3) {$nrhits += $matrixref->[$i][0];} my $motif = TFBS::PatternGen::Motif::Matrix->new(-matrix => $matrixref, -nr_hits=> $nrhits); return $motif; } sub _validate_seq { # a utility function my ($sequence)=@_; $sequence=~ s/[ACGT]//g; return ($sequence eq "" ? 1 : 0); } 1;TFBS-0.7.1/TFBS/PatternGen/.svn/text-base/YMF.pm.svn-base000077500000000000000000000123031305752266700223670ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::YMF # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::MEME - a pattern factory that uses the MEME program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::MEME->new(-seq_file=>'sequences.fa', -binary => 'meme' my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::MEME builds position frequency matrices using an external program MEME written by Bailey and Elkan. For information and source code of MEME see http://www.sdsc.edu/MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =cut package TFBS::PatternGen::YMF; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::YMF::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; use File::Temp qw/ tempfile tempdir /; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $patterngen = TFBS::PatternGen::YMF->new(%args); Function: the constructor for the TFBS::PatternGen::MEME object Returns : a TFBS::PatternGen::MEME object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to the 'meme' executable # OPTIONAL: default 'meme' -additional_params # a string containing additional # command-line switches for the # meme program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'width'}=$args{'-length_oligo'}; $self->{'path_org'}=$args{'-pathoforganismtables'}; $self->{'len_region'}=$args{'-length_region'}; $self->{'config_file'}=$args{'-config_file'}||$args{'-stats_path'}."/stats.config"; #The latter is the example configfile that comes with the installation of YMF $self->{'abs_stats_path'} = $args{'-abs_stats_path'} ; #This is the directory where the executable and the results file is #generated by the program are located $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_stats() or $self->throw("Error running stats."); return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _run_stats{ my ($self)=shift; my $tmp_file = tmpnam(); my $dumpfile = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $dir = tempdir(); #print $dir; #change directory to directory where the program is located #system 'cd $dir.w;'; # my $command="cd $dir;"; # print $command; # system $command; # `$command`; # system 'ls -ltr'; my $command_line = $self->{'abs_stats_path'}."/stats ". # "stats ". $self->{'config_file'}." ". $self->{'len_region'}." ". $self->{'width'}." ". $self->{'path_org'}." ". "-sort ".#sorts on z-score $tmp_file ." >$dumpfile" ; # print STDERR "cd $dir;$command_line\n"; my $resultstring = `cd $dir;$command_line`; # print STDERR $resultstring; $self->_parse_stats_output($resultstring,$command_line,$dumpfile,$dir); unlink $tmp_file; #unlink $dumpfile; return 1 } # sub _parse_stats_output{ my ($self,$resultstring,$command_line,$dumpfile,$temp_dir)=@_; open DUMP,$dumpfile; while(){ if ((/(^Error.*)/) or /(.*Aborting.*)/){ # warn "Error running AnnSpec\nNo patterns produced"; print "YMF Error message: \"$1\"\n"; unlink $dumpfile; $self->throw ("Error running YMF using command:\n $command_line"); return; } } unlink $dumpfile; open RES,"$temp_dir/results"; my $skip=; while (){ my ($word,$occ,$z_score,$expect,$var)=split; #print $word; my $motif =TFBS::PatternGen::YMF::Motif->new (-word=>$word, -tags => {z_score=>$z_score, 'occurences'=>$occ, 'expectation value'=>$expect, 'variance'=>$var} ); push @{ $self->{'motifs'} }, $motif; } my $command="rm -r $temp_dir"; #print $command; `$command`;# or die "could not unlink $!"; # return } # 1;TFBS-0.7.1/TFBS/PatternGen/AnnSpec.pm000077500000000000000000000145161305752266700167760ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::AnnSpec # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::AnnSpec - a pattern factory that uses the AnnSpec program (version 2.1) =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::AnnSpec->new(-seq_file=>'sequences.fa', -binary => 'ann-spec ' my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::AnnSpec builds position frequency matrices using an external program AnnSpec (Workman, C. and Stormo, G.D. (2000) ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. Proc. Pacific Symposium on Biocomputing 2000). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.se =cut package TFBS::PatternGen::AnnSpec; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::AnnSpec::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $pattrengen = TFBS::PatternGen::AnnSpec->new(%args); Function: the constructor for the TFBS::PatternGen::AnnSpec object Returns : a TFBS::PatternGen::AnnSpec object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to the 'meme' executable # OPTIONAL: default 'ann-spec' -additional_params # a string containing additional # command-line switches for the # ann-spec program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'filename'} =$args{'-seq_file'}; $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'annspec'; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_AnnSpec() or $self->throw("Error running AnnSpec."); return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _run_AnnSpec{ my ($self)=shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $command_line = $self->{'binary'}." ". "-f ".$tmp_file." ". # $self->{'motif_length_string'}." ". # $self->{'nr_hits_string'}." ". $self->{'additional_params'}. ""; # print STDERR "$command_line\n"; my $resultstring = `$command_line`; print "$resultstring\n"; # open(TEST, "test.out"); # this sentense is the one I add # my @lines = ;# this sentense is the one I add # my $resultstring = join '', @lines;# this sentense is the one I add # print STDERR $resultstring; $self->_parse_AnnSpec_output($resultstring,$command_line); unlink $tmp_file; return 1 } sub _parse_AnnSpec_output{ my ($self,$resultstring,$command_line)=@_; if ($resultstring eq''){ # warn "Error running AnnSpec\nNo patterns produced"; $self->throw ("Error running AnnSpec using command:\n $command_line"); return; } my ($consensus,$matrix)=$self->_parse_raw_matrix($resultstring); my ($score,$sites)=$self->_parse_sites($resultstring); for(my $x = 0; $x < scalar(@$consensus); $x++){ my $motif =TFBS::PatternGen::AnnSpec::Motif->new ( #-length => $length."", # -bg_probabilities => [split /\s+/, $raw_bp], -tags => {consensus => $consensus->[$x], score=>$score->[$x]}, -nr_hits => 1, -sites=>$sites->[$x], -matrix => $matrix->[$x] ); push @{ $self->{'motifs'} }, $motif; } return } sub _parse_sites{ my ($self,$string)=@_; my (@hits, @scores); foreach my $substring (split /REPORTING/, $string ){ my @sub_hits; my ($sites)=$substring=~/STR\s+n.*seq\n(.*)RUN\s+ALIGNMENT.*/s; my ($average)=$substring=~/RUN INFORMATION_CONTENT\s+(\d*\.*\d*)/; my ($score)=$substring=~/RUN\s+SCORE\s+(\d*\.*\d*)/; if($sites){ my @sites=split/\n/,$sites; foreach my $site (@sites){ my @site_array=split(/\s+/,$site); my ($seq_id)=$site_array[6]=~/>(.*)/; my $strand=1; $strand=-1 if $site_array[3]=~/\'/;#MEans we have a pattern in the reverse strand my ($start)=$site_array[3]=~/(\d+)/; my $site = Bio::SeqFeature::Generic->new ( -start => $start, -end => $start+(length$site_array[4])-1, -strand => $strand, -source => 'AnnSpec', -score => $site_array[2], ); foreach my $seq(@{$self->{'seq_set'}}){ if ($seq->id eq $seq_id){ $site->attach_seq ($seq); } } push (@sub_hits,$site); } push @scores, $score; push @hits, \@sub_hits; } } return \@scores,\@hits; } sub _parse_raw_matrix{ my ($self,$string)=@_; my (@pfms, @consensus); foreach my $sub_string (split /REPORTING/, $string){ my ($ma)=$sub_string=~/RUN\s+WEIGHTS_CONS.*ALR\s+\/.*ALR\s+\#.*(ALR.*\nALR.*\nALR.*\nALR.*\s+\d+\n)ALR\s+=+.*/s; my ($con)=$sub_string=~/WEIGHTS_CONS\s+(.*)\n/; if($ma){ my @matrix=split("\n",$ma); my @pfm; foreach my $row(@matrix){ # print $row; my @row=split /\s+/, $row; push @pfm, [@row[2..scalar@row-1]]; } push @pfms, \@pfm; push @consensus, $con; } } return \@consensus, \@pfms; } 1; TFBS-0.7.1/TFBS/PatternGen/AnnSpec/000077500000000000000000000000001305752266700164265ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/AnnSpec/.svn/000077500000000000000000000000001305752266700173125ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/AnnSpec/.svn/all-wcprops000077500000000000000000000003141305752266700215010ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 52 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/AnnSpec END Motif.pm K 25 svn:wc:ra_dav:version-url V 61 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/AnnSpec/Motif.pm END TFBS-0.7.1/TFBS/PatternGen/AnnSpec/.svn/entries000077500000000000000000000005411305752266700207110ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/PatternGen/AnnSpec http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df Motif.pm file 2009-08-07T13:10:53.000000Z 0f5af59c814a45843af3a0954ee4aca7 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/PatternGen/AnnSpec/.svn/format000077500000000000000000000000021305752266700205200ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/PatternGen/AnnSpec/.svn/text-base/000077500000000000000000000000001305752266700212065ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/AnnSpec/.svn/text-base/Motif.pm.svn-base000077500000000000000000000023111305752266700243370ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::AnnSpec::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::AnnSpec::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::AnnSpec::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the AnnSpec program. You do not normally want to create a TFBS::PatternGen::AnnSpec::Motif yourself. They are created by running TFBS::PatternGen::AnnSpec =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard and Wynand Alkema Boris Lenhard EBoris.Lenhard@cgb.ki.seE Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::AnnSpec::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); TFBS-0.7.1/TFBS/PatternGen/AnnSpec/Motif.pm000077500000000000000000000023111305752266700200420ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::AnnSpec::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::AnnSpec::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::AnnSpec::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the AnnSpec program. You do not normally want to create a TFBS::PatternGen::AnnSpec::Motif yourself. They are created by running TFBS::PatternGen::AnnSpec =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard and Wynand Alkema Boris Lenhard EBoris.Lenhard@cgb.ki.seE Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::AnnSpec::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); TFBS-0.7.1/TFBS/PatternGen/Elph.pm000077500000000000000000000202171305752266700163320ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::Elph # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::Elph - a pattern factory that uses the Elph program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::Elph->new(-seq_file=>'sequences.fa', -binary => '/Elph/elph' -motif_length => [8, 9, 10], -additional_params => '-x -r -e'); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::Gibbs builds position frequency matrices using an advanced Gibbs sampling algorithm implemented in external I program by Chip Lawrence. The algorithm can produce multiple patterns from a single set of sequences. =cut package TFBS::PatternGen::Elph; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::Elph::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $db = TFBS::PatternGen::Gibbs->new(%args); Function: the constructor for the TFBS::PatternGen::Gibbs object Returns : a TFBS::PatternGen::Gibbs object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to Gibbs executable # OPTIONAL: default 'Gibbs' -nr_hits # a presumed number of pattern occurrences in the # sequence set: it can be a single integer, e.g. # -nr_hits => 24 , or a reference to an array of # integers, e.g -nr_hits => [12, 24, 36] -motif_length # an expected length of motif in nucleotides: # it can be a single integer, e.g. # -motif_length => 8 , or a reference to an # array ofintegers, e.g -motif_length => [8..12] -additional_params # a string containing additional # command-line switches for the # Gibbs program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'motif_length_string'} = ($args{'-motif_length'} ? (ref($args{'-motif_length'}) ? join(',', @{$args{'-motif_length'}}) : $args{'-motif_length'}) : 8 ); $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'elph'; $self->{'motifs'} = []; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_elph() or $self->throw("Error running elph."); return $self; } sub _run_elph { my $self = shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); $self->{'additional_params'}=~s/-b//; #This removes a -b switch. This enables long output containgin info about the sites my $command_line = $self->{'binary'}." ". $tmp_file." ". "LEN=".$self->{'motif_length_string'}." ". $self->{'additional_params'}." 2>/dev/null"; my $resultstring = `$command_line`; $self->_parse_elph_output($resultstring,$command_line); #print STDERR "$command_line\n"; #print STDERR $resultstring; # unlink $tmp_file; return 1 } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _parse_elph_output { my ($self, $resultstring,$command_line) = @_; #print $resultstring; if ($resultstring=~/^error/){ $self->throw ("Error running elp command:\n $command_line"); return; } #Motif after optimizing #MAP for motif: 46.735 InfoPar=0.098 # #Motif found: # #Background probability model: # a c g t # 0.30 0.20 0.19 0.31 # #Background counts: #a: 1456 #c: 948 #g: 909 #t: 1487 # # #Motif probability model: #Pos: 1 2 3 4 5 6 #a 1.00 0.00 1.00 0.83 0.00 0.00 #c 0.00 0.00 0.00 0.00 0.00 0.17 #g 0.00 1.00 0.00 0.17 1.00 0.83 #t 0.00 0.00 0.00 0.00 0.00 0.00 #------------------------------------------ #Info 1.73 2.42 1.73 1.19 2.42 1.75 # #Motif counts: #a: 6 0 6 5 0 0 #c: 0 0 0 0 0 1 #g: 0 6 0 1 6 5 #t: 0 0 0 0 0 0 # # (my $MAP)=$resultstring=~/MAP for motif: (.*) InfoPar=/; ($resultstring)=~s/.*Motif counts:\n//s; #print STDERR $resultstring; my @array=split "\n",$resultstring; my @matrix; #print $array[0],"\n"; foreach (0..3){ my (@line)=split(/\s+/,$array[$_]); #print "@line\n"; shift @line; push @matrix,\@line; # print "@line\n"; } # print @matrix; #print $resultstring; my $sites=$self->_site_props($resultstring); my $motif =TFBS::PatternGen::Elph::Motif->new ( -tags => {score=>$MAP},#The score in this case is the E-value given in the output -sites=>$sites, -matrix => \@matrix ); # Seq.no Pos ***** Motif ***** Prob D Seq.Id # 1 354 ggatt AGAAGC cgccg 0.1389 -1 GAL1 # 2 636 caaag AGAAGG ttttt 0.6942 -1 GAL10 # 3 456 aaggc AGAAGG cagta 0.6942 -1 GAL2 # 4 444 aaagt AGAGGG ggtaa 0.1388 -1 GAL7 # 5 324 tagag AGAAGG agcaa 0.6942 -1 GAL80 # 6 165 gttac AGAAGG gccgc 0.6942 -1 GCY1 #$resultstring =~ s/.*=== MAP MAXIMIZATION RESULTS ===//s; #my @raw_motifs = split /\-+\n\s+MOTIF \w\n/s, $resultstring; #shift @raw_motifs; # discard the first one #foreach my $raw_motif (@raw_motifs) { # #print $raw_motif; # my $motif =$self->_parse_raw_motif($raw_motif) || next; push @{ $self->{'motifs'} }, $motif; #} #return 1; } sub _site_props{ my ($self,$resultstring)=@_; my @sites; # print $resultstring; #($resultstring)=~s/.*Motif counts:\n//s; my @array=split(/Seq\.no/,$resultstring); #print $array[1]; my @sites_array=split "\n", $array[1]; foreach my $line(@sites_array){ # print $line; next if $line=~/Pos/; last if $line eq''; my @site=split(/\s+/,$line); # print $site[1],"\n"; my $nr=0; $nr = 1 if $site[2]==1; #A special case when the site startsat the first base. #Then no preceding quence is given and the site array =shorter by 1 my $motif_seq=$site[4-$nr]; # print $motif_seq,"\n"; my $site = Bio::SeqFeature::Generic->new ( -start => $site[2], -end => $site[2]+(length$motif_seq)-1, -strand => 1, #Always 1 with elph -source => 'Elph', -score => $site[-3], ); foreach my $seq(@{$self->{'seq_set'}}){ if ($seq->id eq $site[-1]){#last element of the array $site->attach_seq ($seq); } } push (@sites,$site); } return \@sites; } 1; TFBS-0.7.1/TFBS/PatternGen/Elph/000077500000000000000000000000001305752266700157675ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/Elph/.svn/000077500000000000000000000000001305752266700166535ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/Elph/.svn/all-wcprops000077500000000000000000000003061305752266700210430ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 49 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/Elph END Motif.pm K 25 svn:wc:ra_dav:version-url V 58 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/Elph/Motif.pm END TFBS-0.7.1/TFBS/PatternGen/Elph/.svn/entries000077500000000000000000000005361305752266700202560ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/PatternGen/Elph http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df Motif.pm file 2009-08-07T13:10:54.000000Z 9b0c26fe4c28452e8eaa4ff54d4d47cd 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/PatternGen/Elph/.svn/format000077500000000000000000000000021305752266700200610ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/PatternGen/Elph/.svn/text-base/000077500000000000000000000000001305752266700205475ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/Elph/.svn/text-base/Motif.pm.svn-base000077500000000000000000000021731305752266700237060ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::AnnSpec::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::AnnSpec::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::MEME::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the meme program. You do not normally want to create a TFBS::PatternGen::MEME::Motif yourself. They are created by running TFBS::PatternGen::MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::Elph::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); TFBS-0.7.1/TFBS/PatternGen/Elph/Motif.pm000077500000000000000000000021731305752266700174110ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::AnnSpec::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::AnnSpec::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::MEME::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the meme program. You do not normally want to create a TFBS::PatternGen::MEME::Motif yourself. They are created by running TFBS::PatternGen::MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::Elph::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); TFBS-0.7.1/TFBS/PatternGen/Gibbs.pm000077500000000000000000000201251305752266700164660ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::Gibbs # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::Gibbs - a pattern factory that uses Chip Lawrences Gibbs program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::Gibbs->new(-seq_file=>'sequences.fa', -binary => '/Programs/Gibbs-1.0/bin/Gibbs' -nr_hits => 24, -motif_length => [8, 9, 10], -additional_params => '-x -r -e'); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::Gibbs builds position frequency matrices using an advanced Gibbs sampling algorithm implemented in external I program by Chip Lawrence. The algorithm can produce multiple patterns from a single set of sequences. =cut package TFBS::PatternGen::Gibbs; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::Gibbs::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $db = TFBS::PatternGen::Gibbs->new(%args); Function: the constructor for the TFBS::PatternGen::Gibbs object Returns : a TFBS::PatternGen::Gibbs object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to Gibbs executable # OPTIONAL: default 'Gibbs' -nr_hits # a presumed number of pattern occurrences in the # sequence set: it can be a single integer, e.g. # -nr_hits => 24 , or a reference to an array of # integers, e.g -nr_hits => [12, 24, 36] -motif_length # an expected length of motif in nucleotides: # it can be a single integer, e.g. # -motif_length => 8 , or a reference to an # array ofintegers, e.g -motif_length => [8..12] -additional_params # a string containing additional # command-line switches for the # Gibbs program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'motif_length_string'} = ($args{'-motif_length'} ? (ref($args{'-motif_length'}) ? join(',', @{$args{'-motif_length'}}) : $args{'-motif_length'}) : 8 ); $self->{'nr_hits_string'} = ($args{'-nr_hits'} ? (ref($args{'-nr_hits'}) ? join(',', @{$args{'-nr_hits'}}) : $args{'-nr_hits'}) : "" ); $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'Gibbs'; $self->{'motifs'} = []; $self->_create_seq_set(%args) or die ('Error creating sequence set'); #print $self->{'seq_set'}->[0]->seq; #$self->_seq_props; $self->_run_Gibbs() or $self->throw("Error running Gibbs."); return $self; } sub _run_Gibbs { my $self = shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $command_line = $self->{'binary'}." ". " -PBernoulli ". $tmp_file." ". $self->{'motif_length_string'}." ". $self->{'nr_hits_string'}." ". $self->{'additional_params'}." -n"; my $resultstring = `$command_line`; $self->_parse_Gibbs_output($resultstring); #print STDERR "$command_line\n"; #print STDERR $resultstring; unlink $tmp_file; return 1 } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _parse_Gibbs_output { my ($self, $resultstring) = @_; #print $resultstring; #print"===========END_RESULTSTRING===============================\n"; $resultstring =~ s/.*=== MAP MAXIMIZATION RESULTS ===//s; my @raw_motifs = split /\-+\n\s+MOTIF \w\n/s, $resultstring; shift @raw_motifs; # discard the first one foreach my $raw_motif (@raw_motifs) { #print $raw_motif; my $motif =$self->_parse_raw_motif($raw_motif) || next; push @{ $self->{'motifs'} }, $motif; } return 1; } sub _site_props{ my ($self,$raw_motif)=@_; my @sites; # print $raw_motif; $raw_motif=~s/.*Num Motifs:\s+\d+\n//s; # print $raw_motif; #print "#####################################################\n"; $raw_motif=~s/\n\s+\*+.*//s; #print $raw_motif,"\n"; my @site_lines=(split("\n", $raw_motif)); foreach my $site(@site_lines){ my $start_seq; my $end_seq; my ($seq_nr,$pattern_nr,$start,$seq,$end,$score,$strand,$desc)=$site=~/\s+(\d+),\s+(\d+)\s+(\d+)\s+([\w,\s]+)\s+(\d+)\s+(\d\.\d+)\s+([F,R])(.*)/; # print $seq_nr,$pattern_nr,$start,$seq; if ($strand eq "F"){ $strand=1; $start_seq=$start; $end_seq=$end; } else{ $strand=-1; $start_seq=$end; $end_seq=$start; } #print $site; my $site = Bio::SeqFeature::Generic->new ( -start => $start_seq, -end => $end_seq, -strand => $strand, -source => 'Gibbs sampler', -score => $score, ); $site->attach_seq ($self->{'seq_set'}->[$seq_nr-1]); push (@sites,$site); } foreach my $site(@sites){ #print $site->start."\n"; } return \@sites; } sub _parse_raw_motif { # a utility function my ($self,$raw_motif) = @_; # print $raw_motif; my ($raw_matrix, $raw_bp, $length, $nr_hits, $MAP_score) = $raw_motif =~ /Motif model \(residue frequency x 100\)\n(.+)Motif probability model\n.+Background probability model\n\s+(.+?)\n.+\D(\d+) columns\nNum Motifs\: (\d+).+Difference of Logs of Maps = ([\-\.\d]+)\n/s; #print $raw_matrix; return undef unless $raw_matrix; my $sites = $self->_site_props($raw_motif); # print STDERR # join ":", ($raw_matrix, $raw_bp, $length, $nr_hits); print "\n"; my $matrix = _parse_raw_matrix($raw_matrix); #print $matrix; return TFBS::PatternGen::Gibbs::Motif->new #This object does not contain a new method. Instead the new method is than searched in the first ISA package. Remember that the object is still a TFBS::PatternGen::Gibbs::Motif. #The only ISA in the package is TFBS::PatternGen::Motif.pm. This package indeed contains the new method (-length => $length."", -bg_probabilities => [split /\s+/, $raw_bp], -MAP_score => $MAP_score, -tags => {MAP_score => $MAP_score}, -nr_hits => $nr_hits, -sites=>$sites, -matrix => $matrix ); } sub _parse_raw_matrix { # a utility function my $raw_matrix = shift; my @lines = split "\n", $raw_matrix; my (@A, @C, @G, @T); foreach my $line (@lines) { my $value_string; next unless ($value_string) = $line =~ /\s+\d+\s+\|\s+(.+)/; $value_string =~ s/\./0/g; my ($a, $t, $c, $g) = split /\s+/, $value_string; push @A, $a; push @C, $c; push @G, $g; push @T, $t; } # print STDERR join(" ",@A, "\n", @C, "\n", @G, "\n", @T, "\n"); return [\@A, \@C, \@G, \@T]; } 1; TFBS-0.7.1/TFBS/PatternGen/Gibbs/000077500000000000000000000000001305752266700161255ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/Gibbs/.svn/000077500000000000000000000000001305752266700170115ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/Gibbs/.svn/all-wcprops000077500000000000000000000003101305752266700211740ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 50 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/Gibbs END Motif.pm K 25 svn:wc:ra_dav:version-url V 59 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/Gibbs/Motif.pm END TFBS-0.7.1/TFBS/PatternGen/Gibbs/.svn/entries000077500000000000000000000005371305752266700204150ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/PatternGen/Gibbs http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df Motif.pm file 2009-08-07T13:10:51.000000Z cbaa7a312912e8cd8d3873ec0eeea71a 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/PatternGen/Gibbs/.svn/format000077500000000000000000000000021305752266700202170ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/PatternGen/Gibbs/.svn/text-base/000077500000000000000000000000001305752266700207055ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/Gibbs/.svn/text-base/Motif.pm.svn-base000077500000000000000000000042521305752266700240440ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::Gibbs::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::Gibbs::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::Gibbs::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the Gibbs program. You do not normally want to create a TFBS::PatternGen::Gibbs::Motif yourself. They are created by running TFBS::PatternGen::Gibbs =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard and Wynand Alkema Boris Lenhard EBoris.Lenhard@cgb.ki.seE Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::Gibbs::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); =head2 MAP Title : MAP Usage : my $map_score = $motif->MAP; Function: returns MAP score for the detected motif (This is a backward compatibility method. For consistency, you should use $motif->tag('MAP_score') instead Returns : float (a scalar) Args : none =head2 Other methods TFBS::PatterGen::Motif::Gibbs inherits from TFBS::PatternGen::Motif, which inherits from TFBS::Matrix. Please consult the documentation of those modules for additional available methods. =cut sub MAP{ my ($self) = @_; return $self->tag("MAP_score"); } sub _calculate_PFM { my $self = shift; unless ($self->{'nr_hits'}) { $self->throw(ref($self). " objects must be created with a (nonzero)". " -nr_hits parameter in constructor" ); } my @PFM; foreach my $rowref ( @{$self->{'matrix'}} ) { my @PFMrow; foreach my $element (@$rowref) { push @PFMrow, int($self->{'nr_hits'}*$element/100 + 0.5); } push @PFM, [@PFMrow]; } return \@PFM; } TFBS-0.7.1/TFBS/PatternGen/Gibbs/Motif.pm000077500000000000000000000042521305752266700175470ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::Gibbs::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::Gibbs::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::Gibbs::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the Gibbs program. You do not normally want to create a TFBS::PatternGen::Gibbs::Motif yourself. They are created by running TFBS::PatternGen::Gibbs =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard and Wynand Alkema Boris Lenhard EBoris.Lenhard@cgb.ki.seE Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::Gibbs::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); =head2 MAP Title : MAP Usage : my $map_score = $motif->MAP; Function: returns MAP score for the detected motif (This is a backward compatibility method. For consistency, you should use $motif->tag('MAP_score') instead Returns : float (a scalar) Args : none =head2 Other methods TFBS::PatterGen::Motif::Gibbs inherits from TFBS::PatternGen::Motif, which inherits from TFBS::Matrix. Please consult the documentation of those modules for additional available methods. =cut sub MAP{ my ($self) = @_; return $self->tag("MAP_score"); } sub _calculate_PFM { my $self = shift; unless ($self->{'nr_hits'}) { $self->throw(ref($self). " objects must be created with a (nonzero)". " -nr_hits parameter in constructor" ); } my @PFM; foreach my $rowref ( @{$self->{'matrix'}} ) { my @PFMrow; foreach my $element (@$rowref) { push @PFMrow, int($self->{'nr_hits'}*$element/100 + 0.5); } push @PFM, [@PFMrow]; } return \@PFM; } TFBS-0.7.1/TFBS/PatternGen/MEME.pm000077500000000000000000000140171305752266700161660ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::MEME # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::MEME - a pattern factory that uses the MEME program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::MEME->new(-seq_file=>'sequences.fa', -binary => 'meme' my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::MEME builds position frequency matrices using an external program MEME written by Bailey and Elkan. For information and source code of MEME see http://www.sdsc.edu/MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =cut package TFBS::PatternGen::MEME; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::SimplePFM; use TFBS::PatternGen::MEME::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $pattrengen = TFBS::PatternGen::MEME->new(%args); Function: the constructor for the TFBS::PatternGen::MEME object Returns : a TFBS::PatternGen::MEME object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to the 'meme' executable # OPTIONAL: default 'meme' -additional_params # a string containing additional # command-line switches for the # meme program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'filename'} =$args{'-seq_file'}; $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'meme'; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_meme() or $self->throw("Error running MEME."); return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _run_meme{ my ($self)=shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $command_line = $self->{'binary'}." ". $tmp_file." ". "-text ". "-dna ". $self->{'additional_params'} ." 2>/dev/null" ; # print STDERR "$command_line\n"; my $resultstring = `$command_line`; # print STDERR $resultstring; $self->_parse_meme_output($resultstring,$command_line); unlink $tmp_file; return 1 } sub _parse_meme_output{ my ($self,$resultstring,$command_line)=@_; if ($resultstring=~/^error/){ # warn "Error running AnnSpec\nNo patterns produced"; $self->throw ("Error running MEME command:\n $command_line"); return; } my @motifs=split(/\*\nMOTIF/,$resultstring); shift @motifs;#discard the first one #print STDERR scalar @motifs,"\n"; foreach my $raw_motif(@motifs){ my ($matrix,$sites,$score)=$self->_parse_raw_matrix($raw_motif); # print STDERR $matrix; my $motif =TFBS::PatternGen::MEME::Motif->new ( -tags => {score=>$score},#The score in this case is the E-value given in the output -sites=>$sites, -matrix => $matrix ); push @{ $self->{'motifs'} }, $motif; } return } # # sub _parse_raw_matrix{ my ($self,$string)=@_; my @sites; my @matrix; $string=~s/(Motif \d+ block diagrams.*)//s; # print STDERR $string; my ($width,$e_value)=$string=~/width =\s+(\d+)\s+sites.*E-value =(.*)\n/; # print STDERR $e_value,"\n"; $string=~s/.*Motif \d+ sites sorted by position p-value//s; #print STDERR $string; my @array=split("\n",$string); foreach my $line(@array){ my $nr=0; my $strand=1;#if revcomp is not selected the strand is always 1 next if $line=~/^-/; next if $line=~/P-value\s+Site/; my (@properties)=split(/\s+/,$line); next if @properties<1; # print STDERR "@properties\n"; #First determine whether -revcomp switch is used and thus strand info is given if ($properties[1] eq "+" or $properties[1] eq "-"){ $strand=$properties[1]; $nr=1; } my $site = Bio::SeqFeature::Generic->new ( -start =>$properties[1+$nr], -end =>$properties[1+$nr]+$width-1, -strand=>$strand, -source=>'MEME', -score=>$properties[2+$nr] ); foreach my $seq(@{$self->{'seq_set'}}){ if ($seq->id eq $properties[0]){ $site->attach_seq ($seq); } } push @sites,$site; } foreach my $site(@sites){ push @matrix,$site->seq->seq; } my $patterngen=TFBS::PatternGen::SimplePFM->new(-seq_list=>\@matrix); my $matrix=$patterngen->pattern->rawprint; # print STDERR $matrix; return ($matrix,\@sites,$e_value); } 1;TFBS-0.7.1/TFBS/PatternGen/MEME/000077500000000000000000000000001305752266700156225ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/MEME/.svn/000077500000000000000000000000001305752266700165065ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/MEME/.svn/all-wcprops000077500000000000000000000003061305752266700206760ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 49 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/MEME END Motif.pm K 25 svn:wc:ra_dav:version-url V 58 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/MEME/Motif.pm END TFBS-0.7.1/TFBS/PatternGen/MEME/.svn/entries000077500000000000000000000005361305752266700201110ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/PatternGen/MEME http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df Motif.pm file 2009-08-07T13:10:50.000000Z f3616055bbabb9ac36d4491c9d6be735 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/PatternGen/MEME/.svn/format000077500000000000000000000000021305752266700177140ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/PatternGen/MEME/.svn/text-base/000077500000000000000000000000001305752266700204025ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/MEME/.svn/text-base/Motif.pm.svn-base000077500000000000000000000021731305752266700235410ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::AnnSpec::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::AnnSpec::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::MEME::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the meme program. You do not normally want to create a TFBS::PatternGen::MEME::Motif yourself. They are created by running TFBS::PatternGen::MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::MEME::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); TFBS-0.7.1/TFBS/PatternGen/MEME/Motif.pm000077500000000000000000000021731305752266700172440ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::AnnSpec::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::AnnSpec::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::MEME::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the meme program. You do not normally want to create a TFBS::PatternGen::MEME::Motif yourself. They are created by running TFBS::PatternGen::MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::MEME::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); TFBS-0.7.1/TFBS/PatternGen/Motif/000077500000000000000000000000001305752266700161555ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/Motif/.svn/000077500000000000000000000000001305752266700170415ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/Motif/.svn/all-wcprops000077500000000000000000000004651305752266700212370ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 50 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/Motif END Matrix.pm K 25 svn:wc:ra_dav:version-url V 60 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/Motif/Matrix.pm END Word.pm K 25 svn:wc:ra_dav:version-url V 58 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/Motif/Word.pm END TFBS-0.7.1/TFBS/PatternGen/Motif/.svn/entries000077500000000000000000000007231305752266700204420ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/PatternGen/Motif http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df Matrix.pm file 2009-08-07T13:10:57.000000Z 5f311726eab66b4f40d922e770de5930 2008-01-24T20:21:25.772223Z 8 chrb Word.pm file 2009-08-07T13:10:57.000000Z 05996413e6ae07f300c9a6bae206d4bb 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/PatternGen/Motif/.svn/format000077500000000000000000000000021305752266700202470ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/PatternGen/Motif/.svn/text-base/000077500000000000000000000000001305752266700207355ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/Motif/.svn/text-base/Matrix.pm.svn-base000077500000000000000000000022701305752266700242600ustar00rootroot00000000000000package TFBS::PatternGen::Motif::Matrix; use vars qw(@ISA); use strict; use TFBS::Matrix; use TFBS::Matrix::PFM; @ISA = qw(TFBS::Matrix); sub new { my ($caller, %args) = @_; #my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"PFM"); #my $self = bless $matrix, ref($caller) || $caller; my $self = $caller->SUPER::new(%args, -matrixtype=>"PFM"); $self->{'length'} = $args{'-length'} || scalar @{$self->{'matrix'}->[0]}; $self->{'nr_hits'} = ($args{'-nr_hits'} || undef); # || $self->throw("No -nr_hits provided."); # Why was nr_hits required ?? (Boris) $self->{'sites'}=$args{'-sites'}; # $self->{'tags'} = ($args{'-tags'} || {}); return $self; } sub PFM { my ($self, %args) = @_; return TFBS::Matrix::PFM->new (-name => "unknown", -ID => "unknown", -class=> "unknown", -tags => { %{$self->{'tags'} } }, %args, -matrix => $self->_calculate_PFM() ); } sub pattern { my ($self, %args ) = @_; $self->PFM(%args); } sub _calculate_PFM { # simplest case: matrix already IS PFM my $self = shift; return [@{$self->{'matrix'}}]; } sub get_sites{ return @{$_[0]->{'sites'}}; } 1; TFBS-0.7.1/TFBS/PatternGen/Motif/.svn/text-base/Word.pm.svn-base000077500000000000000000000005331305752266700237270ustar00rootroot00000000000000package TFBS::PatternGen::Motif::Word; use vars qw(@ISA); use strict; use TFBS::Word::Consensus; @ISA = qw(TFBS::Word::Consensus); sub new { my ($caller, %args) = @_; my $word = TFBS::Word::Consensus->new(%args); my $self = bless $word, ref($caller) || $caller; return $self; } sub pattern { return $_; } 1; TFBS-0.7.1/TFBS/PatternGen/Motif/Matrix.pm000077500000000000000000000022701305752266700177630ustar00rootroot00000000000000package TFBS::PatternGen::Motif::Matrix; use vars qw(@ISA); use strict; use TFBS::Matrix; use TFBS::Matrix::PFM; @ISA = qw(TFBS::Matrix); sub new { my ($caller, %args) = @_; #my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"PFM"); #my $self = bless $matrix, ref($caller) || $caller; my $self = $caller->SUPER::new(%args, -matrixtype=>"PFM"); $self->{'length'} = $args{'-length'} || scalar @{$self->{'matrix'}->[0]}; $self->{'nr_hits'} = ($args{'-nr_hits'} || undef); # || $self->throw("No -nr_hits provided."); # Why was nr_hits required ?? (Boris) $self->{'sites'}=$args{'-sites'}; # $self->{'tags'} = ($args{'-tags'} || {}); return $self; } sub PFM { my ($self, %args) = @_; return TFBS::Matrix::PFM->new (-name => "unknown", -ID => "unknown", -class=> "unknown", -tags => { %{$self->{'tags'} } }, %args, -matrix => $self->_calculate_PFM() ); } sub pattern { my ($self, %args ) = @_; $self->PFM(%args); } sub _calculate_PFM { # simplest case: matrix already IS PFM my $self = shift; return [@{$self->{'matrix'}}]; } sub get_sites{ return @{$_[0]->{'sites'}}; } 1; TFBS-0.7.1/TFBS/PatternGen/Motif/Word.pm000077500000000000000000000005331305752266700174320ustar00rootroot00000000000000package TFBS::PatternGen::Motif::Word; use vars qw(@ISA); use strict; use TFBS::Word::Consensus; @ISA = qw(TFBS::Word::Consensus); sub new { my ($caller, %args) = @_; my $word = TFBS::Word::Consensus->new(%args); my $self = bless $word, ref($caller) || $caller; return $self; } sub pattern { return $_; } 1; TFBS-0.7.1/TFBS/PatternGen/SimplePFM.pm000077500000000000000000000063171305752266700172430ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::SimplePFM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::SimplePFM - a simple position frequency matrix factory =head1 SYNOPSIS my @sequences = qw( AAGCCT AGGCAT AAGCCT AAGCCT AGGCAT AGGCCT AGGCAT AGGTTT AGGCAT AGGCCT AGGCCT ); my $patterngen = TFBS::PatternGen::SimplePFM->new(-seq_list=>\@sequences); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::SimplePFM generates a position frequency matrix from a set of nucleotide sequences of equal length, The sequences can be passed either as strings, as Bio::Seq objects or as a fasta file. This pattern generator always creates only one pattern from a given set of sequences. =cut package TFBS::PatternGen::SimplePFM; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGenI; use TFBS::PatternGen; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $db = TFBS::PatternGen::SimplePFM->new(%args); Function: the constructor for the TFBS::PatternGen::SimplePFM object Returns : a TFBS::PatternGen::SimplePFM obkect Args : This method takes named arguments; you must specify one of the following -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_check_seqs_for_uniform_length(); $self->{'motifs'} = [$self->_create_motif()]; return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three above methods are used fro the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _create_motif { my $self = shift; my $length = $self->{'seq_set'}->[-1]->length(); # initialize the matrix my $matrixref = []; for my $i (0..3) { for my $j (0..$length-1) { $matrixref->[$i][$j] = 0; } } #fill the matrix my @base = qw(A C G T); foreach my $seqobj ( @{ $self->{seq_set} } ) { for my $i (0..3) { my $seqstring = $seqobj->seq; my @seqbase = split "", uc $seqstring; for my $j (0..$length-1) { $matrixref->[$i][$j] += ($base[$i] eq $seqbase[$j])?1:0; } } } my $nrhits =0; for my $i (0..3) {$nrhits += $matrixref->[$i][0];} my $motif = TFBS::PatternGen::Motif::Matrix->new(-matrix => $matrixref, -nr_hits=> $nrhits); return $motif; } sub _validate_seq { # a utility function my ($sequence)=@_; $sequence=~ s/[ACGT]//g; return ($sequence eq "" ? 1 : 0); } 1;TFBS-0.7.1/TFBS/PatternGen/YMF.pm000077500000000000000000000123031305752266700160720ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::YMF # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::MEME - a pattern factory that uses the MEME program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::MEME->new(-seq_file=>'sequences.fa', -binary => 'meme' my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::MEME builds position frequency matrices using an external program MEME written by Bailey and Elkan. For information and source code of MEME see http://www.sdsc.edu/MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =cut package TFBS::PatternGen::YMF; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::YMF::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; use File::Temp qw/ tempfile tempdir /; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $patterngen = TFBS::PatternGen::YMF->new(%args); Function: the constructor for the TFBS::PatternGen::MEME object Returns : a TFBS::PatternGen::MEME object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to the 'meme' executable # OPTIONAL: default 'meme' -additional_params # a string containing additional # command-line switches for the # meme program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'width'}=$args{'-length_oligo'}; $self->{'path_org'}=$args{'-pathoforganismtables'}; $self->{'len_region'}=$args{'-length_region'}; $self->{'config_file'}=$args{'-config_file'}||$args{'-stats_path'}."/stats.config"; #The latter is the example configfile that comes with the installation of YMF $self->{'abs_stats_path'} = $args{'-abs_stats_path'} ; #This is the directory where the executable and the results file is #generated by the program are located $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_stats() or $self->throw("Error running stats."); return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _run_stats{ my ($self)=shift; my $tmp_file = tmpnam(); my $dumpfile = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $dir = tempdir(); #print $dir; #change directory to directory where the program is located #system 'cd $dir.w;'; # my $command="cd $dir;"; # print $command; # system $command; # `$command`; # system 'ls -ltr'; my $command_line = $self->{'abs_stats_path'}."/stats ". # "stats ". $self->{'config_file'}." ". $self->{'len_region'}." ". $self->{'width'}." ". $self->{'path_org'}." ". "-sort ".#sorts on z-score $tmp_file ." >$dumpfile" ; # print STDERR "cd $dir;$command_line\n"; my $resultstring = `cd $dir;$command_line`; # print STDERR $resultstring; $self->_parse_stats_output($resultstring,$command_line,$dumpfile,$dir); unlink $tmp_file; #unlink $dumpfile; return 1 } # sub _parse_stats_output{ my ($self,$resultstring,$command_line,$dumpfile,$temp_dir)=@_; open DUMP,$dumpfile; while(){ if ((/(^Error.*)/) or /(.*Aborting.*)/){ # warn "Error running AnnSpec\nNo patterns produced"; print "YMF Error message: \"$1\"\n"; unlink $dumpfile; $self->throw ("Error running YMF using command:\n $command_line"); return; } } unlink $dumpfile; open RES,"$temp_dir/results"; my $skip=; while (){ my ($word,$occ,$z_score,$expect,$var)=split; #print $word; my $motif =TFBS::PatternGen::YMF::Motif->new (-word=>$word, -tags => {z_score=>$z_score, 'occurences'=>$occ, 'expectation value'=>$expect, 'variance'=>$var} ); push @{ $self->{'motifs'} }, $motif; } my $command="rm -r $temp_dir"; #print $command; `$command`;# or die "could not unlink $!"; # return } # 1;TFBS-0.7.1/TFBS/PatternGen/YMF/000077500000000000000000000000001305752266700155325ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/YMF/.svn/000077500000000000000000000000001305752266700164165ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/YMF/.svn/all-wcprops000077500000000000000000000003041305752266700206040ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 48 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/YMF END Motif.pm K 25 svn:wc:ra_dav:version-url V 57 /svn/lenhard/!svn/ver/8/TFBS/TFBS/PatternGen/YMF/Motif.pm END TFBS-0.7.1/TFBS/PatternGen/YMF/.svn/entries000077500000000000000000000005351305752266700200200ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/PatternGen/YMF http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df Motif.pm file 2009-08-07T13:10:55.000000Z 8773ed93b0f146efb3841afff327db68 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/PatternGen/YMF/.svn/format000077500000000000000000000000021305752266700176240ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/PatternGen/YMF/.svn/text-base/000077500000000000000000000000001305752266700203125ustar00rootroot00000000000000TFBS-0.7.1/TFBS/PatternGen/YMF/.svn/text-base/Motif.pm.svn-base000077500000000000000000000022001305752266700234400ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::YMF::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::YMF::Motif - class for unprocessed motifs and associated numerical scores created by the YMF program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::YMF::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the ymf program. You do not normally want to create a TFBS::PatternGen::YMF::Motif yourself. They are created by running TFBS::PatternGen::YMF =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::YMF::Motif; use vars qw(@ISA); use strict; #use TFBS::Word; #use TFBS::Word::Consensus; use TFBS::PatternGen::Motif::Word; @ISA = qw(TFBS::PatternGen::Motif::Word); 1; TFBS-0.7.1/TFBS/PatternGen/YMF/Motif.pm000077500000000000000000000022001305752266700171430ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::YMF::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::YMF::Motif - class for unprocessed motifs and associated numerical scores created by the YMF program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::YMF::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the ymf program. You do not normally want to create a TFBS::PatternGen::YMF::Motif yourself. They are created by running TFBS::PatternGen::YMF =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::YMF::Motif; use vars qw(@ISA); use strict; #use TFBS::Word; #use TFBS::Word::Consensus; use TFBS::PatternGen::Motif::Word; @ISA = qw(TFBS::PatternGen::Motif::Word); 1; TFBS-0.7.1/TFBS/PatternGenI.pm000077500000000000000000000007751305752266700155620ustar00rootroot00000000000000package TFBS::PatternGenI; use vars qw(@ISA); use strict; # Object preamble - inherits from Bio::RootI; use Bio::Root::Root; use Carp; @ISA = qw(Bio::Root::Root); sub pattern { my $self = shift; $self->_abstractDeath; } sub _abstractDeath { # borrowed from BioPerl; with compliments :) my $self = shift; my $package = ref $self; my $caller = (caller())[1]; confess "Abstract method '$caller' defined in interface TFBS::PatternGenI not implemented by package $package"; } TFBS-0.7.1/TFBS/PatternI.pm000077500000000000000000000060201305752266700151150ustar00rootroot00000000000000# TFBS module for TFBS::PatternI # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternI - interface definition for all pattern objects (currently includes matrices and word (consensus and regular expressions ) =head1 DESCRIPTION TFBS::PatternI is a draft class that should contain general interface for matrix and other (future) pattern objects. It is not defined and not used yet, as I need to ponder over certain unresolved issues in general pattern definition. User feedback is more than welcome. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins here: # The code begins HERE: package TFBS::PatternI; use vars '@ISA'; use Bio::Root::Root; use strict; @ISA = qw(Bio::Root::Root); #sub new { #} =head2 ID Title : ID Usage : my $ID = $icm->ID() $pfm->ID('M00119'); Function: Get/set on the ID of the pattern (unique in a DB or a set) Returns : pattern ID (a string) Args : none for get, string for set =cut sub ID { my ($self, $ID) = @_; $self->{'ID'} = $ID if $ID; return $self->{'ID'}; } =head2 name Title : name Usage : my $name = $pwm->name() $pfm->name('PPARgamma'); Function: Get/set on the name of the pattern Returns : pattern name (a string) Args : none for get, string for set =cut sub name { my ($self, $name) = @_; $self->{'name'} = $name if $name; return $self->{'name'}; } =head2 class Title : class Usage : my $class = $pwm->class() $pfm->class('forkhead'); Function: Get/set on the structural class of the pattern Returns : class name (a string) Args : none for get, string for set =cut sub class { my ($self, $class) = @_; $self->{'class'} = $class if $class; return $self->{'class'}; } =head2 tag Title : tag Usage : my $acc = $pwm->tag('acc') $pfm->tag(source => "Gibbs"); Function: Get/set on the structural class of the pattern Returns : tag value (a scalar/reference) Args : tag name (string) for get, tag name (string) and value (any scalar/reference) for set =cut sub tag { my $self = shift; my $tag = shift || return; if (scalar @_) { $self->{'tags'}->{$tag} =shift; } return $self->{'tags'}->{$tag}; } =head2 all_tags Title : all_tags Usage : my %tag = $pfm->all_tags(); Function: get a hash of all tags for a matrix Returns : a hash of all tag values keyed by tag name Args : none =cut sub all_tags { return %{$_[0]->{'tags'}}; } =head2 delete_tag Title : delete_tag Usage : $pfm->delete_tag('score'); Function: get a hash of all tags for a matrix Returns : nothing Args : a string (tag name) =cut sub delete_tag { my ($self, $tag) = @_; delete $self->{'tags'}->{$tag}; } 1; TFBS-0.7.1/TFBS/README.md000066400000000000000000000000341305752266700143040ustar00rootroot00000000000000TFBS ==== TFBS perl module TFBS-0.7.1/TFBS/Run/000077500000000000000000000000001305752266700135745ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Run/.svn/000077500000000000000000000000001305752266700144605ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Run/.svn/all-wcprops000077500000000000000000000003341305752266700166510ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 37 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Run END ConservationProfileGenerator.pm K 25 svn:wc:ra_dav:version-url V 69 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Run/ConservationProfileGenerator.pm END TFBS-0.7.1/TFBS/Run/.svn/entries000077500000000000000000000005511305752266700160600ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/Run http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df ConservationProfileGenerator.pm file 2009-08-07T13:10:48.000000Z 98de460766383395e70cefa0b313bdd0 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/Run/.svn/format000077500000000000000000000000021305752266700156660ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/Run/.svn/text-base/000077500000000000000000000000001305752266700163545ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Run/.svn/text-base/ConservationProfileGenerator.pm.svn-base000077500000000000000000000125121305752266700262750ustar00rootroot00000000000000package TFBS::Run::ConservationProfileGenerator; use strict; use Bio::Root::Root; use TFBS::ConservationProfile; use Bio::AlignIO; use constant DEFAULT_WINDOW => 50; use constant DEFAULT_CUTOFF => 0.7; use vars qw'@ISA'; @ISA = qw'Bio::Root::Root'; sub new { my ( $caller, %args ) = @_; my $self = bless { alignment => undef, ref_sequence => undef, method => undef, window => DEFAULT_WINDOW, cutoff => DEFAULT_CUTOFF, %args }, ref $caller || $caller; if ( !defined( $self->alignment ) or !$self->alignment->isa("Bio::SimpleAlign") ) { $self->throw( "alignment: argument missing or wrong object type: " . ref( $self->alignment ) ); } return $self; } sub run { my ( $self, %args ) = @_; my $method = ( $args{method} or $self->method or "simple" ); my %method_subref = ( simple => \&_run_simple, malin => \&_run_Malins, align_cons => \&_run_align_cons ); if ( !defined( $method_subref{$method} ) ) { $self->throw("method $method not supported"); } $method_subref{$method}->( $self, %args ); } sub alignment { $_[0]->{'alignment'}; } sub ref_sequence { $_[0]->{'ref_sequence'}; } sub method { $_[0]->{'method'}; } sub window { $_[0]->{'window'}; } sub cutoff { $_[0]->{'cutoff'}; } sub _run_simple { my ( $self, %args ) = @_; my ( $window_size, $cutoff, $ref_seq_nr, $other_seq_nr ) = $self->_rearrange( [qw(WINDOW CUTOFF REF_SEQ_NR OTHER_SEQ_NR)], %args ); $window_size = $self->window unless $window_size; $cutoff = $self->cutoff unless $cutoff; $ref_seq_nr = 1 if !$ref_seq_nr; $other_seq_nr = ( $other_seq_nr or 3 - $ref_seq_nr ); my @seq1 = split "", $self->alignment->get_seq_by_pos($ref_seq_nr)->seq; my @seq2 = split "", $self->alignment->get_seq_by_pos($other_seq_nr)->seq; my @CONSERVATION; my @match; while ( $seq1[0] eq "-" or $seq1[0] eq "." ) { shift @seq1; shift @seq2; } for my $i ( 0 .. $#seq1 ) { push( @match, ( uc( $seq1[$i] ) eq uc( $seq2[$i] ) ? 1 : 0 ) ) unless ( $seq1[$i] eq "-" or $seq1[$i] eq "." ); } my @graph = ( $match[0] ); for my $i ( 1 .. ( $#match + $window_size / 2 ) ) { $graph[$i] = $graph[ $i - 1 ] + ( $i > $#match ? 0 : $match[$i] ) - ( $i < $window_size ? 0 : $match[ $i - $window_size ] ); } # at this point, the graph values are shifted $window_size/2 to the right # i.e. the score at a certain position is the score of the window # UPSTREAM of it: To fix it, we shoud discard the first $window_size/2 scores: #$self->conservation1 ([]); foreach my $match_point ( @graph[ int( $window_size / 2 ) .. $#graph ] ) { push @CONSERVATION, $match_point / $window_size; } return TFBS::ConservationProfile->new( conservation => \@CONSERVATION, parameters => { window => $window_size, cutoff => $cutoff, ref_seq_nr => $ref_seq_nr, other_seq_nr => $other_seq_nr, method => "simple" }, ref_sequence => $self->ref_sequence, alignment => $self->alignment ); } sub _run_Malins { shift->throw( "Not implemeted, sorry. Pick another method for the time being"); } sub _run_align_cons { my ( $self, %args ) = @_; my ( $window_size, $increment, $cutoff, $stringency, $format, $prog ) = $self->_rearrange( [qw(WINDOW INCREMENT CUTOFF STRINGENCY FORMAT PROGRAM)], %args ); my %params = ( -w => $window_size, -n => $increment, -t => $cutoff, -s => $stringency, -r => "p", -f => ( $format or "c" ) # center by default ); $prog = "align_cons" unless defined $prog; my @cl_args; while ( my ( $param, $value ) = each %params ) { if ( defined $value ) { push @cl_args, $param, $value; } } my $alnstring = $self->_alignment_to_string("fasta"); $alnstring =~ s/[\"\$]/\\$1/gs; # escape things that might confuse echo my $command = join " ", $prog, @cl_args; my @output_lines = `echo "$alnstring" | $command`; # add error checking here!!! my @CONSERVATION; foreach my $line (@output_lines) { chomp $line; $line =~ s/^\D+//; my ( $pos, $value ) = split /\s+/, $line; push @CONSERVATION, $value; } return TFBS::ConservationProfile->new( conservation => \@CONSERVATION, parameters => { window => $window_size, cutoff => $cutoff, increment => $increment, stringency => $stringency, method => "align_cons" }, alignment => $self->alignment, ref_sequence => $self->ref_sequence ); } sub _alignment_to_string { my ( $self, $format ) = ( @_, "fasta" ); my $alnstring; my $fh = IO::String->new($alnstring); my $outstream = Bio::AlignIO->new( -fh => $fh, -format => $format ); $outstream->write_aln( $self->alignment ); $outstream->close; return $alnstring; } #sub _UNIT_TESTS { # require Test; # require CONSNP::Test::TestObjects; # my $to = CONSNP::Test::TestObjects->new; # # plan(tests => 5); # # exit(0); # # #} 1; TFBS-0.7.1/TFBS/Run/ConservationProfileGenerator.pm000077500000000000000000000125141305752266700220020ustar00rootroot00000000000000package TFBS::Run::ConservationProfileGenerator; use strict; use Bio::Root::Root; use TFBS::ConservationProfile; use Bio::AlignIO; use constant DEFAULT_WINDOW => 50; use constant DEFAULT_CUTOFF => 0.7; use vars qw'@ISA'; @ISA = qw'Bio::Root::Root'; sub new { my ( $caller, %args ) = @_; my $self = bless { alignment => undef, ref_sequence => undef, method => undef, window => DEFAULT_WINDOW, cutoff => DEFAULT_CUTOFF, %args }, ref $caller || $caller; if ( !defined( $self->alignment ) or !$self->alignment->isa("Bio::SimpleAlign") ) { $self->throw( "alignment: argument missing or wrong object type: " . ref( $self->alignment ) ); } return $self; } sub run { my ( $self, %args ) = @_; my $method = ( $args{method} or $self->method or "simple" ); my %method_subref = ( simple => \&_run_simple, malin => \&_run_Malins, align_cons => \&_run_align_cons ); if ( !defined( $method_subref{$method} ) ) { $self->throw("method $method not supported"); } $method_subref{$method}->( $self, %args ); } sub alignment { $_[0]->{'alignment'}; } sub ref_sequence { $_[0]->{'ref_sequence'}; } sub method { $_[0]->{'method'}; } sub window { $_[0]->{'window'}; } sub cutoff { $_[0]->{'cutoff'}; } sub _run_simple { my ( $self, %args ) = @_; my ( $window_size, $cutoff, $ref_seq_nr, $other_seq_nr ) = $self->_rearrange( [qw(WINDOW CUTOFF REF_SEQ_NR OTHER_SEQ_NR)], %args ); $window_size = $self->window unless $window_size; $cutoff = $self->cutoff unless $cutoff; $ref_seq_nr = 1 if !$ref_seq_nr; $other_seq_nr = ( $other_seq_nr or 3 - $ref_seq_nr ); my @seq1 = split "", $self->alignment->get_seq_by_pos($ref_seq_nr)->seq; my @seq2 = split "", $self->alignment->get_seq_by_pos($other_seq_nr)->seq; my @CONSERVATION; my @match; while ( $seq1[0] eq "-" or $seq1[0] eq "." ) { shift @seq1; shift @seq2; } for my $i ( 0 .. $#seq1 ) { push( @match, ( uc( $seq1[$i] ) eq uc( $seq2[$i] ) ? 1 : 0 ) ) unless ( $seq1[$i] eq "-" or $seq1[$i] eq "." ); } my @graph = ( $match[0] ); for my $i ( 1 .. ( $#match + $window_size / 2 ) ) { $graph[$i] = $graph[ $i - 1 ] + ( $i > $#match ? 0 : $match[$i] ) - ( $i < $window_size ? 0 : $match[ $i - $window_size ] ); } # at this point, the graph values are shifted $window_size/2 to the right # i.e. the score at a certain position is the score of the window # UPSTREAM of it: To fix it, we should discard the first $window_size/2 scores: #$self->conservation1 ([]); foreach my $match_point ( @graph[ int( $window_size / 2 ) .. $#graph ] ) { push @CONSERVATION, $match_point / $window_size; } return TFBS::ConservationProfile->new( conservation => \@CONSERVATION, parameters => { window => $window_size, cutoff => $cutoff, ref_seq_nr => $ref_seq_nr, other_seq_nr => $other_seq_nr, method => "simple" }, ref_sequence => $self->ref_sequence, alignment => $self->alignment ); } sub _run_Malins { shift->throw( "Not implemented, sorry. Pick another method for the time being"); } sub _run_align_cons { my ( $self, %args ) = @_; my ( $window_size, $increment, $cutoff, $stringency, $format, $prog ) = $self->_rearrange( [qw(WINDOW INCREMENT CUTOFF STRINGENCY FORMAT PROGRAM)], %args ); my %params = ( -w => $window_size, -n => $increment, -t => $cutoff, -s => $stringency, -r => "p", -f => ( $format or "c" ) # center by default ); $prog = "align_cons" unless defined $prog; my @cl_args; while ( my ( $param, $value ) = each %params ) { if ( defined $value ) { push @cl_args, $param, $value; } } my $alnstring = $self->_alignment_to_string("fasta"); $alnstring =~ s/[\"\$]/\\$1/gs; # escape things that might confuse echo my $command = join " ", $prog, @cl_args; my @output_lines = `echo "$alnstring" | $command`; # add error checking here!!! my @CONSERVATION; foreach my $line (@output_lines) { chomp $line; $line =~ s/^\D+//; my ( $pos, $value ) = split /\s+/, $line; push @CONSERVATION, $value; } return TFBS::ConservationProfile->new( conservation => \@CONSERVATION, parameters => { window => $window_size, cutoff => $cutoff, increment => $increment, stringency => $stringency, method => "align_cons" }, alignment => $self->alignment, ref_sequence => $self->ref_sequence ); } sub _alignment_to_string { my ( $self, $format ) = ( @_, "fasta" ); my $alnstring; my $fh = IO::String->new($alnstring); my $outstream = Bio::AlignIO->new( -fh => $fh, -format => $format ); $outstream->write_aln( $self->alignment ); $outstream->close; return $alnstring; } #sub _UNIT_TESTS { # require Test; # require CONSNP::Test::TestObjects; # my $to = CONSNP::Test::TestObjects->new; # # plan(tests => 5); # # exit(0); # # #} 1; TFBS-0.7.1/TFBS/Site.pm000077500000000000000000000170301305752266700142760ustar00rootroot00000000000000# TFBS module for TFBS::Site # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Site - a nucleotide sequence feature object representing (possibly putative) transcription factor binding site. =head1 SYNOPSIS # manual creation of site object; # for details, see documentation of Bio::SeqFeature::Generic; my $site = TFBS::Site (-start => $start_pos, # integer -end => $end_pos, # integer -score => $score, # float -source => "TFBS", # string -primary => "TF binding site", # primary tag -strand => $strand, # -1, 0 or 1 -seqobj => $seqobj, # a Bio::Seq object whose sequence # contains the site -pattern => $pattern_obj # usu. TFBS::Matrix:PWM obj. -); # Searching sequence with a pattern (PWM) and retrieving individual sites: # # The following objects should be defined for this example: # $pwm - a TFBS::Matrix::PWM object # $seqobj - a Bio::Seq object # Consult the documentation for the above modules if you do not know # how to create them. # Scanning sequence with $pwm returns a TFBS::SiteSet object: my $site_set = $pwm->search_seq(-seqobj => $seqobj, -threshold => "80%"); # To retrieve individual sites from $site_set, create an iterator obj: my $site_iterator = $site_set->Iterator(-sort_by => "score"); while (my $site = $site_iterator->next()) { # do something with $site } =head1 DESCRIPTION TFBS::Site object holds data for a (possibly predicted) transcription factor binding site on a nucleotide sequence (start, end, strand, score, tags, as well as references to the corresponding sequence and pattern objects). TFBS::Site is a subclass of Bio::SeqFeature::Generic and has acces to all of its method. Additionally, it contains the pattern() method, an accessor for pattern object associated with the site object. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. TFBS::Site is a class that extends Bio::SeqFeature::Generic. Please consult Bio::SeqFeature::Generic documentation for other available methods. =cut # The code begins HERE: package TFBS::Site; use vars qw(@ISA); use strict; use Bio::SeqFeature::Generic; @ISA = qw(Bio::SeqFeature::Generic); =head2 new Title : new Usage : my $site = TFBS::Site->new(%args) Function: constructor for the TFBS::Site object Returns : TFBS::Site object Args : -start, # integer -end, # integer -strand, # -1, 0 or 1 -score, # float -source, # string (method used to detect it) -primary, # string (primary tag) -seqobj, # a Bio::Seq object -pattern # a pattern object, usu. TFBS::Matrix::PWM =cut sub new { my $class = shift; my %args = (-seq_id => undef, -siteseq => undef, -seqobj => undef, -strand => "0", -source => "TFBS", -primary => "TF binding site", -pattern => undef, -score => undef, -start => undef, -end => undef, -frame => 0, @_); my $obj = Bio::SeqFeature::Generic->new(%args); my $self = bless $obj, ref($class) || $class; if ($args{-seqobj}) { $self->attach_seq($args{-seqobj}) ; $self->add_tag_value('sequence', $self->seq->seq); } # this is only for GFF printing really, and will be moved there soon if (defined $args{'-pattern'}) { $self->pattern($args{'-pattern'}); $self->add_tag_value('TF' => $self->pattern->name()); $self->add_tag_value('class' => $self->pattern->class) if $self->pattern->class; } return $self; } =head2 pattern Title : pattern Usage : my $pattern = $site->pattern(); # gets the pattern $site->pattern($pwm); # sets the pattern to $pwm Function: gets/sets the pattern object associated with the site Returns : pattern object, here TFBS::Matrix::PWM object Args : pattern object (optional, for setting the pattern only) =cut sub pattern { my ($self, $pattern) = @_; if (defined $pattern) { $self->{'pattern'} = $pattern; } return $self->{'pattern'}; } =head2 rel_score Title : rel_score Usage : my $percent_score = $site->rel_score() * 100; # gets the pattern Function: gets relative score (between 0.0 to 1.0) with respect of the score range of the associated pattern (matrix) Returns : floating point number between 0 and 1, or undef if pattern not defined Args : none =cut sub rel_score { my ($self) = @_; return undef unless $self->pattern(); return ($self->score - $self->pattern->min_score)/ ($self->pattern->max_score - $self->pattern->min_score); } =head2 GFF Title : GFF Usage : print $site->GFF(); : print $site->GFF($gff_formatter) Function: returns a "standard" GFF string - the "generic" gff_string method is left untouched for possible customizations Returns : a string (NOT newline terminated! ) Args : a $gff_formatter function reference (optional) =cut sub GFF { # due to popular demand, GFF is again a legal method, this time # not requiring GFF modules return $_[0]->gff_string($_[1]); } =head2 location =head2 start =head2 end =head2 length =head2 score =head2 frame =head2 sub_SeqFeature =head2 add_sub_SeqFeature =head2 flush_sub_SeqFeature =head2 primary_tag =head2 source_tag =head2 has_tag =head2 add_tag_value =head2 each_tag_value =head2 all_tags =head2 remove_tag =head2 attach_seq =head2 seq =head2 entire_seq =head2 seq_id =head2 annotation =head2 gff_format =head2 gff_string The above methods are inherited from Bio::SeqFeature::Generic. Please see L for details on their usage. =cut ################################################################## # BACKWARD COMPATIBILITY METHODS sub Matrix { my ($self, %args) = @_; $self->pattern(%args); } sub seqobj { } sub siteseq { $_[0]->seq->seq(); } sub site_length { my ($self) = @_; $self->warn("site_length method is present for backward compatibility only. In new code please use the length() method"); return $self->length(); } sub old_GFF { eval "require GFF::GeneFeature;"; if ($@) { print STDERR "Failed to load GFF modules, stopped"; return; } my ($self, %tags) =@_; $self->warn("GFF method is for backward compatibility only, and its use in new code is not recommended. Please use Bio::SeqFeature::Generic gff methods if possible."); my $GFFgf = GFF::GeneFeature->new(2); $GFFgf->seqname ( $self->seqname() or "Unknown" ); $GFFgf->source ("TFBS"); $GFFgf->feature ("TFBS"); $GFFgf->start ($self->start()); $GFFgf->end ($self->end()); $GFFgf->score ($self->score()); $GFFgf->strand (("-",".","+")[$self->strand()+1]); # $GFFgf->strand ($self->strand()); %tags = (TF => $self->pattern->{name}, class => $self->pattern->{class}, sequence => $self->seq->seq(), %tags); while (my ($tag, $value) = each %tags) { my @values; if (ref($value) eq "ARRAY") { @values = @$value; } else { @values = ($value); } $GFFgf->attribute($tag, @values); } return $GFFgf; } 1; TFBS-0.7.1/TFBS/SitePair.pm000077500000000000000000000031731305752266700151150ustar00rootroot00000000000000package TFBS::SitePair; use vars qw(@ISA); use strict; use Bio::SeqFeature::FeaturePair; @ISA = qw(Bio::SeqFeature::FeaturePair); # 'new' used to be inherited, but we need it now sub new { my ($caller, $site1, $site2) = @_; #if ($Bio::Root::Root::VERSION < 1.4) { #return $caller->SUPER::new($site1, $site2); #} #else { return $caller->SUPER::new(-feature1 => $site1, -feature2 => $site2); #} # ^ Version check commented out because from BioPerl 1.5.2 # version nrs are represented differently. // PE 2007-7-11 } =head2 pattern Title : pattern Usage : my $pattern = $sitepair->pattern(); # gets the pattern # sets the pattern to $pwm Function: gets the pattern object associated with the site pair Returns : pattern object, here TFBS::Matrix::PWM object Args : none (get-only method) =cut sub pattern { $_[0]->feature1->pattern(); } =head2 GFF Title : GFF Usage : print $site->GFF(); : print $site->GFF($gff_formatter) Function: returns a "standard" multiline GFF string Returns : a string (multiline, newline terminated) Args : a $gff_formatter function reference (optional) =cut sub GFF { return join "\n", $_[0]->site1->GFF, $_[0]->site2->GFF; } =head2 site1 =head2 site2 Title : site1 site2 Usage : my $site1 = $sitepair->site1(); Function: Returns individual TFBS::Site objects, from the site pair Returns : a TFBS::Site Args : none =cut sub site1 { $_[0]->feature1(); } sub site2 { $_[0]->feature2(); } TFBS-0.7.1/TFBS/SitePairSet.pm000077500000000000000000000142441305752266700155720ustar00rootroot00000000000000# TFBS module for TFBS::SitePairSet # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::SitePairSet - a set of TFBS::SitePair objects =head1 SYNOPSIS my $site_pair_set = TFBS::SitePairSet->new(@list_of_site_pair_objects); # add a TFBS::SitePair object to set: $site_pair_set->add_site_pair($site_pair_obj); # append another TFBS::SitePairSet contents: $site_pair_set->add_site_pair_set($site_pair_obj); # create an iterator: my $it = $site_pair_set->Iterator(-sort_by => 'start'); =head1 DESCRIPTION TFBS::SitePairSet is an aggregate class that contains a collection of TFBS::SitePair objects. It can be created anew and filled with TFBS::Site::Pair object. It is also returned by search_aln() method call of TFBS::PatternI subclasses (e.g. TFBS::Matrix::PWM). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::SitePairSet; use vars qw(@ISA $AUTOLOAD); use strict; use TFBS::SitePair; use TFBS::_Iterator::_SiteSetIterator; @ISA = qw(Bio::Root::Root); sub new { my ($class, @data) = @_; my $self = bless {}, ref($class) || $class; $self->{_site_array_ref} = []; @data = @{$class->{_site_array_ref}} if !@data && ref($class); $self->add_site_pair(@data); return $self; } =head2 size Title : size Usage : my $size = $sitepairset->size() Function: returns a number of TFBS::SitePair objects contained in the set Returns : a scalar (integer) Args : none =cut sub size { scalar @{ $_[0]->{_site_array_ref} }; } =head2 add_site_pair Title : add_site_pair Usage : $sitepairset->add_site_pair($site_pair_object) $sitepairset->add_site_pair(@list_of_site_pair_objects) Function: adds TFBS::SitePair objects to an existing TFBS::SitePairSet object Returns : $sitepairset object (usually ignored) Args : A list of TFBS::SitePair objects to add =cut sub add_site_pair { my ($self, @site_list) = @_; foreach my $site (@site_list) { $site->isa("TFBS::SitePair") or $self->throw("Attempted to add an element ". "of a wrong type."); push @{$self->{_site_array_ref}}, $site; } return 1; } =head2 add_site_pair_set Title : add_site_pair_set Usage : $sitepairset->add_site_pair_set($site_pair_set_object) $sitepairset->add_site_pair(@list_of_site_pair_set_objects) Function: adds the contents of other TFBS::SitePairSet objects to an existing TFBS::SitePairSet object Returns : $sitepairset object (usually ignored) Args : A list of TFBS::SitePairSet objects whose contents should be added to $sitepairset =cut sub add_site_pair_set { my ($self, @sitesets) = @_; foreach my $siteset (@sitesets) { $siteset->isa("TFBS::SitePairSet") or $self->throw("Attempted to add an element ". "that is not a TFBS::SiteSet object."); push @{$self->{_site_array_ref}}, @{ $siteset->{_site_array_ref} }; } return $self; } =head2 Iterator Title : Iterator Usage : my $it = $sitepairset->Iterator(-sort_by=>'start'); while (my $site_pair = $it->next()) { #... Function: Returns an iterator object, used to iterate thorugh elements (TFBS::SitePair objects) Returns : a TFBS::_Iterator object Args : -sort_by # optional - currently it accepts # (default sort order in parenthetse) # 'name' (pattern name, alphabetically) # 'ID' (pattern/matrix ID, alphabetically) # 'start' (site start in sequence, # numerically,increasing order) # 'end' (site end in sequence, # numerically, increasing order) # 'score' (1st site in pair, # numerically, decreasing order) -reverse # optional - reverses the default sorting order if true =cut sub Iterator { my ($self, %args) = @_; return TFBS::_Iterator::_SiteSetIterator->new($self->{_site_array_ref}, $args{'-sort_by'}, $args{'-reverse'} ); } =head2 set1 =head2 set2 Title : set1 set2 Usage : my $siteset1 = $sitepairset->set1(); : my $siteset2 = $sitepairset->set2() Function: Returns individual TFBS::SiteSet objects, from the site set pair Returns : A TFBS::SiteSet object Args : none =cut sub set1 { $_[0]->_get_set(1); } sub set2 { $_[0]->_get_set(2); } =head2 GFF Title : GFF Usage : print $site->GFF(); : print $site->GFF($gff_formatter) Function: returns a "standard" multiline GFF string Returns : a string (multiline, newline terminated) Args : a $gff_formatter function reference (optional) =cut sub GFF { my ($self, %args) = @_; my $iterator = $self->Iterator(-sort_by=>'start'); my $gff_string = ""; while (my $sitepair = $iterator->next()) { $gff_string .= $sitepair->GFF(%args)."\n"; } return $gff_string; } ############################################################## # PRIVATE AND AUTOMATIC METHODS ############################################################## sub _get_set { my ($self, $set_nr) = @_; my $feature = "feature$set_nr"; my $it = $self->Iterator(); my $siteset = TFBS::SiteSet->new(); no strict 'refs'; while (my $site_pair = $it->next()) { eval "$siteset->add_site(\$site_pair->$feature())"; } return $siteset; } sub AUTOLOAD { my ($self) = @_; my %discontinued = (sort => 1, sort_by_name => 1, sort_reversed => 1, reverse => 1, next_site => 1, reset => 1 ); $AUTOLOAD =~ /.+::(\w+)/; if ($discontinued{$1}) { $self->_no_more($1); } else { $self->throw("$1: no such method"); } } sub _no_more { $_[0]->throw("Method '$_[1]' is no longer available in ". ref($_[0]).". Use the 'Iterator' method instead."); } 1; TFBS-0.7.1/TFBS/SiteSet.pm000077500000000000000000000141011305752266700147460ustar00rootroot00000000000000# TFBS module for TFBS::SiteSet # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::SiteSet - a set of TFBS::Site objects =head1 SYNOPSIS my $site_set = TFBS::SiteSet->new(@list_of_site_objects); # add a TFBS::Site object to set: $site_set->add_site($site_obj); # append another TFBS::SiteSet contents: $site_pair_set->add_site_set($site_obj); # create an iterator: my $it = $site_set->Iterator(-sort_by => 'start'); =head1 DESCRIPTION TFBS::SiteSet is an aggregate class that contains a collection of TFBS::Site objects. It can be created anew and filled with TFBS::Site object. It is also returned by search_seq() method call of some TFBS::PatternI subclasses (e.g. TFBS::Matrix::PWM). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::SiteSet; use vars qw(@ISA $AUTOLOAD); use TFBS::Site; use TFBS::_Iterator::_SiteSetIterator; use strict; @ISA = qw(Bio::Root::Root); sub new { my ($class, @data) = @_; my $self = bless {}, ref($class) || $class; $self->{_site_array_ref} = []; @data = @{$class->{site_list}} if !@data && ref($class); $self->add_site(@data); return $self; } =head2 add_site Title : add_site Usage : $siteset->add_site($site_object) $siteset->add_site(@list_of_site_objects) Function: adds TFBS::Site objects to an existing TFBS::SiteSet object Returns : $sitepair object (usually ignored) Args : A list of TFBS::Site objects to add =cut sub add_site { my ($self, @site_list) = @_; foreach my $site (@site_list) { ref($site) =~ /TFBS::Site*/ or $self->throw("Attempted to add an element ". "of a wrong type."); push @{$self->{_site_array_ref}}, $site; } return 1; } =head2 add_site_set Title : add_site_set Usage : $siteset->add_site_set($site_set_object) $siteset->add_site(@list_of_site_set_objects) Function: adds the contents of other TFBS::SiteSet objects to an existing TFBS::SiteSet object Returns : $siteset object (usually ignored) Args : A list of TFBS::SiteSet objects whose contents should be added to $siteset =cut sub add_siteset { my ($self, @sitesets) = @_; foreach my $siteset (@sitesets) { ref($siteset) =~ /TFBS::Site.*Set/ or $self->throw("Attempted to add an element ". "that is not a TFBS::SiteSet object."); push @{$self->{_site_array_ref}}, @{ $siteset->{_site_array_ref} }; } return $self; } =head2 size Title : size Usage : my $size = $siteset->size() Function: returns a number of TFBS::Site objects contained in the set Returns : a scalar (integer) Args : none =cut sub size { scalar @{ $_[0]->{_site_array_ref} }; } =head2 Iterator Title : Iterator Usage : my $siteset_iterator = $siteset->Iterator(-sort_by =>'start'); while (my $site = $siteset_iterator->next) { # do whatever you want with individual matrix objects } Function: Returns an iterator object that can be used to go through all members of the set (TFBS::Site objects) Returns : an iterator object (currently undocumentened in TFBS - but understands the 'next' method) Args : -sort_by # optional - currently it accepts # (default sort order in parenthetse) # 'name' (pattern name, alphabetically) # 'ID' (pattern/matrix ID, alphabetically) # 'start' (site start in sequence, # numerically,increasing order) # 'end' (site end in sequence, # numerically, increasing order) # 'score' (numerically, decreasing order) -reverse # optional - reverses the default sorting order if true =cut sub Iterator { my ($self, %args) = @_; return TFBS::_Iterator::_SiteSetIterator->new($self->{_site_array_ref}, $args{'-sort_by'}, $args{'-reverse'} ); } sub all_sites { my ($self,%args) = @_; return @{$self->{_site_array_ref}} if @{$self->{_site_array_ref}}; } =head2 GFF Title : GFF Usage : print $siteset->GFF(); : print $siteset->GFF($gff_formatter) Function: returns a "standard" multiline GFF string Returns : a string (multiline, newline terminated) Args : a $gff_formatter function reference (optional) =cut sub GFF { my ($self, %args) = @_; my $site_iterator = $self->Iterator(-sort_by=>'start'); my $gff_string = ""; while (my $site = $site_iterator->next()) { $gff_string .= $site->GFF(%args)."\n"; } return $gff_string; } ######################################################## # OBSOLETE METHODS ######################################################## sub old_GFF { eval "require GFF::GeneFeatureSet;"; if ($@) { print STDERR "Failed to load GFF modules, stopped"; return; } my ($self) = @_; my $site_iterator = $self->Iterator(-sort_by=>'start'); my $GFFset = GFF::GeneFeatureSet->new(2); while (my $site = $site_iterator->next()) { $GFFset->addGeneFeature($site->GFF()); } return $GFFset; } ############################################################## # PRIVATE AND AUTOMATIC METHODS ############################################################## sub AUTOLOAD { my ($self) = @_; my %discontinued = (sort => 1, sort_by_name => 1, sort_reversed => 1, reverse => 1, next_site => 1, reset => 1 ); $AUTOLOAD =~ /.+::(\w+)/; if ($discontinued{$1}) { $self->_no_more($1); } else { $self->throw("$1: no such method"); } } sub _no_more { $_[0]->throw("Method '$_[1]' is no longer available in ". ref($_[0]).". Use the 'Iterator' method instead."); } 1; TFBS-0.7.1/TFBS/TFFM.pm000077500000000000000000000123201305752266700141230ustar00rootroot00000000000000# TFBS module for TFBS::TFFM # # You may distribute this module under the same terms as perl itself # # Date: 2015/10/06 # # POD =head1 NAME TFBS::TFFM - class for Transcription Factor Flexible Models (TFFMs) =head1 DESCRIPTION TFBS::TFFM is a class to hold basic information about a TFFM. It was mainly designed to store the information about a TFFM stored in the TFFM table of the JASPAR DB newly introduced in the JASPAR 2016 version. It does NOT (currently) store the actual XML describing the the model but this would be simple to add. At the time of this writing the relationship between JASPAR matrices as stored in the MATRIX table and TFFMs was not completely clear and the matrix IDs related to a TFFM are stored in the TFFM table. The relationship could be 1:n, m:1 or m:n in the future so this may well be changed and a joining table created to facilitate this. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - David Arenillas David Arenillas: dave@cmmt.ubc.ca =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::TFFM; use strict; sub new { my $class = shift; my %args = @_; my $self = bless {}, ref($class) || $class; $self->{'ID'} = $args{-ID} || "Unknown"; $self->{'name'} = $args{-name} || "Unknown"; $self->{'matrix_ID'} = $args{-matrix_ID}; $self->{'log_p_1st_order'} = $args{-log_p_1st_order}; $self->{'log_p_detailed'} = $args{-log_p_detailed}; $self->{'experiment_name'} = $args{-experiment_name}; # The JASPAR matrix related to this TFFM my $matrix = $args{-matrix}; if ($matrix) { if ($matrix->ISA('TFBS::Matrix')) { $self->{'matrix'} = $matrix; } else { $self->throw( "Provided -matrix argument does not refer to a TFBS::Matrix" . " object" ); } } return $self; } =head2 ID Title : ID Usage : my $id = $tffm->ID(); Function: Get/set the ID of this TFFM. Returns : The ID of this TFFM. Args : None for get or a new string ID. =cut sub ID { my ($self, $id) = @_; if ($id) { $self->{ID} = $id; } return $self->{ID}; } =head2 name Title : name Usage : my $name = $tffm->name(); Function: Get/set the name of the transcription factor for which this TFFM was modelled. Returns : Name of the TF modelled by this TFFM. Args : None for get or a new string TF name. =cut sub name { my ($self, $name) = @_; if ($name) { $self->{name} = $name; } return $self->{name}; } =head2 experiment_name Title : experiment_name Usage : my $filename = $tffm->experiment_name(); Function: Get/set the name of the experimental data on which this TFFM (generally ChIP-seq peak data) TFFM was trained. Often this is base file name of ChIP-seq peaks file. Returns : Name of the experiment/datafile. Args : None for get or a new experiment/datafile name. =cut sub experiment_name { my ($self, $exp_name) = @_; if ($exp_name) { $self->{experiment_name} = $exp_name; } return $self->{experiment_name}; } =head2 log_p_1st_order Title : log_p_1st_order Usage : my $log_p_val = $tffm->log_p_1st_order(); Function: Get/set the log(p) value for the 1st order model of this TFFM. Returns : Log(p) value of the 1st-order model. Args : None for get or a new 1st-order log(p) value. =cut sub log_p_1st_order { my ($self, $log_p_val) = @_; if ($log_p_val) { $self->{log_p_1st_order} = $log_p_val; } return $self->{log_p_1st_order}; } =head2 log_p_detailed Title : log_p_detailed Usage : my $log_p_val = $tffm->log_p_detailed(); Function: Get/set the log(p) value for the detailed model of this TFFM. Returns : Log(p) value of the detailed model. Args : None for get or a new detailed log(p) value. =cut sub log_p_detailed { my ($self, $log_p_val) = @_; if ($log_p_val) { $self->{log_p_detailed} = $log_p_val; } return $self->{log_p_detailed}; } =head2 matrix_ID Title : matrix_ID Usage : my $matrix_id = $tffm->matrix_ID(); Function: Get/set the ID of the matrix associated to this TFFM. Returns : ID of the matrix associated to this TFFM. Args : None for get or a JASPAR matrix ID. =cut sub matrix_ID { my ($self, $matrix_id) = @_; if ($matrix_id) { $self->{matrix_ID} = $matrix_id; } return $self->{matrix_ID}; } =head2 matrix Title : matrix Usage : my $matrix = $tffm->matrix(); Function: Get/set the matrix object related to this TFFM Returns : A reference to TFBS::Matrix object which was used to train the TFFM. Args : None for get or a new TFBS::Matrix object reference. =cut sub matrix { my ($self, $matrix) = @_; if ($matrix) { if ($matrix->ISA("TFBS::Matrix")) { $self->{matrix} = $matrix; } else { $self->throw( "Provided matrix argument does not refer to a TFBS::Matrix" . " object" ); } } return $self->{'matrix'}; } 1; TFBS-0.7.1/TFBS/Tools/000077500000000000000000000000001305752266700141305ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Tools/.svn/000077500000000000000000000000001305752266700150145ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Tools/.svn/all-wcprops000077500000000000000000000003021305752266700172000ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 39 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Tools END SetOperations.pm K 25 svn:wc:ra_dav:version-url V 56 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Tools/SetOperations.pm END TFBS-0.7.1/TFBS/Tools/.svn/entries000077500000000000000000000005341305752266700164150ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/Tools http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df SetOperations.pm file 2009-08-07T13:10:43.000000Z de714ffa79e6ed4e3c997e3dfce58183 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/Tools/.svn/format000077500000000000000000000000021305752266700162220ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/Tools/.svn/text-base/000077500000000000000000000000001305752266700167105ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Tools/.svn/text-base/SetOperations.pm.svn-base000077500000000000000000000134511305752266700235710ustar00rootroot00000000000000package TFBS::Tools::SetOperations; use strict; use Bio::Root::Root; use vars qw'@ISA'; @ISA = qw'Bio::Root::Root'; sub new { my ($caller, @args) = @_; my $self = bless {}, ref $caller || $caller; my ($index_by, $strict, $output_type, $pairs) = $self->_rearrange([qw'INDEX_BY STRICT OUTPUT_TYPE PAIRS'], @args); $self->index_by($index_by); $self->strict($strict); $self->output_type($output_type); $self->pairs($pairs); return $self; } sub union { my ($self, @sets) = @_; my %union_index = map {$self->_index($_)} $self->_sets_to_arrayrefs(@sets); $self->_output(\%union_index); } sub intersection { my ($self, @sets) = @_; my @set_arrayrefs = $self->_sets_to_arrayrefs(@sets); #this would be faster, but we might want to retain the exact objects # that were present in #my @set_arrayrefs = sort {@$a <=> @$b} $self->_sets_to_arrayrefs(@sets); my %intersection_index = $self->_index(shift @set_arrayrefs); foreach my $set_arrayref (@set_arrayrefs) { my %curr_set_index = $self->_index($set_arrayref); my @help_array = %curr_set_index; foreach my $key (keys %intersection_index) { if (!exists $curr_set_index{$key}) { delete $intersection_index{$key} ; } } } $self->_output(\%intersection_index); } sub difference { # pairs only for now my ($self, @sets) = @_; my ($set1, $set2) = $self->_sets_to_arrayrefs(@sets); if (!defined $set2) { $self->throw ("'difference' needs exactly two sets as arguments"); } my %diff_index1 = $self->_index($set1); my %diff_index2 = $self->_index($set2); foreach my $key (keys %diff_index1) { if (exists $diff_index2{$key}) { delete $diff_index1{$key}; delete $diff_index2{$key}; } } wantarray ? ($self->_output(\%diff_index1), $self->_output(\%diff_index2)) : $self->_output(\%diff_index1); } sub index_by { my $self = shift; # By default, we are dealing with Bio::SeqFeatureI objects my @DEFAULTS = qw(primary_tag source_tag start end score strand); if (@_) { if(!defined $_[0]) { $self->{_index_by} = \@DEFAULTS; } elsif (ref($_[0]) eq "ARRAY") { $self->{_index_by} = $_[0]; } else { $self->{_index_by} = [@_]; } } return @{$self->{_index_by}}; } sub strict { my $self = shift; if (@_) { if ($self->{_strict} = shift) { $self->{_index_fn} = \&_index_strict; } else { $self->{_index_fn} = \&_index_by_annotation; } } return $self->{_strict}; } sub output_type { my $self = shift; if (@_) { unless ($self->{_output_type} = shift) { $self->{_output_type} = "arrayref" } } return $self->{_output_type}; } sub pairs { my $self = shift; if (@_) { if ($self->{_pairs} = shift and !$self->strict) { $self->{_index_fn} = \&_index_by_pair_annotation; } } return $self->{_pairs}; } sub _index { my ($self) = @_; $self->{_index_fn}->(@_); } sub _index_strict { my ($self, $set_arrayref) = @_; my %index_hash = (map {$_, $_} @$set_arrayref); return %index_hash; } sub _index_by_pair_annotation { my ($self, $set_arrayref) = @_; my %index_hash; foreach my $member (@$set_arrayref) { my @index_elements = ($self->_get_index_elements($member->feature1), $self->_get_index_elements($member->feature2)); $index_hash{join("::", @index_elements)} = $member; } return %index_hash; } sub _index_by_annotation { my ($self, $set_arrayref) = @_; my %index_hash; foreach my $member (@$set_arrayref) { my @index_elements = $self->_get_index_elements($member); $index_hash{join("::", @index_elements)} = $member; } return %index_hash; } sub _get_index_elements { my ($self, $set_member) = @_; my @index_elements; foreach my $method ($self->index_by) { if (ref($method) eq 'CODE') { push @index_elements, $method->($set_member); } else { eval { push @index_elements, $set_member->$method; }; if ($@) { $self->throw(sprintf("Could not use '%s' for indexing a %s object. The original error was:\n", $method, ref($set_member)).$@) } } } return @index_elements; } sub _sets_to_arrayrefs { my ($self, @sets) = @_; my @set_arrayrefs; foreach my $set (@sets) { if (ref($set) eq "ARRAY") { push @set_arrayrefs, $set; } elsif(ref($set) and $set->can("Iterator")) { my @set_elements; my $it = $set->Iterator; while (my $set_el = $it->next) { push @set_elements, $set_el } push @set_arrayrefs, \@set_elements; } else { $self->throw("Set must be an aray reference or have an ". "Iterator method. Got ".(ref($set or $set)). "instead."); } } return @set_arrayrefs; } sub _output { my ($self, $hashref) = @_; if ($self->output_type eq "arrayref") { return [values %$hashref]; } elsif ($self->output_type eq "array") { return %$hashref; } elsif ($self->output_type eq "matrix_set") { my $setobj = TFBS::MatrixSet->new; $setobj->add_Matrix(values %$hashref); return $setobj; } elsif ($self->output_type eq "site_set") { my $setobj = TFBS::SiteSet->new; $setobj->add_site(values %$hashref); return $setobj; } else { $self->throw($self->output_type." is not a supported output type"); } } 1; TFBS-0.7.1/TFBS/Tools/SetOperations.pm000077500000000000000000000134511305752266700172740ustar00rootroot00000000000000package TFBS::Tools::SetOperations; use strict; use Bio::Root::Root; use vars qw'@ISA'; @ISA = qw'Bio::Root::Root'; sub new { my ($caller, @args) = @_; my $self = bless {}, ref $caller || $caller; my ($index_by, $strict, $output_type, $pairs) = $self->_rearrange([qw'INDEX_BY STRICT OUTPUT_TYPE PAIRS'], @args); $self->index_by($index_by); $self->strict($strict); $self->output_type($output_type); $self->pairs($pairs); return $self; } sub union { my ($self, @sets) = @_; my %union_index = map {$self->_index($_)} $self->_sets_to_arrayrefs(@sets); $self->_output(\%union_index); } sub intersection { my ($self, @sets) = @_; my @set_arrayrefs = $self->_sets_to_arrayrefs(@sets); #this would be faster, but we might want to retain the exact objects # that were present in #my @set_arrayrefs = sort {@$a <=> @$b} $self->_sets_to_arrayrefs(@sets); my %intersection_index = $self->_index(shift @set_arrayrefs); foreach my $set_arrayref (@set_arrayrefs) { my %curr_set_index = $self->_index($set_arrayref); my @help_array = %curr_set_index; foreach my $key (keys %intersection_index) { if (!exists $curr_set_index{$key}) { delete $intersection_index{$key} ; } } } $self->_output(\%intersection_index); } sub difference { # pairs only for now my ($self, @sets) = @_; my ($set1, $set2) = $self->_sets_to_arrayrefs(@sets); if (!defined $set2) { $self->throw ("'difference' needs exactly two sets as arguments"); } my %diff_index1 = $self->_index($set1); my %diff_index2 = $self->_index($set2); foreach my $key (keys %diff_index1) { if (exists $diff_index2{$key}) { delete $diff_index1{$key}; delete $diff_index2{$key}; } } wantarray ? ($self->_output(\%diff_index1), $self->_output(\%diff_index2)) : $self->_output(\%diff_index1); } sub index_by { my $self = shift; # By default, we are dealing with Bio::SeqFeatureI objects my @DEFAULTS = qw(primary_tag source_tag start end score strand); if (@_) { if(!defined $_[0]) { $self->{_index_by} = \@DEFAULTS; } elsif (ref($_[0]) eq "ARRAY") { $self->{_index_by} = $_[0]; } else { $self->{_index_by} = [@_]; } } return @{$self->{_index_by}}; } sub strict { my $self = shift; if (@_) { if ($self->{_strict} = shift) { $self->{_index_fn} = \&_index_strict; } else { $self->{_index_fn} = \&_index_by_annotation; } } return $self->{_strict}; } sub output_type { my $self = shift; if (@_) { unless ($self->{_output_type} = shift) { $self->{_output_type} = "arrayref" } } return $self->{_output_type}; } sub pairs { my $self = shift; if (@_) { if ($self->{_pairs} = shift and !$self->strict) { $self->{_index_fn} = \&_index_by_pair_annotation; } } return $self->{_pairs}; } sub _index { my ($self) = @_; $self->{_index_fn}->(@_); } sub _index_strict { my ($self, $set_arrayref) = @_; my %index_hash = (map {$_, $_} @$set_arrayref); return %index_hash; } sub _index_by_pair_annotation { my ($self, $set_arrayref) = @_; my %index_hash; foreach my $member (@$set_arrayref) { my @index_elements = ($self->_get_index_elements($member->feature1), $self->_get_index_elements($member->feature2)); $index_hash{join("::", @index_elements)} = $member; } return %index_hash; } sub _index_by_annotation { my ($self, $set_arrayref) = @_; my %index_hash; foreach my $member (@$set_arrayref) { my @index_elements = $self->_get_index_elements($member); $index_hash{join("::", @index_elements)} = $member; } return %index_hash; } sub _get_index_elements { my ($self, $set_member) = @_; my @index_elements; foreach my $method ($self->index_by) { if (ref($method) eq 'CODE') { push @index_elements, $method->($set_member); } else { eval { push @index_elements, $set_member->$method; }; if ($@) { $self->throw(sprintf("Could not use '%s' for indexing a %s object. The original error was:\n", $method, ref($set_member)).$@) } } } return @index_elements; } sub _sets_to_arrayrefs { my ($self, @sets) = @_; my @set_arrayrefs; foreach my $set (@sets) { if (ref($set) eq "ARRAY") { push @set_arrayrefs, $set; } elsif(ref($set) and $set->can("Iterator")) { my @set_elements; my $it = $set->Iterator; while (my $set_el = $it->next) { push @set_elements, $set_el } push @set_arrayrefs, \@set_elements; } else { $self->throw("Set must be an aray reference or have an ". "Iterator method. Got ".(ref($set or $set)). "instead."); } } return @set_arrayrefs; } sub _output { my ($self, $hashref) = @_; if ($self->output_type eq "arrayref") { return [values %$hashref]; } elsif ($self->output_type eq "array") { return %$hashref; } elsif ($self->output_type eq "matrix_set") { my $setobj = TFBS::MatrixSet->new; $setobj->add_Matrix(values %$hashref); return $setobj; } elsif ($self->output_type eq "site_set") { my $setobj = TFBS::SiteSet->new; $setobj->add_site(values %$hashref); return $setobj; } else { $self->throw($self->output_type." is not a supported output type"); } } 1; TFBS-0.7.1/TFBS/Word.pm000077500000000000000000000042751305752266700143140ustar00rootroot00000000000000# TFBS module for TFBS::Word # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Word - base class for word-based patterns =head1 DESCRIPTION TFBS::Word is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and word pattern manipulation methods that are independent of the word type. It is not meant to be instantiated itself. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Word; use vars '@ISA'; use TFBS::PatternI; use strict; @ISA = qw(TFBS::PatternI); =head2 new =cut sub new { my ($caller, @args) = @_; my $self = $caller->SUPER::new(@args); my ($id, $name, $class, $word, $tagref) = $self->_rearrange([qw(ID NAME CLASS WORD TAGS)], @args); if (defined $word) { $self->word($word); } else { $self->throw("Need a -word argument"); } $self->name($name); $self->ID($id); $self->{'tags'} = ($tagref or {}); return $self; } =head2 word =cut sub word { my ($self, @args) = @_; if(scalar(@args) == 0) { return $self->{'word'}; } my ($word) = @args; if (defined $word and ! $self->validate_word($word)) { $self->throw("Trying to set the word to an invalid value: $word"); } else { return $self->{'word'} = $word; } } =head2 validate_word Required in all subclasses =cut sub validate_word { shift->throw("Error: method 'validate_word' not implemented"); } =head2 length =cut sub length { # wird length does not have to be defined, but its subroutine does shift->throw("Error: method 'length' not implemented"); } =head2 search_seq =cut sub search_seq { shift->throw("Error: method search_seq not implemented"); } =head2 search_aln =cut sub search_aln { shift->throw("Error: method search_aln not implemented"); } 1;TFBS-0.7.1/TFBS/Word/000077500000000000000000000000001305752266700137435ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Word/.svn/000077500000000000000000000000001305752266700146275ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Word/.svn/all-wcprops000077500000000000000000000002701305752266700170170ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 38 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Word END Consensus.pm K 25 svn:wc:ra_dav:version-url V 51 /svn/lenhard/!svn/ver/8/TFBS/TFBS/Word/Consensus.pm END TFBS-0.7.1/TFBS/Word/.svn/entries000077500000000000000000000005271305752266700162320ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/Word http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df Consensus.pm file 2009-08-07T13:11:00.000000Z 0b9f7fd3d406a1f03748bc19ea6e191c 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/Word/.svn/format000077500000000000000000000000021305752266700160350ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/Word/.svn/text-base/000077500000000000000000000000001305752266700165235ustar00rootroot00000000000000TFBS-0.7.1/TFBS/Word/.svn/text-base/Consensus.pm.svn-base000077500000000000000000000150411305752266700225620ustar00rootroot00000000000000# TFBS module for TFBS::Word::Consensus # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Word - IUPAC DNA consensus word-based pattern class =head1 DESCRIPTION TFBS::Word is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and word pattern manipulation methods that are independent of the word type. It is not meant to be instantiated itself. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut package TFBS::Word::Consensus; use vars '@ISA'; use TFBS::Word; use TFBS::Matrix::PWM; use strict; @ISA = qw(TFBS::Word); =head2 new Title : new Usage : my $pwm = TFBS::Matrix::PWM->new(%args) Function: constructor for the TFBS::Matrix::PWM object Returns : a new TFBS::Matrix::PWM object Args : # you must specify the -word argument: -word, # a strig consisting of letters in # IUPAC degenerate DNA alphabet # (any of ACGTSWKMPYBDHVN) ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # a hash reference reference, OPTIONAL =cut # "new" is inherited from TFBS::Word =head2 search_seq Title : search_seq Usage : my $siteset = $pwm->search_seq(%args) Function: scans a nucleotide sequence with the pattern represented by the PWM Returns : a TFBS::SiteSet object Args : # you must specify either one of the following three: -file, # the name od a fasta file (single sequence) #or -seqobj # a Bio::Seq object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -seqstring # a string containing the sequence -max_mismatches, # number of allowed positions in the site that do # not match the consensus # OPTIONAL: default 0 =cut sub search_seq { my ($self, @args) = @_; my ($max_mismatch) = $self->_rearrange([qw(MAX_MISMATCHES)], @args) or 0; $max_mismatch = 0 unless defined $max_mismatch; my $pwm = $self->to_PWM; my $siteset = $pwm->search_seq(@args, -threshold => $self->length - $max_mismatch); $self->_replace_patterns_in_siteset($siteset); return $siteset; } =head2 search_aln Title : search_aln Usage : my $site_pair_set = $pwm->search_aln(%args) Function: Scans a pairwise alignment of nucleotide sequences with the pattern represented by the word: it reports only those hits that are present in equivalent positions of both sequences and exceed a specified threshold score in both, AND are found in regions of the alignment above the specified conservation cutoff value. Returns : a TFBS::SitePairSet object Args : # you must specify either one of the following three: -file, # the name of the alignment file in Clustal format #or -alignobj # a Bio::SimpleAlign object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -alignstring # a multi-line string containing the alignment # in clustal format ############# -max_mismatches, # number of allowed positions in the site that do # not match the consensus # OPTIONAL: default 0 -window, # size of the sliding window (inn nucleotides) # for calculating local conservation in the # alignment # OPTIONAL: default 50 -cutoff # conservation cutoff (%) for including the # region in the results of the pattern search # OPTIONAL: default "70%" =cut sub search_aln { my ($self, @args) = @_; my ($max_mismatch) = $self->_rearrange([qw(MAX_MISMATCHES)], @args) or 0; $max_mismatch = 0 unless defined $max_mismatch; my $pwm = $self->to_PWM; my $sitepairset = $pwm->search_aln(@args, -threshold => $self->length - $max_mismatch); $self->_replace_patterns_in_sitepairset($sitepairset); return $sitepairset; } =head2 to_PWM =cut sub to_PWM { my ($self, @args) = @_; my $pwm = TFBS::Matrix::PWM->new(-ID => $self->ID, -name => $self->name, -class => $self->class, -matrix => _consensus2matrixref($self->word), -tags => {$self->all_tags} ); return $pwm; } =head2 validate_word =cut sub validate_word { my ($self, $word) = @_; $word =~ s/[ACGTSWKMRYBDHVN]//gi; return ($word eq ""); } =head2 length =cut sub length { return length $_[0]->word; } # private methods sub _replace_patterns_in_siteset { my ($self, $siteset) = @_; my $iter = $siteset->Iterator; while (my $site = $iter->next) { $site->pattern($self); } } sub _replace_patterns_in_sitepairset { my ($self, $sitepairset) = @_; my $iter = $sitepairset->Iterator; while (my $sitepair = $iter->next) { $sitepair->feature1->pattern($self); $sitepair->feature2->pattern($self); } } # utility functions sub _consensus2matrixref { my ($word) = @_; my %iupac = ( T => [0,0,0,1], G => [0,0,1,0], K => [0,0,1,1], C => [0,1,0,0], Y => [0,1,0,1], S => [0,1,1,0], B => [0,1,1,1], A => [1,0,0,0], W => [1,0,0,1], R => [1,0,1,0], D => [1,0,1,1], M => [1,1,0,0], H => [1,1,0,1], V => [1,1,1,0], N => [1,1,1,1] ); my @vert_array; foreach my $letter (split '', $word) { push @vert_array, ($iupac{uc($letter)} or croak ("$letter is not a legal IUPAC DNA character")); } return _transpose_arrayref(\@vert_array); } sub _transpose_arrayref { my $vert_arrayref = shift; my $maxcol = scalar(@$vert_arrayref) - 1; my @horiz_array; foreach my $row (0..3) { push @horiz_array, [ map { $vert_arrayref->[$_][$row] } 0..$maxcol ]; } return \@horiz_array; } 1; TFBS-0.7.1/TFBS/Word/Consensus.pm000077500000000000000000000150411305752266700162650ustar00rootroot00000000000000# TFBS module for TFBS::Word::Consensus # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Word - IUPAC DNA consensus word-based pattern class =head1 DESCRIPTION TFBS::Word is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and word pattern manipulation methods that are independent of the word type. It is not meant to be instantiated itself. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut package TFBS::Word::Consensus; use vars '@ISA'; use TFBS::Word; use TFBS::Matrix::PWM; use strict; @ISA = qw(TFBS::Word); =head2 new Title : new Usage : my $pwm = TFBS::Matrix::PWM->new(%args) Function: constructor for the TFBS::Matrix::PWM object Returns : a new TFBS::Matrix::PWM object Args : # you must specify the -word argument: -word, # a strig consisting of letters in # IUPAC degenerate DNA alphabet # (any of ACGTSWKMPYBDHVN) ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # a hash reference reference, OPTIONAL =cut # "new" is inherited from TFBS::Word =head2 search_seq Title : search_seq Usage : my $siteset = $pwm->search_seq(%args) Function: scans a nucleotide sequence with the pattern represented by the PWM Returns : a TFBS::SiteSet object Args : # you must specify either one of the following three: -file, # the name od a fasta file (single sequence) #or -seqobj # a Bio::Seq object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -seqstring # a string containing the sequence -max_mismatches, # number of allowed positions in the site that do # not match the consensus # OPTIONAL: default 0 =cut sub search_seq { my ($self, @args) = @_; my ($max_mismatch) = $self->_rearrange([qw(MAX_MISMATCHES)], @args) or 0; $max_mismatch = 0 unless defined $max_mismatch; my $pwm = $self->to_PWM; my $siteset = $pwm->search_seq(@args, -threshold => $self->length - $max_mismatch); $self->_replace_patterns_in_siteset($siteset); return $siteset; } =head2 search_aln Title : search_aln Usage : my $site_pair_set = $pwm->search_aln(%args) Function: Scans a pairwise alignment of nucleotide sequences with the pattern represented by the word: it reports only those hits that are present in equivalent positions of both sequences and exceed a specified threshold score in both, AND are found in regions of the alignment above the specified conservation cutoff value. Returns : a TFBS::SitePairSet object Args : # you must specify either one of the following three: -file, # the name of the alignment file in Clustal format #or -alignobj # a Bio::SimpleAlign object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -alignstring # a multi-line string containing the alignment # in clustal format ############# -max_mismatches, # number of allowed positions in the site that do # not match the consensus # OPTIONAL: default 0 -window, # size of the sliding window (inn nucleotides) # for calculating local conservation in the # alignment # OPTIONAL: default 50 -cutoff # conservation cutoff (%) for including the # region in the results of the pattern search # OPTIONAL: default "70%" =cut sub search_aln { my ($self, @args) = @_; my ($max_mismatch) = $self->_rearrange([qw(MAX_MISMATCHES)], @args) or 0; $max_mismatch = 0 unless defined $max_mismatch; my $pwm = $self->to_PWM; my $sitepairset = $pwm->search_aln(@args, -threshold => $self->length - $max_mismatch); $self->_replace_patterns_in_sitepairset($sitepairset); return $sitepairset; } =head2 to_PWM =cut sub to_PWM { my ($self, @args) = @_; my $pwm = TFBS::Matrix::PWM->new(-ID => $self->ID, -name => $self->name, -class => $self->class, -matrix => _consensus2matrixref($self->word), -tags => {$self->all_tags} ); return $pwm; } =head2 validate_word =cut sub validate_word { my ($self, $word) = @_; $word =~ s/[ACGTSWKMRYBDHVN]//gi; return ($word eq ""); } =head2 length =cut sub length { return length $_[0]->word; } # private methods sub _replace_patterns_in_siteset { my ($self, $siteset) = @_; my $iter = $siteset->Iterator; while (my $site = $iter->next) { $site->pattern($self); } } sub _replace_patterns_in_sitepairset { my ($self, $sitepairset) = @_; my $iter = $sitepairset->Iterator; while (my $sitepair = $iter->next) { $sitepair->feature1->pattern($self); $sitepair->feature2->pattern($self); } } # utility functions sub _consensus2matrixref { my ($word) = @_; my %iupac = ( T => [0,0,0,1], G => [0,0,1,0], K => [0,0,1,1], C => [0,1,0,0], Y => [0,1,0,1], S => [0,1,1,0], B => [0,1,1,1], A => [1,0,0,0], W => [1,0,0,1], R => [1,0,1,0], D => [1,0,1,1], M => [1,1,0,0], H => [1,1,0,1], V => [1,1,1,0], N => [1,1,1,1] ); my @vert_array; foreach my $letter (split '', $word) { push @vert_array, ($iupac{uc($letter)} or croak ("$letter is not a legal IUPAC DNA character")); } return _transpose_arrayref(\@vert_array); } sub _transpose_arrayref { my $vert_arrayref = shift; my $maxcol = scalar(@$vert_arrayref) - 1; my @horiz_array; foreach my $row (0..3) { push @horiz_array, [ map { $vert_arrayref->[$_][$row] } 0..$maxcol ]; } return \@horiz_array; } 1; TFBS-0.7.1/TFBS/_Iterator.pm000077500000000000000000000030071305752266700153210ustar00rootroot00000000000000package TFBS::_Iterator; use vars '@ISA'; use strict; use Carp; @ISA = qw(Bio::Root::Root); ############################################################# # PUBLIC METHODS ############################################################# sub new { my ($caller, $arrayref, $sort_by, $reverse) = @_; my $class = ref $caller || $caller; my $self; if ($arrayref) { $self = bless { _orig_array_ref => [ @$arrayref ], _iterator_array_ref => [ @$arrayref ], _sort_by => ($sort_by || undef), _reverse => ($reverse || 0) }, $class; } else { croak("No valid array ref for Iterator of ". (ref($class) || $class)." provided:"); } $self->_sort() if $sort_by; $self->_reverse() if $reverse; return $self; } sub current { } sub reset { my ($self) = @_; @{$self->{_iterator_array_ref}} = @{$self->{_orig_array_ref}}; $self->_sort() if $self->{'_sort_by'}; $self->_reverse() if $self->{'reverse'}; return $self; } sub next { my $self = shift; return shift @{$self->{_iterator_array_ref}}; } ################################################################# # PRIVATE METHODS ################################################################# sub _sort { my ($self, $sort_by) = @_; $self->throw("Generic iterator cannot sort ".ref($self). " object by '$sort_by'."); } sub _reverse { my $self = shift; $self->{'_iterator_array_ref'} = [ reverse @{ $self->{'_iterator_array_ref'} } ]; } TFBS-0.7.1/TFBS/_Iterator/000077500000000000000000000000001305752266700147605ustar00rootroot00000000000000TFBS-0.7.1/TFBS/_Iterator/.svn/000077500000000000000000000000001305752266700156445ustar00rootroot00000000000000TFBS-0.7.1/TFBS/_Iterator/.svn/all-wcprops000077500000000000000000000005201305752266700200320ustar00rootroot00000000000000K 25 svn:wc:ra_dav:version-url V 43 /svn/lenhard/!svn/ver/8/TFBS/TFBS/_Iterator END _MatrixSetIterator.pm K 25 svn:wc:ra_dav:version-url V 65 /svn/lenhard/!svn/ver/8/TFBS/TFBS/_Iterator/_MatrixSetIterator.pm END _SiteSetIterator.pm K 25 svn:wc:ra_dav:version-url V 63 /svn/lenhard/!svn/ver/8/TFBS/TFBS/_Iterator/_SiteSetIterator.pm END TFBS-0.7.1/TFBS/_Iterator/.svn/entries000077500000000000000000000007441305752266700172500ustar00rootroot000000000000008 dir 435 http://www.ii.uib.no/svn/lenhard/TFBS/TFBS/_Iterator http://www.ii.uib.no/svn/lenhard 2008-01-24T20:21:25.772223Z 8 chrb svn:special svn:externals svn:needs-lock 92b4b857-2e4f-4894-b4a8-5712848ce9df _MatrixSetIterator.pm file 2009-08-07T13:10:38.000000Z d4d3720758267cb683a5840b30e49bdc 2008-01-24T20:21:25.772223Z 8 chrb _SiteSetIterator.pm file 2009-08-07T13:10:38.000000Z 03a4a343868d81822e866344ae82ee13 2008-01-24T20:21:25.772223Z 8 chrb TFBS-0.7.1/TFBS/_Iterator/.svn/format000077500000000000000000000000021305752266700170520ustar00rootroot000000000000008 TFBS-0.7.1/TFBS/_Iterator/.svn/text-base/000077500000000000000000000000001305752266700175405ustar00rootroot00000000000000TFBS-0.7.1/TFBS/_Iterator/.svn/text-base/_MatrixSetIterator.pm.svn-base000077500000000000000000000027521305752266700254150ustar00rootroot00000000000000package TFBS::_Iterator::_MatrixSetIterator; use vars '@ISA'; use strict; use Carp; use TFBS::_Iterator; @ISA = qw(TFBS::_Iterator); sub _sort { my ($self, $sort_by) = @_; $sort_by or $sort_by = $self->{_sort_by} or $sort_by = 'name'; # we can sort by name, start, end, score my %sort_fn = (class => sub { $a->class() cmp $b->class() || $a->name() cmp $b->name() || $a->ID() cmp $b->ID() }, id => sub { $a->ID() cmp $b->ID() }, ID => sub { $a->ID() cmp $b->ID() }, name => sub { $a->name() cmp $b->name() || $a->class() cmp $b->class() || $a->ID() cmp $b->ID() }, species => sub { $a->tag('species') cmp $b->tag('species') || $a->class() cmp $b->class() || $a->ID() cmp $b->ID() }, total_ic => sub { $b->total_ic() <=> $a->total_ic() || $a->name() cmp $b->name() } ); if (defined (my $sort_function = $sort_fn{lc $sort_by})) { $self->{'_iterator_array_ref'} = [ sort $sort_function @{$self->{'_orig_array_ref'}} ]; } else { #order by tag derived value $self->{'_iterator_array_ref'}= [ sort { $a->tag($self->{_sort_by}) cmp $b->tag( $self->{_sort_by}) || $a->class() cmp $b->class() || $a->ID() cmp $b->ID() } @{$self->{'_orig_array_ref'}} ] || $self->throw("Cannot sort ".ref($self)." object by '$sort_by'."); } } TFBS-0.7.1/TFBS/_Iterator/.svn/text-base/_SiteSetIterator.pm.svn-base000077500000000000000000000025331305752266700250520ustar00rootroot00000000000000package TFBS::_Iterator::_SiteSetIterator; use vars '@ISA'; use strict; use Carp; use TFBS::_Iterator; @ISA = qw(TFBS::_Iterator); sub _sort { my ($self, $sort_by) = @_; $sort_by or $sort_by = $self->{_sort_by} or $sort_by = 'name'; # we can sort by name, start, end, score my %sort_fn = (start => sub { $a->start() <=> $b->start() || $a->pattern->name() cmp $b->pattern->name() || $a->strand() <=> $b->strand() }, end => sub { $a->end() <=> $b->end() || $a->pattern->name() cmp $b->pattern->name() || $a->strand() <=> $b->strand() }, ID => sub { $a->pattern->ID() cmp $b->pattern->ID() || $a->start() <=> $b->start() || $a->end() <=> $b->end() || $a->strand() <=> $b->strand() }, name => sub { $a->pattern->name() cmp $b->pattern->name() || $a->start() <=> $b->start() || $a->end() <=> $b->end() || $a->strand() <=> $b->strand() }, score => sub { $b->score() <=> $a->score() || $a->pattern->name() cmp $b->pattern->name() || $a->strand() <=> $b->strand() } ); if (defined (my $sort_function = $sort_fn{lc $sort_by})) { $self->{'_iterator_array_ref'} = [ sort $sort_function @{$self->{'_orig_array_ref'}} ]; } else { $self->throw("Cannot sort ".ref($self)." object by '$sort_by'."); } } TFBS-0.7.1/TFBS/_Iterator/_MatrixSetIterator.pm000077500000000000000000000040431305752266700211130ustar00rootroot00000000000000package TFBS::_Iterator::_MatrixSetIterator; use vars '@ISA'; use strict; use Carp; use TFBS::_Iterator; @ISA = qw(TFBS::_Iterator); # # Changed name field sorts to case insensitive which results in a more # intuitive name sort order, at least for the JASPAR Web site # DJA 2015/09/16 # sub _sort { my ($self, $sort_by) = @_; $sort_by or $sort_by = $self->{_sort_by} or $sort_by = 'name'; # we can sort by name, start, end, score my %sort_fn = (class => sub { $a->class() cmp $b->class() || uc $a->name() cmp uc $b->name() || $a->ID() cmp $b->ID() }, id => sub { $a->ID() cmp $b->ID() }, ID => sub { $a->ID() cmp $b->ID() }, name => sub { uc $a->name() cmp uc $b->name() || $a->class() cmp $b->class() || $a->ID() cmp $b->ID() }, species => sub { $a->tag('species') cmp $b->tag('species') || $a->class() cmp $b->class() || $a->ID() cmp $b->ID() }, total_ic => sub { $b->total_ic() <=> $a->total_ic() || uc $a->name() cmp uc $b->name() } ); if (defined (my $sort_function = $sort_fn{lc $sort_by})) { $self->{'_iterator_array_ref'} = [ sort $sort_function @{$self->{'_orig_array_ref'}} ]; } else { #order by tag derived value $self->{'_iterator_array_ref'}= [ sort { $a->tag($self->{_sort_by}) cmp $b->tag( $self->{_sort_by}) || $a->class() cmp $b->class() || $a->ID() cmp $b->ID() } @{$self->{'_orig_array_ref'}} ] || $self->throw("Cannot sort ".ref($self)." object by '$sort_by'."); } } TFBS-0.7.1/TFBS/_Iterator/_SiteSetIterator.pm000077500000000000000000000027531305752266700205610ustar00rootroot00000000000000package TFBS::_Iterator::_SiteSetIterator; use vars '@ISA'; use strict; use Carp; use TFBS::_Iterator; @ISA = qw(TFBS::_Iterator); # # Changed name field sorts to case insensitive which results in a more # intuitive name sort order. # DJA 2015/09/16 # sub _sort { my ($self, $sort_by) = @_; $sort_by or $sort_by = $self->{_sort_by} or $sort_by = 'name'; # we can sort by name, start, end, score my %sort_fn = (start => sub { $a->start() <=> $b->start() || uc $a->pattern->name() cmp uc $b->pattern->name() || $a->strand() <=> $b->strand() }, end => sub { $a->end() <=> $b->end() || uc $a->pattern->name() cmp uc $b->pattern->name() || $a->strand() <=> $b->strand() }, ID => sub { $a->pattern->ID() cmp $b->pattern->ID() || $a->start() <=> $b->start() || $a->end() <=> $b->end() || $a->strand() <=> $b->strand() }, name => sub { uc $a->pattern->name() cmp uc $b->pattern->name() || $a->start() <=> $b->start() || $a->end() <=> $b->end() || $a->strand() <=> $b->strand() }, score => sub { $b->score() <=> $a->score() || uc $a->pattern->name() cmp uc $b->pattern->name() || $a->strand() <=> $b->strand() } ); if (defined (my $sort_function = $sort_fn{lc $sort_by})) { $self->{'_iterator_array_ref'} = [ sort $sort_function @{$self->{'_orig_array_ref'}} ]; } else { $self->throw("Cannot sort ".ref($self)." object by '$sort_by'."); } } TFBS-0.7.1/blib/000077500000000000000000000000001305752266700132025ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/000077500000000000000000000000001305752266700141175ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/.exists000066400000000000000000000000001305752266700154250ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/auto/000077500000000000000000000000001305752266700150675ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/auto/TFBS/000077500000000000000000000000001305752266700156255ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/auto/TFBS/.exists000066400000000000000000000000001305752266700171330ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/auto/TFBS/Ext/000077500000000000000000000000001305752266700163655ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/auto/TFBS/Ext/pwmsearch/000077500000000000000000000000001305752266700203565ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/auto/TFBS/Ext/pwmsearch/.exists000066400000000000000000000000001305752266700216640ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/auto/TFBS/Ext/pwmsearch/pwmsearch.bs000066400000000000000000000000001305752266700226630ustar00rootroot00000000000000TFBS-0.7.1/blib/arch/auto/TFBS/Ext/pwmsearch/pwmsearch.bundle000077500000000000000000001427341305752266700235600ustar00rootroot00000000000000ĘţşľS¬ pUÜ Îúíţ „…Đ__TEXT00__text__TEXT{ §{ €__symbol_stub__TEXT"("(€__stub_helper__TEXT0)Î0)€__const__TEXT+ +__cstring__TEXT -• -__unwind_info__TEXTµ/Hµ/H__DATA00__nl_symbol_ptr__DATA00-__la_symbol_ptr__DATA0´02__data__DATAČ0Č0__bss__DATAä0”8__LINKEDIT@ @¬"€0@@\p@Ľ,D<đE‘8Nt PPP`1ĽL_-Ó+®Ô%6™µ˙?ŹŠéöÓ$  * 4©/usr/lib/libSystem.B.dylib&hE)„E+„ElU‰ĺSWVělRč^‹†€#‰… ­˙˙‹‰EđdžX$‹E‰D$Ť˝¨­˙˙‰<$ÇD$Äč‹E ‰D$Ť…zµ˙˙‰$ÇD$ň čôóZEň…`Á˙˙‹E‰D$Ť…cą˙˙‰$ÇD$ čĘ‹E‰D$Ť…L˝˙˙‰$ÇD$ č­Ç…8Á˙˙Ç…@Á˙˙Ť…pÁ˙˙‰D$‰<$‰űčH…ŔtŤ†” ‰$č!»˙˙˙˙餍†­ ‰D$Ť…zµ˙˙‰$či…Ŕt\‰Ç‹EŤŽË ‰L$‰$čN…ŔtX‰D$ ‰…¤­˙˙‰|$Ť…pÁ˙˙‰D$‰$čÍ 1Ű…ŔtŤ†č ‰$謻˙˙˙˙‹µ¤­˙˙ë,Ť†Ż ‰$č‘1˙»˙˙˙˙ëŤ†Í ‰$čz1ö»˙˙˙˙čΉ<$čĉ4$輋… ­˙˙‹;Eđu ‰ŘÄlR^_[]Ăč‹U‰ĺěčX‹€Ŕ!‹‹M‰L$‰$čŚ1ŔÄ]ĂU‰ĺňE‹E¸”x f.€śv‹M‹U ‰”ň€ś‰¤1Ŕ]ĂU‰ĺ‹E‹”‹U ‰ ¸”x‹U‹Mň€śň‹€¤‰1Ŕ]ĂU‰ĺSWVěě>čX‰…`Á˙˙‹€!‰…4Á˙˙‹‰Eđ‹}‹·Ŕ‹]…ö~V1Éş‰đ‰Î€<3DĘF9Ć|ô‰ĆLJ”˙˙˙˙Ť‡”‰…TÁ˙˙…É‹•`Á˙˙t9Ť‡‰…PÁ˙˙Ç…8Á˙˙1ŔélLJ”˙˙˙˙Ť‡”‰…TÁ˙˙‹•`Á˙˙Ť‡‰…PÁ˙˙1ŔŤŠĹ‰ŤLÁ˙˙ŤŠś‰Ť<Á˙˙‰ÁÇ…8Á˙˙Ç…XÁ˙˙ë‹·ŔA‰µhÁ˙˙Ť1€|˙„뉍\Á˙˙fWŔ…öfWÉ‹M ~e‹E‹•\Á˙˙Ť‰…dÁ˙˙ŤT¶űfWŔ1˙‰űfWÉ‹…dÁ˙˙ľ‹µ`Á˙˙‹´†,¸)đţDĆĐňXÁţňX ń‹…hÁ˙˙ÇÂűC9Ă|˝‹}f.ʏvdň…hÁ˙˙‹…PÁ˙˙8‹]u`Ç…dÁ˙˙ż‹Ť\Á˙˙…Ô‹E‰D$ňL$‰\$ ‰L$‹E‰D$‰<$ÇD$č˛ é™Ç…dÁ˙˙‹]éĆňŤ@Á˙˙Ç…dÁ˙˙°˝XÁ˙˙ču‹…LÁ˙˙‰$čr0Ŕ‹•XÁ˙˙‰ŃÁድ\Á˙˙‰´ pÁ˙˙Ç„ tÁ˙˙ň…@Á˙˙ň„ xÁ˙˙B‰•XÁ˙˙„Ŕu‹…<Á˙˙‰$č#Ç…dÁ˙˙˙˙˙˙ň…hÁ˙˙ë5‹…TÁ˙˙8ň…hÁ˙˙x f.Źśv‰Ź”ňŹśLJ¤f.‡¸†µ‹…PÁ˙˙8uEż‹Ť\Á˙˙…­‹E‰D$ňD$‰\$ ‰L$‹E‰D$‰<$ÇD$‰Îč{‰ńëkň…hÁ˙˙°˝XÁ˙˙ču‹…LÁ˙˙‰$čX0Ŕ‹•XÁ˙˙‰ŃÁድ\Á˙˙‰´ pÁ˙˙Ç„ tÁ˙˙ň…hÁ˙˙ň„ xÁ˙˙B‰•XÁ˙˙„ŔtM‹Ť\Á˙˙‹…dÁ˙˙…Ŕ„Ký˙˙ë/‹…TÁ˙˙8x f.‡śvŰ‰Ź”ň‡śLJ¤ëÁ‰…8Á˙˙ë‹…<Á˙˙‰$čĽÇ…8Á˙˙˙˙˙˙‹•`Á˙˙‹…XÁ˙˙‹ŤPÁ˙˙9t6…Ŕ~tŤ˝xÁ˙˙Ťšµ‰Ć‹GřňňD$@‰D$‰$č÷‰đÇHuÜëB‹…TÁ˙˙‹…Ŕx6‹Ź¤ň‡ś‹U‰T$ňD$‰L$‰\$ ‰D$‹E‰D$‰<$č‹…4Á˙˙‹;Eđu‹…8Á˙˙Äě>^_[]ĂčLU‰ĺSWVě č^‹ľRŤG‰†R‹]‰$č}@‰$čP‰„ľV…Ŕt‰\$‰$čWë Ť†6‰„ľVÄ ^_[]ĂU‰ĺSWVěč^ľň~:1˙Ťžö‹†‰Eđ‹Eđ‹‹ ‰L$ŤŽÁ‰L$‰$čÇĂG;ľň|×Ä^_[]ĂU‰ĺSWVěčX‹}˙Ť€h‰$č˙˙˙¸˙˙˙˙é!‹]‹u ‹F‰D$‰$čź‹F‰D$ŤŇ‰$芋F ‰$č7Ý›¸ÇÇĆé1Ŕ˙ŚŔŤé‰EđŤ¤‰E썻 ‰EčŤG˙‰Eäżë&‰L$‹Eđë‰L$‹Ečë‰D$‹Eě‰$č Çëk‹ľŠ€ů-u$€xbu ÇëN€xau Çë<€ů-•Á;}ä}1„Éu-€xmu ‹Lľ€9uŽ€xnu ‹Lľ€9u€xcu ‹Dľ€8u‚G;}|1ŔÄ^_[]ĂU‰ĺSWVělč[‹b‰…ŕ˙˙‹‰EđŤŹ‰D$‹E‰$艅¤ŕ˙˙…Ŕ„1˙Ťµ¨ŕ˙˙ŤZ‰… ŕ˙˙‰ťśŕ˙˙‰ű‰t$‹… ŕ˙˙‰D$‹…¤ŕ˙˙‰$čăř˙¸DŘ…Ű”ŔG˙çĆ„ŔuÄ…Ű„®‹…¤ŕ˙˙‰$‰ţÁţÁîţÁţč‹E‰°Ŕ˙Ś­1Ŕ‹Ťśŕ˙˙ňV‹} ‰ůŤ€1˙‰Ëë GĂ‹u‹¶ŔŻ÷ĆňŚő¨ŕ˙˙ň ˙uÜ‹} ň ×ňXL×ňXL×ňXL×ňYČňL× Á(@‹U‹˛Ŕ9đ|žë7Ť.ë ‹…śŕ˙˙Ť€b‰$čü˙˙‹…¤ŕ˙˙‰$čÓ¸˙˙˙˙é—‹} ‹EÇ€¬Ç€¨Ç€´Ç€°‰Ă1Ŕ…ö~aWŔ‹…śŕ˙˙ň^ňfWŰ1Éş‰ř(ę(áň7ň]ćň_îÇJuîňXÝň›¨ňXÄň°‰ÇÇ(A9ń|ż1Ŕ‹Ťŕ˙˙‹ ;Mđu Äl^_[]ĂčU‰ĺSWVěěčX‰…˙˙˙‹€.‰…˙˙˙‹‰Eđ‹E‰D$Ť…(˙˙˙‰$ÇD$ČčŐ…Ŕ”ĂtT‹u Ť…)˙˙˙‰D$‰4$čő‰4$čóĆD0˙‰4$čć…Ŕt&ż·€|>˙ uĆD>˙0˙‰4$čĹ9ÇsG„˙uß¶ó1É‹…˙˙˙Ť€L‰…˙˙˙‰Č‰…$˙˙˙ë‰ů‰Ď‰ř đ…Č‹E‰D$Ť…(˙˙˙‰$ÇD$Čč.ąŠť(˙˙˙…ŔtČ€ű>tĂ…ö‰ůu˝Ť…)˙˙˙ë ‹… ˙˙˙Š@‰… ˙˙˙ľË1ö„Űt›x ‹…˙˙˙‹€*‹D4öÄuŃ„Űx%@ë‰ $ÇD$@褋Ť$˙˙˙…Ŕuů?B‹EA‰Ť$˙˙˙땉Ť$˙˙˙‹…˙˙˙‰$č)ú˙˙ľ˙˙˙˙é*˙˙˙‹E‹Ť$˙˙˙Ć‹…˙˙˙‹;Eđu ‰đÄě^_[]Ăč?U‰ĺSWVě,CčX‰…ŕĽđ˙‹€j‰…ÜĽđ˙‹‰Eđ‹}ŤťŻ˝đ˙ŤµćĽđ˙‰\$‰t$‹E‰$čńý˙˙…Ŕu0‹E‰D$‰\$ ‰t$‹E ‰D$‰<$čĺô˙˙…ŔtÇ‹…ŕĽđ˙Ť€Îëř˙ąu‹…ŕĽđ˙Ť€Ş‰$čVů˙˙ą˙˙˙˙‹…ÜĽđ˙‹;Eđu ‰ČÄ,C^_[]Ăč~U‰ĺVP‹u…ö~Ç$-č´NuńÇ$ čĄÄ^]ĂU‰ĺSWVěč[‹MŤ¤‰D$Ť» ‰D$ ‹E ‰D$Ť‰D$‹}$‰<$č3}t ‰|$ Ťë ‰|$ Ť‰$ÇD$ÇD$č ‹uň†¨ňްňUňT$Ť‰D$‰<$ň\ŃňY“‹ň\Áň^ĐňT$čµ ‹†Ŕ‰Ů‰ű‹}ř‰D$ ŤG‰D$Ť#‰D$‰$č‰ ‰đ¸Ŕ‰Ú‹M~(ů1öľ1‰T$‰$‰Ë‰×č… ‰ú‰Ů‹EF;°Ŕ|܉T$Ç$ če 1ŔÄ^_[]ĂU‰ĺSWVě čX‹u‹‹}‹]ůčuŤ€î‰$č—÷˙˙‹¸˙˙˙˙ë1ŔÁቋÁá‹U ‰T‹ÁáňEňD˙Ä ^_[]ĂU‰ĺSWVěLčX‰EäÇ$čŕ ‹‰$čĆ ‰$čň ‹8Ç$čŔ ‹‰$č¦ ‰$čş ‹ŤQü‰‹1Ç$č™ ‹‰$č ‰$čĄ ‹Ç$čy ‹‰$Ťł)ÇčZ ˙…w‰÷ŤO‰M̉$čo ‹‹D¸‹XÇ$č< ‹‰$č" ‰ĆöÇt‰4$čA ‹‹D¸‹@ ë@Ç$č ‹‰$čň ‰$č ‹‹D¸‰D$‰4$ÇD$ ÇD$čP ‰EäÇ$čÉ ‹‰$čŻ ‰$čŐ ‹‹D¸‹XÇ$č˘ ‹‰$č ‰ĆöÇt‰4$č§ ‹‹D¸‹@ ë@Ç$čr ‹‰$čX ‰$č~ ‹‹D¸‰D$‰4$ÇD$ ÇD$č¶ ‰EŕÇ$č/ ‹‰$č ‰$č; ‹‹D¸ ‹XÇ$č ‹‰$čî ‰ĆöÇt‰4$č ‹‹D¸ ‹ňë8Ç$čŐ ‹‰$č» ‰$čá ‹‹D¸ ‰D$‰4$č# Ý]čňEčňEŘÇ$č ‹‰$č~ ‰$č¤ ‹‹D¸‹XÇ$čq ‹‰$čW ‰ĆöÇt‰4$čv ‹‹D¸‹@ ë@Ç$čA ‹‰$č' ‰$čM ‹‹D¸‰D$‰4$ÇD$ ÇD$č… ‰EÔÇ$čţ‹‰$čä ‰$č ‹‹D¸‹XÇ$č׋‰$č˝ ‰ĆöÇt‰4$čÜ‹‹D¸‹@ ë@Ç$č§‹‰$čŤ ‰$čł‹‹D¸‰D$‰4$ÇD$ ÇD$čë‰EĐÇ$čd‹‰$čJ ‰$čp‹‹D¸‰ţ‹XÇ$č;‹‰$č! ‰ÇöÇt‰<$č@‹‹D°‹x ëBÇ$č ‹‰$čń‰$č‹‹D°‰D$‰<$ÇD$ ÇD$čO‰ÇÇ$čÉ‹‰$诉$čÉ‹ŠXÇ$見‰$čŚöĂ t(‰$čˇÇ$苉$či‰$čwë‰$č÷‰|$‹EЉD$‹EÔ‰D$ ňZEŘóD$‹Eŕ‰D$‹Eä‰$č‘ë˙˙Ç$č,‹‰$č‰$č8‹0Ç$č ‹‰$čň‰$‹EĚŤ4†č‰0ÄL^_[]Ă‹M ‹U䍒©‰T$‰L$‰$čU‰ĺSWVěLčX‰EđÇ$č°‹‰$č–‰$č‹0Ç$苉$čv‰$芋ŤQü‰‹Ç$či‹‰$čO‰$ču‹‰EěÇ$čF‹‰$č,‰$čR‹‹D‹xÇ$苉$‹EěŤ)ƉučŤC‰Eŕčô‰Ć‰řöÄt‰4$č‹‹D‹@ ë@Ç$čÜ‹‰$č‰$čč‹‹D‰D$‰4$ÇD$ "ÇD$č ‰Eě}č|0Ç$č“‹‰$čy‰$čź‹‹tÇEčéűÇ$čc‹‰$čI‰ĆÇ$čK‹‰$č1‹MđŤ‘Ĺ ‰Uč‰T$ ‹Uě‰T$Ťąľ ‰|$‰$č`‰D$‰4$ÇD$čR‰Ć…öt#‹F<u‹F €x t‹EđŤ€Ĺ ‰EčëlöÄ˙uoÇ$čĎ‹‰$赉ĆÇ$č·‹‰$čť‹MđŤ‰Đ ‰Mč‰L$ ‹Mě‰L$‰|$‰$č҉D$‰4$ÇD$čĉƅö„ŢÇ$č`‹‰$čF‹}đŤŹŘ ‰L$‰$ÇD$蓉EÜÇ$č*‹‰$čŤŹÜ ‰L$‰t$‰$č„Ŕt‰]ä˙Fë'‰]äÇ$č$čŐ‰t$‰$č?‰ĆÇ$čË‹‰$豋M܉L$‰$ÇD$č@‰ĂÇ$čś‹‰$肉\$‰t$‰$1˙č…Ŕ„1Ç$čl‹‰$čR‰EÜÇ$čS‹‰$č9‰ÇÇ$č;‹‰$č!‰\$‰$čljD$‰<$č…‰EŘÇ$č ‹‰$čđ‰ÇÇ$čň‹‰$č؉t$‰$č~‰D$‰<$č<‰D$ ‹UđŤ‚‹Mč…ÉEÁ‰D$ŤŠŤ‚DȉL$‰Á‹}ěEωL$ŤŠDȉL$‹E؉D$ ‰|$Ť‚ä ‰D$‹E܉$迉ÇÇ$č]‹‰$čC‰|$‰$čł…ŰtS‹C…Ŕt*H‰CuFÇ$č*‹‰$č‰\$‰$čžë"Ç$苉$čě‰\$‰$čt…ö‹]ätS‹F…Ŕt*H‰FuFÇ$čĐ‹‰$趉t$‰$čDë"Ç$謋‰$č’‰t$‰$č…˙…\Ç$č‚‹‰$čh‹U𫩲 ‰L$ ŤŠÂú˙˙‰L$ŤŠ3‰L$‰$čŻÇ$čC‹‰$č)‰$ča8thÇ$č ‹‰$č‰ĆÇ$苉$čî‰$č‹8Ç$č苉$čΉ$苉D$‰|$‰4$čúÇ$踋‰$螉$čЉĆÇ$苉$č~‰$褋‰tÇ$čt‹‰$čZ‰$耋0Ç$čT‹‰$č:‰$‹EŕŤ4†č`‰0ÄL^_[]ĂÇ$č&‹‰$č ‹O ‰L$‹MđŤ‰0‰L$‰$č<˙%0˙%0˙%0˙% 0˙%$0˙%(0˙%,0˙%00˙%40˙%80˙%<0˙%@0˙%D0˙%H0˙%L0˙%P0˙%T0˙%X0˙%\0˙%`0˙%d0˙%h0˙%l0˙%p0˙%t0˙%x0˙%|0˙%€0˙%„0˙%0˙%Ś0˙%0˙%”0˙%0˙%ś0˙% 0˙%¤0˙%¨0˙%¬0˙%°0˙%´0˙%¸0˙%Ľ0˙%Ŕ0˙%Ä0héhéh1éúhPéđhdéćhéÜhźéŇhąéČhŃéľhîé´héŞhé h1é–hBéŚhUé‚hjéxhénh‘édh©éZhŔéPhÓéFhěé<hé2hé(h1éhJéhbé hséh0˙%0h‹éę˙˙˙hžéŕ˙˙˙h·éÖ˙˙˙hĚéĚ˙˙˙hŮéÂ˙˙˙hčé¸˙˙˙höé®˙˙˙h é¤˙˙˙héš˙˙˙h,é˙˙˙hDé†˙˙˙hSé|˙˙˙hbér˙˙˙h~éh˙˙˙h‹é^˙˙˙h›éT˙˙˙hŞéJ˙˙˙Đ?$@$ŔY@MAIN: get_matrix failed.rMAIN: open_seq_file failed.wMAIN: open_outfile failed.MAIN: loop_on_seqs failed.DO_SEQ: save_hit failed%ld %.3f %s GET_CMD_ARGS: Too few arguments.GET_MATRIX: could not open specified file.%lf,%*cGET_MATRIX: too many counts.GET_SEQUENCE: Sequence too long.LOOP_ON_SEQS: get_sequence failed.LOOP_ON_SEQS: do_seq failed.%s TFBS %s %s - + %6.3f %6.1f %ld %ld SAVE_HIT: MAXHITS limit reached.matrixfile, seqfile, threshold, tfname, tfclass, outfilepwmsearch.c%s::%sXS_VERSIONVERSION0.2version%s object version %-p does not match %s%s%s%s %-p$::bootstrap parameter%sTFBS::Ext::pwmsearch::search_xs{ 44#(4 0):)D)N)X)b)l)v)€)Š)”)ž)¨)˛)Ľ)Ć)Đ)Ú)ä)î)ř)* ** ***4*>*T*^*h*r*|*†**š*¤*®*¸*Â*Ě*Ö*ŕ*ę*ô*err_log function failure!`- ¤P€,p™pQ@__DefaultRuneLocaleQq@___stack_chk_guard@___stderrp@dyld_stub_binder€ě˙˙˙˙˙˙˙˙q>@_Perl_Gthr_key_ptrq>@_Perl_Icurpad_ptrq>@_Perl_Imarkstack_ptr_ptrq >@_Perl_Iop_ptrq$>@_Perl_Iscopestack_ix_ptrq(>@_Perl_Istack_base_ptrq,>@_Perl_Istack_sp_ptrq0>@_Perl_Isv_yes_ptrq4>@_Perl_Iunitcheckav_ptrq8>@_Perl_call_listq<>@_Perl_croakq@>@_Perl_croak_xs_usageqD>@_Perl_formqH>@_Perl_get_svqL>@_Perl_newSVpvfqP>@_Perl_newSVpvnqT>@_Perl_newXSqX>@_Perl_new_versionq\>@_Perl_sv_2mortalq`>@_Perl_sv_2nvqd>@_Perl_sv_2pv_flagsqh>@_Perl_sv_derived_fromql>@_Perl_sv_freeqp>@_Perl_sv_free2qt>@_Perl_sv_newmortalqx>@_Perl_upg_versionq|>@_Perl_vcmpq€>@_Perl_vstringifyq„@___maskruneq@___stack_chk_failqŚ@___strcpy_chkq@_atofq”@_fcloseq@_fgetsqś@_fopen$UNIX2003q @_fprintfq¤@_fscanfq¨@_fwrite$UNIX2003q¬@_mallocq°@_printfq´@_pthread_getspecificq¸@_putcqĽ@_putcharqŔ@_strcpyqÄ@_strlen_ do_seyannounce‹berr_Óget_ěloop_on_seqsśmarkˇoutput¦save_hit«XS_TFBS__Ext__pwmsearch_search_xs°arch†qÎűĹest_´oot_TFBS__Ext__pwmsearchµsaveÄpullÉď«ălogâshowç„'ä'cmd_argsŤmatrix’sequence—Á(™+Í/‘3Ň4˙4˝7¤8ÔBűĘ*<8ˇ `]Ř´ÄÁ-ľg° úŢ lDúŢ ( libSystem.BúŢ ( libSystem.Bd"d.f:#R.{ Y${ d„$ĘNĘ.Ez$E$*N*.o„$o$<N<.«Ź$«$8N8.ăš$ă$ˇNˇ.„˘$„«„$`N`.äÉ$ä$]N].AÓ$A$XNX.™á$™$4N4.Íí$Íű„$ÄNÄ.‘$‘$ÁNÁ.R$R$-N-.$$$>N>.˝,$˝$gNg.$6$$Y„$0N0.T!f$T!$ÎNÎ& +& Č0Ź& ä0™& č0d˘ +© Č0° ä0ş č0Ă$ćEđ«űoT!!{ ,ă4„=äGAU™aÍo‘}R‹˝•ţ¨ţşţÓţáţúţţ$ţ6ţMţ]ţiţ~ţ‰ţ–ţĄţ´ţŔţŇţăţđţţţ'ţ6ţIţ[ţfţw‹—©ĽÇŐŰăęú $,AGPX``abcdefghijklmnopqrstuvwxyz{}~‚„…†‡‰Š‹ŚŤŽŹ@|€`abcdefghijklmnopqrstuvwxyz{}~‚„…†‡‰Š‹ŚŤŽŹ /Users/gtan/src/TFBS-0.5.0/Ext/pwmsearch.c/Users/gtan/src/TFBS-0.5.0/Ext/pwmsearch.o_do_search./lib/pwm_searchPFF.c_announce_best_save_best_pull_do_seq_err_log/usr/include/secure/_string.h_err_show_get_cmd_args_get_matrix_get_sequence/usr/include/ctype.h_loop_on_seqs_mark_output_save_hit_XS_TFBS__Ext__pwmsearch_search_xspwmsearch.xs_boot_TFBS__Ext__pwmsearch_TRANS_PANIC_NUM_ERRS___ERR___TRANS_PANIC_NUM_ERRS___ERR___XS_TFBS__Ext__pwmsearch_search_xs_announce_best_pull_best_save_boot_TFBS__Ext__pwmsearch_do_search_do_seq_err_log_err_show_get_cmd_args_get_matrix_get_sequence_loop_on_seqs_mark_output_save_hit_Perl_Gthr_key_ptr_Perl_Icurpad_ptr_Perl_Imarkstack_ptr_ptr_Perl_Iop_ptr_Perl_Iscopestack_ix_ptr_Perl_Istack_base_ptr_Perl_Istack_sp_ptr_Perl_Isv_yes_ptr_Perl_Iunitcheckav_ptr_Perl_call_list_Perl_croak_Perl_croak_xs_usage_Perl_form_Perl_get_sv_Perl_newSVpvf_Perl_newSVpvn_Perl_newXS_Perl_new_version_Perl_sv_2mortal_Perl_sv_2nv_Perl_sv_2pv_flags_Perl_sv_derived_from_Perl_sv_free_Perl_sv_free2_Perl_sv_newmortal_Perl_upg_version_Perl_vcmp_Perl_vstringify__DefaultRuneLocale___maskrune___stack_chk_fail___stack_chk_guard___stderrp___strcpy_chk_atof_fclose_fgets_fopen$UNIX2003_fprintf_fscanf_fwrite$UNIX2003_malloc_printf_pthread_getspecific_putc_putchar_strcpy_strlendyld_stub_binderĎúíţ Đ…x__TEXT00__text__TEXTó ëó €__stubs__TEXTŢ$Ţ$€__stub_helper__TEXTě%Ňě%€__const__TEXTŔ' Ŕ'__cstring__TEXTŕ)¬ŕ)__unwind_info__TEXTŚ,|Ś,__eh_frame__TEXT-đ-Ř__DATA00__nl_symbol_ptr__DATA00-__got__DATA00/__la_symbol_ptr__DATA(0h(02__data__DATA11__bss__DATA°10H__LINKEDIT@ @Ü"€0@@`h@¸ D@đE‘|P` PPP`1O_ťô€fë?˝ůeć5żl$  * 8©/usr/lib/libSystem.B.dylib&`E )€E+€EpUH‰ĺAWAVAUATSHěxRL‰…p­˙˙I‰ÍH‰Óó…l­˙˙I‰öH‰řH‹ ď$H‹ H‰MĐÇv&LŤĄx­˙˙L‰çH‰ĆşŘčALŤ˝Jµ˙˙L‰˙L‰öş č*óZ…l­˙˙HŤ˝3ą˙˙ň…@Á˙˙H‰ŢşčHŤ˝˝˙˙L‰îş4čňHŤµPÁ˙˙Ç…Á˙˙Ç…Á˙˙L‰çčk…ŔtHŤ=č.»˙˙˙˙é‰HŤ5L‰˙čľI‰ÇM…˙tHHŤ5H‹˝p­˙˙čŁI‰ĆM…ötCHŤ˝x­˙˙HŤµPÁ˙˙L‰úL‰ńčđ 1Ű…Ŕt8HŤ=˙čÇë%HŤ=¸čąE1˙»˙˙˙˙ëHŤ=ŔčŁE1ö»˙˙˙˙čëL‰˙č+L‰÷č#H‹–#H‹H;EĐu‰ŘHÄxR[A\A]A^A_]ĂččUH‰ĺH‰řH‹ n#H‹9H‰Ć0Ŕčń1Ŕ]ĂUH‰ĺHżx f.‡¨vH‰·ň‡¨‰—°1Ŕ]ĂUH‰ĺH‹‡H‰Hżxň‡¨ň‹‡°‰1Ŕ]ĂUH‰ĺAWAVAUATSHě^L‰…ء˙˙H‰•ŕˇ˙˙I‰öI‰űH‹Ĺ"H‹H‰EĐMcĐM…Ŕ~R1Ň1Ŕľ€<DĆH˙ÂL9Â|ńIÇ˙˙˙˙IŤ“H‰•đˇ˙˙…Ŕt?H‰Ťřˇ˙˙IŤH‰…čˇ˙˙E1˙éPH‰Ťřˇ˙˙IÇ˙˙˙˙IŤH‰…đˇ˙˙ëH‰Ťřˇ˙˙IŤH‰…čˇ˙˙1ŇLŤ čAĽI‰×ë E‹ĐH˙ÂIcŔHĐH‹Ťřˇ˙˙€|˙„ăfWŔE…ŔfWÉ~TH‹…řˇ˙˙LŤCŤL€űfWŔ1ö1˙fWÉIľ2A‹D‰ă)ĂřDŘËřHcŔHcŰňAX ĆÇÁűH˙ĆňAXŢD9Ć|ČÇ…˘˙˙fA.‹Č†Mň…˘˙˙H‹…čˇ˙˙8ujA» …ęM‰ĚÇ…˘˙˙L‰ßL‰ŰH‹µŕˇ˙˙I‰ŐH‹Ťřˇ˙˙E1Ŕf(ÁL‹ŤŘˇ˙˙č …ŔtHŤ=fčůÇ…˘˙˙˙˙˙˙M‰áL‰ęI‰Űë~ňŤĐˇ˙˙M‰ÝH‰ÓM‰Ě°I˙čuHŤ=PčĽ0ŔKŤ H‰ÚH‰”͢˙˙DŽ͢˙˙ň…С˙˙ň„Í ˘˙˙I˙Ç„ŔuHŤ=ĐH‰ÓčyH‰ÚÇ…˘˙˙˙˙˙˙M‰áM‰ëAĽň…˘˙˙ëAH‹…đˇ˙˙H8ň…˘˙˙x fA.‹¨vI‰“ňA‹¨AǰAĽfA.ȆúH‹…čˇ˙˙8uMA» …­M‰ĚL‰ßL‰ŰH‹µŕˇ˙˙I‰ŐH‹Ťřˇ˙˙A¸L‹ŤŘˇ˙˙輅Ŕ…™M‰áL‰ęI‰Űëfň…˘˙˙L‰ŰI‰ŐM‰Ě°I˙čuHŤ=č…0ŔKŤ L‰ęH‰”͢˙˙DŽ͢˙˙ň…˘˙˙ň„Í ˘˙˙I˙Ç„ŔM‰áI‰Ű„ AĽë3H‹…đˇ˙˙H8x fA.¨vI‰“ňA¨Aǰ‹ť˘˙˙…Ű„ý˙˙A‰ŢëE1öH‹…čˇ˙˙8t7M…˙Ž€HŤť ˘˙˙LŤ%RňH‹sđH˙ĆL‰ç°č…HĂI˙ĎuâëRH‹…đˇ˙˙H‹H…ŇH‹Ťřˇ˙˙xu€xnuAŤL$HcÉH‹4Ë€>…s˙˙˙€xcuAŤD$HcŔH‹4Ă€>…]˙˙˙A˙ÄE9üŚc˙˙˙1ŔHÄ[A\A]A^A_]ĂUH‰ĺAWAVAUATSHěhI‰öI‰üH‹ÉH‹H‰EĐHŤ5śčHH‰…xŕ˙˙H…Ŕ„ŞE1˙HŤť€ŕ˙˙A˝H‹˝xŕ˙˙HŤ5KH‰Ú0Ŕčř˙¸DDřE…˙”ŔAýçEŤmHĂ„ŔuÁE…˙„[A˙ÍH‹˝xŕ˙˙čĹD‰éÁůÁéDéÁůA‰Ś$ĐAýŚ“1Ŕ1ŇňşHcňIŤ<öŤ4€Hcö1Űë H˙ĂA‹Ś$ĐŻËÁHcÉňŚÍ€ŕ˙˙ň ßűuÚňA öŤNHcÉňAX ÎŤNHcÉňAX ÎŤNHcÉňAX ÎÂĆH˙ŔHcÎňYČňA ÎA‹Ś$Đ9ČŚy˙˙˙1ŔIÇ„$ŔIÇ„$¸…ÉŽ“1ŇWŔň ň WŰH‰ÖHÁć HŤ4¶HÁţ IŤ4ö1˙(ę(áň4ţň]ćň_îH˙Ç˙uëňXÝňAś$¸ňXÄH˙ÂňA„$Ŕ9Ę|­ë&HŤ=ŚëHŤ=·čĐű˙˙H‹˝xŕ˙˙ča¸˙˙˙˙H‹ ĎH‹ H;MĐuHÄh[A\A]A^A_]Ăč#UH‰ĺAWAVAUATSHěčH‰•řţ˙˙I‰÷I‰ýL‰­đţ˙˙H‹€H‹H‰EĐHŤ˝˙˙˙ľČL‰ęčńH…ŔA”ĆtXHŤµ˙˙˙L‰˙čL‰˙čBĆD8˙L‰˙čH…Ŕt.»A´A€|˙ u AĆD˙E0äL‰˙čßH9ĂsH˙ĂE„äu×A¶ŢE1ä1Éë D‰ůL‹­đţ˙˙A‰ĎD‰ř Ř…¬HŤ˝˙˙˙ľČL‰ęčYąDе˙˙˙H…ŔtÍA€ţ>tÇ…ŰD‰ůuŔLŤ­˙˙˙ëEŠuI˙ĹAľţ1Ű@„˙t›x IľĆH‹ ‹D<öÄuŘE„öx%@ë ľ@čÖ …ŔuľIü?BH‹…řţ˙˙F4 I˙ÄëĄHŤ=-č(ú˙˙»˙˙˙˙é<˙˙˙H‹…řţ˙˙BĆ H‹"H‹H;EĐu‰ŘHÄč[A\A]A^A_]Ăčt UH‰ĺAWAVAUATSHě8CH‰Ť¨Ľđ˙I‰×I‰ôI‰ýH‹ŐH‹H‰EĐHŤť°Ľđ˙LŤµ€˝đ˙L‰˙H‰ŢL‰ňčţ˙˙…Ŕu%L‰ďL‰ćH‰ÚL‰ńL‹…¨Ľđ˙č˘ô˙˙…ŔtŇHŤ=şëř˙ąuHŤ=č\ů˙˙ą˙˙˙˙H‹gH‹H;EĐu‰ČHÄ8C[A\A]A^A_]Ăčą UH‰ĺSP‰ű…Ű~ż-čń ˙Ëuňż HÄ[]éÝ UH‰ĺAWAVAUATSPM‰ĎňEĐE‰ĹI‰ÎH‰ÓH‰đI‰üIŤŚ$» MŤ„$¤HŤ5,L‰˙H‰Â0Ŕčj E…ít HŤ="ëHŤ=ľşL‰ůčO ňAś$¸ňA”$ŔňEĐ(Čň\ĘňY Cň\Úň^ËHŤ5ÖL‰˙°č HŤSHŤ5ÎIcŚ$ĐHŮL‰˙0Ŕčâ AĽ$Đ~IŢ1ŰAľ<L‰ţčé H˙ĂA;ś$Đ|ćż L‰ţčĎ 1ŔHÄ[A\A]A^A_]ĂUH‰ĺAWAVATSHěH‰ËI‰ÖňEŘA‰÷I‰üH‹ HůčuHŤ=Gčł÷˙˙H‹ ¸˙˙˙˙ë1ŔHŤ IM‰$ÎH‹ HŤ IE‰|ÎH‹ HŤ IňEŘňADÎH˙HÄ[A\A^A_]ĂUH‰ĺAWAVAUATSHě(I‰ö1˙č2 H‹8č H‰ÇčF H‹1˙č H‹8č H‰Çč H‹HŤQüH‰D‹91˙čô H‹8čÜ H‰Çč Hąř˙˙˙JŤýHH)ÓH!Ë1˙Hű0…ńč¸ H‹8č  AŤOLcńH‰Çčż H‹J‹đD‹` 1˙čŹ H‹8čw H‰ĂD‰ŕöÄtH‰ßč’ H‹J‹đH‹@ë-1˙č` H‹8čH H‰Çčn H‹J‹4đH‰ß1Ňąč˛ H‰E¸1˙č/ H‹8č AŤOLcáH‰Çč6 H‹J‹ŕD‹h 1˙č H‹8čî H‰ĂD‰čöÄtH‰ßč H‹J‹ŕH‹@ë-1˙č×H‹8čż H‰ÇčĺH‹J‹4ŕH‰ß1Ňąč) H‰E°1˙č¦H‹8čŽ AŤOLcáH‰Çč­H‹J‹ŕD‹h 1˙č}H‹8če H‰ĂD‰čöÄtH‰ßč€H‹J‹ŕH‹ňë&1˙čKH‹8č3 H‰ÇčYH‹J‹4ŕH‰ßčžňEĐL‰uČ1˙čH‹8č AŤOLcáH‰Çč#H‹J‹ŕD‹h 1˙čóH‹8čŰH‰ĂD‰čöÄtH‰ßčöH‹J‹ŕH‹@ë-1˙čÄH‹8č¬H‰ÇčŇH‹J‹4ŕH‰ß1ŇąčH‰EŔ1˙č“H‹8č{AŤOLcéH‰ÇčšH‹J‹čD‹p 1˙čjH‹8čRH‰ĂD‰đöÄtH‰ßčmH‹J‹čL‹hë01˙č;H‹8č#H‰ÇčIH‹J‹4čH‰ß1ŇąčŤI‰ĹL‹uČ1˙čH‹8čďAÇMc˙H‰ÇčH‹J‹řD‹` 1˙čŢH‹8čĆH‰ĂD‰ŕöÄtH‰ßčáH‹J‹řL‹xë01˙čŻH‹8č—H‰Çč˝H‹J‹4řH‰ß1ŇąčI‰Ç1˙čH‹8čgH‰ÇčH‹ŠX#1˙čbH‹8čJH‰ÇöĂ tč_1˙čFH‹8č.H‰Çč<ëčżňZEĐH‹}¸H‹u°H‹UŔL‰éM‰řč(ě˙˙1˙č H‹8čôH‰ÇčH‹JŤđ1˙čîH‹8čÖH‰ÇčH‰HÄ([A\A]A^A_]ĂčÇH‹8čŻHŤŚ H‰ÇL‰öčďUH‰ĺAWAVAUATSHěH1˙č•H‹8č}H‰Çč©L‹01˙č{H‹8čcH‰ÇčwH‹HŤQüH‰D‹91˙čWH‹8č?H‰ÇčeAŤOLcáH‹JŤřI)ĆIÁî1˙č+H‹8čH‰Çč9H‹J‹ŕD‹h 1˙č H‹8čńH‰ĂD‰čöÄtH‰ßč H‹J‹ŕL‹hë01˙čÚH‹8čÂH‰ÇččH‹J‹4ŕH‰ß1Ňą"č,I‰ĹL‰eŔAţ|-E1ö1˙čťH‹8č…AŤOHcŮH‰Çč¤H‹L‹<ŘéĂ1˙čsH‹8č[H‰Ă1˙čaH‹8čIHŤ5k LŤ5k H‰ÇL‰ęL‰ń0ŔčH‰ßH‰Ć1Ňč|I‰ÇM…˙t A‹G <uI‹G€x tLŤ5- ëTöÄ˙uX1˙č˙H‹8čçH‰Ă1˙číH‹8čŐHŤ5÷ LŤ5 H‰ÇL‰ęL‰ń0ŔčH‰ßH‰Ć1ŇčI‰ÇM…˙„oL‰u¸L‰mĐ1˙čźH‹8č‡HŤ5Ă H‰ÇşčÝI‰Ć1˙čyH‹8čaHŤˇ H‰ÇL‰ţčÝ„ŔtA˙Gë1˙čNH‹8č6H‰ÇL‰ţčˇI‰ÇE1í1˙č.H‹8čH‰ÇL‰ö1ŇčŻH‰ĂH‰]Č1˙č H‹8čóH‰ÇL‰ţH‰Úč‘…Ŕ„ 1˙čćH‹8čÎH‰E°1˙čÓLŤ%@ HŤ6 HŤ 1 H‹8H‹]¸H…ŰHDŃH‰U¨I‰ÍLEmĐLDáLŤ5 LEóč€H‰Ă1˙č†H‹8čnH‰ÇH‹uČčH‰ßH‰ĆčÓH‰E¸1˙č\H‹8čDH‰Ă1˙čJH‹8č2H‰ÇL‰ţčŮH‰ßH‰ĆčHŤ5d H‰D$L‰t$L‰$$H‹}°H‹UĐH‹M¸L‹E¨M‰é0ŔčQI‰Ĺ1˙čóH‹8čŰH‰ÇL‰îčLH‹]ČH…ŰtD‹C…Ŕt#˙ȉCu61˙čÂH‹8čŞH‰ÇH‰Ţč9ë1˙č¦H‹8čŽH‰ÇH‰ŢčM…˙tFA‹G…Ŕt$˙ČA‰Gu61˙čwH‹8č_H‰ÇL‰ţčîë1˙č[H‹8čCH‰ÇL‰ţčĚM…í…1˙č8H‹8č HŤ5·HŤŐö˙˙HŤ (H‰Çčs1˙č H‹8čôH‰Çč,H8tO1˙čďH‹8č×I‰Ć1˙čÝH‹8čĹH‰Çčĺ‹1˙čÄH‹8č¬H‰ÇčäH‹L‰÷‰ŢčÝ1˙č H‹8čH‰ÇčşI‰Ć1˙č†H‹8čnH‰Çč”H‹H‹]ŔL‰4Ř1˙čdH‹8čLH‰ÇčrH‹HŤŘ1˙čFH‹8č.H‰ÇčZH‰HÄH[A\A]A^A_]Ă1˙čH‹8čI‹UHŤ5•H‰Ç0Ŕč<˙%D ˙%F ˙%H ˙%J ˙%L ˙%N ˙%P ˙%R ˙%T ˙%V ˙%X ˙%Z ˙%\ ˙%^ ˙%` ˙%b ˙%d ˙%f ˙%h ˙%j ˙%l ˙%n ˙%p ˙%r ˙%t ˙%v ˙%x ˙%z ˙%| ˙%~ ˙%€ ˙%‚ ˙%„ ˙%† ˙% ˙%Š ˙%Ś ˙%Ž ˙% ˙%’ ˙%” ˙%– ˙% ˙%š ˙%ś héhéh1éúhPéđhdéćhéÜhźéŇhąéČhŃéľhîé´héŞhé h2é–hDéŚhXé‚hnéxh„énh—édh°éZhČéPhÜéFhöé<hé2h(é(h>éhXéhqé héLŤýAS˙%íh›éć˙˙˙h®éÜ˙˙˙hÇéŇ˙˙˙hÜéČ˙˙˙hééľ˙˙˙hřé´˙˙˙héŞ˙˙˙hé ˙˙˙h$é–˙˙˙h3éŚ˙˙˙hBé‚˙˙˙hQéx˙˙˙h`én˙˙˙h|éd˙˙˙h‰éZ˙˙˙h™éP˙˙˙h¨éF˙˙˙Đ?$@$ŔY@MAIN: get_matrix failed.rMAIN: open_seq_file failed.wMAIN: open_outfile failed.MAIN: loop_on_seqs failed.DO_SEQ: save_hit failedDO_SEQ: output failed%ld %.3f %s GET_CMD_ARGS: Too few arguments.GET_MATRIX: could not open specified file.%lf,%*cGET_MATRIX: too many counts.GET_SEQUENCE: Sequence too long.LOOP_ON_SEQS: get_sequence failed.LOOP_ON_SEQS: do_seq failed.%s TFBS %s %s - + %6.3f %6.1f %ld %ld SAVE_HIT: MAXHITS limit reached.matrixfile, seqfile, threshold, tfname, tfclass, outfilepwmsearch.c%s::%sXS_VERSIONVERSION0.2version%s object version %-p does not match %s%s%s%s %-p$::bootstrap parameter%sTFBS::Ext::pwmsearch::search_xs$$ŃX ó <<ß$< 4±1c¶ŕ  #ž!zRx ,ËÝ˙˙˙˙˙˙±A†C PŚŤŽŹ$LLß˙˙˙˙˙˙A†C $tCß˙˙˙˙˙˙1A†C $śLß˙˙˙˙˙˙0A†C ,ÄTß˙˙˙˙˙˙ÝA†C PŚŤŽŹ$ôä˙˙˙˙˙˙UA†C CŽ,.ä˙˙˙˙˙˙SA†C GŚŽŹ,LQä˙˙˙˙˙˙…A†C MŚŤŽŹ,|¦ĺ˙˙˙˙˙˙;A†C PŚŤŽŹ,¬±ç˙˙˙˙˙˙ŻA†C PŚŤŽŹ,Ü0é˙˙˙˙˙˙»A†C PŚŤŽŹ$ »é˙˙˙˙˙˙*A†C B,4˝é˙˙˙˙˙˙A†C JŚŤŽŹ,d¦ę˙˙˙˙˙˙{A†C KŚŽŹ,”ńę˙˙˙˙˙˙ A†C MŚŤŽŹ,Äaď˙˙˙˙˙˙­A†C MŚŤŽŹě%ö%& &&&(&2&<&F&P&Z&d&n&x&‚&Ś&–& &Ş&´&ľ&Č&Ň&Ü&ć&đ&ú&''('2'<'F'P'Z'd'n'x'‚'Ś'–' 'Ş'´'err_log function failure!(`-@__DefaultRuneLocaleQq@___stack_chk_guard@___stderrp@dyld_stub_binder€Ř˙˙˙˙˙˙˙˙q(>@_Perl_Gthr_key_ptrq0>@_Perl_Icurpad_ptrq8>@_Perl_Imarkstack_ptr_ptrq@>@_Perl_Iop_ptrqH>@_Perl_Iscopestack_ix_ptrqP>@_Perl_Istack_base_ptrqX>@_Perl_Istack_sp_ptrq`>@_Perl_Isv_yes_ptrqh>@_Perl_Iunitcheckav_ptrqp>@_Perl_call_listqx>@_Perl_croakq€>@_Perl_croak_xs_usageq>@_Perl_formq>@_Perl_get_svq>@_Perl_newSVpvfq >@_Perl_newSVpvnq¨>@_Perl_newXSq°>@_Perl_new_versionq¸>@_Perl_sv_2mortalqŔ>@_Perl_sv_2nvqČ>@_Perl_sv_2pv_flagsqĐ>@_Perl_sv_derived_fromqŘ>@_Perl_sv_freeqŕ>@_Perl_sv_free2qč>@_Perl_sv_newmortalqđ>@_Perl_upg_versionqř>@_Perl_vcmpq€>@_Perl_vstringifyq@___maskruneq@___stack_chk_failq@___strcpy_chkq @_atofq¨@_fcloseq°@_fgetsq¸@_fopenqŔ@_fprintfqČ@_fscanfqĐ@_fwriteqŘ@_mallocqŕ@_printfqč@_pthread_getspecificqđ@_putcqř@_putcharq€@_strcpyq@_strlen_ do_seyannounce‹berr_Óget_ěloop_on_seqsśmarkˇoutput¦save_hit«XS_TFBS__Ext__pwmsearch_search_xs°arch†qÎó¤est_´oot_TFBS__Ext__pwmsearchµsaveÄpullÉĂô¤logâshowç$Ö$cmd_argsŤmatrix’sequence—©%®(é,0Ó1ý1–4‘5±>ó±10Ý US…»Ż»*™{  úŢ lDúŢ ( libSystem.BúŢ ( libSystem.Bd"d.f:#R.ó Y$ó d„$±N±.¤ z$¤ $N.Ă „$Ă $1N1.ô Ź$ô $0N0.$ š$$ $ÝNÝ.˘$«„$UNU.VÉ$V$SNS.©Ó$©$…N…..á$.$;N;.ií$iű„$ŻNŻ.$$»N».Ó$Ó$*N*.ý$$ý$N.,$${N{.‘6$‘Y„$ N .1f$1$­N­&ŕ'& 1Ź& °1™& Ŕ1d˘ŕ'© 1° °1ş Ŕ1Ă‘ć¤ đô űĂ 1!ó ,$ 4=VG©U.aio}Óý‹•ţ¨ţşţÓţáţúţţ$ţ6ţMţ]ţiţ~ţ‰ţ–ţĄţ´ţŔţŇţăţđţţţ'ţ6ţIţ[ţfţw‹—©ĽÇŐŰăęńú /5>FN`abcdefghijklmnopqrstuvwxyz{}~‚„…†‡‰Š‹ŚŤŽŹ@|€`abcdefghijklmnopqrstuvwxyz{}~‚„…†‡‰Š‹ŚŤŽŹ /Users/gtan/src/TFBS-0.5.0/Ext/pwmsearch.c/Users/gtan/src/TFBS-0.5.0/Ext/pwmsearch.o_do_search./lib/pwm_searchPFF.c_announce_best_save_best_pull_do_seq_err_log/usr/include/secure/_string.h_err_show_get_cmd_args_get_matrix_get_sequence/usr/include/ctype.h_loop_on_seqs_mark_output_save_hit_XS_TFBS__Ext__pwmsearch_search_xspwmsearch.xs_boot_TFBS__Ext__pwmsearch_TRANS_PANIC_NUM_ERRS___ERR___TRANS_PANIC_NUM_ERRS___ERR___XS_TFBS__Ext__pwmsearch_search_xs_announce_best_pull_best_save_boot_TFBS__Ext__pwmsearch_do_search_do_seq_err_log_err_show_get_cmd_args_get_matrix_get_sequence_loop_on_seqs_mark_output_save_hit_Perl_Gthr_key_ptr_Perl_Icurpad_ptr_Perl_Imarkstack_ptr_ptr_Perl_Iop_ptr_Perl_Iscopestack_ix_ptr_Perl_Istack_base_ptr_Perl_Istack_sp_ptr_Perl_Isv_yes_ptr_Perl_Iunitcheckav_ptr_Perl_call_list_Perl_croak_Perl_croak_xs_usage_Perl_form_Perl_get_sv_Perl_newSVpvf_Perl_newSVpvn_Perl_newXS_Perl_new_version_Perl_sv_2mortal_Perl_sv_2nv_Perl_sv_2pv_flags_Perl_sv_derived_from_Perl_sv_free_Perl_sv_free2_Perl_sv_newmortal_Perl_upg_version_Perl_vcmp_Perl_vstringify__DefaultRuneLocale___maskrune___stack_chk_fail___stack_chk_guard___stderrp___strcpy_chk_atof_fclose_fgets_fopen_fprintf_fscanf_fwrite_malloc_printf_pthread_getspecific_putc_putchar_strcpy_strlendyld_stub_binderTFBS-0.7.1/blib/bin/000077500000000000000000000000001305752266700137525ustar00rootroot00000000000000TFBS-0.7.1/blib/bin/.exists000066400000000000000000000000001305752266700152600ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/000077500000000000000000000000001305752266700137505ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/.exists000066400000000000000000000000001305752266700152560ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/000077500000000000000000000000001305752266700145065ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/DB.pm000066400000000000000000000005061305752266700153320ustar00rootroot00000000000000# This package should hold interface and common database manipulation # methods, if we decide there are any package TFBS::DB; use vars qw(@ISA); use strict; use Bio::Root::Root; use TFBS::Matrix; @ISA = qw(Bio::Root::Root); sub new { } sub get_MatrixSet { } sub get_Matrix_by_ID { } # not finished (apparently) TFBS-0.7.1/blib/lib/TFBS/DB/000077500000000000000000000000001305752266700147735ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/DB/FlatFileDir.pm000066400000000000000000000326001305752266700174570ustar00rootroot00000000000000# TFBS module for TFBS::DB::FlatFileDir # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::FlatFileDir - interface to a database of pattern matrices stored as a collection of flat files in a dedicated directory =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing directory my $db = TFBS::DB::FlatFileDir->connect("/home/boris/MatrixDir"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M00079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: my @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); # retrieving a set of all PFMs in the database my $matrixset = $db->get_MatrixSet(-matrixtype => "PFM"); =item * creating a new FlatFileDir database in a new directory: my $db = TFBS::DB::JASPAR2->create("/home/boris/NewMatrixDir"); =item * storing a matrix in the database: #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::FlatFileDir is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a set of flat files in a dedicated directory. It has a very simple structure and can be easily set up manually if desired. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::FlatFileDir; use vars qw(@ISA); use strict; use Bio::Root::Root; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::Matrix::PWM; use TFBS::MatrixSet; @ISA = qw(TFBS::DB Bio::Root::Root); =head2 new Title : new Usage : my $db = TFBS::DB::FlatFileDir->new(%args); Function: the formal constructor for the TFBS::DB::FlatFileDir object; most users will not use it - they will use specialized I or I constructors to create a database object Returns : a TFBS::DB::FlatFileDir object Args : -dir # the directory containing flat files =cut sub new { my $caller = shift; my $self = bless {_item => {}, _idlist_of_name=>{} , _idlist_of_class=>{} }, ref ($caller) || $caller; if (-d $_[0]) { $self->{dir} = $_[0]; } elsif ($_[0] eq '-dir' and -d $_[1]) { $self->{dir} = $_[1]; } else { $self->throw("Error initializing FlatFileDir database dir: ", ($_[1] or $_[0] or "No directory parameter passed.")); } $self->_load_db_index(); return $self; } =head2 connect Title : connect Usage : my $db = TFBS::DB::FlatFileDir->connect($directory); Function: Creates a database object that retrieves TFBS::Matrix::* object data from or stores it in an existing directory Returns : a TFBS::DB::FlatFileDir object Args : ($directory) The name of the directory (possibly with fully qualified path). =cut sub connect { my ($caller, $dir) = @_; $caller->new(-dir=>$dir); } =head2 create Title : create Usage : my $newdb = TFBS::DB::FlatFileDir->create($new_directory); Function: connects to the database server, creates a new directory, sets up a FlatFileDir database and returns a database object that interfaces the database Returns : a TFBS::DB::FlatFileDir object Args : ($new_directory) The name of the directory to create (possibly with fully qualified path). =cut sub create { my ($caller, $dir) = @_; if (-d $dir) { die ("Directory $dir exists") ; } mkdir ($dir) or die("Error creating directory $dir, stopped"); open FILE, ">$dir/matrix_list.txt" or die ("Error creating matrix_list.txt"); close FILE; $caller->new(-dir=>$dir); } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PWM is retrieved by default. =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; $self->throw("No ID passed to get_Matrix_by_ID.") unless defined $ID; $mt = defined $mt ? $self->_check_matrixtype($mt) : "PWM"; my $matrixobj; { no strict 'refs'; my $working_mt = $mt = uc $mt; my $matrixstring = $self->_read_file($ID,$mt) # if no desired $mt, is there a PFM? || $self->_read_file($ID,$working_mt="PFM") || return undef; eval("\$matrixobj= TFBS::Matrix::$working_mt->new".' ( -ID => $ID, -name => $self->{_item}->{$ID}->{name} || "", -class => $self->{_item}->{$ID}->{class}|| "", -matrix=> $matrixstring, -tags=> $self->{_item}->{$ID}->{tags} );'. "if (\$working_mt ne \$mt) {\$matrixobj = \$matrixobj->to_$mt;}"); if ($@) {$self->throw($@); } } # print "MATRIXOBJ: $matrixobj\n"; return $matrixobj; } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1', 'PWM'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM') Args : (Matrix_name, Matrix_type) Matrix_name is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PWM is retrieved by default. Warning : According to the current JASPAR2 data model, name is not necessarily a unique identifier. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You have been warned. =cut sub get_Matrix_by_name { my ($self, $name, $mt) = @_; my $ID=$self->{_idlist_of_name}->{$name}->[0] or return undef; if ((my $L= scalar @{ $self->{_idlist_of_name}->{$name} }) > 1) { $self->warn("There are $L matrices with name '$name'"); } return $self->get_Matrix_by_ID($ID, $mt); } sub get_matrix { # an obsolete method - kept for the time being for backward compatibility my ($self, %args) = @_; my $DIR = $self->{dir}; my $ID; # retrieval from .pwm files in a directory my $mt = ($self->_get_matrixtype_from_args(%args) or $self->throw("No -matrixtype provided.")); if ($args{-ID}) { $ID = $args{-ID}; } elsif (my $name = $args{-name}) { $ID=$self->{_idlist_of_name}->{$name}->[0] or $self->warn("No matrix with name $name found."); if ((my $L= scalar @{ $self->{_idlist_of_name}->{$name} }) > 1) { $self->warn("There are $L matrices with name '$name'"); } } else { $self->throw("No -ID or -name passed to ".ref($self)); } my $matrixobj; { no strict 'refs'; my $ucmt = uc $mt; my $matrixstring =`cat $DIR/$ID.$mt`; eval("\$matrixobj= TFBS::Matrix::$ucmt->new".' ( -ID => $ID, -name => $self->{_item}->{$ID}->{name}, -class => $self->{_item}->{$ID}->{class}, -matrix=> $matrixstring # FIXME - temporary );'); if ($@) {$self->throw($@); } } # print "MATRIXOBJ: $matrixobj\n"; return $matrixobj; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($matrixobj); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : ($matrixobj) # a TFBS::Matrix::* object =cut sub store_Matrix { my ($self, $matrixobj) = @_; my ($mt) = ($matrixobj =~ /TFBS::Matrix::(\w+)/) or $self->throw("Wrong type of object passed to store_Matrix."); if (defined $self->{_item}->{$matrixobj->ID()}) { $self->throw("ID ".$matrixobj->ID()." exists in the database."); } else { my $matrixfile = $self->{dir}."/".$matrixobj->ID().".".lc($mt); open FILE, ">$matrixfile" or $self->throw("Could not write file $matrixfile."); print FILE $matrixobj->rawprint; close FILE; my $ic = ($mt eq "ICM") ? $matrixobj->total_ic : ($mt eq "PFM") ? $matrixobj->to_ICM->total_ic : ""; $self->{_item}->{$matrixobj->ID()} = { 'name' => $matrixobj->name || "", 'ic' => $ic, 'class'=> $matrixobj->class || "" }; my %tags= $matrixobj->all_tags(); foreach my $named_tag (keys %tags){ $self->{_item}->{$matrixobj->ID()}{'tag'}{$named_tag}=$tags{$named_tag}; # print $named_tag , " ", $self->{_item}->{$matrixobj->ID()}{'tag'}{$named_tag}, "\n"; } $self->_update_db_index(); } return 0; } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (ID) A string Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, $ID) = @_; my $DIR = $self->{dir}; unlink <$DIR/$ID.*>; delete $self->{_item}->{$ID}; $self->_update_db_index(); } sub _update_db_index { my $self = shift; rename $self->{dir}."/matrix_list.txt", $self->{dir}."/~matrix_list.txt"; open FILE, ">".$self->{dir}."/matrix_list.txt"; foreach my $ID ( keys %{$self->{_item}} ) { print FILE join("\t", $ID, $self->{_item}->{$ID}->{ic}, $self->{_item}->{$ID}->{name}, $self->{_item}->{$ID}->{class} )."\t"; # add tagged annotation # my %tag = $self->{_item}->{$ID}->{'all_tags'}; foreach my $name(sort keys %{$self->{'_item'}->{$ID}{'tag'}}){ print FILE "; ", $name, " \"", $self->{'_item'}->{$ID}{'tag'}{$name}, "\"\ "; } print FILE "\n"; } close FILE; } sub _load_db_index { my ($self, $field, $value) = @_; my $DIR = $self->{dir}; open (MATRIXLIST, "$DIR/matrix_list.txt") or $self->throw("Could not read matrix list $DIR/matrix_list.txt"); while (my $line = ) { chomp $line; my ($ID, $ic, $name, $class) = split /\s+/, $line ; if ($ID =~ /(\w+)\.(\w+)$/) { $ID = $1; } defined($self->{_item}->{$ID}) and $self->warn("Duplicate entries for ID $ID"); $self->{_item}->{$ID} = {name=>$name, ic=>$ic, class=>$class}; push @{ $self->{_idlist_of_name}->{$name} }, $ID; push @{ $self->{_idlist_of_class}->{$class} }, $ID; # annoatation my @anno= split(/\s?;\s?/, $line); my %tags; shift @anno; foreach (@anno){ my ($name, $val)=split(/\s?\"/, $_); # print "$name $val\n"; $self->{_item}->{$ID}->{'tags'}->{$name}=$val; } } close MATRIXLIST; return scalar keys %{ $self->{_item} }; # false if list empty } sub get_MatrixSet { my ($self, %args) = @_; my $DIR = $self->{db}; my $arrayref; my $mt = $self->_check_matrixtype($args{-matrixtype}) || $self->throw("No matrix type provided."); delete $args{'-matrixtype'}; my ($field, $value) = %args; unless (defined $field) { $field="-IDs"; $arrayref = [ keys %{ $self->{_item}} ]; } my @IDlist; if ($field eq "-IDs") { @IDlist = @$arrayref; } elsif ($field eq "-names") { foreach (@$arrayref) { push @IDlist, @{ $self->{_idlist_of_name}->{$_} }; } } elsif ($field eq "-classes") { foreach (@$arrayref) { push @IDlist, @{ $self->{_idlist_of_class}->{$_} }; } } else { $self->throw("Unknown matrixset selector: $field."); } my $matrixset = TFBS::MatrixSet->new(); foreach my $ID(@IDlist) { $matrixset->add_matrix($self->get_Matrix_by_ID($ID, $mt)); } close MATRIXLIST; return $matrixset; } sub _check_matrixtype { my ($self, $mt) = @_; $mt = uc $mt; return undef unless $mt; unless ( $mt eq "PFM" or $mt eq "ICM" or $mt eq "PWM") { $self->throw("Unsupported matrix type: ".$mt); } return $mt; } sub _read_file { my ($self, $id, $mt) = @_; local $/ = undef; open FILE, $self->{dir}."/$id.".lc($mt) or return undef; my $matrixstring = ; #slurp; close FILE; return $matrixstring; } 1; TFBS-0.7.1/blib/lib/TFBS/DB/JASPAR2.pm000066400000000000000000000615141305752266700164020ustar00rootroot00000000000000# TFBS module for TFBS::DB::JASPAR2 # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::JASPAR2 - interface to MySQL relational database of pattern matrices =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing JASPAR2-type database my $db = TFBS::DB::JASPAR2->connect("dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M0079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: my @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); # retrieving a set of all PFMs in the database # derived from human genes: my $matrixset = $db->get_MatrixSet(-species => ['Homo sapiens'], -matrixtype => "PFM"); =item * creating a new JASPAR2-type database named MYJASPAR2: my $db = TFBS::DB::JASPAR2->create("dbi:mysql:MYJASPAR2:myhost", "myusername", "mypassword"); =item * storing a matrix in the database (currently only PFMs): #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::JASPAR2 is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. =head1 JASPAR2 DATA MODEL JASPAR2 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed to store matrices for the JASPAR database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile ID (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional optional tags. Due to its data model, which precedeed the design of the module, TFBS::DB::JASPAR2 cannot store arbitrary tags for a matrix. The supported tags are 'acc' # (accession number; # originally for transcription factor protein seq) 'seqdb' # sequence database where 'acc' comes from 'medline' # PubMed ID 'species' # Species name 'sysgroup' 'total_ic' # total information content - redundant, present # for historical "medline" => ($self->_get_medline($ID) or ""), "species" => ($self->_get_species($ID) or ""), "sysgroup"=> ($self->_get_sysgroup($ID) or ""), "type" => ($self->_get_type($ID) or ""), "seqdb" => ($self->_get_seqdb($ID) or ""), "acc" => ($self->_get_acc($ID) or ""), "total_ic"= ----------------------- ADVANCED --------------------------------- For the developers and the curious, here is the JASPAR2 data model: CREATE TABLE matrix_data ( ID varchar(16) DEFAULT '' NOT NULL, pos_ID varchar(24) DEFAULT '' NOT NULL, base enum('A','C','G','T'), position tinyint(3) unsigned, raw int(3) unsigned, info float(7,5) unsigned, -- calculated pwm float(7,5) unsigned, -- calculated normalized float(7,5) unsigned, PRIMARY KEY (pos_ID), KEY id_index (ID) ); CREATE TABLE matrix_info ( ID varchar(16) DEFAULT '' NOT NULL, name varchar(15) DEFAULT '' NOT NULL, type varchar(8) DEFAULT '' NOT NULL, class varchar(20), phylum varchar (32), -- maps to 'sysgroup' tag litt varchar(40), -- not used by this module medline int(12), information varchar(20), -- not used by this module iterations varchar(6), width int(2), -- calculated consensus varchar(25), -- calculated IC float(6,4), -- maps to 'total_ic' tag sites int(3) unsigned, -- not used by this module PRIMARY KEY (ID) ) CREATE TABLE matrix_seqs ( ID varchar(16) DEFAULT '' NOT NULL, internal varchar(8) DEFAULT '' NOT NULL, seq_db varchar(15) NOT NULL, seq varchar(10) NOT NULL, PRIMARY KEY (ID, seq_db, seq) ) CREATE TABLE matrix_species ( ID varchar(16) DEFAULT '' NOT NULL, internal varchar(8) DEFAULT '' NOT NULL, species varchar(24) NOT NULL, PRIMARY KEY (ID, species) ) It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most TFBS users, simply because for historical reasons some table column names are confusing at best. Most users should only know the methods to store the data and which tags are supported. ------------------------------------------------------------------------- =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::JASPAR2; use vars qw(@ISA $AUTOLOAD); # we need all three matrices due to the redundancy in JASPAR2 data model # which will hopefully be removed in JASPAR3 use TFBS::Matrix::PWM; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::MatrixSet; use Bio::Root::Root; use DBI; # use TFBS::DB; # eventually use strict; @ISA = qw(TFBS::DB Bio::Root::Root); ######################################################################### # CONSTANTS ######################################################################### use constant DEFAULT_CONNECTSTRING => "dbi:mysql:JASPAR_DEMO"; # on localhost use constant DEFAULT_USER => ""; use constant DEFAULT_PASSWORD => ""; ######################################################################### # PUBLIC METHODS ######################################################################### =head2 new Title : new Usage : DEPRECATED - for backward compatibility only Use connect() or create() instead =cut sub new { _new (@_); } =head2 connect Title : connect Usage : my $db = TFBS::DB::JASPAR2->connect("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the existing JASPAR2-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR2 object Args : a standard database connection triplet ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, use the actual values. PASSWORD and USERNAME might be optional, depending on the user acces permissions for the database server. =cut sub connect { # a more intuitive syntax for the constructor my ($caller, @connection_args) = @_; $caller->new(-connect => \@connection_args); } =head2 create Title : create Usage : my $newdb = TFBS::DB::JASPAR2->create("dbi:mysql:NEWDATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the database server, creates a new JASPAR2-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR2 object Args : a standard database connection triplet ("dbi:mysql:NEWDATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of NEWDATABASENAME, HOSTNAME, USERNAME and PASSWORD use the actual values. PASSWORD and USERNAME might be optional, depending on the users acces permissions for the database server. =cut sub create { my ($caller, $connectstring, $user, $password) = @_; if ($connectstring and $connectstring =~ /dbi:mysql:(\w+)(.*)/) { # connect to the server; my $dbh=DBI->connect("dbi:mysql:mysql".$2, $user,$password) or die("Error connecting to the database"); # create database and open it $dbh->do("create database $1") or die("Error creating database."); $dbh->do("use $1"); # create tables _create_tables($dbh); $dbh->disconnect; # run "new" with new database return $caller->new(-connect=>[$connectstring, $user, $password]); } else { die("Missing or malformed connect string for ". "TFBS::DB::JASPAR2 connection."); } } =head2 dbh Title : dbh Usage : my $dbh = $db->dbh(); $dbh->do("UPDATE matrix_data SET name='ADD1' WHERE NAME='SREBP2'"); Function: returns the DBI database handle of the MySQL database interfaced by $db; THIS IS USED FOR WRITING NEW METHODS FOR DIRECT RELATIONAL DATABASE MANIPULATION - if you have write access AND do not know what you are doing, you can severely corrupt the data For documentation about database handle methods, see L Returns : the database (DBI) handle of the MySQL JASPAR2-type relational database associated with the TFBS::DB::JASPAR2 object Args : none =cut sub dbh { my ($self, $dbh) = @_; $self->{'dbh'} = $dbh if $dbh; return $self->{'dbh'}; } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PWM is retrieved by default. =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; $mt = (uc($mt) or "PWM"); unless (defined $ID) { $self->throw("No ID passed to get_Matrix_by_ID"); } my $matrixobj; { no strict 'refs'; my $ucmt = uc $mt; my $matrixstring = $self->_get_matrixstring($ID, $mt) || return undef; eval("\$matrixobj= TFBS::Matrix::$ucmt->new".' ( -ID => $ID, -name => $self->_get_name($ID)."", -class => $self->_get_class($ID)."", -tags => { "medline" => ($self->_get_medline($ID) or ""), "species" => ($self->_get_species($ID) or ""), "sysgroup"=> ($self->_get_sysgroup($ID) or ""), "type" => ($self->_get_type($ID) or ""), "seqdb" => ($self->_get_seqdb($ID) or ""), "acc" => ($self->_get_acc($ID) or ""), "total_ic"=> ($self->_get_total_ic($ID) or "") }, -matrix=> $matrixstring # FIXME - temporary );'); if ($@) {$self->throw($@); } } return $matrixobj; } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1', 'PWM'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM') Args : (Matrix_name, Matrix_type) Matrix_name is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PWM is retrieved by default. Warning : According to the current JASPAR2 data model, name is not necessarily a unique identifier. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You have been warned. =cut sub get_Matrix_by_name { my ($self, $name, $mt) = @_; unless(defined $name) { $self->throw("No name passed to get_Matrix_by_name."); } my @IDlist = $self->_get_IDlist_by_query(-names=>[$name]); my $ID= ($IDlist[0] or $self->warn("No matrix with name $name found.")); if ((my $L= scalar @IDlist) > 1) { $self->warn("There are $L matrices with name '$name'"); } return $self->get_Matrix_by_ID($ID, $mt); } =head2 get_MatrixSet Title : get_MatrixSet Usage : my $matrixset = $db->get_MatrixSet(%args); Function: fetches matrix data under for all matrices in the database matching criteria defined by the named arguments and returns a TFBS::MatrixSet object Returns : a TFBS::MatrixSet object Args : This method accepts named arguments: -IDs # a reference to an array of IDs (strings) -names # a reference to an array of # transcription factor names (string) -classes # a reference to an array of # structural class names (strings) -species # a reference to an array of # Latin species names (strings) -sysgroups # a reference to an array of # higher taxonomic categories (strings) -matrixtype # a string, 'PFM', 'ICM' or 'PWM' -min_ic # float, minimum total information content # of the matrix The five arguments that expect list references are used in database query formulation: elements within lists are combined with 'OR' operators, and the lists of different types with 'AND'. For example, my $matrixset = $db->(-classes => ['TRP_CLUSTER', 'FORKHEAD'], -species => ['Homo sapiens', 'Mus musculus'], -matrixtype => 'PWM'); gives a set of PWMs whose (structural clas is 'TRP_CLUSTER' OR 'FORKHEAD') AND (the species they are derived from is 'Homo sapiens' OR 'Mus musculus'). The -min_ic filter is applied after the query in the sense that the matrices profiles with total information content less than specified are not included in the set. =cut sub get_MatrixSet { my ($self, %args) = @_; my @IDlist = $self->_get_IDlist_by_query(%args); my $mt = ($args{'-matrixtype'} or "PWM"); my $matrixset = TFBS::MatrixSet->new(); foreach (@IDlist) { next if (defined $args{'-min_ic'} and $self->_get_total_ic($_) < $args{'-min_ic'}); $matrixset->add_Matrix($self->get_Matrix_by_ID($_, $mt)); } return $matrixset; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($pfm); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (PFM_object) A TFBS::Matrix::PFM object Comment : this is an experimental method that is not 100% bulletproof; use at your own risk =cut sub store_Matrix { my ($self, @PFMs) = @_; my $err; foreach my $pfm (@PFMs) { eval { $self->_store_matrix_data($pfm); $self->_store_matrix_info($pfm); $self->_store_matrix_seqs($pfm); $self->_store_matrix_species($pfm); }; } return $@; } =head2 store_MatrixSet Title : store_MatrixSet Usage : $db->store_Matrix($matrixset); Function: Stores the TFBS::DB::PFM object that are part of a TFBS::MatrixSet object into the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (MatrixSet_object) A TFBS::MatrixSet object Comment : THIS METHOD IS NOT YET IMPLEMENTED =cut sub store_MatrixSet { $_[0]->throw ("Method store_MtrixSet not yet implemented."); } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (ID) A string Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, @IDs) = @_; eval { foreach my $ID (@IDs) { my $q_ID = $self->dbh->quote($ID); foreach my $table (qw (matrix_data matrix_info matrix_seqs matrix_species) ) { $self->dbh->do("DELETE from $table where ID=$q_ID"); } } }; return $@; } ######################################################################### # PRIVATE METHODS ######################################################################### sub _new { my ($caller, %args) = @_; my $class = ref $caller || $caller; my $self = bless {}, $class; my ($connectstring, $user, $password); if ($args{'-connect'} and (ref($args{'-connect'}) eq "ARRAY")) { ($connectstring, $user, $password) = @{$args{'-connect'}}; } elsif ($args{'-create'} and (ref($args{'-create'}) eq "ARRAY")) { return $caller->create(@{-args{'create'}}); } else { ($connectstring, $user, $password) = (DEFAULT_CONNECTSTRING, DEFAULT_USER, DEFAULT_PASSWORD); } $self->dbh( DBI->connect($connectstring, $user, $password) ); return $self; } sub _get_IDlist_by_query { # called by get_MatrixSet my ($self, %args) = @_; my ($TABLES, %arrayref); $args{-names} and $arrayref{name} = $args{-names} ; $args{-classes} and $arrayref{class} = $args{-classes} ; $args{-sysgroups} and $arrayref{phylum} = $args{-sysgroups}; $args{-IDs} and $arrayref{ID} = $args{-IDs}; my @andconditions; if ($args{-species}) { $TABLES = ' matrix_info, matrix_species '; push @andconditions, 'matrix_info.ID = matrix_species.ID', " (". join(" OR ", (map {"matrix_species.species=". $self->dbh->quote($_) } @{$args{-species}} )). ") "; } else { $TABLES = 'matrix_info '; } foreach my $key (keys %arrayref) { if (scalar @{$arrayref{$key}}) { push @andconditions, "(". join(" OR ", (map {"matrix_info.$key=". $self->dbh->quote($_) } @{$arrayref{$key}} )). ")"; } else { push @andconditions, "(1=0)"; } } my $WHERE = ((scalar @andconditions) == 0) ? "" : " WHERE "; my $query = "SELECT DISTINCTROW matrix_info.id FROM $TABLES $WHERE". join(" AND ", @andconditions); my $sth = $self->dbh->prepare($query); $sth->execute() or $self->throw("Query failed:\n$query\n"); # collect IDs and return my @IDlist = (); while (my ($id) = $sth->fetchrow_array()) { push @IDlist, $id; } $sth->finish; return @IDlist; } sub _get_matrixstring { my ($self, $ID, $mt) = @_; my %dbname = (PWM => 'pwm', PFM => 'raw', ICM => 'info'); unless (defined $dbname{$mt}) { $self->throw("Unsupported matrix type: ".$mt); } my $sth; my $qID = $self->dbh->quote($ID); my $matrixstring = ""; foreach my $base (qw(A C G T)) { $sth=$self->dbh->prepare ("SELECT $dbname{$mt} FROM matrix_data WHERE ID=$qID AND base='$base' ORDER BY position"); $sth->execute; $matrixstring .= join (" ", (map {$_->[0]} @{$sth->fetchall_arrayref()}))."\n"; } $sth->finish; return undef if $matrixstring eq "\n"x4; return $matrixstring; } sub _simple_query { my ($self, $table, $retr_field, $search_field, $search_value) = @_; my $q_value = $self->dbh->quote($search_value); my $sth = $self->dbh->prepare ("SELECT DISTINCT $retr_field from $table WHERE $search_field = $q_value and $retr_field <> \"\" ORDER BY $retr_field"); $sth->execute; return (map {$_->[0]} @{$sth->fetchall_arrayref}); } sub _store_matrix_data { my ($self, $pfm, $ACTION) = @_; my @base = qw(A C G T); my $pfmatrix = $pfm->matrix(); my $icmatrix = $pfm->to_ICM()->matrix(); my $pwmatrix = $pfm->to_PWM->matrix(); my $sth = $self->dbh->prepare (q! INSERT INTO matrix_data VALUES(?,?,?,?,?,?,?,?) !); for my $i (0..3) { for my $j (0..($pfm->length-1)) { $sth->execute( $pfm->ID, $pfm->ID.".".$base[$i].".".($j+1), $base[$i], $j+1, $pfmatrix->[$i][$j], $icmatrix->[$i][$j], $pwmatrix->[$i][$j], $pfmatrix->[$i][$j] / $pfm->column_sum()) or $self->throw("Error executing query."); } } } sub _store_matrix_info { my ($self, $pfm, $ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO matrix_info (ID, name, type, class, phylum, width, IC, sites) VALUES(?,?,?,?,?,?,?,?) !); $sth->execute($pfm->ID, ($pfm->name or $pfm->ID), ($pfm->{'tags'}->{'type'} or ""), ($pfm->class() or undef), ($pfm->{'tags'}->{'sysgroup'} or undef), $pfm->length(), $pfm->to_ICM->total_ic(), $pfm->column_sum() ) or $self->throw("Error executing query"); } sub _store_matrix_seqs { my ($self, $pfm, $ACTION) = @_; return unless ($pfm->{'tags'}->{'seqdb'} or $pfm->{'tags'}->{'acc'}); my $sth = $self->dbh->prepare (q! INSERT INTO matrix_seqs (ID, seq_db, seq) VALUES(?,?,?) !); $sth->execute($pfm->ID, ($pfm->{'tags'}->{'seqdb'} or ""), ($pfm->{'tags'}->{'acc'} or "") ) or $self->throw("Error executing query"); } sub _store_matrix_species { my ($self, $pfm, $ACTION) = @_; return unless $pfm->{'tags'}->{'species'}; my $sp = $pfm->{'tags'}->{'species'}; my @splist = (ref($sp) ? @$sp : $sp); foreach my $species (@splist) { my $sth = $self->dbh->prepare (q! INSERT INTO matrix_species (ID, species) VALUES(?,?) !); $sth->execute($pfm->ID, $species ) or $self->throw("Error executing query"); } } sub _create_tables { # utility function # If you want to change the databse schema, # this is the right place to do it my $dbh = shift; my @queries = ( q! CREATE TABLE matrix_data ( ID varchar(16) DEFAULT '' NOT NULL, pos_ID varchar(24) DEFAULT '' NOT NULL, base enum('A','C','G','T'), position tinyint(3) unsigned, raw int(3) unsigned, info float(7,5) unsigned, pwm float(7,5), normalized float(7,5) unsigned, PRIMARY KEY (pos_ID), KEY id_index (ID) ) !, q! CREATE TABLE matrix_info ( ID varchar(16) DEFAULT '' NOT NULL, name varchar(15) DEFAULT '' NOT NULL, type varchar(8) DEFAULT '' NOT NULL, class varchar(20), phylum varchar(32), litt varchar(40), medline int(12), information varchar(20), iterations varchar(6), width int(2), consensus varchar(25), IC float(6,4), sites int(3) unsigned, PRIMARY KEY (ID) ) !, q! CREATE TABLE matrix_seqs ( ID varchar(16) DEFAULT '' NOT NULL, internal varchar(8) DEFAULT '' NOT NULL, seq_db varchar(15) NOT NULL, seq varchar(10) NOT NULL, PRIMARY KEY (ID, seq_db, seq) ) !, q! CREATE TABLE matrix_species ( ID varchar(16) DEFAULT '' NOT NULL, internal varchar(8) DEFAULT '' NOT NULL, species varchar(24) NOT NULL, PRIMARY KEY (ID, species) ) !); foreach my $query (@queries) { $dbh->do($query) or die("Error executing the query: $query\n"); } } sub AUTOLOAD { my ($self, $ID) = @_; no strict 'refs'; my $TABLE; my %dbname_of = (ID => 'ID', name => 'name', class => 'class', species => 'species', sysgroup => 'phylum', type => 'type', seqdb => 'seq_db', acc => 'seq', total_ic => 'IC', medline => 'medline' ); my ($where_column, $where_value); if ($AUTOLOAD =~ /.*::_{0,1}get_(\w+)_list/) { defined $dbname_of{$1} or $self->throw("$AUTOLOAD: no such method!"); ($where_column, $where_value) = (1,1); } elsif ($AUTOLOAD =~ /.*::_get_(\w+)/) { defined $dbname_of{$1} or $self->throw("$AUTOLOAD: no such method!"); defined $ID or $self->throw("No ID provided for $AUTOLOAD"); ($where_column, $where_value) = ('ID', $ID); } else { $self->throw("$AUTOLOAD: no such method!"); } defined $dbname_of{$1} or $self->throw("$AUTOLOAD: no such method!"); if ($1 eq 'species') { $TABLE = 'matrix_species'; } elsif ($1 eq 'seqdb' or $1 eq 'acc') { $TABLE = 'matrix_seqs', } else { $TABLE = 'matrix_info' ; } my @results = $self->_simple_query ($TABLE, $dbname_of{$1}, $where_column => $where_value); wantarray ? return @results : return $results[0]; } sub DESTROY { $_[0]->dbh->disconnect() if $_[0]->dbh; } 1; TFBS-0.7.1/blib/lib/TFBS/DB/JASPAR4.pm000066400000000000000000000534701305752266700164060ustar00rootroot00000000000000# TFBS module for TFBS::DB::JASPAR4 # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::JASPAR4 - interface to MySQL relational database of pattern matrices =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to the existing JASPAR2-type database my $db = TFBS::DB::JASPAR4->connect("dbi:mysql:JASPAR4:myhost", "myusername", "mypassword"); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('M0079','PFM'); #retrieving a PWM by name my $pwm = $db->get_Matrix_by_name('NF-kappaB', 'PWM'); =item * retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria # retrieving a set of PWMs from a list of IDs: my @IDlist = ('M0019', 'M0045', 'M0073', 'M0101'); my $matrixset = $db->get_MatrixSet(-IDs => \@IDlist, -matrixtype => "PWM"); # retrieving a set of ICMs from a list of names: my @namelist = ('p50', 'p53', 'HNF-1'. 'GATA-1', 'GATA-2', 'GATA-3'); my $matrixset = $db->get_MatrixSet(-names => \@namelist, -matrixtype => "ICM"); # retrieving a set of all PFMs in the database # derived from human genes: my $matrixset = $db->get_MatrixSet(-species => ['Homo sapiens'], -matrixtype => "PFM"); =item * creating a new JASPAR4-type database named MYJASPAR4: my $db = TFBS::DB::JASPAR4->create("dbi:mysql:MYJASPAR4:myhost", "myusername", "mypassword"); =item * storing a matrix in the database (currently only PFMs): #let $pfm is a TFBS::Matrix::PFM object $db->store_Matrix($pfm); =back =head1 DESCRIPTION TFBS::DB::JASPAR4 is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. The interface is nearly identical to the JASPAR2interface, while the underlying data model is different =head1 JASPAR2 DATA MODEL JASPAR4 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed (JASPAR2) to store matrices for the JASPAR database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile ID (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional opseqdbtional tags. Tags that are commonly used in the actual JASPAR database include 'medline' # PubMed ID 'species' # Species name 'superclass' #Species supergroup, eg 'vertebrate', 'plant' etc 'total_ic' # total information content - redundant, present # for historical 'type' #experimental nethod 'acc' #accession number for TF protein sequence 'seqdb' #corresponding database name but any tag is storable and searchable. ----------------------- ADVANCED --------------------------------- For the developers and the curious, here is the JASPAR4 data model: It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most TFBS users. Most users should only know the methods to store the data and which tags are supported. ------------------------------------------------------------------------- =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::DB::JASPAR4; use vars qw(@ISA $AUTOLOAD); # we need all three matrices due to the redundancy in JASPAR2 data model # which will hopefully be removed in JASPAR3 use TFBS::Matrix::PWM; use TFBS::Matrix::PFM; use TFBS::Matrix::ICM; use TFBS::MatrixSet; use Bio::Root::Root; use DBI; # use TFBS::DB; # eventually use strict; @ISA = qw(TFBS::DB Bio::Root::Root); ######################################################################### # CONSTANTS ######################################################################### use constant DEFAULT_CONNECTSTRING => "dbi:mysql:JASPAR_DEMO"; # on localhost use constant DEFAULT_USER => ""; use constant DEFAULT_PASSWORD => ""; ######################################################################### # PUBLIC METHODS ######################################################################### =head2 new Title : new Usage : DEPRECATED - for backward compatibility only Use connect() or create() instead =cut sub new { _new (@_); } =head2 connect Title : connect Usage : my $db = TFBS::DB::JASPAR4->connect("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD"); Function: connects to the existing JASPAR4-type database and returns a database object that interfaces the database Returns : a TFBS::DB::JASPAR4 object Args : a standard database connection triplet ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, use the actual values. PASSWORD and USERNAME might be optional, depending on the user's acces permissions for the database server. =cut sub connect { # a more intuitive syntax for the constructor my ($caller, @connection_args) = @_; $caller->new(-connect => \@connection_args); } =head2 dbh Title : dbh Usage : my $dbh = $db->dbh(); $dbh->do("UPDATE matrix_data SET name='ADD1' WHERE NAME='SREBP2'"); Function: returns the DBI database handle of the MySQL database interfaced by $db; THIS IS USED FOR WRITING NEW METHODS FOR DIRECT RELATIONAL DATABASE MANIPULATION - if you have write access AND do not know what you are doing, you can severely corrupt the data For documentation about database handle methods, see L Returns : the database (DBI) handle of the MySQL JASPAR2-type relational database associated with the TFBS::DB::JASPAR2 object Args : none =cut sub dbh { my ($self, $dbh) = @_; $self->{'dbh'} = $dbh if $dbh; return $self->{'dbh'}; } =head2 store_Matrix Title : store_Matrix Usage : $db->store_Matrix($matrixobject); Function: Stores the contents of a TFBS::Matrix::DB object in the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (PFM_object) A TFBS::Matrix::PFM, FBS::Matrix::PWM or FBS::Matrix::ICM object. PFM object are recommended to use, as they are eaily converted to other formats Comment : this is an experimental method that is not 100% bulletproof; use at your own risk =cut sub store_Matrix { my ($self, @PFMs) = @_; my $err; foreach my $pfm (@PFMs) { eval { $self->_store_matrix_data($pfm); $self->_store_matrix_info($pfm); $self->_store_matrix_annotation($pfm); #$self->_store_matrix_species($pfm); }; } return $@; } sub create { my ($caller, $connectstring, $user, $password) = @_; if ($connectstring and $connectstring =~ /dbi:mysql:(\w+)(.*)/) { # connect to the server; my $dbh=DBI->connect("dbi:mysql:mysql".$2, $user,$password) or die("Error connecting to the database"); # create database and open it $dbh->do("create database $1") or die("Error creating database."); $dbh->do("use $1"); # create tables _create_tables($dbh); $dbh->disconnect; # run "new" with new database return $caller->new(-connect=>[$connectstring, $user, $password]); } else { die("Missing or malformed connect string for ". "TFBS::DB::JASPAR2 connection."); } } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('M00034', 'PFM'); Function: fetches matrix data under the given ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix is stored in the database (PFM is default) Args : (Matrix_ID) Matrix_ID is a string; =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; $mt = (uc($mt) or "PWM"); unless (defined $ID) { $self->throw("No ID passed to get_Matrix_by_ID"); } my $matrixobj; { no strict 'refs'; my $ucmt = uc $mt; my $matrixstring = $self->_get_matrixstring($ID) || return undef; # get type of matrix my $sth=$self->dbh->prepare(qq{SELECT type FROM MATRIX_INFO WHERE ID = '$ID'}); $sth->execute(); my $type=$sth->fetchrow_array(); # get reast of annotation as tags $sth=$self->dbh->prepare(qq{SELECT tag, val FROM MATRIX_ANNOTATION WHERE ID = '$ID' }); $sth->execute(); my %tags; while ( my($tag, $val)= $sth->fetchrow_array()){ $tags{$tag}=$val; } my $name= $tags{'name'}; my $class= $tags{'class'}; delete ($tags{'name'}); delete ($tags{'class'}); eval ("\$matrixobj= TFBS::Matrix::$type->new".' ( -ID => $ID."", -name =>$name, -class => $class, -tags => \%tags, -matrixstring=> $matrixstring # FIXME - temporary );'); #if ($@) {$self->throw($@); } #print "ref:",ref ($matrixobj); } # print $matrixobj->ID(); # print "here\n";print $matrixobj->prettyprint(); return ($matrixobj); } =head2 get_Matrix_by_name Title : get_Matrix_by_name Usage : my $pfm = $db->get_Matrix_by_name('HNF-1'); Function: fetches matrix data under the given name from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on what form the matrix object was stored in the database (default PFM)) Args : (Matrix_name) Warning : According to the current JASPAR4 data model, name is not necessarily a unique identifier. In the case where there are several matrices with the same name in the database, the function fetches the first one and prints a warning on STDERR. You've been warned. =cut sub get_Matrix_by_name { my ($self, $name, $mt) = @_; unless(defined $name) { $self->throw("No name passed to get_Matrix_by_name."); } my @IDlist = $self->_get_IDlist_by_query(-name=>[$name]); my $ID= ($IDlist[0] or $self->warn("No matrix with name $name found.")); if ((my $L= scalar @IDlist) > 1) { $self->warn("There are $L matrices with name '$name'"); } return $self->get_Matrix_by_ID($ID); } =head2 get_MatrixSet Title : get_MatrixSet Usage : my $matrixset = $db->get_MatrixSet(%args); Function: fetches matrix data under for all matrices in the database matching criteria defined by the named arguments and returns a TFBS::MatrixSet object Returns : a TFBS::MatrixSet object Args : This method accepts named arguments, corresponding to arbitrary tags. Note that this is different from JASPAR2. As any tag is supported for database storage, any tag can be used for information retrieval. Additionally, arguments as 'name' and 'class' can be used (even though they are not tags. As with get_Matrix methods, it is important to realize that any matrix format can be stored in the database: the TFBS::MatrixSet might therefore consist of PFMs, ICMs and PWMS, depending on how matrices are stored, Examples include -ID # a reference to an array of IDs (strings) -name # a reference to an array of # transcription factor names (string) -class # a reference to an array of # structural class names (strings) -species # a reference to an array of # Latin species names (strings) -sysgroup # a reference to an array of # higher taxonomic categories (strings) -min_ic # float, minimum total information content # of the matrix. IMPORTANT:if retrieved matrices are in PWM format there is no way to measureinformation content. -matrixtype #string describing type of matrix to retrieve. If left out, the format will revert to the database format. Note that this option only works if the database format is pfm The arguments that expect list references are used in database query formulation: elements within lists are combined with 'OR' operators, and the lists of different types with 'AND'. For example, my $matrixset = $db->(-class => ['TRP_CLUSTER', 'FORKHEAD'], -species => ['Homo sapiens', 'Mus musculus'], ); gives a set of TFBS::Matrix::PFM objects (given that the matrix models are stored as such) whose (structural clas is 'TRP_CLUSTER' OR'FORKHEAD') AND (the species they are derived from is 'Homo sapiens'OR 'Mus musculus'). The -min_ic filter is applied after the query in the sense that the matrices profiles with total information content less than specified are not included in the set. =cut sub get_MatrixSet { my ($self, %args) = @_; my @IDlist = $self->_get_IDlist_by_query(%args); my $type; my $matrixset = TFBS::MatrixSet->new(); foreach (@IDlist) { # print "$_\n"; } foreach (@IDlist) { #next if (defined $args{'-min_ic'} # and $_->_get_total_ic($_) < $args{'-min_ic'}); #evaluate total information content: ivolves actually retrieving matrix # is actually a problem if matrix is stored PWM: thro an error if so my $matrix=$self->get_Matrix_by_ID($_); #evaluate #ugly code: if (defined $args{'-min_ic'} ){ if ($matrix->isa("TFBS::Matrix::PFM")){ next if ( $matrix->to_ICM->total_ic() < $args{'-min_ic'}); } if ($matrix->isa("TFBS::Matrix::ICM")){ next if ($matrix->total_ic() < $args{'-min_ic'}); } if ($matrix->isa("TFBS::Matrix::PWM")){ $self->throw("Cannot evaluate information constent from PWM matrices"); } } #ugly code: if ($args{'-matrixtype'} && $matrix->isa("TFBS::Matrix::PFM")){ if ( $args{'-matrixtype'} eq ('PWM')) { # warn "change"; $matrix= $matrix->to_PWM(); } if ( $args{'-matrixtype'} eq ('ICM')) { #warn "change"; $matrix= $matrix->to_PWM(); } } $matrixset->add_Matrix($matrix); } return $matrixset; } sub store_MatrixSet { $_[0]->throw ("Method store_MtrixSet not yet implemented."); } =head2 delete_Matrix_having_ID Title : delete_Matrix_having_ID Usage : $db->delete_Matrix_with_ID('M00045'); Function: Deletes the matrix having the given ID from the database Returns : 0 on success; $@ contents on failure (this is too C-like and may change in future versions) Args : (ID) A string Comment : Yeah, yeah, 'delete_Matrix_having_ID' is a stupid name for a method, but at least it should be obviuos what it does. =cut sub delete_Matrix_having_ID { my ($self, @IDs) = @_; eval { foreach my $ID (@IDs) { my $q_ID = $self->dbh->quote($ID); foreach my $table (qw (MATRIX_DATA MATRIX_INFO MATRIX_ANNOTATION ) ) { $self->dbh->do("DELETE from $table where ID=$q_ID"); } } }; return $@; } ######################################################################### # PRIVATE METHODS ######################################################################### sub _new { my ($caller, %args) = @_; my $class = ref $caller || $caller; my $self = bless {}, $class; my ($connectstring, $user, $password); if ($args{'-connect'} and (ref($args{'-connect'}) eq "ARRAY")) { ($connectstring, $user, $password) = @{$args{'-connect'}}; } elsif ($args{'-create'} and (ref($args{'-create'}) eq "ARRAY")) { return $caller->create(@{-args{'create'}}); } else { ($connectstring, $user, $password) = (DEFAULT_CONNECTSTRING, DEFAULT_USER, DEFAULT_PASSWORD); } $self->dbh( DBI->connect($connectstring, $user, $password) ); return $self; } sub _store_matrix_data { my ($self, $pfm, $ACTION) = @_; my @base = qw(A C G T); my $matrix = $pfm->matrix(); my $type; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_DATA VALUES(?,?,?,?) !); for my $i (0..3) { for my $j (0..($pfm->length-1)) { $sth->execute( $pfm->ID, $base[$i], $j+1, $matrix->[$i][$j] ) or $self->throw("Error executing query."); } } } sub _store_matrix_info { my ($self, $pfm, $ACTION) = @_; my $type; $type= 'PFM' if $pfm->isa("TFBS::Matrix::PFM"); $type= 'PWM' if $pfm->isa("TFBS::Matrix::PWM"); $type= 'ICM' if $pfm->isa("TFBS::Matrix::ICM"); my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_INFO (ID, type) VALUES(?,?) !); $sth->execute($pfm->ID, $type, ) or $self->throw("Error executing query"); } sub _store_matrix_annotation { my ($self, $pfm, $ACTION) = @_; my $sth = $self->dbh->prepare (q! INSERT INTO MATRIX_ANNOTATION (ID, tag, val) VALUES(?,?,?) !); $sth->execute($pfm->ID, 'name', ($pfm->name() or ""), ); $sth->execute($pfm->ID, 'class', ($pfm->class() or ""), ); # get all tags my %tags= $pfm->all_tags(); foreach my $tag( keys %tags){ $sth->execute($pfm->ID, $tag, ($tags{$tag} or ""), ) or $self->throw("Error executing query"); } } #when creating: try to support arbitrary tags sub _create_tables { # utility function # If you want to change the databse schema, # this is the right place to do it my $dbh = shift; my @queries = ( q! CREATE TABLE MATRIX_DATA( ID VARCHAR (16) DEFAULT '' NOT NULL, row VARCHAR(1) NOT NULL, col TINYINT(3) UNSIGNED NOT NULL, val float(7,4), PRIMARY KEY (ID, row, col) ) !, q! CREATE TABLE MATRIX_INFO( ID VARCHAR (16) DEFAULT '' NOT NULL PRIMARY KEY , type ENUM ('PFM', 'ICM','PWM') DEFAULT 'PFM' NOT NULL ) !, q! CREATE TABLE MATRIX_ANNOTATION( ID VARCHAR (16) DEFAULT '' NOT NULL, tag VARCHAR(255)DEFAULT '' NOT NULL, val varchar(255) DEFAULT '', PRIMARY KEY (ID, tag) ) !, ); foreach my $query (@queries) { $dbh->do($query) or die("Error executing the query: $query\n"); } } sub _get_matrixstring { my ($self, $ID) = @_; #my %dbname = (PWM => 'pwm', PFM => 'raw', ICM => 'info'); #unless (defined $dbname{$mt}) { #$self->throw("Unsupported matrix type: ".$mt); #} my $sth; my $qID = $self->dbh->quote($ID); my $matrixstring = ""; foreach my $base (qw(A C G T)) { $sth=$self->dbh->prepare ("SELECT val FROM MATRIX_DATA WHERE ID=$qID AND row='$base' ORDER BY col"); $sth->execute; $matrixstring .= join (" ", (map {$_->[0]} @{$sth->fetchall_arrayref()}))."\n"; } $sth->finish; return undef if $matrixstring eq "\n"x4; return $matrixstring; } sub _get_IDlist_by_query { # called by get_MatrixSet # should be able to search for arbitrary tags...hmmm my ($self, %args) = @_; my ($TABLES, %arrayref); my (%intersected_set); foreach my $key(keys %args){ unless ( $key eq "-min_ic" or $key eq "-matrixtype"){ my $oldkey=$key; $key=~s/-//; $arrayref{$key}= $args{$oldkey}; } } my @andconditions; $TABLES = 'MATRIX_ANNOTATION '; #special case: get all matrices unless (keys %arrayref){ my $query = "SELECT DISTINCT ID FROM $TABLES "; my $sth = $self->dbh->prepare($query); $sth->execute() or $self->throw("Query failed:\n$query\n"); my @ary; while (my ($id) = $sth->fetchrow_array()) { push (@ary, $id); } return(@ary); } foreach my $key (keys %arrayref) { #print "key: $key\n"; if ($key eq 'ID'){ push @andconditions, "(". join(" OR ", (map {"MATRIX_ANNOTATION.ID=". $self->dbh->quote($_) } @{$arrayref{$key}} )). ")"; } else{ push @andconditions, "(". join(" OR ", (map {"MATRIX_ANNOTATION.tag=". $self->dbh->quote($key)." AND val=". $self->dbh->quote($_) } @{$arrayref{$key}} )). ")"; } push (@andconditions, 1) unless(@andconditions); my $WHERE = ((scalar @andconditions) == 0) ? "" : " WHERE "; my $query = "SELECT DISTINCT ID FROM $TABLES $WHERE". join(" AND ", @andconditions); # warn $query; undef @andconditions; my $sth = $self->dbh->prepare($query); $sth->execute() or $self->throw("Query failed:\n$query\n"); # collect IDs and return my %current_query; while (my ($id) = $sth->fetchrow_array()) { $current_query{$id}=1; } unless (%intersected_set){ %intersected_set= %current_query; next; } # do intersect foreach my $key (keys %intersected_set){ delete $intersected_set{ $key} unless $current_query{$key}; } } my @ary; foreach my $key (keys %intersected_set){ push (@ary, $key); # warn "$key\n"; } return (@ary); } sub DESTROY { $_[0]->dbh->disconnect() if $_[0]->dbh; } TFBS-0.7.1/blib/lib/TFBS/DB/LocalTRANSFAC.pm000066400000000000000000000167241305752266700175170ustar00rootroot00000000000000# TFBS module for TFBS::DB::LocalTRANSFAC # # Copyright Stephen Montgomery smontgom@bcgsc.bc.ca # # Contributors: Boris Lenhard, Leonardo Marino-Ramirez # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::LocalTRANSFAC - interface to local transfac database position frequency matrices (matrix.dat) -------------------------------- NOTICE ---------------------------------- The TRANSFAC database is free for non-commercial use. For commercial use the TRANSFAC databases and programs have to be licensed. Please read the DISCLAIMER at http://transfac.gbf.de/TRANSFAC/disclaimer.htm. ------------------------------------------------------------------------- =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to TRANSFAC data my $db = TFBS::DB::LocalTRANSFAC->connect(-localdir => '/home/someusr'); localdir is the location of the matrix.dat TRANSFAC datafile =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('V$CEBPA_01','PFM'); #retrieving a PWM by TRANSFAC accession number my $pwm = $db->get_Matrix_by_acc('M00116', 'PWM'); =back =head1 DESCRIPTION TFBS::DB::LocalTRANSFAC is a read only database interface that fetches TRANSFAC matrix data from a local TRANSFAC install (matrix.dat) =cut package TFBS::DB::LocalTRANSFAC; use vars qw(@ISA); use strict; use TFBS::DB::TRANSFAC; use TFBS::Matrix::PFM; @ISA = qw(TFBS::DB::TRANSFAC); =head2 connect Title : connect Usage : my $db = TFBS::DB::TRANSFAC->connect(%args); Function: Creates a TRANSFAC database connection object, which can be used to retrieve matrices from a locally installed TRANSFAC database Returns : a TFBS::DB::TRANSFAC object Args : -localdir # REQUIRED: the directory of the matrix.dat TRANSFAC # datafile. matrix.dat must have read access. -accept_conditions # OPTIONAL: by setting this to a true # value, you confirm that you # have read and accepted the terms # of use of TRANSFAC at # http://transfac.gbf.de/TRANSFAC/disclaimer.htm; # this also suppresses the annoying # message that is printed to STDERR # upon invoking the method =cut sub connect { my ($caller, %args) = @_; my $self = bless { 'loc' => $args{'-localdir'}}, ref $caller || $caller; unless (defined ($args{-accept_conditions}) and $args{-accept_conditions}) { print STDERR <connect(-accept_conditions => 1); -------------------------------------------------------------------------- ENDNOTICE ; } unless (defined $args{-localdir}) { $self->throw("Need directory of TRANSFAC database"); } return $self; } =head2 get_Matrix_by_acc Title : get_Matrix_by_acc Usage : my $pfm = $db->get_Matrix_by_acc('V$CREB_01', 'PFM'); Function: fetches matrix data under the given TRANSFAC aaccession number from database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PFM is retrieved by default. =cut sub get_Matrix_by_acc { my ($self, $acc, $mt) = @_; unless (defined $acc) { $self->throw("No parameters passed to get_Matrix_by_ID."); } my $datablock = _get_Matrix_Block ( 'acc' => $acc, 'loc' => $self->{'loc'}); return $self->_get_Matrix_by_Block($datablock, $mt); } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('V$CREB_01', 'PFM'); Function: fetches matrix data under the given TRANSFAC ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PFM is retrieved by default. =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; unless (defined $ID) { $self->throw("No parameters passed to get_Matrix_by_ID."); } my $datablock = _get_Matrix_Block ( 'ID' => $ID, 'loc' => $self->{'loc'}); return $self->_get_Matrix_by_Block($datablock, $mt); } sub _get_Matrix_Block { my %params = @_; my $loc = $params{'loc'}; my $acc = $params{'acc'}; my $ID = $params{'ID'}; $loc = $loc . "/matrix.dat"; open(HANDLE, $loc) || die ("File opening failed for matrix.dat: Check file permissions"); my @raw_data=; my @block = (); my $hit = 0; foreach my $line (@raw_data) { if ($line eq "//\n") { foreach my $lineinblock (@block) { if (defined $ID) { if ($lineinblock eq "ID $ID\n") { $hit = 1 }; } if (defined $acc) { if ($lineinblock eq "AC $acc\n") { $hit = 1 }; } } if ($hit == 0) { @block = (); } } if ($hit == 0) { push @block, $line; } } close(HANDLE); return \@block; } sub _get_Matrix_by_Block { my ($self, $datablock, $mt) = @_; my @datalines = @$datablock; my (@As, @Cs, @Gs, @Ts, $name, $ID, $acc); foreach my $line (@datalines) { if ($line =~ /NA\s+(\S+)\n/) { $name = $1; } if ($line =~ /ID\s+(\S+)\n/) { $ID = $1; } if ($line =~ /AC\s+(\S+)\n/) { $acc = $1; } #if ($line =~ /\d{2}\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\n/) { # change to allow for both older and newer file format # contributed by Leonardo Marino-Ramirez: # Updated 2003-09-05 to enable parsing of non-integer entries if ($line =~ /^\d{2}\s+(\d+\.?\d*)\s+(\d+\.?\d*)\s+(\d+\.?\d*)\s+(\d+\.?\d*).*$/) { push @As, $1; push @Cs, $2; push @Gs, $3; push @Ts, $4; } } return undef unless @As; my $pfm = TFBS::Matrix::PFM-> new ( -ID => $ID, -name => $name, -tags => {acc=>$acc}, -matrix => [ \@As, \@Cs, \@Gs, \@Ts] ); if (!defined($mt) or uc($mt) eq "PFM") {return $pfm;} elsif (uc($mt) eq "ICM") {return $pfm->to_ICM;} elsif (uc($mt) eq "PWM") {return $pfm->to_PWM;} else { $self->throw("Unrecognized matrix format: $mt"); } } 1; TFBS-0.7.1/blib/lib/TFBS/DB/TRANSFAC.pm000066400000000000000000000165311305752266700165400ustar00rootroot00000000000000# TFBS module for TFBS::DB::TRANSFAC # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::DB::TRANSFAC - interface to database of TRANSFAC public position frequency matrices at TESS (http://www.cbil.upenn.edu/tess) -------------------------------- NOTICE ---------------------------------- The TRANSFAC database is free for non-commercial use. For commercial use the TRANSFAC databases and programs have to be licensed. Please read the DISCLAIMER at http://transfac.gbf.de/TRANSFAC/disclaimer.htm. ------------------------------------------------------------------------- =head1 SYNOPSIS =over 4 =item * creating a database object by connecting to TRANSFAC data my $db = TFBS::DB::TRANSFAC->connect(); =item * retrieving a TFBS::Matrix::* object from the database # retrieving a PFM by ID my $pfm = $db->get_Matrix_by_ID('V$CEBPA_01','PFM'); #retrieving a PWM by TRANSFAC accession number my $pwm = $db->get_Matrix_by_acc('M00116', 'PWM'); =back =head1 DESCRIPTION TFBS::DB::TRANSFAC is a read only database interface that fetches TRANSFAC matrix data from TESS web interface (http://www.cbil.upen.edu/TESS) and returns TFBS::Matrix::* objects. =cut package TFBS::DB::TRANSFAC; use vars qw(@ISA $ua); use strict; use Bio::Root::Root; use TFBS::Matrix::PFM; use LWP::Simple qw($ua get); @ISA = qw(TFBS::DB Bio::Root::Root); =head2 connect Title : connect Usage : my $db = TFBS::DB::TRANSFAC->connect(%args); Function: Creates a TRANSFAC database connection object, which can be used to retrieve matrices from public TRANSFAC databases via the web Returns : a TFBS::DB::TRANSFAC object Args : -proxy # OPTIONAL: a http proxy server name, # usually required for accessing TRANSFAC from behind # a firewall -accept_conditions # OPTIONAL: by setting this to a true # value, you confirm that you # have read and accepted the terms # of use of TRANSFAC at # http://transfac.gbf.de/TRANSFAC/disclaimer.htm; # this also suppresses the annoying # message that is printed to STDERR # upon invoking the method =cut sub connect { my ($caller, %args) = @_; my $self = bless {}, ref $caller || $caller; unless (defined ($args{-accept_conditions}) and $args{-accept_conditions}) { print STDERR <connect(-accept_conditions => 1); -------------------------------------------------------------------------- ENDNOTICE ; } if (defined $args{'-proxy'}) { $ua->proxy('http',$args{'-proxy'}); } return $self; } =head2 new Title : connect Usage : my $db = TFBS::DB::TRANSFAC->connect(%args); Function: Here, I is just a synonim for I (to make the interface consistent with other bioperl read-obly Bio::DB::* objects) Returns : a TFBS::DB::TRANSFAC object Args : -accept_conditions # see explanation at I =cut sub new { my ($caller, %args) = @_; $caller->connect(%args); } =head2 get_Matrix_by_ID Title : get_Matrix_by_ID Usage : my $pfm = $db->get_Matrix_by_ID('V$CREB_01', 'PFM'); Function: fetches matrix data under the given TRANSFAC ID from the database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PFM is retrieved by default. =cut sub get_Matrix_by_ID { my ($self, $ID, $mt) = @_; unless (defined $ID) { $self->throw("No parameters passed to get_Matrix_by_ID."); } my $url = "http://www.cbil.upenn.edu/cgi-bin/tess/tess33?request=MTX-DBRTRV-Id&key=$ID"; # my $url = "http://www.cbil.upenn.edu/cgi-bin/tess/tess33?request=MTX-DBRTRV-Id&key=$ID"; return $self->_get_Matrix_by_URL($url, $mt); } =head2 get_Matrix_by_acc Title : get_Matrix_by_acc Usage : my $pfm = $db->get_Matrix_by_acc('V$CREB_01', 'PFM'); Function: fetches matrix data under the given TRANSFAC aaccession number from database and returns a TFBS::Matrix::* object Returns : a TFBS::Matrix::* object; the exact type of the object depending on the second argument (allowed values are 'PFM', 'ICM', and 'PWM'); returns undef if matrix with the given ID is not found Args : (Matrix_ID, Matrix_type) Matrix_ID is a string; Matrix_type is one of the following: 'PFM' (raw position frequency matrix), 'ICM' (information content matrix) or 'PWM' (position weight matrix) If Matrix_type is omitted, a PFM is retrieved by default. =cut sub get_Matrix_by_acc { my ($self, $acc, $mt) = @_; unless (defined $acc) { $self->throw("No parameters passed to get_Matrix_by_ID."); } my $url = "http://www.cbil.upenn.edu/cgi-bin/tess/tess33?request=MTX-DBRTRV-Accno&key=$acc"; return $self->_get_Matrix_by_URL($url, $mt); } sub get_MatrixSet { my ($self, %args) = @_; # not yet implemented } sub _get_Matrix_by_URL { my ($self, $url, $mt) = @_; my $HTMLpage = get $url || return undef; my (@As, @Cs, @Gs, @Ts, $name, $ID, $acc); my @lines = split "\n", $HTMLpage; foreach my $line (@lines) { $line =~ s/\r//; $line =~ s/<\/{0,1}b>//gi; $line =~ s/ //gi; if ($line =~ /Name<\/td>([^<]+)([^<]+)([^<]+) new ( -ID => $ID, -name => $name, -tags => {acc=>$acc}, -matrix => [ \@As, \@Cs, \@Gs, \@Ts] ); if (!defined($mt) or uc($mt) eq "PFM") {return $pfm;} elsif (uc($mt) eq "ICM") {return $pfm->to_ICM;} elsif (uc($mt) eq "PWM") {return $pfm->to_PWM;} else { $self->throw("Unrecognized matrix format: $mt"); } } 1; TFBS-0.7.1/blib/lib/TFBS/Ext/000077500000000000000000000000001305752266700152465ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/Ext/.exists000066400000000000000000000000001305752266700165540ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/Ext/pwmsearch.pm000066400000000000000000000073301305752266700176000ustar00rootroot00000000000000package TFBS::Ext::pwmsearch; require 5.005_62; use strict; use warnings; use vars qw(@ISA @EXPORT @EXPORT_OK %EXPORT_TAGS $VERSION); use Bio::SeqIO; use File::Temp qw (:POSIX); require Exporter; require DynaLoader; our @ISA = qw(Exporter DynaLoader); # Items to export into callers namespace by default. Note: do not export # names by default without a very good reason. Use EXPORT_OK instead. # Do not simply export all your public functions/methods/constants. # This allows declaration use TFBS::Ext::pwmsearch ':all'; # If you do not need this, moving things directly into @EXPORT or @EXPORT_OK # will save memory. %EXPORT_TAGS = ( 'all' => [ qw( ) ] ); @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } ); @EXPORT = qw( ); $VERSION = '0.2'; bootstrap TFBS::Ext::pwmsearch $VERSION; # Preloaded methods go here. sub pwmsearch { my ($matrixobj, $seqobj, $threshold, $start, $end) = @_; $start = 1 if !defined($start); $end = $seqobj->length if !defined($end); my $matrixfile = tmpnam(); open (MATRIX, ">$matrixfile") or die ("Error opening temporary file."); print MATRIX $matrixobj->rawprint(); close MATRIX; my $outfile = tmpnam(); # pwm_search is confused by long descriptions - we delete desc temporarily: my $save_desc = $seqobj->desc(); $seqobj->desc(""); my $seqfile; if ($seqobj->{_fastafile}) { $seqfile = $seqobj->{_fastafile}; } else { $seqfile = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$seqfile", -format=>"fasta"); $outstream->write_seq(Bio::Seq->new(-seq =>$seqobj->subseq($start, $end), -id =>$seqobj->id)); $outstream->close(); } $seqobj->desc($save_desc); # calculate threshold if ($threshold) { if ($threshold =~ /(.+)%/) { # percentage $threshold = $matrixobj->{min_score} + ($matrixobj->{max_score} - $matrixobj->{min_score})* $1/100; } else { # absolute value # $threshold = $args{-threshold}; } } else { # no threshold given $threshold = $matrixobj->{min_score} -1; } search_xs($matrixfile, $seqfile, $threshold, $matrixobj->name()."", $matrixobj->{'class'}."", $outfile); unlink $seqfile unless $seqobj->{'_fastafile'}; unlink $matrixfile; my $hitlist = TFBS::SiteSet->new(); my ($TFname, $TFclass) = ($matrixobj->{name}, $matrixobj->{class}); my $save_delim = $/; # bugfix submitted local $/ = "\n"; # by Michal Lapidot open (OUTFILE, $outfile) or die("Could not read temporary outfile"); while (my $line = ) { # print STDERR $line; chomp $line; $line =~ s/^\s+//; $line =~ s/ *\t */\t/g; my ($seq_id, $factor, $class, $strand, $score, $pos, $siteseq) = (split /\t/, $line)[0, 2, 3, 4, 5, 7, 9]; my $num_strand = ($strand eq "-")? "-1" : "1"; my $site = TFBS::Site->new ( -seq_id => $seqobj->display_id()."", -seqobj => $seqobj, -strand => $num_strand."", -pattern => $matrixobj, -siteseq => $siteseq."", -score => $score."", -start => $pos +$start -1, -end => $pos +$start +length($siteseq) -2 ); $hitlist->add_site($site); } close OUTFILE; $/ = $save_delim; unlink $outfile; return $hitlist; } 1; __END__ =head1 NAME TFBS::Ext::pwmsearch - Perl extension for scanning a DNA sequence object with a position weight matrix =head1 SYNOPSIS use TFBS::Ext::pwmsearch; pwmsearch =head1 DESCRIPTION Stub documentation for TFBS::Ext::pwmsearch, created by h2xs. It looks like the author of the extension was negligent enough to leave the stub unedited. Blah blah blah. =head2 EXPORT None by default. =head1 AUTHOR A. U. Thor, a.u.thor@a.galaxy.far.far.away =head1 SEE ALSO perl(1). =cut TFBS-0.7.1/blib/lib/TFBS/Matrix.pm000066400000000000000000000174401305752266700163160ustar00rootroot00000000000000# TFBS module for TFBS::Matrix # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix - base class for matrix patterns, containing methods common to all =head1 DESCRIPTION TFBS::Matrix is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and matrix manipulation methods that are independent of the matrix type. It is not meant to be instantiated itself. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Matrix; use vars '@ISA'; use PDL; # this dependency has to be eliminated in the future versions use TFBS::PatternI; use strict; @ISA = qw(TFBS::PatternI); sub new { my $class = shift; my %args = @_; my $self = bless {}, ref($class) || $class; # first figure out how it was called # we need (-dbh and (-ID or -name) for fetching it from a database # or -matrix for direct matrix input if (defined $args{'-matrix'}) { $self->set_matrix($args{'-matrix'}); } elsif (defined $args{'-matrixstring'}) { $self->set_matrix($args{'-matrixstring'}); } elsif (defined $args{-matrixfile}) { my $matrixstring; open (FILE, $args{-matrixfile}) or $self->throw("Could not open $args{-matrixfile}"); { local $/ = undef; $matrixstring = ; } $self->set_matrix($matrixstring); } else { $self->throw("No matrix or db object provided."); } # Set the object data. # Parameters specified in constructor call override those # fetched from the database. $self->{'ID'} = ($args{-ID} or $self->{ID} or "Unknown"); $self->{'name'} = ($args{-name} or $self->{name} or "Unknown"); $self->{'class'} = ($args{-class} or $self->{class} or "Unknown"); $self->{'strand'} = ($args{-strand} or $self->{strand} or "+"); $self->{'bg_probabilities'} = ($args{'-bg_probabilities'} || {A => 0.25, C => 0.25, G => 0.25, T => 0.25}); $self->{'tags'} = $args{-tags} ? ((ref($args{-tags}) eq "HASH") ? $args{-tags} : {} ) :{}; return $self; } =head2 matrix Title : matrix Usage : my $matrix = $pwm->matrix(); $pwm->matrix( [ [12, 3, 0, 0, 4, 0], [ 0, 0, 0,11, 7, 0], [ 0, 9,12, 0, 0, 0], [ 0, 0, 0, 1, 1,12] ]); Function: get/set for the matrix data Returns : a reference to 2D array of integers(PFM) or floats (ICM, PWM) Args : none for get; a four line string, reference to 2D array, or a 2D piddle for set =cut sub matrix { my ($self, $matrixdata) = @_; $self->set_matrix($matrixdata) if $matrixdata; return $self->{'matrix'}; } =head2 pdl_matrix Title : pdl_matrix Usage : my $pdl = $pwm->pdl_matrix(); Function: access the PDL matrix used to store the actual matrix data directly Returns : a PDL object, aka a piddle Args : none =cut sub pdl_matrix { pdl $_[0]->{'matrix'}; } sub set_matrix { my ($self, $matrixdata) = @_; # The input matrix (specified as -array=> in the constructir call # can either be # * a 2D regular perl array with 4 rows, # * a piddle (FIXME - check for 4 rows), or # * a four-line string of numbers # print STDERR "MATRIX>>>".$matrixdata; if (ref($matrixdata) eq "ARRAY" and ref($matrixdata->[0]) eq "ARRAY" and scalar(@{$matrixdata}) == 4) { # it is a perl array $self->{'matrix'} = $matrixdata; } elsif (ref($matrixdata) eq "PDL") { # it's a piddle $self->{matrix} = _pdl_to_matrixref($matrixdata); } elsif (!ref($matrixdata)) #and (scalar split "\n",$matrixdata) == 4) { # it's a string then $self->{matrix} = $self->_matrix_from_string($matrixdata); } else { $self->throw("Wrong data type/format for -matrix.\n". "Acceptable formats are Array of Arrays (4 rows),\n". "PDL Array, (4 rows),\n". "or plain string (4 lines)."); } # $self->_set_min_max_score(); return 1; } sub _matrix_from_string { my ($self, $matrixstring) = @_; my @array = (); foreach ((split "\n", $matrixstring)[0..3]) { s/^\s+//; s/\s+$//; push @array, [split]; } return \@array; } sub _set_min_max_score { my ($self) = @_; my $transpose = $self->pdl_matrix->xchg(0,1); $self->{min_score} = sum(minimum $transpose); $self->{max_score} = sum(maximum $transpose); } sub _load { my ($self, $field, $value) = @_; if (substr(ref($self->{db}),0,5) eq "DBI::") { # database retrieval } elsif (-d $self->{dbh}) { # retrieval from .pwm files in a directory $self->_lookup_in_matrixlist($field, $value) or do { warn ("Matrix with $field=>$value not found."); return undef; }; my $ID = $self->{ID}; my $DIR = $self->{dbh}; $self->set_matrix(scalar `cat $DIR/$ID.pwm`); # FIXME - temporary } else { $self->throw("-dbh is not a valid database handle or a directory."); } } =head2 revcom Title : revcom Usage : my $revcom_pfm = $pfm->revcom(); Function: create a matrix pattern object which is reverse complement of the current one Returns : a TFBS::Matrix::* object of the same type as the one the method acted upon Args : none =cut sub revcom { my ($self) = @_; my $revcom_matrix = $self->new(-matrix => $self->pdl_matrix->slice('-1:0,-1:0'), # the above line rotates the original matrix 180 deg, -ID => ($self->{ID} or ""), -name => ($self->{name} or ""), -class => ($self->{class} or ""), -strand => ($self->{strand} and $self->{strand} eq "-") ? "+" : "-", -tags => ($self->{tags} or {}) ); return $revcom_matrix; } =head2 rawprint Title : rawprint Usage : my $rawstring = $pfm->rawprint); Function: convert matrix data to a simple tab-separated format Returns : a four-line string of tab-separated integers or floats Args : none =cut sub rawprint { my $self = shift; my $pwmstring = sprintf ( $self->pdl_matrix ); $pwmstring =~ s/\[|\]//g; # lose [] $pwmstring =~ s/\n /\n/g; # lose leading spaces my @pwmlines = split("\n", $pwmstring); # f $pwmstring = join ("\n", @pwmlines[2..5])."\n"; return $pwmstring; } =head2 prettyprint Title : prettyprint Usage : my $prettystring = $pfm->prettyprint(); Function: convert matrix data to a human-readable string format Returns : a four-line string with nucleotides and aligned numbers Args : none =cut sub prettyprint { my $self = shift; my $pwmstring = sprintf ( $self->pdl_matrix ); $pwmstring =~ s/\[|\]//g; # lose [] $pwmstring =~ s/\n /\n/g; # lose leading spaces my @pwmlines = split("\n", $pwmstring); # @pwmlines = ("A [$pwmlines[2] ]", "C [$pwmlines[3] ]", "G [$pwmlines[4] ]", "T [$pwmlines[5] ]"); $pwmstring = join ("\n", @pwmlines)."\n"; return $pwmstring; } =head2 length Title : length Usage : my $pattern_length = $pfm->length; Function: gets the pattern length in nucleotides (i.e. number of columns in the matrix) Returns : an integer Args : none =cut sub length { my $self = shift; return $self->pdl_matrix->getdim(0); } sub _pdl_to_matrixref { my ($matrixdata) = @_; unless ($matrixdata->isa("PDL")) { die "A non-PDL object passed to _pdl_to_matrixref"; } my @list = list $matrixdata; my @array; my $matrix_width = scalar(@list) / 4; for (0..3) { push @array, [splice(@list, 0, $matrix_width)]; } return \@array; } sub DESTROY { # nothing } 1; TFBS-0.7.1/blib/lib/TFBS/Matrix/000077500000000000000000000000001305752266700157525ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/Matrix/ICM.pm000066400000000000000000000527701305752266700167330ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::ICM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::ICM - class for information content matrices of nucleotide patterns =head1 SYNOPSIS =over 4 =item * creating a TFBS::Matrix::ICM object manually: my $matrixref = [ [ 0.00, 0.30, 0.00, 0.00, 0.24, 0.00 ], [ 0.00, 0.00, 0.00, 1.45, 0.42, 0.00 ], [ 0.00, 0.89, 2.00, 0.00, 0.00, 0.00 ], [ 0.00, 0.00, 0.00, 0.13, 0.06, 2.00 ] ]; my $icm = TFBS::Matrix::ICM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = <new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" ); =item * retrieving a TFBS::Matix::ICM object from a database: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pfm = $db_obj->get_Matrix_by_ID("M0001", "ICM"); # or my $pfm = $db_obj->get_Matrix_by_name("MyProfile", "ICM"); =item * retrieving list of individual TFBS::Matrix::ICM objects from a TFBS::MatrixSet object (see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices) my @icm_list = $matrixset->all_patterns(-sort_by=>"name"); * drawing a sequence logo $icm->draw_logo(-file=>"logo.png", -full_scale =>2.25, -xsize=>500, -ysize =>250, -graph_title=>"C/EBPalpha binding site logo", -x_title=>"position", -y_title=>"bits"); =back =head1 DESCRIPTION TFBS::Matrix::ICM is a class whose instances are objects representing position weight matrices (PFMs). An ICM is normally calculated from a raw position frequency matrix (see L for the explanation of position frequency matrices). For example, given the following position frequency matrix, A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ] the standard computational procedure is applied to convert it into the following information content matrix: A:[2.00 0.30 0.00 0.00 0.24 0.00] C:[0.00 0.00 0.00 1.45 0.42 0.00] G:[0.00 0.89 2.00 0.00 0.00 0.00] T:[0.00 0.00 0.00 0.13 0.06 2.00] which contains the "weights" associated with the occurrence of each nucleotide at the given position in a pattern. A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code starts HERE: package TFBS::Matrix::ICM; use vars '@ISA'; use PDL; use strict; use Bio::Root::Root; use Bio::SeqIO; use TFBS::Matrix; #use GD; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); ################################################################# # PUBLIC METHODS ################################################################# =head2 new Title : new Usage : my $icm = TFBS::Matrix::ICM->new(%args) Function: constructor for the TFBS::Matrix::ICM object Returns : a new TFBS::Matrix::ICM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL =cut sub new { my ($class, %args) = @_; my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"ICM"); my $self = bless $matrix, ref($class) || $class; $self->_check_ic_validity(); return $self; } =head2 to_PWM Title : to_PWM Usage : my $pwm = $icm->to_PWM() Function: converts an information content matrix (a TFBS::Matrix::ICM object) to position weight matrix. At present it assumes uniform background distribution of nucleotide frequencies. Returns : a new TFBS::Matrix::PWM object Args : none; in the future releases, it should be able to accept a user defined background probability of the four nucleotides =cut sub to_PWM { my ($self) = @_; $self->throw ("Method to_PWM not yet implemented."); } =head2 draw_logo Title : draw_logo Usage : my $gdImageObj = $icm->draw_logo(%args) Function: Draws a "sequence logo", a graphical representation of a possibly degenerate fixed-width nucleotide sequence pattern, from the information content matrix Returns : a GD::Image object; if you only need the image file you can ignore it Args : -file, # the name of the output PNG image file # OPTIONAL: default none -xsize # width of the image in pixels # OPTIONAL: default 600 -ysize # height of the image in pixels # OPTIONAL: default 5/8 of -x_size -margin # size of image margins in pixels # OPTIONAL: default 15% of -y_size -full_scale # the maximum value on the y-axis, in bits # OPTIONAL: default 2.25 -graph_title,# the graph title # OPTIONAL: default none -x_title, # x-axis title; OPTIONAL: default none -y_title # y-axis title; OPTIONAL: default none -error_bars # reference to an array of S.D. values for each column; OPTIONAL -ps # if true, produces a postscript string instead of a GD::Image object -pdf # if true AND the -file argumant is used, produces an output pdf file =cut sub draw_logo { no strict; my $self = shift; my %args = (-xsize => 600, -full_scale => 2.25, -graph_title=> "", -x_title => "", -y_title => "", @_); # Other parameters that can be specified: # -ysize -line_width -margin # do not have a fixed default value # - they are calculated from xsize if not specified # draw postscript logo if asked for if ($args{'-ps'} || $args{'-pdf'}){ return _draw_ps_logo($self, @_); } require GD; my ($xsize,$FULL_SCALE, $x_title, $y_title) = @args{qw(-xsize -full_scale -x_title y_title)} ; my $PER_PIXEL_LINE = 300; # calculate other parameters if not specified my $line_width = ($args{-line_width} or int ($xsize/$PER_PIXEL_LINE) or 1); my $ysize = ($args{-ysize} or $xsize/1.6); # remark (the line above): 1.6 is a standard screen x:y ratio my $margin = ($args{-margin} or $ysize*0.15); my $image = GD::Image->new($xsize, $ysize); my $white = $image->colorAllocate(255,255,255); my $black = $image->colorAllocate(0,0,0); my $motif_size = $self->pdl_matrix->getdim(0); my $font = ((&GD::gdTinyFont(), &GD::gdSmallFont(), &GD::gdMediumBoldFont(), &GD::gdLargeFont(), &GD::gdGiantFont())[int(($ysize-50)/100)] or &GD::gdGiantFont()); my $title_font = ((&GD::gdSmallFont(), &GD::gdMediumBoldFont(), &GD::gdLargeFont(), &GD::gdGiantFont())[int(($ysize-50)/100)] or &GD::gdGiantFont()); # WRITE LABELS AND TITLE # graph title #&GD::Font::MediumBold $image->string($title_font, $xsize/2-length($args{-graph_title})* $title_font->width() /2, $margin/2 - $title_font->height()/2, $args{-graph_title}, $black); # x_title $image->string($font, $xsize/2-length($args{-x_title})*$font->width()/2, $ysize-( $margin - $font->height()*0 - 5*$line_width)/2 - $font->height()/2*0, $args{-x_title}, $black); # y_title $image->stringUp($font, ($margin -$font->width()- 5*$line_width)/2 - $font->height()/2 , $ysize/2+length($args{'-y_title'})*$font->width()/2, $args{'-y_title'}, $black); # DRAW AXES # vertical: (top left to bottom right) $image->filledRectangle($margin-$line_width, $margin-$line_width, $margin-1, $ysize-$margin+$line_width, $black); # horizontal: (ditto) $image->filledRectangle($margin-$line_width, $ysize-$margin+1, $xsize-$margin+$line_width,$ysize-$margin+$line_width, $black); # DRAW VERTICAL TICKS AND LABELS # vertical axis (IC 1 and 2) my $ic_1 = ($ysize - 2* $margin) / $FULL_SCALE; foreach my $i (1..$FULL_SCALE) { $image->filledRectangle($margin-3*$line_width, $ysize-$margin - $i*$ic_1, $margin-1, $ysize-$margin+$line_width - $i*$ic_1, $black); $image->string($font, $margin-5*$line_width - $font->width, $ysize - $margin - $i*$ic_1 - $font->height()/2, $i, $black); } # DRAW HORIZONTAL TICKS AND LABELS, AND THE LOGO ITSELF # define function refs as hash elements my %draw_letter = ( A => \&draw_A, C => \&draw_C, G => \&draw_G, T => \&draw_T ); my $horiz_step = ($xsize -2*$margin) / $motif_size; foreach my $i (0..$motif_size) { $image->filledRectangle($margin + $i*$horiz_step, $ysize-$margin+1, $margin + $i*$horiz_step+ $line_width, $ysize-$margin+3*$line_width, $black); last if $i==$motif_size; # get the $i-th column of matrix my %ic; ($ic{A}, $ic{C}, $ic{G}, $ic{T}) = list $self->pdl_matrix->slice($i); # sort nucleotides by increasing information content my @draw_order = sort {$ic{$a}<=>$ic{$b}} qw(A C G T); # draw logo column my $xlettersize = $horiz_step /1.1; my $ybottom = $ysize - $margin; foreach my $base (@draw_order) { my $ylettersize = int($ic{$base}*$ic_1 +0.5); next if $ylettersize ==0; # draw letter $draw_letter{$base}->($image, $margin + $i*$horiz_step, $ybottom - $ylettersize, $xlettersize, $ylettersize, $white); $ybottom = $ybottom - $ylettersize-1; } if ($args{'-error_bars'} and ref($args{'-error_bars'}) eq "ARRAY") { my $sd_pix = int($args{'-error_bars'}->[$i]*$ic_1); my $yt = $ybottom - $sd_pix+1; my $yb = $ybottom + $sd_pix-1; my $xpos = $margin + ($i+0.45)*$horiz_step; my $half_width; if ($yb > $ysize-$margin+$line_width) { $yb = $ysize-$margin+$line_width } else { $image->line($xpos - $xlettersize/8, $yb, $xpos + $xlettersize/8, $yb, $black); } $image->line($xpos, $yt, $xpos, $yb, $black); $image->line($xpos - 1 , $ybottom, $xpos+1, $ybottom, $black); $image->line($xpos - $xlettersize/8, $yt, $xpos + $xlettersize/8, $yt, $black); } # print position number on x axis $image->string($font, $margin + ($i+0.5)*$horiz_step - $font->width()/2, $ysize - $margin +5*$line_width, $i+1, $black); } # print $args{-file}; if ($args{-file}) { open (PNGFILE, ">".$args{-file}) or $self->throw("Could not write to ".$args{-file}); print PNGFILE $image->png; close PNGFILE; } return $image; } sub total_ic { return $_[0]->pdl_matrix->sum(); } =head2 _draw_ps_logo Title : _draw_ps_logo Usage : my $postscript_string = $icm->_draw_ps_logo(%args) Internal method, should be accessed using draw_logo() Function: Draws a "sequence logo", a graphical representation of a possibly degenerate fixed-width nucleotide sequence pattern, from the information content matrix Returns : a postscript string; if you only need the image file you can ignore it Args : -file, # the name of the output PNG image file # OPTIONAL: default none -xsize # width of the image in pixels # OPTIONAL: default 600 -ysize # height of the image in pixels # OPTIONAL: default 5/8 of -x_size -full_scale # the maximum value on the y-axis, in bits # OPTIONAL: default 2.25 -graph_title,# the graph title # OPTIONAL: default none -x_title, # x-axis title; OPTIONAL: default none -y_title # y-axis title; OPTIONAL: default none =cut sub _draw_ps_logo{ my $self = shift; my %args = (-xsize => 600, -full_scale => 2.25, -graph_title=> "", -x_title => "", -y_title => "", @_); my $xsize= $args{'-xsize'}; my $max_ysize= $args{'-ysize'} ||int 5* $args{'-xsize'}/8; my $ysize= $max_ysize*($args{'-full_scale'}-($args{'-full_scale'}-2))/$args{'-full_scale'}; my $x=100; # nternal, for placement on 'paper' my $y=100; my $out= "%!PS-Adobe-2.0 %%Orientation: Portrait %%Pages: 1 %%BoundingBox: 0 0 ".($args{'-xsize'}*1.2)." ".( $max_ysize*1.5)." %%BeginSetup %%EndSetup %%Magnification: 1.0000 %%EndProlog %%end %%save gsave\n"; #colors and correction definitions my %color; $color{'black'}="0.000 0.000 0.000 setrgbcolor"; $color{'A'}="0.000 1.000 0.000 setrgbcolor"; $color{'C'}="0.000 0.000 1.000 setrgbcolor"; $color{'G'}="1.000 0.860 0.000 setrgbcolor"; $color{'T'}="1.000 0.000 0.000 setrgbcolor"; my $fontsize= int $ysize*0.68; my $fontwidth=1.5*($xsize/$self->length()); my %w_correct; # correction of font widths $w_correct{'A'}=0.95; $w_correct{'T'}=1.05; $w_correct{'C'}=0.90; $w_correct{'G'}=0.90; my %y_next;#correction of font heights $y_next{'A'}=1; $y_next{'T'}=1; $y_next{'C'}=0.94; $y_next{'G'}=0.94; my %y_correct; #correction of font bounding boxes $y_correct{'A'}=0; $y_correct{'C'}=0.035*$fontsize; $y_correct{'G'}=0.035*$fontsize; $y_correct{'T'}=0; #define y axis,tickmarks and scaling my $font= $fontwidth/5; $out.="newpath\n ". ($x-10)." ". ($y+2*$ysize/4 )." moveto\n". "$x ". ($y+2*$ysize/4 ) ." lineto\n stroke\n"; $out.= "gsave\n/Times-Bold findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x-20). " ".( $y+$ysize/2)." moveto\n"; $out.=" (1) show\n grestore\n" ; $out.="newpath\n ". ($x-10)." ". ($y+$ysize )." moveto\n". "$x ". ($y+$ysize) ." lineto\n stroke\n"; $out.="newpath\n ". ($x-10)." ". ($y+$max_ysize )." moveto\n". "$x ". ($y+$max_ysize) ." lineto\n stroke\n"; $out.= "gsave\n/Times-Bold findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x-20). " ".( $y+$ysize)." moveto\n"; $out.=" (2) show\n grestore\n" ; $out.="newpath\n $x $y moveto\n". ($x). " ".($y+$max_ysize) ." lineto\n stroke\n"; $out.="newpath\n $x $y moveto\n". ($x+$xsize). " ".($y) ." lineto\n stroke\n"; # draw titles if requested if ($args{'-y_title'}){ $out.= "gsave\n/Times-Italic findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x-40). " ".( $y+$ysize/2)." moveto\n"; $out.=" 90 rotate ($args{'-y_title'}) show\n grestore\n" ; } if ($args{'-x_title'}){ $out.= "gsave\n/Times-Italic findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x+$xsize/2.5). " ".( $y*(0.60))." moveto\n"; $out.=" ($args{'-x_title'}) show\n grestore\n" ; } if ($args{'-title'}){ $out.= "gsave\n/Times-Roman findfont $color{black} [".($font*2)." 0 0 $font 0 0] makefont setfont\n".($x+$xsize/3). " ".( $y+$max_ysize*1.1)." moveto\n"; $out.=" ($args{'-title'}) show\n grestore\n" ; } # define x axis and x tickmarks my $col_width=($xsize/$self->length()) -0.006*$xsize; my $x_now; for(my $i=1; $i<=$self->length(); $i++){ $x_now=$x+$col_width*$i; $out.="newpath\n ". ($x_now)." ". ($y)." moveto\n". ($x_now)." ". ($y-$ysize/20 ) ." lineto\n stroke\n"; $out.= "gsave\n/Times-Bold findfont $color{black} [$font 0 0 $font 0 0] makefont setfont\n".($x_now-$col_width/2). " ".( $y-20)." moveto\n"; $out.=" ($i) show\n grestore\n" ; } # draw the logo foreach my $i (0..$self->length()-1 ) { # get the $i-th column of matrix my %ic; ($ic{A}, $ic{C}, $ic{G}, $ic{T}) = list $self->pdl_matrix->slice($i); my @draw_order = sort {$ic{$a}<=>$ic{$b}} qw(A C G T); #draw this position foreach my $letter (@draw_order){ $ic{$letter}=0.0000001 if ( $ic{$letter}==0); # some interpretors do not uderstand size 0 $out.= "gsave\n/Helvetica-Bold findfont $color{$letter} [".$fontwidth*$w_correct{$letter}." 0 0 "; $out.= $ic{$letter}*$fontsize*$y_next{$letter} ; $y+=$y_correct{$letter}*$ic{$letter}; #movement that isletter specific, due to bounding boxes $out.= " 0 0] makefont setfont\n$x $y moveto\n"; $out.= " ($letter) show\n grestore\n" ; $y+=$fontsize*$ic{$letter}*0.75; #ic content move } $x+=$fontwidth/1.6; $y=100; } # save as file if requested if ($args{-file}) { open (PSFILE, ">".$args{-file}) or $self->throw("Could not write to ".$args{-file}); print PSFILE $out; close PSFILE; } if ($args{-pdf}){ system "ps2pdf $args{-file} ".$args{-file}.".pdf "; system " mv $args{-file}.pdf $args{-file}"; } return $out; } =head2 name =head2 ID =head2 class =head2 matrix =head2 length =head2 revcom =head2 rawprint =head2 prettyprint The above methods are common to all matrix objects. Please consult L to find out how to use them. =cut ################################################################# # INTERNAL METHODS ################################################################# sub _check_ic_validity { my ($self) = @_; # to do } sub DESTROY { # nothing } ################################################################# # UTILITY FUNCTIONS ################################################################# # letter drawing routines sub draw_A { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $green = $im->colorAllocate(0,255,0); my $outPoly = GD::Polygon->new(); $outPoly->addPt($x, $y+$ysize); $outPoly->addPt($x+$xsize*.42, $y); $outPoly->addPt($x+$xsize*.58, $y); $outPoly->addPt($x+$xsize, $y+$ysize); $outPoly->addPt($x+0.85*$xsize, $y+$ysize); $outPoly->addPt($x+0.725*$xsize, $y+0.75*$ysize); $outPoly->addPt($x+0.275*$xsize, $y+0.75*$ysize); $outPoly->addPt($x+0.15*$xsize, $y+$ysize); $im->filledPolygon($outPoly, $green); if ($ysize>8) { my $inPoly = GD::Polygon->new(); $inPoly->addPt($x+$xsize*.5, $y+0.2*$ysize); $inPoly->addPt($x+$xsize*.34, $y+0.6*$ysize-1); $inPoly->addPt($x+$xsize*.64, $y+0.6*$ysize-1); $im->filledPolygon($inPoly, $white); } return 1; } sub draw_C { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $blue = $im->colorAllocate(0,0,255); $im->arc($x+$xsize*0.54, $y+$ysize/2,1.08*$xsize,$ysize,0,360,$blue); $im->fill($x+$xsize/2, $y+$ysize/2, $blue); if ($ysize>12) { $im->arc($x+$xsize*0.53, $y+$ysize/2, 0.75*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize/2, $y+$ysize/4+1, $x+$xsize*1.1, $y+(3*$ysize/4)-1, $white); } elsif ($ysize>3) { $im->arc($x+$xsize*0.53, $y+$ysize/2, (0.75-0.75/$ysize)*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize*0.25, $y+$ysize/2, $x+$xsize*1.1, $y+$ysize/2, $white); } return 1; } sub draw_G { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $yellow = $im->colorAllocate(200,200,0); $im->arc($x+$xsize*0.54, $y+$ysize/2,1.08*$xsize,$ysize,0,360,$yellow); $im->fill($x+$xsize/2, $y+$ysize/2, $yellow); if ($ysize>20) { $im->arc($x+$xsize*0.53, $y+$ysize/2, 0.75*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize/2, $y+$ysize/4+1, $x+$xsize*1.1, $y+$ysize/2-1, $white); } elsif($ysize>3) { $im->arc($x+$xsize*0.53, $y+$ysize/2, (0.75-0.75/$ysize)*$xsize, (0.725-0.725/$ysize)*$ysize, 0,360,$white); $im->fill($x+$xsize/2, $y+$ysize/2, $white); $im->filledRectangle($x+$xsize*0.25, $y+$ysize/2, $x+$xsize*1.1, $y+$ysize/2, $white); } $im->filledRectangle($x+0.85*$xsize, $y+$ysize/2, $x+$xsize,$y+(3*$ysize/4)-1, $yellow); $im->filledRectangle($x+0.6*$xsize, $y+$ysize/2, $x+$xsize,$y+(5*$ysize/8)-1, $yellow); return 1; } sub draw_T { my ($im, $x, $y, $xsize, $ysize, $white) = @_; my $red = $im->colorAllocate(255,0,0); $im->filledRectangle($x, $y, $x+$xsize, $y+0.16*$ysize, $red); $im->filledRectangle($x+0.42*$xsize, $y, $x+0.58*$xsize, $y+$ysize, $red); return 1; } 1; TFBS-0.7.1/blib/lib/TFBS/Matrix/PFM.pm000066400000000000000000000373341305752266700167440ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::PFM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::PFM - class for raw position frequency matrix patterns =head1 SYNOPSIS =over 4 =item * creating a TFBS::Matrix::PFM object manually: my $matrixref = [ [ 12, 3, 0, 0, 4, 0 ], [ 0, 0, 0, 11, 7, 0 ], [ 0, 9, 12, 0, 0, 0 ], [ 0, 0, 0, 1, 1, 12 ] ]; my $pfm = TFBS::Matrix::PFM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = "12 3 0 0 4 0\n0 0 0 11 7 0\n0 9 12 0 0 0\n0 0 0 1 1 12"; my $pfm = TFBS::Matrix::PFM->new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" ); =item * retrieving a TFBS::Matix::PFM object from a database: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pfm = $db_obj->get_Matrix_by_ID("M0001", "PFM"); # or my $pfm = $db_obj->get_Matrix_by_name("MyProfile", "PFM"); =item * retrieving list of individual TFBS::Matrix::PFM objects from a TFBS::MatrixSet object (See the L to learn how to create objects for storage and manipulation of multiple matrices.) my @pfm_list = $matrixset->all_patterns(-sort_by=>"name"); =item * convert a raw frequency matrix to other matrix types: my $pwm = $pfm->to_PWM(); # convert to position weight matrix my $icm = $icm->to_ICM(); # convert to information con =back =head1 DESCRIPTION TFBS::Matrix::PFM is a class whose instances are objects representing raw position frequency matrices (PFMs). A PFM is derived from N nucleotide patterns of fixed size, e.g. the set of sequences AGGCCT AAGCCT AGGCAT AAGCCT AAGCCT AGGCAT AGGCCT AGGCAT AGGTTT AGGCAT AGGCCT AGGCCT will give the matrix: A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ] which contains the count of each nucleotide at each position in the sequence. (If you have a set of sequences as above and want to create a TFBS::Matrix::PFM object out of them, have a look at TFBS::PatternGen::SimplePFM module.) PFMs are easily converted to other types of matrices, namely information content matrices and position weight matrices. A TFBS::Matrix::PFM object has the methods to_ICM and to_PWM which do just that, returning a TFBS::Matrix::ICM and TFBS::Matrix::PWM objects, respectively. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Matrix::PFM; use vars '@ISA'; use PDL; use strict; use Bio::Root::Root; use Bio::SeqIO; use TFBS::Matrix; use TFBS::Matrix::ICM; use TFBS::Matrix::PWM; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); use constant EXACT_SCHNEIDER_MAX => 30; ####################################################### # PUBLIC METHODS ####################################################### =head2 new Title : new Usage : my $pfm = TFBS::Matrix::PFM->new(%args) Function: constructor for the TFBS::Matrix::PFM object Returns : a new TFBS::Matrix::PFM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL Warnings : Warns if the matrix provided has columns with different sums. Columns with different sums contradict the usual origin of matrix data and, unless you are absolutely sure that column sums _should_ be different, it would be wise to check your matrices. =cut sub new { my ($class, %args) = @_; my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"PFM"); my $self = bless $matrix, ref($class) || $class; $self->_check_column_sums(); return $self; } =head2 column_sum Title : column_sum Usage : my $nr_sequences = $pfm->column_sum() Function: calculates the sum of elements of one column (the first one by default) which normally equals the number of sequences used to derive the PFM. Returns : the sum of elements of one column (an integer) Args : columnn number (starting from 1), OPTIONAL - you DO NOT need to specify it unless you are dealing with a matrix =cut sub column_sum { my ($self, $column) = (@_,1); return $self->pdl_matrix->slice($column-1)->sum; } =head2 to_PWM Title : to_PWM Usage : my $pwm = $pfm->to_PWM() Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) to position weight matrix. At present it assumes uniform background distribution of nucleotide frequencies. Returns : a new TFBS::Matrix::PWM object Args : none; in the future releases, it should be able to accept a user defined background probability of the four nucleotides =cut sub to_PWM { my ($self, %args) = @_; my $bg = ($args{'-bg_probabilities' } || $self->{'bg_probabilities'}); my $bg_pdl = transpose pdl ($bg->{'A'}, $bg->{'C'}, $bg->{'G'}, $bg->{'T'}); my $nseqs = $self->pdl_matrix->sum / $self->length; my $q_pdl = ($self->pdl_matrix +$bg_pdl*sqrt($nseqs)) / ($nseqs + sqrt($nseqs)); my $pwm_pdl = log2(4*$q_pdl); my $PWM = TFBS::Matrix::PWM->new ( (map {("-$_", $self->{$_}) } keys %$self), # do not want tags to point to the same arrayref as in $self: -tags => \%{ $self->{'tags'}}, -bg_probabilities => \%{ $self->{'bg_probabilities'}}, -matrix => $pwm_pdl ); return $PWM; } =head2 to_ICM Title : to_ICM Usage : my $icm = $pfm->to_ICM() Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) to information content matrix. At present it assumes uniform background distribution of nucleotide frequencies. Returns : a new TFBS::Matrix::ICM object Args : -small_sample_correction # undef (default), 'schneider' or 'pseudocounts' How a PFM is converted to ICM: For a PFM element PFM[i,k], the probability without pseudocounts is estimated to be simply p[i,k] = PFM[i,k] / Z where - Z equals the column sum of the matrix i.e. the number of motifs used to construct the PFM. - i is the column index (position in the motif) - k is the row index (a letter in the alphacer, here k is one of (A,C,G,T) Here is how one normally calculates the pseudocount-corrected positional probability p'[i,j]: p'[i,k] = (PFM[i,k] + 0.25*sqrt(Z)) / (Z + sqrt(Z)) 0.25 is for the flat distribution of nucleotides, and sqrt(Z) is the recommended pseudocount weight. In the general case, p'[i,k] = (PFM[i,k] + q[k]*B) / (Z + B) where q[k] is the background distribution of the letter (nucleotide) k, and B an arbitrary pseudocount value or expression (for no pseudocounts B=0). For a given position i, the deviation from random distribution in bits is calculated as (Baldi and Brunak eq. 1.9 (2ed) or 1.8 (1ed)): - for an arbitrary alphabet of A letters: D[i] = log2(A) + sum_for_all_k(p[i,k]*log2(p[i,k])) - special case for nucleotides (A=4) D[i] = 2 + sum_for_all_k(p[i,k]*log2(p[i,k])) D[i] equals the information content of the position i in the motif. To calculate the entire ICM, you have to calculate the contrubution of each nucleotide at a position i to D[i], i.e. ICM[i,k] = p'[i,k] * D[i] =cut sub to_ICM { my ($self, %args) = @_; my $bg = ($args{'-bg_probabilities' } || $self->{'bg_probabilities'}); # compute ICM my $bg_pdl = transpose pdl ($bg->{'A'}, $bg->{'C'}, $bg->{'G'}, $bg->{'T'}); my $Z_pdl = $self->pdl_matrix->xchg(0,1)->sumover; # pseudocount calculation my $B = 0; if (lc($args{'-small_sample_correction'} or "") eq "pseudocounts") { $B = sqrt($Z_pdl); } else { $B = 0; # do not add pseudocounts } my $p_pdl = ($self->pdl_matrix +$bg_pdl*$B)/ ($Z_pdl + $B); my $plog_pdl = $p_pdl*log2($p_pdl); $plog_pdl = $plog_pdl->badmask(0); my $D_pdl = 2 + $plog_pdl->xchg(0,1)->sumover; my $ic_pdl = $p_pdl * $D_pdl; # apply Schneider correction if requested if (lc($args{'-small_sample_correction'} or "") eq "schneider") { my $columnsum_pdl = $ic_pdl->transpose->sumover; my $corrected_columnsum_pdl = $columnsum_pdl + _schneider_correction ($self->pdl_matrix, $bg_pdl); $ic_pdl *= $corrected_columnsum_pdl/$columnsum_pdl; } # construct and return an ICM object my $ICM = TFBS::Matrix::ICM->new ( (map {("-$_" => $self->{$_})} keys %$self), -tags => \%{ $self->{'tags'}}, -bg_probabilities => \%{ $self->{'bg_probabilities'}}, -matrix => $ic_pdl ); return $ICM; } =head2 draw_logo Title : draw_logo Usage : my $gd_image = $pfm->draw_logo() Function: draws a sequence logo; similar to the method in TFBS::Matrix::ICM, but can automatically calculate error bars for drawing Returns : a GD image object (see documentation of GD module) Args : many; PFM-specific options are: -small_sample_correction # One of # "Schneider" (uses correction # described by Schneider et al. # (Schneider t et al. (1986) J.Biol.Chem. # "pseudocounts" - standard pseudocount # correction, more suitable for # PFMs with large r column sums # If the parameter is ommited, small # sample correction is not applied -draw_error_bars # if true, adds error bars to each position # in the logo. To calculate the error bars, # it uses the -small_sample_connection # argument if explicitly set, # or "Schneider" by default For other args, see draw_logo entry in TFBS::Matrix::ICM documentation =cut sub draw_logo { my ($self, %args) = @_; if ($args{'-draw_error_bars'}) { $args{'-small_sample_correction'} ||= "Schneider"; # default Schneider my $pdl_no_correction = $self->to_ICM() ->pdl_matrix->transpose->sumover; my $pdl_with_correction = $self->to_ICM(-small_sample_correction => $args{'-small_sample_correction'}) ->pdl_matrix->transpose->sumover; $args{'-error_bars'} = [list ($pdl_no_correction - $pdl_with_correction)]; } $self->to_ICM(%args)->draw_logo(%args); } =head2 add_PFM Title : add_PFM Usage : $pfm->add_PFM($another_pfm) Function: adds the values of $pnother_pfm matrix to $pfm Returns : reference to the updated $pfm object Args : a TFBS::Matrix::PFM object =cut sub add_PFM { my ($self, $pfm) = @_; $pfm->isa("TFBS::Matrix::PFM") or $self->throw("Wrong or no argument passed to add_PFM"); my $sum = $self->pdl_matrix + $pfm->pdl_matrix; $self->set_matrix($sum); return $self; } =head2 name =head2 ID =head2 class =head2 matrix =head2 length =head2 revcom =head2 rawprint =head2 prettyprint The above methods are common to all matrix objects. Please consult L to find out how to use them. =cut ############################################### # PRIVATE METHODS ############################################### sub _check_column_sums { my ($self) = @_; my $pdl = $self->pdl_matrix->sever(); my $rowsums = $pdl->xchg(0,1)->sumover(); if ($rowsums->where($rowsums != $rowsums->slice(0))->getdim(0) > 0) { $self->warn("PFM for ".$self->{ID}." has unequal column sums"); } } sub DESTROY { # does nothing } ############################################### # UTILITY FUNCTIONS ############################################### sub log2 { log($_[0]) / log(2); } sub _schneider_correction { my ($pdl, $bg_pdl) = @_; my $Hg = -sum ($bg_pdl*log2($bg_pdl)); my (@Hnbs, %saved_Hnb); my $is_flat = _is_bg_flat(list $bg_pdl); my @factorials = (1); if (min($pdl->transpose->sumover) <= EXACT_SCHNEIDER_MAX) { foreach my $i (1..max($pdl->transpose->sumover)) { $factorials[$i] =$factorials[$i-1] * $i; } } my @column_sums = list $pdl->transpose->sumover; foreach my $colsum (@column_sums) { if (defined($saved_Hnb{$colsum})) { push @Hnbs, $saved_Hnb{$colsum}; } else { my $Hnb; if ($colsum <= EXACT_SCHNEIDER_MAX) { if ($is_flat) { $Hnb = _schneider_Hnb_precomputed($colsum); } else { $Hnb = _schneider_Hnb_exact($colsum, $bg_pdl, \@factorials); } } else { $Hnb = _schneider_Hnb_approx($colsum, $Hg); } $saved_Hnb{$colsum} = $Hnb; push @Hnbs, $Hnb; } } return -$Hg + pdl(@Hnbs); } sub _schneider_Hnb_exact { my ($n, $bg_pdl, $rFactorial) = @_; my $is_flat = _is_bg_flat(list $bg_pdl); return 0 if $n==1; # my @fctrl = (1); # foreach my $i (1..max($pdl->transpose->sumover)) { # $rFactorial->[$i] =$rFactorial->[$i-1] * $i; # } # my @colsum = list $pdl->transpose->sumover; my ($na, $nc, $ng, $nt) = ($n, 0,0,0); # my $n = $colsum[0]; my $E_Hnb=0; while (1) { my $ns_pdl = pdl [$na, $nc, $ng, $nt]; my $Pnb = ($rFactorial->[$n] / ($rFactorial->[$na] *$rFactorial->[$nc] *$rFactorial->[$ng] *$rFactorial->[$nt]) )*prod($bg_pdl->transpose**pdl($na, $nc, $ng, $nt)); my $Hnb = -1 * sum(($ns_pdl/$n)*log2($ns_pdl/$n)->badmask(0)); $E_Hnb += $Pnb*$Hnb; if ($nt) { if ($ng) { $ng--; $nt++, } elsif ($nc) { $nc--; $ng = $nt+1; $nt = 0; } elsif ($na) { $na--; $nc = $nt+1; $nt = 0; } else { last; } } else { if ($ng) { $ng--; $nt++, } elsif ($nc) { $nc--; $ng++; } else { $na--; $nc++; $nt = 0; } } } return $E_Hnb; } sub _schneider_Hnb_approx { my ($colsum, $Hg) = @_; return $Hg -3/(2*log(2)*$colsum); } sub _schneider_Hnb_precomputed { my $i = shift; if ($i<1 or $i>30) { die "Precomputed params only available for colsums 1 to 30)"; } my @precomputed = ( 0, # 1 0.75, # 2 1.11090234442608, # 3 1.32398964833609, # 4 1.46290503577084, # 5 1.55922640783176, # 6 1.62900374746751, # 7 1.68128673969433, # 8 1.7215504663901, # 9 1.75328193031842, # 10 1.77879136615189, # 11 1.79965855531179, # 12 1.81699248819687, # 13 1.8315892710679, # 14 1.84403166371213, # 15 1.85475371994775, # 16 1.86408383599326, # 17 1.87227404728809, # 18 1.87952034817826, # 19 1.88597702438913, # 20 1.89176691659196, # 21 1.89698887214968, # 22 1.90172322434865, # 23 1.90603586889234, # 24 1.90998133028897, # 25 1.91360509239859, # 26 1.91694538711761, # 27 1.92003457997914, # 28 1.92290025302018, # 29 1.92556605820924, # 30 ); return $precomputed[$i-1]; } sub _is_bg_flat { my @bg = @_; my $ref = shift; foreach my $other (@bg) { return 0 unless $ref==$other; } return 1; } 1; TFBS-0.7.1/blib/lib/TFBS/Matrix/PWM.pm000066400000000000000000000362221305752266700167600ustar00rootroot00000000000000# TFBS module for TFBS::Matrix::PWM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::PWM - class for position weight matrices of nucleotide patterns =head1 SYNOPSIS =over 4 =item * creating a TFBS::Matrix::PWM object manually: my $matrixref = [ [ 0.61, -3.16, 1.83, -3.16, 1.21, -0.06], [-0.15, -2.57, -3.16, -3.16, -2.57, -1.83], [-1.57, 1.85, -2.57, -1.34, -1.57, 1.14], [ 0.31, -3.16, -2.57, 1.76, 0.24, -0.83] ]; my $pwm = TFBS::Matrix::PWM->new(-matrix => $matrixref, -name => "MyProfile", -ID => "M0001" ); # or my $matrixstring = <new(-matrixstring => $matrixstring, -name => "MyProfile", -ID => "M0001" ); =item * retrieving a TFBS::Matix::PWM object from a database: (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) my $db_obj = TFBS::DB::JASPAR2->new (-connect => ["dbi:mysql:JASPAR2:myhost", "myusername", "mypassword"]); my $pwm = $db_obj->get_Matrix_by_ID("M0001", "PWM"); # or my $pwm = $db_obj->get_Matrix_by_name("MyProfile", "PWM"); =item * retrieving list of individual TFBS::Matrix::PWM objects from a TFBS::MatrixSet object (see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices) my @pwm_list = $matrixset->all_patterns(-sort_by=>"name"); =item * scanning a nucleotide sequence with a matrix my $siteset = $pwm->search_seq(-file =>"myseq.fa", -threshold => "80%"); =item * scanning a pairwise alignment with a matrix my $site_pair_set = $pwm->search_aln(-file =>"myalign.aln", -threshold => "80%", -cutoff => "70%", -window => 50); =back =head1 DESCRIPTION TFBS::Matrix::PWM is a class whose instances are objects representing position weight matrices (PWMs). A PWM is normally calculated from a raw position frequency matrix (see L for the explanation of position frequency matrices). For example, given the following position frequency matrix: A:[ 12 3 0 0 4 0 ] C:[ 0 0 0 11 7 0 ] G:[ 0 9 12 0 0 0 ] T:[ 0 0 0 1 1 12 ] The standard computational procedure is applied to convert it into the following position weight matrix: A:[ 0.61 -3.16 1.83 -3.16 1.21 -0.06] C:[-0.15 -2.57 -3.16 -3.16 -2.57 -1.83] G:[-1.57 1.85 -2.57 -1.34 -1.57 1.14] T:[ 0.31 -3.16 -2.57 1.76 0.24 -0.83] which contains the "weights" associated with the occurrence of each nucleotide at the given position in a pattern. A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Matrix::PWM; use vars '@ISA'; use PDL; use strict; use Bio::Root::Root; use Bio::Seq; use Bio::SeqIO; use TFBS::Matrix; use TFBS::SiteSet; use TFBS::Matrix::_Alignment; use TFBS::Ext::pwmsearch; use File::Temp qw/:POSIX/; @ISA = qw(TFBS::Matrix Bio::Root::Root); ################################################################# # PUBLIC METHODS ################################################################# =head2 new Title : new Usage : my $pwm = TFBS::Matrix::PWM->new(%args) Function: constructor for the TFBS::Matrix::PWM object Returns : a new TFBS::Matrix::PWM object Args : # you must specify either one of the following three: -matrix, # reference to an array of arrays of integers #or -matrixstring,# a string containing four lines # of tab- or space-delimited integers #or -matrixfile, # the name of a file containing four lines # of tab- or space-delimited integers ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # an array reference, OPTIONAL =cut sub new { my ($class, %args) = @_; my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"PWM"); my $self = bless $matrix, ref($class) || $class; $self->_set_min_max_score(); return $self; } =head2 search_seq Title : search_seq Usage : my $siteset = $pwm->search_seq(%args) Function: scans a nucleotide sequence with the pattern represented by the PWM Returns : a TFBS::SiteSet object Args : # you must specify either one of the following three: -file, # the name od a fasta file (single sequence) #or -seqobj # a Bio::Seq object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -seqstring # a string containing the sequence -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -subpart # subpart of the sequence to search, given as # -subpart => { start => 140, # end => 180 } # where start and end are coordinates in the # sequence; the coordinate range is interpreted # in the BioPerl tradition (1-based, inclusive) # OPTIONAL: by default searches entire alignment =cut sub search_seq { my ($self, %args) = @_; $self->_search(%args); } =head2 search_aln Title : search_aln Usage : my $site_pair_set = $pwm->search_aln(%args) Function: Scans a pairwise alignment of nucleotide sequences with the pattern represented by the PWM: it reports only those hits that are present in equivalent positions of both sequences and exceed a specified threshold score in both, AND are found in regions of the alignment above the specified conservation cutoff value. Returns : a TFBS::SitePairSet object Args : # you must specify either one of the following three: -file, # the name of the alignment file in Clustal format #or -alignobj # a Bio::SimpleAlign object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -alignstring # a multi-line string containing the alignment # in clustal format ############# -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -window, # size of the sliding window (inn nucleotides) # for calculating local conservation in the # alignment # OPTIONAL: default 50 -cutoff # conservation cutoff (%) for including the # region in the results of the pattern search # OPTIONAL: default "70%" -subpart # subpart of the alignment to search, given as e.g. # -subpart => { relative_to => 1, # start => 140, # end => 180 } # where start and end are coordinates in the # sequence indicated by relative_to (1 for the # 1st sequence in the alignment, 2 for the 2nd) # OPTIONAL: by default searches entire alignment -conservation # conservation profile, a TFBS::ConservationProfile # OPTIONAL: by default the conservation profile is # computed internally on the fly (less efficient) =cut sub search_aln { my ($self, %args) = @_; unless ($args{-alignstring} or $args{-alignobj} or $args{-file}) { $self->throw ("No alignment file, string or object passed to search_aln."); } $args{-pattern_set} = $self; my $aln = ($args{-alignment_setup} or TFBS::Matrix::_Alignment->new(%args)); $aln->do_sitesearch(%args); return $aln->site_pair_set; } sub max_score { $_[0]->{max_score}; } sub min_score { $_[0]->{min_score}; } =head2 name =head2 ID =head2 class =head2 matrix =head2 length =head2 revcom =head2 rawprint =head2 prettyprint The above methods are common to all matrix objects. Please consult L to find out how to use them. =cut ################################################################# # PRIVATE METHODS ################################################################# sub _set_min_max_score { my ($self) = @_; my $transpose = $self->pdl_matrix->xchg(0,1); $self->{min_score} = sum(minimum $transpose); $self->{max_score} = sum(maximum $transpose); } sub _search { # this method runs the pwmsearch C extension and parses the data # similarly to _csearch, which will eventually be discontinued my ($self, %args) = @_; my $seqobj = $self->_to_seqobj(%args); my ($subseq_start, $subseq_end) = (1,$seqobj->length); if(my $subpart = $args{-subpart}) { $subseq_start = $subpart->{-start}; $subseq_end = $subpart->{-end}; unless($subseq_start and $subseq_end) { $self->throw("Option -subpart missing suboption -start or -end"); } } return TFBS::Ext::pwmsearch::pwmsearch($self, $seqobj, ($args{-threshold} or 0), $subseq_start, $subseq_end); } sub _csearch { # this is a wrapper around Wyeth Wasserman's's pwm_searchPFF program # until we do a proper extension my ($self) = shift; #the rest of @_ goes to _to_seqob; my %args = @_; my $PWM_SEARCH = $args{'-binary'} || "pwm_searchPFF"; # dump the sequence into a tempfile my $seqobj = $self->_to_seqobj(@_); my ($fastaFH, $fastafile); if (defined $seqobj->{_fastaFH} and defined $seqobj->{_fastafile}) { ($fastaFH, $fastafile) = ($seqobj->{_fastaFH}, $seqobj->{_fastafile}); } else { ($fastaFH, $fastafile) = tmpnam(); my $seqFH = Bio::SeqIO->newFh(-fh =>$fastaFH, -format=>"Fasta"); print $seqFH $seqobj; } # we need $fastafile below # calculate threshold my $threshold; if ($args{-threshold}) { if ($args{-threshold} =~ /(.+)%/) { # percentage $threshold = $self->{min_score} + ($self->{max_score} - $self->{min_score})* $1/100; } else { # absolute value $threshold = $args{-threshold}; } } else { # no threshold given $threshold = $self->{min_score} -1; } # convert piddle to text (there MUST be a better way) my $pwmstring = sprintf ( $self->pdl_matrix ); $pwmstring =~ s/\[|\]//g; # lose [] $pwmstring =~ s/\n /\n/g; # lose leading spaces my @pwmlines = split("\n", $pwmstring); # f $pwmstring = join ("\n", @pwmlines[2..5])."\n"; # dump pwm into a tempfile my ($pwmFH, $pwmfile) = tmpnam(); # we need $pwmfile below print $pwmFH $pwmstring; close $pwmFH; # run pwmsearch my $hitlist = TFBS::SiteSet->new(); my ($TFname, $TFclass) = ($self->{name}, $self->{class}); my @search_result_lines = `$PWM_SEARCH $pwmfile $fastafile $threshold -n $TFname -c $TFclass`; foreach (@search_result_lines) { chomp; my ($seq_id, $factor, $class, $strand, $score, $pos, $siteseq) = (split)[0, 2, 3, 4, 5, 7, 9]; my $correct_strand = ($strand eq "+")? "-1" : "1"; my $site = TFBS::Site->new ( -seq_id => $seqobj->display_id()."", -seqobj => $seqobj, -strand => $correct_strand."", -pattern => $self, -siteseq => $siteseq."", -score => $score."", -start => $pos, -end => $pos + length($siteseq) -1 ); $hitlist->add_site($site); } # cleanup unlink $fastafile unless $seqobj->{_fastafile}; unlink $pwmfile; return $hitlist; } sub _bsearch { # this is Perl/PDL only search routine. For experimental purposes only my ($self,%args) = @_; #the rest of @_ goes to _to_seqob; my @PWMs; # prepare the sequence my $seqobj = $self->_to_seqobj(%args); my $seqmatrix = (defined $seqobj->{_pdl_matrix}) ? $seqobj->{_pdl_matrix} : _seq_to_pdlmatrix($seqobj); # calculate threshold my $threshold; if ($args{-threshold}) { if ($args{-threshold} =~ /(.+)%/) { # percentage $threshold = $self->{min_score} + ($self->{max_score} - $self->{min_score})* $1/100; } else { # absolute value $threshold = $args{-threshold}; } } else { # no threshold given $threshold = $self->{min_score} -1; } # do the analysis my $hitlist = TFBS::SiteSet->new(); foreach my $pwm ($self, $self->revcom()) { my $TFlength = $pwm->pdl_matrix->getdim(0); my $position_score_pdl = zeroes($seqmatrix->getdim(0) - $TFlength + 1); my $position_index_pdl = sequence($seqmatrix->getdim(0) - $TFlength + 1)+1; foreach my $i (0..($TFlength-1)) { my $columnproduct = $seqmatrix * $pwm->pdl_matrix->slice("$i,:"); $position_score_pdl += $columnproduct->xchg(0,1)->sumover->slice($i.":".($i-$TFlength)); } my @hitpositions = list $position_index_pdl->where($position_score_pdl >= $threshold); my @hitscores = list $position_score_pdl->where($position_score_pdl >= $threshold); for my $i(0..$#hitpositions) { my($pos,$score) = ($hitpositions[$i], $hitscores[$i]); my $siteseq = scalar($seqobj->subseq($pos, $pos+$TFlength-1)); my $site = TFBS::Site->new ( -seq_id => $seqobj->display_id(), -seqobj => $seqobj, -strand => $pwm->{strand}, -Matrix => $pwm, -siteseq => $siteseq, -score => $score, -start => $pos); $hitlist->add_site($site); } } return $hitlist; } sub _to_seqobj { my ($self, %args) = @_; my $seq; if ($args{-file}) { # not a Bio::Seq return Bio::SeqIO->new(-file => $args{-file}, -format => 'fasta', -moltype => 'dna')->next_seq(); } elsif ($args{-seqstring} or $args{-seq}) { # I guess it's a string then return Bio::Seq->new(-seq => ($args{-seqstring} or $args{-seq}), -id => ($args{-seq_id} or "undefined"), -moltype => 'dna'); } elsif ($args{'-seqobj'} and ref($args{'-seqobj'}) and $args{'-seqobj'}->can("seq")) { # do nothing (maybe check later) return $args{'-seqobj'}; } #elsif (ref($format) =~ /Bio\:\:Seq/ and !defined $seq) { # if only one parameter passed and it's a Bio::Seq #return $format; #} else { $self->throw ("Wrong parameters passed to search method: ".%args); } } sub _seq_to_pdlmatrix { # called from ?search # not OO - help function for search my $seqobj = shift; my $seqstring = uc($seqobj->seq()); my @perlarray; foreach (qw(A C G T)) { my $seqtobits = $seqstring; eval "\$seqtobits =~ tr/$_/1/"; # curr. letter $_ to 1 eval "\$seqtobits =~ tr/1/0/c"; # non-1s to 0 push @perlarray, [split("", $seqtobits)]; } return byte (\@perlarray); } sub DESTROY { # nothing } 1; TFBS-0.7.1/blib/lib/TFBS/Matrix/_Alignment.pm000066400000000000000000000273741305752266700204020ustar00rootroot00000000000000package TFBS::Matrix::_Alignment; use vars qw(@ISA $AUTOLOAD); use TFBS::SitePair; use TFBS::SitePairSet; use Bio::Root::Root; use Bio::Seq; use Bio::SimpleAlign; use Bio::AlignIO; use IO::String; use PDL; use strict; @ISA =('Bio::Root::Root'); # CONSTANTS use constant DEFAULT_WINDOW => 50; use constant DEFAULT_CUTOFF => 70; use constant DEFAULT_THRESHOLD => "80%"; sub new { # this is ugly; OK, OK, I'll rewrite it as soon as I can my ($caller, %args) = @_; my $self = bless {}, ref $caller || $caller; $self->window($args{-window} or DEFAULT_WINDOW); $self->_parse_alignment(%args); $self->seq1length(length(_strip_gaps($self->alignseq1()))); $self->seq2length(length(_strip_gaps($self->alignseq2()))); $self->_set_subpart_bounds($args{-subpart}); # # If a conservation profile is provided, no need to compute it again. # NOTE: conservation2 never seems to be used anywhere else so don't worry # about the fact we are ignoring it if conservation is passed in :) # my $cp = $args{-conservation}; if ($cp) { $self->conservation1([$cp->conservation()]); } else { $self->conservation1($self->_calculate_conservation($self->window(),1)); $self->conservation2($self->_calculate_conservation($self->window(),2)); } $self->cutoff($args{-cutoff} or DEFAULT_CUTOFF); #$self->threshold($args{-threshold} or DEFAULT_THRESHOLD); #$self->_do_sitesearch #(($args{-pattern_set} or $self->throw("No -matrixset parameter")), # ($args{-threshold} or DEFAULT_THRESHOLD), # ()); # $self->_set_start_end(%args); # Maybe later... return $self; } sub DESTROY { # empty } sub _parse_alignment { my ($self, %args) = @_; my ($seq1, $seq2, $start); my $alignobj; if (defined $args{'-alignstring'}) { $alignobj = _alignstring_to_alignobj($args{'-alignstring'}); } elsif (defined $args{'-file'}) { $alignobj = _alignfile_to_alignobj($args{'-file'}); } elsif (defined $args{-alignobj}) { $alignobj = $args{'-alignobj'}; } else { $self->throw("No -alignstring, -file or -alignobj passed."); } my @match; my ($seqobj1, $seqobj2) = $alignobj->each_seq; ($seq1, $seq2) = ($seqobj1->seq, $seqobj2->seq); $start = 1; $self->seq1name($seqobj1->display_id); $self->seq2name($seqobj2->display_id); $self->alignseq1($seq1); $self->alignseq2($seq2); my @seq1 = ("-", split('', $seq1) ); my @seq2 = ("-", split('', $seq2) ); $self->{alignseq1array} = [@seq1]; $self->{alignseq2array} = [@seq2]; my (@seq1index, @seq2index); my ($i1, $i2) = (0, 0); for my $pos (0..$#seq1) { my ($s1, $s2) = (0, 0); $seq1[$pos] ne "-" and $s1 = ++$i1; $seq2[$pos] ne "-" and $s2 = ++$i2; push @seq1index, $s1; push @seq2index, $s2; } $self->pdlindex( pdl [ [list sequence($#seq1+1)], [@seq1index], [@seq2index], [list zeroes ($#seq1+1)] ]) ; return 1; } sub pdlindex { my ($self, $input, $p1, $p2) = @_ ; # print ("PARAMS ", join(":", @_), "\n"); if (ref($input) eq "PDL") { $self->{pdlindex} = $input; } unless (defined $p2) { return $self->{pdlindex}; } else { my @results = list $self->{pdlindex}->xchg(0,1)->slice($p2)->where ($self->{pdlindex}->xchg(0,1)->slice($p1)==$input); wantarray ? return @results : return $results[0]; } } sub lower_pdlindex { my ($self, $input, $p1, $p2) = @_; unless (defined $p2) { $self->throw("Wrong number of parameters passed to lower_pdlindex"); } my $result; my $i = $input; until ($result = $self->pdlindex($i, $p1 => $p2)) { $i--; last if $i==0; } return $result or 1; } sub higher_pdlindex { my ($self, $input, $p1, $p2) = @_; unless (defined $p2) { $self->throw("Wrong number of parameters passed to lower_pdlindex"); } my $result; my $i = $input; until ($result = $self->pdlindex($i, $p1 => $p2)) { $i++; last unless ($self->pdlindex($i, $p1=>0) > 0); } return $result; } sub _calculate_conservation { my ($self, $WINDOW, $which) = @_; my (@seq1, @seq2); if ($which==2) { @seq1 = @{$self->{alignseq2array}}; @seq2 = @{$self->{alignseq1array}}; } else { @seq1 = @{$self->{alignseq1array}}; @seq2 = @{$self->{alignseq2array}}; $which=1; } my @CONSERVATION; my @match; while ($seq1[0] eq "-") { shift @seq1; shift @seq2; } for my $i (0..$#seq1) { push (@match,( uc($seq1[$i]) eq uc($seq2[$i]) ? 1:0)) unless ($seq1[$i] eq "-" or $seq1[$i] eq "."); } my @graph=($match[0]); for my $i (1..($#match+$WINDOW/2)) { $graph[$i] = ($graph[$i-1] or 0) + ($i>$#match ? 0: $match[$i]) - ($i<$WINDOW ? 0: $match[$i-$WINDOW]); } # at this point, the graph values are shifted $WINDOW/2 to the right # i.e. the score at a certain position is the score of the window # UPSTREAM of it: To fix it, we should discard the first $WINDOW/2 scores: #$self->conservation1 ([]); foreach my $pos (@graph[int($WINDOW/2)..$#graph]) { push @CONSERVATION, 100*$pos/$WINDOW; } # correction foreach my $pos (0..int($WINDOW/2)) { $CONSERVATION[$pos] = $CONSERVATION[$pos]*$WINDOW/(int($WINDOW/2)+$pos); $CONSERVATION[$#CONSERVATION - $pos] = $CONSERVATION[$#CONSERVATION - $pos]*$WINDOW/(int($WINDOW/2)+$pos); } return [@CONSERVATION]; } sub _strip_gaps { # a utility function my $seq = shift; $seq =~ s/\-|\.//g; return $seq; } sub do_sitesearch { my ($self, @args ) = @_; my ($MATRIXSET, $THRESHOLD, $CUTOFF) = $self->_rearrange([qw(PATTERN_SET THRESHOLD CUTOFF)], @args); if (!$MATRIXSET) { $self->throw("No -pattern_set passed to do_sitesearch"); } $CUTOFF = ($CUTOFF or DEFAULT_CUTOFF); $THRESHOLD = ($THRESHOLD or DEFAULT_THRESHOLD); $self->site_pair_set(TFBS::SitePairSet->new()); return if(($self->subpart1 and $self->subpart1->{-start} == 0) or ($self->subpart2 and $self->subpart2->{-start} == 0)); # ^^^ If one of the subparts is a gap, there's no point in searching my $seqobj1 = Bio::Seq->new(-seq=>_strip_gaps($self->alignseq1()), -id => "Seq1"); my $siteset1 = $MATRIXSET->search_seq(-seqobj => $seqobj1, -threshold => $THRESHOLD, -subpart => $self->subpart1); my $siteset1_itr = $siteset1->Iterator(-sort_by => "start"); my $seqobj2 = Bio::Seq->new(-seq=>_strip_gaps($self->alignseq2()), -id => "Seq2"); my $siteset2 = $MATRIXSET->search_seq(-seqobj => $seqobj2, -threshold => $THRESHOLD, -subpart => $self->subpart2); my $siteset2_itr = $siteset2->Iterator(-sort_by => "start"); my $site1 = $siteset1_itr->next(); my $site2 = $siteset2_itr->next(); while (defined $site1 and defined $site2) { my $pos1_in_aln = $self->pdlindex($site1->start(), 1=>0); my $pos2_in_aln = $self->pdlindex($site2->start(), 2=>0); my $cmp = (($pos1_in_aln <=> $pos2_in_aln) or ($site1->pattern->name() cmp $site2->pattern->name()) or ($site1->strand() cmp $site2->strand())); if ($cmp==0) { ### match if (# threshold test: $self->conservation1->[$site1->start()] >= $self->cutoff() ) { my $site_pair = TFBS::SitePair->new($site1, $site2); $self->site_pair_set->add_site_pair($site_pair); } $site1 = $siteset1_itr->next(); $site2 = $siteset2_itr->next(); } elsif ($cmp<0) { ### $siteset1 is behind $site1 = $siteset1_itr->next(); } elsif ($cmp>0) { ### $siteset2 is behind $site2 = $siteset2_itr->next(); } } } sub _set_subpart_bounds { my ($self, $subpart) = @_; if(defined $subpart) { my ($relative_to, $start, $end) = ($subpart->{-relative_to}, $subpart->{-start}, $subpart->{-end}); unless(defined($relative_to) and defined($start) and defined($end) ) { $self->throw("Option -subpart missing suboption -relative_to, -start or -end"); } if($relative_to == 1) { my $other_start = $self->higher_pdlindex($start, 1 => 2); my $other_end = $self->lower_pdlindex($end, 1 => 2); ($other_start, $other_end) = (0,0) if($other_start > $other_end); $self->subpart1({ -start => $start, -end => $end }); $self->subpart2({ -start => $other_start, -end => $other_end }); } elsif($relative_to == 2) { my $other_start = $self->higher_pdlindex($start, 2 => 1); my $other_end = $self->lower_pdlindex($end, 2 => 1); ($other_start, $other_end) = (0,0) if($other_start > $other_end); $self->subpart1({ -start => $other_start, -end => $other_end }); $self->subpart2({ -start => $start, -end => $end }); } else { $self->throw("Suboption -relative_to should be 1 or 2"); } } } sub _calculate_cutoff { my ($self) = @_; my $ile = 0.9; my @conservation_array = sort {$a <=> $b} @{$self->conservation1()}; my $perc_90 = $conservation_array[int($ile*scalar(@conservation_array))]; return $perc_90; } sub _alignfile_to_string { # a utility function # DEPRECATED !!! my $alignfile = shift; if ($alignfile =~ /\.msf$/i) { my $alignobj = Bio::SimpleAlign->new(); $alignobj->read_MSF($alignfile); return _alignobj_to_string($alignobj); } else { #assumed clustalw - no AlignIO import yet local $/ = undef; open FILE, $alignfile or die("Could not read alignfile $alignfile, stopped"); my $alignstring = ; return $alignstring; } } sub _alignfile_to_alignobj { # a utility function my ($alignfile, $format) = (@_,'clustalw'); if (!$format and $alignfile =~ /\.msf$/i) { $format = 'msf' ;} my $alnio = Bio::AlignIO->new(-file=>$alignfile, -format=>$format); return $alnio->next_aln; } sub _alignobj_to_string { # a utility function # DEPRECATED my $alignobj = shift; my $alignstring; my $io = IO::String->new($alignstring); my $alnio = Bio::AlignIO->new(-fh=>$io, -format=>"clustalw"); $alnio->write_aln($alignobj); $alnio->close(); # $io->close; return $alignstring; } sub _alignstring_to_alignobj { # a utility function my ($alignstring, $format) = (@_, 'clustalw'); my $io = IO::String->new($alignstring); my $alnio = Bio::AlignIO->new(-fh=>$io, -format=>$format); my $alignobj = $alnio->next_aln(); $alnio->close(); # $io->close; return $alignstring; } # uglier than AUTOLOAD, but faster - a quick fix to get rid of Class::MethodMaker sub cutoff { $_[0]->{'cutoff'} = $_[1] if exists $_[1]; $_[0]->{'cutoff'}; } sub window { $_[0]->{'window '} = $_[1] if exists $_[1]; $_[0]->{'window '}; } sub alignseq1 { $_[0]->{'alignseq1'} = $_[1] if exists $_[1]; $_[0]->{'alignseq1'}; } sub alignseq2 { $_[0]->{'alignseq2'} = $_[1] if exists $_[1]; $_[0]->{'alignseq2'}; } sub site_pair_set { $_[0]->{'site_pair_set'} = $_[1] if exists $_[1]; $_[0]->{'site_pair_set'};} sub seq1name { $_[0]->{'seq1name'} = $_[1] if exists $_[1]; $_[0]->{'seq1name'}; } sub seq2name { $_[0]->{'seq2name'} = $_[1] if exists $_[1]; $_[0]->{'seq2name'}; } sub seq1length { $_[0]->{'seq1length'} = $_[1] if exists $_[1]; $_[0]->{'seq1length'}; } sub seq2length { $_[0]->{'seq2length'} = $_[1] if exists $_[1]; $_[0]->{'seq2length'}; } sub subpart1 { $_[0]->{'subpart1'} = $_[1] if exists $_[1]; $_[0]->{'subpart1'}; } sub subpart2 { $_[0]->{'subpart2'} = $_[1] if exists $_[1]; $_[0]->{'subpart2'}; } sub conservation1 { $_[0]->{'conservation1'} = $_[1] if exists $_[1]; $_[0]->{'conservation1'};} sub conservation2 { $_[0]->{'conservation2'} = $_[1] if exists $_[1]; $_[0]->{'conservation2'};} sub exclude_orf { $_[0]->{'exclude_orf'} = $_[1] if exists $_[1]; $_[0]->{'exclude_orf'}; } sub start_at { $_[0]->{'start_at'} = $_[1] if exists $_[1]; $_[0]->{'start_at'}; } sub end_at { $_[0]->{'end_at'} = $_[1] if exists $_[1]; $_[0]->{'end_at'}; } 1; TFBS-0.7.1/blib/lib/TFBS/MatrixSet.pm000066400000000000000000000310041305752266700167620ustar00rootroot00000000000000# TFBS module for TFBS::MatrixSet # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Matrix::Set - an agregate class representing a set of matrix patterns, containing methods for manipulating the set as a whole =head1 SYNOPSIS # creation of a TFBS::MatrixSet object # let @list_of_matrix_objects be a list of TFBS::Matrix::* objects ################################### # Create a TFBS::MatrixSet object: my $matrixset = TFBS::MatrixSet->new(); # creates an empty set $matrixset->add_Matrix(@list_of_matrix_objects); #add matrix objects to set $matrixset->add_Matrix($matrixobj); # adds a single matrix object to set # or, same as above: my $matrixset = TFBS::MatrixSet->new(@list_of_matrix_objects, $matrixobj); ################################### # =head1 DESCRIPTION TFBS::MatrixSet is an aggregate class storing a set of TFBS::Matrix::* subclass objects, and providing methods form manipulating those sets as a whole. TFBS::MatrixSet objects are created de novo or returned by some database (TFBS::DB::*) retrieval methods. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::MatrixSet; use vars '@ISA'; use PDL; use Bio::Seq; use Bio::SeqIO; use Bio::Root::Root; use File::Temp qw/:POSIX/; use TFBS::Matrix; use TFBS::_Iterator::_MatrixSetIterator; use TFBS::SiteSet; use strict; @ISA = qw(Bio::Root::Root); =head2 new =cut sub new { my ($caller, @matrices) = shift; my $self = bless {matrix_list =>[]}, ref($caller) || $caller; $self->add_matrix(@matrices) if @matrices; return $self; } =head2 add_matrix Title : add_matrix Usage : $matrixset->add_matrix(@list_of_matrix_objects); Function: Adds matrix objects to matrixset Returns : object reference (usually ignored) Args : one or more TFBS::Matrix::* objects =cut sub add_matrix { my ($self, @matrices) = @_; foreach my $matrix (@matrices) { $self->throw("Argument to add_matrix_set not a TFBS::Matrix object") unless $matrix->isa("TFBS::Matrix"); } push @{$self->{matrix_list}}, @matrices; return $self; } =head2 add_matrix_set Title : add_matrix Usage : $matrixset->add_matrix(@list_of_matrixset_objects); Function: Adds to the matrixset matrix objects contained in one or more other matrixsets Returns : object reference (usually ignored) Args : one or more TFBS::MatrixSet objects =cut sub add_matrix_set { my ($self, @sets) = @_; foreach my $matrixset (@sets) { $self->throw("Argument to add_matrix_set not a TFBS::Matrixset object") unless $matrixset->isa("TFBS::MatrixSet"); push @{$self->{matrix_list}}, @{$matrixset->{matrix_list}}; } } sub reset { my ($self) = @_; $self->warn("reset: Deprecated method use Iterator instead."); @{$self->{_iterator_list}} = @{$self->{matrix_list}}; } sub sort_by_name { my ($self) = @_; $self->warn("sort_by_name: Deprecated method use Iterator instead."); @{$self->{matrix_list}} = sort { uc($a->{name}) cmp uc ($b->{name}) } @{$self->{matrix_list}}; $self->reset(); } sub next { my ($self) = @_; $self->warn("next: Deprecated method use Iterator instead."); if (my $next_matrix = shift (@{$self->{_iterator_list}})) { return $next_matrix; } else { $self->reset; return undef; } } =head2 search_seq Title : search_seq Usage : my $siteset = $matrixset->search_seq(%args) Function: scans a nucleotide sequence with all patterns represented stored in $matrixset; It works only if all matrix objects in $matrixset understand search_seq method (currently only TFBS::Matrix::PWM objects do) Returns : a TFBS::SiteSet object Args : # you must specify either one of the following three: -file, # the name od a fasta file (single sequence) #or -seqobj # a Bio::Seq object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -seqstring # a string containing the sequence -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" =cut sub search_seq { my ($self, %args) = @_; $self->_search(%args); } =head2 search_aln Title : search_aln Usage : my $site_pair_set = $matrixset->search_aln(%args) Function: Scans a pairwise alignment of nucleotide sequences with the pattern represented by the PWM: it reports only those hits that are present in equivalent positions of both sequences and exceed a specified threshold score in both, AND are found in regions of the alignment above the specified conservation cutoff value. It works only if all matrix object in $matrixset understand search_aln method (currently only TFBS::Matrix::PWM objects do) Returns : a TFBS::SitePairSet object Args : # you must specify either one of the following three: -file, # the name of the alignment file in Clustal format #or -alignobj # a Bio::SimpleAlign object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -alignstring # a multi-line string containing the alignment # in clustal format ############# -threshold, # minimum score for the hit, either absolute # (e.g. 11.2) or relative (e.g. "75%") # OPTIONAL: default "80%" -window, # size of the sliding window (inn nucleotides) # for calculating local conservation in the # alignment # OPTIONAL: default 50 -cutoff # conservation cutoff (%) for including the # region in the results of the pattern search # OPTIONAL: default "70%" -subpart # subpart of the alignment to search, given as e.g. # -subpart => { relative_to => 1, # start => 140, # end => 180 } # where start and end are coordinates in the # sequence indicated by relative_to (1 for the # 1st sequence in the alignment, 2 for the 2nd) # OPTIONAL: by default searches entire alignment -conservation # conservation profile, a TFBS::ConservationProfile # OPTIONAL: by default the conservation profile is # computed internally on the fly (less efficient) =cut sub search_aln { my ($self, %args) = @_; my $mxit = $self->Iterator(); my $sitepairset = TFBS::SitePairSet->new; my $aln = TFBS::Matrix::_Alignment->new(%args); while (my $mx = $mxit->next) { my $singleset = $mx->search_aln(%args, -alignment_setup => $aln); $sitepairset->add_site_pair_set($singleset); } return $sitepairset; } =head2 size Title : size Usage : my $number_of_matrices = $matrixset->size; Function: gets the number of matrix objects in the $matrixset (i.e. the size of the set) Returns : a number Args : none =cut sub size { scalar @{ $_[0]->{matrix_list} }; } =head2 Iterator Title : Iterator Usage : my $matrixset_iterator = $matrixset->Iterator(-sort_by =>'total_ic'); while (my $matrix_object = $matrix_iterator->next) { # do whatever you want with individual matrix objects } Function: Returns an iterator object that can be used to go through all members of the set Returns : an iterator object (currently undocumentened in TFBS - but understands the 'next' method) Args : -sort_by # optional - currently it accepts # 'ID' (alphabetically) # 'name' (alphabetically) # 'class' (alphabetically) # 'total_ic' (numerically, decreasing order) -reverse # optional - reverses the default sorting order if true =cut sub Iterator { my ($self, %args) = @_; return TFBS::_Iterator::_MatrixSetIterator->new($self->{matrix_list}, $args{'-sort_by'}, $args{'-reverse'} ); } sub _search { my ($self, %args) = @_; # DIRTY - stick tmp file name to seq object my $seqobj = $self->_to_seqobj(%args); ($seqobj->{_fastaFH}, $seqobj->{_fastafile}) = tmpnam(); # we need $fastafile below my $outstream = Bio::SeqIO->new(-file=>">".$seqobj->{_fastafile}, -format=>"Fasta"); my $subseqobj; if(my $subpart = $args{-subpart}) { my $subseq_start = $subpart->{-start}; my $subseq_end = $subpart->{-end}; unless($subseq_start and $subseq_end) { $self->throw("Option -subpart missing suboption -relative_to, -start or -end"); } $subseqobj = Bio::Seq->new(-seq => $seqobj->subseq($subseq_start, $subseq_end), -id => $seqobj->id); } $outstream->write_seq($subseqobj or $seqobj); $outstream->close; # iterate through pwms my @PWMs; my $mxit = $self->Iterator(); while (my $pwm = $mxit->next() ) { push @PWMs,$pwm; } # do the analysis my $hitlist = TFBS::SiteSet->new(); foreach my $pwm (@PWMs) { my $threshold = ($args{-threshold} or $pwm->{minscore}); $hitlist->add_siteset($pwm->search_seq(-seqobj=>$seqobj, -threshold =>$threshold, -subpart=>$args{-subpart})); } delete $seqobj->{_fastaFH}; unlink $seqobj->{_fastafile}; delete $seqobj->{_fastafile}; return $hitlist; } sub _csearch { my ($self, %args) = @_; my $PWM_SEARCH = '/home/httpd/cgi-bin/CONSITE/bin/pwm_searchPFF'; # DIRTY - stick tmp file name to seq object my $seqobj = $self->_to_seqobj(%args); ($seqobj->{_fastaFH}, $seqobj->{_fastafile}) = tmpnam(); # we need $fastafile below my $seqFH = Bio::SeqIO->newFh(-fh=>$seqobj->{_fastaFH}, -format=>"Fasta"); print $seqFH $seqobj; # iterate through pwms my @PWMs; $self->reset(); while (my $pwm = $self->next() ) { push @PWMs,$pwm; } # do the analysis my $hitlist = TFBS::SiteSet->new(); foreach my $pwm (@PWMs) { my $threshold = ($args{-threshold} or $pwm->{minscore}); $hitlist->add_siteset($pwm->search_seq(-seqobj=>$seqobj, -threshold =>$threshold )); } delete $seqobj->{_fastaFH}; delete $seqobj->{_fastafile}; return $hitlist; } sub _bsearch { my ($self,%args) = @_; #the rest of @_ goes to _to_seqob; my @PWMs; # prepare the sequence my $seqobj = $self->_to_seqobj(%args); $seqobj->{_pdl_matrix} = _seq_to_pdlmatrix($seqobj); # prepare the PWMs $self->reset(); while (my $pwm = $self->next() ) { push @PWMs,$pwm; } # do the analysis my $hitlist = TFBS::SiteSet->new(); foreach my $pwm (@PWMs) { my $threshold = ($args{-threshold} or $pwm->{minscore}); $hitlist->add_siteset($pwm->bsearch(-seqobj=>$seqobj, -threshold =>$threshold )); } delete $seqobj->{_pdl_matrix}; return $hitlist; } sub _seq_to_pdlmatrix { # not OO - help function for search my $seqobj = shift; my $seqstring = uc($seqobj->seq()); my @perlarray; foreach (qw(A C G T)) { my $seqtobits = $seqstring; eval "\$seqtobits =~ tr/$_/1/"; # curr. letter $_ to 1 eval "\$seqtobits =~ tr/1/0/c"; # non-1s to 0 push @perlarray, [split("", $seqtobits)]; } return byte (\@perlarray); } sub _to_seqobj { my ($self, %args) = @_; my $seq; if ($args{-file}) { # not a Bio::Seq return Bio::SeqIO->new(-file => $args{-file}, -format => 'fasta', -moltype => 'dna')->next_seq(); } elsif ($args{-seqstring} or $args{-seq}) { # I guess it's a string then return Bio::Seq->new(-seq => ($args{-seqstring} or $args{-seq}), -id => ($args{-seq_id} or "undefined"), -moltype => 'dna'); } elsif ($args{'-seqobj'} and ref($args{'-seqobj'}) =~ /Bio\:\:Seq/) { # do nothing (maybe check later) return $args{'-seqobj'}; } #elsif (ref($format) =~ /Bio\:\:Seq/ and !defined $seq) { # if only one parameter passed and it's a Bio::Seq #return $format; #} else { $self->throw ("Wrong parameters passed to search method: ".%args); } # CONTINUE HERE TOMORROW } sub add_Matrix { my ($self, @matrixlist) = @_; foreach (@matrixlist) { ref($_) =~ /TFBS::Matrix::/ or $self->throw("Attempted to add an element ". "that is not a TFBS::Matrix object."); push @{$self->{matrix_list}}, $_; push @{$self->{_iterator_list}}, $_; } return 1; } 1; TFBS-0.7.1/blib/lib/TFBS/PatternGen.pm000066400000000000000000000113021305752266700171100ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen - a base class for pattern generators =head1 DESCRIPTION TFBS::PatternGen is a base class providing methods common to all pattern generating modules. It is meant to be inherited by a concrete pattern generator, which must have its own constructor. =cut package TFBS::PatternGen; # Object preamble - inherits from TFBS::PatternGenI; use vars qw(@ISA); use strict; use TFBS::PatternGenI; # use TFBS::PatternGen::_Motif_; use Bio::Seq; use Bio::SeqIO; use Carp; @ISA = qw(TFBS::PatternGenI); sub new { confess("TFBS::PatterGen is a base class for particular pattern generators". "and cannot be instantiated itself."); } =head2 pattern Title : pattern Usage : my $pattern_obj = $patterngen->pattern() Function: retrieves a pattern object produced by the pattern generator Returns : a pattern object (currently available pattern generators return a TFBS::Matrix::PFM object) Args : none Warning : If a pattern generator produces more than one pattern, this method call returns only the first one and prints a warning on STDERR, In those cases you should use I or I methods. =cut sub pattern { my ($self, %args) =@_; my @PFMs = $self->_motifs_to_patterns(%args); if (scalar(@PFMs) > 1) { $self->warn("The pattern generator produced multiple patterns. ". "Please use patternSet method to retrieve a set object, ". "or all_patterns method to retrieve an array of patterns"); } return $PFMs[0]; } =head2 patternSet Title : patternSet Usage : my $patternSet = $patterngen->patternSet() Function: retrieves a pattern set object containing all the patterns produced by the pattern generator Returns : a pattern set object (currently available pattern generators return a TFBS::MatrixSet object) Args : none =cut sub patternSet { my ($self, %args) = @_; my @PFMs = $self->_motifs_to_patterns(%args); my $set = TFBS::MatrixSet->new(); $set->add_matrix(@PFMs); return $set; } =head2 all_patterns Title : all_patterns Usage : my @patterns = $patterngen->all_patterns() Function: retrieves an array of pattern objects produced by the pattern generator Returns : an array of pattern set objects (currently available pattern generators return an array of TFBS::Matrix::PFM objects) Args : none =cut sub all_patterns { my ($self, %args) = @_; my @patterns = $self->_motifs_to_patterns(%args); return @patterns; } sub _create_seq_set { my ($self, %args) = @_; my (@raw_set, @final_set); if ($args{-seq_list}) { @raw_set = @{$args{-seq_list}}; } elsif ($args{-seq_stream} ) { while (my $seqobj = $args{-seq_stream}->next_seq()) { push @raw_set, $seqobj; } } elsif ($args{-seq_file} ) { my $seqstream = Bio::SeqIO->new(-file=>$args{-seq_file}, -format=>"fasta"); while (my $seqobj = $seqstream->next_seq()) { push @raw_set, $seqobj; } } foreach my $seqobj (@raw_set) { my $i = 1; #for unnamed sequences if (ref($seqobj)) { my $seqstring; eval { $seqstring = $seqobj->seq() }; if ($@) { $self->throw("Invalid sequence object passed in -seq_set."); } else { _validate_seq(uc $seqstring) or $self->throw("Illegal character(s) in sequence: $seqstring"); } push @final_set, $seqobj; } else { my $seqstring = $seqobj; _validate_seq(uc $seqstring) or $self->throw("Illegal character(s) in sequence: $seqstring"); push @final_set, Bio::Seq->new(-seq=>$seqstring, -ID=>"unnamed".$i++, -type=>"dna"); } } $self->{'seq_set'} = \@final_set; return 1; } sub _motifs_to_patterns { my ($self, %args) = @_; my $i = 1; my @patterns; my %params = ( -name => "motif", -ID => "motif", -class => "unknown", %args); foreach my $motif (@{ $self->{'motifs'} }) { push @patterns, $motif->pattern(-name => $params{-name}.$i, -ID => $params{-ID}."#".$i, -class => $params{-class}); $i++; } return @patterns; } sub _validate_seq { # a utility function my $sequence = uc $_[0]; $sequence=~ s/[ACGT]//g; return ($sequence eq "" ? 1 : 0); } sub _check_seqs_for_uniform_length { my $self = shift; my $reflength = $self->{'seq_set'}->[-1]->length(); foreach my $seqobj ( @{ $self->{'seq_set'} } ) { if ($seqobj->length() != $reflength) { $self->throw(ref($self). "object has received sequences of unequal length"); } } } sub all_motifs { return @{$_[0]->{'motifs'}} if $_[0]->{'motifs'}; } TFBS-0.7.1/blib/lib/TFBS/PatternGen/000077500000000000000000000000001305752266700165555ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/PatternGen/AnnSpec.pm000066400000000000000000000140051305752266700204420ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::AnnSpec # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::AnnSpec - a pattern factory that uses the AnnSpec program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::AnnSpec->new(-seq_file=>'sequences.fa', -binary => 'ann-spec ' my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::AnnSpec builds position frequency matrices using an external program AnnSpec (Workman, C. and Stormo, G.D. (2000) ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. Proc. Pacific Symposium on Biocomputing 2000). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =cut package TFBS::PatternGen::AnnSpec; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::AnnSpec::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $pattrengen = TFBS::PatternGen::AnnSpec->new(%args); Function: the constructor for the TFBS::PatternGen::AnnSpec object Returns : a TFBS::PatternGen::AnnSpec object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to the 'meme' executable # OPTIONAL: default 'ann-spec' -additional_params # a string containing additional # command-line switches for the # ann-spec program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'filename'} =$args{'-seq_file'}; $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'ann-spec'; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_AnnSpec() or $self->throw("Error running AnnSpec."); return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _run_AnnSpec{ my ($self)=shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $command_line = $self->{'binary'}." ". "-p ".$tmp_file." ". # $self->{'motif_length_string'}." ". # $self->{'nr_hits_string'}." ". $self->{'additional_params'}. ""; # print STDERR "$command_line\n"; my $resultstring = `$command_line`; # print STDERR $resultstring; $self->_parse_AnnSpec_output($resultstring,$command_line); unlink $tmp_file; return 1 } sub _parse_AnnSpec_output{ my ($self,$resultstring,$command_line)=@_; if ($resultstring eq''){ # warn "Error running AnnSpec\nNo patterns produced"; $self->throw ("Error running AnnSpec using command:\n $command_line"); return; } my ($consensus,$matrix)=$self->_parse_raw_matrix($resultstring); my ($score,$sites)=$self->_parse_sites($resultstring); my $motif =TFBS::PatternGen::AnnSpec::Motif->new ( #-length => $length."", # -bg_probabilities => [split /\s+/, $raw_bp], -tags => {consensus => $consensus, score=>$score}, -nr_hits => 1, -sites=>$sites, -matrix => $matrix ); push @{ $self->{'motifs'} }, $motif; return } sub _parse_sites{ my ($self,$string)=@_; # print $raw_motif; my @hits; my ($sites)=$string=~/STR BEST_SITES\n(.*)STR ave\(S\)/s; my ($average)=$string=~/STR ave\(S\)\s+(\d*\.*\d*)/; my ($score)=$string=~/STR ln\(ave\(sum\(exp\(S\)\)\)\)\s+(\d*\.*\d*)/; # print STDERR $score,"\n"; my @sites=split/\n/,$sites; shift @sites; # print "@sites\n"; foreach my $site (@sites){ my @site_array=split(/\s+/,$site); # print "$site_array[3]\n"; # print "$site_array[5]\n"; my ($seq_id)=$site_array[5]=~/>(.*)/; my $strand=1; $strand=-1 if $site_array[3]=~/\'/;#MEans we have a pattern in the reverse strand my ($start)=$site_array[3]=~/(\d+)/; my $site = Bio::SeqFeature::Generic->new ( -start => $start, -end => $start+(length$site_array[4])-1, -strand => $strand, -source => 'AnnSpec', -score => $site_array[2], ); # foreach my $seq(@{$self->{'seq_set'}}){ if ($seq->id eq $seq_id){ $site->attach_seq ($seq); } } push (@hits,$site); } return $score,\@hits; } sub _parse_raw_matrix{ my ($self,$string)=@_; my ($matrix)=$string=~/ALR ALIGNMENT_MATRIX.*ALR\s+-+(.*)ALR CONSENSUS/s; my ($consensus)=$string=~/ALR CONSENSUS (.*)\n/; #print $consensus; my @matrix=split("\n",$matrix); shift @matrix; my @pfm; foreach my $row(@matrix){ # print $row; my @row=split /\s+/, $row; push @pfm, [@row[2..scalar@row-1]]; } return $consensus, \@pfm; } 1; TFBS-0.7.1/blib/lib/TFBS/PatternGen/AnnSpec/000077500000000000000000000000001305752266700201045ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/PatternGen/AnnSpec/Motif.pm000066400000000000000000000023111305752266700215150ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::AnnSpec::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::AnnSpec::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::AnnSpec::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the AnnSpec program. You do not normally want to create a TFBS::PatternGen::AnnSpec::Motif yourself. They are created by running TFBS::PatternGen::AnnSpec =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard and Wynand Alkema Boris Lenhard EBoris.Lenhard@cgb.ki.seE Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::AnnSpec::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); TFBS-0.7.1/blib/lib/TFBS/PatternGen/Elph.pm000066400000000000000000000202171305752266700200050ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::Elph # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::Elph - a pattern factory that uses the Elph program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::Elph->new(-seq_file=>'sequences.fa', -binary => '/Elph/elph' -motif_length => [8, 9, 10], -additional_params => '-x -r -e'); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::Gibbs builds position frequency matrices using an advanced Gibbs sampling algorithm implemented in external I program by Chip Lawrence. The algorithm can produce multiple patterns from a single set of sequences. =cut package TFBS::PatternGen::Elph; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::Elph::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $db = TFBS::PatternGen::Gibbs->new(%args); Function: the constructor for the TFBS::PatternGen::Gibbs object Returns : a TFBS::PatternGen::Gibbs object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to Gibbs executable # OPTIONAL: default 'Gibbs' -nr_hits # a presumed number of pattern occurrences in the # sequence set: it can be a single integer, e.g. # -nr_hits => 24 , or a reference to an array of # integers, e.g -nr_hits => [12, 24, 36] -motif_length # an expected length of motif in nucleotides: # it can be a single integer, e.g. # -motif_length => 8 , or a reference to an # array ofintegers, e.g -motif_length => [8..12] -additional_params # a string containing additional # command-line switches for the # Gibbs program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'motif_length_string'} = ($args{'-motif_length'} ? (ref($args{'-motif_length'}) ? join(',', @{$args{'-motif_length'}}) : $args{'-motif_length'}) : 8 ); $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'elph'; $self->{'motifs'} = []; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_elph() or $self->throw("Error running elph."); return $self; } sub _run_elph { my $self = shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); $self->{'additional_params'}=~s/-b//; #This removes a -b switch. This enables long output containgin info about the sites my $command_line = $self->{'binary'}." ". $tmp_file." ". "LEN=".$self->{'motif_length_string'}." ". $self->{'additional_params'}." 2>/dev/null"; my $resultstring = `$command_line`; $self->_parse_elph_output($resultstring,$command_line); #print STDERR "$command_line\n"; #print STDERR $resultstring; # unlink $tmp_file; return 1 } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _parse_elph_output { my ($self, $resultstring,$command_line) = @_; #print $resultstring; if ($resultstring=~/^error/){ $self->throw ("Error running elp command:\n $command_line"); return; } #Motif after optimizing #MAP for motif: 46.735 InfoPar=0.098 # #Motif found: # #Background probability model: # a c g t # 0.30 0.20 0.19 0.31 # #Background counts: #a: 1456 #c: 948 #g: 909 #t: 1487 # # #Motif probability model: #Pos: 1 2 3 4 5 6 #a 1.00 0.00 1.00 0.83 0.00 0.00 #c 0.00 0.00 0.00 0.00 0.00 0.17 #g 0.00 1.00 0.00 0.17 1.00 0.83 #t 0.00 0.00 0.00 0.00 0.00 0.00 #------------------------------------------ #Info 1.73 2.42 1.73 1.19 2.42 1.75 # #Motif counts: #a: 6 0 6 5 0 0 #c: 0 0 0 0 0 1 #g: 0 6 0 1 6 5 #t: 0 0 0 0 0 0 # # (my $MAP)=$resultstring=~/MAP for motif: (.*) InfoPar=/; ($resultstring)=~s/.*Motif counts:\n//s; #print STDERR $resultstring; my @array=split "\n",$resultstring; my @matrix; #print $array[0],"\n"; foreach (0..3){ my (@line)=split(/\s+/,$array[$_]); #print "@line\n"; shift @line; push @matrix,\@line; # print "@line\n"; } # print @matrix; #print $resultstring; my $sites=$self->_site_props($resultstring); my $motif =TFBS::PatternGen::Elph::Motif->new ( -tags => {score=>$MAP},#The score in this case is the E-value given in the output -sites=>$sites, -matrix => \@matrix ); # Seq.no Pos ***** Motif ***** Prob D Seq.Id # 1 354 ggatt AGAAGC cgccg 0.1389 -1 GAL1 # 2 636 caaag AGAAGG ttttt 0.6942 -1 GAL10 # 3 456 aaggc AGAAGG cagta 0.6942 -1 GAL2 # 4 444 aaagt AGAGGG ggtaa 0.1388 -1 GAL7 # 5 324 tagag AGAAGG agcaa 0.6942 -1 GAL80 # 6 165 gttac AGAAGG gccgc 0.6942 -1 GCY1 #$resultstring =~ s/.*=== MAP MAXIMIZATION RESULTS ===//s; #my @raw_motifs = split /\-+\n\s+MOTIF \w\n/s, $resultstring; #shift @raw_motifs; # discard the first one #foreach my $raw_motif (@raw_motifs) { # #print $raw_motif; # my $motif =$self->_parse_raw_motif($raw_motif) || next; push @{ $self->{'motifs'} }, $motif; #} #return 1; } sub _site_props{ my ($self,$resultstring)=@_; my @sites; # print $resultstring; #($resultstring)=~s/.*Motif counts:\n//s; my @array=split(/Seq\.no/,$resultstring); #print $array[1]; my @sites_array=split "\n", $array[1]; foreach my $line(@sites_array){ # print $line; next if $line=~/Pos/; last if $line eq''; my @site=split(/\s+/,$line); # print $site[1],"\n"; my $nr=0; $nr = 1 if $site[2]==1; #A special case when the site startsat the first base. #Then no preceding quence is given and the site array =shorter by 1 my $motif_seq=$site[4-$nr]; # print $motif_seq,"\n"; my $site = Bio::SeqFeature::Generic->new ( -start => $site[2], -end => $site[2]+(length$motif_seq)-1, -strand => 1, #Always 1 with elph -source => 'Elph', -score => $site[-3], ); foreach my $seq(@{$self->{'seq_set'}}){ if ($seq->id eq $site[-1]){#last element of the array $site->attach_seq ($seq); } } push (@sites,$site); } return \@sites; } 1; TFBS-0.7.1/blib/lib/TFBS/PatternGen/Elph/000077500000000000000000000000001305752266700174455ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/PatternGen/Elph/Motif.pm000066400000000000000000000021731305752266700210640ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::AnnSpec::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::AnnSpec::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::MEME::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the meme program. You do not normally want to create a TFBS::PatternGen::MEME::Motif yourself. They are created by running TFBS::PatternGen::MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::Elph::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); TFBS-0.7.1/blib/lib/TFBS/PatternGen/Gibbs.pm000066400000000000000000000201251305752266700201410ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::Gibbs # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::Gibbs - a pattern factory that uses Chip Lawrences Gibbs program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::Gibbs->new(-seq_file=>'sequences.fa', -binary => '/Programs/Gibbs-1.0/bin/Gibbs' -nr_hits => 24, -motif_length => [8, 9, 10], -additional_params => '-x -r -e'); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::Gibbs builds position frequency matrices using an advanced Gibbs sampling algorithm implemented in external I program by Chip Lawrence. The algorithm can produce multiple patterns from a single set of sequences. =cut package TFBS::PatternGen::Gibbs; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::Gibbs::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $db = TFBS::PatternGen::Gibbs->new(%args); Function: the constructor for the TFBS::PatternGen::Gibbs object Returns : a TFBS::PatternGen::Gibbs object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to Gibbs executable # OPTIONAL: default 'Gibbs' -nr_hits # a presumed number of pattern occurrences in the # sequence set: it can be a single integer, e.g. # -nr_hits => 24 , or a reference to an array of # integers, e.g -nr_hits => [12, 24, 36] -motif_length # an expected length of motif in nucleotides: # it can be a single integer, e.g. # -motif_length => 8 , or a reference to an # array ofintegers, e.g -motif_length => [8..12] -additional_params # a string containing additional # command-line switches for the # Gibbs program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'motif_length_string'} = ($args{'-motif_length'} ? (ref($args{'-motif_length'}) ? join(',', @{$args{'-motif_length'}}) : $args{'-motif_length'}) : 8 ); $self->{'nr_hits_string'} = ($args{'-nr_hits'} ? (ref($args{'-nr_hits'}) ? join(',', @{$args{'-nr_hits'}}) : $args{'-nr_hits'}) : "" ); $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'Gibbs'; $self->{'motifs'} = []; $self->_create_seq_set(%args) or die ('Error creating sequence set'); #print $self->{'seq_set'}->[0]->seq; #$self->_seq_props; $self->_run_Gibbs() or $self->throw("Error running Gibbs."); return $self; } sub _run_Gibbs { my $self = shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $command_line = $self->{'binary'}." ". " -PBernoulli ". $tmp_file." ". $self->{'motif_length_string'}." ". $self->{'nr_hits_string'}." ". $self->{'additional_params'}." -n"; my $resultstring = `$command_line`; $self->_parse_Gibbs_output($resultstring); #print STDERR "$command_line\n"; #print STDERR $resultstring; unlink $tmp_file; return 1 } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _parse_Gibbs_output { my ($self, $resultstring) = @_; #print $resultstring; #print"===========END_RESULTSTRING===============================\n"; $resultstring =~ s/.*=== MAP MAXIMIZATION RESULTS ===//s; my @raw_motifs = split /\-+\n\s+MOTIF \w\n/s, $resultstring; shift @raw_motifs; # discard the first one foreach my $raw_motif (@raw_motifs) { #print $raw_motif; my $motif =$self->_parse_raw_motif($raw_motif) || next; push @{ $self->{'motifs'} }, $motif; } return 1; } sub _site_props{ my ($self,$raw_motif)=@_; my @sites; # print $raw_motif; $raw_motif=~s/.*Num Motifs:\s+\d+\n//s; # print $raw_motif; #print "#####################################################\n"; $raw_motif=~s/\n\s+\*+.*//s; #print $raw_motif,"\n"; my @site_lines=(split("\n", $raw_motif)); foreach my $site(@site_lines){ my $start_seq; my $end_seq; my ($seq_nr,$pattern_nr,$start,$seq,$end,$score,$strand,$desc)=$site=~/\s+(\d+),\s+(\d+)\s+(\d+)\s+([\w,\s]+)\s+(\d+)\s+(\d\.\d+)\s+([F,R])(.*)/; # print $seq_nr,$pattern_nr,$start,$seq; if ($strand eq "F"){ $strand=1; $start_seq=$start; $end_seq=$end; } else{ $strand=-1; $start_seq=$end; $end_seq=$start; } #print $site; my $site = Bio::SeqFeature::Generic->new ( -start => $start_seq, -end => $end_seq, -strand => $strand, -source => 'Gibbs sampler', -score => $score, ); $site->attach_seq ($self->{'seq_set'}->[$seq_nr-1]); push (@sites,$site); } foreach my $site(@sites){ #print $site->start."\n"; } return \@sites; } sub _parse_raw_motif { # a utility function my ($self,$raw_motif) = @_; # print $raw_motif; my ($raw_matrix, $raw_bp, $length, $nr_hits, $MAP_score) = $raw_motif =~ /Motif model \(residue frequency x 100\)\n(.+)Motif probability model\n.+Background probability model\n\s+(.+?)\n.+\D(\d+) columns\nNum Motifs\: (\d+).+Difference of Logs of Maps = ([\-\.\d]+)\n/s; #print $raw_matrix; return undef unless $raw_matrix; my $sites = $self->_site_props($raw_motif); # print STDERR # join ":", ($raw_matrix, $raw_bp, $length, $nr_hits); print "\n"; my $matrix = _parse_raw_matrix($raw_matrix); #print $matrix; return TFBS::PatternGen::Gibbs::Motif->new #This object does not contain a new method. Instead the new method is than searched in the first ISA package. Remember that the object is still a TFBS::PatternGen::Gibbs::Motif. #The only ISA in the package is TFBS::PatternGen::Motif.pm. This package indeed contains the new method (-length => $length."", -bg_probabilities => [split /\s+/, $raw_bp], -MAP_score => $MAP_score, -tags => {MAP_score => $MAP_score}, -nr_hits => $nr_hits, -sites=>$sites, -matrix => $matrix ); } sub _parse_raw_matrix { # a utility function my $raw_matrix = shift; my @lines = split "\n", $raw_matrix; my (@A, @C, @G, @T); foreach my $line (@lines) { my $value_string; next unless ($value_string) = $line =~ /\s+\d+\s+\|\s+(.+)/; $value_string =~ s/\./0/g; my ($a, $t, $c, $g) = split /\s+/, $value_string; push @A, $a; push @C, $c; push @G, $g; push @T, $t; } # print STDERR join(" ",@A, "\n", @C, "\n", @G, "\n", @T, "\n"); return [\@A, \@C, \@G, \@T]; } 1; TFBS-0.7.1/blib/lib/TFBS/PatternGen/Gibbs/000077500000000000000000000000001305752266700176035ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/PatternGen/Gibbs/Motif.pm000066400000000000000000000042521305752266700212220ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::Gibbs::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::Gibbs::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::Gibbs::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the Gibbs program. You do not normally want to create a TFBS::PatternGen::Gibbs::Motif yourself. They are created by running TFBS::PatternGen::Gibbs =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard and Wynand Alkema Boris Lenhard EBoris.Lenhard@cgb.ki.seE Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::Gibbs::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); =head2 MAP Title : MAP Usage : my $map_score = $motif->MAP; Function: returns MAP score for the detected motif (This is a backward compatibility method. For consistency, you should use $motif->tag('MAP_score') instead Returns : float (a scalar) Args : none =head2 Other methods TFBS::PatterGen::Motif::Gibbs inherits from TFBS::PatternGen::Motif, which inherits from TFBS::Matrix. Please consult the documentation of those modules for additional available methods. =cut sub MAP{ my ($self) = @_; return $self->tag("MAP_score"); } sub _calculate_PFM { my $self = shift; unless ($self->{'nr_hits'}) { $self->throw(ref($self). " objects must be created with a (nonzero)". " -nr_hits parameter in constructor" ); } my @PFM; foreach my $rowref ( @{$self->{'matrix'}} ) { my @PFMrow; foreach my $element (@$rowref) { push @PFMrow, int($self->{'nr_hits'}*$element/100 + 0.5); } push @PFM, [@PFMrow]; } return \@PFM; } TFBS-0.7.1/blib/lib/TFBS/PatternGen/MEME.pm000066400000000000000000000140171305752266700176410ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::MEME # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::MEME - a pattern factory that uses the MEME program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::MEME->new(-seq_file=>'sequences.fa', -binary => 'meme' my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::MEME builds position frequency matrices using an external program MEME written by Bailey and Elkan. For information and source code of MEME see http://www.sdsc.edu/MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =cut package TFBS::PatternGen::MEME; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::SimplePFM; use TFBS::PatternGen::MEME::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $pattrengen = TFBS::PatternGen::MEME->new(%args); Function: the constructor for the TFBS::PatternGen::MEME object Returns : a TFBS::PatternGen::MEME object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to the 'meme' executable # OPTIONAL: default 'meme' -additional_params # a string containing additional # command-line switches for the # meme program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'filename'} =$args{'-seq_file'}; $self->{'additional_params'} = ($args{'-additional_params'} ? (ref($args{'-additional_params'}) ? join(' ', @{$args{'-additional_params'}}) : $args{'-additional_params'}) : "" ); $self->{'binary'} = $args{'-binary'} || 'meme'; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_meme() or $self->throw("Error running MEME."); return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _run_meme{ my ($self)=shift; my $tmp_file = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $command_line = $self->{'binary'}." ". $tmp_file." ". "-text ". "-dna ". $self->{'additional_params'} ." 2>/dev/null" ; # print STDERR "$command_line\n"; my $resultstring = `$command_line`; # print STDERR $resultstring; $self->_parse_meme_output($resultstring,$command_line); unlink $tmp_file; return 1 } sub _parse_meme_output{ my ($self,$resultstring,$command_line)=@_; if ($resultstring=~/^error/){ # warn "Error running AnnSpec\nNo patterns produced"; $self->throw ("Error running MEME command:\n $command_line"); return; } my @motifs=split(/\*\nMOTIF/,$resultstring); shift @motifs;#discard the first one #print STDERR scalar @motifs,"\n"; foreach my $raw_motif(@motifs){ my ($matrix,$sites,$score)=$self->_parse_raw_matrix($raw_motif); # print STDERR $matrix; my $motif =TFBS::PatternGen::MEME::Motif->new ( -tags => {score=>$score},#The score in this case is the E-value given in the output -sites=>$sites, -matrix => $matrix ); push @{ $self->{'motifs'} }, $motif; } return } # # sub _parse_raw_matrix{ my ($self,$string)=@_; my @sites; my @matrix; $string=~s/(Motif \d+ block diagrams.*)//s; # print STDERR $string; my ($width,$e_value)=$string=~/width =\s+(\d+)\s+sites.*E-value =(.*)\n/; # print STDERR $e_value,"\n"; $string=~s/.*Motif \d+ sites sorted by position p-value//s; #print STDERR $string; my @array=split("\n",$string); foreach my $line(@array){ my $nr=0; my $strand=1;#if revcomp is not selected the strand is always 1 next if $line=~/^-/; next if $line=~/P-value\s+Site/; my (@properties)=split(/\s+/,$line); next if @properties<1; # print STDERR "@properties\n"; #First determine whether -revcomp switch is used and thus strand info is given if ($properties[1] eq "+" or $properties[1] eq "-"){ $strand=$properties[1]; $nr=1; } my $site = Bio::SeqFeature::Generic->new ( -start =>$properties[1+$nr], -end =>$properties[1+$nr]+$width-1, -strand=>$strand, -source=>'MEME', -score=>$properties[2+$nr] ); foreach my $seq(@{$self->{'seq_set'}}){ if ($seq->id eq $properties[0]){ $site->attach_seq ($seq); } } push @sites,$site; } foreach my $site(@sites){ push @matrix,$site->seq->seq; } my $patterngen=TFBS::PatternGen::SimplePFM->new(-seq_list=>\@matrix); my $matrix=$patterngen->pattern->rawprint; # print STDERR $matrix; return ($matrix,\@sites,$e_value); } 1;TFBS-0.7.1/blib/lib/TFBS/PatternGen/MEME/000077500000000000000000000000001305752266700173005ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/PatternGen/MEME/Motif.pm000066400000000000000000000021731305752266700207170ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::AnnSpec::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::AnnSpec::Motif - class for unprocessed motifs and associated numerical scores created by the Gibbs program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::MEME::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the meme program. You do not normally want to create a TFBS::PatternGen::MEME::Motif yourself. They are created by running TFBS::PatternGen::MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::MEME::Motif; use vars qw(@ISA); use strict; use TFBS::Matrix::PFM; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen::Motif::Matrix); TFBS-0.7.1/blib/lib/TFBS/PatternGen/Motif/000077500000000000000000000000001305752266700176335ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/PatternGen/Motif/Matrix.pm000066400000000000000000000022701305752266700214360ustar00rootroot00000000000000package TFBS::PatternGen::Motif::Matrix; use vars qw(@ISA); use strict; use TFBS::Matrix; use TFBS::Matrix::PFM; @ISA = qw(TFBS::Matrix); sub new { my ($caller, %args) = @_; #my $matrix = TFBS::Matrix->new(%args, -matrixtype=>"PFM"); #my $self = bless $matrix, ref($caller) || $caller; my $self = $caller->SUPER::new(%args, -matrixtype=>"PFM"); $self->{'length'} = $args{'-length'} || scalar @{$self->{'matrix'}->[0]}; $self->{'nr_hits'} = ($args{'-nr_hits'} || undef); # || $self->throw("No -nr_hits provided."); # Why was nr_hits required ?? (Boris) $self->{'sites'}=$args{'-sites'}; # $self->{'tags'} = ($args{'-tags'} || {}); return $self; } sub PFM { my ($self, %args) = @_; return TFBS::Matrix::PFM->new (-name => "unknown", -ID => "unknown", -class=> "unknown", -tags => { %{$self->{'tags'} } }, %args, -matrix => $self->_calculate_PFM() ); } sub pattern { my ($self, %args ) = @_; $self->PFM(%args); } sub _calculate_PFM { # simplest case: matrix already IS PFM my $self = shift; return [@{$self->{'matrix'}}]; } sub get_sites{ return @{$_[0]->{'sites'}}; } 1; TFBS-0.7.1/blib/lib/TFBS/PatternGen/Motif/Word.pm000066400000000000000000000005331305752266700211050ustar00rootroot00000000000000package TFBS::PatternGen::Motif::Word; use vars qw(@ISA); use strict; use TFBS::Word::Consensus; @ISA = qw(TFBS::Word::Consensus); sub new { my ($caller, %args) = @_; my $word = TFBS::Word::Consensus->new(%args); my $self = bless $word, ref($caller) || $caller; return $self; } sub pattern { return $_; } 1; TFBS-0.7.1/blib/lib/TFBS/PatternGen/SimplePFM.pm000066400000000000000000000063171305752266700207160ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::SimplePFM # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::SimplePFM - a simple position frequency matrix factory =head1 SYNOPSIS my @sequences = qw( AAGCCT AGGCAT AAGCCT AAGCCT AGGCAT AGGCCT AGGCAT AGGTTT AGGCAT AGGCCT AGGCCT ); my $patterngen = TFBS::PatternGen::SimplePFM->new(-seq_list=>\@sequences); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::SimplePFM generates a position frequency matrix from a set of nucleotide sequences of equal length, The sequences can be passed either as strings, as Bio::Seq objects or as a fasta file. This pattern generator always creates only one pattern from a given set of sequences. =cut package TFBS::PatternGen::SimplePFM; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGenI; use TFBS::PatternGen; use TFBS::PatternGen::Motif::Matrix; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $db = TFBS::PatternGen::SimplePFM->new(%args); Function: the constructor for the TFBS::PatternGen::SimplePFM object Returns : a TFBS::PatternGen::SimplePFM obkect Args : This method takes named arguments; you must specify one of the following -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_check_seqs_for_uniform_length(); $self->{'motifs'} = [$self->_create_motif()]; return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three above methods are used fro the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _create_motif { my $self = shift; my $length = $self->{'seq_set'}->[-1]->length(); # initialize the matrix my $matrixref = []; for my $i (0..3) { for my $j (0..$length-1) { $matrixref->[$i][$j] = 0; } } #fill the matrix my @base = qw(A C G T); foreach my $seqobj ( @{ $self->{seq_set} } ) { for my $i (0..3) { my $seqstring = $seqobj->seq; my @seqbase = split "", uc $seqstring; for my $j (0..$length-1) { $matrixref->[$i][$j] += ($base[$i] eq $seqbase[$j])?1:0; } } } my $nrhits =0; for my $i (0..3) {$nrhits += $matrixref->[$i][0];} my $motif = TFBS::PatternGen::Motif::Matrix->new(-matrix => $matrixref, -nr_hits=> $nrhits); return $motif; } sub _validate_seq { # a utility function my ($sequence)=@_; $sequence=~ s/[ACGT]//g; return ($sequence eq "" ? 1 : 0); } 1;TFBS-0.7.1/blib/lib/TFBS/PatternGen/YMF.pm000066400000000000000000000123031305752266700175450ustar00rootroot00000000000000 # TFBS module for TFBS::PatternGen::YMF # # Copyright Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternGen::MEME - a pattern factory that uses the MEME program =head1 SYNOPSIS my $patterngen = TFBS::PatternGen::MEME->new(-seq_file=>'sequences.fa', -binary => 'meme' my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object =head1 DESCRIPTION TFBS::PatternGen::MEME builds position frequency matrices using an external program MEME written by Bailey and Elkan. For information and source code of MEME see http://www.sdsc.edu/MEME =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =cut package TFBS::PatternGen::YMF; use vars qw(@ISA); use strict; # Object preamble - inherits from TFBS::PatternGen; use TFBS::PatternGen; use TFBS::PatternGen::YMF::Motif; use File::Temp qw(:POSIX); use Bio::Seq; use Bio::SeqIO; use File::Temp qw/ tempfile tempdir /; @ISA = qw(TFBS::PatternGen); =head2 new Title : new Usage : my $patterngen = TFBS::PatternGen::YMF->new(%args); Function: the constructor for the TFBS::PatternGen::MEME object Returns : a TFBS::PatternGen::MEME object Args : This method takes named arguments; you must specify one of the following three -seq_list # a reference to an array of strings # and/or Bio::Seq objects # or -seq_stream # A Bio::SeqIO object # or -seq_file # the name of the fasta file containing # all the sequences Other arguments are: -binary # a fully qualified path to the 'meme' executable # OPTIONAL: default 'meme' -additional_params # a string containing additional # command-line switches for the # meme program =cut sub new { my ($caller, %args) = @_; my $self = bless {}, ref($caller) || $caller; $self->{'width'}=$args{'-length_oligo'}; $self->{'path_org'}=$args{'-pathoforganismtables'}; $self->{'len_region'}=$args{'-length_region'}; $self->{'config_file'}=$args{'-config_file'}||$args{'-stats_path'}."/stats.config"; #The latter is the example configfile that comes with the installation of YMF $self->{'abs_stats_path'} = $args{'-abs_stats_path'} ; #This is the directory where the executable and the results file is #generated by the program are located $self->_create_seq_set(%args) or die ('Error creating sequence set'); $self->_run_stats() or $self->throw("Error running stats."); return $self; } =head2 pattern =head2 all_patterns =head2 patternSet The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see L for details. =cut sub _run_stats{ my ($self)=shift; my $tmp_file = tmpnam(); my $dumpfile = tmpnam(); my $outstream = Bio::SeqIO->new(-file=>">$tmp_file", -format=>"fasta"); foreach my $seqobj (@{ $self->{'seq_set'} } ) { $outstream->write_seq($seqobj); } $outstream->close(); my $dir = tempdir(); #print $dir; #change directory to directory where the program is located #system 'cd $dir.w;'; # my $command="cd $dir;"; # print $command; # system $command; # `$command`; # system 'ls -ltr'; my $command_line = $self->{'abs_stats_path'}."/stats ". # "stats ". $self->{'config_file'}." ". $self->{'len_region'}." ". $self->{'width'}." ". $self->{'path_org'}." ". "-sort ".#sorts on z-score $tmp_file ." >$dumpfile" ; # print STDERR "cd $dir;$command_line\n"; my $resultstring = `cd $dir;$command_line`; # print STDERR $resultstring; $self->_parse_stats_output($resultstring,$command_line,$dumpfile,$dir); unlink $tmp_file; #unlink $dumpfile; return 1 } # sub _parse_stats_output{ my ($self,$resultstring,$command_line,$dumpfile,$temp_dir)=@_; open DUMP,$dumpfile; while(){ if ((/(^Error.*)/) or /(.*Aborting.*)/){ # warn "Error running AnnSpec\nNo patterns produced"; print "YMF Error message: \"$1\"\n"; unlink $dumpfile; $self->throw ("Error running YMF using command:\n $command_line"); return; } } unlink $dumpfile; open RES,"$temp_dir/results"; my $skip=; while (){ my ($word,$occ,$z_score,$expect,$var)=split; #print $word; my $motif =TFBS::PatternGen::YMF::Motif->new (-word=>$word, -tags => {z_score=>$z_score, 'occurences'=>$occ, 'expectation value'=>$expect, 'variance'=>$var} ); push @{ $self->{'motifs'} }, $motif; } my $command="rm -r $temp_dir"; #print $command; `$command`;# or die "could not unlink $!"; # return } # 1;TFBS-0.7.1/blib/lib/TFBS/PatternGen/YMF/000077500000000000000000000000001305752266700172105ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/PatternGen/YMF/Motif.pm000066400000000000000000000022001305752266700206160ustar00rootroot00000000000000# TFBS module for TFBS::PatternGen::YMF::Motif # # Copyright Boris Lenhard and Wynand Alkema # # You may distribute this module under the same terms as perl itself # # POD # POD =head1 NAME TFBS::PatternGen::YMF::Motif - class for unprocessed motifs and associated numerical scores created by the YMF program =head1 SYNOPSIS =head1 DESCRIPTION TFBS::PatternGen::YMF::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the ymf program. You do not normally want to create a TFBS::PatternGen::YMF::Motif yourself. They are created by running TFBS::PatternGen::YMF =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Wynand Alkema Wynand Alkema EWynand.Alkema@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # the code begins here: package TFBS::PatternGen::YMF::Motif; use vars qw(@ISA); use strict; #use TFBS::Word; #use TFBS::Word::Consensus; use TFBS::PatternGen::Motif::Word; @ISA = qw(TFBS::PatternGen::Motif::Word); 1; TFBS-0.7.1/blib/lib/TFBS/PatternGenI.pm000066400000000000000000000007751305752266700172350ustar00rootroot00000000000000package TFBS::PatternGenI; use vars qw(@ISA); use strict; # Object preamble - inherits from Bio::RootI; use Bio::Root::Root; use Carp; @ISA = qw(Bio::Root::Root); sub pattern { my $self = shift; $self->_abstractDeath; } sub _abstractDeath { # borrowed from BioPerl; with compliments :) my $self = shift; my $package = ref $self; my $caller = (caller())[1]; confess "Abstract method '$caller' defined in interface TFBS::PatternGenI not implemented by package $package"; } TFBS-0.7.1/blib/lib/TFBS/PatternI.pm000066400000000000000000000060201305752266700165700ustar00rootroot00000000000000# TFBS module for TFBS::PatternI # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::PatternI - interface definition for all pattern objects (currently includes matrices and word (consensus and regular expressions ) =head1 DESCRIPTION TFBS::PatternI is a draft class that should contain general interface for matrix and other (future) pattern objects. It is not defined and not used yet, as I need to ponder over certain unresolved issues in general pattern definition. User feedback is more than welcome. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins here: # The code begins HERE: package TFBS::PatternI; use vars '@ISA'; use Bio::Root::Root; use strict; @ISA = qw(Bio::Root::Root); #sub new { #} =head2 ID Title : ID Usage : my $ID = $icm->ID() $pfm->ID('M00119'); Function: Get/set on the ID of the pattern (unique in a DB or a set) Returns : pattern ID (a string) Args : none for get, string for set =cut sub ID { my ($self, $ID) = @_; $self->{'ID'} = $ID if $ID; return $self->{'ID'}; } =head2 name Title : name Usage : my $name = $pwm->name() $pfm->name('PPARgamma'); Function: Get/set on the name of the pattern Returns : pattern name (a string) Args : none for get, string for set =cut sub name { my ($self, $name) = @_; $self->{'name'} = $name if $name; return $self->{'name'}; } =head2 class Title : class Usage : my $class = $pwm->class() $pfm->class('forkhead'); Function: Get/set on the structural class of the pattern Returns : class name (a string) Args : none for get, string for set =cut sub class { my ($self, $class) = @_; $self->{'class'} = $class if $class; return $self->{'class'}; } =head2 tag Title : tag Usage : my $acc = $pwm->tag('acc') $pfm->tag(source => "Gibbs"); Function: Get/set on the structural class of the pattern Returns : tag value (a scalar/reference) Args : tag name (string) for get, tag name (string) and value (any scalar/reference) for set =cut sub tag { my $self = shift; my $tag = shift || return; if (scalar @_) { $self->{'tags'}->{$tag} =shift; } return $self->{'tags'}->{$tag}; } =head2 all_tags Title : all_tags Usage : my %tag = $pfm->all_tags(); Function: get a hash of all tags for a matrix Returns : a hash of all tag values keyed by tag name Args : none =cut sub all_tags { return %{$_[0]->{'tags'}}; } =head2 delete_tag Title : delete_tag Usage : $pfm->delete_tag('score'); Function: get a hash of all tags for a matrix Returns : nothing Args : a string (tag name) =cut sub delete_tag { my ($self, $tag) = @_; delete $self->{'tags'}->{$tag}; } 1; TFBS-0.7.1/blib/lib/TFBS/Site.pm000066400000000000000000000170301305752266700157510ustar00rootroot00000000000000# TFBS module for TFBS::Site # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Site - a nucleotide sequence feature object representing (possibly putative) transcription factor binding site. =head1 SYNOPSIS # manual creation of site object; # for details, see documentation of Bio::SeqFeature::Generic; my $site = TFBS::Site (-start => $start_pos, # integer -end => $end_pos, # integer -score => $score, # float -source => "TFBS", # string -primary => "TF binding site", # primary tag -strand => $strand, # -1, 0 or 1 -seqobj => $seqobj, # a Bio::Seq object whose sequence # contains the site -pattern => $pattern_obj # usu. TFBS::Matrix:PWM obj. -); # Searching sequence with a pattern (PWM) and retrieving individual sites: # # The following objects should be defined for this example: # $pwm - a TFBS::Matrix::PWM object # $seqobj - a Bio::Seq object # Consult the documentation for the above modules if you do not know # how to create them. # Scanning sequence with $pwm returns a TFBS::SiteSet object: my $site_set = $pwm->search_seq(-seqobj => $seqobj, -threshold => "80%"); # To retrieve individual sites from $site_set, create an iterator obj: my $site_iterator = $site_set->Iterator(-sort_by => "score"); while (my $site = $site_iterator->next()) { # do something with $site } =head1 DESCRIPTION TFBS::Site object holds data for a (possibly predicted) transcription factor binding site on a nucleotide sequence (start, end, strand, score, tags, as well as references to the corresponding sequence and pattern objects). TFBS::Site is a subclass of Bio::SeqFeature::Generic and has acces to all of its method. Additionally, it contains the pattern() method, an accessor for pattern object associated with the site object. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. TFBS::Site is a class that extends Bio::SeqFeature::Generic. Please consult Bio::SeqFeature::Generic documentation for other available methods. =cut # The code begins HERE: package TFBS::Site; use vars qw(@ISA); use strict; use Bio::SeqFeature::Generic; @ISA = qw(Bio::SeqFeature::Generic); =head2 new Title : new Usage : my $site = TFBS::Site->new(%args) Function: constructor for the TFBS::Site object Returns : TFBS::Site object Args : -start, # integer -end, # integer -strand, # -1, 0 or 1 -score, # float -source, # string (method used to detect it) -primary, # string (primary tag) -seqobj, # a Bio::Seq object -pattern # a pattern object, usu. TFBS::Matrix::PWM =cut sub new { my $class = shift; my %args = (-seq_id => undef, -siteseq => undef, -seqobj => undef, -strand => "0", -source => "TFBS", -primary => "TF binding site", -pattern => undef, -score => undef, -start => undef, -end => undef, -frame => 0, @_); my $obj = Bio::SeqFeature::Generic->new(%args); my $self = bless $obj, ref($class) || $class; if ($args{-seqobj}) { $self->attach_seq($args{-seqobj}) ; $self->add_tag_value('sequence', $self->seq->seq); } # this is only for GFF printing really, and will be moved there soon if (defined $args{'-pattern'}) { $self->pattern($args{'-pattern'}); $self->add_tag_value('TF' => $self->pattern->name()); $self->add_tag_value('class' => $self->pattern->class) if $self->pattern->class; } return $self; } =head2 pattern Title : pattern Usage : my $pattern = $site->pattern(); # gets the pattern $site->pattern($pwm); # sets the pattern to $pwm Function: gets/sets the pattern object associated with the site Returns : pattern object, here TFBS::Matrix::PWM object Args : pattern object (optional, for setting the pattern only) =cut sub pattern { my ($self, $pattern) = @_; if (defined $pattern) { $self->{'pattern'} = $pattern; } return $self->{'pattern'}; } =head2 rel_score Title : rel_score Usage : my $percent_score = $site->rel_score() * 100; # gets the pattern Function: gets relative score (between 0.0 to 1.0) with respect of the score range of the associated pattern (matrix) Returns : floating point number between 0 and 1, or undef if pattern not defined Args : none =cut sub rel_score { my ($self) = @_; return undef unless $self->pattern(); return ($self->score - $self->pattern->min_score)/ ($self->pattern->max_score - $self->pattern->min_score); } =head2 GFF Title : GFF Usage : print $site->GFF(); : print $site->GFF($gff_formatter) Function: returns a "standard" GFF string - the "generic" gff_string method is left untouched for possible customizations Returns : a string (NOT newline terminated! ) Args : a $gff_formatter function reference (optional) =cut sub GFF { # due to popular demand, GFF is again a legal method, this time # not requiring GFF modules return $_[0]->gff_string($_[1]); } =head2 location =head2 start =head2 end =head2 length =head2 score =head2 frame =head2 sub_SeqFeature =head2 add_sub_SeqFeature =head2 flush_sub_SeqFeature =head2 primary_tag =head2 source_tag =head2 has_tag =head2 add_tag_value =head2 each_tag_value =head2 all_tags =head2 remove_tag =head2 attach_seq =head2 seq =head2 entire_seq =head2 seq_id =head2 annotation =head2 gff_format =head2 gff_string The above methods are inherited from Bio::SeqFeature::Generic. Please see L for details on their usage. =cut ################################################################## # BACKWARD COMPATIBILITY METHODS sub Matrix { my ($self, %args) = @_; $self->pattern(%args); } sub seqobj { } sub siteseq { $_[0]->seq->seq(); } sub site_length { my ($self) = @_; $self->warn("site_length method is present for backward compatibility only. In new code please use the length() method"); return $self->length(); } sub old_GFF { eval "require GFF::GeneFeature;"; if ($@) { print STDERR "Failed to load GFF modules, stopped"; return; } my ($self, %tags) =@_; $self->warn("GFF method is for backward compatibility only, and its use in new code is not recommended. Please use Bio::SeqFeature::Generic gff methods if possible."); my $GFFgf = GFF::GeneFeature->new(2); $GFFgf->seqname ( $self->seqname() or "Unknown" ); $GFFgf->source ("TFBS"); $GFFgf->feature ("TFBS"); $GFFgf->start ($self->start()); $GFFgf->end ($self->end()); $GFFgf->score ($self->score()); $GFFgf->strand (("-",".","+")[$self->strand()+1]); # $GFFgf->strand ($self->strand()); %tags = (TF => $self->pattern->{name}, class => $self->pattern->{class}, sequence => $self->seq->seq(), %tags); while (my ($tag, $value) = each %tags) { my @values; if (ref($value) eq "ARRAY") { @values = @$value; } else { @values = ($value); } $GFFgf->attribute($tag, @values); } return $GFFgf; } 1; TFBS-0.7.1/blib/lib/TFBS/SitePair.pm000066400000000000000000000027651305752266700165760ustar00rootroot00000000000000package TFBS::SitePair; use vars qw(@ISA); use strict; use Bio::SeqFeature::FeaturePair; @ISA = qw(Bio::SeqFeature::FeaturePair); # 'new' used to be inherited, but we need it now sub new { my ($caller, $site1, $site2) = @_; if ($Bio::Root::Root::VERSION < 1.4) { return $caller->SUPER::new($site1, $site2); } else { return $caller->SUPER::new(-feature1 => $site1, -feature2 => $site2); } } =head2 pattern Title : pattern Usage : my $pattern = $sitepair->pattern(); # gets the pattern # sets the pattern to $pwm Function: gets the pattern object associated with the site pair Returns : pattern object, here TFBS::Matrix::PWM object Args : none (get-only method) =cut sub pattern { $_[0]->feature1->pattern(); } =head2 GFF Title : GFF Usage : print $site->GFF(); : print $site->GFF($gff_formatter) Function: returns a "standard" multiline GFF string Returns : a string (multiline, newline terminated) Args : a $gff_formatter function reference (optional) =cut sub GFF { return join "\n", $_[0]->site1->GFF, $_[0]->site2->GFF; } =head2 site1 =head2 site2 Title : site1 site2 Usage : my $site1 = $sitepair->site1(); Function: Returns individual TFBS::Site objects, from the site pair Returns : a TFBS::Site Args : none =cut sub site1 { $_[0]->feature1(); } sub site2 { $_[0]->feature2(); } TFBS-0.7.1/blib/lib/TFBS/SitePairSet.pm000066400000000000000000000142441305752266700172450ustar00rootroot00000000000000# TFBS module for TFBS::SitePairSet # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::SitePairSet - a set of TFBS::SitePair objects =head1 SYNOPSIS my $site_pair_set = TFBS::SitePairSet->new(@list_of_site_pair_objects); # add a TFBS::SitePair object to set: $site_pair_set->add_site_pair($site_pair_obj); # append another TFBS::SitePairSet contents: $site_pair_set->add_site_pair_set($site_pair_obj); # create an iterator: my $it = $site_pair_set->Iterator(-sort_by => 'start'); =head1 DESCRIPTION TFBS::SitePairSet is an aggregate class that contains a collection of TFBS::SitePair objects. It can be created anew and filled with TFBS::Site::Pair object. It is also returned by search_aln() method call of TFBS::PatternI subclasses (e.g. TFBS::Matrix::PWM). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::SitePairSet; use vars qw(@ISA $AUTOLOAD); use strict; use TFBS::SitePair; use TFBS::_Iterator::_SiteSetIterator; @ISA = qw(Bio::Root::Root); sub new { my ($class, @data) = @_; my $self = bless {}, ref($class) || $class; $self->{_site_array_ref} = []; @data = @{$class->{_site_array_ref}} if !@data && ref($class); $self->add_site_pair(@data); return $self; } =head2 size Title : size Usage : my $size = $sitepairset->size() Function: returns a number of TFBS::SitePair objects contained in the set Returns : a scalar (integer) Args : none =cut sub size { scalar @{ $_[0]->{_site_array_ref} }; } =head2 add_site_pair Title : add_site_pair Usage : $sitepairset->add_site_pair($site_pair_object) $sitepairset->add_site_pair(@list_of_site_pair_objects) Function: adds TFBS::SitePair objects to an existing TFBS::SitePairSet object Returns : $sitepairset object (usually ignored) Args : A list of TFBS::SitePair objects to add =cut sub add_site_pair { my ($self, @site_list) = @_; foreach my $site (@site_list) { $site->isa("TFBS::SitePair") or $self->throw("Attempted to add an element ". "of a wrong type."); push @{$self->{_site_array_ref}}, $site; } return 1; } =head2 add_site_pair_set Title : add_site_pair_set Usage : $sitepairset->add_site_pair_set($site_pair_set_object) $sitepairset->add_site_pair(@list_of_site_pair_set_objects) Function: adds the contents of other TFBS::SitePairSet objects to an existing TFBS::SitePairSet object Returns : $sitepairset object (usually ignored) Args : A list of TFBS::SitePairSet objects whose contents should be added to $sitepairset =cut sub add_site_pair_set { my ($self, @sitesets) = @_; foreach my $siteset (@sitesets) { $siteset->isa("TFBS::SitePairSet") or $self->throw("Attempted to add an element ". "that is not a TFBS::SiteSet object."); push @{$self->{_site_array_ref}}, @{ $siteset->{_site_array_ref} }; } return $self; } =head2 Iterator Title : Iterator Usage : my $it = $sitepairset->Iterator(-sort_by=>'start'); while (my $site_pair = $it->next()) { #... Function: Returns an iterator object, used to iterate thorugh elements (TFBS::SitePair objects) Returns : a TFBS::_Iterator object Args : -sort_by # optional - currently it accepts # (default sort order in parenthetse) # 'name' (pattern name, alphabetically) # 'ID' (pattern/matrix ID, alphabetically) # 'start' (site start in sequence, # numerically,increasing order) # 'end' (site end in sequence, # numerically, increasing order) # 'score' (1st site in pair, # numerically, decreasing order) -reverse # optional - reverses the default sorting order if true =cut sub Iterator { my ($self, %args) = @_; return TFBS::_Iterator::_SiteSetIterator->new($self->{_site_array_ref}, $args{'-sort_by'}, $args{'-reverse'} ); } =head2 set1 =head2 set2 Title : set1 set2 Usage : my $siteset1 = $sitepairset->set1(); : my $siteset2 = $sitepairset->set2() Function: Returns individual TFBS::SiteSet objects, from the site set pair Returns : A TFBS::SiteSet object Args : none =cut sub set1 { $_[0]->_get_set(1); } sub set2 { $_[0]->_get_set(2); } =head2 GFF Title : GFF Usage : print $site->GFF(); : print $site->GFF($gff_formatter) Function: returns a "standard" multiline GFF string Returns : a string (multiline, newline terminated) Args : a $gff_formatter function reference (optional) =cut sub GFF { my ($self, %args) = @_; my $iterator = $self->Iterator(-sort_by=>'start'); my $gff_string = ""; while (my $sitepair = $iterator->next()) { $gff_string .= $sitepair->GFF(%args)."\n"; } return $gff_string; } ############################################################## # PRIVATE AND AUTOMATIC METHODS ############################################################## sub _get_set { my ($self, $set_nr) = @_; my $feature = "feature$set_nr"; my $it = $self->Iterator(); my $siteset = TFBS::SiteSet->new(); no strict 'refs'; while (my $site_pair = $it->next()) { eval "$siteset->add_site(\$site_pair->$feature())"; } return $siteset; } sub AUTOLOAD { my ($self) = @_; my %discontinued = (sort => 1, sort_by_name => 1, sort_reversed => 1, reverse => 1, next_site => 1, reset => 1 ); $AUTOLOAD =~ /.+::(\w+)/; if ($discontinued{$1}) { $self->_no_more($1); } else { $self->throw("$1: no such method"); } } sub _no_more { $_[0]->throw("Method '$_[1]' is no longer available in ". ref($_[0]).". Use the 'Iterator' method instead."); } 1; TFBS-0.7.1/blib/lib/TFBS/SiteSet.pm000066400000000000000000000141011305752266700164210ustar00rootroot00000000000000# TFBS module for TFBS::SiteSet # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::SiteSet - a set of TFBS::Site objects =head1 SYNOPSIS my $site_set = TFBS::SiteSet->new(@list_of_site_objects); # add a TFBS::Site object to set: $site_set->add_site($site_obj); # append another TFBS::SiteSet contents: $site_pair_set->add_site_set($site_obj); # create an iterator: my $it = $site_set->Iterator(-sort_by => 'start'); =head1 DESCRIPTION TFBS::SiteSet is an aggregate class that contains a collection of TFBS::Site objects. It can be created anew and filled with TFBS::Site object. It is also returned by search_seq() method call of some TFBS::PatternI subclasses (e.g. TFBS::Matrix::PWM). =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::SiteSet; use vars qw(@ISA $AUTOLOAD); use TFBS::Site; use TFBS::_Iterator::_SiteSetIterator; use strict; @ISA = qw(Bio::Root::Root); sub new { my ($class, @data) = @_; my $self = bless {}, ref($class) || $class; $self->{_site_array_ref} = []; @data = @{$class->{site_list}} if !@data && ref($class); $self->add_site(@data); return $self; } =head2 add_site Title : add_site Usage : $siteset->add_site($site_object) $siteset->add_site(@list_of_site_objects) Function: adds TFBS::Site objects to an existing TFBS::SiteSet object Returns : $sitepair object (usually ignored) Args : A list of TFBS::Site objects to add =cut sub add_site { my ($self, @site_list) = @_; foreach my $site (@site_list) { ref($site) =~ /TFBS::Site*/ or $self->throw("Attempted to add an element ". "of a wrong type."); push @{$self->{_site_array_ref}}, $site; } return 1; } =head2 add_site_set Title : add_site_set Usage : $siteset->add_site_set($site_set_object) $siteset->add_site(@list_of_site_set_objects) Function: adds the contents of other TFBS::SiteSet objects to an existing TFBS::SiteSet object Returns : $siteset object (usually ignored) Args : A list of TFBS::SiteSet objects whose contents should be added to $siteset =cut sub add_siteset { my ($self, @sitesets) = @_; foreach my $siteset (@sitesets) { ref($siteset) =~ /TFBS::Site.*Set/ or $self->throw("Attempted to add an element ". "that is not a TFBS::SiteSet object."); push @{$self->{_site_array_ref}}, @{ $siteset->{_site_array_ref} }; } return $self; } =head2 size Title : size Usage : my $size = $siteset->size() Function: returns a number of TFBS::Site objects contained in the set Returns : a scalar (integer) Args : none =cut sub size { scalar @{ $_[0]->{_site_array_ref} }; } =head2 Iterator Title : Iterator Usage : my $siteset_iterator = $siteset->Iterator(-sort_by =>'start'); while (my $site = $siteset_iterator->next) { # do whatever you want with individual matrix objects } Function: Returns an iterator object that can be used to go through all members of the set (TFBS::Site objects) Returns : an iterator object (currently undocumentened in TFBS - but understands the 'next' method) Args : -sort_by # optional - currently it accepts # (default sort order in parenthetse) # 'name' (pattern name, alphabetically) # 'ID' (pattern/matrix ID, alphabetically) # 'start' (site start in sequence, # numerically,increasing order) # 'end' (site end in sequence, # numerically, increasing order) # 'score' (numerically, decreasing order) -reverse # optional - reverses the default sorting order if true =cut sub Iterator { my ($self, %args) = @_; return TFBS::_Iterator::_SiteSetIterator->new($self->{_site_array_ref}, $args{'-sort_by'}, $args{'-reverse'} ); } sub all_sites { my ($self,%args) = @_; return @{$self->{_site_array_ref}} if @{$self->{_site_array_ref}}; } =head2 GFF Title : GFF Usage : print $siteset->GFF(); : print $siteset->GFF($gff_formatter) Function: returns a "standard" multiline GFF string Returns : a string (multiline, newline terminated) Args : a $gff_formatter function reference (optional) =cut sub GFF { my ($self, %args) = @_; my $site_iterator = $self->Iterator(-sort_by=>'start'); my $gff_string = ""; while (my $site = $site_iterator->next()) { $gff_string .= $site->GFF(%args)."\n"; } return $gff_string; } ######################################################## # OBSOLETE METHODS ######################################################## sub old_GFF { eval "require GFF::GeneFeatureSet;"; if ($@) { print STDERR "Failed to load GFF modules, stopped"; return; } my ($self) = @_; my $site_iterator = $self->Iterator(-sort_by=>'start'); my $GFFset = GFF::GeneFeatureSet->new(2); while (my $site = $site_iterator->next()) { $GFFset->addGeneFeature($site->GFF()); } return $GFFset; } ############################################################## # PRIVATE AND AUTOMATIC METHODS ############################################################## sub AUTOLOAD { my ($self) = @_; my %discontinued = (sort => 1, sort_by_name => 1, sort_reversed => 1, reverse => 1, next_site => 1, reset => 1 ); $AUTOLOAD =~ /.+::(\w+)/; if ($discontinued{$1}) { $self->_no_more($1); } else { $self->throw("$1: no such method"); } } sub _no_more { $_[0]->throw("Method '$_[1]' is no longer available in ". ref($_[0]).". Use the 'Iterator' method instead."); } 1; TFBS-0.7.1/blib/lib/TFBS/Tools/000077500000000000000000000000001305752266700156065ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/Tools/SetOperations.pm000066400000000000000000000134511305752266700207470ustar00rootroot00000000000000package TFBS::Tools::SetOperations; use strict; use Bio::Root::Root; use vars qw'@ISA'; @ISA = qw'Bio::Root::Root'; sub new { my ($caller, @args) = @_; my $self = bless {}, ref $caller || $caller; my ($index_by, $strict, $output_type, $pairs) = $self->_rearrange([qw'INDEX_BY STRICT OUTPUT_TYPE PAIRS'], @args); $self->index_by($index_by); $self->strict($strict); $self->output_type($output_type); $self->pairs($pairs); return $self; } sub union { my ($self, @sets) = @_; my %union_index = map {$self->_index($_)} $self->_sets_to_arrayrefs(@sets); $self->_output(\%union_index); } sub intersection { my ($self, @sets) = @_; my @set_arrayrefs = $self->_sets_to_arrayrefs(@sets); #this would be faster, but we might want to retain the exact objects # that were present in #my @set_arrayrefs = sort {@$a <=> @$b} $self->_sets_to_arrayrefs(@sets); my %intersection_index = $self->_index(shift @set_arrayrefs); foreach my $set_arrayref (@set_arrayrefs) { my %curr_set_index = $self->_index($set_arrayref); my @help_array = %curr_set_index; foreach my $key (keys %intersection_index) { if (!exists $curr_set_index{$key}) { delete $intersection_index{$key} ; } } } $self->_output(\%intersection_index); } sub difference { # pairs only for now my ($self, @sets) = @_; my ($set1, $set2) = $self->_sets_to_arrayrefs(@sets); if (!defined $set2) { $self->throw ("'difference' needs exactly two sets as arguments"); } my %diff_index1 = $self->_index($set1); my %diff_index2 = $self->_index($set2); foreach my $key (keys %diff_index1) { if (exists $diff_index2{$key}) { delete $diff_index1{$key}; delete $diff_index2{$key}; } } wantarray ? ($self->_output(\%diff_index1), $self->_output(\%diff_index2)) : $self->_output(\%diff_index1); } sub index_by { my $self = shift; # By default, we are dealing with Bio::SeqFeatureI objects my @DEFAULTS = qw(primary_tag source_tag start end score strand); if (@_) { if(!defined $_[0]) { $self->{_index_by} = \@DEFAULTS; } elsif (ref($_[0]) eq "ARRAY") { $self->{_index_by} = $_[0]; } else { $self->{_index_by} = [@_]; } } return @{$self->{_index_by}}; } sub strict { my $self = shift; if (@_) { if ($self->{_strict} = shift) { $self->{_index_fn} = \&_index_strict; } else { $self->{_index_fn} = \&_index_by_annotation; } } return $self->{_strict}; } sub output_type { my $self = shift; if (@_) { unless ($self->{_output_type} = shift) { $self->{_output_type} = "arrayref" } } return $self->{_output_type}; } sub pairs { my $self = shift; if (@_) { if ($self->{_pairs} = shift and !$self->strict) { $self->{_index_fn} = \&_index_by_pair_annotation; } } return $self->{_pairs}; } sub _index { my ($self) = @_; $self->{_index_fn}->(@_); } sub _index_strict { my ($self, $set_arrayref) = @_; my %index_hash = (map {$_, $_} @$set_arrayref); return %index_hash; } sub _index_by_pair_annotation { my ($self, $set_arrayref) = @_; my %index_hash; foreach my $member (@$set_arrayref) { my @index_elements = ($self->_get_index_elements($member->feature1), $self->_get_index_elements($member->feature2)); $index_hash{join("::", @index_elements)} = $member; } return %index_hash; } sub _index_by_annotation { my ($self, $set_arrayref) = @_; my %index_hash; foreach my $member (@$set_arrayref) { my @index_elements = $self->_get_index_elements($member); $index_hash{join("::", @index_elements)} = $member; } return %index_hash; } sub _get_index_elements { my ($self, $set_member) = @_; my @index_elements; foreach my $method ($self->index_by) { if (ref($method) eq 'CODE') { push @index_elements, $method->($set_member); } else { eval { push @index_elements, $set_member->$method; }; if ($@) { $self->throw(sprintf("Could not use '%s' for indexing a %s object. The original error was:\n", $method, ref($set_member)).$@) } } } return @index_elements; } sub _sets_to_arrayrefs { my ($self, @sets) = @_; my @set_arrayrefs; foreach my $set (@sets) { if (ref($set) eq "ARRAY") { push @set_arrayrefs, $set; } elsif(ref($set) and $set->can("Iterator")) { my @set_elements; my $it = $set->Iterator; while (my $set_el = $it->next) { push @set_elements, $set_el } push @set_arrayrefs, \@set_elements; } else { $self->throw("Set must be an aray reference or have an ". "Iterator method. Got ".(ref($set or $set)). "instead."); } } return @set_arrayrefs; } sub _output { my ($self, $hashref) = @_; if ($self->output_type eq "arrayref") { return [values %$hashref]; } elsif ($self->output_type eq "array") { return %$hashref; } elsif ($self->output_type eq "matrix_set") { my $setobj = TFBS::MatrixSet->new; $setobj->add_Matrix(values %$hashref); return $setobj; } elsif ($self->output_type eq "site_set") { my $setobj = TFBS::SiteSet->new; $setobj->add_site(values %$hashref); return $setobj; } else { $self->throw($self->output_type." is not a supported output type"); } } 1; TFBS-0.7.1/blib/lib/TFBS/Word.pm000066400000000000000000000042751305752266700157670ustar00rootroot00000000000000# TFBS module for TFBS::Word # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Word - base class for word-based patterns =head1 DESCRIPTION TFBS::Word is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and word pattern manipulation methods that are independent of the word type. It is not meant to be instantiated itself. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut # The code begins HERE: package TFBS::Word; use vars '@ISA'; use TFBS::PatternI; use strict; @ISA = qw(TFBS::PatternI); =head2 new =cut sub new { my ($caller, @args) = @_; my $self = $caller->SUPER::new(@args); my ($id, $name, $class, $word, $tagref) = $self->_rearrange([qw(ID NAME CLASS WORD TAGS)], @args); if (defined $word) { $self->word($word); } else { $self->throw("Need a -word argument"); } $self->name($name); $self->ID($id); $self->{'tags'} = ($tagref or {}); return $self; } =head2 word =cut sub word { my ($self, @args) = @_; if(scalar(@args) == 0) { return $self->{'word'}; } my ($word) = @args; if (defined $word and ! $self->validate_word($word)) { $self->throw("Trying to set the word to an invalid value: $word"); } else { return $self->{'word'} = $word; } } =head2 validate_word Required in all subclasses =cut sub validate_word { shift->throw("Error: method 'validate_word' not implemented"); } =head2 length =cut sub length { # wird length does not have to be defined, but its subroutine does shift->throw("Error: method 'length' not implemented"); } =head2 search_seq =cut sub search_seq { shift->throw("Error: method search_seq not implemented"); } =head2 search_aln =cut sub search_aln { shift->throw("Error: method search_aln not implemented"); } 1;TFBS-0.7.1/blib/lib/TFBS/Word/000077500000000000000000000000001305752266700154215ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/Word/Consensus.pm000066400000000000000000000150411305752266700177400ustar00rootroot00000000000000# TFBS module for TFBS::Word::Consensus # # Copyright Boris Lenhard # # You may distribute this module under the same terms as perl itself # # POD =head1 NAME TFBS::Word - IUPAC DNA consensus word-based pattern class =head1 DESCRIPTION TFBS::Word is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and word pattern manipulation methods that are independent of the word type. It is not meant to be instantiated itself. =head1 FEEDBACK Please send bug reports and other comments to the author. =head1 AUTHOR - Boris Lenhard Boris Lenhard EBoris.Lenhard@cgb.ki.seE =head1 APPENDIX The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. =cut package TFBS::Word::Consensus; use vars '@ISA'; use TFBS::Word; use TFBS::Matrix::PWM; use strict; @ISA = qw(TFBS::Word); =head2 new Title : new Usage : my $pwm = TFBS::Matrix::PWM->new(%args) Function: constructor for the TFBS::Matrix::PWM object Returns : a new TFBS::Matrix::PWM object Args : # you must specify the -word argument: -word, # a strig consisting of letters in # IUPAC degenerate DNA alphabet # (any of ACGTSWKMPYBDHVN) ####### -name, # string, OPTIONAL -ID, # string, OPTIONAL -class, # string, OPTIONAL -tags # a hash reference reference, OPTIONAL =cut # "new" is inherited from TFBS::Word =head2 search_seq Title : search_seq Usage : my $siteset = $pwm->search_seq(%args) Function: scans a nucleotide sequence with the pattern represented by the PWM Returns : a TFBS::SiteSet object Args : # you must specify either one of the following three: -file, # the name od a fasta file (single sequence) #or -seqobj # a Bio::Seq object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -seqstring # a string containing the sequence -max_mismatches, # number of allowed positions in the site that do # not match the consensus # OPTIONAL: default 0 =cut sub search_seq { my ($self, @args) = @_; my ($max_mismatch) = $self->_rearrange([qw(MAX_MISMATCHES)], @args) or 0; $max_mismatch = 0 unless defined $max_mismatch; my $pwm = $self->to_PWM; my $siteset = $pwm->search_seq(@args, -threshold => $self->length - $max_mismatch); $self->_replace_patterns_in_siteset($siteset); return $siteset; } =head2 search_aln Title : search_aln Usage : my $site_pair_set = $pwm->search_aln(%args) Function: Scans a pairwise alignment of nucleotide sequences with the pattern represented by the word: it reports only those hits that are present in equivalent positions of both sequences and exceed a specified threshold score in both, AND are found in regions of the alignment above the specified conservation cutoff value. Returns : a TFBS::SitePairSet object Args : # you must specify either one of the following three: -file, # the name of the alignment file in Clustal format #or -alignobj # a Bio::SimpleAlign object # (more accurately, a Bio::PrimarySeqobject or a # subclass thereof) #or -alignstring # a multi-line string containing the alignment # in clustal format ############# -max_mismatches, # number of allowed positions in the site that do # not match the consensus # OPTIONAL: default 0 -window, # size of the sliding window (inn nucleotides) # for calculating local conservation in the # alignment # OPTIONAL: default 50 -cutoff # conservation cutoff (%) for including the # region in the results of the pattern search # OPTIONAL: default "70%" =cut sub search_aln { my ($self, @args) = @_; my ($max_mismatch) = $self->_rearrange([qw(MAX_MISMATCHES)], @args) or 0; $max_mismatch = 0 unless defined $max_mismatch; my $pwm = $self->to_PWM; my $sitepairset = $pwm->search_aln(@args, -threshold => $self->length - $max_mismatch); $self->_replace_patterns_in_sitepairset($sitepairset); return $sitepairset; } =head2 to_PWM =cut sub to_PWM { my ($self, @args) = @_; my $pwm = TFBS::Matrix::PWM->new(-ID => $self->ID, -name => $self->name, -class => $self->class, -matrix => _consensus2matrixref($self->word), -tags => {$self->all_tags} ); return $pwm; } =head2 validate_word =cut sub validate_word { my ($self, $word) = @_; $word =~ s/[ACGTSWKMRYBDHVN]//gi; return ($word eq ""); } =head2 length =cut sub length { return length $_[0]->word; } # private methods sub _replace_patterns_in_siteset { my ($self, $siteset) = @_; my $iter = $siteset->Iterator; while (my $site = $iter->next) { $site->pattern($self); } } sub _replace_patterns_in_sitepairset { my ($self, $sitepairset) = @_; my $iter = $sitepairset->Iterator; while (my $sitepair = $iter->next) { $sitepair->feature1->pattern($self); $sitepair->feature2->pattern($self); } } # utility functions sub _consensus2matrixref { my ($word) = @_; my %iupac = ( T => [0,0,0,1], G => [0,0,1,0], K => [0,0,1,1], C => [0,1,0,0], Y => [0,1,0,1], S => [0,1,1,0], B => [0,1,1,1], A => [1,0,0,0], W => [1,0,0,1], R => [1,0,1,0], D => [1,0,1,1], M => [1,1,0,0], H => [1,1,0,1], V => [1,1,1,0], N => [1,1,1,1] ); my @vert_array; foreach my $letter (split '', $word) { push @vert_array, ($iupac{uc($letter)} or croak ("$letter is not a legal IUPAC DNA character")); } return _transpose_arrayref(\@vert_array); } sub _transpose_arrayref { my $vert_arrayref = shift; my $maxcol = scalar(@$vert_arrayref) - 1; my @horiz_array; foreach my $row (0..3) { push @horiz_array, [ map { $vert_arrayref->[$_][$row] } 0..$maxcol ]; } return \@horiz_array; } 1; TFBS-0.7.1/blib/lib/TFBS/_Iterator.pm000066400000000000000000000030071305752266700167740ustar00rootroot00000000000000package TFBS::_Iterator; use vars '@ISA'; use strict; use Carp; @ISA = qw(Bio::Root::Root); ############################################################# # PUBLIC METHODS ############################################################# sub new { my ($caller, $arrayref, $sort_by, $reverse) = @_; my $class = ref $caller || $caller; my $self; if ($arrayref) { $self = bless { _orig_array_ref => [ @$arrayref ], _iterator_array_ref => [ @$arrayref ], _sort_by => ($sort_by || undef), _reverse => ($reverse || 0) }, $class; } else { croak("No valid array ref for Iterator of ". (ref($class) || $class)." provided:"); } $self->_sort() if $sort_by; $self->_reverse() if $reverse; return $self; } sub current { } sub reset { my ($self) = @_; @{$self->{_iterator_array_ref}} = @{$self->{_orig_array_ref}}; $self->_sort() if $self->{'_sort_by'}; $self->_reverse() if $self->{'reverse'}; return $self; } sub next { my $self = shift; return shift @{$self->{_iterator_array_ref}}; } ################################################################# # PRIVATE METHODS ################################################################# sub _sort { my ($self, $sort_by) = @_; $self->throw("Generic iterator cannot sort ".ref($self). " object by '$sort_by'."); } sub _reverse { my $self = shift; $self->{'_iterator_array_ref'} = [ reverse @{ $self->{'_iterator_array_ref'} } ]; } TFBS-0.7.1/blib/lib/TFBS/_Iterator/000077500000000000000000000000001305752266700164365ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/TFBS/_Iterator/_MatrixSetIterator.pm000066400000000000000000000027521305752266700225730ustar00rootroot00000000000000package TFBS::_Iterator::_MatrixSetIterator; use vars '@ISA'; use strict; use Carp; use TFBS::_Iterator; @ISA = qw(TFBS::_Iterator); sub _sort { my ($self, $sort_by) = @_; $sort_by or $sort_by = $self->{_sort_by} or $sort_by = 'name'; # we can sort by name, start, end, score my %sort_fn = (class => sub { $a->class() cmp $b->class() || $a->name() cmp $b->name() || $a->ID() cmp $b->ID() }, id => sub { $a->ID() cmp $b->ID() }, ID => sub { $a->ID() cmp $b->ID() }, name => sub { $a->name() cmp $b->name() || $a->class() cmp $b->class() || $a->ID() cmp $b->ID() }, species => sub { $a->tag('species') cmp $b->tag('species') || $a->class() cmp $b->class() || $a->ID() cmp $b->ID() }, total_ic => sub { $b->total_ic() <=> $a->total_ic() || $a->name() cmp $b->name() } ); if (defined (my $sort_function = $sort_fn{lc $sort_by})) { $self->{'_iterator_array_ref'} = [ sort $sort_function @{$self->{'_orig_array_ref'}} ]; } else { #order by tag derived value $self->{'_iterator_array_ref'}= [ sort { $a->tag($self->{_sort_by}) cmp $b->tag( $self->{_sort_by}) || $a->class() cmp $b->class() || $a->ID() cmp $b->ID() } @{$self->{'_orig_array_ref'}} ] || $self->throw("Cannot sort ".ref($self)." object by '$sort_by'."); } } TFBS-0.7.1/blib/lib/TFBS/_Iterator/_SiteSetIterator.pm000066400000000000000000000025331305752266700222300ustar00rootroot00000000000000package TFBS::_Iterator::_SiteSetIterator; use vars '@ISA'; use strict; use Carp; use TFBS::_Iterator; @ISA = qw(TFBS::_Iterator); sub _sort { my ($self, $sort_by) = @_; $sort_by or $sort_by = $self->{_sort_by} or $sort_by = 'name'; # we can sort by name, start, end, score my %sort_fn = (start => sub { $a->start() <=> $b->start() || $a->pattern->name() cmp $b->pattern->name() || $a->strand() <=> $b->strand() }, end => sub { $a->end() <=> $b->end() || $a->pattern->name() cmp $b->pattern->name() || $a->strand() <=> $b->strand() }, ID => sub { $a->pattern->ID() cmp $b->pattern->ID() || $a->start() <=> $b->start() || $a->end() <=> $b->end() || $a->strand() <=> $b->strand() }, name => sub { $a->pattern->name() cmp $b->pattern->name() || $a->start() <=> $b->start() || $a->end() <=> $b->end() || $a->strand() <=> $b->strand() }, score => sub { $b->score() <=> $a->score() || $a->pattern->name() cmp $b->pattern->name() || $a->strand() <=> $b->strand() } ); if (defined (my $sort_function = $sort_fn{lc $sort_by})) { $self->{'_iterator_array_ref'} = [ sort $sort_function @{$self->{'_orig_array_ref'}} ]; } else { $self->throw("Cannot sort ".ref($self)." object by '$sort_by'."); } } TFBS-0.7.1/blib/lib/auto/000077500000000000000000000000001305752266700147205ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/auto/TFBS/000077500000000000000000000000001305752266700154565ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/auto/TFBS/.exists000066400000000000000000000000001305752266700167640ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/auto/TFBS/Ext/000077500000000000000000000000001305752266700162165ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/auto/TFBS/Ext/pwmsearch/000077500000000000000000000000001305752266700202075ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/auto/TFBS/Ext/pwmsearch/.exists000066400000000000000000000000001305752266700215150ustar00rootroot00000000000000TFBS-0.7.1/blib/lib/pwm_search.h000066400000000000000000000115741305752266700162610ustar00rootroot00000000000000/*--------------------------------------------------------------- * INCLUDES *---------------------------------------------------------------*/ #include #include /*--------------------------------------------------------------- * DECLARATIONS *---------------------------------------------------------------*/ /* extern double atof(); extern double log2(); extern double sqrt(); extern FILE *fopen(); */ void err_log(), err_show(); /*--------------------------------------------------------------- * DEFINES *---------------------------------------------------------------*/ #define __DEBUG__ 0 /* put debug messages on */ #define FNAMELEN 1000 /* max allowed length of file name */ #define MAX_LINE 200 #define MAXCOUNTS 1000 /* max number of counts in count matrix */ #define MAXERR 100 /* max number of errors that err_log can handle */ #define MAXHITS 1000 #define SEQLEN 1000000 /* max sequence length allowed */ #define SEQNAMELEN MAX_LINE /* max allowed sequence name length */ /*--------------------------------------------------------------- * GLOBALS *---------------------------------------------------------------*/ static char PANIC[] = "err_log function failure"; static char *__ERR__[MAXERR]; static int NUM_ERRS=0; static char SQCOMP[] = /* calculate base on complementary strand */ { /* ASCII chars; IUPAC conventions */ /* Control characters unchanged */ '\000','\001','\002','\003','\004','\005','\006','\007', '\010','\011','\012','\013','\014','\015','\016','\017', '\020','\021','\022','\023','\024','\025','\026','\027', '\030','\031','\032','\033','\034','\035','\036','\037', /* Punctuation and digits unchanged */ '\040','\041','\042','\043','\044','\045','\046','\047', '\050','\051','\052','\053','\054','\055','\056','\057', '\060','\061','\062','\063','\064','\065','\066','\067', '\070','\071','\072','\073','\074','\075','\076','\077', /* Capitals go to capitals */ '\100', 'T', 'V', 'G', 'H', '?', '?', 'C', /* @,A-G */ 'D', '?', '?', 'M', '?', 'K', 'N', '?', /* H-O */ '?', '?', 'Y', 'S', 'A', '?', 'B', 'W', /* P-W */ '?', 'R', '?','\133','\134','\135','\136','\137', /* X-Z,etc */ /* Lower case goes to lower case */ '\140', 't', 'v', 'g', 'h', '?', '?', 'c', 'd', '?', '?', 'm', '?', 'k', 'n', '?', '?', '?', 'y', 's', 'a', '?', 'b', 'w', '?', 'r', '?','\173','\174','\175','\176','\177' }; static int TRANS[] = /* translate characters to numbers */ { /* A=0; C=1; G=2; T=3; other = 4 */ /* Control characters */ 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, /* Punctuation and digits */ 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, 4,4,4,4,4,4,4,4, /* Capitals */ 4,0,4,1,4,4,4,2, /* @,A-G */ 4,4,4,4,4,4,4,4, /* H-O */ 4,4,4,4,3,3,4,4, /* P-W */ 4,4,4,4,4,4,4,4, /* X-Z,etc */ /* Lower case */ 4,0,4,1,4,4,4,2, /* @,A-G */ 4,4,4,4,4,4,4,4, /* H-O */ 4,4,4,4,3,3,4,4, /* P-W */ 4,4,4,4,4,4,4,4 /* X-Z,etc */ }; /*--------------------------------------------------------------- * STRUCTURE DEFINITIONS *---------------------------------------------------------------*/ /* ARGUMENTS -- Structure to contain shared arguments */ struct arguments { char counts_file[FNAMELEN+1]; /* file name, count matrix */ char mask_file[FNAMELEN+1]; /* file name, masked seq output, "" means none. */ char seq_file[FNAMELEN+1]; /* file name, sequences */ char name[FNAMELEN+1]; /* TF name */ char class[FNAMELEN+1]; /* TF structural class */ int print_all; /* print scores of all hits */ long best_base; /* base for best score on sequence */ int best_only; /* only show best score on each sequence */ double best_score; /* best score on this sequence */ int best_strand; /* strand for best score on sequence */ double max_score; /* max score possible (implied from pwm) */ double min_score; /* min score possible (implied from pwm) */ double threshold; /* print stuff with log score > max_possible - threshold */ int width; /* pattern width (implied from number of counts) */ }; /* HIT - location and score of a site scoring above threshold */ struct HIT { long base; /* location */ int strand; /* 0 forward, 1 complement */ double score; /* score */ }; TFBS-0.7.1/blib/lib/pwm_searchPFF.c000066400000000000000000000501451305752266700166050ustar00rootroot00000000000000/*-------------------------------------------------------------------- * BUGS or limitations * mask option not yet implemented. * * Extensions/revisions worth considering * pwm_calc that calculates pwm scores for every position; pipe to * selection programs that pull what I want. *------------------------------------------------------------------*/ /*-------------------------------------------------------------------- * This version is a quick and dirty modification of Wyeth Wasserman's * standalone pwm_searchPFF program. * * Boris Lenhard, August 2001 * * Read pwm matrix * Figure maximum and minimum possible scores * Read sequences (fasta format) one at a time, and for each: * Window through the sequence and complement * * Find all occurrences of pattern with * matrix score > threshold * * If -a flag is set just print all the values, otherwise: * * If -b flag is not set, * For each find, show seq name, location, find, score * otherwise * just show the best hit for this sequence * If "-m" option is set, write out all input sequences to * filename given, with finds replaced by 'n's. * * Exit: 0 for success, -1 otherwise. *------------------------------------------------------------------*/ #include "pwm_search.h" int do_search(char* matrixfile, char* seqfile, float threshold, char* tfname, char* tfclass, char* outfile) /*was: main int argc; char **argv;*/ { double pwm[2*MAXCOUNTS]; /* for pwm matrix */ /* do own indexing; 5*pos + nt */ int exitval = -1; /* exit value from main */ struct arguments args; /* command line args */ FILE *fp; /* for sequence input file */ FILE *outfp; NUM_ERRS = 0; if (__DEBUG__) fprintf(stderr, "%s %s %f %s %s %s\n", matrixfile, seqfile, threshold, tfname, tfclass, outfile); if ( __DEBUG__ ) announce("+++\nEntering main.\n+++\n"); /* Parse command line arguments */ /*if ( get_cmd_args(argc,argv,&args) ) { err_log( "Usage: pwm_searchPFF pwm_file seq_file threshold [-a][-b]|[-m mask_file] [-n TFname] [-c TFclass]\n" ); }*/ strcpy(args.counts_file, matrixfile); strcpy(args.seq_file, seqfile); args.threshold = threshold; strcpy(args.name, tfname); strcpy(args.class, tfclass); args.print_all = 0; args.best_only= 0; /* Read in the pwm; calculate max/min score */ //else if ( get_matrix(&args,pwm) ) { err_log("MAIN: get_matrix failed."); } /* Open the sequence file */ else if ( (fp=fopen(args.seq_file,"r")) == NULL ) { err_log("MAIN: open_seq_file failed."); } else if ( (outfp=fopen(outfile,"w")) == NULL ) { err_log("MAIN: open_outfile failed."); } /* Loop on sequences */ else if ( loop_on_seqs(&args,pwm,fp,outfp) ) { err_log("MAIN: loop_on_seqs failed."); } /* Normal completion */ else { exitval = 0; } /* Clean up and close out */ err_show(); fclose(fp); fclose(outfp); if ( __DEBUG__ ) announce("+++\nLeaving main.\n+++\n"); return(exitval); } /*-------------------------------------------------------------------- * Announce * * Print a debugging message * * Returns 0 *------------------------------------------------------------------*/ int announce(msg) char *msg; { int retval = 0; fprintf(stderr,msg); return(retval); } /*-------------------------------------------------------------------- * BEST_SAVE - Save the best score so far * * Called by do_seq * * Returns: 0 *------------------------------------------------------------------*/ int best_save(struct arguments* pargs, long base, int strand, double score) //struct arguments *pargs; /* args from command line */ //long base; /* base where score occurs */ //int strand; /* strand where score occurs */ //double score; /* score of hit to save */ { if ( pargs->best_base < 0 || score > pargs->best_score ) { pargs->best_base = base; pargs->best_score = score; pargs->best_strand = strand; } return(0); } /*-------------------------------------------------------------------- * BEST_PULL - Copy back the best score saved * * Called by do_seq * * Returns: 0 *------------------------------------------------------------------*/ best_pull(pargs,pbase,pstrand,pscore) struct arguments *pargs; /* args from command line */ long *pbase; /* base where score occurs */ int *pstrand; /* strand where score occurs */ double *pscore; /* score of hit to pull back */ { *pbase = pargs->best_base; if ( pargs->best_base >= 0 ) { *pscore = pargs->best_score; *pstrand = pargs->best_strand; } return(0); } /*-------------------------------------------------------------------- * DO_SEQ - Search through the given sequence with the given matrix * * Called by loop_on_seqs * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int do_seq(pargs,pwm,seqid,seq,outfp) struct arguments *pargs; /* args from command line */ double *pwm; /* pwm from get_matrix */ char *seqid; /* id of sequence to work on */ char *seq; /* the sequence to work on */ FILE *outfp; { double backward_score; double forward_score; double score; long base; int done = 0; int nt; int pos; int retval = 0; int strand; long l; long nhit=0L; struct HIT hits[MAXHITS]; if ( __DEBUG__ ) announce("+++\nEntering do_seq.\n+++\n"); /* first make sure sequence is long enough */ for ( base=0; base < pargs->width; ++base ) { if ( seq[base] == '\0' ) done = 1; } /* loop on windows */ pargs->best_base = -1; for ( base=0; !retval && !done && seq[base+pargs->width-1]; ++base ) { forward_score = 0.0; backward_score = 0.0; for ( pos=0; poswidth; ++pos ) { nt = TRANS[seq[base+pos]]; forward_score += pwm[5*pos + nt]; nt = ( nt==4 ) ? 4 : 3-nt; backward_score += pwm[5*(pargs->width - pos -1) + nt]; } if ( forward_score > pargs->threshold ) { if ( pargs->print_all ) { if ( save_hit(base,0,forward_score,hits,&nhit) ) { err_log("DO_SEQ: save_hit failed"); retval = -1; } } else if ( pargs->best_only ) { best_save(pargs,base,0,forward_score); } else if ( output(pargs,seqid,base,seq,0,forward_score,outfp) ) { err_log("DO_SEQ: output failed"); retval = -1; } } if ( backward_score > pargs->threshold ) { if ( pargs->print_all ) { if ( save_hit(base,1,backward_score,hits,&nhit) ) { err_log("DO_SEQ: save_hit failed"); retval = -1; } } else if ( pargs->best_only ) { best_save(pargs,base,1,backward_score); } else if ( output(pargs,seqid,base,seq,1,backward_score, outfp) ) { err_log("DO_SEQ: output failed"); retval = -1; } } } if ( pargs->print_all ) { for ( l=0; l=0 ) { if ( output(pargs,seqid,base,seq,strand,score,outfp) ) { err_log("DO_SEQ: output failed"); retval = -1; } } } if ( __DEBUG__ ) announce("+++\nLeaving do_seq.\n+++\n"); return(retval); } /*********************************************************************** * ERR_LOG and ERR_SHOW * * A pair of functions for saving up and then printing error messages. * err_log stores away an error message each time it is called. When * err_show is called it prints all the messages saved up so far. * * Neither function returns a value **********************************************************************/ void err_log(msg) char *msg; { if ( __DEBUG__ ) announce("+++\nEntering err_log\n+++\n"); NUM_ERRS++; if ( (__ERR__[NUM_ERRS-1] = (char *) malloc( 1+strlen(msg) ) ) == NULL ) __ERR__[NUM_ERRS - 1] = PANIC; else strcpy( __ERR__[NUM_ERRS - 1],msg ); if ( __DEBUG__ ) announce("+++\nLeaving err_log\n+++\n"); return; } void err_show() { int err_num; for ( err_num=0; err_numcounts_file,argv[1]); strcpy(pargs->seq_file,argv[2]); pargs->threshold = atof(argv[3]); pargs->best_only = 0; pargs->print_all = 0; pargs->mask_file[0] = '\0'; while (arg_count < argc) { if ( argv[arg_count][0]=='-' && argv[arg_count][1]=='b' ) { pargs->best_only = 1; arg_count++; } else if ( argv[arg_count][0]=='-' && argv[arg_count][1]=='a' ) { pargs->print_all = 1; arg_count++; } else if ( arg_countmask_file,argv[arg_count+1]); arg_count = arg_count+2; } else if ( arg_countname,argv[arg_count+1]); arg_count = arg_count+2; } else if ( arg_countclass,argv[arg_count+1]); arg_count = arg_count+2; } else { arg_count++; } } } if ( __DEBUG__ ) announce("+++\nLeaving get_cmd_args\n+++\n"); return( retval ); } /*-------------------------------------------------------------------- * GET_MATRIX - Read in pwm. * * Called by main. * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int get_matrix(struct arguments* pargs, double* pwm) /* struct arguments *pargs; args from command line double *pwm; array for pwm */ /* do own indexing; 5*pos + nt */ { double counts[2*MAXCOUNTS]; double max_log; double min_log; double scratch[1+MAXCOUNTS]; int done = 0; int nt; int num_counts; int pos; int retval=0; FILE *fp; /* stream for counts file */ if ( __DEBUG__ ) announce("+++\nEntering get_matrix\n+++\n"); /* Open the file */ if ( (fp=fopen(pargs->counts_file,"r")) == NULL ) { err_log("GET_MATRIX: could not open specified file."); retval = -1; } /* Read in the real numbers without regard to dimension */ else { for ( num_counts=0; !done && num_countswidth = num_counts/4; for ( pos=0; poswidth; ++pos ) { for ( nt=0; nt<4; ++nt ) { pwm[5*pos + nt] = scratch[(pargs->width)*nt + pos]; } pwm[5*pos + 4] = (pwm[5*pos + 0] + pwm[5*pos + 1] + pwm[5*pos + 2] + pwm[5*pos + 3] ) / 4; } /* Next the extreme scores */ pargs->max_score = 0; pargs->min_score = 0; for ( pos=0; poswidth; ++pos ) { max_log = -10.0; min_log = 10.0; for ( nt=0; nt<4; ++nt ) { max_log = ( max_log>pwm[5*pos+nt] ) ? max_log : pwm[5*pos+nt]; min_log = ( min_logmax_score += max_log; pargs->min_score += min_log; } } if ( __DEBUG__ ) announce("+++\nLeaving get_matrix\n+++\n"); return (retval); } /*-------------------------------------------------------------------- * GET_SEQUENCE * * Get the next sequence from the input file (fasta format) * * Called by loop_on_seqs. * * Return 0 normally, -1 on error, 1 if called at EOF. *------------------------------------------------------------------*/ get_sequence(fp,seq_id,sequence) FILE *fp; /* file to read */ char *seq_id; /* name of sequence */ char *sequence; /* text of sequence */ { char msg[2*MAX_LINE]; int c; int done=0; int position; int retval = 0; int word = 0; int count = 0; long base = 0L; char line[MAX_LINE]; // was static int at_eof = 0; // was static int first_time=1; // was static if ( __DEBUG__ ) { announce("+++\nEntering Get_sequence\n+++\n"); } if ( first_time ) { first_time=0; if ( fgets(line,MAX_LINE,fp)==NULL ) { at_eof = 1; } } if ( at_eof ) /* this time or last time */ { retval = 1; } /* At this point, line should always be the first line of an entry */ /* Pull out the id */ if ( !retval ) { strcpy(seq_id,line+1); seq_id[ strlen(seq_id) -1 ] = '\0'; while (count < strlen(seq_id) && !word) { if (seq_id[count] == ' ') { word++; seq_id[count]= '\0'; } count++; } } /* Read in the sequence */ while ( !retval && !done ) { if ( __DEBUG__ ) { announce("+++\nReading in...\n+++\n"); } if ( fgets(line,MAX_LINE,fp) == NULL ) { at_eof = 1; done = 1; } else if ( line[0] == '>' ) { done = 1; } else { for ( position=0; !retval && line[position]!='\0'; ++position) { c = line[position]; if ( !isdigit( c ) && !isspace( c ) ) { if ( base >= SEQLEN ) { err_log("GET_SEQUENCE: Sequence too long."); retval = -1; } else { sequence[base++] = c; } } } } } sequence[base] = '\0'; if ( __DEBUG__ ) { announce("+++\nLeaving Get_sequence\n+++\n"); sprintf(msg,"seq_id=%s\nlength=%ld\n", seq_id, base ); announce(msg); } return(retval); } /*-------------------------------------------------------------------- * LOOP_ON_SEQS - Loop through the sequences of the input file, * doing the search and output. * * Called by main. * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int loop_on_seqs(pargs,pwm,fp, outfp) struct arguments *pargs; /* args from command line */ double *pwm; /* pwm, from get_matrix */ FILE *fp; /* sequence file pointer */ FILE *outfp; /* output file pointer */ { char seq[SEQLEN+1]; char seqid[SEQNAMELEN+1]; int done = 0; int retval=0; if ( __DEBUG__ ) announce("+++\nEntering loop_on_seqs\n+++\n"); /* Main loop */ while ( !retval && !done ) { done = get_sequence(fp,seqid,seq); if ( done == -1 ) { err_log("LOOP_ON_SEQS: get_sequence failed."); retval = -1; } else if ( done == 0 ) { if ( do_seq(pargs,pwm,seqid,seq,outfp) ) { err_log("LOOP_ON_SEQS: do_seq failed."); retval = -1; } } } if ( __DEBUG__ ) announce("+++\nLeaving loop_on_seqs\n+++\n"); return (retval); } /*-------------------------------------------------------------------- * MARK - write "width" dashes, to mark strand * * Called by output. * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int mark(width) int width; { int pos; for ( pos=0; posmin_score)/(pargs->max_score - pargs->min_score), # pargs->min_score, # pargs->max_score); # printf("\n%ld\n",base+1); */ fprintf(outfp, "%s\tTFBS\t%s\t%s\t",seqid,pargs->name,pargs->class); if (strand) { fprintf(outfp, "-\t"); /* FIXED BY BORIS : 1 is for "-" strand */ } else fprintf(outfp, "+\t"); /* FIXED BY BORIS : 0 is for "+" strand */ fprintf(outfp, "%6.3f\t%6.1f\t", score, 100*(score - pargs->min_score)/(pargs->max_score - pargs->min_score)); fprintf(outfp, "%ld\t%ld\t",base+1,base+pargs->width); for ( pos=0; poswidth; ++pos ) { putc(seq[base+pos], outfp); } putc('\n', outfp); /* #endif */ if ( __DEBUG__ ) announce("+++\nLeaving output\n+++\n"); return( retval ); } /*-------------------------------------------------------------------- * SAVE_HIT - save location, strand and score of a hit in an array of such * * Called by do_seq. * * Returns: 0 for success, -1 for failure. *------------------------------------------------------------------*/ int save_hit(base,strand,score,hits,pnhit) long base; int strand; double score; struct HIT *hits; long *pnhit; { int retval = 0; if ( *pnhit == MAXHITS ) { err_log("SAVE_HIT: MAXHITS limit reached."); retval = -1; } hits[*pnhit].base = base; hits[*pnhit].strand = strand; hits[*pnhit].score = score; *pnhit = *pnhit + 1; return(retval); } TFBS-0.7.1/blib/man1/000077500000000000000000000000001305752266700140365ustar00rootroot00000000000000TFBS-0.7.1/blib/man1/.exists000066400000000000000000000000001305752266700153440ustar00rootroot00000000000000TFBS-0.7.1/blib/man3/000077500000000000000000000000001305752266700140405ustar00rootroot00000000000000TFBS-0.7.1/blib/man3/.exists000066400000000000000000000000001305752266700153460ustar00rootroot00000000000000TFBS-0.7.1/blib/man3/TFBS::DB::FlatFileDir.3pm000066400000000000000000000247011305752266700200470ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::DB::FlatFileDir 3" .TH TFBS::DB::FlatFileDir 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::DB::FlatFileDir \- interface to a database of pattern matrices stored as a collection of flat files in a dedicated directory .SH "SYNOPSIS" .IX Header "SYNOPSIS" .IP "\(bu" 4 creating a database object by connecting to the existing directory .Sp .Vb 1 \& my $db = TFBS::DB::FlatFileDir\->connect("/home/boris/MatrixDir"); .Ve .IP "\(bu" 4 retrieving a TFBS::Matrix::* object from the database .Sp .Vb 2 \& # retrieving a PFM by ID \& my $pfm = $db\->get_Matrix_by_ID(\*(AqM00079\*(Aq,\*(AqPFM\*(Aq); \& \& #retrieving a PWM by name \& my $pwm = $db\->get_Matrix_by_name(\*(AqNF\-kappaB\*(Aq, \*(AqPWM\*(Aq); .Ve .IP "\(bu" 4 retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria .Sp .Vb 4 \& # retrieving a set of PWMs from a list of IDs: \& my @IDlist = (\*(AqM0019\*(Aq, \*(AqM0045\*(Aq, \*(AqM0073\*(Aq, \*(AqM0101\*(Aq); \& my $matrixset = $db\->get_MatrixSet(\-IDs => \e@IDlist, \& \-matrixtype => "PWM"); \& \& # retrieving a set of ICMs from a list of names: \& my @namelist = (\*(Aqp50\*(Aq, \*(Aqp53\*(Aq, \*(AqHNF\-1\*(Aq. \*(AqGATA\-1\*(Aq, \*(AqGATA\-2\*(Aq, \*(AqGATA\-3\*(Aq); \& my $matrixset = $db\->get_MatrixSet(\-names => \e@namelist, \& \-matrixtype => "ICM"); \& \& # retrieving a set of all PFMs in the database \& my $matrixset = $db\->get_MatrixSet(\-matrixtype => "PFM"); .Ve .IP "\(bu" 4 creating a new FlatFileDir database in a new directory: .Sp .Vb 1 \& my $db = TFBS::DB::JASPAR2\->create("/home/boris/NewMatrixDir"); .Ve .IP "\(bu" 4 storing a matrix in the database: .Sp .Vb 2 \& #let $pfm is a TFBS::Matrix::PFM object \& $db\->store_Matrix($pfm); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::DB::FlatFileDir is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a set of flat files in a dedicated directory. It has a very simple structure and can be easily set up manually if desired. .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .Vb 8 \& Title : new \& Usage : my $db = TFBS::DB::FlatFileDir\->new(%args); \& Function: the formal constructor for the TFBS::DB::FlatFileDir object; \& most users will not use it \- they will use specialized \& I or I constructors to create a \& database object \& Returns : a TFBS::DB::FlatFileDir object \& Args : \-dir # the directory containing flat files .Ve .SS "connect" .IX Subsection "connect" .Vb 8 \& Title : connect \& Usage : my $db = TFBS::DB::FlatFileDir\->connect($directory); \& Function: Creates a database object that retrieves TFBS::Matrix::* \& object data from or stores it in an existing directory \& Returns : a TFBS::DB::FlatFileDir object \& Args : ($directory) \& The name of the directory (possibly with fully qualified \& path). .Ve .SS "create" .IX Subsection "create" .Vb 9 \& Title : create \& Usage : my $newdb = TFBS::DB::FlatFileDir\->create($new_directory); \& Function: connects to the database server, creates a new directory, \& sets up a FlatFileDir database and returns a database \& object that interfaces the database \& Returns : a TFBS::DB::FlatFileDir object \& Args : ($new_directory) \& The name of the directory to create \& (possibly with fully qualified path). .Ve .SS "get_Matrix_by_ID" .IX Subsection "get_Matrix_by_ID" .Vb 10 \& Title : get_Matrix_by_ID \& Usage : my $pfm = $db\->get_Matrix_by_ID(\*(AqM00034\*(Aq, \*(AqPFM\*(Aq); \& Function: fetches matrix data under the given ID from the \& database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the \& object depending on the second argument (allowed \& values are \*(AqPFM\*(Aq, \*(AqICM\*(Aq, and \*(AqPWM\*(Aq); returns undef if \& matrix with the given ID is not found \& Args : (Matrix_ID, Matrix_type) \& Matrix_ID is a string; Matrix_type is one of the \& following: \*(AqPFM\*(Aq (raw position frequency matrix), \& \*(AqICM\*(Aq (information content matrix) or \*(AqPWM\*(Aq (position \& weight matrix) \& If Matrix_type is omitted, a PWM is retrieved by default. .Ve .SS "get_Matrix_by_name" .IX Subsection "get_Matrix_by_name" .Vb 10 \& Title : get_Matrix_by_name \& Usage : my $pfm = $db\->get_Matrix_by_name(\*(AqHNF\-1\*(Aq, \*(AqPWM\*(Aq); \& Function: fetches matrix data under the given name from the \& database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the object \& depending on the second argument (allowed values are \& \*(AqPFM\*(Aq, \*(AqICM\*(Aq, and \*(AqPWM\*(Aq) \& Args : (Matrix_name, Matrix_type) \& Matrix_name is a string; Matrix_type is one of the \& following: \& \*(AqPFM\*(Aq (raw position frequency matrix), \& \*(AqICM\*(Aq (information content matrix) or \& \*(AqPWM\*(Aq (position weight matrix) \& If Matrix_type is omitted, a PWM is retrieved by default. \& Warning : According to the current JASPAR2 data model, name is \& not necessarily a unique identifier. In the case where \& there are several matrices with the same name in the \& database, the function fetches the first one and prints \& a warning on STDERR. You have been warned. .Ve .SS "store_Matrix" .IX Subsection "store_Matrix" .Vb 6 \& Title : store_Matrix \& Usage : $db\->store_Matrix($matrixobj); \& Function: Stores the contents of a TFBS::Matrix::DB object in the database \& Returns : 0 on success; $@ contents on failure \& (this is too C\-like and may change in future versions) \& Args : ($matrixobj) # a TFBS::Matrix::* object .Ve .SS "delete_Matrix_having_ID" .IX Subsection "delete_Matrix_having_ID" .Vb 9 \& Title : delete_Matrix_having_ID \& Usage : $db\->delete_Matrix_with_ID(\*(AqM00045\*(Aq); \& Function: Deletes the matrix having the given ID from the database \& Returns : 0 on success; $@ contents on failure \& (this is too C\-like and may change in future versions) \& Args : (ID) \& A string \& Comment : Yeah, yeah, \*(Aqdelete_Matrix_having_ID\*(Aq is a stupid name \& for a method, but at least it should be obviuos what it does. .Ve TFBS-0.7.1/blib/man3/TFBS::DB::JASPAR2.3pm000066400000000000000000000446111305752266700167660ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::DB::JASPAR2 3" .TH TFBS::DB::JASPAR2 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::DB::JASPAR2 \- interface to MySQL relational database of pattern matrices .SH "SYNOPSIS" .IX Header "SYNOPSIS" .IP "\(bu" 4 creating a database object by connecting to the existing JASPAR2\-type database .Sp .Vb 3 \& my $db = TFBS::DB::JASPAR2\->connect("dbi:mysql:JASPAR2:myhost", \& "myusername", \& "mypassword"); .Ve .IP "\(bu" 4 retrieving a TFBS::Matrix::* object from the database .Sp .Vb 2 \& # retrieving a PFM by ID \& my $pfm = $db\->get_Matrix_by_ID(\*(AqM0079\*(Aq,\*(AqPFM\*(Aq); \& \& #retrieving a PWM by name \& my $pwm = $db\->get_Matrix_by_name(\*(AqNF\-kappaB\*(Aq, \*(AqPWM\*(Aq); .Ve .IP "\(bu" 4 retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria .Sp .Vb 4 \& # retrieving a set of PWMs from a list of IDs: \& my @IDlist = (\*(AqM0019\*(Aq, \*(AqM0045\*(Aq, \*(AqM0073\*(Aq, \*(AqM0101\*(Aq); \& my $matrixset = $db\->get_MatrixSet(\-IDs => \e@IDlist, \& \-matrixtype => "PWM"); \& \& # retrieving a set of ICMs from a list of names: \& my @namelist = (\*(Aqp50\*(Aq, \*(Aqp53\*(Aq, \*(AqHNF\-1\*(Aq. \*(AqGATA\-1\*(Aq, \*(AqGATA\-2\*(Aq, \*(AqGATA\-3\*(Aq); \& my $matrixset = $db\->get_MatrixSet(\-names => \e@namelist, \& \-matrixtype => "ICM"); \& \& # retrieving a set of all PFMs in the database \& # derived from human genes: \& my $matrixset = $db\->get_MatrixSet(\-species => [\*(AqHomo sapiens\*(Aq], \& \-matrixtype => "PFM"); .Ve .IP "\(bu" 4 creating a new JASPAR2\-type database named \s-1MYJASPAR2:\s0 .Sp .Vb 3 \& my $db = TFBS::DB::JASPAR2\->create("dbi:mysql:MYJASPAR2:myhost", \& "myusername", \& "mypassword"); .Ve .IP "\(bu" 4 storing a matrix in the database (currently only PFMs): .Sp .Vb 2 \& #let $pfm is a TFBS::Matrix::PFM object \& $db\->store_Matrix($pfm); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" \&\s-1TFBS::DB::JASPAR2\s0 is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. .SH "JASPAR2 DATA MODEL" .IX Header "JASPAR2 DATA MODEL" \&\s-1JASPAR2\s0 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed to store matrices for the \s-1JASPAR\s0 database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile \s-1ID\s0 (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional optional tags. .PP Due to its data model, which precedeed the design of the module, \s-1TFBS::DB::JASPAR2\s0 cannot store arbitrary tags for a matrix. .PP The supported tags are 'acc' # (accession number; # originally for transcription factor protein seq) 'seqdb' # sequence database where 'acc' comes from 'medline' # PubMed \s-1ID\s0 'species' # Species name 'sysgroup' 'total_ic' # total information content \- redundant, present # for historical \&\*(L"medline\*(R" => ($self\->_get_medline($ID) or "\*(L"), \*(R"species\*(L" => ($self\->_get_species($ID) or \*(R"\*(L"), \*(R"sysgroup\*(L"=> ($self\->_get_sysgroup($ID) or \*(R"\*(L"), \*(R"type\*(L" => ($self\->_get_type($ID) or \*(R"\*(L"), \*(R"seqdb\*(L" => ($self\->_get_seqdb($ID) or \*(R"\*(L"), \*(R"acc\*(L" => ($self\->_get_acc($ID) or \*(R"\*(L"), \*(R"total_ic"= .PP \&\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \s-1ADVANCED\s0 \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- .PP For the developers and the curious, here is the \s-1JASPAR2\s0 data model: .PP .Vb 12 \& CREATE TABLE matrix_data ( \& ID varchar(16) DEFAULT \*(Aq\*(Aq NOT NULL, \& pos_ID varchar(24) DEFAULT \*(Aq\*(Aq NOT NULL, \& base enum(\*(AqA\*(Aq,\*(AqC\*(Aq,\*(AqG\*(Aq,\*(AqT\*(Aq), \& position tinyint(3) unsigned, \& raw int(3) unsigned, \& info float(7,5) unsigned, \-\- calculated \& pwm float(7,5) unsigned, \-\- calculated \& normalized float(7,5) unsigned, \& PRIMARY KEY (pos_ID), \& KEY id_index (ID) \& ); \& \& \& CREATE TABLE matrix_info ( \& ID varchar(16) DEFAULT \*(Aq\*(Aq NOT NULL, \& name varchar(15) DEFAULT \*(Aq\*(Aq NOT NULL, \& type varchar(8) DEFAULT \*(Aq\*(Aq NOT NULL, \& class varchar(20), \& phylum varchar (32), \-\- maps to \*(Aqsysgroup\*(Aq tag \& litt varchar(40), \-\- not used by this module \& medline int(12), \& information varchar(20), \-\- not used by this module \& iterations varchar(6), \& width int(2), \-\- calculated \& consensus varchar(25), \-\- calculated \& IC float(6,4), \-\- maps to \*(Aqtotal_ic\*(Aq tag \& sites int(3) unsigned, \-\- not used by this module \& PRIMARY KEY (ID) \& ) \& \& \& CREATE TABLE matrix_seqs ( \& ID varchar(16) DEFAULT \*(Aq\*(Aq NOT NULL, \& internal varchar(8) DEFAULT \*(Aq\*(Aq NOT NULL, \& seq_db varchar(15) NOT NULL, \& seq varchar(10) NOT NULL, \& PRIMARY KEY (ID, seq_db, seq) \& ) \& \& \& CREATE TABLE matrix_species ( \& ID varchar(16) DEFAULT \*(Aq\*(Aq NOT NULL, \& internal varchar(8) DEFAULT \*(Aq\*(Aq NOT NULL, \& species varchar(24) NOT NULL, \& PRIMARY KEY (ID, species) \& ) .Ve .PP It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most \s-1TFBS\s0 users, simply because for historical reasons some table column names are confusing at best. Most users should only know the methods to store the data and which tags are supported. .PP \&\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .Vb 3 \& Title : new \& Usage : DEPRECATED \- for backward compatibility only \& Use connect() or create() instead .Ve .SS "connect" .IX Subsection "connect" .Vb 10 \& Title : connect \& Usage : my $db = \& TFBS::DB::JASPAR2\->connect("dbi:mysql:DATABASENAME:HOSTNAME", \& "USERNAME", \& "PASSWORD"); \& Function: connects to the existing JASPAR2\-type database and \& returns a database object that interfaces the database \& Returns : a TFBS::DB::JASPAR2 object \& Args : a standard database connection triplet \& ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") \& In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, \& use the actual values. PASSWORD and USERNAME might be \& optional, depending on the user acces permissions for \& the database server. .Ve .SS "create" .IX Subsection "create" .Vb 10 \& Title : create \& Usage : my $newdb = \& TFBS::DB::JASPAR2\->create("dbi:mysql:NEWDATABASENAME:HOSTNAME", \& "USERNAME", \& "PASSWORD"); \& Function: connects to the database server, creates a new JASPAR2\-type database and returns a database \& object that interfaces the database \& Returns : a TFBS::DB::JASPAR2 object \& Args : a standard database connection triplet \& ("dbi:mysql:NEWDATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") \& In place of NEWDATABASENAME, HOSTNAME, USERNAME and \& PASSWORD use the actual values. PASSWORD and USERNAME \& might be optional, depending on the users acces permissions \& for the database server. .Ve .SS "dbh" .IX Subsection "dbh" .Vb 10 \& Title : dbh \& Usage : my $dbh = $db\->dbh(); \& $dbh\->do("UPDATE matrix_data SET name=\*(AqADD1\*(Aq WHERE NAME=\*(AqSREBP2\*(Aq"); \& Function: returns the DBI database handle of the MySQL database \& interfaced by $db; THIS IS USED FOR WRITING NEW METHODS \& FOR DIRECT RELATIONAL DATABASE MANIPULATION \- if you \& have write access AND do not know what you are doing, \& you can severely corrupt the data \& For documentation about database handle methods, see L \& Returns : the database (DBI) handle of the MySQL JASPAR2\-type \& relational database associated with the TFBS::DB::JASPAR2 \& object \& Args : none .Ve .SS "get_Matrix_by_ID" .IX Subsection "get_Matrix_by_ID" .Vb 10 \& Title : get_Matrix_by_ID \& Usage : my $pfm = $db\->get_Matrix_by_ID(\*(AqM00034\*(Aq, \*(AqPFM\*(Aq); \& Function: fetches matrix data under the given ID from the \& database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the \& object depending on the second argument (allowed \& values are \*(AqPFM\*(Aq, \*(AqICM\*(Aq, and \*(AqPWM\*(Aq); returns undef if \& matrix with the given ID is not found \& Args : (Matrix_ID, Matrix_type) \& Matrix_ID is a string; Matrix_type is one of the \& following: \*(AqPFM\*(Aq (raw position frequency matrix), \& \*(AqICM\*(Aq (information content matrix) or \*(AqPWM\*(Aq (position \& weight matrix) \& If Matrix_type is omitted, a PWM is retrieved by default. .Ve .SS "get_Matrix_by_name" .IX Subsection "get_Matrix_by_name" .Vb 10 \& Title : get_Matrix_by_name \& Usage : my $pfm = $db\->get_Matrix_by_name(\*(AqHNF\-1\*(Aq, \*(AqPWM\*(Aq); \& Function: fetches matrix data under the given name from the \& database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the object \& depending on the second argument (allowed values are \& \*(AqPFM\*(Aq, \*(AqICM\*(Aq, and \*(AqPWM\*(Aq) \& Args : (Matrix_name, Matrix_type) \& Matrix_name is a string; Matrix_type is one of the \& following: \& \*(AqPFM\*(Aq (raw position frequency matrix), \& \*(AqICM\*(Aq (information content matrix) or \& \*(AqPWM\*(Aq (position weight matrix) \& If Matrix_type is omitted, a PWM is retrieved by default. \& Warning : According to the current JASPAR2 data model, name is \& not necessarily a unique identifier. In the case where \& there are several matrices with the same name in the \& database, the function fetches the first one and prints \& a warning on STDERR. You have been warned. .Ve .SS "get_MatrixSet" .IX Subsection "get_MatrixSet" .Vb 10 \& Title : get_MatrixSet \& Usage : my $matrixset = $db\->get_MatrixSet(%args); \& Function: fetches matrix data under for all matrices in the database \& matching criteria defined by the named arguments \& and returns a TFBS::MatrixSet object \& Returns : a TFBS::MatrixSet object \& Args : This method accepts named arguments: \& \-IDs # a reference to an array of IDs (strings) \& \-names # a reference to an array of \& # transcription factor names (string) \& \-classes # a reference to an array of \& # structural class names (strings) \& \-species # a reference to an array of \& # Latin species names (strings) \& \-sysgroups # a reference to an array of \& # higher taxonomic categories (strings) \& \& \-matrixtype # a string, \*(AqPFM\*(Aq, \*(AqICM\*(Aq or \*(AqPWM\*(Aq \& \-min_ic # float, minimum total information content \& # of the matrix .Ve .PP The five arguments that expect list references are used in database query formulation: elements within lists are combined with '\s-1OR\s0' operators, and the lists of different types with '\s-1AND\s0'. For example, .PP .Vb 3 \& my $matrixset = $db\->(\-classes => [\*(AqTRP_CLUSTER\*(Aq, \*(AqFORKHEAD\*(Aq], \& \-species => [\*(AqHomo sapiens\*(Aq, \*(AqMus musculus\*(Aq], \& \-matrixtype => \*(AqPWM\*(Aq); .Ve .PP gives a set of PWMs whose (structural clas is '\s-1TRP_CLUSTER\s0' \s-1OR\s0 \&'\s-1FORKHEAD\s0') \s-1AND\s0 (the species they are derived from is 'Homo sapiens' \&\s-1OR\s0 'Mus musculus'). .PP The \-min_ic filter is applied after the query in the sense that the matrices profiles with total information content less than specified are not included in the set. .SS "store_Matrix" .IX Subsection "store_Matrix" .Vb 9 \& Title : store_Matrix \& Usage : $db\->store_Matrix($pfm); \& Function: Stores the contents of a TFBS::Matrix::DB object in the database \& Returns : 0 on success; $@ contents on failure \& (this is too C\-like and may change in future versions) \& Args : (PFM_object) \& A TFBS::Matrix::PFM object \& Comment : this is an experimental method that is not 100% bulletproof; \& use at your own risk .Ve .SS "store_MatrixSet" .IX Subsection "store_MatrixSet" .Vb 9 \& Title : store_MatrixSet \& Usage : $db\->store_Matrix($matrixset); \& Function: Stores the TFBS::DB::PFM object that are part of a \& TFBS::MatrixSet object into the database \& Returns : 0 on success; $@ contents on failure \& (this is too C\-like and may change in future versions) \& Args : (MatrixSet_object) \& A TFBS::MatrixSet object \& Comment : THIS METHOD IS NOT YET IMPLEMENTED .Ve .SS "delete_Matrix_having_ID" .IX Subsection "delete_Matrix_having_ID" .Vb 9 \& Title : delete_Matrix_having_ID \& Usage : $db\->delete_Matrix_with_ID(\*(AqM00045\*(Aq); \& Function: Deletes the matrix having the given ID from the database \& Returns : 0 on success; $@ contents on failure \& (this is too C\-like and may change in future versions) \& Args : (ID) \& A string \& Comment : Yeah, yeah, \*(Aqdelete_Matrix_having_ID\*(Aq is a stupid name \& for a method, but at least it should be obviuos what it does. .Ve TFBS-0.7.1/blib/man3/TFBS::DB::JASPAR4.3pm000066400000000000000000000362541305752266700167740ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::DB::JASPAR4 3" .TH TFBS::DB::JASPAR4 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::DB::JASPAR4 \- interface to MySQL relational database of pattern matrices .SH "SYNOPSIS" .IX Header "SYNOPSIS" .IP "\(bu" 4 creating a database object by connecting to the existing JASPAR2\-type database .Sp .Vb 3 \& my $db = TFBS::DB::JASPAR4\->connect("dbi:mysql:JASPAR4:myhost", \& "myusername", \& "mypassword"); .Ve .IP "\(bu" 4 retrieving a TFBS::Matrix::* object from the database .Sp .Vb 2 \& # retrieving a PFM by ID \& my $pfm = $db\->get_Matrix_by_ID(\*(AqM0079\*(Aq,\*(AqPFM\*(Aq); \& \& #retrieving a PWM by name \& my $pwm = $db\->get_Matrix_by_name(\*(AqNF\-kappaB\*(Aq, \*(AqPWM\*(Aq); .Ve .IP "\(bu" 4 retrieving a set of matrices as a TFBS::MatrixSet object according to various criteria .Sp .Vb 4 \& # retrieving a set of PWMs from a list of IDs: \& my @IDlist = (\*(AqM0019\*(Aq, \*(AqM0045\*(Aq, \*(AqM0073\*(Aq, \*(AqM0101\*(Aq); \& my $matrixset = $db\->get_MatrixSet(\-IDs => \e@IDlist, \& \-matrixtype => "PWM"); \& \& # retrieving a set of ICMs from a list of names: \& my @namelist = (\*(Aqp50\*(Aq, \*(Aqp53\*(Aq, \*(AqHNF\-1\*(Aq. \*(AqGATA\-1\*(Aq, \*(AqGATA\-2\*(Aq, \*(AqGATA\-3\*(Aq); \& my $matrixset = $db\->get_MatrixSet(\-names => \e@namelist, \& \-matrixtype => "ICM"); \& \& # retrieving a set of all PFMs in the database \& # derived from human genes: \& my $matrixset = $db\->get_MatrixSet(\-species => [\*(AqHomo sapiens\*(Aq], \& \-matrixtype => "PFM"); .Ve .IP "\(bu" 4 creating a new JASPAR4\-type database named \s-1MYJASPAR4:\s0 .Sp .Vb 3 \& my $db = TFBS::DB::JASPAR4\->create("dbi:mysql:MYJASPAR4:myhost", \& "myusername", \& "mypassword"); .Ve .IP "\(bu" 4 storing a matrix in the database (currently only PFMs): .Sp .Vb 2 \& #let $pfm is a TFBS::Matrix::PFM object \& $db\->store_Matrix($pfm); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" \&\s-1TFBS::DB::JASPAR4\s0 is a read/write database interface module that retrieves and stores TFBS::Matrix::* and TFBS::MatrixSet objects in a relational database. The interface is nearly identical to the JASPAR2interface, while the underlying data model is different .SH "JASPAR2 DATA MODEL" .IX Header "JASPAR2 DATA MODEL" \&\s-1JASPAR4\s0 is working name for a relational database model used for storing transcriptional factor pattern matrices in a MySQL database. It was initially designed (\s-1JASPAR2\s0) to store matrices for the \s-1JASPAR\s0 database of high quality eukaryotic transcription factor specificity profiles by Albin Sandelin and Wyeth W. Wasserman. Besides the profile matrix itself, this data model stores profile \s-1ID\s0 (unique), name, structural class, basic taxonomic and bibliographic information as well as some additional opseqdbtional tags. .PP Tags that are commonly used in the actual \s-1JASPAR\s0 database include 'medline' # PubMed \s-1ID\s0 'species' # Species name 'superclass' #Species supergroup, eg 'vertebrate', 'plant' etc 'total_ic' # total information content \- redundant, present # for historical 'type' #experimental nethod 'acc' #accession number for \s-1TF\s0 protein sequence 'seqdb' #corresponding database name .PP but any tag is storable and searchable. .PP \&\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \s-1ADVANCED\s0 \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- .PP For the developers and the curious, here is the \s-1JASPAR4\s0 data model: .PP It is our best intention to hide the details of this data model, which we are using on a daily basis in our work, from most \s-1TFBS\s0 users. Most users should only know the methods to store the data and which tags are supported. .PP \&\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .Vb 3 \& Title : new \& Usage : DEPRECATED \- for backward compatibility only \& Use connect() or create() instead .Ve .SS "connect" .IX Subsection "connect" .Vb 10 \& Title : connect \& Usage : my $db = \& TFBS::DB::JASPAR4\->connect("dbi:mysql:DATABASENAME:HOSTNAME", \& "USERNAME", \& "PASSWORD"); \& Function: connects to the existing JASPAR4\-type database and \& returns a database object that interfaces the database \& Returns : a TFBS::DB::JASPAR4 object \& Args : a standard database connection triplet \& ("dbi:mysql:DATABASENAME:HOSTNAME", "USERNAME", "PASSWORD") \& In place of DATABASENAME, HOSTNAME, USERNAME and PASSWORD, \& use the actual values. PASSWORD and USERNAME might be \& optional, depending on the user\*(Aqs acces permissions for \& the database server. .Ve .SS "dbh" .IX Subsection "dbh" .Vb 10 \& Title : dbh \& Usage : my $dbh = $db\->dbh(); \& $dbh\->do("UPDATE matrix_data SET name=\*(AqADD1\*(Aq WHERE NAME=\*(AqSREBP2\*(Aq"); \& Function: returns the DBI database handle of the MySQL database \& interfaced by $db; THIS IS USED FOR WRITING NEW METHODS \& FOR DIRECT RELATIONAL DATABASE MANIPULATION \- if you \& have write access AND do not know what you are doing, \& you can severely corrupt the data \& For documentation about database handle methods, see L \& Returns : the database (DBI) handle of the MySQL JASPAR2\-type \& relational database associated with the TFBS::DB::JASPAR2 \& object \& Args : none .Ve .SS "store_Matrix" .IX Subsection "store_Matrix" .Vb 11 \& Title : store_Matrix \& Usage : $db\->store_Matrix($matrixobject); \& Function: Stores the contents of a TFBS::Matrix::DB object in the database \& Returns : 0 on success; $@ contents on failure \& (this is too C\-like and may change in future versions) \& Args : (PFM_object) \& A TFBS::Matrix::PFM, FBS::Matrix::PWM or FBS::Matrix::ICM object. \& PFM object are recommended to use, as they are eaily converted to \& other formats \& Comment : this is an experimental method that is not 100% bulletproof; \& use at your own risk .Ve .SS "get_Matrix_by_ID" .IX Subsection "get_Matrix_by_ID" .Vb 9 \& Title : get_Matrix_by_ID \& Usage : my $pfm = $db\->get_Matrix_by_ID(\*(AqM00034\*(Aq, \*(AqPFM\*(Aq); \& Function: fetches matrix data under the given ID from the \& database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the \& object depending on what form the matrix is stored \& in the database (PFM is default) \& Args : (Matrix_ID) \& Matrix_ID is a string; .Ve .SS "get_Matrix_by_name" .IX Subsection "get_Matrix_by_name" .Vb 8 \& Title : get_Matrix_by_name \& Usage : my $pfm = $db\->get_Matrix_by_name(\*(AqHNF\-1\*(Aq); \& Function: fetches matrix data under the given name from the \& database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the object \& depending on what form the matrix object was stored in \& the database (default PFM)) \& Args : (Matrix_name) \& \& Warning : According to the current JASPAR4 data model, name is \& not necessarily a unique identifier. In the case where \& there are several matrices with the same name in the \& database, the function fetches the first one and prints \& a warning on STDERR. You\*(Aqve been warned. .Ve .SS "get_MatrixSet" .IX Subsection "get_MatrixSet" .Vb 10 \& Title : get_MatrixSet \& Usage : my $matrixset = $db\->get_MatrixSet(%args); \& Function: fetches matrix data under for all matrices in the database \& matching criteria defined by the named arguments \& and returns a TFBS::MatrixSet object \& Returns : a TFBS::MatrixSet object \& Args : This method accepts named arguments, corresponding to arbitrary tags. \& Note that this is different from JASPAR2. As any tag is supported for \& database storage, any tag can be used for information retrieval. \& Additionally, arguments as \*(Aqname\*(Aq and \*(Aqclass\*(Aq can be used (even though \& they are not tags. \& As with get_Matrix methods, it is important to realize taht any matrix \& format can be stored in the database: the TFBS::MatrixSet might therefore \& consist of PFMs, ICMs and PWMS, depending on how matrices are stored, \& \& Examples include \& \-ID # a reference to an array of IDs (strings) \& \-name # a reference to an array of \& # transcription factor names (string) \& \-class # a reference to an array of \& # structural class names (strings) \& \-species # a reference to an array of \& # Latin species names (strings) \& \-sysgroup # a reference to an array of \& # higher taxonomic categories (strings) \& \& \& \-min_ic # float, minimum total information content \& # of the matrix. IMPORTANT:if retrieved matrices are in PWM \& format there is no way to measureinformation content. \& \-matrixtype #string describing type of matrix to retrieve. If left out, the format \& will revert to the database format. Note that this option only works \& if the database format is pfm .Ve .PP The arguments that expect list references are used in database query formulation: elements within lists are combined with '\s-1OR\s0' operators, and the lists of different types with '\s-1AND\s0'. For example, .PP .Vb 3 \& my $matrixset = $db\->(\-class => [\*(AqTRP_CLUSTER\*(Aq, \*(AqFORKHEAD\*(Aq], \& \-species => [\*(AqHomo sapiens\*(Aq, \*(AqMus musculus\*(Aq], \& ); .Ve .PP gives a set of TFBS::Matrix::PFM objects (given that the matrix models are stored as such) whose (structural clas is '\s-1TRP_CLUSTER\s0' \s-1OR\s0'\s-1FORKHEAD\s0') \s-1AND\s0 (the species they are derived from is 'Homo sapiens'\s-1OR\s0 'Mus musculus'). .PP The \-min_ic filter is applied after the query in the sense that the matrices profiles with total infromation content less than specified are not included in the set. .SS "delete_Matrix_having_ID" .IX Subsection "delete_Matrix_having_ID" .Vb 9 \& Title : delete_Matrix_having_ID \& Usage : $db\->delete_Matrix_with_ID(\*(AqM00045\*(Aq); \& Function: Deletes the matrix having the given ID from the database \& Returns : 0 on success; $@ contents on failure \& (this is too C\-like and may change in future versions) \& Args : (ID) \& A string \& Comment : Yeah, yeah, \*(Aqdelete_Matrix_having_ID\*(Aq is a stupid name \& for a method, but at least it should be obviuos what it does. .Ve TFBS-0.7.1/blib/man3/TFBS::DB::LocalTRANSFAC.3pm000066400000000000000000000177311305752266700201030ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::DB::LocalTRANSFAC 3" .TH TFBS::DB::LocalTRANSFAC 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::DB::LocalTRANSFAC \- interface to local transfac database position frequency matrices (matrix.dat) .PP .Vb 5 \& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- NOTICE \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \& The TRANSFAC database is free for non\-commercial use. For commercial use \& the TRANSFAC databases and programs have to be licensed. Please read \& the DISCLAIMER at http://transfac.gbf.de/TRANSFAC/disclaimer.htm. \& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- .Ve .SH "SYNOPSIS" .IX Header "SYNOPSIS" .IP "\(bu" 4 creating a database object by connecting to \s-1TRANSFAC\s0 data .Sp .Vb 1 \& my $db = TFBS::DB::LocalTRANSFAC\->connect(\-localdir => \*(Aq/home/someusr\*(Aq); \& \& localdir is the location of the matrix.dat TRANSFAC datafile .Ve .IP "\(bu" 4 retrieving a TFBS::Matrix::* object from the database .Sp .Vb 2 \& # retrieving a PFM by ID \& my $pfm = $db\->get_Matrix_by_ID(\*(AqV$CEBPA_01\*(Aq,\*(AqPFM\*(Aq); \& \& #retrieving a PWM by TRANSFAC accession number \& my $pwm = $db\->get_Matrix_by_acc(\*(AqM00116\*(Aq, \*(AqPWM\*(Aq); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::DB::LocalTRANSFAC is a read only database interface that fetches \&\s-1TRANSFAC\s0 matrix data from a local \s-1TRANSFAC\s0 install (matrix.dat) .SS "connect" .IX Subsection "connect" .Vb 10 \& Title : connect \& Usage : my $db = TFBS::DB::TRANSFAC\->connect(%args); \& Function: Creates a TRANSFAC database connection object, which can be used \& to retrieve matrices from a locally installed TRANSFAC database \& Returns : a TFBS::DB::TRANSFAC object \& Args : \-localdir # REQUIRED: the directory of the matrix.dat TRANSFAC \& # datafile. matrix.dat must have read access. \& \-accept_conditions # OPTIONAL: by setting this to a true \& # value, you confirm that you \& # have read and accepted the terms \& # of use of TRANSFAC at \& # http://transfac.gbf.de/TRANSFAC/disclaimer.htm; \& # this also suppresses the annoying \& # message that is printed to STDERR \& # upon invoking the method .Ve .SS "get_Matrix_by_acc" .IX Subsection "get_Matrix_by_acc" .Vb 10 \& Title : get_Matrix_by_acc \& Usage : my $pfm = $db\->get_Matrix_by_acc(\*(AqV$CREB_01\*(Aq, \*(AqPFM\*(Aq); \& Function: fetches matrix data under the given TRANSFAC aaccession number \& from database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the \& object depending on the second argument (allowed \& values are \*(AqPFM\*(Aq, \*(AqICM\*(Aq, and \*(AqPWM\*(Aq); returns undef if \& matrix with the given ID is not found \& Args : (Matrix_ID, Matrix_type) \& Matrix_ID is a string; Matrix_type is one of the \& following: \*(AqPFM\*(Aq (raw position frequency matrix), \& \*(AqICM\*(Aq (information content matrix) or \*(AqPWM\*(Aq (position \& weight matrix) \& If Matrix_type is omitted, a PFM is retrieved by default. .Ve .SS "get_Matrix_by_ID" .IX Subsection "get_Matrix_by_ID" .Vb 10 \& Title : get_Matrix_by_ID \& Usage : my $pfm = $db\->get_Matrix_by_ID(\*(AqV$CREB_01\*(Aq, \*(AqPFM\*(Aq); \& Function: fetches matrix data under the given TRANSFAC ID from the \& database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the \& object depending on the second argument (allowed \& values are \*(AqPFM\*(Aq, \*(AqICM\*(Aq, and \*(AqPWM\*(Aq); returns undef if \& matrix with the given ID is not found \& Args : (Matrix_ID, Matrix_type) \& Matrix_ID is a string; Matrix_type is one of the \& following: \*(AqPFM\*(Aq (raw position frequency matrix), \& \*(AqICM\*(Aq (information content matrix) or \*(AqPWM\*(Aq (position \& weight matrix) \& If Matrix_type is omitted, a PFM is retrieved by default. .Ve TFBS-0.7.1/blib/man3/TFBS::DB::TRANSFAC.3pm000066400000000000000000000205141305752266700171210ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::DB::TRANSFAC 3" .TH TFBS::DB::TRANSFAC 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::DB::TRANSFAC \- interface to database of TRANSFAC public position frequency matrices at TESS (http://www.cbil.upenn.edu/tess) .PP .Vb 5 \& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- NOTICE \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \& The TRANSFAC database is free for non\-commercial use. For commercial use \& the TRANSFAC databases and programs have to be licensed. Please read \& the DISCLAIMER at http://transfac.gbf.de/TRANSFAC/disclaimer.htm. \& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- .Ve .SH "SYNOPSIS" .IX Header "SYNOPSIS" .IP "\(bu" 4 creating a database object by connecting to \s-1TRANSFAC\s0 data .Sp .Vb 1 \& my $db = TFBS::DB::TRANSFAC\->connect(); .Ve .IP "\(bu" 4 retrieving a TFBS::Matrix::* object from the database .Sp .Vb 2 \& # retrieving a PFM by ID \& my $pfm = $db\->get_Matrix_by_ID(\*(AqV$CEBPA_01\*(Aq,\*(AqPFM\*(Aq); \& \& #retrieving a PWM by TRANSFAC accession number \& my $pwm = $db\->get_Matrix_by_acc(\*(AqM00116\*(Aq, \*(AqPWM\*(Aq); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" \&\s-1TFBS::DB::TRANSFAC\s0 is a read only database interface that fetches \&\s-1TRANSFAC\s0 matrix data from \s-1TESS\s0 web interface (http://www.cbil.upen.edu/TESS) and returns TFBS::Matrix::* objects. .SS "connect" .IX Subsection "connect" .Vb 10 \& Title : connect \& Usage : my $db = TFBS::DB::TRANSFAC\->connect(%args); \& Function: Creates a TRANSFAC database connection object, which can be used \& to retrieve matrices from public TRANSFAC databases via the web \& Returns : a TFBS::DB::TRANSFAC object \& Args : \-proxy # OPTIONAL: a http proxy server name, \& # usually required for accessing TRANSFAC from behind \& # a firewall \& \-accept_conditions # OPTIONAL: by setting this to a true \& # value, you confirm that you \& # have read and accepted the terms \& # of use of TRANSFAC at \& # http://transfac.gbf.de/TRANSFAC/disclaimer.htm; \& # this also suppresses the annoying \& # message that is printed to STDERR \& # upon invoking the method .Ve .SS "new" .IX Subsection "new" .Vb 7 \& Title : connect \& Usage : my $db = TFBS::DB::TRANSFAC\->connect(%args); \& Function: Here, I is just a synonim for I \& (to make the interface consistent with other \& bioperl read\-obly Bio::DB::* objects) \& Returns : a TFBS::DB::TRANSFAC object \& Args : \-accept_conditions # see explanation at I .Ve .SS "get_Matrix_by_ID" .IX Subsection "get_Matrix_by_ID" .Vb 10 \& Title : get_Matrix_by_ID \& Usage : my $pfm = $db\->get_Matrix_by_ID(\*(AqV$CREB_01\*(Aq, \*(AqPFM\*(Aq); \& Function: fetches matrix data under the given TRANSFAC ID from the \& database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the \& object depending on the second argument (allowed \& values are \*(AqPFM\*(Aq, \*(AqICM\*(Aq, and \*(AqPWM\*(Aq); returns undef if \& matrix with the given ID is not found \& Args : (Matrix_ID, Matrix_type) \& Matrix_ID is a string; Matrix_type is one of the \& following: \*(AqPFM\*(Aq (raw position frequency matrix), \& \*(AqICM\*(Aq (information content matrix) or \*(AqPWM\*(Aq (position \& weight matrix) \& If Matrix_type is omitted, a PFM is retrieved by default. .Ve .SS "get_Matrix_by_acc" .IX Subsection "get_Matrix_by_acc" .Vb 10 \& Title : get_Matrix_by_acc \& Usage : my $pfm = $db\->get_Matrix_by_acc(\*(AqV$CREB_01\*(Aq, \*(AqPFM\*(Aq); \& Function: fetches matrix data under the given TRANSFAC aaccession number \& from database and returns a TFBS::Matrix::* object \& Returns : a TFBS::Matrix::* object; the exact type of the \& object depending on the second argument (allowed \& values are \*(AqPFM\*(Aq, \*(AqICM\*(Aq, and \*(AqPWM\*(Aq); returns undef if \& matrix with the given ID is not found \& Args : (Matrix_ID, Matrix_type) \& Matrix_ID is a string; Matrix_type is one of the \& following: \*(AqPFM\*(Aq (raw position frequency matrix), \& \*(AqICM\*(Aq (information content matrix) or \*(AqPWM\*(Aq (position \& weight matrix) \& If Matrix_type is omitted, a PFM is retrieved by default. .Ve TFBS-0.7.1/blib/man3/TFBS::Ext::pwmsearch.3pm000066400000000000000000000104731305752266700201670ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "pwmsearch 3" .TH pwmsearch 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Ext::pwmsearch \- Perl extension for scanning a DNA sequence object with a position weight matrix .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 2 \& use TFBS::Ext::pwmsearch; \& pwmsearch .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" Stub documentation for TFBS::Ext::pwmsearch, created by h2xs. It looks like the author of the extension was negligent enough to leave the stub unedited. .PP Blah blah blah. .SS "\s-1EXPORT\s0" .IX Subsection "EXPORT" None by default. .SH "AUTHOR" .IX Header "AUTHOR" A. U. Thor, a.u.thor@a.galaxy.far.far.away .SH "SEE ALSO" .IX Header "SEE ALSO" \&\fIperl\fR\|(1). TFBS-0.7.1/blib/man3/TFBS::Matrix.3pm000066400000000000000000000145761305752266700166050ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::Matrix 3" .TH TFBS::Matrix 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Matrix \- base class for matrix patterns, containing methods common to all .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::Matrix is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and matrix manipulation methods that are independent of the matrix type. It is not meant to be instantiated itself. .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "matrix" .IX Subsection "matrix" .Vb 7 \& Title : matrix \& Usage : my $matrix = $pwm\->matrix(); \& $pwm\->matrix( [ [12, 3, 0, 0, 4, 0], \& [ 0, 0, 0,11, 7, 0], \& [ 0, 9,12, 0, 0, 0], \& [ 0, 0, 0, 1, 1,12] \& ]); \& \& Function: get/set for the matrix data \& Returns : a reference to 2D array of integers(PFM) or floats (ICM, PWM) \& Args : none for get; \& a four line string, reference to 2D array, or a 2D piddle for set .Ve .SS "pdl_matrix" .IX Subsection "pdl_matrix" .Vb 6 \& Title : pdl_matrix \& Usage : my $pdl = $pwm\->pdl_matrix(); \& Function: access the PDL matrix used to store the actual \& matrix data directly \& Returns : a PDL object, aka a piddle \& Args : none .Ve .SS "revcom" .IX Subsection "revcom" .Vb 7 \& Title : revcom \& Usage : my $revcom_pfm = $pfm\->revcom(); \& Function: create a matrix pattern object which is reverse complement \& of the current one \& Returns : a TFBS::Matrix::* object of the same type as the one \& the method acted upon \& Args : none .Ve .SS "rawprint" .IX Subsection "rawprint" .Vb 5 \& Title : rawprint \& Usage : my $rawstring = $pfm\->rawprint); \& Function: convert matrix data to a simple tab\-separated format \& Returns : a four\-line string of tab\-separated integers or floats \& Args : none .Ve .SS "prettyprint" .IX Subsection "prettyprint" .Vb 5 \& Title : prettyprint \& Usage : my $prettystring = $pfm\->prettyprint(); \& Function: convert matrix data to a human\-readable string format \& Returns : a four\-line string with nucleotides and aligned numbers \& Args : none .Ve .SS "length" .IX Subsection "length" .Vb 6 \& Title : length \& Usage : my $pattern_length = $pfm\->length; \& Function: gets the pattern length in nucleotides \& (i.e. number of columns in the matrix) \& Returns : an integer \& Args : none .Ve TFBS-0.7.1/blib/man3/TFBS::Matrix::ICM.3pm000066400000000000000000000303261305752266700173110ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::Matrix::ICM 3" .TH TFBS::Matrix::ICM 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Matrix::ICM \- class for information content matrices of nucleotide patterns .SH "SYNOPSIS" .IX Header "SYNOPSIS" .IP "\(bu" 4 creating a TFBS::Matrix::ICM object manually: .Sp .Vb 9 \& my $matrixref = [ [ 0.00, 0.30, 0.00, 0.00, 0.24, 0.00 ], \& [ 0.00, 0.00, 0.00, 1.45, 0.42, 0.00 ], \& [ 0.00, 0.89, 2.00, 0.00, 0.00, 0.00 ], \& [ 0.00, 0.00, 0.00, 0.13, 0.06, 2.00 ] \& ]; \& my $icm = TFBS::Matrix::ICM\->new(\-matrix => $matrixref, \& \-name => "MyProfile", \& \-ID => "M0001" \& ); \& \& # or \& \& my $matrixstring = <new(\-matrixstring => $matrixstring, \& \-name => "MyProfile", \& \-ID => "M0001" \& ); .Ve .IP "\(bu" 4 retrieving a TFBS::Matix::ICM object from a database: .Sp (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) .Sp .Vb 6 \& my $db_obj = TFBS::DB::JASPAR2\->new \& (\-connect => ["dbi:mysql:JASPAR2:myhost", \& "myusername", "mypassword"]); \& my $pfm = $db_obj\->get_Matrix_by_ID("M0001", "ICM"); \& # or \& my $pfm = $db_obj\->get_Matrix_by_name("MyProfile", "ICM"); .Ve .IP "\(bu" 4 retrieving list of individual TFBS::Matrix::ICM objects from a TFBS::MatrixSet object .Sp (see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices) .Sp .Vb 1 \& my @icm_list = $matrixset\->all_patterns(\-sort_by=>"name"); .Ve .Sp * drawing a sequence logo .Sp .Vb 7 \& $icm\->draw_logo(\-file=>"logo.png", \& \-full_scale =>2.25, \& \-xsize=>500, \& \-ysize =>250, \& \-graph_title=>"C/EBPalpha binding site logo", \& \-x_title=>"position", \& \-y_title=>"bits"); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::Matrix::ICM is a class whose instances are objects representing position weight matrices (PFMs). An \s-1ICM\s0 is normally calculated from a raw position frequency matrix (see TFBS::Matrix::PFM for the explanation of position frequency matrices). For example, given the following position frequency matrix, .PP .Vb 4 \& A:[ 12 3 0 0 4 0 ] \& C:[ 0 0 0 11 7 0 ] \& G:[ 0 9 12 0 0 0 ] \& T:[ 0 0 0 1 1 12 ] .Ve .PP the standard computational procedure is applied to convert it into the following information content matrix: .PP .Vb 4 \& A:[2.00 0.30 0.00 0.00 0.24 0.00] \& C:[0.00 0.00 0.00 1.45 0.42 0.00] \& G:[0.00 0.89 2.00 0.00 0.00 0.00] \& T:[0.00 0.00 0.00 0.13 0.06 2.00] .Ve .PP which contains the \*(L"weights\*(R" associated with the occurrence of each nucleotide at the given position in a pattern. .PP A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search). .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .Vb 5 \& Title : new \& Usage : my $icm = TFBS::Matrix::ICM\->new(%args) \& Function: constructor for the TFBS::Matrix::ICM object \& Returns : a new TFBS::Matrix::ICM object \& Args : # you must specify either one of the following three: \& \& \-matrix, # reference to an array of arrays of integers \& #or \& \-matrixstring,# a string containing four lines \& # of tab\- or space\-delimited integers \& #or \& \-matrixfile, # the name of a file containing four lines \& # of tab\- or space\-delimited integers \& ####### \& \& \-name, # string, OPTIONAL \& \-ID, # string, OPTIONAL \& \-class, # string, OPTIONAL \& \-tags # an array reference, OPTIONAL .Ve .SS "to_PWM" .IX Subsection "to_PWM" .Vb 9 \& Title : to_PWM \& Usage : my $pwm = $icm\->to_PWM() \& Function: converts an information content matrix (a TFBS::Matrix::ICM object) \& to position weight matrix. At present it assumes uniform \& background distribution of nucleotide frequencies. \& Returns : a new TFBS::Matrix::PWM object \& Args : none; in the future releases, it should be able to accept \& a user defined background probability of the four \& nucleotides .Ve .SS "draw_logo" .IX Subsection "draw_logo" .Vb 10 \& Title : draw_logo \& Usage : my $gdImageObj = $icm\->draw_logo(%args) \& Function: Draws a "sequence logo", a graphical representation \& of a possibly degenerate fixed\-width nucleotide \& sequence pattern, from the information content matrix \& Returns : a GD::Image object; \& if you only need the image file you can ignore it \& Args : \-file, # the name of the output PNG image file \& # OPTIONAL: default none \& \-xsize # width of the image in pixels \& # OPTIONAL: default 600 \& \-ysize # height of the image in pixels \& # OPTIONAL: default 5/8 of \-x_size \& \-margin # size of image margins in pixels \& # OPTIONAL: default 15% of \-y_size \& \-full_scale # the maximum value on the y\-axis, in bits \& # OPTIONAL: default 2.25 \& \-graph_title,# the graph title \& # OPTIONAL: default none \& \-x_title, # x\-axis title; OPTIONAL: default none \& \-y_title # y\-axis title; OPTIONAL: default none \& \-error_bars # reference to an array of S.D. values for each column; OPTIONAL \& \-ps # if true, produces a postscript string instead of a GD::Image object \& \-pdf # if true AND the \-file argumant is used, produces an output pdf file .Ve .SS "_draw_ps_logo" .IX Subsection "_draw_ps_logo" .Vb 10 \& Title : _draw_ps_logo \& Usage : my $postscript_string = $icm\->_draw_ps_logo(%args) \& Internal method, should be accessed using draw_logo() \& Function: Draws a "sequence logo", a graphical representation \& of a possibly degenerate fixed\-width nucleotide \& sequence pattern, from the information content matrix \& Returns : a postscript string; \& if you only need the image file you can ignore it \& Args : \-file, # the name of the output PNG image file \& # OPTIONAL: default none \& \-xsize # width of the image in pixels \& # OPTIONAL: default 600 \& \-ysize # height of the image in pixels \& # OPTIONAL: default 5/8 of \-x_size \& \-full_scale # the maximum value on the y\-axis, in bits \& # OPTIONAL: default 2.25 \& \-graph_title,# the graph title \& # OPTIONAL: default none \& \-x_title, # x\-axis title; OPTIONAL: default none \& \-y_title # y\-axis title; OPTIONAL: default none .Ve .SS "name" .IX Subsection "name" .SS "\s-1ID\s0" .IX Subsection "ID" .SS "class" .IX Subsection "class" .SS "matrix" .IX Subsection "matrix" .SS "length" .IX Subsection "length" .SS "revcom" .IX Subsection "revcom" .SS "rawprint" .IX Subsection "rawprint" .SS "prettyprint" .IX Subsection "prettyprint" The above methods are common to all matrix objects. Please consult TFBS::Matrix to find out how to use them. TFBS-0.7.1/blib/man3/TFBS::Matrix::PFM.3pm000066400000000000000000000327001305752266700173210ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::Matrix::PFM 3" .TH TFBS::Matrix::PFM 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Matrix::PFM \- class for raw position frequency matrix patterns .SH "SYNOPSIS" .IX Header "SYNOPSIS" .IP "\(bu" 4 creating a TFBS::Matrix::PFM object manually: .Sp .Vb 10 \& my $matrixref = [ [ 12, 3, 0, 0, 4, 0 ], \& [ 0, 0, 0, 11, 7, 0 ], \& [ 0, 9, 12, 0, 0, 0 ], \& [ 0, 0, 0, 1, 1, 12 ] \& ]; \& my $pfm = TFBS::Matrix::PFM\->new(\-matrix => $matrixref, \& \-name => "MyProfile", \& \-ID => "M0001" \& ); \& # or \& \& my $matrixstring = \& "12 3 0 0 4 0\en0 0 0 11 7 0\en0 9 12 0 0 0\en0 0 0 1 1 12"; \& \& my $pfm = TFBS::Matrix::PFM\->new(\-matrixstring => $matrixstring, \& \-name => "MyProfile", \& \-ID => "M0001" \& ); .Ve .IP "\(bu" 4 retrieving a TFBS::Matix::PFM object from a database: .Sp (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) .Sp .Vb 6 \& my $db_obj = TFBS::DB::JASPAR2\->new \& (\-connect => ["dbi:mysql:JASPAR2:myhost", \& "myusername", "mypassword"]); \& my $pfm = $db_obj\->get_Matrix_by_ID("M0001", "PFM"); \& # or \& my $pfm = $db_obj\->get_Matrix_by_name("MyProfile", "PFM"); .Ve .IP "\(bu" 4 retrieving list of individual TFBS::Matrix::PFM objects from a TFBS::MatrixSet object .Sp (See the TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices.) .Sp .Vb 1 \& my @pfm_list = $matrixset\->all_patterns(\-sort_by=>"name"); .Ve .IP "\(bu" 4 convert a raw frequency matrix to other matrix types: .Sp .Vb 2 \& my $pwm = $pfm\->to_PWM(); # convert to position weight matrix \& my $icm = $icm\->to_ICM(); # convert to information con .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::Matrix::PFM is a class whose instances are objects representing raw position frequency matrices (PFMs). A \s-1PFM\s0 is derived from N nucleotide patterns of fixed size, e.g. the set of sequences .PP .Vb 12 \& AGGCCT \& AAGCCT \& AGGCAT \& AAGCCT \& AAGCCT \& AGGCAT \& AGGCCT \& AGGCAT \& AGGTTT \& AGGCAT \& AGGCCT \& AGGCCT .Ve .PP will give the matrix: .PP .Vb 4 \& A:[ 12 3 0 0 4 0 ] \& C:[ 0 0 0 11 7 0 ] \& G:[ 0 9 12 0 0 0 ] \& T:[ 0 0 0 1 1 12 ] .Ve .PP which contains the count of each nucleotide at each position in the sequence. (If you have a set of sequences as above and want to create a TFBS::Matrix::PFM object out of them, have a look at TFBS::PatternGen::SimplePFM module.) .PP PFMs are easily converted to other types of matrices, namely information content matrices and position weight matrices. A TFBS::Matrix::PFM object has the methods to_ICM and to_PWM which do just that, returning a TFBS::Matrix::ICM and TFBS::Matrix::PWM objects, respectively. .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .Vb 5 \& Title : new \& Usage : my $pfm = TFBS::Matrix::PFM\->new(%args) \& Function: constructor for the TFBS::Matrix::PFM object \& Returns : a new TFBS::Matrix::PFM object \& Args : # you must specify either one of the following three: \& \& \-matrix, # reference to an array of arrays of integers \& #or \& \-matrixstring,# a string containing four lines \& # of tab\- or space\-delimited integers \& #or \& \-matrixfile, # the name of a file containing four lines \& # of tab\- or space\-delimited integers \& ####### \& \& \-name, # string, OPTIONAL \& \-ID, # string, OPTIONAL \& \-class, # string, OPTIONAL \& \-tags # an array reference, OPTIONAL \&Warnings : Warns if the matrix provided has columns with different \& sums. Columns with different sums contradict the usual \& origin of matrix data and, unless you are absolutely sure \& that column sums _should_ be different, it would be wise to \& check your matrices. .Ve .SS "column_sum" .IX Subsection "column_sum" .Vb 8 \& Title : column_sum \& Usage : my $nr_sequences = $pfm\->column_sum() \& Function: calculates the sum of elements of one column \& (the first one by default) which normally equals the \& number of sequences used to derive the PFM. \& Returns : the sum of elements of one column (an integer) \& Args : columnn number (starting from 1), OPTIONAL \- you DO NOT \& need to specify it unless you are dealing with a matrix .Ve .SS "to_PWM" .IX Subsection "to_PWM" .Vb 9 \& Title : to_PWM \& Usage : my $pwm = $pfm\->to_PWM() \& Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) \& to position weight matrix. At present it assumes uniform \& background distribution of nucleotide frequencies. \& Returns : a new TFBS::Matrix::PWM object \& Args : none; in the future releases, it should be able to accept \& a user defined background probability of the four \& nucleotides .Ve .SS "to_ICM" .IX Subsection "to_ICM" .Vb 7 \& Title : to_ICM \& Usage : my $icm = $pfm\->to_ICM() \& Function: converts a raw frequency matrix (a TFBS::Matrix::PFM object) \& to information content matrix. At present it assumes uniform \& background distribution of nucleotide frequencies. \& Returns : a new TFBS::Matrix::ICM object \& Args : \-small_sample_correction # undef (default), \*(Aqschneider\*(Aq or \*(Aqpseudocounts\*(Aq .Ve .PP How a \s-1PFM\s0 is converted to \s-1ICM:\s0 .PP For a \s-1PFM\s0 element PFM[i,k], the probability without pseudocounts is estimated to be simply .PP .Vb 1 \& p[i,k] = PFM[i,k] / Z .Ve .PP where \&\- Z equals the column sum of the matrix i.e. the number of motifs used to construct the \s-1PFM\s0. \&\- i is the column index (position in the motif) \&\- k is the row index (a letter in the alphacer, here k is one of (A,C,G,T) .PP Here is how one normally calculates the pseudocount-corrected positional probability p'[i,j]: .PP .Vb 1 \& p\*(Aq[i,k] = (PFM[i,k] + 0.25*sqrt(Z)) / (Z + sqrt(Z)) .Ve .PP 0.25 is for the flat distribution of nucleotides, and sqrt(Z) is the recommended pseudocount weight. In the general case, .PP .Vb 1 \& p\*(Aq[i,k] = (PFM[i,k] + q[k]*B) / (Z + B) .Ve .PP where q[k] is the background distribution of the letter (nucleotide) k, and B an arbitrary pseudocount value or expression (for no pseudocounts B=0). .PP For a given position i, the deviation from random distribution in bits is calculated as (Baldi and Brunak eq. 1.9 (2ed) or 1.8 (1ed)): .PP \&\- for an arbitrary alphabet of A letters: .PP .Vb 1 \& D[i] = log2(A) + sum_for_all_k(p[i,k]*log2(p[i,k])) .Ve .PP \&\- special case for nucleotides (A=4) .PP .Vb 1 \& D[i] = 2 + sum_for_all_k(p[i,k]*log2(p[i,k])) .Ve .PP D[i] equals the information content of the position i in the motif. To calculate the entire \s-1ICM\s0, you have to calculate the contrubution of each nucleotide at a position i to D[i], i.e. .PP ICM[i,k] = p'[i,k] * D[i] .SS "draw_logo" .IX Subsection "draw_logo" .Vb 10 \& Title : draw_logo \& Usage : my $gd_image = $pfm\->draw_logo() \& Function: draws a sequence logo; similar to the \& method in TFBS::Matrix::ICM, but can automatically calculate \& error bars for drawing \& Returns : a GD image object (see documentation of GD module) \& Args : many; PFM\-specific options are: \& \-small_sample_correction # One of \& # "Schneider" (uses correction \& # described by Schneider et al. \& # (Schneider t et al. (1986) J.Biol.Chem. \& # "pseudocounts" \- standard pseudocount \& # correction, more suitable for \& # PFMs with large r column sums \& # If the parameter is ommited, small \& # sample correction is not applied \& \& \-draw_error_bars # if true, adds error bars to each position \& # in the logo. To calculate the error bars, \& # it uses the \-small_sample_connection \& # argument if explicitly set, \& # or "Schneider" by default \&For other args, see draw_logo entry in TFBS::Matrix::ICM documentation .Ve .SS "add_PFM" .IX Subsection "add_PFM" .Vb 5 \& Title : add_PFM \& Usage : $pfm\->add_PFM($another_pfm) \& Function: adds the values of $pnother_pfm matrix to $pfm \& Returns : reference to the updated $pfm object \& Args : a TFBS::Matrix::PFM object .Ve .SS "name" .IX Subsection "name" .SS "\s-1ID\s0" .IX Subsection "ID" .SS "class" .IX Subsection "class" .SS "matrix" .IX Subsection "matrix" .SS "length" .IX Subsection "length" .SS "revcom" .IX Subsection "revcom" .SS "rawprint" .IX Subsection "rawprint" .SS "prettyprint" .IX Subsection "prettyprint" The above methods are common to all matrix objects. Please consult TFBS::Matrix to find out how to use them. TFBS-0.7.1/blib/man3/TFBS::Matrix::PWM.3pm000066400000000000000000000320471305752266700173460ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::Matrix::PWM 3" .TH TFBS::Matrix::PWM 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Matrix::PWM \- class for position weight matrices of nucleotide patterns .SH "SYNOPSIS" .IX Header "SYNOPSIS" .IP "\(bu" 4 creating a TFBS::Matrix::PWM object manually: .Sp .Vb 10 \& my $matrixref = [ [ 0.61, \-3.16, 1.83, \-3.16, 1.21, \-0.06], \& [\-0.15, \-2.57, \-3.16, \-3.16, \-2.57, \-1.83], \& [\-1.57, 1.85, \-2.57, \-1.34, \-1.57, 1.14], \& [ 0.31, \-3.16, \-2.57, 1.76, 0.24, \-0.83] \& ]; \& my $pwm = TFBS::Matrix::PWM\->new(\-matrix => $matrixref, \& \-name => "MyProfile", \& \-ID => "M0001" \& ); \& # or \& \& my $matrixstring = <new(\-matrixstring => $matrixstring, \& \-name => "MyProfile", \& \-ID => "M0001" \& ); .Ve .IP "\(bu" 4 retrieving a TFBS::Matix::PWM object from a database: .Sp (See documentation of individual TFBS::DB::* modules to learn how to connect to different types of pattern databases and retrieve TFBS::Matrix::* objects from them.) .Sp .Vb 6 \& my $db_obj = TFBS::DB::JASPAR2\->new \& (\-connect => ["dbi:mysql:JASPAR2:myhost", \& "myusername", "mypassword"]); \& my $pwm = $db_obj\->get_Matrix_by_ID("M0001", "PWM"); \& # or \& my $pwm = $db_obj\->get_Matrix_by_name("MyProfile", "PWM"); .Ve .IP "\(bu" 4 retrieving list of individual TFBS::Matrix::PWM objects from a TFBS::MatrixSet object .Sp (see decumentation of TFBS::MatrixSet to learn how to create objects for storage and manipulation of multiple matrices) .Sp .Vb 1 \& my @pwm_list = $matrixset\->all_patterns(\-sort_by=>"name"); .Ve .IP "\(bu" 4 scanning a nucleotide sequence with a matrix .Sp .Vb 2 \& my $siteset = $pwm\->search_seq(\-file =>"myseq.fa", \& \-threshold => "80%"); .Ve .IP "\(bu" 4 scanning a pairwise alignment with a matrix .Sp .Vb 4 \& my $site_pair_set = $pwm\->search_aln(\-file =>"myalign.aln", \& \-threshold => "80%", \& \-cutoff => "70%", \& \-window => 50); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::Matrix::PWM is a class whose instances are objects representing position weight matrices (PWMs). A \s-1PWM\s0 is normally calculated from a raw position frequency matrix (see TFBS::Matrix::PFM for the explanation of position frequency matrices). For example, given the following position frequency matrix: .PP .Vb 4 \& A:[ 12 3 0 0 4 0 ] \& C:[ 0 0 0 11 7 0 ] \& G:[ 0 9 12 0 0 0 ] \& T:[ 0 0 0 1 1 12 ] .Ve .PP The standard computational procedure is applied to convert it into the following position weight matrix: .PP .Vb 4 \& A:[ 0.61 \-3.16 1.83 \-3.16 1.21 \-0.06] \& C:[\-0.15 \-2.57 \-3.16 \-3.16 \-2.57 \-1.83] \& G:[\-1.57 1.85 \-2.57 \-1.34 \-1.57 1.14] \& T:[ 0.31 \-3.16 \-2.57 1.76 0.24 \-0.83] .Ve .PP which contains the \*(L"weights\*(R" associated with the occurence of each nucleotide at the given position in a pattern. .PP A TFBS::Matrix::PWM object is equipped with methods to search nucleotide sequences and pairwise alignments of nucleotide sequences with the pattern they represent, and return a set of sites in nucleotide sequence (a TFBS::SiteSet object for single sequence search, and a TFBS::SitePairSet for the alignment search). .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .Vb 5 \& Title : new \& Usage : my $pwm = TFBS::Matrix::PWM\->new(%args) \& Function: constructor for the TFBS::Matrix::PWM object \& Returns : a new TFBS::Matrix::PWM object \& Args : # you must specify either one of the following three: \& \& \-matrix, # reference to an array of arrays of integers \& #or \& \-matrixstring,# a string containing four lines \& # of tab\- or space\-delimited integers \& #or \& \-matrixfile, # the name of a file containing four lines \& # of tab\- or space\-delimited integers \& ####### \& \& \-name, # string, OPTIONAL \& \-ID, # string, OPTIONAL \& \-class, # string, OPTIONAL \& \-tags # an array reference, OPTIONAL .Ve .SS "search_seq" .IX Subsection "search_seq" .Vb 6 \& Title : search_seq \& Usage : my $siteset = $pwm\->search_seq(%args) \& Function: scans a nucleotide sequence with the pattern represented \& by the PWM \& Returns : a TFBS::SiteSet object \& Args : # you must specify either one of the following three: \& \& \-file, # the name od a fasta file (single sequence) \& #or \& \-seqobj # a Bio::Seq object \& # (more accurately, a Bio::PrimarySeqobject or a \& # subclass thereof) \& #or \& \-seqstring # a string containing the sequence \& \& \-threshold, # minimum score for the hit, either absolute \& # (e.g. 11.2) or relative (e.g. "75%") \& # OPTIONAL: default "80%" \& \& \-subpart # subpart of the sequence to search, given as \& # \-subpart => { start => 140, \& # end => 180 } \& # where start and end are coordinates in the \& # sequence; the coordinate range is interpreted \& # in the BioPerl tradition (1\-based, inclusive) \& # OPTIONAL: by default searches entire alignment .Ve .SS "search_aln" .IX Subsection "search_aln" .Vb 10 \& Title : search_aln \& Usage : my $site_pair_set = $pwm\->search_aln(%args) \& Function: Scans a pairwise alignment of nucleotide sequences \& with the pattern represented by the PWM: it reports only \& those hits that are present in equivalent positions of both \& sequences and exceed a specified threshold score in both, AND \& are found in regions of the alignment above the specified \& conservation cutoff value. \& Returns : a TFBS::SitePairSet object \& Args : # you must specify either one of the following three: \& \& \-file, # the name of the alignment file in Clustal \& format \& #or \& \-alignobj # a Bio::SimpleAlign object \& # (more accurately, a Bio::PrimarySeqobject or a \& # subclass thereof) \& #or \& \-alignstring # a multi\-line string containing the alignment \& # in clustal format \& ############# \& \& \-threshold, # minimum score for the hit, either absolute \& # (e.g. 11.2) or relative (e.g. "75%") \& # OPTIONAL: default "80%" \& \& \-window, # size of the sliding window (inn nucleotides) \& # for calculating local conservation in the \& # alignment \& # OPTIONAL: default 50 \& \& \-cutoff # conservation cutoff (%) for including the \& # region in the results of the pattern search \& # OPTIONAL: default "70%" \& \& \-subpart # subpart of the alignment to search, given as e.g. \& # \-subpart => { relative_to => 1, \& # start => 140, \& # end => 180 } \& # where start and end are coordinates in the \& # sequence indicated by relative_to (1 for the \& # 1st sequence in the alignment, 2 for the 2nd) \& # OPTIONAL: by default searches entire alignment \& \& \-conservation \& # conservation profile, a TFBS::ConservationProfile \& # OPTIONAL: by default the conservation profile is \& # computed internally on the fly (less efficient) .Ve .SS "name" .IX Subsection "name" .SS "\s-1ID\s0" .IX Subsection "ID" .SS "class" .IX Subsection "class" .SS "matrix" .IX Subsection "matrix" .SS "length" .IX Subsection "length" .SS "revcom" .IX Subsection "revcom" .SS "rawprint" .IX Subsection "rawprint" .SS "prettyprint" .IX Subsection "prettyprint" The above methods are common to all matrix objects. Please consult TFBS::Matrix to find out how to use them. TFBS-0.7.1/blib/man3/TFBS::MatrixSet.3pm000066400000000000000000000253071305752266700172530ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::MatrixSet 3" .TH TFBS::MatrixSet 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Matrix::Set \- an agregate class representing a set of matrix patterns, containing methods for manipulating the set as a whole .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 2 \& # creation of a TFBS::MatrixSet object \& # let @list_of_matrix_objects be a list of TFBS::Matrix::* objects \& \& ################################### \& # Create a TFBS::MatrixSet object: \& \& my $matrixset = TFBS::MatrixSet\->new(); # creates an empty set \& $matrixset\->add_Matrix(@list_of_matrix_objects); #add matrix objects to set \& $matrixset\->add_Matrix($matrixobj); # adds a single matrix object to set \& \& # or, same as above: \& \& my $matrixset = TFBS::MatrixSet\->new(@list_of_matrix_objects, $matrixobj); \& \& ################################### \& # .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::MatrixSet is an aggregate class storing a set of TFBS::Matrix::* subclass objects, and providing methods form manipulating those sets as a whole. TFBS::MatrixSet objects are created de novo or returned by some database (TFBS::DB::*) retrieval methods. .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .SS "add_matrix" .IX Subsection "add_matrix" .Vb 5 \& Title : add_matrix \& Usage : $matrixset\->add_matrix(@list_of_matrix_objects); \& Function: Adds matrix objects to matrixset \& Returns : object reference (usually ignored) \& Args : one or more TFBS::Matrix::* objects .Ve .SS "add_matrix_set" .IX Subsection "add_matrix_set" .Vb 6 \& Title : add_matrix \& Usage : $matrixset\->add_matrix(@list_of_matrixset_objects); \& Function: Adds to the matrixset matrix objects contained in one or \& more other matrixsets \& Returns : object reference (usually ignored) \& Args : one or more TFBS::MatrixSet objects .Ve .SS "search_seq" .IX Subsection "search_seq" .Vb 4 \& Title : search_seq \& Usage : my $siteset = $matrixset\->search_seq(%args) \& Function: scans a nucleotide sequence with all patterns represented \& stored in $matrixset; \& \& It works only if all matrix objects in $matrixset understand \& search_seq method (currently only TFBS::Matrix::PWM objects do) \& Returns : a TFBS::SiteSet object \& Args : # you must specify either one of the following three: \& \& \-file, # the name od a fasta file (single sequence) \& #or \& \-seqobj # a Bio::Seq object \& # (more accurately, a Bio::PrimarySeqobject or a \& # subclass thereof) \& #or \& \-seqstring # a string containing the sequence \& \& \-threshold, # minimum score for the hit, either absolute \& # (e.g. 11.2) or relative (e.g. "75%") \& # OPTIONAL: default "80%" .Ve .SS "search_aln" .IX Subsection "search_aln" .Vb 10 \& Title : search_aln \& Usage : my $site_pair_set = $matrixset\->search_aln(%args) \& Function: Scans a pairwise alignment of nucleotide sequences \& with the pattern represented by the PWM: it reports only \& those hits that are present in equivalent positions of both \& sequences and exceed a specified threshold score in both, AND \& are found in regions of the alignment above the specified \& conservation cutoff value. \& It works only if all matrix object in $matrixset understand \& search_aln method (currently only TFBS::Matrix::PWM objects do) \& \& Returns : a TFBS::SitePairSet object \& Args : # you must specify either one of the following three: \& \& \-file, # the name of the alignment file in Clustal \& format \& #or \& \-alignobj # a Bio::SimpleAlign object \& # (more accurately, a Bio::PrimarySeqobject or a \& # subclass thereof) \& #or \& \-alignstring # a multi\-line string containing the alignment \& # in clustal format \& ############# \& \& \-threshold, # minimum score for the hit, either absolute \& # (e.g. 11.2) or relative (e.g. "75%") \& # OPTIONAL: default "80%" \& \& \-window, # size of the sliding window (inn nucleotides) \& # for calculating local conservation in the \& # alignment \& # OPTIONAL: default 50 \& \& \-cutoff # conservation cutoff (%) for including the \& # region in the results of the pattern search \& # OPTIONAL: default "70%" \& \& \-subpart # subpart of the alignment to search, given as e.g. \& # \-subpart => { relative_to => 1, \& # start => 140, \& # end => 180 } \& # where start and end are coordinates in the \& # sequence indicated by relative_to (1 for the \& # 1st sequence in the alignment, 2 for the 2nd) \& # OPTIONAL: by default searches entire alignment \& \& \-conservation \& # conservation profile, a TFBS::ConservationProfile \& # OPTIONAL: by default the conservation profile is \& # computed internally on the fly (less efficient) .Ve .SS "size" .IX Subsection "size" .Vb 6 \& Title : size \& Usage : my $number_of_matrices = $matrixset\->size; \& Function: gets the number of matrix objects in the $matrixset \& (i.e. the size of the set) \& Returns : a number \& Args : none .Ve .SS "Iterator" .IX Subsection "Iterator" .Vb 10 \& Title : Iterator \& Usage : my $matrixset_iterator = \& $matrixset\->Iterator(\-sort_by =>\*(Aqtotal_ic\*(Aq); \& while (my $matrix_object = $matrix_iterator\->next) { \& # do whatever you want with individual matrix objects \& } \& Function: Returns an iterator object that can be used to go through \& all members of the set \& Returns : an iterator object (currently undocumentened in TFBS \- \& but understands the \*(Aqnext\*(Aq method) \& Args : \-sort_by # optional \- currently it accepts \& # \*(AqID\*(Aq (alphabetically) \& # \*(Aqname\*(Aq (alphabetically) \& # \*(Aqclass\*(Aq (alphabetically) \& # \*(Aqtotal_ic\*(Aq (numerically, decreasing order) \& \& \-reverse # optional \- reverses the default sorting order if true .Ve TFBS-0.7.1/blib/man3/TFBS::PatternGen.3pm000066400000000000000000000126071305752266700174010ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen 3" .TH TFBS::PatternGen 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen \- a base class for pattern generators .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen is a base class providing methods common to all pattern generating modules. It is meant to be inherited by a concrete pattern generator, which must have its own constructor. .SS "pattern" .IX Subsection "pattern" .Vb 10 \& Title : pattern \& Usage : my $pattern_obj = $patterngen\->pattern() \& Function: retrieves a pattern object produced by the pattern generator \& Returns : a pattern object (currently available pattern generators \& return a TFBS::Matrix::PFM object) \& Args : none \& Warning : If a pattern generator produces more than one pattern, \& this method call returns only the first one and prints \& a warning on STDERR, In those cases you should use \& I or I methods. .Ve .SS "patternSet" .IX Subsection "patternSet" .Vb 7 \& Title : patternSet \& Usage : my $patternSet = $patterngen\->patternSet() \& Function: retrieves a pattern set object containing all the patterns \& produced by the pattern generator \& Returns : a pattern set object (currently available pattern generators \& return a TFBS::MatrixSet object) \& Args : none .Ve .SS "all_patterns" .IX Subsection "all_patterns" .Vb 8 \& Title : all_patterns \& Usage : my @patterns = $patterngen\->all_patterns() \& Function: retrieves an array of pattern objects \& produced by the pattern generator \& Returns : an array of pattern set objects (currently available \& pattern generators return an array of \& TFBS::Matrix::PFM objects) \& Args : none .Ve TFBS-0.7.1/blib/man3/TFBS::PatternGen::AnnSpec.3pm000066400000000000000000000137511305752266700210360ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::AnnSpec 3" .TH TFBS::PatternGen::AnnSpec 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::AnnSpec \- a pattern factory that uses the AnnSpec program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 3 \& my $patterngen = \& TFBS::PatternGen::AnnSpec\->new(\-seq_file=>\*(Aqsequences.fa\*(Aq, \& \-binary => \*(Aqann\-spec \*(Aq \& \& \& my $pfm = $patterngen\->pattern(); # $pfm is now a TFBS::Matrix::PFM object .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::AnnSpec builds position frequency matrices using an external program AnnSpec (Workman, C. and Stormo, G.D. (2000) ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. Proc. Pacific Symposium on Biocomputing 2000). .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Wynand Alkema" .IX Header "AUTHOR - Wynand Alkema" Wynand Alkema .SS "new" .IX Subsection "new" .Vb 10 \& Title : new \& Usage : my $pattrengen = TFBS::PatternGen::AnnSpec\->new(%args); \& Function: the constructor for the TFBS::PatternGen::AnnSpec object \& Returns : a TFBS::PatternGen::AnnSpec object \& Args : This method takes named arguments; \& you must specify one of the following three \& \-seq_list # a reference to an array of strings \& # and/or Bio::Seq objects \& # or \& \-seq_stream # A Bio::SeqIO object \& # or \& \-seq_file # the name of the fasta file containing \& # all the sequences \& Other arguments are: \& \-binary # a fully qualified path to the \*(Aqmeme\*(Aq executable \& # OPTIONAL: default \*(Aqann\-spec\*(Aq \& \-additional_params # a string containing additional \& # command\-line switches for the \& # ann\-spec program .Ve .SS "pattern" .IX Subsection "pattern" .SS "all_patterns" .IX Subsection "all_patterns" .SS "patternSet" .IX Subsection "patternSet" The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see TFBS::PatternGen for details. TFBS-0.7.1/blib/man3/TFBS::PatternGen::AnnSpec::Motif.3pm000066400000000000000000000112461305752266700222160ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::AnnSpec::Motif 3" .TH TFBS::PatternGen::AnnSpec::Motif 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::AnnSpec::Motif \- class for unprocessed motifs and associated numerical scores created by the Gibbs program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::AnnSpec::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the AnnSpec program. You do not normally want to create a TFBS::PatternGen::AnnSpec::Motif yourself. They are created by running TFBS::PatternGen::AnnSpec .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard and Wynand Alkema" .IX Header "AUTHOR - Boris Lenhard and Wynand Alkema" Boris Lenhard Wynand Alkema .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. TFBS-0.7.1/blib/man3/TFBS::PatternGen::Elph.3pm000066400000000000000000000147061305752266700204000ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::Elph 3" .TH TFBS::PatternGen::Elph 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::Elph \- a pattern factory that uses the Elph program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 5 \& my $patterngen = \& TFBS::PatternGen::Elph\->new(\-seq_file=>\*(Aqsequences.fa\*(Aq, \& \-binary => \*(Aq/Elph/elph\*(Aq \& \-motif_length => [8, 9, 10], \& \-additional_params => \*(Aq\-x \-r \-e\*(Aq); \& \& my $pfm = $patterngen\->pattern(); # $pfm is now a TFBS::Matrix::PFM object .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::Gibbs builds position frequency matrices using an advanced Gibbs sampling algorithm implemented in external \&\fIGibbs\fR program by Chip Lawrence. The algorithm can produce multiple patterns from a single set of sequences. .SS "new" .IX Subsection "new" .Vb 10 \& Title : new \& Usage : my $db = TFBS::PatternGen::Gibbs\->new(%args); \& Function: the constructor for the TFBS::PatternGen::Gibbs object \& Returns : a TFBS::PatternGen::Gibbs object \& Args : This method takes named arguments; \& you must specify one of the following three \& \-seq_list # a reference to an array of strings \& # and/or Bio::Seq objects \& # or \& \-seq_stream # A Bio::SeqIO object \& # or \& \-seq_file # the name of the fasta file containing \& # all the sequences \& Other arguments are: \& \-binary # a fully qualified path to Gibbs executable \& # OPTIONAL: default \*(AqGibbs\*(Aq \& \-nr_hits # a presumed number of pattern occurrences in the \& # sequence set: it can be a single integer, e.g. \& # \-nr_hits => 24 , or a reference to an array of \& # integers, e.g \-nr_hits => [12, 24, 36] \& \-motif_length # an expected length of motif in nucleotides: \& # it can be a single integer, e.g. \& # \-motif_length => 8 , or a reference to an \& # array ofintegers, e.g \-motif_length => [8..12] \& \-additional_params # a string containing additional \& # command\-line switches for the \& # Gibbs program .Ve .SS "pattern" .IX Subsection "pattern" .SS "all_patterns" .IX Subsection "all_patterns" .SS "patternSet" .IX Subsection "patternSet" The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see TFBS::PatternGen for details. TFBS-0.7.1/blib/man3/TFBS::PatternGen::Elph::Motif.3pm000066400000000000000000000111101305752266700215450ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::Elph::Motif 3" .TH TFBS::PatternGen::Elph::Motif 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::AnnSpec::Motif \- class for unprocessed motifs and associated numerical scores created by the Gibbs program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::MEME::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the meme program. You do not normally want to create a TFBS::PatternGen::MEME::Motif yourself. They are created by running TFBS::PatternGen::MEME .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Wynand Alkema" .IX Header "AUTHOR - Wynand Alkema" Wynand Alkema .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. TFBS-0.7.1/blib/man3/TFBS::PatternGen::Gibbs.3pm000066400000000000000000000150461305752266700205340ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::Gibbs 3" .TH TFBS::PatternGen::Gibbs 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::Gibbs \- a pattern factory that uses Chip Lawrences Gibbs program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 6 \& my $patterngen = \& TFBS::PatternGen::Gibbs\->new(\-seq_file=>\*(Aqsequences.fa\*(Aq, \& \-binary => \*(Aq/Programs/Gibbs\-1.0/bin/Gibbs\*(Aq \& \-nr_hits => 24, \& \-motif_length => [8, 9, 10], \& \-additional_params => \*(Aq\-x \-r \-e\*(Aq); \& \& my $pfm = $patterngen\->pattern(); # $pfm is now a TFBS::Matrix::PFM object .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::Gibbs builds position frequency matrices using an advanced Gibbs sampling algorithm implemented in external \&\fIGibbs\fR program by Chip Lawrence. The algorithm can produce multiple patterns from a single set of sequences. .SS "new" .IX Subsection "new" .Vb 10 \& Title : new \& Usage : my $db = TFBS::PatternGen::Gibbs\->new(%args); \& Function: the constructor for the TFBS::PatternGen::Gibbs object \& Returns : a TFBS::PatternGen::Gibbs object \& Args : This method takes named arguments; \& you must specify one of the following three \& \-seq_list # a reference to an array of strings \& # and/or Bio::Seq objects \& # or \& \-seq_stream # A Bio::SeqIO object \& # or \& \-seq_file # the name of the fasta file containing \& # all the sequences \& Other arguments are: \& \-binary # a fully qualified path to Gibbs executable \& # OPTIONAL: default \*(AqGibbs\*(Aq \& \-nr_hits # a presumed number of pattern occurrences in the \& # sequence set: it can be a single integer, e.g. \& # \-nr_hits => 24 , or a reference to an array of \& # integers, e.g \-nr_hits => [12, 24, 36] \& \-motif_length # an expected length of motif in nucleotides: \& # it can be a single integer, e.g. \& # \-motif_length => 8 , or a reference to an \& # array ofintegers, e.g \-motif_length => [8..12] \& \-additional_params # a string containing additional \& # command\-line switches for the \& # Gibbs program .Ve .SS "pattern" .IX Subsection "pattern" .SS "all_patterns" .IX Subsection "all_patterns" .SS "patternSet" .IX Subsection "patternSet" The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see TFBS::PatternGen for details. TFBS-0.7.1/blib/man3/TFBS::PatternGen::Gibbs::Motif.3pm000066400000000000000000000123471305752266700217200ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::Gibbs::Motif 3" .TH TFBS::PatternGen::Gibbs::Motif 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::Gibbs::Motif \- class for unprocessed motifs and associated numerical scores created by the Gibbs program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::Gibbs::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the Gibbs program. You do not normally want to create a TFBS::PatternGen::Gibbs::Motif yourself. They are created by running TFBS::PatternGen::Gibbs .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard and Wynand Alkema" .IX Header "AUTHOR - Boris Lenhard and Wynand Alkema" Boris Lenhard Wynand Alkema .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "\s-1MAP\s0" .IX Subsection "MAP" .Vb 7 \& Title : MAP \& Usage : my $map_score = $motif\->MAP; \& Function: returns MAP score for the detected motif \& (This is a backward compatibility method. For consistency, \& you should use $motif\->tag(\*(AqMAP_score\*(Aq) instead \& Returns : float (a scalar) \& Args : none .Ve .SS "Other methods" .IX Subsection "Other methods" TFBS::PatterGen::Motif::Gibbs inherits from TFBS::PatternGen::Motif, which inherits from TFBS::Matrix. Please consult the documentation of those modules for additional available methods. TFBS-0.7.1/blib/man3/TFBS::PatternGen::MEME.3pm000066400000000000000000000135721305752266700202330ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::MEME 3" .TH TFBS::PatternGen::MEME 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::MEME \- a pattern factory that uses the MEME program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 3 \& my $patterngen = \& TFBS::PatternGen::MEME\->new(\-seq_file=>\*(Aqsequences.fa\*(Aq, \& \-binary => \*(Aqmeme\*(Aq \& \& \& my $pfm = $patterngen\->pattern(); # $pfm is now a TFBS::Matrix::PFM object .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::MEME builds position frequency matrices using an external program \s-1MEME\s0 written by Bailey and Elkan. For information and source code of \s-1MEME\s0 see .PP http://www.sdsc.edu/MEME .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Wynand Alkema" .IX Header "AUTHOR - Wynand Alkema" Wynand Alkema .SS "new" .IX Subsection "new" .Vb 10 \& Title : new \& Usage : my $pattrengen = TFBS::PatternGen::MEME\->new(%args); \& Function: the constructor for the TFBS::PatternGen::MEME object \& Returns : a TFBS::PatternGen::MEME object \& Args : This method takes named arguments; \& you must specify one of the following three \& \-seq_list # a reference to an array of strings \& # and/or Bio::Seq objects \& # or \& \-seq_stream # A Bio::SeqIO object \& # or \& \-seq_file # the name of the fasta file containing \& # all the sequences \& Other arguments are: \& \-binary # a fully qualified path to the \*(Aqmeme\*(Aq executable \& # OPTIONAL: default \*(Aqmeme\*(Aq \& \-additional_params # a string containing additional \& # command\-line switches for the \& # meme program .Ve .SS "pattern" .IX Subsection "pattern" .SS "all_patterns" .IX Subsection "all_patterns" .SS "patternSet" .IX Subsection "patternSet" The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see TFBS::PatternGen for details. TFBS-0.7.1/blib/man3/TFBS::PatternGen::MEME::Motif.3pm000066400000000000000000000111101305752266700214000ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::MEME::Motif 3" .TH TFBS::PatternGen::MEME::Motif 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::AnnSpec::Motif \- class for unprocessed motifs and associated numerical scores created by the Gibbs program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::MEME::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the meme program. You do not normally want to create a TFBS::PatternGen::MEME::Motif yourself. They are created by running TFBS::PatternGen::MEME .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Wynand Alkema" .IX Header "AUTHOR - Wynand Alkema" Wynand Alkema .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. TFBS-0.7.1/blib/man3/TFBS::PatternGen::SimplePFM.3pm000066400000000000000000000130211305752266700212710ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::SimplePFM 3" .TH TFBS::PatternGen::SimplePFM 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::SimplePFM \- a simple position frequency matrix factory .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 6 \& my @sequences = qw( AAGCCT AGGCAT AAGCCT \& AAGCCT AGGCAT AGGCCT \& AGGCAT AGGTTT AGGCAT \& AGGCCT AGGCCT ); \& my $patterngen = \& TFBS::PatternGen::SimplePFM\->new(\-seq_list=>\e@sequences); \& \& my $pfm = $patterngen\->pattern(); # $pfm is now a TFBS::Matrix::PFM object .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::SimplePFM generates a position frequency matrix from a set of nucleotide sequences of equal length, The sequences can be passed either as strings, as Bio::Seq objects or as a fasta file. .PP This pattern generator always creates only one pattern from a given set of sequences. .SS "new" .IX Subsection "new" .Vb 10 \& Title : new \& Usage : my $db = TFBS::PatternGen::SimplePFM\->new(%args); \& Function: the constructor for the TFBS::PatternGen::SimplePFM \& object \& Returns : a TFBS::PatternGen::SimplePFM obkect \& Args : This method takes named arguments; \& you must specify one of the following \& \-seq_list # a reference to an array of strings \& # and/or Bio::Seq objects \& # or \& \-seq_stream # A Bio::SeqIO object \& # or \& \-seq_file # the name of the fasta file containing \& # all the sequences .Ve .SS "pattern" .IX Subsection "pattern" .SS "all_patterns" .IX Subsection "all_patterns" .SS "patternSet" .IX Subsection "patternSet" The three above methods are used fro the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see TFBS::PatternGen for details. TFBS-0.7.1/blib/man3/TFBS::PatternGen::YMF.3pm000066400000000000000000000135671305752266700201470ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::YMF 3" .TH TFBS::PatternGen::YMF 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::MEME \- a pattern factory that uses the MEME program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 3 \& my $patterngen = \& TFBS::PatternGen::MEME\->new(\-seq_file=>\*(Aqsequences.fa\*(Aq, \& \-binary => \*(Aqmeme\*(Aq \& \& \& my $pfm = $patterngen\->pattern(); # $pfm is now a TFBS::Matrix::PFM object .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::MEME builds position frequency matrices using an external program \s-1MEME\s0 written by Bailey and Elkan. For information and source code of \s-1MEME\s0 see .PP http://www.sdsc.edu/MEME .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Wynand Alkema" .IX Header "AUTHOR - Wynand Alkema" Wynand Alkema .SS "new" .IX Subsection "new" .Vb 10 \& Title : new \& Usage : my $patterngen = TFBS::PatternGen::YMF\->new(%args); \& Function: the constructor for the TFBS::PatternGen::MEME object \& Returns : a TFBS::PatternGen::MEME object \& Args : This method takes named arguments; \& you must specify one of the following three \& \-seq_list # a reference to an array of strings \& # and/or Bio::Seq objects \& # or \& \-seq_stream # A Bio::SeqIO object \& # or \& \-seq_file # the name of the fasta file containing \& # all the sequences \& Other arguments are: \& \-binary # a fully qualified path to the \*(Aqmeme\*(Aq executable \& # OPTIONAL: default \*(Aqmeme\*(Aq \& \-additional_params # a string containing additional \& # command\-line switches for the \& # meme program .Ve .SS "pattern" .IX Subsection "pattern" .SS "all_patterns" .IX Subsection "all_patterns" .SS "patternSet" .IX Subsection "patternSet" The three methods listed above are used for the retrieval of patterns, and are common to all TFBS::PatternGen::* classes. Please see TFBS::PatternGen for details. TFBS-0.7.1/blib/man3/TFBS::PatternGen::YMF::Motif.3pm000066400000000000000000000110741305752266700213210ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternGen::YMF::Motif 3" .TH TFBS::PatternGen::YMF::Motif 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternGen::YMF::Motif \- class for unprocessed motifs and associated numerical scores created by the YMF program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternGen::YMF::Motif is used to store and manipulate unprocessed motifs and associated numerical scores created by the ymf program. You do not normally want to create a TFBS::PatternGen::YMF::Motif yourself. They are created by running TFBS::PatternGen::YMF .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Wynand Alkema" .IX Header "AUTHOR - Wynand Alkema" Wynand Alkema .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. TFBS-0.7.1/blib/man3/TFBS::PatternI.3pm000066400000000000000000000142431305752266700170560ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::PatternI 3" .TH TFBS::PatternI 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::PatternI \- interface definition for all pattern objects (currently includes matrices and word (consensus and regular expressions ) .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::PatternI is a draft class that should contain general interface for matrix and other (future) pattern objects. It is not defined and not used yet, as I need to ponder over certain unresolved issues in general pattern definition. User feedback is more than welcome. .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "\s-1ID\s0" .IX Subsection "ID" .Vb 6 \& Title : ID \& Usage : my $ID = $icm\->ID() \& $pfm\->ID(\*(AqM00119\*(Aq); \& Function: Get/set on the ID of the pattern (unique in a DB or a set) \& Returns : pattern ID (a string) \& Args : none for get, string for set .Ve .SS "name" .IX Subsection "name" .Vb 6 \& Title : name \& Usage : my $name = $pwm\->name() \& $pfm\->name(\*(AqPPARgamma\*(Aq); \& Function: Get/set on the name of the pattern \& Returns : pattern name (a string) \& Args : none for get, string for set .Ve .SS "class" .IX Subsection "class" .Vb 6 \& Title : class \& Usage : my $class = $pwm\->class() \& $pfm\->class(\*(Aqforkhead\*(Aq); \& Function: Get/set on the structural class of the pattern \& Returns : class name (a string) \& Args : none for get, string for set .Ve .SS "tag" .IX Subsection "tag" .Vb 7 \& Title : tag \& Usage : my $acc = $pwm\->tag(\*(Aqacc\*(Aq) \& $pfm\->tag(source => "Gibbs"); \& Function: Get/set on the structural class of the pattern \& Returns : tag value (a scalar/reference) \& Args : tag name (string) for get, \& tag name (string) and value (any scalar/reference) for set .Ve .SS "all_tags" .IX Subsection "all_tags" .Vb 5 \& Title : all_tags \& Usage : my %tag = $pfm\->all_tags(); \& Function: get a hash of all tags for a matrix \& Returns : a hash of all tag values keyed by tag name \& Args : none .Ve .SS "delete_tag" .IX Subsection "delete_tag" .Vb 5 \& Title : delete_tag \& Usage : $pfm\->delete_tag(\*(Aqscore\*(Aq); \& Function: get a hash of all tags for a matrix \& Returns : nothing \& Args : a string (tag name) .Ve TFBS-0.7.1/blib/man3/TFBS::Site.3pm000066400000000000000000000223471305752266700162400ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::Site 3" .TH TFBS::Site 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Site \- a nucleotide sequence feature object representing (possibly putative) transcription factor binding site. .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 2 \& # manual creation of site object; \& # for details, see documentation of Bio::SeqFeature::Generic; \& \& my $site = TFBS::Site \& (\-start => $start_pos, # integer \& \-end => $end_pos, # integer \& \-score => $score, # float \& \-source => "TFBS", # string \& \-primary => "TF binding site", # primary tag \& \-strand => $strand, # \-1, 0 or 1 \& \-seqobj => $seqobj, # a Bio::Seq object whose sequence \& # contains the site \& \-pattern => $pattern_obj # usu. TFBS::Matrix:PWM obj. \& \-); \& \& \& # Searching sequence with a pattern (PWM) and retrieving individual sites: \& # \& # The following objects should be defined for this example: \& # $pwm \- a TFBS::Matrix::PWM object \& # $seqobj \- a Bio::Seq object \& # Consult the documentation for the above modules if you do not know \& # how to create them. \& \& # Scanning sequence with $pwm returns a TFBS::SiteSet object: \& \& my $site_set = $pwm\->search_seq(\-seqobj => $seqobj, \& \-threshold => "80%"); \& \& # To retrieve individual sites from $site_set, create an iterator obj: \& \& my $site_iterator = $site_set\->Iterator(\-sort_by => "score"); \& \& while (my $site = $site_iterator\->next()) { \& # do something with $site \& } .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::Site object holds data for a (possibly predicted) transcription factor binding site on a nucleotide sequence (start, end, strand, score, tags, as well as references to the corresponding sequence and pattern objects). TFBS::Site is a subclass of Bio::SeqFeature::Generic and has acces to all of its method. Additionally, it contains the \fIpattern()\fR method, an accessor for pattern object associated with the site object. .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .PP TFBS::Site is a class that extends Bio::SeqFeature::Generic. Please consult Bio::SeqFeature::Generic documentation for other available methods. .SS "new" .IX Subsection "new" .Vb 12 \& Title : new \& Usage : my $site = TFBS::Site\->new(%args) \& Function: constructor for the TFBS::Site object \& Returns : TFBS::Site object \& Args : \-start, # integer \& \-end, # integer \& \-strand, # \-1, 0 or 1 \& \-score, # float \& \-source, # string (method used to detect it) \& \-primary, # string (primary tag) \& \-seqobj, # a Bio::Seq object \& \-pattern # a pattern object, usu. TFBS::Matrix::PWM .Ve .SS "pattern" .IX Subsection "pattern" .Vb 6 \& Title : pattern \& Usage : my $pattern = $site\->pattern(); # gets the pattern \& $site\->pattern($pwm); # sets the pattern to $pwm \& Function: gets/sets the pattern object associated with the site \& Returns : pattern object, here TFBS::Matrix::PWM object \& Args : pattern object (optional, for setting the pattern only) .Ve .SS "rel_score" .IX Subsection "rel_score" .Vb 7 \& Title : rel_score \& Usage : my $percent_score = $site\->rel_score() * 100; # gets the pattern \& Function: gets relative score (between 0.0 to 1.0) with respect of the score \& range of the associated pattern (matrix) \& Returns : floating point number between 0 and 1, \& or undef if pattern not defined \& Args : none .Ve .SS "\s-1GFF\s0" .IX Subsection "GFF" .Vb 7 \& Title : GFF \& Usage : print $site\->GFF(); \& : print $site\->GFF($gff_formatter) \& Function: returns a "standard" GFF string \- the "generic" gff_string \& method is left untouched for possible customizations \& Returns : a string (NOT newline terminated! ) \& Args : a $gff_formatter function reference (optional) .Ve .SS "location" .IX Subsection "location" .SS "start" .IX Subsection "start" .SS "end" .IX Subsection "end" .SS "length" .IX Subsection "length" .SS "score" .IX Subsection "score" .SS "frame" .IX Subsection "frame" .SS "sub_SeqFeature" .IX Subsection "sub_SeqFeature" .SS "add_sub_SeqFeature" .IX Subsection "add_sub_SeqFeature" .SS "flush_sub_SeqFeature" .IX Subsection "flush_sub_SeqFeature" .SS "primary_tag" .IX Subsection "primary_tag" .SS "source_tag" .IX Subsection "source_tag" .SS "has_tag" .IX Subsection "has_tag" .SS "add_tag_value" .IX Subsection "add_tag_value" .SS "each_tag_value" .IX Subsection "each_tag_value" .SS "all_tags" .IX Subsection "all_tags" .SS "remove_tag" .IX Subsection "remove_tag" .SS "attach_seq" .IX Subsection "attach_seq" .SS "seq" .IX Subsection "seq" .SS "entire_seq" .IX Subsection "entire_seq" .SS "seq_id" .IX Subsection "seq_id" .SS "annotation" .IX Subsection "annotation" .SS "gff_format" .IX Subsection "gff_format" .SS "gff_string" .IX Subsection "gff_string" The above methods are inherited from Bio::SeqFeature::Generic. Please see Bio::SeqFeature::Generic for details on their usage. TFBS-0.7.1/blib/man3/TFBS::SitePair.3pm000066400000000000000000000113361305752266700170500ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::SitePair 3" .TH TFBS::SitePair 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SS "pattern" .IX Subsection "pattern" .Vb 6 \& Title : pattern \& Usage : my $pattern = $sitepair\->pattern(); # gets the pattern \& # sets the pattern to $pwm \& Function: gets the pattern object associated with the site pair \& Returns : pattern object, here TFBS::Matrix::PWM object \& Args : none (get\-only method) .Ve .SS "\s-1GFF\s0" .IX Subsection "GFF" .Vb 6 \& Title : GFF \& Usage : print $site\->GFF(); \& : print $site\->GFF($gff_formatter) \& Function: returns a "standard" multiline GFF string \& Returns : a string (multiline, newline terminated) \& Args : a $gff_formatter function reference (optional) .Ve .SS "site1 =head2 site2" .IX Subsection "site1 =head2 site2" .Vb 3 \& Title : site1 \& site2 \& Usage : my $site1 = $sitepair\->site1(); \& \& Function: Returns individual TFBS::Site objects, from the site pair \& Returns : a TFBS::Site \& Args : none .Ve TFBS-0.7.1/blib/man3/TFBS::SitePairSet.3pm000066400000000000000000000174031305752266700175250ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::SitePairSet 3" .TH TFBS::SitePairSet 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::SitePairSet \- a set of TFBS::SitePair objects .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& my $site_pair_set = TFBS::SitePairSet\->new(@list_of_site_pair_objects); \& \& # add a TFBS::SitePair object to set: \& \& $site_pair_set\->add_site_pair($site_pair_obj); \& \& # append another TFBS::SitePairSet contents: \& \& $site_pair_set\->add_site_pair_set($site_pair_obj); \& \& # create an iterator: \& \& my $it = $site_pair_set\->Iterator(\-sort_by => \*(Aqstart\*(Aq); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::SitePairSet is an aggregate class that contains a collection of TFBS::SitePair objects. It can be created anew and filled with TFBS::Site::Pair object. It is also returned by \fIsearch_aln()\fR method call of TFBS::PatternI subclasses (e.g. TFBS::Matrix::PWM). .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "size" .IX Subsection "size" .Vb 5 \& Title : size \& Usage : my $size = $sitepairset\->size() \& Function: returns a number of TFBS::SitePair objects contained in the set \& Returns : a scalar (integer) \& Args : none .Ve .SS "add_site_pair" .IX Subsection "add_site_pair" .Vb 6 \& Title : add_site_pair \& Usage : $sitepairset\->add_site_pair($site_pair_object) \& $sitepairset\->add_site_pair(@list_of_site_pair_objects) \& Function: adds TFBS::SitePair objects to an existing TFBS::SitePairSet object \& Returns : $sitepairset object (usually ignored) \& Args : A list of TFBS::SitePair objects to add .Ve .SS "add_site_pair_set" .IX Subsection "add_site_pair_set" .Vb 8 \& Title : add_site_pair_set \& Usage : $sitepairset\->add_site_pair_set($site_pair_set_object) \& $sitepairset\->add_site_pair(@list_of_site_pair_set_objects) \& Function: adds the contents of other TFBS::SitePairSet objects \& to an existing TFBS::SitePairSet object \& Returns : $sitepairset object (usually ignored) \& Args : A list of TFBS::SitePairSet objects whose contents should be \& added to $sitepairset .Ve .SS "Iterator" .IX Subsection "Iterator" .Vb 10 \& Title : Iterator \& Usage : my $it = $sitepairset\->Iterator(\-sort_by=>\*(Aqstart\*(Aq); \& while (my $site_pair = $it\->next()) { #... \& Function: Returns an iterator object, used to iterate thorugh elements \& (TFBS::SitePair objects) \& Returns : a TFBS::_Iterator object \& Args : \-sort_by # optional \- currently it accepts \& # (default sort order in parenthetse) \& # \*(Aqname\*(Aq (pattern name, alphabetically) \& # \*(AqID\*(Aq (pattern/matrix ID, alphabetically) \& # \*(Aqstart\*(Aq (site start in sequence, \& # numerically,increasing order) \& # \*(Aqend\*(Aq (site end in sequence, \& # numerically, increasing order) \& # \*(Aqscore\*(Aq (1st site in pair, \& # numerically, decreasing order) \& \-reverse # optional \- reverses the default sorting order if true .Ve .SS "set1" .IX Subsection "set1" .SS "set2" .IX Subsection "set2" .Vb 7 \& Title : set1 \& set2 \& Usage : my $siteset1 = $sitepairset\->set1(); \& : my $siteset2 = $sitepairset\->set2() \& Function: Returns individual TFBS::SiteSet objects, from the site set pair \& Returns : A TFBS::SiteSet object \& Args : none .Ve .SS "\s-1GFF\s0" .IX Subsection "GFF" .Vb 6 \& Title : GFF \& Usage : print $site\->GFF(); \& : print $site\->GFF($gff_formatter) \& Function: returns a "standard" multiline GFF string \& Returns : a string (multiline, newline terminated) \& Args : a $gff_formatter function reference (optional) .Ve TFBS-0.7.1/blib/man3/TFBS::SiteSet.3pm000066400000000000000000000166461305752266700167210ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::SiteSet 3" .TH TFBS::SiteSet 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::SiteSet \- a set of TFBS::Site objects .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& my $site_set = TFBS::SiteSet\->new(@list_of_site_objects); \& \& # add a TFBS::Site object to set: \& \& $site_set\->add_site($site_obj); \& \& # append another TFBS::SiteSet contents: \& \& $site_pair_set\->add_site_set($site_obj); \& \& # create an iterator: \& \& my $it = $site_set\->Iterator(\-sort_by => \*(Aqstart\*(Aq); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::SiteSet is an aggregate class that contains a collection of TFBS::Site objects. It can be created anew and filled with TFBS::Site object. It is also returned by \fIsearch_seq()\fR method call of some TFBS::PatternI subclasses (e.g. TFBS::Matrix::PWM). .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "add_site" .IX Subsection "add_site" .Vb 6 \& Title : add_site \& Usage : $siteset\->add_site($site_object) \& $siteset\->add_site(@list_of_site_objects) \& Function: adds TFBS::Site objects to an existing TFBS::SiteSet object \& Returns : $sitepair object (usually ignored) \& Args : A list of TFBS::Site objects to add .Ve .SS "add_site_set" .IX Subsection "add_site_set" .Vb 8 \& Title : add_site_set \& Usage : $siteset\->add_site_set($site_set_object) \& $siteset\->add_site(@list_of_site_set_objects) \& Function: adds the contents of other TFBS::SiteSet objects \& to an existing TFBS::SiteSet object \& Returns : $siteset object (usually ignored) \& Args : A list of TFBS::SiteSet objects whose contents should be \& added to $siteset .Ve .SS "size" .IX Subsection "size" .Vb 5 \& Title : size \& Usage : my $size = $siteset\->size() \& Function: returns a number of TFBS::Site objects contained in the set \& Returns : a scalar (integer) \& Args : none .Ve .SS "Iterator" .IX Subsection "Iterator" .Vb 10 \& Title : Iterator \& Usage : my $siteset_iterator = \& $siteset\->Iterator(\-sort_by =>\*(Aqstart\*(Aq); \& while (my $site = $siteset_iterator\->next) { \& # do whatever you want with individual matrix objects \& } \& Function: Returns an iterator object that can be used to go through \& all members of the set (TFBS::Site objects) \& Returns : an iterator object (currently undocumentened in TFBS \- \& but understands the \*(Aqnext\*(Aq method) \& Args : \-sort_by # optional \- currently it accepts \& # (default sort order in parenthetse) \& # \*(Aqname\*(Aq (pattern name, alphabetically) \& # \*(AqID\*(Aq (pattern/matrix ID, alphabetically) \& # \*(Aqstart\*(Aq (site start in sequence, \& # numerically,increasing order) \& # \*(Aqend\*(Aq (site end in sequence, \& # numerically, increasing order) \& # \*(Aqscore\*(Aq (numerically, decreasing order) \& \& \-reverse # optional \- reverses the default sorting order if true .Ve .SS "\s-1GFF\s0" .IX Subsection "GFF" .Vb 6 \& Title : GFF \& Usage : print $siteset\->GFF(); \& : print $siteset\->GFF($gff_formatter) \& Function: returns a "standard" multiline GFF string \& Returns : a string (multiline, newline terminated) \& Args : a $gff_formatter function reference (optional) .Ve TFBS-0.7.1/blib/man3/TFBS::Word.3pm000066400000000000000000000112201305752266700162330ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::Word 3" .TH TFBS::Word 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Word \- base class for word\-based patterns .SH "DESCRIPTION" .IX Header "DESCRIPTION" TFBS::Word is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and word pattern manipulation methods that are independent of the word type. It is not meant to be instantiated itself. .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .SS "word" .IX Subsection "word" .SS "validate_word" .IX Subsection "validate_word" Required in all subclasses .SS "length" .IX Subsection "length" .SS "search_seq" .IX Subsection "search_seq" .SS "search_aln" .IX Subsection "search_aln" TFBS-0.7.1/blib/man3/TFBS::Word::Consensus.3pm000066400000000000000000000173671305752266700203420ustar00rootroot00000000000000.\" Automatically generated by Pod::Man 2.23 (Pod::Simple 3.14) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "TFBS::Word::Consensus 3" .TH TFBS::Word::Consensus 3 "2005-01-04" "perl v5.12.4" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" TFBS::Word \- IUPAC DNA consensus word\-based pattern class =head1 DESCRIPTION .PP TFBS::Word is a base class consisting of universal constructor called by its subclasses (TFBS::Matrix::*), and word pattern manipulation methods that are independent of the word type. It is not meant to be instantiated itself. .SH "FEEDBACK" .IX Header "FEEDBACK" Please send bug reports and other comments to the author. .SH "AUTHOR \- Boris Lenhard" .IX Header "AUTHOR - Boris Lenhard" Boris Lenhard .SH "APPENDIX" .IX Header "APPENDIX" The rest of the documentation details each of the object methods. Internal methods are preceded with an underscore. .SS "new" .IX Subsection "new" .Vb 8 \& Title : new \& Usage : my $pwm = TFBS::Matrix::PWM\->new(%args) \& Function: constructor for the TFBS::Matrix::PWM object \& Returns : a new TFBS::Matrix::PWM object \& Args : # you must specify the \-word argument: \& \-word, # a strig consisting of letters in \& # IUPAC degenerate DNA alphabet \& # (any of ACGTSWKMPYBDHVN) \& \& ####### \& \& \-name, # string, OPTIONAL \& \-ID, # string, OPTIONAL \& \-class, # string, OPTIONAL \& \-tags # a hash reference reference, OPTIONAL .Ve .SS "search_seq" .IX Subsection "search_seq" .Vb 6 \& Title : search_seq \& Usage : my $siteset = $pwm\->search_seq(%args) \& Function: scans a nucleotide sequence with the pattern represented \& by the PWM \& Returns : a TFBS::SiteSet object \& Args : # you must specify either one of the following three: \& \& \-file, # the name od a fasta file (single sequence) \& #or \& \-seqobj # a Bio::Seq object \& # (more accurately, a Bio::PrimarySeqobject or a \& # subclass thereof) \& #or \& \-seqstring # a string containing the sequence \& \& \-max_mismatches, # number of allowed positions in the site that do \& # not match the consensus \& # OPTIONAL: default 0 .Ve .SS "search_aln" .IX Subsection "search_aln" .Vb 10 \& Title : search_aln \& Usage : my $site_pair_set = $pwm\->search_aln(%args) \& Function: Scans a pairwise alignment of nucleotide sequences \& with the pattern represented by the word: it reports only \& those hits that are present in equivalent positions of both \& sequences and exceed a specified threshold score in both, AND \& are found in regions of the alignment above the specified \& conservation cutoff value. \& Returns : a TFBS::SitePairSet object \& Args : # you must specify either one of the following three: \& \& \-file, # the name of the alignment file in Clustal \& format \& #or \& \-alignobj # a Bio::SimpleAlign object \& # (more accurately, a Bio::PrimarySeqobject or a \& # subclass thereof) \& #or \& \-alignstring # a multi\-line string containing the alignment \& # in clustal format \& ############# \& \& \-max_mismatches, # number of allowed positions in the site that do \& # not match the consensus \& # OPTIONAL: default 0 \& \& \-window, # size of the sliding window (inn nucleotides) \& # for calculating local conservation in the \& # alignment \& # OPTIONAL: default 50 \& \& \-cutoff # conservation cutoff (%) for including the \& # region in the results of the pattern search \& # OPTIONAL: default "70%" .Ve .SS "to_PWM" .IX Subsection "to_PWM" .SS "validate_word" .IX Subsection "validate_word" .SS "length" .IX Subsection "length" TFBS-0.7.1/blib/script/000077500000000000000000000000001305752266700145065ustar00rootroot00000000000000TFBS-0.7.1/blib/script/.exists000066400000000000000000000000001305752266700160140ustar00rootroot00000000000000TFBS-0.7.1/examples/000077500000000000000000000000001305752266700141105ustar00rootroot00000000000000TFBS-0.7.1/examples/SAMPLE_FlatFileDir/000077500000000000000000000000001305752266700172765ustar00rootroot00000000000000TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0001.pfm000066400000000000000000000001701305752266700205760ustar00rootroot00000000000000 0 3 79 40 66 48 65 11 65 0 94 75 4 3 1 2 5 2 3 3 1 0 3 4 1 0 5 3 28 88 2 19 11 50 29 47 22 81 1 6 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0008.pfm000066400000000000000000000001401305752266700206020ustar00rootroot00000000000000 3 21 25 0 0 24 1 0 13 1 0 0 5 0 0 0 4 0 0 0 0 1 0 2 5 3 0 25 20 0 24 23 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0015.pfm000066400000000000000000000001701305752266700206030ustar00rootroot0000000000000025 1 74 0 78 1 41 2 53 12 13 2 2 9 2 4 2 9 3 29 40 1 4 4 0 1 36 1 20 15 2 76 0 67 0 74 1 68 4 24 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0022.pfm000066400000000000000000000002201305752266700205750ustar00rootroot00000000000000 0 0 0 0 1 1 1 0 0 1 1 3 5 0 1 0 1 0 0 0 3 9 9 5 6 12 11 10 3 2 0 0 0 0 1 5 2 1 1 3 8 10 12 13 10 3 2 0 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0029.pfm000066400000000000000000000002501305752266700206070ustar00rootroot0000000000000014 20 0 27 1 27 26 0 27 0 24 23 6 15 2 1 1 0 10 0 0 0 0 3 1 0 7 6 6 2 25 0 0 0 1 27 0 0 0 4 7 3 5 4 1 0 16 0 0 0 0 24 2 0 7 3 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0036.pfm000066400000000000000000000000741305752266700206110ustar00rootroot0000000000000013 0 52 0 25 13 5 0 0 7 18 48 1 0 15 9 0 0 53 6 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0043.pfm000066400000000000000000000002201305752266700206000ustar00rootroot00000000000000 1 6 1 0 13 0 6 0 13 15 2 5 4 0 0 0 1 15 0 9 4 0 3 5 8 12 0 3 2 1 12 0 1 1 1 3 5 0 17 15 2 2 0 9 0 2 12 5 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0050.pfm000066400000000000000000000002201305752266700205760ustar00rootroot00000000000000 6 19 19 20 5 0 1 20 19 20 1 1 4 0 0 0 3 10 1 0 1 0 13 13 10 1 0 0 11 0 18 0 0 0 6 1 0 0 1 0 1 10 0 0 0 0 0 5 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0057.pfm000066400000000000000000000001701305752266700206110ustar00rootroot00000000000000 1 2 15 0 0 0 0 3 10 8 4 0 1 0 0 2 0 1 0 2 7 7 0 11 15 14 14 8 4 4 4 7 0 5 1 0 2 4 2 2 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0064.pfm000066400000000000000000000000741305752266700206120ustar00rootroot0000000000000016 16 16 0 1 0 0 0 0 9 0 0 0 16 1 0 0 0 0 5 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0071.pfm000066400000000000000000000001701305752266700206050ustar00rootroot0000000000000015 9 6 11 21 0 0 0 0 25 1 1 12 2 0 0 0 0 25 0 2 0 4 5 4 25 25 0 0 0 7 15 3 7 0 0 0 25 0 0 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0078.pfm000066400000000000000000000001541305752266700206160ustar00rootroot00000000000000 7 8 3 30 0 0 0 0 0 9 8 18 0 1 0 0 0 17 6 4 1 0 0 0 31 2 10 9 11 9 1 30 31 0 29 4 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0085.pfm000066400000000000000000000003001305752266700206050ustar00rootroot00000000000000 1 0 0 0 0 3 0 10 10 5 1 1 4 3 4 2 3 4 0 0 0 0 0 0 0 5 7 2 2 1 3 1 3 0 10 0 10 7 10 0 0 0 1 4 2 4 2 2 3 6 0 10 0 0 0 0 0 0 1 3 2 2 1 5 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0092.pfm000066400000000000000000000001701305752266700206100ustar00rootroot00000000000000 4 10 2 0 0 0 0 9 16 5 8 0 2 28 0 0 3 14 0 4 10 15 1 0 0 29 25 1 3 4 7 4 24 1 29 0 1 5 10 16 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0099.pfm000066400000000000000000000001401305752266700206140ustar00rootroot00000000000000 5 0 0 18 1 1 6 18 2 0 1 1 6 1 13 0 12 0 18 0 9 0 0 0 0 19 0 0 3 17 0 1 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/MA0106.pfm000066400000000000000000000003601305752266700206050ustar00rootroot00000000000000 5 3 4 5 13 0 17 0 0 0 0 0 1 1 4 1 15 2 1 1 8 7 0 0 0 17 0 0 0 11 16 16 0 0 0 14 0 0 1 2 2 6 13 12 4 0 0 0 17 0 0 0 15 14 13 2 0 0 14 1 2 1 0 0 0 0 0 17 0 6 1 1 1 2 0 0 2 15 1 13 TFBS-0.7.1/examples/SAMPLE_FlatFileDir/matrix_list.txt000066400000000000000000000012161305752266700223760ustar00rootroot00000000000000MA0022 12.9045842260907 Dorsal_1 REL MA0015 10.9770931714416 CF2-II ZN-FINGER, C2H2 MA0008 11.8821478649142 Athb-1 HOMEO-ZIP MA0106 26.2394386341062 p53 P53 MA0050 16.0080101497756 Irf-1 TRP-CLUSTER MA0043 11.1469251071435 HLF bZIP MA0036 5.68777359118543 GATA-2 ZN-FINGER, GATA MA0029 17.9085403297222 Evi-1 ZN-FINGER, C2H2 MA0071 13.1897301896459 RORalfa-1 NUCLEAR RECEPTOR MA0064 8.5086853429636 PBF ZN-FINGER, DOF MA0057 9.40029161122265 MZF_5-13 ZN-FINGER, C2H2 MA0092 10.1435823386149 Thing1-E47 bHLH MA0085 16.6733068362852 SU_h IPT/TIG domain MA0078 10.5018372361999 SOX17 HMG MA0099 10.6698488649404 c-FOS bZIP MA0001 10.5882147138157 AGL3 MADS TFBS-0.7.1/examples/list_matrices.pl000066400000000000000000000066341305752266700173200ustar00rootroot00000000000000#!/usr/bin/env perl -w # list_matrices.pl # by Boris Lenhard # # See POD documentation for this script at the end of the file # use strict; use Getopt::Long; # for parsing command line arguments use Pod::Usage; use TFBS::DB::FlatFileDir; # Get command line options - if you are curious how this # works, check the Getopt::Long module documentation. my ($database_dir, $id_only, $verbose, $help); GetOptions('help' => \$help, 'database=s' => \$database_dir, 'id-only' => \$id_only, 'verbose' => \$verbose ); if($help) { pod2usage(-exitstatus=>0, -verbose=>2); } elsif (!$database_dir) { pod2usage(1); } # connect to FlatFileDir matrix database # (there is a sample FlatFileDir matrix database directory # examples/SAMPLE_FlatFileDir in the TFBS distribution package) # Change this line if you want to use a different type of database # (e.g. TFBS::DB::JASPAR2) my $db = TFBS::DB::FlatFileDir->connect($database_dir); # get all matrices (TFBS::Matrix::PWM objects) into a TFBS::MatrixSet object my $matrixset = $db->get_MatrixSet(-matrixtype=>"PFM"); # print heading if normal output unless ($id_only or $verbose) { printf("\n %-10s%-15s%-20s%10s%10s\n", 'MatrixID', 'Name', 'Class','Length', 'Total IC'); } # print line if normal or verbose output unless($id_only) { print ("-"x70,"\n"); } # Iterate through the set and display ID and name # (aggregate classes in TFBS - TFBS::MatrixSet, TFBS::SiteSet, # TFBS::SitePairSet) are equipped with iterators that all follow # the same syntax:) my $mx_iterator = $matrixset->Iterator(-sort_by=>'ID'); while (my $pfm = $mx_iterator->next()) { #for each matrix in the set if ($verbose) { print ("\n","-"x65); print ("\nMatrix ID : ", $pfm->ID); print ("\nTransctiption factor name : ", $pfm->name); print ("\nStructural class : ", $pfm->class); print ("\nTotal information content : ", sprintf("%2.2f",$pfm->to_ICM->total_ic)); print ("\nMatrix:\n", $pfm->prettyprint); print ("","-"x65,"\n\n"); } elsif ($id_only) { print ($pfm->ID, "\t", $pfm->name, "\n"); } else { printf(" %-10s%-15s%-20s%10s%10.2f\n", $pfm->ID, $pfm->name, $pfm->class, $pfm->length, $pfm->to_ICM->total_ic); } } # print the line for normal and verbouse output unless($id_only) { print ("-"x70, "\nTotal ", $matrixset->size, " matrices.\n\n"); } # The rest is usage message if the user requests help # or fails to provide required parameters __END__ =head1 NAME list_matrices.pl - List info on matrix patterns stored in a flat file directory =head1 SYNOPSIS ./list_matrices.pl -d [other_options] =head1 OPTIONS =over 8 =item B<-d or --database> REQUIRED: Name of the FlatFileDir database directory to use for retrieving matrices. A sample database directory examples/SAMPLE_FlatFileDir is available in TFBS distribution. =item B<-i or --id-only> OPTIONAL: Prints only a list of matrix IDs =item B<-v or --verbose> OPTIONAL: Prints full record (matrix and info). Overrides -i if set simultaneously. =back =head1 DESCRIPTION This is an example script that displays information about matrix patterns stored in a flat file directory-type database. Its source code is meant to be studied by bioinformaticians who wish to learn how to use TFBS modules. =cut TFBS-0.7.1/examples/phylofoot.pl000066400000000000000000000116111305752266700164700ustar00rootroot00000000000000#!/usr/bin/env perl -w # phylofoot.pl # by Boris Lenhard # # See POD documentation for this script at the end of the file # use strict; use Getopt::Long; # for parsing command line arguments use Pod::Usage; use TFBS::DB::FlatFileDir; # DEFAULTS - presumes scripts are run from example/ directory # in the TFBS distribution # default values for optional parameters my $CONSERVATION_PERCENT = 70; my $MATRIX_THRESHOLD_PERCENT = 80; my $SLIDING_WINDOW = 50; # Get command line options - if you are curious how this # works, check the Getopt::Long module documentation. my ($database_dir, $alignment_file, $help); my @matrix_IDs = (); my $conservation = $CONSERVATION_PERCENT; my $threshold = $MATRIX_THRESHOLD_PERCENT; my $window = $SLIDING_WINDOW; GetOptions('help' => \$help, 'alignment_file=s' => \$alignment_file, 'database=s' => \$database_dir, 'matrix-id:s' => \@matrix_IDs, 'conservation:f' => \$conservation, 'threshold-score:f' => \$threshold, 'window-size:i' => \$window ); if($help) { pod2usage(-exitstatus=>0, -verbose=>2); } elsif (!($alignment_file and $database_dir)) { pod2usage(1); } # parse both the comma-separated lists and individually specified # matrix IDs: @matrix_IDs = split (",", join(',',@matrix_IDs)); # connect to FlatFileDir matrix database # (there is a sample FlatFileDir matrix database directory # examples/SAMPLE_FlatFileDir in the TFBS distribution package) # Change this line if you want to use a different type of database # (e.g. TFBS::DB::JASPAR2) my $db = TFBS::DB::FlatFileDir->connect($database_dir); # slurp the matrices into a TFBS::MatrixSet; my $matrixset; unless (scalar @matrix_IDs) { # if none are selected, we already have the set: $matrixset = $db->get_MatrixSet(-matrixtype=>"PWM"); # retrieves all matrices from the db } else { # otherwise we retrieve inly the requested ones $matrixset = $db->get_MatrixSet(-IDs => \@matrix_IDs, -matrixtype=>"PWM"); } # do the phylogenetic footprinting search of the sites my $sitepairset = $matrixset->search_aln(-file=>$alignment_file, -cutoff=>"$conservation", -threshold=>"$threshold\%", -windowsize => $window); # print it out in GFF format print $sitepairset->GFF; # ANd that's it. Instead of simply printing GFF, you can e.g. # - iterate through individual detected binding site pairs and extract # the information you need # - attach individual sittes as Bio::SeqFeature subclass objects # to bioperl's Bio::Seq objects # - collect the hit sequences and produce derived pattern matrices # using TFBS::Pa # - you tell me... # The rest is usage message if the user requests help # or fails to provide required parameters __END__ =head1 NAME phylofoot.pl - Phylogenetic footprinting example script =head1 SYNOPSIS ./phylofoot.pl -a -d [other_options] =head1 OPTIONS =over 8 =item B<-d or --database> REQUIRED: Name of the FlatFileDir database directory to use for retrieving matrices. A sample database directory examples/SAMPLE_FlatFileDir is available in TFBS distribution. =item B<-a or --alignment-file> REQUIRED: Name of the pairwise alignment file in Clustal format. A sample database directory examples/sample_alignment.aln is available in TFBS distribution. =item B<-m or --matrix-id> OPTIONAL: ID of the matrix from the database to scan the alignment with.You can specify multiple matrices using multiple -m switches or a single comma-separated lists of IDs (NO spaces - e.g. -m M00001,M00021,N01921 ). You can use a script called examples/list_matrices.pl in TFBS distribution to list information for all matrices in a matrix database of the FlatFileDir type. DEFAULT: If no matrix IDs are specified, all matrices in the database are used for the search; =item B<-w or --window-size> OPTIONAL: The width of sliding window for calculating the conservation profile of the submitted pairwise alignment. DEFAULT: If not specified, the default value is 50 (nucleotides). =item B<-c or --conservation> OPTIONAL: Conservation cutoff (in percent) for a region of multiple alignment to include detected conserved sites into output. DEFAULT: If not specified, the default value is 70 (%). =item B<-t or --threshold-score> OPTIONAL: Threshold score (in percent) for a matrix match to a subsequence. DEFAULT: If not specified, the default value is 80 (%). =back =head1 DESCRIPTION This is an example script that scans conserved regions of a pairwise DNA sequence alignment with a set of matrices form a flat file databases and produces GFF output. Its source code is meant to be studied by bioinformaticians who wish to learn how to use TFBS modules. =cut TFBS-0.7.1/examples/sample_alignment.aln000066400000000000000000000202371305752266700201270ustar00rootroot00000000000000CLUSTAL alignment of sequences Human_IR and Mouse_IR Human_IR TCAGAGATGTCCACCTGCGCCCTATTCGAGGTCTCCGGCGTCTTCTTTGGCGTCGTCTTT Mouse_IR ------------------------------------------------------------ Human_IR GCCCTTTCAGAAGCGTCTGCACATTTTTCCAGGTGTCATTTCTCCAACTTGAACACAGGG Mouse_IR ------------------------------------------------------------ Human_IR AGCGCACTGGGCACGCGGGCACGTGGCTGTCCCCAGGGGCCTGGCTTGGGTCTCGCCCCT Mouse_IR ------------------------------------------------------------ Human_IR GGGCCGGGGCGCACGCGCGGGCGGGACATCTGGGGGCGCCCACGCGCTCTGGGACGAGTG Mouse_IR ------------------------------------------------------------ Human_IR TCGCTGGCCAGGCCCGGACTGAGGAAAGGCGAGTGAGACACTACTCGCCTGGGGTGCAAA Mouse_IR ------------------------------------------------------------ Human_IR ATTTAAGGGAGTGAAAAAAAAAAAAAAAGAAAGAAACCAAAACCACCTCGAGTCACCAAA Mouse_IR ------------------------------------------------------------ Human_IR ATAAACATTTTAATGCAGTATTTTTTAAAAAATCAACAGGAATCCTCCAAAGCCCACTAT Mouse_IR --------------------------------------------CCACGGGAGTCAAAAA * * ** * Human_IR GAACAAAATAGCAAAATGGTAGAGAAAGGATCTGTGCCGCTGCGTCGGGCCTGT------ Mouse_IR AATAGTAAAAAGGAGTAAAGAAAGACAAAATGTGCTTGGTTGAATCTGGCATCTAGGGCA * ** * * * *** * ** ** * ** ** *** * * Human_IR ------------------------------------------------------------ Mouse_IR TATTGGGGTGTTTATGTCCTGGGGTATCTGGCGGCATTTCCAGACTTTAAAATTGTTATT Human_IR --G--------------------------------------------------------- Mouse_IR TCGGGGTAATTTGGGGTGTCTCGCTCTGGCGGAGATCTAGAGTGTGCTCCAGACCTTTAA * Human_IR ------------------------------------------------------------ Mouse_IR ATTGGAACCTCGGTGTTACTTGTGTTTGCTGCAGTCAGGAGGGGACCTGGGGGCGTCCCT Human_IR ------------------------------------------------------------ Mouse_IR AGGCTTTTAAACTGGAACCTCAGGCCTGTTTAGTGGTTTTCGAGTCAGGCGGGGATCTGG Human_IR ------------------------------------------------------------ Mouse_IR GGACCCCTAGGTTTTAAAATCGGAACTTTGGGACTCTTTGGGGGCGTTCGTGTCAGGTGC Human_IR ------GGGCGCCTCCGGGGGTCTGAAACTGGAG-------------------------- Mouse_IR GGATATGGGCAGCTCCGGACGTTTGACAGGTGAGTCTTAGAGTTGTTTGGGTGTTTATGT **** ****** ** *** * *** Human_IR ------------------------------------------------------------ Mouse_IR TACACTGGTATCTGAGAACGCTTACAAGCTTTTAAATCGTAATCTTTCTAAGCCAGCAGT Human_IR --------------------------------------------------------GAGA Mouse_IR TCGGGGTGTTGGGATTGGAAGAATCTGAAGTGAGCCTCGGGTCTCTGAAGTTGCGGGTTT * Human_IR CTCGGGGCTGTAGGGCGCGCGGATCTGGGGCGCGCCCTCGGTCCCGGCGCGCCCAGGGCC Mouse_IR TACGGGGC---------------------------------------------------- ****** Human_IR TCCCGCGCGGGGCCCGGCACAGGGAGGCGGGGAGGCGGGCGGGGCGGGGCG-----GGAC Mouse_IR ----AAGGGGGAAAGGGGCCGAGGAGGCGAGGAGGCGAG-GAGGCGTGGCGTTGCCAGCT * *** ** * ******* ******* * * **** **** * Human_IR CGGGCGGCACCTCCC--TCCCCTGCAAGCTTT----CCCTCCCTCTCC--TGGGCCTCTC Mouse_IR CTGGCCACGCCTCCCCGAGCTCTGCTGGCTTTCTCCCCCTCCCCCCCCAGCGGGCCTCAC * *** * ****** * **** ***** ******* * ** ******* * Human_IR CCGGGCGCAGAGTCCCTTCCTAGGCCAGATCCGCGCCGCCTTTTCCCGCGGCCCGCACGG Mouse_IR TGGAGCGCACAGTCCCTTCCCTGTCTTAATCCGCGCCACCTTTTCCCTTGGCCCGCGCGG * ***** ********** * * ********* ********* ******* *** Human_IR GGCCCAGCTGACGGGCCGCGTTGTTTACGGGCCGGAG-C-AGCCCTCTCTCCCGCCGCCC Mouse_IR G-CCCAGCTGACGGGCTGCGTTGTTTACGGGCTGGAGTCGAGCCCTAGCTCCCGCCGCC- * ************** *************** **** * ****** *********** Human_IR GCCCGCCACCCGCCAGCCCAGGTGCCCGCCCGCCAGTCAGCTAGTCCGTCGGTCCGCGCG Mouse_IR --------ACCACCAGCCCAGGTGCCC--------------------TCCAGTGCGCGCG ** *************** * ** ****** Human_IR TCCCTCTGTCCCGGAGCCCGCAGATCGCGACCCAGAGCGCGCGGGGCCGAGAGCCGAGAG Mouse_IR TCCCTCTGTCTCAGACTCCACACAACGCGACCCA-------------------------- ********** * ** ** ** * ********* Human_IR ACAGTCCCG----------------------------------------GGCGCAGCGCG Mouse_IR ------CCCAGAGCTGCACGGAGGGCAGAAGCAGGAGACCCGGACAGGAGACGCACCGCG ** * **** **** Human_IR GAG--CTCCGGGCCCCGAGATCCTGGGACGGGGCCCGGGCCGCAGCGGCCGGGGGGTCGG Mouse_IR GTGAGCTCTGGACTCTGGGATTGCGGGCAC----------------------------GG * * *** ** * * * *** *** ** Human_IR GGCC-------ACCACCGCAAGGGCCTCCGCTCAGTATTTGTAGCTGGCGAAGCCGCGCG Mouse_IR GACCGGGCCTGGGTGACCTGCGGGCCGCGCCACGGTGTTGCTTGCGGCCGAGGCCTCT-G * ** * ***** * * * ** ** * ** * *** *** * * Human_IR CGCCCTTCCCGGGGCTG-CCTCTGGGCCCTCCCCGGCAGGGGGGCTGCGGCCCGCGGGTC Mouse_IR TGCTCTTCCCGGGACTGTCCCCAGGGCCCT-CTAGGCTGGAGAGCTGCGGCCTGTGAGCC ** ********* *** ** * ******* * *** ** * ********* * * * * Human_IR GCGGGCGTGGAAGAGAAGGACGCGCGGCCCCCAGCGCCTCTTGGGTGGCCG-C-CTCGGA Mouse_IR ACGGGCGTGGAAGAGAAGGACGTGCGGCCCCGAGCGCCTCTCCAGAGACCTTCTCACGGA ********************* ******** ********* * * ** * * **** Human_IR GCATGACCCCCGCGGGCCAGCGCCGCGCGCTCTGATCCGAGGAGACCCCGCGCTCCCGCA Mouse_IR GTATGTCCCCAGTAGGCCGGCGTGGCGTGCTCTGATCGCCGGGGTCCCAGCACTCCTACT * *** **** * **** *** *** ********* ** * *** ** **** * Human_IR GCCATGGGCACCGGGGGCCGGCGGGGAG-CGGCGGCCGCGCCGCTGCTGGTGGCGGTGGC Mouse_IR GCTATGGGCTTCGGGAGA-GGATGTGAGACGACGGCTGTGCCATTGCTGGTGGCCGTGGC ** ****** **** * ** * *** ** **** * *** ********** ***** Human_IR CGCGCTGCTACTGGGCGCCGCGGGCCACCTGTACCCCGGAGAGGGTGAGTCTGGGGGCGC Mouse_IR CGCGTTGCTGGTGGGCACAGCCGGCCACCTGTACCCTGGAGAGGGTAAGTCTGG------ **** **** ***** * ** ************** ********* ******* Human_IR GGGCGTGGGCGGGGAGCGCCGCGATGGGGAGAGGACCCCACCCAAGCCAAAATCGATCCC Mouse_IR ------------------------------------------------------------ Human_IR CCGCTTGTGGACTGAGAACCCTCCCCAGGGGCGGGGGGCGGTGGCCAGGACGGTAGCTCC Mouse_IR ------------------------------------------------------------ Human_IR TGCATCGCGTAGGGGGAGCGGGAAGC Mouse_IR -------------------------- TFBS-0.7.1/examples/script1.pl000066400000000000000000000005021305752266700160270ustar00rootroot00000000000000#!/usr/bin/env perl -w use Bio::DB::GenBank; use TFBS::DB::TRANSFAC; my $seq = Bio::DB::GenBank->new()->get_Seq_by_acc('AF100993'); my $db = TFBS::DB::TRANSFAC->connect(); my $pwm = $db->get_Matrix_by_ID('V$CEBPA_01','PWM'); my $siteset = $pwm->search_seq(-seqobj=>$seq, -threshold=>"80%"); print $siteset->GFF(); TFBS-0.7.1/examples/script2.pl000066400000000000000000000004301305752266700160300ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::DB::FlatFileDir; use TFBS::PatternGen::Gibbs; my $gibbs = TFBS::PatternGen::Gibbs->new (-seq_file=>'sequences.fa', -motif_length=>10); my $db = TFBS::DB::FlatFileDir->create('NewPatterns'); $db->store_Matrix($gibbs->all_patterns()); TFBS-0.7.1/examples/viewpfm.cgi000066400000000000000000000065571305752266700162660ustar00rootroot00000000000000#!/usr/bin/perl -w # viewpfm.cgi # by Boris Lenhard # # See POD documentation for this CGI script at the end of the file # use strict; use CGI (); use TFBS::Matrix::PFM; use TFBS::DB::FlatFileDir; use constant DATABASE_DIR => "examples/SAMPLE_FlatFileDir"; # The directory to store created logo image: we need an absolute # path for access by the script, and a relative path for # access by the web browser use constant ABSOLUTE_IMAGE_DIR => "/var/www/html/TEMP"; use constant RELATIVE_IMAGE_DIR => "/TEMP"; use constant THIS_SCRIPT => "/cgi-bin/viewpfm.cgi"; # IMPORTANT NOTE: this script does not delete image files it creates # page # connect to FlatFileDir matrix database # (there is a sample FlatFileDir matrix database directory # examples/SAMPLE_FlatFileDir in the TFBS distribution package) # Change this line if you want to use a different type of database # (e.g. TFBS::DB::JASPAR2) my $db = TFBS::DB::FlatFileDir->connect(DATABASE_DIR); if (CGI::param("matrix_id")) { # matrix entry matrix_info($db, CGI::param("matrix_id")); } else { # draw logo matrix_list_page($db); } sub matrix_list_page { my ($db) = @_; my $q = CGI->new; # get all matrices (TFBS::Matrix::PWM objects) into a TFBS::MatrixSet object my $matrixset = $db->get_MatrixSet(-matrixtype=>"PFM"); print $q->header, $q->start_html; print $q->h1("Matrices in the database"); my $matrix_iterator = $matrixset->Iterator(-sort_by=>"ID"); my @table_rows = ($q->Tr($q->th([ 'MatrixID', 'Name', 'Class','Length', 'Total IC']))); while (my $pfm = $matrix_iterator->next) { push @table_rows, $q->Tr($q->td([$q->a({-href=>THIS_SCRIPT."?matrix_id=".$pfm->ID}, $pfm->ID), $pfm->name, $pfm->class, $pfm->length, $pfm->to_ICM->total_ic])); } print $q->table({-border=>1}, @table_rows); print $q->end_html; } sub matrix_info { my ($db, $matrix_id) = @_; my $q = CGI->new; my $pfm = $db->get_matrix_by_ID($matrix_id); unless(defined $pfm) { # first we draw a sequence logo and store it in a .png file my $logofile = $pfm->ID.".png"; # we want image size to vary with motif length: my $xsize = 60+20*$pfm->length(); # ...but it should not be too narrow for short motifs: $xsize=278 if($pfm->length()<10); $pfm->draw_logo(-file =>ABSOLUTE_IMAGE_DIR."/$logofile", -full_scale =>2.25, -xsize =>$xsize, -ysize =>190, -graph_title=> $pfm->name, -x_title=>"Nucleotide position", -y_title=>"ic [bits]"); # then we output the page print $q->header, $q->start_html; print $q->div("Matrix ID : ".$pfm->ID); print $q->div("Transctiption factor name : ", $pfm->name); print $q->div("Structural class : ", $pfm->class); print $q->div("Total information content : ", sprintf("%2.2f",$pfm->to_ICM->total_ic)); print $q->div("Matrix:"); print $q->div($q->pre($pfm->prettyprint)); print $q->div("Sequence logo:"); print $q->img({-src=>RELATIVE_IMAGE_DIR."/$logofile"}); print $q->div($q->a({-href=>THIS_SCRIPT}, "Back to matrix list")); print $q->end_html; } else { # matrix not found print $q->header, $q->start_html; print $q->h2("Matrix $matrix_id not found in the database"); print $q->a({-href=>THIS_SCRIPT}, "Back to matrix list"); print $q->end_html; } } TFBS-0.7.1/pm_to_blib000066400000000000000000000000001305752266700143110ustar00rootroot00000000000000TFBS-0.7.1/t/000077500000000000000000000000001305752266700125355ustar00rootroot00000000000000TFBS-0.7.1/t/01_Matrix.t000066400000000000000000000011321305752266700144630ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::Matrix::PFM; use Test; plan(tests => 4); # print STDERR join("\n", @INC); my $matrixstring = "2 2 2 2 2 2 2 2\n0 0 0 0 0 0 0 0\n0 0 0 0 0 0 0 0\n0 0 0 0 0 0 0 0"; my $pfm = TFBS::Matrix::PFM->new(-matrix=>$matrixstring, -name=>"MyMatrix"); my $pfmstring = $pfm->rawprint; my $icmstring = $pfm->to_ICM(-add_pseudocounts=>0)->rawprint; ok($pfmstring, $icmstring); ok(1, defined $pfm->to_ICM); my $pwmstring = $pfm->to_PWM->rawprint; my @pwmlines = split "\n", $pwmstring; ok ($pwmlines[1], $pwmlines[3]); ok (1, ($pwmlines[0] ne $pwmlines[1])); TFBS-0.7.1/t/02_Search.t000066400000000000000000000013471305752266700144350ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::Matrix::PFM; use strict; use Test; plan(tests => 2); my $matrixstring = "0 0 0 0 0 0 0 0\n". "0 12 12 0 12 0 12 12\n". "0 0 0 12 0 12 0 0\n". "12 0 0 0 0 0 0 0"; my $pfm = TFBS::Matrix::PFM->new(-matrix=>$matrixstring, -name=>"MyMatrix"); my $siteset = $pfm->to_PWM->search_seq(-file=>'t/test.fa',-threshold=>"70%"); ok($siteset->size(), 20); print $siteset->GFF(); my $sitepairset = $pfm->to_PWM->search_aln(-file=>'t/test.aln', -window=>50, -cutoff=>50, -threshold=>"70%"); my $It = $sitepairset->Iterator(); my $startsum = 0; while (my $sitepair = $It->next) { $startsum += $sitepair->feature1->start; } ok($startsum, 3013); TFBS-0.7.1/t/03_DB_FlatFileDir.t000066400000000000000000000014221305752266700157150ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::Matrix::PFM; use TFBS::DB::FlatFileDir; use Test; plan(tests => 2); my @dbparams; # set up a matrix my $matrixstring = "12 3 0 0 4 0\n0 0 0 11 7 0\n0 9 12 0 0 0\n0 0 0 1 1 12"; my $pfm = TFBS::Matrix::PFM->new(-matrix=>$matrixstring, -ID=>"TEST001"); my $rawstring1 = $pfm->rawprint(); my $db; # write/read test $db = TFBS::DB::FlatFileDir->create ("t/FlatFileDir"); $db->store_Matrix($pfm); my $pfm2= $db->get_Matrix_by_ID("TEST001", "PFM"); my $rawstring2 = $pfm2->rawprint; ok ($rawstring1, $rawstring2); # delete test $db->delete_Matrix_having_ID('TEST001'); my $nopfm = $db->get_Matrix_by_ID("TEST001", "PFM"); ok(undef, $nopfm); END { -d "t/FlatFileDir" && unlink ; rmdir "t/FlatFileDir"; } TFBS-0.7.1/t/04_DB_TRANSFAC.t000066400000000000000000000010511305752266700147700ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::DB::TRANSFAC; use Test; plan(tests => 4); my $db = TFBS::DB::TRANSFAC->new(-accept_conditions=>1); # get a pfm by acc my $pfm1 = $db->get_Matrix_by_acc('M00039'); my $icm1 = $db->get_Matrix_by_acc('M00039',"icm"); my $pfm2 = $db->get_Matrix_by_ID('V$CREB_01', "PFM"); my $icm2 = $db->get_Matrix_by_ID('V$CREB_01', "ICM"); print STDERR $icm1->name()."######".$icm1->tag('acc'); ok($pfm1->ID,'V$CREB_01'); ok($pfm1->rawprint, $pfm2->rawprint); ok($icm1->rawprint, $icm2->rawprint); ok($icm1->tag('acc'), "M00039"); TFBS-0.7.1/t/05_DB_JASPAR.t000066400000000000000000000020401305752266700145470ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::Matrix::PFM; use TFBS::DB::JASPAR2; use Test; plan(tests => 2); my @dbparams; if (-e 't/MYSQLCONNECT') { open FILE, 't/MYSQLCONNECT'; my $line = ; @dbparams = split "::", $line; close FILE; $skip = 0; } else { print "ok # Skip (MySQL server not set up)\n"x2; exit(0); } # set up a matrix my $matrixstring = "12 3 0 0 4 0\n0 0 0 11 7 0\n0 9 12 0 0 0\n0 0 0 1 1 12"; my $pfm = TFBS::Matrix::PFM->new(-matrix=>$matrixstring, -ID=>"TEST001"); my $rawstring1 = $pfm->rawprint(); my $db; # write/read test $db = TFBS::DB::JASPAR2->create ("dbi:mysql:JASPAR2TEST:$dbparams[0]",$dbparams[1], $dbparams[2]); $db->store_Matrix($pfm); my $pfm2= $db->get_Matrix_by_ID("TEST001", "PFM"); my $rawstring2 = $pfm2->rawprint; ok ($rawstring1, $rawstring2); # delete test $db->delete_Matrix_having_ID('TEST001'); my $nopfm = $db->get_Matrix_by_ID("TEST001", "PFM"); ok(undef, $nopfm); END { $db && $db->dbh && $db->dbh->do("drop database if exists JASPAR2TEST"); } TFBS-0.7.1/t/06_SimplePFM.t000066400000000000000000000011141305752266700150200ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::PatternGen::SimplePFM; use Test; plan(tests => 1); my $matrixstring = <new(-matrixstring=>$matrixstring); my @sequences = qw( AAGCCT AGGCAT AAGCCT AAGCCT AGGCAT AGGCCT AGGCAT AGGTTT AGGCAT AGGCCT AGGCCT ); my $patterngen = TFBS::PatternGen::SimplePFM->new(-seq_list=>\@sequences); my $pfm = $patterngen->pattern(); # $pfm is now a TFBS::Matrix::PFM object ok($manpfm->rawprint, $pfm->rawprint); TFBS-0.7.1/t/07_AnnSpec.t000066400000000000000000000024451305752266700145640ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::PatternGen::AnnSpec; use TFBS::PatternGen::SimplePFM; use Test; plan(tests => 7); my $annspecpath; eval {$annspecpath = `which ann-spec 2> /dev/null`;}; if (!$annspecpath or ($annspecpath =~ / /)) { # if space, then error message :) print "ok # Skipped: (no AnnSpec executable found)\n"x7; exit(0); } my $fastafile = "t/test_meme.fa"; #my $fastafile=$ARGV[0]; for (1..5) { my $patterngen=TFBS::PatternGen::AnnSpec->new(-seq_file=>$fastafile, #-binary=>'ann-spec', -additional_params=>'-P 5 -c' ); my @pfms = $patterngen->all_patterns(); my @motifs=$patterngen->all_motifs; if (@motifs>0){ ok(1); my @sites=$motifs[0]->get_sites; ok(1,($sites[0]->seq->seq ne '')); my @seqs; foreach my $site(@sites){ push @seqs,$site->seq->seq; } print $pfms[0]->rawprint; my $seq=$sites[0]->seq->seq; my $patt=TFBS::PatternGen::SimplePFM->new(-seq_list=>\@seqs); my $col_sum=$pfms[0]->matrix->[0]->[0]; my $check_sum=$patt->pattern->matrix->[0]->[0]; ok(1,$col_sum==$check_sum); } if (@pfms>0) { ok(1); ok(1,($pfms[0]->tag("score")>0)); last; } } ok(1); TFBS-0.7.1/t/07_Elph.t000066400000000000000000000022241305752266700141200ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::PatternGen::Elph; use TFBS::PatternGen::SimplePFM; use Test; plan(tests => 5); my $gibbspath; eval {$gibbspath = `which elph 2> /dev/null`;}; if (!$gibbspath or ($gibbspath =~ / /)) { # if space, then error message :) print "ok # Skipped: (no elph executable found)\n"x5; exit(0); } my $fastafile = "t/test_meme.fa"; #my $fastafile = $ARGV[0]; for (1..5) { my $elph = TFBS::PatternGen::Elph->new (-motif_length=>7, -seq_file=>$fastafile, -additional_params=>'-l -x -b -g -v' ); # print Dumper $elph; my $pfm_elph = $elph->pattern(); my @pfms=$elph->all_patterns; my @motifs=$elph->all_motifs; if (@motifs>0){ ok(1); my @sites=$motifs[0]->get_sites; ok(1,($sites[0]->seq->seq ne '')); my @matrix; foreach my $site(@sites){ push @matrix,$site->seq->seq; } my $patt=TFBS::PatternGen::SimplePFM->new(-seq_list=>\@matrix); my $col_sum=$elph->pattern->matrix->[0]->[0]; my $check_sum=$patt->pattern->matrix->[0]->[0]; ok(1,$col_sum==$check_sum); } # } if (@pfms>0) { ok(1); # ok(1,($pfms[0]->tag("MAP_score")>0)); last; } } ok(1); TFBS-0.7.1/t/07_Gibbs.t000066400000000000000000000014061305752266700142570ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::PatternGen::Gibbs; use Test; plan(tests => 5); my $gibbspath; eval {$gibbspath = `which Gibbs 2> /dev/null`;}; if (!$gibbspath or ($gibbspath =~ / /)) { # if space, then error message :) print "ok # Skipped: (no Gibbs executable found)\n"x5; exit(0); } my $fastafile = "t/test.gibbin"; for (1..5) { my $gibbs = TFBS::PatternGen::Gibbs->new (-nr_hits=>10, -motif_length=>[10..12], -seq_file=>$fastafile); my @pfms = $gibbs->all_patterns(); my @motifs=$gibbs->all_motifs; if (@motifs>0){ ok(1); my @sites=$motifs[0]->get_sites; # my $seq=$sites[0]->seq->seq; ok(1,($sites[0]->seq->seq ne '')); } if (@pfms>0) { ok(1); ok(1,($pfms[0]->tag("MAP_score")>0)); last; } } ok(1); TFBS-0.7.1/t/07_MEME.t000066400000000000000000000015431305752266700137560ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::PatternGen::MEME; use Test; plan(tests => 5); my $memepath; eval {$memepath = `which meme 2> /dev/null`;}; if (!$memepath or ($memepath =~ / /)) { # if space, then error message :) print "ok # Skipped: (no meme executable found)\n"x5; exit(0); } my $fastafile = "t/test_meme.fa"; for (1..1) { my $patterngen=TFBS::PatternGen::MEME->new(-seq_file=>$fastafile, -additional_params=>' -revcomp -nmotifs 2 -w 10', ); my @motifs=$patterngen->all_motifs; if (@motifs>0){ ok(1); my @sites=$motifs[0]->get_sites; # my $seq=$sites[0]->seq->seq; ok(1,($sites[0]->seq->seq ne '')); } my @pfms = $patterngen->all_patterns(); if (@pfms>0) { ok(1); ok(1,($pfms[0]->tag("score")>0)); last; } } ok(1); TFBS-0.7.1/t/08_DB_LocalTRANSFAC.t000066400000000000000000000015141305752266700157530ustar00rootroot00000000000000#!/usr/bin/env perl -w use strict; use Bio::SeqIO; use TFBS::DB::LocalTRANSFAC; use Test; plan(tests => 6); my $seq = Bio::SeqIO->new(-file=>'t/test.fa')->next_seq; my $db = TFBS::DB::LocalTRANSFAC->connect(-accept_conditions=>1, -localdir=>'t/transfac_old'); ok(("TFBS::DB::LocalTRANSFAC" eq ref($db)),1); my $pwm = $db->get_Matrix_by_ID('V$CEBPA_01','PWM'); ok($pwm->length,14); my $siteset = $pwm->search_seq(-seqobj=>$seq, -threshold=>"80%"); #print $siteset->GFF(), ok($siteset->size,31); $db = TFBS::DB::LocalTRANSFAC->connect(-accept_conditions=>1, -localdir=>'t/transfac_new'); ok(("TFBS::DB::LocalTRANSFAC" eq ref($db)),1); $pwm = $db->get_Matrix_by_ID('V$MYOD_01','PWM'); ok($pwm->length,12); $siteset = $pwm->search_seq(-seqobj=>$seq, -threshold=>"80%"); print $siteset->GFF(), ok($siteset->size,22); TFBS-0.7.1/t/09_Word_Consensus.t000066400000000000000000000015261305752266700162110ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::Word::Consensus; use Test; plan(tests => 2); # print STDERR join("\n", @INC); my $word = "AGGTCMNNNNKGACCT"; my $word_obj = TFBS::Word::Consensus->new(-word=>$word, -name=>"MyConsensus"); print $word_obj->to_PWM->prettyprint; my $siteset = $word_obj->search_seq(-file=>'t/test.fa', -threshold=>"70%", -max_mismatches => 4); ok($siteset->size(), 6); #print $siteset->GFF."\n\n"; my $sitepairset = $word_obj->search_aln(-file=>'t/test.aln', -window=>50, -cutoff=>50, -max_mismatches=>6); #print $sitepairset->GFF; my $It = $sitepairset->Iterator(); my $startsum = 0; while (my $sitepair = $It->next) { $startsum += $sitepair->feature1->start; } ok($sitepairset->size, 24); TFBS-0.7.1/t/10_Tools_SetOperations.t000066400000000000000000000013221305752266700171770ustar00rootroot00000000000000#!/usr/bin/env perl -w use TFBS::Matrix::PFM; use TFBS::Tools::SetOperations; use strict; use Test; plan(tests=>2); my $matrixstring = "0 0 0 0 0 0 0 0\n". "0 12 12 0 12 0 12 12\n". "0 0 0 12 0 12 0 0\n". "12 0 0 0 0 0 0 0"; my $pfm = TFBS::Matrix::PFM->new(-matrix=>$matrixstring, -name=>"MyMatrix"); my $siteset = $pfm->to_PWM->search_seq(-file=>'t/test.fa',-threshold=>"70%"); ok($siteset->size(), 20); my $siteset2 = $pfm->to_PWM->search_seq(-file => 't/test.fa', -threshold=>"60%"); ok ($siteset2->size>20, 1); my $sop = TFBS::Tools::SetOperations->new; my $i = $sop->intersection($siteset, $siteset2); my $u = $sop->union($siteset, $siteset2); exit(0); TFBS-0.7.1/t/test.aln000066400000000000000000000202371305752266700142140ustar00rootroot00000000000000CLUSTAL alignment of sequences Human_IR and Mouse_IR Human_IR TCAGAGATGTCCACCTGCGCCCTATTCGAGGTCTCCGGCGTCTTCTTTGGCGTCGTCTTT Mouse_IR ------------------------------------------------------------ Human_IR GCCCTTTCAGAAGCGTCTGCACATTTTTCCAGGTGTCATTTCTCCAACTTGAACACAGGG Mouse_IR ------------------------------------------------------------ Human_IR AGCGCACTGGGCACGCGGGCACGTGGCTGTCCCCAGGGGCCTGGCTTGGGTCTCGCCCCT Mouse_IR ------------------------------------------------------------ Human_IR GGGCCGGGGCGCACGCGCGGGCGGGACATCTGGGGGCGCCCACGCGCTCTGGGACGAGTG Mouse_IR ------------------------------------------------------------ Human_IR TCGCTGGCCAGGCCCGGACTGAGGAAAGGCGAGTGAGACACTACTCGCCTGGGGTGCAAA Mouse_IR ------------------------------------------------------------ Human_IR ATTTAAGGGAGTGAAAAAAAAAAAAAAAGAAAGAAACCAAAACCACCTCGAGTCACCAAA Mouse_IR ------------------------------------------------------------ Human_IR ATAAACATTTTAATGCAGTATTTTTTAAAAAATCAACAGGAATCCTCCAAAGCCCACTAT Mouse_IR --------------------------------------------CCACGGGAGTCAAAAA * * ** * Human_IR GAACAAAATAGCAAAATGGTAGAGAAAGGATCTGTGCCGCTGCGTCGGGCCTGT------ Mouse_IR AATAGTAAAAAGGAGTAAAGAAAGACAAAATGTGCTTGGTTGAATCTGGCATCTAGGGCA * ** * * * *** * ** ** * ** ** *** * * Human_IR ------------------------------------------------------------ Mouse_IR TATTGGGGTGTTTATGTCCTGGGGTATCTGGCGGCATTTCCAGACTTTAAAATTGTTATT Human_IR --G--------------------------------------------------------- Mouse_IR TCGGGGTAATTTGGGGTGTCTCGCTCTGGCGGAGATCTAGAGTGTGCTCCAGACCTTTAA * Human_IR ------------------------------------------------------------ Mouse_IR ATTGGAACCTCGGTGTTACTTGTGTTTGCTGCAGTCAGGAGGGGACCTGGGGGCGTCCCT Human_IR ------------------------------------------------------------ Mouse_IR AGGCTTTTAAACTGGAACCTCAGGCCTGTTTAGTGGTTTTCGAGTCAGGCGGGGATCTGG Human_IR ------------------------------------------------------------ Mouse_IR GGACCCCTAGGTTTTAAAATCGGAACTTTGGGACTCTTTGGGGGCGTTCGTGTCAGGTGC Human_IR ------GGGCGCCTCCGGGGGTCTGAAACTGGAG-------------------------- Mouse_IR GGATATGGGCAGCTCCGGACGTTTGACAGGTGAGTCTTAGAGTTGTTTGGGTGTTTATGT **** ****** ** *** * *** Human_IR ------------------------------------------------------------ Mouse_IR TACACTGGTATCTGAGAACGCTTACAAGCTTTTAAATCGTAATCTTTCTAAGCCAGCAGT Human_IR --------------------------------------------------------GAGA Mouse_IR TCGGGGTGTTGGGATTGGAAGAATCTGAAGTGAGCCTCGGGTCTCTGAAGTTGCGGGTTT * Human_IR CTCGGGGCTGTAGGGCGCGCGGATCTGGGGCGCGCCCTCGGTCCCGGCGCGCCCAGGGCC Mouse_IR TACGGGGC---------------------------------------------------- ****** Human_IR TCCCGCGCGGGGCCCGGCACAGGGAGGCGGGGAGGCGGGCGGGGCGGGGCG-----GGAC Mouse_IR ----AAGGGGGAAAGGGGCCGAGGAGGCGAGGAGGCGAG-GAGGCGTGGCGTTGCCAGCT * *** ** * ******* ******* * * **** **** * Human_IR CGGGCGGCACCTCCC--TCCCCTGCAAGCTTT----CCCTCCCTCTCC--TGGGCCTCTC Mouse_IR CTGGCCACGCCTCCCCGAGCTCTGCTGGCTTTCTCCCCCTCCCCCCCCAGCGGGCCTCAC * *** * ****** * **** ***** ******* * ** ******* * Human_IR CCGGGCGCAGAGTCCCTTCCTAGGCCAGATCCGCGCCGCCTTTTCCCGCGGCCCGCACGG Mouse_IR TGGAGCGCACAGTCCCTTCCCTGTCTTAATCCGCGCCACCTTTTCCCTTGGCCCGCGCGG * ***** ********** * * ********* ********* ******* *** Human_IR GGCCCAGCTGACGGGCCGCGTTGTTTACGGGCCGGAG-C-AGCCCTCTCTCCCGCCGCCC Mouse_IR G-CCCAGCTGACGGGCTGCGTTGTTTACGGGCTGGAGTCGAGCCCTAGCTCCCGCCGCC- * ************** *************** **** * ****** *********** Human_IR GCCCGCCACCCGCCAGCCCAGGTGCCCGCCCGCCAGTCAGCTAGTCCGTCGGTCCGCGCG Mouse_IR --------ACCACCAGCCCAGGTGCCC--------------------TCCAGTGCGCGCG ** *************** * ** ****** Human_IR TCCCTCTGTCCCGGAGCCCGCAGATCGCGACCCAGAGCGCGCGGGGCCGAGAGCCGAGAG Mouse_IR TCCCTCTGTCTCAGACTCCACACAACGCGACCCA-------------------------- ********** * ** ** ** * ********* Human_IR ACAGTCCCG----------------------------------------GGCGCAGCGCG Mouse_IR ------CCCAGAGCTGCACGGAGGGCAGAAGCAGGAGACCCGGACAGGAGACGCACCGCG ** * **** **** Human_IR GAG--CTCCGGGCCCCGAGATCCTGGGACGGGGCCCGGGCCGCAGCGGCCGGGGGGTCGG Mouse_IR GTGAGCTCTGGACTCTGGGATTGCGGGCAC----------------------------GG * * *** ** * * * *** *** ** Human_IR GGCC-------ACCACCGCAAGGGCCTCCGCTCAGTATTTGTAGCTGGCGAAGCCGCGCG Mouse_IR GACCGGGCCTGGGTGACCTGCGGGCCGCGCCACGGTGTTGCTTGCGGCCGAGGCCTCT-G * ** * ***** * * * ** ** * ** * *** *** * * Human_IR CGCCCTTCCCGGGGCTG-CCTCTGGGCCCTCCCCGGCAGGGGGGCTGCGGCCCGCGGGTC Mouse_IR TGCTCTTCCCGGGACTGTCCCCAGGGCCCT-CTAGGCTGGAGAGCTGCGGCCTGTGAGCC ** ********* *** ** * ******* * *** ** * ********* * * * * Human_IR GCGGGCGTGGAAGAGAAGGACGCGCGGCCCCCAGCGCCTCTTGGGTGGCCG-C-CTCGGA Mouse_IR ACGGGCGTGGAAGAGAAGGACGTGCGGCCCCGAGCGCCTCTCCAGAGACCTTCTCACGGA ********************* ******** ********* * * ** * * **** Human_IR GCATGACCCCCGCGGGCCAGCGCCGCGCGCTCTGATCCGAGGAGACCCCGCGCTCCCGCA Mouse_IR GTATGTCCCCAGTAGGCCGGCGTGGCGTGCTCTGATCGCCGGGGTCCCAGCACTCCTACT * *** **** * **** *** *** ********* ** * *** ** **** * Human_IR GCCATGGGCACCGGGGGCCGGCGGGGAG-CGGCGGCCGCGCCGCTGCTGGTGGCGGTGGC Mouse_IR GCTATGGGCTTCGGGAGA-GGATGTGAGACGACGGCTGTGCCATTGCTGGTGGCCGTGGC ** ****** **** * ** * *** ** **** * *** ********** ***** Human_IR CGCGCTGCTACTGGGCGCCGCGGGCCACCTGTACCCCGGAGAGGGTGAGTCTGGGGGCGC Mouse_IR CGCGTTGCTGGTGGGCACAGCCGGCCACCTGTACCCTGGAGAGGGTAAGTCTGG------ **** **** ***** * ** ************** ********* ******* Human_IR GGGCGTGGGCGGGGAGCGCCGCGATGGGGAGAGGACCCCACCCAAGCCAAAATCGATCCC Mouse_IR ------------------------------------------------------------ Human_IR CCGCTTGTGGACTGAGAACCCTCCCCAGGGGCGGGGGGCGGTGGCCAGGACGGTAGCTCC Mouse_IR ------------------------------------------------------------ Human_IR TGCATCGCGTAGGGGGAGCGGGAAGC Mouse_IR -------------------------- TFBS-0.7.1/t/test.fa000066400000000000000000000121441305752266700140260ustar00rootroot00000000000000>AP000365[21811:17811:16811]|NM_001444[3=1001]|NM_001444 Homo sapiens fatty acid binding protein 5, psoriasis-associated (FABP5) mRNA. GTCGCCCAGGCTGGAGTACAATGGCGCAATCTCGGCTCACCCTCGGCTCACCACAGCCTC TGCCTCCCGGGTTCAAGCAATTCTCTTGCCTCAGCCTCCTGAGTAGCTGGGACTGAGTAG CCATGTGCCACCATGCCCGGCTAATTTTGTGTTTTTAGTAGAGACAGGGTTTCTCCATGT TAGTCAGGCTGGTCTCAAACTCCTGACCTCAGGGGATCCACCCGCCTCGGCCTCCCAAAA GTGCTGGGATTACAGGCGTGTGCCACTGTGCCTGGTCTGTGAGCCACTGTGCCCGGCCTG AGAAATGTTTCTTTTTTTCTTTCTTTTTTTTTTTTTAAGCAGAAACACATTCATTTATTA ACCAAAGGGATGATCCTAATGAATCCAACACACTTTGAAATAGCTGCATGTAAAATGTTT GTGATAAAGATAATTGAACACAGTAATGAAAAAAAAAAAAGAAAGAAAGAAACGGTATGG AGATTTGCTCATTGAACTGAGCTTGGTCATTCTCTTAGTTAACTCCTGTCCAAAGTGATG ATGGAATCTTTATTGTACTTTTTCATAGATCCGAGTACAGGCGACATGGTTCATGACACA GTCCACCACTAATTTCCCATCTTTCAATGTTCTTGTTATTGTGCTTTCCTTCCCATCCCA CTCCTGATGCTGAACCAATGCACCATCTGTAAAGTTGCACACAGTCTGAGTTTTTCTGCC ATCAGCTGTGGTTTCTTCAAACTTCTCTCCCAGGGTACAAGAAAACTGTGTTGTTTTCAA AGTGCTCTCAGTTTTTATGGTGAGGTTTTTGCCATCACAAGTGATGATACAATCTGGCTT GGCCATTGCGCCCATTTTTTGCAAAGCTATTTCCTCCTAGCTCCTTCATGTATTCATCAA AGCCTTCGCTGTCCACCAGGCGCCATCTTCCTTCCAGCTGCTGAACTGTGGCCATGGTGG GTGCAGGGGGGCTGGTGTGCAGAGCAGGGTCTGCGTCGGCGTGGCAGCGTGCTGTCGAGA AATGTTTCTAAGGAGATCTTATTTGGTCTGAGAACCATGAATGATTATTTTGAGCACTTT TGATTCTGGAGACTCCATTTGGATCAGGCATGGTCCTCCAAATTCAGGCTTCTGAAAGCC TGTACCTCAGAGTAGGCTTGATGTTCCATAAAAGATGTGGTTATGAGTGCAAAGATGACT TGCCTGTATTGTTATACAAATGTAAAATGTAACAATCAACAAAAATGTAGCAAAGTATGC ATGTATACATTTTCTCTAAAGATACAGTTTCTTTTTTGAAAAAATAAACACATTAGGCAG GTGTGATGGCGGGTGCCTGTTATCCCAGCTACTCCGGAGGCTAAGGCACGAGAATCTCTT GAACCTGGGAGGTGGACAAATTGCAGTGAGCCAAGATTGCGCCACTATACTCCAGCCTGG GCAATAGAGCGAGACTCAGTCTCAAAAAATAAATAAATAAATAAATAAATAAATAAATAA ATAAAATAAACACTACCGGCCAGTGGCCATGGCTCGAGCCTATAATCCCAGCACTTTGGG AGGCCTGAGCCAGGTGGAGTTCAGGCATTCAAGACCAGCTTGGGCAATATGACAAGACCC CTGTCTCTACTAAAAATACAAAACAATAGCCGGCCGTGGTGGTGTGTGCCTGTAGTCAGC TGCTTGGGAGGCTGAGGTGGGAGGATTGCTTGAGCCCTGAAGGTGGAAGTTGCAGTGAGC TGAGATAGTGCCATTGCACTCCAGCCTGGGTGACAGAGTGAGACCCTGTCTCAAAAAATA AAATAAAATAAACACTCCTATAAAGGATCCTCTTAGCTCTTTTTCTAACACCTAATCTAC ATTTTCATATTCATTTCAGTTACCCTACAACTGTTCACTGAGCTGCTGTTGAATAGGGGA AATAAGGCAGATAACTACTGCCATCTCCGCTGGAGGGACGATACAGACATTAATCTGGGC ACTTTGATTACAGGCAATGAGAGCTGTGAGTGGGGAAAGCACAAGGTTGGCAGAAGCATT TAGGGGGACACAGCCATTCTCACGGAGGGCAGAGGTCTAAAGCAAGAGCTGAATAAAAAG TAGGAACTGGCCTCGTGGAAAGGGGAAGGGTGATGGGACAGCCTGGTGGTTTGTAGCCCA CTGGAAGGAGTTCTGAAAACTGGTGGTCAGGTGAGAAGGAAAGCTGGGGAAGAGATGAGC ACGTTCGCCAGAGGGTAGCAGGGGCTCTCCGGACCTAGTGAGTCAAGCCAAGGAATTAAG GCTTCAGCCTGCAGGGTGATGAATAGGGCTGTCTATTCCATTTCTTCCTTCTTTCTTTCT TTTCTTTCTTTTTTTGAGACAGCGTCTCACTCTGTCACCCAGGCTGGAGTGCAGTGGCAC GATCCTGGCTCACTGCAACCTCTGCCTCCCTGATTCAAGCAATTCTCCTGCTTCAGCCTC CAGAATAGCCGGGATTACGGGTGCCTGCTACCACGCCTGGCTAATTTTGTATTTTTAGTA GAGGCGAGGTTTCACCATGTTGGTCAGGCTGGTCTCGAACTCCTGACCTCAAGTGATCTG CCTACCTCGGCCTCCCAAAGTGCTGGGATTACAGGTGTAAACCACCGTGCCTGGCCTGAA AATTTCTAGTTTATGATACTTGCCAGCAGAATGTGTTCTGTCACCCTCTTCTGAATAGAT ATGGTTGTCTGCTATGACTTCTCCCACTGCTGCCCTTCCCCCTGAATCCACAGATGCATT TCTTTTAAAACTATGATCTTGTACACAATGGATGTAAATATTTAATCTTTCTATTTGTAT GTTTTTCCATGTTTCTTTTCTTTCTTTCTCTTTTTTTTTTTTTTTTTTTTTTTTTTGGAG GTGGTGTCTGCCTCTATTGCCCACAGGCTGGAGTGCACTGGTACAATCTCGGCTCACTGC ACCCTCCGCCTCCTAGGTTCAAGGGATTCTGCTGCCTGAGCCTCCTGAGTAGCTGGGACT ACAGGTGTGCACCACCACGCCCGGCTAGTTTTTATATTTTTAACAGAGACAGGGTTTCAC CATATTGGCCAGGCTGGTCTCGAACTCCTGACCTCGTGATCCTCTCACCTCGTCCTCCCA AAGTGCTGGGATTACAGGCATGAGCCACCGTGCCCGGCCTCCATGTTTATTTTCTAGTTG CTTACTTGTCCTTTTGTGTTTATCCTTGTTAACTACTACTGCCAGGCTTAAAGTATAGAC CCCTAGAGGGCAAGATTTGTATCTATATAAAATGTACTGCAAAACATCTACTTAAGCCTC ACATTCTTAAACACAAATTACTTTTGAAGATGACTGTTCTGTTTGTTTCCTTCCTGGTTT CTTCCTTTAACTTTTCCACCAAACAGGTACATGATATACTTTACTGAAATAACTTATATA GCAATATGAATTTTTTTTTTGAGGCGGAGTTTCGCTCTTGTTGCCCAGGCTAGAGTGCAA TGGCGTGATCTTGGCTCACTGCAACCTCCGCCTCCTGGGTTCAAACAATTCTCCTGTCTC AGCCTCCAGAATAGCGGGGATTACAGGCGCACACCACCATGCCAGGCTAATTTTTGTATT TTTAGTAGAGACGGGGGTTCACCATGTTGGCCACGCTGGTCTCGAACTCCTGACCTCAGG TGATCCGCCTGCCTTGGCCTCCCAAAGTGCTGGGACTACAGGCATGAGCCACCGTGCCCG GCAAATTTGAGGTGGAGGTTGCAGTGAGCTGAGATCGCATCACTGCACTCTAGCCTAGGT GACAGAGCAAGACTGTCTCCCACTTCAGCCTCCCAAGTAGCTGGGACTACAAGCATGTGC CACCAGACCTGGTTAATTTTTTTTTTTTTTTTTTTTGAGACGGAGTCTCGCTCCATCACC CAGGCTGGAGTGCAGTGGCGCGATCTCAGCTCACTGCAAGCTCCCCCTCCCGGGTACACG CCACTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCACCTGCCAGCACGCCCG GCTAACTTTTTGCATTTTTAGTAGAGACAGGGTTTCACCGTGTTAGCCAGGATGGTCTCG ATCTCCTGACCTCATGATCCACCTGCCTTGGCCTCTCAAAGTGCTGGGATTATAGGCGTG AGCCACCGCGCCCAGCCAGGCCTGGTTAATTTTCTTTGGTATTTTTTTGTAGAGACGGAG GTCTCACTATGTTGCCCAGGCTGGTCTCGAACTCCTGAGCTCAAGTGATCCACCTGCCTT GGCCTTCCAAAGTGCTAGGATTACAGGCATGAGCCACGGTGCCCAGCCTACAGTGCAACT TTAATAATAACAATATGAACACAAAAATTCTAAGATCTAAAATTTAAGCTTTCAGTAGTC CTTCTATAACTGTGAAAGTTTGGTTCCTAAAAAGCCCTGAGGAATTTATGGGAAAACAAG AGAGACAACATTTAGTAGTGAACCTGTGCATTCTAAATAAAGACAATATCAATGACGTGT TATAGGTCTTCAATTAGTAAGAATGAATATTGGACTATGAATTTTTATTCACTGTCACTT GTTTGCTAGATGCTTTGAGAATCTTCCTTGCCTATATTTTCCTGAGATGTTGGTTTTTCT TTGTCACAGATAACAATGCTCATTCCCTCCCCATTAAAAACTAAATATATATATATATAT ATATATGATTAAACGATTACTACATGTGCTTTGAAATATTCAAATATTTTAGACAGTAAA AGTCCCTTGTAATTCAACCCTTTGCAGATGATTGGTTAACAGGTTAGTACACATCTACCT AAATTTAAAATCCCATATTTAACATGTATACTTATTAGAAAGTACACATTCTAATATTTT TCTATTGTATTTGGTACTATTTTCAGATGCTCCTGCCTTTTTCTTTCGTAATTTTGAAGG ACCTCAGCTCCCTGCCTCCTAGATTTTTGCTACTATGGTCTCAGAGCTGTGTAATTTGGA TGACTGAGATGGAAAAACCTC TFBS-0.7.1/t/test.gibbin000066400000000000000000000273361305752266700147030ustar00rootroot00000000000000>CE0001hum 2051-2077 AGGTCACTTTATAAGGGTCTGGGGGGG >CE0001hum 1663-1674 TGGGGACTGGAT >CE0001hum 1953-1967 AAGCCAATTAGGCCC >CE0001hum 1531-1540 AATTCCAATT >CE0001hum 1504-1514 AATGTTTAATG >CE0001hum 1812-1826 AGGCCACCAGACTGA >CE0001hum 1932-1949 TGTCACCTTGGCCCCTCT >CE0001mus 2051-2077 AGGTCACTTTATAAGGGTCTGGGGGGG >CE0001mus 1663-1674 TGGGGACTGGAT >CE0001mus 1953-1967 AAGCCAATTAGGCCC >CE0001mus 1531-1540 AATTCCAATT >CE0001mus 1504-1514 AATGTTTAATG >CE0001mus 1812-1826 AGGCCACCAGACTGA >CE0001mus 1932-1949 TGTCACCTTGGCCCCTCT >CE0002hum 1861-1871 AGATGCAATGT >CE0002hum 1277-1286 CTGAGGCTGA >CE0002hum 854-863 AGCCAGTCAG >CE0002hum 1957-1969 GAGCTGGAAGCCT >CE0002hum 589-601 AGACCACCTTCTC >CE0002hum 413-428 GACCATATGTCACTTG >CE0002hum 504-515 TTGGTAAACATC >CE0002hum 1879-1897 GGGTCAAACCACCCTGGCC >CE0002hum 517-526 TCAGATAATG >CE0002hum 383-396 TTCCATTGGCTGCA >CE0002hum 437-484 ACAAATCCTAATGAGCTAAAAATATGTTTGTTTTAGCTAATTGACCTC >CE0002hum 1922-1953 ATGTGCTTGAATTAGACAGGATTAAAGGCTTA >CE0002mus 1861-1871 AGATGCAATGT >CE0002mus 1277-1286 CTGAGGCTGA >CE0002mus 854-863 AGCCAGTCAG >CE0002mus 1957-1969 GAGCTGGAAGCCT >CE0002mus 589-601 AGACCACCTTCTC >CE0002mus 413-428 GACCATATGTCACTTG >CE0002mus 504-515 TTGGTAAACATC >CE0002mus 1879-1897 GGGTCAAACCACCCTGGCC >CE0002mus 517-526 TCAGATAATG >CE0002mus 383-396 TTCCATTGGCTGCA >CE0002mus 437-484 ACAAATCCTAATGAGCTAAAAATATGTTTGTTTTAGCTAATTGACCTC >CE0002mus 1922-1953 ATGTGCTTGAATTAGACAGGATTAAAGGCTTA >CE0003hum 1672-1684 TCTATTTCCAGAA >CE0003hum 1546-1563 TTTCTCCTTGACCTTTTT >CE0003hum 1848-1860 TAGAACTGCTGAA >CE0003hum 1688-1699 AAAGAGGTTTAG >CE0003mus 1672-1684 TCTATTTCCAGAA >CE0003mus 1546-1563 TTTCTCCTTGACCTTTTT >CE0003mus 1848-1860 TAGAACTGCTGAA >CE0003mus 1688-1699 AAAGAGGTTTAG >CE0004hum 1055-1064 CCCACCACCT >CE0004hum 1121-1133 GGACCAACCCAAA >CE0004hum 231-242 ACAGGCAGCTGC >CE0004mus 1055-1064 CCCACCACCT >CE0004mus 1121-1133 GGACCAACCCAAA >CE0004mus 231-242 ACAGGCAGCTGC >CE0008hum 1951-1962 TGCATGTGTGTG >CE0008hum 3420-3433 CAGGTACCTGGGGA >CE0008hum 300-313 AGAGAAGAGCTAAG >CE0008hum 1417-1428 AGCCCTGCTGGC >CE0008hum 5424-5435 TGTCTGTCCCCA >CE0008hum 2858-2872 ACCTAAAAAGCTCTC >AH0008mus 1951-1962 TGCATGTGTGTG >AH0008mus 3420-3433 CAGGTACCTGGGGA >AH0008mus 300-313 AGAGAAGAGCTAAG >AH0008mus 1417-1428 AGCCCTGCTGGC >AH0008mus 5424-5435 TGTCTGTCCCCA >AH0008mus 2858-2872 ACCTAAAAAGCTCTC >AH0010hum 404-413 GGGCTGATCC >AH0010hum 346-370 GGCCCAATTAAGAGATCAGATGGTG >AH0010mus 404-413 GGGCTGATCC >AH0010mus 346-370 GGCCCAATTAAGAGATCAGATGGTG >CE0011hum 836-850 GCCTTTAATCACCAG >CE0011hum 820-831 TTGAAGGGGAAA >CE0011mus 836-850 GCCTTTAATCACCAG >CE0011mus 820-831 TTGAAGGGGAAA >AH0013hum 11842-11851 GAAAGAGATG >AH0013hum 12348-12359 TTGCTTTGCAGA >AH0013hum 12177-12190 CCATCCATGTGACT >AH0013hum 12439-12458 CCATGTGCCATGGAATCATG >AH0013hum 11629-11638 TGCCCTGGTG >AH0013hum 11794-11803 TCCTACCCTC >AH0013hum 11969-11979 TTCCTGAGGCT >AH0013hum 8110-8125 TGGCTCCTCTGACAGG >AH0013hum 8002-8013 TTGTTTACTAAC >AH0013hum 2216-2230 GCTAAGAGGGGATGA >AH0013hum 6501-6513 CATCGAGTAGCGG >AH0013hum 6700-6728 CGCAGCAACAGGTGTGAGCAGGTGGGGGA >AH0013hum 8051-8064 GGCAGGGCTGGGGG >AH0013hum 8502-8512 AGAGGCTCCCT >AH0013hum 2157-2181 TGAGTTAAAGGCACAGAGATGAAAA >AH0013hum 9151-9170 CCATTCTCAGTTCATTTAGT >AH0013hum 6551-6563 AGGTGGGTGCATC >AH0013hum 6354-6366 CCCCCTCCCTGGG >AH0013hum 6381-6394 GGAGGGCGGGGCCC >AH0013hum 6626-6636 TGGAGAGGGGA >AH0013hum 6680-6694 AGGTGTGCCAGCCCT >AH0013hum 7922-7932 CCTGCTGACCC >AH0013hum 10206-10216 TGCTTCTGAGC >AH0013hum 11153-11170 TGTGTGTCTGCTTCCAGA >AH0013hum 11081-11090 AGAAGTCAAT >AH0013hum 11830-11839 CTGGAGGAGA >AH0013hum 11642-11651 GGGACTCTGA >AH0013hum 11922-11935 TGCCATGGCAACCA >CE0013mus 11842-11851 GAAAGAGATG >CE0013mus 12348-12359 TTGCTTTGCAGA >CE0013mus 12177-12190 CCATCCATGTGACT >CE0013mus 12439-12458 CCATGTGCCATGGAATCATG >CE0013mus 11629-11638 TGCCCTGGTG >CE0013mus 11794-11803 TCCTACCCTC >CE0013mus 11969-11979 TTCCTGAGGCT >CE0013mus 8110-8125 TGGCTCCTCTGACAGG >CE0013mus 8002-8013 TTGTTTACTAAC >CE0013mus 2216-2230 GCTAAGAGGGGATGA >CE0013mus 6501-6513 CATCGAGTAGCGG >CE0013mus 6700-6728 CGCAGCAACAGGTGTGAGCAGGTGGGGGA >CE0013mus 8051-8064 GGCAGGGCTGGGGG >CE0013mus 8502-8512 AGAGGCTCCCT >CE0013mus 2157-2181 TGAGTTAAAGGCACAGAGATGAAAA >CE0013mus 9151-9170 CCATTCTCAGTTCATTTAGT >CE0013mus 6551-6563 AGGTGGGTGCATC >CE0013mus 6354-6366 CCCCCTCCCTGGG >CE0013mus 6381-6394 GGAGGGCGGGGCCC >CE0013mus 6626-6636 TGGAGAGGGGA >CE0013mus 6680-6694 AGGTGTGCCAGCCCT >CE0013mus 7922-7932 CCTGCTGACCC >CE0013mus 10206-10216 TGCTTCTGAGC >CE0013mus 11153-11170 TGTGTGTCTGCTTCCAGA >CE0013mus 11081-11090 AGAAGTCAAT >CE0013mus 11830-11839 CTGGAGGAGA >CE0013mus 11642-11651 GGGACTCTGA >CE0013mus 11922-11935 TGCCATGGCAACCA >CE0014hum 959-979 TCAGGCCAGTGGGAGGAGCTG >CE0014hum 2403-2422 ATTGAGTTCCTCAAGAAGCT >CE0014hum 3412-3428 GAGTACCAGGAGCTCCT >CE0014hum 2207-2221 GGTGGCATCGGTGGG >CE0014hum 3135-3148 CAGATCCAGAGTCT >CE0014hum 2046-2060 ACCCCGCAGTTCAGC >CE0014hum 2129-2139 GTGCTCTTCCG >CE0014hum 3813-3822 GTGGGAGCCT >CE0014hum 3174-3184 GGCACGGTGAG >CE0014hum 1635-1656 GCGGCGCTCAAGCAGAGGTCAG >CE0014hum 1275-1289 CTGGGCAGCTTCCGT >CE0014hum 2925-2935 CTGCAGGAGGC >CE0014hum 1593-1612 GACCGGGTGCAGGTGGAGCG >CE0014hum 1476-1498 GCCCTGCGCGGGGAGCTGAGCCA >CE0014hum 1918-1928 AGTCCGTTAGA >CE0014hum 3099-3112 CGCCAGGCCAAGCA >CE0014hum 3937-3948 ATTGAGACCCGG >CE0014hum 642-652 CTCTCCCTCCA >CE0014hum 1200-1219 TTCTCCTACTCGTCCAGCTC >CE0014hum 1120-1129 CGTCGGGCCT >CE0014hum 1006-1026 GAGACCGCAGGGCTATAAAGC >CE0014hum 2332-2347 CGAAGGACGTGGACGA >CE0014hum 1324-1342 TGCCCTCGGAGCGCCTCGA >CE0014hum 2373-2401 GAGCGCAAGATTGAGTCTCTGATGGATGA >CE0014hum 2817-2845 GAGAGCCAGCAGGTGCAGCAGGTGGAGGT >CE0014hum 2259-2270 CAGACAGGGAAG >CE0014hum 4718-4732 CTGGACAAGTCTTCT >CE0014hum 2576-2586 GACTCCCACCC >CE0014hum 2099-2112 CGCAAGCGGGAGGA >CE0014hum 2937-2947 GAGGAGTGGTA >CE0014hum 4994-5004 CAAATCTACTC >CE0014hum 451-461 CACCAGCCCTT >CE0014hum 2898-2907 CAGTACGAGA >CE0014hum 336-357 GTTTGGGGGCTGTGTCTTTAAG >CE0014hum 491-504 AGCTCTTCCCCAGC >CE0014hum 1410-1450 CAGGAGCTCAACGACCGCTTCGCCAACTTCATCGAGAAGGT >CE0014hum 4004-4016 TTTCTGAGCCCCA >CE0014hum 3114-3127 GAGATGAACGAGTC >CE0014hum 3150-3163 ACGTGCGAGGTGGA >CE0014hum 3313-3332 GAGCAGTTCGCCCTGGAGGC >CE0014hum 3430-3458 AACGTCAAGATGGCCCTGGACATCGAGAT >CE0014hum 1253-1269 GTCCCCGAGCTCCTCGG >CE0014hum 1371-1384 GAGTTCCTGGCCAC >CE0014hum 1614-1633 GACGGGCTGGCGGAGGACCT >CE0014hum 1176-1186 CCGCCCTCACT >CE0014hum 1536-1564 TGCCAGCAGGAGCTGCGCGAGCTGCGGCG >CE0014hum 3635-3653 TAAAGACGACTGGTGAGTC >CE0014hum 3950-3959 ATGGGGAGGT >CE0014hum 4680-4704 AGGTGGTGACAGAGTCCCAGAAGGA >CE0014hum 2349-2365 GCCACTCTGTCCCGCCT >CE0014hum 1459-1474 TGGAGCAGCAGAACGC >CE0014hum 2865-2875 GAGCTGACGGC >CE0014hum 3874-3885 TCAGTGCCTGAG >CE0014hum 2588-2598 TGCGCCACCTG >CE0014hum 468-477 GCCCTGGGGA >CE0014hum 1221-1249 CGCTTCTCCAGCAGCCGCCTGCTGGGCTC >CE0014hum 1141-1153 TCAGCTCCACCTC >CE0014hum 1060-1069 CCCGGCCTAG >CE0014hum 2114-2127 GCGGAGCACAACCT >CE0014hum 1503-1534 CGGGGCCAGGAGCCGGCGCGCGCCGACCAGCT >CE0014hum 1344-1369 TTCTCCATGGCCGAGGCCCTCAACCA >CE0014hum 3280-3293 GAGGCGCTGCTCAG >CE0014hum 3921-3931 GGTTCTGATCA >CE0014hum 1188-1198 TCCCCCGGGGC >CE0014hum 1389-1402 AGCAACGAGAAGCA >CE0014hum 3487-3502 GAGGAGAGCCGGTGAG >CE0014hum 2796-2812 CGAGACCTGCAGGTGAG >CE0014hum 2877-2893 GCGCTGAGGGACATCCG >CE0014hum 434-448 GCAGAGCTGGCGCCA >CE0014hum 915-936 GGGTGGGGCATCCCCCTCCCCA >CE0014mus 959-979 TCAGGCCAGTGGGAGGAGCTG >CE0014mus 2403-2422 ATTGAGTTCCTCAAGAAGCT >CE0014mus 3412-3428 GAGTACCAGGAGCTCCT >CE0014mus 2207-2221 GGTGGCATCGGTGGG >CE0014mus 3135-3148 CAGATCCAGAGTCT >CE0014mus 2046-2060 ACCCCGCAGTTCAGC >CE0014mus 2129-2139 GTGCTCTTCCG >CE0014mus 3813-3822 GTGGGAGCCT >CE0014mus 3174-3184 GGCACGGTGAG >CE0014mus 1635-1656 GCGGCGCTCAAGCAGAGGTCAG >CE0014mus 1275-1289 CTGGGCAGCTTCCGT >CE0014mus 2925-2935 CTGCAGGAGGC >CE0014mus 1593-1612 GACCGGGTGCAGGTGGAGCG >CE0014mus 1476-1498 GCCCTGCGCGGGGAGCTGAGCCA >CE0014mus 1918-1928 AGTCCGTTAGA >CE0014mus 3099-3112 CGCCAGGCCAAGCA >CE0014mus 3937-3948 ATTGAGACCCGG >CE0014mus 642-652 CTCTCCCTCCA >CE0014mus 1200-1219 TTCTCCTACTCGTCCAGCTC >CE0014mus 1120-1129 CGTCGGGCCT >CE0014mus 1006-1026 GAGACCGCAGGGCTATAAAGC >CE0014mus 2332-2347 CGAAGGACGTGGACGA >CE0014mus 1324-1342 TGCCCTCGGAGCGCCTCGA >CE0014mus 2373-2401 GAGCGCAAGATTGAGTCTCTGATGGATGA >CE0014mus 2817-2845 GAGAGCCAGCAGGTGCAGCAGGTGGAGGT >CE0014mus 2259-2270 CAGACAGGGAAG >CE0014mus 4718-4732 CTGGACAAGTCTTCT >CE0014mus 2576-2586 GACTCCCACCC >CE0014mus 2099-2112 CGCAAGCGGGAGGA >CE0014mus 2937-2947 GAGGAGTGGTA >CE0014mus 4994-5004 CAAATCTACTC >CE0014mus 451-461 CACCAGCCCTT >CE0014mus 2898-2907 CAGTACGAGA >CE0014mus 336-357 GTTTGGGGGCTGTGTCTTTAAG >CE0014mus 491-504 AGCTCTTCCCCAGC >CE0014mus 1410-1450 CAGGAGCTCAACGACCGCTTCGCCAACTTCATCGAGAAGGT >CE0014mus 4004-4016 TTTCTGAGCCCCA >CE0014mus 3114-3127 GAGATGAACGAGTC >CE0014mus 3150-3163 ACGTGCGAGGTGGA >CE0014mus 3313-3332 GAGCAGTTCGCCCTGGAGGC >CE0014mus 3430-3458 AACGTCAAGATGGCCCTGGACATCGAGAT >CE0014mus 1253-1269 GTCCCCGAGCTCCTCGG >CE0014mus 1371-1384 GAGTTCCTGGCCAC >CE0014mus 1614-1633 GACGGGCTGGCGGAGGACCT >CE0014mus 1176-1186 CCGCCCTCACT >CE0014mus 1536-1564 TGCCAGCAGGAGCTGCGCGAGCTGCGGCG >CE0014mus 3635-3653 TAAAGACGACTGGTGAGTC >CE0014mus 3950-3959 ATGGGGAGGT >CE0014mus 4680-4704 AGGTGGTGACAGAGTCCCAGAAGGA >CE0014mus 2349-2365 GCCACTCTGTCCCGCCT >CE0014mus 1459-1474 TGGAGCAGCAGAACGC >CE0014mus 2865-2875 GAGCTGACGGC >CE0014mus 3874-3885 TCAGTGCCTGAG >CE0014mus 2588-2598 TGCGCCACCTG >CE0014mus 468-477 GCCCTGGGGA >CE0014mus 1221-1249 CGCTTCTCCAGCAGCCGCCTGCTGGGCTC >CE0014mus 1141-1153 TCAGCTCCACCTC >CE0014mus 1060-1069 CCCGGCCTAG >CE0014mus 2114-2127 GCGGAGCACAACCT >CE0014mus 1503-1534 CGGGGCCAGGAGCCGGCGCGCGCCGACCAGCT >CE0014mus 1344-1369 TTCTCCATGGCCGAGGCCCTCAACCA >CE0014mus 3280-3293 GAGGCGCTGCTCAG >CE0014mus 3921-3931 GGTTCTGATCA >CE0014mus 1188-1198 TCCCCCGGGGC >CE0014mus 1389-1402 AGCAACGAGAAGCA >CE0014mus 3487-3502 GAGGAGAGCCGGTGAG >CE0014mus 2796-2812 CGAGACCTGCAGGTGAG >CE0014mus 2877-2893 GCGCTGAGGGACATCCG >CE0014mus 434-448 GCAGAGCTGGCGCCA >CE0014mus 915-936 GGGTGGGGCATCCCCCTCCCCA >CE0016hum 2801-2815 CCAGCCCAGCTCCTC >CE0016hum 1390-1400 GTGGCTGAGAG >CE0016hum 2741-2759 GGTTGAGATTCTTTATTCT >CE0016hum 2761-2774 GAGGTAGGAAGGGG >CE0016hum 99-109 GGCTCCCGGGG >CE0016hum 1684-1693 CCTTTGCAGC >CE0016hum 121-131 GGGCCGTCCGG >CE0016hum 2838-2853 CCCCAATAAGTTACCC >CE0016hum 233-244 TCCCTGTCCGGG >CE0016hum 144-158 GCAGCACAAACAGGC >CE0016hum 2779-2793 GCATGCTCAGGTGGG >CE0016hum 571-601 CAGGGTATTGGGGTCAGGGTGGCATTAGCCC >CE0016hum 617-637 GGGCTGACTCAGCATCCTGCC >CE0016hum 671-680 CACTCCCTTG >CE0016hum 447-458 GGGCGGGGTCAC >CE0016hum 429-439 CACTTCCGCGA >AH0016mus 2801-2815 CCAGCCCAGCTCCTC >AH0016mus 1390-1400 GTGGCTGAGAG >AH0016mus 2741-2759 GGTTGAGATTCTTTATTCT >AH0016mus 2761-2774 GAGGTAGGAAGGGG >AH0016mus 99-109 GGCTCCCGGGG >AH0016mus 1684-1693 CCTTTGCAGC >AH0016mus 121-131 GGGCCGTCCGG >AH0016mus 2838-2853 CCCCAATAAGTTACCC >AH0016mus 233-244 TCCCTGTCCGGG >AH0016mus 144-158 GCAGCACAAACAGGC >AH0016mus 2779-2793 GCATGCTCAGGTGGG >AH0016mus 571-601 CAGGGTATTGGGGTCAGGGTGGCATTAGCCC >AH0016mus 617-637 GGGCTGACTCAGCATCCTGCC >AH0016mus 671-680 CACTCCCTTG >AH0016mus 447-458 GGGCGGGGTCAC >AH0016mus 429-439 CACTTCCGCGA >AH0017hum 4471-4488 GGTGCTGAGAACTTGCTC >AH0017hum 5385-5405 TTACATGGTAGGTTTAGGGGA >AH0017hum 392-405 TCCCTGGACCCCTC >AH0017mus 4471-4488 GGTGCTGAGAACTTGCTC >AH0017mus 5385-5405 TTACATGGTAGGTTTAGGGGA >AH0017mus 392-405 TCCCTGGACCCCTC >CE0022hum 1175-1187 GCGAGGGAGCGCC >AH0022mus 1175-1187 GCGAGGGAGCGCC >AH0024hum 368-380 TGGACAAGGGCAG >AH0024hum 172-208 GACTTGATCTTCTGTTAGCCCTAATCATCAATTAGCA >AH0024mus 368-380 TGGACAAGGGCAG >AH0024mus 172-208 GACTTGATCTTCTGTTAGCCCTAATCATCAATTAGCA TFBS-0.7.1/t/test_meme.fa000066400000000000000000000113711305752266700150320ustar00rootroot00000000000000>GAL1 CAGGTTATCAGCAACAACACAGTCATATCCATTCTCAATTAGCTCTACCACAGTGTGTGAACCAATGTATCCAGCACCACCTGTAACCAAAACAATTTTAGAAGTACTTTCACTTTGTAACTGAGCTGTCATTTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATA >GAL10 CGGTTTAGCATCATAAGCGCTTATAAATTTCTTAATTATGCTCGGGCACTTTTCGGCCAATGGTCTTGGTAATTCCTTTGCGCTAGAATTGAACTCAGGTACAATCACTTCTTCTGAATGAGATTTAGTCATTATAGTTTTTTCTCCTTGACGTTAAAGTATAGAGGTATATTAACAATTTTTTGTTGATACTTTTATGACATTTGAATAAGAAGTAATACAAACTGAAAATGTTGAAAGTATTAGTTAAAGTGGTTATGCAGCTTTTCCATTTATATATCTGTTAATAGATCAAAAATCATCGCTTCGCTGATTAATTACCCCAGAAATAAGGCTAAAAAACTAATCGCATTATCATCCTATGGTTGTTAATTTGATTCGTTAATTTGAAGGTTTGTGGGGCCAGGTTACTGCCAATTTTTCCTCTTCATAACCATAAAAGCTAGTATTGTAGAATCTTTATTGTTCGGAGCAGTGCGGCGCGAGGCACATCTGCGTTTCAGGAACGCGACCGGTGAAGACGAGGACGCACGGAGGAGAGTCTTCCGTCGGAGGGCTGTCGCCCGCTCGGCGGCTTCTAATCCGTACTTCAATATAGCAATGAGCAGTTAAGCGTATTACTGAAAGTTCCAAAGAGAAGGTTTTTTTAGGCTAAGATAATGGGGCTCTTTACATTTCCACAACATATAAGTAAGATTAGATATGGATATGTATATGGTGGTAATGCCATGTAATATGATTATTAAACTTCTTTGCGTCCATCCAAAAAAAAAGTAAGAATTTTTGAAAATTCAATATAA >GAL2 CATTAATTTTGCTTCCAAGACGACAGTAATATGTCTCCTACAATACCAGTTTCGCTGCAGAAGGCACATCTATTACATTTACTGAGCATAACGGGCTGTACTAATCCAAGGAGGTTTACGGACCAGGGGAACTTTCCAGATTCAGATCACAGCAATATAGGACTAGAAAATATCAGGTAGCCGCACTCAACTTGTAACTGGCAACTACTTTGCATCAAACTCCAATTAAATGCGGTAGAATCTTTTCACAAAAGGTACTCAACGTCAATTCGGAAAGCTTCCTTCCGGAATGGCTTAAGTAGGTTGCAATTTCTTTTTCTATTAGTAGCTAAAAATGGGTCACGTGATCTATATTCGAAAGGGGCGGTTGCCTCAGGAAGGCACCGGCGGTCTTTCGTCCGTGCGGAGATATCTGCGCCGTTCAGGGGTCCATGTGCCTTGGACGATATTAAGGCAGAAGGCAGTATCGGGGCGGATCACTCCGAACCGAGATTAGTTAAGCCCTTCCCATCTCAAGATGGGGAGCAAATGGCATTATACTCCTGCTAGAAAGTTAACTGTGCACATATTCTTAAATTATACAACATTCTGGAGAGCTATTGTTCAAAAAACAAACATTTCGCAGGCTAAAATGTGGAGATAGGATAAGTTTTGTAGACATATATAAACAATCAGTAATTGGATTGAAAATTTGGTGTTGTGAATTGCTCTTCATTATGCACCTTATTCAATTATCATCAAGAATAGTAATAGTTAAGTAAACACAAGATTAACATAATAAAAAAAATAATTCTTTCATA >GAL7 GAGAACTGGAAAGATTGTGTAACCTTGAAAAACGGTGAAACTTACGGGTCCAAGATTGTCTACAGATTTTCCTGATTTGCCAGCTTACTATCCTTCTTGAAAATATGCACTCTATATCTTTTAGTTCTTAATTGCAACACATAGATTTGCTGTATAACGAATTTTATGCTATTTTTTAAATTTGGAGTTCAGTGATAAAAGTGTCACAGCGAATTTCCTCACATGTAGGGACCGAATTGTTTACAAGTTCTCTGTACCACCATGGAGACATCAAAAATTGAAAATCTATGGAAAGATATGGACGGTAGCAACAAGAATATAGCACGAGCCGCGGAGTTCATTTCGTTACTTTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATCATAAGGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGGGTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTTTGCCTCTCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAGATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAAAGGGTCCAAAAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGTGTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGCTATGTTCAGTTAGTTTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATTATGCAGAGCATCAACATGATAAAAAAAAACAGTTGAATATTCCCTCAAAA >GAL80 TATCCTTTACGTTTTGACTTGGTGCTCGAAGATGCTTTCAGAGATGGTGCTTATCCTCATGTCTTTTGGGTTTGTCTTCAATACGGCAGCCGTTGTCTTGCAAACGGCCGCCTCTGCCATGGCAAAGAATGCTTTCCATGACGATCATCGTAGTGCCCAATTGGGTGCCTCTATGATGGGTATGGCTTGGGCAAGTGTCTTTTTATGTATCGTGGAATTTATCCTGCTGGTCTTCTGGTCTGTTAGGGCAAGGTTGGCCTCTACTTACTCCATCGACAATTCAAGATACAGAACCTCCTCCAGATGGAATCCCTTCCATAGAGAGAAGGAGCAAGCAACTGACCCAATATTGACTGCCACTGGACCTGAAGACATGCAACAAAGTGCAAGCATAGTGGGGCCTTCTTCCAATGCTAATCCGGTCACTGCCACTGCTGCTACGGAAAACCAACCTAAAGGTATTAACTTCTTCACTATAAGAAAATCACACGAGCGCCCGGACGATGTCTCTGTTTAAATGGCGCAAGTTTTCCGCTTTGTAATATATATTTATACCCCTTTCTTCTCTCCCCTGCAATATAATAGTTTAATTCTAATATTAATAATATCCTATATTTTCTTCATTTACCGGCGCACTCTCGCCCGAACGACCTCAAAATGTCTGCTACATTCATAATAACCAAAAGCTCATAACTTTTTTTTTTGAACCTGAATATATATACATCACATATCACTGCTGGTCCTTGCCGACCAGCGTATACAATCTCGATAGTTGGTTTCCCGTTCTTTCCACTCCCGTC >GCY1 GTCTTAGTATCTCATCTCATCTCAATTTCTATATTCCACTATAAAATTTTTCACTCTTTC TGCGCGCGCCAATGTCCCCGCAACTACTCAATAGGTAACATGAGAATATTTCAGTTCGTA AGAGAGAAGAGATGAAGTTATTTGGGCTCTTTGCTCGAGGTTACAGAAGGGCCGCATTAG AGTGAATGAGCTGATGATATTTCGCCCAGTTCTACATTTTTTTTTTTTTGGAAGTATGAC CTCTGTTAAATTTTTTTTTTTTTAAATTTCACTTTCTAAAGTCCCAGAAATCCGCTTGAA TGTCTTACATATTGCAATGGATATGCTTGGGTGATCATACTTCCTGGCTTTAGATATTTG AAACTTAACTCTTGTCAACAAACTTCCTATGGAGTGTATAAGAATTGTAAGTTATAACAC CGGCGAACAATCGGGGCAGACTATTCCGGGGAAGAACAAGGAAGGGCGGTCTTTTCTCCC TCATTGTCATAGCAAGGTCATTTCGCCTTCTCAGAAAGGGGTAGAATCAATCTAGCACGC AGATTGCAAACACGGCTTAATAATATGCCTATCAGGCATTCACCCGTGTGACGAATCGCA CACCGCTGCTCTCCTTAATTCCCTAGAGTAGAAACCGAGCTTTCAGGAAAAGACTACGGC AGTAAAGAATTGCTTTACTGGGCGTATAAAACCGGGAGAATCAAGACATTCTAATGACTT GATTCAGGATGAGAGCTTAATAGGTGCATCTTAGCAAGCTAAAATTTGGACAGCTCTCATTACTAAATTAAGATAGAAAA TFBS-0.7.1/t/transfac_new/000077500000000000000000000000001305752266700152075ustar00rootroot00000000000000TFBS-0.7.1/t/transfac_new/matrix.dat000066400000000000000000000015521305752266700172100ustar00rootroot00000000000000 VV  TRANSFAC MATRIX TABLE, Release 7.1 - licensed - 2003-04-08, (C) Biobase GmbH XX // AC M00001 XX ID V$MYOD_01 XX DT 19.10.1992 (created); ewi. DT 22.10.1997 (updated); dbo. CO Copyright (C), Biobase GmbH. XX NA MyoD XX DE myoblast determination gene product XX BF T00526; MyoD; Species: mouse, Mus musculus. XX P0 A C G T 01 1 2 2 0 S 02 2 1 2 0 R 03 3 0 1 1 A 04 0 5 0 0 C 05 5 0 0 0 A 06 0 0 4 1 G 07 0 1 4 0 G 08 0 0 0 5 T 09 0 0 5 0 G 10 0 1 2 2 K 11 0 2 0 3 Y 12 1 0 3 1 G XX BA 5 functional elements in 3 genes XX CC no comment XX // TFBS-0.7.1/t/transfac_old/000077500000000000000000000000001305752266700151745ustar00rootroot00000000000000TFBS-0.7.1/t/transfac_old/matrix.dat000066400000000000000000005002341305752266700171760ustar00rootroot00000000000000VV TRANSFAC MATRIX TABLE, V.2.424-08-1995 XX // AC M00001 XX ID V$MYOD_01 XX NA MyoD XX DT EWI (created); 19.10.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 0 0 02 0 0 0 0 03 1 2 2 0 04 2 1 2 0 05 3 0 1 1 06 0 5 0 0 07 5 0 0 0 08 0 0 4 1 09 0 1 4 0 10 0 0 0 5 11 0 0 5 0 12 0 1 2 2 13 0 2 0 3 14 1 0 3 1 15 0 0 0 0 16 0 0 0 0 17 0 0 0 0 XX BF T00526; MyoD ; mouse XX BA 5 functional elements in 3 genes XX XX // AC M00002 XX ID V$E47_01 XX NA E47 XX DT EWI (created); 19.10.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 0 0 02 0 0 0 0 03 4 4 3 0 04 2 5 4 0 05 3 2 4 2 06 2 0 9 0 07 0 11 0 0 08 11 0 0 0 09 0 0 11 0 10 1 2 8 0 11 0 0 0 11 12 0 0 11 0 13 0 0 4 7 14 1 4 3 3 15 1 6 2 2 16 1 4 4 2 17 1 4 2 3 XX BF T00207; E47 ; human XX BA 11 selected strong binding sites for E47, E47-MyoD, E12+MyoD and (weak) for E12 XX CC Group I in [903]; 5 sites selected in vitro for binding to E12N CC (=N-terminally truncated E12); matrix corrected according to CC the published sequences XX RN [1] RA Sun X.-H., Baltimore D. RT An inhibitory domain of E12 transcription factor prevents DNA RT binding in E12 homodimers but not in E12 heterodimers RL Cell 64:459-470 (1991). XX // AC M00003 XX ID V$VMYB_01 XX NA v-Myb XX DT EWI (created); 19.10.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 0 0 02 0 0 0 0 03 0 0 0 0 04 14 3 4 3 05 13 3 5 3 06 2 7 1 14 07 24 0 0 0 08 24 0 0 0 09 0 24 0 0 10 1 0 17 6 11 0 1 22 1 12 12 5 3 4 13 11 5 2 6 14 0 0 0 0 15 0 0 0 0 16 0 0 0 0 17 0 0 0 0 XX BF T00895; v-Myb ; AMV XX BA 21 sites isolated from total chromatin + 3 sites from mim-1 promoter XX RN [1] RA Biedenkapp H., Borgmeyer U., Sippel A., Klempnauer K.-H. RT Viral myb oncogene encodes a sequence-specific DNA-binding activity RL Nature 335:835-837 (1988). RN [2] RA Ness S. A., Marknell A., Graf T. RT The v-myb oncogene product binds to and activates the promyelocyte-specific RT mim-1 gene RL Cell 59:1115-1125 (1989). XX // AC M00004 XX ID V$CMYB_01 XX NA c-Myb XX DT EWI (created); 19.10.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 12 6 8 02 6 15 5 1 03 6 10 9 5 04 16 5 14 3 05 16 11 8 2 06 9 16 8 16 07 8 4 34 5 08 20 1 25 5 09 4 49 2 5 10 18 18 20 4 11 0 0 60 0 12 0 0 0 60 13 0 0 0 60 14 11 1 47 1 15 6 6 21 5 16 1 7 15 11 17 2 2 12 4 18 1 0 11 1 XX BF T00138; c-Myb ; mouse XX BA 9 c-Myb binding sites in SV40 DNA + 51 selected single binding sites XX CC Matrix of [905] has been inverted to be compatible XX RN [1] RA Nakagoshi H., Nagase T., Kanei-Ishii C., Ueno Y., Ishii S. RT Binding of the c-myb proto-oncogene product to the simian virus RT 40 enhancer stimulates transcription RL J. Biol. Chem. 265:3479-3483 (1990). RN [2] RA Howe K. M., Watson R. J. RT Nucleotide preferences in sequence-specific recognition of DNA RT by c-myb protein RL Nucleic Acids Res. 19:3913-3919 (1991). XX // AC M00005 XX ID V$AP4_01 XX NA AP-4 XX DT EWI (created); 19.10.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 3 0 0 2 02 1 1 3 0 03 3 1 1 0 04 2 1 2 0 05 1 2 0 2 06 0 5 0 0 07 5 0 0 0 08 0 0 5 0 09 0 5 0 0 10 0 0 1 4 11 0 1 4 0 12 0 2 1 2 13 1 0 3 1 14 0 0 5 0 15 1 1 1 2 16 1 4 0 0 17 2 1 1 1 18 0 0 3 2 XX BF T00036; AP-4 ; human XX BA 5 elements from 5 genes XX CC compiled sequences XX RN [1] RA Mermod N., Williams T. J., Tjian R. RT Enhancer binding factors AP-4 and AP-1 act in concert to activate RT SV40 late transcription in vitro RL Nature 332:557-561 (1988). XX // AC M00006 XX ID V$MEF2_01 XX NA MEF-2 XX DT EWI (created); 19.10.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 3 1 1 02 0 0 1 4 03 0 4 0 1 04 0 0 0 5 05 5 0 0 0 06 5 0 0 0 07 5 0 0 0 08 5 0 0 0 09 5 0 0 0 10 0 0 0 5 11 5 0 0 0 12 5 0 0 0 13 0 5 0 0 14 0 2 0 3 15 0 4 0 1 16 0 2 0 3 17 0 0 0 0 XX BF T00505; MEF-2 ; mouse BF T01004; MEF-2 ; rat BF T01005; MEF-2 ; human XX BA 5 elements from 2 different genes (mouse/human/rat mck; rat/chicken mlc) XX CC compiled sequences XX RN [1] RA Gossett L. A., Kelvin D. J., Sternberg E. A., Olson E. N. RT A new myocyte-specific enhancer-binding factor that recognizes RT a conserved element associated with multiple muscle-specific genes RL Mol. Cell. Biol. 9:5022-5033 (1989). XX // AC M00007 XX ID V$ELK1_01 XX NA Elk-1 XX DT EWI (created); 27.10.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 0 0 02 1 1 1 1 03 2 0 1 1 04 2 1 1 0 05 3 0 1 0 06 0 3 1 0 07 2 2 0 0 08 0 0 4 0 09 0 0 4 0 10 4 0 0 0 11 3 0 0 1 12 0 0 4 0 13 0 1 0 3 14 1 1 1 1 15 1 3 0 0 16 1 1 2 0 17 1 1 0 2 XX BF T00250; Elk-1 ; human XX BA 4 functional and binding elements from 4 genes XX CC compiled sequences XX RN [1] RA Dalton S., Treisman R. RT Characterization of SAP-1, a protein recruited by serum response RT factor to the c-fos serum response element RL Cell 68:597-612 (1992). RN [2] RA Rao V. N., Reddy E. Sh. P. RT A divergent ets-related protein, Elk-1, recognizes similar c-ets-1 RT proto-oncogene target sequences and acts as a transcriptional RT activator RL Oncogene 7:65-70 (1992). XX // AC M00008 XX ID V$SP1_01 XX NA Sp1 XX DT EWI (created); 27.10.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 0 0 02 0 0 0 0 03 0 0 0 0 04 2 1 6 2 05 3 1 6 1 06 0 0 11 0 07 0 0 11 0 08 0 8 2 1 09 3 0 6 2 10 0 1 7 3 11 1 0 8 2 12 1 2 7 1 13 3 2 0 6 14 0 0 0 0 15 0 0 0 0 16 0 0 0 0 17 0 0 0 0 XX BF T00759; Sp1 ; human XX BA 11 sites selected by TDA XX RN [1] RA Thiesen H.-J., Bach Ch. RT Target detection assay (TDA): a versatile procedure to determine RT DNA binding sites as demonstrated on SP1 protein RL Nucleic Acids Res. 18:3202- (1990). XX // AC M00009 XX ID I$TTK69_01 XX NA Ttk 69K XX DT EWI (created); 03.11.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 0 0 02 0 0 0 0 03 0 0 0 0 04 0 0 0 0 05 0 0 4 0 06 0 1 3 0 07 0 0 0 4 08 0 4 0 0 09 0 4 0 0 10 0 0 0 4 11 0 0 4 0 12 0 3 1 0 13 0 0 0 0 14 0 0 0 0 15 0 0 0 0 16 0 0 0 0 17 0 0 0 0 XX BF T00843; Ttk 69K ; fruit fly XX BA 4 elements of the eve gene XX CC compiled sequences XX RN [1] RA Read D., Manley J. L. RT Alternatively spliced transcripts of the Drosophila tramtrack RT gene encode zinc finger proteins with distinct DNA binding sp RT ecificities RL EMBO J. 11:1035-1044 (1992). XX // AC M00010 XX ID P$O2_01 XX NA O2 (Opaque-2) XX DT EWI (created); 09.11.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 0 8 02 0 8 0 0 03 8 0 0 0 04 0 0 0 7 05 0 0 0 8 06 0 8 0 0 07 0 8 0 0 08 8 0 0 0 09 1 7 0 0 10 1 0 7 0 11 0 0 0 8 12 8 0 0 0 13 0 0 7 1 14 8 0 0 0 15 0 0 0 8 16 4 0 3 1 17 0 0 0 0 XX BF T00668; opaque-2 ; maize XX BA 8 O2 target sites out of 8 maize 22-kD zein genes XX CC compiled sequences XX RN [1] RA Schmidt R. J., Ketudat M., Aukerman M. J., Hoschek G. RT Opaque-2 is a transcriptional activator that recognizes a specific RT target site in 22-kD zein genes RL Plant Cell 4:689-700 (1992). XX // AC M00011 XX ID V$EVI1_06 XX NA Evi-1 XX DT EWI (created); 09.11.92. DT ewi (updated); 27.04.95. XX PO A C G T 01 0 0 0 0 02 0 0 0 0 03 0 0 0 0 04 16 0 0 0 05 0 16 0 0 06 15 0 0 1 07 16 0 0 0 08 0 0 16 0 09 16 0 0 0 10 0 0 0 16 11 15 0 0 1 12 12 1 3 0 13 0 0 0 0 14 0 0 0 0 15 0 0 0 0 16 0 0 0 0 17 0 0 0 0 XX BF T00273; Evi-1 ; mouse XX BA 16 sites selected from random oligonucleotides XX RN [1] RA Perkins A. S., Fishel R., Jenkins N. A., Copeland N. G. RT Evi-1, a murine zinc finger proto-oncogene, encodes a sequence-specific RT DNA-binding protein RL Mol. Cell. Biol. 11:2665-2674 (1991). XX // AC M00012 XX ID I$CF2II_01 XX NA CF2-II XX DT EWI (created); 10.11.92. DT ewi (updated); 27.04.95. XX PO A C G T 01 0 0 0 0 02 0 0 0 0 03 0 0 0 0 04 0 0 0 0 05 33 15 51 1 06 0 6 0 94 07 94 0 6 0 08 0 8 4 88 09 94 2 4 0 10 1 4 0 95 11 51 3 41 5 12 2 13 2 83 13 63 6 26 5 14 0 0 0 0 15 0 0 0 0 16 0 0 0 0 17 0 0 0 0 XX BF T00120; CF2-II ; fruit fly XX BA 80 binding sites selected from random oligonucleotides; 4th cycle XX RN [1] RA Gogos J. A., Hsu T., Bolton J., Kafatos F. C. RT Sequence discrimination by alternatively spliced isoforms of RT a DNA binding Zinc finger domain RL Science 257:1951-1954 (1992). XX // AC M00013 XX ID I$CF2II_02 XX NA CF2-II XX DT EWI (created); 10.11.92. DT ewi (updated); 27.04.95. XX PO A C G T 01 0 0 0 0 02 0 0 0 0 03 0 0 0 0 04 0 0 0 0 05 26 4 67 0 06 0 7 0 93 07 94 0 6 0 08 0 9 1 90 09 96 1 3 0 10 0 4 0 96 11 70 3 26 1 12 3 7 0 90 13 81 0 16 3 14 0 0 0 0 15 0 0 0 0 16 0 0 0 0 17 0 0 0 0 XX BF T00120; CF2-II ; fruit fly XX BA 70 binding sites selected from random oligonucleotides; 5th cycle XX RN [1] RA Gogos J. A., Hsu T., Bolton J., Kafatos F. C. RT Sequence discrimination by alternatively spliced isoforms of RT a DNA binding Zinc finger domain RL Science 257:1951-1954 (1992). XX // AC M00014 XX ID F$REPCAR1_01 XX NA (repressor of CAR1 expression) XX DT EWI (created); 13.11.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 2 3 3 4 02 5 2 2 3 03 2 5 4 4 04 1 0 2 12 05 14 0 0 1 06 2 0 13 0 07 0 15 0 0 08 0 15 0 0 09 2 0 12 1 10 0 14 1 0 11 0 13 2 0 12 5 2 7 1 13 10 2 2 1 14 6 3 5 1 15 6 2 7 0 16 4 4 5 1 17 4 5 4 1 XX BF T00726; repressor of CAR1 expression ; yeast XX BA 15 elements; 1 repressing element (CAR1 gene) and 14 homologous binding sites XX CC compiled sequences XX RN [1] RA Luche R. M., Sumrada R., Cooper T. G. RT A cis-acting element present in multiple genes serves as a repressor RT protein binding site for the yeast CAR1 gene RL Mol. Cell. Biol. 10:3884-3895 (1990). XX // AC M00015 XX ID F$ABF1_01 XX NA ABF1 XX DT EWI (created); 16.11.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 7 6 3 7 02 6 10 4 4 03 13 0 4 7 04 4 6 4 10 05 15 0 9 0 06 0 0 1 23 07 1 23 0 0 08 18 0 5 1 09 0 10 0 14 10 1 3 4 16 11 11 4 3 6 12 4 4 1 15 13 13 2 4 5 14 6 5 5 8 15 24 0 0 0 16 0 24 0 0 17 0 0 24 0 18 10 0 6 8 19 12 3 5 4 20 9 5 3 6 21 6 7 6 4 22 7 4 2 10 XX BF T00056; BAF1 ; yeast XX BA 24 identified genomic binding sites for ABF1-related factors XX CC compiled sequences XX RN [1] RA Della Seta F., Treich I., Buhler J.-M., Sentenac A. RT ABF1 binding sites in yeast RNA polymerase genes RL J. Biol. Chem. 265:15168-15175 (1990). XX // AC M00016 XX ID I$E74A_01 XX NA E74A XX DT EWI (created); 17.11.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 4 5 2 3 02 3 3 6 2 03 11 1 2 1 04 11 0 2 3 05 2 7 2 6 06 2 12 2 1 07 5 12 0 0 08 0 0 17 0 09 0 0 17 0 10 17 0 0 0 11 17 0 0 0 12 5 1 10 1 13 1 2 1 11 14 3 2 7 3 15 5 3 4 3 16 2 1 6 5 17 2 3 6 1 XX BF T00208; E74A ; fruit fly XX BA 17 recognition sequences selected from random oligonucleotides XX RN [1] RA Urness L. D., Thummel C. S. RT Molecular interactions within the ecdysone regulatory hierarchy: RT DNA binding properties of the Drosophila ecdysone-inducible RT E74A protein RL Cell 63:47-61 (1990). XX // AC M00017 XX ID V$ATF_01 XX NA ATF XX DT EWI (created); 17.11.92. DT ewi (updated); 30.05.95. XX PO A C G T 01 0 0 0 0 02 0 0 0 0 03 5 14 2 4 04 2 6 8 9 05 3 10 10 2 06 0 0 0 25 07 0 0 25 0 08 25 0 0 0 09 0 25 0 0 10 0 0 25 0 11 4 2 1 18 12 7 11 3 4 13 9 1 7 8 14 6 7 7 5 15 1 11 4 9 16 2 13 5 5 17 0 0 0 0 XX BA 25 known binding sites in 15 genes XX CC slightly differing from E4 consensus XX RN [1] RA Rooney R. J., Raychaudhuri P., Nevins J. R. RT E4F and ATF, two transcription factors that recognize the same RT site, can be distinguished both physically and functionally: RT a role for E4F in E1A trans activation RL Mol. Cell. Biol. 10:5138-5149 (1990). XX // AC M00018 XX ID I$UBX_01 XX NA Ubx XX DT EWI (created); 30.11.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 4 4 12 7 02 9 5 8 8 03 9 7 14 4 04 7 7 20 7 05 12 9 18 6 06 9 18 16 12 07 9 13 8 32 08 3 0 0 76 09 74 3 3 6 10 85 0 2 1 11 0 3 3 82 12 8 4 41 21 13 13 3 41 2 14 9 23 17 8 15 6 17 11 20 16 11 11 4 22 17 9 14 7 11 18 6 5 15 4 19 4 3 7 10 XX BF T00863; Ubx ; fruit fly XX BA 88 selected binding sequences XX CC 3 rounds of binding and amplification of random dodekamers bound CC to the bacterially expressed Ubx homeodomain XX RN [1] RA Ekker S. C., Young K. E., von Kessler D. P., Beachy P. A. RT Optimal DNA sequence recognition by the Ultrabithorax homeodomain RT of Drosophila RL EMBO J. 10:1179-1186 (1991). XX // AC M00019 XX ID I$DFD_01 XX NA Dfd XX DT EWI (created); 30.11.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 0.99 1.03 0.99 0.99 02 1.01 0.97 1.01 1.01 03 1.13 0.93 1.05 0.89 04 0.98 0.90 1.23 0.89 05 0.80 1.09 1.12 0.99 06 1.42 0.61 1.40 0.57 07 4.00 0.00 0.00 0.00 08 0.00 0.00 0.00 4.00 09 0.00 0.00 0.00 4.00 10 4.00 0.00 0.00 0.00 11 1.25 1.79 0.45 0.51 12 0.33 1.32 0.26 2.09 13 0.97 0.83 1.02 1.17 14 0.97 0.82 1.37 0.85 15 1.01 0.99 1.03 0.97 16 0.99 1.01 0.98 1.03 XX BF T00193; Dfd ; fruit fly XX BA 57 selected sequences, normalized to a sum of 4 for all 4 bases BA in each position; central ATTA was fixed XX RN [1] RA Ekker St. C., von Kessler D. P., Beachy Ph. A. RT Differential DNA sequence recognition is a determinant of specificity RT in homeotic gene action RL EMBO J. 11:4059-4072 (1992). XX // AC M00020 XX ID I$FTZ_01 XX NA Ftz XX DT EWI (created); 30.11.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 5 2 2 0 02 4 2 2 1 03 4 0 2 3 04 2 2 5 0 05 0 6 0 3 06 7 0 1 1 07 9 0 0 0 08 0 0 0 9 09 0 0 0 9 10 9 0 0 0 11 9 0 0 0 12 0 0 9 0 XX BF T00295; Ftz ; fruit fly XX BA 9 selected sequences bound by the Ftz homeo domain, ATTAAG was fixed ! XX CC only sequences with KD < 4xKD(consensus) were taken XX RN [1] RA Florence B., Handrow R., Laughon A. RT DNA-binding specificity of th fushi tarazu homeodomain RL Mol. Cell. Biol. 11:3613-3623 (1991). XX // AC M00021 XX ID I$KR_01 XX NA Kr (Krueppel) XX DT EWI (created); 03.12.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 5 0 1 0 02 3 2 0 1 03 1 3 0 2 04 0 0 6 0 05 0 0 6 0 06 1 0 5 0 07 1 0 0 5 08 0 0 0 6 09 4 0 1 1 10 3 1 0 2 XX BF T00456; Kr ; fruit fly XX BA 6 Krueppel binding sites within the eve promoter XX CC compiled sequences XX RN [1] RA Stanojevic D., Hoey T., Levine M. RT Sequence-specific DNA-binding activities of the gap proteins RT encoded by hunchback and Krueppel in Drosophila RL Nature 341:331-335 (1989). XX // AC M00022 XX ID I$HB_01 XX NA Hb (Hunchback) XX DT EWI (created); 03.12.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 5 8 2 02 6 8 2 0 03 9 3 4 0 04 4 3 1 8 05 13 1 0 2 06 16 0 0 0 07 16 0 0 0 08 14 0 2 0 09 15 1 0 0 10 9 2 2 3 XX BF T00395; Hb ; fruit fly XX BA 16 Hb binding sites within the eve promoter XX CC compiled sequences XX RN [1] RA Stanojevic D., Hoey T., Levine M. RT Sequence-specific DNA-binding activities of the gap proteins RT encoded by hunchback and Krueppel in Drosophila RL Nature 341:331-335 (1989). XX // AC M00023 XX ID V$HOX13_01 XX NA Hox-1.3 XX DT EWI (created); 03.12.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 2 1 6 02 0 2 7 1 03 1 7 1 1 04 3 4 1 2 05 4 3 0 3 06 4 2 1 3 07 1 5 2 2 08 3 2 0 5 09 1 4 1 4 10 1 9 0 0 11 1 6 1 2 12 1 5 0 4 13 2 7 0 1 14 10 0 0 0 15 0 0 0 10 16 0 0 0 10 17 10 0 0 0 18 0 2 5 3 19 0 2 0 8 20 2 2 5 1 21 2 3 3 2 22 3 0 3 4 23 2 5 1 2 24 3 2 4 1 25 4 4 1 1 26 2 4 3 1 27 3 3 0 4 28 1 5 0 4 29 1 7 1 1 30 4 2 2 2 XX BF T00377; Hox-1.3 ; mouse XX BA 10 Hox-1.3 in vitro binding sites of 10 different genes XX CC compiled sequences XX RN [1] RA Odenwald W. F., Garbern J., Arnheiter H., Tournier-Lasserve RA E., Lazzarini R. A. RT The HOX-1.3 homeo box protein is a sequence-specific DNA-binding RT phosphoprotein RL Genes Dev. 3:158-172 (1989). XX // AC M00024 XX ID V$E2F_01 XX NA E2F XX DT EWI (created); 04.12.92. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 0 0 4 02 2 1 0 2 03 1 2 2 0 04 0 1 4 0 05 0 5 0 0 06 0 0 5 0 07 0 4 1 0 08 0 0 5 0 09 5 0 0 0 10 5 0 0 0 11 5 0 0 0 12 5 0 0 0 13 1 2 0 2 14 0 0 2 3 15 2 0 3 0 XX BF T00219; E2F ; mouse BF T00220; E2F ; monkey BF T00221; E2F ; human XX BA 5 E2F binding sites in 3 E1A-inducible promoters XX CC compiled sequences XX RN [1] RA Thalmeier K., Synovzik H., Mertz R., Winnacker E.-L., Lipp M. RT Nuclear factor E2F mediates basic transcription and trans-activation RT by E1a of the human MYC promoter RL Genes Dev. 3:527-536 (1989). XX // AC M00025 XX ID V$ELK1_02 XX NA Elk-1 XX DT EWI (created); 31.01.94. DT ewi (updated); 27.04.95. XX PO A C G T 01 3 12 5 11 02 5 11 10 5 03 10 9 7 5 04 12 3 11 5 05 6 20 3 2 06 5 24 1 1 07 0 0 31 0 08 0 0 31 0 09 31 0 0 0 10 27 0 0 4 11 11 3 17 0 12 2 8 5 16 13 8 10 6 7 14 8 11 6 6 15 0 0 0 0 XX BF T00250; Elk-1 ; human XX BA 31 selected sites XX CC single binding sites in variable distances to SRF-binding sites XX RN [1] RA Treisman R., Marais R., Wynne J. RT Spatial flexibility in ternary complexes between SRF and its RT accessory proteins RL EMBO J. 11:4631-4640 (1992). XX // AC M00026 XX ID V$RSRFC4_01 XX NA RSRFC4 XX DT EWI (created); 31.01.94. DT ewi (updated); 22.06.95. XX PO A C G T 01 17 4 12 5 02 10 2 13 13 03 3 3 12 20 04 0 35 0 3 05 0 0 0 38 06 38 0 0 0 07 1 0 0 37 08 5 0 0 33 09 8 0 0 30 10 23 0 0 15 11 0 0 0 38 12 38 0 0 0 13 4 1 33 0 14 14 20 0 4 15 12 6 2 18 16 5 9 5 19 XX BF T01009; RSRFC4 ; human XX BA 38 selected binding sites XX RN [1] RA Pollock R., Treisman R. RT Human SRF-related proteins: DNA-binding properties and potential RT regulatory targets RL Genes Dev. 5:2327-2341 (1991). XX // AC M00027 XX ID F$ABAA_01 XX NA AbaA XX DT EWI (created); 23.07.94. DT ewi (updated); 22.06.95. XX PO A C G T 01 3 10 2 7 02 5 9 6 2 03 2 6 5 9 04 0 6 6 10 05 0 10 2 10 06 1 8 1 12 07 7 2 7 6 08 0 22 0 0 09 22 0 0 0 10 0 0 0 22 11 0 0 0 22 12 0 22 0 0 13 0 15 0 7 14 7 5 2 8 15 5 4 7 6 16 4 7 3 8 17 3 6 8 5 18 6 9 4 3 19 1 7 5 9 XX BF T01085; abaA ; Aspergillus nidulans XX BA 22 elements from 5 developmentally regulated genes XX CC compiled sequences XX RN [1] RA Adrianopoulos A., Timberlake W. E. RT The Aspergillus nidulans abaA gene encodes a transcriptional RT activator that acts as a genetic switch to control development RL Mol. Cell. Biol. 14:2503-2515 (1994). XX // AC M00028 XX ID I$HSF_01 XX NA HSF (Drosophila) XX DT EWI (created); 18.10.94. DT ewi (updated); 22.06.95. XX PO A C G T 01 31 14 4 1 02 0 0 50 0 03 48 2 0 0 04 43 1 5 1 05 23 1 14 12 XX BF T00386; HSTF ; fruit fly XX BA 50 functional genomic HSEs XX CC not included are sequences with more than 2 mismatches at positions CC 2 to 4; the matrix only describes the properties of the basic CC 5-bp unit of which three have to be present to constitute a minimal HSE XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). XX // AC M00029 XX ID F$HSF_01 XX NA HSF (yeast) XX DT EWI (created); 18.10.94. DT ewi (updated); 22.06.95. XX PO A C G T 01 28 6 12 4 02 0 0 48 2 03 46 0 2 2 04 46 0 2 2 05 12 19 8 11 XX BF T00385; HSTF ; yeast XX BA 50 functional genomic HSEs XX CC the matrix only describes the properties of the basic 5-bp unit CC of which three have to be present to constitute a minimal HSE; CC not included are sequences with more than 2 mismatches at positions 2 to 4 XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). XX // AC M00030 XX ID F$MATA1_01 XX NA MATa1 XX DT EWI (created); 18.10.94. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 1 1 12 02 0 0 14 0 03 14 0 0 0 04 0 0 0 14 05 0 0 14 0 06 1 2 0 11 07 10 0 3 1 08 6 2 4 2 09 5 4 1 4 10 2 1 1 10 XX BF T00486; MATalpha1 ; yeast XX BA a1 half-sites of 14 hsg operators of 4 genes XX CC compiled sequences XX RN [1] RA Goutte C., Johnson A. D. RT Recognition of a DNA operator by a dimer composed of two different RT homeodomain proteins RL EMBO J. 13:1434-1442 (1994). XX // AC M00031 XX ID F$MATALPHA2_01 XX NA MATalpha2 XX DT EWI (created); 18.10.94. DT ewi (updated); 22.06.95. XX PO A C G T 01 2 2 3 7 02 2 10 1 1 03 8 1 4 1 04 1 1 0 12 05 0 0 13 1 06 0 1 0 13 07 5 3 2 4 08 7 2 2 3 09 7 1 0 6 10 6 3 1 4 XX BF T00487; MATalpha2 ; yeast XX BA alpha2 half-sites of 14 hsg operators of 4 genes XX CC compiled sequences XX RN [1] RA Goutte C., Johnson A. D. RT Recognition of a DNA operator by a dimer composed of two different RT homeodomain proteins RL EMBO J. 13:1434-1442 (1994). XX // AC M00032 XX ID V$CETS1P54_01 XX NA c-Ets-1(p54) XX DT EWI (created); 10.12.94. DT ewi (updated); 27.04.95. XX PO A C G T 01 7 2 3 3 02 1 14 0 0 03 5 9 0 1 04 0 0 15 0 05 0 0 15 0 06 15 0 0 0 07 8 2 0 5 08 4 1 10 0 09 1 6 0 8 10 5 4 4 2 XX BF T00111; c-Ets-1 ; mouse BF T00114; c-Ets-1 54 ; chick XX BA 15 selected binding sites for bacterially expressed murine factor XX CC Dissociation constants range between 0.038 nM and >3 nM XX RN [1] RA Nye J. A., Petersen J. M., Gunther C. V., Jonsen M. D., Graves B. J. RT Interaction of murine Ets-1 with GGA-binding sites establishes RT the Ets domain as a new DNA-binding motif RL Genes Dev. 6:975-990 (1992). XX // AC M00033 XX ID V$P300_01 XX NA p300 XX DT ewi (created); 10.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 3 1 3 3 02 3 5 2 2 03 5 3 4 1 04 4 1 8 2 05 2 0 14 0 06 2 1 10 3 07 13 1 1 1 08 0 0 14 0 09 1 0 0 15 10 3 1 8 4 11 5 4 3 4 12 3 3 7 3 13 4 4 2 5 14 1 5 6 2 XX BF T01427; p300 ; human XX BA 16 selected p300 binding oligonucleotides XX RN [1] RA Rikitake Y., Moran E. RT DNA-binding properties of the E1A-associated 300-Kilodalton protein RL Mol. Cell. Biol. 12:2826-2836 (1992). XX // AC M00034 XX ID V$P53_01 XX NA p53 XX DT ewi (created); 10.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 4 0 13 0 02 5 0 12 0 03 15 0 2 0 04 0 17 0 0 05 17 0 0 0 06 0 0 0 17 07 0 0 17 0 08 0 13 0 4 09 0 17 0 0 10 0 17 0 0 11 0 0 17 0 12 0 0 17 0 13 2 0 15 0 14 0 17 0 0 15 17 0 0 0 16 0 0 0 17 17 0 0 17 0 18 0 2 0 15 19 0 13 0 4 20 0 7 2 7 XX BF T00671; p53 ; human XX BA 17 selected binding sequences XX CC six CASTing cycles with p53 (overexpressed in murine C2C12 cells) CC using random 35-mers XX RN [1] RA Funk W. D., Pak D. T., Karas R. H., Wright W. E., Shay J. W. RT A transcriptionally active DNA-binding site for human p53 protein RT complexes RL Mol. Cell. Biol. 12:2866-2871 (1992). XX // AC M00035 XX ID V$VMAF_01 XX NA v-Maf XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 17 3 8 11 02 12 7 8 12 03 19 7 6 7 04 5 6 1 27 05 2 0 36 1 06 9 29 0 1 07 2 2 0 35 08 1 2 33 3 09 37 0 2 0 10 0 34 0 5 11 5 5 6 23 12 8 21 7 3 13 23 1 7 8 14 1 0 36 2 15 4 29 4 2 16 24 5 7 3 17 7 18 6 8 18 12 11 9 7 19 12 10 9 8 XX BF T01430; v-Maf ; AS42 XX BA 39 selected v-Maf binding sites of TRE-type XX CC PCR amplification of random eikosamers with fixed flanks binding CC to recombinant bZIP domain of v-Maf fused to maltose-binding CC protein and expressed in E. coli XX RN [1] RA Kataoka K., Noda M., Nishizawa M. RT Maf nuclear oncoprotein recognizes sequences related to an RT AP-1 site and forms heterodimers with both Fos and Jun RL Mol. Cell. Biol. 14:700-712 (1994). XX // AC M00036 XX ID V$VJUN_01 XX NA v-Jun XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 6 8 5 5 02 2 11 3 8 03 6 0 15 3 04 20 0 4 0 05 0 0 0 24 06 0 0 24 0 07 24 0 0 0 08 0 24 0 0 09 0 0 24 0 10 0 0 0 24 11 0 24 0 0 12 24 0 0 0 13 0 2 1 21 14 5 8 4 7 15 3 13 4 4 16 5 10 0 9 XX BF T00893; v-Jun ; ASV XX BA 24 selected binding sites XX CC PCR amplification of random eikosamers with fixed flanks binding CC to recombinant bZIP domain of v-Jun fused to maltose-binding CC protein and expressed in E. coli XX XX // AC M00037 XX ID V$NFE2_01 XX NA NF-E2 XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 2 2 0 5 02 2 0 7 0 03 1 8 0 0 04 0 0 0 9 05 0 0 9 0 06 9 0 0 0 07 0 3 6 0 08 0 0 1 8 09 0 8 0 1 10 8 1 0 0 11 1 4 0 4 XX BF T00557; NF-E2 ; mouse BF T00558; NF-E2 ; human BF T01141; NF-E2 ; chick BF T01440; NF-E2 p45 ; human BF T01441; NF-E2 p45 ; mouse XX BA 9 synthetic binding sites XX CC compiled oligonucleotides which compete in gel shift experiments CC for binding to purified NF-E2 (from mouse MEL cells) XX RN [1] RA Andrews N. C., Erdjument-Bromage H., Davidson M. B., Tempst RA P., Orkin S. H. RT Erythroid transcription factor NF-E2 is a haematopoietic-specific RT basic-leucine zipper protein RL Nature 362:722-728 (1993). XX // AC M00038 XX ID F$GCN4 XX NA GCN4 XX DT ewi (created); 12.04.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 4 6 2 2 02 2 7 5 1 03 7 6 2 3 04 2 7 9 2 05 3 6 6 8 06 5 4 10 6 07 5 11 7 4 08 4 12 9 5 09 6 9 11 11 10 19 6 15 2 11 0 0 0 43 12 0 0 43 0 13 43 0 0 0 14 0 43 0 0 15 2 0 0 41 16 1 42 0 0 17 43 0 0 0 18 1 11 5 26 19 12 16 2 8 20 4 14 13 3 21 12 10 4 8 22 6 9 11 7 23 7 10 4 10 24 4 8 11 5 25 5 10 6 6 26 8 7 3 7 27 6 7 4 6 XX BF T00321; GCN4 ; yeast XX BA 43 selected GCN4-binding sites XX CC oligonucleotides were affinity-selected on a GCN4-column containing CC bacterially expressed factor XX RN [1] RA Oliphant A. R., Brandl C. J., Struhl K. RT Defining the sequence specificity of DNA-binding proteins by RT selecting binding sites from random-sequence oligonucleotides: RT analysis of yeast GCN4 protein RL Mol. Cell. Biol. 9:2944-2949 (1989). XX // AC M00039 XX ID V$CREB_01 XX NA CREB XX DT ewi (created); 12.04.95. DT ewi (updated); 12.05.95. XX PO A C G T 01 0 0 0 29 02 0 0 28 1 03 29 0 0 0 04 0 29 0 0 05 0 1 28 0 06 0 1 1 27 07 12 16 1 0 08 17 1 4 7 XX BF T00163; CREB ; human BF T00164; CREB ; rat BF T00165; deltaCREB ; rat BF T00166; deltaCREB ; human BF T00989; CREB ; mouse BF T01311; deltaCREB ; mouse XX BA 29 selected binding oligonucleotides XX CC sequences were selected by CASTing technique using deltaCREB CC translated in vitro in reticulocyte lysate and binding as homodimer XX RN [1] RA Benbrook D. M., Jones N. C. RT Different binding specificities and transcactivation of variant RT CRE's by CREB complexes RL Nucleic Acids Res. 22:1463-1469 (1994). XX // AC M00040 XX ID V$CREBP1_01 XX NA CRE-BP1 XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 0 47 02 0 0 8 39 03 47 0 0 0 04 0 47 0 0 05 4 0 43 0 06 0 0 0 47 07 39 7 0 1 08 45 2 0 0 XX BF T00167; CRE-BP1 ; human XX BA 47 selected binding sequences XX CC sequences were selected by CASTing technique using CRE-BP1 XX RN [1] RA Benbrook D. M., Jones N. C. RT Different binding specificities and transcactivation of variant RT CRE's by CREB complexes RL Nucleic Acids Res. 22:1463-1469 (1994). XX // AC M00041 XX ID V$CREBP1CJUN_01 XX NA CRE-BP1/c-Jun XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 1 0 42 02 2 0 41 0 03 43 0 0 0 04 0 43 0 0 05 2 1 40 0 06 0 0 0 43 07 0 20 3 20 08 40 1 0 2 XX BF T00131; c-Jun ; mouse BF T00132; c-Jun ; rat BF T00133; c-Jun ; human BF T00134; c-Jun ; chick BF T00167; CRE-BP1 ; human XX BA 43 selected binding sequences XX CC sequences were selected by CASTing technique using co-translated CC CRE-BP1 and c-Jun binding to DNA as heterodimers XX RN [1] RA Benbrook D. M., Jones N. C. RT Different binding specificities and transcactivation of variant RT CRE's by CREB complexes RL Nucleic Acids Res. 22:1463-1469 (1994). XX // AC M00042 XX ID V$SOX5_01 XX NA Sox-5 XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 4 6 3 9 02 7 4 3 8 03 21 0 1 1 04 22 0 1 0 05 0 22 0 1 06 23 0 0 0 07 22 1 0 0 08 0 0 0 23 09 10 3 6 4 10 5 7 6 5 XX BF T01429; Sox-5 ; mouse XX BA 23 selected binding sequences XX CC oligonucleotides binding to bacterially expressed GST-Sox-5 CC fusion protein were selected by 5 selection/amplification rounds XX RN [1] RA Denny P., Swift S., Connor F., Ashworth A. RT An SRY-related gene expressed during spermatogenesis in the RT mouse encodes a sequence-specific DNA-binding protein RL EMBO J. 11:3705-3712 (1992). XX // AC M00043 XX ID I$DL_01 XX NA dl XX DT ewi (created); 12.04.95. DT ewi (updated); 18.05.95. XX PO A C G T 01 0 2 18 2 02 0 0 22 0 03 0 0 22 0 04 4 2 0 16 05 4 1 0 17 06 1 0 0 12 07 0 0 0 22 08 0 0 0 22 09 1 21 0 0 10 0 22 0 0 11 5 8 7 2 XX BF T00196; dl ; fruit fly XX BA 22 selected binding sequences XX CC SAAB analysis of oligonucleotides binding to bacterially expressed CC dorsal (dl) protein; alignment produces many gaps in position 6 XX RN [1] RA Pan D., Courey A. J. RT The same dorsal binding site mediates both activation and repression RT in a context-dependent manner RL EMBO J. 11:1837-1842 (1992). XX // AC M00044 XX ID I$SN_02 XX NA Sn XX DT ewi (created); 12.04.95. DT ewi (updated); 27.04.95. XX PO A C G T 01 5 2 1 1 02 1 4 3 1 03 2 5 1 1 04 8 0 1 0 05 1 5 2 1 06 0 5 2 2 07 0 0 0 9 08 0 0 9 0 09 2 2 0 5 10 0 0 0 9 11 2 1 2 4 12 3 2 3 1 13 1 5 2 1 14 6 1 0 2 XX BF T00751; Sn ; fruit fly XX BA 9 binding sites from Drosophila sim gene XX CC bacterially expressed Snail (Sna, Sn) protein XX RN [1] RA Kasai Y., Nambu J. R., Lieberman P. M., Crews S. T. RT Dorsal-ventral patterning in Drosophila: DNA binding of snail RT protein to the single-minded gene RL Proc. Natl. Acad. Sci. USA 89:3414-3418 (1992). XX // AC M00045 XX ID V$E4BP4_01 XX NA E4BP4 XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 6 9 3 5 02 9 3 11 0 03 1 0 0 22 04 0 0 2 21 05 23 0 0 0 06 0 8 0 15 07 1 0 22 0 08 0 0 0 23 09 23 0 0 0 10 23 0 0 0 11 0 11 4 8 12 7 4 8 4 XX BF T01428; E4BP4 ; human XX BA 23 selected binding sequences XX CC high-affiity binding sites for in vitro translated E4BP4 XX RN [1] RA Cowell I. G., Skinner A., Hurst H. C. RT Transcriptional repression by a novel member of the bZIP family RT of transcription factors RL Mol. Cell. Biol. 12:3070-3077 (1992). XX // AC M00046 XX ID F$GCR1_01 XX NA GCR1 XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 2 0 4 0 02 1 0 4 1 03 0 6 0 0 04 1 0 0 5 05 0 0 0 6 06 0 6 0 0 07 0 6 0 0 08 3 0 0 3 09 1 4 1 0 XX BF T00322; GCR1 ; yeast XX BA 6 genomic binding sites from 5 genes XX CC proven by gel shift assays XX RN [1] RA Huie M. A., Scott E. W., Drazinic C. M., Lopez M. C., Hornstra RA I. K., Yang T. P., Baker H. V. RT Characterization of the DNA-binding activity of GCR1: In vivo RT evidence for two GCR1-binding sites in the upstream activating RT sequence of TPI of Saccharomyces cerevisiae RL Mol. Cell. Biol. 12:2690-2700 (1992). XX // AC M00047 XX ID F$QA1F_01 XX NA qa-1F XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 3 7 0 3 02 2 6 1 4 03 0 1 12 0 04 0 0 13 0 05 4 3 4 2 06 1 0 1 11 07 10 1 2 0 08 12 0 0 1 09 5 0 4 4 10 1 5 2 5 11 5 2 5 1 12 2 6 0 5 13 1 3 0 9 14 1 3 0 9 15 11 1 0 1 16 3 1 0 9 17 1 10 1 1 18 2 10 1 0 19 1 3 5 4 20 0 5 5 3 XX BF T00709; qa-1F ; Neurospora crassa XX BA 13 binding sites from 5 genes XX CC identified by footprinting XX RN [1] RA Baum J. A., Geever R., Giles N. H. RT Expression of qa-1F activator protein: identification of upstream RT binding sites in the qa gene cluster and localization of the RT DNA-binding domain RL Mol. Cell. Biol. 7:1256-1266 (1987). XX // AC M00048 XX ID F$ADR1_01 XX NA ADR1 XX DT ewi (created); 12.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 3 5 2 6 02 0 0 16 0 03 0 0 16 0 04 6 0 10 0 05 0 0 15 0 06 0 1 3 4 XX BF T00011; ADR1 ; yeast XX BA 16 selected binding sequences XX CC random octameric oligonucleotides with fixed flanks bound to CC bacterially synthesized ADR1 XX RN [1] RA Cheng C., Kacherovsky N., Dombek K. M., Camier S., Thukral S. RA K., Rhim E., Young E. T. RT Identification of potential target genes for Adr1p through characterization RT of essential nucleotides in UAS1 RL Mol. Cell. Biol. 14:3842-3852 (1994). XX // AC M00049 XX ID F$GAL4_01 XX NA GAL4 XX DT ewi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 5 3 2 02 5 2 1 3 03 3 2 1 5 04 1 10 0 0 05 0 0 10 1 06 0 1 10 0 07 4 3 3 1 08 1 3 4 3 09 2 4 4 1 10 7 0 2 2 11 1 8 2 0 12 4 1 0 6 13 1 3 5 2 14 0 2 1 8 15 1 6 2 2 16 1 5 4 1 17 2 1 1 7 18 0 10 1 0 19 0 11 0 0 20 0 0 11 0 21 8 0 0 3 22 7 0 4 0 23 2 6 3 0 XX BF T00302; GAL4 ; yeast XX BA 11 genomic binding sites from 6 genes XX CC compiled sequences XX RN [1] RA Bram R. J., Lue N. F., Kornberg R. D. RT A GAL family of upstream activating sequences in yeast: roles RT in both induction and repression of transcription RL EMBO J. 5:603-608 (1986). XX // AC M00050 XX ID V$E2F_02 XX NA E2F XX DT ewi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 0 12 02 0 0 0 12 03 0 0 0 12 04 0 6 6 0 05 0 2 10 0 06 0 12 0 0 07 0 0 12 0 08 0 11 1 0 XX BA 12 binding sites from 7 cellular and 1 viral genes XX CC compiled sequences XX RN [1] RA Nevins R. RT E2F: A link between the Rb tumor suppressor protein and viral RT oncoproteins RL Science 258:424-429 (1992). XX // AC M00051 XX ID V$NFKAPPAB50_01 XX NA NF-kappaB (p50) XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 18 0 02 0 0 18 0 03 0 0 18 0 04 2 0 16 0 05 16 1 0 1 06 0 0 3 15 07 0 7 1 10 08 0 16 0 2 09 0 18 0 0 10 0 17 1 0 XX BF T00593; NF-kappaB1 ; human XX BA 18 selected binding sequences XX CC oligonucleotides binding to bacterially expressed NF-kappaB CC (p50) were selected (gel shift) and amplified (PCR) in 3 cycles XX RN [1] RA Kunsch C., Ruben S. M., Rosen C. A. RT Selection of optimal kappaB/Rel DNA-binding motifs: interaction RT of both subunits of NF-kappaB with DNA is required for transcriptional RT activation RL Mol. Cell. Biol. 12:4412-4421 (1992). XX // AC M00052 XX ID V$NFKAPPAB65_01 XX NA NF-kappaB (p65) XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 4 10 4 02 0 0 18 0 03 0 0 18 0 04 11 0 7 0 05 10 3 3 2 06 1 0 0 17 07 0 0 0 18 08 0 3 0 15 09 0 18 0 0 10 0 18 0 0 XX BF T00594; RelA ; human BF T00595; RelA ; mouse XX BA 18 selected binding sequences XX CC oligonucleotides binding to bacterially expressed NF-kappaB CC (p65) were selected (gel shift) and amplified (PCR) in 3 cycles XX RN [1] RA Kunsch C., Ruben S. M., Rosen C. A. RT Selection of optimal kappaB/Rel DNA-binding motifs: interaction RT of both subunits of NF-kappaB with DNA is required for transcriptional RT activation RL Mol. Cell. Biol. 12:4412-4421 (1992). XX // AC M00053 XX ID V$CREL_01 XX NA c-Rel XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 5 8 4 02 0 0 16 1 03 1 0 15 1 04 5 1 9 2 05 6 5 3 3 06 5 1 1 10 07 1 0 0 16 08 2 0 0 15 09 0 15 0 2 10 1 16 0 0 XX BF T00168; c-Rel ; human BF T00169; c-Rel ; mouse BF T01154; c-Rel ; chick XX BA 17 selected binding sequences XX CC oligonucleotides binding to bacterially expressed c-Rel were CC selected (gel shift) and amplified (PCR) in 3 cycles XX RN [1] RA Kunsch C., Ruben S. M., Rosen C. A. RT Selection of optimal kappaB/Rel DNA-binding motifs: interaction RT of both subunits of NF-kappaB with DNA is required for transcriptional RT activation RL Mol. Cell. Biol. 12:4412-4421 (1992). XX // AC M00054 XX ID V$NFKAPPAB_01 XX NA NF-kappaB XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 33 0 02 0 0 33 0 03 0 0 33 0 04 23 0 10 0 05 15 10 3 5 06 9 2 1 21 07 0 5 3 25 08 1 15 0 17 09 2 31 0 0 10 0 33 0 0 XX BF T00587; NF-kappaB ; rat BF T00588; NF-kappaB ; mouse BF T00590; NF-kappaB ; human XX BA 33 binding sites from 18 genes (15 cellular genes and 3 viral genomes) XX CC compiled sequences XX RN [1] RA Zabel U., Schreck R., Baeuerle P. A. RT DNA binding of purified transcription factor NF-kappaB RL J. Biol. Chem. 266:252-260 (1991). XX // AC M00055 XX ID V$NMYC_01 XX NA N-Myc XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 7 11 6 16 02 10 15 6 9 03 9 20 6 5 04 0 39 1 0 05 32 0 4 4 06 0 39 0 1 07 2 0 38 0 08 0 3 0 37 09 0 0 40 0 10 6 11 8 15 11 11 16 5 8 12 11 11 11 7 XX BF T01445; N-Myc ; mouse XX BA 40 selected binding sequences XX CC oligonucleotides bound to bacterially expressed GST-N-Myc fusion protein XX RN [1] RA Alex R., Sözeri O., Meyer S., Dildrop R. RT Determination of the DNA sequence recognized by the bHLH-zip RT domain of the N-Myc protein RL Nucleic Acids Res. 20:2257-2263 (1992). XX // AC M00056 XX ID V$NF1_01 XX NA NF-1 XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 5 0 2 02 5 0 3 0 03 0 4 3 1 04 1 6 0 1 05 1 1 1 5 06 0 2 5 1 07 0 1 0 7 08 0 2 2 4 09 0 3 2 3 10 3 1 1 3 11 3 1 2 2 12 2 0 1 5 13 0 2 0 6 14 0 0 1 7 15 0 0 7 1 16 0 0 8 0 17 1 5 0 1 18 5 0 2 1 19 1 5 1 1 20 0 2 4 2 21 0 3 4 1 22 3 1 1 3 23 0 1 7 0 24 0 7 0 1 25 1 7 0 0 26 6 0 1 1 27 4 0 3 1 28 1 4 1 2 29 3 2 0 3 XX BF T00533; NF-1 ; cat BF T00534; NF-1 ; monkey BF T00535; NF-1 ; rat BF T00536; NF-1 ; clawed frog BF T00537; NF-1 ; mouse BF T00538; NF-1 ; domestic pig BF T00539; NF-1 ; human BF T00544; NF-1A1 ; chick BF T00551; NF-1B1 ; chick BF T00552; NF-1B2 ; chick BF T00554; NF-1C2 ; chick BF T00599; NF-1/L ; rat BF T00604; NF-1/Red1 ; hamster BF T00610; NF-1/X ; hamster BF T01036; NF-1 ; hamster BF T01298; NF-1 ; sheep XX BA 8 selected binding sequences XX CC 6 CASTing cycles for oligonucleotides binding to a member of the NF-1 family XX RN [1] RA Funk W. D., Wright W. E. RT Cyclic amplification and selection of targets for multicomponent RT complexes: Myogenin interacts with factors recognizing binding RT sites for basic helix--loop--helix, nuclear factor 1, myocyte-specific RT enhancer-binding factor 2 , and COMP1 factor RL Proc. Natl. Acad. Sci. USA 89:9484-9488 (1992). XX // AC M00057 XX ID V$COMP1_01 XX NA COMP1 XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 1 2 3 02 2 2 3 0 03 1 1 1 4 04 1 1 2 3 05 3 0 0 4 06 1 0 2 4 07 0 2 5 0 08 6 0 0 1 09 1 0 0 6 10 1 0 0 6 11 1 0 6 0 12 4 1 2 0 13 0 6 1 0 14 3 1 2 1 15 4 0 2 1 16 0 4 2 1 17 4 0 2 1 18 5 1 1 0 19 3 2 2 0 20 4 2 0 0 21 2 1 3 0 22 2 1 3 0 23 0 2 2 2 24 1 2 2 1 XX BF T01446; COMP1 ; mouse XX BA 7 selected binding sequences XX CC 6 CASTing cycles for oligonucleotides binding to a factor that CC cooperates with myogenic proteins XX RN [1] RA Funk W. D., Wright W. E. RT Cyclic amplification and selection of targets for multicomponent RT complexes: Myogenin interacts with factors recognizing binding RT sites for basic helix--loop--helix, nuclear factor 1, myocyte-specific RT enhancer-binding factor 2 , and COMP1 factor RL Proc. Natl. Acad. Sci. USA 89:9484-9488 (1992). XX // AC M00058 XX ID V$HEN1_02 XX NA HEN1 XX DT hiwi (created); 13.04.95. DT ewi (updated); 27.04.95. XX PO A C G T 01 33 16 27 24 02 29 12 31 27 03 19 7 52 22 04 5 0 78 16 05 5 2 91 2 06 21 29 30 20 07 25 72 4 0 08 5 14 64 17 09 0 100 0 0 10 100 0 0 0 11 3 5 91 0 12 2 88 9 2 13 0 0 0 100 14 0 0 100 0 15 7 67 12 14 16 5 3 67 25 17 11 30 23 36 18 5 80 5 9 19 16 68 5 11 20 13 57 9 20 21 22 41 10 26 22 27 16 24 33 XX BF T01447; HEN1 ; rat XX BA 54 selected binding sequences XX CC oligonucleotides bound to in vitro-translated HEN1 and isolated CC by CASTing; figures are percentages XX RN [1] RA Brown L., Baer R. RT HEN1 encodes a 20-kilodalton phosphoprotein that binds an extended RT E-box motif as a homodimer RL Mol. Cell. Biol. 14:12 (1994). XX // AC M00059 XX ID V$YY1_01 XX NA YY1 XX DT hiwi (created); 13.04.95. DT ewi (updated); 27.04.95. XX PO A C G T 01 3 4 6 5 02 7 5 5 1 03 6 3 3 6 04 3 7 5 3 05 6 2 4 6 06 1 13 1 3 07 0 18 0 0 08 17 1 0 0 09 0 0 2 16 10 5 3 3 7 11 1 2 2 13 12 5 2 1 9 13 3 4 5 6 14 4 4 6 4 15 6 4 6 2 16 9 2 2 5 17 3 6 5 4 XX BF T00278; delta factor ; mouse BF T00915; YY1 ; human XX BA 18 repressing binding sites from 7 cellular genes and 6 viral genomes XX CC compiled sequences XX RN [1] RA Shrivastava A., Calame K. RT An analysis of genes regulated by the multi-functional transcriptional RT regulator Yin Yang-1 RL Nucleic Acids Res. 22:5151-5155 (1994). XX // AC M00060 XX ID I$SN_01 XX NA Sn XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 7 6 7 2 02 9 5 4 7 03 13 2 13 1 04 27 0 8 3 05 0 39 1 0 06 39 0 0 1 07 3 0 37 0 08 2 0 38 0 09 0 0 0 40 10 0 0 37 3 11 3 19 3 12 12 18 8 2 4 13 7 11 7 5 XX BA 40 selected binding sequences XX CC selection from random 17-mers by binding to bacterially expressed CC beta-Gal-Snail fusion protein XX RN [1] RA Mauhin V., Lutz Y., Dennefeld C., Alberga A. RT Definition of the DNA-binding site repertoire for the Drosophila RT transcription factor SNAIL RL Nucleic Acids Res. 21:3951-3957 (1993). XX // AC M00061 XX ID F$MIG1_01 XX NA MIG1 XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 1 6 6 02 8 2 1 3 03 6 3 1 4 04 7 0 2 5 05 7 0 0 7 06 9 0 0 5 07 9 0 0 5 08 10 0 2 2 09 2 3 1 8 10 0 7 7 0 11 0 6 0 8 12 0 0 14 0 13 0 0 14 0 14 0 0 14 0 15 0 0 14 0 16 5 1 2 6 17 7 2 3 2 XX BF T00509; MIG1 ; yeast XX BA 14 binding sites out of 11 genes XX CC compiled sequences XX RN [1] RA Lundin M., Nehlin J. O., Ronne H. RT Importance of a flanking AT-rich region in target site recognition RT by the GC box-binding zinc finger protein MIG1 RL Mol. Cell. Biol. 14:1979-1985 (1994). XX // AC M00062 XX ID V$IRF1_01 XX NA IRF-1 XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 8 12 1 02 9 0 0 0 03 21 0 0 0 04 21 0 0 0 05 21 0 0 0 06 1 4 12 1 07 0 10 0 11 08 1 0 19 0 09 21 0 0 0 10 20 1 0 0 11 21 0 0 0 12 1 14 6 0 13 1 13 1 6 XX BF T00422; IRF-1 ; mouse BF T00423; IRF-1 ; human BF T00424; IRF-1 ; rat XX BA 21 selected binding sequences XX CC 5 rounds of binding and PCR amplification from random 26-mers XX RN [1] RA Tanaka N., Kawakami T., Taniguchi T. RT Recognition DNA sequences of interferon regulatory factor 1 RT (IRF-1) and IRF-2, regulators of cell growth and the interferon system RL Mol. Cell. Biol. 13:4531-4538 (1993). XX // AC M00063 XX ID V$IRF2_01 XX NA IRF-2 XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 1 14 0 02 4 0 0 0 03 15 0 0 0 04 15 0 0 0 05 15 0 0 0 06 0 1 12 0 07 0 7 0 8 08 0 0 15 0 09 15 0 0 0 10 15 0 0 0 11 15 0 0 0 12 0 8 7 0 13 0 8 2 5 XX BF T00425; IRF-2 ; mouse XX BA 15 selected binding sequences XX CC 5 rounds of binding and PCR amplification from random 26-mers XX RN [1] RA Tanaka N., Kawakami T., Taniguchi T. RT Recognition DNA sequences of interferon regulatory factor 1 RT (IRF-1) and IRF-2, regulators of cell growth and the interferon system RL Mol. Cell. Biol. 13:4531-4538 (1993). XX // AC M00064 XX ID F$PHO4_01 XX NA PHO4 XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 2 1 4 02 3 2 2 1 03 2 3 3 0 04 0 8 0 0 05 8 0 0 0 06 0 8 0 0 07 0 0 8 0 08 0 0 0 8 09 0 0 5 3 10 0 2 4 2 11 1 0 5 2 12 2 2 2 2 XX BF T00690; PHO4 ; yeast XX BA 8 binding sites from 4 genes XX CC compiled sequences XX RN [1] RA Fisher F., Goding C. R. RT Single amino acid substitutions alter helix--loop--helix protein RT specificity for bases flanking the core CANNTG motif RL EMBO J. 11:4103-4109 (1992). XX // AC M00065 XX ID V$TAL1BETAE47_01 XX NA Tal-1beta/E47 XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 23 35 28 14 02 15 30 23 32 03 14 23 44 19 04 66 9 14 11 05 61 30 3 6 06 0 100 0 0 07 100 0 0 0 08 0 0 100 0 09 86 1 10 3 10 0 0 0 100 11 0 0 100 0 12 1 3 61 35 13 3 17 1 79 14 20 43 17 20 15 27 23 31 19 16 23 26 32 19 XX BF T00207; E47 ; human BF T01448; Tal-1beta ; human XX BA 44 selected binding sequences XX CC random 35-mers bound by in vitro co-translated Tal-1beta and CC E47 after 6 CASTing cycles; figures are percentages XX RN [1] RA -Hsu H.-L., Huang L., Tsan J. T., Funk W., Wright W. E., Hu RA J.-S., Kingston R. E., Baer R. RT Preferred sequences for DNA recognition by the TAL1 helix-loop-helix RT proteins RL Mol. Cell. Biol. 14:1256-1265 (1994). XX // AC M00066 XX ID V$TAL1ALPHAE47_01 XX NA Tal-1alpha/E47 XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 18 10 35 37 02 14 40 17 29 03 12 26 41 21 04 59 8 27 6 05 65 26 9 0 06 0 100 0 0 07 100 0 0 0 08 0 0 95 4 09 86 6 8 0 10 0 0 0 100 11 0 0 100 0 12 1 3 56 40 13 0 17 4 79 14 19 34 35 12 15 34 24 19 23 16 17 39 30 14 XX BF T00207; E47 ; human BF T00790; Tal-1 ; human XX BA 35 selected binding sequences XX CC random 35-mers bound by in vitro co-translated Tal-1alpha and CC E47 after 6 CASTing cycles; figures are percentages XX RN [1] RA -Hsu H.-L., Huang L., Tsan J. T., Funk W., Wright W. E., Hu RA J.-S., Kingston R. E., Baer R. RT Preferred sequences for DNA recognition by the TAL1 helix-loop-helix RT proteins RL Mol. Cell. Biol. 14:1256-1265 (1994). XX // AC M00067 XX ID I$HAIRY_01 XX NA hairy XX DT hiwi (created); 13.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 20 30 35 15 02 26 30 26 17 03 12 16 56 16 04 19 15 58 8 05 0 96 0 4 06 81 12 0 8 07 0 100 0 0 08 3 0 93 3 09 0 83 7 10 10 0 12 71 17 11 36 41 18 5 12 5 62 24 10 13 47 21 11 21 14 15 15 31 38 XX BF T00345; H ; fruit fly XX BA 29 selected binding sequences XX CC random tetradekamers bound by bacterially expressed hairy protein; CC figures are percentages XX RN [1] RA Van Doren M., Bailey A. M., Esnayra J., Ede K., Posakony J. W. RT Negative regulation of proneural gene activity: hairy is a direct RT transcriptional repressor of achaete RL Genes Dev. 8:2729-2742 (1994). XX // AC M00068 XX ID V$HEN1_01 XX NA HEN1 XX DT ewi (created); 13.04.95. DT ewi (updated); 27.04.95. XX PO A C G T 01 31 21 18 31 02 18 18 21 44 03 22 17 44 17 04 7 0 84 9 05 7 0 82 11 06 18 20 35 27 07 21 78 0 0 08 11 29 33 27 09 0 100 0 0 10 100 0 0 0 11 2 8 83 8 12 13 75 12 0 13 0 0 0 100 14 0 0 100 0 15 12 58 17 13 16 0 2 71 27 17 21 29 25 25 18 2 77 4 17 19 12 71 4 14 20 19 58 8 15 21 15 30 20 35 22 20 30 23 27 XX BF T01447; HEN1 ; rat XX BA 51 selected binding sequences XX CC oligonucleotides bound to in vivo-produced HEN1 (COS1 cells) CC isolated by CASTing; figures are percentages XX RN [1] RA Brown L., Baer R. RT HEN1 encodes a 20-kilodalton phosphoprotein that binds an extended RT E-box motif as a homodimer RL Mol. Cell. Biol. 14:12 (1994). XX // AC M00069 XX ID V$YY1_02 XX NA YY1 XX DT ewi (created); 13.04.95. DT ewi (updated); 27.04.95. XX PO A C G T 01 2 4 2 3 02 2 5 2 2 03 2 2 4 3 04 1 7 3 0 05 1 2 6 2 06 2 0 9 0 07 0 11 0 0 08 0 11 0 0 09 11 0 0 0 10 0 0 0 11 11 0 8 2 1 12 1 1 1 8 13 2 0 2 7 14 0 3 7 1 15 2 2 4 3 16 2 6 1 2 17 1 1 1 8 18 1 4 5 1 19 2 4 3 2 20 4 0 2 5 XX BF T00278; delta factor ; mouse BF T00915; YY1 ; human XX BA 11 activating binding sites from 8 cellular genes, 1 viral genome BA and 2 retrotransposon elements XX CC compiled sequences XX RN [1] RA Shrivastava A., Calame K. RT An analysis of genes regulated by the multi-functional transcriptional RT regulator Yin Yang-1 RL Nucleic Acids Res. 22:5151-5155 (1994). XX // AC M00070 XX ID V$TAL1BETAITF2_01 XX NA Tal-1beta/ITF-2 XX DT ewi (created); 18.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 10 32 43 15 02 12 28 35 25 03 19 23 37 21 04 74 0 14 12 05 72 21 5 2 06 0 100 0 0 07 100 0 0 0 08 0 0 98 2 09 95 2 3 0 10 0 0 0 100 11 0 0 100 0 12 0 3 63 34 13 0 30 0 70 14 17 39 20 24 15 25 21 14 40 16 16 35 26 23 XX BF T00433; ITF-2 ; human BF T01448; Tal-1beta ; human XX BA 59 selected binding sequences XX CC random 35-mers bound by in vitro co-translated Tal-1beta and CC ITF-2 (E2-2) after 6 CASTing cycles; figures are percentages XX RN [1] RA -Hsu H.-L., Huang L., Tsan J. T., Funk W., Wright W. E., Hu RA J.-S., Kingston R. E., Baer R. RT Preferred sequences for DNA recognition by the TAL1 helix-loop-helix RT proteins RL Mol. Cell. Biol. 14:1256-1265 (1994). XX // AC M00071 XX ID V$E47_02 XX NA E47 XX DT ewi (created); 18.04.95. DT ewi (updated); 27.04.95. XX PO A C G T 01 41 24 17 17 02 31 24 14 31 03 17 17 30 36 04 50 26 7 17 05 50 14 33 3 06 0 100 0 0 07 100 0 0 0 08 0 5 92 3 09 0 16 84 0 10 0 0 0 100 11 0 0 100 0 12 0 24 13 63 13 20 3 8 69 14 34 46 11 9 15 34 27 24 15 16 25 37 22 16 XX BF T00207; E47 ; human XX BA 38 selected binding sequences XX CC random 35-mers bound by in vitro translated E47 after 6 CASTing CC cycles; figures are percentages XX RN [1] RA -Hsu H.-L., Huang L., Tsan J. T., Funk W., Wright W. E., Hu RA J.-S., Kingston R. E., Baer R. RT Preferred sequences for DNA recognition by the TAL1 helix-loop-helix RT proteins RL Mol. Cell. Biol. 14:1256-1265 (1994). XX // AC M00072 XX ID V$CP2_01 XX NA CP2 XX DT ewi (created); 21.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 1 5 0 02 0 5 1 0 03 2 2 0 2 04 2 4 0 0 05 2 0 2 2 06 5 1 0 0 07 2 3 1 0 08 0 6 0 0 09 2 4 0 0 10 6 0 0 0 11 6 0 0 0 XX BF T00152; CP2 ; mouse XX BA 6 binding sites from the murine alpha-globin gene promoter XX CC compiled sequences XX RN [1] RA Kim C. G., Swendeman S. L., Barnhart K. M., Sheffery M. RT Promoter elements and erythroid cell nuclear factors that regulate RT alpha-globin gene transcription in vitro RL Mol. Cell. Biol. 10:5958-5966 (1990). XX // AC M00073 XX ID V$DELTAEF1_01 XX NA deltaEF1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 32 22 14 32 02 20 39 7 34 03 15 24 15 46 04 0 83 3 14 05 97 0 0 3 06 0 100 0 0 07 0 97 3 0 08 0 0 0 100 09 24 5 39 32 10 54 12 22 12 11 39 20 27 12 XX BF T01467; deltaEF1 ; chick XX BA 41 selected binding sequences XX CC 3 rounds of selection of random 15-mers bound to bacterially CC expressed C-terminal zinc finger domain of GST-deltaEF1 fusion CC protein; figures are percentages XX RN [1] RA Sekido R., Murai K., Funahashi J.-i., Kamachi Y., Fujisawa-Sehara RA A., Nabeshima Y.-I., Kondoh H. RT The delta-crystallin enhancer-binding protein deltaEF1 is a RT repressor of E2-box-mediated gene activation RL Mol. Cell. Biol. 14:5692-5700 (1994). XX // AC M00074 XX ID V$CETS1P54_02 XX NA c-Ets-1(p54) XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 9 11 9 4 02 9 11 6 9 03 19 4 4 9 04 10 20 6 2 05 13 21 1 5 06 0 0 40 0 07 0 0 40 0 08 40 0 0 0 09 23 0 0 17 10 15 5 15 4 11 10 8 1 20 12 9 5 7 15 13 12 9 7 7 XX BF T00114; c-Ets-1 54 ; chick XX BA 40 selected binding sequences XX CC 3 rounds of selection of random oligonucleotides XX RN [1] RA Woods D. B., Ghysdael J., Owen M. J. RT Identification of nucleotide preferences in DNA sequences recognised RT specifially by c-Ets-1 protein RL Nucleic Acids Res. 20:699-704 (1992). XX // AC M00075 XX ID V$GATA1_01 XX NA GATA-1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 5 22 20 6 02 11 17 20 5 03 11 15 10 17 04 0 0 53 0 05 50 1 0 2 06 1 1 1 50 07 18 6 18 11 08 6 14 24 9 09 6 14 18 14 10 7 11 19 15 XX BF T00305; GATA-1 ; mouse XX BA 53 selected binding sequences XX RN [1] RA Merika M., Orkin S. H. RT DNA-binding specificity of GATA family transcription factors RL Mol. Cell. Biol. 13:3999-4010 (1993). XX // AC M00076 XX ID V$GATA2_01 XX NA GATA-2 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 6 16 18 13 02 15 18 18 2 03 13 12 18 10 04 0 4 49 0 05 52 0 0 1 06 0 1 0 52 07 27 6 14 6 08 7 13 23 10 09 5 20 15 13 10 7 13 19 14 XX BF T00308; GATA-2 ; human XX BA 53 selected binding sequences XX RN [1] RA Merika M., Orkin S. H. RT DNA-binding specificity of GATA family transcription factors RL Mol. Cell. Biol. 13:3999-4010 (1993). XX // AC M00077 XX ID V$GATA3_01 XX NA GATA-3 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 16 14 24 9 02 26 13 9 15 03 0 0 63 0 04 62 0 0 1 05 1 0 3 59 06 35 2 10 16 07 17 5 33 8 08 10 11 26 16 09 15 7 34 7 XX BF T00311; GATA-3 ; human XX BA 63 selected binding sequences XX RN [1] RA Merika M., Orkin S. H. RT DNA-binding specificity of GATA family transcription factors RL Mol. Cell. Biol. 13:3999-4010 (1993). XX // AC M00078 XX ID V$EVI1_01 XX NA Evi-1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 7 0 1 5 02 0 0 13 0 03 13 0 0 0 04 0 5 1 7 05 12 0 0 1 06 13 0 0 0 07 0 0 13 0 08 13 0 0 0 09 0 3 0 10 10 11 0 0 2 11 13 0 0 0 12 0 0 13 0 13 13 0 0 0 14 0 1 1 11 15 11 1 0 1 16 11 1 0 1 XX BF T00273; Evi-1 ; mouse XX BA 13 selected binding sequences XX CC 6 rounds of selection of random 35-mers bound to bacterially CC expressed fusion protein of GST and the N-terminal zinc finger CC domain of Evi-1; sequences that yield to a 15-Nt consensus XX RN [1] RA Delwel R., Funabiki T., Kreider B. L., Morishita K., Ihle J. N. RT Four of the seven zinc fingers of the Evi-1 myeloid-transforming RT gene are required for sequence-specific binding to GA(C/T)AAG RT A(T/C)AAGATAA RL Mol. Cell. Biol. 13:4291-4300 (1993). XX // AC M00079 XX ID V$EVI1_02 XX NA Evi-1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 8 1 2 3 02 0 0 13 1 03 14 0 0 0 04 0 7 0 7 05 14 0 0 0 06 14 0 0 0 07 0 0 14 0 08 14 0 0 0 09 0 3 0 11 10 13 1 0 0 11 11 0 3 0 XX BF T00273; Evi-1 ; mouse XX BA 13 selected binding sequences XX CC 6 rounds of selection of random 35-mers bound to bacterially CC expressed fusion protein of GST and the N-terminal zinc finger CC domain of Evi-1; sequences that yield to a 10-Nt consensus XX RN [1] RA Delwel R., Funabiki T., Kreider B. L., Morishita K., Ihle J. N. RT Four of the seven zinc fingers of the Evi-1 myeloid-transforming RT gene are required for sequence-specific binding to GA(C/T)AAG RT A(T/C)AAGATAA RL Mol. Cell. Biol. 13:4291-4300 (1993). XX // AC M00080 XX ID V$EVI1_03 XX NA Evi-1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 23 0 0 4 02 0 0 27 0 03 27 0 0 0 04 0 2 0 25 05 26 0 0 1 06 27 0 0 0 07 0 0 27 0 08 27 0 0 0 09 0 0 0 27 10 26 0 0 1 11 19 1 4 3 XX BF T00273; Evi-1 ; mouse XX BA 27 selected binding sequences XX CC random 15-mers bound to bacterially expressed GST-fusion protein CC of Evi-1 zinc fingers 5 to 7 XX RN [1] RA Delwel R., Funabiki T., Kreider B. L., Morishita K., Ihle J. N. RT Four of the seven zinc fingers of the Evi-1 myeloid-transforming RT gene are required for sequence-specific binding to GA(C/T)AAG RT A(T/C)AAGATAA RL Mol. Cell. Biol. 13:4291-4300 (1993). XX // AC M00081 XX ID V$EVI1_04 XX NA Evi-1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 3 0 3 2 02 2 0 6 0 03 6 1 0 1 04 1 1 0 6 05 7 0 0 1 06 3 0 2 3 07 2 0 6 0 08 6 0 0 2 09 2 2 0 4 10 5 0 0 3 11 5 0 2 1 12 2 0 6 0 13 6 0 0 2 14 2 0 0 6 15 6 1 0 1 XX BF T00273; Evi-1 ; mouse XX BA 8 selected binding sequences XX CC random 15-mers bound to bacterially expressed GST-fusion protein CC of Evi-1 zinc fingers 4 to 6; sequences yielding a 14-Nt consensus XX RN [1] RA Delwel R., Funabiki T., Kreider B. L., Morishita K., Ihle J. N. RT Four of the seven zinc fingers of the Evi-1 myeloid-transforming RT gene are required for sequence-specific binding to GA(C/T)AAG RT A(T/C)AAGATAA RL Mol. Cell. Biol. 13:4291-4300 (1993). XX // AC M00082 XX ID V$EVI1_05 XX NA Evi-1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 14 0 1 4 02 1 0 18 0 03 18 1 0 0 04 0 4 0 15 05 18 0 0 1 06 15 0 1 3 07 1 0 18 0 08 18 0 0 1 09 1 0 0 18 10 18 0 0 1 11 7 1 6 4 XX BF T00273; Evi-1 ; mouse XX BA 19 selected binding sequences XX CC random 15-mers bound to bacterially expressed GST-fusion protein CC of Evi-1 zinc fingers 4 to 6; sequences giving rise to a 10-Nt consensus XX RN [1] RA Delwel R., Funabiki T., Kreider B. L., Morishita K., Ihle J. N. RT Four of the seven zinc fingers of the Evi-1 myeloid-transforming RT gene are required for sequence-specific binding to GA(C/T)AAG RT A(T/C)AAGATAA RL Mol. Cell. Biol. 13:4291-4300 (1993). XX // AC M00083 XX ID V$MZF1_01 XX NA MZF1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 8 3 6 3 02 4 2 11 3 03 3 3 5 9 04 0 0 20 0 05 1 0 18 1 06 0 0 19 1 07 0 0 20 0 08 18 1 1 0 XX BF T00529; MZF-1 ; human XX BA 20 selected binding sequences XX CC 14-mers bound to bacterially expressed MZF1 zinc fingers 1-4 XX RN [1] RA Morris J. F., Hromas R., Rauscher III F. J. RT Characterization of the DNA-binding properties of the myeloid RT zinc finger protein MZF1: two independent DNA-binding domains RT recognize two DNA consensus sequences with a common G-rich core RL Mol. Cell. Biol. 14:1786-1795 (1994). XX // AC M00084 XX ID V$MZF1_02 XX NA MZF1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 2 1 7 6 02 3 2 7 4 03 4 1 8 3 04 2 4 4 6 05 0 0 10 6 06 13 0 3 0 07 1 0 12 3 08 0 0 15 1 09 1 1 14 0 10 1 0 13 2 11 3 4 7 2 12 11 0 2 3 13 10 3 2 1 XX BA 16 selected binding sequences XX CC random 14-mers bound to bacterially expressed MZF1 zinc fingers 5-13 XX RN [1] RA Morris J. F., Hromas R., Rauscher III F. J. RT Characterization of the DNA-binding properties of the myeloid RT zinc finger protein MZF1: two independent DNA-binding domains RT recognize two DNA consensus sequences with a common G-rich core RL Mol. Cell. Biol. 14:1786-1795 (1994). XX // AC M00085 XX ID V$ZID_01 XX NA ZID XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 3 12 9 11 02 2 5 28 0 03 1 0 31 3 04 0 35 0 0 05 1 0 0 34 06 0 35 0 0 07 3 15 1 16 08 32 1 1 1 09 0 3 8 24 10 1 23 7 4 11 23 0 7 5 12 0 15 0 20 13 4 26 2 3 XX BF T01468; ZID ; human XX BA 35 selected binding sequences XX RN [1] RA Bardwell V. J., Treisman R. RT The POZ domain: A conserved protein--protein interaction motif RL Genes Dev. 8:1664-1677 (1994). XX // AC M00086 XX ID V$IK1_01 XX NA Ik-1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 7 5 7 5 02 6 9 0 9 03 3 3 7 11 04 2 6 1 15 05 2 0 22 0 06 0 0 24 0 07 0 0 24 0 08 24 0 0 0 09 22 0 0 2 10 3 4 2 15 11 11 2 8 3 12 3 13 2 6 13 3 14 3 4 XX BF T01469; Ik-1 ; mouse XX BA 24 selected binding sequences XX CC 5 rounds of selection of 15-mers bound to bacterially expressed CC GST-Ik-1 fusion protein XX RN [1] RA Molnár Á., Georgopoulos K. RT The Ikaros gene encodes a family of functionally diverse zinc RT finger DNA-binding proteins RL Mol. Cell. Biol. 14:8292-8303 (1994). XX // AC M00087 XX ID V$IK2_01 XX NA Ik-2 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 9 10 12 5 02 11 5 7 13 03 5 7 7 17 04 6 10 0 20 05 6 0 30 0 06 0 0 36 0 07 0 0 36 0 08 36 0 0 0 09 18 5 1 12 10 10 7 6 13 11 12 8 9 7 12 9 18 5 4 XX BF T01470; Ik-2 ; mouse XX BA 36 selected binding sequences XX CC 5 rounds of selection of 15-mers bound to bacterially expressed CC GST-Ik-2 fusion protein XX RN [1] RA Molnár Á., Georgopoulos K. RT The Ikaros gene encodes a family of functionally diverse zinc RT finger DNA-binding proteins RL Mol. Cell. Biol. 14:8292-8303 (1994). XX // AC M00088 XX ID V$IK3_01 XX NA Ik-3 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 3 0 7 15 02 8 8 5 4 03 0 13 3 9 04 6 5 0 14 05 5 0 20 0 06 0 0 25 0 07 0 0 25 0 08 25 0 0 0 09 25 0 0 0 10 0 7 0 18 11 16 0 5 4 12 1 18 0 6 13 1 17 0 7 XX BF T01471; Ik-3 ; mouse XX BA 25 selected binding sequences XX CC 5 rounds of selection of 15-mers bound to bacterially expressed CC GST-Ik-3 fusion protein XX RN [1] RA Molnár Á., Georgopoulos K. RT The Ikaros gene encodes a family of functionally diverse zinc RT finger DNA-binding proteins RL Mol. Cell. Biol. 14:8292-8303 (1994). XX // AC M00089 XX ID P$ATHB1_01 XX NA Athb-1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 7 3 3 4 02 1 5 3 10 03 4 5 6 4 04 3 13 4 3 05 22 1 0 1 06 25 0 0 0 07 0 0 0 25 08 0 5 0 20 09 25 0 0 0 10 0 0 0 25 11 0 0 0 25 12 3 1 12 0 13 2 8 4 1 14 8 1 3 1 XX BF T01474; Athb-1 ; mouse-ear cress XX BA 24 selected binding sequences XX CC 4 rounds of selection of 15-mers bound to bacterially expressed CC GST-Athb-1 fusion protein XX RN [1] RA Sessa G., Morelli G., Ruberti I. RT The Athb-1 and -2 HD-Zip domains homodimerize forming complexes RT of different DNA binding specificities RL EMBO J. 12:3507-3517 (1993). XX // AC M00090 XX ID I$ABDB_01 XX NA Abd-B XX DT hiwi (created); 24.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 8 5 8 3 02 3 12 11 3 03 9 9 9 7 04 0 1 7 28 05 4 2 4 29 06 11 2 8 24 07 36 2 0 7 08 6 7 1 31 09 2 2 30 9 10 10 4 24 4 11 1 32 6 1 12 8 10 11 7 13 11 8 8 4 14 5 9 7 6 XX BF T01476; Abd-B ; fruit fly XX BA 45 selected binding sequences XX CC 2 rounds of selection of 18-mers bound to bacterially expressed Abd-B protein XX RN [1] RA Ekker S. C., Jackson D. G., von Kessler D. P., Sun B. I., Young RA K. E., Beachy P. A. RT The degree of variation in DNA sequence recognition among four RT Drosophila homeotic proteins RL EMBO J. 13:3551-3560 (1994). XX // AC M00091 XX ID I$BRCZ1_01 XX NA BR-C Z1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 5 0 0 4 02 3 1 2 3 03 3 1 4 1 04 1 1 1 6 05 5 2 1 1 06 7 0 1 1 07 3 0 0 6 08 6 0 0 3 09 4 1 4 0 10 7 0 1 1 11 1 8 0 0 12 9 0 0 0 13 8 0 1 0 14 5 0 3 1 15 4 0 0 5 16 2 3 2 2 17 3 2 0 4 18 3 3 2 1 XX BF T01477; BR-C Z1 ; fruit fly XX BA 9 binding sites within the Pig-1/Sgs-4 intergenic region XX CC compiled sequences, bound in vitro by bacterially expressed protein XX RN [1] RA von Kalm L., Crossgrove K., Von Seggern D., Guild G. M., Beckendorf RA S. K. RT The Broad-Complex directly controls a tissue-specific response RT to the steroid hormone ecdysone at the onset of Drosophila metamorpho RT sis RL EMBO J. 13:3505-3516 (1994). XX // AC M00092 XX ID I$BRCZ2_01 XX NA BR-C Z2 XX DT hiwi (created); 24.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 3 1 3 5 02 2 3 2 5 03 0 5 3 4 04 2 1 2 7 05 2 1 3 6 06 3 1 1 7 07 5 2 1 4 08 0 10 0 2 09 0 1 0 11 10 12 0 0 0 11 1 1 2 8 12 2 0 1 9 13 1 2 1 8 14 2 2 3 5 15 1 0 3 8 16 1 3 1 7 XX BF T01478; BR-C Z2 ; fruit fly XX BA 12 binding sites within the Pig-1/Sgs-4 intergenic region XX CC compiled sequences, bound in vitro by bacterially expressed protein XX RN [1] RA von Kalm L., Crossgrove K., Von Seggern D., Guild G. M., Beckendorf RA S. K. RT The Broad-Complex directly controls a tissue-specific response RT to the steroid hormone ecdysone at the onset of Drosophila metamorpho RT sis RL EMBO J. 13:3505-3516 (1994). XX // AC M00093 XX ID I$BRCZ3_01 XX NA BR-C Z3 XX DT hiwi (created); 24.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 3 2 3 4 02 6 1 3 2 03 4 3 3 2 04 3 1 1 7 05 9 2 1 0 06 10 0 0 2 07 12 0 0 0 08 0 10 1 1 09 4 0 0 8 10 10 0 0 2 11 6 0 4 2 12 6 1 2 3 13 4 3 2 3 14 2 3 5 2 15 5 3 1 3 XX BF T01479; BR-C Z3 ; fruit fly XX BA 12 binding sites within the Pig-1/Sgs-4 intergenic region XX CC compiled sequences, bound in vitro by bacterially expressed protein XX RN [1] RA von Kalm L., Crossgrove K., Von Seggern D., Guild G. M., Beckendorf RA S. K. RT The Broad-Complex directly controls a tissue-specific response RT to the steroid hormone ecdysone at the onset of Drosophila metamorpho RT sis RL EMBO J. 13:3505-3516 (1994). XX // AC M00094 XX ID I$BRCZ4_01 XX NA BR-C Z4 XX DT hiwi (created); 24.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 3 0 0 3 02 2 1 0 3 03 3 0 1 2 04 3 0 2 1 05 0 0 2 4 06 6 0 0 0 07 6 0 0 0 08 6 0 0 0 09 0 3 2 1 10 4 0 1 1 11 3 1 0 2 12 5 0 0 1 13 4 0 0 2 XX BF T01480; BR-C Z4 ; fruit fly XX BA 6 binding sites within the Pig-1/Sgs-4 intergenic region XX CC compiled sequences, bound in vitro by bacterially expressed protein XX RN [1] RA von Kalm L., Crossgrove K., Von Seggern D., Guild G. M., Beckendorf RA S. K. RT The Broad-Complex directly controls a tissue-specific response RT to the steroid hormone ecdysone at the onset of Drosophila metamorpho RT sis RL EMBO J. 13:3505-3516 (1994). XX // AC M00095 XX ID V$CDP_01 XX NA CDP XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 1 16 0 1 02 0 17 0 1 03 17 0 0 1 04 17 1 0 0 05 0 0 0 18 06 18 0 0 0 07 17 0 0 1 08 0 3 0 15 09 0 14 2 2 10 5 0 13 0 11 14 0 1 3 12 1 1 2 14 XX BF T00100; CDP ; human XX BA 18 selected binding sequences with anchored ATA trinucleotide XX CC 5 rounds of selection with bacterially expressed GST-CDP CR1 CC (cut repeat 1) fusion protein XX RN [1] RA Aufiero B., Neufeld E. J., Orkin S. H. RT Sequence-specific DNA binding of individual cut repeats of the RT human CCAAT displacement/ cut homeodomain protein RL Proc. Natl. Acad. Sci. USA 91:7757-7761 (1994). XX // AC M00096 XX ID V$PBX1_01 XX NA Pbx-1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 12 1 0 3 02 3 4 1 8 03 1 9 2 4 04 13 1 0 2 05 14 0 0 2 06 2 0 0 14 07 4 10 0 2 08 14 0 1 1 09 10 0 0 6 XX BF T01481; Pbx-1 ; human XX BA 16 selected binding sequences XX RN [1] RA Van Dijk M. A., Voorhoeve M., Murre C. RT Pbx1 is converted into a transcriptional activator upon acquiring RT the N-terminal region of E2A in pre-B-cell acute lymphoblastoid RT leukemia RL Proc. Natl. Acad. Sci. USA 90:6061-6065 (1993). XX // AC M00097 XX ID V$PAX6_01 XX NA Pax-6 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 14 7 6 10 02 21 9 3 10 03 10 9 10 18 04 8 14 9 16 05 3 2 4 38 06 2 0 1 44 07 3 29 1 14 08 40 5 1 1 09 3 39 0 5 10 1 0 44 2 11 1 36 7 2 12 23 2 1 21 13 1 4 0 42 14 2 13 26 3 15 40 1 6 0 16 14 11 15 7 17 2 4 3 37 18 1 0 20 25 19 13 17 9 4 20 14 8 4 6 21 4 12 3 9 XX BF T00681; Pax-6 ; mouse BF T01122; Pax-6 ; human XX BA 47 selected binding sequences XX CC selection from random 25-mers bound to bacterially expressed CC GST-hPax-6 fusion protein XX RN [1] RA Epstein J., Cai J., Glaser T., Jepeal L., Maas R. RT Identification of a Pax paired domain recognition sequence and RT evidence for DNA-dependent conformational changes RL J. Biol. Chem. 269:8355-8361 (1994). XX // AC M00098 XX ID V$PAX2_01 XX NA Pax-2 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 7 10 4 6 02 5 10 7 9 03 7 8 2 15 04 8 10 6 8 05 8 1 19 4 06 5 2 0 25 07 1 20 3 8 08 21 3 3 5 09 6 13 3 10 10 3 3 20 6 11 3 13 7 7 12 14 1 11 4 13 4 5 3 18 14 4 3 14 8 15 18 5 2 4 16 3 8 5 9 17 0 5 7 6 18 3 3 5 2 19 3 3 2 2 XX BF T00678; Pax-2 ; mouse XX BA 32 selected binding sequences XX CC selection from random 25-mers bound to bacterially expressed CC GST-mPax-2 fusion protein XX RN [1] RA Epstein J., Cai J., Glaser T., Jepeal L., Maas R. RT Identification of a Pax paired domain recognition sequence and RT evidence for DNA-dependent conformational changes RL J. Biol. Chem. 269:8355-8361 (1994). XX // AC M00099 XX ID V$S8_01 XX NA S8 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 6 2 2 7 02 12 7 4 15 03 11 7 13 14 04 25 4 8 10 05 14 14 12 10 06 6 20 5 20 07 1 27 0 25 08 52 2 3 0 09 59 0 0 0 10 0 0 0 59 11 0 0 0 59 12 59 0 0 0 13 16 3 14 8 14 1 10 3 5 15 2 6 4 3 16 4 1 2 5 XX BF T01483; S8 ; mouse XX BA 59 selected binding sequences XX CC 3 rounds of selection from random dodekamers bound to the bacterially CC expressed S8 homeodomain XX RN [1] RA de Jong R., van der Heijden J., Meijlink F. RT DNA-binding specificity of the S8 homeodomain RL Nucleic Acids Res. 21:4711-4720 (1993). XX // AC M00100 XX ID V$CDXA_01 XX NA CdxA XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 9 6 2 2 02 4 0 1 14 03 1 0 0 18 04 4 0 2 13 05 11 3 4 1 06 4 3 2 10 07 6 3 9 1 XX BF T01484; CdxA ; chick XX BA 19 selected binding sequences XX CC 3 rounds of selection from random 20-mers bound to bacterially CC expressed GST-CdxA fusion protein XX RN [1] RA Margalit Y., Yarus S., Shapira E., Gruenbaum Y., Fainsod A. RT Isolation and characterization of target sequences of the chicken RT CdxA homeobox gene RL Nucleic Acids Res. 21:4915-4922 (1993). XX // AC M00101 XX ID V$CDXA_02 XX NA CdxA XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 20 3 3 10 02 16 0 2 18 03 9 0 0 27 04 17 0 2 17 05 22 14 0 0 06 2 7 3 24 07 17 0 16 3 XX BF T01484; CdxA ; chick XX BA 18 selected synthetic and genomic binding sequences XX CC 3 rounds of selection from random 20-mers bound to bacterially CC expressed GST-CdxA fusion protein plus sequences from 4 rounds CC of selection of genomic binding sites XX RN [1] RA Margalit Y., Yarus S., Shapira E., Gruenbaum Y., Fainsod A. RT Isolation and characterization of target sequences of the chicken RT CdxA homeobox gene RL Nucleic Acids Res. 21:4915-4922 (1993). XX // AC M00102 XX ID V$CDP_02 XX NA CDP XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 33 8 17 42 02 46 11 10 33 03 19 8 29 44 04 100 0 0 0 05 0 0 0 100 06 4 77 0 19 07 0 0 100 0 08 100 0 0 0 09 0 0 0 100 10 15 25 2 58 11 90 2 6 2 12 19 9 25 46 13 18 32 2 48 14 31 8 21 40 15 29 15 10 46 XX BA 86 selected binding sequences XX CC 9 rounds of selection from random 15-mers bound to CDP overexpressed CC in COS cells; figures are percentages XX RN [1] RA Andrés V., Chiara M. D., Mahdavi V. RT A new bipartite DNA-binding domain: cooperative interaction RT between the cut repeat and homeo domain of the cut homeo proteins RL Genes Dev. 8:245-257 (1994). XX // AC M00103 XX ID V$CLOX_01 XX NA Clox XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 27 11 21 41 02 45 16 12 27 03 17 6 24 53 04 100 0 0 0 05 0 0 0 100 06 1 73 0 26 07 0 0 100 0 08 100 0 0 0 09 0 0 0 100 10 8 20 2 70 11 86 2 4 8 12 18 11 23 48 13 11 32 6 51 14 21 10 27 42 15 37 10 12 41 XX BF T01485; Clox ; dog XX BA 138 selected binding sequences XX CC 9 rounds of selection from random 15-mers bound to CDP overexpressed CC in COS cells; figures are percentages XX RN [1] RA Andrés V., Chiara M. D., Mahdavi V. RT A new bipartite DNA-binding domain: cooperative interaction RT between the cut repeat and homeo domain of the cut homeo proteins RL Genes Dev. 8:245-257 (1994). XX // AC M00104 XX ID V$CDPCR1_01 XX NA CDP CR1 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 9 7 8 8 02 31 1 0 0 03 2 3 1 26 04 0 22 3 7 05 4 1 26 1 06 30 1 0 1 07 0 1 1 30 08 2 18 5 6 09 1 5 22 2 10 3 12 8 2 XX BF T00100; CDP ; human XX BA 32 selected binding sequences XX CC 5 rounds of selection from random 15-mers bound to bacterially CC expressed GST-CDP CR1 (cut repeat 1) fusion protein XX RN [1] RA Harada R., Bérubé G., Tamplin O. J., Denis-Larose C., Nepveu A. RT DNA-binding specificity of the cut repeats from the human cut-like RT protein RL Mol. Cell. Biol. 15:129-140 (1995). XX // AC M00105 XX ID V$CDPCR3_01 XX NA CDP CR3 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 1 21 1 0 02 20 2 1 0 03 1 20 2 0 04 6 14 4 0 05 12 2 9 1 06 21 0 3 0 07 1 3 3 17 08 17 2 4 1 09 5 7 4 8 10 6 4 11 3 11 0 1 0 23 12 24 0 0 0 13 0 0 0 24 14 0 8 9 7 15 0 0 24 0 XX BF T00100; CDP ; human XX BA 24 selected binding sequences XX CC 5 rounds of selection from random 15-mers bound to bacterially CC expressed GST-CDP CR3 (cut repeat 3) fusion protein XX RN [1] RA Harada R., Bérubé G., Tamplin O. J., Denis-Larose C., Nepveu A. RT DNA-binding specificity of the cut repeats from the human cut-like RT protein RL Mol. Cell. Biol. 15:129-140 (1995). XX // AC M00106 XX ID V$CDPCR3HD_01 XX NA CDP CR3+HD XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 8 11 12 2 02 28 0 4 1 03 3 0 0 30 04 3 15 3 12 05 0 3 30 0 06 33 0 0 0 07 0 0 0 33 08 0 18 10 5 09 1 13 12 6 10 1 14 13 2 XX BF T00100; CDP ; human XX BA 33 selected binding sequences XX CC 5 rounds of selection from random 15-mers bound to bacterially CC expressed GST-CDP CR3 (cut repeat 3) plus HD (homeodomain) fusion protein XX RN [1] RA Harada R., Bérubé G., Tamplin O. J., Denis-Larose C., Nepveu A. RT DNA-binding specificity of the cut repeats from the human cut-like RT protein RL Mol. Cell. Biol. 15:129-140 (1995). XX // AC M00107 XX ID V$E2_01 XX NA E2 XX DT hiwi (created); 24.04.95. DT ewi (updated); 30.05.95. XX PO A C G T 01 4 4 8 1 02 5 7 3 2 03 17 0 0 0 04 3 14 0 0 05 0 17 0 0 06 6 2 8 1 07 5 3 4 5 08 5 5 1 6 09 9 4 3 1 10 4 8 2 3 11 2 12 0 3 12 0 0 17 0 13 0 0 15 2 14 0 0 0 17 15 8 1 5 3 16 4 7 3 3 XX BF T00205; E2 ; BPV-1 XX BA 17 binding sites from BPV genome XX CC compiled sequences XX RN [1] RA Li R., Knight J., Bream G., Stenlund A., Botchan M. RT Specific recognition nucleotides and their DNA context determine RT the affinity of E2 protein for 17 binding sites in the BPV-1 genome RL Genes Dev. 3:510-526 (1989). XX // AC M00108 XX ID V$NRF2_01 XX NA NRF-2 XX DT ewi (created); 05.05.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 5 1 1 0 02 1 5 1 0 03 1 6 0 0 04 0 0 7 0 05 0 0 7 0 06 7 0 0 0 07 7 0 0 0 08 0 0 7 0 09 3 1 1 2 10 0 2 4 1 XX BF T00975; NRF-2 ; human BF T01198; NRF-2 ; monkey BF T01199; NRF-2 ; rat XX BA 7 binding sites from 3 genes XX CC compiled sequences XX RN [1] RA Virbasius J. V., Virbasius C.-M. A., Scarpulla R. C. RT Identity of GABP with NRF-2, a multisubunit activator of cytochrome RT oxidase expression, reveals a cellular role for an Ets domain RT activator of viral promoters RL Genes Dev. 7:380-392 (1993). XX // AC M00109 XX ID V$CEBPB_01 XX NA C/EBPbeta XX DT ewi (created); 05.05.95. DT ewi (updated); 16.05.95. XX PO A C G T 01 9 1 8 3 02 5 3 7 6 03 9 3 8 1 04 1 0 0 20 05 1 0 7 13 06 7 0 8 6 07 5 2 5 9 08 0 0 21 0 09 8 9 0 4 10 20 1 0 0 11 21 0 0 0 12 3 1 6 11 13 5 6 7 3 14 3 3 6 9 XX BF T00017; AGP/EBP ; mouse BF T00459; LAP ; rat BF T00581; NF-IL6 ; human XX BA 21 binding sites from 14 cellular genes and 3 viral genomes XX CC compiled sequences XX RN [1] RA Akira S., Isshiki H., Sugita T., Tanabe O., Kinoshita S., Nishio RA Y., Nakajima T., Hirano T., Kishimoto T. RT A nuclear factor for IL-6 expression (NF-IL6) is a member of RT a C/EBP family RL EMBO J. 9:1897-1906 (1990). XX // AC M00110 XX ID I$ELF1_01 XX NA Elf-1 XX DT ewi (created); 05.05.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 1 1 2 02 1 1 2 1 03 1 0 2 2 04 2 0 1 2 05 2 1 1 1 06 0 2 1 2 07 0 0 4 1 08 0 0 5 0 09 0 0 0 5 10 1 0 1 3 11 0 0 0 5 12 1 1 0 3 13 1 0 3 1 14 2 1 0 2 15 3 1 1 0 16 1 1 1 2 XX BF T01019; Elf-1 ; fruit fly XX BA 5 binding sites from 2 genes XX CC compiled sequences XX RN [1] RA Dynlacht B. D., Attardi L. D., Admon A., Freeman M., Tjian R. RT Functional analysis of NTF-1, a developmentally regulated Drosophila RT transcription factor that binds neuronal cis elements RL Genes Dev. 3:1677-1688 (1989). XX // AC M00111 XX ID I$CF1_01 XX NA CF1 / USP XX DT ewi (created); 11.05.95. DT ewi (updated); 11.05.95. XX PO A C G T 01 15 4 80 1 02 25 6 66 3 03 1 3 93 3 04 2 4 90 4 05 1 0 7 92 06 0 84 10 6 07 80 5 14 1 08 11 47 11 31 09 12 29 49 10 XX BF T00117; CF1 ; fruit fly XX BA 71 selected binding sequences XX CC 3 rounds of selection and amplification from random 20-mers; CC figures are percentages XX RN [1] RA Christianson A. M. K., King D. L., Hatzivassiliou E., Casas RA J. E., Hallenbeck P. L., Nikodem V. M., Mitsialis S. A., Kafatos F. C. RT DNA binding and heteromerization of the Drosophila transcription RT factor chorion factor 1/ultraspiracle RL Proc. Natl. Acad. Sci. USA 89:11503-11507 (1992). XX // AC M00112 XX ID I$CF1_02 XX NA CF1 / USP XX DT ewi (created); 11.05.95. DT ewi (updated); 11.05.95. XX PO A C G T 01 3 3 94 0 02 5 0 95 0 03 0 0 100 0 04 0 0 100 0 05 0 0 0 100 06 0 94 3 3 07 95 0 5 0 08 11 68 5 16 09 13 24 55 8 XX BF T00117; CF1 ; fruit fly XX BA 38 selected binding sequences XX CC 5 rounds of selection and amplification from random 20-mers; CC figures are percentages XX RN [1] RA Christianson A. M. K., King D. L., Hatzivassiliou E., Casas RA J. E., Hallenbeck P. L., Nikodem V. M., Mitsialis S. A., Kafatos F. C. RT DNA binding and heteromerization of the Drosophila transcription RT factor chorion factor 1/ultraspiracle RL Proc. Natl. Acad. Sci. USA 89:11503-11507 (1992). XX // AC M00113 XX ID V$CREB_02 XX NA CREB XX DT ewi (created); 12.05.95. DT ewi (updated); 12.05.95. XX PO A C G T 01 2 5 4 5 02 1 3 6 6 03 2 2 12 0 04 5 1 7 3 05 0 0 0 16 06 0 0 15 1 07 16 0 0 0 08 0 15 1 0 09 0 0 15 1 10 1 5 0 10 11 4 7 3 2 12 5 6 3 2 XX BA 16 selected binding sequences XX CC 5 cycles of selection and amplification from random 15-mers XX RN [1] RA Paca-Uccaralertkun S., Zhao L.-J., Adya N., Cross J. V., Cullen RA B. R., Boros I. M., Giam C.-Z. RT In vitro selection of DNA elements highly responsive to the RT human T-cell lymphotropic virus type I transcriptional activator, Tax RL Mol. Cell. Biol. 14:456-462 (1994). XX // AC M00114 XX ID V$TAXCREB_01 XX NA Tax/CREB XX DT ewi (created); 12.05.95. DT ewi (updated); 12.05.95. XX PO A C G T 01 2 2 9 4 02 2 2 10 3 03 0 2 14 1 04 0 2 15 0 05 1 0 16 0 06 1 2 4 10 07 0 0 0 17 08 0 0 14 3 09 17 0 0 0 10 0 17 0 0 11 0 0 17 0 12 1 7 2 7 13 11 5 1 0 14 4 3 5 5 15 9 2 3 3 XX BF T00163; CREB ; human BF T00164; CREB ; rat BF T00165; deltaCREB ; rat BF T00166; deltaCREB ; human BF T00793; Tax ; HTLV-I BF T00989; CREB ; mouse BF T01311; deltaCREB ; mouse XX BA 17 different selected binding sequences XX CC 5 cycles of selection and amplification from random 15-mers CC using bacterially expressed Tax and CREB; only distinct sequences CC were counted; consensus G XX RN [1] RA Paca-Uccaralertkun S., Zhao L.-J., Adya N., Cross J. V., Cullen RA B. R., Boros I. M., Giam C.-Z. RT In vitro selection of DNA elements highly responsive to the RT human T-cell lymphotropic virus type I transcriptional activator, Tax RL Mol. Cell. Biol. 14:456-462 (1994). XX // AC M00115 XX ID V$TAXCREB_02 XX NA Tax/CREB XX DT ewi (created); 12.05.95. DT ewi (updated); 12.05.95. XX PO A C G T 01 3 0 2 0 02 0 0 0 5 03 0 0 5 0 04 5 0 0 0 05 0 5 0 0 06 0 0 5 0 07 0 5 0 0 08 5 0 0 0 09 0 1 0 4 10 4 1 0 0 11 0 3 0 2 12 0 5 0 0 13 0 5 0 0 14 0 4 0 1 15 1 4 0 0 XX BF T00163; CREB ; human BF T00164; CREB ; rat BF T00165; deltaCREB ; rat BF T00166; deltaCREB ; human BF T00793; Tax ; HTLV-I BF T00989; CREB ; mouse BF T01311; deltaCREB ; mouse XX BA 5 different selected binding sequences XX CC 5 cycles of selection and amplification from random 15-mers CC using bacterially expressed Tax and CREB; only distinct sequences CC were counted; consensus C XX RN [1] RA Paca-Uccaralertkun S., Zhao L.-J., Adya N., Cross J. V., Cullen RA B. R., Boros I. M., Giam C.-Z. RT In vitro selection of DNA elements highly responsive to the RT human T-cell lymphotropic virus type I transcriptional activator, Tax RL Mol. Cell. Biol. 14:456-462 (1994). XX // AC M00116 XX ID V$CEBPA_01 XX NA C/EBPalpha XX DT ewi (created); 16.05.95. DT ewi (updated); 16.05.95. XX PO A C G T 01 16 10 7 10 02 9 10 11 13 03 23 5 8 7 04 0 0 0 43 05 0 0 1 42 06 14 2 25 2 07 4 28 8 3 08 13 11 8 11 09 15 13 2 13 10 29 9 2 3 11 30 6 5 2 12 11 11 9 12 13 10 17 7 9 14 16 8 7 12 XX BF T00104; C/EBP ; mouse BF T00105; C/EBP ; human BF T00107; C/EBP ; chick BF T00108; C/EBP ; rat BF T01388; C/EBP ; clawed frog XX BA 43 binding sites from 29 genes expressed in the liver but not BA increasing during acute phase response XX CC compiled sequences XX RN [1] RA Johnson P. F., Williams S. C. RT CCAAT/enhancer binding (C/EBP) proteins RL Liver Gene Expression, F. Tronche and M. Yaniv (eds.), R. G. Landes Comp., Austin 1994 0:231-258 (1994). XX // AC M00117 XX ID V$CEBPB_02 XX NA C/EBPbeta XX DT ewi (created); 16.05.95. DT ewi (updated); 16.05.95. XX PO A C G T 01 6 2 4 5 02 2 2 6 7 03 6 4 5 2 04 0 0 1 16 05 0 0 0 17 06 3 0 13 1 07 1 14 2 0 08 6 2 6 3 09 4 8 0 5 10 12 4 1 0 11 15 1 0 1 12 2 7 2 6 13 3 4 2 8 14 4 5 5 3 XX BF T00017; AGP/EBP ; mouse BF T00459; LAP ; rat BF T00581; NF-IL6 ; human XX BA 17 binding sites from 14 genes expressed during acute phase response XX CC compiled sequences XX RN [1] RA Johnson P. F., Williams S. C. RT CCAAT/enhancer binding (C/EBP) proteins RL Liver Gene Expression, F. Tronche and M. Yaniv (eds.), R. G. Landes Comp., Austin 1994 0:231-258 (1994). XX // AC M00118 XX ID V$MYCMAX_01 XX NA c-Myc/Max XX DT ewi (created); 18.05.95. DT ewi (updated); 18.05.95. XX PO A C G T 01 25 31 13 31 02 28 7 47 18 03 70 3 21 6 04 10 69 19 2 05 0 100 0 0 06 100 0 0 0 07 0 100 0 0 08 0 0 100 0 09 0 0 0 100 10 0 0 100 0 11 2 19 69 10 12 6 21 3 70 13 18 47 7 28 14 31 13 31 25 XX BF T00140; c-Myc ; human BF T00141; c-Myc ; chick BF T00142; c-Myc ; rat BF T00143; c-Myc ; mouse BF T00489; Max ; human XX BA 34 selected binding sequences for Myc/Max dimers XX CC cumulative analysis of bound sequences after 7 and 8 rounds CC of selection and amplification; only sequences with CACGTG core CC were taken, the flanking regions were analysed as 68 NNNNCAG CC half-sites and are mirrored for this matrix; figures are percentages XX RN [1] RA Solomon D. L. C., Amati B., Land H. RT Distinct DNA binding preferences for the c-Myc/Max and Max/Max dimers RL Nucleic Acids Res. 21:5372-5376 (1993). XX // AC M00119 XX ID V$MAX_01 XX NA Max XX DT ewi (created); 18.05.95. DT ewi (updated); 18.05.95. XX PO A C G T 01 38 16 26 20 02 42 10 33 15 03 62 3 30 5 04 30 32 15 23 05 0 100 0 0 06 100 0 0 0 07 0 100 0 0 08 0 0 100 0 09 0 0 0 100 10 0 0 100 0 11 23 15 32 30 12 5 30 3 62 13 15 33 10 42 14 20 26 16 38 XX BF T00489; Max ; human XX BA 37 selected binding sequences XX CC cumulative analysis of bound sequences after 4 and 5 rounds CC of selection and amplification; only sequences with CACGTG core CC were taken, the flanking regions were analysed as 74 NNNNCAG CC half-sites and are mirrored for this matrix; figures are percentages XX RN [1] RA Solomon D. L. C., Amati B., Land H. RT Distinct DNA binding preferences for the c-Myc/Max and Max/Max dimers RL Nucleic Acids Res. 21:5372-5376 (1993). XX // AC M00120 XX ID I$DL_02 XX NA dl XX DT ewi (created); 18.05.95. DT ewi (updated); 18.05.95. XX PO A C G T 01 3 4 0 5 02 0 3 9 0 03 5 2 5 0 04 0 0 12 0 05 12 0 0 0 06 11 0 1 0 07 12 0 0 0 08 11 0 0 1 09 3 4 1 4 10 0 8 1 3 11 4 5 3 0 XX BF T00196; dl ; fruit fly XX BA 12 binding sites from 2 genes XX CC compiled sequences XX RN [1] RA Thisse C., Perrin-Schmitt F., Stoetzel C., Thisse B. RT Sequence-specific transactivation of the Drosophila twist gene RT by the dorsal gene product RL Cell 65:1191-1201 (1991). XX // AC M00121 XX ID V$USF_01 XX NA USF XX DT ewi (created); 19.05.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 13 7 11 5 02 7 6 17 10 03 19 3 17 3 04 6 16 4 20 05 0 58 0 0 06 56 1 1 0 07 0 55 0 3 08 3 0 55 0 09 0 1 1 56 10 0 0 58 0 11 20 4 16 6 12 3 17 3 19 13 10 17 6 7 14 5 11 7 13 XX BF T00870; USF ; hamster BF T00872; USF ; chick BF T00874; USF ; human BF T00875; USF ; rat BF T00876; USF ; duck BF T00877; USF ; mouse XX BA 58 selected half-site sequences XX CC 2 cycles of binding selection and amplification in the presence CC of Mg2+ ions using purified USF (HeLa cells); figures have been CC mirrored to obtain the complete matrix XX RN [1] RA Bendall A. J., Molloy P. L. RT Base preference for DNA binding by the bHLH-Zip protein USF: RT effects of magnesiumchloride on specificity and comparison with RT binding of Myc family members RL Nucleic Acids Res. 22:2801-2810 (1994). XX // AC M00122 XX ID V$USF_02 XX NA USF XX DT ewi (created); 19.05.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 17 7 9 15 02 16 7 15 16 03 27 7 22 2 04 15 17 10 25 05 0 74 1 0 06 69 2 3 4 07 1 52 13 15 08 15 13 52 1 09 4 3 2 69 10 0 1 74 0 11 25 10 17 15 12 2 22 7 27 13 16 15 7 16 14 15 9 7 17 XX BF T00870; USF ; hamster BF T00872; USF ; chick BF T00874; USF ; human BF T00875; USF ; rat BF T00876; USF ; duck BF T00877; USF ; mouse XX BA 81 selected half-site sequences XX CC 2 cycles of binding selection and amplification in the absence CC of Mg2+ ions using purified USF (HeLa cells); figures have been CC mirrored to obtain the complete matrix XX RN [1] RA Bendall A. J., Molloy P. L. RT Base preference for DNA binding by the bHLH-Zip protein USF: RT effects of magnesiumchloride on specificity and comparison with RT binding of Myc family members RL Nucleic Acids Res. 22:2801-2810 (1994). XX // AC M00123 XX ID V$MYCMAX_02 XX NA c-Myc/Max XX DT ewi (created); 19.05.95. DT ewi (updated); 19.05.95. XX PO A C G T 01 7 7 5 10 02 21 3 1 4 03 3 11 9 6 04 0 29 0 0 05 29 0 0 0 06 0 27 0 2 07 9 0 20 0 08 2 4 0 23 09 0 0 27 2 10 7 3 6 13 11 4 12 7 6 12 14 4 0 10 XX BF T00140; c-Myc ; human BF T00141; c-Myc ; chick BF T00142; c-Myc ; rat BF T00143; c-Myc ; mouse BF T00489; Max ; human XX BA 29 selected binding sequences XX CC 3 rounds of SAAB using Myc/Max complexes immunoprecipitated CC from K562 cells; 3 matrices have been combined XX RN [1] RA Blackwell T. K., Huang J., Ma A., Kretzner L., Alt F. W., Eisenman RA R. N., Weintraub H. RT Binding of Myc proteins to canonical and noncanonical DNA sequences RL Mol. Cell. Biol. 13:5216-5224 (1993). XX // AC M00124 XX ID V$PBX1_02 XX NA Pbx1b XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 8 16 5 4 02 16 9 2 10 03 10 20 4 5 04 36 1 3 0 05 0 1 0 39 06 0 40 0 0 07 37 3 0 0 08 40 0 0 0 09 0 0 0 40 10 0 40 0 0 11 36 2 1 1 12 34 1 1 4 13 16 4 6 14 14 9 7 6 17 15 11 6 2 18 XX BA 40 selected binding sequences XX CC 6 rounds of selection from random 30-mers by binding to bacterially CC expressed Pbx1b and amplification XX RN [1] RA Lu Q., Wright D. D., Kamps M. P. RT Fusion with E2A converts the Pbx1 homeodomain protein into a RT constitutive transcriptional activator in human leukemias carrying RT the t(1;19) translocation RL Mol. Cell. Biol. 14:3938-3948 (1994). XX // AC M00125 XX ID F$MCM1_01 XX NA MCM1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 8 3 2 10 02 3 4 4 12 03 6 0 5 12 04 0 23 0 0 05 0 23 0 0 06 0 10 0 13 07 18 1 1 3 08 19 0 1 3 09 11 0 0 12 10 5 10 4 4 11 8 4 6 5 12 5 0 18 0 13 1 0 21 1 14 6 2 0 15 15 14 6 2 1 16 15 0 2 6 XX BA 23 selected binding sequences XX CC 4 rounds of selection / amplification using in vitro transcribed/translated CC MCM1[1-112]T XX RN [1] RA Wynne J., Treisman R. RT SRF and MCM1 have related but distinct DNA binding specificities RL Nucleic Acids Res. 20:3297-3303 (1992). XX // AC M00126 XX ID V$GATA1_02 XX NA GATA-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 3 3 7 6 02 5 4 2 8 03 8 5 4 2 04 5 4 5 5 05 9 4 1 5 06 0 0 20 0 07 20 0 0 0 08 0 0 0 20 09 20 0 0 0 10 6 4 9 1 11 2 1 11 6 12 4 3 11 2 13 2 4 10 4 14 6 4 4 6 XX BA 20 selected binding sequences XX CC 3 rounds to selection / amplification from random 26-mers binding CC to murine GATA-1 in crude MEL extracts and using mGATA-1 antibodies; CC only sites containing GATA consensus XX RN [1] RA Whyatt D. J., deBoer E., Grosveld F. RT The two zinc finger-like domains of GATA-1 have different DNA RT binding specificities RL EMBO J. 12:4993-5005 (1993). XX // AC M00127 XX ID V$GATA1_03 XX NA GATA-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 4 1 2 0 02 1 1 3 2 03 1 2 4 0 04 2 2 2 1 05 3 0 2 2 06 0 0 12 0 07 12 0 0 0 08 0 0 0 12 09 12 0 0 0 10 8 1 3 0 11 1 4 4 3 12 3 4 3 2 13 3 1 7 1 14 2 4 4 2 XX BA 12 selected binding sequences XX CC 3 rounds to selection / amplification from random 26-mers binding CC to murine GATA-1 in crude MEL extracts and using mGATA-1 antibodies; CC only sequences with GATT consensus XX RN [1] RA Whyatt D. J., deBoer E., Grosveld F. RT The two zinc finger-like domains of GATA-1 have different DNA RT binding specificities RL EMBO J. 12:4993-5005 (1993). XX // AC M00128 XX ID V$GATA1_04 XX NA GATA-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 15 1 11 10 02 10 11 12 15 03 11 25 9 3 04 25 2 2 19 05 0 0 48 0 06 48 0 0 0 07 0 0 0 48 08 48 0 0 0 09 27 1 16 4 10 11 8 24 5 11 12 8 18 10 12 15 13 16 4 13 8 13 14 12 XX BA 48 putative in vivo GATA-1 sites XX CC compiled genomic sequences XX RN [1] RA Whyatt D. J., deBoer E., Grosveld F. RT The two zinc finger-like domains of GATA-1 have different DNA RT binding specificities RL EMBO J. 12:4993-5005 (1993). XX // AC M00129 XX ID V$HFH1_01 XX NA HFH-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 2 3 3 6 02 10 1 2 1 03 4 2 0 8 04 2 0 0 12 05 0 0 14 0 06 0 0 0 14 07 0 0 0 14 08 0 0 0 14 09 13 0 1 0 10 0 1 3 10 11 5 0 2 7 12 1 2 0 11 XX BA 14 selected binding sequences XX CC 4 cycles of SAAB using bacterially expressed HFH-1 and random 12-mers XX RN [1] RA Overdier D. G., Porcella A., Costa R. H. RT The DNA-binding specificity of the hepatocyte nuclear factor RT 3/forkhead domain is influenced by amino acid residues adjacent RT to the recognition helix RL Mol. Cell. Biol. 14:2755-2766 (1994). XX // AC M00130 XX ID V$HFH2_01 XX NA HFH-2 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 9 3 14 6 02 21 2 0 9 03 15 1 0 16 04 1 2 0 29 05 10 0 21 1 06 0 0 0 32 07 1 0 0 31 08 5 0 5 22 09 11 0 14 7 10 5 5 1 21 11 1 2 4 25 12 1 10 0 21 XX BA 32 selected binding sequences XX CC 4 cycles of SAAB using bacterially expressed HFH-2 and random CC 12-mers; joint matrix for 3 groups of sequences XX RN [1] RA Overdier D. G., Porcella A., Costa R. H. RT The DNA-binding specificity of the hepatocyte nuclear factor RT 3/forkhead domain is influenced by amino acid residues adjacent RT to the recognition helix RL Mol. Cell. Biol. 14:2755-2766 (1994). XX // AC M00131 XX ID V$HNF3B_01 XX NA HNF-3beta XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 7 8 4 4 02 9 8 1 5 03 7 5 2 9 04 0 0 0 23 05 12 0 11 0 06 0 1 0 22 07 0 1 0 22 08 1 0 5 17 09 8 0 15 0 10 1 14 0 8 11 2 3 1 17 12 0 9 1 13 XX BA 23 binding sites from 15 genes XX CC compiled sequences XX XX // AC M00132 XX ID V$HNF1_01 XX NA HNF-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 6 0 16 3 02 2 0 24 0 03 1 0 0 25 04 2 0 0 24 05 25 0 0 1 06 20 2 1 3 07 2 0 0 24 08 7 4 8 7 09 16 0 1 9 10 3 1 1 21 11 1 4 0 21 12 17 1 5 3 13 9 11 3 3 14 7 14 0 5 15 10 7 4 4 XX BA 26 binding sites from 20 genes XX CC compiled sequences XX RN [1] RA Tronche F., Yaniv M. RT HNF1, a homeoprotein member of the hepatic transcription reulatory RT network RL BioEssays 14:579-587 (1992). XX // AC M00133 XX ID V$TST1_01 XX NA Tst-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 1 1 3 1 02 2 1 2 1 03 0 1 3 2 04 1 1 4 0 05 5 0 0 1 06 4 0 0 2 07 0 0 0 6 08 2 0 0 4 09 4 1 0 1 10 2 2 2 0 11 4 1 1 0 12 2 2 2 0 13 0 1 1 4 14 1 0 2 2 15 1 2 2 1 XX BA 6 binding sites in the P0 promoter XX CC compiled sequences XX RN [1] RA He X., Gerrero R., Simmons D. M., Park R. E., Lin C. R., Swanson RA L. W., Rosenfeld M. G. RT Tst-1, a member of the POU domein gene family, binds the promoter RT of the gene encoding the cell surface adhesion molecule po RL Mol. Cell. Biol. 11:1739-1744 (1991). XX // AC M00134 XX ID V$HNF4_01 XX NA HNF-4 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 10 4 4 6 02 6 9 7 5 03 12 6 7 6 04 12 3 14 3 05 2 0 29 1 06 5 2 17 8 07 3 8 10 11 08 1 23 1 7 09 27 1 3 1 10 29 0 3 0 11 26 0 5 1 12 3 0 28 1 13 3 1 16 12 14 2 6 6 18 15 0 24 1 7 16 22 4 4 2 17 9 9 6 6 18 7 5 13 5 19 8 3 6 7 XX BF T00372; HNF-4 ; rat BF T00373; HNF-4 ; human XX BA 32 binding sites from 24 genes XX CC compiled sequences XX XX // AC M00135 XX ID V$OCT1_01 XX NA Oct-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 9 8 4 9 02 6 8 11 14 03 13 14 12 5 04 24 11 7 9 05 26 2 9 17 06 1 1 0 54 07 56 0 0 0 08 0 0 0 56 09 0 0 56 0 10 0 53 1 2 11 43 0 0 13 12 56 0 0 0 13 55 0 1 0 14 3 0 1 52 15 14 7 9 24 16 4 11 6 32 17 8 19 14 9 18 13 11 14 10 19 12 9 14 10 XX BF T00641; Oct-1 ; human BF T00642; Oct-1 ; clawed frog BF T00643; Oct-1 ; rat BF T00644; Oct-1 ; mouse BF T00959; Oct-1 ; monkey BF T01031; Oct-1 ; chick BF T01157; Oct-1 ; gibbon ape BF T01466; Oct-1 ; hamster XX BA 56 selected binding sequences XX CC 4 rounds of selection and amplification from random 24-mers CC using the whole Oct-1 POU domain; protein was expressed in HeLa CC cells using a vaccinia expression system XX RN [1] RA Verrijzer C. P., Alkema M. J., van Weperen W. W., Van Leeuwen RA H. C., Strating M. J. J., van der Vliet P. C. RT The DNA binding specificity of the bipartite POU domain and RT its subdomains RL EMBO J. 11:4993-5003 (1992). XX // AC M00136 XX ID V$OCT1_02 XX NA Oct-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 13 9 7 4 02 10 4 8 14 03 3 5 28 5 04 33 5 4 2 05 36 2 2 4 06 2 1 0 41 07 44 0 0 0 08 0 1 0 43 09 0 1 23 20 10 0 43 0 1 11 33 3 5 3 12 12 10 6 13 13 11 7 11 11 14 10 11 6 13 15 15 7 6 12 XX BF T00641; Oct-1 ; human BF T00642; Oct-1 ; clawed frog BF T00643; Oct-1 ; rat BF T00644; Oct-1 ; mouse BF T00959; Oct-1 ; monkey BF T01031; Oct-1 ; chick BF T01157; Oct-1 ; gibbon ape BF T01466; Oct-1 ; hamster XX BA 44 selected binding sequences XX CC 4 rounds of selection and amplification from random 24-mers CC using the POU-specific domain of Oct-1; protein was expressed CC in HeLa cells using a vaccinia expression system XX RN [1] RA Verrijzer C. P., Alkema M. J., van Weperen W. W., Van Leeuwen RA H. C., Strating M. J. J., van der Vliet P. C. RT The DNA binding specificity of the bipartite POU domain and RT its subdomains RL EMBO J. 11:4993-5003 (1992). XX // AC M00137 XX ID V$OCT1_03 XX NA Oct-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 7 11 10 10 02 8 7 12 14 03 8 16 11 9 04 18 7 22 4 05 1 8 0 42 06 50 0 0 1 07 50 0 0 1 08 8 5 2 36 09 8 11 18 10 10 31 3 8 7 11 6 8 13 10 12 11 4 13 8 13 11 5 6 13 XX BF T00641; Oct-1 ; human BF T00642; Oct-1 ; clawed frog BF T00643; Oct-1 ; rat BF T00644; Oct-1 ; mouse BF T00959; Oct-1 ; monkey BF T01031; Oct-1 ; chick BF T01157; Oct-1 ; gibbon ape BF T01466; Oct-1 ; hamster XX BA 51 selected binding sequences XX CC 4 rounds of selection and amplification from random 24-mers CC using the isolated homeodomain domain of Oct-1; protein was CC expressed in HeLa cells using a vaccinia expression system XX RN [1] RA Verrijzer C. P., Alkema M. J., van Weperen W. W., Van Leeuwen RA H. C., Strating M. J. J., van der Vliet P. C. RT The DNA binding specificity of the bipartite POU domain and RT its subdomains RL EMBO J. 11:4993-5003 (1992). XX // AC M00138 XX ID V$OCT1_04 XX NA Oct-1 XX DT hiwi (created); 22.05.95. DT ewi (updated); 22.05.95. XX PO A C G T 01 6 2 6 6 02 6 6 2 7 03 12 1 6 6 04 6 6 6 13 05 12 2 12 7 06 15 7 7 10 07 11 6 9 16 08 11 7 3 23 09 41 3 2 1 10 0 4 4 39 11 3 1 34 9 12 6 27 4 10 13 36 3 0 8 14 30 0 4 13 15 44 1 2 0 16 10 1 3 29 17 11 7 6 18 18 13 9 4 12 19 16 5 5 10 20 13 3 4 11 21 9 8 4 6 22 4 7 11 4 23 12 1 3 6 XX BF T00641; Oct-1 ; human BF T00642; Oct-1 ; clawed frog BF T00643; Oct-1 ; rat BF T00644; Oct-1 ; mouse BF T00959; Oct-1 ; monkey BF T01031; Oct-1 ; chick BF T01157; Oct-1 ; gibbon ape BF T01466; Oct-1 ; hamster XX BA 49 selected binding sequences XX CC 3 rounds of selection and amplification using the HeLa fractionated CC extract and Oct-1-specific antibodies XX RN [1] RA Bendall A. J., Sturm R. A., Danoy P. A. C., Molloy P. L. RT Broad binding-site speciificity and affinity properties of octamer RT 1 and brain octamer-binding proteins RL Eur. J. Biochem. 217:799-811 (1993). XX // AC M00139 XX ID V$AHR_01 XX NA AhR XX DT ewi (created); 26.05.95. DT ewi (updated); 26.05.95. XX PO A C G T 01 0 6 2 1 02 0 6 0 3 03 0 5 1 3 04 0 7 2 0 05 2 3 3 1 06 4 0 5 0 07 3 1 4 1 08 0 5 4 0 09 0 0 0 9 10 3 2 1 3 11 0 0 9 0 12 0 9 0 0 13 0 0 9 0 14 0 0 0 9 15 0 0 9 0 16 6 1 1 1 17 1 3 5 0 18 6 0 0 3 XX BF T00018; AhR ; mouse BF T00019; AhR ; rat BF T00194; dioxin receptor ; mouse BF T00195; dioxin receptor ; rat XX BA 9 binding sites in rat and mouse cytochrome P450 genes XX CC sequences compiled by TRANSFAC XX RN [1] RA Denison M. S., Fisher J. M., Whitlock J. P. RT Protein-DNA Interactions at Recognition Sites for the Dioxin-Ah RT Receptor Complex RL J. Biol. Chem. 264:16478-16482 (1989). RN [2] RA Neuhold L. A., Shirayoshi Y., Ozato K., Jones J. E., Nebert D. W. RT Regulation of Mouse CYP1A1 Gene Expression by Dioxin: Requirement RT of Two cis-Acting Elements during Induction RL Mol. Cell. Biol. 9:2378-2386 (1989). RN [3] RA Hapgood J., Cuthill S., Denis M., Poellinger L., Gustafsson J.-A. RT Specific protein-DNA interactions at a xenobiotic-responsive RT element: Copurification of dioxin receptor and DNA-binding affinity RL Proc. Natl. Acad. Sci. USA 86:60-64 (1989). RN [4] RA Hapgood J., Cuthill S., Soederkvist P., Wilhelmsson A., Pongratz RA I., Tukey R. H., Johnson E. F., Gustafsson J.-A., Poellinger L. RT Liver cells contain constitutive DNase I-hypersensitive sites RT at the xenobiotic response elements 1 and 2 (XRE1 and -2) of RT the rat cytochrome P-450IA1 gene and a constitutive, nuclear RT XRE-binding factor that is distinct from the dioxin receptor RL Mol. Cell. Biol. 11:4314-4323 (1991). RN [5] RA Hoffman E. C., Reyes H., Chu F.-F., Sander F., Conley L. H., RA Brooks B. A., Hankinson O. RT Cloning of a factor required for activity of the Ah (dioxin) receptor RL Science 252:954-958 (1991). XX // AC M00140 XX ID I$BCD_01 XX NA Bcd XX DT ewi (created); 26.05.95. DT ewi (updated); 26.05.95. XX PO A C G T 01 0 4 7 1 02 2 1 8 1 03 0 0 12 0 04 12 0 0 0 05 0 0 0 12 06 0 0 0 12 07 12 0 0 0 08 6 1 2 3 XX BF T00063; Bcd ; fruit fly XX BA 12 binding sites from 3 genes XX CC sequences compiled in TRANSFAC XX XX // AC M00141 XX ID V$LYF1_01 XX NA Lyf-1 XX DT ewi (created); 29.05.95. DT ewi (updated); 29.05.95. XX PO A C G T 01 0 1 2 8 02 0 3 0 8 03 1 1 0 9 04 2 0 9 0 05 1 0 10 0 06 0 1 10 0 07 10 0 1 0 08 3 0 7 1 09 5 0 6 0 XX BF T00479; LyF-1 ; mouse XX BA 11 binding sites from 5 genes XX CC compiled sequences XX RN [1] RA Lo K., Landau N. R., Smale S. T. RT LyF-1, a transcriptional regulator that interacts with a novel RT class of promoters for lymphocytes-specific genes RL Mol. Cell. Biol. 11:5229-5243 (1991). XX // AC M00142 XX ID F$NIT2_01 XX NA NIT2 XX DT ewi (created); 29.05.95. DT ewi (updated); 29.05.95. XX PO A C G T 01 0 0 0 16 02 16 0 0 0 03 0 0 0 16 04 0 16 0 0 05 4 0 3 9 06 7 6 0 3 XX BA 16 binding sites from 2 genes of Neurospora crassa and Aspergillus nidulans XX CC compiled sequences XX RN [1] RA Fu Y.-H., Marzluf G. A. RT nit-2, the major positive-acting nitrogen regulatory gene of RT Neurospora crassa, encodes a sequence-specific DNA-binding protein RL Proc. Natl. Acad. Sci. USA 87:5331-5335 (1990). XX // AC M00143 XX ID V$PAX5_01 XX NA BSAP XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 0 2 2 3 02 1 5 0 1 03 1 2 3 1 04 3 1 1 2 05 2 1 3 1 06 3 0 4 0 07 1 3 2 1 08 0 1 4 2 09 0 7 0 0 10 7 0 0 0 11 2 2 2 1 12 0 2 2 3 13 0 1 6 0 14 3 1 1 2 15 2 0 1 4 16 0 0 7 0 17 2 3 1 1 18 3 0 4 0 19 1 0 2 4 20 4 1 2 0 21 1 1 5 0 22 2 5 0 0 23 1 3 3 0 24 2 0 4 1 25 1 4 2 0 26 2 3 0 2 27 1 3 2 1 28 0 3 2 2 XX BF T00070; BSAP ; human BF T01201; BSAP ; mouse XX BA 7 binding sites from 7 genes XX CC compiled class I sequences which are bound by the full-length CC protein, but not by the truncated BSAP(1-107) XX RN [1] RA Czerny T., Schaffner G., Busslinger M. RT DNA sequence recognition by Pax proteins: bipartite structure RT of the paired domain and its binding site RL Genes Dev. 7:2048-2061 (1993). XX // AC M00144 XX ID V$PAX5_02 XX NA BSAP XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 2 1 2 0 02 3 0 2 0 03 2 2 0 1 04 1 2 2 0 05 2 0 1 2 06 1 0 4 0 07 3 0 1 1 08 1 1 1 2 09 2 0 1 2 10 0 2 1 2 11 0 3 1 1 12 0 1 0 4 13 1 1 2 1 14 3 0 2 0 15 4 0 1 0 16 0 0 5 0 17 0 5 0 0 18 0 0 5 0 19 0 0 2 3 20 2 0 3 0 21 4 0 0 1 22 0 5 0 0 23 1 2 2 0 24 3 0 2 0 25 0 2 0 3 26 2 1 1 1 27 1 2 2 0 28 2 2 0 1 XX BF T00070; BSAP ; human BF T01201; BSAP ; mouse XX BA 5 binding sites from 5 genes XX CC compiled class II sequences which are bound by the full-length CC protein as well as by the truncated BSAP(1-107) XX RN [1] RA Czerny T., Schaffner G., Busslinger M. RT DNA sequence recognition by Pax proteins: bipartite structure RT of the paired domain and its binding site RL Genes Dev. 7:2048-2061 (1993). XX // AC M00145 XX ID V$BRN2_01 XX NA Brn-2 XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 2 3 4 3 02 2 4 2 4 03 1 9 1 1 04 11 0 0 1 05 1 0 0 11 06 1 3 3 3 07 1 5 3 1 08 2 0 2 1 09 6 0 0 6 10 12 0 0 0 11 12 0 0 0 12 1 0 0 11 13 4 2 5 1 14 6 4 1 1 15 5 0 5 2 16 2 5 1 4 XX BF T00630; N-Oct-3 ; human BF T01524; Brn-2 ; rat XX BA 12 binding sites from 9 vertebral and insect genes XX CC compiled sequences XX RN [1] RA Li P., He X., Gerrero M. R., Mok M., Aggarwal A., Rosenfeld M. G. RT Spacing and orientation of bipartite DNA-binding motifs as potential RT functional determinants for POU domain factors RL Genes Dev. 7:2483-2496 (1993). XX // AC M00146 XX ID V$HSF1_01 XX NA HSF1 XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 24 4 12 5 02 0 0 45 0 03 38 1 2 4 04 36 1 2 6 05 7 14 9 15 06 13 2 21 8 07 6 4 6 29 08 5 4 5 31 09 0 45 0 0 10 6 7 21 11 XX BF T01042; HSF1 ; human BF T01044; HSF1 ; chick BF T01525; HSF1 ; mouse XX BA 45 selected binding sequences XX CC 5 rounds of selection from random 27-mers XX RN [1] RA Kroeger P. E., Morimoto R. I. RT Selection of new HSF1 and HSF2 DNA-binding sites reveals differences RT in trimer cooperativity RL Mol. Cell. Biol. 14:7592-7603 (1994). XX // AC M00147 XX ID V$HSF2_01 XX NA HSF2 XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 9 6 8 3 02 0 0 27 0 03 24 1 2 0 04 20 1 2 4 05 4 6 7 10 06 8 7 8 4 07 8 2 2 15 08 4 0 2 20 09 0 27 0 0 10 2 4 12 9 XX BF T00972; HSF2 ; mouse BF T01043; HSF2 ; human BF T01045; HSF2 ; chick XX BA 27 selected binding sequences XX CC 5 rounds of selection from random 27-mers XX RN [1] RA Kroeger P. E., Morimoto R. I. RT Selection of new HSF1 and HSF2 DNA-binding sites reveals differences RT in trimer cooperativity RL Mol. Cell. Biol. 14:7592-7603 (1994). XX // AC M00148 XX ID V$SRY_01 XX NA SRY XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 19 0 0 4 02 22 0 0 1 03 16 0 1 6 04 0 17 6 0 05 14 0 1 8 06 13 5 2 3 07 12 8 2 1 XX BF T00996; SRY ; mouse BF T00997; SRY ; human XX BA 23 selected binding sequences XX CC 5 rounds of selection from random dekamers XX RN [1] RA Pontiggia A., Rimini R., Harley V. R., Goodfellow P. N., Lovell-Badge RA R., Bianchi M. E. RT Sex-reversing mutations affect the architecture of SRY--DNA complexes RL EMBO J. 13:6115-6124 (1994). XX // AC M00149 XX ID P$SBF1_01 XX NA SBF-1 XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 0 1 2 3 02 2 1 0 3 03 2 0 3 1 04 1 1 0 4 05 1 1 3 1 06 1 1 4 0 07 0 0 0 6 08 0 0 0 6 09 6 0 0 0 10 5 1 0 0 11 3 0 0 3 12 4 0 0 2 13 3 0 0 3 14 1 1 2 2 XX BF T00739; SBF-1 ; french bean XX BA 6 binding sites from 2 genes XX CC compiled sequences XX RN [1] RA Lawton M. A., Dean S. M., Dron M., Kooter J. M., Kragh D. M., RA Harrison M. J., Yu L., Tanguay L., Dixon R. A., Lamb C. J. RT Silencer region of a chalcone synthase promoter contains multiple RT binding sites for a factor, SBF-1, closely related to GT-1 RL Plant Mol. Biol. 16:235-249 (1991). XX // AC M00150 XX ID V$BRACH_01 XX NA Brachyury XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 15 17 3 5 02 5 3 15 17 03 15 14 0 11 04 5 12 16 7 05 1 2 0 37 06 2 19 19 0 07 30 0 7 3 08 0 40 0 0 09 38 0 2 0 10 2 34 2 2 11 2 28 8 2 12 1 1 0 38 13 40 0 0 0 14 0 0 40 0 15 0 0 40 0 16 0 0 0 40 17 0 0 40 0 18 0 2 0 38 19 1 7 28 4 20 40 0 0 0 21 31 5 0 4 22 33 0 3 4 23 5 5 1 29 24 1 5 8 26 XX BF T01526; Brachyury ; mouse XX BA 40 selected binding sequences XX CC 4 rounds of selection from random 26-mers by binding to in vitro CC transcribed and translated factor XX RN [1] RA Kispert A., Herrmann B. G. RT The Brachyury gene encodes a novel DNA binding protein RL EMBO J. 12:3211-3220 (1993). XX // AC M00151 XX ID P$AG_01 XX NA AG XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 21 20 6 19 02 9 3 3 51 03 10 0 1 55 04 29 8 8 21 05 0 66 0 0 06 0 65 0 1 07 31 3 6 26 08 47 2 0 17 09 52 0 1 13 10 25 0 1 40 11 17 15 11 23 12 19 8 20 19 13 7 0 57 2 14 2 0 54 10 15 22 17 5 22 16 45 4 5 12 17 40 6 9 11 18 15 10 16 25 XX BF T01007; AG ; mouse-ear cress XX BA 66 selected binding sequences XX CC 5 rounds of selection from random 26-mers bound to bacterially expressed protein XX RN [1] RA Huang H., Mizukami Y., Hu Y., Ma H. RT Isolation and characterization of the binding sequences for RT the product of the Arabidopsis floral homeotic gene AGAMOUS RL Nucleic Acids Res. 21:4769-4776 (1993). XX // AC M00152 XX ID V$SRF_01 XX NA SRF XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 28 1 2 2 02 1 0 0 32 03 1 0 31 1 04 7 25 0 1 05 0 33 0 0 06 0 33 0 0 07 23 0 0 10 08 2 0 0 31 09 33 0 0 0 10 0 0 0 33 11 31 0 0 2 12 10 0 0 23 13 0 0 33 0 14 0 0 33 0 15 12 4 3 14 16 13 9 8 3 17 11 6 7 9 18 1 4 8 20 XX BA 33 selected binding sequences XX CC 4 rounds of selection from random 26-mers using overexpressed CC SRF present in crude 3T3 cell extracts and specific antibodies CC for immunoprecipitation XX RN [1] RA Pollock R., Treisman R. RT A sensitive method for the determination of protein-DNA binding RT specificities RL Nucleic Acids Res. 18:6197-6204 (1990). XX // AC M00154 XX ID F$STRE_01 XX NA STRE XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 3 3 1 10 02 6 7 1 3 03 17 0 0 0 04 0 0 17 0 05 0 0 17 0 06 0 0 17 0 07 0 0 17 0 08 4 5 7 1 XX BA 17 stress-response element-related sequences from 7 stress-induced yeast genes XX CC compiled sequences XX RN [1] RA Schüller C., Brewster J. L., Alexander M. R., Gustin M. C., Ruis H. RT The HOG pathway controls osmotic regulation of transcription RT via the stress response element (STRE) of the Saccharomyces RT cerevisiae CTT1 gene RL EMBO J. 13:4382-4389 (1994). XX // AC M00155 XX ID V$ARP1_01 XX NA ARP-1 XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 0 1 0 6 02 0 1 6 0 03 6 1 0 0 04 1 0 1 0 05 0 5 1 1 06 0 6 0 1 07 0 4 0 3 08 1 1 0 5 09 0 0 0 7 10 0 0 7 0 11 4 0 1 0 12 2 3 1 0 13 2 5 0 0 14 0 5 0 2 15 0 4 1 2 16 3 0 0 4 XX BF T00045; ARP-1 ; human XX BA 7 binding sites from 6 genes XX CC compiled sequences XX RN [1] RA Ladias J. A. A., Karathanasis S. K. RT Regulation of the apolipoprotein A1 gene by ARP-1, a novel member RT of the steroid receptor superfamily RL Science 251:561-565 (1991). XX // AC M00156 XX ID V$RORA1_01 XX NA RORalpha1 XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 10 2 7 6 02 9 3 3 10 03 15 1 2 7 04 9 1 0 15 05 6 12 4 3 06 11 2 5 7 07 21 0 4 0 08 0 0 25 0 09 0 0 25 0 10 0 0 0 25 11 0 25 0 0 12 25 0 0 0 13 7 6 2 10 XX BF T01527; RORalpha1 ; human XX BA 25 selected binding sequences XX CC 4 rounds of selection from random 30-mers by binding to in vitro-synthesized CC factor XX RN [1] RA Gigučre V., Tini M., Flock G., Ong E., Evans R. M., Otulakowski G. RT Isoform-specific amino-terminal domains dictate DNA-binding RT properties of RORalpha, a novel family of orphan hormone nuclear RT receptors RL Genes Dev. 8:538-553 (1994). XX // AC M00157 XX ID V$RORA2_01 XX NA RORalpha2 XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 17 2 7 10 02 15 0 4 17 03 36 0 0 0 04 22 0 0 14 05 2 12 13 9 06 0 0 0 36 07 28 0 8 0 08 0 0 36 0 09 0 0 36 0 10 0 0 0 36 11 0 36 0 0 12 36 0 0 0 13 14 7 10 5 XX BF T01528; RORalpha2 ; human XX BA 36 selected binding sequences XX CC 6 rounds of selection from random 30-mers by binding to in vitro-synthesized CC factor XX RN [1] RA Gigučre V., Tini M., Flock G., Ong E., Evans R. M., Otulakowski G. RT Isoform-specific amino-terminal domains dictate DNA-binding RT properties of RORalpha, a novel family of orphan hormone nuclear RT receptors RL Genes Dev. 8:538-553 (1994). XX // AC M00158 XX ID V$COUP_01 XX NA COUP-TF / HNF-4 XX DT hiwi (created); 08.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 0 0 2 11 02 1 0 12 0 03 12 0 1 0 04 6 7 0 0 05 0 13 0 0 06 0 3 0 10 07 0 2 0 11 08 1 0 0 12 09 2 0 11 0 10 6 4 3 0 11 6 5 1 1 12 1 10 1 1 13 3 6 0 4 14 0 3 3 7 XX BF T00147; COUP ; rat BF T00148; COUP ; chick BF T00149; COUP ; human BF T00372; HNF-4 ; rat BF T00373; HNF-4 ; human BF T00994; COUP-TF ; mouse XX BA 13 genomic binding sites for COUP-TF and HNF-4 XX CC compiled sequences XX RN [1] RA Kimura A., Nishiyori A., Murakami T., tsukamoto T., Hata S., RA Osumi T., Okamura R., Mori M., Takiguchi M. RT Chicken ovalbumin upstream promoter-transcription factor (COUP-TF) RT represses transcription from the promoter of the gene for ornithine RT transcarbamylase in a manner antagonistic to hepatocyte nuclear RT factor-4 (HNF-4) RL J. Biol. Chem. 268:11125-11133 (1993). XX // AC M00159 XX ID V$CEBP_01 XX NA C/EBP XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 5 3 4 10 02 7 6 7 2 03 5 0 3 14 04 3 0 8 11 05 2 0 2 18 06 0 0 22 0 07 2 5 12 3 08 8 1 2 11 09 10 2 4 6 10 16 0 0 6 11 6 5 7 4 12 4 4 7 7 13 4 7 4 7 XX BF T00104; C/EBP ; mouse BF T00105; C/EBP ; human BF T00107; C/EBP ; chick BF T00108; C/EBP ; rat BF T01388; C/EBP ; clawed frog XX BA 22 binding sites from 9 viral and cellular genes XX CC compiled sequences XX RN [1] RA Grange T., Roux J., Rigaud G., Pictet R. RT Cell-type specific activity of two glucocorticoid responsive RT units of rat tyrosine aminotransferase gene is associated with RT multiple binding sites for C/EBP and a novel liver-specific RT nuclear factor RL Nucleic Acids Res. 19:131-139 (1991). XX // AC M00160 XX ID V$SRY_02 XX NA SRY XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 6 6 10 7 02 10 4 3 12 03 15 1 2 11 04 21 2 1 5 05 25 0 0 4 06 0 29 0 0 07 28 0 0 1 08 27 0 0 2 09 10 0 1 18 10 16 3 5 5 11 7 6 13 3 12 11 2 10 6 XX BF T00996; SRY ; mouse BF T00997; SRY ; human XX BA 29 selected binding sequences XX CC 9 rounds of selection from random 26-mers by binding to bacterially CC expressed factor XX RN [1] RA Harley V. R., Lovell-Badge R., Goodfellow P. N. RT Definition of a consensus DNA binding site for SRY RL Nucleic Acids Res. 22:1500-1501 (1994). XX // AC M00161 XX ID V$OCT1_05 XX NA Oct-1 XX DT hiwi (created); 08.06.95. DT ewi (updated); 14.06.95. XX PO A C G T 01 3 3 1 0 02 0 0 3 4 03 3 2 2 0 04 6 1 0 0 05 0 0 0 7 06 0 0 0 7 07 2 0 0 5 08 0 1 5 1 09 0 7 0 0 10 7 0 0 0 11 0 0 0 7 12 5 0 1 1 13 1 2 0 4 14 0 2 1 4 XX BF T00641; Oct-1 ; human BF T00642; Oct-1 ; clawed frog BF T00643; Oct-1 ; rat BF T00644; Oct-1 ; mouse BF T00959; Oct-1 ; monkey BF T01031; Oct-1 ; chick BF T01157; Oct-1 ; gibbon ape BF T01466; Oct-1 ; hamster XX BA 7 binding sites from 6 genes XX CC compiled sequences XX RN [1] RA Groenen M. A. M., Dijkhof R. J. M., van der Poel J. J., van RA Diggelen R., Verstege E. RT Multiple octamer binding sites in the promoter region of the RT bovine alphas2-casein gene RL Nucleic Acids Res. 20:4311-4318 (1992). XX // AC M00162 XX ID V$OCT1_06 XX NA Oct-1 XX DT hiwi (created); 08.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 1 4 1 0 02 3 0 1 2 03 2 1 1 2 04 4 1 0 1 05 3 1 0 2 06 0 0 0 6 07 0 0 3 3 08 3 0 1 2 09 1 3 2 0 10 6 0 0 0 11 0 0 0 6 12 2 0 3 1 13 0 3 0 3 14 2 1 1 2 XX BF T00641; Oct-1 ; human BF T00642; Oct-1 ; clawed frog BF T00643; Oct-1 ; rat BF T00644; Oct-1 ; mouse BF T00959; Oct-1 ; monkey BF T01031; Oct-1 ; chick BF T01157; Oct-1 ; gibbon ape BF T01466; Oct-1 ; hamster XX BA 6 binding sites of TAATGARAT type from 3 genes XX CC compiled sequences XX RN [1] RA Groenen M. A. M., Dijkhof R. J. M., van der Poel J. J., van RA Diggelen R., Verstege E. RT Multiple octamer binding sites in the promoter region of the RT bovine alphas2-casein gene RL Nucleic Acids Res. 20:4311-4318 (1992). XX // AC M00163 XX ID I$HSF_02 XX NA HSF (Drosophila) XX DT ewi (created); 22.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 31 14 4 1 02 0 0 50 0 03 48 2 0 0 04 43 1 5 1 05 23 1 14 12 06 31 14 4 1 07 0 0 50 0 08 48 2 0 0 09 43 1 5 1 10 23 1 14 12 11 31 14 4 1 12 0 0 50 0 13 48 2 0 0 14 43 1 5 1 15 23 1 14 12 XX BF T00386; HSTF ; fruit fly XX BA 50 functional genomic HSEs XX CC triple repeat of the 5-bp unit (M00028) in NGAANNGAANNGAAN orientation XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). XX // AC M00164 XX ID I$HSF_03 XX NA HSF (Drosophila) XX DT ewi (created); 22.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 31 14 4 1 02 0 0 50 0 03 48 2 0 0 04 43 1 5 1 05 23 1 14 12 06 31 14 4 1 07 0 0 50 0 08 48 2 0 0 09 43 1 5 1 10 23 1 14 12 11 12 14 1 23 12 1 5 1 43 13 0 0 2 48 14 0 50 0 0 15 1 4 14 31 XX BF T00386; HSTF ; fruit fly XX BA 50 functional genomic HSEs XX CC triple repeat of the 5-bp unit (M00028) in NGAANNGAANNTTCN orientation XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). XX // AC M00165 XX ID I$HSF_04 XX NA HSF (Drosophila) XX DT ewi (created); 22.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 31 14 4 1 02 0 0 50 0 03 48 2 0 0 04 43 1 5 1 05 23 1 14 12 06 12 14 1 23 07 1 5 1 43 08 0 0 2 48 09 0 50 0 0 10 1 4 14 31 11 31 14 4 1 12 0 0 50 0 13 48 2 0 0 14 43 1 5 1 15 23 1 14 12 XX BF T00386; HSTF ; fruit fly XX BA 50 functional genomic HSEs XX CC triple repeat of the 5-bp unit (M00028) in NGAANNTTCNNGAAN orientation XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). XX // AC M00166 XX ID I$HSF_05 XX NA HSF (Drosophila) XX DT ewi (created); 22.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 12 14 1 23 02 1 5 1 43 03 0 0 2 48 04 0 50 0 0 05 1 4 14 31 06 31 14 4 1 07 0 0 50 0 08 48 2 0 0 09 43 1 5 1 10 23 1 14 12 11 31 14 4 1 12 0 0 50 0 13 48 2 0 0 14 43 1 5 1 15 23 1 14 12 XX BF T00386; HSTF ; fruit fly XX BA 50 functional genomic HSEs XX CC triple repeat of the 5-bp unit (M00028) in NTTCNNGAANNGAAN orientation XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). XX // AC M00167 XX ID F$HSF_02 XX NA HSF (yeast) XX DT ewi (created); 22.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 28 6 12 4 02 0 0 48 2 03 46 0 2 2 04 46 0 2 2 05 12 19 8 11 06 28 6 12 4 07 0 0 48 2 08 46 0 2 2 09 46 0 2 2 10 12 19 8 11 11 28 6 12 4 12 0 0 48 2 13 46 0 2 2 14 46 0 2 2 15 12 19 8 11 XX BF T00385; HSTF ; yeast XX BA 50 functional genomic HSEs XX CC triple repeat of the 5-bp unit (M00029) in NGAANNGAANNGAAN orientation XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). XX // AC M00168 XX ID F$HSF_03 XX NA HSF (yeast) XX DT ewi (created); 22.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 28 6 12 4 02 0 0 48 2 03 46 0 2 2 04 46 0 2 2 05 12 19 8 11 06 28 6 12 4 07 0 0 48 2 08 46 0 2 2 09 46 0 2 2 10 12 19 8 11 11 11 8 19 12 12 2 2 0 46 13 2 2 0 46 14 2 48 0 0 15 4 12 6 28 XX BF T00385; HSTF ; yeast XX BA 50 functional genomic HSEs XX CC 50 functional genomic HSEstriple repeat of the 5-bp unit (M00029) CC in NGAANNGAANNTTCN orientation XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). XX // AC M00169 XX ID F$HSF_04 XX NA HSF (yeast) XX DT ewi (created); 22.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 28 6 12 4 02 0 0 48 2 03 46 0 2 2 04 46 0 2 2 05 12 19 8 11 06 11 8 19 12 07 2 2 0 46 08 2 2 0 46 09 2 48 0 0 10 4 12 6 28 11 28 6 12 4 12 0 0 48 2 13 46 0 2 2 14 46 0 2 2 15 12 19 8 11 XX BF T00385; HSTF ; yeast XX BA 50 functional genomic HSEs XX CC triple repeat of the 5-bp unit (M00029) in NGAANNTTCNNGAAN orientation XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). XX // AC M00170 XX ID F$HSF_05 XX NA HSF (yeast) XX DT ewi (created); 22.06.95. DT ewi (updated); 22.06.95. XX PO A C G T 01 11 8 19 12 02 2 2 0 46 03 2 2 0 46 04 2 48 0 0 05 4 12 6 28 06 28 6 12 4 07 0 0 48 2 08 46 0 2 2 09 46 0 2 2 10 12 19 8 11 11 28 6 12 4 12 0 0 48 2 13 46 0 2 2 14 46 0 2 2 15 12 19 8 11 XX BF T00385; HSTF ; yeast BA 50 functional genomic HSEs XX CC triple repeat of the 5-bp unit (M00028) in NTTCNNGAANNGAAN orientation XX RN [1] RA Fernandes M., Xiao H., Lis J. T. RT Fine structure analyses of the Drosophila and Saccharomyces RT heat shock factor-heat shock element interactions RL Nucleic Acids Res. 22:167-173 (1994). //